Skip to content

Questions

Ethan Wong edited this page Apr 20, 2023 · 6 revisions

wtf

  • What does it mean when there are sets of similar pages with no info? Does this refer to ics.uci.edu#aaa vs ics.uci.edu#aaa
  • How do we detect large files? Some parameter in pickles?
  • What counts as a page with "high textual content"?
  • Do we need to explicitly add beautiful soup / lxml as a dependency somewhere?
Clone this wiki locally