Prevent iteration on pages with no tables (#22)

* get fields 5 thru 14 * consider edge cases for section 14 * adjust table settings to visually join separated tables * write csvs for all fields * refactor writing the headers for similar dicts * prevent iteration on pages with no tables * use list comprehension --------- Co-authored-by: Xavier Medrano <[email protected]>
datamade · Mar 1, 2024 · b484a22 · b484a22
1 parent 79ba9de
commit b484a22
Showing 1 changed file with 2 additions and 1 deletion.
diff --git a/scrapers/financial_disclosure/parse_pdf.py b/scrapers/financial_disclosure/parse_pdf.py
@@ -85,7 +85,8 @@ def parse_pdf(pdf: pdfplumber.PDF) -> dict[str, dict[str, str | None]]:
     table_settings = {
         "intersection_tolerance": 6, # minimum allowable tolerance to grab all tables
     }
-    rows = [tuple(row) for page in pdf.pages for row in page.extract_table(table_settings=table_settings)]  # type: ignore[union-attr]
+
+    rows = [tuple(row) for page in pdf.pages if (table := page.extract_table(table_settings=table_settings)) for row in table]  # type: ignore[union-attr]
 
     grouped_rows = SubstringDict(_group_rows(rows))