#need to figure out a way to convert pdf to txt before grep #3

vr00n · 2016-10-30T16:09:22Z

Try using PDFGREP - I was able to convert the schema PDF to a fairly structured format.

From there you can potentially use grep's contextual operators "-A, -B" to include n lines before or after a pattern match.

Here are my results on a simple pdfgrep command

pdfgrep " " schema_alphabetic.pdf | uniq | more
State of California
Civil Service Pay Scale - Alpha by Class Title
  Schem Class
          Code   Full Class Title
                           Compensation              SISA Footnotes         AR Crit  MCR Prob. Mo. WWG NT   CBID
  CU70     1733  ACCOUNT CLERK II
                      $2,471.00 - $3,097.00           SISA                             1        6   2       R 04
  ME10     4915  ACCOUNT MANAGER, CALIFORNIA EXPOSITION AND STATE FAIR
                      $5,553.00 - $6,901.00                01 43                       1       12   E       S 01
  JL32     4177  ACCOUNTANT I (SPECIALIST)
                 A    $3,000.00 - $3,757.00                                285         1        6   2       R 01
                 L    $3,000.00 - $3,757.00                                285         1        6   2       R 01

The text was updated successfully, but these errors were encountered:

josephlei · 2016-10-31T03:56:00Z

Thank you for the recommendation, I will definitely check this out!

In our discussions, this will soon be provided from source system, openly, in machine readable format and possibly an API. I wasn't aware of this package and think it will be useful in other applications in the future as well, thanks again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#need to figure out a way to convert pdf to txt before grep #3

#need to figure out a way to convert pdf to txt before grep #3

vr00n commented Oct 30, 2016 •

edited

Loading

josephlei commented Oct 31, 2016

#need to figure out a way to convert pdf to txt before grep #3

#need to figure out a way to convert pdf to txt before grep #3

Comments

vr00n commented Oct 30, 2016 • edited Loading

josephlei commented Oct 31, 2016

vr00n commented Oct 30, 2016 •

edited

Loading