Parsing Text in Company Filings

I sat down with Sergey Chernenko in August to ask how to do this, and he walked me through the basics of what he has done before.

  1. EDGAR contains links to “index” files — look for the ones called “”
  2. Collect the links to the 10-Ks that you want
  3. Write a program that follows the link to download the text file.
  4. Search for key words

Sergey uses Perl, I believe, and uses the “grab” command in a Unix shell.  When doing a search on key words or phrases–especially long words or multi-word phrases–one thing to consider is the possibility that your word or phrase is broken across lines.  If your program reads one line at a time during the word/phrase search, then it will not handle line breaks appropriately.