I sat down with Sergey Chernenko in August to ask how to do this, and he walked me through the basics of what he has done before.
- EDGAR contains links to “index” files — look for the ones called “master.zip”
- Collect the links to the 10-Ks that you want
- Write a program that follows the link to download the text file.
- Search for key words
Sergey uses Perl, I believe, and uses the “grab” command in a Unix shell. When doing a search on key words or phrases–especially long words or multi-word phrases–one thing to consider is the possibility that your word or phrase is broken across lines. If your program reads one line at a time during the word/phrase search, then it will not handle line breaks appropriately.