GeistHaus
log in · sign up

GitHub - arc64/dataharvest-2016-commandline

github.com

Contribute to arc64/dataharvest-2016-commandline development by creating an account on GitHub.

1 page links to this URL
A Poor Journalists's Text Mining Toolkit

How can journalists search and analyze collections of documents on their own computers with simple tools? At last weekend's DataHarvest, we ran a workshop trying to answer that question. This write-up to covers using Apache Tika for content extraction and regular expressions in Sublime Text as an advanced search tool.