GeistHaus
log in · sign up

A Poor Journalists's Text Mining Toolkit

pudo.org

How can journalists search and analyze collections of documents on their own computers with simple tools? At last weekend's DataHarvest, we ran a workshop trying to answer that question. This write-up to covers using Apache Tika for content extraction and regular expressions in Sublime Text as an advanced search tool.

0 pages link to this URL

No pages have linked to this URL yet.