BitCurator NLP

About the project

The BitCurator NLP project began on October 1, 2016 and will end on September 30, 2018. BitCurator NLP is funded through a grant from the Andrew W. Mellon Foundation.

The BitCurator NLP project will develop software for collecting institutions to extract, analyze, and produce reports on features of interest in text extracted from born-digital materials contained in collections. The software will use existing natural language processing software libraries to identify and report on those items likely to be relevant to ongoing preservation, information organization, and access activities. These may include entities (e.g. persons, places, and organizations), potential relationships among entities (for example, by describing those entities that appear together within documents or set of documents), and topic models to provide insight into how concepts are naturally clustered within the documents.

Visit the BitCurator NLP wiki page for technical content, documentation, and software downloads.