Analysis of Language and Content In a Digital Environment


ALCIDE (Analysis of Language and Content In a Digital Environment) is a web-based platform designed to assist humanities scholars in analysing large quantities of data such as historical sources and literary works. The system combines a flexible suite of tools to browse through the content of document collections and analyse them along different dimensions, including the lexical, the semantic, the geographical and the temporal level.

In 2014 the first system prototype was designed with the help of history scholars from the Italian-German Historical Institute (ISIG) in Trento. In 2015 their feedback led us to the development of a new interface and to the extension of the platform with new functionalities.

This demo gives access to 3 corpora, two in English and one in Italian:

The original documents in digital format are converted into XML and then a pipeline of NLP modules processes them to extract a set of relevant information. We rely mainly on TextPro, a NLP suite developed at Fondazione Bruno Kessler that includes modules for tokenisation, sentence splitting, morphological analysis, Part-of-Speech (PoS) tagging, lemmatisation, multiword recognition, keywords extraction, chunking and named entity recognition. All extracted information is stored in a MySql DataBase Management System. In ALCIDE we use Highcharts to present the most common chart types (i.e. bar and line charts) while the most interactive and custom data-driven visualisations (i.e. co-occurrences and networks ) are displayed using d3.js. The display of interactive maps is implemented using the Leaflet library.

For more information see the project webpage with all the related available material (slides, posters, video).

Best experience on chrome and safari