Tool Review: TokenX and Language Analysis

[This review is cross-posted at Digital History.]

The proliferation of linguistic tools for analysis has opened new avenues for historians working in the digital realm. Textual analysis is the study of newspaper articles, books, laws, oral histories, and other forms of human communication. Textual analysis digital tools better enable historians to decipher language usage, frequency, and significance in the context of discourse, rhetoric, and ideas. These robust digital tools thereby provide numerous possibilities that can inform historical research and communication strategies that can introduce new thinking into the current historiography. Brian Pytlik Zillig at the Center for Digital Research in the Humanities (CDRH) at the University of Nebraska-Lincoln developed TokenX as a powerful tool for analyzing text. While TokenX continues to undergo revision and further development, tools like this one can help historians integrate textual analysis in their research to analyze connections in language and across several texts.

Accompanying language analysis tools are encoding standards manifest in eXtensible Markup Language (XML), a Text Encoding Initiative (TEI) standard that defines textual elements without compromising the integrity of the original document. Text encoding becomes necessary for making digital representations of original analog materials, a particularly crucial step in digital research for scholars studying eras prior to the proliferation of computers and electronic-born texts. Encoding not only serves to structure sustainable projects but allows for sophisticated analysis of text by a flexible ability to define elements within a document. Furthermore, making texts digital with proper encoding allows more rigorous examination and manipulation of said texts. The more digital texts available for analysis, the better for digital textual analysis tools to articulate and produce visualizations that can create a framework to define, query, and highlight the associations in the record of the past.

TokenX analyzes XML files that can be manually input to the software (assuming the XML document is stored on a server) or built into a digital project, a task accomplished by Pytlik Zillig and CDRH (see, for example, Framing Red Power, William Jennings Bryan and the Railroad, and What Shall be the Character of this Vast Western Territory?). Once a file is “Tokenized,” users can generate word clouds, highlight keywords, view keywords in context, create word counts, and a host of other forms of analysis. Newer features currently being integrated into TokenX allow for n-gram analysis and concordance views of text, both of which help deconstruct texts even further by counting phrases containing an n number of words. Word clouds provide a visual depiction of the frequency of words in a document’s content. Shown by a variation in font size or color depending on their frequency, the word clouds identify the most crucial words used in a document. Another impressive feature in TokenX’s textual analysis rests in being able to view particular words in context. Emphasizing words in their immediate context allows one to visualize that word’s usage in several instances within a document. Through such features, researchers and historians can mine the text for information not visible without machine-aid to demonstrate some connective tissue between the text and a historical argument. Textual visualizations allow scholars to glean what a text or corpus of text is narrating about particular themes, people, or events. Certain elements are highlighted and scholars can investigate these texts in numerous ways to determine why particular words or contexts come into focus while others fade in importance. In terms of scholarly communications, the digital presentation provides an accessible way for historians to narrate their argument. TokenX’s visualizations provide in-depth insights into word contexts within individual and corpus texts and serve as a method for analyzing the connective tissue within language and across texts in time and place.

Recently, TokenX was integrated into student projects with assistance from Pytlik Zillig. This digital tool has aided the students in crafting original historical arguments by highlighting language and word trends. The students first transcribed each of the historical documents used as their source base. Transcribing textual documents into a digital form also provides the historian a deeper familiarity with the document’s content, context, and type of discourse. With a significant corpus of documents made digital users can investigate different keywords and perform the other functions of analysis offered by the tool. Having TokenX integrated into digital projects enable the authors of those projects to make their argument interactive rather than static screen captures of visualizations. The integration of TokenX into digital projects requires you to work through Pytlik Zillig to “Tokenize” the documents and host the material on a server, thus limiting the design capabilities of TokenX. However, the design limitation does not detract from the usefulness of the analysis tool and the value it adds to digital scholarship.

Historians and history instructors will find textual analysis tools, like TokenX, critical for piecing together and visually demonstrating historical analysis to students and colleagues alike.

Brent Rogers and Jason Heppler University of Nebraska-Lincoln Reviewed: August 2009

October 25, 2009 @jaheppler