Text Analysis with Voyant
What Voyant does: Voyant employs mathematics to visualize documents, one or many, in forms aside from text. The math begins with how many words are in a given document or corpus of documents, tracks the words’ distribution through the document/corpus, and generates graphics to display their frequency, context, and location. The resulting graphics range from word clouds, showing a cluster of words that appear most frequently in the text, to word trees that draw associations between words that appear most often with the selected term. Yet not everything on Voyant takes visual form. The original text is on view, as are the operative data: total words, “vocabulary density,” et al.
How to use Voyant: The site appears in five sections. Beginning with the sections’ default settings, in the upper left is Cirrus, a word cloud that can take two forms: a cluster based on frequency of use, and under Links next to Cirrus on the section’s menu bar, a web that links highlighted words via bars to the ones that most often precede or follow it. Thus, “old” is linked to “man,” “folks,” and “master.”
In the upper center of the site is the Reader tool. Reader shows the word selected in Cirrus in two formats: the original full text, with the selected word highlighted in yellow, and a line graph of the word’s frequency in each of the corpus’s documents. Next to Reader on the menu bar, like Links to Cirrus, stands TermsBerry. Here, both of Reader’s formats give way to a cloud of bubbles similar to the Cirrus cloud, but that this time the number of each bubbled word’s appearances in the text are on view inside it. Reader, in sum, is most useful for (a) viewing the original text, and (b) comparing how frequently a given word is used in a corpus’s various texts. Does “old,” e.g., appear in all texts, or just in a few?
The site’s third section, Trends, expands Reader’s line graph in a variety of directions. For one, it makes the line and axes bigger, and therefore easier to track. Second, it allows us to track a word’s frequency within a corpus or single document. By double clicking on a dot on the graph (each dot marks the high point of a word’s frequency in one document, if tracking a multi-document corpus, or one part of a document, if tracking just one text), we are given two choices. Let us imagine here that we have limited our search to a single word: old. “Terms” lets the user see old’s use across all documents all at once, each line laid atop the others. “Documents” lets the user isolate a single document, and see where old appeared within that document. Five times in the document’s first half, say, and zero times in its second half. Next to Trends on the menu bar (see Links and TermsBerry above) is Document Terms. Here we see the data that inform the graph presented as a spreadsheet, in rows and columns: the number of uses in a document, the number of uses relative to the total words in the document, etc.
The fourth section, Contexts, combines Reader’s full text and Cirrus’s Links function to abandon graphics for text. If “old” shows up five times in the document’s first half, what words appear around them? We can use Reader’s full tool to find the answer. We can see what surrounding what other words within a word cloud appear near “old,” via Links. Or we can turn to Contexts to ask specifically, what five words precede or follow “old” in one or all of our corpus’s documents? Bubblelines, next to Contexts, augments Trends’ graph options (lines, columns, etc.) with a string of bubbles across the text—one for every time the word appears. Correlations borrows the spreadsheet format seen in Trends’ to measure different relations between words in numerical form: how often does “come” come within five words of “old” in hard numbers?
The last and simples of Voyant’s five sections is Summary. Here are the metadata that help to remind us which documents are biggest, what is the average words per sentence of one document as compared to another. Next to Summary on the menu bar is Documents, which simply inverts the spreadsheet, with documents in rows rather than the data type or field. Next after that, Phrases, isolates and reports the frequency of phrase that appear more than once in the text.
It is important to repeat that the above descriptions concern only the default settings for each of the five sections. In the upper left of each section, visible only by passing over the corner, appear two or sometimes three additional icons. The first is simple: it allows us to export the given section’s graphic or spreadsheet to a new window on the user’s web browser. The second is much more difficult to summarize. A dropdown list appears that shows five tools: Corpus, Document, Visualization, Grid, and Other. Each of these opens its own dropdown list of five to fifteen options in each category. Here we find the WordTree alluded to above, and here we are reminded that “Cirrus” and “Links” and “Reader” and “Bubbleterms” are just the tip of the iceberg, or should we say the genie’s lamp, that is Voyant.