Another interesting humanities research tool has recently been made available with Google Labs. This tool allows for quantitive methods to be applied in order to complement the research in philosophy, art, language and topics alike. Google Ngram Viewer generates a visual graph in datasets of five, based on the full text of about 5.2 million books, with more than 500 billion words in total. And it all happens at the click of the “Search Lots of Books” button.
Googlers Go Harvard
“These datasets were the basis of a research project led by Harvard University’s Jean-Baptiste Michel and Erez Lieberman Aiden published in Science and coauthored by several Googlers,” according to the Google post. Basically, the Ngram will draw a graph and compare up to five datasets, which are words of either English language, with British and American English differentiated, or words in Chinese, French, German, Russian, or Spanish. Since 2004, Google has digitized more than 15 million books worldwide and the datasets that they are making available today to further humanities research are based on a subset of that corpus. “The Ngram Viewer lets you graph and compare phrases from these datasets over time (from the year 1800 to 2000), showing how their usage has waxed and waned over the years,” the Google post continues. There are many interesting suggestions for your first search like datasets of “World War I, Great War”, or “George Washington, Thomas Jefferson, Abraham Lincoln”, or words like “fry, bake, grill, roast”. The user friendly interface is available to anyone, and scholars are especially invited to try at visualizing their own phrase-sets, to come up with new interesting hypotheses.
Genomics vs. Culturomics
From the abstract of the article that has recently been published by the Ngram researchers themselves in the Science journal, we find out: “We survey the vast terrain of ‘culturomics’, focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. ‘Culturomics’ extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.” Researchers have utilized a digital “fossil record of human culture,” as the phenomenon is depicted at the Harvard news portal. The analogy with genomics is due to eliciting a “cultural genome” from the extensive records. To type in a word and to immediately be presented with its usage frequency, how can it do so much? For one, it helps us to realize that language is alive and it grows (or shrinks) over time, and for two, words form our worlds, and for three, world forms our words. Thus, I have typed in “words” and “worlds” in the Ngram, and I could read that as the world is stagnating, the words are becoming forgotten.