By Avraham Roos
This post is part of a series we will be publishing with projects from the WWP’s Institutes Series: Word Vectors for the Thoughtful Humanist.
This post is excerpted from Avraham Roos’s dissertation, “Why is This Translation Different from All Other Translations? A Linguistic and Cultural-Historical Analysis of English Translations of the Passover Haggadah from 1770 to Now.”
Former Google-employee Thomas Mikolov and colleagues introduced “Word2Vec” (Mikolov et al. 2013), a tool for learning continuous word embeddings from raw data. Word2Vec adds a vector to words in a text representing the spatial distance between words, which are seen as points in space. This distance describes the relation/similarity between the words. Given a large unprepared corpus of data, Word2Vec can detect relationships between words and predict which words go together with a given word. I decided to examine my (whole) digital corpus using Word2Vec. Some initial automated text cleaning (mainly changing capital letters to lower case and deletion of all punctuation) was done and the corpus was loaded into a Word2Vec toolbox[1] to train (window size 6, dimensions 100, iterations 10, negative samples 15).
Word2Vec Word Queries
The toolbox includes a “Query term” box. The results that appear when typing in a word are the words that are closest to the term that you queried in vector space—that is, words that appear in similar contexts in the corpus. For example, when entering “rabbi” the results are variant spellings of the names of rabbis in the Haggadah e.g. eliezer, eliazar, eliazar, akiba, akiva, akeeba, akibah. Other rabbis (also with multiple spellings) that appear are Elazar, Azarya, Jose (the) Galilean, Tarphon, and Yehoshua. The Haggadah relates how five of these rabbis sat together in the town of Bnei Brak to celebrate Pesach together (lines 59-60), while others appear much later on in the text in a rabbinical discussion on how many plagues there were in Egypt and at the sea (lines 183-200) and the Haggadah text tells us that one rabbi constructed a mnemonic device to remember the order of the plagues (line 182). But notwithstanding their physical distance in the text, the search brings them all together.
Word2Vec Operations
The “operations” option in Word2Vec can be used to find out which of the rabbis mentioned in the text did NOT celebrate Pesach in Bnei Brak. In operations, we can use addition or subtraction operations. Addition allows you to add the contexts associated with two terms to each other, while subtraction allows you to subtract the contexts associated with one word from another. The most common example given is “woman – man + king = queen.” So when we use the following operation: “Rabbi – Bnei” we should get a list of those rabbis who were not part of the Bnei Brak celebration. The results:
Table 8.26 – Results for the Word2Vec operation <“Rabbi” – “Bnei”> showing those rabbis mentioned in the text that did not celebrate Pesach at Bnei Brak.
In the rabbinical discussion on the number of plagues, three rabbis are mentioned: Rabbi Jose the Galilean, Rabbi Eliezer, and Rabbi Akiva. Two of these also celebrated in Bnei Brak but Rabbi Jose the Galilean did not. Hence, he appears in this “Rabbi – Bnei” query. The rabbi who created a mnemonic device (an acrostic of the initials of the plagues) to remember the order of the plagues was Rabbi Yehudah who appears in Table 8.26 with three different spellings. His suggested abbreviation is “dtzach – adash- beachab”, but Rabbi Yehudah did not present this idea at the celebration in Bnei Brak so therefore his name and references to his mnemonic appear for this operation.
Word2Vec Clusters
Something else Word2Vec does is automatically create clusters of words it decides belong together (unsupervised topic modeling). For example, it created a cluster of: “fire, cat, water, dog, bit, stick, beat, ox, burned, drank, and butcher,” which reveal a local cluster from the final song in the Haggadah on the calamities that befell the goat that father bought. In this case what the clustering does is highlighting a local passage in the text with characteristic vocabulary that does not appear in other places in the text, or in another context e.g. “drank.”
Another local cluster tells in short what happened to the Jews in Egypt: “serve, egyptians, laid, hard, labor, embittered, hard, lives, ill, treated”. Clusters can also tell us about certain customs e.g. the last bite to be taken after the meal is from a piece of Matzah called the Afikoman. This can be seen in a cluster that contains all the italicized words in the following paragraph:
Afikoman is broken in half, into two unequal parts. The smaller part is reserved on the plate and the remaining larger part is set aside hidden away wrapped in a cloth, napkin or put in a pillow for later after the maggid section leaving it until dinner/supper when the meal or meat is brought to the table. Children often steal this part as it is necessary for the commencement of the evening so they can bargain for a prize as that portion, the afikomen/afikomon/aphikoman/afiqoman must be given during the aftermeal as desert in the tzafun section to all partakers.
Other clusters can reveal the instructional language of the Haggadah:
Verbs: raise up, pick up, is raised, lift, point, cover, uncover, take, remove, return, replace, lay, put, held, hold, search
Nouns: matsoh/matzahs/matzos/matzot/matzoth/mazzot/mazzoth (all variant spellings), dish, plate, tray, platter, (shank) bone
By combining the verbs and nouns, we get many actions prescribed for the evening such as covering and uncovering the plate/dish/tray/platter, showing the matzah to the celebrants, and removing certain items from the plate that are later returned.
However, although these results are fascinating, Word2Vec is of little use for the diachronic research I had in mind. When experimenting by splitting up my corpus into two parts and comparing results, I did not find anything interesting, probably because my two sub-corpora were too small. A future researcher might want to digitize more Haggadot and try this again.
[1] The Women Writers Vector Toolkit provided by the Northeastern University – Word Vector Interface | Women Writers Vector Toolkit (northeastern.edu).
Follow Avraham’s Facebook page at: https://www.facebook.com/Haggadah-Translations-Digital-Humanities-981262611957631/
Take a look at Avraham’s Website/ Blog: https://sites.google.com/site/jewishdigitalhumanities/Downhome
“When one teaches, two learn”
Robert Heinlein (American science-fiction writer,1907-1988)