U. TORONTO (CAN) — Based on the appearance of popular words or phrases, new software can tell when medieval British documents were written.
Researchers with the Centre for Medieval Studies and the Documents of Early England Data Set (DEEDS) Project needed software that could decipher their database of about 10,000 British charter and property documents, which are all from approximately 1066 until the 1400s, with the majority being from 1100 to 1300.
Gelila Tilahun, who was doing her PhD with the Department of Statistical Sciences at University of Toronto, created the software, which uses the DEEDS database as a source.
Given an undated text, the software aggregates the probability of occurrence of words and phrases of the text at each time period, and then estimates the date of the text to be the time value that maximizes the aggregated probabilities.
Being able to estimate the probability of occurrence of a word or phrase means the evolution of the usage of popular terms can be examined. For example, the form of address “Francis et Anglicis” (French and English) was commonly used by French and English barons to address their workers and/or soldiers from the mid-1100s.
When Normandy was lost to France in 1204 and the English no longer had tenure lands in Normandy, the form of address gradually disappeared from usage.
“The idea is that language evolves through time. Some words and phrases eventually die and others continue on,” she says. “These words have their own life. It’s amazing how we can decipher the date of a document based on the evolution of word usage.”
Dating these particular types of documents has proved to be a challenge.
“The British never dated their documents. There are over a million documents in existence and nobody knows when they were written,” says Tilahun.
“You can’t use writing styles or seals because a lot of these documents are not on their original parchment. These documents would go through different monasteries—people would come with a contract that links them to their property—but everything eventually deteriorated so scribes would actually continually handwrite new copies. Now all we have are the words.”
The goal is to get the software online so that anyone can submit a document to be dated. Tilahun says they are also working on modifying the algorithm to determine where the document is from.
Tilahun’s paper on the project, co-authored by Andrey Feuerverger and Michael Gervers, appears in a recent issue of The Annals of Applied Statistics.
Source: University of Toronto