|
Words from the Director - Robert Morrissey |
|
As most of our users know, the Project for American and French Research on the Treasury of the French Language (ARTFL) is a collaborative project with the French Centre national de la recherche scientifique (CNRS) laboratory Analyse et Traitement Informatique de la Langue française. With some 330 subscribing institutions across North America, the ARTFL project is now one of the oldest and most successful on-line full-text services serving the scholarly community of research and higher learning. With success have come new challenges and responsibilities. Over the last several years, the ARTFL project has been working to enhance both our collections and our software. This has been a collective effort involving discussions with our users as well as internal research and development efforts. On behalf of the whole ARTFL team, I am happy to announce this new release of both our search and retrieval engine, PhiloLogic, and of our main database, ARTFL - FRANTEXT. Growing the Collection: ARTFL-FRANTEXT and other databases We have been steadily augmenting our holdings. We have now consolidated them. We have integrated into our main database over 1,000 new works, bringing the number of works to over 2,600 and the total number of words to over 150 million. We have been growing our collections in several ways, among which the most important have been:
While the works integrated into our main database all conform to high standards of correction, we have chosen to make some texts such as the Biblothèque bleue or Bayle’s Dictionnaire historique et critique available in a much rougher state. While we or our collaborative partners are looking for funds to carry out thorough corrections, we believe that the interests of the research community are best served by making this data available in its current state. We welcome any offers for help in correcting these texts. Lastly our digital edition of Diderot and D’Alembert’s Encyclopédie continues to develop. We have performed over 200,000 corrections and, with the help of collaborative partners both here and abroad, we regularly add improvements as well as archival material to this heavily used resource. For further information, I invite you to consult the Encyclopédie home page on the ARTFL website. Our Search and Retrieval Engine: Philologic3 The ARTFL project is somewhat unique in that it involves the close collaboration of a set of scholars trained in scholarly analysis of text and a technical development team specializing in computational methods for organizing, storing, searching, retrieving and analyzing textual materials of many types and in many languages. Recently completed under the direction of Mark Olsen, ARTFL’s Associate Director in charge of technical development, this new version of our search and retrieval engine offers several new enhancements and features. We have maintained our basic commitment to the implementation of easy-to-use, intuitively straightforward software that can be quickly mastered by scholars of various disciplines. But at the same time we have tried to increase what might be called the dimensionality of both the query formulations and the results. By this I mean that we attempted on the one hand to give simple means to limit or extend the searches and on the other to view the results from various points of view (date variations, collocations, position in sentence etc.). New features include:
PhiloLogic3 had been released to the open source community and a variety of digital text projects located elsewhere now use this software for its speed and facility. We invite you to experiment freely with the new system and to give us your comments. Where to now? From Words to Works : Philomine While we will continue to make improvements to Philologic3 – for example, we hope soon to allow users to move, with a simple click of the mouse, from the collocation table to seeing given collocations in context – we nevertheless believe that we are touching the limits of the traditional "single hit" approach to text analysis in the digital humanities. As denoted by its name, the PhiloLogic system has been designed to support fairly customary notions of textual research informed by the long tradition of philology and historical semantics. Researches typically examine evolving word and concept use over time through key words, such as tradition, or clusters of terms that make up a topos, theme or commonplace, or patterns of word use. As powerful and useful as concordances, frequency counts, or collocation tables may be, the focus on small sets of words is subject to many limitations. For example, as the size of textual databases increases, even searches for relatively uncommon words can result in tens of thousands of occurrences, far beyond the capacity of the user to digest. The advent of massive textual digitization projects thus presents ARTFL with a whole set of challenges that will require enhancing the retrieval and analysis capabilities of PhiloLogic and opening new lines of inquiry into textual analysis. To meet these challenges, we have embarked on a development program we have dubbed "From Words to Works." In this program, we are seeking means to leverage machine learning to move from single hit retrieval to large-scale results analysis. New techniques coming out of the field of Information and Computer Science such as information retrieval, machine learning, text data mining, and document clustering offer new ways to approach humanities text research that complement, but do not replace, more traditional approaches supported by systems like PhiloLogic. To support our initial experimentation, the ARTFL Project has developed a set of drop-in extensions to the PhiloLogic system called Philomine. This is an interactive environment designed to allow us to run experiments easily and quickly on a large number of different databases using a variety of feature set selections. It also is designed to allow the user to link back to PhiloLogic text search reports or specific objects. While the current, experimental implementations need to be refined and remain computationally too expensive to allow for public implementation, over time we hope to develop and release a revised version of PhiloLogic to support more coherently some of the Philomine functions with which we are currently experimenting. When we have tested these new functions, we will begin to make them available to the larger community of scholars as extensions to our PhiloLogic system. Conclusion In closing I would like emphasize that most of the development efforts we have undertaken, be it in the area of collections or software, would have been impossible without the help and collaboration of the rather extraordinary set of graduate students from a whole range of disciplines -- humanities, social sciences, computer science, mathematics -- who have worked with us over the years. I would like to express my deep recognition for their contributions as well as for the support of my colleagues in the Department of Romance Languages and Literatures at the University of Chicago. |