The University of Chicago Full-Text System

PhiloLogic Development Site PhiloLogic User Manual Sample Searches Lincoln/Net under PhiloLogic Databases under PhiloLogic

What's New: An Open Source version is now available from the PhiloLogic Development Site.

PhiloLogic, a suite of software developed by the ARTFL Project at the University of Chicago in collaboration with The University of Chicago Library, provides sophisticated searching of a wide variety of large encoded databases on the World Wide Web. It is an easy to use, yet powerful, full-text search, retrieval, and reporting system for large multimedia databases (texts, images, sound) with the ability to handle complex text structures with extensive indexed metadata.

PhiloLogic in its simplest form serves as a document retrieval or look up mechanism whereby users can search a relational database to retrieve given documents and, in some implementations, portions of texts such as acts, scenes, articles, or head-words. This same document retrieval mechanism serves as the basis for defining a corpus in a full-text search. One can, for example, either retrieve all documents in a database written by women from 1935 through 1945 or one can search for words or phrases within database which fit those criteria. The typical PhiloLogic search is broken down into five distinct stages: 1) defining a corpus (i.e. limiting a search), 2) word expansion, 3) word index searching, 4) text extraction, and 5) link resolution and formatting (e.g., SGML to HTML conversion). In other words, after defining a corpus (or one may search an entire database), one can execute a single term, phrase or proximity search. By looking up indices of the word(s) in a relational database, PhiloLogic extracts blocks of text containing the search term(s) with links to larger blocks of text. These extracts are formatted to display on a Web browser and sometimes include links to images, sound recordings, other texts, or even other databases.

In addition to simple word and phrase searches, users can perform more sophisticated searches by using extended UNIX-style regular expressions for complex wildcard searching and, in some implementations, morphological and orthographic expansion. All of these mechanisms to expand words can be combined using Boolean operators such as OR (the vertical bar "|") and AND (a space) within a variety of searching contexts.

Its functions were originally designed for scholarly research in databases of literary, religious, philosophical, and historical collections of texts as well as important historical encyclopedias and dictionaries. PhiloLogic handles notes so as not to interfere with phrase searching. Users can easily search words with diacritics (either by specifying accents or ignoring them by typing in uppercase) and non-Romanized scripts. At present there are some fifty databases on the Web under PhiloLogic containing languages such as ancient Greek, Latin, Hindi, and Urdu as well as nearly all Western European languages. PhiloLogic can also be set up to recognize or ignore manuscript notations such as different brackets, which can indicate spurious text or editorial emendations. Because the software recognizes typical text structures as real data objects, it understands units, such as words, sentences, paragraphs, sections, and pages, permitting very flexible searching and retrieval of these textual objects. Other full-text engines on the market search for strings of characters. Rather than searching for two words within the same sentence or paragraph (intellectual units), other engines must search for two words within a certain number of characters regardless of sentence or paragraph. With PhiloLogic scholars always know where they are in a given text since pagination can be displayed along side other objects. Such a high degree of indexing can lead to decreases in speed, PhiloLogic indexing has been maximized such that it is still incredibly fast on the Web.

PhiloLogic is a registered trademark of the University of Chicago, Copyright 2001 University of Chicago.

The ARTFL Project Subscription Information EFTS UCTech Questions and Comments