Header ARTFL EFTS Comments Help About Philologic Eebo About EEBO-TCP Search Home

Help

EEBO-TCP Idiosyncrasies and Searching

  • Abbreviations: abbreviations have been resolved whenever provided in the encoding (e.g., "&abper;ticuler bookes" becomes perticuler bookes"). The expanded text appears underlined. To see the original, click on the page number to go to the EEBO page image. Unresolved abbreviations appear mostly as a tilde (˜) (e.g., sole˜nly for solemnly). The tilde is a non-word-breaking, unsearchable character. One must search for solenly to find sole˜nly. It is best to use a wildcard such as sole*nly, which finds solemnly, solempnly,sole˜nly, and sole˜pnly.
  • Ligatures: Ligatures (ae, oe, and dz) have been resolved into two letters for searching (e.g., enter aeternal, not æternal).
  • Macron and Breve: These diacritics are not to be entered in word searches. To find wrath with a long mark, enter wrath. The text will show as wra¯th in the results. Unfortunately ˘ does not display properly so that "demurelie" with short marks looks as follows in display: de˘mureli˘e.
  • End of Line Word Breaks: Words that are broken at the end of a line are brought together for searching. Those that are hyphenated appear with a vertical line (e.g., de|livering); those broken without hyphen appear with a plus sign (e.g., ver+tue). Do not enter either the plus sign or vertical line when searching for these words.
  • Illegible Text [gap]: Illegible text is indicated with a [gap]. These annotations are word-breaking since gaps can be many words as well as a few letters.
Full-Text Searching Using PhiloLogic

The term(s) to be searched in selected documents are entered into the Search for: box on the search-form. Word searches in PhiloLogic are by default case insensitive, so that a search finds both lower and upper case representations of words. The user must, however, take into account diacritics when searching databases that have accented characters. PhiloLogic's wildcard characters may also be employed to match many forms. The simplest search in PhiloLogic is a single term search without wildcards. If searching for a term such as "mystery" in the database, simply type the word mystery into the Search for: box and press the SEARCH button.
Tip: At this time, only the first 10,000 occurrences of a word are available in the results formats "Occurrences with Context" and "Occurrences Line by Line." Because EEBO-TCP is a very large database, one will encounter this limit with some regularity. One can limit a search by using the bibliographic fields or one can run a Frequency by Title search, from which all occurrences are available.

Boolean Operators

| (vertical bar):
serves as the OR operator (e.g., freedom|liberty retrieves instances of either). Nevertheless, uppercase OR will automatically be converted to the vertical bar during searching.
! (exclamation point)
serves as the NOT operator (e.g., !holy ghost retrieves occurrences of ghost, but not holy ghost, whereas Jesus !Christ finds occurrences of Jesus without Christ). In any case, uppercase NOT will automatically be converted to the exclamation point during searching.
Space:
serves as the AND operator in sentence and paragraph Proximity Searching (e.g., church state retrieve all cases where church and state appear in the same specified context; this is not the case in phrase searching). Nonetheless, uppercase AND will automatically be converted to a space during searching.

Wildcard Characters in Full-Text Searching
Wildcard characters allow the user to enter a single search entry that may find many forms. This is in contrast to a simple word search which requires an exact match in order to find a word. Wildcard characters can be useful, for example, in identifying cognates made obscure by affixes and vowel weakening, inconsistencies due to irregular orthography, and variations on account of word inflection as well as for discovering potential emendations for uncertain readings. The most commonly used wildcards are listed below.

. (period):
matches any single character (e.g., gentlem.n will retrieve gentleman and gentlemen).
* (asterisk):
matches any string of characters, anchoring the match at the beginning of a word (e.g., cigar* will match cigar, cigars, cigarette, etc.), anchoring the match at the end of a word (e.g., *habit will retrieve habit, cohabit, and inhabit), or in the middle (e.g., c*eers matches compeers, cheers, and careers).
.? (period question mark):
matches the characters entered or the characters entered plus one more character in place of the question mark (e.g., hono.?r matches both honor and honour and cat.? matches cat and cats, but not cathedral, Catherine, etc.). Try co.?templa.ion to match contemplation, contemplacion, co˜templation, co˜templacion, comtemplation, and comtemplacion or ..?onderful to match wonderful and vvonderful.
[a-z] (square brackets):
matches a single character found in the specified range (e.g., [c-f]at will match cat, dat, eat, and fat) or any letters within the brackets (e.g., d[e|i]spis[i|y]ng will match despising, despisyng, dispising, and dispisyng).
Tip: If you are using wildcard characters and would like to see a full list of the words matching your search-term, then run your search as a "Frequency by Author" search. The results page of a "Frequency by Author" search lists all the terms found in a database that match your search-term.

Accents and Special Characters
PhiloLogic requires that one take into account diacritics when searching documents with accented characters in both bibliographic and full-text searching. The system provides three ways to search for accented characters: 1) simply type the required accented character from the keyboard; 2) use a capital letter to match all accented and non-accented forms of a letter; or 3) enter the two character representations listed below.
Tip: If you do not want to have to think about accents, turn on "Caps Lock" and type in all uppercase. This is recommended since accentuation varies: one finds, for example, naivete, naivetè, and naïveté in the database. Be sure to enter and, or, and not in lowercase in phrase searches.

capital letter = any form of the letter
(e. g., E matches é ê è ë and e (no accent) and É Ê È Ë and E (no accent).
grave = (\) back slash
(e.g., a\ matches à).
acute = (/) forward slash
(e.g., e/ matches é).
circumflex = (^) caret
(e.g., e^ matches ê).
cedilla = (,) comma
(e.g., c, matches ç).
ümlaut/dieresis = (") double quote
(e.g., u" matches ü).
tilde = (~) tilde
(e.g., n~ matches ñ).
ae-ligature (æ) = ae
the ligature is resolved into two letters. (e.g., to search æther type in aether).
oe-ligature (œ) = oe
the ligature is resolved into two letters. (e.g., to search œconomy type in oeconomy).

Punctuation and Full-Text Searching
All punctuation should be stripped from word searches except for apostrophes. Apostrophes must be entered as characters.

apostrophe (') = must be entered without a space following.
(e.g., to search Emily's type in Emily's, but realize one must enter d'alexandre and therefore one will not find this occurrence when searching alexandre).
hyphen (-) = a space
the hyphen is not a searchable character. (e.g., to search capo-mastro type in capo mastro).
ampersand (&) = should be stripped
is not a searchable character. Avoid Phrase Searches where an ampersand could be used as a conjunction.
period, question mark, exclamation point, and comma = should be stripped
are not searchable characters.
parentheses, various brackets, and double quotes = should be stripped
are not searchable characters and are word-breaking (e.g., to search vor[r]ia enter vor r ia).
common mathematical symbols
the equal sign (=) and minus sign (-) will produce a "Nothing found" message. The plus sign (+) is not a searchable character, but, if entered, will be ignored.

Text Formatting
Formatting (e.g., font shifts, superscript, subscript, italics, bold, underline, etc.) are ignored in a search (e.g., search 1st simply as 1st).

Selecting a Search Option: One may use upper or lower case letters; searches are case insensitive. Wildcards can be used in all search options. Be sure to review sections on accentuation and punctuation in full-text searching.

  • Single Term and Phrase Search: To search a single term in the entire database or a defined corpus make sure that the Single Term and Phrase Search radio button is highlighted, simply enter the term into the Search Text(s) For: box, and press the SEARCH button. Single Term searching supports wildcard characters and the Boolean OR-operator, which is the vertical bar (|). Entering, for example, freedom|liberty retrieves all occurrences of the word "freedom" or "liberty" in the entire database or a specified corpus. Phrase searching restricts the search to adjacent words in a particular order (punctuation in the text, except for apostrophes, should not be entered).
  • Phrase Separated by a Number of Words: If you are looking for a phrase that could have intervening words, turn on the Separated by radio button and enter the number of words (e.g., "mystery of His body" or "mystery of Christ's body", then enter mystery body).
    Note: For better performance it is a good idea to exclude very common words such as "of" in separated phrase searches.
  • Proximity Searching in the Same Sentence or Paragraph: Searching for more than one term in a single sentence or paragraph without regard to adjacency or word-order constitutes Proximity Searching. Simply type the terms in question into the Search Text(s) For: box, indicate whether they are to be found in the same sentence or paragraph by highlighting the appropriate radio button, and press SEARCH. Proximity Searching supports wildcard characters, the Boolean operator OR, which is the vertical bar (|). If looking for occurrences of the words "church" and "state" within the same sentence or paragraph in any order, enter church state. Entering church state|throne retrieves instances of "church" and "state" or "church" and "throne" in the same sentence or paragraph.
    Please note that many texts do not mark paragraphs and so the entire text is indexed as one paragraph. Also, some texts have the sign ¶ instead of a paragraph tag <p>. These signs are not indexed as paragraphs.
Selecting a Results Format: At the head of any results format one finds the bibliographic criteria limiting one's search, the number of texts searched, the search term(s) entered, and the total number of occurrences of the search term(s) in the database. The number of occurrences displays at the bottom of the report if PhiloLogic has not detected the number before generating the first 25 occurrences on the screen.
  • Occurrences with Context is the default reporting format option. In this format each occurrence is represented by a short citation consisting of the author's name and the title of the work followed by links to the occurrences within several levels of context such as page, paragraph, scene, act, chapter, body, or contents. Below the citation there is a passage of text consisting of some forty words on either side of the key word, which is highlighted. Clicking on the links takes one to that level of context at which point one finds links to the previous and next sections.
  • Occurrences Line by Line (KWIC) is a good format for scanning or printing large result sets since it limits the text displayed to a single line of text. Each occurrence is represented by its Title ID with a linked reference to where the term(s) in question occur within the document. At the bottom of the report one finds the Results Bibliography, which lists the full references for the Title ID.
  • Line by Line (KWIC) Sorted: One can also sort Line by Line results. Under Refined Search Results, click on the radio button and indicate whether the results are to be sorted by the word to the left or to the right of the keyword.
Refined Search Results:
  • Frequency by Author, Year, or Title Reports do not display text. They list the number of occurrences in descending order of frequency with the frequency in bold and the rate per 10,000 in square brackets. There is also a link to the digital table of contents for each title and a link to the occurrences found within that title. At the top of frequency reports one also finds the number of unique forms derived from the search criteria (e.g., clemenc*) within the database and a full list of those unique forms (e.g., clemency | clemencye | clemencie | clemencia).
  • Frequency per 10,000 Reports differ in that they list frequency in descending order of rate per 10,000 with [frequency] in brackets (e.g., 4.72 [4] means 4.72 occurrences in 10,000 words with a total of 4 occurrences in that title, author, or group of years.)
  • Collocation Table allows the user to discover lexical collocations within the database. The user selects one word as the node or keyword and enters it into the Search for: box. Wildcards are allowed, but no phrases; single terms only are permitted. Select the number of words that a given word can be separated from the keyword (5 words is the default). The program then scans the concordance entries for the keyword and lists in table format all the words which occur within the specified distance of the keyword in order of frequency. The three columns represent words on either side of the keyword, words to the left of the keyword, and words to the right of the keyword. Common words such as articles and demonstratives are filtered out. See the list of Filtered Words. To include filtered words in a report select "Turn Filter Off" on the search-form.
  • Line by Line (KWIC) Sorted by Keyword allows a user to sort his/her results by the words which locate to the right or left of the keyword. This report does not support phrase searching. It can only be generated for a single word or word patter (e.g., myster*). Results of over 20,000 cannot be sorted.
  • Word in Clause Position (Theme/Rheme) A Word in Clause Position Report can only be generated for a single word or word pattern (e.g., concord*). Word positions are calculated on within what percentage of the length of the clause the word falls. Front of Clause (first 35%); Last (last 10%), Remainder (middle 55%), Too Short (clause length 3 words or less). Words of 2 letters or fewer and numbers are excluded in calculating clause length. Please note that clauses are identified with punctuation as the primary determining factor so many unpunctuated clauses will go undetected. This feature is experimental and should be used only as a rough indicator. The search can take some time for complete results, but one does receive results as they are ready. When the report is finished, click on the link to "Statistical Summary" to see a rough indication of word position.
  • Word Similarity Only one word may be used with no pattern matching (e.g., mysterious, but not myster*). Words must be of five characters or more. (To generate a list of words using wildcards, uncheck Similarity and check Frequency by Title after entering a pattern, e.g. myster*, in the search box. At the top of the Frequency report one sees a list of all matching words.) After a Similarity report has been generated, one may check the box of any word one wishes to include in his/her search. The number to the left of the box indicates the number of times the word appears in this database.

Bibliographic Fields and their Descriptions: The following fields can be used to find documents or to limit the documents in a full-text search. Fields can be used in conjunction with each other. To see what options are available for any given field, click on the "Terms" button next to the field. If you want to see what options are remaining in another field after entering search criteria in another you can do that by clicking on the "Terms" button next to it after entering search criteria in the field. These are string searches so that by entering "Christian" in the Subject field, one will also be searching for Christianity.

  • Author: Enter last names first and be sure to include exact punctuation and spacing. If uncertain how a name is spelled click the Terms button to the right of the author field for a list.
  • Title: Titles can be quite lengthy and with varied orthography. It is best to put in a unique phrase rather than a complete title.
  • Date (Numeric): The year of publication, not composition. Enter a single year or a range (e.g., 1758 or 1750-1799).
  • Date (Text): This is a text field with information on date. One cannot search a range of years.
  • Publisher: Information on publishers and printers.
  • Publisher Location: City and at times address.
  • Subjects: Generally, Library of Congress subject headings.
  • Description: Includes information on pagination, illustrations, etc.
  • Notes: This field often contains information on location or condition of the original text, subtitles, and the like.
  • Title ID: The title ID is the ID assigned to a given work by TCP. It is a constant ID so if one wishes to query the same title or set of titles repeatedly, there can be some benefit in learning their IDs.

Text Objects and their Attributes: The following fields can be used to find sections of documents or to limit searching within sections of documents in a full-text search. These fields are string searches so that by entering "fox" in the Div Name (Head) field one finds foxe and foxes as well. Fields can be used in conjunction with each other; for example, you can enter fable in the Div Type field and fox in the Div Name field to find all fables with the word fox in the title. To see what options are available for any given field, click on the "Terms" button next to the field. If you want to see what options are remaining in another field after entering search criteria in another you can do that by clicking on the "Terms" button next to it after entering search criteria in the field.
Note: This database is lightly tagged and as such one should avoid arguments from silence.
  Div Objects and Attributes

  • Div Name (Head): generally the name or title of a particular section (div).
  • Div Type: the type of a particular section (div) such as chapter, colophon, fable, letter, and vision.
  • Div Author: the author of a particular section within the document as opposed to the author of the actual document.
  • Div Language: indicates the language of a particular division (e.g., ara, eng, fre, gre, heb, ita, lat, spa).
  • Salutation: the text of a salutation associated with letters, speeches, dedications and the like.
  • Dateline: the date and usually also place of origin often found with letters, dedications, prefaces, and such.

  SubDiv Objects and Attributes

  • Tag: tags below the div level such as ARGUMENT, CLOSER, EPIGRAPH, LG, LIST, OPENER, SP, and STAGE.
  • Tag Type: types associated with tags below the div level such as anagram, chorus, envoi, epigram, month, song, sonnet, and syllogism.
  • Tag Language: indicates the language of the material within the tag (e.g., eng, fre, gre, ita, and lat).



Send questions or comments to ets@lib.uchicago.edu.
PhiloLogic Software, Copyright © 2001 The University of Chicago.
PhiloLogic is a registered trademark of The University of Chicago.