IWW Mark-Up (Draft 6/17/04)

Preliminary Comments:
In page, divs and all other SGML tags, any data following an equal sign (=) must be in double quotation marks. Within those quotes we cannot have any quotes (most commonly a problem with name="".) If needed, use SGML entities “ ”.


I. Headers
Headers will only be made up of three associated ids. Metadata will be derived from indexing. Headers will be followed by a <text> tag, then immediately followed by a page tag. All text needs a page tag. Use "NA" for not available if there isn't one. Of course, calculate backwards for some unnumbered pages in frontmatter. First page tag is always followed by a div1 tag.

<authorid n="A0051">
<titleid n="A0051-T001">
<editionid n="E00001-01">
<page n="X">
<div1 name="X" type="X">
<p>Blah, blah, blah . . .
<p>More blah, blah, blah . . .

II. Page Tag

The page tag should be constructed as follows on its own separate line with no leading or trailing spaces:
<page n="X">

Where X equals the page number. It must precede any other information on the page or it will be lost to the preceding page. This also means that if you have a page image tag, it too MUST follow the page tag. It does not matter where the page number actually appears on the printed page; it must come first. Do not center, align, bold, italicize, etc. the page number either. Do NOT surround by paragraph tags or it will break the phrase/sentence/paragraph. Words with end of line hyphenation that crosses page tags must be rejoined before adding the page tag.

Page tags may break divs, paragraphs, and sentences, but not words (or tables). Do not enter page breaks; they will be calculated from the page tag.
Be sure to end italics, bold, underline and <head> tags before starting a new page. Then you can restart with the new page.

<page n="13">
<page n="xxii">
<page n="NA">

If you see the following:

. . . This is the text that is ending the page. I must con-


tinue it to the next to show you the whole thing.
    Now on to the next paragraph. . . .

Example of the preceding bit of text tagged properly:

<p> . . . This is the text that is ending the page. I must continue
it to the next to show you the whole thing.
<p>Now on to the next paragraph . . .

Remember, page tag on its own line with NO leading or trailing spaces or the notes on the page will not have a link to that page. Also, NEVER KEY IN A BLANK PAGE. A page must have something on it other than a page tag (figure tag as well is just fine).

Alternate Page Numbering
Many texts include alternate page numbers in bold and square brackets. These references often refer to earlier editions. If one types in <b>[178]</b>, then the text will break phrases and such. Instead one enters the text as such <altpage n="178"> and the text will not be indexed and we can display it however we want on output.

III. Divs

We go to three levels:

<div1 name="Frontmatter" type="Frontmatter">
  <div2 name="Home after Many Years of Travel" type="Title Page">
  <div2 name="Dedication to Mr. Fiddle" type="Dedication">
  <div2 name="Preface" type="Preface">
<div1 name="Part One" type="Part">
  <div2 name="Volume One" type="Volume">
    <div3 name="Chapter One" type="Chapter">

Divs surround their texts.

The div tag should be on its own separate line

**If the div name is all that is needed for display, then do not replicate it. The div name will appear on the screen and be indexed.

After pagination, start with <div1> subdivided by <div2>, <div3>, etc. The logical structure may be as follows:

Div attributes: Name and Type
Divs MUST have names (e.g., derived from a title or header). The name of a div will display and be searchable. Divs must also have types. These will display in search results in the future. Div names and types are to be constructed as such:
<div1 name="Volume I" type="Volume">
<div2 name="Chapter One" type="Chapter"></div2>
<div2 name="Chapter Two" type "Chapter"></div2>
<div1 name="Volume II" type="Volume">
<div2 name="Chapter One" type="Chapter"></div2>
<div2 name="Chapter Two" type="Chapter"></div2>

Letters and diaries have extra attributes.
<div2 name="Letter III" type="letter" year="1575" recipient="Lucas, Livingston">
Diaries would have only year=. If the year is unknown enter 9999; do not leave it blank. Names of recipients should be inverted and if unknown enter "Not indicated" with "indicated" in all lower case.

Note: The double quotes ("") around the text in name and type are mandatory! Also, if a title, which is being used as a div name has a note attached to it, do NOT put a note inside the div name. Find another place on the page for that note.

Alternate titles should be constructed as such:
<div1 name="Letter of X to Y">
<head>A Poignant Love Letter</head>

Note: Both titles will be in the actual text and therefore searchable. Only the first within the div will display in the digital table of contents. If you want the editor's title, e.g., "A Poignant Love Letter," to appear in the digital table of contents as well, then make it a subtitle of the div1 name as such:
<div1 name="Letter of X to Y: A Poignant Love Letter">

Please do not put in a <br> or double quotes; use the colon and &ldquo; &rdquo;.

See the list of Div Types to be used.

IV. Paragraphs vs Double Breaks:
We prefer the end of paragraph tag (</p>) on its own separate line.
<p>first paragraph text
<p>next paragraph of text

A <p> tag is an intellectual structure. Users should be able to look for this word and that word within the same paragraph. The <p> tag defines that paragraph. If you need a blank line in the middle of a paragraph, use

Do not include <p> tags in notes. Use <br><br>

It is best to end italics, bold, underlining, and the <head> tags with each paragraph and then start them anew with the next paragraph.

Paragraph Attributes
<p> tags can have only one attribute, which is align. The values are r, l, c. This is to be used very sparingly since the system does not respond well. Try to use <center> tag in place of the align attribute <p align="c">.

V. Line Numbers and Indent:

Line Numbers
Line numbers for poetry and drama must be captured, but we do not want them to break the flow of the text. They must be put into the following construct:
<l n="x">
x is the line number. If there are no numbers then one can use <l>.

One needs to indent certain forms of poetry. (One does not need to indent the beginning of prose paragraphs.) Use the <indent> tag for this purpose with the number of spaces noted in the "n" attribute. Also, for poetry with stanza's we use <lg> </lg> instead of <p> </p>. Example:

1 Ahi cieco amore! ad anime
      Prive di bei consiglj
      Ah perche far di figli
      Un disgraziato don!
5 No che tal don non merita
      Chi 'l suo dovere obblia,
      Chi dell' error la via
      Va trascorrendo ognor;
9 Chi do suoi vizj il cumolo
      Soverchiamente accresce,
      E l' albero, che cresce
      Chi coltivar non sa.

<lg> <l n="1">Ahi cieco amore! ad anime
<l><indent n="3"> Prive di bei consiglj
<l> <indent n="3"> Ah perche far di figli
<l> <indent n="3"> Un disgraziato don!
</lg> <lg> <l n="5">No che tal don non merita
<l> <indent n="3"> Chi 'l suo dovere obblia,
<l> <indent n="3"> Chi dell' error la via
<l> <indent n="3"> Va trascorrendo ognor.
</lg> <lg> <l n="9">Chi do suoi vizj il cumolo
<l><indent n="3"> Soverchiamente accresce,
<l> <indent n="3"> E l' albero, che cresce
<l><indent n="3"> Chi coltivar non sa. </lg>

VI. Speeches: Speakers and Stage Directions:

For drama, use a speaker tag around each speaker's name and a speech tag around dialogue with stage directions enclosed in <stage> tags as follows:
<p> <speaker>character's name</speaker>
Dialogue goes here. <stage>Stage direction in speech</stage> And speech continues.

Do NOT add a break <br> or an end of paragraph tag </p> after the </speaker> tag. The system will bold and add the break. The system will also italicize the stage directions.

VII. Figures:

All images will be in the same directory so we do NOT need a path. Simply put in the exact filename with extension (filename.ext) where you see sysid="filename.ext".

There are two kinds of images, inline and linked to. Inline images pop up on the page. Answer "y" for yes:
<figure inline="y" sysid="filename.ext">

Linked to images (answer inline "n" for no) where ref="Image" will make it so that the word Image will be the hypertext link to the image.

<figure inline="n" sysid="filename.ext" ref="Image">

Ref options for figures can vary.
"Image" - any color or black and white graphic (photo, line drawing, etc.)
"Manuscript Image" - a facsimile of a manuscript page (handwritten)
"Page Image" - a facsimile of a printed page.

Other options for ref text may include Portrait, Map, or Chart

A third kind of figure tag, which we should avoid or use sparingly. The above linked to image does not add text to the indexing. You can search the database for "manuscript image" and not find it even if you have hundreds of such links. To make the ref text searchable (using a caption to link to the image) then you do it this way:
<figure inline="n" sysid="filename.ext">the text of caption</figure>
This is likely to introduce mistakes so let's avoid it.
All complex tables should be treated as images.

VIII. Links:
In an effort to keep the database true to its name, Italian Women Writers, excessive commentary and other editorial matter will be available only as linked HTML files. Short notes and such will be indexed, however. The HTML files are to be in a directory named by the ftext_code (Title ID). The index.html file for the directory will be the table of contents page. The html file will have name sections so that one can go directly to the point of the commentary. Use the hash mark # to get to exact points in the HTML file (e.g., A0024-T001/Prima.html#B) with <a name="X"></a> in the html file (e.g., <a name="B"></a>). In the sgml text one finds something like this
<link sysid="A0024-T001/Prima.html#B" ref="B">
where "B" is the text that one clicks on to get to the html text.
This will not be in superscript, but it will be surrounded by square brackets on output (e.g., Blah, blah, blah [B]). Do NOT add the square brackets. The system does. Realize they'll be there when you add the ref so that if there are parentheses around it already, best to strip them out.

IX. Notes:
Notes (footnotes and endnotes) will be embedded in the text.
<note></note> contains a note or annotation. General rules for inclusion: Treat all foot and end notes the same way; Omit notes that refer to numbered pages of the text (e.g., "For further discussion, see page 35.") as these will be meaningless. Do not forget to remove the note reference along with the note. Do not use superscript; the system does this. Frequently one sees something like Johnson2 which is indexed as Johnson2. We supply note references dynamically.

Construct tag as follows:
body of doc<note ref="#">#. text of note.</note> body of doc
**A space is required following the tag if a displayed space is desired. The tag does not leave an automatic space.

In the following year Dr. Henry Parsnip1 ventured into terra incognata and witnessed many fabulous happenings.

1. Dr. Henry Parsnip is best known as the close companion of Father Francis Garden.

<p>In the following year Dr&dot; Henry Parsnip<note ref="1">1. Dr&dot; Henry Parsnip is best known as the close companion of Father Francis Garden.</note> ventured into <i>terra incognata</i> and witnessed many fabulous happenings.

a. Notes can be long and be made up of several paragraphs. Do Not use a <p> tag in the note. Use a <br><br>.

b. Notes, not in the author's voice, that are extensive should become links.

c. Notes on notes are not permitted. Include the note on a note in square brackets []. To make sure the note does not break phrase searching, try to place it at the end of a sentence.

d. Notes cannot be attached to titles and subtitles that are entered as div names. Find another place on the page for the note reference.

X. Lists and Tables:

A. Lists
Note: a list will not be considered a separate object in terms of searching unless surrounded by <p></p>

B. Tables
All complex tables should be treated as images.
The main thing you want to realize is that if you insert a page tag in the middle of a table, when you click on [page], if you pull up the first half of the table, it won't have the </table> tag and so won't display. When you pull up the last half of the table without the opening <table> tag, you won't get the table. If a table crosses pages, close it at the page break and then start it over again. Goofy, but that's what we're stuck with.

XI. Display
Simply use these HTML tags:
Most commom errors: not leaving a space between words in italics or bold. Putting punctuation inside of bold or italics tags.
Do not use the <font size="+1"></font> tags. Instead use <head> or some other rendering tag that is not a font tag.
It depends on what one is rendering. Speaker tags should be included and formatted in output.

Be sure to end italics, bold, underlining, centering and <head> tags before starting a new paragraph, page or div.

XII. System Requirements:
Period following abbreviations should be rendered &dot;
& Ampersands = &amp;
Single quotes = &lsquo; and &rsquo;
Angle brackets = &lt; and &gt;

Apostrophes and double quotes are fine in text.
Use &laquo; for « and &raquo; for »
Section signs are okay too: § = &sect;

A fuller list of html entities is available.

XIII. Diacritics:
Use html entities, e.g., &egrave;=è &Egrave;=È &agrave; &Agrave; &uuml; &Uuml; &ecirc; &Ecirc; &ccedil; &ntilde; &aelig; &AElig; &oelig; &OElig;

See this Web site with character entities.

XIV. Questions and Missing Text:

Use <question> for questions to be referred to IWW editors.

If text is missing from a page use a <lacuna> tag.

If pages appear to be missing (not blank pages, but pages with text) use the following tag: <missing pp.x-y>, where x-y represents page numbers.

XV. Excisions:

Pages, illustrations, and other text to be omitted will be crossed out. The following do not need be crossed out, but should be omitted during data-capture:

Notes on Structural Hierarchies and Tagging

One must not confuse intellectual/logical divisions with formatting for display. A <div1> or <div2>, a paragraph, a sentence, and a word are all logical units and must not be broken by tags arbitrarily. Here are examples of things to watch out for:

Division <div>:
Don't use div tag for an alternate "title." div is a logical structure. Use <head> or <subhead> tags.
Alternate titles should be constructed as such:
<div1 name="Letter of Abigail Adams to John Adams, August 15 1774" type="Letter">
<head>A Poignant Love Letter</head>

*Paragraph <p>:
If you want a blank line within a paragraph, you must not employ the <p> tag for this display. You must enter <br><br>.

Sentence termination will be calculated automatically by employing punctuation, spacing, and capitalization. Certain abbreviations (Mr. Dr. Rev.) marked with a period will break the sentence and will impair phrase and proximity in same sentence searching. So as not to confuse these periods with full-stops, replace the period with the SGML entity &dot; and we will convert it to a period on output for display.

End of line and end of page hyphenation of words must be eliminated and the words joined. The orphan text should be brought up to the previous line/page. Also, there are problems that cause word clusters. A tag does not add a space. The most common mistake is italicized text. She read the book <i>Pride and Prejudice</i>by Jane Austen. The index will have the word cluster Prejudiceby because there is no space. Same problem with bold, underline, super/subscript and font size tags.