First Semi-Annual Report
October 1997 - March 1998

The Digital South Asia Library: A Pilot Project
Lead Institutions: University of Chicago and Columbia University Libraries

Submitted by James Nye and David Magier, June 12, 1998

The University of Chicago Library and Columbia University Libraries have completed the first half year of a two-year pilot project on South Asia under the Association of Research Libraries' "Global Resources Program" funded by the Mellon Foundation. Work began in October 1997, one month later than proposed, with approval of Deborah Jakubs.

1. Progress Relative to Proposed Objectives

Start-up activities predominated during the first six months. By the end of this reporting period we were in production mode, except in Hyderabad, where delays related to religious holidays and the location of a contractor for data entry postponed full-fledged efforts in creating periodical index records.

The schedule of activities as proposed for the first year follows. We have moved forward well towards meeting all these objectives.

September 1997 - August 1998

Begun

Finished

Activity

X X Site visits to Hyderabad and Madras in July, August, and January supported by Columbia and Chicago [by Magier and Nye]
X X Select, purchase, and install equipment [by Nye, Magier, and staff in India]
X   Selection of serials for indexing and Official Publications of India for treatment during the first year at the October Advisory Board meeting [by the Advisory Board]
X   Indexing of Tamil, Urdu, and English journals [by staff in India and at Columbia]
X X Publish the pilot project web pages [by Nye and Magier]
X X Selection of reference tools for conversion during the two years [by Nye and Magier, in consultation with South Asia library colleagues in the U.S. and India]
X   Conversion of reference tools and Official Publications of India to digital form [by staff in India]
    Critique of journal index records at the April Advisory Board meeting [by the Advisory Board]
X   Begin planning for phase two of the program, contingent on favorable assessment during first year [by Nye, Magier, the Advisory Board, and library colleagues in Madras and Hyderabad]
X   Preparation of proposals for funding phase two [by Nye and Magier]
    Publicize the project and resources made available under the project [by Nye and Magier]
X   Submission of progress report to ARL [by Nye and Magier]

Site visits took place, as scheduled. Magier was in Hyderabad in July while Nye was in Madras and Hyderabad in September and January. These visits were extremely important in maintaining strong working relationships with our colleagues in India and in refining the approaches for creation of index files.

We purchased and installed computers at the Roja Muthiah Research Library (Madras) and the Sundarayya Vignana Kendram (Hyderabad). Equipment manufactured in India was selected. It has performed well, to date. If problems arise with the hardware, we expect that it will be easier to go to the manufacturers for repairs since they are local.

Members of the Selection Board began selection of serials. The Board includes two Urdu and two Tamil scholars -- one each from the U.S. and India for each language -- and a scholar of colonial India. Specifically, the Board consists of: Frances Pritchett, Professor of Urdu, Columbia University; Shamsur Rahman Faruqi, Professor of Urdu, Jamia Millia Islamia University, New Delhi; Norman Cutler, Professor of Tamil, University of Chicago; E. Annamalai, a prominent Tamil scholar, recently retired as Director of the Central Institute of Indian Languages, Mysore; and Dipesh Chakrabarty, Professor of South Asian Languages and Civilizations, University of Chicago. The Board met in conjunction with the annual South Asia Conference in Madison in October 1997 to review the project and suggest several approaches to selection of serial titles. The Board also received valued comments from Prof. C. M. Naim from the University of Chicago, Nalini Persad and Salim al-Din Quraishi from the British Library, and Gail Minault from the University of Texas on specific serials worthy of early indexing.

Staff were engaged in India and trained. We consciously began with fewer staff than will be required so we could work out the details of data entry. We created a manual for training in data entry, a copy of which is enclosed with this report. The statistics on production for this start-up phase are appropriate. A total of 4,250 index records were created -- 3,500 English records for the Bibliography of Asian Studies, 650 for Tamil periodicals, and 100 for Urdu periodicals.

We have not yet announced the web site for our project, waiting for the mounting of more useable data. However, a draft of the project web page is accessible. Some of the links from the page are already "live".

Selection of reference tools for conversion was considered at the October and March meetings of the Committee on South Asian Libraries and Documentation. There was a consensus that we should take up the late nineteenth-century Catalogue of the India Office Library, both for reproduction on acid-free paper and for conversion to an electronic resource. The BookLab, in Austin, will be asked to produce the paper copies for sale to subscribing libraries.

Selection of Official Publications of India titles for conversion is completed. The conversion to electronic text will begin during the project's second half-year.

Proposals to fund phase two of this project are under preparation.

    1. Nye contacted the Ford Foundation in New Delhi about support for the Urdu Research Centre at the Sundarayya Vignana Kendram in Hyderabad. The Foundation is willing to receive a proposal for support of the Centre's infrastructure and for a first phase of preservation activities.
    2. Magier and Nye were asked to form a new center under the Council of American Overseas Research Centers. The center will focus on issues of preservation and access for library collections with strong holdings of regional language materials of South Asia. The Center for Research Libraries is considering a proposal to incorporate the new center under CRL.
    3. The South Asia National Resource Centers at Chicago and Columbia, along with the Triangle South Asia Consortium in North Carolina are preparing a proposal to the Department of Education under Title VI for creation of a web site containing dictionaries of South Asia. The dictionary project is an expansion on our efforts to convert existing reference works into electronic resources under this ARL project. The proposal, for submission in October, will request approximately $575,000 for conversion of at least one dictionary for each language of South Asia to a structured database marked with SGML tags. In addition to a web site with a search engine for exploring the dictionaries, the full databases will be available without charge for delivery via file transfer protocol over the Internet.

Inclusion of the Triangle South Asia Consortium in North Carolina in the proposal to the Department of Education constitutes an early realization of an objective set for phase two of this project, that is, "expansion of our strategic alliances to include locations in other South Asian countries and more North American participants in the Digital South Asia Library program."

2. Related Activities

Distribution of the Bibliography of Asian Studies (BAS) on CD-ROM has been postponed, with energies of the editors instead directed towards bringing up a searchable file on the World Wide Web. From July 1 onwards, the database will be available on a subscription basis. More information is the BAS site.

A preliminary list of serials at the Urdu Research Centre is available.

A list of serial holdings at the Roja Muthiah Research Library is completed and will be mounted this summer at the RMRL site.

An electronic Tamil-Tamil-English dictionary and a dictionary of Tamil idioms and phrases are being brought up by the Committee on Institutional Cooperation's (CIC) South Asia Library Project. These two dictionaries are running in test mode at Chicago. CIC expects to have a formal contract signed with the copyright holder and announce the availability of the dictionary this autumn. CIC is also in the last stages of negotiation with Oxford University Press over distribution rights for Stuart McGregor's Oxford Hindi-English Dictionary. Finally, CIC South Asia National Resource Centers are converting John Platts' A Dictionary of Urdu, Classical Hindi, and English using funds approved under a special collaborative library project for the centers.

It is noteworthy that official expressions of interest have come from Indian and Pakistani government libraries seeking collaboration on the program of access to the Official Publications. Holdings of those publications in both countries are strong, but widely dispersed. Scholars and librarians at those South Asian collections are seeking the easier access this project will provide.

3. Difficulties Encountered and Remedies Effected

The most important difficulty we have faced is recruiting and training staff in India and especially Hyderabad. It has been slightly more difficult than anticipated finding indexers with the requisite knowledge and sense of commitment to undertake project work, especially given the short term of ensured employment. The solution we effected in Hyderabad was to subcontract the work to a private corporation, the Computer Corporation, owned by Mr. Ashhar Farhan, a leading software vendor for word-processing and typesetting in Urdu script and the son of a prominent Hyderabad Urdu scholar. The early results point towards success through this approach. Mr. Farhan and his team have made vital suggestions on the structure of index records and the nature of the relational database containing the data during the course of their preparing the first hundred Urdu index entries. They have joined us as true colleagues.