The Historical Sciences in the Age of Digitization

Libraries are digitized and put online, respectable publishing houses make their products available on homepages, historical documents or literary works are photographed, digitized and presented to the general public. Important sources are text-genetically analyzed and electronically edited, seminal Western dictionaries are input in China, made searchable and sold on CD-ROM or made accessible on University servers. The international scientific community has never been so closely linked, never has there been so much material available to so many researchers of the Historical Sciences. It sounds like a Garden of Eden.

So, is all sunshine and roses, or are there one or two snakes hidden in the branches? What are the new methods we can bring to bear on our sources? Do the new possibilities make our life easier, and produce better results? Are modern researchers really in a better position to do their work than their colleagues of a generation ago? Or are there new obstacles, new problems to consider? These are some of the questions we want to address in this panel.

We invite papers dealing with topics relating to above questions. The geographical focus need not be limited to South-Asia. Indeed, Researchers on exotic fields such as Japan, the Islamic World or Italian History would be warmly welcomed.

Chairperson: Mark Schneider

Mark Schneider

Instead of an Introduction: Some Remarks on Digitization with East-Asian Writing Systems

The last one or two decades have seen a fair amount of ambitious digitization projects, not only in the Western hemisphere but also in parts of the world which seem far removed from the trodden paths of European and American historical, linguistic or codico­ logical research. In this paper I want to call attention to the situa­tion of digitization efforts in East Asia, where large projects have been accomplished almost unnoticed by the better part of the Western academic world. Notable examples are the digitization of the Chinese classics, or of the Taishô Tripiṭaka, a collection of the Chinese Buddhist canon and its Japanese commentaries.

Along with the usual, these projects have to fight with prob­ lems arising from the writing system. A logographic writing sys­tem, to use the most common term, poses its very own challenges to the task of digitizing a source, not the least among them encod­ ing and font range. Other problems are of an institutional nature.

In my paper, I will concentrate mostly on Japan, with just a few casual remarks on the situation in China. First, I will talk about the more technical problems involved in processing historical sources produced on the basis of such a complex writing system. Then I will report on some factors obstructing or aiding large­ scale digitization projects which are based in the peculiar struc­ tures of East Asian academic landscapes. I will proceed to intro­ duce some concrete examples for such projects and the solutions they have found for the problems outlined before, and conclude with a general outlook.

Camillo Formigatti

<title type=”alt” xml:lang=”eng”>TEI and Cataloging Sanskrit Manuscripts</title>

Primary sources—be they literary, archaeological or of other na­ture—are the backbone of historical research, and the first crucial step of every scientific enterprise is their correct classification and description. Besides being carriers of texts (i.e. literary sources), manuscripts can also be looked at as archaeological artifacts belonging to the material culture of a given society. Among the primary sources they are thus a treasure of information for ar­chaeologists, historians and literary scholars alike. Yet precise­ly their richness and complexity makes their evaluation and in­terpretation difficult. Manuscript catalogs provide very often a first clue on how to approach this task. The correct and accu­rate description of a manuscript includes a great deal of critical analysis, both of the text(s) it carries and of its material aspects. Moreover, a catalog entry very often gives a first appraisal of the state of the art by means of a selected bibliography.

In the field of South Asian studies, manuscript catalogs played an even more important role. During the 19th century many  European  scholars  traveled to the Indian subcontinent and to Central Asia in search of manuscripts of Sanskrit and Middle Indo-­ Aryan texts, very often working with the help of local Paṇḍits. In 1868 the Indian Government set out in an am­bitious enterprise to compile a catalog of all Sanskrit manu­scripts in Indian and European libraries. It is thanks to reports and catalog written by scholars who traveled through the whole of India collecting and buying manuscripts, and to cata­logs of Indian manuscripts kept in European libraries that the knowledge of Sanskrit literature made a huge step forward. Many texts hitherto unknown—and many others that had been deemed lost—were discovered.

Unfortunately not all collections have been fully cataloged. The collections of South Asian manuscripts in the Cambridge University Library comprise Pali, Sinhala, Burmese, Sanskrit and Prakrit manuscripts. Only a small part of the Sanskrit manu­scripts has been cataloged by Cecil Bendall in 1883. The Sanskrit Manuscripts Project, Cambridge is currently cataloging all Sanskrit manuscripts in the collections with the aim of making their descriptions available to the scholarly community through a digital catalog. Moreover, a significant portion of the holdings will be digitized and thus become accessible all over the world. The present paper will investigate the advantages and short­ comings of a digital catalog as compared with a traditional catalog in book form. Particular attention will be devoted to the encoding of the information according to the Text Encoding Initiative (TEI) standards. The highly hierarchical organization of the TEI schema for cataloging manuscripts often forces cat­alogers of South Asian manuscripts to put the relevant data in the Procrustean bed of categories developed for Western manu­scripts. Yet a digital catalog has many advantages over cata­logs in book form. The second part of the paper will thus focus on aspects such as the digitization of manuscripts, the possibil­ities opened up by cross referencing information and the open character of digital texts. A short conclusion will be provided—or may be not, who knows? After all, digital texts are always an in fieri process, and so is this catalog.

Vanja Štefanec

Natural Language Processing in Philological Research

[long abstract to be found in the conference booklet]

