The Orlando Project: Building Digital Resources for an Integrated History of Women's Writing in the British Isles

Presented at the Digital Resources in the Humanities Conference 1997

Susan Brown, (University of Guelph); Patricia Clements, Isobel Grundy, Susan Hockey, Terry Butler, Susan Fisher, Katherine Binhammer, and Jeanne Wood, (University of Alberta)

The Orlando Project is a five-year collaborative enterprise which will produce the first full scholarly history of women's writing in the British Isles. It combines interdisciplinary research and the use of electronic technology to create a history which will appear in both printed and electronic form. We call our history "integrated" because the different strengths of these two media reinforce each other. The project brings together the disciplines of humanities computing and literary history in a team which consists of five principal investigators, two post-doctoral fellows, a project librarian and ten graduate research assistants. It is supported by a Major Collaborative Research Initiative grant for Can$1.6 million from the Canadian Social Sciences and Humanities Research Council and began in summer 1995. One principal investigator and two graduate research assistants are based at the University of Guelph. The remainder of the team are at the University of Alberta, where the project is directed by Patricia Clements, Dean of Arts.

Literary studies have entered a period of renewed interest in history, a result in part of the critique of earlier historical accounts by researchers into writing produced by women and other marginalized groups. Scholars of women's writing have long felt the need for a broad literary history that is drawn from this field of knowledge itself and not adapted from masculine norms. We aim to provide for women's writing something which the work of centuries has furnished for "mainstream" writing: an account which places various genres, issues, and local and temporary groupings in relation to each other. We will produce five printed volumes: a chronology volume and four volumes of literary history covering the periods: beginnings to 1830, 1820 to 1890, 1880 to 1945 and 1939 to the present. We will also produce an as yet unknown range of electronic products, derived from the digital resources which we are compiling.

Unlike almost all other current digital projects in the humanities, the Orlando Project is not creating electronic representations of primary sources or other existing print or manuscript material. Rather, we are developing a textbase for a literary critical history which incorporates basic research on all aspects of British women's writing. Put very simply, our textbase is a very large collection of research notes and material, derived by normal scholarly research methods. What makes our material significantly different is that the entire collection incorporates rich SGML encoding which reflects not only the structure of the documents, but description, interpretation and content tagging to meet the needs of the literary scholars as they develop and shape the intellectual goals of the project.

There were many reasons to choose SGML for this project, some of which (longevity of the material, machine-independence etc) will be very obvious to this conference. From our perspective SGML allows us to address the complexities and contradictions of a scholarly field, and to provide multiple levels of encoding for framing critical arguments and for linking data and argument in hypertexts. With SGML we can write the kinds of continuous prose which scholars in the humanities expect to work with, but we can also include in that prose content tagging to make the information accessible for retrieval and analysis by scholars, students and general users alike.

Developing the SGML encoding scheme for the Orlando Project was thus somewhat different from the normal procedures for document analysis, since there were no extant documents to analyse! Rather, it became a kind of project requirements analysis, incorporating a detailed discussion of the aims of the research as the co-investigators sought to define what they want the project to accomplish. From this, the project had to determine how the material might best be organized and what the critical elements within that organization should be. Major components of the projected textbase can be seen as the following: a complex and dynamic chronology of women's writing, collections of brief informative documents on topics ranging from biographical summaries and critical assessments of writers to discussions of literary genres and short explanations of historical events as well as lengthier discussions treating major developments in women's history, mapping, sound clips, images and video. All these must be integrated by SGML structures. Bringing the intellectual concerns into the SGML encoding has been the biggest challenge.

Early in our work, several primary areas of enquiry were identified: women, their writing and the social, cultural and historical context which influenced their writing. These formed the basis of the development of three major DTD's: biography, writing and events. The project has been able to build on the work of the TEI in defining the structural components of our DTDs, but we have also needed to add tagging to accommodate our research needs, to include chronology items and research and scholarly notes as well as more refined tagging for names and places. Beyond these we have devised tagsets for the discussion of issues of authorship and attribution - anonymous, pseudonymous, unattributed, disputed, and collaborative texts - for genre issues - varying names of genres, and hybrid, or innovative or experimental genres - and for issues of reception - gendered responses as well as others. The concepts inherent in the TEI header have also helped us create a document header ("Orlando header") which describes the document and records its progress as the graduate research assistants, volume authors and others work on it.

Within the TEI-like shell are the content, critical and interpretive tags which are the essence of the scholarly research in the project. Hundreds of tags have been discussed and debated, as the project sought to articulate its scholarly requirements in more detail and in new ways. For each tag we must then formalize the definition, and then develop and monitor practice issues in its precision and consistency of use, as new research is added to the textbase.

In the first few months of the project a DTD for biographies was developed and the first group of graduate research assistants trained in its use. The creation of biography documents began in January 1996 and inevitably revealed some shortcomings in the DTD design. These have been addressed and work is now proceeding relatively smoothly. By mid-May 1997, some 500 biography documents have been created, at least in an initial form. E-mail lists and a hypermail archive help the taggers keep track of decisions made on practice issues and the occasional round table session gives them an opportunity to discuss problems. The writing and events DTDs are also now in place and work has begun on creating these documents both by tagging new research and by converting information which has separately been compiled in Access and in ProCite.

Our first major milestone is the completion of the chronology. Each of the DTDs contains items for the chronology tagged as elements. With a Perl script we are able to pull all of these out, sort them by date (and by keyword if needed), thus creating a draft chronology for the co-investigators to examine and comment on. Naturally, all changes will be entered into the original documents so that they are available for all future uses of the textbase.

Selecting and developing the computing tools for the Orlando Project has also posed some interesting challenges. The choice of SGML ensures that our information will outlive the system on which it is created, but we had to make it easy for a team of literary researchers with only wordprocessing experience to work with SGML, so that they could concentrate on the intellectual decisions in the tagging and not be distracted by too many computing issues. Windows was the obvious choice for platform and we initially also chose to use WordPerfect SGML Edition on the grounds that most of the team already had WordPerfect experience. During the first year of tagging enough problems arose with WordPerfect for us to decide to move to Author/Editor in the spring of 1997. All the taggers also have a copy of Panorama to view their documents.

Keeping track of all the documents being created and revised by a team of this size at two different sites was another challenge to be tackled. We devised a Web-based checkin/checkout system for the central repository of documents and over the summer of 1997 are developing this into more of a full-blown workflow management system. This will become crucial as we gradually move into a mode of revision and improvement to the textbase as we make more use of tools that will help us get an overall picture of the material. At present we make use of SGREP to perform one- off structured searches. All the biography documents are also loaded into OpenText's LiveLink software for broad searches, and we are embarking on more customization to make this useful.

However our major computing challenge is for the delivery system or systems. We do not yet know what these will be, but the team can, and does, indulge in much speculation on what we would like the user to be able to do. We have the tagging in place for the implementation of many different functions, but we have yet to see how these will work out in practice or how they will scale up as the textbase grows. At present, our energies are directed to the completion of the chronology. We have been accepted for Electronic Book Technologies (EBT) Higher Education Grant Program and believe that we can deliver the chronology in traditional and in innovative ways using the EBT software.

As the project moves into its second phase, we have plans to begin to work with multimedia, with geographic information systems, photos, audio and video. These will pose a new set of challenges as we grapple not only with technology that is new to most of the team, but, much more significantly, how these new media can best be used to serve the intellectual goals of the project.


