Issues, principles and policies for creating high-quality digital resources with low-cost methods

Alan Dawson, Centre for Digital Library Research, University of Strathclyde

alan.dawson@strath.ac.uk

September 2005

Scholars in the arts and humanities are using digital resources in increasingly varied and sophisticated ways. Many of these resources are digital versions of existing resources such as photographs, printed publications or correspondence. Often they are of high quality, created by libraries, museums and archives focused on making their resources available to the broad spectrum of potential users. However, where digital resources are created to meet the needs of an individual research project, the emphasis is understandably on the requirements of that specific project rather than on any broader objectives.

This presentation aims to demonstrate some low-cost methods for creating high-quality digital resources and maintaining them with non-specialist staff, while ensuring that they can support the widest possible range of uses beyond the immediate context of their creation. In this context, 'high-quality' refers to the characteristics of the digital resources, not the content itself. Five particularly desirable characteristics of digital collections are considered:

For each of these key issues, the presentation will describe and demonstrate a number of practical methods that have been used to create and manage digital resources efficiently within the Glasgow Digital Library, and will present them in a theoretical context by showing how these methods have evolved from a set of underlying principles and assumptions that underpin the philosophy and purpose of the digital library. The illustrations will draw on a range of different types of material, including photographs, letters, historical ebooks, exhibitions and political ephemera.

1. Sustainability

Being able to sustain a digital collection means keeping costs as low as possible over a long period. It often means keeping a service running after project funding has expired.

Principles and assumptions Implications and methods Illustrations
Time is money. Time-saving methods enhance sustainability. Automate whatever can be automated, direct manual effort where it has most value. Show automatic creation of websites that can readily be amended by non-specialist staff.
Aim to eliminate recurrent costs such as software licences, system support contracts, redesign costs. Use low-cost desktop software. Ensure initial website design allows for expansion and can easily be amended by content editors. Show use of Word and Access as content management tools, with markup embedded in database for ease of editing.

Example policy implications

Possible conflict between the sustainability of low-maintenance dynamic websites (which may have high software costs) and the accessibility of automatically-generated static websites.

2. Longevity (preservation)

Principles and assumptions Implications and methods Illustrations
Never put valuable data into any software unless you know how to get it out again, with all content and structure intact. Avoid over-dependence on any software formats that are liable to become obsolete. Set up and test import and export mechanisms for databases and content repositories. Illustrate testing of alternatives, e.g. exporting to XML as long-term resource storage format for text.
Structure is permanent, presentation format is transient. Separate inherent structure of resources from their online presentation, and ensure that the master content repository fully captures structure. Demonstrate how text structure can be captured in Word or even plain text as well as in a database or in XML. Show effectiveness of CSS.

Example policy implications

Possible conflict between preservation and sustainability - more copies means more work means higher costs.

3. Accessibility

The term accessibility is often used in a restricted way, referring to the compliance of websites with the needs of users with disabilities. There is also the much broader meaning of accessibility; the ability of potential users to locate information of interest and relevance to them (often referred to as resource discovery). Both meanings will be addressed here.

Principles and assumptions Implications and methods Illustrations
Machine-readable and searchable text is more accessible and useful than text available only in image format. Carry out OCR and proof reading wherever possible. Illustrate use of both image and text together.
Resource creators should acknowledge and adapt to user preferences for resource discovery. Digital resources must be readily located via search engines (without compromising other principles). Show how to make metadata work with Google, e.g. by loading it into the >title< tag.
Searching is not the answer to all information needs. Provide a range of access routes, e.g. search and browse interfaces, subject terms, indexes. Show how to convert existing book indexes to work online in ebooks.

Example policy implications

Possible conflict between preservation and accessibility, e.g. faithful reproduction of original non-standard characters may impede effectiveness of searching.

4. Interoperability

Principles and assumptions Implications and methods Illustrations
Think globally, act locally. Digital resources created for a specific project should not be locked in to use only within that context. Metadata should not be constrained to one specific scheme. Website design should not enforce a single narrow view of resources. Show storage of metadata in generic fields that allows output in different forms for different purposes, with tags for different schemes embedded in database.
The content of metadata fields (semantics) matters far more to users than the container (syntax). Use accepted international standards for resource description while aiming to meet needs of targets user groups. Show how use of a general scheme such as LCSH can co-exist with use of a local subject scheme.

Example policy implications

Possible conflict between accessibility for local or specialist users, who may require a specific terminology, with the demands of interoperability, requiring generic terminology.

5. Reusability

Principles and assumptions Implications and methods Illustrations
With universal access, resource creators can not predict the possible uses and value of their content for all users. It should be possible to identify and extract individual images, articles and objects for use elsewhere. Show use of item-level granular metadata for individual ebook chapters and sections.
Additional value can be obtained from items by presenting them in different contexts, but unnecessary duplication of effort and content is to be avoided. It should be possible for the same item to appear in more than one digital collection or online exhibition, without duplication. Show how the same item may have different metadata for different contexts.

Example policy implications

Possible conflict between the use of value judgements to enhance the accessibility of metadata for a particular audience, while detracting from its potential reusability for other purposes.

Conclusion

It is possible to use low-cost desktop software for efficient content management in a small-to-medium-sized collection (around 10000 items), while adhering to important objectives of digitisation. By making explicit the principles and assumptions underpinning a collection, it is easier to understand the implications of those assumptions and to adopt appropriate methods for flexible content management. This process in turn helps to inform the policies and decisions that are required (though not always acknowledged) for long-term maintenance of a successful digital resource collection.