Building a digital library in 80 days: the Glasgow experience

Alan Dawson

Centre for Digital Library Research, University of Strathclyde, Glasgow G1 1XH

January 2003

alan.dawson@strath.ac.uk

This article appears in the book Digital Libraries: Policy, Planning and Practice (Andrews J. & Law D., ed. 2004)

Summary

This chapter considers the main issues identified by researchers in the field of digital libraries and compares them with issues actually encountered in developing a real digital library service. In order to make this comparison meaningful, some background information is given about the Glasgow Digital Library. This is followed by a systematic commentary on sixteen research issues, with details of problems encountered in practice, solutions adopted, lessons learned, and an assessment of each issue's significance. The 80 days of the title refers to the time available to the author to carry out implementation.

Background

The Glasgow Digital Library (GDL)1 is intended to be a regional, co-operative, distributed digital library, based in Glasgow, the largest city in Scotland. It attempts to combine theory with practice, research project with user service, and to balance the immediate needs of local partners and users with a global and long-term perspective on digital resources. It is not a typical digital library, but there is probably no such thing.

The GDL is based in the Centre for Digital Library Research (CDLR)2, although operated as a co-operative venture. It was funded as one of several research projects under the theme of collaborative collection management.3 Its long-term aim is to create a digital library to support teaching, learning, research and public information, but its initial requirement was to investigate and report on planning, implementing, operating and evaluating the library. This meant there was a need to create a library service in order to research its management and operation, although funding was provided only to carry out the research project, not to create a user service. Once the project was under way, this paradox was partially resolved by submitting additional funding bids for specific digitisation projects, which allowed the library to create its own content. However, the time and work devoted to content creation and management meant that by the end of the two-year research funding there was little time for completion of some initial goals, such as evaluation, promotion and development of a sustainable financial model.

By early 2003 the GDL incorporated six main digital collections, with a total of around 5000 items4. Details of these collections are available via the GDL itself, along with documents recording its partners, policies and early development. Further information about the philosophy and development of the library is given by Nicholson & Macgregor (2002), while more information about its structure and collections is provided by Nicholson, Dawson & Macgregor (2003).

Issues, Problems and Solutions

In an extensive review of literature, databases and project websites concerning digital library research, Chowdhury & Chowdhury (1999) identified sixteen headings and highlighted 'major research activities in each of these areas', aiming to give 'an indication of the research issues that need to be addressed and resolved in the near future in order to bring the digital library from the researcher's laboratory to the real life environment.'

Many of the issues identified in this paper are closely interconnected, so the headings are not entirely distinct topics, but they do offer a useful means of structuring the subject area. Much of this chapter is therefore organised according to these sixteen headings. Under each heading is a commentary on issues and problems that were actually encountered in trying to create a digital library, as well as details of any solutions found and lessons learned. There is also a rating of the importance of each heading for the GDL, on a scale from 1 to 5 (1 = not important, 5 = extremely important), and an assessment of the percentage of implementation time spent on matters relating to that heading.

At the time of initial implementation, the main focus was on the collections and how to deal with their content, with little reference to research papers. The following evaluation of the relationship between theory and practice is therefore entirely retrospective.

1. Collection Development

Issues and problems: In the early stages of the GDL, collection development was not a big issue because the focus was on establishing a collaborative framework for library operation, involving all project partners. Furthermore, there was little point worrying about collection development policy when there was no funding to carry out digitisation anyway. It was necessary to deal with a preliminary issue of library purpose and philosophy (see heading 17 below) before taking a coherent view of collection development.

Solutions: Once the realities of what was feasible with available funding became clear, collection development could be addressed. Initially there were three main priorities:

In practice it was over a year before it was feasible to write a collection policy. Although the policy is terse and brief, it is important that it exists, has been agreed by project partners, and is publicly available.

Lessons: A coherent collection development policy is essential but can not be created in a vacuum. It has to follow on from a broad vision of library purpose and philosophy, and needs to balance ideals against practicalities, which may involve assigning development priorities. The policy is likely to require periodic updating and should be reviewed at least once a year.

Practical importance of this issue to GDL: 4
Time expended: 3%

2. Development methodology and design issues

Issues and problems: Organisation and management of diverse collections of digital objects was a big issue for the GDL, but at a simpler level than described by Chowdhury & Chowdhury, who refer to a digital library system as 'a number of servers, spread over the Internet, that interact with each other to meet user requests'. The CDLR had already established a service meeting these criteria (CAIRNS8) but this was regarded as a distributed library catalogue rather than a digital library. The GDL was not initially concerned with interactions between distributed servers, but it did need a system for managing its content. This was less simple than it sounds, with multiple contributors, collections, file formats and access methods involved. The requirements of different elements of the GDL meant that at one stage similar (but not identical) information was being held and maintained in several different forms:

While there is nothing wrong with providing information in more than one form, there had to be mechanisms for integrating its management and ensuring that updating only took place in one location.

Solutions: A clear and consistent naming scheme was devised and adopted for all objects in all collections within the library. As Access databases were required as a condition of funding for three of the first six collections9, Access was chosen as the initial content and metadata repository for all six collections. Library content and metadata was then generated from Access in different forms for different purposes, e.g. web pages and MARC 21 records10, by including HTML markup and MARC tags in the database and using Visual Basic programs to automate the integration and exporting of content, markup and metadata11. In effect, this was a modular, flexible, low-cost content management system using common desktop software. Advantages of the modular approach for the GDL were that it facilitated flexible re-use of metadata, enabled content creation to be easily distributed amongst contributors, and allowed additional collections to be plugged in to the library relatively easily and cheaply.

Lessons: Managing diverse digital content from multiple sources does not necessarily require expensive content management software, but it can help. Most people, even librarians, are not as organised or systematic as they might be. Developing an efficient methodology for a heterogeneous distributed digital library requires a diligent and consistent approach from those involved, or rigorous control policies and mechanisms, or software tools that can compensate for lack of diligence and rigour. Basically, a library of any size and complexity has to be well-organised and managed whatever software is used.

Practical importance of this issue to GDL: 5
Time expended: 20%

3. User interfaces

Issues and problems: There were five main requirements for the GDL web interface: consistency, flexibility, scalability, accessibility and feasibility. Consistency meant devising a design template that could provide visual coherence across all collections without imposing blanket uniformity. Flexibility meant enabling users to access library content in different ways ( across as well as within collections. Scalability meant creating an interface that would look acceptable with only three or four collections yet be able to cope with dozens or hundreds. Accessibility meant meeting requirements of funding bodies and standard web accessibility guidelines.12 Feasibility meant creating something quickly and inexpensively.

Solutions: As there was little time to spend creating and testing complicated designs, and no designers eager to show off their skills, the obvious solution was to make a virtue out of necessity and go for simplicity. Content of the library was judged to be inherently interesting, with plenty of striking images, so all that was needed was a clear set of labels and some example images to illustrate each collection, together with options for navigating the library as a whole ( by places and by subjects ( as an alternative to the collection-centred view.

Lessons: There are so many interface design options for a large and complex service that it is impossible to offer general guidelines in a short space13. The critical point is to understand the main priorities for the user interface, with an appropriate balance between simplicity and sophistication, complexity and accessibility, features and feasibility, style and substance. The use of templates and stylesheets (or similar mechanisms) to provide design consistency and flexibility is essential.

Practical importance of this issue to GDL: 4
Time expended: 7%

4. Information organisation: classification and indexing

Issues and problems: One of the aims of the library was to make it possible to search and browse across collections, so a broad but controlled method of information classification was required. The CDLR has substantial experience of using the Dewey Decimal Classification scheme (DDC), in the long-established BUBL LINK catalogue of Internet resources, so is familiar with the costs and benefits of this approach. BUBL also uses a complementary system of controlled browseable subject terms14 which is popular with users, though lacking the depth of hierarchical structure offered by DDC. Both are desirable, but maintaining DDC classification and a controlled subject vocabulary is time-consuming and might not be sustainable for the GDL.

Solutions: The current approach is to use Library of Congress Subject Headings (LCSH) as the primary means of linking diverse collections into a coherent information structure. LCSH is far from perfect but it is large and widely used. One of its main problems is cultural bias toward North America, which in some areas renders it amusingly inappropriate for a Scottish context15. LCSH is therefore supplemented by controlled local subject terms, agreed with project partners, where this is considered essential. The GDL subject terms are used in the web interface while the LCSH terms are included in the metadata and used in contexts where international compatibility is required16. In addition to subject terms, controlled authority files for place names and people names are used (where relevant) to provide library-wide consistency and an alternative to the collection-centred view.

Lessons: One of the beauties of a digital library is the flexibility it provides in allowing the same item to appear in more than one place, under different subject headings, or even as part of different collections. In order to make this feasible, a controlled information structure is required, such as LCSH or an alternative taxonomy. This can also serve to illustrate the scope and scale of collections, and influence topic chunking, as well as making searching more reliable. It is much easier to make changes to a controlled vocabulary than try to introduce retrospective consistency to an uncontrolled set of keywords.

Practical importance of this issue to GDL: 5
Time expended: 5%

5. Resource discovery: metadata

Issues and problems: Inevitably, there were numerous issues associated with metadata creation and management, some quite subtle and detailed. Only the most significant can be mentioned here. One major question was whether to catalogue the original item or the digital copy, e.g. was the author (MARC) or creator (Dublin Core) of a booklet the person who wrote it or the organisation responsible for digitising it? Perhaps the biggest issue was how to handle the whole metadata creation and management process, in order to ensure consistency when the same records appeared in multiple locations. There was even the prospect of both including metadata within data (DC records embedded in web pages) and data within metadata (item descriptions or even full text included in the 500 or 520 field of a MARC record). With a large collection of relatively small digital objects, the initially clear distinction between data and metadata began to crumble. Granularity was also a major issue ( were individual metadata records required for every item in a themed collection, or could we manage with a few collection-level records? This problem of granularity appeared to require a mechanism for cascading metadata, analogous to cascading style sheets for web page formatting.

Solutions: Unsurprisingly, the solution adopted was to design a flexible database structure that was not tied to any particular metadata scheme, then to write content extraction routines (Visual Basic programs) that allowed metadata to be generated in different formats for different purposes. For example, separate forename and surname fields in the database for the Aspect collection were combined to produce standard name format for display via web pages, but a surname-first format for use in embedded Dublin Core metadata and in a separate file of MARC records. The use of default values in databases made consistent record creation relatively easy, but once the records had been detached from their nest and released into the world as independent items, each of them had to have the full metadata set included. No means for implementing cascading metadata17 at a global level has yet been found.

Lessons: It is better to be accurate and consistent in creating metadata for a few keys fields ( e.g. title, author, summary, date, subject terms ( than to be so daunted by complexity that nothing gets done. The question of which metadata standard to use (Dublin Core, IMS, MARC etc) is not a lifelong commitment, as it is possible to translate between them (up to a point), if the metadata itself is in good shape. More difficult questions concern who should create the metadata, whether central editorial control is required, and how the process is managed. The ability to carry out repetitive but precise search-and-replace editing on specific metadata fields (or use an alternative control mechanism) is enormously useful, as it allows decisions to be taken early without worrying about future revisions. Controlled vocabularies or authority files are extremely useful for fields such as resource types, dates, people, organisations and places, as well as being essential for subject terms.

Practical importance of this issue to GDL: 5
Time expended: 10%

6. Access and file management

Comments: This topic may be better labelled as access control or authentication, as it is concerned with policies and procedures for controlling who can access different types of content. This is a big issue for commercial services and some digital libraries, but not for the Glasgow Digital Library, which is freely available to all.

Solutions: None required as yet.

Lessons: If access control is required then you need a scaleable system capable of handling it and sound administrative procedures.

Practical importance of this issue to GDL: 1
Time expended: 0%

7. User studies

Comments: The CDLR has conducted user studies for other research projects, but these were not specified as part of the GDL project, although their potential benefit is accepted.

Solutions: None required as yet.

Lessons: Set priorities and accept that you can not do everything with limited resources.

Practical importance of this issue to GDL: 1
Time expended: 0%

8. Information retrieval

Issues and problems: The GDL aims to offer users several search options ( across the entire library, within a single collection, within a single field in a collection or the whole library, or cross-searching the GDL with other digital libraries and library catalogues. There are numerous software solutions available ( on one occasion over twenty possible search tools were counted within the CDLR alone. In contrast to this richness and complexity there is the Google factor ( web users have become used to the simplest possible search interface and very fast results. There are real difficulties in offering complex search options, and in summarising the scope and meaning of these options, via a simple user interface.

Solutions: This is still an area under development. In the short term, priority was given to creating a flexible browseable interface to illustrate existing collections, as users need an overview of the content in order to carry out useful searches. Once this was in place, search options were added one collection at a time, using different software solutions for different collections18. This is not ideal from a user service perspective, as there is an inherent (though minor) inconsistency between search operations in different areas of the library. However, from a research perspective it is useful to investigate and understand different search solutions. In the longer term, the aim is to use intelligent scripting to provide cascading search facilities, i.e. to search fields with the highest value first (such as titles and subject terms), then to search other metadata fields only if no matches are found, then continue with full-text searching or cross-searching only if no matches are found in any metadata fields. Although these options will be available explicitly via an advanced search interface, most users will probably use the simple search box. The cascading search19 is intended to make a simple search as effective as possible and to transfer complexity from the interface to the information retrieval mechanism.

Lessons: Although conceptually simple, there are almost endless possibilities in providing complex search facilities for a digital library. Standard issues need to be resolved, such as indexing, relevance ranking, case sensitivity, phrase searching, stemming, pattern matching, Boolean searching and results paging. In the short term it is better to offer something currently feasible than nothing at all, but it is also advisable to have plans for better services and work towards them one step one a time.

Practical importance of this issue to GDL: 5
Time expended: 8%

9. Legal issues

Issues and problems: Legal issues have not been a major headache for the GDL but still had to be addressed. As a short-term project and a partnership between several institutions, its legal status was unclear for matters such as intellectual property rights and liability. Where external funding was sought for digitisation projects, the GDL had to secure rights to publish the materials created, as well as submitting them to the funding body.

Solutions: None really. The question of legal status was left unresolved in the short term (there were many more pressing matters). The main legal requirement was to obtain written permission from copyright holders before digitising any copyright material. The main implementation decision was whether to proceed in those few cases where the copyright holder could not be traced (initially no, later yes).

Lessons: Copyright is still a massive issue for most digital libraries, and service providers must be clear about their rights and responsibilities. It is important to understand the legal issues, take initial advice and follow legal requirements, but equally important not to be impeded by unnecessary legal detail or worried by improbable scenarios.

Practical importance of this issue to GDL: 2
Time expended: 1%

10. Social issues

Comments: This broad heading is used by Chowdhury & Chowdhury to refer to "elements of the social world, including a sense of community, that we do not want to lose from our notions of 'library'." Researching these issues played a larger role in the initial plan for the GDL than in its subsequent development, but they were not ignored. There were three main areas where social interaction was considered: between project partners, between the library and its users, and amongst users themselves.

Solutions: None really, other than the usual mantra that good communication is very important. For project partners, the GDL relied on the trusty mailing list, which has some advantages over more active or intrusive methods. This will also be the preferred solution for user communication until there is evidence of demand for something more sophisticated (the web has thousands of empty discussion boards created by well-meaning designers trying to create a social space).

Lessons: Communications and meetings may appear to contribute little of tangible benefit, but are valuable in keeping partners informed of developments and enthusiastic about outcomes. An active and committed user community can not be enforced but has to be earned, for example by providing compelling content, by making it easy for users to contact the service (and each other) and by responding promptly to feedback.

Practical importance of this issue to GDL: 2
Time expended: 2%

11. Evaluation of digital information

Issues and problems: There are two main issues here ( ensuring users accept the authenticity of library content and preventing others copying and misrepresenting it. As a non-commercial venture, the GDL is more concerned with open access and flexibility than with protecting content from being copied. However, it has had to consider the question of whether to digitally enhance materials in order to improve their presentation, even though the aim of digitisation was to create an accurate copy of the original source material.

Solutions: Most attention has been paid to image manipulation, but similar principles apply to text, audio, video and other content types. In practice images are routinely manipulated after creation; they are converted from TIFF to JPEG format, reduced in size, and thumbnail images may be created. Sometimes borders are trimmed, edges sharpened, and gamma correction adjusted to lighten dark areas. None of this affects authenticity of the material ( it is common practice and does not involve misrepresentation. In a few cases image editing has been carried out. For example, some leaflets from the Aspect collection had been delivered to individual contributors and included their name and address. Rather than risk damaging the original by tearing off a sticky label, the personal details were edited out after digitisation ( a sensible procedure that does not affect authenticity. A small step further involved digitally enhancing a few images from the Red Clydeside collection, where the original was worn or damaged, e.g.

For the GDL, this has been the limit of digital manipulation; the aim is to preserve and accurately portray the original material, not misrepresent it. Yet these examples show the issue is less clear-cut than might be imagined. It is a big step from completing a background colour to adding missing words to a document or missing faces to a photograph, yet any manipulation at all could raise questions about authenticity.

Lessons: If measures are needed to protect online content then technical solutions are available, such as digital watermarking. Proving the authenticity of objects is a different matter, where solutions are more social then technical. Digitisation involves making value judgements, so it is advisable to define limits for any digital manipulation in a policy document, ensure those involved understand and adhere to it, and inform users of this policy. Such a policy could also cover general editorial conventions such as wording, spelling and use of value judgements in item descriptions. As well as being inherently useful, consistent use of explicit guidelines helps ensure the value of digital information and the reputation of the library.

Practical importance of this issue to GDL: 3
Time expended: 3%

12. Evaluation of digital libraries

Comments: As a research project, the GDL could claim to have fulfilled its obligations by submitting a final report to its primary funding body, with any practical value to library users being a bonus. This argument is simplistic and disingenuous, but could be used to justify absence of any formal evaluation process.

Solutions: None as yet. Ideally there would be qualitative evaluation, via user studies, and quantitative evaluation, by analysing usage logs of web pages and search terms to determine the scale and nature of library usage (for example, see Dawson, 1999). In practice such formal evaluation is unlikely until the library has been further developed and promoted.

Lessons: As no practical guidance can be given based on implementation, a theoretical perspective will be offered. Chowdhury & Chowdhury point out that usability assessment based primarily on interface design is 'too narrow a basis for evaluating something as complex as a DL... evaluation of a DL's effectiveness has to be in terms of its impact on users' work.' In other words, there is little point in having a beautifully designed interface to a collection that no-one is interested in, or a superb collection that is too hard to get at.

Practical importance of this issue to GDL: 2
Time expended: 0%

13. Standards

Issues and problems: More than any other topic, the question of standards permeates other issues, especially resource discovery, information organisation, and preservation. The GDL, like the CDLR, is committed to compatibility with international standards. Easily said, but meaningless in itself, when there are so many standards to choose from. The challenge for the GDL was not just to specify the standards it will use but to create a coherent information environment and set of procedures that enable content contributors to adhere to the standards in a practicable and straightforward manner.

Solutions: Choice of standards for the GDL was determined by its global outlook and concerns for long-term interoperability. Key standards are Dublin Core and MARC 21 for metadata, LCSH for subject vocabulary, AACR220 for resource descriptions, SQL for information retrieval, XHTML and W3C web accessibility guidelines for web pages, TIFF and JPEG for image format, and Z39.50 for cross-searching with other catalogues (OAI may also become important in future). These standards are supplemented by editorial guidelines and authority files for place names and resource types. Although local in scope, these are equally important in providing consistency of resource description across collections and assisting information retrieval.

Lessons: The standards listed above are quite different from each other in purpose and implementation. In order to make sensible decisions about which standards to use in a digital library it is essential to be aware of their scope and purpose (and the primary purpose of the library), but not necessary to understand the technical details of each one. Compliance with standards for service providers (as opposed to software providers) is rarely technically difficult but does require clear policies and disciplined work practices. Adherence to chosen standards should not be seen as an additional burden to be imposed after content creation but as an inherent part of the digital library development environment.

Practical importance of this issue to GDL: 5
Time expended: 7%

14. Preservation

Issues and problems: Another aspect to the principle of 'think globally, act locally' is 'think long-term, act short-term'. The GDL aims to create, describe and manage content with an indefinite life-span, including historical material that may still be of interest in hundreds of years time. There are three main categories of concern: physical storage media (disks, CDs, tapes etc), content format (relational databases, Word documents, web pages, image files etc) and information structures (MARC, DC, LCSH etc).

Solutions: For content and metadata, the solutions are in structures and standards. Textual content held in a consistent manner in a structured database can always be exported to another format ( even to plain text. International standards such as MARC21 are not going to disappear overnight, even if by 2203 it has become MARC41. Storage media might be more of a problem, as there is no knowing what technical developments will take place in future, so a good policy is to keep master copies of data in two formats and be prepared to migrate should either become outdated. Keeping a paper copy of every printable object in a digital library might also guarantee preservation, but would rather be missing the point too. Another possibility worth considering, if material is deleted or edited as well as added, is to take a complete snapshot of library content at fixed intervals, say once a year.

Lessons: Only store content in a software package if you are sure you know how to get it out again. Be prepared to store material in a different format to that used for public access. Make sure you can always match metadata to content (for example, a well-preserved old photograph might be quite interesting, but it has far greater value if you know who or what it was and when and where it was taken). Think long-term.

Practical importance of this issue to GDL: 4
Time expended: 2%

15. Implications for library managers

Comments: There are two complementary issues here ( implications for existing managers of physical libraries and implications for those who have responsibility for digital libraries but may not think of themselves as librarians. Some management issues are similar, e.g. dealing with user enquiries, others rather different, e.g. content maintenance and preservation. One major difference between the GDL and most physical libraries is that it creates and commissions content as well as collecting and managing it.

Solutions: None as yet. The GDL is still grappling with possible transition from research project to user service. Ongoing library management is a key issue to be addressed before this transition can occur.

Lessons: In a co-operative enterprise it is important for all partners to feel part of the library and be committed to its development, but co-operative management can be slow and inefficient. Someone needs to be in charge on a daily basis who can understand technical issues but not be subservient to them, and can take responsibility for maintenance matters such as collection updating, software upgrading, feature development and user enquiries.

Practical importance of this issue to GDL: 3
Time expended: 1%

16. Future directions

Under this heading Chowdhury & Chowdhury summarised initiatives and projects that were beginning or ongoing in 1999. Four years is a long time in digital library research, so it is worth considering a more recent view of the field, as offered by Soergel (2002), who presented 'a broad-based digital library research and development framework ... to evaluate and integrate existing research and practice and to provide a structured vision for what digital libraries can be'. Soergel's framework consists of three guiding principles and eleven specific themes for digital library research. The principles relate to ideological and social issues and will be discussed later, whereas the themes relate to future directions and are briefly assessed below, again by relating theory to practice.

Soergel's themes for DL research and development

Theme 1. DLs must integrate access to materials with access to tools to process these materials.

Comment: No real problem in practice if most user access is via web browsers with a plug-in for handling PDF. Where special software tools are needed, libraries need to decide how far to adjust the user interface to accommodate repetitive messages about access to supporting software.

Theme 2. DLs should support individual and community information spaces.

Comment: This is a value judgement that depends on the vision and purpose of the library (see heading 17 below). It may not be feasible or appropriate for all digital libraries. For the GDL it is regarded as a long-term priority in the collection development and management policy. In the short term it is far from practicable with current resources.

Theme 3. Digital libraries need semantic structure.

Comment: Absolutely. This theme covers similar ground to issues 4 and 5 above and is indeed, as Soergel argues 'of prime importance'. Without semantic structure then a digital library is barely a library at all, more the digital equivalent of a charity shop with content piled into unsorted heaps.

Theme 4. DLs need linked data structures for powerful navigation and search.

Comment: Soergel considers this a special case of theme 3, but with added value for users by providing 'links across disciplines and across digital libraries'. It is hard to question this principle, but putting it into practice requires subject expertise as well as effective software tools. For a library such as the GDL in an early stage of development it remains an aspiration rather than a practicable proposition.

Theme 5. DLs should support powerful search that combines information across databases.

Comment: Cross-searching is an important principle for the GDL, partly because it can draw on experience available locally from implementing the Z39.50-based CAIRNS service8. Even so, the facilities offered by current CAIRNS technology fall some way short of the sophisticated retrieval and presentation from 'distributed access to heterogeneous systems' envisaged by Soergel. This is a major research and development area for the CDLR, yet it is still a pilot rather than an operational service for the GDL. For digital libraries without similar infrastructure and expertise, this will be a difficult theme to pursue.

Theme 6. DL interfaces should guide users through complex tasks.

Comment: This is fine if you know what the complex tasks are likely to be, which may be true in subject-specific digital libraries with well-established user communities, but is not the case in a newly-created regional library such as the GDL. This theme is related to that of creating learning materials, which is considered under heading 19 below.

Theme 7. The DL field should provide ready-made tools for building and using semantically rich digital libraries.

Comment: This theme lies at the heart of the traditional divide between research and practice. The ultimate aim of much research is to deliver useful guidance and solutions for practitioners, yet the specific requirements of the GDL illustrate how difficult it is to provide generally-applicable software tools. For example, the GDL was required by funding bodies to submit content in Access databases, it was required to create websites to offer public access to content, required to offer cross-searching via Z39.50, and required to create MARC records for uploading to local and remote databases, and to do all this for a heterogeneous set of collections. There was little chance of any existing toolkit incorporating this functionality, so the GDL had to work out its own development methodology, enabling content and metadata for all collections to be integrated and re-used for different purposes from a single source. Other large libraries will have different, but perhaps similarly complex, requirements. Simpler digital libraries can perhaps be created with existing tools such as Greenstone software21. As ever, the main difficulties are not in information technology but in information description and management.

Theme 8. Innovative DL design should be informed by studies of user requirements and user behavior.

Comment: Undoubtedly true, but does the design have to be innovative, if the content is of inherent value? The digital library field has had no period of stability to allow user familiarity. Perhaps some web-based services (again the Google factor springs to mind) are extremely popular owing to their simplicity and predictability ( in fact their lack of innovation. They provide simple interfaces to large collections, so perhaps digital libraries can learn from them too.

Theme 9. DL evaluation needs to consider new functionality.

Comment: Soergel's main point here is the same as in heading 12 above, that evaluation of effectiveness has to be in terms of the purpose of the library and its impact on users' work, not just its interface.

Theme 10. Legal/organizational issues of information access and rights management need to be addressed using new technology.

Comment: Yes, but the benefits must outweigh the costs, which may be legal as well as technical and organisational, so the inclination of many digital libraries will be to remain open-access and non-commercial if possible.

Theme 11. DLs need sustainable business models.

Comment: Indeed, but easier said than done, especially in a field largely funded by short-term projects. This theme has the potential to undermine the other ten, as they are all potentially expensive. It may be difficult to justify expenditure on themes where the benefits are unclear or immeasurable. Soergel's point is valid, but the solution elusive. The GDL operates at the low-cost end of the field, where funding for digitisation projects amounts to hundreds or a few thousand pounds, not millions. It might be possible to have a sustainable business model with a small budget, but you have to be realistic about the extent to which research issues and themes can be incorporated into a user service.

Additional digital library issues

Some significant issues that arose in implementing the Glasgow Digital Library do not easily fit into the above sixteen headings and will be briefly considered.

17. Library purpose and philosophy

Comment: The way a digital library addresses the issues described above ought to reflect its underlying principles of operation, such as who the library is for, what is its vision, and who is paying for it. Soergel partially addresses this issue by proposing three 'overarching guiding principles':

These principles are not right or wrong, they are value judgements that may not apply to all libraries. Their relevance for the GDL varies according to whether its status is one of research project or user service. An example of an alternative principle for a digital library would be to promote social inclusion by providing simple, low-cost solutions to enable as many people as possible to contribute digital content within a coherent library framework. Different principles are not necessarily mutually exclusive, but they should be explicit so they can help determine the priorities of the library manager, the content of the collection development policy and the nature of implementation.

Solutions: As a regional initiative, the GDL has been something of a hybrid between public library and higher education library. Its content is not geared to particular courses, nor limited to groups such students, children or even Glaswegians. The involvement of public libraries at steering group level has ensured it aims to provide public information as well as material suitable for teaching, learning and research.

Lessons: Any digital library needs a clear sense of direction. Implementation will be difficult if the goalposts keep shifting. Decisions should take account of underlying principles as well as short-term priorities and technical possibilities.

Practical importance of this issue to GDL: 5
Time expended: 2%

18. Content creation

Issues and problems: Objects do not digitise themselves. Although implicit in other issues, the question of how to carry out digitisation and what file types and content formats to use is not explicitly addressed by Chowdhury & Chowdhury, yet is of fundamental importance to any digital library. Equipment such as scanners, cameras, microphones and portable hard disks is needed, or funding to pay service providers. Nor do items line themselves up in a queue next to the scanner or camera. Research has to be carried out, items selected for quality and relevance, captions written and edited, even titles may need inventing. Decisions then have to be made about image resolution, file sizes, file formats, workflow processes. Even with a basic formula of text and web pages, options include XML, HTML, PDF, Word, RTF, plain text, and decisions have to be made about symbols and characters sets. For image format there are even more choices. JPEG is common, PNG becoming more widespread, TIFF possibly recommended for printing or preservation, there are compression options to choose, and a balance required between image quality/size and speed of access. Large images such as maps require special software solutions at client or server end, or both. Another major issue is whether to carry out optical character reading (OCR) on text files held as images. This is time-consuming if results are properly edited, yet offers the substantial added value of fully-searchable text. For sound and video files, MP3 and MPEG are common but by no means universal, and there is range of proprietary formats.

Solutions: For digitisation itself, a mixture of solutions was used. Some was carried out internally on a simple desktop scanner, some by project partners, some by a specialist service (including all large images and glass slides). Using different methods was not a problem in itself but did require careful administration. Similarly, some research and content selection was carried out by the GDL research assistant, some by project partners, and some by external contractors with specialist knowledge. Choice of formats was sometimes determined by the funding body for a collection ( TIFF at 300dpi was the common image file specification. For public access the GDL offers lower-quality higher-speed JPEG files for screen display, with high-quality images available if requested by users for printing. XHTML is preferred for text, with PDF only used if essential for retaining complex layouts. OCR is carried out if feasible and is carefully checked before publication. MP3 will be used for audio files once these become available.

Lessons: Digitisation itself can be very quick ( images can be captured in seconds ( but the prerequisite selections and decisions, and the subsequent manipulation and management processes, are not trivial. They are crucial, regardless of where the digitisation itself occurs. File formats that are standard and predictable, as above, might still be the best choices in the circumstances. It is important to be aware of options and to make choices that suit the library's priorities.

Practical importance of this issue to GDL: 5
Time expended: 20%

19. Learning materials

Issues: An organised collection of objects can constitute a valid and valuable digital library, yet many libraries will wish to add value to these objects by creating educational resources based on them, drawing out themes and timelines and emphasising inter-relationships. This task should not be undertaken lightly, as it requires subject expertise, it may require tailoring content for a particular target audience, with appropriate design features and additional metadata, and may entail introducing personal judgements. It is usually expensive and time-consuming, yet it can also bring the library to life.

Solutions: In the absence of time or funding to produce learning materials, the GDL approach has been to semi-automate the process where possible, drawing on the values of controlled metadata fields for people, organisations and dates to auto-generate illustrated indexes and timelines linking related items together. Better results could certainly be obtained by additional research and hand-crafting of materials, but this simply is not always possible. The automated process is feasible and the added value is judged to be worth the extra work involved. It is also satisfying to see metadata being used to worthwhile effect.

Lessons: Terminology is important but imprecise. The more educational value that is added to a digital library, the more it becomes reasonable to describe it as a virtual learning environment, which may make it more attractive to funding bodies. The trick is to do this while preserving the library contents as independent items which can still be re-used and assembled in different ways for different purposes.

Practical importance of this issue to GDL: 3
Time expended: 3%

20. Promotion and publicity

Issues: Digital libraries need a promotional strategy that reflects their purpose and philosophy. One difficulty is in deciding when to promote a library if it is under constant development. Premature hype can be damaging and breed cynicism, yet funders and partners may require early promotional activities. Effective publicity may generate user demands on a library which can not be met, so needs careful planning and timing.

Solutions: None really for offline publicity, although an early awareness day did generate a flurry of interest. For online promotion the GDL approach was to release a prototype at an early stage of development, as a taster of things to come and evidence of progress. This was also useful in raising awareness, and ensured prominence of GDL content in web search engine results, but it also entailed subsequent design and URL changes.

Lessons: Unless a library has a captive audience, a combination of methods is advisable. Xie & Wolfram (2002) found that 'The majority of respondents reported they were informed about the state digital library service (Wisconsin) through physical libraries', suggesting that printed notices and leaflets are a worthwhile complement to online publicity. For task-oriented users and researchers, the ease of discovering individual library items via web search engines should not be underestimated. Accurate and differentiated titles and metadata within static web pages is invaluable for this purpose.

Practical importance of this issue to GDL: 2
Time expended: 2%

Overview

Many school teachers feel there is a gulf between educational research, which often takes an idealised view of the process of education, and the real world of state schools with their leaking roofs, blocked toilets, disruptive pupils, burdensome bureaucracy and fluctuating political initiatives. Not all schools have such problems, but for most teachers getting through the day is more of a priority than applying findings from educational research.

The digital library field is not like that, but there are parallels. Practitioners may have good ideas and be aware of possibilities but simply not have time or funds to do much about them, so they have to prioritise and compromise. This does not mean that digital library research is irrelevant or misguided, but it means not all of it can be applied. Practitioners have to focus on the art of the possible. For some large programmes and institutions the possibilities are very wide, for others very limited. Many of the issues and problems outlined above will be encountered, and all practitioners need to understand them, but the solutions adopted will vary from case to case. The decisions on which these solutions are based will be easier to make if the library has a clear purpose and set of policies, and if the implementers understand the broader issues as well as immediate priorities. Few of the solutions adopted for the GDL were technologically innovative, but they were achievable, they suited its particular requirements, and they sustained its potential for long-term scalability and interoperability with other digital libraries.

If any lessons learned from the Glasgow Digital Library experience can be of value elsewhere, then that adds to its success as a digital library research project. Whether it can ultimately be successful as a digital library service remains to be seen.

References

Chowdhury, G.G. & Chowdhury, S: Digital library research: major issues and trends, in Journal of Documentation, vol. 55, no. 4, September 1999. http://www.aslib.co.uk/jdoc/1999/sep/05.html

Dawson, A: Inferring User Behaviour from Journal Access Figures, in Serials Librarian vol 35 no. 3, 1999. http://bubl.ac.uk/archive/journals/serlib/v35n0399/dawson.htm

Nicholson, D & Macgregor, G: Learning lessons holistically in the Glasgow Digital Library, in D-Lib Magazine, July/August 2002. http://dlib.org/dlib/july02/nicholson/07nicholson.html

Nicholson, D, Dawson, A & Macgregor, G: GDL: Model infrastructure for a regional digital library? in Widwisawn, Issue 1, January 2003. http://widwisawn.cdlr.strath.ac.uk/Issues/issue1.htm

Soergel, D: A Framework for Digital Library Research: Broadening the Vision, in D-Lib Magazine, December 2002. http://dlib.org/dlib/december02/soergel/12soergel.html

Xie, H & Wolfram, D: State Digital Library Usability: Contributing Organizational Factors, in Journal of the American Society for Information Science and Technology, vol 53 no. 13 2002. http://www.asis.org/Publications/JASIS/vol53n13.html

Notes

  1. The Glasgow Digital Library is available at http://gdl.cdlr.strath.ac.uk/
  2. The Centre for Digital Library Research was formed in August 1999, bringing together long-standing research interests in the digital information area at the University of Strathclyde. Key aims are to combine theory with practice in innovative ways and to be a centre of excellence on digital libraries issues. As well as a practical focus, its research and development has a holistic approach, aiming to encompass all areas of digital library research, including social and human issues as well as technical and structural matters.
  3. Initial two-year funding was from the Research Support Libraries Programme, an initiative with a vision to 'facilitate the best possible arrangements for research support in UK libraries'. http://www.rslp.ac.uk/
  4. The word 'item' is beautifully vague. The quoted figure includes every document, web page and image stored centrally. There are fewer than 5000 catalogue records, as some records describe multi-image objects, such as leaflets. On the other hand, hundreds of catalogue records describe distributed items, which may be considered part of the library. These items were recorded by the GDL and integrated into it (up to a point) but not created by it.
  5. Red Clydeside: a history of the labour movement in Glasgow 1910-1932. http://gdl.cdlr.strath.ac.uk/redclyde/
  6. GlasgowInfo: directory of links to current information concerning the city of Glasgow. http://gdl.cdlr.strath.ac.uk/glasgowinfo/
  7. Aspect: a digital collection of materials from elections to the first Scottish parliament for almost 300 years. http://gdl.cdlr.strath.ac.uk/aspect/
  8. CAIRNS allows the simultaneous searching of multiple library collections of print and electronic resources held by Scottish libraries and information services. http://cairns.lib.strath.ac.uk/
  9. Funding for two collections was from SCRAN (http://www.scran.ac.uk/) and for two collections from the Resources for Learning in Scotland consortium (http://www.rls.org.uk/). Captions and metadata for both were required to be submitted in Access databases, with similar but not identical table structures.
  10. MARC records were created in text format and subsequently converted to machine-readable MARC format by using the free MarcEdit software. http://www.onid.orst.edu/~reeset/marcedit/
  11. The process of creating fixed web pages from an underlying database, as opposed to dynamic creation in response to user request, is known as static rendering by some systems.
  12. W3C Web Content Accessibility Guidelines. http://www.w3.org/TR/WAI-WEBCONTENT/
  13. Jakob Neilsen's guidance is still useful and relevant, many years after it was first offered. http://www.useit.com/
  14. Access to BUBL LINK via controlled subject terms is from http://bubl.ac.uk/link/. The DDC view of the same content is available via http://bubl.ac.uk/link/ddc.html
  15. For example, to use LCSH terms 'soccer' and 'bars (drinking establishments)' rather than 'football' and 'pubs', both cornerstones of Glasgow culture, is to invite ridicule for a Glasgow-based service. The CDLR, the National Library of Scotland and other partners are involved in a long-term initiative to extend and internationalise LCSH, working with SACO for subject terms and NACO for place names. http://www.loc.gov/catdir/pcc/.
  16. The CDLR subscribes to the OCLC Connexion service (formerly CORC) and is evaluating its use for collaborative cataloguing and other purposes. In this context the use of LCSH and other established authority files is essential. http://www.oclc.org/connexion/
  17. Cascading metadata may be referred to as inherited metadata in other contexts.
  18. The ease of importing Access databases into SQL Server made this the simplest solution to implement, with ASP or Cold Fusion scripts controlling the interface and search requests.
  19. Cascading search may be referred to as automatic search broadening in other contexts.
  20. Anglo-American Cataloguing Rules, Second Edition.
  21. The Greenstone suite of software for building and distributing digital library collections was produced by the New Zealand Digital Library Project. http://www.greenstone.org/