Digitisation - peak of achievement

Scottish Mountaineering Club Journal 1890-1901

Alan Dawson, Centre for Digital Library Research, University of Strathclyde

alan.dawson@strath.ac.uk

April 2007

"Let thy words be few"

These are the first five words in the first issue of the Scottish Mountaineering Club Journal (SMCJ), published in January 1890. Well over a century later, the club and its journal are still thriving, and their published words are far from few. The club has changed substantially over the years, and membership is now restricted to accomplished rock and ice climbers, but in its early years most members were simply men who climbed hills, who would now be regarded as hillwalkers rather than mountaineers. The early issues of the SMCJ therefore document an exciting period in exploration of the Scottish highlands, containing the first recorded descriptions of numerous Scottish hills and crags, as well as articles on geology, photography, deer forests, snow cover, gaelic names, aesthetics, physiology, equipment, and expeditions abroad.

The first six volumes (36 issues) of the SMCJ have recently been digitised at the Centre for Digital Library Research (CDLR), thanks to a grant of £3000 from the Scottish Mountaineering Trust. All 36 issues are now freely available online, for personal or educational use, via the Glasgow Digital Library (run by CDLR), at http://gdl.cdlr.strath.ac.uk/smcj/.

Munros

'surely it is better to follow a standard, even if occasionally wrong'
(H. T. Munro, SMCJ volume 2 number 6, p330)

Issue 6 of volume 1 of the SMCJ contains the first publication of the tables that later became known as the Munros (Scottish mountains over 3000 feet high). The list has been revised many times since (1), most recently in 1997, leading many to call for a return to the original list compiled by Hugh T. Munro. However, inspection of the original tables shows why this would not be a good idea:

The most startling difference between the first list of Munros and the current one is its much larger size. Although Munro did designate a subset of his 538 hills to be 'separate mountains', he clearly regarded the full set as the standard list, so it is not clear how the set of hills regarded as Munros has shrunk from 538 (all mountains) to the current 284 (separate mountains). Perhaps the answer is hidden away in one of those volumes that have yet to be digitised.

Methodology

The methods used at CDLR to create the online version of the SMCJ are similar to those used to create accessible and easily usable ebooks in HTML format rather than the more cumbersome and problematic PDF, or other proprietary format. This methodology is described in detail elsewhere (2), but can be summarised as follows:

  1. Capture text and images using scanner or digital camera
  2. Convert text to machine-readable form, via OCR
  3. Convert images from TIF (kept for preservation) to JPG
  4. Assemble text for each issue into single Word document
  5. Proofread text and apply structure using Word styles: headings, quotes, tables, notes, indexes etc
  6. Insert references to image file names
  7. Convert from Word to HTML (using a Word macro), retaining only text and structure, not formatting
  8. Import all HTML files into an Access database (using an Access module)
  9. Generate web pages from database
  10. Generate cumulative indexes to all issues from database
  11. Publish generated pages on web server, with manually created stylesheet to control formatting
  12. Add link to web pages and await visit from Google robots to add searchability

This methodology takes longer than producing facsimile pages in PDF or image format (especially step 5), but has many advantages: web pages are relatively small, quick to load, Google-friendly, fully compliant with accessibility legislation, and viewable on any browser on any machine, with no plug-in software needed. Furthermore, each page has a different but precise HTML title, generated from the article title, author and date (3).

The methodology also adds value to the paper publications, rather than merely digitising them. Although the collection will be searchable and discoverable via Google, the use of accented characters and the spelling variations of proper names mean that browsing is at least as effective as searching. It is therefore important for the indexes to function effectively across the whole collection, not just within an issue or volume. This has been achieved by a) converting the original indexes to each bound volume into a single cumulative index, with links to the specific issue and relevant page, and b) adding further indexes that do not appear in the paper version, such as indexes of authors, events, illustrations, places and reviews. All index entries are stored in the Word documents along with the text, so that creation of index pages can be fully automated.

Many of the issues that arise from using this methodology have been addressed and resolved in earlier work on ebook creation (4). For example, policies are needed on error correction, punctuation, capitalisation, image placement, footnotes, character sets, etc. The aim is to strike a balance between access and preservation by faithfully capturing the content and structure of the original work without having to preserve typesetting or artefacts of the printing process, so that the end result is highly accurate but can take advantage of current styles and standards.

Development of the ebook methodology to make it applicable to a journal, including creation of the cumulative indexes, has made the SMCJ a useful focus for digital library research and development, as well as being valuable historical content.

Setbacks

The process of producing the online SMCJ has been far from smooth. Copies of the paper journals had been borrowed from the SMC library in Glasgow, but this closed due to building renovation and sale, so some issues had to be located elsewhere and borrowed privately. The project then ran out of funding halfway through proofreading of volume 4. Yet these were minor issues compared to the death of Rob Milne on Everest in May 2005. Rob had been SMC publications manager and steered the digitisation proposal through the Scottish Mountaineering Trust committee. Shortly after Rob's death the SMC librarian, Ian Angell, was desperately unlucky to fall into a rock crevasse descending from Ben Donich near Arrochar, and he too was killed. In these circumstances it is inevitable that the project took longer than originally envisaged, but it was also important not to abandon it, as that would have been against the spirit of both mountaineering and digitisation. The names of Rob Milne and Ian Angell deserve to be credited and remembered alongside those heroes from a much earlier generation of mountaineering in Scotland, whose recorded exploits are now readily available to all.

Further information

  1. For a concise summary of revisions see Statistical topics in hillwalking, by Chris Crocker and Graham Jackson: http://www.biber.fsnet.co.uk/
  2. The ebook methodology project report and toolkit is available from the Arts and Humanities Data Service: http://ahds.ac.uk/collections/ebook-methodology/
  3. For more details of this technique see Optimising metadata to make high-value content more accessible to Google users, by Alan Dawson and Val Hamilton, 2006: http://cdlr.strath.ac.uk/pubs/dawsona/ad200503.htm
  4. See Twenty issues in ebook creation, by Alan Dawson and Jake Wallis, 2005: http://cdlr.strath.ac.uk/pubs/dawsona/ad200501.htm