Inferring User Behaviour from Journal Access Figures

Alan Dawson

Centre for Digital Library Research, University of Strathclyde, Glasgow G1 1XH

alan.dawson@strath.ac.uk

1999

This article was published in The Serials Librarian, Volume 35 Number 3 1999

SUMMARY

Using the BUBL Journals service as a rich source of data, this article outlines different methods of measuring usage of electronic journals, suggests that different types of access may be mapped to the user activities of browsing, reading and searching, and draws some inferences about why different titles have differing patterns of usage. It also proposes that by measuring the 'search-to-browse' ratio of a journal it is possible to assess whether journals are being used primarily for research and reference purposes or mainly for current awareness and casual browsing.

Background

One of the best things about running an online information service is that you can get detailed information about how the service is being used. Access statistics should be treated with caution but they can help highlight popular items and show patterns of service use.

It is certainly easier and cheaper to collect usage figures for electronic periodicals than for printed ones. In a recent survey of public libraries in the UK, Evans et al (¹) found that "most authorities were not measuring the use of serials" but of those that did, 58% provided figures for use of electronic titles but only 5% measured use of printed serials. In an earlier study of printed serials use in academic libraries, Bustion et al (²) found that "extended use of direct observation, however, is prohibitively expensive" and that "observational errors, aggregation issues, and costs limit the feasibility of long-term, direct-observation use studies". In 1993 Naylor (³) agreed that the use study (of printed periodicals) was "hard to implement, and expensive".

Since the BUBL Information Service was relaunched at the end of March 1997 (⁴), access statistics have been steadily accumulating, and after six months of operation the first detailed analysis of these statistics was carried out. Even the bare access count is interesting, as it shows precisely how many times each title has been looked at during the half-year period. Furthermore, we can see how usage has grown over the six-month period, and by analysing the raw data in different ways we can draw some reasonable inferences about how the different titles are being used. Although it might be possible to obtain equivalent information for paper journals, this would involve a far greater outlay of time and effort, becoming a significant research exercise in itself.

Context

Before considering the access figures in detail, it is worth summarising some key facts about the online journals held by BUBL, to provide a meaningful context for the data:

Of the 249 titles carried in the BUBL Journals service (⁵), 153 (61%) fall in the broad subject area of library and information science (LIS), and a further 22 (9%) are classed as computing and information technology (IT). These titles include several magazines and newsletters as well as academic journals. The other major subject areas are social and medical services; business and marketing; agriculture and food science.
The full text of only 17 titles (7%) is held by BUBL. Abstracts are held for a further 160 titles (64%), while the remaining 72 (29%) are tables of contents only.
Coverage of over 110 titles (44%) dates back at least five years to 1993 or earlier.
Only currently published titles are included in the BUBL Journals service. Titles which have ceased publication or are no longer held by BUBL are stored in the BUBL Archive service and are not included in this usage study.

Every journal is searchable in four ways:

An individual file or document (usually equating to a single issue) can be searched using the standard browser search function. This does not require any indexing by BUBL.
All issues of a single title can be searched collectively, from the title's menu page in BUBL Journals. This is referred to as the 'single-title index'. There are 249 of these indexes; one for each title.
All titles within a specific subject area can be searched collectively. This is referred to as the 'journal-subject index'. There are five of these indexes; one for each main subject area. The IT index is a subset of the LIS index.
All titles in all subject areas can be searched collectively. This is referred to as the 'all-titles index'. There is only one such index.

Users may therefore reach the content of an individual journal by several possible routes:

By searching the 'all-titles' index
By searching the 'journal-subject' index
By searching the 'single-title' index
By browsing the alphabetical list of journals
By browsing the subject listing of journals
By browsing the BUBL Updates file, which includes recent additions to the service

A further possibility is an external link to a specific title. External links are most likely to point to a menu of journal issues, but could be to a specific issue or a specific full-text article. Overall numbers of specific links are likely to be low. For example, using Infoseek (⁶) to locate external links to the Serials Librarian page on BUBL gives just four hits: two from BUBL itself, plus one from Acqweb (⁷) and one from the University of New Mexico Law School Library. The search syntax for generating this is 'link:' followed by the URL:

link:http://bubl.ac.uk/journals/lis/oz/serlib/

Browsing, Reading and Searching

The BUBL server usage log records every access to every file of every title. In order to make sense of this vast amount of data, the log was first analysed by running the Analog program (⁸) on each month's usage log to produce a count of the number of accesses to every file and every menu during the month. This set of six summary reports (one per month) were further analysed, collectively, using a Microsoft Word Basic program written specially for the task, to produce the following aggregate figures for each journal title for the six-month period:

Number of accesses to the menu of issues for that journal (index.html)
Total accesses to all files holding journal content (*.htm)
Number of searches of all issues of that journal (the single-title index)

For example, for the Serials Librarian the three figures are 326, 1509, 189 (giving a rank order of 31st, 40th, 38th out of 249).

These three numbers can reasonably be interpreted as representing three different user activities; browsing, reading and searching:

Access Count	Screen Display	Assumed User Activity
Menu	List of journal issues with dates	Browsing
Text file	Content of a specific journal issue	Reading
Search box	List of titles matching search term	Searching

The correlation between access count and user behaviour is assumed to be 100% in the case of searching, as this is a deliberate user activity that can not easily be misinterpreted. However, screen display of textual content does not always imply reading behaviour. It is quite likely that a user will be scanning the text for relevance rather than reading it in full. This is particularly likely for pages displayed following a search, as the user may have to scan several different pages of apparent 'hits' before finding one that matches the topic he or she was really interested in. The equivalent behaviour might be referred to as browsing if applied to a paper periodical, where searching is not possible. In contrast the electronic browsing identified by the access count equates to looking at the dates on the spines of a set of paper journals without opening any of them. The only information available to the electronic journal user at that point is the date and number of issues.

The parallel with reading and browsing paper journals is therefore far from exact, but the electronic activities are distinct enough to justify further analysis. Furthermore, the activities are the same for every title, and therefore we can assume that variations in patterns of access to different titles are the result of different user activities, even if we can not be certain precisely what those activities are. In other words, we don't know exactly what users are doing, but we do know when they are using different titles in different ways.

Journal Popularity and Usage

The different measurements produce three different answers to the question of which is the most popular title. In fact they generate three quite different league tables:

	Reading	Browsing	Searching
1	ALAWON	Journal of Chronic Fatigue Syndrome	LIBRES
2	Newsletter on Serials Pricing Issues	Journal of Global Marketing	New Titles in Library and Information Science (⁹)
3	ALCTS Network News	Library Quarterly	Public Access Computer Systems Review
4	Electronic Information Report	Journal of Customer Service in Marketing and Management	ALCTS Network News
5	Library Journal	Ejournal	Newsletter on Serials Pricing Issues
6	Journal of American Society for Information Science	LIBRES	Journal of Information Science
7	Network Week	Journal of Segmentation in Marketing	Computers in Libraries
8	Network News	Advances in Librarianship	Managing Information
9	Computers in Libraries	Electronic Library	Program
10	Information Management Report	Associates	Journal of Librarianship and Information Science

Twenty-six different titles can justifiably claim to be among the ten most popular journals! Only four titles appear in two lists, and none appears in all three. As the results are drawn from six months data, this is unlikely to be a random pattern. The differences in use are substantial, and some reasonable inferences can be drawn from them:

Titles with a high 'reading' score may be accessed by other routes in addition to the browse menu. For example, ALAWON (¹⁰) is issued very frequently and therefore appears regularly in the BUBL Updates files, from which direct access to the latest issue is possible, bypassing the browse menu. This partly explains its high 'reading' value. Other titles with a high 'reading' score may contain words that are commonly specified in user searches of the 'all-titles' index or the 'journal-subject' index, thereby triggering reading accesses but not browsing accesses. (When a user searches one of these cross-journal indexes, several different titles are usually returned as matching the search term. However, that in itself does not count as an access to the individual journal title. It is only when the user follows up the search result and selects a particular title that a journal access is recorded.)
Titles with a high 'browsing' score are being accessed via the journal subject menu or the alphabetical title menu, and therefore either the journal name itself encourages access or it is of known value from previous use. The top ten 'browse' titles include three marketing journals and one medical journal, none of which appear in the other top ten lists. This suggests use of these titles for current awareness or relatively casual use rather than for specific research or reference purposes.
Titles with a high 'searching' score are being used for research and reference purposes, most likely as a result of previous reading of that title. Given that it is possible to search across all titles, users would probably only search within a particular title if they had a good idea of its likely content, or if they were seeking to narrow a cross-journal search which had produced too many hits. This assertion is supported by the fact that all of the top ten 'search score' journals are LIS titles, which is the core subject of the journals service. In fact, the top 21 titles in the search table are all LIS titles.

The Search-Browse Ratio

The top two titles in the browsing table appear at numbers 48 and 23 in the searching table, and at numbers 15 and 26 in the reading table. These figures suggest major differences in usage between these titles and other popular journals. These differences can be quantified by measuring the ratio of searching to browsing and reading. It would be quite possible to produce a separate search-browse ratio and search-read ratio for each title. However, it is probably more helpful to generate a single figure combining these two measures. In view of the fact that online behaviour classified as reading is in practice likely to involve a substantial amount of browsing (as the term is commonly understood), the most meaningful measure is probably a single search-browse ratio (SBR). This can be expressed as a percentage by dividing the 'searching' figure by the combined 'reading' and 'browsing' figures:

searching / (reading + browsing) * 100 = SBR

For example, the figures for the Library Journal are:

331 / (3934 + 411) * 100 = 7.6%

This means there is one search of all issues of the Library Journal for every 13 accesses to the content or menu. Equivalent figures for the Serials Librarian are:

189 / (1509 + 326) * 100 = 10.3%

Although the actual numbers are much smaller, the SBR is greater. The difference suggests that proportionately the Serials Librarian is used for research and reference more than the Library Journal. Both these figures are above average though not near the top. The SBR values range from 0.2% to 41.9%, with the median being 5.1%. We can therefore produce a fourth league table; one that arguably gives the best indication of the nature of periodical usage as opposed to the quantity. LIBRES again tops the table but the other titles are significantly different to the other three tables:

LIBRES	41.9%
Program	28.4%
Reference Librarian	25.1%
New Titles in Library and Information Science	24.8%
Public Access Computer Systems Review	23.5%
Journal of Librarianship and Information Science	22.2%
CTI News	20.1%
CPU: Working in the Computer Industry	20.0%
Journal of Information Science	19.1%
Managing Information	18.8%
College and Research Libraries	18.8%

It would in theory be possible to have a SBR higher than 100%. This would occur if there were large numbers of unsuccessful searches, indicating that users persistently expected to find useful information in that title but failed to do so. Note that with the method of calculating SBR used in BUBL Journals, a single successful search followed by a single document access would give an SBR of 50% not 100%, as the search box is located on the menu of issues, so the equation would be 1 / (1 + 1) * 100 = 50%. This should not necessarily be regarded as an optimum score, as a series of successful searches could push the SBR above 50%.

We can reasonably conclude that a high SBR means users are familiar with the type of material to be found in the journal, and are looking for something specific in it. It implies research and reference activity. A low SBR suggests a more general, possibly casual, interest in the subject area, or a desire to keep up-to-date with the journal content rather than a need to find something specific.

These conclusions are supported by the figures for the other two 'most popular' titles. ALAWON has a SBR of 2.34%, i.e. one search for every 43 accesses, which puts it in 200th place in the SBR table. This seems about right for a publication which is essentially a brief update bulletin rather than a journal. Most people would scan it for the latest news but have little need to search it. The other table topper, the Journal of Chronic Fatigue Syndrome, is also below average, with an SBR of 4.5%, suggesting that its high browse count is the result of casual interest rather than academic research and reference. The word 'casual' is perhaps unfair - we know from user feedback that the journal is used by appreciative CFS sufferers as well as by researchers.

These figures all confirm that the Search-Browse Ratio is a meaningful indicator of how electronic periodicals are used. It provides a means of standardising usage assessment regardless of actual totals, and it offers a valuable alternative to simple access counts, which can give a misleading impression of a journal's value to users.

In the BUBL Journals service there are several factors other than user behaviour that may affect the SBR - whether the full text is available online, the overall number of issues, and the intended purpose of the publication. Like most statistics, the numbers need to be sensibly interpreted, but the available evidence supports the potential usefulness of the SBR measurement in other online information services which provide single-title indexes.

Further Analysis

Although the above analysis may seem complex, it is easy to envisage further refinements and alternative measures. One obvious possibility would be to divide the 'reading' score by the number of issues of that title available online. This would give a figure for accesses per issue rather than total accesses, and yet another league table. Another option would be to compare accesses to the most recent issue with accesses to back issues - the Current-Previous Ratio -which might indicate a title's value for current awareness as opposed to research.

It is worth emphasising that the 'search' figures in the SBR are all derived from the single-title index, i.e. it is only measuring within-journal searching. Clearly there are many successful searches that derive from one of the cross-journal indexes, which have not been analysed here. As the cross-journal indexes are not directed at a specific title they do not readily contribute to assessment of journal usage patterns. We can however conclude from brief analysis of the overall usage figures that both browsing and cross-journal searching are extremely important to users. Any journals service that is only available via one of these methods is therefore likely to be losing numerous potential accesses and users.

Finally there is the question of the search terms themselves. This opens up a completely new field of analysis, with the potential to show exactly what users are looking for when searching, both within and across titles. It would also enable measurement of the ratio of successful to unsuccessful searches, and show how many unsuccessful searches users are prepared to make before giving up. At present we do not keep a record of search terms used in the BUBL Journals service, but this omission is currently being reviewed.

Conclusions

For BUBL as a service provider the usage figures and analysis are of interest in themselves, but there are also broader issues to consider. We have to justify the value of our service to our funding body (¹¹), and one way we can do this is via usage statistics. In a recent review of some of the national services provided for the UK higher education community, BUBL's detailed published access statistics were judged to indicate "a sense of openness and accountability on the part of the provider team which should be applauded". But funding bodies are only interested in overall totals and patterns of growth - they are not concerned about individual titles unless an item is so hugely popular that it affects network traffic. ALAWON and LIBRES are not really that type of publication.

However, funding bodies are rightly concerned that services should be cost-effective, and in order to manage the service effectively it is very helpful to be able to assess the benefit (in terms of usage figures) of any individual title against the cost incurred in making that information available. This has been done for printed journals - for example in Francq's Usage/Cost Relational Index (¹²) - but it can more easily be applied to electronic journals services. Certainly Francq's cautionary words are equally applicable to both paper and electronic usage: "it is important for the manager to be aware of what the usage figures represent."

For BUBL Journals the cost of any title is almost entirely in the staff time it takes to provide and maintain the electronic version. If our usage analysis had found that we were spending time transcribing, checking, editing and indexing titles that no-one was looking at, we could justify dropping those titles and spending the time on something more valuable. In practice, we do not at present propose dropping any of our journals collection as a result of this review. Even the least popular of our 249 titles has racked up a total of 161 accesses in six months, which might be considered healthy for some paper journals in an academic library. However, should the need arise to curtail the services we provide in some areas, we have a set of meaningful measurements as a basis for making well-informed decisions.

We can never know from figures alone the real value of all these accesses to our users. If a single access provides a single user with precisely the information he or she was seeking, possibly saving hours of searching elsewhere, then that gives it a very high value to the user, but it is still only one access in the usage statistics. The Search-Browse Ratio provides a more meaningful measure than a simple access count, but numbers alone will never provide a complete picture of what users are doing and what they want from a service.

References

Evans, M.K., McKnight, C., Morris, A. & Brunskill, K: The Electronic Serials in Public Libraries Project: Summary of Initial Survey Findings. Loughborough University Department of Information and Library Studies, November 1997. http://info.lboro.ac.uk/departments/dils/research/quest.html
Bustion, M., Eltinge, J. & Harer, J: On the Merits of Direct Observation of Periodical Usage: An Empirical Study. College and Research Libraries, Volume 53 Number 6 1992, p537
Naylor, M: A Comparison of Two Methodologies for Counting Current Periodical Use, Serials Review, Volume 19 Number 1 1993
Dawson, A: BUBL Bursts out of Bath. The Serials Librarian, Volume 31, Number 4 1997, p15-22
BUBL Journals: http://bubl.ac.uk/journals/
Infoseek: http://www.infoseek.com/
Acqweb Directory of Journals, Newsletters and Listserv Archives: http://www.library.vanderbilt.edu/law/acqs/journals.html
Analog: Fast, freeware WWW logfile analysis for Unix, PCs, NT, Mac, VMS: http://www.statslab.cam.ac.uk/~sret1/analog/ and many other locations
New Titles in Library and Information Science: This is not a journal as such, but monthly extracts from the BookData BookFind CD: http://bubl.ac.uk/journals/lis/kn/ntilis/
ALAWON: American Library Association Washington Office Newsline: http://bubl.ac.uk/journals/lis/ae/alawon/
Joint Information Services Committee: http://www.jisc.ac.uk/
Francq, C: Bottoming Out the Bottomless Pit with the Journal Usage/Cost Relational Index. Technical Services Quarterly, Volume 11 Number 4 1994, p 13-26