Design, query, and evaluate information retrieval systems…
Most of the systems we interact with daily in a modern library setting — including our integrated library systems (ILS) and the catalog — are information retrieval systems. Our library routines are a key component of the success of these systems and, as librarians, we must possess an ability to use various components effectively. This includes encoding data into records as well as understanding the design features integrated into each system. As a librarian, it benefits us to have a theoretical as well as a working knowledge of how these systems are designed and how we can evaluate them for optimal usefulness.
In libraries, our ILS systems feature various subsystems, including a circulation module for patron records, an OPAC that can access material records via a user-friendly Graphic User Interface (GUI) and a cataloging module that allows libraries to enter and classify materials. Some also have interlibrary loan (ILL) and acquisitions modules. In terms of evaluating these systems, our analysis has to recognize that we are in a paradigm shift as the web’s proliferation has raised the bar for retrieval. Users now expect more intuitive queries. And, at the same time, tools such as classification, controlled vocabulary, tagging and menus are also needed to isolate satisfactory results in our increasingly information-dense world.
My personal view is that evaluation of library systems should be weighted in favor the user’s perspective. “The key utility measure is user happiness” note Manning, Raghaven and Schultz (2009) as they describe the need for user-focused design and evaluation. The bottom line is that library systems should be designed to meet the rising needs and expectations of our users who engage with them as much as is possible given technological and budgetary restraints.
Today’s librarians must also teach users how to interact with various information retrieval systems ranging from the natural-language web searching to specialized databases that may involve controlled vocabulary tools such as thesauri or advanced searching based on Boolean logic. Honing our operational ability with IR system in the form of effective query strategies prepares librarians for meeting the 21st Century demands of information users. And, this process also helps us better evaluate and design the IR systems for our organizations to achieve our mission.
One such skill is how to construct an effective query strategies — and we are also in a position to help these users evaluate the search results. Working with users is also a good way to understand the challenges they face in interfacing with these complex products. This can help us to better evaluate the design of proposed systems and upgrades to systems that our libraries are considering.
Library innovator Charles Cutter’s premise was that a library catalog should offer the ability to locate based three possible entry points: author, title and subject. This system comes from an era when index cards were the records, and his standards carried over to early library bibliographic standards and information retrieval system designs. Today’s ILS designs allow retrieval on all aspects of a data structure, including keywords within notes and summaries. However, the use of natural language search remains an important goal for user-facing library systems in my view and every librarian should be aware that people measure our systems not against each other but against the broader information universe that has raised the bar for IR systems.
One such recognition of the need to advance in library information retrieval is the introduction of Resource Description and Access (RDA) as a replacement for the decades-old Anglo American Cataloging Rules (AACR) — libraries and librarians should be prepared to make this leap if they have not already done or risk falling behind the curve in meeting the user expectations in this era of information proliferation. RDA is largely an improvement in how catalogs see the physical manifestation — or format so to speak.
RDA looks beyond books to what is referred to as a ‘work’. A user seeks a particular ‘work’ that she heard about, for example, and she can see in library’s catalog the various ‘expressions’ that are available for check out such as print, audiobook, eBook. Later, there may be various ‘manifestations’ reflecting new editions or graphical versions of this same ‘work’. This layer of complexity added to traditional bibliographic description allows for better access to the information and format that user needs. And, modern library IR systems have had to be redesigned so that they better reflect these diverse choices. Users expect more — and our design standards have had to elevate to keep pace with these rising expectations.
Other information retrieval advances include the use of the web for academic and other research traditionally reserved for subscription-based academic databases. Google Scholar is presently offering a system that provides free access to abstracts of academic journal articles. It is challenging the fee-based systems that are based on more indexing and filtering methods of search but often are not as easy to use or access. Web-based IR designs are possible due to advances in information retrieval that utilize algorithmic search strategies and programs called spiders that detect relevance. Librarians should keep abreast of these advances and consider them when designing our future as an information profession.
As library systems have been criticized for not being more “Google-like,” savvy librarians have recognized that usability is an important factor in evaluating our systems and services. The user experience (UX) is particularly key to evaluating our newly emerging technologies as users will not adopt technologies they feel uncomfortable or frustrated with. Bell (2002) establishes that usability relates to the interface and how easy it is to use whereas UX refers to the emotional experience of the user. As a public service librarian, I tend to agree that we have to look beyond features and examine library users’ experience to measure how effective our tools are or are not. We move from, “Does your OPAC search by keyword?” to “Why do users express frustration when using the OPAC?” This mindset for evaluation will help sort through the many design options that can be alluring at a distance but not practical in the day to day setting of our libraries.
The catalog is the heart of a modern library, and itself should be subjected to ongoing evaluation and improvement in my view. “UX design is the process used to determine what the experience will be like when a user interacts with your product,” according to usertesting.com, a blog dedicated to UX issues (Klein, 2016). Librarians and the vendors who sell them should be subjecting their catalog designs to UX feedback to improve the overall service levels. Part of this process is working with the librarians who work with these hands on, and librarians should engage with vendors in this dialog. As we move into the digital era of librarianship, the expectations for natural language query will continue to rise and we will be challenged to defend our systems, fiscally and otherwise. In the meantime, we have to set a high bar for our bibliographic standards so that searches are effective and easy to perform — and this should be from the perspective of the user whose level of skill may vary.
Another useful set of UX standards of evaluation that I’ve been introduced looks at three factors: value, adoptability and desirability (Guo as quoted by Bell). These standards of evaluation challenge us to consider the utility of our library informational retrieval systems. Also, we should look at adoption rates to determine the success of our decisions relative to new platforms. Lastly, we should acknowledge that if users are frustrated or hit roadblocks the conclusion should be that the system needs improvement — and we should move away from the assumption that the user is somehow deficient or needs to be re-educated.
One of the fundamental lessons in library science that we are introduced to in Information Retrieval (LIBR 202) relates to examining the result set for accuracy. I mentioned this because a system that is not effective will not provide value to the user nor is it likely to be widely adopted. We learn that we can measure a set of results for a given query by either precision or recall. These evaluation formulas measure the ability to find relevant materials within a collection using a search. These are specifically are defined as follows:
- Precision is the fraction of retrieved documents relevant to the query;
- Recall is the fraction of the documents relevant to the query that are successfully retrieved.
The distinction is important when evaluating library systems because when irrelevant results creep into the returns our users are misdirected and lose confidence in the system. And, secondly, if recall is low, items that may have been available can be missed — we cannot afford as libraries to have our materials invisible to the potential user. This sort of poor performance can sometimes be the result of marginal cataloging standards. However, poor query strategies can also result in less than optimal relevancy. Our job as librarians involves evaluating our systems but also optimizing and supporting them for higher levels of precision and recall through an active, hands-on role from encoding to inquiry.
Today’s librarians are tasked with locating material and information quickly and efficiently to “save the time” of the user as Ranganathan advocated (Rubin, 2004). This requires that we understand the nature of queries in a given system and how relevance in results can be increased through more effective queries. A query is simply a request for information from a database. But queries can be of various types: by parameter using menu or filters, by example completing aspects of structures or by query language such as with Boolean queries or natural language search, according to webtopia.com. A good librarian is an expert of the query and seeks to understand what that means for a given system.
This IR process can be summarized as a sequence of steps: an information need arises, the need is expressed, a search strategy is formed, a precise query made, and information results are returned (Meadow, Boyce, Kraft and Barry, 2008). Bibliographic standards and other IR tools are meant to solve this problem of a user’s need for specific information. These needs are expressed as queries that are formulated to work with particular system design. I have personally found that this effort can be two-fold. First, a theoretical understanding of query and search helps and this may involve a cultivated interest in database management system design (Oracle, 2016). And, second, a willingness to work with the user interface to better understand the tools it offers is crucial.
Another distinction to consider is that a successful query within a library setting can involve both searching or browsing. Librarians should recognize that browsing, a broader process of inquiry, is a viable alternative to searching, and we should direct users in how to access subject headings, a form of browsing integral to our OPAC systems. Other means of browsing such as encyclopedias, bibliographies, indexes and subject thesauri are worth recommending as well. Our classifications systems, Dewey Decimal System (DDS) and Library of Congress Classification, also support physical browsing. I mention this because my experience is that the OPAC computer terminal alone is not always the best approach to meeting user expectation — a broader search strategy should be encouraged by librarians.
A public-service librarian like myself will work with patrons to help them understand how to leverage these tools for a given investigation. It’s my sense that people are actually getting more skilled and self-sufficient at these sorts of tasks due to the ubiquitous nature of the web in our lives. However, library systems are unique. And, information literacy is something that should be built into the process of visiting a library as well as within the catalog interface as much as possible. Indeed, librarianship at its core is an active process of bringing users closer to their information need — and the query is fundamental to this process. We should gently guide in this arena to elevate the user’s search strategy within our systems and to help improve their overall experience in the library setting.
Evidence #1 — Click here to view: Design and Evaluation
One of my group assignments for Information Retrieval (LIBR 202) was to design a database schema and execute the construction of the database from this schema. This was a multi-step process involving many hours of work as individuals and in our student groups. It built upon earlier individual assignments in which we had learned how to use classification methods to describe the contents of a typical refrigerator. As a group, we choose to continue with the food theme and develop a database schema that would describe foods in a way that could benefit a diabetic person — thus we focused aspects such as calories per serving and other relevant data that could be retrieved from the USDA website.
With this group assignment, we learned to identify these relevant aspects — know as facets — as a means to describe our items in a way that was searchable within a database structure. We also aimed for consistency in our descriptions by developing and utilizing rules that we had crafted for our schema. The product of our efforts was a working database that is searchable and was evaluated for accuracy in response to queries. One specific piece of evidence that I am providing for Comp E is the “rules” document used for this assignment, which demonstrates this process of design and evaluation in action. It shows how each facet had to be carefully considered for what would be the data options to fill the fields and how these were to be encoded as well as from what source we would draw this data.
As a group, we began to see how evaluation of such a system relates back to the rules that are written as part of the database’s schema. Poorly constructed or misapplied rules meant lower accuracy. Our evaluations involved the use of single word queries, which gave us the chance to test our encoding — which in turn reflected the application of our rules. While the database we created was quite rudimentary, this process of design, execution and evaluation helped me develop a mental framework that has proven to be useful down the line in dealing with more complex systems such as AACR2 and RDA.
Today, I engage with MaRC records that use complex bibliographic description but the same principles apply — the use of rules are needed and in the end contribute to accuracy and ultimately this determine the quality of the database itself. The ability to create and follow rules in the process of descriptive cataloging is key to effective the information retrieval from that system.
Evidence #2 — Query
One of the most challenging assignments in a graduate program is a literature review performed for my Research Methods (LIBR 285) course, which was themed on evaluating programs and services. We performed a literature review as a required element of this class, and this I offer as evidence of my ability with the query element of Comp E.
A literature review teaches us to apply our knowledge about information gathering in an academic setting utilizing online databases, library catalogs (OPACs) and web-based research. As researchers, we must apply standards of information literacy as well solid query strategy. This involves making certain our search is comprehensive within the area of study and that the articles are peer-reviewed, for instance. This process manifests in hundreds of individual queries and millions of results. Thus designing an effective query with a given system is a fundamental skill of all information retrieval students as well as librarians.
I offer up this particular literature review that completed as part of my Research Methods in Information Science (LIBR 285) course because it represents the highest level of academic rigor required in this SJSU SLIS program. My paper specifically examined the topic of youth reading habits and was titled, “Influencing Reading Habits among Youth in an Era of Declining Readership.” It offered me a number of critical lessons in the process that is information retrieval in action. The first task was breaking the idea down in parts, which are reading habits and declining readership. My studies had helped me to understand facet analysis was critical to a good search strategy — I would need to look at both areas individually and then also identify where they overlapped significantly. This directed my search strategy and helped me customize my queries to pull very specifically the articles that spoke to my focus.
Some background reading based on the National Endowments for the Arts reports and elsewhere on the reading habits of Americans helped provide a broader view of the topic. However, the use of the King Library database access was essential to finding peer-reviewed materials to deepen my understanding of youth reading habits. EBSCO is one interfaces that I frequently leveraged in this process — and I continue to coach students in the use of EBSCO’s Academic Search Primer. Additionally, I utilized JSTOR, a database service that I had been tasked to analyze in another course and found particularly user-friendly as well as Google Scholar — both tools that I direct students to as a librarian as appropriate.
Bell, S. (2015). Usability and user experience — there is a difference [blog post]. Retrieved from http://dbl.lishost.org/blog/2012/05/29/usability-and-user-experience-there-is-a-difference/#.V_ObpJMrJp9
Klein, L. (2016). What is UX design? [blog post] Retrieved from https://www.usertesting.com/blog/2015/09/16/what-is-ux-design-15-user-experience-experts-weigh-in/
Manning, C.D., Raghavan, P. & Schutz, H. (2009) Introduction to information retrieval. Cambridge, England: Cambridge University Press.
Meadow, C.T., Boyce, B.R., Kraft, D.H., Barry, C. (2008). Text information retrieval systems (3rd ed.). United Kingdom: Emerald Group Publishing Limited.
Oracle. (2006). Oracle RCUI Guidelines.
Rubin, R.E. (2004). Foundations of library and information science (2nd ed.). New York, NY: Neal-Schuman Publishers.