X CLICK TO CLOSE WINDOW  
PORTFOLIO OF SAMPLES AND DELIVERABLES

Experience > Portfolio > Sample Final Report (Selected Sections)

Sample Final Report

[The following represents selected sections from report prepared for corporate portal client.]

TYPES OF CLASSIFICATION SCHEMES

BACKGROUND

While titles of items in relatively small liturgical collections in the Middle Ages could be scanned individually, the growth of such library collections eventually resulted in the need for some sort of classification scheme, that is, a way of organizing groups of texts (usually by subject) so that users could more easily find what they needed. [Client]'s hierarchy is an example of such classification schemes.

For the past hundred years (according to Dr. Bella Haas Weinberg of St. John's University) numerous systems for the structured representation of knowledge and information have been developed. These systems include both hierarchical classification systems with notation, as well as alphabetical indexing systems with sophisticated features for the representation of term relationships.

ENUMERATIVE AND FACETED CLASSIFICATION SCHEMES

Within the field of library science, there are two major types of classification schemes: enumerative and faceted. The following section provides a brief introduction to both.

Enumerative Classification Schemes

Enumerative classifications aim to enumerate or list all subjects present in the literature that the scheme is intended to classify. Two major enumerative classification schemes currently in use are the Dewey Decimal Classification (DDC) and the Library of Congress Classification (LCC). The DDC, first published in 1876, was originally designed for the arrangement of books on library shelves. The LCC was developed to encompass the wide range of materials in that library.

The enumeration is normally achieved by first identifying the main disciplines to be covered by the scheme, either on a philosophical or pragmatic basis, and allocating each a main class status. Then each discipline is divided into subclasses. This process of subdivision is continued until an appropriate level of specificity has been achieved. The object is to provide one place, and one place only, for each subject.

For example, the DDC classifies "Philosophy and Psychology" as class 100, which is further broken down into 100: Philosophy, 110: Metaphysics, 120: Epistemology, etc. Many enumerative schemes are thus also hierarchical.

[Client]'s current classification scheme similarly divides its universe of knowledge into the broad categories of "Entertainment," "Shopping," "Connecting," "Lifestyle," "Library," "Work & Money," "Computing," "Lookup," "Travel," and "Personal." Each of those categories are then subdivided iteratively until the appropriate level of granularity is reached.

Faceted Classification Schemes
Originated by the noted librarian S. R. Ranganathan in the 1930s, faceted classification arose from the need to accommodate complex or multi-concept subjects. Jennifer E. Rowley, in her book Organizing Knowledge, notes that what Ranganathan recognized was that the world of knowledge was growing quickly, with new areas of knowledge being discovered and new ways to combine existing subjects. He understood that any classification that attempted to enumerate a finite number of subjects without full capabilities for expansion to allow for new areas of knowledge could never meet the needs of the future.

Ranganathan wanted to classify knowledge into broad classes that were then broken down into basic concepts or elements according to certain characteristics, called facets. He proposed five basic types of facets that may occur in many subject fields: personality, matter, energy, space and time.

Facets of any specific collection, however, are determined by an evaluation of the nature of the items in that collection. Other examples of facets include:

· Subjects: general or specific
· Language: multilingual or individual language
· Geography: global or national
· Creating/supporting body
· User environment
· Structure
· Methodology

APPLICATION OF CLASSIFICATION TO THE WEB

When applied to the rapidly growing number of documents on the Internet, enumerative (hierarchical) schemes suffer from a number of disadvantages.

  • They require a substantial training period as well as time-consuming human analysis.
  • Because of their dependence on humans, they are expensive to implement.
  • Rigid hierarchical classification schemes cannot keep up with rapidly growing bodies of knowledge; they are usually updated through slow, formal processes by organized (often international) bodies.
  • They often split up collections of related material, thus necessitating good cross-referencing.
  • Their primary organization is by general discipline, not concrete topic, thus requiring users to dig through a long hierarchical structure to find specific information.
  • It is hard to apply existing enumerative schemes to the web, due to the nature of the users (general consumers as opposed to academic researchers) and of the collection (popular culture as opposed to academic research).

On the other hand, faceted classification permits far more specific classification than do most enumerative schemes. In addition, by rotating or permutating the facet values that describe a specific object, a faceted classification provides access to a single resource in a number of different ways.

We propose that [client] gain the benefits of both enumerative and faceted classification by using the latter (with the appropriate technologies to implement it) in addition to its current hierarchy. See the following section for specific recommendations concerning faceted classification.

DEVELOPING A CLASSIFICATION SYSTEM FOR [CLIENT]
Recommendation
Use a faceted classification scheme to analyze and describe content objects in the [client] directory.

EVALUATION OF CLASSIFICATION SCHEMES AGAINST [CLIENT]'S NEEDS
We completed our analysis of classification schemes by identifying the functionalities [client] needs from a classification system. We then compared enumerative and faceted classification approaches, respectively, against each of these functionalities. The results are displayed in the matrix below [not included here].

FACETED CLASSIFICATION AS THE BETTER APPROACH FOR [CLIENT]
Based on our analysis of the functionality matrix, we recommend that [client] begin developing a faceted classification approach to analyze content objects and use that analysis to build portions of its hierarchy from the bottom up. This will also help [client] to better leverage the information already in the [client] directory, facilitate ad targeting, and provide mapping tools to its partners.

Specifically, a faceted classification approach will:

  • Allow [client] to "slice and dice" the information in its database, that is, to provide the most relevant content to its partners, each of whom has different content needs
  • Enable the use of controlled vocabularies to increase inter-editor consistency
  • Provide a stable vocabulary for search queries
  • Increase ease of updating the classification scheme by allowing editorial staff to add or delete sections as priorities and other factors demand
  • Improve usability by increasing access points to information
  • Enhance the [client].com search function by allowing for filtering on a variety of facets
  • Increase monetization opportunities by offering sponsorship opportunities at the finest level of content granularity (e.g. facet values) rather than by subsections of the hierarchy

RECOMMENDED FACETS
We have developed the following content classification facets based on our analyses of [client] content, discussions with opinion leaders and partners, and discussions with the [client] project team. We recommend that [client] consider these facets as part of a new bottom-up information architecture strategy for classification.

In all cases, it will be necessary for [client] to define rules for using these facets to ensure consistency in their application by editors. This will help guarantee that users will obtain relevant and precise search results, that partners will be able to map directory data accurately, and that advertising is targeted appropriately.

The facets below [not described here] apply only to objects within the directory. Refer to the "Making It Happen" section of this report for details on the descriptive information to be assigned to categories.

Object Facets
Object Management Facets

THESAURI AND AUTHORITY FILES
Recommendation
Develop thesauri and authority files to enable the development of automatic hierarchy generation by means of category rules and a category thesaurus.

CONTROLLED VOCABULARIES VS. FREE TEXT
Free text is vocabulary taken directly from source material. It is extracted either by an indexer or automatically by software. One of the principal problems with the use of free text to analyze documents, or in this case, to describe objects, is the lack of consistency between object vocabularies or between editors.

For example, one content object might use the term "automobile," another might use the term "car." Both discuss the same content material and should be retrieved in a search for either term. Editors might also use inconsistent terminology to describe similar content. Indeed, the same editor might use different terminology at different times for the same content.

To reduce as much as possible the problem of inconsistent terminology and the resultant scattering of information, librarians have for many years used controlled vocabularies. Controlled vocabularies consist of predetermined terms that are chosen (by editor or software) to describe an object. Controlled vocabularies are often organized by major subject areas. The National Library of Medicine, for example, has developed MESH (MEdical Subject Headings) for use in cataloging material in its collections.

Controlled vocabularies are usually either thesauri or authority files, which are described more fully in the following two sections.

THESAURI
A thesaurus is a controlled vocabulary comprised of a structured list of terms. The structure is provided by three kinds of relationships between terms: broader terms, narrower terms, and related terms. In addition, terms not used in the vocabulary may be included, showing the terms that should be used in their place.

AUTHORITY FILES
Authority files are enumerative lists of terms (such as personal names), without the BT, NT, and RT relationships of thesauri. However, authority files might direct editors from non-used terms to preferred terms. For example, an authority file of classical violinists may have the following entries:
     Heifitz, Jascha
     Kennedy
     Kennedy, Nigel. Use Kennedy
     Manze, Andrew
     Solerno Sonnenberg, Nadia
     Stern, Isaac

NEED FOR THESAURI AND AUTHORITY FILES
As described in the "Making it Happen" section of this report, facet thesauri, category thesauri and authority files are needed for the automatic generation of the dynamic portions of the [client] directory. The various thesauri also can be leveraged by Oingo's lexicon and semantic mapping process to improve users' search results by including related concepts and by clarifying ambiguous concepts. In addition, thesauri can be used to build rules for mapping content to [client] partners. Thesauri and authority files also will increase editors' consistency in creating data records.

RECOMMENDATIONS FOR [CLIENT]
[not included here]

ContextualAnalysis, LLC ¥ Chicago, IL ¥ 773-561-1993 ¥ info@contextualanalysis.com © Copyright 2003
Contextual Analysis, LLC.