|
[The following
represents selected sections from report prepared for corporate
portal client.]
TYPES OF
CLASSIFICATION SCHEMES
BACKGROUND
While titles of items in relatively small liturgical collections
in the Middle Ages could be scanned individually, the growth of
such library collections eventually resulted in the need for some
sort of classification scheme, that is, a way of organizing groups
of texts (usually by subject) so that users could more easily find
what they needed. [Client]'s hierarchy is an example of such classification
schemes.
For the past hundred years (according to Dr. Bella Haas Weinberg
of St. John's University) numerous systems for the structured representation
of knowledge and information have been developed. These systems
include both hierarchical classification systems with notation,
as well as alphabetical indexing systems with sophisticated features
for the representation of term relationships.
ENUMERATIVE AND FACETED CLASSIFICATION SCHEMES
Within the field of library science, there are two major types
of classification schemes: enumerative and faceted. The following
section provides a brief introduction to both.
Enumerative
Classification Schemes
Enumerative
classifications aim to enumerate or list all subjects present
in the literature that the scheme is intended to classify. Two
major enumerative classification schemes currently in use are
the Dewey Decimal Classification (DDC) and the Library of Congress
Classification (LCC). The DDC, first published in 1876, was originally
designed for the arrangement of books on library shelves. The
LCC was developed to encompass the wide range of materials in
that library.
The enumeration
is normally achieved by first identifying the main disciplines
to be covered by the scheme, either on a philosophical or pragmatic
basis, and allocating each a main class status. Then each discipline
is divided into subclasses. This process of subdivision is continued
until an appropriate level of specificity has been achieved.
The object is to provide one place, and one place only, for each
subject.
For example,
the DDC classifies "Philosophy and Psychology" as class
100, which is further broken down into 100: Philosophy, 110:
Metaphysics, 120: Epistemology, etc. Many enumerative schemes
are thus also hierarchical.
[Client]'s current classification scheme similarly divides its
universe of knowledge into the broad categories of "Entertainment," "Shopping," "Connecting," "Lifestyle," "Library," "Work & Money," "Computing," "Lookup," "Travel," and "Personal." Each
of those categories are then subdivided iteratively until the appropriate
level of granularity is reached.
Faceted Classification Schemes
Originated by the noted librarian S. R. Ranganathan in the 1930s,
faceted classification arose from the need to accommodate complex
or multi-concept subjects. Jennifer E. Rowley, in her book Organizing
Knowledge, notes that what Ranganathan recognized was that the
world of knowledge was growing quickly, with new areas of knowledge
being discovered and new ways to combine existing subjects. He
understood that any classification that attempted to enumerate
a finite number of subjects without full capabilities for expansion
to allow for new areas of knowledge could never meet the needs
of the future.
Ranganathan wanted to classify knowledge into broad classes that
were then broken down into basic concepts or elements according
to certain characteristics, called facets. He proposed five basic
types of facets that may occur in many subject fields: personality,
matter, energy, space and time.
Facets of any specific collection, however, are determined by an
evaluation of the nature of the items in that collection. Other
examples of facets include:
· Subjects: general or specific
· Language: multilingual or individual language
· Geography: global or national
· Creating/supporting body
· User environment
· Structure
· Methodology
APPLICATION OF CLASSIFICATION TO THE WEB
When applied to the rapidly growing number of documents on the
Internet, enumerative (hierarchical) schemes suffer from a number
of disadvantages.
- They require
a substantial training period as well as time-consuming human
analysis.
- Because
of their dependence on humans, they are expensive to implement.
- Rigid hierarchical
classification schemes cannot keep up with rapidly growing
bodies of knowledge; they are usually updated through slow,
formal processes by organized (often international) bodies.
- They often
split up collections of related material, thus necessitating
good cross-referencing.
- Their primary
organization is by general discipline, not concrete topic,
thus requiring users to dig through a long hierarchical structure
to find specific information.
- It is hard
to apply existing enumerative schemes to the web, due to the
nature of the users (general consumers as opposed to academic
researchers) and of the collection (popular culture as opposed
to academic research).
On the other
hand, faceted classification permits far more specific classification
than do most enumerative schemes. In addition, by rotating or
permutating the facet values that describe a specific object,
a faceted classification provides access to a single resource
in a number of different ways.
We propose
that [client] gain the benefits of both enumerative and faceted
classification by using the latter (with the appropriate technologies
to implement it) in addition to its current hierarchy. See the
following section for specific recommendations concerning faceted
classification.
DEVELOPING
A CLASSIFICATION SYSTEM FOR [CLIENT]
Recommendation
Use a faceted classification scheme to analyze and describe content
objects in the [client] directory.
EVALUATION
OF CLASSIFICATION SCHEMES AGAINST [CLIENT]'S NEEDS
We completed our analysis of classification schemes by identifying
the functionalities [client] needs from a classification system.
We then compared enumerative and faceted classification approaches,
respectively, against each of these functionalities. The results
are displayed in the matrix below [not included here].
FACETED CLASSIFICATION
AS THE BETTER APPROACH FOR [CLIENT]
Based on our analysis of the functionality matrix, we recommend
that [client] begin developing a faceted classification approach
to analyze content objects and use that analysis to build portions
of its hierarchy from the bottom up. This will also help [client]
to better leverage the information already in the [client] directory,
facilitate ad targeting, and provide mapping tools to its partners.
Specifically,
a faceted classification approach will:
- Allow [client]
to "slice and dice" the information in its database,
that is, to provide the most relevant content to its partners,
each of whom has different content needs
- Enable
the use of controlled vocabularies to increase inter-editor
consistency
- Provide
a stable vocabulary for search queries
- Increase
ease of updating the classification scheme by allowing editorial
staff to add or delete sections as priorities and other factors
demand
- Improve
usability by increasing access points to information
- Enhance
the [client].com search function by allowing for filtering
on a variety of facets
- Increase
monetization opportunities by offering sponsorship opportunities
at the finest level of content granularity (e.g. facet values)
rather than by subsections of the hierarchy
RECOMMENDED
FACETS
We have developed the following content classification facets based
on our analyses of [client] content, discussions with opinion leaders
and partners, and discussions with the [client] project team. We
recommend that [client] consider these facets as part of a new
bottom-up information architecture strategy for classification.
In all cases,
it will be necessary for [client] to define rules for using these
facets to ensure consistency in their application by editors.
This will help guarantee that users will obtain relevant and
precise search results, that partners will be able to map directory
data accurately, and that advertising is targeted appropriately.
The facets
below [not described here] apply only to objects within the directory.
Refer to the "Making It Happen" section of this report
for details on the descriptive information to be assigned to
categories.
Object Facets
Object Management Facets
THESAURI AND
AUTHORITY FILES
Recommendation
Develop thesauri and authority files to enable the development
of automatic hierarchy generation by means of category rules and
a category thesaurus.
CONTROLLED
VOCABULARIES VS. FREE TEXT
Free text is vocabulary taken directly from source material. It
is extracted either by an indexer or automatically by software.
One of the principal problems with the use of free text to analyze
documents, or in this case, to describe objects, is the lack of
consistency between object vocabularies or between editors.
For example,
one content object might use the term "automobile," another
might use the term "car." Both discuss the same content
material and should be retrieved in a search for either term.
Editors might also use inconsistent terminology to describe similar
content. Indeed, the same editor might use different terminology
at different times for the same content.
To reduce
as much as possible the problem of inconsistent terminology and
the resultant scattering of information, librarians have for
many years used controlled vocabularies. Controlled vocabularies
consist of predetermined terms that are chosen (by editor or
software) to describe an object. Controlled vocabularies are
often organized by major subject areas. The National Library
of Medicine, for example, has developed MESH (MEdical Subject
Headings) for use in cataloging material in its collections.
Controlled
vocabularies are usually either thesauri or authority files,
which are described more fully in the following two sections.
THESAURI
A thesaurus is a controlled vocabulary comprised of a structured
list of terms. The structure is provided by three kinds of
relationships between terms: broader terms, narrower terms,
and related terms. In addition, terms not used in the vocabulary
may be included, showing the terms that should be used in their
place.
AUTHORITY
FILES
Authority files are enumerative lists of terms (such as personal
names), without the BT, NT, and RT relationships of thesauri. However,
authority files might direct editors from non-used terms to preferred
terms. For example, an authority file of classical violinists may
have the following entries:
Heifitz, Jascha
Kennedy
Kennedy, Nigel. Use Kennedy
Manze, Andrew
Solerno Sonnenberg, Nadia
Stern, Isaac
NEED FOR THESAURI
AND AUTHORITY FILES
As described in the "Making it Happen" section of this
report, facet thesauri, category thesauri and authority files are
needed for the automatic generation of the dynamic portions of
the [client] directory. The various thesauri also can be leveraged
by Oingo's lexicon and semantic mapping process to improve users'
search results by including related concepts and by clarifying
ambiguous concepts. In addition, thesauri can be used to build
rules for mapping content to [client] partners. Thesauri and authority
files also will increase editors' consistency in creating data
records.
RECOMMENDATIONS
FOR [CLIENT]
[not included here]
|