pkb contents > information architecture | just under 4887 words | updated 12/28/2017

1. What is information retrieval?

IR occurs when a user retrieves information objects AKA content objects (documents and metadata) from an information retrieval system (e.g. libraries, archives, repositories/portals, websites, databases). IR requires IA.

1.1. IR systems

1.2. Major activities in IR

1.2.1. Cataloging & indexing

Information professionals work to ensure that IR (1) has good precision, AKA 'satisfies the requirement for general survey' or 'satisfies the collocating requirement' or attains 'representational predictability'; (2) has good recall; and (3) is possible across multiple IR systems. They do this by (1) creating search algorithms or by (2) ingesting information objects into a catalog, which includes:

INDEXING AGENT
Taxonomist Users Machine
TAXONOMY Yes Closed n/a Auto-Tagging, Auto-Classification, Auto-Categorization
No Open Tagging Information Extraction

1.2.2. Searching, browsing, & navigating

To retrieve information, users engage with information systems by:

2. What is information architecture?

Rosenfeld, Morville, and Arango (2015) say that IA aims to facilitate both the finding and understanding of information. The "finding" dimension of IA is information retrieval, the legacy of library science; the "understanding" dimension comes via Richard Saul Wurman, who focused on making complex systems intelligible through structured presentation, visualization, etc.

2.1. Must information be organized?

Per Abrahamson and Freedman (2008), order (like all things) has both benefit and cost. Even setting feasibility aside, it's probably never the case that 100% order is optimal. At the same time, because individuals' organization strategies and preferences vary so much, shared information spaces must (?) be organized according to intelligible principles that prospective users of the space can access and learn.

2.1.1. Anatomy of mess

Per Abrahamson and Freedman (2008):

2.1.2. Bases of organization

Hedden (2016) offers a typology in which taxonomies (i.e., knowledge organization structures) have different bases/origins. KOSs may be (1) objective, with obvious empirical basis; (2) socially-negotiated, perhaps eventually becoming (3) culturally embedded; or (4) idiosyncratic:

2.2. What ways can things be organized?

Schemas and structures work together. For example, my Zotero library is a hierarchy (structure) of topical categories (schema), automatically sorted by alphabet (schema). The distinction I draw is that "schemas" are for the conceptual (abstract, disembodied) organization of information into groups, whereas "structures" describe group configuration.

2.2.1. Information schemas

Wurman (1990) claims there are only five ways to organize items. Similarly, Wyllys (2000) states that information can be organized according to different schemas that encompass Wurman's five ways (italicized):

2.2.2. Information structures

But, Wyllys adds, there are also three fundamental information structures, AKA data models: hierarchic, graph, and relational. In their review of database history, Connolly and Begg (2015) classify Wyllys' three structures as "record-based" and add "object-based" as another top-level category:

2.2.2.1. Record-based

Can’t express constraints on the data, but express structure well.

2.2.2.1.1. Hierarchical

Per Zeng (n.d.),

2.2.2.1.2. Graph

AKA networks, triples, ontologies. Enables explicit modeling of different kinds of relationships (Has, IsCreatedBy, etc.) as well as more relationships (not restricted to one parent). Records are also called nodes and segments; relationships are also called edges.

2.2.2.1.3. Relational

Tables with columns, implicitly related via attributes; see notes on relational databases.

2.2.2.2. Object-based

Allow specification of constraints, but not overall structure. Objects are instances of classes; classes and objects have attributes (properties, characteristics, adjectives/nouns) and methods (actions, functions, behaviors, verbs).

2.2.3. Questions that arise in organizing

3. What is taxonomy?

Per Hedden, taxonomy --- the law or science (nomos) of order (taxis) --- has both general and specific meanings.

In general, taxonomy is the discipline of creating and managing taxonomies, a term synonymous with knowledge organization structures/systems (KOS) and very nearly synonymous with controlled vocabularies (CVs). As a field, taxonomy has roots in biological taxonomies and library catalogs (which existed even in antiquity as scrolls). Companies began to use taxonomies for corporate knowledge asset management very marginally in the 1980s, then increasingly in the 1990s as web technologies spread.

Beyond this general sense, taxonomies are also specific kinds of KOSs.

3.1. What are knowledge organization structures?

3.1.1. KOSs by structure

In order of increasing complexity (number, kind, and configuration of relationships):

KOS Ambiguity control Synonym control Hierarchical relationship Associative relationship
Term list X
Syn ring X X
Taxonomy X X X
Thesaurus X X X X

Terms are meant to be atomic units of meaning---single or multi-word phrases.

Concepts are combinations of terms.

Term lists are essentially flat list, though they may include "See"/"Use" to steer people towards preferred language, or use a synonym ring approach (no preferred term among multiple).

Synonym rings AKA synsets associate synonyms without indicating preference for one over the others (equivalence relationship). Synsets are usually invisible to users, e.g. underpinning a search engine.

Taxonomies AKA hierarchies AKA hierarchical taxonomies AKA tree structures arrange terms into parent/child relationships beneath a single top term (TT). A strict hierarchy requires that each term have a single parent AKA broader term (BT), though they may have multiple siblings and children AKA narrower terms (NT).

Microcontrolled vocabularies are subsets of a controlled vocabulary, creating a specialist CV.

Faceted taxonomies are a bottom-up approach to providing multiple views of the same content objects, based on shared attributes like topic, location, format, author, etc. They are often presented as navigation aids or search refinments; they may also be used to organize very large controlled vocabulary.

Polyhierarchies are hierarchies in which children may have multiple parents.

Thesauri capture associative relationships AKA related terms (RT) in addition to equivalence (U/UF) and hierarchical (BT/NT) relationships. Relationships are also called cross-references; they should be reciprocal (explicit entry at term Y linking to term X, explicit entry at term X linking to term Y) and may or may not be symmetric. Thesauri are useful for representing a very large controlled vocabulary.

Semantic networks fall short, in some way, of full ontologies, which are defined by their:

Relevant technologies include RDF, OWL, and topic maps.

3.1.2. KOSs by role in IR

As part of ingesting a content object into an information system, KOSs can be applied to the content object at several levels, creating more or less granular pictures of what the object is about:

Information consumers depend on KOSs to browse and search content objects:

See also uses of CVs.

3.2. What are reference works?

Reference works are information-dense resources meant to be consulted for specific information, not read comprehensively. Terminology for reference works is reviewed here because it often overlaps with KOS terminology. Definitions are quoted/paraphrased from Wikipedia:

3.3. What are controlled vocabularies?

CVs (AKA authority lists) consist of terms, syntax (how terms may be combined), and term records containing each term's semantic relationships, scope notes, and history notes.

Controlled vocabularies stand in contrast with natural languages, which undermine information retrieval because they are fraught with polysemes* (ambiguous words --- both homographs and synonyms/near-synonyms/quasi-synonyms). At minimum, to avoid the IR pitfalls of natural languages CVs must:

* Why so many polysemes in natural language?

3.3.1. Uses of CVs

Svenonius (2005), also see KOSs by role in IR: "In the form of terminological databanks, CVs are used to assist in both manual and automatic translation. In the form of glossaries, they standardize and explicate the meaning or usage of terms in specialized fields of activity. In the form of literary thesauri they assist in composition by facilitating the expression of ideas. In the form of conceptual structures they give backbone to knowledge representation systems."

3.3.1.1. CV-metadata connections

Per ANSI/NISO Z39.19 (p. 19),

3.3.2. Constructing CVs

3.3.2.1. Where do terms come from?

Terms are chosen from a specific domain/information space if their inclusion is warranted:

Since warrants shift over time, there should be a 'parking lot' for candidate terms AKA provisional terms. Terms can be generated by a committee (top-down, bottom-up); by empirical methods (deductive, inductive); by a machine; or from an existing CV (don't duplicate effort!).

3.3.2.2. How are terms expressed?

3.3.2.2.1. Scope & coordination/syntax

The basic rule per ANSI/NISO Z39.19 is that a term should denote a single concept or unit of thought; this is challenged by Svenonius (2005), who argues that concepts are inherently fuzzy and that quantitative linguistics provides a more objective foundation for extracting terms from natural language. At any rate, there are different kinds of concepts:

The simplest term form is a single-word term; there are several kinds of multiword AKA compound terms, the construction of which is governed by the CV's syntax:

A (1) bound term uses multiple words or a phrase to denote a single concept, e.g. oral surgery. The inverted form of bound terms may be included as an entry term, e.g. surgery, oral: see oral surgery. These (a) naturally occurring compound terms are generally preferable to (b) qualified homographs --- for example, religious tolerance is generally better than tolerance (religious). If no bound compound term exists, ambiguous terms (AKA head or focus nouns) should receive a modifier (AKA difference). Additionally, the scope of any term, not just homographs, may be clarified with a scope note (SN). If a SN mentions another term in the vocabulary, it should receive a reciprocal SN or cross-reference: term2: X SN term1.

While compound terms denote a single concept, (2) coordinated terms AKA synthesized terms associate several concepts with a single information object and generally fall in the domain of indexing languages. Coordinated terms can be (a) embedded in the object (precoordination), as with Library of Congress subject headings in books: English Language--Rhetoric, Persuasion (Rhetoric), Report Writing. Precoordinated terms enable browsing with great specificity, describe complex concepts, and impose alphabetic proximity on related terms that would otherwise be far apart. Given the high cost of taxonomist labor, though, it's often better to let users (b) combine terms interactively during search (postcoordination), e.g. English Language AND Rhetoric AND Persuasion AND Report Writing.

Use of a compound term should be determined by warrant, total #terms in the CV (more compound terms means more overall terms), and intended format (print sometimes benefits from precoordinated terms); see ANSI/NISO Z9.19 pp. 39-40 for more guidance and examples.

3.3.2.2.2. Grammatical forms

*Svenonius (2005) explains the intended effect of this standard: CVs where every term is "context independent", i.e. self-contained and thus reusable. However, context independence is "not normally operative in classification schemes where the verbal headings are governed by the principle of hierarchical force."

3.3.2.2.3. Typographic forms

All relationships are reciprocal, which should be captured by the taxonomy software. Most relationships (except RT) are asymmetric. Orphan terms lack any relationship.

3.3.2.3.1. Equivalence & preferred terms

Synonyms, near-synonyms, lexical variants, and (when generic posting is practiced) child terms are subsumed into a single preferred term with USE/USED FOR or U/UF. Preferred terms are also called descriptors and headings; non-preferred terms are also called entry or lead-in terms.

3.3.2.3.2. Hierarchy

See discussion of hierarchy, above.

3.3.2.3.3. Association

Associative relationships should be recorded between terms that are needed to explain each other, or readily evoke each other, or are etymologically related, or are derived one from the other. Although they may be disambiguated in a graph database, in a thesaurus the following relationships (and more) are collapsed under RELATED TERM/RT:

3.3.2.4. CV elements by impact on IR

Per ANSI/NISO Z39.19 (p. 16),

3.3.3. Testing CVs

The usability and IR performance of CVs should be tested:

3.3.4. Documenting CVs

Documentation should cover:

3.3.5. Maintaining CVs

CVs must be updated to stay valid, so there should be plan for reviews at defined intervals.

Updates may include:

3.3.6. Managing CVs

ANSI/NISO Z39.19 (pp. 99-102) lists desirable features of CV management software:

3.3.7. Displaying CVs

3.3.7.1. Overarching forms

3.3.7.2. Organization within or across forms

3.3.7.3. Format considerations

3.4. Interoperability

Per ANSI/NISO Z39.19, the need for interoperability arises from different sources:

Interoperability may be approached in different ways:

Interoperability data may be stored in:

4. Sources

4.1. Cited

Abrahamson, E. & Freedman, D. H. (2008). A perfect mess: The hidden benefits of disorder --- How crammed closets, cluttered offices, and on-the-fly planning make the world a better place. New York City, NY: Back Bay Books.

AfterHoursProgramming.com. (n.d.) IA tutorial. Retrieved from http://www.afterhoursprogramming.com/tutorial/Information-Architecture/Overview/

ANSI/NISO. (2005). Z39.19-2005: Guidelines for the construction, management, and format of monolingual controlled vocabularies. Retrieved from http://www.niso.org/apps/group_public/download.php/12591/z39-19-2005r2010.pdf

Connolly, T. & Begg, C. (2015). Database systems: A practical approach to design, implementation, and management (6th ed.). New York City, NY: Pearson Education.

Hedden, H. (2016). The accidental taxonomist (2e). Medford, NJ: Information Today, Inc.

Hjørland, B. (2011). Theoretical clarity is not "Manicheanism": A reply to Marcia Bates. Journal of Information Science, 37(5), 546-552. Retrieved from http://pure.iva.dk/files/31053333/JIS_1568_v3.pdf

Jacob, E. K. (2004). Classification and categorization: a difference that makes a difference. Library Trends, 52(3), 515. Retrieved from https://pdfs.semanticscholar.org/774e/ab27b22aa92dfaa9aeeeafbe845058e85f58.pdf

NISO. (1997). TR02-1997: Guidelines for indexes and related information retrieval devices. Retrieved from http://www.niso.org/publications/tr/tr02.pdf

Pomerantz, J. (2015). Metadata. The MIT Press Essential Knowledge Series. Boston, MA: MIT Press.

Rosenfeld, L., Morville, P., & Arango, J. (2015). Information architecture for the web and beyond (4e). Sebastopol, California: O'Reilly Media.

Svenonius, E. (2005). Design of controlled vocabularies. Encyclopedia of Library and Information Science, 45 (10), 82–109. Retrieved from http://polaris.gseis.ucla.edu/gleazer/260_readings/Svenonius.pdf

UX Booth. (2015). A complete beginner's guide to information architecture. Retrieved from http://www.uxbooth.com/articles/complete-beginners-guide-to-information-architecture/

Wurman, R. S. (2014). Give yourself permission to follow your nose [video]. Retrieved from https://www.youtube.com/watch?v=SDm1zXxpkr8

Wurman, R. S. (1990). Information anxiety. New York City, NY: Bantam Books.

Wurman, R. S. (1997). Information architects. New York City, NY: Graphis Inc.

Wyllys, R. E. (2000). Information architecture. Retrieved from https://www.ischool.utexas.edu/~l38613dw/readings/InfoArchitecture.html

Zeng, M. (n.d.). 4.3 Hierarchical relationships. In Construction of controlled vocabularies: A primer. Retrieved from http://marciazeng.slis.kent.edu/Z3919/43hierarchy.htm

4.2. Unread