infotechHomeSearch
infotech

Metadata matters

Dec. 17, 2002. updated June 2, 2003


Many K-Staters maintain and contribute to webpages for their departments and the university. How many are aware of the role that metadata plays in their webpages' discovery by search engines? Metadata is key to resource discovery on the Internet. Find out what it is and why it's important to K-State.

by Donna Schenck-Hamlin, Director of Information Support Services for Agriculture, and member of the KSU Digital Libraries Program Task Force

What is metadata?

"Metadata" may sound abstract, but it's a very simple concept.

Whenever you use an online library catalog to look up a book by author, you're searching for data that has been labeled "Author". The label "Author" is a meta tag and its content (e.g. "Mark Twain") is the metadata. The metadata simply describes an attribute of the data (i.e. the book) in a way that facilitates its access. The index of terms sorted at the back of a book is also metadata content, linking keywords to their physical location in the text.

Here's a book example:

Meta tag Metadata
Database KANSAS STATE UNIVERSITY LIBRARIES
Author Twain, Mark, 1835-1910.
Title The adventures of Huckleberry Finn : and related readings / [Mark Twain ... et al.]
Published Evanston, Ill. : McDougal Littrell, c1997.

In fact, indexes and metadata are concepts that go together, because metadata directs the link between a term you are looking for and its location or relation to the retrieved document. When you do a keyword search from www.northernlight.com, or any of the other Internet search engines, you're using hidden indexes that are periodically refreshed, based on the engine's particular formula for collecting websites.

Though these formula vary, all search engines attempt to retrieve the most "relevant" resource (i.e. best match between a searcher's submitted terms and their occurrence in the metadata or in the actual body of Internet resources).

Essential meta tags for HTML documents

The most fundamental HTML meta tags used to optimize resource discovery are "Title", "Keywords", and "Description". To see an example in the context of HTML, go to www.ksu.edu and view the page source in your web-browser client (in Netscape: click View, then Page Source. In Explorer: View, then Source).

Most people now know to create a title for an HTML document, or they are prompted to do so by their web-design software. Where some go wrong is in applying that same title to a whole series of different webpages, or by writing titles that neither describe nor promote the page. This is no help to people who are typically browsing through thousands of retrieved titles.

"Keywords" is a meta tag that most search engines rate very highly in their indexing formulas. This is an opportunity to add words you expect searchers might use to locate your resource, whether or not they occur in the body of your text.

Misuse of this practice is known as "spamming", in which designers pack high numbers of the same word or phrase into metadata as a way to optimize their page's position in search-results lists. Your page can be penalized by search engines for this, so a general rule of thumb is to avoid use of a word more than twice, even if it's part of a keyword phrase.

"Description" is a short phrase or paragraph that describes the content of a webpage. It displays in the list of search-engine results, if you have created the metadata for it. If you have not, then a fixed number of words from the beginning of your webpage will appear instead, often misleading or at least not attracting searchers to look further.

Advanced searches, such as www.northernlight.com/power.html, let you focus your search by specifying attributes of a web resource, such as date or language. These are implemented by indexing derived from meta tags such as "Date" and "Language". To see an example of several more meta tags used to describe a resource in detail, view the source of the webpage at www.oznet.ksu.edu/pr_prcag/publications.shtml.

Notice that most of these meta tags are preceded with "DC.", which indicates they are following the Dublin Core metadata element set, with its associated definitions. "DC." has been omitted from the critical metadata element sets named title, keywords, and description, because at this time many of the major commercial search engines are insensitive to Dublin Core. Recommended best practice is to duplicate these three elements with and without the "DC." prefix, or to leave these three without the "DC." prefix.

Why is metadata important to K-State?

Despite (or because of) the many search engines we use, Internet resource discovery remains a "hit or miss" operation for students, staff, and faculty. Even when looking for webpages on K-State servers, one can encounter differing coverage and limitations of each campus search engine. Lack of metadata in our own webpages exacerbates the problem, because that metadata would be indexed by our search engines.

There have been two guest lecturers at K-State last year addressing metadata issues. Their PowerPoint presentations are available via the links below.

There will be increasing emphasis on this topic as our university moves toward implementing the KSU Digital Libraries Program. Under this program, digitized knowledge resources from within the university will be searchable simultaneously with the library catalog and licensed databases, using metadata stored in document type definitions (DTDs).

What can you do?

  1. Find out whether your department, college, or other working unit is establishing any standards for metadata use. For example, KSU Research and Extension Publications has initiated their own form for attaching metadata to official publications.
  2. If you are responsible for designing or updating webpages, make sure that you have incorporated at least the three essential metadata elements cited above in the header of your HTML document. Below is an example you can copy-and-paste into your webpage documents and then tailor to your data:

    <html>
    <head>
    <title>This is still required, even if it duplicates first DC tag below</title>
    <meta name="keywords" content="subject terms or keywords go here">
    <meta name="description" content="short sentence describing the page">
    
    <meta name="DC.title" content="title of webpage goes here">
    <meta name="DC.subject" content="subject terms or keywords go here">
    <meta name="DC.description" content="short sentence describing the page">
    </head>
    

  3. Learn more about various metadata element sets that are being standardized worldwide. Some of these are

Web accessibility

Metadata is just one factor in improving the public's access to webpages. As you may already know, the deadline for meeting federal and state web-accessibility guidelines was March 31, 2002.

Providing metadata on webpages is a Priority 2 level, which means: "A Web content developer should satisfy this checkpoint. Otherwise, one or more groups will find it difficult to access information in the document. Satisfying this checkpoint will remove significant barriers to accessing Web documents." -- Checklist of Checkpoints for Web Content Accessibility Guidelines 1.0

Questions and comments about accessibility issues, including metadata, can be shared with other K-State web authors by joining K-State's Web Access discussion list. WEBACCESS is an open list. To join, send an e-mail message to listserv@ksu.edu with the following command and include your first and last name:

SUB WEBACCESS yourname

Or, send metadata inquiries directly to me at donnash@lib.ksu.edu.


start of standard bottom bar
Home        Search        Directories        Calendar        Comments
Kansas State University
October 22, 2009