Dec. 17, 2002. updated June 2, 2003
by Donna Schenck-Hamlin, Director of Information Support Services for Agriculture, and member of the KSU Digital Libraries Program Task Force What is metadata? "Metadata" may sound abstract, but it's a very simple concept.Whenever you use an online library catalog to look up a book by author, you're searching for data that has been labeled "Author". The label "Author" is a meta tag and its content (e.g. "Mark Twain") is the metadata. The metadata simply describes an attribute of the data (i.e. the book) in a way that facilitates its access. The index of terms sorted at the back of a book is also metadata content, linking keywords to their physical location in the text. Here's a book example:
In fact, indexes and metadata are concepts that go together, because metadata directs the link between a term you are looking for and its location or relation to the retrieved document. When you do a keyword search from www.northernlight.com, or any of the other Internet search engines, you're using hidden indexes that are periodically refreshed, based on the engine's particular formula for collecting websites. Though these formula vary, all search engines attempt to retrieve the most "relevant" resource (i.e. best match between a searcher's submitted terms and their occurrence in the metadata or in the actual body of Internet resources). Essential meta tags for HTML documents The most fundamental HTML meta tags used to optimize resource discovery are "Title", "Keywords", and "Description". To see an example in the context of HTML, go to www.ksu.edu and view the page source in your web-browser client (in Netscape: click View, then Page Source. In Explorer: View, then Source). Most people now know to create a title for an HTML document, or they are prompted to do so by their web-design software. Where some go wrong is in applying that same title to a whole series of different webpages, or by writing titles that neither describe nor promote the page. This is no help to people who are typically browsing through thousands of retrieved titles. "Keywords" is a meta tag that most search engines rate very highly in their indexing formulas. This is an opportunity to add words you expect searchers might use to locate your resource, whether or not they occur in the body of your text. Misuse of this practice is known as "spamming", in which designers pack high numbers of the same word or phrase into metadata as a way to optimize their page's position in search-results lists. Your page can be penalized by search engines for this, so a general rule of thumb is to avoid use of a word more than twice, even if it's part of a keyword phrase. "Description" is a short phrase or paragraph that describes the content of a webpage. It displays in the list of search-engine results, if you have created the metadata for it. If you have not, then a fixed number of words from the beginning of your webpage will appear instead, often misleading or at least not attracting searchers to look further. Advanced searches, such as www.northernlight.com/power.html, let you focus your search by specifying attributes of a web resource, such as date or language. These are implemented by indexing derived from meta tags such as "Date" and "Language". To see an example of several more meta tags used to describe a resource in detail, view the source of the webpage at www.oznet.ksu.edu/pr_prcag/publications.shtml. Notice that most of these meta tags are preceded with "DC.", which indicates they are following the Dublin Core metadata element set, with its associated definitions. "DC." has been omitted from the critical metadata element sets named title, keywords, and description, because at this time many of the major commercial search engines are insensitive to Dublin Core. Recommended best practice is to duplicate these three elements with and without the "DC." prefix, or to leave these three without the "DC." prefix. Why is metadata important to K-State? Despite (or because of) the many search engines we use, Internet resource discovery remains a "hit or miss" operation for students, staff, and faculty. Even when looking for webpages on K-State servers, one can encounter differing coverage and limitations of each campus search engine. Lack of metadata in our own webpages exacerbates the problem, because that metadata would be indexed by our search engines. There have been two guest lecturers at K-State last year addressing metadata issues. Their PowerPoint presentations are available via the links below.
There will be increasing emphasis on this topic as our university moves toward implementing the KSU Digital Libraries Program. Under this program, digitized knowledge resources from within the university will be searchable simultaneously with the library catalog and licensed databases, using metadata stored in document type definitions (DTDs). What can you do?
Web accessibility Metadata is just one factor in improving the public's access to webpages. As you may already know, the deadline for meeting federal and state web-accessibility guidelines was March 31, 2002. Providing metadata on webpages is a Priority 2 level, which means: "A Web content developer should satisfy this checkpoint. Otherwise, one or more groups will find it difficult to access information in the document. Satisfying this checkpoint will remove significant barriers to accessing Web documents." -- Checklist of Checkpoints for Web Content Accessibility Guidelines 1.0 Questions and comments about accessibility issues, including metadata, can be shared with other K-State web authors by joining K-State's Web Access discussion list. WEBACCESS is an open list. To join, send an e-mail message to listserv@ksu.edu with the following command and include your first and last name: Or, send metadata inquiries directly to me at donnash@lib.ksu.edu. |