Keywords in metadata

A common source of angst among metadata reviewers as well as metadata creators and editors, keywords are subject to a number of misconceptions. Our purpose here is to help clarify the issues and reduce the angst.

Keywords as categories for which your data are relevant

  • Subjects about which your data have something to say
    • Things you measured
    • Places where you made your measurements
    • Methods by which you measured or observed things or analyzed your observations
  • Aspects of the data package, format, contents, or documentation that, compared with other data, might be similar to some and not to others
    • Product type
    • Data format
  • Problems for which your data might be seen as a helpful resource
    • Climate change
    • Mineral availability
  • Names of things with which you want your data to be associated
    • Program name
    • Project name

Types or names?

Keywords can be types or names, but people will associate many different ideas with named things:

How keywords might be used after you're finished your review

How to evaluate the choice of keyword thesauri

  • For concepts that are represented in common controlled vocabularies, terms should be taken from those vocabularies and should be spelled correctly.
  • For special concepts or identifiers that aren't in controlled vocabularies, use "None" as the thesaurus name.
  • How to evaluate the choices of keywords

    • Secondary validation of metadata may help with well-supported vocabularies
    • Manual analysis would be required for specialized vocabularies
    But in general,
    • Will the keywords help with precision and recall if they are used to include or exclude these data in a search?
      • Precision: Results include only what you expect
      • Recall: Results include everything you expect

    Who, really, should be putting keywords into metadata?

    • Metadata authors often know only their own work
    • Metadata editors often know only the work of their own office
    • People who manage data collections see more diverse information
    • Library catalogers traditionally carried out this function