Skip to main content

What is metadata?

21 June 2022

Having read about the FAIR principles in our last post, you might wonder - what is metadata? Metadata is data that describes other data. It summarizes basic information about data, making it easier to find and work with particular instances of data. To be as useful as possible, metadata needs to be standardized so it can be used and understood by many, and especially by machines. It can be divided into three, or possibly four, categories:

Descriptive metadata is needed for discovering and identifying assets. It consists of information that describes the asset, such as the asset’s title, author, and relevant keywords. 

Structural metadata describes relationships among various parts of a resource - it shows how a digital asset is organized, like how pages in a book are organized to form chapters. It also indicates whether a particular asset is part of a single collection or multiple collections, and facilitates the navigation and presentation of information in an electronic resource. It is the key to documenting the relationship between two assets, and is usually. 

Administrative metadata is related to the technical source of a digital asset, such as the file type, and when and how the asset was created. It also relates to usage rights and intellectual property, giving the owner of the asset, the license or conditions of use, and the allowed duration of use. It can be subdivided into technical metadata, preservation metadata, and rights metadata.

Provenance metadata, which is not always present, can be used to describe a digital file or resource’s history - including what was done to the file, when and where, who did it, what they did and which tool(s) they used, and why.

Metadata standards help with FAIR

To be as useful as possible, metadata needs to be standardized so it can be used and understood by many, and especially by machines. Metadata can be standardized in many different ways - some examples are:

Metadata schemas that define the data elements needed to describe a particular object. A metadata schema tailored to your discipline provides a set of metadata elements designed to provide a description of your dataset that is sufficient to make it discoverable and understandable.

Controlled vocabularies that narrow the possibilities of input by limiting choices to established lists of terms or codes. This helps to eliminate variation and ambiguity. They can be arranged as alphabetical lists of terms or as taxonomies with a hierarchical structure of broader and narrower terms.

Format standards that give technical specifications for how to encode the metadata for machine readability, processing, and exchange among systems. Common examples include the file formats CSV, XML, and RDF.

Many INCF Working Groups work with some sort of community standards or best practices for metadata.