Metadata schema

It is fortunate that astronomers deal in data, are comfortable collecting it, and have devised many ways to organize it. It is unfortunate as this means there are lots of ways to organize this data and there is no universal metadata schema for astronomy and astrophysics, much less for glass plates and their associated materials. Thus far, our research has focused on developing a functional schema centering the glass plates as they are the scientific product, but relying on the envelopes and logbooks as complimentary records.

We chose to begin by exploring FITS (Flexible Image Transport System) headers. Contemporary astronomers are familiar with FITS headers because images from CCD telescopes utilize them to record metadata, albeit highly astronomically-specific metadata. One might question this strategy, as we want people without any astronomy expertise to be able to access this information. There are 53 keywords defined in the FITS standard, the majority astronomy-focused. However, several keywords (AUTHOR, DATE, DATE_OBS, OBJECT, etc.) are also general enough to be comprehensible to a broad audience. The FITS dictionary also provides nearly all of the keywords we need to capture the field-specific metadata necessary for scientific research. Therefore, the FITS system currently acts as our base schema.

As described earlier, the logbooks, envelopes, and plates all contain handwritten, descriptive metadata. But, the images themselves also contain important metadata that can be extracted from the plate utilizing simple, web-based tools. Uploading a scan of a plate to nova.astrometry.net will provide accurate information about the size of the scan, the coordinates of its central point, and objects seen within the image, among other details. (See Methods and Process for more information.) While this adds a step to the process, we must emphasize the importance of collecting the metadata from the images, and the increased quality of scientifically-relevant metadata available to researchers by doing so.

As alluded to earlier, the FITS dictionary is certainly not the only metadata dictionary used in the astronomy and astrophysics community, not to mention other general schema that would further unlock this data to general research and allow it to link to other data repositories around the world. To this end, we have also researched:

We applied these dictionaries to our data samples, crosswalking each one with FITS to understand where there was and was not overlap. We also identified desired fields that currently are not specifically covered by any of these existing schema or must be derived from other metadata, but are important for either context or for scientific purposes.

The keywords we currently use for sky survey plates are:

COLLECT***

EPOCH***

INSTRUM***

SCANNER***

SCANRES***

SERIES***

SITE***

SITE-LAT***

SITE-LON***

TELESCOP***

TYPE***

PLATENUM**

DATE-LOG**

DATE-MJD**

EXP-LOG**

EXP-PLA**

OBJ-LOG**

OBJ-ENV**

OBJ-PLA**

DATE-SCN**

SCAN-OP**

SCANHT**

SCANWD**

EMULSION**

HA**

ZD**

AIRMASS**

SITE-ALT**

SCANID**

OBSERVER**

PUBLICAT**

REMARKS**

RA-PLA*

DEC-PLA*

RA-ENV*

DEC-ENV*

RA-LOG*

DEC-LOG*

PLATESZ*

RA-FITS*

DEC-FITS*

SCANRAD*

SCANSCAL*

SCAN-ORI*

*** indicate that the keyword is associated with the plate SERIES: a significant number of related plates, similar to a “collection” in an archive

** indicate the keyword is universal to all types of plates

* indicate that the keyword is used for sky survey plates

Keyword Dictionary

We have applied this system to a handful of test plates, all sky surveys. Initially, metadata is recorded in a Google Sheet template. Then, a simple Python script is used to edit the FITS header with the full metadata. This step ensures that the metadata is always associated with the FITS file itself, and also provides a backup in case there is ever an issue with the Google Sheet. In the future, we expect metadata will be input directly into a database, and the Python script will scrape it to automatically write it into the headers of uploaded FITS files.


Support for this project comes from the National Science Foundation (Grant AST-2101781), University of Chicago College Innovation Fund, John Crerar Foundation, Kathleen and Howard Zar Science Library Fund, Institute on the Formation of Knowledge, and Yerkes Future Foundation.