Easy data and code sharing through Knowledge@UChicago
Professor Dorian Abbot’s research explains fundamental problems in earth and planetary sciences using mathematics and computational models. Abbot regularly needs to share data and code associated with his publications to meet the requirements of journals and funding agencies such as the National Science Foundation and NASA. “Sharing data is also a good practice so others can reproduce our results,” he explains.
Abbot deposits his data into Knowledge@UChicago—an open-source digital repository available to researchers across campus for preserving and making data and software available, along with articles, presentations, dissertations, and reports. Knowledge@UChicago recently rolled out new features that improve research data and software preservation. First, the new platform integrates with GitHub, an open-source version-control system used for managing and storing revisions of code and files. Researchers can connect a GitHub repository to Knowledge@UChicago and select the automatic preservation of all new code releases. For a researcher interested in linking a specific version of code to a publication, this feature is particularly valuable.
Second, Knowledge@UChicago now collects rich information about research data, facilitating the reuse and understanding of the submitted files. Depositors can point to related research in the repository and on the web by providing a link and the type of relationship between the items, allowing for a connection to be made between a journal article and a supporting dataset or images, for example. Last, as the result of a partnership with the Library and the University’s Research Computing Center, Knowledge@UChicago will soon feature an API (application programming interface) that will allow for easy and automatable upload of metadata and data files.
An early adopter of Knowledge@UChicago, Abbot and his co-authors have made several deposits, including data needed to reproduce findings from “The Snowball Stratosphere,” code for a simple model of evolution of melt pond coverage on permeable Arctic Sea ice, and data for the atmospheric circulation and climate of terrestrial plants orbiting sun-like and M dwarf stars.
While there are other ways to share data and code, Abbot finds that the platform and functionalities offered through Knowledge@UChicago often serve him well. “They are very easy to work with, they are free, and they allow you to store a fairly large amount of data,” Abbot says. Knowledge@UChicago also conveniently issues a DOI, or digital object identifier, that can be used to cite the submission. An enthusiastic user, he has recommended the digital repository to other faculty members.
The value of Knowledge@UChicago is not limited to researchers in the physical sciences. “More and more often, medical grants are requesting that their awardees make their data available,” notes Sam Armato, Associate Professor of Radiology and the Committee on Medical Physics. “National Institutes of Health applications have a section for it.” The Library expects that faculty across the sciences, social sciences, and humanities will have data that can be readily shared on Knowledge@UChicago.
The Library, which runs Knowledge@UChicago in collaboration with IT Services, recently migrated the repository to the TIND digital platform, with the support of capital funding for Knowledge@UChicago to better meet growing campus needs and interests around data management, sharing and preservation, open access, and reproducible research results. TIND is based on the Invenio open-source software originally developed at CERN—the European Organization for Nuclear Research—to manage its own digital outputs.
With the TIND migration, the Library has created simple workflows for repository use. Researchers with an active CNetID are able to log in to the repository and create a record using a web form. A member of the Library team reviews the submitted item to ensure that the files can be opened, that there are no privacy or copyright concerns, and that the metadata is sufficient to find and reuse the data.
At this time, the University of Chicago Library is able to accommodate datasets of about 5 GB. Researchers are encouraged to contact firstname.lastname@example.org to discuss accommodating datasets that are larger in size.