A combination of federal mandates and good research practice is driving increased interest in data publication capabilities. There are limited tools currently available to librarians, digital media managers, and others in campus organizations tasked with managing data publication. Typical approaches involve developing, installing, and configuring various software components, and integrating these with existing campus identity and storage systems. This is a costly and time-consuming activity that few can afford.

How it Works

Globus publication capabilities are delivered through a hosted service. Published data is stored on campus, institutional, and group resources that are often managed and operated by different administrators. To associate storage resources with a data collection simply use Globus shared endpoints and associate them with the data repository to publish.

How Publication Works

Published datasets are organized by "communities" and their member "collections". For example, a national laboratory may have several member collections: Material Science, Computing, Environment and Life Sciences, to name a few. Often, collections will map to a department or group within an institution, but this is not required. Globus users can create and manage their own communities and collections through the data publication service. A collection enables the submission of datasets with policies regarding access.

A dataset comprises data and metadata. Policies can be set on communities or collections to manage:

  • Metadata (schema, requirements)
  • Access control (user and group based)
  • Curation workflow
  • Submission and distribution licenses
  • Storage

Datasets undergo curation based on a workflow defined by the community that will publish the data. Workflows may be customized by each community to capture their specific metadata and to reflect the community's review process. After the dataset is published, it is discoverable using a faceted search that allows the researcher to progressively filter results and rapidly focus in on the data of interest. The data may then be transferred to a Globus endpoint where the investigator can inspect and further process the data.

Data publication is a premium feature available with a Globus Subscription.

Case Study

The Materials Data Facility (MDF) is a Globus Labs project working to make it easier for researchers to publish and discover materials science data with a special emphasis on supporting heterogeneous data types, widely distributed data sources, and dataset sizes ranging from kilobytes to terabytes. In the first year, the project has made over 7 TB of materials data available to the community, with individual dataset sizes reaching up to 1.5 TB and >1 million files. MDF is also expecting to release a search index allowing researchers to discover data across various community repositories and search deeply into datasets in the coming year. To achieve these results, we leverage and build upon a variety of Globus services, including data publication, data search, transfer, and Globus Auth.