Welcome to OSN’s documentation!

The Open Storage Network (OSN) is a distributed data sharing and transfer service intended to facilitate exchanges of active scientific data sets between research organizations, communities and projects, providing easy access and high bandwidth delivery of large data sets to researchers.

The OSN serves two principal purposes: (1) enable the smooth flow of large data sets between resources such as instruments, campus data centers, national supercomputing centers, and cloud providers; and (2) facilitate access to long tail data sets by the scientific community. Examples of data currently available on the OSN include synthetic data from ocean models; the widely used Extracted Features Set from the Hathi Trust Digital Library; open access earth sciences data from Pangeo; and Geophysical Data from BCO-DMO. These data sets are being used by researchers to machine learning models, validate simulations, and perform statistical analysis of live data.

Note

This documentation is under active development.

Contents

System Overview

OSN data is housed in storage pods interconnected by national, high-performance networks creating well-connected, cloud-like storage that is easily accessible at high data transfer rates comparable to or exceeding the public cloud storage providers, where users can temporariy park data, for retrieval by a collaborator or create a repository of active research data.

This user guide is designed for the following categories of OSN user:

  • End Users who wish to view metadata and retrieve data.

  • Data Curators who maintain data sets

  • Data Managers who grant access to data sets for Curators and End Users

Configuration

Key characteristics of OSN storage are:

  • Ability to access data from anywhere via a RESTful interface that follows S3 conventions

  • Federated identity management, allowing access to protected information with existing identity via InCommon or commercial services

  • High speed access and transfer via national research and education networks

  • Security and data integrity

OSN storage pods are located in science DMZs at Big Data Hub sites, interconnected by national, high-performance networks. 5 petabytes of storage are currently available for allocation.

OSN Pod Deployment at six sites as of January, 2021

OSN Pod Deployment at six sites as of January, 2021

OSN Storage Pod

OSN Storage Pod

File Systems

OSN Storage is disk based and primarily intended to house active data sets. OSN storage is allocated from the pod(s) closest to the requestor with capacity to fulfill the request. Allocations of a minimum 10 terabytes and a maximum of 50 terabytes can be requested through the XRAS process. If your project needs more than 50 terabytes, please contact the OSN team directly to discuss before you submit your request.

The OSN supports two types of data sets:

  1. Open Access Data Sets that are readable by anyone and writable by Curators and Data Managers.

  2. Protected Access Data Sets that are readable by invitation from a data manager and writable by Curators and Data Managers.

Every data set is a collection of objects that are individually and uniquely accessible from anywhere. For Open Access data sets, an S3 RESTful interface allows users to manipulate storage objects simply by issuing commands in the form of Uniform Resource Identifiers.

For Protected Access Data Sets, the user first obtains an access key which is then embedded into the access command. Examples of each are provided below.

Coming soon: Consistent with FAIR principles, every OSN data set will have a landing page that makes it easy to “visit” a data set from a browser, search engine, or data catalog. The landing page contains metadata that describes the data set, along with the links to preconfigured, downloadable tools for accessing the data.

An active research data set can remain in OSN storage up to five years and usage must comply with the OSN Acceptable Use Policy.

Allocations

Storage on the OSN is allocated in standalone buckets independent of HPC allocations. There is a one-to-one mapping between buckets and allocations. This User Guide uses “Allocation” when referring to outward-facing operations such as Allocation requests, and “Bucket” when referring to inward-facing operations such as Bucket creation.

OSN storage is allocated from the resources at the location(s) closest to the requestor with capacity to fulfill the request. Allocations of a minimum 10 terabytes and max of 50 terabytes supporting up to 1.6 million files can be requested through the XRAS process. Larger allocations can be accommodated with additional review. If your project needs more than more than 50 terabytes or more than 1.6 million files, please contact the OSN team directly to discuss before you submit your request.

An active research dataset can remain in OSN storage up to five years.

Accessing Datasets

OSN supports a RESTful API that is compatible with the basic data access model of the Amazon S3 API. Any software that complies with that API can access data stored on the OSN.

There are three common methods for connecting to and using OSN resources:

  1. OSN portal built-in web tools

  2. Third party desktop applications (e.g. Cyberduck, Rclone)

  3. Third party data management server applications (e.g. Globus and iRods)

OSN Portal Built-in Web Tools

The OSN portal (portal.osn.xsede.org) supports a simple UI that allows end users to browse allocations and to upload and download objects via the browser. This mode of access is most appropriate for browsing a dataset and uploading/downloading smaller files (typically <100G).

To use the built-in browser, a user logs onto the OSN portal and clicks on one of the allocations that they have been granted access to. This brings the user to a searchable/sortable table listing of the allocation and its subdirectories. Clicking on any of the objects shown initiates a download of the object to the local disk.

To upload a file, the user locates the file on their local filesystem and drags the file to the browser window. This initiates an upload to the bucket location that the user is currently browsing.

OSN Basic Bucket Browser
OSN Basic Bucket Explorer

OSN Basic Bucket Explorer

Third Party Applications

There are numerous commercial and open source software tools for moving files to and from S3 buckets. These tools provide more sophisticated capabilities than the built-in browser tool including transfer management, multi-upload management, and provide configuration options that can help optimize data transfer for a given computer/network environment.

To use these tools, you will need to retrieve a pair of keys that are used to access the buckets stored on OSN. To retrieve these keys, you can contact your data manager and she will either give you keys or create an account for you on the OSN portal where you can retrieve these keys. If your data manager creates a portal account for you and gives you access to the keys you can visit OSN Portal to retrieve them; the allocations you have access tto and their associated keys will be listed on your home page.

OSN Portal User Home page

OSN Portal User Home page

Note that the “Bucket” information displayed in the portal has two components (this will be important when you configure third party tools). The bucket information contains the OSN site/pod location and the specific allocation on that pod.

Cyberduck

Cyberduck is a popular file transfer tool that supports the S3 API. The following describes how to configure Cyberduck to connect to an OSN resource. Cyberduck is a “cloud storage browser” for Mac and Windows that supports multiple storage providers/protocols. The software may be downloaded at: The Cyberduck Download Page

Using Cyberduck with OSN is straightforward.

  1. Visit the OSN portal to retrive your Bucket location and allocation names (see image below)

  2. Visit the OSN portal and retrieve your allocation keys or retrieve them from the data manager for your project

  3. Open Cyberduck and select the bookmarks icon (see image below)

  4. Click the add icon at the bottom left of the screen to create the bookmark

  5. Edit the new bookmark to point at the desired OSN pod using you allocation key pair

OSN Portal Location and Allocation
Selecting the bookmarks page and adding new bookmark

Selecting the bookmarks page and adding new bookmark

When specifying the server, use the hostname portion of the location (i.e. if the location is https://mghp.osn.xsede.org the hostname is “mghp.osn.xsede.org”).

When specifying “Port”, use 443 if the location starts with “https://”; use 80 if the location starts with “http://”.

Adding OSN pod and user information to bookmark

Adding OSN pod and user information to bookmark

Anonymous Access Data Sets

Some datasets provide anonymous read access; if you are accessing buckets anonymously, type “anonymous” into the Access ID portion and Cyberduck will then select the grayed out anonymous access box in the window.

Using anonymous access as your user

Using anonymous access as your user

Exit the window for the bookmark to save.

Browsing, Uploading, and Downloading

Once a bookmark is created, you can use it to access data by double-clicking the bookmark. This logs your user in and lists the contents of the dataset.

Note: If your buckets have large object counts, you will need to increase the Timeout settings for connections.

Go to Preference > Connection and change the box next to Timeout for opening connections (seconds) and change the setting to 90 seconds.

Directory listing within bucket

Directory listing within bucket

Cyberduck client is a full-fledged transfer client so desktop up/downloads can be easily performed for data sets.

The tool supports multiple upload/download streams, chunking, pausing and restarting.

Rclone
Rclone Configuration
Rclone Commands

Third Party Data Management

Landing Pages

blerg lorem ipsum

Open Access & Protected Datasets

blerg lorem ipsum

Managing Files and Data

blerg lorem ipsum

Transfer Data to the OSN

blerg lorem ipsum

Help

blerg lorem ipsum