Skip to content

license

Introduction to Cloud Native & Analysis Ready Data Formats

Instructors(s):

Tyson Lee Swetnam PhD , Carlos Lizárraga-Celaya PhD

About

This website follows the FAIR and CARE data principles and hopes to help further open science.

Agenda

Lessons Estimated Time to Complete Link
Introduction to Cloud Native Data Types 15 minutes presentation
Hands on with GeoJSON 30 minutes GeoJSON.io
Hands on with Cloud Optimized GeoTIFF 30 minutes cogeo.org
Break 10 minutes
Hands on with XArray & Zarr 30 minutes Xarray, Zarr
Hands on with Cloud Optimized Point Clouds 30 minutes COPC
Hands on with Spatio-Temporal Asset Catalogs 30 minutes STAC
Summary and Conclusion 5 minutes

Pre-requisites

  • a laptop with an active wifi connection

helpful but not required

Why "cloud native"?

There is a shift happening in the way we use Earth Observation System data to do research and management. Cloud data storage technologies have advanced at such a pace that we can now find and explore massive amounts of data via our web browser. At the same time online platforms with specialized software and hardware offer general data science and machine learning tools to explore these online datasets.

With these advances it is easier to foster collaborations, promote data-driven discovery, drive scientific innovation, increase transparency and improve reproducibility.

conventional

The old ways of receiving and working with GIS data.

Many of us have been participants in "sneaker net" and "mail order" data delivery ordering and managing data transfers over physical media. These data are then processed on our workstations and laptop computers and ultimately put on external hard drives or uploaded back to national data services. GIS data have changed hands for years over conventional internet protocols (https://, ftp://, and newer s3://), where datasets are preferentially DOWNLOADED to our local compute resources and worked on.

"Cloud Native" means you are no longer looking to download all of your GIS data. Instead, we send our "code" and our execution tasks to the "Cloud" where the data are processed, and serviced over a variety of commercial cloud providers who are already hosting these large geospatial datasets (often free of cost to us). Results can be viewed in the browser, or streamed in reduced formats back to our local computers.

cloud

The Cloudy way

Cloud-native and "Analysis Ready Data" formats allow us to work with large datasets on the cloud easily and rather painlessly.

geojson stac cog

zarr xarray copc

Open Architectures

The new approach to data sharing, focused on object storage rather than file downloads. This cloud platform approach is scalable and instead of moving data to processing systems near users as is the tradition, brings processing, computing, analytics and visualization to data – so called data proximate workbench capabilities, sometimes also referred to as server-side processing.

(Open Architecture for scalable cloud-based data analytics. From Abernathey, Ryan (2020): Data Access Modes in Science.)


Light reading

Gentemann, C. L., et al. (2021). “Science Storms the Cloud”. AGU Advances, 2, e2020AV000354. https://doi.org/10.1029/2020AV000354

Abernathey, R. P. et al. (2021) "Cloud-Native Repositories for Big Scientific Data," in Computing in Science & Engineering, vol. 23, no. 2, pp. 26-35, 1 March-April 2021, https://doi.org/10.1109/MCSE.2021.3059437


Last update: 2022-11-15