Software
Defining Openness
"Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)." - The Open Definition
Licenses¶
Open Source Initiative Licenses
Scientific Support Systems¶
Digital Object Identifier (DOI) Org - is the registration authority for the ISO standard (ISO 26324) for the DOI system. The DOI system provides a technical and social infrastructure for the registration and use of persistent interoperable identifiers, called DOIs, for use on digital networks.
ORCID - unique digital ID for every researcher
Zenodo - EU funded project for DOIs
Zotero - open source publication and citation manager
Scientific Programming Languages¶
BASH - is the GNU Project's shell—the Bourne Again SHell
- C is a general-purpose computer programming language.
C++ - is a general-purpose programming language.
- is a general-purpose, multi-paradigm programming language.
Fortran - designed from the ground up for computationally intensive applications in science and engineering.
- LFortran - is a modern open-source (BSD licensed) interactive Fortran compiler built on top of LLVM. It can execute user’s code interactively to allow exploratory work (much like Python, MATLAB or Julia) as well as compile to binaries with the goal to run user’s code on modern architectures such as multi-core CPUs and GPUs.
Go - is an open source programming language supported by Google.
HTML - is a markup language used for structuring and presenting content on the World Wide Web.
JavaScript - is the programming language of the Web.
JSON -JavaScript Object Notation is a lightweight data-interchange format.
GeoJSON - JSON extension for geospatial data
TYSON - Typed JSON extension
Julia - is a high-level, high-performance, dynamic programming language.
Make - is a build automation tool that automatically builds executable programs and libraries from source code by reading files called Makefiles which specify how to derive the target program.
CMake - is an open-source, cross-platform family of tools designed to build, test and package software.
PERL - is a family of two high-level, general-purpose, interpreted, dynamic programming languages.
Python - is a high-level, interpreted, general-purpose programming language.
R - is a programming language for statistical computing and graphics.
YAML - "YAML Ain't Markup Language" used in configurations for applications.
lightweight markup languages¶
Lightweight Markup Languages are designed to be human readable with minimal syntax.
MarkDown - is a lightweight markup language for creating formatted text using a plain-text editor.
ReStructuredText (RST) - is a file format for textual data used primarily in the Python programming language community for technical documentation.
Operating Systems¶
Linux (UNIX)¶
Apple¶
Microsoft¶
Windows Subsystem for Linux 2 (WSL2)
Package managers¶
Version Control¶
Continuous Integration¶
Composable Computing¶
Ansible - simple IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs.
Argo Workflows - is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
Terraform (HashiCorp) - leverage declarative configuration file for deploying infrastructure
Vagrant (HashiCorp) - leverages a declarative configuration file which describes all your software requirements, packages, operating system configuration, users, and more.
Scientific Software¶
Artificial Intelligence¶
Large Language Models¶
BARD - Google's Large Language Model
LLaMA - Meta's LLaMA (Large Language Model Meta AI)
ChatGPT -- OpenAI Large Language Model
Generative AI¶
OpenAI - is an American artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership.
Model Libraries¶
Hugging Face - is a public library of pre-trained models and applications
Machine Learning / Computer Vision¶
Awesome Computer Vision - meta list of many awesome lists and other links to ML, AI, and computer vision maintained by jbhuang0604
Containers¶
Apptainer - (also see Singularity) Linux Foundation branched container platform for HPC and Cloud.
containerd - open and reliable container runtime featured in Kubernetes
Docker - widely used container platform for distributed computing.
Kubernetes - open source container orchestration platform, created by Google.
Singularity - containers for HPC and Cloud.
Container Registries¶
Docker Hub - Official Images for Docker
Amazon Elastic Container Registry - run containers on AWS
Google Container Registry - run containers on Google Cloud
Azure Container Registry - run containers on Azure
NVIDIA GPU Cloud - containers for GPU computing
GitHub Container Registry - managed containers on GitHub
GitLab Container Registry - managed containers on GitLab
RedHat Quay.io - containers managed by RedHat
BioContainers Registry - bioinformatics containers
Productivity Software¶
CryptPad - online rich text pad.
Draw.io - drawings and diagrams in browser.
Excel - love it or hate it, many people still work in it or with .xlsx
format files.
Google Docs - is an online word processor included as part of the free, web-based Google Docs Editors suite offered by Google.
HackMD - online markdown editor.
GitBook - create documentation using Git and Markdown
JupyterBook - create documentation using Jupyter Notebooks and Markdown
MkDocs - is a fast, simple and downright gorgeous static site generator that's geared towards building project documentation.
LaTeX - is a high-quality typesetting system
Overleaf - LaTeX online document sharing platform.
ReadTheDocs - documentation using a variety of Markup langages
Software Heritage - preserves software source code for present and future generations.
Project Management Software¶
Workflow Managers¶
Scientific Workflow Systems - are a critical component of scaling out computational analyses with big data.
Apache Airflow - provides many plug-and-play operators that are ready to execute your tasks on Google Cloud Platform, Amazon Web Services, Microsoft Azure, etc.
Dask Distributed - is task scheduler for Dask in Python (Jupyter).
Makeflow - is a workflow system for executing large complex workflows on clusters, clouds, and grids.
NextFlow - enables scalable and reproducible scientific workflows using software containers.
Pegasus - project encompasses a set of technologies that help workflow-based applications execute in a number of different environments including desktops, campus clusters, grids, and clouds.
SnakeMake - workflow management system is a tool to create reproducible and scalable data analyses.