Data Management¶
Tools and practices for organizing, storing, backing up, and sharing research data in wildland fire science.
Cloud Storage¶
OneDrive - Microsoft's cloud storage with Office 365 integration. 🆓 5GB free, 💰 1TB with Microsoft 365 subscription.
Google Drive - Cloud storage with Google Workspace integration. 🆓 15GB free, 💰 100GB+ paid plans.
Dropbox - Cloud storage with strong sync capabilities and file versioning. 🆓 2GB free, 💰 2TB+ paid plans.
Box - Enterprise cloud storage with advanced security and compliance features. 💰 Paid plans, free accounts for some educational institutions.
Nextcloud - Self-hosted open-source cloud storage and collaboration platform. 🆓 Free (self-hosted).
Backup Solutions¶
Backblaze - Unlimited cloud backup for computers with automatic continuous backup. 💰 $7/mo per computer.
Arq Backup - Backup software that stores to your choice of cloud provider (AWS, Google Cloud, etc.). 💰 $50 one-time + cloud storage costs.
Duplicati - Free open-source backup software supporting multiple cloud storage backends. 🆓 Free.
Time Machine (macOS) - Built-in macOS backup to external drives. 🆓 Free with macOS.
Acronis Cyber Protect - Enterprise backup and recovery with ransomware protection. 💰 Paid plans.
3-2-1 Backup Rule
Keep 3 copies of your data, on 2 different types of media, with 1 copy offsite. This protects against hardware failure, disasters, and accidental deletion. Critical research data should always follow this rule.
Large File Transfer¶
Globus - High-performance file transfer for research data, optimized for large datasets and institutional repositories. 🆓 Free for researchers.
WeTransfer - Send files up to 2GB free, 200GB with Pro. Simple browser-based transfer. 🆓 Free up to 2GB, 💰 Pro for larger files.
Send Anywhere - Direct device-to-device file transfer without cloud storage. 🆓 Free with limits, 💰 paid plans.
Dropbox Transfer - Send up to 2GB free, 100GB with paid plans, files expire after set time. 💰 Paid plans for large files.
Aspera - Ultra-fast file transfer for scientific data, used by NASA and other agencies. 💰 Enterprise licensing.
Metadata & Documentation¶
Tropy - Research photo management with metadata organization for field photos and historical documents. 🆓 Free.
Zotero - Reference manager with PDF storage and annotation, excellent for research papers. 🆓 Free with 300MB storage, 💰 paid storage plans.
Mendeley - Reference manager with collaboration features and built-in PDF reader. 🆓 Free with 2GB storage.
Obsidian - Local-first markdown note-taking with linking for building personal knowledge bases. 🆓 Free, 💰 optional cloud sync.
Notion - All-in-one workspace for notes, databases, and project management. 🆓 Free personal use, 💰 team plans.
README Files - Best practices for documenting code and data with README files in your repositories.
Data Repositories & Archiving¶
Zenodo - General-purpose open repository for research data, software, and publications with DOI assignment. 🆓 Free, up to 50GB per dataset.
Figshare - Research data repository with visualizations, unlimited public storage. 🆓 Free public storage.
Dryad - Curated repository for scientific datasets, often required by journals. 💰 Submission fees apply.
Open Science Framework (OSF) - Free project management and data sharing for research with DOI minting. 🆓 Free.
GitHub Releases - Version and archive code with DOI integration via Zenodo. 🆓 Free for public repositories.
Environmental Data Initiative (EDI) - Repository for ecological and environmental research data. 🆓 Free.
Data Organization Best Practices¶
Folder Structure
Use consistent, hierarchical folder structures:
ProjectName/
├── data/
│ ├── raw/ # Original, unmodified data
│ ├── processed/ # Cleaned and processed data
│ └── metadata/ # Data documentation
├── code/ # Analysis scripts
├── results/ # Outputs, figures, tables
├── documents/ # Papers, reports
└── README.md # Project documentation
File Naming Conventions
Use descriptive, consistent file names with dates:
- Use ISO date format: YYYYMMDD or YYYY-MM-DD
- Avoid spaces (use underscores or hyphens): fire_perimeter_20230615.shp
- Include version numbers: analysis_v02.py
- Be descriptive: modis_burned_area_california_2020.tif
Version Control
Use Git for code and text files. Don't version control large binary files directly; instead use Git LFS or store in data repositories with version tags.
Data Formats¶
Recommended Open Formats¶
- Tabular data: CSV, Apache Parquet (for large datasets)
- Geospatial vector: GeoJSON, GeoPackage (
.gpkg), Shapefile (.shp) - Geospatial raster: GeoTIFF (
.tif), Cloud Optimized GeoTIFF (COG), NetCDF (.nc) - Point clouds: LAS, LAZ (compressed)
- Documents: Markdown (
.md), plain text (.txt), PDF/A (archival) - Images: JPEG, PNG, TIFF (uncompressed for archival)
Avoid Proprietary Formats When Possible¶
While software like Excel (.xlsx) and ArcGIS (.mxd, .lyrx) are common, save copies in open formats for long-term accessibility and interoperability.
Data Security & Privacy¶
Sensitive Data
- Never commit passwords, API keys, or credentials to Git repositories
- Use
.gitignoreto exclude sensitive files - Encrypt sensitive data at rest and in transit
- Follow institutional IRB and data protection policies
- Check FAIR and CARE data principles for ethical sharing
License Your Data
Apply open licenses to your data and code: - Creative Commons (CC BY 4.0) for data and documents - MIT or Apache 2.0 for code - CC0 (Public Domain) for maximum openness
See Choose a License for guidance.
Training Resources¶
FAIR Data Principles - Make data Findable, Accessible, Interoperable, Reusable.
CARE Principles - Collective Benefit, Authority to Control, Responsibility, Ethics for Indigenous data.
Good Enough Practices in Scientific Computing - Practical advice for data management and reproducible research.
Data Management Planning - Tool for creating data management plans required by funding agencies.