DataLake Native
Lightweight, file-based data lake with modern web dashboard and dual deployment architecture
View on GitHubDual Deployment Architecture UPDATED
Choose your deployment strategy: Docker for local development, AWS for production
Local Development
Docker Native solution for testing and development
AWS Production
Terraform-managed infrastructure with Prefect orchestration
Key Features
Modern data platform with flexible deployment options
File-Based Storage
Zero database dependencies with JSON file storage, S3-compatible and cloud-ready
Real-Time Dashboard
Modern web interface with glassmorphism design, live metrics, and data visualization
Job Orchestration
Built-in scheduler with execution tracking, error handling, and manual triggers
Weather Pipeline
Example data collection pipeline for meteorological data with OpenWeatherMap integration
Responsive Design
Mobile-friendly interface that works perfectly on desktop and mobile devices
Production Ready
Terraform infrastructure, GitHub Actions CI/CD, and comprehensive monitoring
Architecture Overview
Simple, scalable, and production-ready architecture
Data Flow Pipeline
Storage Structure
data/raw/weather/ - Raw weather data (JSON)
data/processed/ - Processed datasets
data/logs/ - Job execution logs
data/metrics/ - Calculated metrics
Technology Stack
Deployment Options
Choose the right deployment for your needs
Docker Local
Perfect for: Development, testing, demos
Setup: docker-compose up -d
Features: File-based storage, instant startup
AWS Production
Perfect for: Production, scaling, enterprise
Setup: terraform apply
Features: EC2, S3, IAM, auto-scaling