Data Analysis Workflow¶
Overview¶
This page outlines the data analysis workflow at Orbital Kitchens, from data ingestion through analysis and reporting.
Data Architecture¶
Data Warehouse Structure¶
Our Snowflake data warehouse follows a three-tier medallion architecture:
- Bronze Layer - Raw, unprocessed data from source systems
- Silver Layer - Cleaned and validated data with basic transformations
- Gold Layer - Business-level aggregations and analytics-ready datasets
For detailed information about our data design, pipelines, and custom logic, refer to the Data Design Document.
Tools and Platforms¶
Data Ingestion - Airflow¶
- Platform: Airflow
- Hosting: GCP VM with Docker containers
- Code Repository: GitHub Repo
- ⚠️ Note: GitHub repo is NOT automatically synced with production code on VM
- Updates must be made in both places manually
Data Warehousing - Snowflake¶
All data storage, processing, and analysis happens in Snowflake.
1. DBeaver / SnowSight (SQL IDE)¶
Purpose: Sandbox environment and everyday SQL queries
Common Use Cases:
- Checking for missing item IDs
- Identifying missing review data
- Finding duplicate row numbers in Acumatica tables
- Data quality checks and validation
- Quick updates
- Core DDL management
Best Practices:
- All DDL is version controlled in GitHub
- Use service account with key/pair authentication for local development
- Focus on maintaining data quality and validation
2. Snowflake Notebooks¶
Purpose: Production-level data manipulation and analysis
Key Features:
- Think of notebooks as advanced stored procedures
- Reference cell output in later cells
- Easy debugging capabilities
- Task scheduling
- Switch between Python and SQL seamlessly
Common Use Cases:
- Data transformations and manipulations
- Forecasting using
SNOWFLAKE.ML.FORECAST - Complex analytical workflows
- Automated data processing pipelines
3. Streamlit Apps¶
Purpose: Internal tools for data management
Access & Security:
- Share links with role-based access control
- Currently used by VA team
Key Application:
- Item Change Log App: Automatically manages master IDs, item status, and item brand mappings
4. Semantic Views & Cortex Analyst¶
Purpose: AI-powered data analysis and natural language querying
Implementation:
- Create semantic views using Cortex Analyst
- Edit via visual editor or YAML file
- One semantic view per data segment:
- Otter data
- ERP data
- Customer data
- Payroll data
Deployment:
- Agents deployed in Snowflake Intelligence
Business Intelligence - Sigma¶
- Platform: Sigma Computing
- Purpose: Data reporting and visualization
- Data Source: Connects directly to Snowflake Gold layer
Operational Databases¶
Supabase (PostgreSQL)¶
- Purpose: Operational database for software tools
- Use Case: Backend database for internal applications
Backend SQL Best Practices¶
When to Use Raw SQL¶
Use raw SQL queries for:
- Complex logic requiring CTEs
- Multi-table joins
- Advanced analytical queries
- Performance-critical operations
When to Use SQLAlchemy ORM¶
Use SQLAlchemy ORM for:
- Simple CRUD operations
- Single-table queries
- Basic filtering and sorting
- Standard application queries
Workflow Summary¶
Data Sources
↓
Airflow (GCP VM)
↓
Snowflake Bronze Layer
↓
Snowflake Silver Layer
↓
Snowflake Gold Layer
↓
├─→ Sigma (BI Reporting)
├─→ Snowflake Notebooks (Analysis)
├─→ Streamlit Apps (Internal Tools)
└─→ Cortex Analyst (AI Querying)
Quick Reference¶
| Task | Tool | Environment |
|---|---|---|
| Daily SQL queries | DBeaver/SnowSight | Development/Sandbox |
| Data quality checks | DBeaver/SnowSight | Development/Sandbox |
| Production data manipulation | Snowflake Notebooks | Production |
| Forecasting | Snowflake Notebooks | Production |
| Business reporting | Sigma | Production |
| Master data management | Streamlit Apps | Production |
| Natural language queries | Cortex Analyst | Production |
| Data ingestion | Airflow | Production |