Data Analysis Workflow¶

Overview¶

This page outlines the data analysis workflow at Orbital Kitchens, from data ingestion through analysis and reporting.

Data Architecture¶

Data Warehouse Structure¶

Our Snowflake data warehouse follows a three-tier medallion architecture:

Bronze Layer - Raw, unprocessed data from source systems
Silver Layer - Cleaned and validated data with basic transformations
Gold Layer - Business-level aggregations and analytics-ready datasets

For detailed information about our data design, pipelines, and custom logic, refer to the Data Design Document.

Tools and Platforms¶

Data Ingestion - Airflow¶

Platform: Airflow
Hosting: GCP VM with Docker containers
Code Repository: GitHub Repo
- ⚠️ Note: GitHub repo is NOT automatically synced with production code on VM
- Updates must be made in both places manually

Data Warehousing - Snowflake¶

All data storage, processing, and analysis happens in Snowflake.

1. DBeaver / SnowSight (SQL IDE)¶

Purpose: Sandbox environment and everyday SQL queries

Common Use Cases:

Checking for missing item IDs
Identifying missing review data
Finding duplicate row numbers in Acumatica tables
Data quality checks and validation
Quick updates
Core DDL management

Best Practices:

All DDL is version controlled in GitHub
Use service account with key/pair authentication for local development
Focus on maintaining data quality and validation

2. Snowflake Notebooks¶

Purpose: Production-level data manipulation and analysis

Key Features:

Think of notebooks as advanced stored procedures
Reference cell output in later cells
Easy debugging capabilities
Task scheduling
Switch between Python and SQL seamlessly

Common Use Cases:

Data transformations and manipulations
Forecasting using SNOWFLAKE.ML.FORECAST
Complex analytical workflows
Automated data processing pipelines

3. Streamlit Apps¶

Purpose: Internal tools for data management

Access & Security:

Share links with role-based access control
Currently used by VA team

Key Application:

Item Change Log App: Automatically manages master IDs, item status, and item brand mappings

4. Semantic Views & Cortex Analyst¶

Purpose: AI-powered data analysis and natural language querying

Implementation:

Create semantic views using Cortex Analyst
Edit via visual editor or YAML file
One semantic view per data segment:
- Otter data
- ERP data
- Customer data
- Payroll data

Deployment:

Agents deployed in Snowflake Intelligence

Business Intelligence - Sigma¶

Platform: Sigma Computing
Purpose: Data reporting and visualization
Data Source: Connects directly to Snowflake Gold layer

Operational Databases¶

Supabase (PostgreSQL)¶

Purpose: Operational database for software tools
Use Case: Backend database for internal applications

Backend SQL Best Practices¶

When to Use Raw SQL¶

Use raw SQL queries for:

Complex logic requiring CTEs
Multi-table joins
Advanced analytical queries
Performance-critical operations

When to Use SQLAlchemy ORM¶

Use SQLAlchemy ORM for:

Simple CRUD operations
Single-table queries
Basic filtering and sorting
Standard application queries

Workflow Summary¶

Data Sources
    ↓
Airflow (GCP VM)
    ↓
Snowflake Bronze Layer
    ↓
Snowflake Silver Layer
    ↓
Snowflake Gold Layer
    ↓
├─→ Sigma (BI Reporting)
├─→ Snowflake Notebooks (Analysis)
├─→ Streamlit Apps (Internal Tools)
└─→ Cortex Analyst (AI Querying)

Quick Reference¶

Task	Tool	Environment
Daily SQL queries	DBeaver/SnowSight	Development/Sandbox
Data quality checks	DBeaver/SnowSight	Development/Sandbox
Production data manipulation	Snowflake Notebooks	Production
Forecasting	Snowflake Notebooks	Production
Business reporting	Sigma	Production
Master data management	Streamlit Apps	Production
Natural language queries	Cortex Analyst	Production
Data ingestion	Airflow	Production