Skip to content

Data Analysis Workflow

Overview

This page outlines the data analysis workflow at Orbital Kitchens, from data ingestion through analysis and reporting.

Data Architecture

Data Warehouse Structure

Our Snowflake data warehouse follows a three-tier medallion architecture:

  1. Bronze Layer - Raw, unprocessed data from source systems
  2. Silver Layer - Cleaned and validated data with basic transformations
  3. Gold Layer - Business-level aggregations and analytics-ready datasets

For detailed information about our data design, pipelines, and custom logic, refer to the Data Design Document.

Tools and Platforms

Data Ingestion - Airflow

  • Platform: Airflow
  • Hosting: GCP VM with Docker containers
  • Code Repository: GitHub Repo
    • ⚠️ Note: GitHub repo is NOT automatically synced with production code on VM
    • Updates must be made in both places manually

Data Warehousing - Snowflake

All data storage, processing, and analysis happens in Snowflake.

1. DBeaver / SnowSight (SQL IDE)

Purpose: Sandbox environment and everyday SQL queries

Common Use Cases:

  • Checking for missing item IDs
  • Identifying missing review data
  • Finding duplicate row numbers in Acumatica tables
  • Data quality checks and validation
  • Quick updates
  • Core DDL management

Best Practices:

  • All DDL is version controlled in GitHub
  • Use service account with key/pair authentication for local development
  • Focus on maintaining data quality and validation

2. Snowflake Notebooks

Purpose: Production-level data manipulation and analysis

Key Features:

  • Think of notebooks as advanced stored procedures
  • Reference cell output in later cells
  • Easy debugging capabilities
  • Task scheduling
  • Switch between Python and SQL seamlessly

Common Use Cases:

  • Data transformations and manipulations
  • Forecasting using SNOWFLAKE.ML.FORECAST
  • Complex analytical workflows
  • Automated data processing pipelines

3. Streamlit Apps

Purpose: Internal tools for data management

Access & Security:

  • Share links with role-based access control
  • Currently used by VA team

Key Application:

  • Item Change Log App: Automatically manages master IDs, item status, and item brand mappings

4. Semantic Views & Cortex Analyst

Purpose: AI-powered data analysis and natural language querying

Implementation:

  • Create semantic views using Cortex Analyst
  • Edit via visual editor or YAML file
  • One semantic view per data segment:
    • Otter data
    • ERP data
    • Customer data
    • Payroll data

Deployment:

Business Intelligence - Sigma

  • Platform: Sigma Computing
  • Purpose: Data reporting and visualization
  • Data Source: Connects directly to Snowflake Gold layer

Operational Databases

Supabase (PostgreSQL)

  • Purpose: Operational database for software tools
  • Use Case: Backend database for internal applications

Backend SQL Best Practices

When to Use Raw SQL

Use raw SQL queries for:

  • Complex logic requiring CTEs
  • Multi-table joins
  • Advanced analytical queries
  • Performance-critical operations

When to Use SQLAlchemy ORM

Use SQLAlchemy ORM for:

  • Simple CRUD operations
  • Single-table queries
  • Basic filtering and sorting
  • Standard application queries

Workflow Summary

Data Sources
Airflow (GCP VM)
Snowflake Bronze Layer
Snowflake Silver Layer
Snowflake Gold Layer
├─→ Sigma (BI Reporting)
├─→ Snowflake Notebooks (Analysis)
├─→ Streamlit Apps (Internal Tools)
└─→ Cortex Analyst (AI Querying)

Quick Reference

Task Tool Environment
Daily SQL queries DBeaver/SnowSight Development/Sandbox
Data quality checks DBeaver/SnowSight Development/Sandbox
Production data manipulation Snowflake Notebooks Production
Forecasting Snowflake Notebooks Production
Business reporting Sigma Production
Master data management Streamlit Apps Production
Natural language queries Cortex Analyst Production
Data ingestion Airflow Production