Case Study

Enterprise Data Platform

An automated cloud data platform connecting 5+ source systems into a unified data lake with quality gates, delta detection, and medallion architecture โ€” powering real-time dashboards for a large professional services firm.

Azure Data FactoryData Lake Gen2Parquet SQL ServerREST APIsPower BI Key VaultSelf-Hosted IR
ProblemSourcesPipelineData LakeImpact
Before Data Trapped in Silos

Critical business data was scattered across 5+ disconnected systems. Finance teams manually exported spreadsheets, reports were built by copy-pasting into Excel โ€” slow, error-prone, and not scalable.

๐Ÿ“Š
Manual Exports
Finance manually downloading spreadsheets from the ERP every week
๐Ÿ“‹
Survey Copy-Paste
Client feedback manually downloaded and reformatted
๐Ÿ”’
No Automation
People & Culture had no automated access to partner data
Result: No single source of truth, stale data, Excel-based reporting, and no firm-wide visibility.
Sources 5 Connected Systems

The platform ingests data daily from 5 source systems โ€” cloud APIs, on-premises databases, and file servers.

โ˜๏ธ
Survey Platform API
Client surveys & feedback via REST API with async export polling
๐Ÿ—„๏ธ
Finance ERP
Actuals, budgets, org structure via SQL Server
๐Ÿ—„๏ธ
Practice Management
Partners & clients via SQL Server
๐Ÿ“
On-Prem File Servers
Excel & CSV exports via Self-Hosted IR
๐Ÿ—„๏ธ
Legacy ERP
Org structure from legacy accounting system
Pipeline Azure Data Factory Orchestration

Three business domain pipelines run in parallel โ€” each with quality gates, delta detection, and domain-specific logic.

Finance
Row Countโ†’Quality Gate ยฑ15%โ†’Copy to Parquetโ†’Delta Hashโ†’Dim & Fact Models
Clients & Markets
List Surveysโ†’ForEach ร—4โ†’Start Exportโ†’Poll Statusโ†’Download ZIP
People & Culture
Extract Partnersโ†’Copy to Parquetโ†’Delta Processing
Data Lake Medallion Architecture

Data lands in a three-tier medallion architecture โ€” Bronze (raw), Silver (modelled), Gold (analytics-ready).

๐Ÿฅ‰ Bronze โ€” Raw / Landing
Date-partitioned raw data in original format
ParquetJSONCSVDate-partitioned
๐Ÿฅˆ Silver โ€” Cleansed & Modelled
Star schema with dimension and fact tables, delta-detected
Dim_OrgStructureDim_DatesFact_ActualsFact_Budget
๐Ÿฅ‡ Gold โ€” Analytics Ready
Optimised views powering real-time Power BI dashboards across the firm
Power BIDaily @ 2:00 AM AESTFirm-wide KPIs
Results Architecture & Impact

Automated daily data platform replacing manual Excel-based reporting across the entire firm.

Azure Data Factory
Orchestration engine with parallel pipeline execution
Data Lake Gen2
Medallion architecture: Bronze โ†’ Silver โ†’ Gold
Quality Gates
ยฑ15% row count threshold protection per load
Delta Detection
Hash-based change tracking, only new/changed rows flow
Async API Polling
Survey exports with parallel ForEach processing
Key Vault + SHIR
Secure secrets, on-prem connectivity via runtime
5+
Source systems unified into one data lake
3
Business domains running in parallel daily
ยฑ15%
Quality gate threshold protecting every load
Daily
Automated execution at 2:00 AM AEST

From Excel Silos to Unified Analytics

5 source systems, 3 business domains, medallion architecture, quality gates โ€” all automated daily.