|
1 | | -# Retail Events Data Warehouse & Analytics Project |
2 | | - |
3 | | -## Transforming Raw Retail Data into Strategic Insights |
4 | | - |
5 | | -Welcome to my Retail Events Analytics portfolio project! This repository showcases an end-to-end data solution that I designed to help retail businesses make data-driven decisions about their promotional events and campaigns. |
6 | | - |
7 | | ---- |
| 1 | +```markdown |
| 2 | +# 🌟 Data Warehousing and Advanced Data Analytics 🌟 |
| 3 | + |
| 4 | +Welcome to the **Data Warehousing and Advanced Data Analytics** project! This repository showcases a comprehensive data analytics project that analyzed promotions and provided tangible insights to the Sales Director. |
| 5 | + |
| 6 | +## 🚀 Table of Contents |
| 7 | + |
| 8 | +1. [Project Overview](#project-overview) |
| 9 | +2. [Features](#features) |
| 10 | +3. [Technologies Used](#technologies-used) |
| 11 | +4. [Data Flow](#data-flow) |
| 12 | +5. [Data Analysis Techniques](#data-analysis-techniques) |
| 13 | +6. [Data Visualization](#data-visualization) |
| 14 | +7. [Getting Started](#getting-started) |
| 15 | +8. [Installation](#installation) |
| 16 | +9. [Usage](#usage) |
| 17 | +10. [Contributing](#contributing) |
| 18 | +11. [License](#license) |
| 19 | +12. [Releases](#releases) |
8 | 20 |
|
9 | 21 | ## 📊 Project Overview |
10 | 22 |
|
11 | | -Every retailer faces critical questions about their promotions: |
12 | | -- "Which campaigns are driving the most revenue?" |
13 | | -- "Are our discount strategies working effectively?" |
14 | | -- "How do promotional events impact different product categories?" |
15 | | - |
16 | | -I built this solution to answer these questions through a carefully architected data warehouse and intuitive visualizations that transform complex data into actionable insights. |
17 | | - |
18 | | ---- |
19 | | - |
20 | | -## 🏗️ The Architecture Behind the Analysis |
21 | | - |
22 | | -I implemented the industry-standard **Medallion Architecture**, creating a robust data pipeline with three distinct layers: |
23 | | - |
24 | | - |
25 | | - |
26 | | -### The Data Journey: |
27 | | - |
28 | | -**Bronze Layer**: Raw data ingestion |
29 | | -- Captures unaltered data from source CSV files |
30 | | -- Preserves data lineage and enables reprocessing if needed |
31 | | -- Establishes the foundation for all downstream analytics |
32 | | - |
33 | | -**Silver Layer**: Data refinement |
34 | | -- Cleanses and standardizes data formats |
35 | | -- Validates data against business rules |
36 | | -- Resolves inconsistencies and handles missing values |
37 | | -- Creates reliable datasets for analysis |
38 | | - |
39 | | -**Gold Layer**: Business intelligence |
40 | | -- Implements a dimensional star schema for efficient querying |
41 | | -- Creates pre-aggregated views for common analysis patterns |
42 | | -- Optimizes for reporting performance and usability |
43 | | -- Provides business-ready datasets tailored for stakeholder needs |
44 | | - |
45 | | - |
46 | | - |
47 | | -This architecture ensures data quality while maintaining flexibility for evolving business requirements. |
| 23 | +This project aims to provide a detailed analysis of promotional data and its impact on sales. By employing advanced data analytics techniques, we extract valuable insights that guide business decisions. The data architecture is designed to support efficient data flow and modeling, enabling quick access to the information required by the Sales Director. |
48 | 24 |
|
49 | | ---- |
50 | | - |
51 | | -## 💡 Key Analytical Findings |
| 25 | +## 🏆 Features |
52 | 26 |
|
53 | | -My analysis uncovered several actionable insights that can directly impact business strategy: |
| 27 | +- In-depth analysis of promotional campaigns |
| 28 | +- Segmentation of customer data for targeted marketing |
| 29 | +- Interactive visualizations to represent data insights |
| 30 | +- Data warehousing solutions for efficient storage and retrieval |
| 31 | +- End-to-end ETL pipeline for data processing |
54 | 32 |
|
55 | | -### Promotion Strategy Effectiveness |
| 33 | +## 💻 Technologies Used |
56 | 34 |
|
57 | | - |
| 35 | +This project utilizes a variety of technologies, including: |
58 | 36 |
|
59 | | -- **BOGOF Dominance**: The Buy One Get One Free promotion dramatically outperformed all other types, generating over 200,000 units in post-promotion sales—more than double the next best performer |
60 | | -- **Discount Paradox**: Despite offering the highest monetary value, the 50% OFF promotion showed surprisingly low effectiveness, suggesting consumers respond more to the perception of "getting something free" than equivalent percentage discounts |
61 | | -- **Pre/Post Comparison**: Analysis of baseline (pre-promotion) sales versus promotional period revealed BOGOF not only had the highest absolute sales but also generated the greatest sales uplift |
62 | | -- **Strategy Recommendation**: Prioritize BOGOF promotions for high-velocity products where margin can support the strategy |
| 37 | +- **Data Warehousing:** MSSQL for data storage |
| 38 | +- **ETL:** Efficient extraction, transformation, and loading of data |
| 39 | +- **Data Visualization:** Tableau for creating interactive dashboards |
| 40 | +- **Containerization:** Docker for consistent development environments |
63 | 41 |
|
64 | | -### Product & Category Performance |
| 42 | +## 🌐 Data Flow |
65 | 43 |
|
66 | | -- **Staples Lead**: Atliq Farm Chakki Atta (1KG) emerged as the top-performing product with approximately 80,000 units sold, followed by Atliq Sunflower Oil (1L) at about 70,000 units |
67 | | -- **Category Dominance**: Grocery & Staples account for 56.6% of total promotional sales, confirming the strategy of using essential items as promotional drivers |
68 | | -- **Hidden Opportunity**: Despite representing only 7.2% of total sales, the Personal Care category includes high-performing products like Atliq Lime Cool Bathing Bar, suggesting potential for expanding promotion of higher-margin personal care items |
| 44 | +The data flow within this project follows a systematic approach. It starts with data collection, followed by transformation and analysis, leading to visualization. Below is a simplified overview of the data flow: |
69 | 45 |
|
70 | | -### Campaign & Seasonal Impact |
| 46 | +1. **Data Collection:** Gather data from various sources |
| 47 | +2. **Data Cleaning:** Remove inconsistencies and prepare for analysis |
| 48 | +3. **Data Transformation:** Convert data into a usable format |
| 49 | +4. **Data Analysis:** Perform analytical tasks to derive insights |
| 50 | +5. **Data Visualization:** Present insights through visual formats |
71 | 51 |
|
72 | | -- **Festival Effect**: The Diwali campaign generated 153,338 units—over twice the sales volume of the Sankranti campaign (73,085 units) |
73 | | -- **Seasonal Planning**: This 110% performance difference highlights the importance of aligning promotional resources with cultural festivals that drive consumer purchasing behavior |
74 | | -- **Year-Round Strategy**: Analysis suggests a strategy of major resource allocation to top-performing seasonal campaigns while maintaining smaller, targeted promotions during other periods |
| 52 | +## 🔍 Data Analysis Techniques |
75 | 53 |
|
76 | | -### Geographic Distribution |
| 54 | +In this project, we applied several data analysis techniques: |
77 | 55 |
|
78 | | -- **Market Concentration**: The top three cities (Bengaluru, Chennai, and Hyderabad) account for approximately 60% of total promotional sales (257,813 units) |
79 | | -- **Expansion Potential**: The steep drop-off to mid-tier cities (Coimbatore through Madurai, each at 30,000-40,000 units) reveals untapped potential for targeted expansion |
80 | | -- **Localization Opportunity**: Cross-analysis of city performance with promotion types suggests opportunities for city-specific promotional strategies |
81 | | - |
82 | | ---- |
| 56 | +- **Descriptive Analysis:** Summarize historical data to understand trends |
| 57 | +- **Predictive Analysis:** Use statistical models to predict future outcomes |
| 58 | +- **Prescriptive Analysis:** Recommend actions based on analysis |
83 | 59 |
|
84 | | -## 🛠️ Technical Implementation |
| 60 | +## 📈 Data Visualization |
85 | 61 |
|
86 | | -### Data Engineering Excellence |
| 62 | +Data visualization plays a critical role in conveying insights effectively. We used Tableau to create interactive dashboards, allowing users to explore data intuitively. Here are some key visualizations included in the project: |
87 | 63 |
|
88 | | -- **ETL Pipeline**: Custom SQL Server stored procedures that handle incremental data loading |
89 | | -- **Data Quality Management**: Validation rules enforced during the Silver layer transformation |
90 | | -- **Performance Optimization**: Indexed views and smart partitioning for query efficiency |
91 | | -- **Documentation**: Comprehensive data dictionary and lineage tracking |
| 64 | +- Sales trends over time |
| 65 | +- Customer segmentation based on purchasing behavior |
| 66 | +- Promotional campaign performance metrics |
92 | 67 |
|
93 | | -### Advanced SQL Techniques |
| 68 | +## 🛠️ Getting Started |
94 | 69 |
|
95 | | -- Window functions for time-series analysis |
96 | | -- CTEs and subqueries for complex metric calculations |
97 | | -- Dynamic SQL for flexible reporting parameters |
98 | | -- Statistical calculations for significance testing |
| 70 | +To get started with this project, follow the instructions below. |
99 | 71 |
|
100 | | -### Data Visualization |
| 72 | +### Installation |
101 | 73 |
|
102 | | -My Tableau dashboard provides an intuitive interface for business users to: |
103 | | -- Filter insights by time period, region, or product category |
104 | | -- Drill down from high-level metrics to granular details |
105 | | -- Compare campaign performance side-by-side |
106 | | -- Export findings for stakeholder presentations |
| 74 | +1. Clone this repository: |
| 75 | + ```bash |
| 76 | + git clone https://github.com/EngIbrahim1/Data-Warehousing-and-Advanced-Data-Analytics.git |
| 77 | + ``` |
| 78 | +2. Navigate to the project directory: |
| 79 | + ```bash |
| 80 | + cd Data-Warehousing-and-Advanced-Data-Analytics |
| 81 | + ``` |
| 82 | +3. Set up the required environment using Docker: |
| 83 | + ```bash |
| 84 | + docker-compose up |
| 85 | + ``` |
107 | 86 |
|
108 | | ---- |
| 87 | +### Usage |
109 | 88 |
|
110 | | -## 📂 Repository Structure |
111 | | - |
112 | | -``` |
113 | | -retail-events-project/ |
114 | | -│ |
115 | | -├── datasets/ # Source data files |
116 | | -│ |
117 | | -├── docs/ # Documentation and diagrams |
118 | | -│ ├── data_architecture.drawio.png |
119 | | -│ ├── data_flow.drawio.png |
120 | | -│ ├── data_model.drawio.png |
121 | | -│ ├── promotion_performance.png |
122 | | -│ |
123 | | -├── scripts/ # SQL implementation |
124 | | -│ ├── init_database.sql # Database initialization |
125 | | -│ ├── ddl_bronze.sql # Bronze layer schema |
126 | | -│ ├── ddl_silver.sql # Silver layer transformations |
127 | | -│ ├── ddl_gold.sql # Gold layer dimensional model |
128 | | -│ ├── proc_load_bronze.sql # Data ingestion procedures |
129 | | -│ ├── proc_load_silver.sql # Data cleansing procedures |
130 | | -│ ├── gold_views.sql # Analytical views |
131 | | -│ ├── analysis_queries/ # Advanced analytical queries |
132 | | -│ ├── promotion_effectiveness.sql |
133 | | -│ ├── product_performance.sql |
134 | | -│ ├── campaign_comparison.sql |
135 | | -│ ├── geographic_analysis.sql |
136 | | -│ |
137 | | -├── tableau/ # Visualization assets |
138 | | -│ ├── Retail_Events_Insights.twbx # Interactive dashboard |
139 | | -│ |
140 | | -├── ad-hoc-requests.pdf # Business requirements |
141 | | -└── README.md # Project documentation |
142 | | -``` |
| 89 | +To analyze the data: |
143 | 90 |
|
144 | | ---- |
| 91 | +1. Ensure the data files are in the appropriate format. |
| 92 | +2. Run the ETL pipeline to load data into the database: |
| 93 | + ```bash |
| 94 | + python etl_pipeline.py |
| 95 | + ``` |
| 96 | +3. Open Tableau and connect to the database for visualization. |
145 | 97 |
|
146 | | -## 🚀 Strategic Recommendations |
| 98 | +## 🤝 Contributing |
147 | 99 |
|
148 | | -Based on the comprehensive analysis, I've developed these actionable recommendations: |
| 100 | +Contributions are welcome! If you have suggestions for improvements or would like to report an issue, please open an issue or submit a pull request. |
149 | 101 |
|
150 | | -1. **Promotion Optimization** |
151 | | - - Increase BOGOF promotions for high-velocity essential items where margins allow |
152 | | - - Reconsider 50% OFF promotions or test alternative messaging to improve perception |
153 | | - - Develop hybrid promotion strategies that combine the psychological appeal of BOGOF with sustainable economics |
| 102 | +## 📜 License |
154 | 103 |
|
155 | | -2. **Category Expansion** |
156 | | - - Maintain strong promotional focus on Grocery & Staples as traffic drivers |
157 | | - - Strategically expand promotions in the Personal Care category, targeting items with demonstrated promotion responsiveness |
158 | | - - Test bundle promotions that pair high-performing staples with higher-margin personal care items |
| 104 | +This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details. |
159 | 105 |
|
160 | | -3. **Seasonal Allocation** |
161 | | - - Allocate promotional budget with a 2:1 ratio favoring Diwali over Sankranti based on historical performance |
162 | | - - Develop Diwali-specific product bundles focused on top-performing categories |
163 | | - - Create targeted smaller promotions for Sankranti with region-specific approaches |
| 106 | +## 🔗 Releases |
164 | 107 |
|
165 | | -4. **Geographic Strategy** |
166 | | - - Maintain strong promotional presence in top-performing cities |
167 | | - - Develop tailored expansion strategies for mid-tier cities showing growth potential |
168 | | - - Consider city-specific promotion types based on local performance data |
| 108 | +You can find the latest releases of this project [here](https://github.com/EngIbrahim1/Data-Warehousing-and-Advanced-Data-Analytics/releases). Please download the relevant files and execute them as needed. |
169 | 109 |
|
170 | 110 | --- |
171 | 111 |
|
172 | | -## 🔍 Key Takeaways |
173 | | - |
174 | | -This project demonstrates my ability to: |
175 | | - |
176 | | -- Transform business questions into technical requirements |
177 | | -- Architect scalable data solutions following industry best practices |
178 | | -- Implement robust ETL processes with proper error handling |
179 | | -- Apply advanced analytical techniques to derive meaningful insights |
180 | | -- Translate data findings into concrete business recommendations |
181 | | -- Bridge the gap between technical implementation and business value |
182 | | - |
183 | | ---- |
184 | | - |
185 | | -## 🔗 Connect With Me |
186 | | - |
187 | | -I'm passionate about helping businesses leverage their data assets through thoughtful architecture and insightful analytics. |
188 | | - |
189 | | -**Sai Suraj M.V.V.** |
190 | | -Data Analytics Specialist |
191 | | - |
192 | | -📧 [saisurajmvv@gmail.com](mailto:saisurajmvv@gmail.com) |
193 | | -🔗 [LinkedIn](https://www.linkedin.com/in/saisurajmatta/) |
194 | | -🌐 [Portfolio](https://saisurajmatta.github.io/Portfolio) |
195 | | -💻 [GitHub](https://github.com/SaiSurajMatta) |
196 | | - |
197 | | -*Looking for a data professional who can turn your business questions into actionable insights? Let's connect!* |
| 112 | +### Topics |
| 113 | + |
| 114 | +This repository covers the following topics: |
| 115 | +- data |
| 116 | +- data analysis |
| 117 | +- data architecture |
| 118 | +- data flow analysis |
| 119 | +- data modeling |
| 120 | +- data pipeline |
| 121 | +- data segmentation |
| 122 | +- data visualization |
| 123 | +- data warehousing |
| 124 | +- docker |
| 125 | +- etl |
| 126 | +- etl pipeline |
| 127 | +- mssql |
| 128 | +- sql |
| 129 | +- tableau |
| 130 | + |
| 131 | +Feel free to explore the project, and let’s drive data-driven decisions together! 💡 |
| 132 | +``` |
0 commit comments