Author: Maria Ilnitsa
This project analyzes sales performance, customer purchasing behavior, and marketing campaign effectiveness in an e-commerce environment. The analysis uses multiple relational datasets containing information about customers, products, orders, and marketing campaigns.
The goal of the project is to transform raw transactional and marketing data into actionable business insights that help improve revenue generation, customer retention, and marketing efficiency.
By combining sales data with customer segmentation and campaign performance metrics, the analysis identifies key drivers of revenue and opportunities for improving marketing strategy.
An e-commerce company collects large amounts of data related to customer purchases, product sales, and marketing campaigns. However, management lacks clear visibility into:
-
which products and categories generate the most revenue
-
how customer behavior impacts sales performance
-
which marketing channels are most effective at converting users into customers
-
how customer segments contribute to overall revenue
Without structured analysis, it is difficult to make data-driven decisions about marketing investments, product strategy, and customer retention.
The main objectives of this analysis are to:
-
analyze overall sales performance and revenue distribution
-
identify top-performing products and product categories
-
examine customer purchasing behavior and repeat purchase patterns
-
evaluate marketing campaign performance using key marketing metrics
-
measure conversion efficiency across marketing channels
-
identify opportunities to improve customer retention and marketing effectiveness
The analysis was guided by several hypotheses:
• A small number of product categories generate the majority of revenue.
• Returning customers contribute significantly to overall sales.
• Most customers place only a small number of orders.
• Marketing channels differ in their conversion efficiency.
• Revenue varies across geographic markets.
The project uses four datasets representing key areas of an e-commerce business:
Information about customer demographics and signup behavior.
Product catalog data including product categories, pricing, and suppliers.
Transaction-level data describing customer purchases.
Marketing campaign data including campaign type, budget, clicks, and conversions.
The datasets contain 500 records each and include information about:
- customer locations and segments
- product categories and pricing
- order quantities and payment methods
- marketing campaign performance metrics
- Customer segment
- Product category
- Order quantity
- Order revenue
- Campaign type
- Campaign budget
- Clicks and conversions
- Each dataset contains 500 rows
- Multiple relational tables simulate a real business database structure
- Primary and foreign keys allow joins between tables
- Data types were consistent and suitable for analysis
- No missing values were detected during the preparation stage
sales_analysis
│
├── Data
│ └── campaings_data.csv
| └── customers_data.csv
| └── orders_data.csv
| └── products_data.csv
│
├── Notebooks
│ └── analysis.ipynb
│
├── Visualization
│ └── .png
│ └── .png
| └── .png
| └── .png
| └── .png
│
├── requirments.txt
|
├── README.md
|
└── DATASET_README.md
The analysis was conducted using the following tools:
- Python 3.11
- Pandas – data manipulation and analysis
- NumPy – numerical operations
- Matplotlib – data visualization
- Seaborn – statistical visualization
- MySQL – relational database queries
- Google Sheets – data exploration and dashboarding
- VS Code & Jupyter Notebook
Several preprocessing and metric calculation steps were performed to ensure the datasets were clean, consistent, and ready for analysis.
- Applied
.head()and inspected columns for all datasets:orders,customers,products,campaigns. - Verified column names, data types, and initial sample values.
- Performed preliminary validation for each dataset:
- Orders: checked
quantity,total_price,order_status - Customers: verified
customer_segment,signup_date,country/city - Products: reviewed
price,cost,category - Campaigns: inspected
start_date,end_date,budget,clicks,conversions
- Orders: checked
- Converted all date columns from strings to
datetimeobjects (pd.to_datetime) for accurate time-based analysis. - Applied to:
- Orders:
order_date - Customers:
signup_date - Campaigns:
start_dateandend_date
- Orders:
- Created new columns to support trend analysis and aggregation:
- Orders:
Year,Month,Year_Monthbased onorder_date - Customers:
Year,Monthbased onsignup_date - Campaigns:
Start_Year,Start_Month,End_Year,End_Monthbased on campaign dates
- Orders:
To measure marketing performance, several key metrics were calculated for each campaign:
- Conversion Rate (CR) =
conversions / clicks - Cost per Click (CPC) =
budget / clicks - Cost per Acquisition (CPA) =
budget / conversions
Handling divide-by-zero:
- Replaced infinite values from 0 conversions or clicks with
pd.NAto avoid errors in analysis.
Example of calculated metrics for campaigns:
| campaign_id | campaign_name | CR | CPC | CPA |
|---|---|---|---|---|
| 1 | Campaign_1 | 0.441 | 2.27 | 2.27 |
| 2 | Campaign_2 | 0.006 | 8.64 | 8.63 |
| 3 | Campaign_3 | 0.119 | 2.36 | 2.36 |
These metrics provide the foundation for evaluating campaign effectiveness and optimizing marketing spend.
- All datasets are now ready for:
- relational joins between orders, customers, products, and campaigns
- KPI and revenue analysis
- customer segmentation and campaign performance evaluation
This structured preparation ensures a clean foundation for subsequent analysis of marketing, sales, and customer behavior.
The sales analysis reveals that revenue is concentrated within a small number of product categories. Categories such as Beauty and Clothing generate the highest revenue, indicating stronger customer demand in these segments.
Additionally, a limited number of top-performing products contribute a significant share of overall sales, suggesting that focusing marketing efforts and inventory management on high-demand products could further increase revenue.
Monthly sales trends show fluctuations in purchasing activity throughout the year, highlighting opportunities for targeted marketing campaigns during higher-demand periods.
Customer analysis shows that New and Returning customers generate the majority of total revenue, while the VIP segment contributes a smaller portion of overall sales.
Purchase frequency analysis indicates that most customers place only one or two orders, suggesting that repeat purchasing behavior is relatively limited. This highlights an opportunity to increase revenue through improved customer retention strategies.
Geographically, Germany, Australia, and the United States represent the strongest markets, contributing the largest share of revenue compared to other regions.
Marketing campaign analysis shows significant differences in conversion efficiency across campaign types.
Google Ads and Social Media campaigns demonstrate the highest conversion rates, making them the most effective channels for converting users into customers.
Other campaign types such as Email and Affiliate marketing show lower conversion performance, indicating potential opportunities for optimization in targeting strategies and campaign messaging.
Budget and conversion comparisons also reveal that some channels produce higher conversions relative to their investment, highlighting differences in marketing efficiency.
Prioritize high-performing channels such as Google Ads and Social Media.
Develop retention strategies to encourage repeat purchases.
Focus marketing efforts on top-performing product categories.
Strengthen marketing campaigns in high-performing countries.
Gmail: maria.ilnitska11@gmail.com
LinkedIn: www.linkedin.com/in/maria-ilnitska
Telegram: @mariailnitska