INFO-523-DATA MININIG_Final Project ‘BMW Car Price Analysis & Prediction (Project Proposal)’
Proposal
Dataset
| Model | Year | Region | Color | Fuel_Type | Transmission | Engine_Size_L | Mileage_KM | Price_USD | Sales_Volume | Sales_Classification | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5 Series | 2016 | Asia | Red | Petrol | Manual | 3.5 | 151748 | 98740 | 8300 | High |
| 1 | i8 | 2013 | North America | Red | Hybrid | Automatic | 1.6 | 121671 | 79219 | 3428 | Low |
| 2 | 5 Series | 2022 | North America | Blue | Petrol | Automatic | 4.5 | 10991 | 113265 | 6994 | Low |
| 3 | X3 | 2024 | Middle East | Blue | Petrol | Automatic | 1.7 | 27255 | 60971 | 4047 | Low |
| 4 | 7 Series | 2020 | South America | Black | Diesel | Manual | 2.1 | 122131 | 49898 | 3080 | Low |
| 5 | 5 Series | 2017 | Middle East | Silver | Diesel | Manual | 1.9 | 171362 | 42926 | 1232 | Low |
| 6 | i8 | 2022 | Europe | White | Diesel | Manual | 1.8 | 196741 | 55064 | 7949 | High |
| 7 | M5 | 2014 | Asia | Black | Diesel | Automatic | 1.6 | 121156 | 102778 | 632 | Low |
| 8 | X3 | 2016 | South America | White | Diesel | Automatic | 1.7 | 48073 | 116482 | 8944 | High |
| 9 | i8 | 2019 | Europe | White | Electric | Manual | 3.0 | 35700 | 96257 | 4411 | Low |
Dataset Overview
This dataset — BMW Worldwide Sales Records (2010–2024) — contains over 50,000 records of BMW’s sales and specifications across multiple regions. Key features include: Model, Year, Engine_Size_L, Transmission, Fuel_Type, Color, Region, Price, and Sales_Volume. This dataset was chosen because it provides a diverse range of attributes for exploring market behavior, pricing trends, and customer preferences in the automotive industry.
Questions
- What are the key factors influencing BMW used-car prices in the market?
- Can machine learning models accurately predict used-car prices?
- How stable are these predictions over time?
- Using a temporal split, how well can the model predict pricing trends for “next year”?
Analysis Plan
- Data Cleaning & Preparation
Handle missing values
Standardize numerical units
Encode categorical variables
Ensure time-related variables are aligned for temporal analysis
- Exploratory Data Analysis (EDA)
Price trends by year, region, and model
Correlation analysis between price and vehicle attributes
Identify patterns that may indicate shifting market preferences
- Modeling Approach
Apply Machine Learning Models:
Linear Regression
Decision Tree Regression
Random Forest Regression
Evaluate using:
RMSE, MAE, R²
Perform:
Feature importance analysis
Temporal train–test split to simulate predicting next-year price trends
- Visualization Dashboard
Build clear, interactive visualizations using matplotlib / seaborn
Include:
Price distribution
Trend lines across years
Model performance summary
Visualization Dashboard:
- Build clear, interactive plots for trends and insights using matplotlib and seaborn.
Ethical AI Use Disclosure:
- AI tools (e.g., ChatGPT) were used ethically for code debugging, idea exploration, and documentation enhancement.
- All datasets are real and sourced from Kaggle.