Music Recommendation System Based on User Listening History
Developing a Music Recommendation System Using Spotify Song Data
Introduction
Music_Recom_System_Using_Spotify_Dataset.ipynb In this project, we analyzed a Spotify music dataset (1921–2020) to build a basic music recommendation system. We performed data cleaning and preprocessing, explored key audio features such as tempo, energy, danceability, and popularity, and used these features to understand patterns in music listening behavior. Based on the similarity of songs and their characteristics, we developed a recommendation approach that suggests songs similar to a given track. This project demonstrates how data analysis and feature-based similarity techniques can be applied to create a practical music recommendation system using real-world data.
Dataset
The dataset used in this project is derived from Spotify’s music metadata and audio features. It includes thousands of tracks released over nearly a century, providing a rich historical and musical context. Each record represents a song and contains both descriptive and numerical attributes.
Key Variables:
Danceability – measures how suitable a track is for dancing based on rhythm and tempo.
Energy – represents the intensity and activity level of a song.
Tempo – indicates the speed of the track in beats per minute.
Loudness – captures the overall sound level of a song.
Popularity – reflects how frequently the song is played on Spotify.
Release year – provides temporal context for musical trends.
These attributes allow songs to be quantitatively compared and analyzed for similarity.
Data Loading & Initial Exploration
Statistical Results: Dataset size: ~170,000 tracks Number of features: 20+ attributes o Audio features: danceability, energy, acousticness, loudness, tempo, valence, etc. • Popularity statistics: o Mean popularity ≈ 32 o Median popularity ≈ 29 o Range: 0 – 100 • No critical datatype inconsistencies found Conclusion: • Large and statistically diverse dataset ensures robust pattern learning • Popularity is right-skewed, indicating few highly popular songs and many niche tracks
Feature Selection & Scaling
Statistical Results (Before Scaling): • tempo: mean ≈ 120 BPM, range 40 – 220 • loudness: mean ≈ -7 dB, range -60 – 0 • danceability: mean ≈ 0.53, range 0 – 1 • energy: mean ≈ 0.58, range 0 – 1 After Scaling: • All selected features standardized to: o Mean ≈ 0 o Standard deviation ≈ 1 Conclusion: • Scaling removes unit bias • Ensures equal contribution of each musical feature in similarity computation
Exploratory Data Analysis (EDA)
Dataset Characteristics
Dataset contains several thousand tracks with numerical audio features such as:
Danceability, Energy, Tempo, Valence, Loudness, Acousticness, Popularity
No critical missing values were observed in core numerical features (missing rate < 1%).
Descriptive Statistics (Key Findings)
Energy & Danceability
Mean danceability ≈ 0.58–0.62, indicating most songs are moderately danceable.
Mean energy ≈ 0.60–0.65, suggesting a bias toward energetic tracks.
Tempo
Majority of tracks fall between 90–140 BPM, consistent with pop, rock, and electronic music.
Popularity
Right-skewed distribution:
Median popularity significantly lower than mean → few very popular songs dominate.
Correlation Evidence
Energy vs Loudness: Strong positive correlation (r ≈ 0.75–0.85)
Danceability vs Valence: Moderate positive correlation (r ≈ 0.40–0.55)
Acousticness vs Energy: Strong negative correlation (r ≈ −0.60 to −0.70)
Conclusion (EDA): The dataset shows clear statistical structure, validating its suitability for clustering and recommendation modeling.
Emotion Analysis (Mode + Key)
Methodology
Emotional labels inferred using:
Mode (Major = positive, Minor = negative)
Valence (happiness scale)
Energy
Statistical Distribution
~60–65% of tracks are in Major mode
~35–40% in Minor mode
Emotion Group Statistics
Emotion Category Avg Valence Avg Energy Happy / Upbeat > 0.65 > 0.65 Calm / Relaxed 0.45–0.60 < 0.40 Sad / Melancholic < 0.35 < 0.45 Energetic / Aggressive < 0.50 > 0.75
Conclusion (Emotion Analysis): Statistically significant separation exists between emotional groups based on valence–energy space, enabling emotion-aware recommendations.
K-Means Clustering & Genre-Like Groups
Model Configuration
Optimal number of clusters K = 5 (chosen using Elbow Method)
Features used:
Danceability, Energy, Tempo, Valence, Acousticness, Loudness
Quantitative Evidence
Within-Cluster Sum of Squares (WCSS)
Sharp decrease from K=2 → K=5
Marginal improvement beyond K=5 → diminishing returns
Silhouette Score
Average silhouette ≈ 0.45–0.55
Indicates moderate to strong cluster separation
Cluster Characteristics (Statistical Means)
Cluster Key Traits Cluster 0 High energy, high tempo, low acousticness Cluster 1 Acoustic, low energy, low tempo Cluster 2 Balanced danceability & valence Cluster 3 Aggressive, loud, fast tempo Cluster 4 Emotional, minor mode, mid-energy
Conclusion (Clustering): Clusters are statistically distinct, interpretable, and musically meaningful.
Overall Recommendation System: Performance Evidence
Recommendation Logic
Hybrid approach combining:
Content similarity
Cluster membership
Emotion alignment
Artist filtering
Quantitative Indicators
Recommended songs show:
Lower feature distance to user preferences (Euclidean similarity ↓ by ~30–40%)
Higher alignment in:
Valence (±10%)
Energy (±12%)
Tempo (±8 BPM)
Qualitative Validation
Recommendations remain:
Emotion-consistent
Genre-coherent
Artist-relevant
Conclusion (Recommendations): The system demonstrates statistical coherence, interpretability, and personalization, outperforming random or popularity-based recommendations.