Music Recommendation System Based on User Listening History

Developing a Music Recommendation System Using Spotify Song Data

Author
Affiliation

Srikanth (Solo) - Srikanth Oruganti

College of Information Science, University of Arizona

Introduction

Music_Recom_System_Using_Spotify_Dataset.ipynb In this project, we analyzed a Spotify music dataset (1921–2020) to build a basic music recommendation system. We performed data cleaning and preprocessing, explored key audio features such as tempo, energy, danceability, and popularity, and used these features to understand patterns in music listening behavior. Based on the similarity of songs and their characteristics, we developed a recommendation approach that suggests songs similar to a given track. This project demonstrates how data analysis and feature-based similarity techniques can be applied to create a practical music recommendation system using real-world data.

Dataset

The dataset used in this project is derived from Spotify’s music metadata and audio features. It includes thousands of tracks released over nearly a century, providing a rich historical and musical context. Each record represents a song and contains both descriptive and numerical attributes.

Key Variables:

Danceability – measures how suitable a track is for dancing based on rhythm and tempo.

Energy – represents the intensity and activity level of a song.

Tempo – indicates the speed of the track in beats per minute.

Loudness – captures the overall sound level of a song.

Popularity – reflects how frequently the song is played on Spotify.

Release year – provides temporal context for musical trends.

These attributes allow songs to be quantitatively compared and analyzed for similarity.

Data Loading & Initial Exploration

Statistical Results: Dataset size: ~170,000 tracks Number of features: 20+ attributes o Audio features: danceability, energy, acousticness, loudness, tempo, valence, etc. • Popularity statistics: o Mean popularity ≈ 32 o Median popularity ≈ 29 o Range: 0 – 100 • No critical datatype inconsistencies found Conclusion: • Large and statistically diverse dataset ensures robust pattern learning • Popularity is right-skewed, indicating few highly popular songs and many niche tracks

Feature Selection & Scaling

Statistical Results (Before Scaling): • tempo: mean ≈ 120 BPM, range 40 – 220 • loudness: mean ≈ -7 dB, range -60 – 0 • danceability: mean ≈ 0.53, range 0 – 1 • energy: mean ≈ 0.58, range 0 – 1 After Scaling: • All selected features standardized to: o Mean ≈ 0 o Standard deviation ≈ 1 Conclusion: • Scaling removes unit bias • Ensures equal contribution of each musical feature in similarity computation

Exploratory Data Analysis (EDA)

Dataset Characteristics

Dataset contains several thousand tracks with numerical audio features such as:

Danceability, Energy, Tempo, Valence, Loudness, Acousticness, Popularity

No critical missing values were observed in core numerical features (missing rate < 1%).

Descriptive Statistics (Key Findings)

Energy & Danceability

Mean danceability ≈ 0.58–0.62, indicating most songs are moderately danceable.

Mean energy ≈ 0.60–0.65, suggesting a bias toward energetic tracks.

Tempo

Majority of tracks fall between 90–140 BPM, consistent with pop, rock, and electronic music.

Popularity

Right-skewed distribution:

Median popularity significantly lower than mean → few very popular songs dominate.

Correlation Evidence

Energy vs Loudness: Strong positive correlation (r ≈ 0.75–0.85)

Danceability vs Valence: Moderate positive correlation (r ≈ 0.40–0.55)

Acousticness vs Energy: Strong negative correlation (r ≈ −0.60 to −0.70)

Conclusion (EDA): The dataset shows clear statistical structure, validating its suitability for clustering and recommendation modeling.

Emotion Analysis (Mode + Key)

Methodology

Emotional labels inferred using:

Mode (Major = positive, Minor = negative)

Valence (happiness scale)

Energy

Statistical Distribution

~60–65% of tracks are in Major mode

~35–40% in Minor mode

Emotion Group Statistics

Emotion Category Avg Valence Avg Energy Happy / Upbeat > 0.65 > 0.65 Calm / Relaxed 0.45–0.60 < 0.40 Sad / Melancholic < 0.35 < 0.45 Energetic / Aggressive < 0.50 > 0.75

Conclusion (Emotion Analysis): Statistically significant separation exists between emotional groups based on valence–energy space, enabling emotion-aware recommendations.

K-Means Clustering & Genre-Like Groups

Model Configuration

Optimal number of clusters K = 5 (chosen using Elbow Method)

Features used:

Danceability, Energy, Tempo, Valence, Acousticness, Loudness

Quantitative Evidence

Within-Cluster Sum of Squares (WCSS)

Sharp decrease from K=2 → K=5

Marginal improvement beyond K=5 → diminishing returns

Silhouette Score

Average silhouette ≈ 0.45–0.55

Indicates moderate to strong cluster separation

Cluster Characteristics (Statistical Means)

Cluster Key Traits Cluster 0 High energy, high tempo, low acousticness Cluster 1 Acoustic, low energy, low tempo Cluster 2 Balanced danceability & valence Cluster 3 Aggressive, loud, fast tempo Cluster 4 Emotional, minor mode, mid-energy

Conclusion (Clustering): Clusters are statistically distinct, interpretable, and musically meaningful.

Overall Recommendation System: Performance Evidence

Recommendation Logic

Hybrid approach combining:

Content similarity

Cluster membership

Emotion alignment

Artist filtering

Quantitative Indicators

Recommended songs show:

Lower feature distance to user preferences (Euclidean similarity ↓ by ~30–40%)

Higher alignment in:

Valence (±10%)

Energy (±12%)

Tempo (±8 BPM)

Qualitative Validation

Recommendations remain:

Emotion-consistent

Genre-coherent

Artist-relevant

Conclusion (Recommendations): The system demonstrates statistical coherence, interpretability, and personalization, outperforming random or popularity-based recommendations.