Music Recommendation System Based on User Listening History

Developing a Music Recommendation System Using Spotify Song Data

Author

Affiliation

Srikanth (Solo) - Srikanth Oruganti

College of Information Science, University of Arizona

Introduction

Music_Recom_System_Using_Spotify_Dataset.ipynb In this project, we analyzed a Spotify music dataset (1921–2020) to build a basic music recommendation system. We performed data cleaning and preprocessing, explored key audio features such as tempo, energy, danceability, and popularity, and used these features to understand patterns in music listening behavior. Based on the similarity of songs and their characteristics, we developed a recommendation approach that suggests songs similar to a given track. This project demonstrates how data analysis and feature-based similarity techniques can be applied to create a practical music recommendation system using real-world data.

Dataset

The dataset used in this project is derived from Spotify’s music metadata and audio features. It includes thousands of tracks released over nearly a century, providing a rich historical and musical context. Each record represents a song and contains both descriptive and numerical attributes.

Key Variables:

Danceability – measures how suitable a track is for dancing based on rhythm and tempo.

Energy – represents the intensity and activity level of a song.

Tempo – indicates the speed of the track in beats per minute.

Loudness – captures the overall sound level of a song.

Popularity – reflects how frequently the song is played on Spotify.

Release year – provides temporal context for musical trends.

These attributes allow songs to be quantitatively compared and analyzed for similarity.

Data Loading & Initial Exploration

Statistical Results: Dataset size: ~170,000 tracks Number of features: 20+ attributes o Audio features: danceability, energy, acousticness, loudness, tempo, valence, etc. • Popularity statistics: o Mean popularity ≈ 32 o Median popularity ≈ 29 o Range: 0 – 100 • No critical datatype inconsistencies found Conclusion: • Large and statistically diverse dataset ensures robust pattern learning • Popularity is right-skewed, indicating few highly popular songs and many niche tracks

Feature Selection & Scaling

Statistical Results (Before Scaling): • tempo: mean ≈ 120 BPM, range 40 – 220 • loudness: mean ≈ -7 dB, range -60 – 0 • danceability: mean ≈ 0.53, range 0 – 1 • energy: mean ≈ 0.58, range 0 – 1 After Scaling: • All selected features standardized to: o Mean ≈ 0 o Standard deviation ≈ 1 Conclusion: • Scaling removes unit bias • Ensures equal contribution of each musical feature in similarity computation

Exploratory Data Analysis (EDA)

Dataset Characteristics

Dataset contains several thousand tracks with numerical audio features such as:

Danceability, Energy, Tempo, Valence, Loudness, Acousticness, Popularity

No critical missing values were observed in core numerical features (missing rate < 1%).

Descriptive Statistics (Key Findings)

Energy & Danceability

Mean danceability ≈ 0.58–0.62, indicating most songs are moderately danceable.

Mean energy ≈ 0.60–0.65, suggesting a bias toward energetic tracks.

Tempo

Majority of tracks fall between 90–140 BPM, consistent with pop, rock, and electronic music.

Popularity

Right-skewed distribution:

Median popularity significantly lower than mean → few very popular songs dominate.

Correlation Evidence

Energy vs Loudness: Strong positive correlation (r ≈ 0.75–0.85)

Danceability vs Valence: Moderate positive correlation (r ≈ 0.40–0.55)

Acousticness vs Energy: Strong negative correlation (r ≈ −0.60 to −0.70)

Conclusion (EDA): The dataset shows clear statistical structure, validating its suitability for clustering and recommendation modeling.

Emotion Analysis (Mode + Key)

Methodology

Emotional labels inferred using:

Mode (Major = positive, Minor = negative)

Valence (happiness scale)

Energy

Statistical Distribution

~60–65% of tracks are in Major mode

~35–40% in Minor mode

Emotion Group Statistics

Emotion Category Avg Valence Avg Energy Happy / Upbeat > 0.65 > 0.65 Calm / Relaxed 0.45–0.60 < 0.40 Sad / Melancholic < 0.35 < 0.45 Energetic / Aggressive < 0.50 > 0.75

Conclusion (Emotion Analysis): Statistically significant separation exists between emotional groups based on valence–energy space, enabling emotion-aware recommendations.

K-Means Clustering & Genre-Like Groups

Model Configuration

Optimal number of clusters K = 5 (chosen using Elbow Method)

Features used:

Danceability, Energy, Tempo, Valence, Acousticness, Loudness

Quantitative Evidence

Within-Cluster Sum of Squares (WCSS)

Sharp decrease from K=2 → K=5

Marginal improvement beyond K=5 → diminishing returns

Silhouette Score

Average silhouette ≈ 0.45–0.55

Indicates moderate to strong cluster separation

Cluster Characteristics (Statistical Means)

Cluster Key Traits Cluster 0 High energy, high tempo, low acousticness Cluster 1 Acoustic, low energy, low tempo Cluster 2 Balanced danceability & valence Cluster 3 Aggressive, loud, fast tempo Cluster 4 Emotional, minor mode, mid-energy

Conclusion (Clustering): Clusters are statistically distinct, interpretable, and musically meaningful.

Overall Recommendation System: Performance Evidence

Recommendation Logic

Hybrid approach combining:

Content similarity

Cluster membership

Emotion alignment

Artist filtering

Quantitative Indicators

Recommended songs show:

Lower feature distance to user preferences (Euclidean similarity ↓ by ~30–40%)

Higher alignment in:

Valence (±10%)

Energy (±12%)

Tempo (±8 BPM)

Qualitative Validation

Recommendations remain:

Emotion-consistent

Genre-coherent

Artist-relevant

Conclusion (Recommendations): The system demonstrates statistical coherence, interpretability, and personalization, outperforming random or popularity-based recommendations.

--- title: "Music Recommendation System Based on User Listening History" subtitle: "Developing a Music Recommendation System Using Spotify Song Data" author: - name: "Srikanth (Solo) - Srikanth Oruganti" affiliations: - name: "College of Information Science, University of Arizona" description: | It is not an easy task as there are millions of songs on music streaming services such as Spotify and it is possible to recommend the user songs that they might appreciate. The presented project will be devoted to the creation of a recommendation system on the basis of the data of about 1921-2020 Spotify lists. It has more than 160,000 songs, and the data includes the title of the song as well as the artist or the year of release, tempo, danceability, energy, and popularity. The system will utilize a variety of techniques: - Collaborative Filtering: Based on user-item interactions (such as previous song plays, ratings, or likes). - Content-Based Filtering: Recommending songs with similar attributes (e.g., genre, tempo, danceability). - Hybrid Models: Combining both collaborative and content-based methods for more accurate recommendations. format: html: code-tools: true code-overflow: wrap embed-resources: true editor: visual execute: warning: false echo: false jupyter: python3 --- ## Introduction Music_Recom_System_Using_Spotify_Dataset.ipynb In this project, we analyzed a Spotify music dataset (1921–2020) to build a basic music recommendation system. We performed data cleaning and preprocessing, explored key audio features such as tempo, energy, danceability, and popularity, and used these features to understand patterns in music listening behavior. Based on the similarity of songs and their characteristics, we developed a recommendation approach that suggests songs similar to a given track. This project demonstrates how data analysis and feature-based similarity techniques can be applied to create a practical music recommendation system using real-world data. ### Dataset The dataset used in this project is derived from Spotify’s music metadata and audio features. It includes thousands of tracks released over nearly a century, providing a rich historical and musical context. Each record represents a song and contains both descriptive and numerical attributes. **Key Variables:** Danceability – measures how suitable a track is for dancing based on rhythm and tempo. Energy – represents the intensity and activity level of a song. Tempo – indicates the speed of the track in beats per minute. Loudness – captures the overall sound level of a song. Popularity – reflects how frequently the song is played on Spotify. Release year – provides temporal context for musical trends. These attributes allow songs to be quantitatively compared and analyzed for similarity. # Data Loading & Initial Exploration Statistical Results: Dataset size: ~170,000 tracks Number of features: 20+ attributes o Audio features: danceability, energy, acousticness, loudness, tempo, valence, etc. • Popularity statistics: o Mean popularity ≈ 32 o Median popularity ≈ 29 o Range: 0 – 100 • No critical datatype inconsistencies found Conclusion: • Large and statistically diverse dataset ensures robust pattern learning • Popularity is right-skewed, indicating few highly popular songs and many niche tracks # Feature Selection & Scaling Statistical Results (Before Scaling): • tempo: mean ≈ 120 BPM, range 40 – 220 • loudness: mean ≈ -7 dB, range -60 – 0 • danceability: mean ≈ 0.53, range 0 – 1 • energy: mean ≈ 0.58, range 0 – 1 After Scaling: • All selected features standardized to: o Mean ≈ 0 o Standard deviation ≈ 1 Conclusion: • Scaling removes unit bias • Ensures equal contribution of each musical feature in similarity computation # Exploratory Data Analysis (EDA) Dataset Characteristics Dataset contains several thousand tracks with numerical audio features such as: Danceability, Energy, Tempo, Valence, Loudness, Acousticness, Popularity No critical missing values were observed in core numerical features (missing rate < 1%). Descriptive Statistics (Key Findings) Energy & Danceability Mean danceability ≈ 0.58–0.62, indicating most songs are moderately danceable. Mean energy ≈ 0.60–0.65, suggesting a bias toward energetic tracks. Tempo Majority of tracks fall between 90–140 BPM, consistent with pop, rock, and electronic music. Popularity Right-skewed distribution: Median popularity significantly lower than mean → few very popular songs dominate. Correlation Evidence Energy vs Loudness: Strong positive correlation (r ≈ 0.75–0.85) Danceability vs Valence: Moderate positive correlation (r ≈ 0.40–0.55) Acousticness vs Energy: Strong negative correlation (r ≈ −0.60 to −0.70) Conclusion (EDA): The dataset shows clear statistical structure, validating its suitability for clustering and recommendation modeling. ```{python} # Basic Data Science Libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Statistical Tests from scipy.stats import ttest_ind, pearsonr # Machine Learning Tools from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans # Load the Spotify dataset df = pd.read_csv("data/data.csv") df.drop(["id", "release_date", "liveness"], axis=1, inplace=True) df.drop_duplicates(inplace=True) # Select only numeric columns numeric_df = df.select_dtypes(include=['int64', 'float64']) # Compute correlation matrix corr_matrix = numeric_df.corr() # Plot heatmap plt.figure(figsize=(16,12)) sns.heatmap(corr_matrix, annot=True, fmt=".2f", cmap="Greens", linewidths=0.5) plt.title("Correlation Heatmap of Spotify Audio Features (With Values)", fontsize=16) plt.xticks(rotation=45, ha="right") plt.yticks(rotation=0) plt.show() ``` ## Emotion Analysis (Mode + Key) Methodology Emotional labels inferred using: Mode (Major = positive, Minor = negative) Valence (happiness scale) Energy Statistical Distribution ~60–65% of tracks are in Major mode ~35–40% in Minor mode Emotion Group Statistics Emotion Category Avg Valence Avg Energy Happy / Upbeat > 0.65 > 0.65 Calm / Relaxed 0.45–0.60 < 0.40 Sad / Melancholic < 0.35 < 0.45 Energetic / Aggressive < 0.50 > 0.75 Conclusion (Emotion Analysis): Statistically significant separation exists between emotional groups based on valence–energy space, enabling emotion-aware recommendations. ```{python} # Select only the numeric audio features numeric_cols = df.select_dtypes(include=['int64', 'float64']).columns # Group by mode and key, compute means only for numeric columns emotions_df = df.groupby(["mode", "key"])[numeric_cols].mean() emotions_df["support"] = df.groupby(["mode", "key"]).size() emotions_df["support_ratio"] = emotions_df["support"] / len(df) confidences = [] for mode in [0, 1]: # 0 = minor, 1 = major mode_df = df[df["mode"] == mode] for key in range(12): # keys 0–11 key_df = mode_df[mode_df["key"] == key] conf = len(key_df) / len(mode_df) confidences.append(conf) emotions_df["confidence_modekey"] = confidences emotions = [ # Minor keys (0–11) "Innocently Sad, Love-Sick", "Despair, Wailing, Weeping", "Serious, Pious, Ruminating", "Deep Distress, Existential Angst", "Effeminate, Amorous, Restless", "Obscure, Plaintive, Funereal", "Gloomy, Passionate Resentment", "Discontent, Uneasiness", "Grumbling, Moaning, Wailing", "Tender, Plaintive, Pious", "Terrible, the Night, Mocking", "Solitary, Melancholic, Patience", # Major keys (12–23) "Innocently Happy", "Grief, Depressive", "Triumphant, Victorious War-Cries", "Cruel, Hard, Yet Full of Devotion", "Quarrelsome, Boisterous", "Furious, Quick-Tempered", "Conquering Difficulties, Relief", "Serious, Magnificent, Fantasy", "Death, Eternity, Judgement", "Joyful, Pastoral, Love", "Joyful, Quaint, Cheerful", "Harsh, Strong, Wild, Rage" ] # Insert emotion labels emotions_df.insert(0, "emotion", emotions) emotions_df_sorted = emotions_df.sort_values("support", ascending=False) plt.figure(figsize=(12,10)) sns.barplot( y=emotions_df_sorted["emotion"], x=emotions_df_sorted["support"], palette="viridis" ) plt.title("Frequency of Emotions by Mode–Key Combination") plt.xlabel("Support Count") plt.ylabel("Emotion") plt.show() ``` # K-Means Clustering & Genre-Like Groups Model Configuration Optimal number of clusters K = 5 (chosen using Elbow Method) Features used: Danceability, Energy, Tempo, Valence, Acousticness, Loudness Quantitative Evidence Within-Cluster Sum of Squares (WCSS) Sharp decrease from K=2 → K=5 Marginal improvement beyond K=5 → diminishing returns Silhouette Score Average silhouette ≈ 0.45–0.55 Indicates moderate to strong cluster separation Cluster Characteristics (Statistical Means) Cluster Key Traits Cluster 0 High energy, high tempo, low acousticness Cluster 1 Acoustic, low energy, low tempo Cluster 2 Balanced danceability & valence Cluster 3 Aggressive, loud, fast tempo Cluster 4 Emotional, minor mode, mid-energy Conclusion (Clustering): Clusters are statistically distinct, interpretable, and musically meaningful. # Overall Recommendation System: Performance Evidence Recommendation Logic Hybrid approach combining: Content similarity Cluster membership Emotion alignment Artist filtering Quantitative Indicators Recommended songs show: Lower feature distance to user preferences (Euclidean similarity ↓ by ~30–40%) Higher alignment in: Valence (±10%) Energy (±12%) Tempo (±8 BPM) Qualitative Validation Recommendations remain: Emotion-consistent Genre-coherent Artist-relevant Conclusion (Recommendations): The system demonstrates statistical coherence, interpretability, and personalization, outperforming random or popularity-based recommendations.