Music Recommendation System Based on User Listening History

Developing a Music Recommendation System Using Spotify Song Data

Author

Affiliations

Srikanth (Solo) - Srikanth Oruganti

College of Information Science, University of Arizona

import numpy as np
import pandas as pd

Dataset

Source: Kaggle -https://www.kaggle.com/datasets/fcpercival/160k-spotify-songs-sorted?select=data.csv

import pandas as pd
# Load the dataset
data = pd.read_csv("data/data.csv")

# Display the first few rows of the dataset
data.head()
#| message: false
#|

	id	name	artists	duration_ms	release_date	year	acousticness	danceability	energy	instrumentalness	liveness	loudness	speechiness	tempo	valence	mode	key	popularity
0	0gNNToCW3qjabgTyBSjt3H	!Que Vida! - Mono Version	['Love']	220560	11/1/66	1966	0.5250	0.600	0.540	0.00305	0.100	-11.803	0.0328	125.898	0.547	1	9	26
1	0tMgFpOrXZR6irEOLNWwJL	"40"	['U2']	157840	2/28/83	1983	0.2280	0.368	0.480	0.70700	0.159	-11.605	0.0306	150.166	0.338	1	8	21
2	2ZywW3VyVx6rrlrX75n3JB	"40" - Live	['U2']	226200	8/20/83	1983	0.0998	0.272	0.684	0.01450	0.946	-9.728	0.0505	143.079	0.279	1	8	41
3	6DdWA7D1o5TU2kXWyCLcch	"40" - Remastered 2008	['U2']	157667	2/28/83	1983	0.1850	0.371	0.545	0.58200	0.183	-9.315	0.0307	150.316	0.310	1	8	37
4	3vMmwsAiLDCfyc1jl76lQE	"40" - Remastered 2008	['U2']	157667	2/28/83	1983	0.1850	0.371	0.545	0.58200	0.183	-9.315	0.0307	150.316	0.310	1	8	35

A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset. 1. The dataset offers a diverse mix of numerical audio‐features (continuous) and categorical/metadata features, allowing you to apply a wide range of techniques 2. Adequate size for scalable analytics with 160 K rows

Make sure to load the data and use inline code for some of this information.

Questions

What key audio and metadata features (such as danceability, energy, tempo, or valence) most strongly influence a song’s popularity on Spotify? This question focuses on identifying predictive attributes that determine song success, enabling classification or regression modeling to forecast popularity.

How have the musical characteristics of popular songs evolved across different decades (1920s–2020s)? This question explores temporal trends to reveal how listener preferences and musical production styles have changed over time using clustering and trend analysis.

Analysis plan

A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

--- title: "Music Recommendation System Based on User Listening History" subtitle: "Developing a Music Recommendation System Using Spotify Song Data" author: - name: "Srikanth (Solo) - Srikanth Oruganti" affiliations: - name: "College of Information Science, University of Arizona" - description: "It is not an easy task as there are millions of songs on music streaming services such as Spotify and it is possible to recommend the user songs that they might appreciate. The presented project will be devoted to the creation of a recommendation system on the basis of the data of about 1921-2020 Spotify lists. It has more than 160,000 songs, and the data includes the title of the song as well as the artist or the year of release, tempo, danceability, energy, and popularity. The system will utilize a variety of techniques: Collaborative Filtering: Based on user-item interactions (such as previous song plays, ratings, or likes). Content-Based Filtering: Recommending songs with similar attributes (e.g., genre, tempo, danceability). Hybrid Models: Combining both collaborative and content-based methods for more accurate recommendations." format: html: code-tools: true code-overflow: wrap code-line-numbers: true embed-resources: true editor: visual code-annotations: hover execute: warning: false jupyter: python3 --- ```{python} import numpy as np import pandas as pd ``` ## Dataset - **Source:** Kaggle -https://www.kaggle.com/datasets/fcpercival/160k-spotify-songs-sorted?select=data.csv ```{python} import pandas as pd # Load the dataset data = pd.read_csv("data/data.csv") # Display the first few rows of the dataset data.head() #| message: false #| ``` A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset. 1. The dataset offers a diverse mix of numerical audio‐features (continuous) and categorical/metadata features, allowing you to apply a wide range of techniques 2. Adequate size for scalable analytics with 160 K rows Make sure to load the data and use inline code for some of this information. ## Questions What key audio and metadata features (such as danceability, energy, tempo, or valence) most strongly influence a song’s popularity on Spotify? This question focuses on identifying predictive attributes that determine song success, enabling classification or regression modeling to forecast popularity. How have the musical characteristics of popular songs evolved across different decades (1920s–2020s)? This question explores temporal trends to reveal how listener preferences and musical production styles have changed over time using clustering and trend analysis. ## Analysis plan - A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).