Music Recommendation System Based on User Listening History

Developing a Music Recommendation System Using Spotify Song Data

Author
Affiliations

Srikanth (Solo) - Srikanth Oruganti

College of Information Science, University of Arizona

import numpy as np
import pandas as pd

Dataset

  • Source: Kaggle -https://www.kaggle.com/datasets/fcpercival/160k-spotify-songs-sorted?select=data.csv
import pandas as pd
# Load the dataset
data = pd.read_csv("data/data.csv")

# Display the first few rows of the dataset
data.head()
#| message: false
#| 
id name artists duration_ms release_date year acousticness danceability energy instrumentalness liveness loudness speechiness tempo valence mode key popularity explicit
0 0gNNToCW3qjabgTyBSjt3H !Que Vida! - Mono Version ['Love'] 220560 11/1/66 1966 0.5250 0.600 0.540 0.00305 0.100 -11.803 0.0328 125.898 0.547 1 9 26 0
1 0tMgFpOrXZR6irEOLNWwJL "40" ['U2'] 157840 2/28/83 1983 0.2280 0.368 0.480 0.70700 0.159 -11.605 0.0306 150.166 0.338 1 8 21 0
2 2ZywW3VyVx6rrlrX75n3JB "40" - Live ['U2'] 226200 8/20/83 1983 0.0998 0.272 0.684 0.01450 0.946 -9.728 0.0505 143.079 0.279 1 8 41 0
3 6DdWA7D1o5TU2kXWyCLcch "40" - Remastered 2008 ['U2'] 157667 2/28/83 1983 0.1850 0.371 0.545 0.58200 0.183 -9.315 0.0307 150.316 0.310 1 8 37 0
4 3vMmwsAiLDCfyc1jl76lQE "40" - Remastered 2008 ['U2'] 157667 2/28/83 1983 0.1850 0.371 0.545 0.58200 0.183 -9.315 0.0307 150.316 0.310 1 8 35 0

A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset. 1. The dataset offers a diverse mix of numerical audio‐features (continuous) and categorical/metadata features, allowing you to apply a wide range of techniques 2. Adequate size for scalable analytics with 160 K rows

Make sure to load the data and use inline code for some of this information.

Questions

What key audio and metadata features (such as danceability, energy, tempo, or valence) most strongly influence a song’s popularity on Spotify? This question focuses on identifying predictive attributes that determine song success, enabling classification or regression modeling to forecast popularity.

How have the musical characteristics of popular songs evolved across different decades (1920s–2020s)? This question explores temporal trends to reveal how listener preferences and musical production styles have changed over time using clustering and trend analysis.

Analysis plan

  • A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).