Music Recommendation System Based on User Listening History
Developing a Music Recommendation System Using Spotify Song Data
Dataset
- Source: Kaggle -https://www.kaggle.com/datasets/fcpercival/160k-spotify-songs-sorted?select=data.csv
import pandas as pd
# Load the dataset
data = pd.read_csv("data/data.csv")
# Display the first few rows of the dataset
data.head()
#| message: false
#| | id | name | artists | duration_ms | release_date | year | acousticness | danceability | energy | instrumentalness | liveness | loudness | speechiness | tempo | valence | mode | key | popularity | explicit | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0gNNToCW3qjabgTyBSjt3H | !Que Vida! - Mono Version | ['Love'] | 220560 | 11/1/66 | 1966 | 0.5250 | 0.600 | 0.540 | 0.00305 | 0.100 | -11.803 | 0.0328 | 125.898 | 0.547 | 1 | 9 | 26 | 0 |
| 1 | 0tMgFpOrXZR6irEOLNWwJL | "40" | ['U2'] | 157840 | 2/28/83 | 1983 | 0.2280 | 0.368 | 0.480 | 0.70700 | 0.159 | -11.605 | 0.0306 | 150.166 | 0.338 | 1 | 8 | 21 | 0 |
| 2 | 2ZywW3VyVx6rrlrX75n3JB | "40" - Live | ['U2'] | 226200 | 8/20/83 | 1983 | 0.0998 | 0.272 | 0.684 | 0.01450 | 0.946 | -9.728 | 0.0505 | 143.079 | 0.279 | 1 | 8 | 41 | 0 |
| 3 | 6DdWA7D1o5TU2kXWyCLcch | "40" - Remastered 2008 | ['U2'] | 157667 | 2/28/83 | 1983 | 0.1850 | 0.371 | 0.545 | 0.58200 | 0.183 | -9.315 | 0.0307 | 150.316 | 0.310 | 1 | 8 | 37 | 0 |
| 4 | 3vMmwsAiLDCfyc1jl76lQE | "40" - Remastered 2008 | ['U2'] | 157667 | 2/28/83 | 1983 | 0.1850 | 0.371 | 0.545 | 0.58200 | 0.183 | -9.315 | 0.0307 | 150.316 | 0.310 | 1 | 8 | 35 | 0 |
A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset. 1. The dataset offers a diverse mix of numerical audio‐features (continuous) and categorical/metadata features, allowing you to apply a wide range of techniques 2. Adequate size for scalable analytics with 160 K rows
Make sure to load the data and use inline code for some of this information.
Questions
What key audio and metadata features (such as danceability, energy, tempo, or valence) most strongly influence a song’s popularity on Spotify? This question focuses on identifying predictive attributes that determine song success, enabling classification or regression modeling to forecast popularity.
How have the musical characteristics of popular songs evolved across different decades (1920s–2020s)? This question explores temporal trends to reveal how listener preferences and musical production styles have changed over time using clustering and trend analysis.
Analysis plan
- A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).