This dance, it’s like a weapon: Radiohead’s and Beck’s danceability, valence, popularity, and more from the LastFM and Spotify APIs
This article is originally published at https://rcrastinate.blogspot.com/
- Set up the R packages to access the Spotify and Last.FM APIs
- Get chart information on a specific user (in this case: me) from Last.FM
- Get information on these top artists from Spotify
- Visualize some of the information with ggplot2
- Get information on all the songs by Beck and Radiohead
- Again: Visualize this information
library(spotifyr)
library(RLastFM)
library(dplyr)
# Spotify
s.id <- "< API ID >"
s.sc <- "< API Secret"
# LastFM
l.ky <- "< API Key >"
l.ap <- "http://ws.audioscrobbler.com/2.0/"
Sys.setenv(SPOTIFY_CLIENT_ID = s.id)
Sys.setenv(SPOTIFY_CLIENT_SECRET = s.sc)
access_token <- get_spotify_access_token()
I slightly adapted the user.getTopArtists() function in the RLastFM package because I wanted to use the "limit" parameter of the Last.FM API.
user.getTopArt2 <- function (username, period = NA, key = l.ky, parse = TRUE, limit = 20) {
params = list(method = "user.gettopartists", user = username,
period = period, api_key = key, limit = limit)
params = params[!as.logical(lapply(params, is.na))]
ret = getForm(l.ap, .params = params)
doc = xmlParse(ret, asText = TRUE)
if (parse)
doc = RLastFM:::p.user.gettopartists(doc)
return(doc) }
Also, I adapted the function get_artist_audio_features() from the spotifyr package to override the dialog where I am required to type the name of the artist in interactive sessions.
get.art.audio.feats <- function (artist_name, access_token = get_spotify_access_token()) {
artists <- get_artists(artist_name)
selected_artist <- artists$artist_name[1]
artist_uri <- artists$artist_uri[artists$artist_name ==
selected_artist]
albums <- get_albums(artist_uri)
if (nrow(albums) > 0) {
albums <- select(albums, -c(base_album_name, base_album,
num_albums, num_base_albums, album_rank))
}
else {
stop(paste0("Cannot find any albums for \"", selected_artist,
"\" on Spotify"))
}
album_popularity <- get_album_popularity(albums)
tracks <- get_album_tracks(albums)
track_features <- get_track_audio_features(tracks)
track_popularity <- get_track_popularity(tracks)
tots <- albums %>% left_join(album_popularity, by = "album_uri") %>%
left_join(tracks, by = "album_name") %>% left_join(track_features,
by = "track_uri") %>% left_join(track_popularity, by = "track_uri")
return(tots) }
Now, we getting the relevant data. First, we are accessing the current top 20 artists (of all time) of a specific Last.FM user.
top.arts <- as.data.frame(user.getTopArt2("< user name >", period = "overall", limit = 20))
head(top.arts)
artist playcount mbid rank
1 Menomena 1601 ad386705-fb8c-40ec-94d7-e690e079e979 1
2 Beck 1594 a8baaa41-50f1-4f63-979e-717c14979dfb 2
3 Radiohead 1556 a74b1b7f-71a5-4011-9441-d0b5e4122711 3
4 PJ Harvey 1546 e795e03d-b5d5-4a5f-834d-162cfb308a2c 4
5 Prince 1501 cdc0fff7-54cf-4052-a283-319b648670fd 5
6 Nick Cave & The Bad Seeds 1243 172e1f1a-504d-4488-b053-6344ba63e6d0 6
Now, we are accessing the "artist audio features" for each one of the top 20 artists. I'm including a Sys.sleep(5) here, because I don't want to access Spotify's API too frequently. With 5 seconds between each artist, we should be on the safe side - I guess 2 seconds would be enough.
Please note that get.art.audio.feats() returns a tibble that contains one row for each of the artist's tracks. That is why I have to aggregate the data by artist. Hence, the "audio features" of an artist are the mean values of all the tracks by this artist.
art.info.list <- list()
for (i in 1:nrow(top.arts)) {
cat(i, "\n")
new.art.info <- get.art.audio.feats(top.arts[i, "artist"])
new.art.info$artist <- top.arts[i, "artist"]
art.info.list[[length(art.info.list)+1]] <- new.art.info
Sys.sleep(5)
}
art.info <- do.call("rbind", art.info.list)
agg.art.info <- aggregate(cbind(danceability, energy, speechiness,
acousticness, instrumentalness, liveness,
valence, tempo, track_popularity) ~ artist,
data = art.info, FUN = mean)
Now, let's merge Last.FM's playcount information with the information we got from Spotify's API. We are also sorting the dataframe by my playcount on Last.FM.
agg.art.info <- merge(agg.art.info, top.arts[,c("playcount", "rank", "artist")], by = "artist", all.x = T, all.y = F)
agg.art.info <- agg.art.info[order(agg.art.info$playcount, decreasing = T),]
saveRDS(agg.art.info, file = "agg.art.info.Rds")
What we get is this:
artist danceability energy speechiness acousticness instrumentalness liveness
7 Menomena 0.5397246 0.5475217 0.05414783 0.2678826 0.28483706 0.1451246
3 Beck 0.6043865 0.6106874 0.06503006 0.2653964 0.20943021 0.1872540
13 Radiohead 0.4095833 0.5755077 0.05504551 0.3131903 0.39465664 0.2026224
11 PJ Harvey 0.5155636 0.4578315 0.09909212 0.4328224 0.13406858 0.1739836
12 Prince 0.7078350 0.7295825 0.04883786 0.2464166 0.04265596 0.1405495
8 Nick Cave & The Bad Seeds 0.4147489 0.5152961 0.05414545 0.3465942 0.07602337 0.2952866
valence tempo track_popularity playcount rank
7 0.3084768 131.4496 5.84058 1601 1
3 0.4795632 115.6165 21.64417 1594 2
13 0.2858109 118.7648 47.62821 1556 3
11 0.4331848 118.7181 21.32727 1546 4
12 0.7672718 126.3280 25.81553 1501 5
8 0.3521593 119.5002 27.47619 1243 6
Just a few quick notes concerning the audio features given by the Spotify API (I'm citing Spotify's webpage here):
- danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
- energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
- speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
- acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
- instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
- liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
- valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
- tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
- track_popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are.
Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently.
library(ggplot2)
library(ggrepel)
library(tidyr)
library(scales)
library(RColorBrewer)
data <- readRDS("agg.art.info.Rds")
for (var in c("tempo", "track_popularity", "playcount")) {
data[, paste0("norm.", var)] <- rescale(data[, var], to = c(0, 1))
}
data.long <- gather(data, key = variable, value = value, c("danceability", "energy",
"speechiness", "acousticness", "instrumentalness", "liveness", "valence"))
substr(data.long$variable, 1, 1) <- toupper(substr(data.long$variable, 1, 1))
data.long$artist <- factor(data.long$artist, levels = data$artist)
ggplot(data.long[data.long$rank %in% 1:5,], aes(y = value, x = variable, group = artist, col = artist)) +
geom_line(size = 2, lineend = "round", alpha = .8) +
scale_color_manual(values = brewer.pal(5, "Spectral")) +
scale_y_continuous(breaks = NULL) +
labs(x = "", y = "", col = "Artist") +
coord_flip() +
theme_dark() + theme(axis.text.y = element_text(size = 15),
panel.grid.major.y = element_line(size = 1))
Thanks for visiting r-craft.org
This article is originally published at https://rcrastinate.blogspot.com/
Please visit source website for post related comments.