Skip to content
Dipack's Website
GitLabGitHub

YouTube scraper for your Spotify library - in Rust!

7 min read

I've been trying to learn Rust for a while, and by a while I mean for over 3 years now! Most of my attempts at learning the language have been thwarted by the learning curve, and my general laziness at having to do something that isn't strictly necessary for my day-to-day work and wellbeing.

What is Rust?

If you haven't heard of Rust, then let me give you a brief description of it: It's a systems level language (think C, C++, etc.) that's designed with memory-safety as a first-class citizen, and a ton of other features that I don't know enough to talk about, and has been touted as a replacement for C, C++ and other similar languages in applications where performance and memory safety are top considerations. It's popular enough that Linus Torvalds has commented 

on possibly including the language in the Linux kernel, given enough time, and development towards interoperability with other kernel code.

Why do I want to learn Rust?

My desire to learn Rust stemmed from the fact that I'd started to find developing applications in high level languages like JavaScript, Python, etc. unsatisfying from a performance perspective. While those two languages form the basis for countless tech stacks across the software development world, and are popular for many reasons, I've always appreciated compiled languages that can find errors for me before I deploy code, rather than adding tons of defensive code to catch edge cases from versions past, poorly behaved APIs, and just poorly written software.

Of course, defensive code is not something that only interpreted languages require, and I'm quite aware of the existence of TypeScript which does something similar in terms of types and sniffing out issues before deploys, there's a lot of performance that is left on the table given modern day machines and their impressive capabilities, which interpreted languages.. squander, in my opinion, with their massive amounts of syntactic sugar, and usage of FFI 

to talk to code written in a high performance language like C.

Have you heard of Golang?

For a while, Golang scratched this itch by giving me a language that was super easy to pick up, with a gentle learning curve, and massively performance upgrades over interpreted languages. However, Golang comes with its own issues, some of which are the lack of a true inheritance based system, no easy way to actually enforce implementation of an interface, and frankly baffling approach to external dependency management that I still don't understand, to name a few. As a result, I find myself hesitant to use a language that has such glaring flaws, and seems to be changing quite rapidly.

And so, I find myself going back to wanting to learn Rust, as a language that does all of what I want from a compiled language, and gives me a super simple way to manage memory. Rust also makes it super easy for me to deal with pointers, cause even though I've known basic C++ through coursework in the 11th and 12th grades, and some interesting internships during my Bachelor's, I've always found pointer syntax quite opaque.

Enough talk - what have you built?

Given my earlier attempts to learn the language, I decided this time around was going to be different: I was going to do something that I was already super comfortable with (interacting with REST APIs), and had a need for (finding YouTube links for my Spotify library).

Long story short, I was able to build a tool in Rust to do so, and I learnt a ton of stuff along the way! I'll break down the flow of my tool below.

How does this tool work?

This tool uses the official Spotify API to pull your list of saved tracks from your user profile. It uses 3-legged OAuth client authentication to authenticate with the API.

  1. Start the OAuth callback handler at src/bin/server.rs 
    listening on port 4001.
  2. Then start the client application at src/main.rs 
    .
  3. The client then uses the callback handler to complete the 3-legged OAuth handshake process, and obtain an access token with user-library-read scope.
  4. Using the obtained access token, the client uses the "Get User's Saved Tracks" 
    endpoint to fetch all the saved tracks for the user. Source 
/// ..snipped..
#[derive(Deserialize, Debug, Clone)]
struct Artist {
name: String,
href: String,
id: String,
}
#[derive(Deserialize, Debug, Clone)]
struct Track {
name: String,
href: String,
id: String,
popularity: u32,
track_number: u32,
duration_ms: u64,
artists: Vec<Artist>,
}
#[derive(Deserialize, Debug, Clone)]
struct TrackItem {
added_at: String,
track: Track,
}
#[derive(Deserialize, Debug)]
struct SavedTracksResponse {
href: String,
limit: u32,
next: Option<String>,
offset: u32,
previous: Option<String>,
total: u32,
items: Vec<TrackItem>,
}
/// ..snipped..
/// Query Spotify to get all of the user's saved tracks.
fn get_user_saved_tracks(
access_token: &str,
client: &reqwest::blocking::Client,
) -> Result<Vec<Track>, Box<dyn std::error::Error>> {
let limit: u32 = 50;
let mut offset: u32 = 0;
let mut tracks = Vec::new();
loop {
let response = client
.get("https://api.spotify.com/v1/me/tracks")
.query(&[("limit", limit), ("offset", offset)])
.header("Authorization", format!("Bearer {}", access_token))
.send()?;
let json = response.json::<SavedTracksResponse>()?;
tracks.append(
&mut json
.items
.iter()
.map(|track| track.track.clone())
.collect::<Vec<Track>>(),
);
if json.next.is_some() {
offset += limit;
} else {
break;
}
}
Ok(tracks)
}
/// ..snipped..
  1. Once we have a list of all the user's saved tracks, now we start scraping YouTube to find matches!

How do you scrape YouTube?

At this point I had two choices on how to scrape YouTube to find the best matches for a particular song title, artist query combination: Use the official YouTube API, or build a web-scraper and use YouTube.com directly.

I chose to do the latter to give myself a challenge. I also did not want to deal with instantiating another client application for my project, especially since I'm quite familiar with the onerous process of registering an OAuth client application with Google, having dealt with it more times than I care to count, at my current day job.

Having built a parser for websites using BeautifulSoup and Python, I thought the Rust analogue would look similar: compose a query string for the YouTube search page, use a GET request to fetch the resulting HTML, use a HTML parser to interpret the results, and find the appropriate Anchor tags with the direct link to the video! Given a good enough query string, I was quite sure that the first result on the search results would be the result I wanted, and in manual tests YouTube did not disappoint. But this was where things stopped proceeding according to plan.

It seemed that YouTube would not render the "cards" for its search as part of the HTML returned from the GET request -- there weren't even placeholder elements in HTML! Search results for one of the songs in my library

Upon further digging into the returned HTML, I discovered that YouTube embeds a JSON with an array of results containing all sorts metadata about the query, including links to the video itself, in the HTML for the search results page! JSON array YouTube uses to store metadata

All I had to do now was somehow extract the JSON from the HTML, and pick the top result's link! This turned out to be more difficult than expected (obviously), but with some google-fu (ironic, I suppose?) I was able to find someone else had already done something similar in NodeJS 

, and I decided to re-use their approach 
. Many thanks to Hermann Fassett for saving me a few hours of work!

What does the result look like?

Well, after spending a whole weekend working on this application, the results were quite accurate, I'm proud to say. At the risk of exposing my taste in music, the image below shows the application pulling the top result from YouTube searches, and thanks to YouTube's amazing search algorithm, the linked video is always what I want. Rust tool at work!

What have I learnt?

For starters, I've learnt that I can write very messy, yet decently functional Rust code. There's a lot of room for improvement in the application, especially when it comes to error handling, but I also learnt a lot about reading files, making HTTP requests, structuring Rust projects to have multiple binaries in the same code base, (de)serialising JSON and Rust structs, async-await calls, to name a few. I'm quite happy with the progress I've made, and while there's still a long way for me to go before I can call myself proficient in the language, it's definitely a good feeling to have finally made something in Rust!

© 2024 by Dipack's Website. All rights reserved.