Remote classes are starting, and you know what that means: long introduction videos, and plenty of equally long lectures to come.
It feels like I always get a migraine trying to watch these videos. No, my professors aren’t particularly boring, and the material isn’t too stale– but the effort I have to put into simply loading the videos is just Sisyphisian. The places these videos are uploaded are so obscure and brittle that I really can’t imagine how they were found to begin with.
How could you expect any less?
It’s simply too much effort to upload a plain MP4 file and say, “Hey, watch this!” No, that’d too frictionless, too easy. After all, if you don’t suffer in the attainment of something, it wouldn’t be worth anything.
Well, this time I had to download a video with subtitles from Yuja, and I learned about HLS. Maybe you’ll find it useful, or learn a bit yourself. If you need to download from Yuja, skip to the end.
HLS videos, generally
Yuja serves videos using HLS, which is slightly more annoying to download than a plain video file. (You can’t just use the “F12->Network and look for videos” trick).
The video’s split into variable-sized chunks, each at a different URL, a list of which is stored in an m3u8 file. If you find the m3u8 file (F12->Network should do the trick this time), you can download the individual chunks and put them into a single video file.
At first, I did something like this:
for link in $(grep -v '#' blah.m3u8); do wget -O part $link cat part >> whole done ffmpeg -i whole video.mp4
… which works just fine! Butt it turns out there’s a more elegant way to do it:
ffmpeg -i $M3U8_URL video.mp4
Yea, ffmpeg’ll handle it all for you. What a good boy. :)
Yuja, in particular
If you want to programmatically and easily get videos from Yuja (skipping out on the tedious F12->Network business), you’ll have to download and parse their video metadata.
Video URLs in Yuja are structured like so (where $SUB is your subdomain):
You can get a JSON file of the metadata on a given video from this URL:
From there, you can find:
- videoHLSFileKey ($HLS_KEY), which is used for getting the m3u8 URL
- captionFileKey, ($SUB_KEY), which is used for getting the subtitle URL
Caption URLs are structured just like this:
m3u8 URLs seem to be structured as:
… no, that’s not a typo. I’m sure the back-end’s prettyyy.
That’s all you need, really.
[EDIT: I’ve found that using videoHLSFileKey is unreliable, and so is following that m3u8 structure. I have updated the yuja-dl script and this post with a better, more reliable method. See below.]
This is the sort of post (and the sort of script) that’s very vulnerable to bitrot. They might switch around their back-end at any moment, might swap around API calls, or maybe even dye their hair yellow. I’ll keep this post and the script updated for as long as I’m dogfooding– a few months, at least– but after that I won’t notice when it bitrots.
I don’t like leaving things to rot, though. So, if you can’t get it working with a video, please send me the URL (on GitHub or e-mail), and I’ll try and get things fixed up.
EDIT: New method for getting m3u8 links
The version up there works for some videos' m3u8s, but not all. Here is a new method that works for every video (… that I tried). Note that the previous method for getting captions still works for everything (I think).
GetVideoListNodeInfo JSON you got
captionFileKey from, you can also
- videoListNodePID ($LN_PID)
From there, you can get more metadata from
by passing the following POST data:
From there you can get this value:
- videoLink ($VFL)
And now we have enough info the get the m3u8 link! You just have to make a request like this (replacing $ID and $VFL, ofc):
Then you can get the m3u8 link(s) in a list called
Bam! Now you’re good to go! =w=