Ripping HLS videos from Yuja [edit]

2021-01-14 · ← 🏠 · english · unix · web · ripping

Remote classes are starting, and you know what that means: long introduction videos, and plenty of equally long lectures to come.

It feels like I always get a migraine trying to watch these videos. No, my professors aren’t particularly boring, and the material isn’t too stale– but the effort I have to put into simply loading the videos is just Sisyphisian. The places these videos are uploaded are so obscure and brittle that I really can’t imagine how they were found to begin with.

Of course, the video isn’t just a file. Of course, the JavaScript carpet-bombs your browser into crashing within seconds. Of course, the proprietary video player is significantly worse than the one that ships with your browser.

How could you expect any less?

It’s simply too much effort to upload a plain MP4 file and say, “Hey, watch this!” No, that’d too frictionless, too easy. After all, if you don’t suffer in the attainment of something, it wouldn’t be worth anything.

Well, this time I had to download a video with subtitles from Yuja, and I learned about HLS. Maybe you’ll find it useful, or learn a bit yourself. If you need to download from Yuja, skip to the end.

HLS videos, generally

Yuja serves videos using HLS, which is slightly more annoying to download than a plain video file. (You can’t just use the “F12->Network and look for videos” trick).

The video’s split into variable-sized chunks, each at a different URL, a list of which is stored in an m3u8 file. If you find the m3u8 file (F12->Network should do the trick this time), you can download the individual chunks and put them into a single video file.

At first, I did something like this:

for link in $(grep -v '#' blah.m3u8); do
	wget -O part $link
	cat part >> whole
done

ffmpeg -i whole video.mp4

… which works just fine! Butt it turns out there’s a more elegant way to do it:

ffmpeg -i $M3U8_URL video.mp4

Yea, ffmpeg’ll handle it all for you. What a good boy. :)

Yuja, in particular

If you want to programmatically and easily get videos from Yuja (skipping out on the tedious F12->Network business), you’ll have to download and parse their video metadata.

Video URLs in Yuja are structured like so (where $SUB is your subdomain):

https://$SUB.yuja.com/V/Video?v=$ID

You can get a JSON file of the metadata on a given video from this URL:

https://$SUB.yuja.com/P/Data/GetVideoListNodeInfo?videoPID=$ID

From there, you can find:

Caption URLs are structured just like this:

https://$SUB.yuja.com/P/DataPage/CaptionFile/$SUB_KEY

m3u8 URLs seem to be structured as:

https://my.yuja.com/P/Data/VideoUrl/level${HLS_KEY}/720p/${HLS_KEY}.m3u8.m3u8?dist=yuja-edits&key=${HLS_KEY}/720p/${HLS_KEY}.m3u8

… no, that’s not a typo. I’m sure the back-end’s prettyyy.

That’s all you need, really.

I put together a shell script that’ll do the whole thing for you, right over here: GitHub, Feneas.

[EDIT: I’ve found that using videoHLSFileKey is unreliable, and so is following that m3u8 structure. I have updated the yuja-dl script and this post with a better, more reliable method. See below.]

On bitrot

This is the sort of post (and the sort of script) that’s very vulnerable to bitrot. They might switch around their back-end at any moment, might swap around API calls, or maybe even dye their hair yellow. I’ll keep this post and the script updated for as long as I’m dogfooding– a few months, at least– but after that I won’t notice when it bitrots.

I don’t like leaving things to rot, though. So, if you can’t get it working with a video, please send me the URL (on GitHub or e-mail), and I’ll try and get things fixed up.

The version up there works for some videos' m3u8s, but not all. Here is a new method that works for every video (… that I tried). Note that the previous method for getting captions still works for everything (I think).

From the GetVideoListNodeInfo JSON you got captionFileKey from, you can also get:

From there, you can get more metadata from

https://dcccd.yuja.com/P/Data/VideoJSON

by passing the following POST data:

video=$ID&node=$LN_PID}

From there you can get this value:

And now we have enough info the get the m3u8 link! You just have to make a request like this (replacing $ID and $VFL, ofc):

https://dcccd.yuja.com/P/Data/VideoSource?video=$VFL&videoPID=$ID

Then you can get the m3u8 link(s) in a list called videoSources.

Bam! Now you’re good to go! =w=