Download

to download video from youtube, the following packages can be used

  1. youtube-dl

    https://github.com/ytdl-org/youtube-dl

    package can be obtained by

    $ pip3 install youtube-dl
    
  2. streamlink

    https://streamlink.github.io/

    package can be obtained by

    $ pip3 install streamlink
    

Usually I use youtube-dl for downloading post-stream video, and streamlink to record livestream. Youtube-dl also supports passing in of user cookie to get access to member only content for channels.

Sometimes youtube is unable to re-encode the streams immediately, taking upto days. During this period, if user wants to download video, it will be limited to the last 2/4/10 hours etc of the video, where the length is dependent on segment size (1s/2s/5s etc) which is dependent on stream setting.

Under such circumstances if one wishes to download the full video, it can be done through collecting video’s metadata info and proceed from there. This can be achieved by

$ youtube-dl --skip-download --write-info-json <video_id>

and pass the metadata json into the python script below to get stitched metadata info, which can be used to download the full video as

$ youtube-dl --load-info-json <metadata_json> -f <specify_the_format_of_interest>
### credit https://github.com/ytdl-org/youtube-dl/issues/26330#issuecomment-678554642
### credit https://github.com/ytdl-org/youtube-dl/issues/26330#issuecomment-867858209
import os
import sys
import json
import re
import argparse

def pad():
    print("WIROKROKRORK")
    with open(args.input) as json_file:
        try:
            data = json.load(json_file)
        except ValueError as err:
            print('Cannot decode this JSON file. Error:', err)
            sys.exit()

        print('Loaded JSON for video ID:', data['id'])
        print('Title:', data['fulltitle'])
        print('Uploader:', data['uploader'])
        print('-----------------------------------------------------------------')
        for format in data['formats']:
            print('Fixing format ID:', format['format_id'])
            # Get the first path ID from the first fragment
            try:
                firstFragment = format['fragments'][0]
            except KeyError as e:
                continue
            firstFragmentID = (re.search('sq/(.*)/lmt/', firstFragment['path'])).group(1)
            # Add the missing fragments
            for id in range(0, int(firstFragmentID)):
                newFragment = {'duration': 2.0, 'path': 'sq/%d' % (id)} 
                format['fragments'].insert(id, newFragment)
        print('-----------------------------------------------------------------')
        print('Writing result to', args.output)
        with open(args.output, 'w') as outfile:
            json.dump(data, outfile)

    json_file.close()

parser = argparse.ArgumentParser(description='Checking YouTube JSON for missing stream fragments, and pad them in.')
parser.add_argument('-i', '--input', type=str, help='path to the input JSON file.', required=True)
parser.add_argument('-o', '--output', type=str, default='output.json', help='path to the output JSON file.')
args = parser.parse_args()

if os.path.exists(args.output):
    print('WARNING:', args.output, 'already exist.')
    overwrite = input('Would you like to overwrite? [y/N] ') or 'n'
    if overwrite.lower() == 'y':
        pad()
    else:
        print('User choose to not overwrite, exiting')
        sys.exit()
else:
    pad()