Download
to download video from youtube, the following packages can be used
-
youtube-dl
https://github.com/ytdl-org/youtube-dl
package can be obtained by
$ pip3 install youtube-dl
-
streamlink
package can be obtained by
$ pip3 install streamlink
Usually I use youtube-dl for downloading post-stream video, and streamlink to record livestream. Youtube-dl also supports passing in of user cookie to get access to member only content for channels.
Sometimes youtube is unable to re-encode the streams immediately, taking upto days. During this period, if user wants to download video, it will be limited to the last 2/4/10 hours etc of the video, where the length is dependent on segment size (1s/2s/5s etc) which is dependent on stream setting.
Under such circumstances if one wishes to download the full video, it can be done through collecting video’s metadata info and proceed from there. This can be achieved by
$ youtube-dl --skip-download --write-info-json <video_id>
and pass the metadata json into the python script below to get stitched metadata info, which can be used to download the full video as
$ youtube-dl --load-info-json <metadata_json> -f <specify_the_format_of_interest>
### credit https://github.com/ytdl-org/youtube-dl/issues/26330#issuecomment-678554642
### credit https://github.com/ytdl-org/youtube-dl/issues/26330#issuecomment-867858209
import os
import sys
import json
import re
import argparse
def pad():
print("WIROKROKRORK")
with open(args.input) as json_file:
try:
data = json.load(json_file)
except ValueError as err:
print('Cannot decode this JSON file. Error:', err)
sys.exit()
print('Loaded JSON for video ID:', data['id'])
print('Title:', data['fulltitle'])
print('Uploader:', data['uploader'])
print('-----------------------------------------------------------------')
for format in data['formats']:
print('Fixing format ID:', format['format_id'])
# Get the first path ID from the first fragment
try:
firstFragment = format['fragments'][0]
except KeyError as e:
continue
firstFragmentID = (re.search('sq/(.*)/lmt/', firstFragment['path'])).group(1)
# Add the missing fragments
for id in range(0, int(firstFragmentID)):
newFragment = {'duration': 2.0, 'path': 'sq/%d' % (id)}
format['fragments'].insert(id, newFragment)
print('-----------------------------------------------------------------')
print('Writing result to', args.output)
with open(args.output, 'w') as outfile:
json.dump(data, outfile)
json_file.close()
parser = argparse.ArgumentParser(description='Checking YouTube JSON for missing stream fragments, and pad them in.')
parser.add_argument('-i', '--input', type=str, help='path to the input JSON file.', required=True)
parser.add_argument('-o', '--output', type=str, default='output.json', help='path to the output JSON file.')
args = parser.parse_args()
if os.path.exists(args.output):
print('WARNING:', args.output, 'already exist.')
overwrite = input('Would you like to overwrite? [y/N] ') or 'n'
if overwrite.lower() == 'y':
pad()
else:
print('User choose to not overwrite, exiting')
sys.exit()
else:
pad()