Compare commits

..

27 Commits

Author SHA1 Message Date
Ventilaar
570ac88b99 Reimplement orphaned video listing and more accurate stats
Some checks failed
Update worker server / build-and-publish (release) Has been cancelled
Generate docker image / build-and-publish (release) Successful in 19s
2025-02-11 22:55:21 +01:00
Ventilaar
c51a72ec2b minor typo
Some checks failed
Generate docker image / build-and-publish (release) Failing after 7s
Update worker server / build-and-publish (release) Successful in 16s
2025-02-11 17:41:20 +01:00
Ventilaar
fa8f11dad6 Major update
All checks were successful
Update worker server / build-and-publish (release) Successful in 13s
Generate docker image / build-and-publish (release) Successful in 59s
2025-02-11 13:12:10 +01:00
Ventilaar
46e5d8bb02 fuck
All checks were successful
Update worker server / build-and-publish (release) Successful in 19s
Generate docker image / build-and-publish (release) Successful in 20s
2025-01-29 22:23:41 +01:00
Ventilaar
89ce9b1c0a Add error reporting and fix channel add
All checks were successful
Update worker server / build-and-publish (release) Successful in 33s
Generate docker image / build-and-publish (release) Successful in 1m4s
2025-01-29 19:36:06 +01:00
Ventilaar
729b24debb Task routing
All checks were successful
Update worker server / build-and-publish (release) Successful in 19s
Generate docker image / build-and-publish (release) Successful in 19s
2025-01-24 20:49:17 +01:00
Ventilaar
20e5793cd8 Performance and cleanup
All checks were successful
Update worker server / build-and-publish (release) Successful in 18s
Generate docker image / build-and-publish (release) Successful in 20s
2025-01-23 15:57:36 +01:00
Ventilaar
282b895170 Bug and performance fix
All checks were successful
Generate docker image / build-and-publish (release) Successful in 19s
Update worker server / build-and-publish (release) Successful in 10s
2025-01-23 14:52:15 +01:00
Ventilaar
38f6f04260 Fix None iterable and add new background task
All checks were successful
Update worker server / build-and-publish (release) Successful in 22s
Generate docker image / build-and-publish (release) Successful in 57s
2025-01-23 14:21:01 +01:00
Ventilaar
43e6c00787 idk
All checks were successful
Update worker server / build-and-publish (release) Successful in 18s
Generate docker image / build-and-publish (release) Successful in 20s
2025-01-21 20:49:40 +01:00
Ventilaar
d42030dcbc Small concurrency and logging fix
All checks were successful
Update worker server / build-and-publish (release) Successful in 10s
Generate docker image / build-and-publish (release) Successful in 19s
2025-01-19 13:27:09 +01:00
Ventilaar
5530179558 Small fixup
All checks were successful
Update worker server / build-and-publish (release) Successful in 11s
Generate docker image / build-and-publish (release) Successful in 20s
2025-01-18 23:39:32 +01:00
Ventilaar
1186d236f2 Rework video queue download
All checks were successful
Update worker server / build-and-publish (release) Successful in 11s
Generate docker image / build-and-publish (release) Successful in 14s
2025-01-18 23:29:12 +01:00
Ventilaar
5a4726ac10 Add queue download function
All checks were successful
Update worker server / build-and-publish (release) Successful in 9s
Generate docker image / build-and-publish (release) Successful in 1m3s
2025-01-18 22:20:17 +01:00
Ventilaar
46bde82d32 Hotfix shared state issue
All checks were successful
Update worker server / build-and-publish (release) Successful in 12s
Generate docker image / build-and-publish (release) Successful in 19s
2024-12-07 14:58:52 +01:00
Ventilaar
6c681d6b07 Uhhh
All checks were successful
Update worker server / build-and-publish (release) Successful in 9s
Generate docker image / build-and-publish (release) Successful in 49s
2024-12-05 22:20:55 +01:00
Ventilaar
0d5d233e90 Cleanup and documentation
All checks were successful
Generate docker image / build-and-publish (release) Successful in 19s
Update worker server / build-and-publish (release) Successful in 20s
2024-12-05 22:15:42 +01:00
Ventilaar
548a4860fc it was google!
All checks were successful
Generate docker image / build-and-publish (release) Successful in 55s
Update worker server / build-and-publish (release) Successful in 10s
2024-10-15 16:23:43 +02:00
Ventilaar
da333ab4f6 lets hope it was a fluke
Some checks failed
Generate docker image / build-and-publish (release) Failing after 27s
Update worker server / build-and-publish (release) Successful in 11s
2024-10-15 16:20:44 +02:00
Ventilaar
f2b01033ea compact even more
Some checks failed
Generate docker image / build-and-publish (release) Has been cancelled
Update worker server / build-and-publish (release) Has been cancelled
2024-10-15 16:08:05 +02:00
Ventilaar
49f0ea7481 whyyyy
Some checks failed
Generate docker image / build-and-publish (release) Has been cancelled
Update worker server / build-and-publish (release) Has been cancelled
2024-10-15 16:06:17 +02:00
Ventilaar
f1287a4212 pymongo requires gcc now?
Some checks failed
Generate docker image / build-and-publish (release) Failing after 3m26s
Update worker server / build-and-publish (release) Successful in 9s
2024-10-15 15:59:24 +02:00
Ventilaar
30ea647ca9 Ok, long time no commit. I dont know what ive changed, pray it works
Some checks failed
Update worker server / build-and-publish (release) Successful in 15s
Generate docker image / build-and-publish (release) Failing after 25s
2024-10-15 15:48:09 +02:00
Ventilaar
a7c640a8cf Fix search error, add tombstone 2024-05-04 22:49:50 +02:00
Ventilaar
f6da232164 Rename functions 2024-04-21 00:31:25 +02:00
Ventilaar
1d5934275c Handle websub added messages to queue 2024-04-21 00:26:00 +02:00
Ventilaar
72af6b6126 Handle mass websub subscriptions with added statistics. General cleanup 2024-04-18 23:36:45 +02:00
30 changed files with 1011 additions and 403 deletions

View File

@@ -1,4 +1,4 @@
name: Generate release
name: Generate docker image
on:
release:
@@ -22,13 +22,4 @@ jobs:
uses: docker/build-push-action@v5
with:
push: true
tags: git.ventilaar.nl/ventilaar/ayta:latest
- name: Update worker server
uses: appleboy/ssh-action@v1.0.3
with:
host: 192.168.66.109
username: root
key: ${{ secrets.SERVER_KEY }}
port: 22
script: /root/update_worker.sh
tags: git.ventilaar.nl/ventilaar/ayta:latest

View File

@@ -0,0 +1,18 @@
name: Update worker server
on:
release:
types: [published]
jobs:
build-and-publish:
runs-on: ubuntu-latest
steps:
- name: Update worker server
uses: appleboy/ssh-action@v1.0.3
with:
host: 192.168.66.109
username: root
key: ${{ secrets.SERVER_KEY }}
port: 22
script: /root/update_worker.sh

View File

@@ -1,7 +1,7 @@
FROM python:3-alpine
FROM python:3.12-alpine
WORKDIR /app
COPY requirements.txt /app
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
EXPOSE 8000
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "ayta:create_app()"]
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "--workers", "1", "ayta:create_app()"]

View File

@@ -6,7 +6,7 @@ current cronjob yt-dlp archive service.
Partially inspired by [hobune](https://github.com/rebane2001/hobune). While that project is amazingby it's own, it's just not scaleable.
## The idea
Having over 250k videos, scaling the current cronjob yt-dlp archive task is just really hard. Filetypes change, things get partially downloaded and such.
Having over 350k videos, scaling the current cronjob yt-dlp archive task is just really hard. Filetypes change, things get partially downloaded and such.
Partially yt-dlp is to blame because it's a package that needs to change all the time. But with this some changes are not accounted for.
yt-dlp will still do the downloads. But a flask frontend will be developed to make all downloaded videos easily indexable.
For it to be quick (unlike hobune) a database has to be implemented. This could get solved by a static site generator type of software, but that is not my choice.
@@ -52,13 +52,22 @@ Extra functionality for further development of features.
### Stage 3
Mainly focused on retiring the cronjob based scripts and moving it to celery based tasks
- [ ] manage videos by ID's instead of per channel basis
- [ ] download videos from queue
- [ ] Manage websub callbacks
- [x] manage videos by ID's instead of per channel basis
- [x] download videos from queue
- [x] Manage websub callbacks
- [x] Implement yt-dlp proxy servers, as the VPN is blocked
- [x] Celery tasks based video downloading
- [x] Manage websub callbacks
- [x] Celery task queue views
- [x] More performant statistics
- [ ] Retire cronjobs
- [ ] Retire file based configurations
### Stage 4
Mongodb finally has it's limitations.
- [ ] Migrate to postgresql
- [ ] Retire time based tasks like channel mirroring
- [ ] A more comprehensive statistics page, uploads per day, downloads per day and such
### Stage ...
Since this is my flagship software which I have developed more features will be added.

View File

@@ -1,13 +0,0 @@
#Import os Library
import os
import datetime
import json
def print_current_time(give=False):
time = datetime.datetime.now().replace(microsecond=0)
print(f'--- It is {time} ---')
return time
with open('lockfile', 'w') as file:
data = {'time': print_current_time(), 'PID': os.getpid()}
file.write(json.dumps(data, default=str))

View File

@@ -24,8 +24,19 @@ def create_app(test_config=None):
# Celery Periodic tasks
config['CELERY']['beat_schedule'] = {}
config['CELERY']['beat_schedule']['Renew WebSub endpoints'] = {'task': 'ayta.tasks.websub_renew_expiring', 'schedule': 4000}
#config['CELERY']['beat_schedule']['Process WebSub data'] = {'task': 'ayta.tasks.websub_process_data', 'schedule': 6}
config['CELERY']['beat_schedule']['Renew WebSub endpoints around every hour'] = {'task': 'ayta.tasks.websub_renew_expiring', 'schedule': 4000}
config['CELERY']['beat_schedule']['Process WebSub data around every two minutes'] = {'task': 'ayta.tasks.websub_process_data', 'schedule': 100}
config['CELERY']['beat_schedule']['Queue up new videos in static channel playlists about 2 times a day'] = {'task': 'ayta.tasks.playlist_to_queue', 'schedule': 50000}
config['CELERY']['beat_schedule']['Download around 123 videos spread out through the day'] = {'task': 'ayta.tasks.video_queue', 'schedule': 700}
config['CELERY']['beat_schedule']['Generate new statistiscs about every 3 hours'] = {'task': 'ayta.tasks.generate_statistics', 'schedule': 10000}
# Celery task routing
# Tasks not defined in this configuration will be routed to the default queue "celery"
config['CELERY']['task_routes'] = {
'ayta.tasks.video_download': {'queue': 'download'},
'ayta.tasks.video_queue': {'queue': 'download'}
}
app = Flask(__name__)
app.config.from_mapping(config)
@@ -41,7 +52,9 @@ def create_app(test_config=None):
app.jinja_env.filters['pretty_time'] = filters.pretty_time
app.jinja_env.filters['current_time'] = filters.current_time
app.jinja_env.filters['epoch_time'] = filters.epoch_time
app.jinja_env.filters['epoch_date'] = filters.epoch_date
app.jinja_env.filters['datetime_date'] = filters.datetime_date
from .blueprints import watch
from .blueprints import index
from .blueprints import admin

View File

@@ -1,8 +1,8 @@
from flask import Blueprint, render_template, request, redirect, url_for, flash
from flask import Blueprint, render_template, request, redirect, url_for, flash, current_app
from ..nosql import get_nosql
from ..dlp import checkChannelId, getChannelInfo
from ..decorators import login_required
from ..tasks import websub_subscribe_callback, websub_unsubscribe_callback
from ..tasks import test_sleep, websub_subscribe_callback, websub_unsubscribe_callback, video_download, video_queue, playlist_to_queue
from datetime import datetime
from secrets import token_urlsafe
@@ -30,28 +30,35 @@ def channels():
generic = {}
if request.method == 'POST':
channelId = request.form.get('channel_id', None)
originalName = request.form.get('original_name', None)
addedDate = request.form.get('added_date', None)
task = request.form.get('task', None)
if task == 'add_channel':
channelId = request.form.get('channel_id', None)
originalName = request.form.get('original_name', None)
addedDate = request.form.get('added_date', None)
### add some validation
addedDate = datetime.strptime(addedDate, '%Y-%m-%d')
if checkChannelId(channelId) is False:
channelId, originalName = getChannelInfo(channelId, ('channel_id', 'uploader'))
if not get_nosql().insert_new_channel(channelId, originalName, addedDate):
flash('Error inserting new channel, you probably made a mistake somewhere')
return redirect(url_for('admin.channels'))
### add some validation
addedDate = datetime.strptime(addedDate, '%Y-%m-%d')
if checkChannelId(channelId) is False:
channelId, originalName = getChannelInfo(channelId, ('channel_id', 'uploader'))
if not get_nosql().insert_new_channel(channelId, originalName, addedDate):
flash('Error inserting new channel, you probably made a mistake somewhere')
return redirect(url_for('admin.channels'))
return redirect(url_for('admin.channel', channelId=channelId))
return redirect(url_for('admin.channel', channelId=channelId))
elif task == 'playlist-queue':
task = playlist_to_queue.delay()
flash(f'Task playlist-queue has been queued: {task.id}')
generic['currentDate'] = datetime.utcnow()
channelIds = get_nosql().list_all_channels()
for channelId in channelIds:
channels[channelId] = get_nosql().get_channel_info(channelId)
channels[channelId] = get_nosql().get_channel_info(channelId, limited=True)
channels[channelId]['video_count'] = get_nosql().get_channel_videos_count(channelId)
return render_template('admin/channels.html', channels=channels, generic=generic)
@@ -76,10 +83,10 @@ def channel(channelId):
return redirect(url_for('admin.channel', channelId=channelId))
if task == 'update-value':
if key == 'active':
if key in ['active', 'websub']:
value = True if value else False
if key == 'added_date':
if key in ['added_date']:
value = datetime.strptime(value, '%Y-%m-%d')
get_nosql().update_channel_key(channelId, key, value)
@@ -109,6 +116,8 @@ def run(runId):
@bp.route('/websub', methods=['GET', 'POST'])
@login_required
def websub():
render = {}
if request.method == 'POST':
task = request.form.get('task', None)
value = request.form.get('value', None)
@@ -118,18 +127,30 @@ def websub():
flash(f"Started task {task.id}")
return redirect(url_for('admin.websub'))
elif task == 'clean-retired':
get_nosql().websub_cleanRetired()
return redirect(url_for('admin.websub'))
elif task == 'unsubscribe-callbacks':
for callbackId in get_nosql().websub_getCallbacks():
websub_unsubscribe_callback.delay(callbackId)
flash(f"Started unsubscribe tasks for all callbacks")
return redirect(url_for('admin.websub'))
elif task == 'subscribe-channels':
for channelId in get_nosql().list_all_channels(websub=True):
websub_subscribe_callback.delay(channelId)
flash(f'Started subscribe tasks for activated channels')
return redirect(url_for('admin.websub'))
callbackIds = get_nosql().websub_getCallbacks()
callbacks = {}
render['stats'] = get_nosql().websub_statistics()
for callbackId in callbackIds:
callbacks[callbackId] = get_nosql().websub_getCallback(callbackId)
return render_template('admin/websub.html', callbacks=callbacks)
return render_template('admin/websub.html', callbacks=callbacks, render=render)
@bp.route('/reports', methods=['GET', 'POST'])
@login_required
@@ -142,14 +163,18 @@ def reports():
get_nosql().close_report(value)
flash(f'Report closed {value}')
return redirect(url_for('admin.reports'))
elif task == 'clean-closed':
get_nosql().report_clean()
flash(f'Cleaned closed reports older than 30 days')
return redirect(url_for('admin.reports'))
reports = get_nosql().list_reports()
return render_template('admin/reports.html', reports=reports)
@bp.route('/posters', methods=['GET', 'POST'])
@bp.route('/queue', methods=['GET', 'POST'])
@login_required
def posters():
def queue():
if request.method == 'POST':
task = request.form.get('task', None)
value = request.form.get('value', None)
@@ -160,34 +185,54 @@ def posters():
flash('Description must be at least 8 characters long')
if value and len(value) >= 12:
get_nosql().poster_newEndpoint(value, description)
get_nosql().queue_newEndpoint(value, description)
flash(f'Created endpoint ID: {value}')
else:
value = token_urlsafe(16)
get_nosql().poster_newEndpoint(value, description)
get_nosql().queue_newEndpoint(value, description)
flash(f'Created endpoint ID: {value}')
elif task == 'retire':
get_nosql().poster_retireEndpoint(value)
get_nosql().queue_retireEndpoint(value)
flash(f'Endpoint retired: {value}')
elif task == 'clean-retired':
get_nosql().poster_cleanRetired()
get_nosql().queue_cleanRetired()
flash(f'Cleaned retired endpoints')
elif task == 'manual-queue':
get_nosql().poster_insertQueue('manual', value)
flash(f'Added to queue: {value}')
if not get_nosql().check_exists(value):
direct = request.form.get('direct', None)
if direct:
task = video_download.delay(value)
flash(f"Started task {task.id}")
else:
get_nosql().queue_insertQueue(value, 'webui')
flash(f'Added to queue: {value}')
else:
flash(f'This video ID already exists in the archive: {value}')
elif task == 'delete-queue':
get_nosql().poster_deleteQueue(value)
get_nosql().queue_deleteQueue(value)
flash(f'Deleted from queue: {value}')
elif task == 'empty-queue':
get_nosql().queue_emptyQueue()
flash(f'Queue has been emptied')
elif task == 'queue-run-once':
value = int(value) if value.isdigit() else 1
for x in range(value):
task = video_queue.delay()
flash(f'Task has been started on the oldest queued item: {task.id}')
return redirect(url_for('admin.posters'))
return redirect(url_for('admin.queue'))
endpoints = get_nosql().poster_getEndpoints()
queue = get_nosql().poster_getQueue()
endpoints = get_nosql().queue_getEndpoints()
queue = get_nosql().queue_getQueue()
count = len(list(queue.clone()))
return render_template('admin/posters.html', endpoints=endpoints, queue=queue)
return render_template('admin/queue.html', endpoints=endpoints, queue=queue, count=count)
@bp.route('/users', methods=['GET', 'POST'])
@login_required
@@ -215,4 +260,18 @@ def users():
users = get_nosql().list_all_users()
return render_template('admin/users.html', users=users)
return render_template('admin/users.html', users=users)
@bp.route('/workers', methods=['GET', 'POST'])
@login_required
def workers():
if request.method == 'POST':
task = request.form.get('task', None)
if task == 'test-sleep':
test_sleep.delay()
celery = current_app.extensions.get('celery')
tasks = celery.control.inspect().active()
reserved = celery.control.inspect().reserved()
return render_template('admin/workers.html', tasks=tasks, reserved=reserved)

View File

@@ -39,10 +39,10 @@ def websub(cap):
return abort(404)
@bp.route('/poster/<cap>', methods=['POST'])
def poster(cap):
@bp.route('/queue/<cap>', methods=['POST'])
def queue(cap):
# if endpoint does not exist
if not get_nosql().poster_isActive(cap):
if not get_nosql().queue_isActive(cap):
return abort(404)
videoId = request.form.get('v')
@@ -60,7 +60,7 @@ def poster(cap):
return abort(409)
# try to insert
if get_nosql().poster_insertQueue(cap, videoId):
if get_nosql().queue_insertQueue(videoId, cap):
return '', 202
else:
return abort(409)

View File

@@ -11,7 +11,7 @@ def base():
channelIds = get_nosql().list_all_channels()
for channelId in channelIds:
channel = get_nosql().get_channel_info(channelId)
channel = get_nosql().get_channel_info(channelId, limited=True)
channel['video_count'] = get_nosql().get_channel_videos_count(channelId)
channels.append(channel)

View File

@@ -22,5 +22,4 @@ def base():
return render_template('search/index.html', results=results, query=query)
return render_template('search/index.html', stats=get_nosql().gen_stats())
return render_template('search/index.html', stats=get_nosql().statistics_get())

View File

@@ -37,7 +37,7 @@ def base():
render['info'] = get_nosql().get_video_info(vGet)
render['params'] = request.args.get('v')
if render['info']['_status'] != 'available':
if render['info'].get('_status') != 'available':
flash(render['info'].get('_status_description', 'Video unavailable because of technical errors. Come back later.'))
return redirect(url_for('index.base'))

View File

@@ -16,9 +16,21 @@ def pretty_time(time):
except:
return time # return given time
def epoch_time(time):
def epoch_date(epoch):
try:
return datetime.fromtimestamp(time).strftime('%d %b %Y')
return datetime.fromtimestamp(epoch).strftime('%d %b %Y')
except:
return None
def epoch_time(epoch):
try:
return datetime.fromtimestamp(epoch).strftime('%d %b %Y %H:%M:%S')
except:
return None
def datetime_date(obj):
try:
return obj.strftime('%d %b %Y %H:%M')
except:
return None

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,11 @@
class OIDC():
"""
This function class is nothing more than a nonce and state store for security in the authentication mechanism.
Additionally this class provides the function to generate redirect url's and check bearer tokens on their validity as well as caching jwt signing keys.
Fairly barebones and should be 100% secure. (famous last words)
This is made for form posted JWT's. While not the most secure it is the most easy way to implement. Moving on to a code based solution might be preferred in the future.
The nonce and state store is in memory, so only one instance can be used at a time until central key caching is implemented.
"""
def __init__(self, app=None):
self.states = {}
self.nonces = {}
@@ -15,27 +22,34 @@ class OIDC():
self.client_id = config['OIDC_ID']
self.provider = config['OIDC_PROVIDER']
self.domain = config['DOMAIN']
self.window = 120 # the time window to allow states and nonces in seconds
# Authentication provider url must be HTTPS and end on a TLD
if self.provider[:8] != 'https://' or self.provider[-1] == '/':
print('Incorrect OIDC provider URI', flush=True)
exit()
# Get the provider configuration endpoints
configuration = requests.get(f'{self.provider}/.well-known/openid-configuration').json()
jwks_uri = configuration.get('jwks_uri')
self.authorize_uri = configuration.get('authorization_endpoint')
# Start the JWKS management client, it will load the keys and maintain them
self.jwks_manager = jwt.PyJWKClient(jwks_uri)
#################################
#######################################################
def state_maintenance(self):
from datetime import datetime
pivot = datetime.now().timestamp() - 120
# Current time minus the acceptable window
pivot = datetime.now().timestamp() - self.window
# List with expired states
expired_states = [state for state, timestamp in self.states.items() if timestamp <= pivot]
# Remove expired states from store
for state in expired_states:
del self.states[state]
@@ -43,30 +57,40 @@ class OIDC():
import secrets
from datetime import datetime
# Clean state store first
self.state_maintenance()
# Generate token and paired timestamp
state = secrets.token_urlsafe(8)
timestamp = datetime.now().timestamp()
# Add token to the state store
self.states[state] = timestamp
# Return the state
return state
def state_check(self, state):
# Clean state store first
self.state_maintenance()
# If given state is actively stored
if state in self.states:
# Delete state and return True
del self.states[state]
return True
# Given state is not stored
return False
#################################
#######################################################
# Same code as above but a different store for nonces #
#######################################################
def nonce_maintenance(self):
from datetime import datetime
pivot = datetime.now().timestamp() - 120
pivot = datetime.now().timestamp() - self.window
expired_nonces = [nonce for nonce, timestamp in self.nonces.items() if timestamp <= pivot]
@@ -95,7 +119,7 @@ class OIDC():
return False
#################################
#######################################################
def generate_redirect(self):
return str(f'{self.authorize_uri}'
@@ -107,21 +131,32 @@ class OIDC():
def check_bearer(self, token):
import jwt
# Test given JWT
try:
# Get the signed public key from the token
signing_key = self.jwks_manager.get_signing_key_from_jwt(token).key
# Try to decode the token, this will also check the validity in these points:
# 1. Token is signed by expected keys
# 2. Token is issued by the expected provider
# 3. Expected parameters are really in the token
# 4. Token is really intended for us
# 5. Token is still valid (with 5 sec margin)
decoded = jwt.decode(token, signing_key,
algorithms=jwt.algorithms.get_default_algorithms(),
issuer=self.provider,
require=['aud', 'client_id', 'exp', 'iat', 'iss', 'rat', 'sub'],
audience=self.client_id,
leeway=5)
# Any exception (invalid JWT, invalid formatting etc...) must return False
except Exception as e:
print(e, flush=True)
return False
# double check if given token is really requested by us
# Double check if given token is really requested by us by matching the nonce in the signed key
if not self.nonce_check(decoded.get('nonce', None)):
return False
# Return the unique user identifier
return decoded.get('sub', False)

View File

@@ -5,6 +5,48 @@ from flask import current_app
# CELERY TASKS #
##########################################
@shared_task()
def test_sleep(time=60):
from time import sleep
sleep(time)
return True
@shared_task()
def video_download(videoId):
"""
I do not want to deal with the quirks of native yt-dlp in python, hence the subprocess.
"""
import subprocess
process = subprocess.run(['/usr/local/bin/yt-dlp', '--config-location', '/var/www/archive.ventilaar.net/goodstuff/config_video.conf', '--', f'https://www.youtube.com/watch?v={videoId}'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
if process.returncode != 0:
return (False, process.stdout)
return (True, None)
@shared_task()
def video_queue():
"""
Gets the oldest video ID from the queue and runs video_download() on it.
"""
from .nosql import get_nosql
videoId = get_nosql().queue_getNext()
if videoId:
videoId = videoId['id']
else:
return None
status, reason = video_download(videoId)
if status:
get_nosql().queue_deleteQueue(videoId)
return True
else:
get_nosql().queue_setFailed(videoId, reason)
return False
@shared_task()
def websub_subscribe_callback(channelId):
import requests
@@ -33,7 +75,9 @@ def websub_subscribe_callback(channelId):
response = requests.post(url, data=data)
if response.status_code == 202:
return True
# maybe handle errors?
return False
@shared_task()
@@ -61,6 +105,8 @@ def websub_unsubscribe_callback(callbackId):
if response.status_code == 202:
return True
# maybe handle errors?
return False
@shared_task()
@@ -68,37 +114,90 @@ def websub_process_data():
from .nosql import get_nosql
while True:
data = get_nosql().websub_getFirstPostData()
if not data:
blob = get_nosql().websub_getFirstPostData()
if not blob:
break
_id, data = data
_id, data = blob
parsed = do_parse_data(data)
if not parsed:
get_nosql().websub_deletePostProcessing(_id)
state, channelId, videoId = parsed
if parsed:
state, channelId, videoId = parsed
if state == 'added':
if not get_nosql().check_exists(videoId): # if video not exists
get_nosql().queue_insertQueue(videoId, 'WebSub')
# note for future me
# the websub notifications report ALL videos, including shorts and livestreams
# so if you are going to work on individual video downloading make sure you filter them!
elif state == 'removed':
# we currently do not do anything with removed videos
# but the idea is to trigger a full channel mirror in case a creator started to mass delete videos
pass
get_nosql().websub_deletePostProcessing(_id)
@shared_task()
def websub_renew_expiring(hours=6):
from .nosql import get_nosql
from datetime import datetime, timedelta
count = 0
for callbackId in get_nosql().websub_getCallbacks():
data = get_nosql().websub_getCallback(callbackId)
pivot = datetime.utcnow() - timedelta(hours=hours)
expires = data.get('activation_time') + timedelta(seconds=data.get('lease'))
if pivot <= expires: # if expiration happens after the calculation time pass the loop
if data.get('status') not in ['active']: # callback not active
continue
print(f'{callbackId} should be renewed')
pivot = datetime.utcnow() + timedelta(hours=hours) # hours past now
expires = data.get('activation_time') + timedelta(seconds=data.get('lease')) # callback expires at
if pivot <= expires: # expiration happens after n hours fron now
continue # skip callback
# expiration happens within n hours
websub_subscribe_callback.delay(data.get('channel'))
# limit amount of subscribe requests to spread out the requests over time
# with an expiration pivot of 6h and a maximum validity of 5 days we can currently handle 3072 channels
count = count + 1
if count >= 256:
break
@shared_task()
def playlist_to_queue():
"""
As there is still one cronjob based task running daily in the background, we have to make sure that gets hooked as well into the system.
The cronjob task gets the last 50 uploads for all channels and commits the playlist json into the database
This task makes sure we append the ID's that we got from the playlist into the download queue.
Should idealy be run after the cronjob completes, but I don't want to implement an API that does that, so this gets run twice a day.
"""
from .nosql import get_nosql
import random
from datetime import datetime, timedelta
pivot = datetime.utcnow() - timedelta(days=3) # calculates 3 days before now
channels = list(get_nosql().list_all_channels(active=True))
random.shuffle(channels) # randomize channelId order because otherwise the queue will follow the channel order as well
for channel in channels:
info = get_nosql().get_channel_info(channel)
# if last_run not set or last_run is older than the pivot (indicating it has not been updated)
if not info.get('last_run') or info.get('last_run') < pivot:
# skip channel
continue
for item in info['playlist']['entries']:
videoId = item['id']
get_nosql().queue_insertQueue(videoId, 'Playlist mirroring')
@shared_task()
def generate_statistics():
from .nosql import get_nosql
get_nosql().statistics_generate()
##########################################
# TASK MODULES #

View File

@@ -19,7 +19,7 @@
{% for item in channelInfo %}
<form method="POST">
<div class="input-field">
<span class="supporting-text">{{ item }}</span>
<span class="supporting-text mb-2">{{ item }}</span>
<input class="validate" type="text" value="{{ item }}" name="key" hidden>
</div>

View File

@@ -15,6 +15,18 @@
</div>
</div>
<div class="row">
<div class="col s12 l4 m-4">
<div class="card">
<div class="card-content">
<span class="card-title">Direct actions</span>
<form class="mt-4" method="post">
<button class="btn mb-2 green" type="submit" name="task" value="playlist-queue">Playlist to Queue</button>
<br>
<span class="supporting-text">Forcerun playlist to queue task</span>
</form>
</div>
</div>
</div>
<div class="col s12 l4 m-4">
<div class="card">
<div class="card-content">
@@ -38,7 +50,7 @@
});
</script>
</div>
<button class="btn mt-4" type="submit" name="action" value="add_channel">Add</button>
<button class="btn mt-4" type="submit" name="task" value="add_channel">Add</button>
</form>
</div>
</div>

View File

@@ -11,79 +11,89 @@
<div class="divider"></div>
<div class="row">
<div class="col s12">
<h5>Global channel options</h5>
<h5>Global channel options</h5>
</div>
</div>
<div class="row">
<div class="col s6 l4 m-4">
<a href="{{ url_for('admin.system') }}">
<div class="card black-text">
<a href="{{ url_for('admin.system') }}">
<div class="card black-text">
<div class="card-content">
<span class="card-title">System</span>
<p class="grey-text">Internal system settings</p>
<p class="grey-text">Internal system settings</p>
</div>
</div>
</a>
</a>
</div>
<div class="col s6 l4 m-4">
<a href="{{ url_for('admin.channels') }}">
<div class="card black-text">
<a href="{{ url_for('admin.channels') }}">
<div class="card black-text">
<div class="card-content">
<span class="card-title">Channels</span>
<p class="grey-text">Manage channels in the system</p>
<p class="grey-text">Manage channels in the system</p>
</div>
</div>
</a>
</a>
</div>
<div class="col s6 l4 m-4">
<a href="{{ url_for('admin.runs') }}">
<div class="card black-text">
<a href="{{ url_for('admin.runs') }}">
<div class="card black-text">
<div class="card-content">
<span class="card-title">Archive runs</span>
<p class="grey-text">Look at the cron run logs</p>
<p class="grey-text">Look at the cron run logs</p>
</div>
</div>
</a>
</a>
</div>
<div class="col s6 l4 m-4">
<a href="{{ url_for('admin.websub') }}">
<div class="card black-text">
<a href="{{ url_for('admin.websub') }}">
<div class="card black-text">
<div class="card-content">
<span class="card-title">WebSub</span>
<p class="grey-text">Edit WebSub YouTube links</p>
<p class="grey-text">Edit WebSub YouTube links</p>
</div>
</div>
</a>
</a>
</div>
<div class="col s6 l4 m-4">
<a href="{{ url_for('admin.reports') }}">
<div class="card black-text">
<a href="{{ url_for('admin.reports') }}">
<div class="card black-text">
<div class="card-content">
<span class="card-title">Reports</span>
<p class="grey-text">View user reports</p>
<p class="grey-text">View user reports</p>
</div>
</div>
</a>
</a>
</div>
<div class="col s6 l4 m-4">
<a href="{{ url_for('admin.posters') }}">
<div class="card black-text">
<a href="{{ url_for('admin.queue') }}">
<div class="card black-text">
<div class="card-content">
<span class="card-title">Posters</span>
<p class="grey-text">Extension posters</p>
<span class="card-title">Queue</span>
<p class="grey-text">Video download queue and API access</p>
</div>
</div>
</a>
</a>
</div>
<div class="col s6 l4 m-4">
<a href="{{ url_for('admin.users') }}">
<div class="card black-text">
<a href="{{ url_for('admin.users') }}">
<div class="card black-text">
<div class="card-content">
<span class="card-title">Users</span>
<p class="grey-text">Authenticated users</p>
<p class="grey-text">Authenticated users</p>
</div>
</div>
</a>
</a>
</div>
<div class="col s6 l4 m-4">
<a href="{{ url_for('admin.workers') }}">
<div class="card black-text">
<div class="card-content">
<span class="card-title">Workers</span>
<p class="grey-text">Worker and task management</p>
</div>
</div>
</a>
</div>
</div>
{% endblock %}

View File

@@ -1,150 +0,0 @@
{% extends 'material_base.html' %}
{% block title %}Posters administration page{% endblock %}
{% block description %}Posters administration page of the AYTA system{% endblock %}
{% block content %}
<div class="row">
<div class="col s12 l11">
<h4>Posters administration page</h4>
</div>
<div class="col s12 l1 m-5">
<form method="POST">
<input title="Prunes all deleted endpoints, but keeps last 3 days" type="submit" value="clean-retired" name="task">
</form>
</div>
</div>
<div class="divider"></div>
<div class="row">
<div class="col s12">
<h5>Poster options</h5>
</div>
</div>
<div class="row">
<div class="col s12 l4 m-4">
<div class="card">
<div class="card-content">
<span class="card-title">Create new endpoint</span>
<form method="post">
<div class="row">
<div class="col s12 input-field">
<input placeholder="Custom endpoint" name="value" type="text" class="validate" minlength="12">
<span class="supporting-text">Leaving this empty will create a random secure string</span>
</div>
<div class="col s12 input-field">
<input placeholder="Description" name="description" type="text" class="validate" minlength="8" maxlength="64" required>
<span class="supporting-text">Description for the endpoint for better administration</span>
</div>
<button class="btn mt-4" type="submit" name="task" value="add-endpoint">Create</button>
</div>
</form>
</div>
</div>
</div>
<div class="col s12 l4 m-4">
<div class="card">
<div class="card-content">
<span class="card-title">Queue manually</span>
<form method="post">
<div class="row">
<div class="col s12 input-field">
<input placeholder="Youtube video ID" name="value" type="text" class="validate" minlength="11" maxlength="11" required>
<span class="supporting-text">Must be a valid Youtube video ID</span>
</div>
<div class="col s12 mt-5 input-field">
<div class="switch">
<label>Queue<input type="checkbox" value="direct" name="value" disabled><span class="lever"></span>Direct</label>
<span class="supporting-text">Queue up or start directly</span>
</div>
</div>
<button class="btn mt-4" type="submit" name="task" value="manual-queue">Queue</button>
</div>
</form>
</div>
</div>
</div>
</div>
<div class="divider"></div>
<div class="row">
<div class="col s6 l9">
<h5>Registered endpoints</h5>
</div>
<div class="col s6 l3 m-4 input-field">
<input id="filter_query" type="text">
<label for="filter_query">Filter results</label>
</div>
</div>
<div class="row">
<div class="col s12">
<table class="striped highlight responsive-table">
<thead>
<tr>
<th>Actions</th>
<th>id</th>
<th>description</th>
<th>status</th>
<th>created_time</th>
<th>retired_time</th>
</tr>
</thead>
<tbody>
{% for endpoint in endpoints %}
<tr class="filterable">
<td>
<form method="post">
<input type="text" value="{{ endpoint.get('id') }}" name="value" hidden>
<button class="btn-small waves-effect waves-light" type="submit" name="task" value="retire" title="Retire endpoint" {% if endpoint.get('status') != 'active' %}disabled{% endif %}>🗑️</button>
</form>
</td>
<td>{{ endpoint.get('id') }}</td>
<td>{{ endpoint.get('description') }}</td>
<td>{{ endpoint.get('status') }}</td>
<td>{{ endpoint.get('created_time') }}</td>
<td>{{ endpoint.get('retired_time') }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
<div class="divider"></div>
<div class="row">
<div class="col s6 l9">
<h5>Queued ID's</h5>
</div>
<div class="col s6 l3 m-4 input-field">
<input id="filter_query" type="text">
<label for="filter_query">Filter results</label>
</div>
</div>
<div class="row">
<div class="col s12">
<table class="striped highlight responsive-table">
<thead>
<tr>
<th>Actions</th>
<th>id</th>
<th>endpoint</th>
<th>status</th>
<th>created_time</th>
</tr>
</thead>
<tbody>
{% for id in queue %}
<tr class="filterable">
<td>
<form method="post">
<input type="text" value="{{ id.get('id') }}" name="value" hidden>
<button class="btn-small waves-effect waves-light" type="submit" name="task" value="delete-queue" title="Delete from queue" {% if id.get('status') != 'queued' %}disabled{% endif %}>🗑️</button>
</form>
</td>
<td>{{ id.get('id') }}</td>
<td>{{ id.get('endpoint') }}</td>
<td>{{ id.get('status') }}</td>
<td>{{ id.get('created_time') }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
{% endblock %}

View File

@@ -0,0 +1,178 @@
{% extends 'material_base.html' %}
{% block title %}Queue administration page{% endblock %}
{% block description %}Queue administration page of the AYTA system{% endblock %}
{% block content %}
<div class="row">
<div class="col s12">
<h4>Queue administration page</h4>
</div>
</div>
<div class="divider"></div>
<div class="row">
<div class="col s12">
<h5>Options</h5>
</div>
</div>
<div class="row">
<div class="col s12 l4 m-4">
<div class="card">
<div class="card-content">
<span class="card-title">Direct actions</span>
<form class="mt-4" method="post" onsubmit="return confirm('Are you sure?');">
<button class="btn mb-2 red" type="submit" name="task" value="empty-queue">Empty Queue</button>
<br>
<span class="supporting-text">Removes all queued ids</span>
</form>
<form class="mt-4" method="post" onsubmit="return confirm('Are you sure?');">
<button class="btn mb-2" type="submit" name="task" value="clean-retired">Clean retired</button>
<br>
<span class="supporting-text">Prunes all deactivated endpoints, but keeps last 3 days</span>
</form>
<form class="mt-4 input-field" method="post" onsubmit="return confirm('Are you sure?');">
<input type="number" style="width: 80px" value="1" name="value" min="1" max="99">
<button class="btn mb-2 green" type="submit" name="task" value="queue-run-once">Download oldest queued</button>
<br>
<span class="supporting-text">Will download the oldest queued video ID</span>
</form>
</div>
</div>
</div>
<div class="col s12 l4 m-4">
<div class="card">
<div class="card-content">
<span class="card-title">Create new endpoint</span>
<form method="post">
<div class="row">
<div class="col s12 input-field">
<input placeholder="Custom endpoint" name="value" type="text" class="validate" minlength="12">
<span class="supporting-text">Leaving this empty will create a random secure string</span>
</div>
<div class="col s12 input-field">
<input placeholder="Description" name="description" type="text" class="validate" minlength="8" maxlength="64" required>
<span class="supporting-text">Description for the endpoint for better administration</span>
</div>
<button class="btn mt-4" type="submit" name="task" value="add-endpoint">Create</button>
</div>
</form>
</div>
</div>
</div>
<div class="col s12 l4 m-4">
<div class="card">
<div class="card-content">
<span class="card-title">Queue manually</span>
<form method="post">
<div class="row">
<div class="col s12 input-field">
<input placeholder="Youtube video ID" name="value" type="text" class="validate" minlength="11" maxlength="11" required>
<span class="supporting-text">Must be a valid Youtube video ID</span>
</div>
<div class="col s12 mt-5 input-field">
<div class="switch">
<label>Queue<input type="checkbox" value="direct" name="direct"><span class="lever"></span>Direct</label>
<span class="supporting-text">Queue up or start directly</span>
</div>
</div>
<button class="btn mt-4" type="submit" name="task" value="manual-queue">Queue</button>
</div>
</form>
</div>
</div>
</div>
</div>
<div class="divider"></div>
<div class="row">
<div class="col s6 l9">
<h5>Registered endpoints</h5>
</div>
<div class="col s6 l3 m-4 input-field">
<input id="filter_query" type="text">
<label for="filter_query">Filter results</label>
</div>
</div>
<div class="row">
<div class="col s12">
<table class="striped highlight responsive-table">
<thead>
<tr>
<th>Actions</th>
<th>id</th>
<th>description</th>
<th>status</th>
<th>created_time</th>
<th>retired_time</th>
</tr>
</thead>
<tbody>
{% for endpoint in endpoints %}
<tr class="filterable">
<td>
<form method="post">
<input type="text" value="{{ endpoint.get('id') }}" name="value" hidden>
<button class="btn-small waves-effect waves-light" type="submit" name="task" value="retire" title="Retire endpoint" {% if endpoint.get('status') != 'active' %}disabled{% endif %}>🗑️</button>
</form>
</td>
<td>{{ endpoint.get('id') }}</td>
<td>{{ endpoint.get('description') }}</td>
<td>{{ endpoint.get('status') }}</td>
<td>{{ endpoint.get('created_time') }}</td>
<td>{{ endpoint.get('retired_time') }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
<div class="divider"></div>
<div class="row">
<div class="col s4 l8">
<h5>Queued ID's</h5>
</div>
<div class="col s4 l1">
<p>{{ count }} items</p>
</div>
<div class="col s4 l3 m-4 input-field">
<input id="filter_query" type="text">
<label for="filter_query">Filter results</label>
</div>
</div>
<div class="row">
<div class="col s12">
<table class="striped highlight responsive-table">
<thead>
<tr>
<th>Actions</th>
<th>id</th>
<th>endpoint</th>
<th>status</th>
<th>created_time</th>
<th>fail_reason</th>
</tr>
</thead>
<tbody>
{% for id in queue %}
<tr class="filterable">
<td>
<form method="post">
<input type="text" value="{{ id.get('id') }}" name="value" hidden>
<button class="btn-small waves-effect waves-light" type="submit" name="task" value="delete-queue" title="Delete from queue" {% if id.get('status') == 'working' %}disabled{% endif %}>🗑️</button>
</form>
<form method="post">
<input type="text" value="{{ id.get('id') }}" name="value" hidden>
<button class="btn-small waves-effect waves-light" type="submit" name="task" value="run-download" title="Run download task" disabled>⏩</button>
<!-- This function fill not work until the download queue and video download process is rewritten -->
</form>
</td>
<td>{{ id.get('id') }}</td>
<td>{{ id.get('endpoint') }}</td>
<td>{{ id.get('status') }}</td>
<td>{{ id.get('created_time') }}</td>
<td><textarea class="info">{{ id.get('fail_reason') }}</textarea></td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
{% endblock %}

View File

@@ -4,14 +4,9 @@
{% block content %}
<div class="row">
<div class="col s12 l11">
<div class="col s12">
<h4>WebSub administration page</h4>
</div>
<div class="col s12 l1 m-5">
<form method="POST">
<input title="Prunes all retired callbacks, but keeps last 3 days" type="submit" value="clean-retired" name="task">
</form>
</div>
</div>
<div class="divider"></div>
<div class="row">
@@ -19,6 +14,43 @@
<h5>WebSub options</h5>
</div>
</div>
<div class="row">
<div class="col s12 l4 m-4">
<div class="card">
<div class="card-content">
<span class="card-title">Direct actions</span>
<form method="post" onsubmit="return confirm('Are you sure?');">
<button class="btn mb-2 green" type="submit" name="task" value="subscribe-channels">Subscribe channels</button>
<br>
<span class="supporting-text">Send WebSub subscription request for all activated channels. (This will renew existing ones as well)</span>
</form>
<form class="mt-4" method="post" onsubmit="return confirm('Are you sure?');">
<button class="btn mb-2 red" type="submit" name="task" value="unsubscribe-callbacks">Unsubscribe channels</button>
<br>
<span class="supporting-text">Send WebSub unsubscription request for all activated endpoints. (This will only unsubscribe, not disable)</span>
</form>
<form class="mt-4" method="post" onsubmit="return confirm('Are you sure?');">
<button class="btn mb-2" type="submit" name="task" value="clean-retired">Clean retired</button>
<br>
<span class="supporting-text">Prunes all retired callbacks, but keeps until last day</span>
</form>
</div>
</div>
</div>
<div class="col s12 l4 m-4">
<div class="card">
<div class="card-content">
<span class="card-title">Statistics</span>
<h6>Unprocessed callback datapoints</h6>
<p>{{ render['stats']['unprocessed_data'] }}</p>
<h6>Active callbacks</h6>
<p>{{ render['stats']['active_callbacks'] }}</p>
<h6>Something</h6>
<p>Blah</p>
</div>
</div>
</div>
</div>
<div class="divider"></div>
<div class="row">
<div class="col s6 l9">

View File

@@ -0,0 +1,80 @@
{% extends 'material_base.html' %}
{% block title %}Workers administration page{% endblock %}
{% block description %}Workers administration page of the AYTA system{% endblock %}
{% block content %}
<div class="row">
<div class="col s12">
<h4>Workers administration page</h4>
</div>
</div>
<div class="divider"></div>
<div class="row">
<div class="col s12">
<h5>Options</h5>
</div>
</div>
<form method="POST">
<input title="test-sleep" type="submit" value="test-sleep" name="task">
</form>
<div class="divider"></div>
<div class="row">
<div class="col s12 m-4">
<h5>Reserved tasks per worker</h5>
<p>Usually 4 tasks per worker</p>
{% if reserved is none %}
<h6>No workers with reserved tasks, are there any workers with stuck tasks or are they even online?</h6>
{% else %}
{% for worker in reserved %}
<span>{{ worker }}</span>
<table class="striped highlight responsive-table" style=" border: 1px solid black;">
<thead>
<tr>
<th>ID</th>
<th>Task</th>
<th>Arguments</th>
</tr>
</thead>
<tbody>
{% for task in reserved[worker] %}
<tr>
<td>{{ task.get('id') }}</td>
<td>{{ task.get('name') }}</td>
<td>{{ task.get('args') }} {{ task.get('kwargs') }}</td>
</tr>
{% endfor %}
</tbody>
</table>
{% endfor %}
{% endif %}
</div>
<div class="col s12 m-4">
<h5>Current workers and processing tasks</h5>
{% if tasks is none %}
<h6>No workers with running tasks, are there any workers with stuck tasks or are they even online?</h6>
{% else %}
{% for worker in tasks %}
<span>{{ worker }}</span>
<table class="striped highlight responsive-table" style=" border: 1px solid black;">
<thead>
<tr>
<th>ID</th>
<th>Task</th>
<th>Time started</th>
</tr>
</thead>
<tbody>
{% for task in tasks[worker] %}
<tr>
<td>{{ task.get('id') }}</td>
<td>{{ task.get('name') }}</td>
<td>{{ task.get('time_start')|epoch_time }}</td>
</tr>
{% endfor %}
</tbody>
</table>
{% endfor %}
{% endif %}
</div>
</div>
{% endblock %}

View File

@@ -25,12 +25,24 @@
</li>
<li class="collection-item">
<a href="{{ url_for('channel.channel', channelId='UCIcgBZ9hEJxHv6r_jDYOMqg') }}"><span class="title">Unus Annus</span></a>
<p>Reason: This channel does not exist. (Self removed)</p>
<p>Reason: This channel does not exist.</p>
</li>
<li class="collection-item">
<a href="{{ url_for('channel.channel', channelId='UCz1s8aJYSQuaXJCtEi-VWRA') }}"><span class="title">Dutch Legion</span></a>
<p>Reason: This account has been terminated due to multiple or severe violations of YouTube's policy prohibiting hate speech.</p>
</li>
<li class="collection-item">
<a href="{{ url_for('channel.channel', channelId='UC91-8aNaRbp71UMEb_34ryg') }}"><span class="title">RBMK5000</span></a>
<p>Reason: This channel does not exist.</p>
</li>
<li class="collection-item">
<a href="{{ url_for('channel.channel', channelId='UCoPSAT64vfXlulyWd_dPE3Q') }}"><span class="title">Evilfisher2</span></a>
<p>Reason: This channel was removed because it violated our Community Guidelines.</p>
</li>
<li class="collection-item">
<a href="{{ url_for('channel.channel', channelId='UCZXkvavD2YKnFCzCkZ-bNPw') }}"><span class="title">mrabhy</span></a>
<p>Reason: This channel was removed because it violated our Community Guidelines.</p>
</li>
</ul>
</div>
<div class="col s12 l6 center-align">
@@ -43,6 +55,22 @@
<a href="{{ url_for('channel.channel', channelId='UCzGdxkzULCa9RlD-Q2EZPXQ') }}"><span class="title">Kalashnikov Group</span></a>
<p>Reason: This account has been terminated for a violation of YouTube's Terms of Service.</p>
</li>
<li class="collection-item">
<a href="{{ url_for('channel.channel', channelId='UCtfg1tENiu3SgGMZVduFmTg') }}"><span class="title">FiberNinja</span></a>
<p>Reason: This channel was removed because it violated our Community Guidelines.</p>
</li>
<li class="collection-item">
<a href="{{ url_for('channel.channel', channelId='UCv4VkfbX8YfqodF-4coEEfQ') }}"><span class="title">James Somerton</span></a>
<p>Reason: This channel does not exist.</p>
</li>
<li class="collection-item">
<a href="{{ url_for('channel.channel', channelId='UC8XH9kpilkuss4bVeRZD1kw') }}"><span class="title">Plagued Moth</span></a>
<p>Reason: This channel was removed because it violated our Community Guidelines.</p>
</li>
<li class="collection-item">
<a href="{{ url_for('channel.channel', channelId='UCxZTTWP0QN7-ch2wW1QeFwg') }}"><span class="title">CowOfTheSea</span></a>
<p>Reason: This channel was removed because it violated our Community Guidelines.</p>
</li>
</ul>
</div>
</div>

View File

@@ -5,68 +5,72 @@
{% block content %}
<div class="row">
<div class="col s12 l3 m-4">
<h4>Search the archive</h4>
<p>Searching is currently partially working and will probably not work optimally for a long time until the database and backend is fully reworked.</p>
<p>In the meantime if you know the channel name and video title you can use local search on <a href="{{ url_for('channel.base') }}">this</a> page</p>
<img class="responsive-img" src="{{ url_for('static', filename='img/mongo_meme.png') }}">
{% if stats is defined %}
<div class="divider"></div>
<h5>Stats of the archive</h5>
<h4>Search the archive</h4>
<p>Searching is currently partially working and will probably not work optimally for a long time until the database and backend is fully reworked.</p>
<p>In the meantime if you know the channel name and video title you can use local search on <a href="{{ url_for('channel.base') }}">this</a> page</p>
<img class="responsive-img" src="{{ url_for('static', filename='img/mongo_meme.png') }}">
{% if stats is not none and stats is defined %}
<div class="divider"></div>
<h5>Stats of the archive</h5>
<ul class="collection">
{% for stat in stats %}
{% for stat in stats %}
<li class="collection-item">
<span class="title">{{ stat }}</span>
<p>{{ stats[stat] }}</p>
</li>
{% endfor %}
<!--<span class="title">{{ stat }}</span>-->
{% if stat == 'last_updated' %}
Last updated {{ stats[stat]|datetime_date }} UTC
{% else %}
{{ stats[stat] }}
{% endif %}
</li>
{% endfor %}
</ul>
{% endif %}
{% endif %}
</div>
<div class="col s12 l9 m-4">
<div class="row">
<div class="col s6 offset-s3">
<div class="col s6 offset-s3">
<img class="responsive-img" src="{{ url_for('static', filename='img/bing_chilling.png') }}">
</div>
</div>
<div class="col s12 center-align">
<h5>"A big archive needs a search function." -Sun Tzu</h5>
</div>
</div>
<div class="divider"></div>
<form method="post" class="">
<div class="row">
<div class="col s12 m-4 input-field">
<input id="first_name" name="query" type="text" placeholder='Search the archive!' maxlength="64" value="{{ query }}">
</div>
<div class="divider"></div>
<form method="post" class="">
<div class="row">
<div class="col s12 m-4 input-field">
<input id="first_name" name="query" type="text" placeholder='Search the archive!' maxlength="64" value="{{ query }}">
<label for="first_name">Searching in video titles, uploader names and tags.</label>
<span class="supporting-text">Input will be interpreted as keywords. You can search for literal text by using quotes("). Or exclude by prepending minus (-).</span>
</div>
<div class="col s12 m-4">
</div>
<div class="col s12 m-4">
<button class="btn icon-right waves-effect waves-light" type="submit" name="task" value="search">Search</button>
</div>
</div>
</form>
{% if results is defined %}
<div class="divider"></div>
</div>
</div>
</form>
{% if results is defined %}
<div class="divider"></div>
<table class="striped highlight responsive-table">
<thead>
<tr>
<th>Title</th>
<th>Uploader</th>
<th>Date</th>
<th>Date</th>
</tr>
</thead>
<tbody>
{% for result in results %}
{% for result in results %}
<tr>
<td><a href="{{ url_for('watch.base') }}?v={{ result.get('id') }}">{{ result.get('title') }}</a></td>
<td><a href="{{ url_for('channel.channel', channelId=result.get('channel_id')) }}">{{ result.get('uploader') }}</a></td>
<td>{{ result.get('upload_date')|pretty_time }}</td>
<td>{{ result.get('upload_date')|pretty_time }}</td>
</tr>
{% endfor %}
{% endfor %}
</tbody>
</table>
{% if results|length == 0 %}<h6>No results. Relax the search terms more please!</h6>{% else %}<p>Not the results you were looking for? Try adding quotes ("") around important words.</p>{% endif %}
{% endif %}
{% if results|length == 0 %}<h6>No results. Relax the search terms more please!</h6>{% else %}<p>Not the results you were looking for? Try adding quotes ("") around important words.</p>{% endif %}
{% endif %}
</div>
</div>
{% endblock %}

View File

@@ -6,7 +6,7 @@
<meta property="og:title" content="{{ render.get('info').get('title') }}" />
<meta property="og:type" content="website" />
<meta property="og:url" content="{{ url_for('watch.base') }}?v={{ render.get('info').get('id') }}" />
<meta property="og:image" content="https://archive.ventilaar.net/videos/automatic/{{ render.get('info').get('channel_id') }}/{{ render.get('info').get('id') }}/{{ render.get('info').get('title') }}.jpg" />
<meta property="og:image" content="https://archive.ventilaar.net/videos/automatic/{{ render.get('info').get('channel_id') }}/{{ render.get('info').get('id') }}/{{ render.get('info').get('_title_slug') }}.jpg" />
<meta property="og:description" content="{{ render.get('info').get('description', '')|truncate(100) }}" />
{% endblock %}
@@ -27,7 +27,7 @@
<div class="col s12 l3">
<p><b>Video by:</b> <a href="{{ url_for('channel.channel', channelId=render.get('info').get('channel_id')) }}">{{ render.get('info').get('uploader') }}</a></p>
<p><b>Upload date:</b> {{ render.get('info').get('upload_date')|pretty_time }}</p>
<p><b>Archive date:</b> {{ render.get('info').get('epoch')|epoch_time }}</p>
<p><b>Archive date:</b> {{ render.get('info').get('epoch')|epoch_date }}</p>
<p><b>Video length:</b> {{ render.get('info').get('duration')|pretty_duration }}</p>
</div>
<div class="col s4 l3 center-align">

View File

@@ -0,0 +1,20 @@
from ayta.nosql import Mango
#import ayta
#app = ayta.create_app()
mango = Mango('mongodb://root:example@192.168.66.140:27017')
data = mango.download_queue.find({'status': 'failed'})
for x in data:
vId = x['id']
lines = x['fail_reason'].splitlines()
error = lines[-1]
check = "This video has been removed for violating YouTube's Terms of Service"
if check in error:
print(vId)
mango.info_json.insert_one({'id': vId, '_status': 'unavailable',
'_status_description': f'Video is unavailable because YouTube said: {check}'})
mango.queue_deleteQueue(vId)
else:
print(error)
print('done')

18
one_offs/archive_size.py Normal file
View File

@@ -0,0 +1,18 @@
from ayta.nosql import Mango
#import ayta
#app = ayta.create_app()
mango = Mango('mongodb://root:example@192.168.66.140:27017')
data = mango.info_json.find({'_status': 'available'}, {'filesize_approx': 1})
total = 0
for x in data:
size = x.get('filesize_approx')
if size:
total = total + int(size)
# the 5000 is the amount of GB of unjust approximation
total = int(total / 1000000000 + 5000)
print(f'Approximate size: {total} GB')

View File

@@ -0,0 +1,37 @@
from ayta.nosql import Mango
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
#import ayta
#app = ayta.create_app()
mango = Mango('mongodb://root:example@192.168.66.140:27017')
pivot = datetime.utcnow() - timedelta(days=90)
pivot = int(pivot.timestamp())
data = mango.info_json.find({'_status': 'available', 'timestamp': {'$gt': pivot}}, {'epoch': 1})
stat = {}
for x in data:
epoch = x['epoch']
day = datetime.fromtimestamp(epoch).strftime('%Y%m%d')
if day not in stat:
stat[day] = 1
else:
stat[day] = stat[day] + 1
dates = list(stat.keys())
values = list(stat.values())
plt.figure(figsize=(16, 8)) # Set the figure size
plt.bar(dates, values) # Create the bar chart
# Customize the x-axis labels to be vertical
plt.xticks(rotation=45, ha='right') # Rotate xticklabels by 45 degrees and align them to the right
plt.xlabel('Date') # Label for x-axis
plt.ylabel('Counts') # Label for y-axis
plt.title('Bar Graph of Counts by Date') # Title of the graph
# Display the graph
plt.show()

View File

@@ -0,0 +1,35 @@
from ayta.nosql import Mango
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
#import ayta
#app = ayta.create_app()
mango = Mango('mongodb://root:example@192.168.66.140:27017')
pivot = '20220101'
data = mango.info_json.find({'_status': 'available', 'upload_date': {'$gt': pivot}}, {'upload_date': 1})
stat = {}
for x in data:
day = x['upload_date']
if day not in stat:
stat[day] = 1
else:
stat[day] = stat[day] + 1
dates = list(stat.keys())
values = list(stat.values())
plt.figure(figsize=(16, 8)) # Set the figure size
plt.bar(dates, values) # Create the bar chart
# Customize the x-axis labels to be vertical
plt.xticks(rotation=45, ha='right') # Rotate xticklabels by 45 degrees and align them to the right
plt.xlabel('Date') # Label for x-axis
plt.ylabel('Counts') # Label for y-axis
plt.title('Bar Graph of Counts by Date') # Title of the graph
# Display the graph
plt.show()

View File

@@ -3,9 +3,12 @@
flask
flask-caching
flask-limiter
flask-sqlalchemy
flask-migrate
pymongo
yt-dlp
gunicorn
celery
sqlalchemy
requests
pyjwt[crypto]