mirror of https://github.com/streamlink/streamlink
docs: add plugin.api.validate API guide
This commit is contained in:
parent
256800bee4
commit
003debed0c
|
@ -16,6 +16,9 @@ Validation schemas
|
|||
.
|
||||
Ideally, we'd just run autodoc on the main module and configure the order of items. :(
|
||||
|
||||
Please see the :ref:`validation schema guides <api_guide/validate:Validation schemas>`
|
||||
for an introduction to this API and a list of examples.
|
||||
|
||||
.. autoclass:: streamlink.plugin.api.validate.Schema
|
||||
:members:
|
||||
:undoc-members:
|
||||
|
|
|
@ -5,3 +5,4 @@ API Guide
|
|||
:maxdepth: 2
|
||||
|
||||
api_guide/quickstart
|
||||
api_guide/validate
|
||||
|
|
|
@ -0,0 +1,335 @@
|
|||
Validation schemas
|
||||
==================
|
||||
|
||||
.. currentmodule:: streamlink.plugin.api.validate
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
The :ref:`streamlink.plugin.api.validate <api/validate:Validation schemas>` module provides an API for defining declarative
|
||||
validation schemas which are used to verify and extract data from various inputs, for example HTTP responses.
|
||||
|
||||
Validation schemas are a powerful tool for :ref:`plugin <api/plugin:Plugin>` implementors to find and extract data like
|
||||
stream URLs, stream metadata and more from websites and web APIs.
|
||||
|
||||
Instead of verifying and extracting data programatically and having to perform error handling manually,
|
||||
declarative validation schemas allow defining comprehensive validation and extraction rules which are easy to understand
|
||||
and which raise errors with meaningful messages upon extraction failure.
|
||||
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
Simple schemas
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Let's begin with a few simple validation schemas which are not particularly useful yet.
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> from streamlink.plugin.api import validate
|
||||
|
||||
>>> schema_one = validate.Schema("123")
|
||||
>>> schema_two = validate.Schema(123)
|
||||
>>> schema_three = validate.Schema(int, 123.0)
|
||||
|
||||
>>> schema_one.validate("123")
|
||||
'123'
|
||||
>>> schema_two.validate(123)
|
||||
123
|
||||
>>> schema_three.validate(123)
|
||||
123
|
||||
|
||||
First, three :class:`Schema` instances are created, ``schema_one``, ``schema_two`` and ``schema_three``.
|
||||
|
||||
The :class:`Schema` class is the main schema validation interface and the outer wrapper for all schema definitions.
|
||||
It is a subclass of :class:`validate.all <all>` which additionally implements the :meth:`Schema.validate()` method.
|
||||
This interface is expected by various Streamlink methods and functions when passing the ``schema`` argument/keyword,
|
||||
for example to the :class:`HTTPSession <streamlink.session.Streamlink.http>` methods or :mod:`streamlink.utils.parse` functions.
|
||||
|
||||
The :class:`validate.all <all>` class takes a sequence of schema object arguments and validates each one in order.
|
||||
All schema objects in this schema container must be valid.
|
||||
|
||||
Schema objects can be anything, and depending on their type, different validations will be applied. In our example, both
|
||||
``schema_one`` and ``schema_two`` contain only one schema object, namely ``"123"`` and ``123`` respectively, whereas
|
||||
``schema_three`` contains two schema objects, ``int`` and ``123.0``. This means that the first two schemas validate
|
||||
only one condition, while the third one validates two, first ``int``, then ``123.0``.
|
||||
|
||||
As you've probably already noticed, validation schemas also have a return value for their extraction purpose, but this isn't
|
||||
much interesting in this example.
|
||||
|
||||
The ``"123"``, ``123`` and ``123.0`` schemas are simple :func:`equality validations <validate>`. This is the case for
|
||||
all basic objects, and all they do is validate and return the input value again. ``int`` however is a ``type`` object,
|
||||
and thus a :func:`type validation <_validate_type>`, which checks whether the input is an instance of the schema object
|
||||
and then also returns the input value again. Since ``123`` is an ``int``, the schema is valid for that input.
|
||||
``schema_three`` however hasn't finished validating yet at this point, as it defines two validation schemas in total.
|
||||
This means that the return value of the ``int`` validation gets passed to the ``123.0`` schema validation, and as expected
|
||||
when checking ``123 == 123.0``, despite both the input and schema being different types, namely ``int`` and ``float``,
|
||||
the validation succeeds and returns its input value again, causing the return value of the whole
|
||||
``schema_three`` to be ``123``.
|
||||
|
||||
Now let's have a look at validation errors.
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> schema_one.validate(123)
|
||||
streamlink.exceptions.PluginError: Unable to validate result: ValidationError(equality):
|
||||
123 does not equal '123'
|
||||
|
||||
>>> schema_three.validate(123.0)
|
||||
streamlink.exceptions.PluginError: Unable to validate result: ValidationError(type):
|
||||
Type of 123.0 should be int, but is float
|
||||
|
||||
The first :meth:`Schema.validate()` call passes ``123`` to ``schema_one``. ``schema_one`` however expects ``"123"``, so
|
||||
a :class:`ValidationError <_exception.ValidationError>` is raised because the input value is not equal to the schema.
|
||||
:meth:`Schema.validate()` catches the error and wraps it in a :class:`PluginError <streamlink.exceptions.PluginError>`
|
||||
with a specific validation message.
|
||||
|
||||
The second validation also fails, but here, it's because of the input type. The first sub-schema explicitly checks for
|
||||
the type ``int``, and despite the following schema being ``123.0``, which is a ``float`` object that would obviously validate
|
||||
a ``123.0`` ``float`` input when comparing equality, a :class:`ValidationError <_exception.ValidationError>` is raised.
|
||||
|
||||
Extracting JSON data
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The next example shows how to read an optional integer value from JSON data.
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> from streamlink.plugin.api import validate
|
||||
|
||||
>>> json_schema = validate.Schema(
|
||||
... str,
|
||||
... validate.parse_json(),
|
||||
... {
|
||||
... "status": validate.any(None, int),
|
||||
... },
|
||||
... validate.get("status"),
|
||||
... )
|
||||
|
||||
>>> json_schema.validate("""{"status":null}""")
|
||||
None
|
||||
>>> json_schema.validate("""{"status":123}""")
|
||||
123
|
||||
|
||||
>>> json_schema.validate("""Not JSON""")
|
||||
streamlink.exceptions.PluginError: Unable to validate result: ValidationError:
|
||||
Unable to parse JSON: Expecting value: line 1 column 1 (char 0) ('Not JSON')
|
||||
|
||||
>>> json_schema.validate("""{"status":"unknown"}""")
|
||||
streamlink.exceptions.PluginError: Unable to validate result: ValidationError(dict):
|
||||
Unable to validate value of key 'status'
|
||||
Context(AnySchema):
|
||||
ValidationError(equality):
|
||||
'unknown' does not equal None
|
||||
ValidationError(type):
|
||||
Type of 'unknown' should be int, but is str
|
||||
|
||||
Once again, we start with a new :class:`Schema` object which gets assigned to ``json_schema``. This schema collection validates
|
||||
four schemas in total. Each of them must be valid, with each output being the input of the next one.
|
||||
|
||||
Since our goal is to parse JSON data and extract data from it, this means that we should only accept string inputs, so we set
|
||||
``str`` as the first schema in this :class:`validate.all <all>` schema collection.
|
||||
|
||||
Next is the :func:`validate.parse_json() <parse_json>` validation, a call of a utility function which returns
|
||||
a :class:`validate.transform <transform>` schema object that does exactly what its name suggests: it takes an input and returns
|
||||
something else. In this case, obviously, strings are the input and a parsed JSON object is the output, assuming that the input
|
||||
is indeed valid JSON data.
|
||||
|
||||
Now we validate the parsed JSON object. We expect the JSON data to be a JSON ``object``, so we let the next validation schema
|
||||
be a :func:`dict validation <_validate_dict>`. :class:`dict` validation schemas define a set of key-value pairs which
|
||||
must exist in the input, unless keys are set as optional using :class:`validate.optional <optional>`.
|
||||
For the sake of simplicity, this isn't the case in this example just yet. Each value of the key-value pairs is
|
||||
a validation schema on its own where the input is validated against.
|
||||
|
||||
Here, the ``"status"`` key has a :class:`validate.any <any>` validation schema, which is also a schema collection, similar to
|
||||
:class:`validate.all <all>`, but :class:`validate.any <any>` requires at least one sub-schema to be valid, not all.
|
||||
Each sub-schema receives the same input, and the output of the overall schema collection is the output of the first sub-schema
|
||||
that's valid. For our example, this means that the value of the ``status`` key in the JSON data must either be
|
||||
``None`` (``null``) or an ``int``.
|
||||
|
||||
If any of the schemas in a nested schema definition like that fails, then a validation error stack will be generated
|
||||
by :class:`ValidationError <_exception.ValidationError>`, as shown above.
|
||||
|
||||
The last of the four schemas in the outer :class:`validate.all <all>` schema collection is a :class:`validate.get <get>` schema.
|
||||
This schema works on any kind of input which implements :func:`__getitem__()`, for example :class:`dict` objects.
|
||||
And as expected, it attempts to get and return the ``"status"`` key of the output of the previous :class:`dict` validation.
|
||||
The :mod:`validation <streamlink.plugin.api.validate>` module also supports getting multiple values at once using
|
||||
the :class:`validate.union <union>` or :class:`validate.union_get <union_get>` schemas, but this isn't relevant here.
|
||||
|
||||
Finding stream URLs in HTML
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Let's imagine a simple website where a stream URL is embedded as JSON data in a ``data-player`` attribute of an unknown
|
||||
HTML element where the web player of that website reads from.
|
||||
|
||||
Extracting this data could be done by using regular expressions, but then we'd have to take HTML syntax into account,
|
||||
as well as JSON syntax which should usually be HTML-encoded in that HTML element attribute, which would make writing
|
||||
a regular expression even harder, apart from the fact that the JSON data structure could easily change at any time.
|
||||
|
||||
Therefore it would make much more sense parsing the HTML data, querying the resulting node tree using an XPath query
|
||||
for getting the attribute value, then parsing the JSON data and finally finding and validating the stream URL.
|
||||
|
||||
We also don't want to raise validation errors unnecessarily when the user inputs a URL where no video player was found,
|
||||
so we can instead return an empty list of streams in our plugin implementation and let Streamlink's CLI exit gracefully.
|
||||
Validation errors are only supposed to be raised when an actual error happened due to unexpected data,
|
||||
not when streams are offline or inaccessible.
|
||||
|
||||
Thanks to validation schemas, we can do all this declaratively without causing a mess when doing this programmatically.
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> from streamlink.plugin.api import validate
|
||||
|
||||
>>> schema = validate.Schema(
|
||||
... validate.parse_html(),
|
||||
... validate.xml_xpath_string(".//*[@data-player][1]/@data-player"),
|
||||
... validate.none_or_all(
|
||||
... validate.parse_json(),
|
||||
... {
|
||||
... validate.optional("url"): validate.url(
|
||||
... path=validate.endswith(".m3u8"),
|
||||
... ),
|
||||
... },
|
||||
... validate.get("url"),
|
||||
... ),
|
||||
... )
|
||||
|
||||
>>> schema.validate("""
|
||||
... <!doctype html>
|
||||
... <section class="no-video-player"></section>
|
||||
... """)
|
||||
None
|
||||
|
||||
>>> schema.validate("""
|
||||
... <!doctype html>
|
||||
... <section
|
||||
... class="video-player"
|
||||
... data-player="{
|
||||
... "title":"Offline"
|
||||
... }"
|
||||
... >
|
||||
... ...
|
||||
... </section>
|
||||
... """)
|
||||
None
|
||||
|
||||
>>> schema.validate("""
|
||||
... <!doctype html>
|
||||
... <section
|
||||
... class="video-player"
|
||||
... data-player="{
|
||||
... "title":"Live",
|
||||
... "url":"https://host/hls-playlist.m3u8"
|
||||
... }"
|
||||
... >
|
||||
... ...
|
||||
... </section>
|
||||
... """)
|
||||
'https://host/hls-playlist.m3u8'
|
||||
|
||||
>>> schema.validate("""
|
||||
... <!doctype html>
|
||||
... <section
|
||||
... class="video-player"
|
||||
... data-player="{
|
||||
... "title":"Live",
|
||||
... "url":"https://host/dash-manifest.mpd"
|
||||
... }"
|
||||
... >
|
||||
... ...
|
||||
... </section>
|
||||
... """)
|
||||
streamlink.exceptions.PluginError: Unable to validate result: ValidationError(NoneOrAllSchema):
|
||||
ValidationError(dict):
|
||||
Unable to validate value of key 'url'
|
||||
Context(url):
|
||||
Unable to validate URL attribute 'path'
|
||||
Context(endswith):
|
||||
'/dash-manifest.mpd' does not end with '.m3u8'
|
||||
|
||||
We start with a new :class:`Schema` and begin by parsing HTML using the :func:`validate.parse_html() <parse_html>`
|
||||
utility function. Similar to :func:`validate.parse_json() <parse_json>`, it returns a :class:`validate.transform <transform>`
|
||||
schema. :func:`validate.parse_html() <parse_html>` however returns a parsed HTML node tree via Streamlink's
|
||||
:ref:`lxml dependency <install:Dependencies>`.
|
||||
|
||||
This is followed by an XPath query schema using the :func:`validate.xml_xpath_string() <xml_xpath_string>` utility function.
|
||||
:func:`validate.xml_xpath_string() <xml_xpath_string>` is a wrapper for :func:`validate.xml_xpath() <xml_xpath>` which always
|
||||
returns a string or ``None``, depending on the query result. This is useful for querying text contents or single attribute
|
||||
values, like in this case. XPath queries on their own always return a result set, i.e. possibly multiple values, so when
|
||||
trying to find single values, it is important to limit the number of potential return values to only one in the XPath query.
|
||||
|
||||
The query here attempts to find any node with a ``data-player`` attribute. It then limits the result set to the first found
|
||||
element and then reads the value of its ``data-player`` attribute. :func:`validate.xml_xpath_string() <xml_xpath_string>`
|
||||
turns this into a single string return value, or ``None`` if no or an empty value was returned by the query.
|
||||
|
||||
Since we now have two different paths for our overall validation schema, either no player data or still unvalidated player data,
|
||||
our next schema is a :class:`validate.none_or_all <none_or_all>` schema. This works similar to :class:`validate.all <all>`,
|
||||
except that ``None`` inputs are skipped and get returned immediately without validating any sub-schemas. This lets us handle
|
||||
cases where no player was found on the website, without raising a :class:`ValidationError <_exception.ValidationError>`.
|
||||
|
||||
In the :class:`validate.none_or_all <none_or_all>` schema, we now attempt to parse JSON data, which was already shown
|
||||
previously, except for the fact that we don't need to validate the ``str`` input here, as the XPath query must have already
|
||||
returned a string value.
|
||||
|
||||
On to the :func:`dict validation <_validate_dict>`. We're only interested in the ``url`` key. Any other keys of the input
|
||||
will get ignored. Since we're aware that ``url`` can be missing if the stream is offline, we mark it as optional using the
|
||||
:class:`validate.optional <optional>` schema. This makes the :func:`dict validation <_validate_dict>` not raise an error
|
||||
if it's missing, but if it's set, then its value must validate. Talking about the value, we want its value to be a URL.
|
||||
|
||||
This is where the :func:`validate.url <url>` utility function comes in handy. It parses the input and lets us validate
|
||||
any parts of the parsed URL with further validation schemas. The return value is always the full URL string. In our example,
|
||||
we want to ensure that the URL's path ends with the ``".m3u8"`` string, which is an indicator for the stream being
|
||||
an HLS stream, so we can pass the URL to Streamlink's :class:`HLS implementation <streamlink.stream.HLSStream>`.
|
||||
|
||||
Lastly, we simply get the ``url`` key using :class:`validate.get <get>`. The return value must either be ``None`` if no ``url``
|
||||
key was included in the JSON data, or a ``str`` with a URL where its path ends with ``".m3u8"``.
|
||||
|
||||
This means that the overall schema can only return ``None`` or said kind of URL string. If the ``url`` key is not a URL,
|
||||
or if its path does not end with ``".m3u8"``, then a :class:`ValidationError <_exception.ValidationError>` is raised,
|
||||
which is what we want. The ``None`` return value should then be checked accordingly by the plugin implementation.
|
||||
|
||||
Validating HTTP responses
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
In order to validate HTTP responses directly, Streamlink's :class:`HTTPSession <streamlink.session.Streamlink.http>` allows
|
||||
setting the ``schema`` keyword in :meth:`HTTPSession.request() <streamlink.session.Streamlink.http.request>`,
|
||||
as well as in each HTTP-verb method like ``get()``, ``post()``, etc.
|
||||
|
||||
Here's a simple plugin implementation with the same schema from the `Finding stream URLs in HTML`_ example above.
|
||||
|
||||
.. code-block:: python
|
||||
:caption: example-plugin.py
|
||||
:name: example-plugin
|
||||
|
||||
import re
|
||||
|
||||
from streamlink.plugin import Plugin, pluginmatcher
|
||||
from streamlink.plugin.api import validate
|
||||
from streamlink.stream.hls import HLSStream
|
||||
|
||||
|
||||
@pluginmatcher(re.compile(r"https://example\.tld/"))
|
||||
class ExamplePlugin(Plugin):
|
||||
def _get_streams():
|
||||
hls_url = self.session.http.get(self.url, schema=validate.Schema(
|
||||
validate.parse_html(),
|
||||
validate.xml_xpath_string(".//*[@data-player][1]/@data-player"),
|
||||
validate.none_or_all(
|
||||
validate.parse_json(),
|
||||
{
|
||||
validate.optional("url"): validate.url(
|
||||
path=validate.endswith(".m3u8"),
|
||||
),
|
||||
},
|
||||
validate.get("url"),
|
||||
),
|
||||
))
|
||||
|
||||
if not hls_url:
|
||||
return None
|
||||
|
||||
return HLSStream.parse_variant_playlist(self.session, hls_url)
|
||||
|
||||
|
||||
__plugin__ = ExamplePlugin
|
Loading…
Reference in New Issue