Generic description of pixel formats is hard. In this case, the Apple
special format for packed YUV could have been interpreted as a RGB
format with funny packing.
Move multiple GL-specific things from the renderer to other places like
vo_opengl.c, vo_opengl_cb.c, and ra_gl.c.
The vp_w/vp_h parameters to gl_video_resize() make no sense anymore, and
are implicitly part of struct fbodst.
Checking the main framebuffer depth is moved to vo_opengl.c. For
vo_opengl_cb.c it always assumes 8. The API user now has to override
this manually. The previous heuristic didn't make much sense anyway.
The only remaining dependency on GL is the hwdec stuff, which is harder
to change.
Completely unnecessary, we can just update the uniforms immediately
after creating the program. In theory, for GLSL 4.20+, we could even
skip this, but oh well.
The vp_w/vp_h variables and parameters were not really used anymore
(they were redundant with ra_tex w/h) - but vp_h was still used to
identify whether rendering should be done mirrored.
Simplify this by adding a fbodst struct (some bad naming), which
contains the render target texture, and some parameters how it should be
rendered to (for now only flipping). It would not be appropriate to make
this a member of ra_tex, so it's a separate struct.
Introduces a weird regression for the first frame rendered after
interpolation is toggled at runtime, but seems to work otherwise. This
is possibly due to the change that blit() now mirrors, instead of just
copying. (This is also why ra_fns.blit is changed.)
Fixes#4719.
When using dumb mode, we can actually redraw a frame without uploading
it. Marking this as fresh as well results in unpredictable pass
behavior, which is confusing and makes debugging harder. So mark it as a
redraw instead, in that case.
Since the GL *gl is no longer needed for the timers, we can get rid of
the sc->gl dependency. This requires moving a utility function (which is
not GL-specific anyway) out of gl_utils.h and into utils.h
In the past, this always measured the per-shader execution times of the
individual OSD parts, which was thrown off because the shader was reused
anyway. (And apparently recording the OSD shader execution times was
removed completely, probably because of them being so unrealiably
anyway)
Since ra_timer no longer has the restriction of not allowing timers to
run concurrently, we can just wrap the entire OSD block inside a single
osd_timer now, and record that. (Technically, this can still be off when
using --blend-subtitles=video/yes and showing a full-screen OSD at the
same time. Maybe this can be done better?)
In order to prevent code duplication and keep the ra abstraction as
small as possible, `ra` only implements the actual timer queries,
it does not do pooling/averaging of the results. This is instead moved
to a ra-neutral struct timer_pool in utils.c.
This code is pretty much for the sake of vo_opengl_cb API users. It
resets certain state that either the user or our code doesn't reset
correctly. This is somewhat outdated. With GL implicit state being
so awfully large, it seems more reasonable require that any code
restores the default state when returning to the caller. Some
exceptions are defined in opengl_cb.h.
Now all GL-specifics of shader compilation are abstracted through ra.
Of course we still have everything hardcoded to GLSL - that isn't going
to change.
Some things will probably change later - in particular, the way we pass
uniforms and textures to the shader. Currently, there is a confusing
mismatch between "primitive" uniforms like floats, and others like
textures.
Also, SSBOs are not abstracted yet.
Instead of having a mutable ra_tex field (and the only one), move the
flag to struct ra, since we have only 2 tex_upload user calls anyway,
and both want the same PBO behavior. (At first I considered making it
a RA_TEX_UPLOAD_ flag, but why bother. PBOs are a terribly GL-specific
thing, so we can't expect a reasonable abstraction of it anyway.)
This requires a silly extension to ra_fns.tex_upload: since the OSD
texture can be much larger than the actual OSD image data to upload, a
mechanism for uploading only to a small part of the texture is needed.
Otherwise, we'd have to realloc/copy the data, just to pad it, and then
pay for uploading the padding too.
The RA_TEX_UPLOAD_DISCARD flag is not interpreted by GL (not sure how
you'd tell GL about this), but it clarifies the API and might be
helpful if we support other backend APIs in the future.
Actually GL-specific parts go into gl_utils.c/h, the shader cache
(gl_sc*) into shader_cache.c/h.
No semantic changes of any kind, except that the VAO helper is made
public again as part of gl_utils.c (all while the goal for gl_utils.c
itself is to be included by GL-specific code).
Another "small" step towards removing GL dependencies from the renderer.
This commit generally passes ra_tex objects instead of GL FBO integer
IDs to various rendering functions. video.c still manually binds the
FBOs when calling shaders.
This also happens to fix a memory leak with output_fbo.
Further work removing GL dependencies from the actual video renderer,
and moving them into ra backends.
Use of glInvalidateFramebuffer() falls away. I'd like to keep this, but
it's better to readd it once shader runs are in ra.
This currently only works when using lcms-based color management
(--icc-profile-*).
In principle, we could also support using lcms even when the user has
not specified an ICC profile, by generating the profile against a fixed
reference (--target-prim/--target-trc) instead. I still might do that
some day, simply because 3dlut provides a higher quality conversion than
our simple gamut mapping does for stuff like BT.2020, and also because
it's now needed to enable embedded ICC profiles. But that would be a
separate change, so preserve the status quo for now.
(Besides, my opinion is still that you should be using an ICC profile if
you care about colors being accurate _at all_)
mesa won't pick client storage unless this bit is set, and we
*absolutely* want to be using client storage for our DR PBOs.
Performance is shit on AMD otherwise. (Nvidia always uses client storage
for persistent coherent buffers whether you tell it it or not, probably
because it's way faster and nvidia doesn't trust users to figure that
out on their own)
It makes no sense to have this on an already created buffer.
If anything, the ra backend would have to export this as a global value
(e.g. struct ra field), so that whatever allocates the buffer can
account for the required alignment. Since this code is in vo_opengl.c in
the first place, and since GL doesn't dictate any special alignment
here, it doesn't make sense in the first place to export this. (Maybe
something like this will be required later.)
Breaks on mesa for whatever reason... even though it doesn't generate a
GLSL shader compiler error
Shouldn't make a performance difference for us because we cache `pos`
anyway, and most compute shaders will probably cache all of their
samples to shmem. Might have to re-visit this when we have an actual use
case for repeated sampling inside CS though. (RAVU + anti-ringing is a
possible candidate for that)
Or less appropriate, as some would argue. The new name is short for
"Apple YUV packed".
(This format is needed only for hardware decoding on rather old Apple
hardware, and a very annoying special case.)
This broke float textures, which were actually used by some shaders.
There were probably some other bugs as well.
Lots of code can be avoided by using ra_tex_params directly, so do that.
The main change is that COMPONENT/FORMAT are replaced by a single FORMAT
directive, which takes different parameters now. Due to the mess with
16/32 bit float textures, and because we want to support other APIs than
just GL in the future, it's not really clear how this should be handled,
and the nice component/type separation makes things actually harder. So
just jump the gun and use the ra_format.name names, which were
originally meant mostly for debugging. (This is probably something that
will be regretted later.)
Still only superficially tested, but seems to work.
Fixes#4708.
Since this code was already written for HDR, and is now per-channel
(because it works better for HDR as well), we can actually reuse this to
get very high quality gamut mapping without clipping. The only required
change is to move the tone mapping from before the gamut map to after
the gamut map. Additonally, we need to also account for changes in the
signal range as a result of applying the CMS when we compute ref_peak,
which is fortunately pretty easy because we only need to consider the
case of primaries mapping to themselves.
Since `HDR` no longer really makes sense as a label, rename it to
`--tone-mapping` in general. Also fits better with
`--tone-mapping-desat` etc.
Arguably we could also rename `--hdr-compute-peak`, but that option is
basically only useful for HDR content anyway because we don't need
information about the signal range for gamut mapping.
This (finally!) gives us reasonably high quality gamut mapping even in
the absence of an ICC profile / 3DLUT.
Huge thanks to @rusxg for finding this solution, which was previously
believed not to exist. Of course, we still don't actually need it, but I
don't want to leave this half-implemented in case somebody does in the
future.
So far, switching between integrated and discrete GPU would cause the
kernel to kill mpv due to an indecipherable buffer error. The technical
note TN2229 from Apple recommends to enable OpenGL Offline Renderers for
every Mac with more GPUs than displays to handle the switch between GPU.
By ordering the array from the least commonly rejected to the most,
we can sequentially remove PixelFormat attributes to fit the host.
Fixes#2371
Also add some more helpers.
Fix the broken math.h include statement.
utils.c uses ra_gl.h internals, which it shouldn't, and which will be
removed again as soon as this code gets converted to ra fully.
The dither texture data is created as a float array, but uploaded to a
texture with GL_R16 as internal format. We relied on GL to do the
conversion from float to uint16_t. Not all GL variants even support
this: GLES does not provide this conversion (one of the reasons why this
code has a float16 code path). Also, ra is not going to do this. So just
convert on the fly.
Still keep the float16 texture format fallback, because not all GLES
implementations provide GL_R16.
There is some possibility that we'll need to provide some kind of upload
conversion anyway for float->float16. We still rely on GL doing this
implicitly, and all GL variants support it, but with RA there might be
the need for explicit conversion. Even then, it might be best to reduce
the number of conversion cases. I'll worry about this later.
Format handling via ra_* was added earlier, but the format negotiation
part was forgotten.
Actually move some aspects of it to ra_get_imgfmt_desc(). Also make sure
the unorm and float formats selected by the common format lookup
functions are linear filterable. (For OpenGL, this is implicitly
guaranteed, so it wasn't done before.) Whether these assumptions should
be checked/enforced in the ra code at all is a bit fuzzy, but with ra
being helper code only for the actual video renderer, it's probably
justified.
Parsing the texture data as raw strings makes the textures the most
portable and self-contained. In order to facilitate different types of
shaders, the parse_user_shader interaction has been changed to instead
have it loop through blocks and call the passed functions for each valid
block parsed. This is more modular and also cleaner, with better code
separation.
Closes#4586.
- Each struct tex_hook now stores multiple hooks, this allows us to
avoid the awkward way of the current code has to add the same pass
multiple times.
- As a consequence, SHADER_MAX_HOOKS was split up into SHADER_MAX_PASSES
(number of tex_hooks) and SHADER_MAX_HOOKS (number of hooked textures
per tex_hook), and both numbers decreased correspondingly.
- Instead of having a weird free() callback, we can just leverage
talloc's recursive free behavior. The only user is the user shaders code
anyway.
This actually makes sure we don't decolor due to clipping even when the
signal itself exceeds the luma by a significant factor, which was pretty
common for saturated blues (and to a lesser degree, reds) - most
noticeable in skies etc.
This prevents the turn-the-sky-cyan effect of mobius tone mapping, and
should also improve the other tone mapping modes in quality.
This starts work on moving OpenGL-specific code out of the general
renderer code, so that we can support other other GPU APIs. This is in
a very early stage and it's only a proof of concept. It's unknown
whether this will succeed or result in other backends.
For now, the GL rendering API ("ra") and its only provider (ra_gl) does
texture creation/upload/destruction only. And it's used for the main
video texture only. All other code is still hardcoded to GL.
There is some duplication with ra_format and gl_format handling. In the
end, only the ra variants will be needed (plus the gl_format table of
course). For now, this is simpler, because for some reason lots of hwdec
code still requires the GL variants, and would have to be updated to
use the ra ones.
Currently, the video.c code accesses private ra_gl fields. In the end,
it should not do that of course, and it would not include ra_gl.h.
Probably adds bugs, but you can keep them.
The radius check was not strict enough, especially not for all
platforms. To fix this, actually check the hardware capabilities instead
of relying on a hard-coded maximum radius.
The textures not having an FBO actually caused regressions when trying
to render the subtitles on top of this texture (--blend-subtitles),
which still relied on an FBO.
So just kill off the logic entirely. Why worry about a single FBO wasted
when we're allocating like 10 anyway.
Fixes#4657.
According to the OpenGL spec, atomic access to SSBO variables is *not*
guaranteed to be coherent, even when reusing the same SSBO attached to
the same shader across different frames. So we actually need a
glMemoryBarrier here, at least in theory.
This bug slipped past my attention because nvidia ignores memory
barriers, but this is not necessarily always the case. Since
image_load_store is incoherent (specifically, writing to images from
compute shaders is incoherent) we need to insert a memory barrier to
make it coherent again. Since we only care about texture fetches, that's
the only barrier we need.
Two changes, compounded into one since they affect the same logic:
1. Never use linearization for HDR downscaling
2. Always use linearization for interpolation
Instead of fixing p->use_linear at the beginning of pass_render_frame,
we flip it on "dynamically" as needed. I plan on killing this
p->use_linear frame (along with other per-pass metadata) and moving them
into their own struct for tracking the "current" state of the video, but
that's a separate/upcoming refactor.
As a small bonus, reduce some code duplication in the interpolation
logic.
Fixes#4631
Mesa 17.1 supports compute shader but not full specs of OpenGL 4.3.
Change the code to detect OpenGL extension "GL_ARB_compute_shader"
rather than OpenGL version 4.3.
HDR peak detection requires SSBO, and polar scaler requires 2D array
extension. Add these extensions as requirement as well.
This performs almost 50% faster on my machine (!!), from 4650μs down to
about 3176μs for ewa_lanczossharp.
It's possible we could use a similar approach to speed up the separable
scalers, although with vastly simpler code. For separable scalers we'd
also have the additional huge benefit of only needing padding in one
direction, so we could potentially use a big 256x1 kernel or something
to essentially compute an entire row at once.
This is done via compute shaders. As a consequence, the tone mapping
algorithms had to be rewritten to compute their known constants in GLSL
(ahead of time), instead of doing it once. Didn't affect performance.
Using shmem/SSBO atomics in this way is extremely fast on nvidia, but it
might be slow on other platforms. Needs testing.
Unfortunately, setting up the SSBO still requires OpenGL calls, which
means I can't have it in video_shaders.c, where it belongs. But I'll
defer worrying about that until the backend refactor, since then I'll be
breaking up the video/video_shaders structure anyway.
These can either be invoked as dispatch_compute to do a single
computation, or finish_pass_fbo (after setting compute_size_minimum) to
render to a new texture using a compute shader. To make this stuff all
work transparently, we try really, really hard to make compute shaders
as identical to fragment shaders as possible in their behavior.
Don't use FBOTEX_FUZZY where the FBO is sized according to
p->texture_w/h, since this changes infrequently (and when it does, we
need to reset everything anyway). No real reason to make this change
other than that it possibly prevents nasty surprises in the future, so I
feel more comfortable about it.
Seems like I really like this C99 idiom. No reason not to generalize it
do snprintf(). Introduce mp_tprintf(), which basically this idiom to
snprintf(). This macro looks like it returns a string that was allocated
with alloca() on the caller site, except it's portable C99/C11. (And
unlike alloca(), the result is valid only within block scope.)
Use it in 2 places in the vo_opengl code. But it has the potential to
make a whole bunch of weird looking code look slightly nicer.
Can be enabled via --vd-lavc-dr=yes. See manpage additions for what it
does.
This reminds of the MPlayer -dr flag, but the implementation is
completely different. It's the same basic concept: letting the decoder
render into a GPU buffer to avoid a copy. Unlike MPlayer, this doesn't
try to go through filters (libavfilter doesn't support this anyway).
Unless a filter can work in-place, DR will be silently disabled. MPlayer
had very complex semantics about buffer types and management (which
apparently nobody ever understood) and weird restrictions that mostly
limited it to mpeg2 style codecs. The mpv code does not do any of this,
and just lets the decoder allocate an arbitrary number of untyped
images. (No MPlayer code was used.)
Parts of the code based on work by atomnuker (starting point for the
generic code) and haasn (some GL definitions, some basic PBO code, and
correct fencing).
In addition to using the new VAO mechanism introduced in the previous
commit, this tries to keep the OSD code self-contained. This doesn't
work all too well (because of the pass and CMS stuff), but it's still
better than before.
This removes VAO handling from video.c. Instead the shader cache will
create the VAO as needed. The consequence is that this creates a VAO
per shader, which might be a bit wasteful, but doesn't matter anyway.
Reduce this to 1 draw call per OSD pass. This removes the need for some
annoying special handling regarding 3D video support (we supported
duplicating the OSD/subtitles for side-by-side 3D output etc.).
Remove the unneeded texture sampler uniform thing.
These are apparently expensive on some drivers which are not smart
enough to turn x/42 into x*1.0/42. So, do it for them.
My great test framework says it's okay
Performance seems pretty much unchanged but I no longer get nasty spikes
on NUMA systems, probably because glBufferSubData runs in the driver or
something.
As a simplification of the code, we also just size the PBO to always
have the full size, even for cropped textures. This seems slower but not
by relevant amounts, and only affects e.g. --vf=crop. It also slightly
increases VRAM usage for textures with big strides.
This new code path is especially nice because it no longer depends on
GL_ARB_map_buffer_range, and no longer uses any functions that can
possibly fail, thus simplifying control flow and seemingly deprecating
the manpage's claim about possible image corruption.
In theory we could also reduce NUM_PBO_BUFFERS since it doesn't seem
like we're streaming uploads anyway, but leave it in there just in
case some drivers disagree...
STREAM is better than DYNAMIC because we're only using it once per
frame. As for COPY vs DRAW, that was pretty much incorrect to begin with
- but surprisngly, COPY is actually faster (sometimes significantly so,
e.g. on my NUMA system).
After testing, the best I can gather is that it has to do with the fact
that COPY requires fewer redundant memcpy()s, and also 3x reduce RAM
bandwidth (in theory).
Anyway, that bit shouldn't introduce any regressions, it's just a
documentation update. Maybe I'll change my mind about the comment again
the future, it's really hard to tell. Vulkan, please save us!
Instead of allocating three PBOs and cycling through them, we allocate
one PBO that's three times as large, and cycle through the subregion
offsets.
This results in arguably simpler code and faster initialization
performance. Especially for 4K textures, initializing PBOs can take
quite some time (e.g. 180ms -> 110ms). For 1080p, it's more like 66ms ->
52ms for me.
The alignment to 4096 is completely unnecessary by spec, but we do it
anyway just for peace of mind.
This is unnecessary to call from gl_video_resize, because the hooks only
(possibly) change when the actual vo_opengl options change. This used to
be required back when mpv still had prescaling built in, but since that
was all moved to user shaders and the code removed, this is a left-over
artifact.
The renderer code doesn't list a fixed set of supported formats, but
supports anything that is described by AVPixFmtDescriptor and follows a
number of constraints.
Plane order is not included in those constraints. This means the planes
could be in random order, rather than what the vo_opengl renderer
happens to assume. For example, it assumes that the 4th plane is alpha,
even though alpha could be on any plane. Likewise it assumes that plane
0 was always luma, and planes 2/3 chroma. (In earlier iterations of
format support, this was guaranteed by MP_IMGFLAG_YUV_P, but this is not
used anymore.)
Explicitly set the plane semantics (enum plane_type) by the component
descriptors and the used colorspace. The behavior should mostly not
change, but it's less likely to break when FFmpeg adds new pixel
formats.
With some newer ANGLE builds, mapping can fail with "Failed to create
EGL surface" during playback. The reason is unknown, and it might just
be an ANGLE bug. Probe whether it works at init time to avoid the
problem.
In commit 6eb0bbe this was changed from xs[n] to use gl_format.chroma_w
indiscriminately, which broke chroma rendering when zooming/cropping.
The solution is to only use chroma_w for chroma planes.
Fixes#4592.
On optional hook points, we store to a temp FBO and then read from it
again to complete any operations that may still be left (e.g.
sigmoidization after MAIN/LINEAR).
In theory this mechanism should be reworked to avoid the temporary FBO
until the next time we actually need one - and also skip redundant
passes if we the next thing we need *is* a FBO - but both are those are
tricky. Anyway, in the meantime, at least we can label the
(semi-)redundant passes that get generated when using user shaders.
This just indicates a fixed linear coefficient to multiply into the
signal, similar to the old option --target-brightness (but the inverse
thereof). Good for testing purposes, which is why I added it. (This also
corresponds somewhat to what zimg does)
It's now possible to request non-dumb mode as a user, even when not
using any non-dumb features. This change is mostly intended for testing,
so I can easily switch between dumb and non-dumb mode on default
settings. The default behavior is unaffected.
This backend is selected if vaapi is available, but vaapi-over-EGL is
not. This causes various issues around the forced RGB conversion, which
is done with fixed, usually incorrect parameters.
It seems the existing auto probing check is too weak, and doesn't really
prevent it from getting loaded. Fix this by adding a flag to not ever
load this during auto probing.
I'm still not deleting it, because it's useful for testing on nvidia
machines.
See #4555.
The current algorithm blew up when the color was negative, such as the
case when downscaling with dscale=mitchell or other algorithms that
introduce negative ringing. The simplest solution is to just slightly
change the calculation to force both parameters to be in-range.
This is exposed so that bjin/mpv-prescalers can use textureGatherOffset
for performance.
Since there are now quite a lot of parameters where it isn't quite clear
why they're all defined, add a paragraph to the man page that explains
them a bit.
This helps prevent unnaturally, weirdly colorized blown out highlights
for direct images of the sunlit sky and other way-too-bright HDR
content. I was debating whether to set the default at 1.0 or 2.0, but
went with the more conservative option that preserves more detail/color.
This logic doesn't really make sense. copy_img_tex already binds the
texture, so why would we bind it a second time? Furthermore, nothing
actually uses this return value. Must have been some left-over artifact
of a previous iteration of this function. Anyway, it's harmless, just
nonsensical. So remove it.
This is more efficient on my machine (nvidia), but only when applied to
groups of exactly 4 texels. So we switch to the more efficient
textureGather for groups of 4. Some notes:
- textureGatherOffset seems to be faster than textureGather by a
non-negligible amount, but for some reason, textureOffset is still
slower than a straight-up texture
- textureGather* requires GLSL 400; and at least on nvidia, this
requires actually allocating a GL 4.0 context.
- the code in opengl/common.c that clamped the GLSL version to 330 is
deprecated, because the old user shader style has been removed
completely in the meantime
- To combat the growing complexity of the polar sampling code, we drop
the antiringing functionality from EWA shaders completely, since it
never really worked well for EWA to begin with. (Horrific artifacting)
- change asserts to silent exits
- check all pointers before use
- move the p->pass initialization code to the right place
This should hopefully cut down on the amount of crashing by making the
code fundamentally more robust, while also fixing a concrete issue where
opengl-cb failed to initialize p->pass.
This allows filter functions to be prematurely cut off once their
contributions start becoming insignificant. This effectively prevents
wasted GPU time sampling from parts of the function that are essentially
reduced to zero by the window function, providing anywhere from a 10% to
20% speedup. (5700μs -> 4700μs for me)
2f41c4e8 exposed some other edge cases as well. Globally resetting the
pass info was not the right way to go about it, because we don't know in
advance what the frame type is going to be - at least not with the
current code structure. (In principle, we could separately indicate the
frame type and the pass type and then only reset it on the first
actual pass_describe call, but that's annoying as well)
Also fixes a latent issue where p->pass was never initialized, which
broke the MP_DBG debugging code in some cases.
Since all existing code does gl_video_upload immediately followed by
pass_render_frame, we can just move the upload into pass_render_frame
itself, which arguably makes more sense anyway.
This replaces `vo-performance` by `vo-passes`, bringing with it a number
of changes and improvements:
1. mpv users can now introspect the vo_opengl passes, which is something
that has been requested multiple times.
2. performance data is now measured per-pass, which helps both
development and debugging.
3. since adding more passes is cheap, we can now report information for
more passes (e.g. the blit pass, and the osd pass). Note: we also
switch to nanosecond scale, to be able to measure these passes
better.
4. `--user-shaders` authors can now describe their own passes, helping
users both identify which user shaders are active at any given time
as well as helping shader authors identify performance issues.
5. the timing data per pass is now exported as a full list of samples,
so projects like Argon-/mpv-stats can immediately read out all of the
samples and render a graph without having to manually poll this
option constantly.
Due to gl_timer's design being complicated (directly reading performance
data would block, so we delay the actual read-back until the next _start
command), it's vital not to conflate different passes that might be
doing different things from one frame to another. To accomplish this,
the actual timers are stored as part of the gl_shader_cache's sc_entry,
which makes them unique for that exact shader.
Starting and stopping the time measurement is easy to unify with the
gl_sc architecture, because the existing API already relies on a
"generate, render, reset" flow, so we can just put timer_start and
timer_stop in sc_generate and sc_reset, respectively.
The ugliest thing about this code is that due to the need to keep pass
information relatively stable in between frames, we need to distinguish
between "new" and "redrawn" frames, which bloats the code somewhat and
also feels hacky and vo_opengl-specific. (But then again, this entire
thing is vo_opengl-specific)
For some braindead reason, Microsoft decided to prevent you from
dynamically loading system libraries. This makes portability harder.
And we're talking about portability between Microsoft OSes!
This partially reverts the change from a longer time ago to always build
DXVA2 and D3D11VA together.
To make it simpler, we change the following:
- building with ANGLE headers is now required to build D3D hwaccels
- if DXVA2 is enabled, D3D11VA is still forcibly built
- the CLI vo_opengl ANGLE backend is now under --egl-angle-win32
This is done to reduce the dependency mess slightly.
Another legacy annoyance. The only place where packed YUV is still
important is slightly older Apple hardware or drivers, which require
it for efficient hardware decoding.
Instead of setting up a weird swizzle (which is linked to how the
internal renderer code works, rather than the generic format code), add
per-component mapping to gl_imgfmt_desc.
The renderer still computes the weird swizzle, but at least it's
confined to itself. Also, it appears the hwdec backends don't need this
anymore.
It's really nice that the messy init_format() goes away too.
The changes to path list options is basically getting rid of the need to
pass multiple paths to a single option. Instead, you can use the option
multiple times. The old behavior can be used by using the -set suffix
with the option.
Change some options to path lists. For example --script is now append by
default, and if you use --script-set, you need to use ":"/";" as
separator instead of ",".
--sub-paths/--audio-file-paths is a deprecated alias now, and will break
if the user tries to pass multiple paths to it. I'm assuming that if
these are used, most users will pass only 1 path anyway.
--opengl-shaders has more compatibility handling, since it's probably
rather common that users pass multiple options to it.
Also document all that in the manpage.
I'll probably regret this later, as it somewhat increases the complexity
of the option parser, rather than increasing it.
Add something that allows is to extract the component order from various
RGBA formats. In fact, also handle YUV, GBRP, and XYZ formats with this.
It introduces a new struct mp_regular_imgfmt, that hopefully will
eventually replace struct mp_imgfmt_desc. The latter is still needed by
a lot of code though, especially generic code. Also vo_opengl still uses
the old one, so this commit is sort of incomplete.
Due to its genericness, it's also possible that this commit introduces
rendering bugs, or accepts formats it shouldn't accept.
Commit 3fb6380 was supposed to increase MAX_TEXTURE_HOOKS but instead
increased SHADER_MAX_HOOKS, since I forgot that they were separate (for
whatever reason).
To prevent this mistake from happening again, and to unify the location
in which user_shader-specific #defines are placed, get rid of the two
constants in opengl/video.c and move/reuse them from user_shaders.h
instead.
Also bump up MAX_SAVED_TEXTURES (now SHADER_MAX_SAVED) slightly as a
precaution against adding more passes to vo_opengl. I think we're
already flirting with the limit.
This is even better at preventing discoloration than tone mapping on the
XYZ image. Partly inspired by the HLG OOTF. Also simplifies the way we
tone map, and moves this logic to the pass_tone_map function where it
belongs.
This also fixes what could arguably be considered a bug in the HLG
implementation when using HLG for non-BT.2020 colorspaces, which is not
permitted by spec but thinkable in theory. Although in this case, I
guess it will be arbitrary whether people use the BT.2020-normalized
luma coefficients or change it to fit the colorspace, so I guess either
way could be considered "right", depending on what people end up doing.
Either way, in lieue of standard practice, we do what makes the most
sense (to me), and hopefully others will follow.
The downside is that we upload an extra vec3 uniform even if we don't
use it, but eliminating that would be ugly.
These can never be uninitialized because the enum cases are exhaustive and
the fallback is in the correct order, but GCC is too dumb to understand
this.
Also explicitly initialize tex_type, because while GCC doesn't warn
about it (for some reason), maybe it will in the future.
Moves the DXLockObjectsNV call to after PresentEx. This fixes an issue
where the presented image is a single frame late. This may be due to
DXLockObjectsNV locking the render target before StretchRect is done.
The spec indicates that the lock call should provide synchronization
for the resource, so this may be due to a driver bug.
This preserves channel balance better and helps reduce discoloration due
to nonlinear tone mapping.
I wasn't sure whether to stuff this inside pass_color_manage or
pass_tone_map but decided for the former because adding the extra
mp_csp_prim would have made the signature of the latter longer than
80col, and also because the `mp_get_cms_matrix` below it basically does
the same thing anyway, so it doesn't look that out of place. Also why is
this justification longer than the actual description of the algorithm
and what it's good for?
This introduces (yet another..) mp_colorspace members, an enum `light`
(for lack of a better name) which basically tells us whether we're
dealing with scene-referred or display-referred light, but also a bit
more metadata (in which way is the scene-referred light expected to be
mapped to the display?).
The addition of this parameter accomplishes two goals:
1. Allows us to actually support HLG more-or-less correctly[1]
2. Allows people playing back direct “camera” content (e.g. v-log or
s-log2) to treat it as scene-referred instead of display-referred
[1] Even better would be to use the display-referred OOTF instead of the
idealized OOTF, but this would require either native HLG support in
LittleCMS (unlikely) or more communication between lcms.c and
video_shaders.c than I'm remotely comfortable with
That being said, in principle we could switch our usage of the BT.1886
EOTF to the BT.709 OETF instead and treat BT.709 content as being
scene-referred under application of the 709+1886 OOTF; which moves that
particular conversion from the 3dlut to the shader code; but also allows
a) users like UliZappe to turn it off and b) supporting the full HLG
OOTF in the same framework. But I think I prefer things as they are
right now.
st2084 and std-b67 are really weird names for PQ and HLG, which is what
everybody else (including e.g. the ITU-R) calls them. Follow their
example.
I decided against naming them bt2020-pq and bt2020-hlg because it's not
necessary in this case. The standard name is only used for the other
colorspaces etc. because those literally have no other names.
List of changes:
1. Kill nom_peak, since it's a pointless non-field that stores nothing
of value and is _always_ derived from ref_white anyway.
2. Kill ref_white/--target-brightness, because the only case it really
existed for (PQ) actually doesn't need to be this general: According
to ITU-R BT.2100, PQ *always* assumes a reference monitor with a
white point of 100 cd/m².
3. Improve documentation and comments surrounding this stuff.
4. Clean up some of the code in general. Move stuff where it belongs.
This file is an leftover from when img_format.h was changed from using
the ancient FourCCs (based on Microsoft multimedia conventions) for
pixel formats to a simple enum. The remaining cases still inherently
used FourCCs for whatever reasons.
Instead of worrying about residual copyrights in this file, just move it
into code we don't want to relicense (the ancient Linux TV code). We
have to fix some other code depending on it. For the most part, we just
replace the MP_FOURCC macro with libavutil's MKTAG (although the macro
definition is exactly the same). In demux_raw, we drop some pre-defined
FourCCs, but it's not like it matters. (Instead of
--demuxer-rawvideo-format use --demuxer-rawvideo-mp-format.)
In GLES 2 mode, we can do dither, but "fruit" dithering is still out of
the question, because it does not support any high depth textures.
(Actually we probably could use an 8 bit texture too for this, at least
with small matrix sizes, but it's still too much of a pain to convert
the data, so why bother.)
This is actually a regression; before this, forcibly enabling dumb mode
due to low GL caps actually happened to avoid this case.
Fixes#4519.
I call it `mobius` because apparently the form f(x) = (cx+a)/(dx+b) is
called a Möbius transform, which is the algorithm this is based on. In
the extremes it becomes `reinhard` (param=0.0 and `clip` (param=1.0),
smoothly transitioning between the two depending on the parameter.
This is a useful tone mapping algorithm since the tunable mobius
transform allows the user to decide the trade-off between color accuracy
and detail preservation on a continuous scale. The default of 0.3 is
already far more accurate than `reinhard` while also being reasonably
good at preserving highlights, without suffering from the overall
brightness drop and color distortion of `hable`.
For these reasons, make this the new default. Also expand and improve
the documentation for these tone mapping functions.
Unfortunately quite a mess, in particular due to the need to have some
compatibility with the old API. (The old API will be supported only in
short term.)
In a multi GPU scenario, it may be desirable to use different GPUs
for decode and display responsibilities. For example, if a secondary
GPU has better video decoding capabilities.
In such a scenario, we need to initialise a separate context for each
GPU, and use the display context in hwdec_cuda, while passing the
decode context to avcodec.
Once that's done, the actually hand-off between the two GPUs is
transparent to us (It happens during the cuMemcpy2D operation which
copies the decoded frame from a cuda buffer to the OpenGL texture).
In the end, the bulk of the work is around introducing a new
configuration option to specify the decode device.
The new API has literally no advantages (other than that we can drop
mp_vt_download_image and other things later), but it's sort-of uniform
with the other hwaccels.
"--videotoolbox-format=no" is not supported with the new API, because it
doesn't "fit in". Probably could be added later again.
The iOS code change is untested (no way to test).
This was broken in e0250b9604. In some cases, device creation will
succeed, but creating an EGL context on the device will fail. With
--angle-renderer=auto, it should try to create the context again on a
D3D9 device.
This fixes mpv in Windows Vista on VirtualBox for me.
TLS is a headache. We should avoid it if we can.
The involved mechanism is unfortunately entangled with the unfortunate
libmpv API for returning pointers to host API objects. This has to be
kept until we change the API somehow.
Practically untested out of pure laziness. I'm sure I'll get a bunch of
reports if it's broken.
Until now, the texture pointer was stored in plane 1, and the texture
array index was in plane 2. Move this down to plane 0 and plane 1. This
is to align it to the new WIP D3D11 decoding API in Libav, where we
decided that there is no reason to avoid setting plane 0, and that it
would be less weird to start at plane 0.
This reverts commit 142b2f23d4, and replaces it with another try. The
previous attempt removed the overlay on every rendering, because the
normal rendering path actually unrefs the mp_image. Consequently,
unmap_current_image() was completely inappropriate for removing the
overlay.
This drops support for the old libavcodec APIs. Now FFmpeg 3.3 or FFmpeg
git is required. Libav has no release with the new APIs yet, so for
Libav git as of a few weeks or months ago or so is required if you want
to use Libav.
Not much actually changes in hwdec_vaegl.c - some code is removed, but
the reindentation inflates the diff.
Wayland is still too amateurish, and multiple features don't work,
including critical ones. There is no solution in sight, so prefer X11.
(Which seems to mostly work ok via xwayland.)
Once all problems are solved, the defaults can be switched back.
Mostly because of ANGLE (sadly).
The implementation became unpleasantly big, but at least it's relatively
self-contained.
I'm not sure to what degree shaders from different drivers are
compatible as in whether a driver would randomly misbehave if it's fed
a binary created by another driver. The useless binayFormat parameter
won't help it, as they can probably easily clash. As usual, OpenGL is
pretty shit here.
gl_headers.h is basically header_fixes.h done consequently. It contains
all OpenGL defines (and some typedefs) we need. We don't include GL
headers provided by the system anymore.
Some care has to be taken by certain windowing APIs including all of
gl.h anyway. Then the definitions could clash. Fortunately, redefining
preprocessor symbols to the same content is allowed and ignored. Also,
redefining typedefs to the same thing is allowed in C11. Apparently the
latter is not allowed in C99, so there is an imperfect attempt to avoid
the typedefs if required API symbols are apparently present already.
The nost risky part about this are the standard typedefs and GLAPIENTRY.
The latter is different only on win32 (and at least consistently so).
The typedefs are mostly based on stdint.h typedefs, which khrplatform.h
clumsily emulates on platforms which don't have it. The biggest
difference is that we define GLsizeiptr directly to ptrdiff_t, instead
of checking for the _WIN64 symbol and defining it to long or long long.
This also typedefs GLsync to __GLsync, just like the khronos headers.
Although symbols prefixed with __ are implementation reserved, khronos
also violates this rule, and having the same definition as khronos will
avoid problems on duplicate definitions.
We can simplify the build scripts too. The ios-gl check seems a bit
wrong now (what we really want to test for is EAGLContext), but I can't
test and thus can't improve it.
cuda_dynamic.h redefined two GL symbols; just include the new headers
directly instead.
This is pretty trivial, but also quite annoying due to details like
mismatching eglGetProcAddress() function signature (most callers just
cast the function pointer), and ARM/Linux hacks. So move them all to one
place.
With the recent GLES3 header detection, and if ANGLE is in the search
path, the ANGLE headers will be used over the desktop GL ones. It
appears the ANGLE headers do not include <windows.h>, which leads to the
dxinterop code to fail building. Oops.
Fix this by including <windows.h> is dxinterop is compiled in.
It appears we expect IOS to provide GLES 3. The IOS block contains all
symbols from the GLES block. Weirdly not all, so it's possible that some
symbols will be redefined, which is annoying, but harmless. I don't have
an iOS setup to test, otherwise it's likely that a modification of the
IOS include statements would take care of this.
DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL might be buggy on some hardware.
Additionaly DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL might be supported on some
Windows 7 systems with the platform update, but it might have poor
performance. In these cases, the user might want to disable the use of
DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL swap chains with --angle-flip=no.
Might be useful for other backends too. For context_vdpau, resize
handling, presentation, and handling the mapping state becomes somewhat
less awkward.
In some cases, such as when using the libmpv opengl-cb API, or with
certain vo_opengl backends, the main framebuffer is never accessed.
Instead, rendering is done to a FBO that acts as back buffer. This meant
an incorrect/broken bit depth could be used for dithering.
Change it to read the framebuffer depth lazily on the first render call.
Also move the main FBO field out of the GL struct to MPGLContext,
because the renderer's init function does not need to access it anymore.
Useful for testing. Unfortunately, the nVidia EGL driver ignores this,
and returns a GLES 3.2 context anyway (which it is allowed to do). Might
still be useable with ANGLE, which will really give you a GLES 2 context
if you ask for it.
When dumb mode is used (the "simple" rendering path), respect the dither
options. Options should never be ignored (except in GLESv2 mode); either
they should be respected in dumb mode, or they should disable dumb mode.
In this case, the former applies.
This actually fixes the dreaded errors during resizing. It works pretty
much like before, except each surface is reallocated before it's used.
It implies surfaces with the old size remain in the presentation queue
and will be displayed.
Don't assume 0 is an invalid object handle. vdpau with its weird API
design makes all objects indexes, with 0 being a perfectly valid and
common value. You need to use VDP_INVALID_HANDLE, which is not 0.
Don't crash if init fails at vdpau initialization. It's because
mp_vdpau_destroy(NULL) crashes. Simplify it.
Destroy output surface backed FBO before output surface. Also, strictly
bookkeep the map/unmap calls (and unmap surfaces before destroying the
FBO/texture). I can't see a change in the weird errors when resizing the
window, but I guess it's slightly more correct.
Add the GL_WRITE_DISCARD_NV symbol to header_fixes.h, because we might
fail compilation with headers that do not contain the vdpau extension
(well, probably doesn't matter).
The gl_timer_last_us() function could access samples[-1]. Fix by
coercing to unsigned, so the % will put it into index [0,max). The
real value returned in this corner case doesn't mean too much, I
guess.
As the manpage says, this has no value other than adding bugs.
It uses code based on context_x11.c, and basically does very stripped
down context creation (no alpha support etc.). It uses vdpau for
display, and maps vdpau output surfaces as FBOs to render into them.
This might be good to experiment with asynchronous presentation. For
now, it presents synchronously, with a 4 frame delay (which should whack
off A/V sync). The forced 4 frame delay is probably also why interaction
feels slower.
There are some weird vdpau errors on resizing and uninit. No idea what
causes them.
Should have done this 1000 years ago. Now GL backends can use mp_log
macros directly on the MPGLContext, instead of doing stupid things like
for example MP_WARN(ctx->vo, ...).
The existing code modifies f.radius so that it is in terms of the
filter sample radius (in the source coordinate space) and has
some small errors because of this behavior.
This commit changes f.radius so that it is always in terms of
the filter function radius (in the destination coordinate space).
The sample radius can always be derived by multiplying f.radius
by filter_scale, which is the new, more descriptive name for the
previous inv_scale.
What a fucking waste of time. It depends on with which headers you
compile as well, so the situation is worse and more confusing than
you'd think. God knows what brain fart made them change the numeric
ID without changing the extension name or any other ways to keep
ABI-compatibility and without any warning.
Locale-independent, and doesn't have the char vs. unsigned char problem.
(Although in this case, the code was fine, because bstr.start is
unsigned char.)
This was a hack to let libmpv API users pass a d3d device to mpv. It's
not needed anymore for 2 reasons:
1. ANGLE does not have this problem
2. Even native GL via nVidia (where this failed) seems to not require
this anymore
Implements --hwdec=videotoolbox on iOS. Similar to hwdec_osx.c, but
using CVPixelBuffer APIs available on iOS instead of the equivalent
IOSurface APIs in macOS.
We can drop the custom table.
For some reason, the interop does not accept GL_RGB_RAW_422_APPLE as
internal format for GL_RGB_422_APPLE, so switch the format table to use
GL_RGB (this way both interop and real textures work the same).
Another victim of the apparent requirement of exactly matching texture
formats is kCVPixelFormatType_32BGRA. vo_opengl wants to handle this as
normal RGBA texture, with a swizzle applied in the shader.
CGLTexImageIOSurface2D() rejects this, because it wants the exact
internal format. Just drop the format, because it's useless anyway.
(Maybe this is a bit too fragile...)
All supported pixel formats have a specific "mapping" of CPU data to
textures. This function determines the number and the formats of these
textures. Moving it to a helper will be useful for some hardware decode
interop backends, since they all need similar things.
GL_LUMINANCE_ALPHA is the only reason why per-plane swizzles exist.
Remove per-plane swizzles (again), and regrettably handle them as
special cases (again). Carry along the logical texture format (called
gl_format in some parts of the code, including the new one).
We also don't need a use_integer flag, since the new gl_format member
implies whether it's an integer texture. (Yes, the there are separate
logical GL formats for integer textures. This aspect of the OpenGL API
is hysteric at best.)
This should change nothing about actual rendering logic and GL API
usage.
Originally, there was probably some sort of intention to restrict it to
formats supported by the interop, or something. But in the end it was
overcomplicated nonsense.
In the future, we could use mp_hwdec_ctx.supported_formats or other
mechanisms to handle this in a better way.
mp_hwdec_ctx.ctx is not set to a dummy pointer - hwdec_devices_load() is
only used to detect whether to vo_opengl interop is present, and the
common hwdec code expects that the .ctx field is not NULL.
This also changes videotoolbox-copy to use --videotoolbox-format,
instead of the FFmpeg-set default.
The code for copying a videotoolbox surface to mp_image was duplicated
(with some minor differences - I picked the hw_videotoolbox.c version,
because it was "better"). mp_imgfmt_from_cvpixelformat() is somewhat
duplicated with the vt_formats[] table, but this will be fixed in a
later commit, and moving the function to shared code is preparation.
This replaces the old backend that exclusively used EGL windowing with
one that can also use ANGLE's ability to render to directly to a
texture. The advantage of this is that it allows mpv to create the swap
chain itself and this allows mpv to use a flip-mode swap chain on a HWND
(which avoids problems with DirectComposition) and to use a longer swap
chain that has six backbuffers by default (which reportedly fixes
problems with rendering 24fps video on 24Hz monitors.)
Also, "screenshot window" should now work on DXGI 1.2 and up (Windows 8
and up.)
The lock was disabled recently. This commit gets rid of the dummied out
calls. The main reason for removing it is that there is no apparent need
for it anymore, and the new FFmpeg vaapi code does not use or provide
such a lock (there are some places which we cannot control and which do
vaapi API calls, like frame destructors).
It was basically inverted. Not sure how this even happened. Hopefully
it's more an "I don't know what I was doing" instead of an "I don't know
what I am doing" case.
vo_opengl used to have it as sub-option, which made it very hard to pass
down option values to backends in a generic way (even if these options
were completely backend-specific). For --opengl-dcomposition we used a
VOFLAG to deal with this. Fortunately, sub-options are gone, and we can
just add it as global option.
Move the option to context_angle.c and add it as global option. I
thought about adding a mechanism to let backends declare options, which
would get magically picked up my m_config instead of having to add them
to the global option list manually (similar to VO vo_driver.options),
but decided against this complexity just for 1 or 2 backends. Likewise,
it could have been added as a single option to avoid the boilerplate of
an option struct, but then again there are probably going to be more
angle suboptions, and it's cleaner.
Introduce the --opengl-hwdec-interop option, which replaces
--hwdec-preload. The new option allows explicit selection of the interop
backend.
This is relatively complex, and I would have preferred not to add this,
but it's probably useful to debug certain problems. In exchange, the
"new" option documents that pretty much any but the simplest use of it
will not be forward compatible.
Use the libavutil vdpau frame allocation code instead of our own "old"
code. This also uses its code for copying a video surface to normal
memory (used by vdpau-copy).
Since vdpau doesn't really have an internal pixel format, 4:2:0 can be
accessed as both nv12 and yuv420p - and libavutil prefers to report
yuv420p. The OpenGL interop has to be adjusted accordingly.
Preemption is a potential problem, but it doesn't break it more than it
already is.
This requires a bug fix to FFmpeg's vdpau code, or vdpau-copy (as well
as taking screenshots) will fail. Libav has fixed this bug ages ago.
Because it allows easier testing of filters + hwdec.
Make the texture setup code a bit more generic so it doesn't get too
much of a mess. We also use the GL renderer utility function
gl_find_unorm_format(), which saves us additional work with OpenGL's
semi-redundant format specifiers.
EGL rendering + new decode API didn't work due to a certain libva bug
with sort-of legacy API use hitting again. It will report the wrong
vaapi pixel format. It's old code and always nv12 anyway, so stop
worrying about it.
There are going to be users who have a Mesa installation which do not
support 10 bit, but a GPU which can decode to 10 bit. So it's probably
better not to hardcode whether it is supported.
Introduce a more general way to signal supported formats from renderer
to decoder. Obviously this is imperfect, because it still isn't part of
proper format negotation (for example, what if there's a vavpp filter,
which accepts anything). Still slightly better than before.
I don't know any way to probe for vaapi dmabuf/EGL dmabuf support
properly (in particular testing specific formats, not just general
availability). So we stay with the current approach and try to create
and map dummy surfaces on init to probe for support. Overdo it and check
all formats that AVHWFramesConstraints reports, instead of only NV12 and
P010 surfaces.
Since we can support unknown formats now, add explicitly checks to the
EGL/dmabuf mapper code to reject unsupported formats. I also noticed
that libavutil signals support for RGB0/BGR0, but couldn't get it to
work. Remove the DRM formats that are unused/didn't work the way I tried
to use them.
With this, 10 bit decoding + rendering should work, provided you have
a capable CPU and a patched Mesa. The required Mesa patch adds support
for the R16 and GR32 formats. It was sent by a Kodi developer to the
Mesa developer mailing list and was not accepted yet.
For surfaces allocated by libavutil, we assume that the sw_format (i.e.
in hw_subfmt in mp_image_params) is always correct. The API guarantees
that it explicitly sets the equivalent vaapi format on surface
allocation.
For surfaces allocated by mpv's old vaapi code, we explicitly retrieve
the format right after decoding. Unless the driver magically changes the
format asynchronously, it will still be correct once the surface reaches
the renderer.
In both cases, checking the format again is obviously redundant. In
addition, it doesn't require us to maintain a libva fourcc <-> mpfmt
table and the va_fourcc_to_imgfmt() function. This also unbreaks 10 bit
rendering support (still disabled by default).
This does not work, because Mesa has no support for the proposed
DRM_FORMAT_R16 and DRM_FORMAT_GR16 formats. It's also untested of
course.
As long as video/decode/vaapi.c doesn't hand down P010 surfaces, this is
fine anyway.
This can be tested by removing the code that disables P010 output:
diff --git a/video/decode/vaapi.c b/video/decode/vaapi.c
--- a/video/decode/vaapi.c
+++ b/video/decode/vaapi.c
@@ -55,13 +55,6 @@ static int init_decoder(struct lavc_ctx *ctx, int w, int h)
assert(!ctx->avctx->hw_frames_ctx);
- // If we use direct rendering, disallow 10 bit - it's probably not
- // implemented yet, and our downstream components can't deal with it.
- if (!p->own_ctx && required_sw_format != AV_PIX_FMT_NV12) {
- MP_WARN(ctx, "10 bit surfaces are currently supported.\n");
- return -1;
- }
-
mp_image_hw_download() is a libavutil wrapper added in the previous
commit. We drop our own code completely, as everything is provided by
libavutil and our helper wrapper.
This breaks the screenshot code, so that has to be adjusted as well.
I'm not sure what systems have <sys/poll.h> (maybe there are historical
reasons why some would), but POSIX defines <poll.h>. Although this code
is full of highly OS specific calls (like ioctl()), there's no reason
not to use the more standard include path.
vo_wayland_wait_events() is going to return when its time to swap the
buffers anyway, calling request_frame() before makes no sense.
Fixes the constant high CPU usage by the compositor when mpv is paused
and the window is in view.
This is actually a pretty important fix. eglChooseConfig() might be the
first thing that fails when porobing for desktop GL / ES2 / ES3 support,
because EGL_RENDERABLE_TYPE is set values specific to the underlying
APIs.
Not sure how the hell this worked before. EGL 1.4 implementations
certainly could fail the call with EGL_BAD_ATTRIBUTE if
EGL_RENDERABLE_TYPE has EGL_OPENGL_ES3_BIT set. It's quite possible that
many EGL implementations tolerate invalid EGLConfig values steming from
uininitialized EGLConfig values (and eglCreateWindowSurface() even is
specified to return EGL_BAD_CONFIG error code for "not valid"
EGLConfigs).
The way it should (probably) work is that selecting a RGBA framebuffer
format will simply make the compositor use the alpha. It works this way
on Wayland. On X11, this is... not done. Instead, both GLX and EGL
report two FB configs, which are exactly the same, except for the
platform-specific visual. Only the latter (non-default) points to a
visual that actually has alpha. So you can't make the pure GLX and EGL
APIs select alpha mode, and you have to override manually.
Or in other words, alpha was hacked violently into X11, in a way that
doesn't really make sense for the sake of compatibility, and forces API
users to wade through metaphorical cow shit to deal with it.
To be fair, some other platforms actually also require you to enable
alpha explicitly (rather than looking at the framebuffer type), but they
skip the metaphorical cow shit step.
So that the EGL code can use it too.
Also print the actual FB config ID, instead of nonsense. (I _think_ once
in the past a certain GLX implementation just used numeric config IDs
casted to EGLConfig - or at least that would explain this nonsense.)
Preparation for the following commits. Since at least theoretically the
config selection depends on the context type (EGL_RENDERABLE_TYPE has
separate bits for ES 2, ES 3, and desktop GL), doing it any other way
would be too painful.
For X11 garbage we have to pass some annoying parameters to EGL context
creation. Add some sort of extensible API, so that adding a new
parameter doesn't break all callers. We still want to keep it as a
single function, because it's so nice isolating all the EGL nonsense API
boilerplate like this. (Did I mention yet that X11 and EGL are garbage?)
Also somewhat simplifies the vo_flags mess in the helper internals.
The chroma alignment renormalization code forgot to account for the fact
that the chroma subsampling ratio has to be rotated.
Unfortunately, doing it this way seems to have somewhat broken the
chroma offset rotation logic for odd-sized subsampled image files. While
this is a bug, it's much, much less noticeable, so it's not nearly as
important as the bug this change fixes. Either way, a future patch needs
to still revise this logic, ideally by redesigning the entire rotation
mechanism.
Apparently we don't always set the viewport to window dimensions
anymore, e.g. if nothing is actually rendered. This means the viewport
can contain old values.
The window screenshot code uses the viewport values to guess the default
framebuffer dimensions. With --force-window --idle --no-osc (which draws
nothing and issues a glClear() command only), taking a screenshot would
yield an image with the wrong size and possibly garbage in it. Fix this
by explicitly passing the currently known window dimensions. Abusing the
values stored in the viewport was questionable anyway.
Long planned. Leads to some sanity.
There still are some rather gross things. Especially g_groups is ugly,
and a hack that can hopefully be removed. (There is a plan for it, but
whether it's implemented depends on how much energy is left.)
We want to avoid causing problems if libmpv is used in an application
that links cuda, or if the libav* libraries are linked with cuda,
as might happen if the scale_npp filter is used.
The test ended up failing if cuda.h wasn't present, even if cuda.h
isn't used during the actual build.
This test is attempting to establish if the ffmpeg being built
against has dynlink_cuda support. While it might theoretically be
possible to build against the older normally-linked-cuda version
of ffmpeg, it seems more trouble than it's worth.
This was a typo in the extensiuon spec and was probably always broken.
Could have led to broken builds when used with ancient ANGLE headers
(or possibly generic EGL headers).
The latest 375.xx nvidia drivers add support for P016 output
surfaces. In combination with an ffmpeg change to return those
surfaces, we can display them.
The bulk of the work is related to knowing which format you're
dealing with at the right time. Once you know, it's straight forward.
The intention was that if --blend-subtitles is enabled, the frame should
always be re-rendered instead of using e.g. a cached scaled frame. The
reason is that subtitles can change anyway, e.g. if you pause and change
subtitle size and such.
On the other hand, if the frame is marked as repeated, it should always
use the cached copy. Actually "simplify" this and drop the cache only if
playback is paused (which frame->still indicates indirectly).
Also see PR #3773.
unmap_current_image() is called after rendering. This essentially
invalidates the textures, so we can't assume that the image is still
present.
Also see PR #3773.
This allows us to define the tukey window (and other tapered windows).
Also add a missing option definition for `wblur` while we're at it, to
make testing out window-related stuff easier.
It's not that easy to decide whether a frame needs to be
reuploaded/rerendered. Using unique frame IDs for input makes it
slightly easier and more robust. This also removes the use of video PTS
in the interpolation path.
This should also avoid reuploading the video frame if it's just redrawn
in paused mode, or when using OSD/subtitles in cover art mode.
It turns out the glFlush() call really helps in some cases, though only
in audio timing mode (where we render, then wait for a while, then
display the frame). Add a --opengl-early-flush=auto mode, which does
exactly that.
It's unclear whether this is fine on OSX (strange things going on
there), but it should be.
See #3670.
Thread-local storage in GCC is platform-specific, and some platforms that
are otherwise perfectly capable of running mpv may lack TLS support in GCC.
This change adds a test for GCC variant of TLS and relies on its result
instead of assumption.
Provided that LLVM's `__thread` support is similar to GCC, the test is
called "GCC/LLVM TLS".
Signed-off-by: wm4 <wm4@nowhere>
- Change connector selection to accept human readable names (such as
eDP-1, HDMI-A-2) rather than arbitrary numbers.
- Change GPU selection to accept GPU number rather than device paths.
- Merge connector and GPU selection into one --drm-connector.
- Add support for --drm-connector=help.
- Add support for --drm-* in EGL backend.
- Refactor KMS; reduce state sharing across drm_common.
The glFlush() call was made optional recently
since it's not needed in most cases. On OSX though
this is needed since we removed kCGLPFADoubleBuffer
from the context creation, so the glFlush() call
was added to the cocoa backend only.
The CGLFlushDrawable() call can be safely removed
since it only does something when a double
buffered context is used. Also fixes a small typo.
Fixes#3627.
In "dumb mode" (where most features are disabled and which only performs
some basic rendering) we explicitly copy a set of whitelisted options,
and leave all the other options at their default values. Add the new
--opengl-early-flush option to this whitelist. Also remove an option
field accidentally added in the commit adding --opengl-early-flush.
It seems this can cause issues with certain platforms, so better to
disable it by default. The original reason for this isn't overly
justified, and display-sync mode should get rid of the need for it
anyway.
The new option is meant for testing, and will probably be removed if
nobody comes up and reports that enabling the option actually improves
anything.
Reduces code duplication between OpenGL backend and DRM VO.
(The control() for OpenGL backend isn't sufficiently similar to the
VO's control() to consider merging it as a whole - I extracted only the
FPS code.)
Other than being overly convoluted, this seems to make sense to me.
Except that to get the "rot" transform I have to set flip=true, which
makes no sense at all to me.
Combining rotation and cropping didn't work. It was just completely
broken.
I'm still not sure if this is correct. Chroma positioning seems to be
broken on rotation. There might also be a problem with non-mod-2 frame
sizes. Still, strictly an improvement for both rotated and non-rotated
rendering modes.
Also, this could probably be written in a more elegant way.