In "dumb mode" (where most features are disabled and which only performs
some basic rendering) we explicitly copy a set of whitelisted options,
and leave all the other options at their default values. Add the new
--opengl-early-flush option to this whitelist. Also remove an option
field accidentally added in the commit adding --opengl-early-flush.
It seems this can cause issues with certain platforms, so better to
disable it by default. The original reason for this isn't overly
justified, and display-sync mode should get rid of the need for it
anyway.
The new option is meant for testing, and will probably be removed if
nobody comes up and reports that enabling the option actually improves
anything.
Reduces code duplication between OpenGL backend and DRM VO.
(The control() for OpenGL backend isn't sufficiently similar to the
VO's control() to consider merging it as a whole - I extracted only the
FPS code.)
Other than being overly convoluted, this seems to make sense to me.
Except that to get the "rot" transform I have to set flip=true, which
makes no sense at all to me.
Combining rotation and cropping didn't work. It was just completely
broken.
I'm still not sure if this is correct. Chroma positioning seems to be
broken on rotation. There might also be a problem with non-mod-2 frame
sizes. Still, strictly an improvement for both rotated and non-rotated
rendering modes.
Also, this could probably be written in a more elegant way.
The video code can deal fine with feeding software image formats to
hwdec interop drivers. In RPI's case, this is preferable for
performance, working around OpenGL bugs (see RPI firmware issue #666),
and because OpenGL rendering doesn't bring too many advantages due to
RPI supporting GLES 2.0 only.
Maybe a way to force the normal video path is needed later. But
currently, this can be tested by just not loading the hwdec interop
driver.
If you run command-line mpv and set --hwdec to something that does
not load the RPI interop layer, you'll even have to use --hwdec-preload
manually to get it enabled.
Was intended to put the GL layer above the standard console. (But
actually that was done already, and the oddness I'm seeing seems to
be an unrelated bug.)
We always want to use __declspec(selectany) to declare GUIDs, but
manually including <initguid.h> in every file that used GUIDs was
error-prone. Since all <initguid.h> does is define INITGUID and include
<guiddef.h>, we can remove all references to <initguid.h> and just
compile with -DINITGUID to get the same effect.
Also, this partially reverts 622bcb0 by re-adding libuuid.a to the
build, since apparently some GUIDs (such as GUID_NULL) are not declared
in the source file, even when INITGUID is set.
Obviously, in the vast majority of cases, there's only one device
in the system, but doing this means we're more likely to get a
usable device in the multi-device case.
cuda would support decoding on one device and displaying on another
but the peer memory handling is not transparent and I have no way
to test it so I can't really write it.
The documentation around this stuff is poor, but I found an nvidia
sample that demonstrates how to use the interop API most efficiently.
(https://github.com/nvpro-samples/gl_cuda_interop_pingpong_st)
Key lessons are:
1) you can register the texture itself and have cuda write to it,
thereby skipping an additional copy through the PBO.
2) You don't have to be mapped when you do the copy - once you get a
mapped pointer, it remains valid. Magic!
This lets us throw out the PBOs as well as much of the explicit
alignment and stride handling.
CPU usage is slightly (~3%) lower for 4K content in one test case,
so it makes a detectable difference, and presumably saves memory.
When we rotate the inmage by 90° or 270°, chroma width and height need
to be swapped.
Fixes#3568.
But is the chroma sub location correct? Who the hell knows...
Extend the flag-based notification mechanism that was used via
M_OPT_TERM. Make the vo_opengl update mechanism use this (which, btw.,
also fixes compilation with OpenGL renderers forcibly disabled).
While this adds a 3rd mechanism and just seems to further the chaos, I'd
rather have a very simple mechanism now, than actually furthering the
mess by mixing old and new update mechanisms. In particular, we'll be
able to remove quite some property implementations, and replace them
with much simpler update handling. The new update mechanism can also
more easily refactored once we have a final mechanism that handles
everything in an uniform way.
There were multiple values under M_OPT_EXIT (M_OPT_EXIT-n for n>=0).
Somehow M_OPT_EXIT-n either meant error code n (with n==0 no error?), or
the number of option valus consumed (0 or 1). The latter is MPlayer
legacy, which left it to the option type parsers to determine whether an
option took a value or not. All of this was changed in mpv, by requiring
the user to use explicit syntax ("--opt=val" instead of "-opt val").
In any case, the n value wasn't even used (anymore), so rip this all
out. Now M_OPT_EXIT-1 doesn't mean anything, and could be used by a new
error code.
Negative height is used to signal a flipped framebuffer. There's
absolutely no reason to pass this down to overlay_adjust(), and only
requires implementers to deal with an additional special-case.
'cuda-gl' isn't right - you can turn this on without any GL and
get some non-zero benefit (with the cuda-copy hwaccel). So
'cuda-hwaccel' seems more consistent with everything else.
This happened to break because the texture unit wasn't reset to 0, which
some code expects. The OSD code in particular set the OSD texture on the
wrong texture unit, with the result that OSD/OSC was not visible.
A minor cleanup that makes the code simpler, and guarantees that we
cleanup the GL state properly at any point.
We do this by reusing the uniform caching, and assigning each sampler
uniform its own texture unit by incrementing a counter. This has various
subtle consequences for the GL driver, which hopefully don't matter. For
example, it will bind fewer textures at a time, but also rebind them
more often.
For some reason we keep TEXUNIT_VIDEO_NUM, because it limits the number
of hook passes that can be bound at the same time.
OSD rendering is an exception: we do many passes with the same shader,
and rebinding the texture each pass. For now, this is handled in an
unclean way, and we make the shader cache reserve texture unit 0 for the
OSD texture. At a later point, we should allocate that one dynamically
too, and just pass the texture unit to the OSD rendering code. Right now
I feel like vo_rpi.c (may it rot in hell) is in the way.
The caller now has to call gl_sc_reset(), and _after_ rendering. This
way we can unset OpenGL state that was setup for rendering. This affects
the shader program, for example. The next commit uses this to
automatically manage texture units via the shader cache.
vo_rpi.c changes untested.
Stops Mesa from restricting us to OpenGL 3.0. It also tries to create
GLES 3 contexts for drivers which do not just return a higher context
when requesting GLES 2.
I don't know whether this code is a good or bad idea. A not-so-good
aspect is that we don't check for EGL 1.5 (or 1.4 extensions) for some
of the more advanced context attributes. But EGL implementations should
be able to tolerate it and return an error, and then we'd use the
fallback.
This used to be shared, but since vo_rpi is going to be removed,
untangle them. There was barely any actual code shared since the recent
changes anyway.
As a subtle change, we also stop opening libGLESv2.so explicitly in the
vo_opengl backend, and use RTLD_DEFAULT instead.
Minimal support just for testing.
Only the window surface creation (including size determination) is
really platform specific, so this could be some generic thing with
platform-specific support as some sort of sub-driver, but on the other
hand I don't see much of a need for such a thing.
While most of the fbdev usage is done by the EGL driver, using this
fbdev ioctl is apparently the only way to get the display resolution.
Add a function to egl_helpers.c for creating an EGL context and make
context_x11egl.c use it. This is meant to be generic, and should work
with other windowing APIs as well. The other EGL-using code in mpv can
be switched to it.
The wrong enum got copied here, so it was essentially using the transfer
characteristics as the primaries (instead of the primaries), which
accidentally worked fine most of the time (since the two usually
coincided), but broke on weird/mistagged files.
The consequence of this was that e.g. hardware decoding with VAAPI-EGL
could sometimes not work if the compiler didn't support C11. (Although I
found this one on RPI, which also uses this mechanism.)
If the shader fails to compile, and assertion could trigger in
gl_sc_gen_shader_and_reset() due to the code trying to recreate the
shader every time, and re-appending the uniforms every time. Just reset
the uniform array to fix this.
Some disturbed GL drivers might not return anything for glGetShaderiv()
if the GL state got "lost", so initialize variables just for additional
robustness.
Since vo_rpi is going to be deprecated, better port its features to the
vo_opengl backend.
The most tricky part is the fact that recreating dispmanx elements will
conflict with the GL context. Fortunately, RPI's EGL support is
reasonably compliant, and we can transplant the context to newly created
dispmanx elements, making this much easier. This means unlike vo_rpi,
the GL state will actually not be recreated.
This overlay support specifically skips the OpenGL rendering chain, and
uses GL rendering only for OSD/subtitles. This is for devices which
don't have performant GL support.
hwdec_rpi.c contains code ported from vo_rpi.c. vo_rpi.c is going to be
deprecated. I left in the code for uploading sw surfaces (as it might
be slightly more efficient for rendering sw decoded video), although
it's dead code for now.
The nvidia examples use the old (as in CUDA 3.x) interop API which
is deprecated, and I think not even functional on recent versions
of CUDA for windows. As I was following the examples, I used this
old API.
So, let's update to the new API, and hopefully, it'll start working
on windows too.
Just another corner-caseish potential issue. Unlike unreffing the image
manually, unref_current_image() also takes care of properly unmapping
hwdec frames. (The corner-case part of this is that it's probably never
mapped at this point, but it's apparently not entirely guaranteed.)