Async compute in particular seems to cause problems on some drivers, and
even when supprted the benefits are not that massive from the tests I
have seen, so it's probably safe to keep off by default.
Async transfer on the other hand seems to work better and offers a more
substantial improvement, so it's kept on.
This gets confused by e.g. SPARSE_BIT on the TRANSFER_BIT, leading to
situations where "more specialized" is ambiguous and the logic breaks
down. So to fix it, only compare the subset we care about.
blit() implies scaling, copy() is the equivalent command to use when the
formats are compatible (same pixel size) and the rects have the same
dimensions.
This allows RAs with support for non-opaque FBO formats to use a more
appropriate FBO format for the output tex, possibly enabling a more
efficient blit operation.
This requires distinguishing between real formats (which can be used to
create textures) and fake formats (e.g. ra_gl's FBO hack).
On AMD devices, we only get one graphics pipe but several compute pipes
which can (in theory) run independently. As such, we should prefer
compute shaders over fragment shaders in scenarios where we expect them
to be better for parallelism.
This is amusingly trivial to do, and actually improves performance even
in a single-queue scenario.
Instead of using a single primary queue, we generate multiple
vk_cmdpools and pick the right one dynamically based on the intent.
This has a number of immediate benefits:
1. We can use async texture uploads
2. We can use the DMA engine for buffer updates
3. We can benefit from async compute on AMD GPUs
Unfortunately, the major downside is that due to the lack of QF
ownership tracking, we need to use CONCURRENT sharing for all resources
(buffers *and* images!). In theory, we could try figuring out a way to
get rid of the concurrent sharing for buffers (which is only needed for
compute shader UBOs), but even so, the concurrent sharing mode doesn't
really seem to have a significant impact over here (nvidia). It's
possible that other platforms may disagree.
Our deadlock-avoidance strategy is stupidly simple: Just flush the
command every time we need to switch queues, and make sure all
submission and callbacks happen in FIFO order. This required lifting the
cmds_pending and cmds_queued out from vk_cmdpool to mpvk_ctx, and some
functions died/got moved as a result, but that's a relatively minor
change.
On my hardware this is a fairly significant performance boost, mainly
due to async transfers. (Nvidia doesn't expose separate compute queues
anyway). On AMD, this should be a performance boost as well due to async
compute.
This is especially interesting for vulkan since it allows completely
skipping the layout transition as part of the renderpass. Unfortunately,
that also means it needs to be put into renderpass_params, as opposed to
renderpass_run_params (unlike #4777).
Closes#4777.
This uses the new vk_signal mechanism to order all access to textures.
This has several advantageS:
1. It allows real synchronization of image access across multiple frames
when using multiple queues for parallelism.
2. It allows using events instead of pipeline barriers, which is a
finer-grained synchronization primitive that allows for more
efficient layout transitions over longer durations.
This commit also restructures some of the implicit transition code for
renderpasses to be more flexible and correct. (Note: this technically
drops the ability to transition the image out of undefined layout when
not blending, but that was a bug anyway and needs to be done properly)
vo_gpu: vulkan: remove no-longer-true optimization
The change to the output_tex format makes this no longer true, and it
actually seems to hurt performance now as well. So just don't do it
anymore. I also realized it hurts performance when drawing an OSD, so
it's probably not a good idea anyway.
This combines VkSemaphores and VkEvents into a common umbrella
abstraction which can resolve to either.
We aggressively try to prefer VkEvents over VkSemaphores whenever the
conditions are met (1. we can unsignal the semaphore, i.e. it comes from
the same frame; and 2. it comes from the same queue).
Instead of being submitted immediately, commands are appended into an
internal submission queue, and the actual submission is done once per
frame (at the same time as queue cycling). Again, the benefits are not
immediately obvious because nothing benefits from this yet, but it will
make more sense for an upcoming vk_signal mechanism.
This also cleans up the way the ra_vk submission interacts with the
synchronization/callbacks from the ra_vk_ctx. Although currently, the
way the dependency is signalled is a bit hacky: normally it would be
associated with the ra_tex itself and waited on in the appropriate stage
implicitly. But that code is just temporary, so I'm keeping it in there
for a better commit order.
Instead of associating a single VkSemaphore with every command buffer
and allowing the user to ad-hoc wait on it during submission, make the
raw semaphores-to-signal array work like the raw semaphores-to-wait-on
array. Doesn't really provide a clear benefit yet, but it's required for
upcoming modifications.
1. No more static arrays (deps / callbacks / queues / cmds)
2. Allows safely recording multiple commands at the same time
3. Uses resources optimally by never over-allocating commands
Libav has been broken due to the hwdec changes. This was always a
temporary situation (depended on pending patches to be merged), although
it took a bit longer. This also restores the travis config.
One code change is needed in vd_lavc.c, because it checks the AV_PIX_FMT
for videotoolbox (as opposed to the mpv format identifier), which is not
available in Libav. Add an ifdef; the affected code is for a deprecated
option anyway.
This hack was part of a solution to VSync judder in desktop OpenGL on
Windows. Rather than using blocking-SwapBuffers(), mpv could use
DwmFlush() to wait for the image to be presented by the compositor.
Since this would only work while the compositor was running, and the
compositor was silently disabled when OpenGL entered exclusive
fullscreen mode, mpv needed a way to detect exclusive fullscreen mode.
The code that is being removed could detect exclusive fullscreen mode by
checking the state of an undocumented mutex using undocumented native
API functions, but because of how fragile it was, it was always meant to
be removed when a better solution for accurate VSync in OpenGL was
found. Since then, mpv got the dxinterop backend, which uses desktop
OpenGL but has accurate VSync. It also got a native Direct3D 11 backend,
which is a viable alternative to OpenGL on Windows.
For people who are still using desktop OpenGL with WGL, there shouldn't
be much of a difference, since mpv can use other API functions to detect
exclusive fullscreen.
Refactored and split the `reinit_window_state` code into four
separate functions:
- `update_window_style` used to update window styles without
modifying the window rect.
- `fit_window_on_screen` used to adjust the window size when it is
larger than the screen size. Added a helper function `fit_rect` to
fit one rect on another without using any data from w32 struct.
- `update_fullscreen_state` used to calculate the new fullscreen
state and adjust the window rect accordingly.
- `update_window_state` used to display the window on screen with
new size, position and ontop state.
This commit fixes three issues:
- fixed#4753 by skipping `fit_window_on_screen` for a maximized
window, since maximized window should already fit on the screen.
It should be noted that this bug was only reproducible with
`--fit-border` option which is enabled by default. The cause of the
bug is that after calling the `add_window_borders` for a maximized
window, the rect in result is slightly larger than the screen rect,
which is okay, `SetWindowPos` will interpret it as a maximized state
later, so no auto-fitting to screen size is needed here.
- fixed#5215 by skipping `fit_window_on_screen` when leaving fullscreen.
On a multi-monitor system if the mpv window was stretched to cover
multiple monitors, its size was reset after switching back from
fullscreen to fit the size of the active monitor. Also, when changing
`--ontop` and `--border` options, now only the
`update_window_style` and `update_window_state` functions are used,
so `fit_window_on_screen` is not used for them too.
- fixed#2451 by moving the `ITaskbarList2_MarkFullscreenWindow`
below the `SetWindowPos`. If the taskbar is notified about fullscreen
state before the window is shown on screen, the taskbar button could
be missing until Alt-TAB is pressed, usually it was reproducible on
Windows 8.
Other changes:
- In `update_fullscreen_state` the `reset window bounds` debug
message now reports client area size and position, instead of window area
size and position. This is done for consistency with debug messages
in handling fullscreen state above in this function, since they also print
window bounds of the client area.
- Refactored `gui_thread_reconfig`. Added a new window flag `fit_on_screen`
to fit the window on screen even when leaving fullscreen. This is needed
for the case when the new video opened while the window is still in the
fullscreen state.
- Moved parent and fullscreen state checks out from the WM_MOVING to
`snap_to_screen_edges` function for consistency with other functions.
There's no point in keeping these checks out of the function body.
When window and screen size and position are stored in RECT, it's
much easier to modify them using WinAPI functions.
Added two macros to get width and height of the rect.
I've decided that MP_TRACE means “noisy spam per frame”, whereas
MP_DBG just means “more verbose debugging messages than MSGL_V”.
Basically, MSGL_DBG shouldn't create spam per frame like it currently
does, and MSGL_V should make sense to the end-user and provide mostly
additional informational output.
MP_DBG is basically what I want to make the new default for --log-file,
so the cut-off point for MP_DBG is if we probably want to know if for
debugging purposes but the user most likely doesn't care about on the
terminal.
Also, the debug callbacks for libass and ffmpeg got bumped in their
verbosity levels slightly, because being external components they're a
bit less relevant to mpv debugging, and a bit too over-eager in what
they consider to be relevant information.
I exclusively used the "try it on my machine and remove messages from
MSGL_* until it does what I want it to" approach of refactoring, so
YMMV.
Annoying exception that makes no sense to keep. Normally, users or
client applications will either use --hwdec=auto, or not set the option
at all, which both leads to the expected result.
Full range YUV causes problems everywhere. For example it's usually the
wrong choice when using encoding mode, and libswscale sometimes messes
up when converting to full range too. (In this partricular case, we
found that converting rgba->yuv420p16 full range actually seems to
output limited range.)
This actually restores a similar heueristic from the late vf_scale.c.
When autoprobing the hwdec interops (which now happens to all compiled
interops if hardware decoding is used), failure to load an interop
should not print an error in the normal case. So hide it.
(We could make the log level conditional on whether autoprobing is used,
but directly loading it without autoprobing is obscure, and most other
interops don't do this either.)
For METHOD_INTERNAL hwdecs (non-copy cases), make sure the VO interops
are always loaded, because those decoders will output hardware pixel
formats, which will need special support in vo_gpu. Otherwise,
initialization will fail, complaining that it can't convert the output
format to something the VO supports.
* Distinguish between the window being moved or not.
* Skip trying to snap if currently in full screen or an embedded
window.
* Exit snapped state if the size changed when the window was being
moved.
Check the expected width and height against up-to-date
window placement. If they do not match, we will consider snapping
to have happened on Windows' side.
Fixes display-sync (though if you change virtual desktops you'll need to seek
to re-enable display-sync) partially under wayland.
As an advantage, rendering is completely disabled if you change desktops or
alt+tab so you lose no performance if you leave mpv running elsewhere as long
as it isn't visible.
This could also be ported to other VOs which supports it.
We need to support hardware/drivers which do not support ARGB8888 in
their primary plane.
We also use p->primary_plane_format when creating the gbm surface, to
make sure it always matches (in actuality there should be little
difference).
Passing in an invalid DRM overlay id with the --drm-overlay option would
cause drmplane to be freed twice: once in the for-loop and once at the
error-handler label fail.
Solve by setting drmpanel to NULL after freeing it.
Also the 'return false' statement after the error handler label should
probably be 'return NULL', given that the return type of
drm_atomic_create_context returns a pointer.
vo_x11 and vo_xv need this. According to the Linux manpage, all involved
functions are POSIX-2001 anyway. (I just assumed they were not, because
they're mostly System V UNIX legacy garbage.)
If the codec uses AV_CODEC_HW_CONFIG_METHOD_INTERNAL, and we're using
the -copy method, then don't request the native pix_fmt. It might not
have a AVFrame.hw_frames_ctx set, and we couldn't read back at all. On
top of that, most of those decoders probably don't provide read-back
when using such opaque formats anyway, while providing separate decoding
modes to decode to RAM.
Finally get rid of all the HWDEC_* things, and instead rely on the
libavutil equivalents. vdpau still uses a shitty hack, but fuck the
vdpau code.
Remove all the now unneeded remains. The vdpau preemption thing was not
unused anymore; if someone cares this could probably be restored.
This code is for trying to avoid using an emulation layer when using
auto probing, so that we end up using the actual API the drivers
provide. It was destroyed in the recent refactor.
With the recent changes, mpv's internal mechanisms got synced to
libavcodec's once more. Some things are still needed for filters (until
the mechanism gets replaced), but there's no need to require other hwdec
methods to use these fields. So remove them where they are unnecessary.
Also fix some minor leaks in the dxva2 backends, and set the driver_name
field in the Apple ones. Untested on Apple crap.
Otherwise, if e.g. "nvdec" didn't work, but "nvdec-copy" did, it would
never try "vdpau", which is actually the next non-copy mode on the
autprobe list. It's really expected that it selects "vdpau". Fix this by
sorting the -copy modes to the end of the final hwdec list.
But we still don't want preferred -copy modes like "nvdec-copy" to be
sorted after fragile non-preferred modes like "cuda", and --hwdec=auto
should prefer "nvdec-copy" over it, so make sure the copying mode does
not get precedence over preferred vs. non-preferred mode.
Also simplify the existing auto_pos sorting condition, and fix the
fallback sort order (although that doesn't matter too much).
Change it from explicit metadata about every hwaccel method to trying to
get it from libavcodec. As shown by add_all_hwdec_methods(), this is a
quite bumpy road, and a bit worse than expected.
This will probably cause a bunch of regressions. In particular I didn't
check all the strange decoder wrappers, which all cause some sort of
special cases each. You're volunteering for beta testing by using this
commit.
One interesting thing is that we completely get rid of mp_hwdec_ctx in
vd_lavc.c, and that HWDEC_* mostly goes away (some filters still use it,
and the VO hwdec interops still have a lot of code to set it up, so it's
not going away completely for now).
The libavcodec mediacodec support does not conform to the new hwaccel
APIs yet. It has been agreed uppon that this glue code can be deleted
for now, and support for it will be restored at a later point.
Readding would require that it supports the AVCodecContext.hw_device_ctx
API. The hw_device_ctx would then contain the surface ID.
vo_mediacodec_embed would actually perform the task of creating
vo.hwdec_devs and adding a mp_hwdec_ctx, whose av_device_ref is a
AVHWDeviceContext containing the android surface.
It makes more sense to have it in the general video directory (along
with vdpau.c and vaapi.c), since the decoder source files don't even
access it anymore.
Like with all hwaccels, there's little that is actually specific to
decoding (which has been moved away anyway), and what is left are
declarations (which will also go away soon).
Lots of shit code for nothing. We probably could just use libavutil's
code for all of this. But for now go with this, since it tends to
prevent stupid terminal messages during probing (libavutil has no
mechanism to selectively suppress errors specifically during probing).
Ignores the "emulated" API flag (for avoiding vaapi/vdpau wrappers), but
it doesn't matter that much for -copy anyway.
This leaked 2 unreffed AVFrame structs (roughly 1KB) per decoded frame.
Can I blame the FFmpeg API and the weird difference between freeing and
unreffing an AVFrame?
The idea is to get rid of vd_lavc_hwdec, so special functionality like
this has to go somewhere else. At this point, hwframes_refine is only
needed for d3d11, and it doesn't do much, so for now the new callback
has no context. In can be made more fancy if really needed.
The testing_only field is not referenced anymore with vaglx removed and
the previous commit dropping all uses.
The ra_hwdec_driver.api field became unused with the previous commit,
but all hwdec interop drivers still initialized it.
Since this touches highly OS-specific code, build regressions are
possible (plus the previous commit might break hw decoding at runtime).
At least hwdec_cuda.c still used the .api field, other than initializing
it.
Make the VO<->decoder interface capable of supporting multiple hwdec
APIs at once. The main gain is that this simplifies autoprobing a lot.
Before this change, it could happen that the VO loaded the "wrong" hwdec
API, and the decoder was stuck with the choice (breaking hw decoding).
With the change applied, the VO simply loads all available APIs, so
autoprobing trickery is left entirely to the decoder.
In the past, we were quite careful about not accidentally loading the
wrong interop drivers. This was in part to make sure autoprobing works,
but also because libva had this obnoxious bug of dumping garbage to
stderr when using the API. libva was fixed, so this is not a problem
anymore.
The --opengl-hwdec-interop option is changed in various ways (again...),
and renamed to --gpu-hwdec-interop. It does not have much use anymore,
other than debugging. It's notable that the order in the hwdec interop
array ra_hwdec_drivers[] still matters if multiple drivers support the
same image formats, so the option can explicitly force one, if that
should ever be necessary, or more likely, for debugging. One example are
the ra_hwdec_d3d11egl and ra_hwdec_d3d11eglrgb drivers, which both
support d3d11 input.
vo_gpu now always loads the interop lazily by default, but when it does,
it loads them all. vo_opengl_cb now always loads them when the GL
context handle is initialized. I don't expect that this causes any
problems.
It's now possible to do things like changing between vdpau and nvdec
decoding at runtime.
This is also preparation for cleaning up vd_lavc.c hwdec autoprobing.
It's another reason why hwdec_devices_request_all() does not take a
hwdec type anymore.
nvdec aka cuvid aka cuda should work much better than vdpau, and support
newer codecs (such as vp9), and more advanced surface formats (like 10
bit).
This requires moving the d3d hwaccels in the autoprobe order, since on
Windows, d3d decoding should be preferred over nvidia proprietary stuff.
Users of older drivers will need to force --hwdec=vdpau, since it could
happen that the vo_gpu cuda hwdec interop loads (so the vdpau interop is
not loaded), but the hwdec itself doesn't work.
I expect this does not break AMD (which still needs vdpau for vo_gpu
interop, until libva is fixed so it can fully support AMD).
This has stopped being useful a long time ago, and it's the only GPL
source file in the vo_gpu source directories. Recently it wasn't even
loaded at all, unless you forced loading it.
They were added to the "to deleted" list and never relicensed, because I
thought I'd delete them early. But it's possible that they'll stay in
mpv for a longer time, so relicense them. Still leaving them as
deprecated and scheduled for removal, so they can still be dropped once
there is a better way to deal with them, if they get annoying, or if a
better mechanism is found that makes them unnecessary.
All contributors agreed. There are some minor changes by people who did
not agree, but these are all not relevant or have been removed.
Almost all of them had their guts removed and replaced by libavfilter
long ago, but remove them anyway. They're pointless and have been
scheduled for deprecation.
Still leave vf_format (because we need it in some form) and vf_sub (not
sure).
This will break some builtin functionality: lavfi yadif defaults are
different, auto rotation and stereo3d downconversion are broken. These
might be fixed later.
We want to drop vf_scale, but we still need a way to auto convert
between imgfmts. In particular, vf.c will auto insert the "scale" filter
if the VO doesn't support a pixfmt.
To avoid chaos, create a new vf_convert.c filter, based on vf_scale.c,
but without the unrelicensed code parts. In particular, this filter does
not do scaling and has no options. It merely converts from one imgfmt to
another, if needed.
The D3D11_CREATE_DEVICE_BGRA_SUPPORT flag doesn't enable support for
BGRA textures. BGRA textures will be supported whether or not the flag
is passed. The flag just fails device creation if they are not supported
as an API convenience for programs that need BGRA textures, such as
programs that use D2D or D3D9 interop. We can handle devices without
BGRA support fine, so don't bother with the flag.
For consistency with already implemented shcore.dll
function loading in w32->api:
Moved loading of imm32.dll to w32_api_load, and declare
pImmDisableIME function pointer in the w32->api struct.
Removed unloading of imm32.dll.
Seems like the last refactor to this code broke playing flipped images,
at least with --opengl-pbo --gpu-api=opengl.
Flipping is rather a shitmess. The main problem is that OpenGL does not
support flipped uploading. The original vo_gl implementation considered
it important to handle the flipped case efficiently, so instead of
uploading the image line by line backwards, it uploaded it flipped, and
then flipped it in the renderer (basically the upload path ignored the
flipping). The ra code and backends probably have an insane and
inconsistent mix of semantics, so fix this by never passing it flipped
images in the first place.
In the future, the backends should probably support flipped images
directly.
Fixes#5097.
Like the manual says, this is technically undefined behaviour. See:
https://msdn.microsoft.com/en-us/library/windows/desktop/ff476085.aspx
In particular, MSDN says texture arrays created with the BIND_DECODER
flag cannot be used with CreateShaderResourceView, which means they
can't be sampled through SRVs like normal Direct3D textures. However,
some programs (Google Chrome included) do this anyway for performance
and power-usage reasons, and it appears to work with most drivers.
Older AMD drivers had a "bug" with zero-copy decoding, but this appears
to have been fixed. See #3255, #3464 and http://crbug.com/623029.
The shader cache in ra_d3d11 caches the result of shaderc, crossc and
the D3DCompiler DLL, so it should be invalidated when any of those
components are updated. This should make the cache more reliable, which
makes it safer to enable gpu-shader-cache-dir. Shader compilation is
slow with D3D11, so gpu-shader-cache-dir is highly necessary
Some shaders take a _long_ time to compile with the Direct3D compiler.
The ANGLE backend had this problem too, to a certain extent. Logging
should help identify which shaders cause long stalls and could also help
with benchmarking ways of reducing compile times.
ra_d3d11 uses the SPIR-V compiler to translate GLSL to SPIR-V, which is
then translated to HLSL. This means it always exposes the same GLSL
version that the SPIR-V compiler supports (4.50 for shaderc/glslang.)
Despite claiming to support GLSL 4.50, some features that are tied to
the GLSL version in OpenGL are not supported by ra_d3d11 when targeting
legacy Direct3D feature levels.
This includes two features that mpv relies on:
- Reading from gl_FragCoord in the fragment shader (requires FL 10_0)
- textureGather from any texture component (requires FL 11_0)
These features have been exposed as new RA caps.
This is a new RA/vo_gpu backend that uses Direct3D 11. The GLSL
generated by vo_gpu is cross-compiled to HLSL with SPIRV-Cross.
What works:
- All of mpv's internal shaders should work, including compute shaders.
- Some external shaders have been tested and work, including RAVU and
adaptive-sharpen.
- Non-dumb mode works, even on very old hardware. Most features work at
feature level 9_3 and all features work at feature level 10_0. Some
features also work at feature level 9_1 and 9_2, but without high-bit-
depth FBOs, it's not very useful. (Hardware this old is probably not
fast enough for advanced features anyway.)
Note: This is more compatible than ANGLE, which requires 9_3 to work
at all (GLES 2.0,) and 10_1 for non-dumb-mode (GLES 3.0.)
- Hardware decoding with D3D11VA, including decoding of 10-bit formats
without truncation to 8-bit.
What doesn't work / can be improved:
- PBO upload and direct rendering does not work yet. Direct rendering
requires persistent-mapped PBOs because the decoder needs to be able
to read data from images that have already been decoded and uploaded.
Unfortunately, it seems like persistent-mapped PBOs are fundamentally
incompatible with D3D11, which requires all resources to use driver-
managed memory and requires memory to be unmapped (and hence pointers
to be invalidated) when a resource is used in a draw or copy
operation.
However it might be possible to use D3D11's limited multithreading
capabilities to emulate some features of PBOs, like asynchronous
texture uploading.
- The blit() and clear() operations don't have equivalents in the D3D11
API that handle all cases, so in most cases, they have to be emulated
with a shader. This is currently done inside ra_d3d11, but ideally it
would be done in generic code, so it can take advantage of mpv's
shader generation utilities.
- SPIRV-Cross is used through a NIH C-compatible wrapper library, since
it does not expose a C interface itself.
The library is available here: https://github.com/rossy/crossc
- The D3D11 context could be made to support more modern DXGI features
in future. For example, it should be possible to add support for
high-bit-depth and HDR output with DXGI 1.5/1.6.
Backported from @haasn's change to libplacebo, except in the current RA,
there's nothing to indicate an ra_format can be bound as a storage
image, so there's no way to force all of these formats to have a
glsl_format. Instead, the layout qualifier will be removed if
glsl_format is NULL.
This is needed for the upcoming ra_d3d11 backend. In Direct3D 11, while
loading float values from unorm images often works as expected, it's
technically undefined behaviour, and in Windows 10, it will cause the
debug layer to spam the log with error messages. Also, apparently in
GLSL, the format name must match the image's format exactly (but in
Direct3D, it just has to have the same component type.)
Backported from @haasn's change to libplacebo. More flexible than the
previous "shared || non-shared" distinction. The extra flexibility is
needed for Direct3D 11, but it also doesn't hurt code-wise.
For some reason vo_lavc's draw_image can buffer the frame and encode it
only later. Also, there is logic for rendering the OSD (i.e. subtitles)
only when needed.
In theory this can lead to subtitles being pruned before it tries to
render them (as the subtitle logic doesn't know that the VO still needs
them later), although this probably never happens in reality.
The worse issue, that actually happened, is that if the last frame gets
buffered, it attempts to render subtitles in the uninit callback. At
this point, the subtitle decoder is already torn down and all subtitles
removed, thus it will draw nothing. This didn't always happen. I'm not
sure why - potentially in the working cases, the frame wasn't buffered.
Since this logic doesn't have much worth, except a minor performance
advantage if frames with subtitles are dropped, just remove it.
Hopefully fixes#4689.
Repeating frames (for display-sync) is not supposed to render the entire
frame again. When using hardware decoding, it unfortunately did: the
renderer uses the frame ID to check whether the frame data changed, and
unmapping the hwdec frame clears it.
Essentially reverts commit 761eeacf54. Back then I probably
thought it would be a good idea to release the hwdec image quickly in
order to return it to the decoder, but they're referenced anyway.
This should increase the performance and reduce GPU work.
Normally such code is didsabled by have_mglsl==false in
check_gl_features(), but apparently not this one.
Just fix it. Seems also more readable.
Fixes#5069.
Apparently this is required, but it doesn't check for it. To be fair,
this was tested by creating a compatibility context and pretending it's
GL 2.1. GL_ARB_shader_storage_buffer_object actually requires GL 4.0 or
up, but GL_ARB_uniform_buffer_object requires only GL 2.0.
vo_gpu.c will call gl_video_icc_auto_enabled() to check whether it
should retrieve the ICC profile. But the value returned by this function
will be outdated, because gl_video_update_options() is not called yet.
Change the order of function calls so that this is done after updating
the options.
(This is fairly chaotic, but I guess this code will be refactored a
dozen of times anyway in the future.)
All this code used to be required by the old variants of the libavcodec
hw decoding APIs. Almost all of that is gone, although the mediacodec
API unfortunately still pulls in some old stuff (but not all of it).
(mediacodec build/functionality is untested, but should work.)
All of this was dead code and completely unused.
get_buffer2_hwdec() is the biggest chunk. One unfortunate thing about it
is that, while it was active, it could perform a software fallback much
faster, because it didn't have to wait until a full frame is decoded (it
actually decoded a full frame, but the current code has to decode many
more frames due to the codec delay, because the current code waits until
the API returns a decoded frame.) We should probably restore the latter,
although since it's an optional optimization, and the current behavior
doesn't change with the removal of this code, don't actually do anything
about it.
This is where it should be. It only wasn't because of an old libavcodec
bug, that returned the side data only on every IDR. This required some
sort of caching, which is now dropped. (mp_image wouldn't have been able
to do this kind of caching, because this code is stateless.) We don't
support these old libavcodec versions anymore, which is why this is not
needed anymore.
Also move initialization of rotation/stereo stuff to dec_video.c.
This simply didn't work. Unlike cuda-copy, this is a true hwaccel, and
obviously we need to provide it a device.
Implement this in a relatively generic way, which can probably reused
directly by videotoolbox (not doing this yet because it would require
testing on OSX).
Like with cuda-copy, --cuda-decode-device is ignored. We might be able
to provide a more general way to select devices at some later point.
This is just a dumb consequence of HWDEC_ types somehow being part of
both decoder and VO. Obviously, the VO should only care about supporting
specific hardware surface types or providing specific device types, but
until they are separated, stupid unintuitive mismatches will occur.
See manpage additions.
(In ffmpeg-mpv and Libav, this is still called "cuvid". Libav won't work
yet, because it has no frame params support yet, but this could get
fixed soon.)
This removes the need for codec- and API-specific knowledge in the
libavcodec hardware acceleration API user. For mpv, this removes the
need for vd_lavc_hwdec.pixfmt_map and a few other things. (For now, we
still keep the "old" parts for the sake of supporting older Libav, and
FFgarbage.)
params->rc was ignored in the calculation for the buffer size. I fucking
hate this stupid ra_tex_upload signature where *rc is randomly relevant
or not.
Coverity complains about this, but it's probably a false positive.
Anyway, rewrite it in a slightly more readable way. Now it's more
obvious that it is correct.
Should speed up seeks.
(Unfortunately it's useless for backstepping. Backstepping is like
precise seeking, except we're unable to drop frames, as we can't know
the previous frame if we drop it.)
Comparing mpv's implementation against the ACES ODR reference samples
and algorithms, it seems like they're happy desaturating highlights
_way_ more aggressively than mpv currently does. And indeed, looking at
some example clips like The Redwoods (which is actually well-mastered),
the current desaturation produces unnatural-looking brightness fringes
where the sky meets the treeline.
Adjust the algorithm to make it apply to a much larger, more gradual
brightness region; and change the interpretation of the parameter. As a
bonus, the new parameter is actually sanely scaled (higher values = more
desaturation). Also, make it scale based on the signal level instead of
the luminance, to avoid under-desaturating bright blues.
The new_segment field was used to track the decoder data flow handler of
timeline boundaries, which are used for ordered chapters etc. (anything
that sets demuxer_desc.load_timeline). This broke seeking with the
demuxer cache enabled. The demuxer is expected to set the new_segment
field after every seek or segment boundary switch, so the cached packets
basically contained incorrect values for this, and the decoders were not
initialized correctly.
Fix this by getting rid of the flag completely. Let the decoders instead
compare the segment information by content, which is hopefully enough.
(In theory, two segments with same information could perhaps appear in
broken-ish corner cases, or in an attempt to simulate looping, and such.
I preferred the simple solution over others, such as generating unique
and stable segment IDs.)
We still add a "segmented" field to make it explicit whether segments
are used, instead of doing something silly like testing arbitrary other
segment fields for validity.
Cached seeking with timeline stuff is still slightly broken even with
this commit: the seek logic is not aware of the overlap that segments
can have, and the timestamp clamping that needs to be performed in
theory to account for the fact that a packet might contain a frame that
is always clipped off by segment handling. This can be fixed later.
This commit allows to use the AV_PIX_FMT_DRM_PRIME newly introduced
format in ffmpeg that allows decoders to provide an AVDRMFrameDescriptor
struct.
That struct holds dmabuf fds and information allowing zerocopy rendering
using KMS / DRM Atomic.
This has been tested on RockChip ROCK64 device.
Since we divide by it in a couple of places and compositors can be crazy,
its better to be safe than sorry.
Also checks cursor spawn durinig init (pointless since it does again on
cursor entry but its more correct).
It seems the cursor hadn't had its position properly adjusted when scaled.
Hence, bring back correct buffer scaling to make the cursor look fine.
Also the cursor surface now gets created sooner so that's better.
Regression since ec6e8a31e0. Removal of the explicit else case
always applies the conversion to premultiplied alpha in the else branch.
We want to scale with multiplied alpha, but we don't want to multiply
with alpha again on top of it.
Fixes#4983, hopefully.
This should be functionally identical to rgba16f, since the formats only
differ in their representation on the CPU, but it could be useful for RA
backends that don't expose rgba16f, like Vulkan. It's definitely useful
for the WIP D3D11 backend.
With video paused, changing the brightness controls (or similar) would
sometimes not rerender the video frame. So the OSD would redraw, but the
video wouldn't change. This is caused by output caching, and a redraw
request is free to return the cached frame. Change it such to invalidate
the cached frame if any of the options or the equalizer change.
In theory, gl_video_reset_surfaces() could be called if the equalizer
changes - this would apparently force interpolatzion to redraw all
frames. But this looks kind of crappy when changing the equalizer during
playback. It'll "eventually" use the correct settings anyway, and when
paused interpolation is off.
This was phased out, and was used only by vdpau by now. Drop the
mechanism and the vdpau special code, which means screenshots won't
include the vf_vdpaupp processing anymore. (I don't care enough about
vdpau, it's on its way out.)
The mechanism introduced in b135af6842 assumed AVHWFramesContext would
be enough. Apparently it's not - the intended use with Rockchip (not
Rokchip btw.) requires accessing actual frame data in order to access
the AVDRMFrameDescriptor struct.
Just pass the entire mp_image to the new function. This is more
flexible, although it slightly worries me that it will be less reusable
for things which require setting up mp_image_params before any real
frames are processed (such as filters).
The same should happen with any other side data that matters to mpv,
otherwise filters will drop it.
(No, don't try to argue that mpv should use AVFrame. That won't work.)
ffmpeg_garbage() is copy&paste from frame_new_side_data() in FFmpeg
(roughly feed201849b8f91), because it's not public API. The name
reflects my opinion about FFmpeg's API.
In mp_image_to_av_frame(), change the too-fragile
*new_ref = (struct mp_image){0};
into explicitly zeroing out the fields that are "transferred" to the
created AVFrame.
Merge mp_image_copy_fields_to_av_frame() into mp_image_from_av_frame(),
same for the other direction.
There isn't any good reason to keep them separate, and the refcounting
handling makes it only more awkward.
It seems this will be useful for Rokchip DRM hwcontext integration.
DRM hwcontexts have additional internal structure which can be different
depending on the decoder, and which is not part of the generic hwcontext
API. Rockchip has 1 layer, which EGL interop happens to translate to a
RGB texture, while VAAPI (mapped as DRM hwcontext) will use multiple
layers. Both will use sw_format=nv12, and thus are indistinguishable on
the mp_image_params level. But this is needed to initialize the EGL
mapping and the vo_gpu video renderer correctly.
We hope that the layer count is enough to tell whether EGL will
translate the data to a RGB texture (vs. 2 texture resembling raw nv12
data). For that we introduce MP_IMAGE_HW_FLAG_OPAQUE.
This commit adds the flag, infrastructure to set it, and an "example"
for D3D11.
The D3D11 addition is quite useless at this point. But later we want to
get rid of d3d11_update_image_attribs() anyway, while we still need a
way to force d3d11vpp filter insertion, so maybe it has some
justification (who knows). In any case it makes testing this easier.
Obviously it also adds some basic support for triggering the opaque
format for decoding, which will use a driver-specific format, but which
is not supported in shaders. The opaque flag is not used to determine
whether d3d11vpp needs to be inserted, though.
Mostly an obscure option for testing. But --videotoolbox-format can be
deprecated, as it becomes redundant.
We rely on the libavutil hwcontext implementation to reject invalid
pixfmts, or not to blow up if they are incompatible.
This was confusing at best. Change it to output the actual choices.
(Seems like in the end it's always me who has to clean up other people's
bullshit.)
Context names were not unique - but they should be, so fix it. The whole
point of the original --opengl-backend option was to side-step the
tricky auto-detection, so you know exactly what you get. The goal of
this commit is to make --gpu-context work the same way. Fix the
non-unique names by appending "vk" to the names.
Keep in mind that this was not suitable for slecting the "UI" backend
anyway, since "x11" would force GLX, whereas people on not-NVIDIA
actually want "x11egl". Users trying to use --gpu-context=x11 to force
the X11 backend would always end up with GLX, which would at least break
VAAPI hardware decoding for them. Basically the idea that this option
could select the "UI" type is completely broken - it selects an
implementation, which implies a UI. Selecting the UI type This would
require a separate mechanism. (Although in theory this separate
mechanism could be part of the --gpu-context option - in any case,
someone would have to implement it.)
To achieve help output that can actually be understood, just duplicate
the code. Most of that code is duplicated anyway, and trying to share
just the list code with the result of making the output unreadable
doesn't make too much sense. If we wanted to save code/effort, we could
just remove the help output altogether.
--gpu-api has non-unique entries, and it would be nice to group them
(e.g. list all OpenGL capable contexts with "opengl"), but C makes this
simple idea too much of a pain, so don't do it.
Also remove a stray tab from the android entry on the manpage.
If the chroma location is missing, vo_gpu will use centered chroma.
Select a better chroma location by default: normally, it will always be
MPEG video chroma location. If full levels are used, use JPEG chroma
location, because that sort of sounds like it could make sense as it
might coincide with JPEG being decoded.
See e.g. #4804.
Every compositor (including toy compositors) has had support for wl_output v2
since forever, so there's little point in supporting degraded output for 5 year
old releases (especially considering we require zxdg6 which is far more recent).
This adds symbol information to the generated SPIR-V, which shows up in
the SPIR-V assembly dump. It's also useful for potential RA backends
that use SPIRV-Cross, since the symbol information is used in the
generated shader source.
This should actually cover all of them, if you take into account that
some unchanged GPL source files include header files with such checks.
Also this was done already for the libaf derived code.
This is only for "safety" and to avoid misunderstandings.
It turns out compositors which do scaling scale the cursor as well,
so every single surface needs to get scaled too.
Also, 32 corresponds to the default size for both GTK+ and KDE.
This new interface in libva2 offers a cleaner way to export surfaces
which can then be imported to EGL. In particular, this works with
the Mesa driver, so we can have proper playback without a pointless
download and upload on AMD cards.
This change does nothing with libva1, and will fall back to the
libva1 interface (vaDeriveImage() + vaAcquireBufferHandle()) if
vaExportSurfaceHandle() is not present.
At the moment, rendering on Android requires ``--vo=opengl-cb`` and
a lot of java<->c++ bridging code to receive the receive and react to
the render callback in java. Performance also suffers with opengl-cb,
due to the overhead of context switching in JNI.
With this patch, Android can render using ``--vo=gpu --gpu-context=android``
(after setting ``--wid`` to point to an android.view.Surface on-screen).
MediaCodec uses a fixed number of output buffers to hold frames, and
expects that output buffers will be released as soon as possible. Once
rendered, the underlying frame is automatically released and cannot be
reused or rerendered.
The new VO_CAP_NOREDRAW forces mpv to release frames immediately after
they are rendered or dropped, to ensure that MediaCodec decoder does not
run out of buffers and stall out.
This partiular format is not marked as AV_PIX_FMT_FLAG_RGB in FFmpeg's
pixdesc table, so mpv assumed it's a YUV format.
This is a regression, since the old code in mp_imgfmt_get_desc() also
treated this format specially to avoid this problem. Another format
which was special-cased in the old code was AV_PIX_FMT_MONOBLACK, so
make an exception for it as well.
Maybe this problem could be avoided by mp_image_params_guess_csp() not
forcing certain colorimetric parameters by the implied colorspace, but
certainly that would cause other problems. At least there are mistagged
files out there that would break. (Do we actually care?)
Fixes#4965.
This commit:
- Implements output tracking (e.g. monitor plug/unplug)
- Creates the surface during registry (no other dependencies)
- Queues the callback immediately after surface creation
- Cleaner and better event handling (functions return directly)
- Better reconfigure handling (resizes reduced to 1 during init)
- Don't unnecessarily resize (if dimensions match)
Apart from that fixes 2 potential memory leaks (mime type and window
title), 2 string ownership issues (output name and make need to be
dup'd), fixes some style issues (switches were indented) and finally
adds messages when disabling/enabling idle inhibition.
The callback setter function was removed in preparation for the commit
which will use the frame event cb because it was unnecessary.
The VO code resets each flag individually, and it doesn't do it for this one.
Also make the prints use the struct names rather than the hardcoded ones,
forgot to add those to the last wayland_common commit.
iive agreed to relicense things that are still in mpv to LGPLv2.1. So
change the licenses of the affected files, and rename the configure
switch for LGPL mode to --enable-preliminary-lgpl2.
(The "preliminary" part will probably be removed from the configure
switch soon as well.)
Also player/main.c hasn't had GPL parts since a few commits ago.
The wayland code was written more than 4 years ago when wayland wasn't
even at version 1.0. This commit rewrites everything in a more modern way,
switches to using the new xdg v6 shell interface which solves a lot of bugs
and makes mpv tiling-friedly, adds support for drag and drop, adds support
for touchscreens, adds support for KDE's server decorations protocol,
and finally adds support for the new idle-inhibitor protocol.
It does not yet use the frame callback as a main rendering loop driver,
this will happen with a later commit.
The existing code in check_ext() avoided false positive due to
sub-strings, but allowed false negatives. Fix this with slightly better
search code, and make it available as function to other source files.
(There are some cases of strstr() still around.)
Unless FBOs are unsupported, this works. In particular, it's required to
get ICC profiles working in voluntary dumb mode. So instead of
blanket-disabling it, only disable it in the !have_fbo false case.
We allowed any input format that was generally supported by libva, but
this is probably nonsense, as the actual surface format was always fixed
to nv12. We would have to check whether libva can upload a given pixel
format to a nv12 surface. Or we would have to use a separate frame pool
for input surfaces with the exact sw_format - but then we'd also need to
check whether the vaapi VideoProc supports the surface type.
Hardcode nv12 and yuv420p as input formats, which we know can be
uploaded to nv12 surfaces. In theory we could get a list of supported
upload formats from libavutil, but that also require allocating a dummy
hw frames context just for the query.
Add a comment to the upload code why we can allocate an output surface
for input.
In the long run, we'll probably want to use libavfilter's vaapi
deinterlacer, but for now this would break at least user options.
The current check_va_status() function could probably be argued to be
derived from the original VAAPI's patch check_status() function, thus
GPL-only. While I have my doubts that it applies to an idiom on this
level, it's better to replace it. Similar idea, different expression
equals no copyright association.
An earlier commit message promised this, but it was forgotten.
Originally mpv vaapi support was based on the MPlayer-vaapi patches.
These were never merged in upstream MPlayer. The license headers
indicated they were GPL-only. Although the actual author agreed to
relicensing, the company employing him to write this code did not, so
the original code is unusable to us.
Fortunately, vaapi support was refactored and rewritten several times,
meaning little code is actually left. The previous commits removed or
moved that to GPL-only code. Namely, vo_vaapi.c remains GPL-only. The
other code went away or became unnecessary mainly because libavcodec
itself gained the ability to manage the hw decoder, and libavutil
provides code to manage vaapi surfaces. We also changed to mainly using
EGL interop, making any of the old rendering code unnecessary.
hwdec_vaglx.c is still GPL. It's possibly relicensable, because much of
it was changed, but I'm not too sure and further investigation would be
required. Also, this has been disabled by default for a while now, so
bothering with this is a waste of time. This commit simply disables it
at compile time as well in LGPL mode.
Done for license reasons. vo_vaapi.c is turned into some kind of
dumpster fire, and we'll remove it as soon as I'm mentally ready for
unkind users to complain about removal of this old POS.
This is for relicensing. Some of this code is loosely based on
vo_vaapi.c from the original MPlayer-vaapi patches. Most of the code has
changed, and only the initialization code and check_status() look
remotely similar. The initialization code is changed to be like Libav's
(hwcontext_vaapi.c). check_va_status() is just a C idiom, but to play it
safe, we'll either drop it from LGPL code (or recreate it).
vaapi.c still contains plenty of code from the original patches, but the
next commits will move them out of the LGPL code paths.
Seems to be fixed upstream in the nvidia driver, so it's probably a good
idea to 1. force the layout and 2. remove the warning, as it now
actually works. Users with older drivers would run into errors, but they
can still use shaderc as a replacement. (And it's not like the old
status quo was any better)
This was always set to the length of the VAO, but it should have been
set to the number of vertex attribs actually in use for this frame. No
idea how that managed to survive the test framework on nvidia/linux, but
ANGLE caught it.
This has several advantages:
1. no more redundant texcoords when we don't need them
2. no more arbitrary limit on how many textures we can bind
3. (that extends to user shaders as well)
4. no more arbitrary limits on tscale radius
To realize this, the VAO was moved from a hacky stateful approach
(gl_sc_set_vertex_attribs) - which always bothered me since it was
required for compute shaders as well even though they ignored it - to be
a proper parameter of gl_sc_dispatch_draw, and internally plumbed into
gl_sc_generate, which will make a (properly mangled) deep copy into
params.vertex_attribs.
FlagBits is just the name of the enum. The actual data type representing
a combination of these flags follows the *Flags convention. (The
relevant difference is that the latter is defined to be uint32_t instead
of left implicit)
For consistency, use *Flags everywhere instead of randomly switching
between *Flags and *FlagBits.
Also fix a wrong type name on `stageFlags`, pointed out by @atomnuker
Using renderpass layout transitions is more optimal and doesn't require
a redundant pipeline barrier.
Since our render passes are static and don't change throughout the
lifetime of a ra_renderpass, we unfortunately don't have much
flexibility here - so just hard-code SHADER_READ_ONLY_OPTIMAL as the
output format as this will be the most common case.
We also can't short-circuit the transition when we need to preserve the
framebuffer contents, since that depends on the current layout; so we
still use an explicit tex_barrier in this case. (Most optimal for this
scenario would be an input attachment anyway)
Now you need FFmpeg git, or something.
This also gets rid of the last real use of gpu_memcpy(). libavutil does
that itself. (vaapi.c still used it, but it was essentially unused,
because the code path isn't really in use anymore. It wasn't even
included due to the d3d-hwaccel dependency in wscript.)
This is apparently required to get storage images working on
windows/vulkan, and probably good practice either way. Not entirely sure
if it's the best idea to be always storing the value as 32-bit float,
but it should hardly matter in practice (since we're only writing one
sample per thread).
(Leaving them implicit requires the shaderStorageImageWriteWithoutFormat
feature to be enabled, which the windows nvidia vulkan driver doesn't
support, at least not for a GTX 670)
This makes the radeon driver shut up about frequently updating
STATIC_DRAW UBOs (--opengl-debug), and also reduces the amount of
synchronization necessary for vulkan uniform buffers.
Also add some extra debugging/tracing code paths. I went with a
flags-based approach in case we ever want to extend this.
In addition to the built-in nvidia compiler, we now also support a
backend based on libshaderc. shaderc is sort of like glslang except it
has a C API and is available as a dynamic library.
The generated SPIR-V is now cached alongside the VkPipeline in the
cached_program. We use a special cache header to ensure validity of this
cache before passing it blindly to the vulkan implementation, since
passing invalid SPIR-V can cause all sorts of nasty things. It's also
designed to self-invalidate if the compiler gets better, by offering a
catch-all `int compiler_version` that implementations can use as a cache
invalidation marker.
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
Tested by making the ra_tex_resize function always fail (apart from the
initial FBO check). This required a few changes:
1. reset shaders on failed dispatch
2. reset cleanup binds on failed dispatch
3. fall back to initializing the struct image to 1x1 on failure
4. handle output_fbo_valid gracefully
This was sort of grating by default and made it really hard to actually
read e.g. text on top of a transparent background. I decided to approach
the problem from both directions, making the whites darker and the grays
lighter. This brings it closer to the dynamic range of e.g. the
wikipedia transparent svg preview.
Due to the plethora of historical baggage from different eras getting
confusing, I decided to simplify and unify the struct organization and
naming scheme.
Structs that got renamed:
1. fbodst -> ra_fbo (and moved to gpu/context.h)
2. fbotex -> removed (redundant after 2af2fa7a)
3. fbosurface -> surface
4. img_tex -> image
In addition to these structs being renamed, all of the names have been
made consistent. The new scheme is as follows:
struct image img;
struct ra_tex *tex;
struct ra_fbo fbo;
This also affects derived names, e.g. indirect_fbo -> indirect_tex.
Notably also, finish_pass_fbo -> finish_pass_tex and finish_pass_direct
-> finish_pass_fbo.
The new equivalent of fbotex_change() is called ra_tex_resize().
This commit (should) contain no logic changes, just renaming a bunch of
crap.
I've observed the garbage pixels in more scenarios. They also were never
really needed to begin with, originally being a discovered work-around
for bug that we fixed since then anyway. Doesn't really seem to even
help resizing, since the OpenGL drivers are all smart enough to pool
resources internally anyway.
Fixes#1814
Enabling double buffering fixed some graphical glitches when entering
fullscreen, but it also caused a fullscreen performance regression. We
decided that the glitches were preferable to the performance regression.
This reverts commit cee764849e.
The code used ra_ctx_destroy even though ra_ctx_create was never called
(since it's just a dummy ctx), which led to a conflict of assumptions.
The proper fix is to only use ra_gl_ctx_uninit (mirroring the
ra_gl_ctx_init) and free the dummy ctx manually.
Fixes https://github.com/cmdrkotori/mpc-qt/issues/129
We want e.g. --opengl-shaders-append=foo to resolve to the new option,
all while printing an option name. --opengl-shader is a similar case.
These options are special, because they apply "actions" on actual
options by specifying a suffix. So the alias/deprecation handling has to
be part of resolving the actual option from prefix and suffix.
Turns out the option code apparently tries to directly talloc_free() the
allocated strings, instead of going through a tactx wrapper or
something. So we can't directly overwrite it. Do something else
instead..
Almost as fast as the old code, but more general. Notably, glslang
doesn't support nested arrays.
(cf. https://github.com/KhronosGroup/glslang/issues/1057)
Also much cleaner code-wise, so I think I'll keep it even if glslang
implements array_of_arrays.
This never really made sense since the BT.1886 changes. It should get
*brighter* for bright rooms, not darker for dark rooms. Picked some new
values that seemed reasonable-ish.
This causes a performance regression on 10.11 and newer, but the single
buffered method was broken and could cause partially rendered frames to
be presented to the screen.
This reverts 9f30cd8292 and
e543853a7f.
This is done in several steps:
1. refactor MPGLContext -> struct ra_ctx
2. move GL-specific stuff in vo_opengl into opengl/context.c
3. generalize context creation to support other APIs, and add --gpu-api
4. rename all of the --opengl- options that are no longer opengl-specific
5. move all of the stuff from opengl/* that isn't GL-specific into gpu/
(note: opengl/gl_utils.h became opengl/utils.h)
6. rename vo_opengl to vo_gpu
7. to handle window screenshots, the short-term approach was to just add
it to ra_swchain_fns. Long term (and for vulkan) this has to be moved to
ra itself (and vo_gpu altered to compensate), but this was a stop-gap
measure to prevent this commit from getting too big
8. move ra->fns->flush to ra_gl_ctx instead
9. some other minor changes that I've probably already forgotten
Note: This is one half of a major refactor, the other half of which is
provided by rossy's following commit. This commit enables support for
all linux platforms, while his version enables support for all non-linux
platforms.
Note 2: vo_opengl_cb.c also re-uses ra_gl_ctx so it benefits from the
--opengl- options like --opengl-early-flush, --opengl-finish etc. Should
be a strict superset of the old functionality.
Disclaimer: Since I have no way of compiling mpv on all platforms, some
of these ports were done blindly. Specifically, the blind ports included
context_mali_fbdev.c and context_rpi.c. Since they're both based on
egl_helpers, the port should have gone smoothly without any major
changes required. But if somebody complains about a compile error on
those platforms (assuming anybody actually uses them), you know where to
complain.
See "Copyright" file for caveats.
This changes the remaining "almost LGPL" files to LGPL, because we think
that the conditions the author set for these was finally fulfilled.
This is "wrong", because you might want mp_image_copy_attributes() to
preserve the information that the colorspace parameters are unknown.
This is important for hwdec -copy modes, which call this function before
fix_image_params() and mp_colorspace_merge() are called.
Instead, just wipe the colorspace attributes if the pixel format changes
in an apparently incompatible way. Use mp_image_params_guess_csp() logic
for this and factor that into its own function.
mp_image_set_attributes() attempts to do something similar, so change
that in the same way. Also, mp_image_params_guess_csp() just returned if
the imgfmt was invalid or unset - just remove that part, because it
annoyingly doesn't fit into the new code, and had little reason to exist
to begin with. (Probably.)
I see no reason not to do this. I think the check comes from the time
when mp_image stored the image aspect ratio, instead of the pixel aspect
ratio, where the logic might have made more sense.
It was noticed that -copy hwdec modes typically dropped the
chroma_location field. This happened because the attributes on hw
download are copied with mp_image_copy_attributes(), which tries to copy
these parameters only if src and dst were both YUV (in an attempt to
copy parameters only if it makes sense).
But hardware formats did not have the YUV flag set (anymore?), and code
shouldn't attempt to check the flag in this way anyway. Drop the check,
and always copy the whole color metadata struct. There is a call to
mp_image_params_guess_csp() below, which tries to unset nonsense
metadata if it was copied from a YUV format to RGB. This function would
also do the right thing for hw formats (although for the cited bug only
the software case matters).
Fixes#4804.
This reverts commit 96462040ec.
I guess the autoprobing is still too primitive to handle this well. What
it really should be trying is initializing the wrapper decoder, and if
that doesn't work, try another method. This is complicated by hwaccels
initializing in a delayed way, so there is no easy solution yet.
Probably fixes#4865.
This overrides the use of GLX_SGI_swap_control, because apparently
GLX_SGI_swap_control doesn't support SwapInterval(0), but the
GLX_MESA_swap_interval does.
Of course, everybody except mesa just accepts SwapInterval(0) even for
GLX_SGI_swap_control, but mesa needs to be the special snowflake here
and reject it, forcing us to load their stupid named extension instead.
Meanwhile khronos has done nothing except spit out GLX_EXT_swap_control
(not to be confused with GL_EXT_swap_control, which is exported by
WGL_EXT_swap_control), that doesn't fix the problem because mesa doesn't
implement it anyway.
What a fucking mess.
Even if the contents are entirely zero. In the current code, these
entries were left uninitialized. (Which always worked for nvidia - but
randomly blew up for AMD)
This is simultaneously generalized into two directions:
1. Support more sc_uniform types (needed for SC_UNIFORM_TYPE_PUSHC)
2. Support more flexible packing (needed for both PUSHC and ra_d3d11)
This is around 512 kB, which is just way too much. Heap-allocate it
instead. Also cut down the max pass count to 64, since 128 was
unrealistically high even for vo_opengl.
Instead of relying on power-of-two buffer sizes and unsigned overflow,
make this code more robust (and also cleaner).
Why can't C get a real modulo operator?
This was there even before the refactor, but the refactor exposed the
bug. I hate C's useless fucking modulo operator so much. I've gotten hit
by this exact bug way too many times.
Since the addition of UBOs, the assumption that the uniform index
corresponds to the pass->params.inputs index is no longer true. Also,
there's no reason it would even need this - since the `input` is also
available directly in sc_uniform.
I have no idea how I've been using this code for as long as I have
without any segfaults until earlier today.
This was needlessly complicated and prone to breakage, because even the
references to the ring buffer could end up getting invalidated and
containing garbage data on e.g. shader cache flush. For much the same
reason why we can't keep around the *timer_pool, we're also forced to
hard-copy the entire sample buffer per pass per frame.
Not a huge deal, though. This is, what, a few kB per frame? We have more
pressing CPU performance concerns anyway.
Also simplified/fixed some other code.
This clearly highlights all out-of-gamut/clipped pixels. (Either too
bright or too saturated)
Has some (documented) caveats. Also make TONE_MAPPING_CLIP stop actually
clamping the value range (it's unnecessary and breaks this feature).
Redefining texture1D / texture3D seems to be illegal, they are already
built-in macros or something. So just use tex1D and tex3D instead.
Additionally, GL_KHR_vulkan_glsl requires using explicit vertex
locations and bindings, so make some changes to facilitate this. (It
also requires explicitly setting location=0 for the color attachment
output)
Vulkan compat. rgb16 doesn't exist on hardware anyway, might as well
just generate the 3DLUT against rgba16 as well. We've decided this is
the simplest way to do vulkan compatibility: just make sure we never
actually need 3-component textures.
This is mostly done so we can support using textures with more
components than the scaler LUTs have entries. But while we're at it,
also change the way the weights are packed so that they're always
sequential with no gaps. This allows us to simplify
pass_sample_separated_get_weights as well.
Mouse wheel bindings have always been a cause of user confusion.
Previously, on Wayland and macOS, precise touchpads would generate AXIS
keycodes and notched mouse wheels would generate mouse button keycodes.
On Windows, both types of device would generate AXIS keycodes and on
X11, both types of device would generate mouse button keycodes. This
made it pretty difficult for users to modify their mouse-wheel bindings,
since it differed between platforms and in some cases, between devices.
To make it more confusing, the keycodes used on Windows were changed in
18a45a42d5 without a deprecation period or adequate communication to
users.
This change aims to make mouse wheel binds less confusing. Both the
mouse button and AXIS keycodes are now deprecated aliases of the new
WHEEL keycodes. This will technically break input configs on Wayland and
macOS that assign different commands to precise and non-precise scroll
events, but this is probably uncommon (if anyone does it at all) and I
think it's a fair tradeoff for finally fixing mouse wheel-related
confusion on other platforms.
It seems like the Cocoa backend used to return the same mpv keycodes for
mouse back/forward as it did for scrolling up and down. Fix this by
explicitly mapping all Cocoa button numbers to the right mpv keycodes.
mpv's mouse button numbering is based on X11 button numbering, which
allows for an arbitrary number of buttons and includes mouse wheel input
as buttons 3-6. This button numbering was used throughout the codebase
and exposed in input.conf, and it was difficult to remember which
physical button each number actually referred to and which referred to
the scroll wheel.
In practice, PC mice only have between two and five buttons and one or
two scroll wheel axes, which are more or less in the same location and
have more or less the same function. This allows us to use names to
refer to the buttons instead of numbers, which makes input.conf syntax a
lot easier to remember. It also makes the syntax robust to changes in
mpv's underlying numbering. The old MOUSE_BTNx names are still
understood as deprecated aliases of the named buttons.
This changes both the input.conf syntax and the MP_MOUSE_BTNx symbols in
the codebase, since I think both would benefit from using names over
numbers, especially since some platforms don't use X11 button numbering
and handle different mouse buttons in different windowing system events.
This also makes the names shorter, since otherwise they would be pretty
long, and it removes the high-numbered MOUSE_BTNx_DBL names, since they
weren't used.
Names are the same as used in Qt:
https://doc.qt.io/qt-5/qt.html#MouseButton-enum
If a VO-area option changes, gl_video_resize() is called
unconditionally. This function does something even if the size does not
change (at least it discards buffered frames for interpolation), which
can lead to stutter when you keep firing option change events during
playback.
Check for an actual resize, and if nothing changes, exit early.
Could cause a crash if anything called ra_get_imgfmt_desc(imgfmt=0). Let
it fail correctly. This can happen if a hwdec backend does not set
hw_subfmt correctly.
This also introduces RA_CAP_GLOBAL_UNIFORM. If this is not set, UBOs
*must* be used for non-bindings. Currently the cap is ignored though,
and the shader_cache *always* generates UBO-using code where it can.
Could be made an option in principle.
Only enabled for drivers new enough to support explicit UBO offsets,
just in case...
No change to performance, which is probably what we expect.
This no longer concerns the API user except in as much as the API user
probably wants to know whether or not PBOs are active, so keep around
the CAP field even though it's mostly useless now.
Both vulkan and opengl distinguish between rendering to an image and
using an image as a storage attachment. So make this an explicit
capability instead of lumping it in with render_dst. (That way we could
support, for example, using an image as a storage attachment without
requiring a framebuffer)
The real reason for this change is that you can directly use the output
FBO as a storage attachment on vulkan but you can't on opengl, which
makes this param structly separate from render_dst.
I don't like the feeling of "reusing" the int binding for this. It
feels... wrong, somehow. I'd prefer to use an explicit "offset" field.
(Plus, I might re-use this for uniform buffers or something)
YMMV
This is needed for HAVE_SSE4_INTRINSICS. config.h used to be included as
a transitive dependency of vf.h, but the include statement was removed
from vf.h in 8f2ccba71b.
Also silence an unused variable warning that was introduced in the same
commit.
Not resetting hwdec_request_reinit caused it to flush on every packet,
which not only caused it to fail triggering the actual fallback, and let
it never decode a new frame, but also to get stuck on EOF.
This removes all GPL only code from it, and that's the whole purpose.
Also happens to be much simpler.
The "deinterlace" option still sort of exists, but only as runtime
changeable option. The main change in behavior is that the property will
not report back the actual deint state. Or in other words, if inserting
or initializing the filter fails, the deinterlace property will still
return "yes". This is in line with most recent behavior changes to
properties and options.
I really wouldn't care much about this, but some parts of the core code
are under HAVE_GPL, so there's some need to get rid of it. Simply turn
the video equalizer from its current fine-grained handling with vf/vo
fallbacks into global options. This makes updating them much simpler.
This removes any possibility of applying video equalizers in filters,
which affects vf_scale, and the previously removed vf_eq. Not a big
loss, since the preferred VOs have this builtin.
Remove video equalizer handling from vo_direct3d, vo_sdl, vo_vaapi, and
vo_xv. I'm not going to waste my time on these legacy VOs.
vo.eq_opts_cache exists _only_ to send a VOCTRL_SET_EQUALIZER, which
exists _only_ to trigger a redraw. This seems silly, but for now I feel
like this is less of a pain. The rest of the equalizer using code is
self-updating.
See commit 96b906a51d for how some video equalizer code was GPL only.
Some command line option names and ranges can probably be traced back to
a GPL only committer, but we don't consider these copyrightable.
Both the video equalizer command/option glue, which drives this filter,
as well as the filter itself are slightly GPL contaminated. So it goes.
After this commit, "--vf=eq" will actually use libavfilter's vf_eq (if
FFmpeg was compiled in GPL mode), but it has different options and will
not listen to the equalizer VOCTRLs.
So far, we had a thread-safe way to read options, but no option update
notification mechanism. Everything was funneled though the main thread's
central mp_option_change_callback() function. For example, if the
panscan options were changed, the function called vo_control() with
VOCTRL_SET_PANSCAN to manually notify the VO thread of updates. This
worked, but's pretty inconvenient. Most of these problems come from the
fact that MPlayer was written as a single-threaded program.
This commit works towards a more flexible mechanism. It adds an update
callback to m_config_cache (the thing that is already used for
thread-safe access of global options).
This alone would still be rather inconvenient, at least in context of
VOs. Add another mechanism on top of it that uses mp_dispatch_queue, and
takes care of some annoying synchronization issues. We extend
mp_dispatch_queue itself to make this easier and slightly more
efficient.
As a first application, use this to reimplement certain VO scaling and
renderer options. The update_opts() function translates these to the
"old" VOCTRLs, though.
An annoyingly subtle issue is that m_config_cache's destructor now
releases pending notifications, and must be released before the
associated dispatch queue. Otherwise, it could happen that option
updates during e.g. VO destruction queue or run stale entries, which is
not expected.
Rather untested. The singly-linked list code in dispatch.c is probably
buggy, and I bet some aspects about synchronization are not entirely
sane.
Also refactors the usage of tex_upload to make ra_tex_upload_pbo a
RA-internal thing again.
ra_buf_pool has the main advantage of being dynamically sized depending
on buf_poll, so for OpenGL we'll end up only using one buffer (when not
persistently mapping) - while for vulkan we'll use as many as necessary,
which depends on the swapchain depth anyway.
This adds handling of spherical video metadata: retrieving it from
demux_lavf and demux_mkv, passing it through filters, and adjusting it
with vf_format. This does not include support for rendering this type of
video.
We don't expect we need/want to support the other projection types like
cube maps, so we don't include that for now. They can be added later as
needed.
Also raise the maximum sizes of stringified image params, since they
can get really long.
This broke screensaver/powersave inhibition with at least KDE and
LXDE. This is a release blocker.
Since fdo, KDE and GNOME idiots seem to be unable to reach
a consensus on a simple protocol, this seems unlikely to get
fixed upstream this year, so revert this change.
Fixes#4752.
Breaks #4706 but I don’t give a damn.
This reverts commit 3f75b3c343.
- tex_uploads args are moved to a struct
- the ability to directly upload texture data without going through a
buffer is made explicit
- the concept of buffer updates and buffer polling is made more explicit
and generalized to buf_update as well (not just mapped buffers)
- the ability to call tex_upload/buf_update on a tex/buf is made
explicit during tex/buf creation
- uploading from buffers now uses an explicit offset instead of
implicitly comparing *src against buf->data, because not all buffers
may actually be persistently mapped
- the initial_data = immutable requirement is dropped. (May be re-added
later for D3D11 if that ever becomes a thing)
This change helps the vulkan abstraction immensely and also helps move
common code (like the PBO pooling) out of ra_gl and into the
opengl/utils.c
This also technically has the side-benefit / side-constraint of using
PBOs for OSD texture uploads as well, which actually seems to help
performance on machines where --opengl-pbo is faster than the naive code
path. Because of this, I decided to hook up the OSD code to the
opengl-pbo option as well.
One drawback of this refactor is that the GL_STREAM_COPY hack for
texture uploads "got lost", but I think I'm happy with that going away
anyway since DR almost fully deprecates it, and it's not the "right
thing" anyway - but instead an nvidia-only hack to make this stuff work
somewhat better on NUMA systems with discrete GPUs.
Another change is that due to the way fencing works with ra_buf (we get
one fence per ra_buf per upload) we have to use multiple ra_bufs instead
of offsets into a shared buffer. But for OpenGL this is probably better
anyway. It's possible that in future, we could support having
independent “buffer slices” (each with their own fence/sync object), but
this would be an optimization more than anything. I also think that we
could address the underlying problem (memory closeness) differently by
making the ra_vk memory allocator smart enough to chunk together
allocations under the hood.
Instead of merging it into render_dst. This is better for vulkan,
because blitting in vulkan both does not require a FBO *and* requires a
different image layout.
Also less "hacky" for OpenGL, since now the weird blit=FBO requirement
is an implementation detail of ra_gl
This extracts non-ANGLE specific code to d3d11_helpers.c, which is
modeled after egl_helpers.c. Currently the only consumer is
context_angle.c, but in future this may allow the D3D11 device and
swapchain creation logic to be reused in other backends.
Also includes small improvements to D3D11 device creation. It is now
possible to create feature level 11_1 devices (though ANGLE does not
support these,) and BGRA swapchains, which might be slightly more
efficient than ARGB, since its the same format used by the compositor.