this lead to an unexpected videotoolbox-copy hwdec name due to the last
two chars being cut off. since selection is also done by that name one
had to use "videotoolbox-co" to explicitly use the copy mode of
videotoolbox.
This option has been deprecated upstream for a long time, probably
doesn't even work anymore, and won't work moving forwards as we replace
the vulkan code by libplacebo wrappers.
I haven't removed the option completely yet since in theory we could
still add support for e.g. a native glslang wrapper in the future. But
most likely the future of this code is deletion.
As an aside, fix an issue where the man page didn't mention d3d11.
This commit bumps the libmpv version to 1.102
drm-osd-plane -> drm-draw-plane
drm-video-plane -> drm-drmprime-video-plane
drm-osd-size -> drm-draw-surface-size
"draw plane", as in the plane that OpenGL draws to, whether it be
video + OSD or just OSD.
"drmprime video plane", as in the plane used for hwdec video imported
via drmprime.
"draw surface size", as in the size of the surface used for the draw plane
The new names are invariant whether or not hwdec_drmprime_drm is being
used or not. The original naming was very confusing, as when doing
regular rendering (swdec or vaapi) the video would be displayed on the
"OSD plane", and the "Video plane" would remain unused.
Add general primary/overlay plane option to drm-osd-plane-id and
drm-video-plane-id, so that the user can just request any usable
primary or overlay plane for either of these two options. This should
be somewhat more user-friendly (especially as neither of these two
options currently have a useful help function), as usually you would
only be interested in the type of the plane, and not exactly which
plane gets picked.
By design, some vulkan implementations block until vsync during
vkAcquireNextImageKHR. Since mpv only considers the time that
`swap_buffers` spent blocking as constituting part of the vsync, we can
help it out a bit by pre-emptively calling this function here in order
to improve the accuracy of vsync jitter measurements on vulkan.
(If it fails, we just ignore the error and have the user call it a
second time later - maybe it will work then)
On my system this drops vsync-jitter from ~0.030 to ~0.007, an accuracy
of +/- 100μs. (Which *might* have something to do with the fact that
this is the polling interval for command polling)
Makes performance slightly better when using multiple queues by avoiding
unnecessary semaphores due to bad queue selection.
Also remove an aeons-old workaround for an nvidia bug that only ever
existed in the earliest beta vulkan drivers anyway.
We are currently unnecessarily including vulkan headers even when
not building with vulkan support. I also guarded the GL header
inclusion even though this doesn't appear to break anything today.
Fixes#6330.
This makes the default fit on screen, autofit and window-scale
changing behavior to use the screen working area, instead of
the whole screen area.
As a result mpv window doesn't cover the taskbar now when opening
videos with size larger than the screen size.
The actual behavior now is the same as expected behavior for
usecases 1-4 from #4363.
This commit also removes the screenrc from w32 struct.
The screen rect can now be retrieved via `get_screen_area` function,
which was renamed from `update_screen_rect`.
On a multi-monitor system, if the user moved the window between
monitors, this function will return the current screen area under
the window, and not the screen area from monitor specified by
`--screen` option. The `--screen` option sets the initial monitor
the mpv window is displayed on.
Returning -1 in a function with return type bool is the same as
returning true. In the error paths, false should be returned to
indicate that something went wrong.
depending on the capabilities of the system and testing of various
attributes the resulting CGL pixel format can change. due to that
probing it can be helpful to know which pixel format is used to create
the CGL context. added some verbose logging that outputs final pixel
format.
this adds support for GPU rendered screenshots, DR (theoretically) and
possible other advanced functions in the future that need to be executed
from the rendering thread.
additionally frames that would be off screen or not be displayed when on
screen are being dropped now.
I was inconsistent about this originally, as the functionality was
moved into the core spec in 1.1 and so both suffixed and unsuffixed
versions of everything exist and can be mixed together.
There's no reason to fail to build with 1.0.39+ so I'm fixing the
names.
Currently, the error paths in init() are a bit confusing, and we can
end up trying to pop the current context when there is no context,
which leads to distracting error messages.
I also added an explicit path to return early if the GPU backend is
not OpenGL or Vulkan. It's pointless to do any other cuda init
after that point. (Of course, someone could write more interops.)
Fixes#6256
Since 810acf32d6 video_plane can be NULL
under some circumstances. While there is a check in init, init treats
this as an error condition and would call uninit, which in turn calls
disable_video_plane, which would then segfault. Fix this by including
a NULL check inside disable_video_plane, so that it doesn't try to
disable what isnt' there.
Despite their place in the tree, hwdecs can be loaded and used just
fine by the vulkan GPU backend.
In this change we add Vulkan interop support to the cuda/nvdec hwdec.
The overall process is mostly straight forward, so the main observation
here is that I had to implement it using an intermediate Vulkan buffer
because the direct VkImage usage is blocked by a bug in the nvidia
driver. When that gets fixed, I will revist this.
Nevertheless, the intermediate buffer copy is very cheap as it's all
device memory from start to finish. Overall CPU utilisiation is pretty
much the same as with the OpenGL GPU backend.
Note that we cannot use a single intermediate buffer - rather there
is a pool of them. This is done because the cuda memcpys are not
explicitly synchronised with the texture uploads.
In the basic case, this doesn't matter because the hwdec is not
asked to map and copy the next frame until after the previous one
is rendered. In the interpolation case, we need extra future frames
available immediately, so we'll be asked to map/copy those frames
and vulkan will be asked to render them. So far, harmless right? No.
All the vulkan rendering, including the upload steps, are batched
together and end up running very asynchronously from the CUDA copies.
The end result is that all the copies happen one after another, and
only then do the uploads happen, which means all textures are uploaded
the same, final, frame data. Whoops. Unsurprisingly this results in
the jerky motion because every 3/4 frames are identical.
The buffer pool ensures that we do not overwrite a buffer that is
still waiting to be uploaded. The ra_buf_pool implementation
automatically checks if existing buffers are available for use and
only creates a new one if it really has to. It's hard to say for sure
what the maximum number of buffers might be but we believe it won't
be so large as to make this strategy unusable. The highest I've seen
is 12 when using interpolation with tscale=bicubic.
A future optimisation here is to synchronise the CUDA copies with
respect to the vulkan uploads. This can be done with shared semaphores
that would ensure the copy of the second frames only happens after the
upload of the first frame, and so on. This isn't trivial to implement
as I'd have to first adjust the hwdec code to use asynchronous cuda;
without that, there's no way to use the semaphore for synchronisation.
This should result in fewer intermediate buffers being required.
This is arguably a little contrived, but in the case of CUDA interop,
we have to track additional state on the cuda side for each exported
buffer. If we want to be able to manage buffers with an ra_buf_pool,
we need some way to keep that CUDA state associated with each created
buffer. The easiest way to do that is to attach it directly to the
buffers.
The CUDA/Vulkan interop works on the basis of memory being exported
from Vulkan and then imported by CUDA. To enable this, we add a way
to declare a buffer as being intended for export, and then add a
function to do the export.
For now, we support the fd and Handle based exports on Linux and
Windows respectively. There are others, which we can support when
a need arises.
Also note that this is just for exporting buffers, rather than
textures (VkImages). Image import on the CUDA side is supposed to
work, but it is currently buggy and waiting for a new driver release.
Finally, at least with my nvidia hardware and drivers, everything
seems to work even if we don't initialise the buffer with the right
exportability options. Nevertheless I'm enforcing it so that we're
following the spec.
Since the code just broke out of the loop on a match rather than jumping
straight to the end of the function body, it ended up hitting the code
path for when the end of the list was reached.
since we draw our own title bar we lose the standard functionality of
the system provided title bar. because of that we have to reimplement
the functionality of double clicking the title bar. depending on the
system preferences we want to minimize, zoom or do nothing.
Fixes#6223
On a multi monitor setup, when the center of the window was going off
screen, the icc profile would always switch to the profile of the first
screen.
This fixes the issue by defaulting the value to the current screen.
This was pased on the texture height, which was a mistake. In some cases
it could exceed the actual size of the buffer, leading to a vulkan API
error. This didn't seem to cause any problems in practice, since a
too-large synchronization is just bad for performance and shouldn't do
any harm internally, but either way, it was still undefined behavior to
submit a barrier outside of the buffer size.
Fix the calculation, thus fixing this issue.
Since linear downscaling makes sense to handle independently from
linear/sigmoid upscaling, we split this option up. Now,
linear-downscaling is its own option that only controls linearization
when downscaling and nothing more. Likewise, linear-upscaling /
sigmoid-upscaling are two mutually exclusive options (the latter
overriding the former) that apply only to upscaling and no longer
implicitly enable linear light downscaling as well.
The old behavior was very confusing, as evidenced by issues such
as #6213. The current behavior should make much more sense, and only
minimally breaks backwards compatibility (since using linear-scaling
directly was very uncommon - most users got this for free as part of
gpu-hq and relied only on that).
Closes#6213.
when entering a Split View a windowDidEnterFullScreen event happens
without a previous toggleFullScreen call. in that case it tries to stop
an animation that was never initiated by us and basically breaks the
system initiated fullscreen, or in this case the Split View. immediately
after entering the fullscreen it tries top stop the animation and
resizes the window, which causes the window to exit fullscreen. only
try to stop an animation that was initiated by us and is safe to stop.
by default the pixel format creation falls back to software renderer
when everything fails. this is mostly needed for VMs. additionally one
can directly request an sw renderer or exclude it entirely.
moved the retrieval of the macOS specific options from the backend
initialisation to the initialisation of the CocoaCB class, the earliest
point possible. this way macOS specific options can be used for the
opengl context creation for example.
This is to improve the experience when running with default settings
on a driver that doesn't have any overlay planes (or indeed only one
plane), but still supports DRM atomic. Since the drmprime video plane
is set to pick an overlay plane by default it would fail on these
drivers due to not being able to create any atomic context. Users with
such cards had to specify --drm-video-plane-id manually to some bogus
value (it's not used after all).
The "video" plane is only ever used by the drmprime-drm hwdec interop,
which is not used at all in the typical usecase where everything is
actually rendered on to the "OSD" plane using EGL, so having an atomic
context without the "video" plane should be fine most of the time.
For vec3, the alignment and size differ. The current code will pack a
struct like { vec3; float; vec2 } into 8 machine words, whereas the spec
would only use 6.
This actually fixes a real bug: The only place in the code I could find
where it was conceivably possible that a vec3 is followed by a float was
when using --gpu-dumb-mode in combination with --gamma-factor, and only
when --gpu-api=vulkan. So it's no surprised nobody ran into it yet.
These used to be unsupported long ago, but it seems glslang added
support in the meantime. (I don't know which version, but I'm guessing
it was long enough ago that we don't have to add a feature check)
Should hopefully help make push constant layouts more robust against
possible bugs either in our code or in the driver.
Certain low-end Mali GPUs have a rather low precision and overflow
during the PRNG calculations, thereby breaking e.g. deband-grain.
Modify the permute() to avoid this, this does not impact the
quality of PRNG output (noticeably).
This problem was observed on:
GL_VENDOR='ARM', GL_RENDERER='Mali-T720'
GL_VERSION='OpenGL ES 3.1 v1.r15p0-00rel0.bdd9e62cdc8c88e0610a16b5901161e9'
Upstream has this now. Didn't really make any different for me (except
making the polar compute shader 2%-3% faster), but maybe it does for
somebody else.
When using multiple compute shaders as part of the same pass, there can
be a conflict in the block sizes. In the problematic case, the HDR
detection shader can collide with the polar sampling shader. In this
case, the solution is clear - the passes that can handle any size should
"give in" and not overwrite the block sizes.
Fixes#6083.
instead of force unwrapping and chaining the optional vars in our
containsMouseLocation function, safely unwrap and guard the resulting
var.
Fixes#6062
Add another parameter to mpv_opengl_drm_params to hold the FD to the
render node, so that the fd can be passed to hwdec_vaegl.
The render node is opened in context_drm_egl and inferred from the
primary device fd using drmGetRenderDeviceNameFromFd.
The previous code did not save enough information about the old state,
and could end up changing what plane the fbcon:s FB got attached to,
or in worse case causing a blank screen (observed in some multi-screen
setups on Sandy Bridge).
In addition refactor the handling of drmModeModeInfo property blobs to
not leak, as well as enable reuse of already created blobs.