Due to the COM Single-Threaded Apartment model, the thread owning the
objects will still do all the actual method calls (in the form of
message dispatches), but at least this will be COM's problem rather than
having to set up several handles and adding extra code to the event
thread.
Since the event thread still needs to own the WASAPI handles to avoid
waiting on another thread to dispatch the messages, the init and uninit
code still has to run in the thread.
This also removes a broken drain implementation and removes unused
headers from each of the files split from the original ao_wasapi.c.
ao_wasapi.c was almost entirely init code mixed with option code and
occasionally actual audio handling code. Split most things to
ao_wasapi_utils.c and keep the audio handling code in ao_wasapi.c.
Gets rid of the internal ring buffer and get_buffer. Corrects an
implementation error in thread_reset.
There is still a possible race condition on reset, and a few refactors
left to do. If feasible, the thread that handles everything
WASAPI-related will be made to only handle feed events.
This reverts commit 75dd3ec210.
This broke seeking with ordered chapters in some situations. While
the reverted commit was perfectly fine for playback of normal files,
it overlooked that in the ordered chapters case switching segments
actually reinitialized the audio chain completely, including the
decoder. And decoders still read packets on initialization. We can
restore the original commit as soon as decoders stop doing this.
Apparently the "right" place to initialize the hardware decoder is in
the libavcodec get_format callback.
This doesn't change vda.c and vdpau_old.c, because I don't have OSX, and
vdpau_old.c is probably going to be removed soon (if Libav ever manages
to release Libav 10). So for now the init_decoder callback added with
this commit is optional.
This also means vdpau.c and vaapi.c don't have to manage and check the
image parameters anymore.
This change is probably needed for when libavcodec VDA supports gets a
new iteration of its API.
This updates the logic for the new, somewhat unified behavior of SRGB
and 3DLUT since 34bf9be (not that it was particularly correct even that
change) and checks for the presence of corresponding extensions only in
the cases in which they're needed.
This commit:
- Changes some of the #define and variable names for clarification and
adds comments where appropriate.
- Unifies :srgb and :icc-profile, making them fit into the same step of
the decoding process and removing the weird interactions between both
of them.
- Makes :icc-profile take precedence over :srgb (to significantly reduce
the number of confusing and useless special cases)
- Moves BT709 decompanding (approximate or actual) to the shader in all
cases, making it happen before upscaling (instead of the old 0.45
gamma function). This is the simpler and more proper way to do it.
- Enables the approx gamma function to work with :srgb as well due to
this (since they now share the gamma expansion code).
- Renames :icc-approx-gamma to :approx-gamma since it is no longer tied
to the ICC options or LittleCMS.
- Uses gamma 2.4 as input space for the actual 3DLUT, this is now a
pretty arbitrary factor but I picked 2.4 mainly because a higher pure
power value here seems to produce visually better results with wide
gamut profiles, rather then the previous 1.95 or BT.709.
- Adds the input gamma space to the 3dlut cache header in case we change
it more in the future, or even make it user customizable (though I
don't see why the latter would really be necessary).
- Fixes the OSD's gamma when using :srgb, which was previously still
using the old (0.45) approximation in all cases.
- Updates documentation on :srgb, it was still mentioning the old
behavior from circa a year ago.
This commit should serve to both open up and make the CMS/shader code much
more accessible and less confusing/error-prone and simultaneously also
improve the performance of 3DLUTs with wide gamut color spaces.
I would liked to have made it more modular but almost all of these
changes are interdependent, save for the documentation updates.
Note: Right now, the "3DLUT takes precedence over SRGB" logic is just
coded into gl_lcms.c's compile_shaders function. Ideally, this should be
done earlier, when parsing the options (by overriding the actual
opts.srgb flag) and output a warning to the user.
Note: I'm not sure how well this works together with real-world
subtitles that may need to be color corrected as well. I'm not sure
whether :approx-gamma needs to apply to subtitles as well. I'll need to
test this on proper files later.
Note: As of now, linear light scaling is still intrinsically tied to
either :srgb or :icc-profile. It would be thinkable to have this as an
extra option, :linear-scaling or similar, that could be used with or
without the two color management options.
Assume obtained.samples contains the number of samples the SDL audio
callback will request at once. Then make sure ao.c will set the buffer
size at least to 3 times that value (or more).
Might help with bad SDL audio backends like ESD, which supposedly uses a
500ms buffer.
This obviously doesn't work. It wasn't much of a problem in the past
because most passthrough formats use 2 channels, which is also the
default for downmix.
In general, we don't need to have a large hw audio buffer size anymore,
because we can quickly fill it from the soft buffer.
Note that this probably doesn't change much anyway. On my system (dmix
enabled), the buffer size is only 170ms, and ALSA won't give more. Even
when using a hardware device the buffer size seems to be limited to
341ms.
This AO pretended to support volume operations when in spdif passthrough
mode, but actually did nothing. This is wrong: at least the GET
operations must write their argument. Signal that volume is unsupported
instead.
This was probably a hack to prevent insertion of volume filters or so,
but it didn't work anyway, while recovering after failed volume filter
insertion does work, so this is not needed at all.
Since the addition of the AO feed thread, 200ms of latency (MIN_BUFFER)
was added to all push-based AOs. This is not so nice, because even AOs
with relatively small buffering (e.g. ao_alsa on my system with ~170ms
of buffer size), the additional latency becomes noticable when e.g.
toggling mute with softvol.
Fix this by trying to keep not only 200ms minimum buffer, but also 200ms
maximum buffer. In other words, never buffer beyond 200ms in total. Do
this by estimating the AO's buffer fill status using get_space and the
initially known AO buffer size (the get_space return value on
initialization, before any audio was played). We limit the maximum
amount of data written to the soft buffer so that soft buffer size and
audio buffer size equal to 200ms (MIN_BUFFER).
To avoid weird problems with weird AOs, we buffer beyond MIN_BUFFER if
the AO's get_space requests more data than that, and as long as the soft
buffer is large enough.
Note that this is just a hack to improve the latency. When the audio
chain gains the ability to refilter data, this won't be needed anymore,
and instead we can introduce some sort of buffer replacement function in
order to update data in the soft buffer.
It is possible to have ao->reset() called between ao->pause() and
ao->resume() when seeking during the pause. If the underlying PCM
supports pausing, resuming an already reset PCM will produce an error.
Avoid that by explicitly checking PCM state before calling
snd_pcm_pause().
Signed-off-by: wm4 <wm4@nowhere>
The uint64_t math would cause overflow at long enough system uptimes
(...such as 3 days), and any precision error given by the double math will
be under one milisecond.
One strange issue is that we apparently can't stop the audio API on
audio reset (ao_driver.reset). We could use SDL_PauseAudio, but that
doesn't specify whether remaining audio is dropped. We also could use
SDL_LockAudio, but holding that over a long time will probably be bad,
and it probably doesn't drop audio. This means we simply play silence
after a reset, instead of stopping the callback completely. (The
existing code ran into an underrun in this situation.)
The delay estimation works about the same. We simply assume that the
callback is locked to audio timing (like ao_jack), and that 1 callback
corresponds to 1 period. It seems this (removed) code fragment assumes
there 1 one period size delay:
// delay subcomponent: remaining audio from the next played buffer, as
// provided by the callback
buffer_interval += callback_interval;
so we explicitly do that too.
Apparently, this is always _really_ monotonic, despite what the Linux
manpages say. So this should be much better than gettimeofday(). (At
times there were kernel bugs which broke the monotonic property.)
From the perspective of the player, time can still be discontinuous
(you could just stop the process with ^Z), but at least it's guaranteed
to be monotonic without further hacks required.
Also note that clock_gettime() returns the time in nanoseconds. We want
microseconds only, because that's the unit we chose internally. Another
problem is that nanoseconds can wrap pretty quickly (less than 300 years
in 63 bits), so it's just better to use microseconds. The devision won't
make the code that much slower (compilers can avoid a real division).
Note: this expects that the system provides clock_gettime() as well as
CLOCK_MONOTONIC. Both are optional according to POSIX. The only system
I know which doesn't have these, OSX, has seperate timer code anyway,
but I still don't know whether more obscure (yet supported) platforms
have a problem with this, so I'm playing safely. But this still expects
that CLOCK_MONOTONIC always works at runtime if it's defined.
This is probably "safer". Without it, we will play 1 sample, because the
logic was written in a way to decode 1 sample if audio is paused. 1
sample usually will initialize the audio PTS, but not play any real
audio. Also see previous commit.
In ancient times, this actually used 1 byte (instead of 1 sample), so
clearly no sample was written, unless the audio was 8-bit mono.
Remove the ao_buffer_playable_samples field. This contained the number
of samples that fill_audio_out_buffers() wanted to write to the AO (i.e.
this data was supposed to be played at some point), but ao_play()
rejected it due to partial fill.
This could happen with many AOs, notably those which align all written
data to an internal period size (often called "outburst" in the AO
code), and the accepted number of samples is rounded down to period
boundaries. The left-over samples at the end were still kept in
mpctx->ao_buffer, and had to be played later.
The reason ao_buffer_playable_samples had to exist was to make sure that
at EOF, the correct number of left-over samples was played (and not
possibly other data in the buffer that had to be sliced off due to
endpts in fill_audio_out_buffers()). (You'd think you could just slice
the entire buffer, but I suspect this wasn't done because the end time
could actually change due to A/V sync changes. Maybe that was the reason
it's so complicated.)
Some commits ago, ao.c gained internal buffering, and ao_play() will
never return partial writes - as long as you don't try to write more
samples than ao_get_space() reports. This is always the case. The only
exception is filling the audio buffers while paused. In this case, we
decode and play only 1 sample in order to initialize decoding (e.g. on
seeking). Actually playing this 1 sample is in fact a bug, but even of
the AO doesn't have period size alignment, you won't notice it. In
summary, this means we can safely remove the code.
Until now, this was always conflated with uninit. This was ugly, and
also many AOs emulated this manually (or just ignored it). Make draining
an explicit operation, so AOs which support it can provide it, and for
all others generic code will emulate it.
For ao_wasapi, we keep it simple and basically disable the internal
draining implementation (maybe it should be restored later).
Tested on Linux only.
Same deal as with the previous commit. We don't lose any functionality,
except for waiting "properly" on audio end, instead of waiting using the
delay estimate.
This removes the ringbuffer management from the code, and uses the
generic code added with the previous commit. The result should be
pretty much the same.
The "estimate" sub-option goes away. This estimation is now always
active. The new code for delay estimation is slightly different, and
follows the claim of the jack framework that callbacks are timed
exactly.
This has 2 goals:
- Ensure that AOs have always enough data, even if the device buffers
are very small.
- Reduce complexity in some AOs, which do their own buffering.
One disadvantage is that performance is slightly reduced due to more
copying.
Implementation-wise, we don't change ao.c much, and instead "redirect"
the driver's callback to an API wrapper in push.c.
Additionally, we add code for dealing with AOs that have a pull API.
These AOs usually do their own buffering (jack, coreaudio, portaudio),
and adding a thread is basically a waste. The code in pull.c manages
a ringbuffer, and allows callback-based AOs to read data directly.
Since the AO will run in a thread, and there's lots of shared state with
encoding, we have to add locking.
One case this doesn't handle correctly are the encode_lavc_available()
calls in ao_lavc.c and vo_lavc.c. They don't do much (and usually only
to protect against doing --ao=lavc with normal playback), and changing
it would be a bit messy. So just leave them.
We want to move the AO to its own thread. There's no technical reason
for making the ao struct opaque to do this. But it helps us sleep at
night, because we can control access to shared state better.
This field will be moved out of the ao struct. The encoding code was
basically using an invalid way of accessing this field.
Since the AO will be moved into its own thread too and will do its own
buffering, the AO and the playback core might not even agree which
sample a PTS timestamp belongs to. Add some extrapolation code to handle
this case.
Use QueryPerformanceCounter to improve the accuracy of
IAudioClock::GetPosition.
While this is mainly for "realtime correctness" (usually the delay is a
single sample or less), there are cases where IAudioClock::GetPosition
takes a long time to return from its call (though the documentation doesn't
define what a "long time" is), so correcting its value might be important in
case the documented possible delay happens.
The lack of device latency made get_delay report latencies shorter than
they should; on systems with fast enough drivers, the delay is not
perceptible, but high enough invisible delays would cause desyncs.
I'm not yet completely sure whether this is 100% accurate, there are
some issues involved when repeatedly pausing+unpausing (the delay might
jump around by several dozen miliseconds), but seeking seems to be
working correctly now.
The player didn't quit when the end of a file was reached. The reason
for this is that jack reported a constant audio delay even when all
audio was done playing. Whether that was recognized as EOF by the player
depended whether the exact value was higher or lower than the player's
threshhold for what it considers no more audio.
get_delay() should return amount of time it takes until the last sample
written to the audio buffer reaches the speaker. Therefore, we have to
track the estimated time when the last sample is done, and subtract it
from the calculated latency. Basically, the latency is the only amount
of time left in the delay, and it should go towards 0 as audio reaches
ths speakers.
I'm not sure if this is correct, but at least it solves the problem. One
suspicious thing is that we use system time to estimate the end of the
audio time. Maybe using jack_frame_time() would be more correct. But
apart from this, there doesn't seem to be a better way to handle this.