Commit Graph

963 Commits

Author SHA1 Message Date
James Almer 567c67c6c8 avcodec/ac3dsp: make len a size_t in float_to_fixed24
Should simplify asm implementations, and prevent UB on at least win64.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-22 18:33:00 -03:00
Andreas Rheinhardt 6f7bf64dbc avcodec: Remove DCT, FFT, MDCT and RDFT
They were replaced by TX from libavutil; the tremendous work
to get to this point (both creating TX as well as porting
the users of the components removed in this commit) was
completely performed by Lynne alone.

Removing the subsystems from configure may break some command lines,
because the --disable-fft etc. options are no longer recognized.

Co-authored-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-01 02:25:09 +02:00
Martin Storsjö 30cea1d39b Revert "avcodec/arm/hevc: remove duplicate mov of deblock neon"
This reverts commit 9413bdc381.

That commit broke the fate HEVC tests - unfortunately I only
tested checkasm for that patch, and that function is still
lacking checkasm coverage.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-03-23 09:39:32 +02:00
xufuji456 9413bdc381 avcodec/arm/hevc: remove duplicate mov of deblock neon
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-03-22 09:46:22 +02:00
xufuji456 b10eabdab3 codec/arm/hevcdsp_idct_neon: remove duplicate mov
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-03-07 14:31:08 +02:00
xufuji456 67fd1b79e7 libavcodec/hevc: remove duplicate semicolon in hevcdsp_init_neon
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-02-28 15:24:41 +02:00
xufuji456 05438db024 libavcodec/hevc: reuse scale_store on idct32x32_neon
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-02-28 13:29:25 +02:00
Lynne e0661fc805
dca_core: convert to lavu/tx
Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the
arm32 and aarch64 changes.
2022-11-06 14:39:36 +01:00
Andreas Rheinhardt 76d8f0dd14 avcodec/ac3dsp: Remove unused parameter
Forgotten in fd98594a88.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-29 23:37:13 +02:00
Martin Storsjö 86519234b8 arm: vc1dsp: Canonicalize the syntax for aligned NEON loads/stores
This hopefully should fix building with older toolchains, hopefully
fixing the fate failures on
http://fate.ffmpeg.org/history.cgi?slot=armel5tej-qemu-debian-gcc4.4.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-29 10:28:45 +03:00
Andreas Rheinhardt 9beba05311 avcodec/fmtconvert: Remove unused AVCodecContext parameter
Unused since d74a8cb7e4.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-21 20:26:40 +02:00
Rémi Denis-Courmont b52034270a lavc/vorbisdsp: use ptrdiff_t rather than intptr_t
... for a difference between pointers.
2022-09-19 13:51:00 -03:00
James Cowgill 50a4dff69f avcodec/arm/sbcenc: avoid callee preserved vfp registers
When compiling FFmpeg with GCC-9, some very random segfaults were
observed in code which had previously called down into the SBC encoder
NEON assembly routines. This was caused by these functions clobbering
some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was
using these registers to save local variables, but after these
functions returned, they would contain garbage.

Fix by reallocating the registers in the two affected functions in
the following way:
 ff_sbc_analyze_4_neon: q2-q5 => q8-q11, then q1-q4 => q8-q11
 ff_sbc_analyze_8_neon: q2-q9 => q8-q15

The reason for using these replacements is to keep closely related
sets of registers consecutively numbered which hopefully makes the
code more easy to follow. Since this commit only reallocates
registers, it should have no performance impact.

Signed-off-by: James Cowgill <jcowgill@debian.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-13 09:51:51 +03:00
Andreas Rheinhardt a54e53a1c4 avcodec/vp8dsp: Constify src in vp8_mc_func
Reviewed-by: Peter Ross <pross@xvid.org>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 20:57:51 +02:00
Martin Storsjö 3f456dc245 arm: rv40dsp: Change stride parameters to ptrdiff_t
These were missed when h264_chroma_mc_func was changed in
e4a94d8b36.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-02 23:04:58 +03:00
Martin Storsjö 826cd5e098 arm: vc1sdp: Change stride parameters to ptrdiff_t
This was missed in db54426975.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-02 23:04:55 +03:00
Lynne f99d15cca0
arm/fft: disable NEON optimizations for 131072pt transforms
This has been broken since the start, and it was only discovered
when I started testing my replacement for the FFT.
Disable it, since there's no point in fixing slower code that's about
to be removed anyway.

The vfp version is not affected.
2022-08-29 07:13:43 +02:00
Andreas Rheinhardt 6c4595190e avcodec/flacdsp: Split encoder-only parts into a ctx of its own
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 03:28:45 +02:00
Andreas Rheinhardt 3a869cd5cd avcodec/flacdsp: Remove unused function parameter
Forgotten in e609cfd697.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 03:28:45 +02:00
Andreas Rheinhardt 333b32af8e avcodec/h264chroma: Constify src in h264_chroma_mc_func
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 03:02:13 +02:00
Andreas Rheinhardt b3bbbb14d0 avcodec/hevcdsp: Constify src pointers
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 02:54:04 +02:00
Andreas Rheinhardt 966fc1230a avcodec/mpegvideoencdsp: Allow pointers to const where possible
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-31 03:32:40 +02:00
Andreas Rheinhardt abb85429f3 avcodec/me_cmp: Constify me_cmp_func buffer parameters
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-31 03:31:53 +02:00
Andreas Rheinhardt af43da3e4d avcodec/videodsp: Constify buf in VideoDSPContext.prefetch
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-31 03:14:34 +02:00
Andreas Rheinhardt 7ab9b30800 avcodec/vp56: Move VP5-9 range coder functions to a header of their own
Also use a vpx prefix for them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-28 03:49:54 +02:00
Ben Avison 23c92e14f5 avcodec/vc1: Arm 32-bit NEON unescape fast path
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.

vc1dsp.vc1_unescape_buffer_c: 918624.7
vc1dsp.vc1_unescape_buffer_neon: 142958.0

Signed-off-by: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-01 10:03:34 +03:00
Ben Avison c07de58a72 avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C
version can still outperform the NEON version in specific cases. The balance
between different code paths is stream-dependent, but in practice the best
case happens about 5% of the time, the worst case happens about 40% of the
time, and the complexity of the remaining cases fall somewhere in between.
Therefore, taking the average of the best and worst case timings is
probably a conservative estimate of the degree by which the NEON code
improves performance.

vc1dsp.vc1_h_loop_filter4_bestcase_c: 19.0
vc1dsp.vc1_h_loop_filter4_bestcase_neon: 48.5
vc1dsp.vc1_h_loop_filter4_worstcase_c: 144.7
vc1dsp.vc1_h_loop_filter4_worstcase_neon: 76.2
vc1dsp.vc1_h_loop_filter8_bestcase_c: 41.0
vc1dsp.vc1_h_loop_filter8_bestcase_neon: 75.0
vc1dsp.vc1_h_loop_filter8_worstcase_c: 294.0
vc1dsp.vc1_h_loop_filter8_worstcase_neon: 102.7
vc1dsp.vc1_h_loop_filter16_bestcase_c: 54.7
vc1dsp.vc1_h_loop_filter16_bestcase_neon: 130.0
vc1dsp.vc1_h_loop_filter16_worstcase_c: 569.7
vc1dsp.vc1_h_loop_filter16_worstcase_neon: 186.7
vc1dsp.vc1_v_loop_filter4_bestcase_c: 20.2
vc1dsp.vc1_v_loop_filter4_bestcase_neon: 47.2
vc1dsp.vc1_v_loop_filter4_worstcase_c: 164.2
vc1dsp.vc1_v_loop_filter4_worstcase_neon: 68.5
vc1dsp.vc1_v_loop_filter8_bestcase_c: 43.5
vc1dsp.vc1_v_loop_filter8_bestcase_neon: 55.2
vc1dsp.vc1_v_loop_filter8_worstcase_c: 316.2
vc1dsp.vc1_v_loop_filter8_worstcase_neon: 72.7
vc1dsp.vc1_v_loop_filter16_bestcase_c: 62.2
vc1dsp.vc1_v_loop_filter16_bestcase_neon: 103.7
vc1dsp.vc1_v_loop_filter16_worstcase_c: 646.5
vc1dsp.vc1_v_loop_filter16_worstcase_neon: 110.7

Signed-off-by: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-01 10:03:33 +03:00
Martin Storsjö a78f136f3f configure: Use a separate config_components.h header for $ALL_COMPONENTS
This avoids unnecessary rebuilds of most source files if only the
list of enabled components has changed, but not the other properties
of the build, set in config.h.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-03-16 14:12:49 +02:00
J. Dekker 7fc6015de9 Revert "arm: hevc_qpel: Fix the assembly to work with non-multiple of 8 widths"
This reverts commit 2589060b92 which was
originally to fix the FATE test. The real cause of the test breakage was
fixed in 22b7c37275.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-01-04 14:31:48 +01:00
J. Dekker 22b7c37275 lavc/arm: dont assign hevc_qpel functions for non-multiple of 8 widths
The assembly is written assuming that the width is a multiple of 8.

However the real issue is the functions were errorneously assigned to
the 2, 4, 6 & 12 widths. This behaviour never broke the decoder as
samples which trigger the functions for these widths have not been found
in the wild. This relies on the mappings in ff_hevc_pel_weight[].

Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-01-04 14:31:32 +01:00
Martin Storsjö 2d5a7f6d00 arm/aarch64: Improve scheduling in the avg form of h264_qpel
Don't use the loaded registers directly, avoiding stalls on in
order cores. Use vrhadd.u8 with q registers where easily possible.

Signed-off-by: Martin Storsjö <martin@martin.st>
2021-10-18 14:27:36 +03:00
Martin Storsjö 2589060b92 arm: hevc_qpel: Fix the assembly to work with non-multiple of 8 widths
This unbreaks the fate-checkasm-hevc_pel test on arm targets.

The assembly assumed that the width passed to the DSP functions is
a multiple of 8, while the checkasm test used other widths too.

This wasn't noticed before, because the hevc_pel checkasm tests
(that were added in 9c513edb79 in
January) weren't run as part of fate until in
b492cacffd in August.

As this hasn't been an issue in practice with actual full decoding
tests, it seems like the actual decoder doesn't call these functions
with such widths. Therefore, we could alternatively fix the test
to only test things that the real decoder does, and this modification
could be reverted.

Signed-off-by: Martin Storsjö <martin@martin.st>
2021-08-25 23:24:49 +03:00
Andreas Rheinhardt afc95a10ac avcodec/h264dsp, h264idct: Fix lengths of array parameters
Fixes many -Warray-parameter warnings from GCC 11.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-08-08 17:44:57 +02:00
Andreas Rheinhardt 7c1f347b18 avcodec: Remove deprecated old encode/decode APIs
Deprecated in commits 7fc329e2dd
and 31f6a4b4b8.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-04-27 10:43:12 -03:00
Andreas Rheinhardt f3c197b129 Include attributes.h directly
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-19 14:34:10 +02:00
James Almer f1a894f9d3 avcodec: add missing FF_API_OLD_ENCDEC wrappers to xmm clobber functions
Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-26 19:26:31 -03:00
Lynne 151b41c8cc
fft: remove 16-bit FFT and MDCT code
No longer used by anything.
Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's
simply too much work for code meant to be all removed anyway.
2021-01-14 01:44:21 +01:00
Lynne 9e05421dbe
ac3enc_fixed: drop unnecessary fixed-point DSP code 2021-01-14 01:44:20 +01:00
Anton Khirnov e15371061d lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump
They are not properly namespaced and not intended for public use.
2021-01-01 14:14:57 +01:00
Anton Khirnov c8c2dfbc37 lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h
That is a more appropriate place for it.
2021-01-01 14:11:01 +01:00
Martin Storsjö b252178321 libavcodec: arm: Add a NEON implementation of pixblockdsp
Cortex A7     A8     A9    A53   A72
get_pixels_c:                144.7  146.0  143.0  137.7   69.0
get_pixels_armv6:            112.0  106.7   90.2   95.0   72.5
get_pixels_neon:              69.0   29.7   68.7   40.2   19.0
get_pixels_unaligned_c:      144.7  146.2  143.0  137.7   69.0
get_pixels_unaligned_neon:    77.0   36.5   72.5   48.5   19.0
diff_pixels_c:               376.7  319.7  265.5  307.7  148.0
diff_pixels_armv6:           179.0  159.5  205.5  139.0  142.0
diff_pixels_neon:             69.0   40.2   77.5   53.2   26.0
diff_pixels_unaligned_c:     376.7  319.7  265.5  307.7  148.0
diff_pixels_unaligned_neon:   85.0   54.5   93.5   66.7   26.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 23:37:43 +03:00
qoroliang cacdac819f lavc/hevcdec: fix the HEVC decoder crash when memory over-read
Fix an occasional crash for hevc decoder in ARM 32 platform, the
root cause is the memory over read(read cross the memory boundary)
in SAO NENO functions ff_hevc_sao_band_filter_neon_8 and
ff_hevc_sao_edge_filter_neon_8.

After this fix, the crash disapper in the massive Android phone
test.

Signed-off-by: qoroliang <qoroliang@tencent.com>
2020-04-20 10:28:04 +08:00
Aman Gupta 0e49560806 avcodec/arm/mlpdsp: add missing dependency for truehd
Signed-off-by: Aman Gupta <aman@tmm1.net>
2019-11-11 11:29:55 -08:00
James Almer 47e12966b7 Merge commit '0676de935b1e81bc5b5698fef3e7d48ff2ea77ff'
* commit '0676de935b1e81bc5b5698fef3e7d48ff2ea77ff':
  arm: Implement a NEON version of 422 h264_h_loop_filter_chroma

Merged-by: James Almer <jamrial@gmail.com>
2019-03-22 16:06:04 -03:00
Martin Storsjö 0676de935b arm: Implement a NEON version of 422 h264_h_loop_filter_chroma
Previously, the 420 version was used even for 422.

This fixes occasional checkasm failures.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-03-21 22:03:46 +02:00
James Almer d6b62ce1ac Merge commit 'cef914e08310166112ac09567e66452a7679bfc8'
* commit 'cef914e08310166112ac09567e66452a7679bfc8':
  arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2

Merged-by: James Almer <jamrial@gmail.com>
2019-03-14 16:19:41 -03:00
James Almer 7b9ca44cbc arm/h264dsp: change loop filter stride argument to ptrdiff_t
This was missed in d5d699ab6e

Signed-off-by: James Almer <jamrial@gmail.com>
2019-02-20 19:38:55 -03:00
Martin Storsjö cef914e083 arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2
This makes it similar to put_epel16_v6, and gives a 10-25%
speedup of this function.

Before:                   Cortex A7       A8       A9      A53     A72
vp8_put_epel16_h6v6_neon:    3058.0   2218.5   2459.8   2183.0  1572.2
After:
vp8_put_epel16_h6v6_neon:    2670.8   1934.2   2244.4   1729.4  1503.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:18 +02:00
Meng Wang 3b2fd96048 avcodec/arm/hevcdsp_sao : add NEON optimization for sao
Signed-off-by: Meng Wang <wangmeng.kids@bytedance.com>
Reviewed-by: Shengbin Meng <shengbinmeng@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-04-09 03:45:15 +02:00
Martin Storsjö 5f83935de4 arm: hevcdsp: Add commas between macro arguments
When targeting darwin, clang requires commas between arguments,
while the no-comma form is allowed for other targets.

Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.

Signed-off-by: Martin Storsjö <martin@martin.st>
2018-03-31 21:59:01 +03:00