1
mirror of https://git.videolan.org/git/ffmpeg.git synced 2024-09-09 09:16:59 +02:00
Commit Graph

1032 Commits

Author SHA1 Message Date
Diego Biurrun
0a35f128f3 cabac: x86: Give optimizations header a more meaningful name 2016-12-01 08:23:54 +01:00
Diego Biurrun
0361e4dcb4 h264_qpel: x86: Move function with only one instance out of template macro
libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]
2016-11-08 17:21:02 +01:00
Diego Biurrun
3cba09e522 x86: Drop stray semicolons after function definitions
libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
2016-11-05 12:41:45 +01:00
Martin Storsjö
2e55e26b40 vp9: Flip the order of arguments in MC functions
This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:12:02 +02:00
Pierre Edouard Lepere
6d5636ad9a hevc: x86: Add add_residual() SIMD optimizations
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.

Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-10-22 17:33:35 +02:00
Diego Biurrun
788544ff0e audiodsp: x86: Remove pointless header file
Its single forward declaration can be moved to the only place
it is used, like is done for all other dsp init files.
2016-10-19 15:20:41 +02:00
Diego Biurrun
b89804da9b x86: videodsp: Add parentheses to expression to work around warning
libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds
2016-10-19 10:13:34 +02:00
Diego Biurrun
6be7944ee2 x86: Add missing colons after assembly labels
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
2016-10-17 16:31:26 +02:00
Alexandra Hájková
112cee0241 hevc: Add SSE2 and AVX IDCT
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-11 18:21:04 +02:00
Anton Khirnov
e4128c08d7 Revert "hevc: x86: Refactor IDCT macro declarations"
This reverts commit d9dccc0389. There were
outstanding objections to this commit.
2016-10-06 15:24:04 +02:00
Diego Biurrun
5801f9ed24 h264_intrapred: x86: Update comments left behind in 95c89da36e 2016-10-06 12:32:34 +02:00
Diego Biurrun
d9dccc0389 hevc: x86: Refactor IDCT macro declarations 2016-10-06 12:32:34 +02:00
Ronald S. Bultje
715f139c9b vp9lpf/x86: make filter_16_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
8915320db9 vp9lpf/x86: make filter_48/84/88_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
725a216481 vp9lpf/x86: make filter_44_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
5bfa96c4b3 vp9lpf/x86: make filter_16_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
b905e8d2fe vp9lpf/x86: make filter_48/84_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
37637e6590 vp9lpf/x86: make filter_88_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
be10834bd9 vp9lpf/x86: make filter_44_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
7c62891efe vp9lpf/x86: save one register in SIGN_ADD/SUB.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
c6375a83d1 vp9lpf/x86: store unpacked intermediates for filter6/14 on stack.
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
4ce8ba72f9 vp9lpf/x86: move variable assigned inside macro branch.
The value is not used outside the branch.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
e4961035b2 vp9lpf/x86: simplify ABSSUM_CMP by inverting the comparison meaning.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
683da2788e vp9lpf/x86: remove unused register from ABSSUB_CMP macro.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
6e74e9636b vp9lpf/x86: slightly simplify 44/48/84/88 h stores.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
6411c328a2 vp9lpf/x86: make cglobal statement more conservative in register allocation.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
a6e288d624 vp9lpf/x86: save one register in loopfilter surface coverage.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
0ed21bdc9e vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
f2e3d706a1 vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}().
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
92d47550ea vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16
Similar gains as the ssse3 version once again

Additional improvements by Clément Bœsch <u@pkh.me>.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
6bea478158 vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
1f451eed60 vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2().
Similar gains in performance as the SSSE3 version

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
a692724c58 vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Justin Ruggles
b57e38f52c ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm
Adds a wrapper function for downmixing which detects channel count changes
and updates the selected downmix function accordingly.

Simplification and porting to current x86inc infrastructure by Diego Biurrun.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-01 00:46:25 +02:00
Justin Ruggles
43717469f9 ac3dsp: Reverse matrix in/out order in downmix()
Also use (float **) instead of (float (*)[2]). This matches the matrix
layout in libavresample so we can reuse assembly code between the two.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-01 00:45:55 +02:00
Hendrik Leppkes
8d1267932c x86/h264_weight: use appropriate register size for weight parameters
This fixes decoding corruption on 64 bit windows.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-09-30 12:18:22 +03:00
Diego Biurrun
2caa93b813 mpegaudiodsp: Change type of array stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.
2016-09-29 17:54:24 +02:00
Diego Biurrun
e4a94d8b36 h264chroma: Change type of stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.
2016-09-29 14:48:04 +02:00
Diego Biurrun
2ec9fa5ec6 idct: Change type of array stride parameters to ptrdiff_t
ptrdiff_t is the correct type for array strides and similar.
2016-09-29 14:48:03 +02:00
Diego Biurrun
009adfd4fb x86: fpel: Remove unnecessary sign extend 2016-09-29 14:47:41 +02:00
Anton Khirnov
de2ae3c1fa lavc: add clobber tests for the new encoding/decoding API 2016-09-28 10:01:52 +02:00
Anton Khirnov
12004a9a7f audiodsp/x86: yasmify vector_clipf_sse 2016-09-22 09:47:52 +02:00
Anton Khirnov
683da86aab audiodsp: reorder arguments for vector_clipf
This will make the x86 asm simpler.

ARM conversion by Martin Storsjö <martin@martin.st> and Janne Grunau
<janne-libav@jannau.net>
2016-09-22 09:47:52 +02:00
Anton Khirnov
eea9857bfd blockdsp: drop the high_bit_depth parameter
It has no effect, since the code is supposed to operate the same way for
any bit depth.
2016-09-22 09:47:52 +02:00
Anton Khirnov
75d98e30af audiodsp/x86: clear the high bits of the order parameter on 64bit
Also change shl to add, since it can be faster on some CPUs.

CC: libav-stable@libav.org
2016-09-19 19:18:07 +02:00
Anton Khirnov
1d6c76e11f audiodsp/x86: fix ff_vector_clip_int32_sse2
This version, which is the only one doing two processing cycles per loop
iteration, computes the load/store indices incorrectly for the second
cycle.

CC: libav-stable@libav.org
2016-09-19 19:18:07 +02:00
Diego Biurrun
de452e5037 pixblockdsp: Change type of stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.

Also adjust parameter names to be "stride" everywhere.
2016-09-14 14:12:36 +02:00
Diego Biurrun
721d57e608 vp56: Separate VP5 and VP6 dsp initialization
VP5 has no arch-specific optimizations (nor will it get some in the
future), so it makes no sense to try to share dsp init code with VP6.
2016-08-26 11:50:22 +02:00
Diego Biurrun
3fd22538bc prores: Change type of stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.

Also adjust parameter names to be "linesize" everywhere.
2016-08-26 11:50:21 +02:00
Diego Biurrun
f81be06cf6 cavs: Change type of stride parameters to ptrdiff_t
ptrdiff_t is the correct type for array strides and similar.
2016-08-26 11:48:15 +02:00