1
mirror of https://git.videolan.org/git/ffmpeg.git synced 2024-09-02 06:23:23 +02:00
Commit Graph

1858 Commits

Author SHA1 Message Date
Christophe Gisquet
9fa056ba75 pngdsp x86: use unaligned access
For test images manually generated to contain only up prediction,
timing results:
         8380x3032    255x185
before:   138635       1992
after:    139232       1996

Actually jumping to the proper version depending on the alignment:
8380x3032: 138767

A 0.5% speed improvement for gigantic images is not worth the code
duplication.

Fixes ticket #4148

Signed-off-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Tested-by: Benoit Fouet <benoit.fouet@free.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-03 11:56:22 +01:00
Kieran Kunhya
36091742d1 v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-26 20:30:47 +01:00
Michael Niedermayer
ea41e6d637 Merge commit '9c12c6ff9539e926df0b2a2299e915ae71872600'
* commit '9c12c6ff9539e926df0b2a2299e915ae71872600':
  motion_est: convert stride to ptrdiff_t

Conflicts:
	libavcodec/me_cmp.c
	libavcodec/ppc/me_cmp.c
	libavcodec/x86/me_cmp_init.c

See: 9c669672c7
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-24 12:13:00 +01:00
Vittorio Giovara
9c12c6ff95 motion_est: convert stride to ptrdiff_t
CC: libav-stable@libav.org
Bug-Id: CID 700556 / CID 700557 / CID 700558
2014-11-24 01:30:10 +00:00
Carl Eugen Hoyos
600e38f563 Fix standalone compilation of the apng decoder on x86. 2014-11-23 13:21:29 +01:00
Michael Niedermayer
65ce8f8895 avcodec/x86/Makefile: fix order
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-23 01:49:04 +01:00
Michael Niedermayer
d3512a0e89 avcodec/x86/lossless_audiodsp: fix fallback code for 32bit
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-22 21:08:38 +01:00
Michael Niedermayer
4327088da3 avcodec/x86/lossless_audiodsp: support len %16 == 8 in scalarproduct_and_madd_int16()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-22 20:40:36 +01:00
Reimar Döffinger
478c61ccb2 h264_i386: Optimize decode_significance_8x8_x86 for 64 bit.
11674 -> 10877 decicycles on my Phenom II.
Overall speedup was unfortunately within measurement error.

Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2014-11-22 14:06:48 +01:00
James Almer
3cec54b7d7 x86/flacdsp: add SSE2 and AVX decorrelate functions
Two to four times faster depending on instruction set, block size and channel count.
2014-11-13 13:47:55 -03:00
James Almer
84ccc317ce x86/flacdsp: separate decoder and encoder dsp initialization
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-12 14:41:45 -03:00
James Almer
7292b0477a x86/hpeldsp: fix loop in {avg,avg_no_rnd}_pixels16_x2_mmx
Handle it inside the __asm__() block.
Fixes fate-vc1_ilaced_twomv when using the gcc-usan toolchain.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-23 13:11:05 -03:00
Michael Niedermayer
3c1378ce0a Merge commit '2d91abade29e43bb45c881d45909b8ee77e904e2'
* commit '2d91abade29e43bb45c881d45909b8ee77e904e2':
  x86: h264_intrapred: Don't treat 32-bit integers as 64-bit

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-08 11:48:58 +02:00
Henrik Gramner
2d91abade2 x86: h264_intrapred: Don't treat 32-bit integers as 64-bit
The upper halves are not guaranteed to be zero in x86-64.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-10-08 08:15:52 +00:00
Mickaël Raulet
4ba6371a83 x86/hevc: get rid off packusdw for ssse3 compatibility
cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2

Fixes out of array access
Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-04 21:14:15 +02:00
James Almer
0de1d6287e x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2}
2x to 2.5x faster than the C version.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-02 22:11:55 -03:00
James Almer
acebff8e5d x86/mpegvideoencdsp: improve ff_pix_sum16_sse2
~15% faster.

Also add an mmxext version that takes advantage of the new code, and
build it alongside with the mmx version only on x86_32.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-01 13:07:22 -03:00
Michael Niedermayer
d22e88d120 avcodec/x86/fmtconvert: Fix operand size in ff_int32_to_float_fmul_array8_sse*
Fixes acodec-dca2 fate failure

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-28 19:04:06 +02:00
James Almer
26cd7b1e1a x86/fmtconvert: add ff_int32_to_float_fmul_array8_{sse,sse2}
About two times faster than the c wrapper.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-26 20:48:40 -03:00
Carl Eugen Hoyos
c0f9df30dd lavc/x86/idctdsp.h: Fix make checkheaders. 2014-09-25 22:18:25 +02:00
James Almer
a829870b2f avcodec/svq1enc: align buffer used by simd functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-25 16:00:20 -03:00
James Almer
4b892e469b x86/cavsdsp: fix buffer alignment in cavs_idct8_add_mmx()
It may be used by ff_add_pixels_clamped_sse2().
Should fix fate-cavs failures on some systems.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-25 16:00:16 -03:00
James Almer
4f4f08e6f0 x86/idctdsp: port {put,add}_pixels_clamped to yasm
Also add sse2 versions for both.
put_pixels_clamped port and sse2 version originally written by Timothy Gu.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 21:52:13 -03:00
James Almer
c99a882814 avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 21:43:19 -03:00
James Almer
ad26e83f9c avcodec/x86: use function pointers for {put,add}_pixels_clamped
Same behavior as in simple_idct.
This way the best optimized versions available will be used instead.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 18:52:32 -03:00
James Almer
70277d1d23 x86/videodsp: add ff_emu_edge_{hfix,hvar}_avx2
~15% faster than sse2.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 16:12:55 -03:00
James Almer
164d6c7f5b x86/videodsp: fix warning about discarded 'const' qualifier
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-23 19:59:20 -03:00
James Almer
6b2caa321f x86/vp9: add AVX and AVX2 MC
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-22 22:35:03 -03:00
James Almer
33c752be51 x86/me_cmp: port mmxext vsad functions to yasm
Also add mmxext versions of vsad8 and vsad_intra8, and sse2 versions of
vsad16 and vsad_intra16.
Since vsad8 and vsad16 are not bitexact, they are accordingly marked as
approximate.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-19 20:50:20 -03:00
James Almer
77f9a81cca x86/me_cmp: combine sad functions into a single macro
No point in having the sad8 functions separate now that the loop is no
longer unrolled.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-17 23:52:36 -03:00
Michael Niedermayer
41d82b85ab avcodec/x86/vp9lpf: Always include x86util.asm
Fixes executable stack

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-17 23:37:46 +02:00
Michael Niedermayer
85f2c0124d avcodec/x86/me_cmp: fix sad8xh
This adds back support for 8x4 and 8x16
it does not support 8x2, i think nothing uses that

Found-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-17 14:08:24 +02:00
James Almer
0456d169c4 x86/me_cmp: port mmxext and sse2 sad functions to yasm
Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of
sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext).
Since the _xy2 versions are not bitexact, they are accordingly marked as
approximate.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-17 11:12:50 +02:00
James Almer
52ec81c67d x86/hevc_res_add: add missing guards to hevc_transform_add32_8_avx2
Should fix compilation with old Yasm/Nasm versions.

Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-04 23:34:01 -03:00
James Almer
c3d2426cca x86/hevc_res_add: add ff_hevc_transform_add32_8_avx2
~20% faster than AVX.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-04 20:21:29 -03:00
James Darnley
46ef45ab59 lavc/x86/v210: give cpuflag to INIT macro
This lets the cglobal macro automatically append a suffix to the function name.
This means that INIT_XMM avx must be used rather than INIT_AVX.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 00:35:07 +02:00
Michael Niedermayer
5b58d79a99 Merge commit '7a1d6ddd2c6b2d66fbc1afa584cf506930a26453'
* commit '7a1d6ddd2c6b2d66fbc1afa584cf506930a26453':
  xvid: Add C IDCT

Conflicts:
	libavcodec/dct-test.c
	libavcodec/xvididct.c

See: 298b3b6c1f
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-03 04:09:38 +02:00
Michael Niedermayer
5db23c07a3 Merge commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b'
* commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b':
  idctdsp: Add global function pointers for {add|put}_pixels_clamped functions

Conflicts:
	libavcodec/arm/idctdsp_init_arm.c
	libavcodec/dct.h
	libavcodec/idctdsp.c
	libavcodec/jrevdct.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-03 03:19:40 +02:00
Pascal Massimino
7a1d6ddd2c xvid: Add C IDCT
Thanks to Pascal Massimino and Michael Militzer for relicensing as LGPL.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-02 14:41:13 -07:00
Diego Biurrun
95c0cec03a idctdsp: Add global function pointers for {add|put}_pixels_clamped functions
These function pointers already existed in the ARM code. Adding them globally
allows calls to the function pointers to access arch-optimized versions of the
functions transparently.
2014-09-02 14:41:13 -07:00
Reimar Döffinger
d9e2aceb7f Add missing "const" all over the place.
Only "./configure --enable-gpl" on x86 was tested.

Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2014-08-29 18:57:25 +02:00
Michael Niedermayer
5403a288a7 Merge commit '8d27bf1cff35be406b0fd89d832e1852d4c573bc'
* commit '8d27bf1cff35be406b0fd89d832e1852d4c573bc':
  x86: xvid: K&R formatting cosmetics

Conflicts:
	libavcodec/x86/xvididct_sse2.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-27 21:20:39 +02:00
Michael Niedermayer
b3b05a11d3 Merge commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5'
* commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5':
  cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs

Conflicts:
	libavcodec/mpeg4videodec.c
	libavcodec/x86/Makefile
	libavcodec/x86/dct-test.c
	libavcodec/x86/xvididct_sse2.c
	libavcodec/xvididct.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-27 21:09:30 +02:00
Michael Niedermayer
3ff5ca89fc Merge commit '1f156af4274dc72d588620f6bedb4e9e66023c92'
* commit '1f156af4274dc72d588620f6bedb4e9e66023c92':
  x86: xvid_idct: Drop unused definitions

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-27 21:01:54 +02:00
Diego Biurrun
8d27bf1cff x86: xvid: K&R formatting cosmetics 2014-08-27 05:58:04 -07:00
Diego Biurrun
dcb7c868ec cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs 2014-08-27 04:54:05 -07:00
Diego Biurrun
1f156af427 x86: xvid_idct: Drop unused definitions 2014-08-27 04:36:41 -07:00
Christophe Gisquet
3e892b2bcd x86: hevc_mc: split differently calls
In some cases, 2 or 3 calls are performed to functions for unusual
widths. Instead, perform 2 calls for different widths to split the
workload.

The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't
be processed that way without modifications: some calls use unaligned
buffers, and having branches to handle this was resulting in no
micro-benchmark benefit.

For block_w == 12 (around 1% of the pixels of the sequence):
Before:
12758 decicycles in epel_uni, 4093 runs, 3 skips
19389 decicycles in qpel_uni, 8187 runs, 5 skips
22699 decicycles in epel_bi, 32743 runs, 25 skips
34736 decicycles in qpel_bi, 32733 runs, 35 skips

After:
11929 decicycles in epel_uni, 4096 runs, 0 skips
18131 decicycles in qpel_uni, 8184 runs, 8 skips
20065 decicycles in epel_bi, 32750 runs, 18 skips
31458 decicycles in qpel_bi, 32753 runs, 15 skips

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-24 12:05:33 +02:00
Christophe Gisquet
38e2aa3759 x86: hevc_mc: correct unneeded use of SSE4 code
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-24 11:43:33 +02:00
Christophe Gisquet
2346f2b5db x86: hevcdsp: use compilation-time-fixed constant
The stride for some buffers is known.

Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 16:26:30 +02:00