ffmpeg

mirror of https://git.videolan.org/git/ffmpeg.git synced 2024-09-16 03:44:15 +02:00

Author	SHA1	Message	Date
Rostislav Pehlivanov	50945482a7	h264_idct: enable unmacro on newer NASM versions Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2018-02-12 10:50:37 +00:00
Martin Vignali	8f9c38b196	avcodec/utvideoenc : add SIMD (avx) for sub_left_prediction asm code by Henrik Gramner	2018-01-28 20:23:11 +01:00
James Almer	6e80079a28	avcodec: increase AV_INPUT_BUFFER_PADDING_SIZE to 64 AVX-512 support has been introduced, and even if no functions currently use zmm registers (able to load as much as 64 bytes of consecutive data per instruction), they will be added eventually. Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2018-01-11 23:46:31 -03:00
James Almer	438f884fc4	x86/lossless_videodsp: rename ff_add_left_pred_int16_sse4 to ff_add_left_pred_int16_unaligned_ssse3 SSSE3_FAST is the proper check for it. Signed-off-by: James Almer <jamrial@gmail.com>	2017-12-10 00:51:01 -03:00
James Almer	a4fc63c0f9	x86/lossless_videodsp: don't overread the dst buffer in ff_add_left_pred_unaligned_avx2 Fixes valgrind Signed-off-by: James Almer <jamrial@gmail.com>	2017-12-10 00:38:05 -03:00
Martin Vignali	630967ef63	avcodec/utvideodec : add SIMD (SSSE3 and AVX2) for gradient_pred	2017-12-09 15:19:03 +01:00
Martin Vignali	4353c35067	avcodec/x86/lossless_videodsp : add avx2 version for add_left_pred	2017-12-09 15:16:03 +01:00
Martin Vignali	cfbcea1cca	avcodec/x86/lossless_videodsp.asm : make macro for add_left_pred_unaligned in order to add avx2 version	2017-12-09 15:15:59 +01:00
Martin Vignali	be6d1f9632	avcodec/x86/bswapdsp : use macro for 128 bits constants loading in xmm or ymm	2017-12-02 18:25:25 +01:00
Mikulas Patocka	fbdd78fa3e	avcodec/fft: fix INTERL macro on 3dnow The commit `b7c16a3f2c` ("x86: fft: Port to cpuflags") breaks the opus decoder in ffmpeg when compiling for 3dnow. The output is audible, but there's a lot of noise. The reason for the breakage is that the commit unintentionally changed the INTERL macro so that it is empty when compiling for 3dnow. This patch fixes it. Signed-off-by: Mikulas Patocka <mikulas@twibright.com> Signed-off-by: James Almer <jamrial@gmail.com>	2017-11-25 13:11:45 -03:00
Martin Vignali	515555af6c	avcodec/x86/exrdsp : use ymm constant for pb_80 speed seems to be similar, but simplify code	2017-11-23 20:00:13 +01:00
James Almer	beb63baa69	x86/utvideodsp: reuse shared constants Remove the broadcast instructions as well now that they are wide enough. Signed-off-by: James Almer <jamrial@gmail.com>	2017-11-21 10:57:14 -03:00
James Almer	ebf352116b	x86/constants: make pb_80 32 byte wide Signed-off-by: James Almer <jamrial@gmail.com>	2017-11-21 10:57:03 -03:00
Martin Vignali	ba98f8463f	avcodec/huffyuvdspenc : add diff_int16 AVX2 func	2017-11-21 09:42:08 +01:00
Martin Vignali	d189a426fa	avcodec/huffyuvdspenc : reorganize diff_int16	2017-11-21 09:42:03 +01:00
Martin Vignali	e641c94190	avcodec/huffyuvdsp : add add_int16 AVX2 func	2017-11-21 09:41:58 +01:00
Martin Vignali	6955e8842e	avcodec/huffyuvdsp : reorganize add_int16 asm	2017-11-21 09:41:52 +01:00
Martin Vignali	7f9b67bcb6	avcodec/huffyuvdsp(enc) : move duplicate macro to a template file	2017-11-21 09:41:46 +01:00
Martin Vignali	caf51a573d	avcodec/x86/utvideodsp.asm : cosmetic better func separator and add comment for the restore rgb planes10 declaration	2017-11-21 09:00:47 +01:00
Martin Vignali	b5ebe38443	avcodec/utvideodsp : add avx2 version for the dsp	2017-11-21 09:00:42 +01:00
Martin Vignali	48b7c45b0c	avcodec/x86/utvideodsp : make macro for func	2017-11-21 09:00:38 +01:00
James Almer	aea0f06db7	x86/jpeg2000dsp: add ff_ict_float_{fma3,fma4} jpeg2000_ict_float_c: 2296.0 jpeg2000_ict_float_sse: 628.0 jpeg2000_ict_float_avx: 317.0 jpeg2000_ict_float_fma3: 262.0 Signed-off-by: James Almer <jamrial@gmail.com>	2017-11-20 18:33:58 -03:00
Michael Niedermayer	58cf31cee7	avcodec/x86/mpegvideodsp: Fix signedness bug in need_emu Fixes: out of array read Fixes: 3516/attachment-311488.dat Found-by: Insu Yun, Georgia Tech. Tested-by: wuninsu@gmail.com Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-11-14 04:54:31 +01:00
Thomas Köppe	43171a2a73	Fix missing used attribute for inline assembly variables Variables used in inline assembly need to be marked with attribute((used)). Static constants already were, via the define of DECLARE_ASM_CONST. But DECLARE_ALIGNED does not add this attribute, and some of the variables defined with it are const only used in inline assembly, and therefore appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks variables as used. This change makes FFMPEG work with Clang's ThinLTO. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-11-13 03:58:34 +01:00
Martin Vignali	0380b72d35	libavcodec/lossless_video_dsp : cosmetic add better separator for each function, in order to make reading of the asm file easier	2017-11-07 00:56:54 +01:00
Martin Vignali	da62128ea1	libavcodec/lossless_videodsp : add add_bytes avx2 version	2017-11-07 00:56:02 +01:00
James Almer	783535a4cd	x86/bswapdsp: add missing preprocessor wrappers for AVX2 functions Fixes build with old nasm/yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2017-10-29 22:21:51 -03:00
Martin Vignali	e9930883a2	libavcodec/bswapdsp : add AVX2 func for bswap_buf (swap uint32_t)	2017-10-29 15:21:35 +01:00
James Almer	b7c16a3f2c	Merge commit '681a86aba6cb09b98ad716d986182060c7795d20' * commit '681a86aba6cb09b98ad716d986182060c7795d20': x86: fft: Port to cpuflags Merged-by: James Almer <jamrial@gmail.com>	2017-10-21 12:45:49 -03:00
James Almer	11f5ffd330	Merge commit 'e9bb77fb1012cba1951a82136df7071f71bce8fb' * commit 'e9bb77fb1012cba1951a82136df7071f71bce8fb': x86: h264: Simplify DEQUANT macro with cpuflags Merged-by: James Almer <jamrial@gmail.com>	2017-10-21 12:39:41 -03:00
James Almer	53eea3a569	Merge commit '307eb1a8ee363db1fcf869e427a8deb6d9538881' * commit '307eb1a8ee363db1fcf869e427a8deb6d9538881': x86: vp8dsp: port FILTER_BILINEAR macro to cpuflags Merged-by: James Almer <jamrial@gmail.com>	2017-10-21 12:28:39 -03:00
James Almer	2904db9045	Merge commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2' * commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2': x86util: Port all macros to cpuflags See `d5f8a642f6` Merged-by: James Almer <jamrial@gmail.com>	2017-10-21 12:15:57 -03:00
James Almer	b78bb51a7c	Merge commit '6eef263aca281fb582e1fa3d841ac20ef747a252' * commit '6eef263aca281fb582e1fa3d841ac20ef747a252': x86: Merge align directives into SECTION_RODATA declarations where possible Merged-by: James Almer <jamrial@gmail.com>	2017-10-12 13:48:35 -03:00
James Almer	18279738f9	x86/blockdsp: use three operand form for an instruction Fixes assembling with old yasm.	2017-10-04 23:51:44 -03:00
Michael Niedermayer	26ea142658	avcodec/x86/lossless_videoencdsp: Fix warning: signed dword value exceeds bounds Add () to regsize define Suggested-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-10-05 01:22:44 +02:00
Michael Niedermayer	df62b70de8	avcodec/x86/lossless_videoencdsp: Fix handling of small widths Fixes out of array access Fixes: crash-huf.avi Regression since: `6b41b44149` This could also be fixed by adding checks in the C code that calls the dsp Found-by: Zhibin Hu and 连一汉 <lianyihan@360.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-10-05 01:22:44 +02:00
Martin Vignali	cbbec68847	libavcodec/blockdsp : add AVX version Also modify the required alignment, to 32 instead of 16 for several codecs Signed-off-by: James Almer <jamrial@gmail.com>	2017-10-03 19:47:37 -03:00
Martin Vignali	ac5908b13f	libavcodec/exr : add x86 SIMD for predictor Signed-off-by: James Almer <jamrial@gmail.com>	2017-10-01 17:35:30 -03:00
James Almer	0c005fa86f	Merge commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6' * commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6': asm: Consistently uppercase SECTION markers Merged-by: James Almer <jamrial@gmail.com>	2017-09-26 18:48:06 -03:00
James Almer	318778de9e	Merge commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3' * commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3': Mark some arrays that never change as const. Merged-by: James Almer <jamrial@gmail.com>	2017-09-26 16:02:40 -03:00
Henrik Gramner	18821e3ba1	x86/exrdsp: optimize ff_reorder_pixels_avx2() Tested with "checkasm --test=exrdsp -bench" Before: reorder_pixels_c: 5187.8 reorder_pixels_sse2: 377.0 reorder_pixels_avx2: 331.3 After: reorder_pixels_c: 5181.5 reorder_pixels_sse2: 377.0 reorder_pixels_avx2: 313.8 Signed-off-by: James Almer <jamrial@gmail.com>	2017-09-18 23:24:55 -03:00
James Almer	98d7ad085e	avcodec/exrdsp: improve the ExrDSPContext->reorder_pixels prototype Make dst be the first parameter and src const. It's more in line with the rest of the codebase. Signed-off-by: James Almer <jamrial@gmail.com>	2017-09-17 19:01:40 -03:00
Martin Vignali	9b8c1224d7	libavcodec/exr : add X86 SIMD for reorder_pixels Signed-off-by: James Almer <jamrial@gmail.com>	2017-09-17 17:53:57 -03:00
Michael Niedermayer	bc488ec28a	avcodec/me_cmp: Fix crashes on ARM due to misalignment Adds a diff_pixels_unaligned() Fixes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=872503 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-08-21 23:19:18 +02:00
Ivan Kalvachev	43dab86bcd	opus_pvq_search: Restore the proper use of conditional define and simplify the function name suffix handling. Using named define properly documents the code paths. It also avoids passing additional numbered arguments through multiple levels of macro templates. The suffix handling is done by concatenation, like in other asm functions and avoid having two separate "cglobal" defines. Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>	2017-08-19 22:42:56 +01:00
Rostislav Pehlivanov	3c99523a28	opus_pvq_search: split functions into exactness and only use the exact if its faster This splits the asm function into exact and non-exact version. The exact version is as fast or faster on newer CPUs (which EXTERNAL_AVX_FAST describes well) whilst the non-exact version is faster than the exact on older CPUs. Also fixes yasm compilation which doesn't accept !cpuflags(avx) syntax. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2017-08-18 19:32:55 +01:00
Rostislav Pehlivanov	f386dd70ac	opus_pvq_search: only use rsqrtps approximation on CPUs with avx Makes the search produce idential results with the C version. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2017-08-18 17:30:41 +01:00
Rostislav Pehlivanov	8e53cd1fab	ops_pvq_search: remove dead macro There's no point in toggling it, even for debugging. Its just worse. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2017-08-18 17:27:41 +01:00
Ivan Kalvachev	7205513f8f	SIMD opus pvq_search implementation Explanation on the workings and methods used by the Pyramid Vector Quantization Search function could be found in the following Work-In-Progress mail threads: http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>	2017-08-18 17:18:32 +01:00
Rostislav Pehlivanov	70eb77b34e	mdct15: add inverse transform postrotation SIMD 2.5ms frames: Before (c): 2638 decicycles in postrotate, 2097040 runs, 112 skips After (sse3): 1467 decicycles in postrotate, 2097083 runs, 69 skips After (avx2): 1244 decicycles in postrotate, 2097085 runs, 67 skips 5ms frames: Before (c): 4987 decicycles in postrotate, 1048371 runs, 205 skips After (sse3): 2644 decicycles in postrotate, 1048509 runs, 67 skips After (avx2): 2031 decicycles in postrotate, 1048523 runs, 53 skips 10ms frames: Before (c): 9153 decicycles in postrotate, 523575 runs, 713 skips After (sse3): 5110 decicycles in postrotate, 523726 runs, 562 skips After (avx2): 3738 decicycles in postrotate, 524223 runs, 65 skips 20ms frames: Before (c): 17857 decicycles in postrotate, 261866 runs, 278 skips After (sse3): 10041 decicycles in postrotate, 261746 runs, 398 skips After (avx2): 7050 decicycles in postrotate, 262116 runs, 28 skips Improves total decoding performance for real world content by 9% with avx2. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2017-07-30 07:38:39 +01:00
Wan-Teh Chang	ea1ca17be2	avcodec/x86/cavsdsp: Delete #include "libavcodec/x86/idctdsp.h". This file already has #include "idctdsp.h", which is resolved to the idctdsp.h header in the directory where this file resides by compilers. Two other files in this directory, libavcodec/x86/idctdsp_init.c and libavcodec/x86/xvididct_init.c, also rely on #include "idctdsp.h" working this way. Signed-off-by: Wan-Teh Chang <wtc@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-07-21 02:08:33 +02:00
James Almer	9d5e81d3b1	Revert "x86/sbrdsp: remove unnecessary sign extend instruction in apply_noise_main" This reverts commit `24bb7db403`. noise has to after all be sign extended, not zero extended, on tests other than checkasm. Fixes most aac tests broken by the now reverted commit.	2017-07-05 10:29:15 -03:00
James Almer	24bb7db403	x86/sbrdsp: remove unnecessary sign extend instruction in apply_noise_main noise needs to be zero extended and it can be done implicitly as a side effect in a subsequent instruction. Signed-off-by: James Almer <jamrial@gmail.com>	2017-07-04 23:36:17 -03:00
James Almer	bcbe9e4447	x86/sbrdsp: zero extend m_max in apply_noise_main Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2017-07-04 23:02:24 -03:00
James Almer	440285474b	x86/utvideodsp: make restore_rgb_planes functions work on x86_32 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2017-07-04 22:49:45 -03:00
James Almer	ac8ad8d098	x86/sbrdsp: sign extend start and end gprs in ff_sbr_hf_gen_sse Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-30 11:46:24 -03:00
James Darnley	0c2acccd4b	avcodec/x86: use new x86-64 functions for -idct simple They now match according to FATE, barring any further bugs with untested parts	2017-06-28 17:27:35 +02:00
James Darnley	d7246ea9f2	avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high depth functions Includes add/put functions Rounding contributed by Ronald S. Bultje	2017-06-28 17:27:35 +02:00
James Darnley	8b19467d07	avcodec/x86: allow future 8-bit simple idct to have "DC only hack" Created by Ronald S. Bultje	2017-06-28 17:27:35 +02:00
Clément Bœsch	b12a36170b	lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis	2017-06-28 12:22:39 +02:00
Michael Niedermayer	516c213f08	avcodec/x86/vp9dsp_init_16bpp: Fix linking to missing ff_vp9_ipred_dr_32x32_16_avx2() on 32bit Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-06-28 00:31:33 +02:00
Ilia Valiakhmetov	35a5d9715d	avcodec/vp9: add 64-bit ipred_dr_32x32_16 avx2 implementation vp9_diag_downright_32x32_12bpp_c: 429.7 vp9_diag_downright_32x32_12bpp_sse2: 158.9 vp9_diag_downright_32x32_12bpp_ssse3: 144.6 vp9_diag_downright_32x32_12bpp_avx: 141.0 vp9_diag_downright_32x32_12bpp_avx2: 73.8 Almost 50% faster than avx implementation Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2017-06-27 16:10:50 -04:00
Paul B Mahol	4ed7c2bbc3	avcodec/utvideodec: add SIMD for restore_rgb_planes Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-06-27 09:54:10 +02:00
Matthieu Bouron	db5bf64b21	lavc/x86: clear r2 higher bits in ff_sbr_sum_square Suggested-by: James Almer <jamrial@gmail.com>	2017-06-26 09:55:23 +02:00
James Almer	349446e36f	x86/mdct15: use three operand form for some instructions Fixes compilation with old yasm	2017-06-24 01:44:49 -03:00
Rostislav Pehlivanov	e1120b1c54	mdct15: add assembly optimizations for the 15-point FFT c: 1802 decicycles in fft15,16774635 runs, 2581 skips avx: 865 decicycles in fft15,16776378 runs, 838 skips Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2017-06-23 23:45:37 +01:00
Diego Biurrun	fd502f4f5f	build: Generalize yasm/nasm-related variable names None of them are specific to the YASM assembler. (Cherry-picked from libav commit `39e208f4d4`) Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-21 17:00:29 -03:00
James Darnley	8221c71703	avcodec/x86: allow future 8-bit simple idct to use slightly different coefficients	2017-06-20 16:12:25 +02:00
James Darnley	d2597fb0c1	avcodec/x86: modify simple_idct10 macros to add an action paramter	2017-06-20 13:35:01 +02:00
James Darnley	8781330d80	avcodec/x86: cleanup simple_idct10 Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register.	2017-06-20 13:34:38 +02:00
James Darnley	e3db94302c	avcodec/x86/mpegenc: support transpose permuation type	2017-06-20 12:12:13 +02:00
James Darnley	fa30a0a548	avcodec/x86/mpegenc: check IDCT permutation type is a valid value	2017-06-20 12:12:13 +02:00
Michael Niedermayer	ae6f6d4e34	avcodec/x86/mpegvideo: Use intra scantable in dct_unquantize_h263_intra_mmx() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-06-20 00:07:51 +02:00
James Almer	8bb59e6742	x86/aacpsdsp: add ff_ps_hybrid_analysis_ileave_sse About 2x faster than the c version.	2017-06-18 22:34:22 -03:00
James Almer	e229df9478	x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4} About 2x faster than the c version.	2017-06-18 22:33:27 -03:00
James Almer	623d217ed1	avcodec/aacps: move checks for valid length outside the stereo_interpolate dsp function Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-15 23:49:40 -03:00
James Almer	b3446862bf	x86/vorbisdsp: optimize ff_vorbis_inverse_coupling_sse About 7% faster.	2017-06-15 23:20:05 -03:00
Ronald S. Bultje	d35ff98e27	vp9: fix overwrite in ff_vp9_ipred_dr_16x16_16_avx2. Fixes trac issue 6459.	2017-06-14 11:37:38 -04:00
Ilia Valiakhmetov	81fc617c12	avcodec/vp9: ipred_dr_16x16_16 avx2 implementation Signed-off-by: Ilia Valiakhmetov <zakne0ne@gmail.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2017-06-12 12:40:58 -04:00
James Almer	497a4b554c	x86/aacpsdsp: fix output of ff_ps_stereo_interpolate_ipdopd_sse3 The fate-aac-al_sbr_ps_04_ur test did not detect this mistake.	2017-06-07 13:53:51 -03:00
Ilia Valiakhmetov	73d9a9a6af	libavcodec/vp9: ipred_dl_32x32_16 avx2 implementation vp9_diag_downleft_32x32_8bpp_c: 580.2 vp9_diag_downleft_32x32_8bpp_sse2: 75.6 vp9_diag_downleft_32x32_8bpp_ssse3: 73.7 vp9_diag_downleft_32x32_8bpp_avx: 72.7 vp9_diag_downleft_32x32_10bpp_c: 1101.2 vp9_diag_downleft_32x32_10bpp_sse2: 145.4 vp9_diag_downleft_32x32_10bpp_ssse3: 137.5 vp9_diag_downleft_32x32_10bpp_avx: 134.8 vp9_diag_downleft_32x32_10bpp_avx2: 94.0 vp9_diag_downleft_32x32_12bpp_c: 1108.5 vp9_diag_downleft_32x32_12bpp_sse2: 145.5 vp9_diag_downleft_32x32_12bpp_ssse3: 137.3 vp9_diag_downleft_32x32_12bpp_avx: 135.2 vp9_diag_downleft_32x32_12bpp_avx2: 94.0 ~30% faster than avx implementation Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2017-06-06 08:05:03 -04:00
James Almer	933dd62288	x86/aacpsdsp: optimize ff_ps_mul_pair_single_sse ~2% faster.	2017-06-04 23:29:56 -03:00
James Almer	be3809a521	x86/aacpsdsp: optimize ff_ps_stereo_interpolate_sse3 Move the unpacking outside of the loop. 5% to 10% faster. Suggested-by: ubitux Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-03 12:39:43 -03:00
James Almer	b5a0971ff0	x86/aacps: add ff_ps_stereo_interpolate_ipdopd_sse3() About 2x faster than the c version. Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-02 11:06:24 -03:00
James Darnley	0dea0114fb	avcodec/x86/idctdsp_init: reindent	2017-05-30 13:20:44 +02:00
James Darnley	8e89f6fd37	avcodec/x86: move simple_idct to external assembly	2017-05-30 13:20:42 +02:00
Clément Bœsch	584366a436	lavc/mpegvideoenc: reformat inv_zigzag_direct16 so the zigzag pattern is visible	2017-05-19 11:17:58 +02:00
Clément Bœsch	19bb2cade5	Merge commit 'b4a911c189962e563a09fb0efaf6fa9ab56263a4' * commit 'b4a911c189962e563a09fb0efaf6fa9ab56263a4': mpegvideoenc: make a table const Merged-by: Clément Bœsch <u@pkh.me>	2017-05-19 11:15:16 +02:00
James Darnley	7aa90b4e94	avcodec/h264: add sse2 versions of previous idct functions Kaby Lake Pentium: - ff_h264_idct_add_8_sse2: ~1.18x faster than mmxext - ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext	2017-05-15 15:00:20 +02:00
James Darnley	27460dfebc	avcodec/h264: add avx 8-bit h264_idct_dc_add Haswell: - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext Skylake-U: - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext	2017-05-15 15:00:19 +02:00
James Darnley	f61d454ca1	avcodec/h264: add avx 8-bit h264_idct_add Haswell: - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext Skylake-U: - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext	2017-05-15 15:00:17 +02:00
James Darnley	b5325c6711	avcodec/h264: use some 3 operand forms	2017-05-15 15:00:16 +02:00
James Darnley	060ba9e5e3	avcodec/h264: change RETs into REP_RETs where appropriate	2017-05-15 15:00:15 +02:00
Michael Niedermayer	fa8fd0808f	avcodec/x86/vc1dsp_init: Fix build failure with --disable-optimizations and clang compilers doing DCE at -O0 do not necessarily understand "complex" boolean expressions Build succeeds with this change, this was the only failure Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-04-27 04:25:31 +02:00
Clément Bœsch	5be1440c74	Merge commit '0a35f128f3c6e0ae9a0a2236c557602c108da269' * commit '0a35f128f3c6e0ae9a0a2236c557602c108da269': cabac: x86: Give optimizations header a more meaningful name Merged-by: Clément Bœsch <u@pkh.me>	2017-04-08 14:30:13 +02:00
Ronald S. Bultje	83ae7e6350	x86/idctdsp_init: reindent.	2017-04-06 10:03:28 -04:00
Ronald S. Bultje	e0c205677f	x86/simple_idct: add explicit sse2 simple_idct_put/add versions. These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations. This way we don't need to use the ff_put/add_pixels_clamped function pointers.	2017-04-06 10:03:28 -04:00
Ronald S. Bultje	2f0591cfa3	cavs: add a sse2 idct implementation. This makes using the function pointer ff_add_pixels_clamped() unnecessary, since we always know what the best implementation is at compile-time.	2017-04-06 10:03:28 -04:00
Ronald S. Bultje	c9d98c5649	cavs: convert idct from inline asm to yasm.	2017-04-06 10:03:27 -04:00
Ronald S. Bultje	b51d7d89f8	x86/xvididct: remove use of ff_put/add_pixels_clamped function pointer. Since there's separate SSE2 implementations of xvid_idct_put/add, this patch has no practical impact on performance.	2017-04-06 10:03:27 -04:00

1 2 3 4 5 ...

2474 Commits