ffmpeg

mirror of https://git.videolan.org/git/ffmpeg.git synced 2024-07-20 11:14:12 +02:00

History

Martin Storsjö 9f10cff610 aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter This work is sponsored by, and copyright, Google. This is similar to the arm version, but due to the larger registers on aarch64, we can do 8 pixels at a time for all filter sizes. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_loop_filter_h_4_8_10bpp_neon: 213.2 172.6 vp9_loop_filter_h_8_8_10bpp_neon: 281.2 244.2 vp9_loop_filter_h_16_8_10bpp_neon: 657.0 444.5 vp9_loop_filter_h_16_16_10bpp_neon: 1280.4 877.7 vp9_loop_filter_mix2_h_44_16_10bpp_neon: 397.7 358.0 vp9_loop_filter_mix2_h_48_16_10bpp_neon: 465.7 429.0 vp9_loop_filter_mix2_h_84_16_10bpp_neon: 465.7 428.0 vp9_loop_filter_mix2_h_88_16_10bpp_neon: 533.7 499.0 vp9_loop_filter_mix2_v_44_16_10bpp_neon: 271.5 244.0 vp9_loop_filter_mix2_v_48_16_10bpp_neon: 330.0 305.0 vp9_loop_filter_mix2_v_84_16_10bpp_neon: 329.0 306.0 vp9_loop_filter_mix2_v_88_16_10bpp_neon: 386.0 365.0 vp9_loop_filter_v_4_8_10bpp_neon: 150.0 115.2 vp9_loop_filter_v_8_8_10bpp_neon: 209.0 175.5 vp9_loop_filter_v_16_8_10bpp_neon: 492.7 345.2 vp9_loop_filter_v_16_16_10bpp_neon: 951.0 682.7 This is significantly faster than the ARM version in almost all cases except for the mix2 functions. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 2-3x. Signed-off-by: Martin Storsjö <martin@martin.st>		2017-01-24 22:36:11 +02:00
..
asm-offsets.h	Merge commit '705f5e5e155f6f280a360af220fc5b30cfcee702'	2016-01-02 11:14:28 +01:00
cabac.h
fft_init_aarch64.c	Merge commit '97aec6e75ef36ed0402653519daa8e1fc8ddb555'	2016-04-12 15:43:09 +01:00
fft_neon.S	Merge commit '780cd20b00a69e26bbfffbb8eec16fbe999ea793'	2014-12-09 12:08:29 +01:00
fmtconvert_init.c	Merge commit 'a0fc780a2093784e8664f88205ee1b215e109cee'	2016-01-02 11:21:16 +01:00
fmtconvert_neon.S	Merge commit 'a0fc780a2093784e8664f88205ee1b215e109cee'	2016-01-02 11:21:16 +01:00
h264chroma_init_aarch64.c
h264cmc_neon.S	avcodec: fix vc1dsp dependencies	2016-09-25 13:11:45 +02:00
h264dsp_init_aarch64.c	lavc/aarch64: Do not use the neon horizontal chroma loop filter for H.264 4:2:2.	2015-01-31 10:05:10 +01:00
h264dsp_neon.S
h264idct_neon.S	aarch64: h264idct: Use the offset parameter to movrel	2016-12-08 18:11:07 +01:00
h264pred_init.c	Merge commit 'f56d8d8dd72b1ab52aa814c5a0fccabf8040ef68'	2015-07-21 01:39:30 +02:00
h264pred_neon.S	Merge commit 'f56d8d8dd72b1ab52aa814c5a0fccabf8040ef68'	2015-07-21 01:39:30 +02:00
h264qpel_init_aarch64.c	arm64: constify src in h264qpel dsp function definitions	2015-06-24 08:41:32 +02:00
h264qpel_neon.S
hpeldsp_init_aarch64.c
hpeldsp_neon.S
Makefile	aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter	2017-01-24 22:36:11 +02:00
mdct_neon.S
mpegaudiodsp_init.c
mpegaudiodsp_neon.S	Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'	2016-06-21 21:55:34 +02:00
neon.S	Merge commit 'cdb1665f70def544ddab3e3ed3763ef99c8b3873'	2016-04-24 12:51:42 +01:00
neontest.c	avcodec: fix arguments on xmm/neon clobber test wrappers	2016-10-02 02:15:47 -03:00
rv40dsp_init_aarch64.c
synth_filter_init.c	avcodec/synth_filter: split off remaining code from dcadec files	2016-01-25 14:57:38 -03:00
synth_filter_neon.S	Merge commit '705f5e5e155f6f280a360af220fc5b30cfcee702'	2016-01-02 11:14:28 +01:00
vc1dsp_init_aarch64.c
videodsp_init.c
videodsp.S
vorbisdsp_init.c
vorbisdsp_neon.S
vp9dsp_init_10bpp_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9dsp_init_12bpp_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9dsp_init_16bpp_aarch64_template.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter	2017-01-24 22:36:11 +02:00
vp9dsp_init_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9dsp_init.h	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9itxfm_16bpp_neon.S	aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfm	2017-01-24 22:36:08 +02:00
vp9itxfm_neon.S	aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32	2017-01-14 21:13:32 +01:00
vp9lpf_16bpp_neon.S	aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter	2017-01-24 22:36:11 +02:00
vp9lpf_neon.S	aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne};	2017-01-14 21:13:10 +01:00
vp9mc_16bpp_neon.S	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9mc_neon.S	aarch64: vp9mc: Fix a comment to refer to a register with the right name	2017-01-14 21:13:43 +01:00