ffmpeg/libswscale
Martin Storsjö 70db14376c swscale: aarch64: Optimize the final summation in the hscale routine
Before:                     Cortex A53      A72      A73  Graviton 2  Graviton 3
hscale_8_to_15_width8_neon:     8273.0   4602.5   4289.5      2429.7      1629.1
hscale_8_to_15_width16_neon:   12405.7   6803.0   6359.0      3549.0      2378.4
hscale_8_to_15_width32_neon:   21258.7  11491.7  11469.2      5797.2      3919.6
hscale_8_to_15_width40_neon:   25652.0  14173.7  12488.2      6893.5      4810.4

After:
hscale_8_to_15_width8_neon:     7633.0   3981.5   3350.2      1980.7      1261.1
hscale_8_to_15_width16_neon:   11666.7   5951.0   5512.0      3080.7      2131.4
hscale_8_to_15_width32_neon:   20900.7  10733.2   9481.7      5275.2      3862.1
hscale_8_to_15_width40_neon:   24826.0  13536.2  11502.0      6397.2      4731.9

Thus, this gives overall a 8-29% speedup for the smaller filter
sizes, around 1-8% for the larger filter sizes.

Inspired by a patch by Jonathan Swinney <jswinney@amazon.com>.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-22 10:49:46 +03:00
..
aarch64 swscale: aarch64: Optimize the final summation in the hscale routine 2022-04-22 10:49:46 +03:00
arm sws: rename SwsContext.swscale to convert_unscaled 2021-07-03 15:57:53 +02:00
ppc sws: rename SwsContext.swscale to convert_unscaled 2021-07-03 15:57:53 +02:00
tests swscale: introduce isSwappedChroma 2022-01-04 19:39:22 -06:00
x86 swscale/x86/swscale: Remove superfluous and invalid ';' 2022-01-22 17:00:45 +01:00
Makefile libswscale: Split version.h 2022-03-16 14:05:26 +02:00
alphablend.c swscale/alphablend: Fix slice handling 2021-10-03 20:38:29 +02:00
bayer_template.c swscale: do not drop half of bits from 16bit bayer formats 2020-08-08 12:03:42 +02:00
gamma.c swscale: re-enable gamma 2015-09-04 19:00:20 -03:00
hscale.c avutil: Rename FF_CEIL_COMPAT to AV_CEIL_COMPAT 2016-01-27 16:36:46 +00:00
hscale_fast_bilinear.c sws: Move fast bilinear C code into seperate file 2014-07-19 05:36:26 +02:00
input.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
libswscale.v build: Change structure of the linker version script templates 2016-05-29 16:43:11 +02:00
log2_tab.c lsws: duplicate ff_log2_tab 2014-08-12 20:52:21 +02:00
options.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
output.c swscale/output: use isSwappedChroma 2022-01-04 19:39:22 -06:00
rgb2rgb.c swscale: aarch64: Add a NEON implementation of interleaveBytes 2020-05-15 23:38:17 +03:00
rgb2rgb.h Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
rgb2rgb_template.c swscale/rgb2rgb_template: use shuffle macro on big-endian arches 2020-12-12 23:07:22 -05:00
slice.c Replace all occurences of av_mallocz_array() by av_calloc() 2021-09-20 01:03:52 +02:00
swscale.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
swscale.h Keep including the full version.h when headers are included externally 2022-03-19 00:01:57 +02:00
swscale_internal.h Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
swscale_unscaled.c sws: add a new scaling API 2021-09-06 09:16:52 +02:00
swscaleres.rc Add Windows resource file support for shared libraries 2013-12-05 23:42:07 +01:00
utils.c libswscale: Split version.h 2022-03-16 14:05:26 +02:00
version.h doc: Add an entry to APIchanges about changes to version.h and version_major.h 2022-03-16 14:12:46 +02:00
version_major.h libswscale: Split version.h 2022-03-16 14:05:26 +02:00
vscale.c Replace all occurences of av_mallocz_array() by av_calloc() 2021-09-20 01:03:52 +02:00
yuv2rgb.c swscale/yuv2rgb: Silence a set-but-unused-variable warning 2021-12-03 16:10:51 +01:00