Provide arm64 neon implementation for sa8d 16x8 and 16x16 functions
for 10 bit depth. Benchmarks are shown below.
sa8d_8x8_c: 2914
sa8d_8x8_neon: 608
sa8d_16x16_c: 11469
sa8d_16x16_neon: 2030
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide arm64 neon implementation for satd 16x8 and 16x16 functions
for 10 bit depth. Benchmarks are shown below.
satd_16x8_c: 4268
satd_16x8_neon: 1493
satd_16x16_c: 8382
satd_16x16_neon: 2908
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide arm64 neon implementation for pixel_var2 function
for 10 bit depth. Benchmarks are shown below.
var2_8x8_c: 1988
var2_8x8_neon: 505
var2_8x16_c: 3800
var2_8x16_neon: 862
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide arm64 neon implementation for ssd_nv12 function
for 10 bit depth. Benchmarks are shown below.
ssd_nv12_c: 181441
ssd_nv12_neon: 29037
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide arm64 neon implementation for satd 8x8 and 8x16 functions
for 10 bit depth. Benchmarks are shown below.
satd_8x8_c: 2143
satd_8x8_neon: 812
satd_8x16_c: 4228
satd_8x16_neon: 1504
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide arm64 neon implementation for asd8 function for 10 bit
depth. Benchmarks are shown below.
asd8_c: 4400
asd8_neon: 857
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide arm64 neon implementation for vsad function for 10 bit
depth. Benchmarks are shown below.
vsad_c: 3599
vsad_neon: 392
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide arm64 neon implementation for denoise_dct function for high bit
depth. Benchmarks are shown below.
denoise_dct_c: 2149
denoise_dct_neon: 585
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide neon arm64 implementation for decimate_score64 for high bit
depth. Benchmarks are shown below.
decimate_score64_c: 894
decimate_score64_neon: 431
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide neon arm64 implementations for decimate score functions
for high bit depth. Benchmarks are shown below.
decimate_score15_c: 273
decimate_score15_neon: 205
decimate_score16_c: 284
decimate_score16_neon: 208
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide neon optimized implementation for mc_plane_copy function
from motion compensation family for 10 bit depth.
Benchmark results are shown below.
hpel_filter_c: 111495
hpel_filter_neon: 37849
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide neon optimized implementation for mc_store_interleave function
from motion compensation family for 10 bit depth.
Benchmark results are shown below.
load_deinterleave_chroma_fenc_c: 2910
load_deinterleave_chroma_fenc_neon: 430
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide neon optimized implementation for mc_load_deinterleave function
from motion compensation family for 10 bit depth.
Benchmark results are shown below.
load_deinterleave_chroma_fdec_c: 2936
load_deinterleave_chroma_fdec_neon: 422
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide neon optimized implementation for mc_lowres function from
motion compensation family for 10 bit depth.
Benchmark results are shown below.
lowres_init_c: 149446
lowres_init_neon: 13172
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide mc_luma and get_ref wrappers were only defined with 8 bit depth.
As all required 10 bit depth helper functions exists, move it out from
if scope and make it always defined regardless the bit depth.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide neon optimized implementation for mc_copy functions from
motion compensation family for 10 bit depth.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Provide neon optimized implementation for pixel_avg2 functions from
motion compensation family for 10 bit depth.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Previously some functions from motion compensation family used uint8_t,
while the others pixel definition. Unify this and change every uint8_t
usage to pixel.
This commit is a prerequisite to 10 bit depth support.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Add if/else clause in files to control which code is used.
Move generic function out of 8-bit depth scope to common one
for both modes.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Without the change parallel build occasionally fails as:
$ make --shuffle
...
gcc ... -c common/opencl.c -o common/opencl-8.o ...
common/opencl.c:116:10: fatal error: common/oclobj.h: No such file or directory
116 | #include "common/oclobj.h"
| ^~~~~~~~~~~~~~~~~
Best reproducible with `make --shuffle` mode:
https://savannah.gnu.org/bugs/index.php?62100
This happens because `common/oclobj.h` is an autogenerated file.
Normally `.depend` would contain this autogenerated dependency.
But nothing forces `common/oclobj.h` to be generated.
The change moves dependency of $(GENERATED) from final binaries
to `.depend` itself:
.depend: $(GENERATED)