Commit Graph

3191 Commits

Author SHA1 Message Date
Hubert Mazur 8743a46d10 pixel: Add neon sa8d implementations for 10 bit
Provide arm64 neon implementation for sa8d 16x8 and 16x16 functions
for 10 bit depth. Benchmarks are shown below.

sa8d_8x8_c: 2914
sa8d_8x8_neon: 608
sa8d_16x16_c: 11469
sa8d_16x16_neon: 2030

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Hubert Mazur 820fb5a7d8 pixel: Add neon satd implementations for 10 bit
Provide arm64 neon implementation for satd 16x8 and 16x16 functions
for 10 bit depth. Benchmarks are shown below.

satd_16x8_c: 4268
satd_16x8_neon: 1493
satd_16x16_c: 8382
satd_16x16_neon: 2908

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Hubert Mazur 9927ac9ae0 Add neon pixel_var2 implementation for 10 bit
Provide arm64 neon implementation for pixel_var2 function
for 10 bit depth. Benchmarks are shown below.

var2_8x8_c: 1988
var2_8x8_neon: 505
var2_8x16_c: 3800
var2_8x16_neon: 862

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Hubert Mazur 7ae0053807 Add neon pixel_var implementation for 10 bit
Provide arm64 neon implementation for pixel_var function
for 10 bit depth. Benchmarks are shown below.

var_8x8_c: 757
var_8x8_neon: 342
var_8x16_c: 1431
var_8x16_neon: 582
var_16x16_c: 2721
var_16x16_neon: 767

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Hubert Mazur a87a9f89eb pixel: Add neon ssd_nv12 implementation for 10 bit
Provide arm64 neon implementation for ssd_nv12 function
for 10 bit depth. Benchmarks are shown below.

ssd_nv12_c: 181441
ssd_nv12_neon: 29037

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Hubert Mazur 1b59a1f3ee pixel: Add neon satd implementations for 10 bit
Provide arm64 neon implementation for satd 8x8 and 8x16 functions
for 10 bit depth. Benchmarks are shown below.

satd_8x8_c: 2143
satd_8x8_neon: 812
satd_8x16_c: 4228
satd_8x16_neon: 1504

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Grzegorz Bernacki 1754f6b20c pixel: Add neon satd implementations for 10 bit
Provide arm64 neon implementation for satd functions for 10 bit
depth. Benchmarks are shown below.

satd_4x4_c: 858
satd_4x4_neon: 712
satd_4x8_c: 1834
satd_4x8_neon: 812
satd_4x16_c: 3677
satd_4x16_neon: 1149
satd_8x4_c: 1290
satd_8x4_neon: 427

Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Hubert Mazur 8fd1e5f26d pixel: Add neon ssd implementations for 10 bit
Provide arm64 neon implementation for ssd functions for 10 bit
depth. Benchmarks are shown below.

ssd_4x4_c: 1466
ssd_4x4_neon: 240
ssd_4x8_c: 1918
ssd_4x8_neon: 482
ssd_4x16_c: 5258
ssd_4x16_neon: 1025
ssd_8x4_c: 1291
ssd_8x4_neon: 235
ssd_8x8_c: 2431
ssd_8x8_neon: 425
ssd_8x16_c: 4635
ssd_8x16_neon: 910
ssd_16x8_c: 4198
ssd_16x8_neon: 897
ssd_16x16_c: 8549
ssd_16x16_neon: 1907

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Hubert Mazur 90b3391ee6 pixel: Add neon asd8 implementations for 10 bit
Provide arm64 neon implementation for asd8 function for 10 bit
depth. Benchmarks are shown below.

asd8_c: 4400
asd8_neon: 857

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Hubert Mazur 8a90ffa7d1 pixel: Add neon vsad implementations for 10 bit
Provide arm64 neon implementation for vsad function for 10 bit
depth. Benchmarks are shown below.

vsad_c: 3599
vsad_neon: 392

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Hubert Mazur 3afe3c82bc pixel: Add neon sad_x3 implementations for 10 bit
Provide arm64 neon implementations for sad_x3 functions for 10 bit
depth. Benchmarks are shown below.

sad_x3_4x4_c: 710
sad_x3_4x4_neon: 286
sad_x3_4x8_c: 1422
sad_x3_4x8_neon: 430
sad_x3_8x4_c: 1350
sad_x3_8x4_neon: 269
sad_x3_8x8_c: 2851
sad_x3_8x8_neon: 440
sad_x3_8x16_c: 5597
sad_x3_8x16_neon: 734
sad_x3_16x8_c: 5414
sad_x3_16x8_neon: 722
sad_x3_16x16_c: 10729
sad_x3_16x16_neon: 1288

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:45:18 +00:00
Hubert Mazur 7882a3689b quant: Add implementation for denoise_dct function
Provide arm64 neon implementation for denoise_dct function for high bit
depth. Benchmarks are shown below.

denoise_dct_c: 2149
denoise_dct_neon: 585

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:31:51 +00:00
Hubert Mazur 01e056712c quant: Add neon implementations of coeff_level_run
Provide arm64 neon implementations for coeff_level_run functions for high bit
depth. Benchmarks are shown below.

coeff_level_run4_c: 135
coeff_level_run4_neon: 155
coeff_level_run8_c: 181
coeff_level_run8_neon: 182
coeff_level_run15_c: 296
coeff_level_run15_neon: 275
coeff_level_run16_c: 305
coeff_level_run16_neon: 264

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:31:51 +00:00
Hubert Mazur 03c0e9a900 quant: Add neon implementations of coeff_last
Provide arm64 neon implementations for coeff_last functions for high bit
depth. Benchmarks are shown below.

coeff_last4_c: 79
coeff_last4_neon: 107
coeff_last8_c: 109
coeff_last8_neon: 154
coeff_last15_c: 161
coeff_last15_neon: 135
coeff_last16_c: 160
coeff_last16_neon: 132
coeff_last64_c: 782
coeff_last64_neon: 400

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:31:51 +00:00
Hubert Mazur 7c62a144ff quant: Add implementation for decimate64
Provide neon arm64 implementation for decimate_score64 for high bit
depth. Benchmarks are shown below.

decimate_score64_c: 894
decimate_score64_neon: 431

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:31:51 +00:00
Hubert Mazur 66d000d2d6 quant: Add implementation for decimate functions
Provide neon arm64 implementations for decimate score functions
for high bit depth. Benchmarks are shown below.

decimate_score15_c: 273
decimate_score15_neon: 205
decimate_score16_c: 284
decimate_score16_neon: 208

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:31:51 +00:00
Hubert Mazur 986dd1f3b7 quant: Add implementation for dequant
Provide neon arm64 implementations for dequant functions for high bit
depth. Benchmarks are shown below.

dequant_4x4_cqm_c: 359
dequant_4x4_cqm_neon: 225
dequant_4x4_dc_cqm_c: 344
dequant_4x4_dc_cqm_neon: 208
dequant_4x4_dc_flat_c: 348
dequant_4x4_dc_flat_neon: 210
dequant_4x4_flat_c: 362
dequant_4x4_flat_neon: 227
dequant_8x8_cqm_c: 1526
dequant_8x8_cqm_neon: 517
dequant_8x8_flat_c: 1547
dequant_8x8_flat_neon: 520

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:31:51 +00:00
Hubert Mazur b8ea87e05c quant: Add neon implementation of quant functions
Provide arm64 neon implementations of quant functions for high
bit depth. Benchmarks are shown below.

quant_2x2_dc_c: 217
quant_2x2_dc_neon: 275
quant_4x4_c: 482
quant_4x4_neon: 326
quant_4x4_dc_c: 428
quant_4x4_dc_neon: 348
quant_4x4x4_c: 2508
quant_4x4x4_neon: 1027
quant_8x8_c: 2439
quant_8x8_neon: 936

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:31:51 +00:00
Hubert Mazur cc5c343f43 mc: Add arm64 neon implementation for hpel filter
Provide neon optimized implementation for mc_plane_copy function
from motion compensation family for 10 bit depth.
Benchmark results are shown below.

hpel_filter_c: 111495
hpel_filter_neon: 37849

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur e47bede829 mc: Add arm64 neon implementation for copy funcs
Provide neon optimized implementation for mc_plane_copy function
from motion compensation family for 10 bit depth.
Benchmark results are shown below.

plane_copy_c:  2955
plane_copy_neon: 2910
plane_copy_deinterleave_c: 24056
plane_copy_deinterleave_neon: 3625
plane_copy_deinterleave_rgb_c: 19928
plane_copy_deinterleave_rgb_neon: 3941
plane_copy_interleave_c: 24399
plane_copy_interleave_neon: 4723
plane_copy_swap_c: 32269
plane_copy_swap_neon: 3211

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur df179744c9 mc: Add arm64 neon implementation for store func
Provide neon optimized implementation for mc_store_interleave function
from motion compensation family for 10 bit depth.
Benchmark results are shown below.

load_deinterleave_chroma_fenc_c: 2910
load_deinterleave_chroma_fenc_neon: 430

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur 68d712065f mc: Add arm64 neon implementation for mc_load func
Provide neon optimized implementation for mc_load_deinterleave function
from motion compensation family for 10 bit depth.
Benchmark results are shown below.

load_deinterleave_chroma_fdec_c: 2936
load_deinterleave_chroma_fdec_neon: 422

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur 0a810f4f58 mc: Add arm64 neon implementation for mc_lowres
Provide neon optimized implementation for mc_lowres function from
motion compensation family for 10 bit depth.
Benchmark results are shown below.

lowres_init_c: 149446
lowres_init_neon: 13172

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur 25ef883299 mc: Add arm64 neon implementation for mc_integral
Provide neon optimized implementation for mc_integral functions from
motion compensation family for 10 bit depth.
Benchmark results are shown below.

integral_init4h_c: 2651
integral_init4h_neon: 550
integral_init4v_c: 4247
integral_init4v_neon: 612
integral_init8h_c: 2544
integral_init8h_neon: 1027
integral_init8v_c: 1996
integral_init8v_neon: 245

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur 7ff0f978fa mc: Add arm64 neon implementation for mc_chroma
Provide neon optimized implementation for mc_chroma functions from
motion compensation family for 10 bit depth.
Benchmark results are shown below.

mc_chroma_2x2_c: 700
mc_chroma_2x2_neon: 478
mc_chroma_2x4_c: 1300
mc_chroma_2x4_neon: 765
mc_chroma_4x2_c: 1229
mc_chroma_4x2_neon: 483
mc_chroma_4x4_c: 2383
mc_chroma_4x4_neon: 773
mc_chroma_4x8_c: 4662
mc_chroma_4x8_neon: 1319
mc_chroma_8x4_c: 4450
mc_chroma_8x4_neon: 940
mc_chroma_8x8_c: 8797
mc_chroma_8x8_neon: 1638

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur 0876120871 mc: Move mc_luma and get_ref wrappers
Provide mc_luma and get_ref wrappers were only defined with 8 bit depth.
As all required 10 bit depth helper functions exists, move it out from
if scope and make it always defined regardless the bit depth.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur 25d5baf43d mc: Add arm64 neon implementation for mc_weight
Provide neon optimized implementation for mc_weight functions from
motion compensation family for 10 bit depth.

Benchmark results are shown below.

weight_w4_c: 4734
weight_w4_neon: 4165
weight_w8_c: 8930
weight_w8_neon: 1620
weight_w16_c: 16939
weight_w16_neon: 2729
weight_w20_c: 20721
weight_w20_neon: 3470

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur f0b0489f19 mc: Add arm64 neon implementation for mc_copy
Provide neon optimized implementation for mc_copy functions from
motion compensation family for 10 bit depth.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur bb3d83dd02 mc: Add arm64 neon implementation for pixel_avg2
Provide neon optimized implementation for pixel_avg2 functions from
motion compensation family for 10 bit depth.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur 13a2488815 mc: Add arm64 neon implementation for pixel_avg
Provide neon optimized implementation for pixel_avg functions from
motion compensation family for 10 bit depth.
Checkasm benchmarks are shown below.

avg_4x2_c: 703
avg_4x2_neon: 222
avg_4x4_c: 1405
avg_4x4_neon: 516
avg_4x8_c: 2759
avg_4x8_neon: 898
avg_4x16_c: 5808
avg_4x16_neon: 1776
avg_8x4_c: 2767
avg_8x4_neon: 412
avg_8x8_c: 5559
avg_8x8_neon: 841
avg_8x16_c: 11176
avg_8x16_neon: 1668
avg_16x8_c: 10493
avg_16x8_neon: 1504
avg_16x16_c: 21116
avg_16x16_neon: 2985

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur ba45eba390 aarch64/mc-c: Unify pixel/uint8_t usage
Previously some functions from motion compensation family used uint8_t,
while the others pixel definition. Unify this and change every uint8_t
usage to pixel.
This commit is a prerequisite to 10 bit depth support.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Hubert Mazur 249924eaf1 mc: Add initial support for 10 bit neon support
Add if/else clause in files to control which code is used.
Move generic function out of 8-bit depth scope to common one
for both modes.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
2023-10-01 15:13:40 +00:00
Martin Husemann 834c5c92db ppc: Add x264_cpu_detect() for NetBSD/macppc
The altivec instruction set detection is very similar to FreeBSD
and OpenBSD, but uses slightly different sysctl selectors.
2023-10-01 17:35:48 +03:00
Anton Mitrofanov 31e19f92f0 ppc: Fix compilation on unknown OS 2023-10-01 17:28:26 +03:00
Anton Mitrofanov a8b68ebfaa Improve qpfile parsing resiliency 2023-04-02 15:51:50 +03:00
Anton Mitrofanov eaa68fad9e Fix high bit depth deinterleave of YUYV or UYVY 2023-01-28 22:11:33 +00:00
Anton Mitrofanov cd31a90ba5 Fix compilation of only 8 or 10 bit by a non-optimizing compiler 2023-01-28 21:45:30 +03:00
Anton Mitrofanov 17df75b32e Bump dates to 2023 2023-01-28 16:37:02 +03:00
Roger Hardiman 941cae6d1d Add Risc-V 64 bit 2022-12-17 16:09:25 +00:00
Hubert Mazur 416e3eb2b5 aarch64: pixel: add 10bits sad functions
Provide routines for sad functions for high bit depth, i.e. 10 bits.
Benchmarks run on AWS Gravtion 2 instances.

sad_4x4_c: 583
sad_4x4_neon: 273
sad_4x8_c: 1179
sad_4x8_neon: 366
sad_4x16_c: 2121
sad_4x16_neon: 550
sad_8x4_c: 924
sad_8x4_neon: 213
sad_8x8_c: 1711
sad_8x8_neon: 316
sad_8x16_c: 3505
sad_8x16_neon: 497
sad_16x8_c: 3070
sad_16x8_neon: 635
sad_16x16_c: 6113
sad_16x16_neon: 1118

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
2022-10-28 07:11:57 +00:00
Anton Mitrofanov b093bbe7d9 ffms: Fix crash if stream properties changes 2022-10-06 00:12:42 +03:00
Henrik Gramner ed0f7a6340 cli: Use space instead of newline as autocomplete delimiter
On most systems any whitespace is fine, but MSYS2 wants ASCII 0x20.
2022-10-01 17:21:11 +02:00
Sergei Trofimovich e067ab0b53 Makefile: Add missing dependency of '.depend' on 'oclobj.h'
Without the change parallel build occasionally fails as:

    $ make --shuffle
    ...
    gcc ... -c common/opencl.c -o common/opencl-8.o ...
    common/opencl.c:116:10: fatal error: common/oclobj.h: No such file or directory
      116 | #include "common/oclobj.h"
          |          ^~~~~~~~~~~~~~~~~

Best reproducible with `make --shuffle` mode:
   https://savannah.gnu.org/bugs/index.php?62100

This happens because `common/oclobj.h` is an autogenerated file.
Normally `.depend` would contain this autogenerated dependency.
But nothing forces `common/oclobj.h` to be generated.

The change moves dependency of $(GENERATED) from final binaries
to `.depend` itself:

    .depend: $(GENERATED)
2022-09-19 22:31:01 +01:00
Anton Mitrofanov 7628a5696f Fix memory overread in mbtree 2022-09-05 19:32:40 +00:00
Anton Mitrofanov 8bdd8b8993 CI: Fix vlc-contrib linking on macOS
Use pkg-config from the custom PATH.
2022-09-01 23:17:40 +03:00
Anton Mitrofanov f7074e12d9 CI: Migrate build runners to macOS Monterey 2022-08-31 20:06:58 +03:00
Anton Mitrofanov baee400fa9 CI: Fix vlc-contrib processing on macos
Use perl for in-place editing because sed doesn't work with symlinks.
2022-06-02 01:31:50 +03:00
Stephen Hutchinson bfc87b7a33 configure: Allow AviSynth+ on *BSD and Haiku 2022-02-22 18:03:57 +00:00
Anton Mitrofanov 95634be643 Fix build on MIPS with AviSynth+ support 2022-02-22 20:46:39 +03:00
Anton Mitrofanov 35fe20d1ba Replace AvxSynth with AviSynth+ on POSIX systems 2022-02-21 21:57:05 +00:00