Commit Graph

46 Commits

Author SHA1 Message Date
Ziemowit Zabawa d8debbc6dd Fix typos with codespell tool
This includes fixing convert_method_to_flag so it recognizes "gauss"
parameter properly instead of silently defaulting to bicubic.
2021-09-29 21:11:16 +00:00
Anton Mitrofanov 8e5e8340f0 Bump dates to 2021 2021-01-24 16:38:34 +03:00
Anton Mitrofanov 33f9e14746 Fix warning: comparison of integers of different signs [-Wsign-compare] 2020-04-09 15:36:22 +03:00
Anton Mitrofanov 04e6c65e6b Bump dates to 2020 2020-02-29 22:02:01 +03:00
Henrik Gramner ec1d32302d Bump dates to 2019 2019-03-06 22:45:52 +03:00
Henrik Gramner ca5408b13c Bump dates to 2018 2018-01-17 18:31:04 +01:00
Vittorio Giovara 71ed44c731 Unify 8-bit and 10-bit CLI and libraries
Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
option to set the bit depth at runtime.

Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
incorrect value, it's preferable to induce a linking failure. If applications
relies on this symbol this will make it more obvious where the problem is.

Add Makefile rules that compiles modules with different bit depths. Assembly
on x86 is prefixed with the 'private_prefix' define, while all other archs
modify their function prefix internally.

Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
assembly, PowerPC assembly, and MIPS assembly.

The depth and cache CLI filters heavily depend on bit depth size, so they
need to be duplicated for each value. This means having to rename these
filters, and adjust the callers to use the right version.

Unfortunately the threaded input CLI module inherits a common.h dependency
(input/frame -> common/threadpool -> common/frame -> common/common) which
is extremely complicated to address in a sensible way. Instead duplicate
the module and select the appropriate one at run time.

Each bitdepth needs different checkasm compilation rules, so split the main
checkasm target into two executables.
2017-12-24 23:47:24 +03:00
Vittorio Giovara 8f2437d333 Drop the x264 prefix from static functions and variables 2017-12-24 23:11:30 +03:00
Vittorio Giovara 1d2420981a arm/aarch64: Correctly prefix integral function symbols 2017-01-21 14:10:37 +01:00
Henrik Gramner c7a2e327be Bump dates to 2017 2017-01-21 14:10:37 +01:00
Anton Mitrofanov b2b39dae0b Cosmetics
Also make x264_weighted_reference_duplicate() static.
2016-12-01 18:00:07 +01:00
Janne Grunau 5caef139cf arm/aarch64: use plane_copy wrapper macros
Move the macros to common/mc.h to share them across all architectures.
Fixes possible buffer overreads if the width of the user supplied frames
is not a multiple of 16.

Reported-by: Kirill Batuzov <batuzovk@ispras.ru>
2016-09-17 15:10:14 +02:00
Janne Grunau 14a58532fe arm: Add asm for mbtree fixed point conversion
7-8 times faster on a cortex-a53 vs. gcc-5.3.

mbtree_fix8_pack_c: 44114
mbtree_fix8_pack_neon: 5805
mbtree_fix8_unpack_c: 38924
mbtree_fix8_unpack_neon: 4870
2016-06-13 22:07:00 +02:00
Henrik Gramner d23d186552 Bump dates to 2016 2016-01-17 00:30:13 +01:00
Janne Grunau 424534537a arm: do not fill mc_weight*_neon tabs for HIGH_BIT_DEPTH
The asm is only for 8-bit and function prototypes reflect that. Avoids
numerous warnings with --bit-depth=9/10.
2015-12-20 18:40:11 +01:00
Martin Storsjö 6f04b14687 arm: Implement x264_mbtree_propagate_{cost, list}_neon
The cost function could be simplified to avoid having to clobber
q4/q5, but this requires reordering instructions which increase
the total runtime.

checkasm timing       Cortex-A7      A8      A9
mbtree_propagate_cost_c      63702   155835  62829
mbtree_propagate_cost_neon   17199   10454   11106

mbtree_propagate_list_c      104203  108949  84532
mbtree_propagate_list_neon   82035   78348   60410
2015-10-11 18:44:54 +02:00
Martin Storsjö 5db8b6b93a arm: Implement x264_plane_copy_neon
checkasm timing       Cortex-A7      A8     A9
plane_copy_c                 13124   10925  9106
plane_copy_neon              7349    5103   8945
2015-10-11 18:44:54 +02:00
Martin Storsjö 5265b927b0 arm: Implement integral_init4/8h/v_neon
checkasm timing       Cortex-A7      A8     A9
integral_init4h_c            10466   8590   6161
integral_init4h_neon         3021    1494   1800
integral_init4v_c            16250   13590  13628
integral_init4v_neon         3473    2073   3291
integral_init8h_c            10100   8275   5705
integral_init8h_neon         4403    2344   2751
integral_init8v_c            6403    4632   4999
integral_init8v_neon         1184    783    1306
2015-10-11 18:44:54 +02:00
Yu Xiaolei 627f891c57 NV21 input support
Eliminates an extra copy when encoding Android camera preview images.

Checkasm test by Janne Grunau.
ARM assembly with improvements from Janne Grunau.
2015-07-25 22:52:54 +02:00
Anton Mitrofanov d7ccd89f1b Bump dates to 2015 2015-02-23 13:34:44 +03:00
Anton Mitrofanov 30140b34b8 Fix bugs/typos in motion compensation and cache_load
Didn't affect output due to the incorrect values either not being used in the
code path or producing equal results compared to the correct values.

Also deduplicate hpel_ref arrays.
2014-12-13 00:34:15 +01:00
Janne Grunau fadc4045f9 arm: use the weight_fn_t typedef for mc weight function arrays 2014-04-22 15:37:50 -07:00
Janne Grunau 644c396be9 arm: correct x264_mc_chroma_neon function declaration 2014-04-22 15:37:50 -07:00
Janne Grunau 2e96c571b8 arm: x264_store_interleave_chroma_neon
store_interleave_chroma_c: 4036
store_interleave_chroma_neon: 1043
2014-04-22 15:37:49 -07:00
Janne Grunau 1576e51e52 arm: x264_plane_copy_interleave_neon
plane_copy_interleave_c: 40285
plane_copy_interleave_neon: 10137
2014-04-22 15:37:49 -07:00
Janne Grunau 0016dec270 arm: x264_plane_copy_deinterleave_rgb_neon
plane_copy_deinterleave_rgb_c: 31543
plane_copy_deinterleave_rgb_neon: 8312
2014-04-22 15:37:49 -07:00
Janne Grunau 5e0ca9aa4e arm: load_deinterleave_chroma_f{dec,enc}_neon
load_deinterleave_chroma_fdec_c: 4055
load_deinterleave_chroma_fdec_neon: 995
load_deinterleave_chroma_fenc_c: 4071
load_deinterleave_chroma_fenc_neon: 992
2014-04-22 15:37:48 -07:00
Janne Grunau c9a5ae0d21 arm: x264_plane_copy_deinterleave_neon
plane_copy_deinterleave_c: 42988
plane_copy_deinterleave_neon: 10184
2014-04-22 15:37:48 -07:00
Janne Grunau 2794ba5bb0 arm: add missing macro instantiation for x264_pixel_avg_4x16_neon
checkasm --bench on a cortex-a9:
avg_4x16_c: 8910
avg_4x16_neon: 2091
2014-04-22 15:37:48 -07:00
Henrik Gramner 807aeaaae7 Bump dates to 2014
Also update AUTHORS file and my e-mail address in the headers of various files.
2014-01-08 11:15:45 -08:00
Stefan Groenroos 3a8baa0ec6 ARM: update NEON mc_chroma to work with NV12 and re-enable it
Up to 10-15% faster overall.
2013-02-26 15:13:17 -08:00
Loren Merritt 732b072ae2 Bump dates to 2013 2013-01-08 16:01:32 -08:00
Henrik Gramner 3131a19cab Fix incorrect zero-extension assumptions in x86_64 asm
Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero.
This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI.
As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations.
Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary.
Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.
2012-03-06 10:37:53 -08:00
Hii 27a7b05b83 Bump dates to 2012 2012-02-04 07:18:13 -08:00
Fiona Glaser 9bbfc30284 Split prefetch_fenc between colorspaces
Add 4:2:2 version.
2011-10-21 17:22:56 -07:00
Sean McGovern ee9bc136e9 Bump dates to 2011 2011-01-25 12:16:24 -08:00
Oskar Arvidsson 1382552b8c Convert X264_HIGH_BIT_DEPTH to HIGH_BIT_DEPTH
Less verbose.
2010-11-19 09:47:36 -08:00
Fiona Glaser 213a99d070 Update source file headers
Update dates, improve file descriptions, make things more consistent.
Also add information about commercial licensing.
2010-09-18 01:30:37 -07:00
Loren Merritt 387828eda8 Convert x264 to use NV12 pixel format internally
~1% faster overall on Conroe, mostly due to improved cache locality.
Also allows improved SIMD on some chroma functions (e.g. deblock).
This change also extends the API to allow direct NV12 input, which should be a bit faster than YV12.
This isn't currently used in the x264cli, as swscale does not have fast NV12 conversion routines, but it might be useful for other applications.

Note this patch disables the chroma SIMD code for PPC and ARM until new versions are written.
2010-07-14 19:03:32 -07:00
Oskar Arvidsson c91f43a4b0 Support for 9 and 10-bit encoding
Output bit depth is specified on compilation time via --bit-depth.
There is currently almost no assembly code available for high-bit-depth modes, so encoding will be very slow.
Input is still 8-bit only; this will change in the future.

Note that very few H.264 decoders support >8 bit depth currently.
Also note that the quantizer scale differs for higher bit depth.  For example, for 10-bit, the quantizer (and crf) ranges from 0 to 63 instead of 0 to 51.
2010-07-04 14:47:33 -07:00
Henrik Gramner 8c02c79035 Shrink even more constant arrays 2010-05-16 22:51:12 -07:00
David Conrad b46cec4f01 ARM NEON versions of weightp functions 2010-02-15 01:00:05 -08:00
David Conrad aa48c1fbb7 Fix x264 compilation on Apple GCC
Apple's GCC stupidly ignores the ARM ABI and doesn't give any stack alignment beyond 4.
2010-01-13 23:47:02 -05:00
David Conrad 094110915e Fix weightp on ARM + PPC
No ARM or PPC assembly yet though.
2009-11-08 20:21:52 -08:00
David Conrad 53a5772a35 Various ARM-related fixes
Fix comment for mc_copy_neon.
Fix memzero_aligned_neon prototype.
Update NEON (i)dct_dc prototypes.
Duplicate x86 behavior for global+hidden functions.
2009-11-08 20:21:47 -08:00
David Conrad 6bf21c631a GSOC merge part 4: ARM NEON mc assembly functions
prefetch, memcpy_aligned, memzero_aligned, avg, mc_luma, get_ref, mc_chroma, hpel_filter, frame_init_lowres
2009-08-24 06:00:28 -07:00