ffmpeg

Commit Graph

Author	SHA1	Message	Date
Diego Biurrun	82bd04b170	rv34: Drop now unnecessary dsputil dependencies	2013-02-06 11:30:54 +01:00
Diego Biurrun	c9f933b5b6	Add av_cold attributes to arch-specific init functions	2013-02-05 17:01:05 +01:00
Diego Biurrun	26301caaa1	x86: mmx2 ---> mmxext in asm constructs	2012-11-14 00:58:51 +01:00
Janne Grunau	f101eab1be	x86: call most of the x86 dsp init functions under if (ARCH_X86) Rename the called dsp init functions to *_init_x86.	2012-10-08 11:54:05 +02:00
Diego Biurrun	e0c6cce447	x86: Replace checks for CPU extensions and flags by convenience macros This separates code relying on inline from that relying on external assembly and fixes instances where the coalesced check was incorrect.	2012-09-08 18:18:34 +02:00
Diego Biurrun	ec36aa6944	x86: Fix linking with some or all of yasm, mmx, optimizations disabled Some optimized template functions reference optimized symbols, so they must be explicitly disabled when those symbols are unavailable.	2012-08-30 19:37:32 +02:00
Diego Biurrun	a886b279a0	x86: cosmetics: Comment some #endifs for better readability	2012-08-30 18:50:33 +02:00
Martin Storsjö	1d9c2dc89a	Don't include common.h from avutil.h Signed-off-by: Martin Storsjö <martin@martin.st>	2012-08-15 22:32:06 +03:00
Diego Biurrun	239fdf1b4a	x86: build: replace mmx2 by mmxext Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.	2012-08-03 22:51:05 +02:00
Ronald S. Bultje	79195ce565	x86/dsputil: put inline asm under HAVE_INLINE_ASM. This allows compiling with compilers that don't support gcc-style inline assembly. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:24:27 -04:00
Diego Biurrun	a5a93fa8f5	cosmetics: do not use full path for local headers	2012-06-22 10:49:40 +02:00
Michael Kostylev	6797d1948b	x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions.	2012-05-15 23:54:08 +02:00
Christophe Gisquet	110d0cdc9d	rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-10 18:42:43 +02:00
Christophe GISQUET	272b252c01	rv40dsp: implement prescaled versions for biweight. Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:06:48 -07:00
Ronald S. Bultje	3ab9a2a557	rv34: change most "int stride" into "ptrdiff_t stride". This prevents having to sign-extend on 64-bit systems with 32-bit ints, such as x86-64. Also fixes crashes on systems where we don't do it and arguments are not in registers, such as Win64 for all weight functions.	2012-02-20 14:58:25 -08:00
Christophe Gisquet	e5c9de2ab7	rv40: x86 SIMD for biweight Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-01-30 23:58:25 +01:00
Diego Biurrun	91bafb52ae	x86: Give RV40 init file a more suitable name.	2012-01-30 23:58:24 +01:00

17 Commits