dav1d/NEWS

Changes for 0.7.1 'Frigatebird':
------------------------------

0.7.1 is a minor update on 0.7.0:
 - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC
 - SSE2 optimizations for prep_bilin and prep_8tap
 - AVX2 optimizations for MC scaled
 - Fix a clamping issue in motion vector projection
 - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions
 - Improvements on the dav1dplay utility player to support resizing


Changes for 0.7.0 'Frigatebird':
------------------------------

0.7.0 is a major release for dav1d:
 - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)
 - 10b/12b ARM64 optimizations are mostly complete:
   - ipred (paeth, smooth, dc, pal, filter, cfl)
   - itxfm (only 10b)
 - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
 - AVX2 for cfl4:4:4
 - AVX-512 CDEF filter
 - ARM64 8b improvements for cfl_ac and itxfm
 - ARM64 implementation for emu_edge in 8b/10b/12b
 - ARM32 implementation for emu_edge in 8b
 - Improvements on the dav1dplay utility player to support 10 bit,
   non-4:2:0 pixel formats and film grain on the GPU


Changes for 0.6.0 'Gyrfalcon':
------------------------------

0.6.0 is a major release for dav1d:
 - New ARM64 optimizations for the 10/12bit depth:
    - mc_avg, mc_w_avg, mc_mask
    - mc_put/mc_prep 8tap/bilin
    - mc_warp_8x8
    - mc_w_mask
    - mc_blend
    - wiener
    - SGR
    - loopfilter
    - cdef
 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
 - New SSSE3 optimizations for film grain
 - New AVX2 optimizations for msac_adapt16
 - Fix rare mismatches against the reference decoder, notably because of clipping
 - Improvements on ARM64 on msac, cdef and looprestoration optimizations
 - Improvements on AVX2 optimizations for cdef_filter
 - Improvements in the C version for itxfm, cdef_filter


Changes for 0.5.2 'Asiatic Cheetah':
------------------------------------

0.5.2 is a small release improving speed for ARM32 and adding minor features:
 - ARM32 optimizations for loopfilter, ipred_dc|h|v
 - Add section-5 raw OBU demuxer
 - Improve the speed by reducing the L2 cache collisions
 - Fix minor issues


Changes for 0.5.1 'Asiatic Cheetah':
------------------------------------

0.5.1 is a small release improving speeds and fixing minor issues
compared to 0.5.0:
 - SSE2 optimizations for CDEF, wiener and warp_affine
 - NEON optimizations for SGR on ARM32
 - Fix mismatch issue in x86 asm in inverse identity transforms
 - Fix build issue in ARM64 assembly if debug info was enabled
 - Add a workaround for Xcode 11 -fstack-check bug


Changes for 0.5.0 'Asiatic Cheetah':
------------------------------------

0.5.0 is a medium release fixing regressions and minor issues,
and improving speed significantly:
 - Export ITU T.35 metadata
 - Speed improvements on blend_ on ARM
 - Speed improvements on decode_coef and MSAC
 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64
 - NEON optimizations for CDEF and warp on ARM32
 - SSE2 optimizations for MSAC hi_tok decoding
 - SSSE3 optimizations for deblocking loopfilters and warp_affine
 - AVX2 optimizations for film grain and ipred_z2
 - SSE4 optimizations for warp_affine
 - VSX optimizations for wiener
 - Fix inverse transform overflows in x86 and NEON asm
 - Fix integer overflows with large frames
 - Improve film grain generation to match reference code
 - Improve compatibility with older binutils for ARM
 - More advanced Player example in tools


Changes for 0.4.0 'Cheetah':
----------------------------

 - Fix playback with unknown OBUs
 - Add an option to limit the maximum frame size
 - SSE2 and ARM64 optimizations for MSAC
 - Improve speed on 32bits systems
 - Optimization in obmc blend
 - Reduce RAM usage significantly
 - The initial PPC SIMD code, cdef_filter
 - NEON optimizations for blend functions on ARM
 - NEON optimizations for w_mask functions on ARM
 - NEON optimizations for inverse transforms on ARM64
 - VSX optimizations for CDEF filter
 - Improve handling of malloc failures
 - Simple Player example in tools


Changes for 0.3.1 'Sailfish':
------------------------------

 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
 - Reduce binary size, notably on Windows
 - SSSE3 optimizations for ipred_filter
 - ARM optimizations for MSAC


Changes for 0.3.0 'Sailfish':
------------------------------

This is the final release for the numerous speed improvements of 0.3.0-rc.
It mostly:
 - Fixes an annoying crash on SSSE3 that happened in the itx functions


Changes for 0.2.2 (0.3.0-rc) 'Antelope':
-----------------------------

 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
   The impact is important on SSSE3, SSE4 and AVX2 cpus
 - SSSE3 optimizations for all blocks size in itx
 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
 - Speed improvements on CDEF for SSE4 CPUs
 - NEON optimizations for SGR and loop filter
 - Minor crashes, improvements and build changes


Changes for 0.2.1 'Antelope':
----------------------------

 - SSSE3 optimization for cdef_dir
 - AVX2 improvements of the existing CDEF optimizations
 - NEON improvements of the existing CDEF and wiener optimizations
 - Clarification about the numbering/versionning scheme


Changes for 0.2.0 'Antelope':
----------------------------

 - ARM64 and ARM optimizations using NEON instructions
 - SSSE3 optimizations for both 32 and 64bits
 - More AVX2 assembly, reaching almost completion
 - Fix installation of includes
 - Rewrite inverse transforms to avoid overflows
 - Snap packaging for Linux
 - Updated API (ABI and API break)
 - Fixes for un-decodable samples


Changes for 0.1.0 'Gazelle':
----------------------------

Initial release of dav1d, the fast and small AV1 decoder.
 - Support for all features of the AV1 bitstream
 - Support for all bitdepth, 8, 10 and 12bits
 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
 - Full acceleration for AVX2 64bits processors, making it the fastest decoder
 - Partial acceleration for SSSE3 processors
 - Partial acceleration for NEON processors
Update NEWS for 0.7.1 2020-06-17 07:01:20 +02:00			`Changes for 0.7.1 'Frigatebird':`
			`------------------------------`

			`0.7.1 is a minor update on 0.7.0:`
Update NEWS for 0.7.1 2020-06-21 00:02:59 +02:00			`- ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC`
Update NEWS for 0.7.1 2020-06-17 07:01:20 +02:00			`- SSE2 optimizations for prep_bilin and prep_8tap`
			`- AVX2 optimizations for MC scaled`
			`- Fix a clamping issue in motion vector projection`
Update NEWS for 0.7.1 2020-06-21 00:02:59 +02:00			`- Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions`
Update NEWS for 0.7.1 2020-06-17 07:01:20 +02:00			`- Improvements on the dav1dplay utility player to support resizing`


Update NEWS for 0.7.0 2020-05-15 19:54:35 +02:00			`Changes for 0.7.0 'Frigatebird':`
			`------------------------------`

			`0.7.0 is a major release for dav1d:`
			`- Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)`
			`- 10b/12b ARM64 optimizations are mostly complete:`
			`- ipred (paeth, smooth, dc, pal, filter, cfl)`
			`- itxfm (only 10b)`
			`- AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize`
			`- AVX2 for cfl4:4:4`
			`- AVX-512 CDEF filter`
			`- ARM64 8b improvements for cfl_ac and itxfm`
			`- ARM64 implementation for emu_edge in 8b/10b/12b`
			`- ARM32 implementation for emu_edge in 8b`
Update NEWS for 0.7.0 2020-05-19 10:09:08 +02:00			`- Improvements on the dav1dplay utility player to support 10 bit,`
			`non-4:2:0 pixel formats and film grain on the GPU`
Update NEWS for 0.7.0 2020-05-15 19:54:35 +02:00

Update NEWS for 0.6.0 2020-02-09 14:35:57 +01:00			`Changes for 0.6.0 'Gyrfalcon':`
			`------------------------------`

			`0.6.0 is a major release for dav1d:`
			`- New ARM64 optimizations for the 10/12bit depth:`
			`- mc_avg, mc_w_avg, mc_mask`
			`- mc_put/mc_prep 8tap/bilin`
			`- mc_warp_8x8`
Update NEWS for 0.6.0 2020-03-05 20:03:56 +01:00			`- mc_w_mask`
			`- mc_blend`
Update NEWS for 0.6.0 2020-02-09 14:35:57 +01:00			`- wiener`
Update NEWS for 0.6.0 2020-02-25 10:21:42 +01:00			`- SGR`
			`- loopfilter`
Update NEWS for 0.6.0 2020-02-09 14:35:57 +01:00			`- cdef`
Update NEWS for 0.6.0 2020-02-25 10:21:42 +01:00			`- New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask`
Update NEWS for 0.6.0 2020-02-09 14:35:57 +01:00			`- New SSSE3 optimizations for film grain`
			`- New AVX2 optimizations for msac_adapt16`
			`- Fix rare mismatches against the reference decoder, notably because of clipping`
Update NEWS for 0.6.0 2020-02-25 10:21:42 +01:00			`- Improvements on ARM64 on msac, cdef and looprestoration optimizations`
			`- Improvements on AVX2 optimizations for cdef_filter`
Update NEWS for 0.6.0 2020-02-09 14:35:57 +01:00			`- Improvements in the C version for itxfm, cdef_filter`


Update NEWS 2019-12-02 18:19:06 +01:00			`Changes for 0.5.2 'Asiatic Cheetah':`
			`------------------------------------`

			`0.5.2 is a small release improving speed for ARM32 and adding minor features:`
			`- ARM32 optimizations for loopfilter, ipred_dc\|h\|v`
			`- Add section-5 raw OBU demuxer`
			`- Improve the speed by reducing the L2 cache collisions`
			`- Fix minor issues`


Update NEWS for 0.5.1 2019-10-25 18:46:28 +02:00			`Changes for 0.5.1 'Asiatic Cheetah':`
			`------------------------------------`

			`0.5.1 is a small release improving speeds and fixing minor issues`
			`compared to 0.5.0:`
			`- SSE2 optimizations for CDEF, wiener and warp_affine`
			`- NEON optimizations for SGR on ARM32`
			`- Fix mismatch issue in x86 asm in inverse identity transforms`
			`- Fix build issue in ARM64 assembly if debug info was enabled`
build: Add a workaround for Xcode 11 -fstack-check bug 2019-10-25 19:47:21 +02:00			`- Add a workaround for Xcode 11 -fstack-check bug`
Update NEWS for 0.5.1 2019-10-25 18:46:28 +02:00

Update NEWS for 0.5.0 2019-10-09 08:55:25 +02:00			`Changes for 0.5.0 'Asiatic Cheetah':`
Update NEWS for 0.5.1 2019-10-25 18:46:28 +02:00			`------------------------------------`
Update NEWS for 0.5.0 2019-10-09 08:55:25 +02:00
			`0.5.0 is a medium release fixing regressions and minor issues,`
			`and improving speed significantly:`
			`- Export ITU T.35 metadata`
			`- Speed improvements on blend_ on ARM`
			`- Speed improvements on decode_coef and MSAC`
Update news for 0.5.0: z2-avx2, ipred-neon and wiener-vsx 2019-10-11 10:57:27 +02:00			`- NEON optimizations for blend*, w_mask_, ipred functions for ARM64`
Update NEWS for 0.5.0 2019-10-09 08:55:25 +02:00			`- NEON optimizations for CDEF and warp on ARM32`
			`- SSE2 optimizations for MSAC hi_tok decoding`
			`- SSSE3 optimizations for deblocking loopfilters and warp_affine`
NEWS: Official naming is AVX2, not AVX-2 2020-02-10 21:19:08 +01:00			`- AVX2 optimizations for film grain and ipred_z2`
Update NEWS for 0.5.0 2019-10-09 08:55:25 +02:00			`- SSE4 optimizations for warp_affine`
Update news for 0.5.0: z2-avx2, ipred-neon and wiener-vsx 2019-10-11 10:57:27 +02:00			`- VSX optimizations for wiener`
Update NEWS for 0.5.0 2019-10-09 08:55:25 +02:00			`- Fix inverse transform overflows in x86 and NEON asm`
			`- Fix integer overflows with large frames`
			`- Improve film grain generation to match reference code`
			`- Improve compatibility with older binutils for ARM`
Update news for 0.5.0: z2-avx2, ipred-neon and wiener-vsx 2019-10-11 10:57:27 +02:00			`- More advanced Player example in tools`
Update NEWS for 0.5.0 2019-10-09 08:55:25 +02:00

Update NEWS for 0.4.0 2019-05-22 00:30:20 +02:00			`Changes for 0.4.0 'Cheetah':`
			`----------------------------`

			`- Fix playback with unknown OBUs`
			`- Add an option to limit the maximum frame size`
			`- SSE2 and ARM64 optimizations for MSAC`
			`- Improve speed on 32bits systems`
			`- Optimization in obmc blend`
Update NEWS for 0.4.0 2019-07-27 14:08:18 +02:00			`- Reduce RAM usage significantly`
dav1d 0.4.0 2019-08-02 23:35:51 +02:00			`- The initial PPC SIMD code, cdef_filter`
Update NEWS for 0.4.0 2019-07-27 14:08:18 +02:00			`- NEON optimizations for blend functions on ARM`
			`- NEON optimizations for w_mask functions on ARM`
			`- NEON optimizations for inverse transforms on ARM64`
Update news for 0.5.0: z2-avx2, ipred-neon and wiener-vsx 2019-10-11 10:57:27 +02:00			`- VSX optimizations for CDEF filter`
Update NEWS for 0.4.0 2019-07-27 14:08:18 +02:00			`- Improve handling of malloc failures`
			`- Simple Player example in tools`
Update NEWS for 0.4.0 2019-05-22 00:30:20 +02:00

Update NEWS and version for 0.3.1 2019-05-11 17:23:10 +02:00			`Changes for 0.3.1 'Sailfish':`
			`------------------------------`

			`- Fix a buffer overflow in frame-threading mode on SSSE3 CPUs`
			`- Reduce binary size, notably on Windows`
			`- SSSE3 optimizations for ipred_filter`
			`- ARM optimizations for MSAC`


Update NEWS for 0.3.0 - Sailfish 2019-04-24 11:42:54 +02:00			`Changes for 0.3.0 'Sailfish':`
			`------------------------------`

			`This is the final release for the numerous speed improvements of 0.3.0-rc.`
			`It mostly:`
			`- Fixes an annoying crash on SSSE3 that happened in the itx functions`


			`Changes for 0.2.2 (0.3.0-rc) 'Antelope':`
			`-----------------------------`
On the road to 0.2.2 2019-03-13 23:39:00 +01:00
Update NEWS for 0.2.2 2019-04-19 09:16:39 +02:00			`- Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase`
NEWS: Official naming is AVX2, not AVX-2 2020-02-10 21:19:08 +01:00			`The impact is important on SSSE3, SSE4 and AVX2 cpus`
Update NEWS for 0.2.2 2019-04-19 09:16:39 +02:00			`- SSSE3 optimizations for all blocks size in itx`
Update NEWS for 0.5.0 2019-10-09 08:55:25 +02:00			`- SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)`
Update NEWS for 0.2.2 2019-04-19 09:16:39 +02:00			`- Speed improvements on CDEF for SSE4 CPUs`
			`- NEON optimizations for SGR and loop filter`
			`- Minor crashes, improvements and build changes`

On the road to 0.2.2 2019-03-13 23:39:00 +01:00
On the road to 0.2.1 2019-03-04 18:15:48 +01:00			`Changes for 0.2.1 'Antelope':`
			`----------------------------`

Update NEWS for 0.2.1 2019-03-09 10:55:02 +01:00			`- SSSE3 optimization for cdef_dir`
NEWS: Official naming is AVX2, not AVX-2 2020-02-10 21:19:08 +01:00			`- AVX2 improvements of the existing CDEF optimizations`
Update NEWS for 0.2.1 2019-03-09 10:55:02 +01:00			`- NEON improvements of the existing CDEF and wiener optimizations`
			`- Clarification about the numbering/versionning scheme`

On the road to 0.2.1 2019-03-04 18:15:48 +01:00
Update NEWS for 0.2.0 2019-03-01 18:48:01 +01:00			`Changes for 0.2.0 'Antelope':`
On the road to 0.1.1 2018-12-15 12:29:51 +01:00			`----------------------------`

Update NEWS for 0.2.0 2019-03-01 18:48:01 +01:00			`- ARM64 and ARM optimizations using NEON instructions`
			`- SSSE3 optimizations for both 32 and 64bits`
NEWS: Official naming is AVX2, not AVX-2 2020-02-10 21:19:08 +01:00			`- More AVX2 assembly, reaching almost completion`
On the road to 0.1.1 2018-12-15 12:29:51 +01:00			`- Fix installation of includes`
			`- Rewrite inverse transforms to avoid overflows`
Update NEWS for 0.2.0 2019-03-01 18:48:01 +01:00			`- Snap packaging for Linux`
			`- Updated API (ABI and API break)`
			`- Fixes for un-decodable samples`
On the road to 0.1.1 2018-12-15 12:29:51 +01:00

Update NEWS 2018-12-11 15:14:56 +01:00			`Changes for 0.1.0 'Gazelle':`
			`----------------------------`
Initial commit for dav1d dav1d is an AV1 Decoder 2018-09-04 22:23:33 +02:00
On the way to 0.1.0 2018-12-10 22:34:59 +01:00			`Initial release of dav1d, the fast and small AV1 decoder.`
			`- Support for all features of the AV1 bitstream`
			`- Support for all bitdepth, 8, 10 and 12bits`
Update NEWS 2018-12-11 15:14:56 +01:00			`- Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 and grayscale`
NEWS: Official naming is AVX2, not AVX-2 2020-02-10 21:19:08 +01:00			`- Full acceleration for AVX2 64bits processors, making it the fastest decoder`
On the way to 0.1.0 2018-12-10 22:34:59 +01:00			`- Partial acceleration for SSSE3 processors`
			`- Partial acceleration for NEON processors`