wireguard-linux-compat

Commit Graph

Author	SHA1	Message	Date
Jason A. Donenfeld	3d3c92b471	compat: drop CentOS 8 Stream support Nobody uses this and it's impossible to maintain given the current CI situation. RHEL 7 and 8 release remain for now, though that might not always be the case. See the link for details. Link: https://lists.zx2c4.com/pipermail/wireguard/2022-June/007664.html Suggested-by: Philip J. Perry <phil@elrepo.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-29 15:36:49 +02:00
Jason A. Donenfeld	99935b07b4	compat: do not backport ktime_get_coarse_boottime_ns to c8s Also bump the c8s version stamp. Reported-by: Vladimír Beneš <vbenes@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-28 12:44:18 +02:00
Jason A. Donenfeld	18fbcd68a3	version: bump Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-27 12:54:37 +02:00
Jason A. Donenfeld	3ec3e822b6	compat: handle backported rng and blake2s Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-22 17:14:00 +02:00
Jason A. Donenfeld	ba45dd6fbf	qemu: give up on RHEL8 in CI They keep breaking their kernel and being difficult when I send patches to fix it, so just give up on trying to support this in the CI. It'll bitrot and people will complain and we'll see what happens at that point. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-05-05 16:24:24 +02:00
Jason A. Donenfeld	c7560fd0e0	qemu: set panic_on_warn=1 from cmdline Rather than setting this once init is running, set panic_on_warn from the kernel command line, so that it catches splats from WireGuard initialization code and the various crypto selftests. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-05-05 16:24:24 +02:00
Jason A. Donenfeld	33c87a1110	qemu: use vports on arm Rather than having to hack up QEMU, just use the virtio serial device. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-05-05 16:24:24 +02:00
Jason A. Donenfeld	894152a5b8	netns: limit parallelism to $(nproc) tests at once The parallel tests were added to catch queueing issues from multiple cores. But what happens in reality when testing tons of processes is that these separate threads wind up fighting with the scheduler, and we wind up with contention in places we don't care about that decrease the chances of hitting a bug. So just do a test with the number of CPU cores, rather than trying to scale up arbitrarily. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-05-05 16:08:26 +02:00
Jason A. Donenfeld	f8886735e3	netns: make routing loop test non-fatal I hate to do this, but I still do not have a good solution to actually fix this bug across architectures. So just disable it for now, so that the CI can still deliver actionable results. This commit adds a large red warning, so that at least the failure isn't lost forever, and hopefully this can be revisited down the line. Link: https://lore.kernel.org/netdev/CAHmME9pv1x6C4TNdL6648HydD8r+txpV4hTUXOBVkrapBXH4QQ@mail.gmail.com/ Link: https://lore.kernel.org/netdev/YmszSXueTxYOC41G@zx2c4.com/ Link: https://lore.kernel.org/wireguard/CAHmME9rNnBiNvBstb7MPwK-7AmAN0sOfnhdR=eeLrowWcKxaaQ@mail.gmail.com/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-05-05 16:07:27 +02:00
Nikolay Aleksandrov	f9d9b4db6f	device: check for metadata_dst with skb_valid_dst() When we try to transmit an skb with md_dst attached through wireguard we hit a null pointer dereference in wg_xmit() due to the use of dst_mtu() which calls into dst_blackhole_mtu() which in turn tries to dereference dst->dev. Since wireguard doesn't use md_dsts we should use skb_valid_dst(), which checks for DST_METADATA flag, and if it's set, then falls back to wireguard's device mtu. That gives us the best chance of transmitting the packet; otherwise if the blackhole netdev is used we'd get ETH_MIN_MTU. [ 263.693506] BUG: kernel NULL pointer dereference, address: 00000000000000e0 [ 263.693908] #PF: supervisor read access in kernel mode [ 263.694174] #PF: error_code(0x0000) - not-present page [ 263.694424] PGD 0 P4D 0 [ 263.694653] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 263.694876] CPU: 5 PID: 951 Comm: mausezahn Kdump: loaded Not tainted 5.18.0-rc1+ #522 [ 263.695190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 [ 263.695529] RIP: 0010:dst_blackhole_mtu+0x17/0x20 [ 263.695770] Code: 00 00 00 0f 1f 44 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 10 48 83 e0 fc 8b 40 04 85 c0 75 09 48 8b 07 <8b> 80 e0 00 00 00 c3 66 90 0f 1f 44 00 00 48 89 d7 be 01 00 00 00 [ 263.696339] RSP: 0018:ffffa4a4422fbb28 EFLAGS: 00010246 [ 263.696600] RAX: 0000000000000000 RBX: ffff8ac9c3553000 RCX: 0000000000000000 [ 263.696891] RDX: 0000000000000401 RSI: 00000000fffffe01 RDI: ffffc4a43fb48900 [ 263.697178] RBP: ffffa4a4422fbb90 R08: ffffffff9622635e R09: 0000000000000002 [ 263.697469] R10: ffffffff9b69a6c0 R11: ffffa4a4422fbd0c R12: ffff8ac9d18b1a00 [ 263.697766] R13: ffff8ac9d0ce1840 R14: ffff8ac9d18b1a00 R15: ffff8ac9c3553000 [ 263.698054] FS: 00007f3704c337c0(0000) GS:ffff8acaebf40000(0000) knlGS:0000000000000000 [ 263.698470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 263.698826] CR2: 00000000000000e0 CR3: 0000000117a5c000 CR4: 00000000000006e0 [ 263.699214] Call Trace: [ 263.699505] <TASK> [ 263.699759] wg_xmit+0x411/0x450 [ 263.700059] ? bpf_skb_set_tunnel_key+0x46/0x2d0 [ 263.700382] ? dev_queue_xmit_nit+0x31/0x2b0 [ 263.700719] dev_hard_start_xmit+0xd9/0x220 [ 263.701047] __dev_queue_xmit+0x8b9/0xd30 [ 263.701344] __bpf_redirect+0x1a4/0x380 [ 263.701664] __dev_queue_xmit+0x83b/0xd30 [ 263.701961] ? packet_parse_headers+0xb4/0xf0 [ 263.702275] packet_sendmsg+0x9a8/0x16a0 [ 263.702596] ? _raw_spin_unlock_irqrestore+0x23/0x40 [ 263.702933] sock_sendmsg+0x5e/0x60 [ 263.703239] __sys_sendto+0xf0/0x160 [ 263.703549] __x64_sys_sendto+0x20/0x30 [ 263.703853] do_syscall_64+0x3b/0x90 [ 263.704162] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 263.704494] RIP: 0033:0x7f3704d50506 [ 263.704789] Code: 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89 [ 263.705652] RSP: 002b:00007ffe954b0b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 263.706141] RAX: ffffffffffffffda RBX: 0000558bb259b490 RCX: 00007f3704d50506 [ 263.706544] RDX: 000000000000004a RSI: 0000558bb259b7b2 RDI: 0000000000000003 [ 263.706952] RBP: 0000000000000000 R08: 00007ffe954b0b90 R09: 0000000000000014 [ 263.707339] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe954b0b90 [ 263.707735] R13: 000000000000004a R14: 0000558bb259b7b2 R15: 0000000000000001 [ 263.708132] </TASK> [ 263.708398] Modules linked in: bridge netconsole bonding [last unloaded: bridge] [ 263.708942] CR2: 00000000000000e0 Link: https://github.com/cilium/cilium/issues/19428 Reported-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> [Jason: polyfilled for < 4.3] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-04-14 14:49:25 +02:00
Jason A. Donenfeld	f909532a21	qemu: enable ACPI for SMP It turns out that by having CONFIG_ACPI=n, we've been failing to boot additional CPUs, and so these systems were functionally UP. The code bloat is unfortunate for build times, but I don't see an alternative. So this commit sets CONFIG_ACPI=y for x86_64 and i686 configs. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-04-06 18:01:04 +02:00
Jason A. Donenfeld	ec89ca64cb	socket: ignore v6 endpoints when ipv6 is disabled The previous commit fixed a memory leak on the send path in the event that IPv6 is disabled at compile time, but how did a packet even arrive there to begin with? It turns out we have previously allowed IPv6 endpoints even when IPv6 support is disabled at compile time. This is awkward and inconsistent. Instead, let's just ignore all things IPv6, the same way we do other malformed endpoints, in the case where IPv6 is disabled. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-04-06 18:00:26 +02:00
Wang Hai	fa32671b99	socket: free skb in send6 when ipv6 is disabled I got a memory leak report: unreferenced object 0xffff8881191fc040 (size 232): comm "kworker/u17:0", pid 23193, jiffies 4295238848 (age 3464.870s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814c3ef4>] slab_post_alloc_hook+0x84/0x3b0 [<ffffffff814c8977>] kmem_cache_alloc_node+0x167/0x340 [<ffffffff832974fb>] __alloc_skb+0x1db/0x200 [<ffffffff82612b5d>] wg_socket_send_buffer_to_peer+0x3d/0xc0 [<ffffffff8260e94a>] wg_packet_send_handshake_initiation+0xfa/0x110 [<ffffffff8260ec81>] wg_packet_handshake_send_worker+0x21/0x30 [<ffffffff8119c558>] process_one_work+0x2e8/0x770 [<ffffffff8119ca2a>] worker_thread+0x4a/0x4b0 [<ffffffff811a88e0>] kthread+0x120/0x160 [<ffffffff8100242f>] ret_from_fork+0x1f/0x30 In function wg_socket_send_buffer_as_reply_to_skb() or wg_socket_send_ buffer_to_peer(), the semantics of send6() is required to free skb. But when CONFIG_IPV6 is disable, kfree_skb() is missing. This patch adds it to fix this bug. Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-04-06 17:59:56 +02:00
Jason A. Donenfeld	ffb8cd6233	qemu: simplify RNG seeding We don't actualy need to write anything in the pool. Instead, we just force the total over 128, and we should be good to go for all old kernels. We also only need this on getrandom() kernels, which simplifies things too. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-03-03 00:10:28 +01:00
Jason A. Donenfeld	4eff63d292	queueing: use CFI-safe ptr_ring cleanup function We make too nuanced use of ptr_ring to entirely move to the skb_array wrappers, but we at least should avoid the naughty function pointer cast when cleaning up skbs. Otherwise RAP/CFI will honk at us. This patch uses the __skb_array_destroy_skb wrapper for the cleanup, rather than directly providing kfree_skb, which is what other drivers in the same situation do too. Reported-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-03-02 23:49:51 +01:00
Jason A. Donenfeld	273018b7ea	crypto: curve25519-x86_64: use in/out register constraints more precisely Rather than passing all variables as modified, pass ones that are only read into that parameter. This helps with old gcc versions when alternatives are additionally used, and lets gcc's codegen be a little bit more efficient. This also syncs up with the latest Vale/EverCrypt output. This also forward ports `3c9f3b6` ("crypto: curve25519-x86_64: solve register constraints with reserved registers"). Cc: Aymeric Fromherz <aymeric.fromherz@inria.fr> Cc: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/wireguard/1554725710.1290070.1639240504281.JavaMail.zimbra@inria.fr/ Link: https://github.com/project-everest/hacl-star/pull/501 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-13 17:25:37 +01:00
Jason A. Donenfeld	4f4c019873	compat: drop Ubuntu 14.04 It's been over a year since we announced sunsetting this. Link: https://lore.kernel.org/wireguard/CAHmME9rckipsdZYW+LA=x6wCMybdFFA+VqoogFXnR=kHYiCteg@mail.gmail.com/T Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-13 15:06:45 +01:00
Jason A. Donenfeld	743eef2350	version: bump Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-08 16:09:03 +01:00
Mathias Krause	3c9f3b6997	crypto: curve25519-x86_64: solve register constraints with reserved registers The register constraints for the inline assembly in fsqr() and fsqr2() are pretty tight on what the compiler may assign to the remaining three register variables. The clobber list only allows the following to be used: RDI, RSI, RBP and R12. With RAP reserving R12 and a kernel having CONFIG_FRAME_POINTER=y, claiming RBP, there are only two registers left so the compiler rightfully complains about impossible constraints. Provide alternatives that'll allow a memory reference for 'out' to solve the allocation constraint dilemma for this configuration. Also make 'out' an input-only operand as it is only used as such. This not only allows gcc to optimize its usage further, but also works around older gcc versions, apparently failing to handle multiple alternatives correctly, as in failing to initialize the 'out' operand with its input value. Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-06 22:47:01 +01:00
Jason A. Donenfeld	8e40dd6270	compat: udp_tunnel: don't take reference to non-init namespace The comment to sk_change_net is instructive: Kernel sockets, f.e. rtnl or icmp_socket, are a part of a namespace. They should not hold a reference to a namespace in order to allow to stop it. Sockets after sk_change_net should be released using sk_release_kernel We weren't following these rules before, and were instead using __sock_create, which means we kept a reference to the namespace, which in turn meant that interfaces were not cleaned up on namespace exit. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-06 18:34:54 +01:00
Arnd Bergmann	ea6b8e7be5	compat: siphash: use _unaligned version by default On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because the ordinary load/store instructions (ldr, ldrh, ldrb) can tolerate any misalignment of the memory address. However, load/store double and load/store multiple instructions (ldrd, ldm) may still only be used on memory addresses that are 32-bit aligned, and so we have to use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we may end up with a severe performance hit due to alignment traps that require fixups by the kernel. Testing shows that this currently happens with clang-13 but not gcc-11. In theory, any compiler version can produce this bug or other problems, as we are dealing with undefined behavior in C99 even on architectures that support this in hardware, see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363. Fortunately, the get_unaligned() accessors do the right thing: when building for ARMv6 or later, the compiler will emit unaligned accesses using the ordinary load/store instructions (but avoid the ones that require 32-bit alignment). When building for older ARM, those accessors will emit the appropriate sequence of ldrb/mov/orr instructions. And on architectures that can truly tolerate any kind of misalignment, the get_unaligned() accessors resolve to the leXX_to_cpup accessors that operate on aligned addresses. Since the compiler will in fact emit ldrd or ldm instructions when building this code for ARM v6 or later, the solution is to use the unaligned accessors unconditionally on architectures where this is known to be fast. The _aligned version of the hash function is however still needed to get the best performance on architectures that cannot do any unaligned access in hardware. This new version avoids the undefined behavior and should produce the fastest hash on all architectures we support. Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-03 23:24:03 +01:00
Gustavo A. R. Silva	5325bc82aa	ratelimiter: use kvcalloc() instead of kvzalloc() Use 2-factor argument form kvcalloc() instead of kvzalloc(). Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-03 23:24:03 +01:00
Jason A. Donenfeld	e44c78cb66	receive: drop handshakes if queue lock is contended If we're being delivered packets from multiple CPUs so quickly that the ring lock is contended for CPU tries, then it's safe to assume that the queue is near capacity anyway, so just drop the packet rather than spinning. This helps deal with multicore DoS that can interfere with data path performance. It _still_ does not completely fix the issue, but it again chips away at it. Reported-by: Streun Fabio <fstreun@student.ethz.ch> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-03 23:24:03 +01:00
Jason A. Donenfeld	5707d38fb5	receive: use ring buffer for incoming handshakes Apparently the spinlock on incoming_handshake's skb_queue is highly contended, and a torrent of handshake or cookie packets can bring the data plane to its knees, simply by virtue of enqueueing the handshake packets to be processed asynchronously. So, we try switching this to a ring buffer to hopefully have less lock contention. This alleviates the problem somewhat, though it still isn't perfect, so future patches will have to improve this further. However, it at least doesn't completely diminish the data plane. Reported-by: Streun Fabio <fstreun@student.ethz.ch> Reported-by: Joel Wanner <joel.wanner@inf.ethz.ch> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-03 23:24:03 +01:00
Jason A. Donenfeld	68abb1b9ba	device: reset peer src endpoint when netns exits Each peer's endpoint contains a dst_cache entry that takes a reference to another netdev. When the containing namespace exits, we take down the socket and prevent future sockets from being created (by setting creating_net to NULL), which removes that potential reference on the netns. However, it doesn't release references to the netns that a netdev cached in dst_cache might be taking, so the netns still might fail to exit. Since the socket is gimped anyway, we can simply clear all the dst_caches (by way of clearing the endpoint src), which will release all references. However, the current dst_cache_reset function only releases those references lazily. But it turns out that all of our usages of wg_socket_clear_peer_endpoint_src are called from contexts that are not exactly high-speed or bottle-necked. For example, when there's connection difficulty, or when userspace is reconfiguring the interface. And in particular for this patch, when the netns is exiting. So for those cases, it makes more sense to call dst_release immediately. For that, we add a small helper function to dst_cache. This patch also adds a test to netns.sh from Hangbin Liu to ensure this doesn't regress. Test-by: Hangbin Liu <liuhangbin@gmail.com> Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-03 23:24:03 +01:00
Randy Dunlap	ea3f5fbe7e	main: rename 'mod_init' & 'mod_exit' functions to be module-specific Rename module_init & module_exit functions that are named "mod_init" and "mod_exit" so that they are unique in both the System.map file and in initcall_debug output instead of showing up as almost anonymous "mod_init". This is helpful for debugging and in determining how long certain module_init calls take to execute. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-03 23:24:03 +01:00
Jason A. Donenfeld	cb001d4540	netns: actually test for routing loops We previously removed the restriction on looping to self, and then added a test to make sure the kernel didn't blow up during a routing loop. The kernel didn't blow up, thankfully, but on certain architectures where skb fragmentation is easier, such as ppc64, the skbs weren't actually being discarded after a few rounds through. But the test wasn't catching this. So actually test explicitly for massive increases in tx to see if we have a routing loop. Note that the actual loop problem will need to be addressed in a different commit. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-03 23:24:03 +01:00
Peter Georg	2715e64143	compat: update for RHEL 8.5 RHEL 8.5 has been released. Replace all ISCENTOS8S checks with ISRHEL8. Increase RHEL_MINOR for CentOS 8 Stream detection to 6. Signed-off-by: Peter Georg <peter.georg@physik.uni-regensburg.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-12-03 23:24:03 +01:00
Mathias Krause	29747255f9	compat: account for grsecurity backports and changes grsecurity kernels tend to carry additional backports and changes, like commit b60b87fc2996 ("netlink: add ethernet address policy types") or the SYM_FUNC_* changes. RAP nowadays hooks the latter, therefore no diversion to RAP_ENTRY is needed any more. Instead of relying on the kernel version test, also test for the macros we're about to define to not already be defined to account for these additional changes in the grsecurity patch without breaking compatibility to the older public ones. Also test for CONFIG_PAX instead of RAP_PLUGIN for the timer API related changes as these don't depend on the RAP plugin to be enabled but just a PaX/grsecurity patch to be applied. While there is no preprocessor knob for the latter, use CONFIG_PAX as this will likely be enabled in every kernel that uses the patch. Signed-off-by: Mathias Krause <minipli@grsecurity.net> [zx2c4: small changes to include a header nearby a macro def test] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-08-08 22:28:36 +02:00
Jason A. Donenfeld	50dda8ce5e	compat: account for latest c8s backports Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-15 01:13:41 +02:00
Jason A. Donenfeld	d378f93078	version: bump Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-06 11:29:55 +02:00
Jason A. Donenfeld	fb4a0da653	qemu: increase default dmesg log size The selftests currently parse the kernel log at the end to track potential memory leaks. With these tests now reading off the end of the buffer, due to recent optimizations, some creation messages were lost, making the tests think that there was a free without an alloc. Fix this by increasing the kernel log size. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-06 10:38:33 +02:00
Jason A. Donenfeld	8f4414d334	qemu: add disgusting hacks for RHEL 8 Red Hat does awful things to their kernel for RHEL 8, such that it doesn't even compile in most configurations. This is utter craziness, and their response to me sending patches to fix this stuff has been to stonewall for months on end and then do nothing. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-06 00:12:57 +02:00
Jason A. Donenfeld	fd7a4621a5	allowedips: add missing __rcu annotation to satisfy sparse A __rcu annotation got lost during refactoring, which caused sparse to become enraged. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-04 17:49:34 +02:00
Jason A. Donenfeld	383461dba8	allowedips: free empty intermediate nodes when removing single node When removing single nodes, it's possible that that node's parent is an empty intermediate node, in which case, it too should be removed. Otherwise the trie fills up and never is fully emptied, leading to gradual memory leaks over time for tries that are modified often. There was originally code to do this, but was removed during refactoring in 2016 and never reworked. Now that we have proper parent pointers from the previous commits, we can implement this properly. In order to reduce branching and expensive comparisons, we want to keep the double pointer for parent assignment (which lets us easily chain up to the root), but we still need to actually get the parent's base address. So encode the bit number into the last two bits of the pointer, and pack and unpack it as needed. This is a little bit clumsy but is the fastest and less memory wasteful of the compromises. Note that we align the root struct here to a minimum of 4, because it's embedded into a larger struct, and we're relying on having the bottom two bits for our flag, which would only be 16-bit aligned on m68k. The existing macro-based helpers were a bit unwieldy for adding the bit packing to, so this commit replaces them with safer and clearer ordinary functions. We add a test to the randomized/fuzzer part of the selftests, to free the randomized tries by-peer, refuzz it, and repeat, until it's supposed to be empty, and then then see if that actually resulted in the whole thing being emptied. That combined with kmemcheck should hopefully make sure this commit is doing what it should. Along the way this resulted in various other cleanups of the tests and fixes for recent graphviz. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-04 16:57:59 +02:00
Jason A. Donenfeld	03add828b7	allowedips: allocate nodes in kmem_cache The previous commit moved from O(n) to O(1) for removal, but in the process introduced an additional pointer member to a struct that increased the size from 60 to 68 bytes, putting nodes in the 128-byte slab. With deployed systems having as many as 2 million nodes, this represents a significant doubling in memory usage (128 MiB -> 256 MiB). Fix this by using our own kmem_cache, that's sized exactly right. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-04 16:57:59 +02:00
Jason A. Donenfeld	b56d48ce67	allowedips: remove nodes in O(1) Previously, deleting peers would require traversing the entire trie in order to rebalance nodes and safely free them. This meant that removing 1000 peers from a trie with a half million nodes would take an extremely long time, during which we're holding the rtnl lock. Large-scale users were reporting 200ms latencies added to the networking stack as a whole every time their userspace software would queue up significant removals. That's a serious situation. This commit fixes that by maintaining a double pointer to the parent's bit pointer for each node, and then using the already existing node list belonging to each peer to go directly to the node, fix up its pointers, and free it with RCU. This means removal is O(1) instead of O(n), and we don't use gobs of stack. The removal algorithm has the same downside as the code that it fixes: it won't collapse needlessly long runs of fillers. We can enhance that in the future if it ever becomes a problem. This commit documents that limitation with a TODO comment in code, a small but meaningful improvement over the prior situation. Currently the biggest flaw, which the next commit addresses, is that because this increases the node size on 64-bit machines from 60 bytes to 68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up using twice as much memory per node, because of power-of-two allocations, which is a big bummer. We'll need to figure something out there. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-04 16:57:59 +02:00
Jason A. Donenfeld	3c14c4bf90	allowedips: initialize list head in selftest The randomized trie tests weren't initializing the dummy peer list head, resulting in a NULL pointer dereference when used. Fix this by initializing it in the randomized trie test, just like we do for the static unit test. While we're at it, all of the other strings like this have the word "self-test", so add it to the missing place here. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-04 16:57:59 +02:00
Jason A. Donenfeld	4d8b7edca7	peer: allocate in kmem_cache With deployments having upwards of 600k peers now, this somewhat heavy structure could benefit from more fine-grained allocations. Specifically, instead of using a 2048-byte slab for a 1544-byte object, we can now use 1544-byte objects directly, thus saving almost 25% per-peer, or with 600k peers, that's a savings of 303 MiB. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-04 16:57:59 +02:00
Jason A. Donenfeld	6fbc0e6284	global: use synchronize_net rather than synchronize_rcu Many of the synchronization points are sometimes called under the rtnl lock, which means we should use synchronize_net rather than synchronize_rcu. Under the hood, this expands to using the expedited flavor of function in the event that rtnl is held, in order to not stall other concurrent changes. This fixes some very, very long delays when removing multiple peers at once, which would cause some operations to take several minutes. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-02 18:31:09 +02:00
Jason A. Donenfeld	405caf01e9	kbuild: do not use -O3 Apparently, various versions of gcc have O3-related miscompiles. Looking at the difference between -O2 and -O3 for gcc 11 doesn't indicate miscompiles, but the difference also doesn't seem so significant for performance that it's worth risking. Link: https://lore.kernel.org/lkml/CAHk-=wjuoGyxDhAF8SsrTkN0-YfCx7E6jUN3ikC_tn2AKWTTsA@mail.gmail.com/ Link: https://lore.kernel.org/lkml/CAHmME9otB5Wwxp7H8bR_i2uH2esEMvoBMC8uEXBMH9p0q1s6Bw@mail.gmail.com/ Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-02 18:31:09 +02:00
Jason A. Donenfeld	b50ef4dc45	netns: make sure rp_filter is disabled on vethc Some distros may enable strict rp_filter by default, which will prevent vethc from receiving the packets with an unroutable reverse path address. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-06-02 18:31:09 +02:00
Jason A. Donenfeld	e67b7226a3	version: bump Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-04-24 16:49:45 -04:00
Thadeu Lima de Souza Cascardo	1edffe201a	Revert "compat: skb_mark_not_on_list will be backported to Ubuntu 18.04" This reverts commit `cad80597c7`. Because this commit has not been backported so far, due to the implications of building Ubuntu's backport of wireguard in a timely manner. For now, reverting this fix would allow wireguard-linux-compat CI to work on Ubuntu 18.04. A different fix or the same one can be applied again when the time is right. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-04-23 14:32:21 -06:00
Peter Georg	2cf9543bed	compat: update and improve detection of CentOS Stream 8 CentOS Stream 8 by now (4.18.0-301.1.el8) reports RHEL_MINOR=5. The current RHEL 8 minor release is still 3. RHEL 8.4 is in beta. Replace equal comparison by greater equal to (hopefully) be a little bit more future proof. Signed-off-by: Peter Georg <peter.georg@physik.uni-regensburg.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-04-22 19:51:51 -06:00
Jason A. Donenfeld	122f06bfd8	compat: icmp_ndo_send functions were backported extensively Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-03-07 08:14:33 -07:00
Jason A. Donenfeld	6e36d32801	version: bump Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-19 15:08:33 +01:00
Jason A. Donenfeld	72a0fe52de	qemu: bump default kernel version Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-19 12:53:31 +01:00
Jason A. Donenfeld	51aff05520	compat: zero out skb->cb before icmp This corresponds to the fancier upstream commit that's still on lkml, which passes a zeroed ip_options struct to __icmp_send. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-19 12:53:05 +01:00
Thadeu Lima de Souza Cascardo	cad80597c7	compat: skb_mark_not_on_list will be backported to Ubuntu 18.04 linux commit 22f6bbb7bcfcef0b373b0502a7ff390275c575dd ("net: use skb_list_del_init() to remove from RX sublists") will be backported to Ubuntu 18.04 default kernel, which is based on linux 4.15. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-18 20:28:54 +01:00

1 2 3 4 5 ...

1297 Commits All Branches Search

1297 Commits

All Branches