Commit Graph

1297 Commits

Author SHA1 Message Date
Jason A. Donenfeld 3d3c92b471 compat: drop CentOS 8 Stream support
Nobody uses this and it's impossible to maintain given the current CI
situation.

RHEL 7 and 8 release remain for now, though that might not always be the
case. See the link for details.

Link: https://lists.zx2c4.com/pipermail/wireguard/2022-June/007664.html
Suggested-by: Philip J. Perry <phil@elrepo.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-06-29 15:36:49 +02:00
Jason A. Donenfeld 99935b07b4 compat: do not backport ktime_get_coarse_boottime_ns to c8s
Also bump the c8s version stamp.

Reported-by: Vladimír Beneš <vbenes@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-06-28 12:44:18 +02:00
Jason A. Donenfeld 18fbcd68a3 version: bump
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-06-27 12:54:37 +02:00
Jason A. Donenfeld 3ec3e822b6 compat: handle backported rng and blake2s
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-06-22 17:14:00 +02:00
Jason A. Donenfeld ba45dd6fbf qemu: give up on RHEL8 in CI
They keep breaking their kernel and being difficult when I send patches
to fix it, so just give up on trying to support this in the CI. It'll
bitrot and people will complain and we'll see what happens at that
point.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-05 16:24:24 +02:00
Jason A. Donenfeld c7560fd0e0 qemu: set panic_on_warn=1 from cmdline
Rather than setting this once init is running, set panic_on_warn from
the kernel command line, so that it catches splats from WireGuard
initialization code and the various crypto selftests.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-05 16:24:24 +02:00
Jason A. Donenfeld 33c87a1110 qemu: use vports on arm
Rather than having to hack up QEMU, just use the virtio serial device.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-05 16:24:24 +02:00
Jason A. Donenfeld 894152a5b8 netns: limit parallelism to $(nproc) tests at once
The parallel tests were added to catch queueing issues from multiple
cores. But what happens in reality when testing tons of processes is
that these separate threads wind up fighting with the scheduler, and we
wind up with contention in places we don't care about that decrease the
chances of hitting a bug. So just do a test with the number of CPU
cores, rather than trying to scale up arbitrarily.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-05 16:08:26 +02:00
Jason A. Donenfeld f8886735e3 netns: make routing loop test non-fatal
I hate to do this, but I still do not have a good solution to actually
fix this bug across architectures. So just disable it for now, so that
the CI can still deliver actionable results. This commit adds a large
red warning, so that at least the failure isn't lost forever, and
hopefully this can be revisited down the line.

Link: https://lore.kernel.org/netdev/CAHmME9pv1x6C4TNdL6648HydD8r+txpV4hTUXOBVkrapBXH4QQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/YmszSXueTxYOC41G@zx2c4.com/
Link: https://lore.kernel.org/wireguard/CAHmME9rNnBiNvBstb7MPwK-7AmAN0sOfnhdR=eeLrowWcKxaaQ@mail.gmail.com/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-05 16:07:27 +02:00
Nikolay Aleksandrov f9d9b4db6f device: check for metadata_dst with skb_valid_dst()
When we try to transmit an skb with md_dst attached through wireguard
we hit a null pointer dereference in wg_xmit() due to the use of
dst_mtu() which calls into dst_blackhole_mtu() which in turn tries to
dereference dst->dev.

Since wireguard doesn't use md_dsts we should use skb_valid_dst(), which
checks for DST_METADATA flag, and if it's set, then falls back to
wireguard's device mtu. That gives us the best chance of transmitting
the packet; otherwise if the blackhole netdev is used we'd get
ETH_MIN_MTU.

 [  263.693506] BUG: kernel NULL pointer dereference, address: 00000000000000e0
 [  263.693908] #PF: supervisor read access in kernel mode
 [  263.694174] #PF: error_code(0x0000) - not-present page
 [  263.694424] PGD 0 P4D 0
 [  263.694653] Oops: 0000 [#1] PREEMPT SMP NOPTI
 [  263.694876] CPU: 5 PID: 951 Comm: mausezahn Kdump: loaded Not tainted 5.18.0-rc1+ #522
 [  263.695190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
 [  263.695529] RIP: 0010:dst_blackhole_mtu+0x17/0x20
 [  263.695770] Code: 00 00 00 0f 1f 44 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 10 48 83 e0 fc 8b 40 04 85 c0 75 09 48 8b 07 <8b> 80 e0 00 00 00 c3 66 90 0f 1f 44 00 00 48 89 d7 be 01 00 00 00
 [  263.696339] RSP: 0018:ffffa4a4422fbb28 EFLAGS: 00010246
 [  263.696600] RAX: 0000000000000000 RBX: ffff8ac9c3553000 RCX: 0000000000000000
 [  263.696891] RDX: 0000000000000401 RSI: 00000000fffffe01 RDI: ffffc4a43fb48900
 [  263.697178] RBP: ffffa4a4422fbb90 R08: ffffffff9622635e R09: 0000000000000002
 [  263.697469] R10: ffffffff9b69a6c0 R11: ffffa4a4422fbd0c R12: ffff8ac9d18b1a00
 [  263.697766] R13: ffff8ac9d0ce1840 R14: ffff8ac9d18b1a00 R15: ffff8ac9c3553000
 [  263.698054] FS:  00007f3704c337c0(0000) GS:ffff8acaebf40000(0000) knlGS:0000000000000000
 [  263.698470] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  263.698826] CR2: 00000000000000e0 CR3: 0000000117a5c000 CR4: 00000000000006e0
 [  263.699214] Call Trace:
 [  263.699505]  <TASK>
 [  263.699759]  wg_xmit+0x411/0x450
 [  263.700059]  ? bpf_skb_set_tunnel_key+0x46/0x2d0
 [   263.700382]  ? dev_queue_xmit_nit+0x31/0x2b0
 [  263.700719]  dev_hard_start_xmit+0xd9/0x220
 [  263.701047]  __dev_queue_xmit+0x8b9/0xd30
 [  263.701344]  __bpf_redirect+0x1a4/0x380
 [  263.701664]  __dev_queue_xmit+0x83b/0xd30
 [  263.701961]  ? packet_parse_headers+0xb4/0xf0
 [  263.702275]  packet_sendmsg+0x9a8/0x16a0
 [  263.702596]  ? _raw_spin_unlock_irqrestore+0x23/0x40
 [  263.702933]  sock_sendmsg+0x5e/0x60
 [  263.703239]  __sys_sendto+0xf0/0x160
 [  263.703549]  __x64_sys_sendto+0x20/0x30
 [  263.703853]  do_syscall_64+0x3b/0x90
 [  263.704162]  entry_SYSCALL_64_after_hwframe+0x44/0xae
 [  263.704494] RIP: 0033:0x7f3704d50506
 [  263.704789] Code: 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
 [  263.705652] RSP: 002b:00007ffe954b0b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 [  263.706141] RAX: ffffffffffffffda RBX: 0000558bb259b490 RCX: 00007f3704d50506
 [  263.706544] RDX: 000000000000004a RSI: 0000558bb259b7b2 RDI: 0000000000000003
 [  263.706952] RBP: 0000000000000000 R08: 00007ffe954b0b90 R09: 0000000000000014
 [  263.707339] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe954b0b90
 [  263.707735] R13: 000000000000004a R14: 0000558bb259b7b2 R15: 0000000000000001
 [  263.708132]  </TASK>
 [  263.708398] Modules linked in: bridge netconsole bonding [last unloaded: bridge]
 [  263.708942] CR2: 00000000000000e0

Link: https://github.com/cilium/cilium/issues/19428
Reported-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
[Jason: polyfilled for < 4.3]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-14 14:49:25 +02:00
Jason A. Donenfeld f909532a21 qemu: enable ACPI for SMP
It turns out that by having CONFIG_ACPI=n, we've been failing to boot
additional CPUs, and so these systems were functionally UP. The code
bloat is unfortunate for build times, but I don't see an alternative. So
this commit sets CONFIG_ACPI=y for x86_64 and i686 configs.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-06 18:01:04 +02:00
Jason A. Donenfeld ec89ca64cb socket: ignore v6 endpoints when ipv6 is disabled
The previous commit fixed a memory leak on the send path in the event
that IPv6 is disabled at compile time, but how did a packet even arrive
there to begin with? It turns out we have previously allowed IPv6
endpoints even when IPv6 support is disabled at compile time. This is
awkward and inconsistent. Instead, let's just ignore all things IPv6,
the same way we do other malformed endpoints, in the case where IPv6 is
disabled.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-06 18:00:26 +02:00
Wang Hai fa32671b99 socket: free skb in send6 when ipv6 is disabled
I got a memory leak report:

unreferenced object 0xffff8881191fc040 (size 232):
  comm "kworker/u17:0", pid 23193, jiffies 4295238848 (age 3464.870s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff814c3ef4>] slab_post_alloc_hook+0x84/0x3b0
    [<ffffffff814c8977>] kmem_cache_alloc_node+0x167/0x340
    [<ffffffff832974fb>] __alloc_skb+0x1db/0x200
    [<ffffffff82612b5d>] wg_socket_send_buffer_to_peer+0x3d/0xc0
    [<ffffffff8260e94a>] wg_packet_send_handshake_initiation+0xfa/0x110
    [<ffffffff8260ec81>] wg_packet_handshake_send_worker+0x21/0x30
    [<ffffffff8119c558>] process_one_work+0x2e8/0x770
    [<ffffffff8119ca2a>] worker_thread+0x4a/0x4b0
    [<ffffffff811a88e0>] kthread+0x120/0x160
    [<ffffffff8100242f>] ret_from_fork+0x1f/0x30

In function wg_socket_send_buffer_as_reply_to_skb() or wg_socket_send_
buffer_to_peer(), the semantics of send6() is required to free skb. But
when CONFIG_IPV6 is disable, kfree_skb() is missing. This patch adds it
to fix this bug.

Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-06 17:59:56 +02:00
Jason A. Donenfeld ffb8cd6233 qemu: simplify RNG seeding
We don't actualy need to write anything in the pool. Instead, we just
force the total over 128, and we should be good to go for all old
kernels. We also only need this on getrandom() kernels, which simplifies
things too.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-03 00:10:28 +01:00
Jason A. Donenfeld 4eff63d292 queueing: use CFI-safe ptr_ring cleanup function
We make too nuanced use of ptr_ring to entirely move to the skb_array
wrappers, but we at least should avoid the naughty function pointer cast
when cleaning up skbs. Otherwise RAP/CFI will honk at us. This patch
uses the __skb_array_destroy_skb wrapper for the cleanup, rather than
directly providing kfree_skb, which is what other drivers in the same
situation do too.

Reported-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-02 23:49:51 +01:00
Jason A. Donenfeld 273018b7ea crypto: curve25519-x86_64: use in/out register constraints more precisely
Rather than passing all variables as modified, pass ones that are only
read into that parameter. This helps with old gcc versions when
alternatives are additionally used, and lets gcc's codegen be a little
bit more efficient. This also syncs up with the latest Vale/EverCrypt
output.

This also forward ports 3c9f3b6 ("crypto: curve25519-x86_64: solve
register constraints with reserved registers").

Cc: Aymeric Fromherz <aymeric.fromherz@inria.fr>
Cc: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/wireguard/1554725710.1290070.1639240504281.JavaMail.zimbra@inria.fr/
Link: https://github.com/project-everest/hacl-star/pull/501
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-13 17:25:37 +01:00
Jason A. Donenfeld 4f4c019873 compat: drop Ubuntu 14.04
It's been over a year since we announced sunsetting this.

Link: https://lore.kernel.org/wireguard/CAHmME9rckipsdZYW+LA=x6wCMybdFFA+VqoogFXnR=kHYiCteg@mail.gmail.com/T
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-13 15:06:45 +01:00
Jason A. Donenfeld 743eef2350 version: bump
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-08 16:09:03 +01:00
Mathias Krause 3c9f3b6997 crypto: curve25519-x86_64: solve register constraints with reserved registers
The register constraints for the inline assembly in fsqr() and fsqr2()
are pretty tight on what the compiler may assign to the remaining three
register variables. The clobber list only allows the following to be
used: RDI, RSI, RBP and R12. With RAP reserving R12 and a kernel having
CONFIG_FRAME_POINTER=y, claiming RBP, there are only two registers left
so the compiler rightfully complains about impossible constraints.

Provide alternatives that'll allow a memory reference for 'out' to solve
the allocation constraint dilemma for this configuration.

Also make 'out' an input-only operand as it is only used as such. This
not only allows gcc to optimize its usage further, but also works around
older gcc versions, apparently failing to handle multiple alternatives
correctly, as in failing to initialize the 'out' operand with its input
value.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-06 22:47:01 +01:00
Jason A. Donenfeld 8e40dd6270 compat: udp_tunnel: don't take reference to non-init namespace
The comment to sk_change_net is instructive:

  Kernel sockets, f.e. rtnl or icmp_socket, are a part of a namespace.
  They should not hold a reference to a namespace in order to allow
  to stop it.
  Sockets after sk_change_net should be released using sk_release_kernel

We weren't following these rules before, and were instead using
__sock_create, which means we kept a reference to the namespace, which
in turn meant that interfaces were not cleaned up on namespace
exit.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-06 18:34:54 +01:00
Arnd Bergmann ea6b8e7be5 compat: siphash: use _unaligned version by default
On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
because the ordinary load/store instructions (ldr, ldrh, ldrb) can
tolerate any misalignment of the memory address. However, load/store
double and load/store multiple instructions (ldrd, ldm) may still only
be used on memory addresses that are 32-bit aligned, and so we have to
use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
may end up with a severe performance hit due to alignment traps that
require fixups by the kernel. Testing shows that this currently happens
with clang-13 but not gcc-11. In theory, any compiler version can
produce this bug or other problems, as we are dealing with undefined
behavior in C99 even on architectures that support this in hardware,
see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363.

Fortunately, the get_unaligned() accessors do the right thing: when
building for ARMv6 or later, the compiler will emit unaligned accesses
using the ordinary load/store instructions (but avoid the ones that
require 32-bit alignment). When building for older ARM, those accessors
will emit the appropriate sequence of ldrb/mov/orr instructions. And on
architectures that can truly tolerate any kind of misalignment, the
get_unaligned() accessors resolve to the leXX_to_cpup accessors that
operate on aligned addresses.

Since the compiler will in fact emit ldrd or ldm instructions when
building this code for ARM v6 or later, the solution is to use the
unaligned accessors unconditionally on architectures where this is
known to be fast. The _aligned version of the hash function is
however still needed to get the best performance on architectures
that cannot do any unaligned access in hardware.

This new version avoids the undefined behavior and should produce
the fastest hash on all architectures we support.

Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-03 23:24:03 +01:00
Gustavo A. R. Silva 5325bc82aa ratelimiter: use kvcalloc() instead of kvzalloc()
Use 2-factor argument form kvcalloc() instead of kvzalloc().

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-03 23:24:03 +01:00
Jason A. Donenfeld e44c78cb66 receive: drop handshakes if queue lock is contended
If we're being delivered packets from multiple CPUs so quickly that the
ring lock is contended for CPU tries, then it's safe to assume that the
queue is near capacity anyway, so just drop the packet rather than
spinning. This helps deal with multicore DoS that can interfere with
data path performance. It _still_ does not completely fix the issue, but
it again chips away at it.

Reported-by: Streun Fabio <fstreun@student.ethz.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-03 23:24:03 +01:00
Jason A. Donenfeld 5707d38fb5 receive: use ring buffer for incoming handshakes
Apparently the spinlock on incoming_handshake's skb_queue is highly
contended, and a torrent of handshake or cookie packets can bring the
data plane to its knees, simply by virtue of enqueueing the handshake
packets to be processed asynchronously. So, we try switching this to a
ring buffer to hopefully have less lock contention. This alleviates the
problem somewhat, though it still isn't perfect, so future patches will
have to improve this further. However, it at least doesn't completely
diminish the data plane.

Reported-by: Streun Fabio <fstreun@student.ethz.ch>
Reported-by: Joel Wanner <joel.wanner@inf.ethz.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-03 23:24:03 +01:00
Jason A. Donenfeld 68abb1b9ba device: reset peer src endpoint when netns exits
Each peer's endpoint contains a dst_cache entry that takes a reference
to another netdev. When the containing namespace exits, we take down the
socket and prevent future sockets from being created (by setting
creating_net to NULL), which removes that potential reference on the
netns. However, it doesn't release references to the netns that a netdev
cached in dst_cache might be taking, so the netns still might fail to
exit. Since the socket is gimped anyway, we can simply clear all the
dst_caches (by way of clearing the endpoint src), which will release all
references.

However, the current dst_cache_reset function only releases those
references lazily. But it turns out that all of our usages of
wg_socket_clear_peer_endpoint_src are called from contexts that are not
exactly high-speed or bottle-necked. For example, when there's
connection difficulty, or when userspace is reconfiguring the interface.
And in particular for this patch, when the netns is exiting. So for
those cases, it makes more sense to call dst_release immediately. For
that, we add a small helper function to dst_cache.

This patch also adds a test to netns.sh from Hangbin Liu to ensure this
doesn't regress.

Test-by: Hangbin Liu <liuhangbin@gmail.com>
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-03 23:24:03 +01:00
Randy Dunlap ea3f5fbe7e main: rename 'mod_init' & 'mod_exit' functions to be module-specific
Rename module_init & module_exit functions that are named
"mod_init" and "mod_exit" so that they are unique in both the
System.map file and in initcall_debug output instead of showing
up as almost anonymous "mod_init".

This is helpful for debugging and in determining how long certain
module_init calls take to execute.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-03 23:24:03 +01:00
Jason A. Donenfeld cb001d4540 netns: actually test for routing loops
We previously removed the restriction on looping to self, and then added
a test to make sure the kernel didn't blow up during a routing loop. The
kernel didn't blow up, thankfully, but on certain architectures where
skb fragmentation is easier, such as ppc64, the skbs weren't actually
being discarded after a few rounds through. But the test wasn't catching
this. So actually test explicitly for massive increases in tx to see if
we have a routing loop. Note that the actual loop problem will need to
be addressed in a different commit.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-03 23:24:03 +01:00
Peter Georg 2715e64143 compat: update for RHEL 8.5
RHEL 8.5 has been released. Replace all ISCENTOS8S checks with ISRHEL8.
Increase RHEL_MINOR for CentOS 8 Stream detection to 6.

Signed-off-by: Peter Georg <peter.georg@physik.uni-regensburg.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-03 23:24:03 +01:00
Mathias Krause 29747255f9 compat: account for grsecurity backports and changes
grsecurity kernels tend to carry additional backports and changes, like
commit b60b87fc2996 ("netlink: add ethernet address policy types") or
the SYM_FUNC_* changes. RAP nowadays hooks the latter, therefore no
diversion to RAP_ENTRY is needed any more.

Instead of relying on the kernel version test, also test for the macros
we're about to define to not already be defined to account for these
additional changes in the grsecurity patch without breaking
compatibility to the older public ones.

Also test for CONFIG_PAX instead of RAP_PLUGIN for the timer API related
changes as these don't depend on the RAP plugin to be enabled but just a
PaX/grsecurity patch to be applied. While there is no preprocessor knob
for the latter, use CONFIG_PAX as this will likely be enabled in every
kernel that uses the patch.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
[zx2c4: small changes to include a header nearby a macro def test]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-08-08 22:28:36 +02:00
Jason A. Donenfeld 50dda8ce5e compat: account for latest c8s backports
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-15 01:13:41 +02:00
Jason A. Donenfeld d378f93078 version: bump
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-06 11:29:55 +02:00
Jason A. Donenfeld fb4a0da653 qemu: increase default dmesg log size
The selftests currently parse the kernel log at the end to track
potential memory leaks. With these tests now reading off the end of the
buffer, due to recent optimizations, some creation messages were lost,
making the tests think that there was a free without an alloc. Fix this
by increasing the kernel log size.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-06 10:38:33 +02:00
Jason A. Donenfeld 8f4414d334 qemu: add disgusting hacks for RHEL 8
Red Hat does awful things to their kernel for RHEL 8, such that it
doesn't even compile in most configurations. This is utter craziness,
and their response to me sending patches to fix this stuff has been to
stonewall for months on end and then do nothing.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-06 00:12:57 +02:00
Jason A. Donenfeld fd7a4621a5 allowedips: add missing __rcu annotation to satisfy sparse
A __rcu annotation got lost during refactoring, which caused sparse to
become enraged.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-04 17:49:34 +02:00
Jason A. Donenfeld 383461dba8 allowedips: free empty intermediate nodes when removing single node
When removing single nodes, it's possible that that node's parent is an
empty intermediate node, in which case, it too should be removed.
Otherwise the trie fills up and never is fully emptied, leading to
gradual memory leaks over time for tries that are modified often. There
was originally code to do this, but was removed during refactoring in
2016 and never reworked. Now that we have proper parent pointers from
the previous commits, we can implement this properly.

In order to reduce branching and expensive comparisons, we want to keep
the double pointer for parent assignment (which lets us easily chain up
to the root), but we still need to actually get the parent's base
address. So encode the bit number into the last two bits of the pointer,
and pack and unpack it as needed. This is a little bit clumsy but is the
fastest and less memory wasteful of the compromises. Note that we align
the root struct here to a minimum of 4, because it's embedded into a
larger struct, and we're relying on having the bottom two bits for our
flag, which would only be 16-bit aligned on m68k.

The existing macro-based helpers were a bit unwieldy for adding the bit
packing to, so this commit replaces them with safer and clearer ordinary
functions.

We add a test to the randomized/fuzzer part of the selftests, to free
the randomized tries by-peer, refuzz it, and repeat, until it's supposed
to be empty, and then then see if that actually resulted in the whole
thing being emptied. That combined with kmemcheck should hopefully make
sure this commit is doing what it should. Along the way this resulted in
various other cleanups of the tests and fixes for recent graphviz.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-04 16:57:59 +02:00
Jason A. Donenfeld 03add828b7 allowedips: allocate nodes in kmem_cache
The previous commit moved from O(n) to O(1) for removal, but in the
process introduced an additional pointer member to a struct that
increased the size from 60 to 68 bytes, putting nodes in the 128-byte
slab. With deployed systems having as many as 2 million nodes, this
represents a significant doubling in memory usage (128 MiB -> 256 MiB).
Fix this by using our own kmem_cache, that's sized exactly right. This
also makes wireguard's memory usage more transparent in tools like
slabtop and /proc/slabinfo.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-04 16:57:59 +02:00
Jason A. Donenfeld b56d48ce67 allowedips: remove nodes in O(1)
Previously, deleting peers would require traversing the entire trie in
order to rebalance nodes and safely free them. This meant that removing
1000 peers from a trie with a half million nodes would take an extremely
long time, during which we're holding the rtnl lock. Large-scale users
were reporting 200ms latencies added to the networking stack as a whole
every time their userspace software would queue up significant removals.
That's a serious situation.

This commit fixes that by maintaining a double pointer to the parent's
bit pointer for each node, and then using the already existing node list
belonging to each peer to go directly to the node, fix up its pointers,
and free it with RCU. This means removal is O(1) instead of O(n), and we
don't use gobs of stack.

The removal algorithm has the same downside as the code that it fixes:
it won't collapse needlessly long runs of fillers.  We can enhance that
in the future if it ever becomes a problem. This commit documents that
limitation with a TODO comment in code, a small but meaningful
improvement over the prior situation.

Currently the biggest flaw, which the next commit addresses, is that
because this increases the node size on 64-bit machines from 60 bytes to
68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up
using twice as much memory per node, because of power-of-two
allocations, which is a big bummer. We'll need to figure something out
there.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-04 16:57:59 +02:00
Jason A. Donenfeld 3c14c4bf90 allowedips: initialize list head in selftest
The randomized trie tests weren't initializing the dummy peer list head,
resulting in a NULL pointer dereference when used. Fix this by
initializing it in the randomized trie test, just like we do for the
static unit test.

While we're at it, all of the other strings like this have the word
"self-test", so add it to the missing place here.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-04 16:57:59 +02:00
Jason A. Donenfeld 4d8b7edca7 peer: allocate in kmem_cache
With deployments having upwards of 600k peers now, this somewhat heavy
structure could benefit from more fine-grained allocations.
Specifically, instead of using a 2048-byte slab for a 1544-byte object,
we can now use 1544-byte objects directly, thus saving almost 25%
per-peer, or with 600k peers, that's a savings of 303 MiB. This also
makes wireguard's memory usage more transparent in tools like slabtop
and /proc/slabinfo.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-04 16:57:59 +02:00
Jason A. Donenfeld 6fbc0e6284 global: use synchronize_net rather than synchronize_rcu
Many of the synchronization points are sometimes called under the rtnl
lock, which means we should use synchronize_net rather than
synchronize_rcu. Under the hood, this expands to using the expedited
flavor of function in the event that rtnl is held, in order to not stall
other concurrent changes.

This fixes some very, very long delays when removing multiple peers at
once, which would cause some operations to take several minutes.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-02 18:31:09 +02:00
Jason A. Donenfeld 405caf01e9 kbuild: do not use -O3
Apparently, various versions of gcc have O3-related miscompiles. Looking
at the difference between -O2 and -O3 for gcc 11 doesn't indicate
miscompiles, but the difference also doesn't seem so significant for
performance that it's worth risking.

Link: https://lore.kernel.org/lkml/CAHk-=wjuoGyxDhAF8SsrTkN0-YfCx7E6jUN3ikC_tn2AKWTTsA@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAHmME9otB5Wwxp7H8bR_i2uH2esEMvoBMC8uEXBMH9p0q1s6Bw@mail.gmail.com/
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-02 18:31:09 +02:00
Jason A. Donenfeld b50ef4dc45 netns: make sure rp_filter is disabled on vethc
Some distros may enable strict rp_filter by default, which will prevent
vethc from receiving the packets with an unroutable reverse path address.

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-02 18:31:09 +02:00
Jason A. Donenfeld e67b7226a3 version: bump
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-24 16:49:45 -04:00
Thadeu Lima de Souza Cascardo 1edffe201a Revert "compat: skb_mark_not_on_list will be backported to Ubuntu 18.04"
This reverts commit cad80597c7.

Because this commit has not been backported so far, due to the implications
of building Ubuntu's backport of wireguard in a timely manner.

For now, reverting this fix would allow wireguard-linux-compat CI to work
on Ubuntu 18.04.

A different fix or the same one can be applied again when the time is
right.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-23 14:32:21 -06:00
Peter Georg 2cf9543bed compat: update and improve detection of CentOS Stream 8
CentOS Stream 8 by now (4.18.0-301.1.el8) reports RHEL_MINOR=5. The
current RHEL 8 minor release is still 3. RHEL 8.4 is in beta. Replace
equal comparison by greater equal to (hopefully) be a little bit more
future proof.

Signed-off-by: Peter Georg <peter.georg@physik.uni-regensburg.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-22 19:51:51 -06:00
Jason A. Donenfeld 122f06bfd8 compat: icmp_ndo_send functions were backported extensively
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-03-07 08:14:33 -07:00
Jason A. Donenfeld 6e36d32801 version: bump
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-19 15:08:33 +01:00
Jason A. Donenfeld 72a0fe52de qemu: bump default kernel version
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-19 12:53:31 +01:00
Jason A. Donenfeld 51aff05520 compat: zero out skb->cb before icmp
This corresponds to the fancier upstream commit that's still on lkml,
which passes a zeroed ip_options struct to __icmp_send.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-19 12:53:05 +01:00
Thadeu Lima de Souza Cascardo cad80597c7 compat: skb_mark_not_on_list will be backported to Ubuntu 18.04
linux commit 22f6bbb7bcfcef0b373b0502a7ff390275c575dd ("net: use
skb_list_del_init() to remove from RX sublists") will be backported to Ubuntu
18.04 default kernel, which is based on linux 4.15.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-18 20:28:54 +01:00