Nobody uses this and it's impossible to maintain given the current CI
situation.
RHEL 7 and 8 release remain for now, though that might not always be the
case. See the link for details.
Link: https://lists.zx2c4.com/pipermail/wireguard/2022-June/007664.html
Suggested-by: Philip J. Perry <phil@elrepo.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
They keep breaking their kernel and being difficult when I send patches
to fix it, so just give up on trying to support this in the CI. It'll
bitrot and people will complain and we'll see what happens at that
point.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Rather than setting this once init is running, set panic_on_warn from
the kernel command line, so that it catches splats from WireGuard
initialization code and the various crypto selftests.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The parallel tests were added to catch queueing issues from multiple
cores. But what happens in reality when testing tons of processes is
that these separate threads wind up fighting with the scheduler, and we
wind up with contention in places we don't care about that decrease the
chances of hitting a bug. So just do a test with the number of CPU
cores, rather than trying to scale up arbitrarily.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
It turns out that by having CONFIG_ACPI=n, we've been failing to boot
additional CPUs, and so these systems were functionally UP. The code
bloat is unfortunate for build times, but I don't see an alternative. So
this commit sets CONFIG_ACPI=y for x86_64 and i686 configs.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The previous commit fixed a memory leak on the send path in the event
that IPv6 is disabled at compile time, but how did a packet even arrive
there to begin with? It turns out we have previously allowed IPv6
endpoints even when IPv6 support is disabled at compile time. This is
awkward and inconsistent. Instead, let's just ignore all things IPv6,
the same way we do other malformed endpoints, in the case where IPv6 is
disabled.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
We don't actualy need to write anything in the pool. Instead, we just
force the total over 128, and we should be good to go for all old
kernels. We also only need this on getrandom() kernels, which simplifies
things too.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
We make too nuanced use of ptr_ring to entirely move to the skb_array
wrappers, but we at least should avoid the naughty function pointer cast
when cleaning up skbs. Otherwise RAP/CFI will honk at us. This patch
uses the __skb_array_destroy_skb wrapper for the cleanup, rather than
directly providing kfree_skb, which is what other drivers in the same
situation do too.
Reported-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Rather than passing all variables as modified, pass ones that are only
read into that parameter. This helps with old gcc versions when
alternatives are additionally used, and lets gcc's codegen be a little
bit more efficient. This also syncs up with the latest Vale/EverCrypt
output.
This also forward ports 3c9f3b6 ("crypto: curve25519-x86_64: solve
register constraints with reserved registers").
Cc: Aymeric Fromherz <aymeric.fromherz@inria.fr>
Cc: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/wireguard/1554725710.1290070.1639240504281.JavaMail.zimbra@inria.fr/
Link: https://github.com/project-everest/hacl-star/pull/501
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The register constraints for the inline assembly in fsqr() and fsqr2()
are pretty tight on what the compiler may assign to the remaining three
register variables. The clobber list only allows the following to be
used: RDI, RSI, RBP and R12. With RAP reserving R12 and a kernel having
CONFIG_FRAME_POINTER=y, claiming RBP, there are only two registers left
so the compiler rightfully complains about impossible constraints.
Provide alternatives that'll allow a memory reference for 'out' to solve
the allocation constraint dilemma for this configuration.
Also make 'out' an input-only operand as it is only used as such. This
not only allows gcc to optimize its usage further, but also works around
older gcc versions, apparently failing to handle multiple alternatives
correctly, as in failing to initialize the 'out' operand with its input
value.
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The comment to sk_change_net is instructive:
Kernel sockets, f.e. rtnl or icmp_socket, are a part of a namespace.
They should not hold a reference to a namespace in order to allow
to stop it.
Sockets after sk_change_net should be released using sk_release_kernel
We weren't following these rules before, and were instead using
__sock_create, which means we kept a reference to the namespace, which
in turn meant that interfaces were not cleaned up on namespace
exit.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
because the ordinary load/store instructions (ldr, ldrh, ldrb) can
tolerate any misalignment of the memory address. However, load/store
double and load/store multiple instructions (ldrd, ldm) may still only
be used on memory addresses that are 32-bit aligned, and so we have to
use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
may end up with a severe performance hit due to alignment traps that
require fixups by the kernel. Testing shows that this currently happens
with clang-13 but not gcc-11. In theory, any compiler version can
produce this bug or other problems, as we are dealing with undefined
behavior in C99 even on architectures that support this in hardware,
see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363.
Fortunately, the get_unaligned() accessors do the right thing: when
building for ARMv6 or later, the compiler will emit unaligned accesses
using the ordinary load/store instructions (but avoid the ones that
require 32-bit alignment). When building for older ARM, those accessors
will emit the appropriate sequence of ldrb/mov/orr instructions. And on
architectures that can truly tolerate any kind of misalignment, the
get_unaligned() accessors resolve to the leXX_to_cpup accessors that
operate on aligned addresses.
Since the compiler will in fact emit ldrd or ldm instructions when
building this code for ARM v6 or later, the solution is to use the
unaligned accessors unconditionally on architectures where this is
known to be fast. The _aligned version of the hash function is
however still needed to get the best performance on architectures
that cannot do any unaligned access in hardware.
This new version avoids the undefined behavior and should produce
the fastest hash on all architectures we support.
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Use 2-factor argument form kvcalloc() instead of kvzalloc().
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
If we're being delivered packets from multiple CPUs so quickly that the
ring lock is contended for CPU tries, then it's safe to assume that the
queue is near capacity anyway, so just drop the packet rather than
spinning. This helps deal with multicore DoS that can interfere with
data path performance. It _still_ does not completely fix the issue, but
it again chips away at it.
Reported-by: Streun Fabio <fstreun@student.ethz.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Apparently the spinlock on incoming_handshake's skb_queue is highly
contended, and a torrent of handshake or cookie packets can bring the
data plane to its knees, simply by virtue of enqueueing the handshake
packets to be processed asynchronously. So, we try switching this to a
ring buffer to hopefully have less lock contention. This alleviates the
problem somewhat, though it still isn't perfect, so future patches will
have to improve this further. However, it at least doesn't completely
diminish the data plane.
Reported-by: Streun Fabio <fstreun@student.ethz.ch>
Reported-by: Joel Wanner <joel.wanner@inf.ethz.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Each peer's endpoint contains a dst_cache entry that takes a reference
to another netdev. When the containing namespace exits, we take down the
socket and prevent future sockets from being created (by setting
creating_net to NULL), which removes that potential reference on the
netns. However, it doesn't release references to the netns that a netdev
cached in dst_cache might be taking, so the netns still might fail to
exit. Since the socket is gimped anyway, we can simply clear all the
dst_caches (by way of clearing the endpoint src), which will release all
references.
However, the current dst_cache_reset function only releases those
references lazily. But it turns out that all of our usages of
wg_socket_clear_peer_endpoint_src are called from contexts that are not
exactly high-speed or bottle-necked. For example, when there's
connection difficulty, or when userspace is reconfiguring the interface.
And in particular for this patch, when the netns is exiting. So for
those cases, it makes more sense to call dst_release immediately. For
that, we add a small helper function to dst_cache.
This patch also adds a test to netns.sh from Hangbin Liu to ensure this
doesn't regress.
Test-by: Hangbin Liu <liuhangbin@gmail.com>
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Rename module_init & module_exit functions that are named
"mod_init" and "mod_exit" so that they are unique in both the
System.map file and in initcall_debug output instead of showing
up as almost anonymous "mod_init".
This is helpful for debugging and in determining how long certain
module_init calls take to execute.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
We previously removed the restriction on looping to self, and then added
a test to make sure the kernel didn't blow up during a routing loop. The
kernel didn't blow up, thankfully, but on certain architectures where
skb fragmentation is easier, such as ppc64, the skbs weren't actually
being discarded after a few rounds through. But the test wasn't catching
this. So actually test explicitly for massive increases in tx to see if
we have a routing loop. Note that the actual loop problem will need to
be addressed in a different commit.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
RHEL 8.5 has been released. Replace all ISCENTOS8S checks with ISRHEL8.
Increase RHEL_MINOR for CentOS 8 Stream detection to 6.
Signed-off-by: Peter Georg <peter.georg@physik.uni-regensburg.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
grsecurity kernels tend to carry additional backports and changes, like
commit b60b87fc2996 ("netlink: add ethernet address policy types") or
the SYM_FUNC_* changes. RAP nowadays hooks the latter, therefore no
diversion to RAP_ENTRY is needed any more.
Instead of relying on the kernel version test, also test for the macros
we're about to define to not already be defined to account for these
additional changes in the grsecurity patch without breaking
compatibility to the older public ones.
Also test for CONFIG_PAX instead of RAP_PLUGIN for the timer API related
changes as these don't depend on the RAP plugin to be enabled but just a
PaX/grsecurity patch to be applied. While there is no preprocessor knob
for the latter, use CONFIG_PAX as this will likely be enabled in every
kernel that uses the patch.
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
[zx2c4: small changes to include a header nearby a macro def test]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The selftests currently parse the kernel log at the end to track
potential memory leaks. With these tests now reading off the end of the
buffer, due to recent optimizations, some creation messages were lost,
making the tests think that there was a free without an alloc. Fix this
by increasing the kernel log size.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Red Hat does awful things to their kernel for RHEL 8, such that it
doesn't even compile in most configurations. This is utter craziness,
and their response to me sending patches to fix this stuff has been to
stonewall for months on end and then do nothing.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
When removing single nodes, it's possible that that node's parent is an
empty intermediate node, in which case, it too should be removed.
Otherwise the trie fills up and never is fully emptied, leading to
gradual memory leaks over time for tries that are modified often. There
was originally code to do this, but was removed during refactoring in
2016 and never reworked. Now that we have proper parent pointers from
the previous commits, we can implement this properly.
In order to reduce branching and expensive comparisons, we want to keep
the double pointer for parent assignment (which lets us easily chain up
to the root), but we still need to actually get the parent's base
address. So encode the bit number into the last two bits of the pointer,
and pack and unpack it as needed. This is a little bit clumsy but is the
fastest and less memory wasteful of the compromises. Note that we align
the root struct here to a minimum of 4, because it's embedded into a
larger struct, and we're relying on having the bottom two bits for our
flag, which would only be 16-bit aligned on m68k.
The existing macro-based helpers were a bit unwieldy for adding the bit
packing to, so this commit replaces them with safer and clearer ordinary
functions.
We add a test to the randomized/fuzzer part of the selftests, to free
the randomized tries by-peer, refuzz it, and repeat, until it's supposed
to be empty, and then then see if that actually resulted in the whole
thing being emptied. That combined with kmemcheck should hopefully make
sure this commit is doing what it should. Along the way this resulted in
various other cleanups of the tests and fixes for recent graphviz.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The previous commit moved from O(n) to O(1) for removal, but in the
process introduced an additional pointer member to a struct that
increased the size from 60 to 68 bytes, putting nodes in the 128-byte
slab. With deployed systems having as many as 2 million nodes, this
represents a significant doubling in memory usage (128 MiB -> 256 MiB).
Fix this by using our own kmem_cache, that's sized exactly right. This
also makes wireguard's memory usage more transparent in tools like
slabtop and /proc/slabinfo.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Previously, deleting peers would require traversing the entire trie in
order to rebalance nodes and safely free them. This meant that removing
1000 peers from a trie with a half million nodes would take an extremely
long time, during which we're holding the rtnl lock. Large-scale users
were reporting 200ms latencies added to the networking stack as a whole
every time their userspace software would queue up significant removals.
That's a serious situation.
This commit fixes that by maintaining a double pointer to the parent's
bit pointer for each node, and then using the already existing node list
belonging to each peer to go directly to the node, fix up its pointers,
and free it with RCU. This means removal is O(1) instead of O(n), and we
don't use gobs of stack.
The removal algorithm has the same downside as the code that it fixes:
it won't collapse needlessly long runs of fillers. We can enhance that
in the future if it ever becomes a problem. This commit documents that
limitation with a TODO comment in code, a small but meaningful
improvement over the prior situation.
Currently the biggest flaw, which the next commit addresses, is that
because this increases the node size on 64-bit machines from 60 bytes to
68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up
using twice as much memory per node, because of power-of-two
allocations, which is a big bummer. We'll need to figure something out
there.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The randomized trie tests weren't initializing the dummy peer list head,
resulting in a NULL pointer dereference when used. Fix this by
initializing it in the randomized trie test, just like we do for the
static unit test.
While we're at it, all of the other strings like this have the word
"self-test", so add it to the missing place here.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
With deployments having upwards of 600k peers now, this somewhat heavy
structure could benefit from more fine-grained allocations.
Specifically, instead of using a 2048-byte slab for a 1544-byte object,
we can now use 1544-byte objects directly, thus saving almost 25%
per-peer, or with 600k peers, that's a savings of 303 MiB. This also
makes wireguard's memory usage more transparent in tools like slabtop
and /proc/slabinfo.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Many of the synchronization points are sometimes called under the rtnl
lock, which means we should use synchronize_net rather than
synchronize_rcu. Under the hood, this expands to using the expedited
flavor of function in the event that rtnl is held, in order to not stall
other concurrent changes.
This fixes some very, very long delays when removing multiple peers at
once, which would cause some operations to take several minutes.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Some distros may enable strict rp_filter by default, which will prevent
vethc from receiving the packets with an unroutable reverse path address.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This reverts commit cad80597c7.
Because this commit has not been backported so far, due to the implications
of building Ubuntu's backport of wireguard in a timely manner.
For now, reverting this fix would allow wireguard-linux-compat CI to work
on Ubuntu 18.04.
A different fix or the same one can be applied again when the time is
right.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
CentOS Stream 8 by now (4.18.0-301.1.el8) reports RHEL_MINOR=5. The
current RHEL 8 minor release is still 3. RHEL 8.4 is in beta. Replace
equal comparison by greater equal to (hopefully) be a little bit more
future proof.
Signed-off-by: Peter Georg <peter.georg@physik.uni-regensburg.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This corresponds to the fancier upstream commit that's still on lkml,
which passes a zeroed ip_options struct to __icmp_send.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
linux commit 22f6bbb7bcfcef0b373b0502a7ff390275c575dd ("net: use
skb_list_del_init() to remove from RX sublists") will be backported to Ubuntu
18.04 default kernel, which is based on linux 4.15.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>