* [PATCH net] net: tun: do not call napi_disable() twice
@ 2022-06-29 9:37 Eric Dumazet
2022-06-29 16:17 ` Jakub Kicinski
0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2022-06-29 9:37 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet, syzbot, Petar Penkov
syzbot reported a hang in tun_napi_disable() while RTNL is held.
Because tun.c logic is complicated, I chose to:
1) rename tun->napi_enabled to tun->napi_configured
2) Add a new boolean, tracking if tun->napi is enabled or not.
INFO: task kworker/0:1:14 blocked for more than 143 seconds.
Not tainted 5.19.0-rc3-syzkaller-00144-g3b0dc529f56b #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:1 state:D stack:27168 pid: 14 ppid: 2 flags:0x00004000
Workqueue: ipv6_addrconf addrconf_verify_work
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5146 [inline]
__schedule+0xa00/0x4b50 kernel/sched/core.c:6458
schedule+0xd2/0x1f0 kernel/sched/core.c:6530
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6589
__mutex_lock_common kernel/locking/mutex.c:679 [inline]
__mutex_lock+0xa70/0x1350 kernel/locking/mutex.c:747
addrconf_verify_work+0xe/0x20 net/ipv6/addrconf.c:4616
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
</TASK>
INFO: task dhcpcd:3190 blocked for more than 143 seconds.
Not tainted 5.19.0-rc3-syzkaller-00144-g3b0dc529f56b #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:dhcpcd state:D stack:22976 pid: 3190 ppid: 3189 flags:0x00000000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5146 [inline]
__schedule+0xa00/0x4b50 kernel/sched/core.c:6458
schedule+0xd2/0x1f0 kernel/sched/core.c:6530
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6589
__mutex_lock_common kernel/locking/mutex.c:679 [inline]
__mutex_lock+0xa70/0x1350 kernel/locking/mutex.c:747
__netlink_dump_start+0x16a/0x900 net/netlink/af_netlink.c:2344
netlink_dump_start include/linux/netlink.h:245 [inline]
rtnetlink_rcv_msg+0x73e/0xc90 net/core/rtnetlink.c:6046
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
__sys_sendto+0x21a/0x320 net/socket.c:2119
__do_sys_sendto net/socket.c:2131 [inline]
__se_sys_sendto net/socket.c:2127 [inline]
__x64_sys_sendto+0xdd/0x1b0 net/socket.c:2127
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fca4209d206
RSP: 002b:00007fff12495ae8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fff12496c20 RCX: 00007fca4209d206
RDX: 0000000000000014 RSI: 00007fff12496b40 RDI: 0000000000000018
RBP: 00007fff12496bb0 R08: 00007fff12496b24 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff12496b40
R13: 00007fff12496b24 R14: 0000000000000000 R15: 00007fff12495af0
</TASK>
Showing all locks held in the system:
3 locks held by kworker/0:1/14:
1 lock held by khungtaskd/29:
1 lock held by dhcpcd/3190:
2 locks held by getty/3293:
1 lock held by syz-executor658/3647:
=============================================
NMI backtrace for cpu 0
CPU: 0 PID: 29 Comm: khungtaskd Not tainted 5.19.0-rc3-syzkaller-00144-g3b0dc529f56b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
nmi_trigger_cpumask_backtrace+0x1e6/0x230 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:220 [inline]
watchdog+0xc22/0xf90 kernel/hung_task.c:378
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
</TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 3647 Comm: syz-executor658 Not tainted 5.19.0-rc3-syzkaller-00144-g3b0dc529f56b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/irqflags.h:137 [inline]
RIP: 0010:lock_is_held_type+0xf0/0x140 kernel/locking/lockdep.c:5710
Code: f0 41 0f 94 c5 48 c7 c7 e0 7c cc 89 e8 69 0d 00 00 b8 ff ff ff ff 65 0f c1 05 d4 79 8b 76 83 f8 01 75 29 9c 58 f6 c4 02 75 3d <48> f7 04 24 00 02 00 00 74 01 fb 48 83 c4 08 44 89 e8 5b 5d 41 5c
RSP: 0018:ffffc90002fcf928 EFLAGS: 00000046
RAX: 0000000000000046 RBX: 0000000000000001 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffff8880b9b39ed8 R08: ffff8880b9b3a908 R09: ffffffff8dbb8517
R10: fffffbfff1b770a2 R11: dffffc0000000000 R12: ffff88801ba31d80
R13: 0000000000000001 R14: 00000000ffffffff R15: ffff88801ba32808
FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055fe675980b0 CR3: 000000000ba8e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
lock_is_held include/linux/lockdep.h:279 [inline]
lockdep_assert_rq_held kernel/sched/sched.h:1295 [inline]
rq_clock kernel/sched/sched.h:1450 [inline]
sched_info_arrive kernel/sched/stats.h:239 [inline]
sched_info_switch kernel/sched/stats.h:295 [inline]
prepare_task_switch kernel/sched/core.c:4955 [inline]
context_switch kernel/sched/core.c:5098 [inline]
__schedule+0x2b44/0x4b50 kernel/sched/core.c:6458
schedule+0xd2/0x1f0 kernel/sched/core.c:6530
schedule_hrtimeout_range_clock+0x195/0x390 kernel/time/hrtimer.c:2305
usleep_range_state+0x129/0x1b0 kernel/time/timer.c:2132
usleep_range include/linux/delay.h:67 [inline]
napi_disable+0xff/0x120 net/core/dev.c:6402
tun_napi_disable drivers/net/tun.c:285 [inline]
__tun_detach+0x165/0x1440 drivers/net/tun.c:643
tun_detach drivers/net/tun.c:700 [inline]
tun_chr_close+0xc4/0x180 drivers/net/tun.c:3454
__fput+0x277/0x9d0 fs/file_table.c:317
task_work_run+0xdd/0x1a0 kernel/task_work.c:177
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xaff/0x2a00 kernel/exit.c:795
do_group_exit+0xd2/0x2f0 kernel/exit.c:925
__do_sys_exit_group kernel/exit.c:936 [inline]
__se_sys_exit_group kernel/exit.c:934 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:934
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7efd8d173a29
Code: Unable to access opcode bytes at RIP 0x7efd8d1739ff.
RSP: 002b:00007ffdb72544d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007efd8d1e7330 RCX: 00007efd8d173a29
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 00007ffdb72546c8
R10: 00007ffdb72546c8 R11: 0000000000000246 R12: 00007efd8d1e7330
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
</TASK>
Fixes: a8fc8cb5692a ("net: tun: stop NAPI when detaching queues")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Petar Penkov <ppenkov@aviatrix.com>
---
drivers/net/tun.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e2eb35887394e384972f573745f5870ba8c9d19b..7dab3dc1c387a4f98c72490e955e78b8d5d9da25 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -138,6 +138,7 @@ struct tun_file {
unsigned int ifindex;
};
struct napi_struct napi;
+ bool napi_configured;
bool napi_enabled;
bool napi_frags_enabled;
struct mutex napi_mutex; /* Protects access to the above napi */
@@ -265,29 +266,34 @@ static int tun_napi_poll(struct napi_struct *napi, int budget)
static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile,
bool napi_en, bool napi_frags)
{
- tfile->napi_enabled = napi_en;
+ tfile->napi_configured = napi_en;
tfile->napi_frags_enabled = napi_en && napi_frags;
if (napi_en) {
netif_napi_add_tx(tun->dev, &tfile->napi, tun_napi_poll);
napi_enable(&tfile->napi);
+ tfile->napi_enabled = true;
}
}
static void tun_napi_enable(struct tun_file *tfile)
{
- if (tfile->napi_enabled)
+ if (tfile->napi_configured && !tfile->napi_enabled) {
napi_enable(&tfile->napi);
+ tfile->napi_enabled = true;
+ }
}
static void tun_napi_disable(struct tun_file *tfile)
{
- if (tfile->napi_enabled)
+ if (tfile->napi_configured && tfile->napi_enabled) {
napi_disable(&tfile->napi);
+ tfile->napi_enabled = false;
+ }
}
static void tun_napi_del(struct tun_file *tfile)
{
- if (tfile->napi_enabled)
+ if (tfile->napi_configured)
netif_napi_del(&tfile->napi);
}
@@ -1977,7 +1983,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
napi_gro_frags(&tfile->napi);
local_bh_enable();
mutex_unlock(&tfile->napi_mutex);
- } else if (tfile->napi_enabled) {
+ } else if (tfile->napi_configured) {
struct sk_buff_head *queue = &tfile->sk.sk_write_queue;
int queue_len;
@@ -2498,7 +2504,7 @@ static int tun_xdp_one(struct tun_struct *tun,
!tfile->detached)
rxhash = __skb_get_hash_symmetric(skb);
- if (tfile->napi_enabled) {
+ if (tfile->napi_configured) {
queue = &tfile->sk.sk_write_queue;
spin_lock(&queue->lock);
__skb_queue_tail(queue, skb);
@@ -2553,7 +2559,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
if (flush)
xdp_do_flush();
- if (tfile->napi_enabled && queued > 0)
+ if (tfile->napi_configured && queued > 0)
napi_schedule(&tfile->napi);
rcu_read_unlock();
--
2.37.0.rc0.161.g10f37bed90-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net] net: tun: do not call napi_disable() twice
2022-06-29 9:37 [PATCH net] net: tun: do not call napi_disable() twice Eric Dumazet
@ 2022-06-29 16:17 ` Jakub Kicinski
2022-06-29 16:19 ` Eric Dumazet
0 siblings, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2022-06-29 16:17 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, netdev, eric.dumazet, syzbot,
Petar Penkov
On Wed, 29 Jun 2022 09:37:52 +0000 Eric Dumazet wrote:
> syzbot reported a hang in tun_napi_disable() while RTNL is held.
>
> Because tun.c logic is complicated, I chose to:
>
> 1) rename tun->napi_enabled to tun->napi_configured
>
> 2) Add a new boolean, tracking if tun->napi is enabled or not.
Not a huge surprise TBH :S
Is there a repro?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net] net: tun: do not call napi_disable() twice
2022-06-29 16:17 ` Jakub Kicinski
@ 2022-06-29 16:19 ` Eric Dumazet
2022-06-29 16:27 ` Jakub Kicinski
0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2022-06-29 16:19 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, netdev, Eric Dumazet, syzbot,
Petar Penkov
On Wed, Jun 29, 2022 at 6:17 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 29 Jun 2022 09:37:52 +0000 Eric Dumazet wrote:
> > syzbot reported a hang in tun_napi_disable() while RTNL is held.
> >
> > Because tun.c logic is complicated, I chose to:
> >
> > 1) rename tun->napi_enabled to tun->napi_configured
> >
> > 2) Add a new boolean, tracking if tun->napi is enabled or not.
>
> Not a huge surprise TBH :S
>
> Is there a repro?
Yes, here it is:
// autogenerated by syzkaller (https://github.com/google/syzkaller)
#define _GNU_SOURCE
#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
static void sleep_ms(uint64_t ms)
{
usleep(ms * 1000);
}
static uint64_t current_time_ms(void)
{
struct timespec ts;
if (clock_gettime(CLOCK_MONOTONIC, &ts))
exit(1);
return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}
static bool write_file(const char* file, const char* what, ...)
{
char buf[1024];
va_list args;
va_start(args, what);
vsnprintf(buf, sizeof(buf), what, args);
va_end(args);
buf[sizeof(buf) - 1] = 0;
int len = strlen(buf);
int fd = open(file, O_WRONLY | O_CLOEXEC);
if (fd == -1)
return false;
if (write(fd, buf, len) != len) {
int err = errno;
close(fd);
errno = err;
return false;
}
close(fd);
return true;
}
static void kill_and_wait(int pid, int* status)
{
kill(-pid, SIGKILL);
kill(pid, SIGKILL);
for (int i = 0; i < 100; i++) {
if (waitpid(-1, status, WNOHANG | __WALL) == pid)
return;
usleep(1000);
}
DIR* dir = opendir("/sys/fs/fuse/connections");
if (dir) {
for (;;) {
struct dirent* ent = readdir(dir);
if (!ent)
break;
if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
continue;
char abort[300];
snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
ent->d_name);
int fd = open(abort, O_WRONLY);
if (fd == -1) {
continue;
}
if (write(fd, abort, 1) < 0) {
}
close(fd);
}
closedir(dir);
} else {
}
while (waitpid(-1, status, __WALL) != pid) {
}
}
static void setup_test()
{
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
setpgrp();
write_file("/proc/self/oom_score_adj", "1000");
}
static void execute_one(void);
#define WAIT_FLAGS __WALL
static void loop(void)
{
int iter = 0;
for (;; iter++) {
int pid = fork();
if (pid < 0)
exit(1);
if (pid == 0) {
setup_test();
execute_one();
exit(0);
}
int status = 0;
uint64_t start = current_time_ms();
for (;;) {
if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
break;
sleep_ms(1);
if (current_time_ms() - start < 5000)
continue;
kill_and_wait(pid, &status);
break;
}
}
}
uint64_t r[1] = {0xffffffffffffffff};
void execute_one(void)
{
intptr_t res = 0;
memcpy((void*)0x20000100, "/dev/net/tun\000", 13);
res = syscall(__NR_openat, 0xffffffffffffff9cul, 0x20000100ul, 0ul, 0ul);
if (res != -1)
r[0] = res;
memcpy((void*)0x20000040, "netpci0\000\000\000\000\000\000\000\000\000", 16);
*(uint16_t*)0x20000050 = 0x2512;
syscall(__NR_ioctl, r[0], 0x400454ca, 0x20000040ul);
memcpy((void*)0x200001c0, "caif0\000\000\000\000\000\000\000\000\000\000\000",
16);
*(uint16_t*)0x200001d0 = 0x400;
syscall(__NR_ioctl, r[0], 0x400454d9, 0x200001c0ul);
syscall(__NR_ioctl, r[0], 0x401054d5, 0ul);
}
int main(void)
{
syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
loop();
return 0;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net] net: tun: do not call napi_disable() twice
2022-06-29 16:19 ` Eric Dumazet
@ 2022-06-29 16:27 ` Jakub Kicinski
0 siblings, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2022-06-29 16:27 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, netdev, Eric Dumazet, syzbot,
Petar Penkov
On Wed, 29 Jun 2022 18:19:58 +0200 Eric Dumazet wrote:
> On Wed, Jun 29, 2022 at 6:17 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Wed, 29 Jun 2022 09:37:52 +0000 Eric Dumazet wrote:
> > > syzbot reported a hang in tun_napi_disable() while RTNL is held.
> > >
> > > Because tun.c logic is complicated, I chose to:
> > >
> > > 1) rename tun->napi_enabled to tun->napi_configured
> > >
> > > 2) Add a new boolean, tracking if tun->napi is enabled or not.
> >
> > Not a huge surprise TBH :S
> >
> > Is there a repro?
>
> Yes, here it is:
>
> // autogenerated by syzkaller (https://github.com/google/syzkaller)
Thanks! let me test this:
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e2eb35887394..8776a9e1a8f5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -661,7 +661,6 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
sock_put(&tfile->sk);
} else {
tun_disable_queue(tun, tfile);
- tun_napi_disable(tfile);
}
synchronize_net();
@@ -719,6 +718,7 @@ static void tun_detach_all(struct net_device *dev)
--tun->numqueues;
}
list_for_each_entry(tfile, &tun->disabled, next) {
+ tun_napi_disable(tfile);
tfile->socket.sk->sk_shutdown = RCV_SHUTDOWN;
tfile->socket.sk->sk_data_ready(tfile->socket.sk);
RCU_INIT_POINTER(tfile->tun, NULL);
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-06-29 16:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29 9:37 [PATCH net] net: tun: do not call napi_disable() twice Eric Dumazet
2022-06-29 16:17 ` Jakub Kicinski
2022-06-29 16:19 ` Eric Dumazet
2022-06-29 16:27 ` Jakub Kicinski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.