From: Jonathan Lemon <jonathan.lemon@gmail.com>
To: <netdev@vger.kernel.org>, <ast@kernel.org>,
<daniel@iogearbox.net>, <yhs@fb.com>, <bpf@vger.kernel.org>
Cc: <kernel-team@fb.com>
Subject: [PATCH 1/1 v3 bpf-next] bpf: increment and use correct thread iterator
Date: Fri, 11 Dec 2020 09:11:38 -0800 [thread overview]
Message-ID: <20201211171138.63819-2-jonathan.lemon@gmail.com> (raw)
In-Reply-To: <20201211171138.63819-1-jonathan.lemon@gmail.com>
From: Jonathan Lemon <bsd@fb.com>
On some systems, some variant of the following splat is
repeatedly seen. The common factor in all traces seems
to be the entry point to task_file_seq_next(). With the
patch, all warnings go away.
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
\x09(t=21033 jiffies g=159148529 q=223125)
NMI backtrace for cpu 26
CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
Call Trace:
<IRQ>
dump_stack+0x50/0x70
nmi_cpu_backtrace.cold.6+0x13/0x50
? lapic_can_unplug_cpu.cold.30+0x40/0x40
nmi_trigger_cpumask_backtrace+0xba/0xca
rcu_dump_cpu_stacks+0x99/0xc7
rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
? tick_sched_do_timer+0x60/0x60
update_process_times+0x24/0x50
tick_sched_timer+0x37/0x70
__hrtimer_run_queues+0xfe/0x270
hrtimer_interrupt+0xf4/0x210
smp_apic_timer_interrupt+0x5e/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:get_pid_task+0x38/0x80
Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
? find_ge_pid+0x1b/0x20
task_seq_get_next+0x52/0xc0
task_file_seq_get_next+0x159/0x220
task_file_seq_next+0x4f/0xa0
bpf_seq_read+0x159/0x390
vfs_read+0x8a/0x140
ksys_read+0x59/0xd0
do_syscall_64+0x42/0x110
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f95ae73e76e
Code: Bad RIP value.
RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0
The attached patch does 3 things:
1) If unable to obtain the file structure for the current task,
proceed to the next task number after the one returned from
task_seq_get_next(), instead of the next task number from the
original iterator.
2) Use thread_group_leader() instead of the open-coded comparision
of tgid vs pid.
3) Only obtain the task reference count at the end of the RCU section
instead of repeatedly obtaining/releasing it when iterathing though
a thread group.
Fixes: a650da2ee52a ("bpf: Add task and task/file iterator targets")
Fixes: 67b6b863e6ab ("bpf: Avoid iterating duplicated files for task_file iterator")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
kernel/bpf/task_iter.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 0458a40edf10..66a52fcf589a 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -33,17 +33,17 @@ static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
pid = find_ge_pid(*tid, ns);
if (pid) {
*tid = pid_nr_ns(pid, ns);
- task = get_pid_task(pid, PIDTYPE_PID);
+ task = pid_task(pid, PIDTYPE_PID);
if (!task) {
++*tid;
goto retry;
- } else if (skip_if_dup_files && task->tgid != task->pid &&
+ } else if (skip_if_dup_files && !thread_group_leader(task) &&
task->files == task->group_leader->files) {
- put_task_struct(task);
task = NULL;
++*tid;
goto retry;
}
+ get_task_struct(task);
}
rcu_read_unlock();
@@ -164,7 +164,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
curr_files = get_files_struct(curr_task);
if (!curr_files) {
put_task_struct(curr_task);
- curr_tid = ++(info->tid);
+ curr_tid = curr_tid + 1;
info->fd = 0;
goto again;
}
--
2.24.1
next prev parent reply other threads:[~2020-12-11 19:00 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-11 17:11 [PATCH 0/1 v3 bpf-next] bpf: increment and use correct thread iterator Jonathan Lemon
2020-12-11 17:11 ` Jonathan Lemon [this message]
2020-12-11 20:23 ` [PATCH 1/1 " Andrii Nakryiko
2020-12-11 23:01 ` Jonathan Lemon
2020-12-12 0:28 ` Andrii Nakryiko
2020-12-14 7:00 ` Yonghong Song
2020-12-18 16:53 ` Yonghong Song
2020-12-18 18:06 ` Jonathan Lemon
2020-12-18 18:21 ` Yonghong Song
2020-12-18 19:28 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201211171138.63819-2-jonathan.lemon@gmail.com \
--to=jonathan.lemon@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.