* [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators
@ 2020-08-18 22:23 Yonghong Song
2020-08-18 22:23 ` [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator Yonghong Song
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Yonghong Song @ 2020-08-18 22:23 UTC (permalink / raw)
To: bpf, netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
Patch #1 fixed a rcu stall warning when traversing large number
of tasks/files without overflowing seq_file buffer. The method
is to control the number of visited objects in bpf_seq_read()
so all bpf iterators will benefit.
Patch #2 calculated tid properly in a namespace in order to avoid
visiting the name task multiple times.
Patch #3 handled read() error code EAGAIN properly in bpftool
bpf_iter userspace code to collect pids. The change is needed
due to Patch #1.
Yonghong Song (3):
bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
bpf: avoid visit same object multiple times
bpftool: handle EAGAIN error code properly in pids collection
kernel/bpf/bpf_iter.c | 15 ++++++++++++++-
kernel/bpf/task_iter.c | 3 ++-
tools/bpf/bpftool/pids.c | 2 ++
3 files changed, 18 insertions(+), 2 deletions(-)
--
2.24.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
2020-08-18 22:23 [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
@ 2020-08-18 22:23 ` Yonghong Song
2020-08-19 0:05 ` Alexei Starovoitov
2020-08-18 22:23 ` [PATCH bpf v2 2/3] bpf: avoid visit same object multiple times Yonghong Song
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Yonghong Song @ 2020-08-18 22:23 UTC (permalink / raw)
To: bpf, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Paul E . McKenney
In our production system, we observed rcu stalls when
'bpftool prog` is running.
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
\x09(t=21031 jiffies g=2534773 q=179750)
NMI backtrace for cpu 7
CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G W 5.8.0-00004-g68bfc7f8c1b4 #6
Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
Call Trace:
<IRQ>
dump_stack+0x57/0x70
nmi_cpu_backtrace.cold+0x14/0x53
? lapic_can_unplug_cpu.cold+0x39/0x39
nmi_trigger_cpumask_backtrace+0xb7/0xc7
rcu_dump_cpu_stacks+0xa2/0xd0
rcu_sched_clock_irq.cold+0x1ff/0x3d9
? tick_nohz_handler+0x100/0x100
update_process_times+0x5b/0x90
tick_sched_timer+0x5e/0xf0
__hrtimer_run_queues+0x12a/0x2a0
hrtimer_interrupt+0x10e/0x280
__sysvec_apic_timer_interrupt+0x51/0xe0
asm_call_on_stack+0xf/0x20
</IRQ>
sysvec_apic_timer_interrupt+0x6f/0x80
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0010:task_file_seq_get_next+0x71/0x220
Code: 00 00 8b 53 1c 49 8b 7d 00 89 d6 48 8b 47 20 44 8b 18 41 39 d3 76 75 48 8b 4f 20 8b 01 39 d0 76 61 41 89 d1 49 39 c1 48 19 c0 <48> 8b 49 08 21 d0 48 8d 04 c1 4c 8b 08 4d 85 c9 74 46 49 8b 41 38
RSP: 0018:ffffc90006223e10 EFLAGS: 00000297
RAX: ffffffffffffffff RBX: ffff888f0d172388 RCX: ffff888c8c07c1c0
RDX: 00000000000f017b RSI: 00000000000f017b RDI: ffff888c254702c0
RBP: ffffc90006223e68 R08: ffff888be2a1c140 R09: 00000000000f017b
R10: 0000000000000002 R11: 0000000000100000 R12: ffff888f23c24118
R13: ffffc90006223e60 R14: ffffffff828509a0 R15: 00000000ffffffff
task_file_seq_next+0x52/0xa0
bpf_seq_read+0xb9/0x320
vfs_read+0x9d/0x180
ksys_read+0x5f/0xe0
do_syscall_64+0x38/0x60
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f8815f4f76e
Code: c0 e9 f6 fe ff ff 55 48 8d 3d 76 70 0a 00 48 89 e5 e8 36 06 02 00 66 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 52 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
RSP: 002b:00007fff8f9df578 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000000000170b9c0 RCX: 00007f8815f4f76e
RDX: 0000000000001000 RSI: 00007fff8f9df5b0 RDI: 0000000000000007
RBP: 00007fff8f9e05f0 R08: 0000000000000049 R09: 0000000000000010
R10: 00007f881601fa40 R11: 0000000000000246 R12: 00007fff8f9e05a8
R13: 00007fff8f9e05a8 R14: 0000000001917f90 R15: 000000000000e22e
Note that `bpftool prog` actually calls a task_file bpf iterator
program to establish an association between prog/map/link/btf anon
files and processes.
In the case where the above rcu stall occured, we had a process
having 1587 tasks and each task having roughly 81305 files.
This implied 129 million bpf prog invocations. Unfortunwtely none of
these files are prog/map/link/btf files so bpf iterator/prog needs
to traverse all these files and not able to return to user space
since there are no seq_file buffer overflow.
This patch fixed the issue in bpf_seq_read() to limit the number
of visited objects. If the maximum number of visited objects is
reached, no more objects will be visited in the current syscall.
If there is nothing written in the seq_file buffer, -EAGAIN will
return to the user so user can try again.
The maximum number of visited objects is set at 1 million.
In our Intel Xeon D-2191 2.3GHZ 18-core server, bpf_seq_read()
visiting 1 million files takes around 0.18 seconds.
We did not use cond_resched() since for some iterators, e.g.,
netlink iterator, where rcu read_lock critical section spans between
consecutive seq_ops->next(), which makes impossible to do cond_resched()
in the key while loop of function bpf_seq_read().
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
kernel/bpf/bpf_iter.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index b6715964b685..8faa2ce89396 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -67,6 +67,9 @@ static void bpf_iter_done_stop(struct seq_file *seq)
iter_priv->done_stop = true;
}
+/* maximum visited objects before bailing out */
+#define MAX_ITER_OBJECTS 1000000
+
/* bpf_seq_read, a customized and simpler version for bpf iterator.
* no_llseek is assumed for this file.
* The following are differences from seq_read():
@@ -79,7 +82,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
{
struct seq_file *seq = file->private_data;
size_t n, offs, copied = 0;
- int err = 0;
+ int err = 0, num_objs = 0;
void *p;
mutex_lock(&seq->lock);
@@ -135,6 +138,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
while (1) {
loff_t pos = seq->index;
+ num_objs++;
offs = seq->count;
p = seq->op->next(seq, p, &seq->index);
if (pos == seq->index) {
@@ -153,6 +157,15 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
if (seq->count >= size)
break;
+ if (num_objs >= MAX_ITER_OBJECTS) {
+ if (offs == 0) {
+ err = -EAGAIN;
+ seq->op->stop(seq, p);
+ goto done;
+ }
+ break;
+ }
+
err = seq->op->show(seq, p);
if (err > 0) {
bpf_iter_dec_seq_num(seq);
--
2.24.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf v2 2/3] bpf: avoid visit same object multiple times
2020-08-18 22:23 [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
2020-08-18 22:23 ` [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator Yonghong Song
@ 2020-08-18 22:23 ` Yonghong Song
2020-08-18 22:23 ` [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection Yonghong Song
2020-08-18 22:28 ` [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
3 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2020-08-18 22:23 UTC (permalink / raw)
To: bpf, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Rik van Riel
Currently when traversing all tasks, the next tid
is always increased by one. This may result in
visiting the same task multiple times in a
pid namespace.
This patch fixed the issue by seting the next
tid as pid_nr_ns(pid, ns) + 1, similar to
funciton next_tgid().
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
kernel/bpf/task_iter.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index f21b5e1e4540..99af4cea1102 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -29,8 +29,9 @@ static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
rcu_read_lock();
retry:
- pid = idr_get_next(&ns->idr, tid);
+ pid = find_ge_pid(*tid, ns);
if (pid) {
+ *tid = pid_nr_ns(pid, ns);
task = get_pid_task(pid, PIDTYPE_PID);
if (!task) {
++*tid;
--
2.24.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection
2020-08-18 22:23 [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
2020-08-18 22:23 ` [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator Yonghong Song
2020-08-18 22:23 ` [PATCH bpf v2 2/3] bpf: avoid visit same object multiple times Yonghong Song
@ 2020-08-18 22:23 ` Yonghong Song
2020-08-18 22:30 ` Andrii Nakryiko
2020-08-18 22:28 ` [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
3 siblings, 1 reply; 9+ messages in thread
From: Yonghong Song @ 2020-08-18 22:23 UTC (permalink / raw)
To: bpf, netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
When the error code is EAGAIN, the kernel signals the user
space should retry the read() operation for bpf iterators.
Let us do it.
Signed-off-by: Yonghong Song <yhs@fb.com>
---
tools/bpf/bpftool/pids.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/bpf/bpftool/pids.c b/tools/bpf/bpftool/pids.c
index e3b116325403..df7d8ec76036 100644
--- a/tools/bpf/bpftool/pids.c
+++ b/tools/bpf/bpftool/pids.c
@@ -134,6 +134,8 @@ int build_obj_refs_table(struct obj_refs_table *table, enum bpf_obj_type type)
while (true) {
ret = read(fd, buf, sizeof(buf));
if (ret < 0) {
+ if (errno == EAGAIN)
+ continue;
err = -errno;
p_err("failed to read PID iterator output: %d", err);
goto out;
--
2.24.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators
2020-08-18 22:23 [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
` (2 preceding siblings ...)
2020-08-18 22:23 ` [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection Yonghong Song
@ 2020-08-18 22:28 ` Yonghong Song
3 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2020-08-18 22:28 UTC (permalink / raw)
To: bpf, netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
On 8/18/20 3:23 PM, Yonghong Song wrote:
> Patch #1 fixed a rcu stall warning when traversing large number
> of tasks/files without overflowing seq_file buffer. The method
> is to control the number of visited objects in bpf_seq_read()
> so all bpf iterators will benefit.
>
> Patch #2 calculated tid properly in a namespace in order to avoid
> visiting the name task multiple times.
>
> Patch #3 handled read() error code EAGAIN properly in bpftool
> bpf_iter userspace code to collect pids. The change is needed
> due to Patch #1.
Sorry. Missed the changelog below.
Changelogs:
- v1 -> v2:
. do ratelimiting in bpf_seq_read() with a counter.
cond_resched() doesn't work as some iterators (e.g., tcp,
netlink, etc.) has rcu read critical section across
consecutive seq_ops->next() functions.
>
> Yonghong Song (3):
> bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
> bpf: avoid visit same object multiple times
> bpftool: handle EAGAIN error code properly in pids collection
>
> kernel/bpf/bpf_iter.c | 15 ++++++++++++++-
> kernel/bpf/task_iter.c | 3 ++-
> tools/bpf/bpftool/pids.c | 2 ++
> 3 files changed, 18 insertions(+), 2 deletions(-)
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection
2020-08-18 22:23 ` [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection Yonghong Song
@ 2020-08-18 22:30 ` Andrii Nakryiko
0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2020-08-18 22:30 UTC (permalink / raw)
To: Yonghong Song
Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann, Kernel Team
On Tue, Aug 18, 2020 at 3:24 PM Yonghong Song <yhs@fb.com> wrote:
>
> When the error code is EAGAIN, the kernel signals the user
> space should retry the read() operation for bpf iterators.
> Let us do it.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
LGTM.
Acked-by: Andrii Nakryiko <andriin@fb.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
2020-08-18 22:23 ` [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator Yonghong Song
@ 2020-08-19 0:05 ` Alexei Starovoitov
2020-08-19 0:30 ` Yonghong Song
0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2020-08-19 0:05 UTC (permalink / raw)
To: Yonghong Song
Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
Paul E . McKenney
On Tue, Aug 18, 2020 at 03:23:09PM -0700, Yonghong Song wrote:
>
> We did not use cond_resched() since for some iterators, e.g.,
> netlink iterator, where rcu read_lock critical section spans between
> consecutive seq_ops->next(), which makes impossible to do cond_resched()
> in the key while loop of function bpf_seq_read().
but after this patch we can, right?
>
> +/* maximum visited objects before bailing out */
> +#define MAX_ITER_OBJECTS 1000000
> +
> /* bpf_seq_read, a customized and simpler version for bpf iterator.
> * no_llseek is assumed for this file.
> * The following are differences from seq_read():
> @@ -79,7 +82,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
> {
> struct seq_file *seq = file->private_data;
> size_t n, offs, copied = 0;
> - int err = 0;
> + int err = 0, num_objs = 0;
> void *p;
>
> mutex_lock(&seq->lock);
> @@ -135,6 +138,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
> while (1) {
> loff_t pos = seq->index;
>
> + num_objs++;
> offs = seq->count;
> p = seq->op->next(seq, p, &seq->index);
> if (pos == seq->index) {
> @@ -153,6 +157,15 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
> if (seq->count >= size)
> break;
>
> + if (num_objs >= MAX_ITER_OBJECTS) {
> + if (offs == 0) {
> + err = -EAGAIN;
> + seq->op->stop(seq, p);
> + goto done;
> + }
> + break;
> + }
> +
should this block be after op->show() and error processing?
Otherwise bpf_iter_inc_seq_num() will be incorrectly incremented?
> err = seq->op->show(seq, p);
> if (err > 0) {
> bpf_iter_dec_seq_num(seq);
After op->stop() we can do cond_resched() in all cases,
since rhashtable walk does rcu_unlock in stop() callback, right?
I think copy_to_user() and mutex_unlock() don't do cond_resched()
equivalent work.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
2020-08-19 0:05 ` Alexei Starovoitov
@ 2020-08-19 0:30 ` Yonghong Song
2020-08-19 0:49 ` Alexei Starovoitov
0 siblings, 1 reply; 9+ messages in thread
From: Yonghong Song @ 2020-08-19 0:30 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
Paul E . McKenney
On 8/18/20 5:05 PM, Alexei Starovoitov wrote:
> On Tue, Aug 18, 2020 at 03:23:09PM -0700, Yonghong Song wrote:
>>
>> We did not use cond_resched() since for some iterators, e.g.,
>> netlink iterator, where rcu read_lock critical section spans between
>> consecutive seq_ops->next(), which makes impossible to do cond_resched()
>> in the key while loop of function bpf_seq_read().
>
> but after this patch we can, right?
We can do cond_resched() after seq->op->stop(). See more below.
>
>>
>> +/* maximum visited objects before bailing out */
>> +#define MAX_ITER_OBJECTS 1000000
>> +
>> /* bpf_seq_read, a customized and simpler version for bpf iterator.
>> * no_llseek is assumed for this file.
>> * The following are differences from seq_read():
>> @@ -79,7 +82,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>> {
>> struct seq_file *seq = file->private_data;
>> size_t n, offs, copied = 0;
>> - int err = 0;
>> + int err = 0, num_objs = 0;
>> void *p;
>>
>> mutex_lock(&seq->lock);
>> @@ -135,6 +138,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>> while (1) {
>> loff_t pos = seq->index;
>>
>> + num_objs++;
>> offs = seq->count;
>> p = seq->op->next(seq, p, &seq->index);
>> if (pos == seq->index) {
>> @@ -153,6 +157,15 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>> if (seq->count >= size)
>> break;
>>
>> + if (num_objs >= MAX_ITER_OBJECTS) {
>> + if (offs == 0) {
>> + err = -EAGAIN;
>> + seq->op->stop(seq, p);
>> + goto done;
>> + }
>> + break;
>> + }
>> +
>
> should this block be after op->show() and error processing?
> Otherwise bpf_iter_inc_seq_num() will be incorrectly incremented?
The purpose of op->next() is to calculate the "next" object position,
stored in the seq private data. So for next read() syscall, start()
will try to fetch the data based on the info in seq private data.
This is true for conditions "if (seq->count >= size) break"
in the above so next op->start() can try to locate the correct
object. The same is for this -EAGAIN thing.
>
>> err = seq->op->show(seq, p);
>> if (err > 0) {
>> bpf_iter_dec_seq_num(seq);
>
> After op->stop() we can do cond_resched() in all cases,
> since rhashtable walk does rcu_unlock in stop() callback, right?
Yes, we can. I am thinking since we return to user space,
cond_resched() might not be needed since returning to user space
will trigger some kind of scheduling. This patch fixed
the rcu stall issue. But if my understanding is incorrect,
I am happy to add cond_reched().
> I think copy_to_user() and mutex_unlock() don't do cond_resched()
> equivalent work.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator
2020-08-19 0:30 ` Yonghong Song
@ 2020-08-19 0:49 ` Alexei Starovoitov
0 siblings, 0 replies; 9+ messages in thread
From: Alexei Starovoitov @ 2020-08-19 0:49 UTC (permalink / raw)
To: Yonghong Song
Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
Paul E . McKenney
On Tue, Aug 18, 2020 at 05:30:37PM -0700, Yonghong Song wrote:
>
>
> On 8/18/20 5:05 PM, Alexei Starovoitov wrote:
> > On Tue, Aug 18, 2020 at 03:23:09PM -0700, Yonghong Song wrote:
> > >
> > > We did not use cond_resched() since for some iterators, e.g.,
> > > netlink iterator, where rcu read_lock critical section spans between
> > > consecutive seq_ops->next(), which makes impossible to do cond_resched()
> > > in the key while loop of function bpf_seq_read().
> >
> > but after this patch we can, right?
>
> We can do cond_resched() after seq->op->stop(). See more below.
>
> >
> > > +/* maximum visited objects before bailing out */
> > > +#define MAX_ITER_OBJECTS 1000000
> > > +
> > > /* bpf_seq_read, a customized and simpler version for bpf iterator.
> > > * no_llseek is assumed for this file.
> > > * The following are differences from seq_read():
> > > @@ -79,7 +82,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
> > > {
> > > struct seq_file *seq = file->private_data;
> > > size_t n, offs, copied = 0;
> > > - int err = 0;
> > > + int err = 0, num_objs = 0;
> > > void *p;
> > > mutex_lock(&seq->lock);
> > > @@ -135,6 +138,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
> > > while (1) {
> > > loff_t pos = seq->index;
> > > + num_objs++;
> > > offs = seq->count;
> > > p = seq->op->next(seq, p, &seq->index);
> > > if (pos == seq->index) {
> > > @@ -153,6 +157,15 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
> > > if (seq->count >= size)
> > > break;
> > > + if (num_objs >= MAX_ITER_OBJECTS) {
> > > + if (offs == 0) {
> > > + err = -EAGAIN;
> > > + seq->op->stop(seq, p);
> > > + goto done;
> > > + }
> > > + break;
> > > + }
> > > +
> >
> > should this block be after op->show() and error processing?
> > Otherwise bpf_iter_inc_seq_num() will be incorrectly incremented?
>
> The purpose of op->next() is to calculate the "next" object position,
> stored in the seq private data. So for next read() syscall, start()
> will try to fetch the data based on the info in seq private data.
>
> This is true for conditions "if (seq->count >= size) break"
> in the above so next op->start() can try to locate the correct
> object. The same is for this -EAGAIN thing.
>
> >
> > > err = seq->op->show(seq, p);
> > > if (err > 0) {
> > > bpf_iter_dec_seq_num(seq);
> >
> > After op->stop() we can do cond_resched() in all cases,
> > since rhashtable walk does rcu_unlock in stop() callback, right?
>
> Yes, we can. I am thinking since we return to user space,
> cond_resched() might not be needed since returning to user space
> will trigger some kind of scheduling. This patch fixed
> the rcu stall issue. But if my understanding is incorrect,
> I am happy to add cond_reched().
ahh. you're correct on both counts. Applied all three patches to bpf tree. thanks!
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-08-19 0:50 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-18 22:23 [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
2020-08-18 22:23 ` [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task_file iterator Yonghong Song
2020-08-19 0:05 ` Alexei Starovoitov
2020-08-19 0:30 ` Yonghong Song
2020-08-19 0:49 ` Alexei Starovoitov
2020-08-18 22:23 ` [PATCH bpf v2 2/3] bpf: avoid visit same object multiple times Yonghong Song
2020-08-18 22:23 ` [PATCH bpf v2 3/3] bpftool: handle EAGAIN error code properly in pids collection Yonghong Song
2020-08-18 22:30 ` Andrii Nakryiko
2020-08-18 22:28 ` [PATCH bpf v2 0/3] bpf: two fixes for bpf iterators Yonghong Song
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.