[PATCH 0/1 v3 bpf-next] bpf: increment and use correct thread iterator

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/1 v3 bpf-next] bpf: increment and use correct thread iterator
@ 2020-12-11 17:11 Jonathan Lemon
  2020-12-11 17:11 ` [PATCH 1/1 " Jonathan Lemon
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Lemon @ 2020-12-11 17:11 UTC (permalink / raw)
  To: netdev, ast, daniel, yhs, bpf; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Reposting with one of the many splats seen.

v2->v3:
  Add splat to commitlog descriptions

v1->v2
  Use Fixes: shas from correct tree

Jonathan Lemon (1):
  bpf: increment and use correct thread iterator

 kernel/bpf/task_iter.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/1 v3 bpf-next] bpf: increment and use correct thread iterator
  2020-12-11 17:11 [PATCH 0/1 v3 bpf-next] bpf: increment and use correct thread iterator Jonathan Lemon
@ 2020-12-11 17:11 ` Jonathan Lemon
  2020-12-11 20:23   ` Andrii Nakryiko
  2020-12-18 16:53   ` Yonghong Song
  0 siblings, 2 replies; 10+ messages in thread
From: Jonathan Lemon @ 2020-12-11 17:11 UTC (permalink / raw)
  To: netdev, ast, daniel, yhs, bpf; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

On some systems, some variant of the following splat is
repeatedly seen.  The common factor in all traces seems
to be the entry point to task_file_seq_next().  With the
patch, all warnings go away.

    rcu: INFO: rcu_sched self-detected stall on CPU
    rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
    \x09(t=21033 jiffies g=159148529 q=223125)
    NMI backtrace for cpu 26
    CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
    Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
    Call Trace:
     <IRQ>
     dump_stack+0x50/0x70
     nmi_cpu_backtrace.cold.6+0x13/0x50
     ? lapic_can_unplug_cpu.cold.30+0x40/0x40
     nmi_trigger_cpumask_backtrace+0xba/0xca
     rcu_dump_cpu_stacks+0x99/0xc7
     rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
     ? tick_sched_do_timer+0x60/0x60
     update_process_times+0x24/0x50
     tick_sched_timer+0x37/0x70
     __hrtimer_run_queues+0xfe/0x270
     hrtimer_interrupt+0xf4/0x210
     smp_apic_timer_interrupt+0x5e/0x120
     apic_timer_interrupt+0xf/0x20
     </IRQ>
    RIP: 0010:get_pid_task+0x38/0x80
    Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
    RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
    RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
    RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
    R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
    R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
     ? find_ge_pid+0x1b/0x20
     task_seq_get_next+0x52/0xc0
     task_file_seq_get_next+0x159/0x220
     task_file_seq_next+0x4f/0xa0
     bpf_seq_read+0x159/0x390
     vfs_read+0x8a/0x140
     ksys_read+0x59/0xd0
     do_syscall_64+0x42/0x110
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f95ae73e76e
    Code: Bad RIP value.
    RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
    RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
    RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
    RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
    R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
    R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0

The attached patch does 3 things:

1) If unable to obtain the file structure for the current task,
   proceed to the next task number after the one returned from
   task_seq_get_next(), instead of the next task number from the
   original iterator.

2) Use thread_group_leader() instead of the open-coded comparision
   of tgid vs pid.

3) Only obtain the task reference count at the end of the RCU section
   instead of repeatedly obtaining/releasing it when iterathing though
   a thread group.

Fixes: a650da2ee52a ("bpf: Add task and task/file iterator targets")
Fixes: 67b6b863e6ab ("bpf: Avoid iterating duplicated files for task_file iterator")

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 kernel/bpf/task_iter.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 0458a40edf10..66a52fcf589a 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -33,17 +33,17 @@ static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
 	pid = find_ge_pid(*tid, ns);
 	if (pid) {
 		*tid = pid_nr_ns(pid, ns);
-		task = get_pid_task(pid, PIDTYPE_PID);
+		task = pid_task(pid, PIDTYPE_PID);
 		if (!task) {
 			++*tid;
 			goto retry;
-		} else if (skip_if_dup_files && task->tgid != task->pid &&
+		} else if (skip_if_dup_files && !thread_group_leader(task) &&
 			   task->files == task->group_leader->files) {
-			put_task_struct(task);
 			task = NULL;
 			++*tid;
 			goto retry;
 		}
+		get_task_struct(task);
 	}
 	rcu_read_unlock();
 
@@ -164,7 +164,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
 		curr_files = get_files_struct(curr_task);
 		if (!curr_files) {
 			put_task_struct(curr_task);
-			curr_tid = ++(info->tid);
+			curr_tid = curr_tid + 1;
 			info->fd = 0;
 			goto again;
 		}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1 v3 bpf-next] bpf: increment and use correct thread iterator
  2020-12-11 17:11 ` [PATCH 1/1 " Jonathan Lemon
@ 2020-12-11 20:23   ` Andrii Nakryiko
  2020-12-11 23:01     ` Jonathan Lemon
  2020-12-18 16:53   ` Yonghong Song
  1 sibling, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2020-12-11 20:23 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Networking, Alexei Starovoitov, Daniel Borkmann, Yonghong Song,
	bpf, Kernel Team

On Fri, Dec 11, 2020 at 10:56 AM Jonathan Lemon
<jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> On some systems, some variant of the following splat is
> repeatedly seen.  The common factor in all traces seems
> to be the entry point to task_file_seq_next().  With the
> patch, all warnings go away.
>
>     rcu: INFO: rcu_sched self-detected stall on CPU
>     rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
>     \x09(t=21033 jiffies g=159148529 q=223125)
>     NMI backtrace for cpu 26
>     CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
>     Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
>     Call Trace:
>      <IRQ>
>      dump_stack+0x50/0x70
>      nmi_cpu_backtrace.cold.6+0x13/0x50
>      ? lapic_can_unplug_cpu.cold.30+0x40/0x40
>      nmi_trigger_cpumask_backtrace+0xba/0xca
>      rcu_dump_cpu_stacks+0x99/0xc7
>      rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
>      ? tick_sched_do_timer+0x60/0x60
>      update_process_times+0x24/0x50
>      tick_sched_timer+0x37/0x70
>      __hrtimer_run_queues+0xfe/0x270
>      hrtimer_interrupt+0xf4/0x210
>      smp_apic_timer_interrupt+0x5e/0x120
>      apic_timer_interrupt+0xf/0x20
>      </IRQ>
>     RIP: 0010:get_pid_task+0x38/0x80
>     Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
>     RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
>     RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
>     RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
>     RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
>     R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
>     R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
>      ? find_ge_pid+0x1b/0x20
>      task_seq_get_next+0x52/0xc0
>      task_file_seq_get_next+0x159/0x220
>      task_file_seq_next+0x4f/0xa0
>      bpf_seq_read+0x159/0x390
>      vfs_read+0x8a/0x140
>      ksys_read+0x59/0xd0
>      do_syscall_64+0x42/0x110
>      entry_SYSCALL_64_after_hwframe+0x44/0xa9
>     RIP: 0033:0x7f95ae73e76e
>     Code: Bad RIP value.
>     RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>     RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
>     RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
>     RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
>     R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
>     R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0
>
> The attached patch does 3 things:
>
> 1) If unable to obtain the file structure for the current task,
>    proceed to the next task number after the one returned from
>    task_seq_get_next(), instead of the next task number from the
>    original iterator.
>
> 2) Use thread_group_leader() instead of the open-coded comparision
>    of tgid vs pid.
>
> 3) Only obtain the task reference count at the end of the RCU section
>    instead of repeatedly obtaining/releasing it when iterathing though
>    a thread group.
>
> Fixes: a650da2ee52a ("bpf: Add task and task/file iterator targets")
> Fixes: 67b6b863e6ab ("bpf: Avoid iterating duplicated files for task_file iterator")
>
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  kernel/bpf/task_iter.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 0458a40edf10..66a52fcf589a 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -33,17 +33,17 @@ static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
>         pid = find_ge_pid(*tid, ns);
>         if (pid) {
>                 *tid = pid_nr_ns(pid, ns);
> -               task = get_pid_task(pid, PIDTYPE_PID);
> +               task = pid_task(pid, PIDTYPE_PID);
>                 if (!task) {
>                         ++*tid;
>                         goto retry;
> -               } else if (skip_if_dup_files && task->tgid != task->pid &&
> +               } else if (skip_if_dup_files && !thread_group_leader(task) &&
>                            task->files == task->group_leader->files) {
> -                       put_task_struct(task);
>                         task = NULL;
>                         ++*tid;
>                         goto retry;
>                 }
> +               get_task_struct(task);
>         }

This part looks good. I'd say it deserves a separate patch, but it's minor.

>         rcu_read_unlock();
>
> @@ -164,7 +164,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
>                 curr_files = get_files_struct(curr_task);
>                 if (!curr_files) {
>                         put_task_struct(curr_task);
> -                       curr_tid = ++(info->tid);
> +                       curr_tid = curr_tid + 1;

Yonghong might know definitively, but it seems like we need to update
info->tid here as well:

info->tid = curr_tid;

If the search eventually yields no task, then info->tid will stay at
some potentially much smaller value, and we'll keep re-searching tasks
from the same TID on each subsequent read (if user keeps reading the
file). So corner case, but good to have covered.

>                         info->fd = 0;
>                         goto again;
>                 }
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1 v3 bpf-next] bpf: increment and use correct thread iterator
  2020-12-11 20:23   ` Andrii Nakryiko
@ 2020-12-11 23:01     ` Jonathan Lemon
  2020-12-12  0:28       ` Andrii Nakryiko
  2020-12-14  7:00       ` Yonghong Song
  0 siblings, 2 replies; 10+ messages in thread
From: Jonathan Lemon @ 2020-12-11 23:01 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, Alexei Starovoitov, Daniel Borkmann, Yonghong Song,
	bpf, Kernel Team

On Fri, Dec 11, 2020 at 12:23:34PM -0800, Andrii Nakryiko wrote:
> > @@ -164,7 +164,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
> >                 curr_files = get_files_struct(curr_task);
> >                 if (!curr_files) {
> >                         put_task_struct(curr_task);
> > -                       curr_tid = ++(info->tid);
> > +                       curr_tid = curr_tid + 1;
> 
> Yonghong might know definitively, but it seems like we need to update
> info->tid here as well:
> 
> info->tid = curr_tid;
> 
> If the search eventually yields no task, then info->tid will stay at
> some potentially much smaller value, and we'll keep re-searching tasks
> from the same TID on each subsequent read (if user keeps reading the
> file). So corner case, but good to have covered.

That applies earlier as well:

                curr_task = task_seq_get_next(ns, &curr_tid, true);
                if (!curr_task) {
                        info->task = NULL;
                        info->files = NULL;
                        return NULL;
                }

The logic seems to be "if task == NULL, then return NULL and stop". 
Is the seq_iterator allowed to continue/restart if seq_next returns NULL?
--
Jonathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1 v3 bpf-next] bpf: increment and use correct thread iterator
  2020-12-11 23:01     ` Jonathan Lemon
@ 2020-12-12  0:28       ` Andrii Nakryiko
  2020-12-14  7:00       ` Yonghong Song
  1 sibling, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2020-12-12  0:28 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Networking, Alexei Starovoitov, Daniel Borkmann, Yonghong Song,
	bpf, Kernel Team

On Fri, Dec 11, 2020 at 3:01 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> On Fri, Dec 11, 2020 at 12:23:34PM -0800, Andrii Nakryiko wrote:
> > > @@ -164,7 +164,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
> > >                 curr_files = get_files_struct(curr_task);
> > >                 if (!curr_files) {
> > >                         put_task_struct(curr_task);
> > > -                       curr_tid = ++(info->tid);
> > > +                       curr_tid = curr_tid + 1;
> >
> > Yonghong might know definitively, but it seems like we need to update
> > info->tid here as well:
> >
> > info->tid = curr_tid;
> >
> > If the search eventually yields no task, then info->tid will stay at
> > some potentially much smaller value, and we'll keep re-searching tasks
> > from the same TID on each subsequent read (if user keeps reading the
> > file). So corner case, but good to have covered.
>
> That applies earlier as well:
>
>                 curr_task = task_seq_get_next(ns, &curr_tid, true);
>                 if (!curr_task) {
>                         info->task = NULL;
>                         info->files = NULL;
>                         return NULL;
>                 }
>

True, info->tid = curr_tid + 1; seems to be needed here?

> The logic seems to be "if task == NULL, then return NULL and stop".
> Is the seq_iterator allowed to continue/restart if seq_next returns NULL?

I don't think we allow seeking, so no restarts. But nothing will
prevent the user to keep calling read() after it returns 0 byte, so
yes, continuation is possible.

> --
> Jonathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1 v3 bpf-next] bpf: increment and use correct thread iterator
  2020-12-11 23:01     ` Jonathan Lemon
  2020-12-12  0:28       ` Andrii Nakryiko
@ 2020-12-14  7:00       ` Yonghong Song
  1 sibling, 0 replies; 10+ messages in thread
From: Yonghong Song @ 2020-12-14  7:00 UTC (permalink / raw)
  To: Jonathan Lemon, Andrii Nakryiko
  Cc: Networking, Alexei Starovoitov, Daniel Borkmann, bpf, Kernel Team



On 12/11/20 3:01 PM, Jonathan Lemon wrote:
> On Fri, Dec 11, 2020 at 12:23:34PM -0800, Andrii Nakryiko wrote:
>>> @@ -164,7 +164,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
>>>                  curr_files = get_files_struct(curr_task);
>>>                  if (!curr_files) {
>>>                          put_task_struct(curr_task);
>>> -                       curr_tid = ++(info->tid);
>>> +                       curr_tid = curr_tid + 1;
>>
>> Yonghong might know definitively, but it seems like we need to update
>> info->tid here as well:
>>
>> info->tid = curr_tid;
>>
>> If the search eventually yields no task, then info->tid will stay at
>> some potentially much smaller value, and we'll keep re-searching tasks
>> from the same TID on each subsequent read (if user keeps reading the
>> file). So corner case, but good to have covered.
> 
> That applies earlier as well:
> 
>                  curr_task = task_seq_get_next(ns, &curr_tid, true);
>                  if (!curr_task) {
>                          info->task = NULL;
>                          info->files = NULL;
>                          return NULL;
>                  }
> 
> The logic seems to be "if task == NULL, then return NULL and stop".
> Is the seq_iterator allowed to continue/restart if seq_next returns NULL?

If seq_next() returns NULL, bpf_seq_read() will end and the control
will return to user space. There are two cases here:
    - there are something in the buffer and user will get non-zero-length
      return data and after this typically user will call read() syscall
      again. In such cases case, the search will be from last info->tid.
    - the buffer is empty and user will get a "0" return value for read()
      system. Typically, user will not call read() syscall any more.
      But if it does, the search will start from last info->tid.

Agree with Andrii, in general, this should not be a big problem.
But it is good to get this fixed too.

> --
> Jonathan
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1 v3 bpf-next] bpf: increment and use correct thread iterator
  2020-12-11 17:11 ` [PATCH 1/1 " Jonathan Lemon
  2020-12-11 20:23   ` Andrii Nakryiko
@ 2020-12-18 16:53   ` Yonghong Song
  2020-12-18 18:06     ` Jonathan Lemon
  1 sibling, 1 reply; 10+ messages in thread
From: Yonghong Song @ 2020-12-18 16:53 UTC (permalink / raw)
  To: Jonathan Lemon, netdev, ast, daniel, bpf; +Cc: kernel-team



On 12/11/20 9:11 AM, Jonathan Lemon wrote:
> From: Jonathan Lemon <bsd@fb.com>
> 
> On some systems, some variant of the following splat is
> repeatedly seen.  The common factor in all traces seems
> to be the entry point to task_file_seq_next().  With the
> patch, all warnings go away.
> 
>      rcu: INFO: rcu_sched self-detected stall on CPU
>      rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
>      \x09(t=21033 jiffies g=159148529 q=223125)
>      NMI backtrace for cpu 26
>      CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
>      Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
>      Call Trace:
>       <IRQ>
>       dump_stack+0x50/0x70
>       nmi_cpu_backtrace.cold.6+0x13/0x50
>       ? lapic_can_unplug_cpu.cold.30+0x40/0x40
>       nmi_trigger_cpumask_backtrace+0xba/0xca
>       rcu_dump_cpu_stacks+0x99/0xc7
>       rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
>       ? tick_sched_do_timer+0x60/0x60
>       update_process_times+0x24/0x50
>       tick_sched_timer+0x37/0x70
>       __hrtimer_run_queues+0xfe/0x270
>       hrtimer_interrupt+0xf4/0x210
>       smp_apic_timer_interrupt+0x5e/0x120
>       apic_timer_interrupt+0xf/0x20
>       </IRQ>
>      RIP: 0010:get_pid_task+0x38/0x80
>      Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
>      RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
>      RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
>      RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
>      RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
>      R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
>      R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
>       ? find_ge_pid+0x1b/0x20
>       task_seq_get_next+0x52/0xc0
>       task_file_seq_get_next+0x159/0x220
>       task_file_seq_next+0x4f/0xa0
>       bpf_seq_read+0x159/0x390
>       vfs_read+0x8a/0x140
>       ksys_read+0x59/0xd0
>       do_syscall_64+0x42/0x110
>       entry_SYSCALL_64_after_hwframe+0x44/0xa9
>      RIP: 0033:0x7f95ae73e76e
>      Code: Bad RIP value.
>      RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>      RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
>      RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
>      RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
>      R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
>      R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0
> 
> The attached patch does 3 things:
> 
> 1) If unable to obtain the file structure for the current task,
>     proceed to the next task number after the one returned from
>     task_seq_get_next(), instead of the next task number from the
>     original iterator.

Looks like this fix is the real fix for the above warnings.
Basically, say we have
    info->tid = 10 and returned curr_tid = 3000 and tid 3000 has no files.
the current logic will go through
    - set curr_tid = 11 (info->tid++) and returned curr_tid = 3000
    - set curr_tid = 12 and returned curr_tid = 3000
    ...
    - set curr_tid = 3000 and returned curr_tid = 3000
    - set curr_tid = 3001 and return curr_tid >= 3001

All the above works are redundant work, and it may cause issues
for non preemptable kernel.

I suggest you factor out this change plus the following change
which suggested by Andrii early to a separate patch carried with
the below Fixes tag.

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 0458a40edf10..56bcaef72e36 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -158,6 +158,7 @@ task_file_seq_get_next(struct 
bpf_iter_seq_task_file_info *info)
                 if (!curr_task) {
                         info->task = NULL;
                         info->files = NULL;
+                       info->tid = curr_tid + 1;
                         return NULL;
                 }

> 
> 2) Use thread_group_leader() instead of the open-coded comparision
>     of tgid vs pid.

My experiment show there is no difference between thread_group_leader()
and comparing tgid/pid, but indeed using existing function is better,
so I am okay with this.

> 
> 3) Only obtain the task reference count at the end of the RCU section
>     instead of repeatedly obtaining/releasing it when iterathing though
>     a thread group.

The above two changes are not really fixing the rcu warnings, but
they are nice to have indeed. Could you put it into separate patch
(patch 2)? You can add the following improvement change as well

-bash-4.4$ git diff
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 0458a40edf10..f61e5ddb38ce 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -148,12 +148,12 @@ task_file_seq_get_next(struct 
bpf_iter_seq_task_file_info *info)
          * it held a reference to the task/files_struct/file.
          * Otherwise, it does not hold any reference.
          */
-again:
         if (info->task) {
                 curr_task = info->task;
                 curr_files = info->files;
                 curr_fd = info->fd;
         } else {
+again:
                 curr_task = task_seq_get_next(ns, &curr_tid, true);
                 if (!curr_task) {
                         info->task = NULL;

to reduce one branch for searching next task in task_file_seq_get_next() 
function.

> 
> Fixes: a650da2ee52a ("bpf: Add task and task/file iterator targets")
> Fixes: 67b6b863e6ab ("bpf: Avoid iterating duplicated files for task_file iterator")
> 
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>   kernel/bpf/task_iter.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 0458a40edf10..66a52fcf589a 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -33,17 +33,17 @@ static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
>   	pid = find_ge_pid(*tid, ns);
>   	if (pid) {
>   		*tid = pid_nr_ns(pid, ns);
> -		task = get_pid_task(pid, PIDTYPE_PID);
> +		task = pid_task(pid, PIDTYPE_PID);
>   		if (!task) {
>   			++*tid;
>   			goto retry;
> -		} else if (skip_if_dup_files && task->tgid != task->pid &&
> +		} else if (skip_if_dup_files && !thread_group_leader(task) &&
>   			   task->files == task->group_leader->files) {
> -			put_task_struct(task);
>   			task = NULL;
>   			++*tid;
>   			goto retry;
>   		}
> +		get_task_struct(task);
>   	}
>   	rcu_read_unlock();
>   
> @@ -164,7 +164,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
>   		curr_files = get_files_struct(curr_task);
>   		if (!curr_files) {
>   			put_task_struct(curr_task);
> -			curr_tid = ++(info->tid);
> +			curr_tid = curr_tid + 1;
>   			info->fd = 0;
>   			goto again;
>   		}
> 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1 v3 bpf-next] bpf: increment and use correct thread iterator
  2020-12-18 16:53   ` Yonghong Song
@ 2020-12-18 18:06     ` Jonathan Lemon
  2020-12-18 18:21       ` Yonghong Song
  2020-12-18 19:28       ` Andrii Nakryiko
  0 siblings, 2 replies; 10+ messages in thread
From: Jonathan Lemon @ 2020-12-18 18:06 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, ast, daniel, bpf, kernel-team

On Fri, Dec 18, 2020 at 08:53:22AM -0800, Yonghong Song wrote:
> 
> 
> On 12/11/20 9:11 AM, Jonathan Lemon wrote:
> > From: Jonathan Lemon <bsd@fb.com>
> > 
> > On some systems, some variant of the following splat is
> > repeatedly seen.  The common factor in all traces seems
> > to be the entry point to task_file_seq_next().  With the
> > patch, all warnings go away.
> > 
> >      rcu: INFO: rcu_sched self-detected stall on CPU
> >      rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
> >      \x09(t=21033 jiffies g=159148529 q=223125)
> >      NMI backtrace for cpu 26
> >      CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
> >      Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
> >      Call Trace:
> >       <IRQ>
> >       dump_stack+0x50/0x70
> >       nmi_cpu_backtrace.cold.6+0x13/0x50
> >       ? lapic_can_unplug_cpu.cold.30+0x40/0x40
> >       nmi_trigger_cpumask_backtrace+0xba/0xca
> >       rcu_dump_cpu_stacks+0x99/0xc7
> >       rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
> >       ? tick_sched_do_timer+0x60/0x60
> >       update_process_times+0x24/0x50
> >       tick_sched_timer+0x37/0x70
> >       __hrtimer_run_queues+0xfe/0x270
> >       hrtimer_interrupt+0xf4/0x210
> >       smp_apic_timer_interrupt+0x5e/0x120
> >       apic_timer_interrupt+0xf/0x20
> >       </IRQ>
> >      RIP: 0010:get_pid_task+0x38/0x80
> >      Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
> >      RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
> >      RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
> >      RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
> >      RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
> >      R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
> >      R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
> >       ? find_ge_pid+0x1b/0x20
> >       task_seq_get_next+0x52/0xc0
> >       task_file_seq_get_next+0x159/0x220
> >       task_file_seq_next+0x4f/0xa0
> >       bpf_seq_read+0x159/0x390
> >       vfs_read+0x8a/0x140
> >       ksys_read+0x59/0xd0
> >       do_syscall_64+0x42/0x110
> >       entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >      RIP: 0033:0x7f95ae73e76e
> >      Code: Bad RIP value.
> >      RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> >      RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
> >      RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
> >      RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
> >      R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
> >      R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0
> > 
> > The attached patch does 3 things:
> > 
> > 1) If unable to obtain the file structure for the current task,
> >     proceed to the next task number after the one returned from
> >     task_seq_get_next(), instead of the next task number from the
> >     original iterator.
> 
> Looks like this fix is the real fix for the above warnings.
> Basically, say we have
>    info->tid = 10 and returned curr_tid = 3000 and tid 3000 has no files.
> the current logic will go through
>    - set curr_tid = 11 (info->tid++) and returned curr_tid = 3000
>    - set curr_tid = 12 and returned curr_tid = 3000
>    ...
>    - set curr_tid = 3000 and returned curr_tid = 3000
>    - set curr_tid = 3001 and return curr_tid >= 3001
> 
> All the above works are redundant work, and it may cause issues
> for non preemptable kernel.
> 
> I suggest you factor out this change plus the following change
> which suggested by Andrii early to a separate patch carried with
> the below Fixes tag.
> 
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 0458a40edf10..56bcaef72e36 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -158,6 +158,7 @@ task_file_seq_get_next(struct
> bpf_iter_seq_task_file_info *info)
>                 if (!curr_task) {
>                         info->task = NULL;
>                         info->files = NULL;
> +                       info->tid = curr_tid + 1;
>                         return NULL;
>                 }

Sure this isn't supposed to be 'curr_tid'?  task_seq_get_next() stops
when there are no more threads found.  This increments the thread id
past the search point, and would seem to introduce a potential off-by-one 
error.

That is:
   curr_tid = 3000. 
   call task_seq_get_next() --> return NULL, curr_tid = 3000.
      (so there is no tid >= 3000)
   set curr_tid = 3001.  

   next restart (if there is one) skips a newly created 3000.

-- 
Jonathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1 v3 bpf-next] bpf: increment and use correct thread iterator
  2020-12-18 18:06     ` Jonathan Lemon
@ 2020-12-18 18:21       ` Yonghong Song
  2020-12-18 19:28       ` Andrii Nakryiko
  1 sibling, 0 replies; 10+ messages in thread
From: Yonghong Song @ 2020-12-18 18:21 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: netdev, ast, daniel, bpf, kernel-team



On 12/18/20 10:06 AM, Jonathan Lemon wrote:
> On Fri, Dec 18, 2020 at 08:53:22AM -0800, Yonghong Song wrote:
>>
>>
>> On 12/11/20 9:11 AM, Jonathan Lemon wrote:
>>> From: Jonathan Lemon <bsd@fb.com>
>>>
>>> On some systems, some variant of the following splat is
>>> repeatedly seen.  The common factor in all traces seems
>>> to be the entry point to task_file_seq_next().  With the
>>> patch, all warnings go away.
>>>
>>>       rcu: INFO: rcu_sched self-detected stall on CPU
>>>       rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
>>>       \x09(t=21033 jiffies g=159148529 q=223125)
>>>       NMI backtrace for cpu 26
>>>       CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
>>>       Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
>>>       Call Trace:
>>>        <IRQ>
>>>        dump_stack+0x50/0x70
>>>        nmi_cpu_backtrace.cold.6+0x13/0x50
>>>        ? lapic_can_unplug_cpu.cold.30+0x40/0x40
>>>        nmi_trigger_cpumask_backtrace+0xba/0xca
>>>        rcu_dump_cpu_stacks+0x99/0xc7
>>>        rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
>>>        ? tick_sched_do_timer+0x60/0x60
>>>        update_process_times+0x24/0x50
>>>        tick_sched_timer+0x37/0x70
>>>        __hrtimer_run_queues+0xfe/0x270
>>>        hrtimer_interrupt+0xf4/0x210
>>>        smp_apic_timer_interrupt+0x5e/0x120
>>>        apic_timer_interrupt+0xf/0x20
>>>        </IRQ>
>>>       RIP: 0010:get_pid_task+0x38/0x80
>>>       Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
>>>       RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
>>>       RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
>>>       RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
>>>       RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
>>>       R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
>>>       R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
>>>        ? find_ge_pid+0x1b/0x20
>>>        task_seq_get_next+0x52/0xc0
>>>        task_file_seq_get_next+0x159/0x220
>>>        task_file_seq_next+0x4f/0xa0
>>>        bpf_seq_read+0x159/0x390
>>>        vfs_read+0x8a/0x140
>>>        ksys_read+0x59/0xd0
>>>        do_syscall_64+0x42/0x110
>>>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>       RIP: 0033:0x7f95ae73e76e
>>>       Code: Bad RIP value.
>>>       RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>>>       RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
>>>       RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
>>>       RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
>>>       R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
>>>       R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0
>>>
>>> The attached patch does 3 things:
>>>
>>> 1) If unable to obtain the file structure for the current task,
>>>      proceed to the next task number after the one returned from
>>>      task_seq_get_next(), instead of the next task number from the
>>>      original iterator.
>>
>> Looks like this fix is the real fix for the above warnings.
>> Basically, say we have
>>     info->tid = 10 and returned curr_tid = 3000 and tid 3000 has no files.
>> the current logic will go through
>>     - set curr_tid = 11 (info->tid++) and returned curr_tid = 3000
>>     - set curr_tid = 12 and returned curr_tid = 3000
>>     ...
>>     - set curr_tid = 3000 and returned curr_tid = 3000
>>     - set curr_tid = 3001 and return curr_tid >= 3001
>>
>> All the above works are redundant work, and it may cause issues
>> for non preemptable kernel.
>>
>> I suggest you factor out this change plus the following change
>> which suggested by Andrii early to a separate patch carried with
>> the below Fixes tag.
>>
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 0458a40edf10..56bcaef72e36 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -158,6 +158,7 @@ task_file_seq_get_next(struct
>> bpf_iter_seq_task_file_info *info)
>>                  if (!curr_task) {
>>                          info->task = NULL;
>>                          info->files = NULL;
>> +                       info->tid = curr_tid + 1;
>>                          return NULL;
>>                  }
> 
> Sure this isn't supposed to be 'curr_tid'?  task_seq_get_next() stops
> when there are no more threads found.  This increments the thread id
> past the search point, and would seem to introduce a potential off-by-one
> error.

This is for the case where read() syscall return length 0, but user
space still keep read(). You are right, we may skip one newly created 
one indeed.

Although such a usecase is not common, but info->tid = curr_tid
certainly more correct than info->tid = curr_tid + 1.

So thanks for suggestion, LGTM.

> 
> That is:
>     curr_tid = 3000.
>     call task_seq_get_next() --> return NULL, curr_tid = 3000.
>        (so there is no tid >= 3000)
>     set curr_tid = 3001.
> 
>     next restart (if there is one) skips a newly created 3000.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1 v3 bpf-next] bpf: increment and use correct thread iterator
  2020-12-18 18:06     ` Jonathan Lemon
  2020-12-18 18:21       ` Yonghong Song
@ 2020-12-18 19:28       ` Andrii Nakryiko
  1 sibling, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2020-12-18 19:28 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Yonghong Song, Networking, Alexei Starovoitov, Daniel Borkmann,
	bpf, Kernel Team

On Fri, Dec 18, 2020 at 10:08 AM Jonathan Lemon
<jonathan.lemon@gmail.com> wrote:
>
> On Fri, Dec 18, 2020 at 08:53:22AM -0800, Yonghong Song wrote:
> >
> >
> > On 12/11/20 9:11 AM, Jonathan Lemon wrote:
> > > From: Jonathan Lemon <bsd@fb.com>
> > >
> > > On some systems, some variant of the following splat is
> > > repeatedly seen.  The common factor in all traces seems
> > > to be the entry point to task_file_seq_next().  With the
> > > patch, all warnings go away.
> > >
> > >      rcu: INFO: rcu_sched self-detected stall on CPU
> > >      rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
> > >      \x09(t=21033 jiffies g=159148529 q=223125)
> > >      NMI backtrace for cpu 26
> > >      CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
> > >      Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
> > >      Call Trace:
> > >       <IRQ>
> > >       dump_stack+0x50/0x70
> > >       nmi_cpu_backtrace.cold.6+0x13/0x50
> > >       ? lapic_can_unplug_cpu.cold.30+0x40/0x40
> > >       nmi_trigger_cpumask_backtrace+0xba/0xca
> > >       rcu_dump_cpu_stacks+0x99/0xc7
> > >       rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
> > >       ? tick_sched_do_timer+0x60/0x60
> > >       update_process_times+0x24/0x50
> > >       tick_sched_timer+0x37/0x70
> > >       __hrtimer_run_queues+0xfe/0x270
> > >       hrtimer_interrupt+0xf4/0x210
> > >       smp_apic_timer_interrupt+0x5e/0x120
> > >       apic_timer_interrupt+0xf/0x20
> > >       </IRQ>
> > >      RIP: 0010:get_pid_task+0x38/0x80
> > >      Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
> > >      RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
> > >      RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
> > >      RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
> > >      RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
> > >      R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
> > >      R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
> > >       ? find_ge_pid+0x1b/0x20
> > >       task_seq_get_next+0x52/0xc0
> > >       task_file_seq_get_next+0x159/0x220
> > >       task_file_seq_next+0x4f/0xa0
> > >       bpf_seq_read+0x159/0x390
> > >       vfs_read+0x8a/0x140
> > >       ksys_read+0x59/0xd0
> > >       do_syscall_64+0x42/0x110
> > >       entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >      RIP: 0033:0x7f95ae73e76e
> > >      Code: Bad RIP value.
> > >      RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> > >      RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
> > >      RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
> > >      RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
> > >      R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
> > >      R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0
> > >
> > > The attached patch does 3 things:
> > >
> > > 1) If unable to obtain the file structure for the current task,
> > >     proceed to the next task number after the one returned from
> > >     task_seq_get_next(), instead of the next task number from the
> > >     original iterator.
> >
> > Looks like this fix is the real fix for the above warnings.
> > Basically, say we have
> >    info->tid = 10 and returned curr_tid = 3000 and tid 3000 has no files.
> > the current logic will go through
> >    - set curr_tid = 11 (info->tid++) and returned curr_tid = 3000
> >    - set curr_tid = 12 and returned curr_tid = 3000
> >    ...
> >    - set curr_tid = 3000 and returned curr_tid = 3000
> >    - set curr_tid = 3001 and return curr_tid >= 3001
> >
> > All the above works are redundant work, and it may cause issues
> > for non preemptable kernel.
> >
> > I suggest you factor out this change plus the following change
> > which suggested by Andrii early to a separate patch carried with
> > the below Fixes tag.
> >
> > diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> > index 0458a40edf10..56bcaef72e36 100644
> > --- a/kernel/bpf/task_iter.c
> > +++ b/kernel/bpf/task_iter.c
> > @@ -158,6 +158,7 @@ task_file_seq_get_next(struct
> > bpf_iter_seq_task_file_info *info)
> >                 if (!curr_task) {
> >                         info->task = NULL;
> >                         info->files = NULL;
> > +                       info->tid = curr_tid + 1;
> >                         return NULL;
> >                 }
>
> Sure this isn't supposed to be 'curr_tid'?  task_seq_get_next() stops
> when there are no more threads found.  This increments the thread id
> past the search point, and would seem to introduce a potential off-by-one
> error.
>
> That is:
>    curr_tid = 3000.
>    call task_seq_get_next() --> return NULL, curr_tid = 3000.
>       (so there is no tid >= 3000)
>    set curr_tid = 3001.
>
>    next restart (if there is one) skips a newly created 3000.

Seems fine to me to skip 3000 in such case. 3000 didn't exist at the
time of iteration. If there was >=3001 it would have been skipped as
well.

>
> --
> Jonathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-18 19:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-11 17:11 [PATCH 0/1 v3 bpf-next] bpf: increment and use correct thread iterator Jonathan Lemon
2020-12-11 17:11 ` [PATCH 1/1 " Jonathan Lemon
2020-12-11 20:23   ` Andrii Nakryiko
2020-12-11 23:01     ` Jonathan Lemon
2020-12-12  0:28       ` Andrii Nakryiko
2020-12-14  7:00       ` Yonghong Song
2020-12-18 16:53   ` Yonghong Song
2020-12-18 18:06     ` Jonathan Lemon
2020-12-18 18:21       ` Yonghong Song
2020-12-18 19:28       ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.