* [PATCH bpf-next v2 0/2] bpf: avoid iterating duplicated files for task_file iterator @ 2020-08-28 5:38 Yonghong Song 2020-08-28 5:38 ` [PATCH bpf-next v2 1/2] " Yonghong Song 2020-08-28 5:38 ` [PATCH bpf-next v2 2/2] selftests/bpf: test task_file iterator without visiting pthreads Yonghong Song 0 siblings, 2 replies; 8+ messages in thread From: Yonghong Song @ 2020-08-28 5:38 UTC (permalink / raw) To: bpf, netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team Commit e679654a704e ("bpf: Fix a rcu_sched stall issue with bpf task/task_file iterator") introduced rate limiting in bpf_seq_read() to fix a case where traversing too many tasks and files (tens of millions of files) may cause kernel rcu stall. But rate limiting won't reduce the amount of work to traverse all these files. In practice, for a user process, typically all threads belongs to that process share the same file table and there is no need to visit every thread for its files. This patch added additional logic for task_file iterator to skip tasks if those tasks are not group_leaders and their files are the same as those of group_leaders. Such reduction of unnecessary work will make iterator runtime much faster if there are a lot of non-main threads and open files for the process. Patch #1 is the kernel implementation and Patch #2 is the selftest. v1 -> v2: - for task_file, no need for additional user parameter, kernel can just skip those files already visited, and this should not impact user space. (Andrii) - to add group_leader-only customization for task will be considered later. - remove Patch #1 and sent it separately as this patch set won't depend on it any more. Yonghong Song (2): bpf: avoid iterating duplicated files for task_file iterator selftests/bpf: test task_file iterator without visiting pthreads kernel/bpf/task_iter.c | 14 ++++++++----- .../selftests/bpf/prog_tests/bpf_iter.c | 21 +++++++++++++++++++ .../selftests/bpf/progs/bpf_iter_task_file.c | 10 ++++++++- 3 files changed, 39 insertions(+), 6 deletions(-) -- 2.24.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH bpf-next v2 1/2] bpf: avoid iterating duplicated files for task_file iterator 2020-08-28 5:38 [PATCH bpf-next v2 0/2] bpf: avoid iterating duplicated files for task_file iterator Yonghong Song @ 2020-08-28 5:38 ` Yonghong Song 2020-09-01 16:17 ` Josef Bacik 2020-09-01 17:18 ` Josef Bacik 2020-08-28 5:38 ` [PATCH bpf-next v2 2/2] selftests/bpf: test task_file iterator without visiting pthreads Yonghong Song 1 sibling, 2 replies; 8+ messages in thread From: Yonghong Song @ 2020-08-28 5:38 UTC (permalink / raw) To: bpf, netdev Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Andrii Nakryiko Currently, task_file iterator iterates all files from all tasks. This may potentially visit a lot of duplicated files if there are many tasks sharing the same files, e.g., typical pthreads where these pthreads and the main thread are sharing the same files. This patch changed task_file iterator to skip a particular task if that task shares the same files as its group_leader (the task having the same tgid and also task->tgid == task->pid). This will preserve the same result, visiting all files from all tasks, and will reduce runtime cost significantl, e.g., if there are a lot of pthreads and the process has a lot of open files. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> --- kernel/bpf/task_iter.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) It would be good if somebody familar with sched code can help check whether I missed anything or not (e.g., locks, etc.) for the code change task->files == task->group_leader->files Note the change in this patch might have conflicts with e60572b8d4c3 ("bpf: Avoid visit same object multiple times") which is merged into bpf/net sometimes back. diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index 232df29793e9..0c5c96bb6964 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -22,7 +22,8 @@ struct bpf_iter_seq_task_info { }; static struct task_struct *task_seq_get_next(struct pid_namespace *ns, - u32 *tid) + u32 *tid, + bool skip_if_dup_files) { struct task_struct *task = NULL; struct pid *pid; @@ -32,7 +33,10 @@ static struct task_struct *task_seq_get_next(struct pid_namespace *ns, pid = idr_get_next(&ns->idr, tid); if (pid) { task = get_pid_task(pid, PIDTYPE_PID); - if (!task) { + if (!task || + (skip_if_dup_files && + task->tgid != task->pid && + task->files == task->group_leader->files)) { ++*tid; goto retry; } @@ -47,7 +51,7 @@ static void *task_seq_start(struct seq_file *seq, loff_t *pos) struct bpf_iter_seq_task_info *info = seq->private; struct task_struct *task; - task = task_seq_get_next(info->common.ns, &info->tid); + task = task_seq_get_next(info->common.ns, &info->tid, false); if (!task) return NULL; @@ -64,7 +68,7 @@ static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos) ++*pos; ++info->tid; put_task_struct((struct task_struct *)v); - task = task_seq_get_next(info->common.ns, &info->tid); + task = task_seq_get_next(info->common.ns, &info->tid, false); if (!task) return NULL; @@ -147,7 +151,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info, curr_files = *fstruct; curr_fd = info->fd; } else { - curr_task = task_seq_get_next(ns, &curr_tid); + curr_task = task_seq_get_next(ns, &curr_tid, true); if (!curr_task) return NULL; -- 2.24.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH bpf-next v2 1/2] bpf: avoid iterating duplicated files for task_file iterator 2020-08-28 5:38 ` [PATCH bpf-next v2 1/2] " Yonghong Song @ 2020-09-01 16:17 ` Josef Bacik 2020-09-01 17:18 ` Josef Bacik 1 sibling, 0 replies; 8+ messages in thread From: Josef Bacik @ 2020-09-01 16:17 UTC (permalink / raw) To: Yonghong Song, bpf, netdev Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Andrii Nakryiko On 8/28/20 1:38 AM, Yonghong Song wrote: > Currently, task_file iterator iterates all files from all tasks. > This may potentially visit a lot of duplicated files if there are > many tasks sharing the same files, e.g., typical pthreads > where these pthreads and the main thread are sharing the same files. > > This patch changed task_file iterator to skip a particular task > if that task shares the same files as its group_leader (the task > having the same tgid and also task->tgid == task->pid). > This will preserve the same result, visiting all files from all > tasks, and will reduce runtime cost significantl, e.g., if there are > a lot of pthreads and the process has a lot of open files. > > Suggested-by: Andrii Nakryiko <andriin@fb.com> > Signed-off-by: Yonghong Song <yhs@fb.com> > --- > kernel/bpf/task_iter.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > It would be good if somebody familar with sched code can help check > whether I missed anything or not (e.g., locks, etc.) > for the code change > task->files == task->group_leader->files > > Note the change in this patch might have conflicts with > e60572b8d4c3 ("bpf: Avoid visit same object multiple times") > which is merged into bpf/net sometimes back. > > diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c > index 232df29793e9..0c5c96bb6964 100644 > --- a/kernel/bpf/task_iter.c > +++ b/kernel/bpf/task_iter.c > @@ -22,7 +22,8 @@ struct bpf_iter_seq_task_info { > }; > > static struct task_struct *task_seq_get_next(struct pid_namespace *ns, > - u32 *tid) > + u32 *tid, > + bool skip_if_dup_files) > { > struct task_struct *task = NULL; > struct pid *pid; > @@ -32,7 +33,10 @@ static struct task_struct *task_seq_get_next(struct pid_namespace *ns, > pid = idr_get_next(&ns->idr, tid); > if (pid) { > task = get_pid_task(pid, PIDTYPE_PID); > - if (!task) { > + if (!task || > + (skip_if_dup_files && > + task->tgid != task->pid && > + task->files == task->group_leader->files)) { This is fine, we're not deref'ing files, if we were you'd need get_files_struct(). You can deref task->group_leader here because you got the task so this is safe. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH bpf-next v2 1/2] bpf: avoid iterating duplicated files for task_file iterator 2020-08-28 5:38 ` [PATCH bpf-next v2 1/2] " Yonghong Song 2020-09-01 16:17 ` Josef Bacik @ 2020-09-01 17:18 ` Josef Bacik 2020-09-01 17:38 ` Yonghong Song 1 sibling, 1 reply; 8+ messages in thread From: Josef Bacik @ 2020-09-01 17:18 UTC (permalink / raw) To: Yonghong Song, bpf, netdev Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Andrii Nakryiko On 8/28/20 1:38 AM, Yonghong Song wrote: > Currently, task_file iterator iterates all files from all tasks. > This may potentially visit a lot of duplicated files if there are > many tasks sharing the same files, e.g., typical pthreads > where these pthreads and the main thread are sharing the same files. > > This patch changed task_file iterator to skip a particular task > if that task shares the same files as its group_leader (the task > having the same tgid and also task->tgid == task->pid). > This will preserve the same result, visiting all files from all > tasks, and will reduce runtime cost significantl, e.g., if there are > a lot of pthreads and the process has a lot of open files. > > Suggested-by: Andrii Nakryiko <andriin@fb.com> > Signed-off-by: Yonghong Song <yhs@fb.com> > --- > kernel/bpf/task_iter.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > It would be good if somebody familar with sched code can help check > whether I missed anything or not (e.g., locks, etc.) > for the code change > task->files == task->group_leader->files > > Note the change in this patch might have conflicts with > e60572b8d4c3 ("bpf: Avoid visit same object multiple times") > which is merged into bpf/net sometimes back. > > diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c > index 232df29793e9..0c5c96bb6964 100644 > --- a/kernel/bpf/task_iter.c > +++ b/kernel/bpf/task_iter.c > @@ -22,7 +22,8 @@ struct bpf_iter_seq_task_info { > }; > > static struct task_struct *task_seq_get_next(struct pid_namespace *ns, > - u32 *tid) > + u32 *tid, > + bool skip_if_dup_files) > { > struct task_struct *task = NULL; > struct pid *pid; > @@ -32,7 +33,10 @@ static struct task_struct *task_seq_get_next(struct pid_namespace *ns, > pid = idr_get_next(&ns->idr, tid); > if (pid) { > task = get_pid_task(pid, PIDTYPE_PID); > - if (!task) { > + if (!task || > + (skip_if_dup_files && > + task->tgid != task->pid && > + task->files == task->group_leader->files)) { > ++*tid; > goto retry; Sorry I only checked the task->files and task->group_leader thing, I forgot to actually pay attention to what the patch itself was doing. This will leak task structs, you need something like if (!task) { ++*tid; goto retry; } if (skip_if_dup_files && etc) { ++*tid; put_task_struct(task); goto retry; } otherwise you'll leak tasks. Thanks, Josef ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH bpf-next v2 1/2] bpf: avoid iterating duplicated files for task_file iterator 2020-09-01 17:18 ` Josef Bacik @ 2020-09-01 17:38 ` Yonghong Song 0 siblings, 0 replies; 8+ messages in thread From: Yonghong Song @ 2020-09-01 17:38 UTC (permalink / raw) To: Josef Bacik, bpf, netdev Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Andrii Nakryiko On 9/1/20 10:18 AM, Josef Bacik wrote: > On 8/28/20 1:38 AM, Yonghong Song wrote: >> Currently, task_file iterator iterates all files from all tasks. >> This may potentially visit a lot of duplicated files if there are >> many tasks sharing the same files, e.g., typical pthreads >> where these pthreads and the main thread are sharing the same files. >> >> This patch changed task_file iterator to skip a particular task >> if that task shares the same files as its group_leader (the task >> having the same tgid and also task->tgid == task->pid). >> This will preserve the same result, visiting all files from all >> tasks, and will reduce runtime cost significantl, e.g., if there are >> a lot of pthreads and the process has a lot of open files. >> >> Suggested-by: Andrii Nakryiko <andriin@fb.com> >> Signed-off-by: Yonghong Song <yhs@fb.com> >> --- >> kernel/bpf/task_iter.c | 14 +++++++++----- >> 1 file changed, 9 insertions(+), 5 deletions(-) >> >> It would be good if somebody familar with sched code can help check >> whether I missed anything or not (e.g., locks, etc.) >> for the code change >> task->files == task->group_leader->files >> >> Note the change in this patch might have conflicts with >> e60572b8d4c3 ("bpf: Avoid visit same object multiple times") >> which is merged into bpf/net sometimes back. >> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c >> index 232df29793e9..0c5c96bb6964 100644 >> --- a/kernel/bpf/task_iter.c >> +++ b/kernel/bpf/task_iter.c >> @@ -22,7 +22,8 @@ struct bpf_iter_seq_task_info { >> }; >> static struct task_struct *task_seq_get_next(struct pid_namespace *ns, >> - u32 *tid) >> + u32 *tid, >> + bool skip_if_dup_files) >> { >> struct task_struct *task = NULL; >> struct pid *pid; >> @@ -32,7 +33,10 @@ static struct task_struct *task_seq_get_next(struct >> pid_namespace *ns, >> pid = idr_get_next(&ns->idr, tid); >> if (pid) { >> task = get_pid_task(pid, PIDTYPE_PID); >> - if (!task) { >> + if (!task || >> + (skip_if_dup_files && >> + task->tgid != task->pid && >> + task->files == task->group_leader->files)) { >> ++*tid; >> goto retry; > > Sorry I only checked the task->files and task->group_leader thing, I > forgot to actually pay attention to what the patch itself was doing. > > This will leak task structs, you need something like > > if (!task) { > ++*tid; > goto retry; > } > if (skip_if_dup_files && etc) { > ++*tid; > put_task_struct(task); > goto retry; > } > > otherwise you'll leak tasks. Thanks, Or, yes, good catch. Thanks! I too focused on task->files == task->group_lead->files and did not think deep on leaking tasks. Will send v2. > > Josef ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH bpf-next v2 2/2] selftests/bpf: test task_file iterator without visiting pthreads 2020-08-28 5:38 [PATCH bpf-next v2 0/2] bpf: avoid iterating duplicated files for task_file iterator Yonghong Song 2020-08-28 5:38 ` [PATCH bpf-next v2 1/2] " Yonghong Song @ 2020-08-28 5:38 ` Yonghong Song 2020-09-02 0:41 ` Andrii Nakryiko 1 sibling, 1 reply; 8+ messages in thread From: Yonghong Song @ 2020-08-28 5:38 UTC (permalink / raw) To: bpf, netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team Modified existing bpf_iter_test_file.c program to check whether all accessed files from the main thread or not. Modified existing bpf_iter_test_file program to check whether all accessed files from the main thread or not. $ ./test_progs -n 4 ... #4/7 task_file:OK ... #4 bpf_iter:OK Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Yonghong Song <yhs@fb.com> --- .../selftests/bpf/prog_tests/bpf_iter.c | 21 +++++++++++++++++++ .../selftests/bpf/progs/bpf_iter_task_file.c | 10 ++++++++- 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c index 7375d9a6d242..375ffaf85d78 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c @@ -132,17 +132,38 @@ static void test_task_stack(void) bpf_iter_task_stack__destroy(skel); } +static void *do_nothing(void *arg) +{ + pthread_exit(arg); +} + static void test_task_file(void) { struct bpf_iter_task_file *skel; + pthread_t thread_id; + void *ret; skel = bpf_iter_task_file__open_and_load(); if (CHECK(!skel, "bpf_iter_task_file__open_and_load", "skeleton open_and_load failed\n")) return; + skel->bss->tgid = getpid(); + + if (CHECK(pthread_create(&thread_id, NULL, &do_nothing, NULL), + "pthread_create", "pthread_create failed\n")) + goto done; + do_dummy_read(skel->progs.dump_task_file); + if (CHECK(pthread_join(thread_id, &ret) || ret != NULL, + "pthread_join", "pthread_join failed\n")) + goto done; + + CHECK(skel->bss->count != 0, "", + "invalid non pthread file visit %d\n", skel->bss->count); + +done: bpf_iter_task_file__destroy(skel); } diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c index 8b787baa2654..b2f7c7c5f952 100644 --- a/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c @@ -6,6 +6,9 @@ char _license[] SEC("license") = "GPL"; +int count = 0; +int tgid = 0; + SEC("iter/task_file") int dump_task_file(struct bpf_iter__task_file *ctx) { @@ -17,8 +20,13 @@ int dump_task_file(struct bpf_iter__task_file *ctx) if (task == (void *)0 || file == (void *)0) return 0; - if (ctx->meta->seq_num == 0) + if (ctx->meta->seq_num == 0) { + count = 0; BPF_SEQ_PRINTF(seq, " tgid gid fd file\n"); + } + + if (tgid == task->tgid && task->tgid != task->pid) + count++; BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, (long)file->f_op); -- 2.24.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH bpf-next v2 2/2] selftests/bpf: test task_file iterator without visiting pthreads 2020-08-28 5:38 ` [PATCH bpf-next v2 2/2] selftests/bpf: test task_file iterator without visiting pthreads Yonghong Song @ 2020-09-02 0:41 ` Andrii Nakryiko 2020-09-02 2:18 ` Yonghong Song 0 siblings, 1 reply; 8+ messages in thread From: Andrii Nakryiko @ 2020-09-02 0:41 UTC (permalink / raw) To: Yonghong Song Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann, Kernel Team On Thu, Aug 27, 2020 at 10:38 PM Yonghong Song <yhs@fb.com> wrote: > > Modified existing bpf_iter_test_file.c program to check whether > all accessed files from the main thread or not. > > Modified existing bpf_iter_test_file program to check > whether all accessed files from the main thread or not. > $ ./test_progs -n 4 > ... > #4/7 task_file:OK > ... > #4 bpf_iter:OK > Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED > > Signed-off-by: Yonghong Song <yhs@fb.com> > --- Acked-by: Andrii Nakryiko <andriin@fb.com> > .../selftests/bpf/prog_tests/bpf_iter.c | 21 +++++++++++++++++++ > .../selftests/bpf/progs/bpf_iter_task_file.c | 10 ++++++++- > 2 files changed, 30 insertions(+), 1 deletion(-) > [...] > + if (CHECK(pthread_join(thread_id, &ret) || ret != NULL, > + "pthread_join", "pthread_join failed\n")) > + goto done; > + > + CHECK(skel->bss->count != 0, "", nit: please use non-empty string for second argument > + "invalid non pthread file visit %d\n", skel->bss->count); > + > +done: > bpf_iter_task_file__destroy(skel); > } > [...] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH bpf-next v2 2/2] selftests/bpf: test task_file iterator without visiting pthreads 2020-09-02 0:41 ` Andrii Nakryiko @ 2020-09-02 2:18 ` Yonghong Song 0 siblings, 0 replies; 8+ messages in thread From: Yonghong Song @ 2020-09-02 2:18 UTC (permalink / raw) To: Andrii Nakryiko Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann, Kernel Team On 9/1/20 5:41 PM, Andrii Nakryiko wrote: > On Thu, Aug 27, 2020 at 10:38 PM Yonghong Song <yhs@fb.com> wrote: >> >> Modified existing bpf_iter_test_file.c program to check whether >> all accessed files from the main thread or not. >> >> Modified existing bpf_iter_test_file program to check >> whether all accessed files from the main thread or not. >> $ ./test_progs -n 4 >> ... >> #4/7 task_file:OK >> ... >> #4 bpf_iter:OK >> Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED >> >> Signed-off-by: Yonghong Song <yhs@fb.com> >> --- > > Acked-by: Andrii Nakryiko <andriin@fb.com> > >> .../selftests/bpf/prog_tests/bpf_iter.c | 21 +++++++++++++++++++ >> .../selftests/bpf/progs/bpf_iter_task_file.c | 10 ++++++++- >> 2 files changed, 30 insertions(+), 1 deletion(-) >> > > [...] > >> + if (CHECK(pthread_join(thread_id, &ret) || ret != NULL, >> + "pthread_join", "pthread_join failed\n")) >> + goto done; >> + >> + CHECK(skel->bss->count != 0, "", > > nit: please use non-empty string for second argument Okay, will change to "check_count" instead of empty string. Thanks! > >> + "invalid non pthread file visit %d\n", skel->bss->count); >> + >> +done: >> bpf_iter_task_file__destroy(skel); >> } >> > > [...] > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-09-02 2:19 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-28 5:38 [PATCH bpf-next v2 0/2] bpf: avoid iterating duplicated files for task_file iterator Yonghong Song 2020-08-28 5:38 ` [PATCH bpf-next v2 1/2] " Yonghong Song 2020-09-01 16:17 ` Josef Bacik 2020-09-01 17:18 ` Josef Bacik 2020-09-01 17:38 ` Yonghong Song 2020-08-28 5:38 ` [PATCH bpf-next v2 2/2] selftests/bpf: test task_file iterator without visiting pthreads Yonghong Song 2020-09-02 0:41 ` Andrii Nakryiko 2020-09-02 2:18 ` Yonghong Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).