bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
	"kpsingh@chromium.org" <kpsingh@chromium.org>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 1/4] bpf: introduce task_vma bpf_iter
Date: Thu, 17 Dec 2020 22:08:31 +0000	[thread overview]
Message-ID: <C4D9D25A-C3DD-4081-9EAD-B7A5B6B74F45@fb.com> (raw)
In-Reply-To: <20201217190308.insbsxpf6ujapbs3@ast-mbp>



> On Dec 17, 2020, at 11:03 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Tue, Dec 15, 2020 at 03:36:59PM -0800, Song Liu wrote:
>> +/*
>> + * Key information from vm_area_struct. We need this because we cannot
>> + * assume the vm_area_struct is still valid after each iteration.
>> + */
>> +struct __vm_area_struct {
>> +	__u64 start;
>> +	__u64 end;
>> +	__u64 flags;
>> +	__u64 pgoff;
>> +};
> 
> Where it's inside .c or exposed in uapi/bpf.h it will become uapi
> if it's used this way. Let's switch to arbitrary BTF-based access instead.
> 
>> +static struct __vm_area_struct *
>> +task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
>> +{
>> +	struct pid_namespace *ns = info->common.ns;
>> +	struct task_struct *curr_task;
>> +	struct vm_area_struct *vma;
>> +	u32 curr_tid = info->tid;
>> +	bool new_task = false;
>> +
>> +	/* If this function returns a non-NULL __vm_area_struct, it held
>> +	 * a reference to the task_struct. If info->file is non-NULL, it
>> +	 * also holds a reference to the file. If this function returns
>> +	 * NULL, it does not hold any reference.
>> +	 */
>> +again:
>> +	if (info->task) {
>> +		curr_task = info->task;
>> +	} else {
>> +		curr_task = task_seq_get_next(ns, &curr_tid, true);
>> +		if (!curr_task) {
>> +			info->task = NULL;
>> +			info->tid++;
>> +			return NULL;
>> +		}
>> +
>> +		if (curr_tid != info->tid) {
>> +			info->tid = curr_tid;
>> +			new_task = true;
>> +		}
>> +
>> +		if (!curr_task->mm)
>> +			goto next_task;
>> +		info->task = curr_task;
>> +	}
>> +
>> +	mmap_read_lock(curr_task->mm);
> 
> That will hurt. /proc readers do that and it causes all sorts
> of production issues. We cannot take this lock.
> There is no need to take it.
> Switch the whole thing to probe_read style walking.
> And reimplement find_vma with probe_read while omitting vmacache.
> It will be short rbtree walk.
> bpf prog doesn't need to see a stable struct. It will read it through ptr_to_btf_id
> which will use probe_read equivalent underneath.

rw_semaphore is designed to avoid write starvation, so read_lock should not cause
problem unless the lock was taken for extended period. [1] was a recent fix that 
avoids /proc issue by releasing mmap_lock between iterations. We are using similar
mechanism here. BTW: I changed this to mmap_read_lock_killable() in the next version. 

On the other hand, we need a valid vm_file pointer for bpf_d_path. So walking the 
rbtree without taking any lock would not work. We can avoid taking the lock when 
some SPF like mechanism merged (hopefully soonish). 

Did I miss anything? 

We can improve bpf_iter with some mechanism to specify which task to iterate, so 
that we don't have to iterate through all tasks when the user only want to inspect 
vmas in one task. 

Thanks,
Song

[1] ff9f47f6f00c ("mm: proc: smaps_rollup: do not stall write attempts on mmap_lock")


  reply	other threads:[~2020-12-17 22:09 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15 23:36 [PATCH v2 bpf-next 0/4] introduce bpf_iter for task_vma Song Liu
2020-12-15 23:36 ` [PATCH v2 bpf-next 1/4] bpf: introduce task_vma bpf_iter Song Liu
2020-12-16 17:36   ` Yonghong Song
2020-12-16 19:41     ` Song Liu
2020-12-17  0:34   ` Andrii Nakryiko
2020-12-17  1:51     ` Song Liu
2020-12-17 19:03   ` Alexei Starovoitov
2020-12-17 22:08     ` Song Liu [this message]
2020-12-18  2:34       ` Alexei Starovoitov
2020-12-18  3:15         ` Yonghong Song
2020-12-18  4:33         ` Song Liu
2020-12-18  5:23           ` Alexei Starovoitov
2020-12-18 16:38             ` Yonghong Song
2020-12-18 17:23               ` Song Liu
2021-01-05  1:46                 ` Alexei Starovoitov
2021-01-05  5:47                   ` Song Liu
2021-01-05 16:27                     ` Alexei Starovoitov
2021-01-05 17:10                       ` Song Liu
2021-01-05 17:27                         ` Alexei Starovoitov
2021-01-05 19:38                           ` Song Liu
2021-01-05 19:46                             ` Alexei Starovoitov
2021-01-05 19:51                               ` Song Liu
2020-12-15 23:37 ` [PATCH v2 bpf-next 2/4] bpf: allow bpf_d_path in sleepable bpf_iter program Song Liu
2020-12-16 17:41   ` Yonghong Song
2020-12-16 18:15   ` KP Singh
2020-12-16 18:31     ` KP Singh
2021-01-25 12:52   ` Jiri Olsa
2020-12-15 23:37 ` [PATCH v2 bpf-next 3/4] libbpf: introduce section "iter.s/" for " Song Liu
2020-12-16 17:42   ` Yonghong Song
2020-12-16 18:00     ` KP Singh
2020-12-15 23:37 ` [PATCH v2 bpf-next 4/4] selftests/bpf: add test for bpf_iter_task_vma Song Liu
2020-12-16 18:18   ` Yonghong Song
2020-12-16 23:23     ` Song Liu
2020-12-16 17:00 ` [PATCH v2 bpf-next 0/4] introduce bpf_iter for task_vma Yonghong Song
2020-12-16 17:35   ` Song Liu
2020-12-16 18:31     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C4D9D25A-C3DD-4081-9EAD-B7A5B6B74F45@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).