All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
	"kpsingh@chromium.org" <kpsingh@chromium.org>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 1/4] bpf: introduce task_vma bpf_iter
Date: Tue, 5 Jan 2021 09:27:45 -0800	[thread overview]
Message-ID: <CAADnVQLg-kQ9Neva0DvUW8CMiuNhv0HTHdsV5fgV8+ra98wE5w@mail.gmail.com> (raw)
In-Reply-To: <EB23A240-8A1B-468B-86C8-CF372FE745C5@fb.com>

On Tue, Jan 5, 2021 at 9:11 AM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Jan 5, 2021, at 8:27 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Jan 4, 2021 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>
> >>> On Jan 4, 2021, at 5:46 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >>>
> >>> On Fri, Dec 18, 2020 at 05:23:25PM +0000, Song Liu wrote:
> >>>>
> >>>>
> >>>>> On Dec 18, 2020, at 8:38 AM, Yonghong Song <yhs@fb.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 12/17/20 9:23 PM, Alexei Starovoitov wrote:
> >>>>>> On Thu, Dec 17, 2020 at 8:33 PM Song Liu <songliubraving@fb.com> wrote:
> >>>>>>>>
> >>>>>>>> ahh. I missed that. Makes sense.
> >>>>>>>> vm_file needs to be accurate, but vm_area_struct should be accessed as ptr_to_btf_id.
> >>>>>>>
> >>>>>>> Passing pointer of vm_area_struct into BPF will be tricky. For example, shall we
> >>>>>>> allow the user to access vma->vm_file? IIUC, with ptr_to_btf_id the verifier will
> >>>>>>> allow access of vma->vm_file as a valid pointer to struct file. However, since the
> >>>>>>> vma might be freed, vma->vm_file could point to random data.
> >>>>>> I don't think so. The proposed patch will do get_file() on it.
> >>>>>> There is actually no need to assign it into a different variable.
> >>>>>> Accessing it via vma->vm_file is safe and cleaner.
> >>>>>
> >>>>> I did not check the code but do you have scenarios where vma is freed but old vma->vm_file is not freed due to reference counting, but
> >>>>> freed vma area is reused so vma->vm_file could be garbage?
> >>>>
> >>>> AFAIK, once we unlock mmap_sem, the vma could be freed and reused. I guess ptr_to_btf_id
> >>>> or probe_read would not help with this?
> >>>
> >>> Theoretically we can hack the verifier to treat some ptr_to_btf_id as "less
> >>> valid" than the other ptr_to_btf_id, but the user experience will not be great.
> >>> Reading such bpf prog will not be obvious. I think it's better to run bpf prog
> >>> in mmap_lock then and let it access vma->vm_file. After prog finishes the iter
> >>> bit can do if (mmap_lock_is_contended()) before iterating. That will deliver
> >>> better performance too. Instead of task_vma_seq_get_next() doing
> >>> mmap_lock/unlock at every vma. No need for get_file() either. And no
> >>> __vm_area_struct exposure.
> >>
> >> I think there might be concern calling BPF program with mmap_lock, especially that
> >> the program is sleepable (for bpf_d_path). It shouldn't be a problem for common
> >> cases, but I am not 100% sure for corner cases (many instructions in BPF + sleep).
> >> Current version is designed to be very safe for the workload, which might be too
> >> conservative.
> >
> > I know and I agree with all that, but how do you propose to fix the
> > vm_file concern
> > without holding the lock while prog is running? I couldn't come up with a way.
>
> I guess the gap here is that I don't see why __vm_area_struct exposure is an issue.
> It is similar to __sk_buff, and simpler (though we had more reasons to introduce
> __sk_buff back when there wasn't BTF).
>
> If we can accept __vm_area_struct, current version should work, as it doesn't have
> problem with vm_file

True. The problem with __vm_area_struct is that it is a hard coded
uapi with little to none
extensibility. In this form vma iterator is not really useful beyond
the example in selftest.

  reply	other threads:[~2021-01-05 17:28 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15 23:36 [PATCH v2 bpf-next 0/4] introduce bpf_iter for task_vma Song Liu
2020-12-15 23:36 ` [PATCH v2 bpf-next 1/4] bpf: introduce task_vma bpf_iter Song Liu
2020-12-16 17:36   ` Yonghong Song
2020-12-16 19:41     ` Song Liu
2020-12-17  0:34   ` Andrii Nakryiko
2020-12-17  1:51     ` Song Liu
2020-12-17 19:03   ` Alexei Starovoitov
2020-12-17 22:08     ` Song Liu
2020-12-18  2:34       ` Alexei Starovoitov
2020-12-18  3:15         ` Yonghong Song
2020-12-18  4:33         ` Song Liu
2020-12-18  5:23           ` Alexei Starovoitov
2020-12-18 16:38             ` Yonghong Song
2020-12-18 17:23               ` Song Liu
2021-01-05  1:46                 ` Alexei Starovoitov
2021-01-05  5:47                   ` Song Liu
2021-01-05 16:27                     ` Alexei Starovoitov
2021-01-05 17:10                       ` Song Liu
2021-01-05 17:27                         ` Alexei Starovoitov [this message]
2021-01-05 19:38                           ` Song Liu
2021-01-05 19:46                             ` Alexei Starovoitov
2021-01-05 19:51                               ` Song Liu
2020-12-15 23:37 ` [PATCH v2 bpf-next 2/4] bpf: allow bpf_d_path in sleepable bpf_iter program Song Liu
2020-12-16 17:41   ` Yonghong Song
2020-12-16 18:15   ` KP Singh
2020-12-16 18:31     ` KP Singh
2021-01-25 12:52   ` Jiri Olsa
2020-12-15 23:37 ` [PATCH v2 bpf-next 3/4] libbpf: introduce section "iter.s/" for " Song Liu
2020-12-16 17:42   ` Yonghong Song
2020-12-16 18:00     ` KP Singh
2020-12-15 23:37 ` [PATCH v2 bpf-next 4/4] selftests/bpf: add test for bpf_iter_task_vma Song Liu
2020-12-16 18:18   ` Yonghong Song
2020-12-16 23:23     ` Song Liu
2020-12-16 17:00 ` [PATCH v2 bpf-next 0/4] introduce bpf_iter for task_vma Yonghong Song
2020-12-16 17:35   ` Song Liu
2020-12-16 18:31     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAADnVQLg-kQ9Neva0DvUW8CMiuNhv0HTHdsV5fgV8+ra98wE5w@mail.gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=Kernel-team@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.