All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Luigi Rizzo <lrizzo@google.com>
Cc: Yonghong Song <yhs@fb.com>,
	Liam Howlett <liam.howlett@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Michel Lespinasse <walken@google.com>, bpf <bpf@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	"kernel-team@fb.com" <kernel-team@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock
Date: Wed, 8 Sep 2021 10:21:18 -0700	[thread overview]
Message-ID: <20210908172118.n2f4w7epm6hh62zf@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CAMOZA0+2KLgYTXDZHGUYFnYezee=_hH6kFVM+-n2ZQuFTfh6yg@mail.gmail.com>

On Wed, Sep 08, 2021 at 07:09:23PM +0200, Luigi Rizzo wrote:
> On Wed, Sep 8, 2021 at 6:10 PM Yonghong Song <yhs@fb.com> wrote:
> >
> >
> >
> > On 9/8/21 8:12 AM, Liam Howlett wrote:
> > > * Luigi Rizzo <lrizzo@google.com> [210908 10:44]:
> > >> On Wed, Sep 8, 2021 at 4:16 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >>>
> > >>> On Wed, Sep 08, 2021 at 10:53:26AM -0300, Jason Gunthorpe wrote:
> > >>>> On Wed, Sep 08, 2021 at 02:20:17PM +0200, Daniel Borkmann wrote:
> > >>>>
> > >>>>>> The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
> > >>>>>> which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function
> > >>>>>> asserts that mm->mmap_lock needs to be held. But this is not the case for
> > >>>>>> bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which
> > >>>>>> uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held
> > >>>>>> in bpf_get_stack[id]() use case, the above warning is emitted during test run.
> > >> ...
> > >>>>> Luigi / Liam / Andrew, if the below looks reasonable to you, any objections to route the
> > >>>>> fix with your ACKs via bpf tree to Linus (or strong preference via -mm fixes)?
> > >>>>
> > >>>> Michel added this remark along with the mmap_read_trylock_non_owner:
> > >>>>
> > >>>>      It's still not ideal that bpf/stackmap subverts the lock ownership in this
> > >>>>      way.  Thanks to Peter Zijlstra for suggesting this API as the least-ugly
> > >>>>      way of addressing this in the short term.
> > >>>>
> > >>>> Subverting lockdep and then adding more and more core MM APIs to
> > >>>> support this seems quite a bit more ugly than originally expected.
> > >>>>
> > >>>> Michel's original idea to split out the lockdep abuse and put it only
> > >>>> in BPF is probably better. Obtain the mmap_read_trylock normally as
> > >>>> owner and then release ownership only before triggering the work. At
> > >>>> least lockdep will continue to work properly for the find_vma.
> > >>>
> > >>> The only right solution to all of this is the below. That function
> > >>> downright subverts all the locking rules we have. Spreading the hacks
> > >>> any further than that one function is absolutely unacceptable.
> > >>
> > >> I'd be inclined to agree that we should not introduce hacks around
> > >> locking rules. I don't know enough about the constraints of
> > >> bpf/stackmap, how much of a performance penalty do we pay with Peter's
> > >> patch,
> > >> and ow often one is expected to call this function ?
> > >>
> > >> cheers
> > >> luigi
> > >
> > > The hack already exists.  The symptom of the larger issue is that
> > > lockdep now catches the hack when using find_vma().
> > >
> > > If Peter's solution is acceptable to the bpf folks, then we can go ahead
> > > and drop the option of using the non_owner variant - which would be
> > > best.  Otherwise the hack around the locking rule still exists as long
> > > as the find_vma() interface isn't used.
> >
> > Hi, Peter, Luigi, Liam, Jason,
> >
> > Peter's solution will definitely break user applications using
> > BPF_F_USER_BUILD_ID feature
> 
> Again I am ignorant on the details so if you can clarify the following
> it may help me and others to better understand the problem:
> 
> 1. Peter's patch appears to just take the same "fallback" path
>    that would be taken if the trylock failed.
>    Is this really a breakage or just loss of performance ?
>    I would expect the latter, since it is called "fallback".

As Yonghong explained it's a user space breakage.
User space tooling expects build_id to be available 99.999% of the time
and that's what users observed in practice.
They've built a bunch of tools on top of this feature.
The data from these tools goes into various datacenter tables
and humans analyze it later.
So Peter's proposal is not acceptable. We don't want to get yelled at.

> 2. Assuming it is just loss of performance, it becomes important to

It's nothing to do with performance.

  reply	other threads:[~2021-09-08 17:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-08  4:44 [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock Yonghong Song
2021-09-08 12:20 ` Daniel Borkmann
2021-09-08 13:53   ` Jason Gunthorpe
2021-09-08 14:15     ` Peter Zijlstra
2021-09-08 14:43       ` Luigi Rizzo
2021-09-08 14:43         ` Luigi Rizzo
2021-09-08 15:12         ` Liam Howlett
2021-09-08 16:09           ` Yonghong Song
2021-09-08 17:09             ` Luigi Rizzo
2021-09-08 17:09               ` Luigi Rizzo
2021-09-08 17:21               ` Alexei Starovoitov [this message]
2021-09-08 17:52                 ` Andrew Morton
2021-09-08 18:02                   ` Alexei Starovoitov
2021-09-08 18:15                     ` Andrew Morton
2021-09-08 18:20                       ` Alexei Starovoitov
2021-09-08 18:20                         ` Alexei Starovoitov
2021-09-08 18:30                         ` Liam Howlett
2021-09-08 18:45                           ` Yonghong Song
2021-09-08 18:49                           ` Jason Gunthorpe
2021-09-08 19:11                             ` Yonghong Song
2021-09-08 23:33                               ` Jason Gunthorpe
2021-09-09  5:50                                 ` Yonghong Song
2021-09-09  8:05                           ` Peter Zijlstra
2021-09-08 18:43                     ` Liam Howlett
2021-09-08 19:42                       ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210908172118.n2f4w7epm6hh62zf@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jgg@ziepe.ca \
    --cc=kernel-team@fb.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=lrizzo@google.com \
    --cc=peterz@infradead.org \
    --cc=walken@google.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.