From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E6DFC433EF for ; Wed, 8 Sep 2021 17:09:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DFD3661131 for ; Wed, 8 Sep 2021 17:09:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232976AbhIHRKq (ORCPT ); Wed, 8 Sep 2021 13:10:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229689AbhIHRKp (ORCPT ); Wed, 8 Sep 2021 13:10:45 -0400 Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02362C061575 for ; Wed, 8 Sep 2021 10:09:36 -0700 (PDT) Received: by mail-ej1-x62c.google.com with SMTP id n27so5604501eja.5 for ; Wed, 08 Sep 2021 10:09:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=oHtgMWyzDNnZhRHsFAyIiwKY5ZE/+saSyjV0LIIBwfU=; b=eIvwXEJ39rNOrLUIF/hRy/ntwr16AcVpFXMYuNAdtXwRRWf3sUpD+swY3FxNZhrs9Q 8INR7pBFdEgzIMAd+EC1dP5QhrZt6zp6MHGbgpNV3mJ71swD8kzoA5+iKR7zGwaetxGi wt4pOOHfX9sXOXPwAN6vRYu4PaSLGc7d6bq5E/x51gdE7pWgMNhi6+Yj3+E22roxynL8 fSeUg7MeAwUCEshyYT5czbBpIpo3sLMPNMXj300GiUM0yos2rEjBeAmt/hZYOkTyxh2y GP3rCTt6OrGXbE0O83R7x5WfJY50HRPP7cKFHi/yaE9xrDVY9IXA7Qd/dOOi2pDcJ5bo +ywQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oHtgMWyzDNnZhRHsFAyIiwKY5ZE/+saSyjV0LIIBwfU=; b=YMetxqIw1WStPIFwBTJOQAGr5UAQQ3Js9kIL10jv+6YzKwZHeom1zc4GffUGpT3pWF GIPUgu2Q20IzKKnUrhm2vO487r95OSN8MlMXgq+mSYq1WwzsM+/zPe9aafp6eTeRx2qh h4Fjix229ZH+Flx7NdD/R4BWuW+wN/OYDdgKsgmmr3VEUaZNgArit8xlZfJfIXVsPlEv jKfOG9p5npaJ+PhnqgzlJMZ0tkRyH+mX+Lr890jdeEzTRokvyrEMZCg5BeeE6qOUfcCJ G/tmbr0uFtELmUzpOH765Tdb9x5i7YwKo2HsOc/ZEu5vW4OHiFkPlTL3zRJhti7hx/Ko Fxsw== X-Gm-Message-State: AOAM531U1FSlPePSgMMtxccXSrkKRHLDUEifBCm5ux91zy9OhUHDTz5f 8VNYk2PHBDifHJTrCTqHB+MXjkqcAGqwCwfo4Xk8VA== X-Google-Smtp-Source: ABdhPJx27s5VEnYXZCusQFMWhYQE0tirmNpwOb0SmDjaXnGCRGisp+oqqHw4eZjRuvnTq3XKlPLP2bHvXYOOtyVml6w= X-Received: by 2002:a17:907:76ee:: with SMTP id kg14mr956915ejc.90.1631120975299; Wed, 08 Sep 2021 10:09:35 -0700 (PDT) MIME-Version: 1.0 References: <20210908044427.3632119-1-yhs@fb.com> <20210908135326.GZ1200268@ziepe.ca> <20210908151230.m2zyslt4qrufm4bv@revolver> In-Reply-To: From: Luigi Rizzo Date: Wed, 8 Sep 2021 19:09:23 +0200 Message-ID: Subject: Re: [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock To: Yonghong Song Cc: Liam Howlett , Peter Zijlstra , Jason Gunthorpe , Daniel Borkmann , Michel Lespinasse , bpf , "linux-mm@kvack.org" , Alexei Starovoitov , Andrii Nakryiko , "kernel-team@fb.com" , Andrew Morton Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Wed, Sep 8, 2021 at 6:10 PM Yonghong Song wrote: > > > > On 9/8/21 8:12 AM, Liam Howlett wrote: > > * Luigi Rizzo [210908 10:44]: > >> On Wed, Sep 8, 2021 at 4:16 PM Peter Zijlstra wrote: > >>> > >>> On Wed, Sep 08, 2021 at 10:53:26AM -0300, Jason Gunthorpe wrote: > >>>> On Wed, Sep 08, 2021 at 02:20:17PM +0200, Daniel Borkmann wrote: > >>>> > >>>>>> The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") > >>>>>> which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function > >>>>>> asserts that mm->mmap_lock needs to be held. But this is not the case for > >>>>>> bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which > >>>>>> uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held > >>>>>> in bpf_get_stack[id]() use case, the above warning is emitted during test run. > >> ... > >>>>> Luigi / Liam / Andrew, if the below looks reasonable to you, any objections to route the > >>>>> fix with your ACKs via bpf tree to Linus (or strong preference via -mm fixes)? > >>>> > >>>> Michel added this remark along with the mmap_read_trylock_non_owner: > >>>> > >>>> It's still not ideal that bpf/stackmap subverts the lock ownership in this > >>>> way. Thanks to Peter Zijlstra for suggesting this API as the least-ugly > >>>> way of addressing this in the short term. > >>>> > >>>> Subverting lockdep and then adding more and more core MM APIs to > >>>> support this seems quite a bit more ugly than originally expected. > >>>> > >>>> Michel's original idea to split out the lockdep abuse and put it only > >>>> in BPF is probably better. Obtain the mmap_read_trylock normally as > >>>> owner and then release ownership only before triggering the work. At > >>>> least lockdep will continue to work properly for the find_vma. > >>> > >>> The only right solution to all of this is the below. That function > >>> downright subverts all the locking rules we have. Spreading the hacks > >>> any further than that one function is absolutely unacceptable. > >> > >> I'd be inclined to agree that we should not introduce hacks around > >> locking rules. I don't know enough about the constraints of > >> bpf/stackmap, how much of a performance penalty do we pay with Peter's > >> patch, > >> and ow often one is expected to call this function ? > >> > >> cheers > >> luigi > > > > The hack already exists. The symptom of the larger issue is that > > lockdep now catches the hack when using find_vma(). > > > > If Peter's solution is acceptable to the bpf folks, then we can go ahead > > and drop the option of using the non_owner variant - which would be > > best. Otherwise the hack around the locking rule still exists as long > > as the find_vma() interface isn't used. > > Hi, Peter, Luigi, Liam, Jason, > > Peter's solution will definitely break user applications using > BPF_F_USER_BUILD_ID feature Again I am ignorant on the details so if you can clarify the following it may help me and others to better understand the problem: 1. Peter's patch appears to just take the same "fallback" path that would be taken if the trylock failed. Is this really a breakage or just loss of performance ? I would expect the latter, since it is called "fallback". 2. Assuming it is just loss of performance, it becomes important to clarify how much we are losing, and whether this is something to worry about, or it is only some rarely used function. Things like: - total CPU cost + latency difference (including deferred work, if any) in the "trylock ok" and "trylock failed" cases. I have no idea but since there is a trylock involved, I suppose we are in O(500ns) territory (when cold or contended) - how often we'd expect calls to the bpf helpers involved (BPF_F_USER_BUILD_ID in bpf_get_stack() or bpf_get_stack_pe()) ? thanks luigi From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BFE9C433F5 for ; Wed, 8 Sep 2021 17:09:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B938561157 for ; Wed, 8 Sep 2021 17:09:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B938561157 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D6F1F6B0071; Wed, 8 Sep 2021 13:09:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF79D6B0072; Wed, 8 Sep 2021 13:09:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBFBD900002; Wed, 8 Sep 2021 13:09:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0030.hostedemail.com [216.40.44.30]) by kanga.kvack.org (Postfix) with ESMTP id AA0BD6B0071 for ; Wed, 8 Sep 2021 13:09:37 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4A770183F96A3 for ; Wed, 8 Sep 2021 17:09:37 +0000 (UTC) X-FDA: 78565042794.27.5766F5D Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf28.hostedemail.com (Postfix) with ESMTP id DBFD990000A5 for ; Wed, 8 Sep 2021 17:09:36 +0000 (UTC) Received: by mail-ej1-f49.google.com with SMTP id jg16so5638758ejc.1 for ; Wed, 08 Sep 2021 10:09:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=oHtgMWyzDNnZhRHsFAyIiwKY5ZE/+saSyjV0LIIBwfU=; b=eIvwXEJ39rNOrLUIF/hRy/ntwr16AcVpFXMYuNAdtXwRRWf3sUpD+swY3FxNZhrs9Q 8INR7pBFdEgzIMAd+EC1dP5QhrZt6zp6MHGbgpNV3mJ71swD8kzoA5+iKR7zGwaetxGi wt4pOOHfX9sXOXPwAN6vRYu4PaSLGc7d6bq5E/x51gdE7pWgMNhi6+Yj3+E22roxynL8 fSeUg7MeAwUCEshyYT5czbBpIpo3sLMPNMXj300GiUM0yos2rEjBeAmt/hZYOkTyxh2y GP3rCTt6OrGXbE0O83R7x5WfJY50HRPP7cKFHi/yaE9xrDVY9IXA7Qd/dOOi2pDcJ5bo +ywQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oHtgMWyzDNnZhRHsFAyIiwKY5ZE/+saSyjV0LIIBwfU=; b=Fj5bzyMsB5dKb6e1BiVMELq0kj0NXutAfqFrpgLjYnbh2ko8YTBMmv+dOaYEjDo7M+ UXPDl8e1Hw/2Ynk2B647yxNkenanW066yUY+dSRgANIpiAGoD+ymrhRI/0en+ROndMUd gyOq7dBEb2I/tJ7v1D7NtkJUBe7IV5uZojcyY0WpW1PFyMY5rsLiJsntSKq2lXiiu5uK 0umEmAn9AawBgcOixRWqGpYGvn/jlR70yDmFH0JepuefAeB0bni0/3y+uVahXB2aVf9D CjZD3GR4F3bVCoPtjfdYbDiByalDuf9QRlICW7y+EJw7gOeUnENx+M1jfuJTX9TuIgxk Wdsw== X-Gm-Message-State: AOAM531igNimtqfyVeo3SuYJWXTM4MtqYpYTsvahiJfefo4d6lR8dQNY L8r8ZFohGHS18sqUz+I8Om7aR8iP60RI2hHZo7Ppqg== X-Google-Smtp-Source: ABdhPJx27s5VEnYXZCusQFMWhYQE0tirmNpwOb0SmDjaXnGCRGisp+oqqHw4eZjRuvnTq3XKlPLP2bHvXYOOtyVml6w= X-Received: by 2002:a17:907:76ee:: with SMTP id kg14mr956915ejc.90.1631120975299; Wed, 08 Sep 2021 10:09:35 -0700 (PDT) MIME-Version: 1.0 References: <20210908044427.3632119-1-yhs@fb.com> <20210908135326.GZ1200268@ziepe.ca> <20210908151230.m2zyslt4qrufm4bv@revolver> In-Reply-To: From: Luigi Rizzo Date: Wed, 8 Sep 2021 19:09:23 +0200 Message-ID: Subject: Re: [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock To: Yonghong Song Cc: Liam Howlett , Peter Zijlstra , Jason Gunthorpe , Daniel Borkmann , Michel Lespinasse , bpf , "linux-mm@kvack.org" , Alexei Starovoitov , Andrii Nakryiko , "kernel-team@fb.com" , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: DBFD990000A5 X-Stat-Signature: y8i971sejpzb8c5tptkwqsb9z4hep8fk Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eIvwXEJ3; spf=pass (imf28.hostedemail.com: domain of lrizzo@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=lrizzo@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1631120976-806675 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 8, 2021 at 6:10 PM Yonghong Song wrote: > > > > On 9/8/21 8:12 AM, Liam Howlett wrote: > > * Luigi Rizzo [210908 10:44]: > >> On Wed, Sep 8, 2021 at 4:16 PM Peter Zijlstra wrote: > >>> > >>> On Wed, Sep 08, 2021 at 10:53:26AM -0300, Jason Gunthorpe wrote: > >>>> On Wed, Sep 08, 2021 at 02:20:17PM +0200, Daniel Borkmann wrote: > >>>> > >>>>>> The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") > >>>>>> which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function > >>>>>> asserts that mm->mmap_lock needs to be held. But this is not the case for > >>>>>> bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which > >>>>>> uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held > >>>>>> in bpf_get_stack[id]() use case, the above warning is emitted during test run. > >> ... > >>>>> Luigi / Liam / Andrew, if the below looks reasonable to you, any objections to route the > >>>>> fix with your ACKs via bpf tree to Linus (or strong preference via -mm fixes)? > >>>> > >>>> Michel added this remark along with the mmap_read_trylock_non_owner: > >>>> > >>>> It's still not ideal that bpf/stackmap subverts the lock ownership in this > >>>> way. Thanks to Peter Zijlstra for suggesting this API as the least-ugly > >>>> way of addressing this in the short term. > >>>> > >>>> Subverting lockdep and then adding more and more core MM APIs to > >>>> support this seems quite a bit more ugly than originally expected. > >>>> > >>>> Michel's original idea to split out the lockdep abuse and put it only > >>>> in BPF is probably better. Obtain the mmap_read_trylock normally as > >>>> owner and then release ownership only before triggering the work. At > >>>> least lockdep will continue to work properly for the find_vma. > >>> > >>> The only right solution to all of this is the below. That function > >>> downright subverts all the locking rules we have. Spreading the hacks > >>> any further than that one function is absolutely unacceptable. > >> > >> I'd be inclined to agree that we should not introduce hacks around > >> locking rules. I don't know enough about the constraints of > >> bpf/stackmap, how much of a performance penalty do we pay with Peter's > >> patch, > >> and ow often one is expected to call this function ? > >> > >> cheers > >> luigi > > > > The hack already exists. The symptom of the larger issue is that > > lockdep now catches the hack when using find_vma(). > > > > If Peter's solution is acceptable to the bpf folks, then we can go ahead > > and drop the option of using the non_owner variant - which would be > > best. Otherwise the hack around the locking rule still exists as long > > as the find_vma() interface isn't used. > > Hi, Peter, Luigi, Liam, Jason, > > Peter's solution will definitely break user applications using > BPF_F_USER_BUILD_ID feature Again I am ignorant on the details so if you can clarify the following it may help me and others to better understand the problem: 1. Peter's patch appears to just take the same "fallback" path that would be taken if the trylock failed. Is this really a breakage or just loss of performance ? I would expect the latter, since it is called "fallback". 2. Assuming it is just loss of performance, it becomes important to clarify how much we are losing, and whether this is something to worry about, or it is only some rarely used function. Things like: - total CPU cost + latency difference (including deferred work, if any) in the "trylock ok" and "trylock failed" cases. I have no idea but since there is a trylock involved, I suppose we are in O(500ns) territory (when cold or contended) - how often we'd expect calls to the bpf helpers involved (BPF_F_USER_BUILD_ID in bpf_get_stack() or bpf_get_stack_pe()) ? thanks luigi