From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F644C43215 for ; Wed, 13 Nov 2019 18:46:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B9E77206F0 for ; Wed, 13 Nov 2019 18:46:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="a7A+dcs4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726409AbfKMSqC (ORCPT ); Wed, 13 Nov 2019 13:46:02 -0500 Received: from mail-ot1-f65.google.com ([209.85.210.65]:37305 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727049AbfKMSqC (ORCPT ); Wed, 13 Nov 2019 13:46:02 -0500 Received: by mail-ot1-f65.google.com with SMTP id d5so2554816otp.4 for ; Wed, 13 Nov 2019 10:46:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=cT4Ynp7AABnbetkKbYtY9L84HJMiir4x9aUvgO3qIkA=; b=a7A+dcs4qLyRwykFz5P8aF/5gHL9KK36xtIfXVr/kBk3uPlOEckoeys38y+nnoealE xzZU97hBkbVqSDW3gJxcm8meBZzrkQaKb0fOpplJk89VtBOEN/d/rvjF/p1FfJJqTTXv 4RFlHCPjmONBQGpTT3WPud0MHuI1arLibsaS9qkCPzQA4NQZsJVxDT/hq2PSgxZUbK6j hmavUnr3qxAC5dpB11ue+Kc8RjftT0go7LTB3CRWT06X6W6/Vr35u8YiRNHJv46YGe1d 7YKSAbhOh/ee5b8uqRA8DDWS3/IOwVUmAGMREeaTodHgoCOvb7Na9VVd6Z5oKT3OMDr8 TTJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=cT4Ynp7AABnbetkKbYtY9L84HJMiir4x9aUvgO3qIkA=; b=h3Vy+pCVBUplcpqmLsJq2E4TF9mJlqtNcf26+rtRUkhjBjGW/r99cPKTo1hRf1CG0P 1VPNNCYOt7URAXBh8Hpg+EoCxfF9VKLzUBmSbHWozwJqnG2CEi16AYtlt3ykzKgIS1IB Wt+tmPlYjxRdFFVRN1rpLB3pYPer1OhRiJbdKFdoT5S07ZN0yoIEmpRs3Zp8UT06h4jQ mISvCD/NLt6I8Rn27NzPHTOceSA/EveLbeZh3lmF2ewHpHxIAdwH+JN/f7+8I7lhq9AT o5ZgQEINMxA05XqqNI4YuHXNWepDdN0/x/ehCC3w1wCcv91dgolO2IPS+dsJ6kC2QZ/D ATSg== X-Gm-Message-State: APjAAAUfp8Oqv3I0KYu8s6Pdmy+HJqaX89JGEvHh3zVAOSXi9ckb+WFW YJKYPU3Cxn+J3RjbGRHP1FbjJUyeaQqn3uv7A1IamQ== X-Google-Smtp-Source: APXvYqyPn+rDZrNle0VfTIMlvvY+/KynBCjFHBDHqyFwY/x5q0B2O33H3JfKZdicH8PxUo9GG3nCazg6kogrLq1IvZ4= X-Received: by 2002:a05:6830:1af7:: with SMTP id c23mr4066831otd.247.1573670761142; Wed, 13 Nov 2019 10:46:01 -0800 (PST) MIME-Version: 1.0 References: <20191113042710.3997854-1-jhubbard@nvidia.com> <20191113042710.3997854-10-jhubbard@nvidia.com> <20191113104308.GE6367@quack2.suse.cz> In-Reply-To: <20191113104308.GE6367@quack2.suse.cz> From: Dan Williams Date: Wed, 13 Nov 2019 10:45:49 -0800 Message-ID: Subject: Re: [PATCH v4 09/23] mm/gup: introduce pin_user_pages*() and FOLL_PIN To: Jan Kara Cc: John Hubbard , Andrew Morton , Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Christoph Hellwig , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , bpf@vger.kernel.org, Maling list - DRI developers , KVM list , linux-block@vger.kernel.org, Linux Doc Mailing List , linux-fsdevel , linux-kselftest@vger.kernel.org, "Linux-media@vger.kernel.org" , linux-rdma , linuxppc-dev , Netdev , Linux MM , LKML , Mike Rapoport Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Wed, Nov 13, 2019 at 2:43 AM Jan Kara wrote: > > On Tue 12-11-19 20:26:56, John Hubbard wrote: > > Introduce pin_user_pages*() variations of get_user_pages*() calls, > > and also pin_longterm_pages*() variations. > > > > These variants all set FOLL_PIN, which is also introduced, and > > thoroughly documented. > > > > The pin_longterm*() variants also set FOLL_LONGTERM, in addition > > to FOLL_PIN: > > > > pin_user_pages() > > pin_user_pages_remote() > > pin_user_pages_fast() > > > > pin_longterm_pages() > > pin_longterm_pages_remote() > > pin_longterm_pages_fast() > > > > All pages that are pinned via the above calls, must be unpinned via > > put_user_page(). > > > > The underlying rules are: > > > > * These are gup-internal flags, so the call sites should not directly > > set FOLL_PIN nor FOLL_LONGTERM. That behavior is enforced with > > assertions, for the new FOLL_PIN flag. However, for the pre-existing > > FOLL_LONGTERM flag, which has some call sites that still directly > > set FOLL_LONGTERM, there is no assertion yet. > > > > * Call sites that want to indicate that they are going to do DirectIO > > ("DIO") or something with similar characteristics, should call a > > get_user_pages()-like wrapper call that sets FOLL_PIN. These wrappers > > will: > > * Start with "pin_user_pages" instead of "get_user_pages". That > > makes it easy to find and audit the call sites. > > * Set FOLL_PIN > > > > * For pages that are received via FOLL_PIN, those pages must be returne= d > > via put_user_page(). > > > > Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases > > in this documentation. (I've reworded it and expanded upon it.) > > > > Reviewed-by: Mike Rapoport # Documentation > > Reviewed-by: J=C3=A9r=C3=B4me Glisse > > Cc: Jonathan Corbet > > Cc: Ira Weiny > > Signed-off-by: John Hubbard > > Thanks for the documentation. It looks great! > > > diff --git a/mm/gup.c b/mm/gup.c > > index 83702b2e86c8..4409e84dff51 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -201,6 +201,10 @@ static struct page *follow_page_pte(struct vm_area= _struct *vma, > > spinlock_t *ptl; > > pte_t *ptep, pte; > > > > + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ > > + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) =3D=3D > > + (FOLL_PIN | FOLL_GET))) > > + return ERR_PTR(-EINVAL); > > retry: > > if (unlikely(pmd_bad(*pmd))) > > return no_page_table(vma, flags); > > How does FOLL_PIN result in grabbing (at least normal, for now) page refe= rence? > I didn't find that anywhere in this patch but it is a prerequisite to > converting any user to pin_user_pages() interface, right? > > > +/** > > + * pin_user_pages_fast() - pin user pages in memory without taking loc= ks > > + * > > + * Nearly the same as get_user_pages_fast(), except that FOLL_PIN is s= et. See > > + * get_user_pages_fast() for documentation on the function arguments, = because > > + * the arguments here are identical. > > + * > > + * FOLL_PIN means that the pages must be released via put_user_page().= Please > > + * see Documentation/vm/pin_user_pages.rst for further details. > > + * > > + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_page= s.rst. It > > + * is NOT intended for Case 2 (RDMA: long-term pins). > > + */ > > +int pin_user_pages_fast(unsigned long start, int nr_pages, > > + unsigned int gup_flags, struct page **pages) > > +{ > > + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ > > + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) > > + return -EINVAL; > > + > > + gup_flags |=3D FOLL_PIN; > > + return internal_get_user_pages_fast(start, nr_pages, gup_flags, p= ages); > > +} > > +EXPORT_SYMBOL_GPL(pin_user_pages_fast); > > I was somewhat wondering about the number of functions you add here. So w= e > have: > > pin_user_pages() > pin_user_pages_fast() > pin_user_pages_remote() > > and then longterm variants: > > pin_longterm_pages() > pin_longterm_pages_fast() > pin_longterm_pages_remote() > > and obviously we have gup like: > get_user_pages() > get_user_pages_fast() > get_user_pages_remote() > ... and some other gup variants ... > > I think we really should have pin_* vs get_* variants as they are very > different in terms of guarantees and after conversion, any use of get_* > variant in non-mm code should be closely scrutinized. OTOH pin_longterm_* > don't look *that* useful to me and just using pin_* instead with > FOLL_LONGTERM flag would look OK to me and somewhat reduce the number of > functions which is already large enough? What do people think? I don't fe= el > too strongly about this but wanted to bring this up. I'd vote for FOLL_LONGTERM should obviate the need for {get,pin}_user_pages_longterm(). It's a property that is passed by the call site, not an internal flag.