From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A4A4C4361B for ; Tue, 8 Dec 2020 19:49:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AC21E23C32 for ; Tue, 8 Dec 2020 19:49:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AC21E23C32 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 116916B005C; Tue, 8 Dec 2020 14:49:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0EE026B005D; Tue, 8 Dec 2020 14:49:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1F616B0068; Tue, 8 Dec 2020 14:49:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id DC1276B005C for ; Tue, 8 Dec 2020 14:49:08 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 98229180AD804 for ; Tue, 8 Dec 2020 19:49:08 +0000 (UTC) X-FDA: 77571153576.15.town78_2012cbf273e9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 719F71814B0C1 for ; Tue, 8 Dec 2020 19:49:08 +0000 (UTC) X-HE-Tag: town78_2012cbf273e9 X-Filterd-Recvd-Size: 6287 Received: from mail-qk1-f193.google.com (mail-qk1-f193.google.com [209.85.222.193]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 19:49:07 +0000 (UTC) Received: by mail-qk1-f193.google.com with SMTP id 143so2055114qke.10 for ; Tue, 08 Dec 2020 11:49:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=VZm6udYisX2VyALKkKAYGeC/tLoo12RCyTClApE0bpw=; b=f0RJYm+DiPajXiGT2EXQi0Ar6uv3VAHvmfJkampyd6hVKPrJZg3+EDIhFTPMWmjnhW qKGAi7W2i8QtoNbFc4HKtZ7CQRyYDqH9xfwhKurIrxCKccLPkYxpGp6PBM5ZQBjtmFTD yBYDS41hzMY7vtZ6fkubzG6MES7jMitdMthhjZ7r68dojTQFFPe+fc8xWK0enMjJuNW4 dW7KD676T1qV7nzhiLf1MFJyeAS5ubTQ5IjyaA7bzl/n6ZlcAWVYW1REkLnyMvhLAUfZ LL1xfZUSOH64WScaQFIzrKyqahNdRvdwh0/PAw6LWenFPyGOi7Zi7WKtysoZnkHnyjwp dTMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=VZm6udYisX2VyALKkKAYGeC/tLoo12RCyTClApE0bpw=; b=gFU3IyrBVKII9zBexoN/jC6x1ATedjghjB8y1ran3tQsyK+DQs3hYDh2qd2mn9qhT4 ebhg1F5RWv/vJnfPE7fiftu0XU0qGSOUrJ3LE0li4e+3diz1cGVB9F1F85PTWy6oMBch l4P/herjQEhAZYTDTAXYFgvPp3V9dDvud9S2toIMShdcoo/SZo8Pe+3Hv/1R8bcV4Yg5 T0CQO3DlkRbq79ij/t+QfOJdGUEjRCRtzSrSk2m+xO8/b8PhLjtHCuY6hch7A/tMjlZp p54OO7tghlyahuiNuOan78//V/sU3kaFQmeJ4C00T03TrctYl/IS1YcdUPtM9kluWPm4 utnA== X-Gm-Message-State: AOAM531jTmHOTcWSPTXwxPL5s3Jpb9h0ATesABcHyDAjqLsSUkDWCBcz Bp/PPeu60pQoYdbttvLnYpQJBA== X-Google-Smtp-Source: ABdhPJwwaBFmqa8J1WR3LQ1JGYnzsFGNIt2FnSNhHEbBTcDIKnQxYkzuF/pn+wWrnaLEFiomxtYMWg== X-Received: by 2002:a37:4f12:: with SMTP id d18mr12949177qkb.378.1607456947073; Tue, 08 Dec 2020 11:49:07 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-162-115-133.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.115.133]) by smtp.gmail.com with ESMTPSA id l66sm14160667qkd.105.2020.12.08.11.49.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 11:49:06 -0800 (PST) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kmiyv-0080hy-U9; Tue, 08 Dec 2020 15:49:05 -0400 Date: Tue, 8 Dec 2020 15:49:05 -0400 From: Jason Gunthorpe To: Joao Martins Cc: linux-mm@kvack.org, Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton Subject: Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages Message-ID: <20201208194905.GQ5487@ziepe.ca> References: <20201208172901.17384-1-joao.m.martins@oracle.com> <20201208172901.17384-8-joao.m.martins@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201208172901.17384-8-joao.m.martins@oracle.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 08, 2020 at 05:28:58PM +0000, Joao Martins wrote: > Much like hugetlbfs or THPs, we treat device pagemaps with > compound pages like the rest of GUP handling of compound pages. > > Rather than incrementing the refcount every 4K, we record > all sub pages and increment by @refs amount *once*. > > Performance measured by gup_benchmark improves considerably > get_user_pages_fast() and pin_user_pages_fast(): > > $ gup_benchmark -f /dev/dax0.2 -m 16384 -r 10 -S [-u,-a] -n 512 -w > > (get_user_pages_fast 2M pages) ~75k us -> ~3.6k us > (pin_user_pages_fast 2M pages) ~125k us -> ~3.8k us > > Signed-off-by: Joao Martins > mm/gup.c | 67 ++++++++++++++++++++++++++++++++++++++++++-------------- > 1 file changed, 51 insertions(+), 16 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index 98eb8e6d2609..194e6981eb03 100644 > +++ b/mm/gup.c > @@ -2250,22 +2250,68 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > } > #endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */ > > + > +static int record_subpages(struct page *page, unsigned long addr, > + unsigned long end, struct page **pages) > +{ > + int nr; > + > + for (nr = 0; addr != end; addr += PAGE_SIZE) > + pages[nr++] = page++; > + > + return nr; > +} > + > #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE) > -static int __gup_device_huge(unsigned long pfn, unsigned long addr, > - unsigned long end, unsigned int flags, > - struct page **pages, int *nr) > +static int __gup_device_compound_huge(struct dev_pagemap *pgmap, > + struct page *head, unsigned long sz, > + unsigned long addr, unsigned long end, > + unsigned int flags, struct page **pages) > +{ > + struct page *page; > + int refs; > + > + if (!(pgmap->flags & PGMAP_COMPOUND)) > + return -1; > + > + page = head + ((addr & (sz-1)) >> PAGE_SHIFT); All the places that call record_subpages do some kind of maths like this, it should be placed inside record_subpages and not opencoded everywhere. > + refs = record_subpages(page, addr, end, pages); > + > + SetPageReferenced(page); > + head = try_grab_compound_head(head, refs, flags); > + if (!head) { > + ClearPageReferenced(page); > + return 0; > + } > + > + return refs; > +} Why is all of this special? Any time we see a PMD/PGD/etc pointing to PFN we can apply this optimization. How come device has its own special path to do this?? Why do we need to check PGMAP_COMPOUND? Why do we need to get pgmap? (We already removed that from the hmm version of this, was that wrong? Is this different?) Dan? Also undo_dev_pagemap() is now out of date, we have unpin_user_pages() for that and no other error unwind touches ClearPageReferenced.. Basic idea is good though! Jason