From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1269FC433EF for ; Fri, 4 Feb 2022 20:21:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 33C778D0003; Fri, 4 Feb 2022 15:21:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C7A06B007D; Fri, 4 Feb 2022 15:21:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07A948D0003; Fri, 4 Feb 2022 15:21:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id E65686B007B for ; Fri, 4 Feb 2022 15:21:34 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id ABBC38281C0E for ; Fri, 4 Feb 2022 20:21:34 +0000 (UTC) X-FDA: 79106217708.15.E277553 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf03.hostedemail.com (Postfix) with ESMTP id 347BE20006 for ; Fri, 4 Feb 2022 20:21:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=BiQ0QWuI9mezri/gHhkMbqNNBh1ntqomogdBFez09R4=; b=ia4K0m1QScGbUNOeiEth1cIdVh /7lB0O3bbSv5Pwo3H/GvUK2PHMKgObvSe1l8LscZcQBghmcDCFqL+jj4wuMa2s9doRZ+HaEmPVAob VqIr1g2KA04nXH6eBECX++TCjo2/SEBUIzGIJ3iXq0X+0uEXufh4g8LDUVb4iLYmhb4bph1xVtkYB WC/BFxWfZDmwu8tnuvaSfgUYKki9ilcTQzz6OaGYsWfjXEYOfnUBR7LXcrIskIzVD+Rm3FWwotsAu NRSV2is58dgEtLVPFTtk9ldIiaz2BZV79usZkHkT+W4G1zhemBa1EKeIl4m/8wvgiBe6L916Ny7rc QRK/1l7Q==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nG4jU-007LlP-K6; Fri, 04 Feb 2022 19:59:00 +0000 From: "Matthew Wilcox (Oracle)" To: linux-mm@kvack.org Cc: "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, John Hubbard , Christoph Hellwig , Jason Gunthorpe , William Kucharski Subject: [PATCH 12/75] mm: Make compound_pincount always available Date: Fri, 4 Feb 2022 19:57:49 +0000 Message-Id: <20220204195852.1751729-13-willy@infradead.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220204195852.1751729-1-willy@infradead.org> References: <20220204195852.1751729-1-willy@infradead.org> MIME-Version: 1.0 X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 347BE20006 X-Stat-Signature: brabind6i8yiq59umoj9phb9gima8qef Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ia4K0m1Q; dmarc=none; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-HE-Tag: 1644006094-98988 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Move compound_pincount from the third page to the second page, which means it's available for all compound pages. That lets us delete hpage_pincount_available(). On 32-bit systems, there isn't enough space for both compound_pincount and compound_nr in the second page (it would collide with page->private, which is in use for pages in the swap cache), so revert the optimisation of storing both compound_order and compound_nr on 32-bit systems. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: John Hubbard Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: William Kucharski --- Documentation/core-api/pin_user_pages.rst | 18 +++++++++--------- include/linux/mm.h | 21 ++++++++------------- include/linux/mm_types.h | 7 +++++-- mm/debug.c | 14 ++++---------- mm/gup.c | 20 +++++++++----------- mm/page_alloc.c | 3 +-- mm/rmap.c | 6 ++---- 7 files changed, 38 insertions(+), 51 deletions(-) diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/co= re-api/pin_user_pages.rst index fcf605be43d0..b18416f4500f 100644 --- a/Documentation/core-api/pin_user_pages.rst +++ b/Documentation/core-api/pin_user_pages.rst @@ -55,18 +55,18 @@ flags the caller provides. The caller is required to = pass in a non-null struct pages* array, and the function then pins pages by incrementing each by a= special value: GUP_PIN_COUNTING_BIAS. =20 -For huge pages (and in fact, any compound page of more than 2 pages), th= e -GUP_PIN_COUNTING_BIAS scheme is not used. Instead, an exact form of pin = counting -is achieved, by using the 3rd struct page in the compound page. A new st= ruct -page field, hpage_pinned_refcount, has been added in order to support th= is. +For compound pages, the GUP_PIN_COUNTING_BIAS scheme is not used. Instea= d, +an exact form of pin counting is achieved, by using the 2nd struct page +in the compound page. A new struct page field, compound_pincount, has +been added in order to support this. =20 This approach for compound pages avoids the counting upper limit problem= s that are discussed below. Those limitations would have been aggravated severe= ly by huge pages, because each tail page adds a refcount to the head page. And= in -fact, testing revealed that, without a separate hpage_pinned_refcount fi= eld, +fact, testing revealed that, without a separate compound_pincount field, page overflows were seen in some huge page stress tests. =20 -This also means that huge pages and compound pages (of order > 1) do not= suffer +This also means that huge pages and compound pages do not suffer from the false positives problem that is mentioned below.:: =20 Function @@ -264,9 +264,9 @@ place.) Other diagnostics =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -dump_page() has been enhanced slightly, to handle these new counting fie= lds, and -to better report on compound pages in general. Specifically, for compoun= d pages -with order > 1, the exact (hpage_pinned_refcount) pincount is reported. +dump_page() has been enhanced slightly, to handle these new counting +fields, and to better report on compound pages in general. Specifically, +for compound pages, the exact (compound_pincount) pincount is reported. =20 References =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D diff --git a/include/linux/mm.h b/include/linux/mm.h index e679a7d66200..dd7d6e95e43b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -891,17 +891,6 @@ static inline void destroy_compound_page(struct page= *page) compound_page_dtors[page[1].compound_dtor](page); } =20 -static inline bool hpage_pincount_available(struct page *page) -{ - /* - * Can the page->hpage_pinned_refcount field be used? That field is in - * the 3rd page of the compound page, so the smallest (2-page) compound - * pages cannot support it. - */ - page =3D compound_head(page); - return PageCompound(page) && compound_order(page) > 1; -} - static inline int head_compound_pincount(struct page *head) { return atomic_read(compound_pincount_ptr(head)); @@ -909,7 +898,7 @@ static inline int head_compound_pincount(struct page = *head) =20 static inline int compound_pincount(struct page *page) { - VM_BUG_ON_PAGE(!hpage_pincount_available(page), page); + VM_BUG_ON_PAGE(!PageCompound(page), page); page =3D compound_head(page); return head_compound_pincount(page); } @@ -917,7 +906,9 @@ static inline int compound_pincount(struct page *page= ) static inline void set_compound_order(struct page *page, unsigned int or= der) { page[1].compound_order =3D order; +#ifdef CONFIG_64BIT page[1].compound_nr =3D 1U << order; +#endif } =20 /* Returns the number of pages in this potentially compound page. */ @@ -925,7 +916,11 @@ static inline unsigned long compound_nr(struct page = *page) { if (!PageHead(page)) return 1; +#ifdef CONFIG_64BIT return page[1].compound_nr; +#else + return 1UL << compound_order(page); +#endif } =20 /* Returns the number of bytes in this potentially compound page. */ @@ -1307,7 +1302,7 @@ void unpin_user_pages(struct page **pages, unsigned= long npages); */ static inline bool page_maybe_dma_pinned(struct page *page) { - if (hpage_pincount_available(page)) + if (PageCompound(page)) return compound_pincount(page) > 0; =20 /* diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5140e5feb486..e510ff214acf 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -126,11 +126,14 @@ struct page { unsigned char compound_dtor; unsigned char compound_order; atomic_t compound_mapcount; + atomic_t compound_pincount; +#ifdef CONFIG_64BIT unsigned int compound_nr; /* 1 << compound_order */ +#endif }; struct { /* Second tail page of compound page */ unsigned long _compound_pad_1; /* compound_head */ - atomic_t hpage_pinned_refcount; + unsigned long _compound_pad_2; /* For both global and memcg */ struct list_head deferred_list; }; @@ -285,7 +288,7 @@ static inline atomic_t *compound_mapcount_ptr(struct = page *page) =20 static inline atomic_t *compound_pincount_ptr(struct page *page) { - return &page[2].hpage_pinned_refcount; + return &page[1].compound_pincount; } =20 /* diff --git a/mm/debug.c b/mm/debug.c index bc9ac87f0e08..c4cf44266430 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -92,16 +92,10 @@ static void __dump_page(struct page *page) page, page_ref_count(head), mapcount, mapping, page_to_pgoff(page), page_to_pfn(page)); if (compound) { - if (hpage_pincount_available(page)) { - pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n= ", - head, compound_order(head), - head_compound_mapcount(head), - head_compound_pincount(head)); - } else { - pr_warn("head:%p order:%u compound_mapcount:%d\n", - head, compound_order(head), - head_compound_mapcount(head)); - } + pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n"= , + head, compound_order(head), + head_compound_mapcount(head), + head_compound_pincount(head)); } =20 #ifdef CONFIG_MEMCG diff --git a/mm/gup.c b/mm/gup.c index af623a139995..a444b94c96fd 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -99,12 +99,11 @@ static inline struct page *try_get_compound_head(stru= ct page *page, int refs) * * FOLL_GET: page's refcount will be incremented by @refs. * - * FOLL_PIN on compound pages that are > two pages long: page's refco= unt will - * be incremented by @refs, and page[2].hpage_pinned_refcount will be - * incremented by @refs * GUP_PIN_COUNTING_BIAS. + * FOLL_PIN on compound pages: page's refcount will be incremented by + * @refs, and page[1].compound_pincount will be incremented by @refs. * - * FOLL_PIN on normal pages, or compound pages that are two pages lon= g: - * page's refcount will be incremented by @refs * GUP_PIN_COUNTING_BI= AS. + * FOLL_PIN on normal pages: page's refcount will be incremented by + * @refs * GUP_PIN_COUNTING_BIAS. * * Return: head page (with refcount appropriately incremented) for succe= ss, or * NULL upon failure. If neither FOLL_GET nor FOLL_PIN was set, that's @@ -135,16 +134,15 @@ __maybe_unused struct page *try_grab_compound_head(= struct page *page, return NULL; =20 /* - * When pinning a compound page of order > 1 (which is - * what hpage_pincount_available() checks for), use an - * exact count to track it. + * When pinning a compound page, use an exact count to + * track it. * * However, be sure to *also* increment the normal page * refcount field at least once, so that the page really * is pinned. That's why the refcount from the earlier * try_get_compound_head() is left intact. */ - if (hpage_pincount_available(page)) + if (PageHead(page)) atomic_add(refs, compound_pincount_ptr(page)); else page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1)); @@ -166,7 +164,7 @@ static void put_compound_head(struct page *page, int = refs, unsigned int flags) if (flags & FOLL_PIN) { mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, refs); - if (hpage_pincount_available(page)) + if (PageHead(page)) atomic_sub(refs, compound_pincount_ptr(page)); else refs *=3D GUP_PIN_COUNTING_BIAS; @@ -211,7 +209,7 @@ bool __must_check try_grab_page(struct page *page, un= signed int flags) * increment the normal page refcount field at least once, * so that the page really is pinned. */ - if (hpage_pincount_available(page)) { + if (PageHead(page)) { page_ref_add(page, 1); atomic_add(1, compound_pincount_ptr(page)); } else { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3589febc6d31..02283598fd14 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -734,8 +734,7 @@ static void prep_compound_head(struct page *page, uns= igned int order) set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); set_compound_order(page, order); atomic_set(compound_mapcount_ptr(page), -1); - if (hpage_pincount_available(page)) - atomic_set(compound_pincount_ptr(page), 0); + atomic_set(compound_pincount_ptr(page), 0); } =20 static void prep_compound_tail(struct page *head, int tail_idx) diff --git a/mm/rmap.c b/mm/rmap.c index 6a1e8c7f6213..a531b64d53fa 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1216,8 +1216,7 @@ void page_add_new_anon_rmap(struct page *page, VM_BUG_ON_PAGE(!PageTransHuge(page), page); /* increment count (starts at -1) */ atomic_set(compound_mapcount_ptr(page), 0); - if (hpage_pincount_available(page)) - atomic_set(compound_pincount_ptr(page), 0); + atomic_set(compound_pincount_ptr(page), 0); =20 __mod_lruvec_page_state(page, NR_ANON_THPS, nr); } else { @@ -2439,8 +2438,7 @@ void hugepage_add_new_anon_rmap(struct page *page, { BUG_ON(address < vma->vm_start || address >=3D vma->vm_end); atomic_set(compound_mapcount_ptr(page), 0); - if (hpage_pincount_available(page)) - atomic_set(compound_pincount_ptr(page), 0); + atomic_set(compound_pincount_ptr(page), 0); =20 __page_set_anon_rmap(page, vma, address, 1); } --=20 2.34.1