linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Yang Shi <shy828301@gmail.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Xu <peterx@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Alistair Popple <apopple@nvidia.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Steven Price <steven.price@arm.com>,
	SeongJae Park <sj@kernel.org>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Huang Ying <ying.huang@intel.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Zack Rusin <zackr@vmware.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	Christoph Hellwig <hch@infradead.org>, Song Liu <song@kernel.org>,
	Thomas Hellstrom <thomas.hellstrom@linux.intel.com>,
	Russell King <linux@armlinux.org.uk>,
	"David Sc. Miller" <davem@davemloft.net>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Jann Horn <jannh@google.com>,
	Vishal Moola <vishal.moola@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page
Date: Tue, 20 Jun 2023 20:52:38 -0300	[thread overview]
Message-ID: <ZJI7xkXWmjrE1yY3@ziepe.ca> (raw)
In-Reply-To: <2ad8b6cf-692a-ff89-ecc-586c20c5e07f@google.com>

On Tue, Jun 20, 2023 at 12:54:25PM -0700, Hugh Dickins wrote:
> On Tue, 20 Jun 2023, Jason Gunthorpe wrote:
> > On Tue, Jun 20, 2023 at 12:47:54AM -0700, Hugh Dickins wrote:
> > > Add powerpc-specific pte_free_defer(), to call pte_free() via call_rcu().
> > > pte_free_defer() will be called inside khugepaged's retract_page_tables()
> > > loop, where allocating extra memory cannot be relied upon.  This precedes
> > > the generic version to avoid build breakage from incompatible pgtable_t.
> > > 
> > > This is awkward because the struct page contains only one rcu_head, but
> > > that page may be shared between PTE_FRAG_NR pagetables, each wanting to
> > > use the rcu_head at the same time: account concurrent deferrals with a
> > > heightened refcount, only the first making use of the rcu_head, but
> > > re-deferring if more deferrals arrived during its grace period.
> > 
> > You didn't answer my question why we can't just move the rcu to the
> > actual free page?
> 
> I thought that I had answered it, perhaps not to your satisfaction:
> 
> https://lore.kernel.org/linux-mm/9130acb-193-6fdd-f8df-75766e663978@google.com/
> 
> My conclusion then was:
> Not very good reasons: good enough, or can you supply a better patch?

Oh, I guess I didn't read that email as answering the question..

I was saying to make pte_fragment_free() unconditionally do the
RCU. It is the only thing that uses the page->rcu_head, and it means
PPC would double RCU the final free on the TLB path, but that is
probably OK for now. This means pte_free_defer() won't do anything
special on PPC as PPC will always RCU free these things, this address
the defer concern too, I think. Overall it is easier to reason about.

I looked at fixing the TLB stuff to avoid the double rcu but quickly
got scared that ppc was using a kmem_cache to allocate other page
table sizes so there is not a reliable struct page to get a rcu_head
from. This looks like the main challenge for ppc... We'd have to teach
the tlb code to not do its own RCU stuff for table levels that the
arch is already RCU freeing - and that won't get us to full RCU
freeing on PPC.

Anyhow, this is a full version of what I was thinking:

diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 20652daa1d7e3a..b5dcd0f27fc115 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -106,6 +106,21 @@ pte_t *pte_fragment_alloc(struct mm_struct *mm, int kernel)
 	return __alloc_for_ptecache(mm, kernel);
 }
 
+static void pgtable_free_cb(struct rcu_head *head)
+{
+	struct page *page = container_of(head, struct page, rcu_head);
+
+	pgtable_pte_page_dtor(page);
+	__free_page(page);
+}
+
+static void pgtable_free_cb_kernel(struct rcu_head *head)
+{
+	struct page *page = container_of(head, struct page, rcu_head);
+
+	__free_page(page);
+}
+
 void pte_fragment_free(unsigned long *table, int kernel)
 {
 	struct page *page = virt_to_page(table);
@@ -115,8 +130,13 @@ void pte_fragment_free(unsigned long *table, int kernel)
 
 	BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
 	if (atomic_dec_and_test(&page->pt_frag_refcount)) {
+		/*
+		 * Always RCU free pagetable memory. rcu_head overlaps with lru
+		 * which is no longer in use by the time the table is freed.
+		 */
 		if (!kernel)
-			pgtable_pte_page_dtor(page);
-		__free_page(page);
+			call_rcu(&page->rcu_head, pgtable_free_cb);
+		else
+			call_rcu(&page->rcu_head, pgtable_free_cb_kernel);
 	}
 }

  reply	other threads:[~2023-06-20 23:52 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-20  7:35 [PATCH v2 00/12] mm: free retracted page table by RCU Hugh Dickins
2023-06-20  7:40 ` [PATCH v2 01/12] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s Hugh Dickins
2023-06-20  7:42 ` [PATCH v2 02/12] mm/pgtable: add PAE safety to __pte_offset_map() Hugh Dickins
2023-06-20  7:43 ` [PATCH v2 03/12] arm: adjust_pte() use pte_offset_map_nolock() Hugh Dickins
2023-06-20  7:45 ` [PATCH v2 04/12] powerpc: assert_pte_locked() " Hugh Dickins
2023-06-20  7:47 ` [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-06-20 11:45   ` Jason Gunthorpe
2023-06-20 19:54     ` Hugh Dickins
2023-06-20 23:52       ` Jason Gunthorpe [this message]
2023-06-22  2:36         ` Hugh Dickins
2023-06-27 17:01           ` Jason Gunthorpe
2023-06-27 20:53             ` Hugh Dickins
2023-06-20  7:49 ` [PATCH v2 06/12] sparc: add pte_free_defer() for pte_t *pgtable_t Hugh Dickins
2023-06-20  7:51 ` [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-06-28 19:16   ` Gerald Schaefer
2023-06-29  5:08     ` Hugh Dickins
2023-06-29 15:22       ` Jason Gunthorpe
2023-06-29 15:56         ` Gerald Schaefer
2023-06-30  6:00           ` Hugh Dickins
2023-07-02  4:32             ` Hugh Dickins
2023-07-04 13:40               ` Alexander Gordeev
2023-07-04 16:03                 ` Hugh Dickins
2023-07-04 15:19               ` Gerald Schaefer
2023-07-04 17:03                 ` Hugh Dickins
2023-07-05 12:55                   ` Gerald Schaefer
2023-07-06  1:20                     ` Hugh Dickins
2023-07-06 15:02                       ` Gerald Schaefer
2023-07-06 19:45                         ` Hugh Dickins
2023-07-10 17:21                     ` Jason Gunthorpe
2023-07-05  6:46               ` Alexander Gordeev
2023-07-06  0:52                 ` Hugh Dickins
2023-07-07 14:37                   ` Gerald Schaefer
2023-07-03 16:10             ` Gerald Schaefer
2023-06-29 13:59     ` Alexander Gordeev
2023-06-29 15:43       ` Gerald Schaefer
2023-06-30 13:38   ` Claudio Imbrenda
2023-06-30 15:28     ` Hugh Dickins
2023-06-30 16:25       ` Claudio Imbrenda
2023-06-30 19:22         ` Hugh Dickins
2023-07-03 11:00           ` Claudio Imbrenda
2023-07-03 21:29             ` Jason Gunthorpe
2023-06-20  7:53 ` [PATCH v2 08/12] mm/pgtable: add pte_free_defer() for pgtable as page Hugh Dickins
2023-06-20  7:54 ` [PATCH v2 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock Hugh Dickins
2023-06-20  7:56 ` [PATCH v2 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Hugh Dickins
2023-06-20  8:04   ` [PATCH mm " Hugh Dickins
2023-06-20  7:58 ` [PATCH v2 11/12] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() Hugh Dickins
2023-06-20  7:59 ` [PATCH v2 12/12] mm: delete mmap_write_trylock() and vma_try_start_write() Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJI7xkXWmjrE1yY3@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lstoakes@gmail.com \
    --cc=mgorman@techsingularity.net \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=naoya.horiguchi@nec.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rcampbell@nvidia.com \
    --cc=rppt@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=song@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=zackr@vmware.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).