All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Yang Shi <shy828301@gmail.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Xu <peterx@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Alistair Popple <apopple@nvidia.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Steven Price <steven.price@arm.com>,
	SeongJae Park <sj@kernel.org>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Huang Ying <ying.huang@intel.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Zack Rusin <zackr@vmware.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	Christoph Hellwig <hch@infradead.org>, Song Liu <song@kernel.org>,
	Thomas Hellstrom <thomas.hellstrom@linux.intel.com>,
	Russell King <linux@armlinux.org.uk>,
	"David S. Miller" <davem@davemloft.net>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>, Jann Horn <jannh@google.com>,
	Vishal Moola <vishal.moola@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>,
	linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH v3 13/13] mm/pgtable: notes on pte_offset_map[_lock]()
Date: Tue, 11 Jul 2023 21:46:23 -0700 (PDT)	[thread overview]
Message-ID: <b791c3b0-25c6-a263-d785-d564344eb644@google.com> (raw)
In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com>

Add a block of comments on pte_offset_map_lock(), pte_offset_map() and
pte_offset_map_nolock() to mm/pgtable-generic.c, to help explain them.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/pgtable-generic.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index fa9d4d084291..4fcd959dcc4d 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -315,6 +315,50 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd,
 	return pte;
 }
 
+/*
+ * pte_offset_map_lock(mm, pmd, addr, ptlp), and its internal implementation
+ * __pte_offset_map_lock() below, is usually called with the pmd pointer for
+ * addr, reached by walking down the mm's pgd, p4d, pud for addr: either while
+ * holding mmap_lock or vma lock for read or for write; or in truncate or rmap
+ * context, while holding file's i_mmap_lock or anon_vma lock for read (or for
+ * write). In a few cases, it may be used with pmd pointing to a pmd_t already
+ * copied to or constructed on the stack.
+ *
+ * When successful, it returns the pte pointer for addr, with its page table
+ * kmapped if necessary (when CONFIG_HIGHPTE), and locked against concurrent
+ * modification by software, with a pointer to that spinlock in ptlp (in some
+ * configs mm->page_table_lock, in SPLIT_PTLOCK configs a spinlock in table's
+ * struct page).  pte_unmap_unlock(pte, ptl) to unlock and unmap afterwards.
+ *
+ * But it is unsuccessful, returning NULL with *ptlp unchanged, if there is no
+ * page table at *pmd: if, for example, the page table has just been removed,
+ * or replaced by the huge pmd of a THP.  (When successful, *pmd is rechecked
+ * after acquiring the ptlock, and retried internally if it changed: so that a
+ * page table can be safely removed or replaced by THP while holding its lock.)
+ *
+ * pte_offset_map(pmd, addr), and its internal helper __pte_offset_map() above,
+ * just returns the pte pointer for addr, its page table kmapped if necessary;
+ * or NULL if there is no page table at *pmd.  It does not attempt to lock the
+ * page table, so cannot normally be used when the page table is to be updated,
+ * or when entries read must be stable.  But it does take rcu_read_lock(): so
+ * that even when page table is racily removed, it remains a valid though empty
+ * and disconnected table.  Until pte_unmap(pte) unmaps and rcu_read_unlock()s
+ * afterwards.
+ *
+ * pte_offset_map_nolock(mm, pmd, addr, ptlp), above, is like pte_offset_map();
+ * but when successful, it also outputs a pointer to the spinlock in ptlp - as
+ * pte_offset_map_lock() does, but in this case without locking it.  This helps
+ * the caller to avoid a later pte_lockptr(mm, *pmd), which might by that time
+ * act on a changed *pmd: pte_offset_map_nolock() provides the correct spinlock
+ * pointer for the page table that it returns.  In principle, the caller should
+ * recheck *pmd once the lock is taken; in practice, no callsite needs that -
+ * either the mmap_lock for write, or pte_same() check on contents, is enough.
+ *
+ * Note that free_pgtables(), used after unmapping detached vmas, or when
+ * exiting the whole mm, does not take page table lock before freeing a page
+ * table, and may not use RCU at all: "outsiders" like khugepaged should avoid
+ * pte_offset_map() and co once the vma is detached from mm or mm_users is zero.
+ */
 pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd,
 			     unsigned long addr, spinlock_t **ptlp)
 {
-- 
2.35.3


WARNING: multiple messages have this Message-ID (diff)
From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
	David Hildenbrand <david@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Yang Shi <shy828301@gmail.com>, Peter Xu <peterx@redhat.com>,
	linux-kernel@vger.kernel.org, Song Liu <song@kernel.org>,
	sparclinux@vger.kernel.org,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Will Deacon <will@kernel.org>,
	linux-s390@vger.kernel.org, Yu Zhao <yuzhao@google.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Alistair Popple <apopple@nvidia.com>,
	Russell King <linux@armlinux.org.uk>,
	Matthew Wilcox <willy@infradead.org>,
	Steven Price <steven.price@arm.com>,
	Christoph Hellwig <hch@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Zi Yan <ziy@nvidia.com>, Huang Ying <ying.huang@intel.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Thomas Hellstrom <thomas.hellstrom@linux.intel.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-arm-kernel@lists.infradead.org,
	SeongJae Park <sj@kernel.org>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Jann Horn <jannh@google.com>,
	linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Zack Rusin <zackr@vmware.com>,
	Vishal Moola <vishal.moola@gmail.com>,
	Minchan Kim <minchan@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	"David S. Miller" <davem@davemloft.net>,
	Mike Rapoport <rppt@kernel.org>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH v3 13/13] mm/pgtable: notes on pte_offset_map[_lock]()
Date: Tue, 11 Jul 2023 21:46:23 -0700 (PDT)	[thread overview]
Message-ID: <b791c3b0-25c6-a263-d785-d564344eb644@google.com> (raw)
In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com>

Add a block of comments on pte_offset_map_lock(), pte_offset_map() and
pte_offset_map_nolock() to mm/pgtable-generic.c, to help explain them.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/pgtable-generic.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index fa9d4d084291..4fcd959dcc4d 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -315,6 +315,50 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd,
 	return pte;
 }
 
+/*
+ * pte_offset_map_lock(mm, pmd, addr, ptlp), and its internal implementation
+ * __pte_offset_map_lock() below, is usually called with the pmd pointer for
+ * addr, reached by walking down the mm's pgd, p4d, pud for addr: either while
+ * holding mmap_lock or vma lock for read or for write; or in truncate or rmap
+ * context, while holding file's i_mmap_lock or anon_vma lock for read (or for
+ * write). In a few cases, it may be used with pmd pointing to a pmd_t already
+ * copied to or constructed on the stack.
+ *
+ * When successful, it returns the pte pointer for addr, with its page table
+ * kmapped if necessary (when CONFIG_HIGHPTE), and locked against concurrent
+ * modification by software, with a pointer to that spinlock in ptlp (in some
+ * configs mm->page_table_lock, in SPLIT_PTLOCK configs a spinlock in table's
+ * struct page).  pte_unmap_unlock(pte, ptl) to unlock and unmap afterwards.
+ *
+ * But it is unsuccessful, returning NULL with *ptlp unchanged, if there is no
+ * page table at *pmd: if, for example, the page table has just been removed,
+ * or replaced by the huge pmd of a THP.  (When successful, *pmd is rechecked
+ * after acquiring the ptlock, and retried internally if it changed: so that a
+ * page table can be safely removed or replaced by THP while holding its lock.)
+ *
+ * pte_offset_map(pmd, addr), and its internal helper __pte_offset_map() above,
+ * just returns the pte pointer for addr, its page table kmapped if necessary;
+ * or NULL if there is no page table at *pmd.  It does not attempt to lock the
+ * page table, so cannot normally be used when the page table is to be updated,
+ * or when entries read must be stable.  But it does take rcu_read_lock(): so
+ * that even when page table is racily removed, it remains a valid though empty
+ * and disconnected table.  Until pte_unmap(pte) unmaps and rcu_read_unlock()s
+ * afterwards.
+ *
+ * pte_offset_map_nolock(mm, pmd, addr, ptlp), above, is like pte_offset_map();
+ * but when successful, it also outputs a pointer to the spinlock in ptlp - as
+ * pte_offset_map_lock() does, but in this case without locking it.  This helps
+ * the caller to avoid a later pte_lockptr(mm, *pmd), which might by that time
+ * act on a changed *pmd: pte_offset_map_nolock() provides the correct spinlock
+ * pointer for the page table that it returns.  In principle, the caller should
+ * recheck *pmd once the lock is taken; in practice, no callsite needs that -
+ * either the mmap_lock for write, or pte_same() check on contents, is enough.
+ *
+ * Note that free_pgtables(), used after unmapping detached vmas, or when
+ * exiting the whole mm, does not take page table lock before freeing a page
+ * table, and may not use RCU at all: "outsiders" like khugepaged should avoid
+ * pte_offset_map() and co once the vma is detached from mm or mm_users is zero.
+ */
 pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd,
 			     unsigned long addr, spinlock_t **ptlp)
 {
-- 
2.35.3


  parent reply	other threads:[~2023-07-12  4:46 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-12  4:27 [PATCH v3 00/13] mm: free retracted page table by RCU Hugh Dickins
2023-07-12  4:27 ` Hugh Dickins
2023-07-12  4:30 ` [PATCH v3 01/13] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s Hugh Dickins
2023-07-12  4:30   ` Hugh Dickins
2023-07-12  4:32 ` [PATCH v3 02/13] mm/pgtable: add PAE safety to __pte_offset_map() Hugh Dickins
2023-07-12  4:32   ` Hugh Dickins
2023-07-12  4:33 ` [PATCH v3 03/13] arm: adjust_pte() use pte_offset_map_nolock() Hugh Dickins
2023-07-12  4:33   ` Hugh Dickins
2023-07-12  4:34 ` [PATCH v3 04/13] powerpc: assert_pte_locked() " Hugh Dickins
2023-07-12  4:34   ` Hugh Dickins
2023-07-18 10:41   ` Aneesh Kumar K.V
2023-07-18 10:41     ` Aneesh Kumar K.V
2023-07-19  5:04     ` Hugh Dickins
2023-07-19  5:04       ` Hugh Dickins
2023-07-19  5:24       ` Aneesh Kumar K V
2023-07-19  5:24         ` Aneesh Kumar K V
2023-07-21 13:13         ` Jay Patel
2023-07-21 13:13           ` Jay Patel
2023-07-23 22:26           ` [PATCH v3 04/13 fix] powerpc: assert_pte_locked() use pte_offset_map_nolock(): fix Hugh Dickins
2023-07-23 22:26             ` Hugh Dickins
2023-07-12  4:35 ` [PATCH v3 05/13] powerpc: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-07-12  4:35   ` Hugh Dickins
2023-07-12  4:37 ` [PATCH v3 06/13] sparc: add pte_free_defer() for pte_t *pgtable_t Hugh Dickins
2023-07-12  4:37   ` Hugh Dickins
2023-07-12  4:38 ` [PATCH v3 07/13] s390: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-07-12  4:38   ` Hugh Dickins
2023-07-13  4:47   ` Alexander Gordeev
2023-07-13  4:47     ` Alexander Gordeev
2023-07-19 14:25   ` Claudio Imbrenda
2023-07-19 14:25     ` Claudio Imbrenda
2023-07-23 22:29     ` [PATCH v3 07/13 fix] s390: add pte_free_defer() for pgtables sharing page: fix Hugh Dickins
2023-07-23 22:29       ` Hugh Dickins
2023-07-12  4:39 ` [PATCH v3 08/13] mm/pgtable: add pte_free_defer() for pgtable as page Hugh Dickins
2023-07-12  4:39   ` Hugh Dickins
2023-07-12  4:41 ` [PATCH v3 09/13] mm/khugepaged: retract_page_tables() without mmap or vma lock Hugh Dickins
2023-07-12  4:41   ` Hugh Dickins
2023-07-12  4:42 ` [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Hugh Dickins
2023-07-12  4:42   ` Hugh Dickins
2023-07-23 22:32   ` [PATCH v3 10/13 fix] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock(): fix Hugh Dickins
2023-07-23 22:32     ` Hugh Dickins
2023-08-03  9:17   ` [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Qi Zheng
2023-08-03  9:17     ` Qi Zheng
2023-08-06  3:55     ` Hugh Dickins
2023-08-06  3:55       ` Hugh Dickins
2023-08-07  2:21       ` Qi Zheng
2023-08-07  2:21         ` Qi Zheng
2023-08-06  3:59     ` [PATCH v3 10/13 fix2] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock(): fix2 Hugh Dickins
2023-08-06  3:59       ` Hugh Dickins
2023-08-14 20:36   ` [BUG] Re: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Jann Horn
2023-08-14 20:36     ` Jann Horn
2023-08-15  6:34     ` Hugh Dickins
2023-08-15  6:34       ` Hugh Dickins
2023-08-15  7:11       ` David Hildenbrand
2023-08-15  7:11         ` David Hildenbrand
2023-08-15 15:41         ` Hugh Dickins
2023-08-15 15:41           ` Hugh Dickins
2023-08-21 19:48     ` Hugh Dickins
2023-08-21 19:48       ` Hugh Dickins
2023-07-12  4:43 ` [PATCH v3 11/13] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() Hugh Dickins
2023-07-12  4:43   ` Hugh Dickins
2023-07-23 22:35   ` [PATCH v3 11/13 fix] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps(): fix Hugh Dickins
2023-07-23 22:35     ` Hugh Dickins
2023-07-12  4:44 ` [PATCH v3 12/13] mm: delete mmap_write_trylock() and vma_try_start_write() Hugh Dickins
2023-07-12  4:44   ` Hugh Dickins
2023-07-12  4:48   ` [PATCH mm " Hugh Dickins
2023-07-12  4:48     ` Hugh Dickins
2023-07-12  4:46 ` Hugh Dickins [this message]
2023-07-12  4:46   ` [PATCH v3 13/13] mm/pgtable: notes on pte_offset_map[_lock]() Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b791c3b0-25c6-a263-d785-d564344eb644@google.com \
    --to=hughd@google.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@infradead.org \
    --cc=imbrenda@linux.ibm.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lstoakes@gmail.com \
    --cc=mgorman@techsingularity.net \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=naoya.horiguchi@nec.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rcampbell@nvidia.com \
    --cc=rppt@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=song@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=zackr@vmware.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.