From: Hugh Dickins <hughd@google.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Kravetz <mike.kravetz@oracle.com>, Mike Rapoport <rppt@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Matthew Wilcox <willy@infradead.org>, David Hildenbrand <david@redhat.com>, Suren Baghdasaryan <surenb@google.com>, Qi Zheng <zhengqi.arch@bytedance.com>, Yang Shi <shy828301@gmail.com>, Mel Gorman <mgorman@techsingularity.net>, Peter Xu <peterx@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>, Alistair Popple <apopple@nvidia.com>, Ralph Campbell <rcampbell@nvidia.com>, Ira Weiny <ira.weiny@intel.com>, Steven Price <steven.price@arm.com>, SeongJae Park <sj@kernel.org>, Lorenzo Stoakes <lstoakes@gmail.com>, Huang Ying <ying.huang@intel.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, Zack Rusin <zackr@vmware.com>, Jason Gunthorpe <jgg@ziepe.ca>, Axel Rasmussen <axelrasmussen@google.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Pasha Tatashin <pasha.tatashin@soleen.com>, Miaohe Lin <linmiaohe@huawei.com>, Minchan Kim <minchan@kernel.org>, Christoph Hellwig <hch@infradead.org>, Song Liu <song@kernel.org>, Thomas Hellstrom <thomas.hellstrom@linux.intel.com>, Russell King <linux@armlinux.org.uk>, "David S. Miller" <davem@davemloft.net>, Michael Ellerman <mpe@ellerman.id.au>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, Heiko Carstens <hca@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>, Alexander Gordeev <agordeev@linux.ibm.com>, Gerald Schaefer <gerald.schaefer@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com>, Jann Horn <jannh@google.com>, Vishal Moola <vishal.moola@gmail.com>, Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>, linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 13/13] mm/pgtable: notes on pte_offset_map[_lock]() Date: Tue, 11 Jul 2023 21:46:23 -0700 (PDT) [thread overview] Message-ID: <b791c3b0-25c6-a263-d785-d564344eb644@google.com> (raw) In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Add a block of comments on pte_offset_map_lock(), pte_offset_map() and pte_offset_map_nolock() to mm/pgtable-generic.c, to help explain them. Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/pgtable-generic.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index fa9d4d084291..4fcd959dcc4d 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -315,6 +315,50 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, return pte; } +/* + * pte_offset_map_lock(mm, pmd, addr, ptlp), and its internal implementation + * __pte_offset_map_lock() below, is usually called with the pmd pointer for + * addr, reached by walking down the mm's pgd, p4d, pud for addr: either while + * holding mmap_lock or vma lock for read or for write; or in truncate or rmap + * context, while holding file's i_mmap_lock or anon_vma lock for read (or for + * write). In a few cases, it may be used with pmd pointing to a pmd_t already + * copied to or constructed on the stack. + * + * When successful, it returns the pte pointer for addr, with its page table + * kmapped if necessary (when CONFIG_HIGHPTE), and locked against concurrent + * modification by software, with a pointer to that spinlock in ptlp (in some + * configs mm->page_table_lock, in SPLIT_PTLOCK configs a spinlock in table's + * struct page). pte_unmap_unlock(pte, ptl) to unlock and unmap afterwards. + * + * But it is unsuccessful, returning NULL with *ptlp unchanged, if there is no + * page table at *pmd: if, for example, the page table has just been removed, + * or replaced by the huge pmd of a THP. (When successful, *pmd is rechecked + * after acquiring the ptlock, and retried internally if it changed: so that a + * page table can be safely removed or replaced by THP while holding its lock.) + * + * pte_offset_map(pmd, addr), and its internal helper __pte_offset_map() above, + * just returns the pte pointer for addr, its page table kmapped if necessary; + * or NULL if there is no page table at *pmd. It does not attempt to lock the + * page table, so cannot normally be used when the page table is to be updated, + * or when entries read must be stable. But it does take rcu_read_lock(): so + * that even when page table is racily removed, it remains a valid though empty + * and disconnected table. Until pte_unmap(pte) unmaps and rcu_read_unlock()s + * afterwards. + * + * pte_offset_map_nolock(mm, pmd, addr, ptlp), above, is like pte_offset_map(); + * but when successful, it also outputs a pointer to the spinlock in ptlp - as + * pte_offset_map_lock() does, but in this case without locking it. This helps + * the caller to avoid a later pte_lockptr(mm, *pmd), which might by that time + * act on a changed *pmd: pte_offset_map_nolock() provides the correct spinlock + * pointer for the page table that it returns. In principle, the caller should + * recheck *pmd once the lock is taken; in practice, no callsite needs that - + * either the mmap_lock for write, or pte_same() check on contents, is enough. + * + * Note that free_pgtables(), used after unmapping detached vmas, or when + * exiting the whole mm, does not take page table lock before freeing a page + * table, and may not use RCU at all: "outsiders" like khugepaged should avoid + * pte_offset_map() and co once the vma is detached from mm or mm_users is zero. + */ pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, spinlock_t **ptlp) { -- 2.35.3
WARNING: multiple messages have this Message-ID (diff)
From: Hugh Dickins <hughd@google.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Miaohe Lin <linmiaohe@huawei.com>, David Hildenbrand <david@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Yang Shi <shy828301@gmail.com>, Peter Xu <peterx@redhat.com>, linux-kernel@vger.kernel.org, Song Liu <song@kernel.org>, sparclinux@vger.kernel.org, Alexander Gordeev <agordeev@linux.ibm.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>, Will Deacon <will@kernel.org>, linux-s390@vger.kernel.org, Yu Zhao <yuzhao@google.com>, Ira Weiny <ira.weiny@intel.com>, Alistair Popple <apopple@nvidia.com>, Russell King <linux@armlinux.org.uk>, Matthew Wilcox <willy@infradead.org>, Steven Price <steven.price@arm.com>, Christoph Hellwig <hch@infradead.org>, Jason Gunthorpe <jgg@ziepe.ca>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, Zi Yan <ziy@nvidia.com>, Huang Ying <ying.huang@intel.com>, Axel Rasmussen <axelrasmussen@google.com>, Gerald Schaefer <gerald.schaefer@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Thomas Hellstrom <thomas.hellstrom@linux.intel.com>, Ralph Campbell <rcampbell@nvidia.com>, Pasha Tatashin <pasha.tatashin@soleen.com>, Vasily Gorbik <gor@linux.ibm.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Heiko Carstens <hca@linux.ibm.com>, Qi Zheng <zhengqi.arch@bytedance.com>, Suren Baghdasaryan <surenb@google.com>, Vlastimil Babka <vbabka@suse.cz>, linux-arm-kernel@lists.infradead.org, SeongJae Park <sj@kernel.org>, Lorenzo Stoakes <lstoakes@gmail.com>, Jann Horn <jannh@google.com>, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi <naoya.horiguchi@nec.com>, Zack Rusin <zackr@vmware.com>, Vishal Moola <vishal.moola@gmail.com>, Minchan Kim <minchan@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Mel Gorman <mgorman@techsingularity.net>, "David S. Miller" <davem@davemloft.net>, Mike Rapoport <rppt@kernel.org>, Mike Kravetz <mike.kravetz@oracle.com> Subject: [PATCH v3 13/13] mm/pgtable: notes on pte_offset_map[_lock]() Date: Tue, 11 Jul 2023 21:46:23 -0700 (PDT) [thread overview] Message-ID: <b791c3b0-25c6-a263-d785-d564344eb644@google.com> (raw) In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Add a block of comments on pte_offset_map_lock(), pte_offset_map() and pte_offset_map_nolock() to mm/pgtable-generic.c, to help explain them. Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/pgtable-generic.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index fa9d4d084291..4fcd959dcc4d 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -315,6 +315,50 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, return pte; } +/* + * pte_offset_map_lock(mm, pmd, addr, ptlp), and its internal implementation + * __pte_offset_map_lock() below, is usually called with the pmd pointer for + * addr, reached by walking down the mm's pgd, p4d, pud for addr: either while + * holding mmap_lock or vma lock for read or for write; or in truncate or rmap + * context, while holding file's i_mmap_lock or anon_vma lock for read (or for + * write). In a few cases, it may be used with pmd pointing to a pmd_t already + * copied to or constructed on the stack. + * + * When successful, it returns the pte pointer for addr, with its page table + * kmapped if necessary (when CONFIG_HIGHPTE), and locked against concurrent + * modification by software, with a pointer to that spinlock in ptlp (in some + * configs mm->page_table_lock, in SPLIT_PTLOCK configs a spinlock in table's + * struct page). pte_unmap_unlock(pte, ptl) to unlock and unmap afterwards. + * + * But it is unsuccessful, returning NULL with *ptlp unchanged, if there is no + * page table at *pmd: if, for example, the page table has just been removed, + * or replaced by the huge pmd of a THP. (When successful, *pmd is rechecked + * after acquiring the ptlock, and retried internally if it changed: so that a + * page table can be safely removed or replaced by THP while holding its lock.) + * + * pte_offset_map(pmd, addr), and its internal helper __pte_offset_map() above, + * just returns the pte pointer for addr, its page table kmapped if necessary; + * or NULL if there is no page table at *pmd. It does not attempt to lock the + * page table, so cannot normally be used when the page table is to be updated, + * or when entries read must be stable. But it does take rcu_read_lock(): so + * that even when page table is racily removed, it remains a valid though empty + * and disconnected table. Until pte_unmap(pte) unmaps and rcu_read_unlock()s + * afterwards. + * + * pte_offset_map_nolock(mm, pmd, addr, ptlp), above, is like pte_offset_map(); + * but when successful, it also outputs a pointer to the spinlock in ptlp - as + * pte_offset_map_lock() does, but in this case without locking it. This helps + * the caller to avoid a later pte_lockptr(mm, *pmd), which might by that time + * act on a changed *pmd: pte_offset_map_nolock() provides the correct spinlock + * pointer for the page table that it returns. In principle, the caller should + * recheck *pmd once the lock is taken; in practice, no callsite needs that - + * either the mmap_lock for write, or pte_same() check on contents, is enough. + * + * Note that free_pgtables(), used after unmapping detached vmas, or when + * exiting the whole mm, does not take page table lock before freeing a page + * table, and may not use RCU at all: "outsiders" like khugepaged should avoid + * pte_offset_map() and co once the vma is detached from mm or mm_users is zero. + */ pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, spinlock_t **ptlp) { -- 2.35.3
next prev parent reply other threads:[~2023-07-12 4:46 UTC|newest] Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-07-12 4:27 [PATCH v3 00/13] mm: free retracted page table by RCU Hugh Dickins 2023-07-12 4:27 ` Hugh Dickins 2023-07-12 4:30 ` [PATCH v3 01/13] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s Hugh Dickins 2023-07-12 4:30 ` Hugh Dickins 2023-07-12 4:32 ` [PATCH v3 02/13] mm/pgtable: add PAE safety to __pte_offset_map() Hugh Dickins 2023-07-12 4:32 ` Hugh Dickins 2023-07-12 4:33 ` [PATCH v3 03/13] arm: adjust_pte() use pte_offset_map_nolock() Hugh Dickins 2023-07-12 4:33 ` Hugh Dickins 2023-07-12 4:34 ` [PATCH v3 04/13] powerpc: assert_pte_locked() " Hugh Dickins 2023-07-12 4:34 ` Hugh Dickins 2023-07-18 10:41 ` Aneesh Kumar K.V 2023-07-18 10:41 ` Aneesh Kumar K.V 2023-07-19 5:04 ` Hugh Dickins 2023-07-19 5:04 ` Hugh Dickins 2023-07-19 5:24 ` Aneesh Kumar K V 2023-07-19 5:24 ` Aneesh Kumar K V 2023-07-21 13:13 ` Jay Patel 2023-07-21 13:13 ` Jay Patel 2023-07-23 22:26 ` [PATCH v3 04/13 fix] powerpc: assert_pte_locked() use pte_offset_map_nolock(): fix Hugh Dickins 2023-07-23 22:26 ` Hugh Dickins 2023-07-12 4:35 ` [PATCH v3 05/13] powerpc: add pte_free_defer() for pgtables sharing page Hugh Dickins 2023-07-12 4:35 ` Hugh Dickins 2023-07-12 4:37 ` [PATCH v3 06/13] sparc: add pte_free_defer() for pte_t *pgtable_t Hugh Dickins 2023-07-12 4:37 ` Hugh Dickins 2023-07-12 4:38 ` [PATCH v3 07/13] s390: add pte_free_defer() for pgtables sharing page Hugh Dickins 2023-07-12 4:38 ` Hugh Dickins 2023-07-13 4:47 ` Alexander Gordeev 2023-07-13 4:47 ` Alexander Gordeev 2023-07-19 14:25 ` Claudio Imbrenda 2023-07-19 14:25 ` Claudio Imbrenda 2023-07-23 22:29 ` [PATCH v3 07/13 fix] s390: add pte_free_defer() for pgtables sharing page: fix Hugh Dickins 2023-07-23 22:29 ` Hugh Dickins 2023-07-12 4:39 ` [PATCH v3 08/13] mm/pgtable: add pte_free_defer() for pgtable as page Hugh Dickins 2023-07-12 4:39 ` Hugh Dickins 2023-07-12 4:41 ` [PATCH v3 09/13] mm/khugepaged: retract_page_tables() without mmap or vma lock Hugh Dickins 2023-07-12 4:41 ` Hugh Dickins 2023-07-12 4:42 ` [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Hugh Dickins 2023-07-12 4:42 ` Hugh Dickins 2023-07-23 22:32 ` [PATCH v3 10/13 fix] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock(): fix Hugh Dickins 2023-07-23 22:32 ` Hugh Dickins 2023-08-03 9:17 ` [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Qi Zheng 2023-08-03 9:17 ` Qi Zheng 2023-08-06 3:55 ` Hugh Dickins 2023-08-06 3:55 ` Hugh Dickins 2023-08-07 2:21 ` Qi Zheng 2023-08-07 2:21 ` Qi Zheng 2023-08-06 3:59 ` [PATCH v3 10/13 fix2] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock(): fix2 Hugh Dickins 2023-08-06 3:59 ` Hugh Dickins 2023-08-14 20:36 ` [BUG] Re: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Jann Horn 2023-08-14 20:36 ` Jann Horn 2023-08-15 6:34 ` Hugh Dickins 2023-08-15 6:34 ` Hugh Dickins 2023-08-15 7:11 ` David Hildenbrand 2023-08-15 7:11 ` David Hildenbrand 2023-08-15 15:41 ` Hugh Dickins 2023-08-15 15:41 ` Hugh Dickins 2023-08-21 19:48 ` Hugh Dickins 2023-08-21 19:48 ` Hugh Dickins 2023-07-12 4:43 ` [PATCH v3 11/13] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() Hugh Dickins 2023-07-12 4:43 ` Hugh Dickins 2023-07-23 22:35 ` [PATCH v3 11/13 fix] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps(): fix Hugh Dickins 2023-07-23 22:35 ` Hugh Dickins 2023-07-12 4:44 ` [PATCH v3 12/13] mm: delete mmap_write_trylock() and vma_try_start_write() Hugh Dickins 2023-07-12 4:44 ` Hugh Dickins 2023-07-12 4:48 ` [PATCH mm " Hugh Dickins 2023-07-12 4:48 ` Hugh Dickins 2023-07-12 4:46 ` Hugh Dickins [this message] 2023-07-12 4:46 ` [PATCH v3 13/13] mm/pgtable: notes on pte_offset_map[_lock]() Hugh Dickins
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=b791c3b0-25c6-a263-d785-d564344eb644@google.com \ --to=hughd@google.com \ --cc=agordeev@linux.ibm.com \ --cc=akpm@linux-foundation.org \ --cc=aneesh.kumar@linux.ibm.com \ --cc=anshuman.khandual@arm.com \ --cc=apopple@nvidia.com \ --cc=axelrasmussen@google.com \ --cc=borntraeger@linux.ibm.com \ --cc=christophe.leroy@csgroup.eu \ --cc=davem@davemloft.net \ --cc=david@redhat.com \ --cc=gerald.schaefer@linux.ibm.com \ --cc=gor@linux.ibm.com \ --cc=hca@linux.ibm.com \ --cc=hch@infradead.org \ --cc=imbrenda@linux.ibm.com \ --cc=ira.weiny@intel.com \ --cc=jannh@google.com \ --cc=jgg@ziepe.ca \ --cc=kirill.shutemov@linux.intel.com \ --cc=linmiaohe@huawei.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-s390@vger.kernel.org \ --cc=linux@armlinux.org.uk \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=lstoakes@gmail.com \ --cc=mgorman@techsingularity.net \ --cc=mike.kravetz@oracle.com \ --cc=minchan@kernel.org \ --cc=mpe@ellerman.id.au \ --cc=naoya.horiguchi@nec.com \ --cc=pasha.tatashin@soleen.com \ --cc=peterx@redhat.com \ --cc=peterz@infradead.org \ --cc=rcampbell@nvidia.com \ --cc=rppt@kernel.org \ --cc=shy828301@gmail.com \ --cc=sj@kernel.org \ --cc=song@kernel.org \ --cc=sparclinux@vger.kernel.org \ --cc=steven.price@arm.com \ --cc=surenb@google.com \ --cc=thomas.hellstrom@linux.intel.com \ --cc=vbabka@suse.cz \ --cc=vishal.moola@gmail.com \ --cc=will@kernel.org \ --cc=willy@infradead.org \ --cc=ying.huang@intel.com \ --cc=yuzhao@google.com \ --cc=zackr@vmware.com \ --cc=zhengqi.arch@bytedance.com \ --cc=ziy@nvidia.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.