All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org,
	linux-kselftest@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	liubo <liubo254@huawei.com>, Peter Xu <peterx@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Hugh Dickins <hughd@google.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	John Hubbard <jhubbard@nvidia.com>, Mel Gorman <mgorman@suse.de>,
	Shuah Khan <shuah@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v2 2/8] smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
Date: Wed, 2 Aug 2023 17:34:34 +0200	[thread overview]
Message-ID: <36dc6356-78b6-5cc5-0a1a-ef01bbce15f9@redhat.com> (raw)
In-Reply-To: <20230802151613.3nyg3xof3gyovlxu@techsingularity.net>

On 02.08.23 17:16, Mel Gorman wrote:
> On Tue, Aug 01, 2023 at 02:48:38PM +0200, David Hildenbrand wrote:
>> We shouldn't be using a GUP-internal helper if it can be avoided.
>>
>> Similar to smaps_pte_entry() that uses vm_normal_page(), let's use
>> vm_normal_page_pmd() that similarly refuses to return the huge zeropage.
>>
>> In contrast to follow_trans_huge_pmd(), vm_normal_page_pmd():
>>
>> (1) Will always return the head page, not a tail page of a THP.
>>
>>   If we'd ever call smaps_account with a tail page while setting "compound
>>   = true", we could be in trouble, because smaps_account() would look at
>>   the memmap of unrelated pages.
>>
>>   If we're unlucky, that memmap does not exist at all. Before we removed
>>   PG_doublemap, we could have triggered something similar as in
>>   commit 24d7275ce279 ("fs/proc: task_mmu.c: don't read mapcount for
>>   migration entry").
>>
>>   This can theoretically happen ever since commit ff9f47f6f00c ("mm: proc:
>>   smaps_rollup: do not stall write attempts on mmap_lock"):
>>
>>    (a) We're in show_smaps_rollup() and processed a VMA
>>    (b) We release the mmap lock in show_smaps_rollup() because it is
>>        contended
>>    (c) We merged that VMA with another VMA
>>    (d) We collapsed a THP in that merged VMA at that position
>>
>>   If the end address of the original VMA falls into the middle of a THP
>>   area, we would call smap_gather_stats() with a start address that falls
>>   into a PMD-mapped THP. It's probably very rare to trigger when not
>>   really forced.
>>
>> (2) Will succeed on a is_pci_p2pdma_page(), like vm_normal_page()
>>
>>   Treat such PMDs here just like smaps_pte_entry() would treat such PTEs.
>>   If such pages would be anonymous, we most certainly would want to
>>   account them.
>>
>> (3) Will skip over pmd_devmap(), like vm_normal_page() for pte_devmap()
>>
>>   As noted in vm_normal_page(), that is only for handling legacy ZONE_DEVICE
>>   pages. So just like smaps_pte_entry(), we'll now also ignore such PMD
>>   entries.
>>
>>   Especially, follow_pmd_mask() never ends up calling
>>   follow_trans_huge_pmd() on pmd_devmap(). Instead it calls
>>   follow_devmap_pmd() -- which will fail if neither FOLL_GET nor FOLL_PIN
>>   is set.
>>
>>   So skipping pmd_devmap() pages seems to be the right thing to do.
>>
>> (4) Will properly handle VM_MIXEDMAP/VM_PFNMAP, like vm_normal_page()
>>
>>   We won't be returning a memmap that should be ignored by core-mm, or
>>   worse, a memmap that does not even exist. Note that while
>>   walk_page_range() will skip VM_PFNMAP mappings, walk_page_vma() won't.
>>
>>   Most probably this case doesn't currently really happen on the PMD level,
>>   otherwise we'd already be able to trigger kernel crashes when reading
>>   smaps / smaps_rollup.
>>
>> So most probably only (1) is relevant in practice as of now, but could only
>> cause trouble in extreme corner cases.
>>
>> Fixes: ff9f47f6f00c ("mm: proc: smaps_rollup: do not stall write attempts on mmap_lock")
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Maybe move the follow_trans_huge_pmd() declaration from linux/huge_mm.h
> to mm/internal.h to discourage future mistakes? Otherwise
> 

Makes sense.

> Acked-by: Mel Gorman <mgorman@techsingularity.net>

Thanks!

-- 
Cheers,

David / dhildenb


  reply	other threads:[~2023-08-02 15:36 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01 12:48 [PATCH v2 0/8] smaps / mm/gup: fix gup_can_follow_protnone fallout David Hildenbrand
2023-08-01 12:48 ` [PATCH v2 1/8] mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT David Hildenbrand
2023-08-01 15:48   ` Peter Xu
2023-08-01 16:15     ` David Hildenbrand
2023-08-01 17:04       ` Peter Xu
2023-08-01 17:09         ` David Hildenbrand
2023-08-02 15:08   ` Mel Gorman
2023-08-02 15:12     ` David Hildenbrand
2023-08-01 12:48 ` [PATCH v2 2/8] smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd() David Hildenbrand
2023-08-02 15:16   ` Mel Gorman
2023-08-02 15:34     ` David Hildenbrand [this message]
2023-08-01 12:48 ` [PATCH v2 3/8] kvm: explicitly set FOLL_HONOR_NUMA_FAULT in hva_to_pfn_slow() David Hildenbrand
2023-08-02 15:27   ` Mel Gorman
2023-08-02 15:29     ` David Hildenbrand
2023-08-01 12:48 ` [PATCH v2 4/8] mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT David Hildenbrand
2023-08-02 15:28   ` Mel Gorman
2023-08-01 12:48 ` [PATCH v2 5/8] pgtable: improve pte_protnone() comment David Hildenbrand
2023-08-02 15:35   ` Mel Gorman
2023-08-01 12:48 ` [PATCH v2 6/8] mm/huge_memory: remove stale NUMA hinting comment from follow_trans_huge_pmd() David Hildenbrand
2023-08-01 16:07   ` Peter Xu
2023-08-01 16:16     ` David Hildenbrand
2023-08-02 15:34   ` Mel Gorman
2023-08-01 12:48 ` [PATCH v2 7/8] selftest/mm: ksm_functional_tests: test in mmap_and_merge_range() if anything got merged David Hildenbrand
2023-08-01 12:48 ` [PATCH v2 8/8] selftest/mm: ksm_functional_tests: Add PROT_NONE test David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36dc6356-78b6-5cc5-0a1a-ef01bbce15f9@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liubo254@huawei.com \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=shuah@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.