All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Yang Shi <shy828301@gmail.com>,
	Wang Yugui <wangyugui@e16-tech.com>,
	Matthew Wilcox <willy@infradead.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Alistair Popple <apopple@nvidia.com>,
	Ralph Campbell <rcampbell@nvidia.com>, Zi Yan <ziy@nvidia.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>, Jue Wang <juew@google.com>,
	Peter Xu <peterx@redhat.com>, Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/7] mm/thp: fix page_vma_mapped_walk() if huge page mapped by ptes
Date: Fri, 4 Jun 2021 19:24:02 +0300	[thread overview]
Message-ID: <20210604162402.iclcdd3ywynkoamy@box.shutemov.name> (raw)
In-Reply-To: <alpine.LSU.2.11.2106011411290.2148@eggly.anvils>

On Tue, Jun 01, 2021 at 02:13:21PM -0700, Hugh Dickins wrote:
> Running certain tests with a DEBUG_VM kernel would crash within hours,
> on the total_mapcount BUG() in split_huge_page_to_list(), while trying
> to free up some memory by punching a hole in a shmem huge page: split's
> try_to_unmap() was unable to find all the mappings of the page (which,
> on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
> 
> Crash dumps showed two tail pages of a shmem huge page remained mapped
> by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end
> of a long unmapped range; and no page table had yet been allocated for
> the head of the huge page to be mapped into.
> 
> Although designed to handle these odd misaligned huge-page-mapped-by-pte
> cases, page_vma_mapped_walk() falls short by returning false prematurely
> when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
> are cases when a huge page may span the boundary, with ptes present in
> the next.

Oh. My bad. I guess it was pain to debug.

> Restructure page_vma_mapped_walk() as a loop to continue in these cases,
> while keeping its layout much as before. Add a step_forward() helper to
> advance pvmw->address across those boundaries: originally I tried to use
> mm's standard p?d_addr_end() macros, but hit the same crash 512 times
> less often: because of the way redundant levels are folded together,
> but folded differently in different configurations, it was just too
> difficult to use them correctly; and step_forward() is simpler anyway.
> 
> Merged various other minor fixes and cleanups into page_vma_mapped_walk()
> as I worked on it: which I find much easier to enumerate here than to
> prise apart into separate commits.

But it makes it harder to review...

> Handle all of the hugetlbfs PageHuge() case once at the start,
> so we don't need to worry about it again further down.
> 
> Sometimes local copy of pvmw->page was used, sometimes pvmw->page:
> just use pvmw->page throughout (and continue to use pvmw->address
> throughout, though we could take a local copy instead).
> 
> Use pmd_read_atomic() with barrier() instead of READ_ONCE() for pmde:
> some architectures (e.g. i386 with PAE) have a multi-word pmd entry,
> for which READ_ONCE() is not good enough.
> 
> Re-evaluate pmde after taking lock, then use it in subsequent tests,
> instead of repeatedly dereferencing pvmw->pmd pointer.
> 
> Rearrange the !pmd_present block to follow the same "return not_found,
> return not_found, return true" pattern as the block above it (note:
> returning not_found there is never premature, since the existence or
> prior existence of a huge pmd guarantees good alignment).
> 
> Adjust page table boundary test in case address was not page-aligned.
> 
> Reset pvmw->pte to NULL after unmapping that page table.
> 
> Respect the 80-column line limit.
> 
> Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>

I tried to review it and superficially it looks good, but it has to be
split into bunch of patches.


>  		/* when pud is not present, pte will be NULL */
> -		pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
> +		pvmw->pte = huge_pte_offset(mm, pvmw->address,
> +					    page_size(pvmw->page));

AFAICS, it exactly fits into 80-column.

>  		if (!pvmw->pte)
>  			return false;
>  
> -		pvmw->ptl = huge_pte_lockptr(page_hstate(page), mm, pvmw->pte);
> +		pvmw->ptl = huge_pte_lockptr(page_hstate(pvmw->page),
> +					     mm, pvmw->pte);

And this one end on 79.

Hm?

-- 
 Kirill A. Shutemov

  reply	other threads:[~2021-06-04 16:24 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01 21:03 [PATCH 0/7] mm/thp: fix THP splitting unmap BUGs and related Hugh Dickins
2021-06-01 21:03 ` Hugh Dickins
2021-06-01 21:05 ` [PATCH 1/7] mm/thp: fix __split_huge_pmd_locked() on shmem migration entry Hugh Dickins
2021-06-01 21:05   ` Hugh Dickins
2021-06-03 21:26   ` Yang Shi
2021-06-03 21:26     ` Yang Shi
2021-06-04  2:22     ` Hugh Dickins
2021-06-04  2:22       ` Hugh Dickins
2021-06-04 18:03       ` Yang Shi
2021-06-04 18:03         ` Yang Shi
2021-06-04 21:52         ` Hugh Dickins
2021-06-04 21:52           ` Hugh Dickins
2021-06-04 15:34   ` Kirill A. Shutemov
2021-06-04 21:29     ` Hugh Dickins
2021-06-04 21:29       ` Hugh Dickins
2021-06-01 21:07 ` [PATCH 2/7] mm/thp: try_to_unmap() use TTU_SYNC for safe DEBUG_VM splitting Hugh Dickins
2021-06-01 21:07   ` Hugh Dickins
2021-06-02  1:59   ` Alistair Popple
2021-06-03 21:45   ` Yang Shi
2021-06-03 21:45     ` Yang Shi
2021-06-04  2:45     ` Hugh Dickins
2021-06-04  2:45       ` Hugh Dickins
2021-06-04 18:24       ` Yang Shi
2021-06-04 18:24         ` Yang Shi
2021-06-03 21:48   ` Peter Xu
2021-06-04  2:54     ` Hugh Dickins
2021-06-04  2:54       ` Hugh Dickins
2021-06-04 14:48       ` Peter Xu
2021-06-04 22:26         ` Hugh Dickins
2021-06-04 22:26           ` Hugh Dickins
2021-06-04 15:47       ` Kirill A. Shutemov
2021-06-01 21:09 ` [PATCH 3/7] mm/thp: fix vma_address() if virtual address below file offset Hugh Dickins
2021-06-01 21:09   ` Hugh Dickins
2021-06-01 21:30   ` Matthew Wilcox
2021-06-03 21:36     ` Hugh Dickins
2021-06-03 21:36       ` Hugh Dickins
2021-06-03 21:40       ` [PATCH v2 " Hugh Dickins
2021-06-03 21:40         ` Hugh Dickins
2021-06-04 15:53         ` Kirill A. Shutemov
2021-06-04 17:36         ` Matthew Wilcox
2021-06-04 22:35           ` Hugh Dickins
2021-06-04 22:35             ` Hugh Dickins
2021-06-01 21:11 ` [PATCH 4/7] mm/thp: fix page_address_in_vma() on file THP tails Hugh Dickins
2021-06-01 21:11   ` Hugh Dickins
2021-06-01 21:32   ` Matthew Wilcox
2021-06-03 22:06   ` Yang Shi
2021-06-03 22:06     ` Yang Shi
2021-06-04 15:54   ` Kirill A. Shutemov
2021-06-01 21:13 ` [PATCH 5/7] mm/thp: fix page_vma_mapped_walk() if huge page mapped by ptes Hugh Dickins
2021-06-01 21:13   ` Hugh Dickins
2021-06-04 16:24   ` Kirill A. Shutemov [this message]
2021-06-04 17:42     ` Matthew Wilcox
2021-06-04 22:56     ` Hugh Dickins
2021-06-04 22:56       ` Hugh Dickins
2021-06-01 21:15 ` [PATCH 6/7] mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() Hugh Dickins
2021-06-01 21:15   ` Hugh Dickins
2021-06-04 16:39   ` Kirill A. Shutemov
2021-06-04 23:07     ` Hugh Dickins
2021-06-04 23:07       ` Hugh Dickins
2021-06-01 21:17 ` [PATCH 7/7] mm/thp: remap_page() is only needed on anonymous THP Hugh Dickins
2021-06-01 21:17   ` Hugh Dickins
2021-06-03 22:09   ` Yang Shi
2021-06-03 22:09     ` Yang Shi
2021-06-04 16:41   ` Kirill A. Shutemov
2021-06-02  2:07 ` [PATCH 0/7] mm/thp: fix THP splitting unmap BUGs and related Alistair Popple
2021-06-03 22:21 ` Hugh Dickins
2021-06-03 22:21   ` Hugh Dickins
2021-06-03 23:03   ` Andrew Morton
2021-06-03 22:26 ` [PATCH 6.1/7] mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split Hugh Dickins
2021-06-03 22:26   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210604162402.iclcdd3ywynkoamy@box.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=juew@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterx@redhat.com \
    --cc=rcampbell@nvidia.com \
    --cc=shy828301@gmail.com \
    --cc=wangyugui@e16-tech.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.