From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Yang Shi <shy828301@gmail.com>,
Wang Yugui <wangyugui@e16-tech.com>,
Matthew Wilcox <willy@infradead.org>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
Alistair Popple <apopple@nvidia.com>,
Ralph Campbell <rcampbell@nvidia.com>, Zi Yan <ziy@nvidia.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Minchan Kim <minchan@kernel.org>, Jue Wang <juew@google.com>,
Peter Xu <peterx@redhat.com>, Jan Kara <jack@suse.cz>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/7] mm/thp: fix page_vma_mapped_walk() if huge page mapped by ptes
Date: Fri, 4 Jun 2021 19:24:02 +0300 [thread overview]
Message-ID: <20210604162402.iclcdd3ywynkoamy@box.shutemov.name> (raw)
In-Reply-To: <alpine.LSU.2.11.2106011411290.2148@eggly.anvils>
On Tue, Jun 01, 2021 at 02:13:21PM -0700, Hugh Dickins wrote:
> Running certain tests with a DEBUG_VM kernel would crash within hours,
> on the total_mapcount BUG() in split_huge_page_to_list(), while trying
> to free up some memory by punching a hole in a shmem huge page: split's
> try_to_unmap() was unable to find all the mappings of the page (which,
> on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
>
> Crash dumps showed two tail pages of a shmem huge page remained mapped
> by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end
> of a long unmapped range; and no page table had yet been allocated for
> the head of the huge page to be mapped into.
>
> Although designed to handle these odd misaligned huge-page-mapped-by-pte
> cases, page_vma_mapped_walk() falls short by returning false prematurely
> when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
> are cases when a huge page may span the boundary, with ptes present in
> the next.
Oh. My bad. I guess it was pain to debug.
> Restructure page_vma_mapped_walk() as a loop to continue in these cases,
> while keeping its layout much as before. Add a step_forward() helper to
> advance pvmw->address across those boundaries: originally I tried to use
> mm's standard p?d_addr_end() macros, but hit the same crash 512 times
> less often: because of the way redundant levels are folded together,
> but folded differently in different configurations, it was just too
> difficult to use them correctly; and step_forward() is simpler anyway.
>
> Merged various other minor fixes and cleanups into page_vma_mapped_walk()
> as I worked on it: which I find much easier to enumerate here than to
> prise apart into separate commits.
But it makes it harder to review...
> Handle all of the hugetlbfs PageHuge() case once at the start,
> so we don't need to worry about it again further down.
>
> Sometimes local copy of pvmw->page was used, sometimes pvmw->page:
> just use pvmw->page throughout (and continue to use pvmw->address
> throughout, though we could take a local copy instead).
>
> Use pmd_read_atomic() with barrier() instead of READ_ONCE() for pmde:
> some architectures (e.g. i386 with PAE) have a multi-word pmd entry,
> for which READ_ONCE() is not good enough.
>
> Re-evaluate pmde after taking lock, then use it in subsequent tests,
> instead of repeatedly dereferencing pvmw->pmd pointer.
>
> Rearrange the !pmd_present block to follow the same "return not_found,
> return not_found, return true" pattern as the block above it (note:
> returning not_found there is never premature, since the existence or
> prior existence of a huge pmd guarantees good alignment).
>
> Adjust page table boundary test in case address was not page-aligned.
>
> Reset pvmw->pte to NULL after unmapping that page table.
>
> Respect the 80-column line limit.
>
> Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
I tried to review it and superficially it looks good, but it has to be
split into bunch of patches.
> /* when pud is not present, pte will be NULL */
> - pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
> + pvmw->pte = huge_pte_offset(mm, pvmw->address,
> + page_size(pvmw->page));
AFAICS, it exactly fits into 80-column.
> if (!pvmw->pte)
> return false;
>
> - pvmw->ptl = huge_pte_lockptr(page_hstate(page), mm, pvmw->pte);
> + pvmw->ptl = huge_pte_lockptr(page_hstate(pvmw->page),
> + mm, pvmw->pte);
And this one end on 79.
Hm?
--
Kirill A. Shutemov
next prev parent reply other threads:[~2021-06-04 16:24 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-01 21:03 [PATCH 0/7] mm/thp: fix THP splitting unmap BUGs and related Hugh Dickins
2021-06-01 21:05 ` [PATCH 1/7] mm/thp: fix __split_huge_pmd_locked() on shmem migration entry Hugh Dickins
2021-06-03 21:26 ` Yang Shi
2021-06-04 2:22 ` Hugh Dickins
2021-06-04 18:03 ` Yang Shi
2021-06-04 21:52 ` Hugh Dickins
2021-06-04 15:34 ` Kirill A. Shutemov
2021-06-04 21:29 ` Hugh Dickins
2021-06-01 21:07 ` [PATCH 2/7] mm/thp: try_to_unmap() use TTU_SYNC for safe DEBUG_VM splitting Hugh Dickins
2021-06-02 1:59 ` Alistair Popple
2021-06-03 21:45 ` Yang Shi
2021-06-04 2:45 ` Hugh Dickins
2021-06-04 18:24 ` Yang Shi
2021-06-03 21:48 ` Peter Xu
2021-06-04 2:54 ` Hugh Dickins
2021-06-04 14:48 ` Peter Xu
2021-06-04 22:26 ` Hugh Dickins
2021-06-04 15:47 ` Kirill A. Shutemov
2021-06-01 21:09 ` [PATCH 3/7] mm/thp: fix vma_address() if virtual address below file offset Hugh Dickins
2021-06-01 21:30 ` Matthew Wilcox
2021-06-03 21:36 ` Hugh Dickins
2021-06-03 21:40 ` [PATCH v2 " Hugh Dickins
2021-06-04 15:53 ` Kirill A. Shutemov
2021-06-04 17:36 ` Matthew Wilcox
2021-06-04 22:35 ` Hugh Dickins
2021-06-01 21:11 ` [PATCH 4/7] mm/thp: fix page_address_in_vma() on file THP tails Hugh Dickins
2021-06-01 21:32 ` Matthew Wilcox
2021-06-03 22:06 ` Yang Shi
2021-06-04 15:54 ` Kirill A. Shutemov
2021-06-01 21:13 ` [PATCH 5/7] mm/thp: fix page_vma_mapped_walk() if huge page mapped by ptes Hugh Dickins
2021-06-04 16:24 ` Kirill A. Shutemov [this message]
2021-06-04 17:42 ` Matthew Wilcox
2021-06-04 22:56 ` Hugh Dickins
2021-06-01 21:15 ` [PATCH 6/7] mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() Hugh Dickins
2021-06-04 16:39 ` Kirill A. Shutemov
2021-06-04 23:07 ` Hugh Dickins
2021-06-01 21:17 ` [PATCH 7/7] mm/thp: remap_page() is only needed on anonymous THP Hugh Dickins
2021-06-03 22:09 ` Yang Shi
2021-06-04 16:41 ` Kirill A. Shutemov
2021-06-02 2:07 ` [PATCH 0/7] mm/thp: fix THP splitting unmap BUGs and related Alistair Popple
2021-06-03 22:21 ` Hugh Dickins
2021-06-03 23:03 ` Andrew Morton
2021-06-03 22:26 ` [PATCH 6.1/7] mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210604162402.iclcdd3ywynkoamy@box.shutemov.name \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=juew@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=naoya.horiguchi@nec.com \
--cc=peterx@redhat.com \
--cc=rcampbell@nvidia.com \
--cc=shy828301@gmail.com \
--cc=wangyugui@e16-tech.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).