linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Yang Shi <shy828301@gmail.com>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed
Date: Wed, 06 Mar 2024 13:41:13 -0500	[thread overview]
Message-ID: <03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com> (raw)
In-Reply-To: <36bdda72-2731-440e-ad15-39b845401f50@arm.com>

[-- Attachment #1: Type: text/plain, Size: 6172 bytes --]

On 6 Mar 2024, at 12:41, Ryan Roberts wrote:

> On 06/03/2024 16:19, Ryan Roberts wrote:
>> On 06/03/2024 16:09, Matthew Wilcox wrote:
>>> On Wed, Mar 06, 2024 at 01:42:06PM +0000, Ryan Roberts wrote:
>>>> When running some swap tests with this change (which is in mm-stable)
>>>> present, I see BadThings(TM). Usually I see a "bad page state"
>>>> followed by a delay of a few seconds, followed by an oops or NULL
>>>> pointer deref. Bisect points to this change, and if I revert it,
>>>> the problem goes away.
>>>
>>> That oops is really messed up ;-(  We're clearly got two CPUs oopsing at
>>> the same time and it's all interleaved.  That said, I can pick some
>>> nuggets out of it.
>>>
>>>> [   76.239466] BUG: Bad page state in process usemem  pfn:2554a0
>>>> [   76.240196] kernel BUG at include/linux/mm.h:1120!
>>>
>>> These are the two different BUGs being called simultaneously ...
>>>
>>> The first one is bad_page() in page_alloc.c and the second is
>>> put_page_testzero()
>>>         VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
>>>
>>> I'm sure it's significant that both of these are the same page (pfn
>>> 2554a0).  Feels like we have two CPUs calling put_folio() at the same
>>> time, and one of them underflows.  It probably doesn't matter which call
>>> trace ends up in bad_page() and which in put_page_testzero().
>>>
>>> One of them is coming from deferred_split_scan(), which is weird because
>>> we can see the folio_try_get() earlier in the function.  So whatever
>>> this folio was, we found it on the deferred split list, got its refcount,
>>> moved it to the local list, either failed to get the lock, or
>>> successfully got the lock, split it, unlocked it and put it.
>>>
>>> (I can see this was invoked from page fault -> memcg shrinking.  That's
>>> probably irrelevant but explains some of the functions in the backtrace)
>>>
>>> The other call trace comes from migrate_folio_done() where we're putting
>>> the _source_ folio.  That was called from migrate_pages_batch() which
>>> was called from kcompactd.
>>>
>>> Um.  Where do we handle the deferred list in the migration code?
>>>
>>>
>>> I've also tried looking at this from a different angle -- what is it
>>> about this commit that produces this problem?  It's a fairly small
>>> commit:
>>>
>>> -               if (folio_test_large(folio)) {
>>> +               /* hugetlb has its own memcg */
>>> +               if (folio_test_hugetlb(folio)) {
>>>                         if (lruvec) {
>>>                                 unlock_page_lruvec_irqrestore(lruvec, flags);
>>>                                 lruvec = NULL;
>>>                         }
>>> -                       __folio_put_large(folio);
>>> +                       free_huge_folio(folio);
>>>
>>> So all that's changed is that large non-hugetlb folios do not call
>>> __folio_put_large().  As a reminder, that function does:
>>>
>>>         if (!folio_test_hugetlb(folio))
>>>                 page_cache_release(folio);
>>>         destroy_large_folio(folio);
>>>
>>> and destroy_large_folio() does:
>>>         if (folio_test_large_rmappable(folio))
>>>                 folio_undo_large_rmappable(folio);
>>>
>>>         mem_cgroup_uncharge(folio);
>>>         free_the_page(&folio->page, folio_order(folio));
>>>
>>> So after my patch, instead of calling (in order):
>>>
>>> 	page_cache_release(folio);
>>> 	folio_undo_large_rmappable(folio);
>>> 	mem_cgroup_uncharge(folio);
>>> 	free_unref_page()
>>>
>>> it calls:
>>>
>>> 	__page_cache_release(folio, &lruvec, &flags);
>>> 	mem_cgroup_uncharge_folios()
>>> 	folio_undo_large_rmappable(folio);
>>>
>>> So have I simply widened the window for this race
>>
>> Yes that's the conclusion I'm coming to. I have reverted this patch and am still
>> seeing what looks like the same problem very occasionally. (I was just about to
>> let you know when I saw this reply). It's much harder to reproduce now... great.
>>
>> The original oops I reported against your RFC is here:
>> https://lore.kernel.org/linux-mm/eeaf36cf-8e29-4de2-9e5a-9ec2a5e30c61@arm.com/
>>
>> Looks like I had UBSAN enabled for that run. Let me turn on all the bells and
>> whistles and see if I can get it to repro more reliably to bisect.
>>
>> Assuming the original oops and this are related, that implies that the problem
>> is lurking somewhere in this series, if not this patch.
>>
>> I'll come back to you shortly...
>
> Just a bunch of circumstantial observations, I'm afraid. No conclusions yet...
>
> With this patch reverted:
>
> - Haven't triggered with any of the sanitizers compiled in
> - Have only triggered when my code is on top (swap-out mTHP)
> - Have only triggered when compiled using GCC 12.2 (can't trigger with 11.4)
>
> So perhaps I'm looking at 2 different things, with this new intermittent problem
> caused by my changes. Or perhaps my changes increase the window significantly.
>
> I have to go pick up my daughter now. Can look at this some more tomorrow, but
> struggling for ideas - need a way to more reliably reproduce.
>
>>
>>> , whatever it is
>>> exactly?  Something involving mis-handling of the deferred list?

I had a chat with willy on the deferred list mis-handling. Current migration
code (starting from commit 616b8371539a6 ("mm: thp: enable thp migration in
generic path")) does not properly handle THP and mTHP on the deferred list.
So if the source folio is on the deferred list, after migration,
the destination folio will not. But this seems a benign bug, since
the opportunity of splitting a partially mapped THP/mTHP is gone.

In terms of potential races, the source folio refcount is elevated before
migration, deferred_split_scan() can move the folio off the deferred_list,
but cannot split it. During folio_migrate_mapping() when folio is frozen,
deferred_split_scan() cannot move the folio off the deferred_list to begin
with.

I am going to send a patch to fix the deferred_list handling in migration,
but it seems not be related to the bug in this email thread.


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

  reply	other threads:[~2024-03-06 18:41 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 17:42 [PATCH v3 00/18] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 01/18] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 02/18] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 03/18] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 04/18] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 05/18] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 06/18] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 07/18] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 08/18] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 09/18] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2024-03-06 13:42   ` Ryan Roberts
2024-03-06 16:09     ` Matthew Wilcox
2024-03-06 16:19       ` Ryan Roberts
2024-03-06 17:41         ` Ryan Roberts
2024-03-06 18:41           ` Zi Yan [this message]
2024-03-06 19:55             ` Matthew Wilcox
2024-03-06 21:55               ` Matthew Wilcox
2024-03-07  8:56                 ` Ryan Roberts
2024-03-07 13:50                   ` Yin, Fengwei
2024-03-07 14:05                     ` Re: Matthew Wilcox
2024-03-07 15:24                       ` Re: Ryan Roberts
2024-03-07 16:24                         ` Re: Ryan Roberts
2024-03-07 23:02                           ` Re: Matthew Wilcox
2024-03-08  1:06                       ` Re: Yin, Fengwei
2024-03-07 17:33                   ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox
2024-03-07 18:35                     ` Ryan Roberts
2024-03-07 20:42                       ` Matthew Wilcox
2024-03-08 11:44                     ` Ryan Roberts
2024-03-08 12:09                       ` Ryan Roberts
2024-03-08 14:21                         ` Ryan Roberts
2024-03-08 15:11                           ` Matthew Wilcox
2024-03-08 16:03                             ` Matthew Wilcox
2024-03-08 17:13                               ` Ryan Roberts
2024-03-08 18:09                                 ` Ryan Roberts
2024-03-08 18:18                                   ` Matthew Wilcox
2024-03-09  4:34                                     ` Andrew Morton
2024-03-09  4:52                                       ` Matthew Wilcox
2024-03-09  8:05                                         ` Ryan Roberts
2024-03-09 12:33                                           ` Ryan Roberts
2024-03-10 13:38                                             ` Matthew Wilcox
2024-03-08 15:33                         ` Matthew Wilcox
2024-03-09  6:09                       ` Matthew Wilcox
2024-03-09  7:59                         ` Ryan Roberts
2024-03-09  8:18                           ` Ryan Roberts
2024-03-09  9:38                             ` Ryan Roberts
2024-03-10  4:23                               ` Matthew Wilcox
2024-03-10  8:23                                 ` Ryan Roberts
2024-03-10 11:08                                   ` Matthew Wilcox
2024-03-10 11:01       ` Ryan Roberts
2024-03-10 11:11         ` Matthew Wilcox
2024-03-10 16:31           ` Ryan Roberts
2024-03-10 19:57             ` Matthew Wilcox
2024-03-10 19:59             ` Ryan Roberts
2024-03-10 20:46               ` Matthew Wilcox
2024-03-10 21:52                 ` Matthew Wilcox
2024-03-11  9:01                   ` Ryan Roberts
2024-03-11 12:26                     ` Matthew Wilcox
2024-03-11 12:36                       ` Ryan Roberts
2024-03-11 15:50                         ` Matthew Wilcox
2024-03-11 16:14                           ` Ryan Roberts
2024-03-11 17:49                             ` Matthew Wilcox
2024-03-12 11:57                               ` Ryan Roberts
2024-03-11 19:26                             ` Matthew Wilcox
2024-03-10 11:14         ` Ryan Roberts
2024-02-27 17:42 ` [PATCH v3 11/18] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 12/18] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 13/18] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 14/18] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 15/18] mm: Remove lru_to_page() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 16/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 17/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 18/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).