All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Yang Shi <shy828301@gmail.com>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed
Date: Sun, 10 Mar 2024 04:23:13 +0000	[thread overview]
Message-ID: <Ze01sd5T6tLm6Ep8@casper.infradead.org> (raw)
In-Reply-To: <08c01e9d-beda-435c-93ac-f303a89379df@arm.com>

On Sat, Mar 09, 2024 at 09:38:42AM +0000, Ryan Roberts wrote:
> > I think split_queue_len is getting out of sync with the number of items on the
> > queue? We only decrement it if we lost the race with folio_put(). But we are
> > unconditionally taking folios off the list here. So we are definitely out of
> > sync until we take the lock again below. But we only put folios back on the list
> > that failed to split. A successful split used to decrement this variable
> > (because the folio was on _a_ list). But now it doesn't. So we are always
> > mismatched after the first failed split?
> 
> Oops, I meant first *sucessful* split.

Agreed, nice fix.

> I've run the full test 5 times, and haven't seen any slow down or RCU stall
> warning. But on the 5th time, I saw the non-NULL mapping oops (your new check
> did not trigger):
> 
> [  944.475632] BUG: Bad page state in process usemem  pfn:252932
> [  944.477314] page:00000000ad4feba6 refcount:0 mapcount:0
> mapping:000000003a777cd9 index:0x1 pfn:0x252932
> [  944.478575] aops:0x0 ino:dead000000000122
> [  944.479130] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff)
> [  944.479934] page_type: 0xffffffff()
> [  944.480328] raw: 0bfffc0000000000 0000000000000000 fffffc00084a4c90
> fffffc00084a4c90
> [  944.481734] raw: 0000000000000001 0000000000000000 00000000ffffffff
> 0000000000000000
> [  944.482475] page dumped because: non-NULL mapping

> So what do we know?
> 
>  - the above page looks like it was the 3rd page of a large folio
>     - words 3 and 4 are the same, meaning they are likely empty _deferred_list
>     - pfn alignment is correct for this
>  - The _deferred_list for all previously freed large folios was empty
>     - but the folio could have been in the new deferred split batch?

I don't think it could be in a deferred split bacth because we hold the
refcount at that point ...

>  - free_tail_page_prepare() zeroed mapping/_deferred_list during free
>  - _deferred_list was subsequently reinitialized to "empty" while on free list
> 
> So how about this for a rough hypothesis:
> 
> 
> CPU1                                  CPU2
> deferred_split_scan
> list_del_init
> folio_batch_add
>                                       folio_put -> free
> 		                        free_tail_page_prepare
> 			                  is on deferred list? -> no
> split_huge_page_to_list_to_order
>   list_empty(folio->_deferred_list)
>     -> yes
>   list_del_init
> 			                  mapping = NULL
> 					    -> (_deferred_list.prev = NULL)
> 			                put page on free list
>     INIT_LIST_HEAD(entry);
>       -> "mapping" no longer NULL
> 
> 
> But CPU1 is holding a reference, so that could only happen if a reference was
> put one too many times. Ugh.

Before we start blaming the CPU for doing something impossible, what if
we're taking the wrong lock?  I know that seems crazy, but if page->flags
gets corrupted to the point where we change some of the bits in the
nid, when we free the folio, we call folio_undo_large_rmappable(),
get the wrong ds_queue back from get_deferred_split_queue(), take the
wrong split_queue_lock, corrupt the deferred list of a different node,
and bad things happen?

I don't think we can detect that folio->nid has become corrupted in the
page allocation/freeing code (can we?), but we can tell if a folio is
on the wrong ds_queue in deferred_split_scan():

        list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
                                                        _deferred_list) {
+		VM_BUG_ON_FOLIO(folio_nid(folio) != sc->nid, folio);
+		VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
                list_del_init(&folio->_deferred_list);

(also testing the hypothesis that somehow a split folio has ended up
on the deferred split list)

This wouldn't catch the splat above early, I don't think, but it might
trigger early enough with your workload that it'd be useful information.

(I reviewed the patch you're currently testing with and it matches with
what I think we should be doing)


  reply	other threads:[~2024-03-10  4:23 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 17:42 [PATCH v3 00/18] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 01/18] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 02/18] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 03/18] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 04/18] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 05/18] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 06/18] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 07/18] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 08/18] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 09/18] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2024-03-06 13:42   ` Ryan Roberts
2024-03-06 16:09     ` Matthew Wilcox
2024-03-06 16:19       ` Ryan Roberts
2024-03-06 17:41         ` Ryan Roberts
2024-03-06 18:41           ` Zi Yan
2024-03-06 19:55             ` Matthew Wilcox
2024-03-06 21:55               ` Matthew Wilcox
2024-03-07  8:56                 ` Ryan Roberts
2024-03-07 13:50                   ` Yin, Fengwei
2024-03-07 14:05                     ` Re: Matthew Wilcox
2024-03-07 15:24                       ` Re: Ryan Roberts
2024-03-07 16:24                         ` Re: Ryan Roberts
2024-03-07 23:02                           ` Re: Matthew Wilcox
2024-03-08  1:06                       ` Re: Yin, Fengwei
2024-03-07 17:33                   ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox
2024-03-07 18:35                     ` Ryan Roberts
2024-03-07 20:42                       ` Matthew Wilcox
2024-03-08 11:44                     ` Ryan Roberts
2024-03-08 12:09                       ` Ryan Roberts
2024-03-08 14:21                         ` Ryan Roberts
2024-03-08 15:11                           ` Matthew Wilcox
2024-03-08 16:03                             ` Matthew Wilcox
2024-03-08 17:13                               ` Ryan Roberts
2024-03-08 18:09                                 ` Ryan Roberts
2024-03-08 18:18                                   ` Matthew Wilcox
2024-03-09  4:34                                     ` Andrew Morton
2024-03-09  4:52                                       ` Matthew Wilcox
2024-03-09  8:05                                         ` Ryan Roberts
2024-03-09 12:33                                           ` Ryan Roberts
2024-03-10 13:38                                             ` Matthew Wilcox
2024-03-08 15:33                         ` Matthew Wilcox
2024-03-09  6:09                       ` Matthew Wilcox
2024-03-09  7:59                         ` Ryan Roberts
2024-03-09  8:18                           ` Ryan Roberts
2024-03-09  9:38                             ` Ryan Roberts
2024-03-10  4:23                               ` Matthew Wilcox [this message]
2024-03-10  8:23                                 ` Ryan Roberts
2024-03-10 11:08                                   ` Matthew Wilcox
2024-03-10 11:01       ` Ryan Roberts
2024-03-10 11:11         ` Matthew Wilcox
2024-03-10 16:31           ` Ryan Roberts
2024-03-10 19:57             ` Matthew Wilcox
2024-03-10 19:59             ` Ryan Roberts
2024-03-10 20:46               ` Matthew Wilcox
2024-03-10 21:52                 ` Matthew Wilcox
2024-03-11  9:01                   ` Ryan Roberts
2024-03-11 12:26                     ` Matthew Wilcox
2024-03-11 12:36                       ` Ryan Roberts
2024-03-11 15:50                         ` Matthew Wilcox
2024-03-11 16:14                           ` Ryan Roberts
2024-03-11 17:49                             ` Matthew Wilcox
2024-03-12 11:57                               ` Ryan Roberts
2024-03-11 19:26                             ` Matthew Wilcox
2024-03-10 11:14         ` Ryan Roberts
2024-02-27 17:42 ` [PATCH v3 11/18] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 12/18] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 13/18] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 14/18] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 15/18] mm: Remove lru_to_page() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 16/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 17/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 18/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ze01sd5T6tLm6Ep8@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.