All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Zi Yan <ziy@nvidia.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Yang Shi <shy828301@gmail.com>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed
Date: Wed, 6 Mar 2024 19:55:50 +0000	[thread overview]
Message-ID: <ZejKRs0Ft9w2nm0s@casper.infradead.org> (raw)
In-Reply-To: <03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com>

On Wed, Mar 06, 2024 at 01:41:13PM -0500, Zi Yan wrote:
> I had a chat with willy on the deferred list mis-handling. Current migration
> code (starting from commit 616b8371539a6 ("mm: thp: enable thp migration in
> generic path")) does not properly handle THP and mTHP on the deferred list.
> So if the source folio is on the deferred list, after migration,
> the destination folio will not. But this seems a benign bug, since
> the opportunity of splitting a partially mapped THP/mTHP is gone.
> 
> In terms of potential races, the source folio refcount is elevated before
> migration, deferred_split_scan() can move the folio off the deferred_list,
> but cannot split it. During folio_migrate_mapping() when folio is frozen,
> deferred_split_scan() cannot move the folio off the deferred_list to begin
> with.
> 
> I am going to send a patch to fix the deferred_list handling in migration,
> but it seems not be related to the bug in this email thread.

... IOW the source folio remains on the deferred list until its
refcount goes to 0, at which point we call folio_undo_large_rmappable()
and remove it from the deferred list.

A different line of enquiry might be the "else /* We lost race with
folio_put() */" in deferred_split_scan().  If somebody froze the
refcount, we can lose track of a deferred-split folio.  But I think
that's OK too.  The only places which freeze a folio are vmscan (about
to free), folio_migrate_mapping() (discussed above), and page splitting.
In none of these cases do we want to keep the folio on the deferred
split list because we're either freeing it, migrating it or splitting
it.

Oh, and there's something in s390 that I can't be bothered to look at.


Hang on, I think I see it.  It is a race between folio freeing and
deferred_split_scan(), but page migration is absolved.  Look:

CPU 1: deferred_split_scan:
spin_lock_irqsave(split_queue_lock)
list_for_each_entry_safe()
folio_try_get()
list_move(&folio->_deferred_list, &list);
spin_unlock_irqrestore(split_queue_lock)
list_for_each_entry_safe() {
	folio_trylock() <- fails
	folio_put(folio);

CPU 2: folio_put:
folio_undo_large_rmappable
        ds_queue = get_deferred_split_queue(folio);
        spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
                list_del_init(&folio->_deferred_list);
*** at this point CPU 1 is not holding the split_queue_lock; the
folio is on the local list.  Which we just corrupted ***

Now anything can happen.  It's a pretty tight race that involves at
least two CPUs (CPU 2 might have been the one to have the folio locked
at the time CPU 1 caalled folio_trylock()).  But I definitely widened
the window by moving the decrement of the refcount and the removal from
the deferred list further apart.


OK, so what's the solution here?  Personally I favour using a
folio_batch in deferred_split_scan() to hold the folios that we're
going to try to remove instead of a linked list.  Other ideas that are
perhaps less intrusive?


  reply	other threads:[~2024-03-06 19:55 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 17:42 [PATCH v3 00/18] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 01/18] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 02/18] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 03/18] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 04/18] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 05/18] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 06/18] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 07/18] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 08/18] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 09/18] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2024-03-06 13:42   ` Ryan Roberts
2024-03-06 16:09     ` Matthew Wilcox
2024-03-06 16:19       ` Ryan Roberts
2024-03-06 17:41         ` Ryan Roberts
2024-03-06 18:41           ` Zi Yan
2024-03-06 19:55             ` Matthew Wilcox [this message]
2024-03-06 21:55               ` Matthew Wilcox
2024-03-07  8:56                 ` Ryan Roberts
2024-03-07 13:50                   ` Yin, Fengwei
2024-03-07 14:05                     ` Re: Matthew Wilcox
2024-03-07 15:24                       ` Re: Ryan Roberts
2024-03-07 16:24                         ` Re: Ryan Roberts
2024-03-07 23:02                           ` Re: Matthew Wilcox
2024-03-08  1:06                       ` Re: Yin, Fengwei
2024-03-07 17:33                   ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox
2024-03-07 18:35                     ` Ryan Roberts
2024-03-07 20:42                       ` Matthew Wilcox
2024-03-08 11:44                     ` Ryan Roberts
2024-03-08 12:09                       ` Ryan Roberts
2024-03-08 14:21                         ` Ryan Roberts
2024-03-08 15:11                           ` Matthew Wilcox
2024-03-08 16:03                             ` Matthew Wilcox
2024-03-08 17:13                               ` Ryan Roberts
2024-03-08 18:09                                 ` Ryan Roberts
2024-03-08 18:18                                   ` Matthew Wilcox
2024-03-09  4:34                                     ` Andrew Morton
2024-03-09  4:52                                       ` Matthew Wilcox
2024-03-09  8:05                                         ` Ryan Roberts
2024-03-09 12:33                                           ` Ryan Roberts
2024-03-10 13:38                                             ` Matthew Wilcox
2024-03-08 15:33                         ` Matthew Wilcox
2024-03-09  6:09                       ` Matthew Wilcox
2024-03-09  7:59                         ` Ryan Roberts
2024-03-09  8:18                           ` Ryan Roberts
2024-03-09  9:38                             ` Ryan Roberts
2024-03-10  4:23                               ` Matthew Wilcox
2024-03-10  8:23                                 ` Ryan Roberts
2024-03-10 11:08                                   ` Matthew Wilcox
2024-03-10 11:01       ` Ryan Roberts
2024-03-10 11:11         ` Matthew Wilcox
2024-03-10 16:31           ` Ryan Roberts
2024-03-10 19:57             ` Matthew Wilcox
2024-03-10 19:59             ` Ryan Roberts
2024-03-10 20:46               ` Matthew Wilcox
2024-03-10 21:52                 ` Matthew Wilcox
2024-03-11  9:01                   ` Ryan Roberts
2024-03-11 12:26                     ` Matthew Wilcox
2024-03-11 12:36                       ` Ryan Roberts
2024-03-11 15:50                         ` Matthew Wilcox
2024-03-11 16:14                           ` Ryan Roberts
2024-03-11 17:49                             ` Matthew Wilcox
2024-03-12 11:57                               ` Ryan Roberts
2024-03-11 19:26                             ` Matthew Wilcox
2024-03-10 11:14         ` Ryan Roberts
2024-02-27 17:42 ` [PATCH v3 11/18] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 12/18] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 13/18] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 14/18] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 15/18] mm: Remove lru_to_page() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 16/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 17/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 18/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZejKRs0Ft9w2nm0s@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.