All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed
Date: Mon, 11 Mar 2024 17:49:09 +0000	[thread overview]
Message-ID: <Ze9EFdFLXQEUVtKl@casper.infradead.org> (raw)
In-Reply-To: <7177da75-0158-4227-98cf-ca93a73389a0@arm.com>

On Mon, Mar 11, 2024 at 04:14:06PM +0000, Ryan Roberts wrote:
> > With this patch, it took about 700 seconds in my xfstests run to find
> > the one in shrink_folio_list().  It took 3260 seconds to catch the one
> > in move_folios_to_lru().
> 
> 54 hours?? That's before I even reported that we still had a bug! Either you're
> anticipating my every move, or you have a lot of stuff running in parallel :)

One hour is 3600 seconds ;-)  So more like 53 minutes.

> > That one is only probably OK.  We usually manage to split a folio before
> > we get to this point, so we should remove it from the deferred list then.
> > You'd need to buy a lottery ticket if you managed to get hwpoison in a
> > folio that was deferred split ...
> 
> OK, but "probably"? Given hwpoison is surely not a hot path, why not just be
> safe and call folio_undo_large_rmappable()?

Yes, I've added it; just commenting on the likelihood of being able to
hit it in testing, even with aggressive error injection.

> >> But taking a step back, I wonder whether the charge() and uncharge() functions
> >> should really be checking to see if the folio is on a deferred split list and if
> >> so, then move the folio to the corect list?
> > 
> > I don't think that's the right thing to do.  All of these places which
> > uncharge a folio are part of the freeing path, so we always want it
> > removed, not moved to a different deferred list.
> 
> Well I'm just thinking about trying to be robust. Clearly you would prefer that
> folio_undo_large_rmappable() has been called before uncharge(), then uncharge()
> notices that there is nothing on the deferred list (and doesn't take the lock).
> But if it's not, is it better to randomly crash (costing best part of a week to
> debug) or move the folio to the right list?

Neither ;-)  The right option is to include the assertion that the
deferred list is empty.  That way we get to see the backtrace of whoever
forgot to take the folio off the deferred list.

> Alternatively, can we refactor so that there aren't 9 separate uncharge() call
> sites. Those sites are all trying to free the folio so is there a way to better
> refactor that into a single place (I guess the argument for the current
> arrangement is reducing the number of times that we have to iterate through the
> batch?). Then we only have to get it right once.

I have been wondering about a better way to do it.  I've also been
looking a bit askance at put_pages_list() which doesn't do memcg
uncharging ...

> > 
> > But what about mem_cgroup_move_account()?  Looks like that's memcg v1
> > only?  Should still be fixed though ...
> 
> Right.
> 
> And what about the first bug you found with the local list corruption? I'm not
> running with that fix so its obviously not a problem here. But I still think its
> a bug that we should fix? list_for_each_entry_safe() isn't safe against
> *concurrent* list modification, right?

I've been thinking about that too.  I decided that the local list is
actually protected by the lock after all.  It's a bit fiddly to prove,
but:

1. We have a reference on every folio ahead on the list (not behind us,
but see below)
2. If split_folio succeeds, it takes the lock that would protect the
list we are on.
3. If it doesn't, and folio_put() turns out to be the last reference,
__folio_put_large -> destroy_large_folio -> folio_undo_large_rmappable
takes the lock that protects the list we would be on.

So we can analyse this loop as:

	list_for_each_entry_safe(folio, next, &list, _deferred_list) {
		if (random() & 1)
			continue;
		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
		list_del_init(&folio->_deferred_list);
		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
	}

We're guaranteed that 'next' is a valid folio because we hold a refcount
on it.  Anything left on the list between &list and next may have been
removed from the list, but we don't look at those entries until after
we take the split_queue_lock again to do the list_splice_tail().

I'm too scared to write a loop like this, but I don't think it contains
a bug.


  reply	other threads:[~2024-03-11 17:49 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 17:42 [PATCH v3 00/18] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 01/18] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 02/18] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 03/18] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 04/18] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 05/18] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 06/18] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 07/18] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 08/18] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 09/18] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2024-03-06 13:42   ` Ryan Roberts
2024-03-06 16:09     ` Matthew Wilcox
2024-03-06 16:19       ` Ryan Roberts
2024-03-06 17:41         ` Ryan Roberts
2024-03-06 18:41           ` Zi Yan
2024-03-06 19:55             ` Matthew Wilcox
2024-03-06 21:55               ` Matthew Wilcox
2024-03-07  8:56                 ` Ryan Roberts
2024-03-07 13:50                   ` Yin, Fengwei
2024-03-07 14:05                     ` Re: Matthew Wilcox
2024-03-07 15:24                       ` Re: Ryan Roberts
2024-03-07 16:24                         ` Re: Ryan Roberts
2024-03-07 23:02                           ` Re: Matthew Wilcox
2024-03-08  1:06                       ` Re: Yin, Fengwei
2024-03-07 17:33                   ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox
2024-03-07 18:35                     ` Ryan Roberts
2024-03-07 20:42                       ` Matthew Wilcox
2024-03-08 11:44                     ` Ryan Roberts
2024-03-08 12:09                       ` Ryan Roberts
2024-03-08 14:21                         ` Ryan Roberts
2024-03-08 15:11                           ` Matthew Wilcox
2024-03-08 16:03                             ` Matthew Wilcox
2024-03-08 17:13                               ` Ryan Roberts
2024-03-08 18:09                                 ` Ryan Roberts
2024-03-08 18:18                                   ` Matthew Wilcox
2024-03-09  4:34                                     ` Andrew Morton
2024-03-09  4:52                                       ` Matthew Wilcox
2024-03-09  8:05                                         ` Ryan Roberts
2024-03-09 12:33                                           ` Ryan Roberts
2024-03-10 13:38                                             ` Matthew Wilcox
2024-03-08 15:33                         ` Matthew Wilcox
2024-03-09  6:09                       ` Matthew Wilcox
2024-03-09  7:59                         ` Ryan Roberts
2024-03-09  8:18                           ` Ryan Roberts
2024-03-09  9:38                             ` Ryan Roberts
2024-03-10  4:23                               ` Matthew Wilcox
2024-03-10  8:23                                 ` Ryan Roberts
2024-03-10 11:08                                   ` Matthew Wilcox
2024-03-10 11:01       ` Ryan Roberts
2024-03-10 11:11         ` Matthew Wilcox
2024-03-10 16:31           ` Ryan Roberts
2024-03-10 19:57             ` Matthew Wilcox
2024-03-10 19:59             ` Ryan Roberts
2024-03-10 20:46               ` Matthew Wilcox
2024-03-10 21:52                 ` Matthew Wilcox
2024-03-11  9:01                   ` Ryan Roberts
2024-03-11 12:26                     ` Matthew Wilcox
2024-03-11 12:36                       ` Ryan Roberts
2024-03-11 15:50                         ` Matthew Wilcox
2024-03-11 16:14                           ` Ryan Roberts
2024-03-11 17:49                             ` Matthew Wilcox [this message]
2024-03-12 11:57                               ` Ryan Roberts
2024-03-11 19:26                             ` Matthew Wilcox
2024-03-10 11:14         ` Ryan Roberts
2024-02-27 17:42 ` [PATCH v3 11/18] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 12/18] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 13/18] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 14/18] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 15/18] mm: Remove lru_to_page() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 16/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 17/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 18/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ze9EFdFLXQEUVtKl@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.