linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed
Date: Mon, 11 Mar 2024 15:50:09 +0000	[thread overview]
Message-ID: <Ze8oMcUrmWWzt9Qd@casper.infradead.org> (raw)
In-Reply-To: <79dad067-1d26-4867-8eb1-941277b9a77b@arm.com>

On Mon, Mar 11, 2024 at 12:36:00PM +0000, Ryan Roberts wrote:
> On 11/03/2024 12:26, Matthew Wilcox wrote:
> > On Mon, Mar 11, 2024 at 09:01:16AM +0000, Ryan Roberts wrote:
> >> [  153.499149] Call trace:
> >> [  153.499470]  uncharge_folio+0x1d0/0x2c8
> >> [  153.500045]  __mem_cgroup_uncharge_folios+0x5c/0xb0
> >> [  153.500795]  move_folios_to_lru+0x5bc/0x5e0
> >> [  153.501275]  shrink_lruvec+0x5f8/0xb30
> > 
> >> And that code is from your commit 29f3843026cf ("mm: free folios directly in move_folios_to_lru()") which is another patch in the same series. This suffers from the same problem; uncharge before removing folio from deferred list, so using wrong lock - there are 2 sites in this function that does this.
> > 
> > Two sites, but basically the same thing; one is for "the batch is full"
> > and the other is "we finished the list".
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index a0e53999a865..f60c5b3977dc 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1842,6 +1842,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
> >  		if (unlikely(folio_put_testzero(folio))) {
> >  			__folio_clear_lru_flags(folio);
> >  
> > +			if (folio_test_large(folio) &&
> > +			    folio_test_large_rmappable(folio))
> > +				folio_undo_large_rmappable(folio);
> >  			if (folio_batch_add(&free_folios, folio) == 0) {
> >  				spin_unlock_irq(&lruvec->lru_lock);
> >  				mem_cgroup_uncharge_folios(&free_folios);
> > 
> >> A quick grep over the entire series has a lot of hits for "uncharge". I
> >> wonder if we need a full audit of that series for other places that
> >> could potentially be doing the same thing?
> > 
> > I think this assertion will catch all occurrences of the same thing,
> > as long as people who are testing are testing in a memcg.  My setup
> > doesn't use a memcg, so I never saw any of this ;-(
> > 
> > If you confirm this fixes it, I'll send two patches; a respin of the patch
> > I sent on Sunday that calls undo_large_rmappable in this one extra place,
> > and then a patch to add the assertions.
> 
> Good timing on your response - I've just finished testing! Although my patch
> included both the site you have above and another that I fixed up speculatively
> in shrink_folio_list() based on reviewing all mem_cgroup_uncharge_folios() call
> sites.

I've been running some tests with this:

+++ b/mm/huge_memory.c
@@ -3223,6 +3221,7 @@ void folio_undo_large_rmappable(struct folio *folio)
        struct deferred_split *ds_queue;
        unsigned long flags;
 
+       folio_clear_large_rmappable(folio);
        if (folio_order(folio) <= 1)
                return;
 
+++ b/mm/memcontrol.c
@@ -7483,6 +7483,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
        struct obj_cgroup *objcg;

        VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
+       VM_BUG_ON_FOLIO(folio_test_large(folio) &&
+                       folio_test_large_rmappable(folio), folio);

        /*
         * Nobody should be changing or seriously looking at


which I think is too aggressive to be added.  It does "catch" one spot
where we don't call folio_undo_large_rmappable() in
__filemap_add_folio() but since it's the "we allocated a folio, charged
it, but failed to add it to the pagecache" path, there's no way that
it can have been mmaped, so it can't be on the split list.

With this patch, it took about 700 seconds in my xfstests run to find
the one in shrink_folio_list().  It took 3260 seconds to catch the one
in move_folios_to_lru().

The thought occurs that we know these folios are large rmappable (if
they're large).  It's a little late to make that optimisation, but
during the upcoming development cycle, I'm going to remove that
half of the test.  ie I'll make it look like this:

@@ -1433,6 +1433,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
                 */
                nr_reclaimed += nr_pages;

+               if (folio_test_large(folio))
+                       folio_undo_large_rmappable(folio);
                if (folio_batch_add(&free_folios, folio) == 0) {
                        mem_cgroup_uncharge_folios(&free_folios);
                        try_to_unmap_flush();

> There is also a call to mem_cgroup_uncharge() in delete_from_lru_cache(), which
> I couldn't convince myself was safe. Perhaps you could do a quick audit of the
> call sites?

That one is only probably OK.  We usually manage to split a folio before
we get to this point, so we should remove it from the deferred list then.
You'd need to buy a lottery ticket if you managed to get hwpoison in a
folio that was deferred split ...

I reviewed all the locations that call mem_cgroup_uncharge:

__filemap_add_folio	Never mmapable
free_huge_folio		hugetlb, never splittable
delete_from_lru_cache	Discussed above
free_zone_device_page	compound pages not yet supported
destroy_large_folio	folio_undo_large_rmappable already called
__folio_put_small	not a large folio

Then all the placs that call mem_cgroup_uncharge_folios:

folios_put_refs		Just fixed
shrink_folio_list	Just fixed
move_folios_to_lru	Just fixed

So I think we're good.

> But taking a step back, I wonder whether the charge() and uncharge() functions
> should really be checking to see if the folio is on a deferred split list and if
> so, then move the folio to the corect list?

I don't think that's the right thing to do.  All of these places which
uncharge a folio are part of the freeing path, so we always want it
removed, not moved to a different deferred list.

But what about mem_cgroup_move_account()?  Looks like that's memcg v1
only?  Should still be fixed though ...


  reply	other threads:[~2024-03-11 15:50 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 17:42 [PATCH v3 00/18] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 01/18] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 02/18] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 03/18] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 04/18] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 05/18] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 06/18] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 07/18] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 08/18] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 09/18] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2024-03-06 13:42   ` Ryan Roberts
2024-03-06 16:09     ` Matthew Wilcox
2024-03-06 16:19       ` Ryan Roberts
2024-03-06 17:41         ` Ryan Roberts
2024-03-06 18:41           ` Zi Yan
2024-03-06 19:55             ` Matthew Wilcox
2024-03-06 21:55               ` Matthew Wilcox
2024-03-07  8:56                 ` Ryan Roberts
2024-03-07 13:50                   ` Yin, Fengwei
2024-03-07 14:05                     ` Re: Matthew Wilcox
2024-03-07 15:24                       ` Re: Ryan Roberts
2024-03-07 16:24                         ` Re: Ryan Roberts
2024-03-07 23:02                           ` Re: Matthew Wilcox
2024-03-08  1:06                       ` Re: Yin, Fengwei
2024-03-07 17:33                   ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox
2024-03-07 18:35                     ` Ryan Roberts
2024-03-07 20:42                       ` Matthew Wilcox
2024-03-08 11:44                     ` Ryan Roberts
2024-03-08 12:09                       ` Ryan Roberts
2024-03-08 14:21                         ` Ryan Roberts
2024-03-08 15:11                           ` Matthew Wilcox
2024-03-08 16:03                             ` Matthew Wilcox
2024-03-08 17:13                               ` Ryan Roberts
2024-03-08 18:09                                 ` Ryan Roberts
2024-03-08 18:18                                   ` Matthew Wilcox
2024-03-09  4:34                                     ` Andrew Morton
2024-03-09  4:52                                       ` Matthew Wilcox
2024-03-09  8:05                                         ` Ryan Roberts
2024-03-09 12:33                                           ` Ryan Roberts
2024-03-10 13:38                                             ` Matthew Wilcox
2024-03-08 15:33                         ` Matthew Wilcox
2024-03-09  6:09                       ` Matthew Wilcox
2024-03-09  7:59                         ` Ryan Roberts
2024-03-09  8:18                           ` Ryan Roberts
2024-03-09  9:38                             ` Ryan Roberts
2024-03-10  4:23                               ` Matthew Wilcox
2024-03-10  8:23                                 ` Ryan Roberts
2024-03-10 11:08                                   ` Matthew Wilcox
2024-03-10 11:01       ` Ryan Roberts
2024-03-10 11:11         ` Matthew Wilcox
2024-03-10 16:31           ` Ryan Roberts
2024-03-10 19:57             ` Matthew Wilcox
2024-03-10 19:59             ` Ryan Roberts
2024-03-10 20:46               ` Matthew Wilcox
2024-03-10 21:52                 ` Matthew Wilcox
2024-03-11  9:01                   ` Ryan Roberts
2024-03-11 12:26                     ` Matthew Wilcox
2024-03-11 12:36                       ` Ryan Roberts
2024-03-11 15:50                         ` Matthew Wilcox [this message]
2024-03-11 16:14                           ` Ryan Roberts
2024-03-11 17:49                             ` Matthew Wilcox
2024-03-12 11:57                               ` Ryan Roberts
2024-03-11 19:26                             ` Matthew Wilcox
2024-03-10 11:14         ` Ryan Roberts
2024-02-27 17:42 ` [PATCH v3 11/18] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 12/18] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 13/18] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 14/18] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 15/18] mm: Remove lru_to_page() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 16/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 17/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 18/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ze8oMcUrmWWzt9Qd@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).