From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D4FAC5475B for ; Mon, 11 Mar 2024 15:50:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FA716B00B9; Mon, 11 Mar 2024 11:50:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5AA036B00BB; Mon, 11 Mar 2024 11:50:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 472296B00BC; Mon, 11 Mar 2024 11:50:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 344396B00B9 for ; Mon, 11 Mar 2024 11:50:18 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 85DFF1401C1 for ; Mon, 11 Mar 2024 15:50:17 +0000 (UTC) X-FDA: 81885194874.07.F249C7D Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf15.hostedemail.com (Postfix) with ESMTP id 6043EA0033 for ; Mon, 11 Mar 2024 15:50:14 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=IGezUBPh; dmarc=none; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710172215; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g41W4mEEH+z7y1eLzyceGgQpUO6X9r4pLoWen1qtWnI=; b=0hp2fRIxGgFYAYcnqnmx2L4PGLI32hEOxx9zEaMfDE75LyWuic5TPcO6NCFN3zIPpDGIjv M91JsjpynnsLb2PV1XBhS5Xdy8zdqciEz8du7X983QN8U2DxSXA8ekDkquBF/y2c6mDsTC yKXuysyhqXOjavfXLLkdzCaKqmtOP4g= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=IGezUBPh; dmarc=none; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710172215; a=rsa-sha256; cv=none; b=LWPbHgQd5FCFXI1jqQ5J8WtrogWpo9aC6o1NHqmJuV9TKZ4ws5fXsz/ANMJUY3EvTtRdxb fHNXCr3Rdu0KWnoSagWrICb/4Xj5BW59o6raWyeztg1ynN5nzU4eR00266PRL2GqfMtJ5j Re4Xn2F66/4eQmJ9Fyqjwcuk09TGUMI= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=g41W4mEEH+z7y1eLzyceGgQpUO6X9r4pLoWen1qtWnI=; b=IGezUBPhQZ9bEpuI6NMEgHF2PT pdqbFUPHBm4e9y4CFGIb3HoJL9fBbjG2ytQkGDRx23h1L9LxCyHsCM+V0SF5MtulED4UNimCuGlzT eiDrpui2wMjXedoWkRb6bCT/8Zgv6DV5l+UfBjbrVOlLA76DB7r2UfVhq1yY3hRb+p+guWUxv4A7g Gg7JRo6KMUS1WTKJyApZQqMDeXOX/UR7+hNt1P9GCwysQcsp9vjmT/NIPsk4eeQTE8h3h8EMHvm87 2fu7QCIazKBvXGte5iXOH8Zx5XenHN627p3jTwvF8v6/pt41NAtAVMwEFbjhz7VCpa66YMSs6yPS3 9MwpWZrA==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rjhuj-00000000wI4-3hIR; Mon, 11 Mar 2024 15:50:09 +0000 Date: Mon, 11 Mar 2024 15:50:09 +0000 From: Matthew Wilcox To: Ryan Roberts Cc: Andrew Morton , linux-mm@kvack.org Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Message-ID: References: <8cd67a3d-81a7-4127-9d17-a1d465c3f9e8@arm.com> <02e820c2-8a1d-42cc-954b-f9e041c4417a@arm.com> <9dfd3b3b-4733-4c7c-b09c-5e6388531e49@arm.com> <79dad067-1d26-4867-8eb1-941277b9a77b@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <79dad067-1d26-4867-8eb1-941277b9a77b@arm.com> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 6043EA0033 X-Stat-Signature: exyijt3zn68onmrhhzpbjqpx8qmx9ubk X-HE-Tag: 1710172214-505156 X-HE-Meta: U2FsdGVkX1/ILSDCLeWpAI0His4ztGNgtrY/RgYPZeebXFctHSXLRSdvUR1QK/pEzlgcB/vkdrL1OJjTnDnBDWW3W+czWeXfqMGXmbmqIdAfH140oXVbNC3nTom1IQnnpSqq/8ejEX6v8nQeDYDF0wxnncbZdd9jpU8ghvdEgBN+AiH63HB+WzuGggbcDMhbs3GAhbSEhXbzDqPCqEguTP+EeahVn1sLFjcO+okg8AiVsSTB9xk/VVsM92l1FLA9/lYzU3q/aMTjmh+w5QmP8olDYlUUyG0S1xistVWnaiTf079ZVXL8LE6lE6L9YYy7R50M7yh5sB4U8uY7ZVBXT7VEs1PjEQ7y7QDQmzylQcvieR04ntiU928sM/tlDAKREQ6rGvPk0DMKkOfADtbS7oMFTaIJECHXGK2cALU91N7zxmKjLe9BC4HZmhmuAAvA8HkWt6QDav6VvsxReCnRRfaPKG4iKVBCXf5iAeP7NTbbVq2lGofnC9VY/gmNCE9zE2FPatCZNB34ZfWf7Y10UwzAAFDEX+To7Hc/KEfIOJPdkYqN29RIjxzw2WOZnw1phvEdb6WzpiwAYdUrBIPXopnGn4smn85vr7P/+W3Y9MuQ4f+jzgPjt6EM7eYijBxsnWudHORX/HhpKLiV/CEaPVwqN7AmGBfhSxEKaLpLREqYpq7FbCJ5U6rKO1XksOu183+sjiKLg1bzp4uKDf7fA+UbYv94aK0tl2R1tPHfuniyg0pEe3zLbCHb/HIliPwSWMHwbZJVsvfUfFtAmiUITGX0aHHsUgFKjfhFbHDpipbkjzWtx32lnCkxw5aH4BNtN0V6g5Hmf98JNXgg6CR/nuk/bgxd5g/lcDbyUEejZ81qfcnnCFITcI0qF7xxPpFiozskwmY3THScL1bF4rlEc4TnkOpyg4h52GwoKl0NEkGXpoMiRjfJK5hvPZ0lC5gyzvq4yNhI+TTFX3Avppg 1L6GNX2B k28Ped6EV6W6inuBFsyDX+0tor5//kDS7VuY0uxsn67OiPnhgyTHdaT3PvGyywUn+PMQ7cszDQHcda067qwCY75bTcXApffKcsw2cMywbR0bYV23522LMUjPUfUUi35ondBNbOhdkMenhb8w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 11, 2024 at 12:36:00PM +0000, Ryan Roberts wrote: > On 11/03/2024 12:26, Matthew Wilcox wrote: > > On Mon, Mar 11, 2024 at 09:01:16AM +0000, Ryan Roberts wrote: > >> [ 153.499149] Call trace: > >> [ 153.499470] uncharge_folio+0x1d0/0x2c8 > >> [ 153.500045] __mem_cgroup_uncharge_folios+0x5c/0xb0 > >> [ 153.500795] move_folios_to_lru+0x5bc/0x5e0 > >> [ 153.501275] shrink_lruvec+0x5f8/0xb30 > > > >> And that code is from your commit 29f3843026cf ("mm: free folios directly in move_folios_to_lru()") which is another patch in the same series. This suffers from the same problem; uncharge before removing folio from deferred list, so using wrong lock - there are 2 sites in this function that does this. > > > > Two sites, but basically the same thing; one is for "the batch is full" > > and the other is "we finished the list". > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index a0e53999a865..f60c5b3977dc 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1842,6 +1842,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, > > if (unlikely(folio_put_testzero(folio))) { > > __folio_clear_lru_flags(folio); > > > > + if (folio_test_large(folio) && > > + folio_test_large_rmappable(folio)) > > + folio_undo_large_rmappable(folio); > > if (folio_batch_add(&free_folios, folio) == 0) { > > spin_unlock_irq(&lruvec->lru_lock); > > mem_cgroup_uncharge_folios(&free_folios); > > > >> A quick grep over the entire series has a lot of hits for "uncharge". I > >> wonder if we need a full audit of that series for other places that > >> could potentially be doing the same thing? > > > > I think this assertion will catch all occurrences of the same thing, > > as long as people who are testing are testing in a memcg. My setup > > doesn't use a memcg, so I never saw any of this ;-( > > > > If you confirm this fixes it, I'll send two patches; a respin of the patch > > I sent on Sunday that calls undo_large_rmappable in this one extra place, > > and then a patch to add the assertions. > > Good timing on your response - I've just finished testing! Although my patch > included both the site you have above and another that I fixed up speculatively > in shrink_folio_list() based on reviewing all mem_cgroup_uncharge_folios() call > sites. I've been running some tests with this: +++ b/mm/huge_memory.c @@ -3223,6 +3221,7 @@ void folio_undo_large_rmappable(struct folio *folio) struct deferred_split *ds_queue; unsigned long flags; + folio_clear_large_rmappable(folio); if (folio_order(folio) <= 1) return; +++ b/mm/memcontrol.c @@ -7483,6 +7483,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) struct obj_cgroup *objcg; VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); + VM_BUG_ON_FOLIO(folio_test_large(folio) && + folio_test_large_rmappable(folio), folio); /* * Nobody should be changing or seriously looking at which I think is too aggressive to be added. It does "catch" one spot where we don't call folio_undo_large_rmappable() in __filemap_add_folio() but since it's the "we allocated a folio, charged it, but failed to add it to the pagecache" path, there's no way that it can have been mmaped, so it can't be on the split list. With this patch, it took about 700 seconds in my xfstests run to find the one in shrink_folio_list(). It took 3260 seconds to catch the one in move_folios_to_lru(). The thought occurs that we know these folios are large rmappable (if they're large). It's a little late to make that optimisation, but during the upcoming development cycle, I'm going to remove that half of the test. ie I'll make it look like this: @@ -1433,6 +1433,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, */ nr_reclaimed += nr_pages; + if (folio_test_large(folio)) + folio_undo_large_rmappable(folio); if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); > There is also a call to mem_cgroup_uncharge() in delete_from_lru_cache(), which > I couldn't convince myself was safe. Perhaps you could do a quick audit of the > call sites? That one is only probably OK. We usually manage to split a folio before we get to this point, so we should remove it from the deferred list then. You'd need to buy a lottery ticket if you managed to get hwpoison in a folio that was deferred split ... I reviewed all the locations that call mem_cgroup_uncharge: __filemap_add_folio Never mmapable free_huge_folio hugetlb, never splittable delete_from_lru_cache Discussed above free_zone_device_page compound pages not yet supported destroy_large_folio folio_undo_large_rmappable already called __folio_put_small not a large folio Then all the placs that call mem_cgroup_uncharge_folios: folios_put_refs Just fixed shrink_folio_list Just fixed move_folios_to_lru Just fixed So I think we're good. > But taking a step back, I wonder whether the charge() and uncharge() functions > should really be checking to see if the folio is on a deferred split list and if > so, then move the folio to the corect list? I don't think that's the right thing to do. All of these places which uncharge a folio are part of the freeing path, so we always want it removed, not moved to a different deferred list. But what about mem_cgroup_move_account()? Looks like that's memcg v1 only? Should still be fixed though ...