From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5488DC54791 for ; Sun, 10 Mar 2024 11:01:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F3A56B0072; Sun, 10 Mar 2024 07:01:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A2ED6B0074; Sun, 10 Mar 2024 07:01:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46AF16B0075; Sun, 10 Mar 2024 07:01:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3801F6B0072 for ; Sun, 10 Mar 2024 07:01:13 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A2A7F40DE2 for ; Sun, 10 Mar 2024 11:01:12 +0000 (UTC) X-FDA: 81880837584.09.714A133 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf10.hostedemail.com (Postfix) with ESMTP id AD414C001B for ; Sun, 10 Mar 2024 11:01:10 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710068471; a=rsa-sha256; cv=none; b=kUvEbd3ttam5HV7L/kHUkfKsKcOHnWHbV9aa4fcip9aJIPi3Jf84nhCC13uWnxrY1y2tef 8aSmHsoJuuVxxLwc8UI8EsLr/Ja5Te6NDhXP/TsqbUONP9xLzo1KUOLHiIEPjLoYVOuA1/ w/vQdoJIXt8JhS+UYxMZRN4z45r8odI= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710068471; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LnBDsSGufDLgenMMNSSZCjsZGIQ1aDXNQrq0l+N4Acc=; b=2HIpU2+Pik1FcZox1e2/sDe9wHKGwDmpgM0+7hD0lTyUlyGvA74EEvYorDJ/rm76/MHyDl NkWn8JhByZjCl8apYI32mrD/IlkT1p+yfyN0P0NVSdZjgh2NL7X7ZGlVMa0uFXtYTuT5tD uq1tNshfovYLlP7yFbc2i42NDLb0V+4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8B5DB113E; Sun, 10 Mar 2024 04:01:45 -0700 (PDT) Received: from [10.57.68.196] (unknown [10.57.68.196]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A8E893F738; Sun, 10 Mar 2024 04:01:08 -0700 (PDT) Message-ID: <8cd67a3d-81a7-4127-9d17-a1d465c3f9e8@arm.com> Date: Sun, 10 Mar 2024 11:01:06 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Content-Language: en-GB To: Matthew Wilcox Cc: Andrew Morton , linux-mm@kvack.org References: <20240227174254.710559-1-willy@infradead.org> <20240227174254.710559-11-willy@infradead.org> <367a14f7-340e-4b29-90ae-bc3fcefdd5f4@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: AD414C001B X-Stat-Signature: 4fhom5t5krenhgy6dtde7ci51r37g5u5 X-Rspam-User: X-HE-Tag: 1710068470-555329 X-HE-Meta: U2FsdGVkX191Bd4YlM0tJxGjq4eUuvV7khIh9adJ9QQC/u87dz0UawILm7ymIvpy9Y1w/Gs0js3XHV++Eeoh3s05lX1hZhtvWHAmB/rEHjOk257TMX6y5/VK2atnbBXtL2Q4kIYbxCFvYoXFRHOCZhoF1ingiyhD+vnL7qhGKoz7yVHlKdLL3cm6If7LtPnBoXriBVwygwVQ3mBM0PSKnoizstqLUi+gvksv7/oOdvFUIx0B3N4qCTLwCUpM4vO5MiGSDNU9RJNq5mNjtGK442TxlHRr+ILstg6BYLHi0RL7utRrFPmip8orySAXkoIU6bbBI/5MiKtRHMfLDuQPE7QvvLt8F+l29AwMuvlEtTZCvVwkzPohDaQ1yGf2uzgEvdStjZM9iym4byKQcp3VODaTTUa2h7ftIiJZV3+AL21dfiCTCDP+VxTZK8t9IF0etuTHRi8PpWu+Bs1Irq4yyBbblMtYbgOTtfb12uVcXj2nTWf6oB0OSGtwM6oMHpa1RUw/wl8Bfug99e44iWHO1A1LBadgOGXz/2ivF/thGWCDGX11izAsWbi9feauvDqWxQWrzc7Zylfskp/0XuNN8/YmeV2CMQIdYfenJzwQ5/1aXS7/eWd/TCgkViMBxcyzisMQvOMRVfGoBcigLw8WZrqIAQ0X8eGoV74zcNY2hBvJMakeslP3qXPTZ5fJX0NM1gBgEwy60cyKQmQd5jO+bE97/Bdl0q2C+uDgw6erjV2+r7/l8i7P/VduU9MvXpxgQVZsApgCKE5eNQaReJLnoYiUNEnYj/2tyId612Zh6emmqn3Rr74S8/kdkGIuSfp7yRyq6KAE6NyXBa57EKUsO4pzoWftvtZ9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 06/03/2024 16:09, Matthew Wilcox wrote: > On Wed, Mar 06, 2024 at 01:42:06PM +0000, Ryan Roberts wrote: >> When running some swap tests with this change (which is in mm-stable) >> present, I see BadThings(TM). Usually I see a "bad page state" >> followed by a delay of a few seconds, followed by an oops or NULL >> pointer deref. Bisect points to this change, and if I revert it, >> the problem goes away. > > That oops is really messed up ;-( We're clearly got two CPUs oopsing at > the same time and it's all interleaved. That said, I can pick some > nuggets out of it. > >> [ 76.239466] BUG: Bad page state in process usemem pfn:2554a0 >> [ 76.240196] kernel BUG at include/linux/mm.h:1120! > > These are the two different BUGs being called simultaneously ... > > The first one is bad_page() in page_alloc.c and the second is > put_page_testzero() > VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); > > I'm sure it's significant that both of these are the same page (pfn > 2554a0). Feels like we have two CPUs calling put_folio() at the same > time, and one of them underflows. It probably doesn't matter which call > trace ends up in bad_page() and which in put_page_testzero(). > > One of them is coming from deferred_split_scan(), which is weird because > we can see the folio_try_get() earlier in the function. So whatever > this folio was, we found it on the deferred split list, got its refcount, > moved it to the local list, either failed to get the lock, or > successfully got the lock, split it, unlocked it and put it. > > (I can see this was invoked from page fault -> memcg shrinking. That's > probably irrelevant but explains some of the functions in the backtrace) > > The other call trace comes from migrate_folio_done() where we're putting > the _source_ folio. That was called from migrate_pages_batch() which > was called from kcompactd. > > Um. Where do we handle the deferred list in the migration code? > > > I've also tried looking at this from a different angle -- what is it > about this commit that produces this problem? It's a fairly small > commit: > > - if (folio_test_large(folio)) { > + /* hugetlb has its own memcg */ > + if (folio_test_hugetlb(folio)) { > if (lruvec) { > unlock_page_lruvec_irqrestore(lruvec, flags); > lruvec = NULL; > } > - __folio_put_large(folio); > + free_huge_folio(folio); > > So all that's changed is that large non-hugetlb folios do not call > __folio_put_large(). As a reminder, that function does: > > if (!folio_test_hugetlb(folio)) > page_cache_release(folio); > destroy_large_folio(folio); > > and destroy_large_folio() does: > if (folio_test_large_rmappable(folio)) > folio_undo_large_rmappable(folio); > > mem_cgroup_uncharge(folio); > free_the_page(&folio->page, folio_order(folio)); > > So after my patch, instead of calling (in order): > > page_cache_release(folio); > folio_undo_large_rmappable(folio); > mem_cgroup_uncharge(folio); > free_unref_page() > > it calls: > > __page_cache_release(folio, &lruvec, &flags); > mem_cgroup_uncharge_folios() > folio_undo_large_rmappable(folio); I was just looking at this again, and something pops out... You have swapped the order of folio_undo_large_rmappable() and mem_cgroup_uncharge(). But folio_undo_large_rmappable() calls get_deferred_split_queue() which tries to get the split queue from folio_memcg(folio) first and falls back to pgdat otherwise. If you are now calling mem_cgroup_uncharge_folios() first, will that remove the folio from the cgroup? Then we are operating on the wrong list? (just a guess based on the name of the function...) > > So have I simply widened the window for this race, whatever it is > exactly? Something involving mis-handling of the deferred list? >