From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26108C54E58 for ; Mon, 11 Mar 2024 09:01:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B7DF6B0074; Mon, 11 Mar 2024 05:01:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 766E96B0075; Mon, 11 Mar 2024 05:01:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E0F56B0078; Mon, 11 Mar 2024 05:01:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 483526B0074 for ; Mon, 11 Mar 2024 05:01:23 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0DF221A032B for ; Mon, 11 Mar 2024 09:01:23 +0000 (UTC) X-FDA: 81884164446.13.FDE9136 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf21.hostedemail.com (Postfix) with ESMTP id 749A01C0009 for ; Mon, 11 Mar 2024 09:01:20 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710147680; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G+3p2MM9XbuQwB+ze8Gh/JaHBoM9tq4h3uz0LNVc+vw=; b=beewksbTxSZ9QNeaPTUtrtuuRqVh3lwOGxRtyK9fCTIU8BEZlYCz0V81UwNTp13VO8K/mk nMDH5gN3dIb4xJao1INGT8CGJgyd7bJNEJvq9Ss3wEQZoPmAcGyeQ6ZwLVxgr6SjrPmPLI 0Ule0hWtQSomkErBV1U3qo/CG3Mta/I= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710147680; a=rsa-sha256; cv=none; b=EejkevPtOrQhkYQ0BgXGNnZV6eBE/kmjTSNOaE9QjScY3okbOQS7r/VQ4pszzMOBwbSQiZ bkaQO2XGuy7Pzxy6UJI976tzMyngsEE7R/z0V2Q/HTgxR8+5EE1CFLfK2+bUGUFZv5qQDI WF16R8rlNavgqiM++UEIvDvp7Xe8iII= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0220EFEC; Mon, 11 Mar 2024 02:01:56 -0700 (PDT) Received: from [10.57.68.246] (unknown [10.57.68.246]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 957903F762; Mon, 11 Mar 2024 02:01:18 -0700 (PDT) Message-ID: Date: Mon, 11 Mar 2024 09:01:16 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Content-Language: en-GB To: Matthew Wilcox Cc: Andrew Morton , linux-mm@kvack.org References: <20240227174254.710559-1-willy@infradead.org> <20240227174254.710559-11-willy@infradead.org> <367a14f7-340e-4b29-90ae-bc3fcefdd5f4@arm.com> <8cd67a3d-81a7-4127-9d17-a1d465c3f9e8@arm.com> <02e820c2-8a1d-42cc-954b-f9e041c4417a@arm.com> <9dfd3b3b-4733-4c7c-b09c-5e6388531e49@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 749A01C0009 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: dm17edaiszth6drxzr3ejtetn3zfj5ke X-HE-Tag: 1710147680-835279 X-HE-Meta: U2FsdGVkX1+YSW/6FDg9MM3vZh4JVeKYv3cFIubwQJgDK07n7Xw29e+cKdvi+h6wlmAeIjE8b0wzeY2M2Uh4uNBYeafK4khrv+uqOAWbwaQpN9h3EsCwiXOaGCVMXZH1YvGsBIkcFEZhTzClCyAe9+eMwkXxlpRbpaMGXtW5zWTPw2MQB+tPapAZt2di1d1mYZ2UcHUJ1hnW3sR3fBW9QcZKifgwUnsaX7N5vArBzf0//GVC+gjAZD1PD2WoRFirPMNovacIfVflSezTSHqLsRSe6WMEDhx0nHqy+ZhZpMyRb8seFBa9W6ApCc77Z/YuyAYKVfeQeE3yP/LLPrMc4WwqzTAtMpf04X/1sq/yalquf7L/rdisYH2sfmaUHlqhmrMM6J3Vz6Ecrkn9bn8zGEW+M3k+msGs56WyZYOeY4JKNmyel/k1BiSN/vYHXb02YwYL1fVw5R0/sIWAqVjzJ8LPIaB+R9pLEjybx0V7WZ84a9iCbDTIN4scX7yDMgoYJPPdRb4kq/RaF4L7SzHCSiXAcgW+aSg057nDM0UqPPcDJsJXhGG0COXswrlt41ufog2HUG9vqtuynAyUsaeRSfkOjrqW+5tUO6kfV6MtNYFdeGG8gdQ8HAFwPaDTud3D8kFQSpQeo1oBZCihlRKVpESzyMnLWSFhr1B4kP0EOZ9aMYd73ZEl9FPR1OxzdvLfxc2rI1enltZluLMaFABG/cmZjDWMswYBBqNWrYle/uYTgSSuIfRNufvFLR+j2JJfv+EH3LVCTndNcCoGP/NYmvB5bNXtGXgCRL/48K7bNZ+sfwUoj+ISz2wkqKQTzB+lLCbJz+060KKHyY9WlqtiqSTaaq+J9aCi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/03/2024 21:52, Matthew Wilcox wrote: > On Sun, Mar 10, 2024 at 08:46:58PM +0000, Matthew Wilcox wrote: >> On Sun, Mar 10, 2024 at 07:59:46PM +0000, Ryan Roberts wrote: >>> I've now been able to repro this without any of my code on top - just mm-unstable and your fix for the the memcg uncharging ordering issue. So we have separate, more difficultt to repro bug. I've discovered CONFIG_DEBUG_LIST so enabled that. I'll try to bisect in the morning, but I suspect it will be slow going. >>> >>> [ 390.317982] ------------[ cut here ]------------ >>> [ 390.318646] list_del corruption. prev->next should be fffffc00152a9090, but was fffffc002798a490. (prev=fffffc002798a490) >> >> Interesting. So prev->next is pointing to prev, ie prev is an empty >> list, but it should be pointing to this entry ... this is feeling like >> another missing lock. > > Let's check that we're not inverting the order of memcg_uncharge and > removing a folio from the deferred list (build tested only, but only > one line of this will be new to you): OK found it - its another instance of the same issue... Applied your below patch (resulting code: mm-unstable (d7182786dd0a) + yesterday's fix ("mm: Remove folio from deferred split list before uncharging it") + below patch). The new check triggered: [ 153.459843] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffffd5fc0 pfn:0x4da690 [ 153.460667] head: order:4 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 153.461218] memcg:ffff0000c7fa1000 [ 153.461519] anon flags: 0xbfffc00000a0048(uptodate|head|mappedtodisk|swapbacked|node=0|zone=2|lastcpupid=0xffff) [ 153.462678] page_type: 0xffffffff() [ 153.463294] raw: 0bfffc00000a0048 dead000000000100 dead000000000122 ffff0000fbfa29c1 [ 153.470267] raw: 0000000ffffd5fc0 0000000000000000 00000000ffffffff ffff0000c7fa1000 [ 153.471395] head: 0bfffc00000a0048 dead000000000100 dead000000000122 ffff0000fbfa29c1 [ 153.472494] head: 0000000ffffd5fc0 0000000000000000 00000000ffffffff ffff0000c7fa1000 [ 153.473357] head: 0bfffc0000020204 fffffc001269a401 dead000000000122 00000000ffffffff [ 153.481663] head: 0000001000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 153.482438] page dumped because: VM_BUG_ON_FOLIO(folio_order(folio) > 1 && !list_empty(&folio->_deferred_list)) [ 153.483464] ------------[ cut here ]------------ [ 153.484000] kernel BUG at mm/memcontrol.c:7486! [ 153.484484] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP [ 153.485249] Modules linked in: [ 153.485621] CPU: 33 PID: 2146 Comm: usemem Not tainted 6.8.0-rc5-00463-gb5100df1d6f3 #5 [ 153.486552] Hardware name: linux,dummy-virt (DT) [ 153.487300] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 153.488363] pc : uncharge_folio+0x1d0/0x2c8 [ 153.488922] lr : uncharge_folio+0x1d0/0x2c8 [ 153.489384] sp : ffff80008ea0b6d0 [ 153.489747] x29: ffff80008ea0b6d0 x28: 0000000000000000 x27: 00000000fffffffe [ 153.490626] x26: dead000000000100 x25: dead000000000122 x24: 0000000000000020 [ 153.491435] x23: ffff80008ea0b918 x22: ffff0000c7f88850 x21: ffff0000c7f88800 [ 153.492255] x20: ffff80008ea0b730 x19: fffffc001269a400 x18: 0000000000000006 [ 153.493087] x17: 212026262031203e x16: 20296f696c6f6628 x15: 0720072007200720 [ 153.494175] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720 [ 153.495186] x11: 0720072007200720 x10: ffff0013f5e7b7c0 x9 : ffff800080128e84 [ 153.496142] x8 : 00000000ffffbfff x7 : ffff0013f5e7b7c0 x6 : 80000000ffffc000 [ 153.497050] x5 : ffff0013a5987d08 x4 : 0000000000000000 x3 : 0000000000000000 [ 153.498041] x2 : 0000000000000000 x1 : ffff0000cbc2c500 x0 : 0000000000000063 [ 153.499149] Call trace: [ 153.499470] uncharge_folio+0x1d0/0x2c8 [ 153.500045] __mem_cgroup_uncharge_folios+0x5c/0xb0 [ 153.500795] move_folios_to_lru+0x5bc/0x5e0 [ 153.501275] shrink_lruvec+0x5f8/0xb30 [ 153.501833] shrink_node+0x4d8/0x8b0 [ 153.502227] do_try_to_free_pages+0xe0/0x5a8 [ 153.502835] try_to_free_mem_cgroup_pages+0x128/0x2d0 [ 153.503708] try_charge_memcg+0x114/0x658 [ 153.504344] __mem_cgroup_charge+0x6c/0xd0 [ 153.505007] __handle_mm_fault+0x42c/0x1640 [ 153.505684] handle_mm_fault+0x70/0x290 [ 153.506136] do_page_fault+0xfc/0x4d8 [ 153.506659] do_translation_fault+0xa4/0xc0 [ 153.507140] do_mem_abort+0x4c/0xa8 [ 153.507716] el0_da+0x2c/0x78 [ 153.508169] el0t_64_sync_handler+0xb8/0x130 [ 153.508810] el0t_64_sync+0x190/0x198 [ 153.509410] Code: 910c8021 a9025bf5 a90363f7 97fd7bef (d4210000) [ 153.510309] ---[ end trace 0000000000000000 ]--- [ 153.510974] Kernel panic - not syncing: Oops - BUG: Fatal exception [ 153.511727] SMP: stopping secondary CPUs [ 153.513519] Kernel Offset: disabled [ 153.514090] CPU features: 0x0,00000020,7002014a,2140720b [ 153.514960] Memory Limit: none [ 153.515457] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]--- move_folios_to_lru+0x5bc/0x5e0 is: static unsigned int move_folios_to_lru(struct lruvec *lruvec, struct list_head *list) { ... if (free_folios.nr) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); <<<<<<<<<<< HERE free_unref_folios(&free_folios); spin_lock_irq(&lruvec->lru_lock); } return nr_moved; } And that code is from your commit 29f3843026cf ("mm: free folios directly in move_folios_to_lru()") which is another patch in the same series. This suffers from the same problem; uncharge before removing folio from deferred list, so using wrong lock - there are 2 sites in this function that does this. A quick grep over the entire series has a lot of hits for "uncharge". I wonder if we need a full audit of that series for other places that could potentially be doing the same thing? > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index bb57b3d0c8cd..61fd1a4b424d 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio) > { > if (!folio || !folio_test_large(folio)) > return; > - if (folio_order(folio) > 1) > - INIT_LIST_HEAD(&folio->_deferred_list); > folio_set_large_rmappable(folio); > } > > diff --git a/mm/internal.h b/mm/internal.h > index 79d0848c10a5..690c68c18c23 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -525,6 +525,8 @@ static inline void prep_compound_head(struct page *page, unsigned int order) > atomic_set(&folio->_entire_mapcount, -1); > atomic_set(&folio->_nr_pages_mapped, 0); > atomic_set(&folio->_pincount, 0); > + if (order > 1) > + INIT_LIST_HEAD(&folio->_deferred_list); > } > > static inline void prep_compound_tail(struct page *head, int tail_idx) > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 138bcfa18234..e2334c4ee550 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -7483,6 +7483,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) > struct obj_cgroup *objcg; > > VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); > + VM_BUG_ON_FOLIO(folio_order(folio) > 1 && > + !list_empty(&folio->_deferred_list), folio); > > /* > * Nobody should be changing or seriously looking at > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index bdff5c0a7c76..1c1925b92934 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) > } > break; > case 2: > - /* > - * the second tail page: ->mapping is > - * deferred_list.next -- ignore value. > - */ > + /* the second tail page: deferred_list overlaps ->mapping */ > + if (unlikely(!list_empty(&folio->_deferred_list))) { > + bad_page(page, "on deferred list"); > + goto out; > + } > break; > default: > if (page->mapping != TAIL_MAPPING) {