From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 439E5C48BF6 for ; Thu, 7 Mar 2024 16:24:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 759DF6B01FB; Thu, 7 Mar 2024 11:24:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E2276B01FC; Thu, 7 Mar 2024 11:24:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 582B56B01FE; Thu, 7 Mar 2024 11:24:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 417F46B01FB for ; Thu, 7 Mar 2024 11:24:50 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 090CC1A06E6 for ; Thu, 7 Mar 2024 16:24:50 +0000 (UTC) X-FDA: 81870766740.02.5846CB3 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf20.hostedemail.com (Postfix) with ESMTP id EC28F1C0013 for ; Thu, 7 Mar 2024 16:24:47 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf20.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709828688; a=rsa-sha256; cv=none; b=OSKdQh5qLC1njpXLO/xe3NtM1TNPx1ZnkhrrXEsf4d/AMlqbsLI6UfOqCoiw88SZ65/y4B 0XLcsGqTreqNGk2+Hx48L9XMwBQvJ/WwD1/Tp8+bpT0a9McX885TqwkktD/eZajCX5J/XI uUw7zbkNQVZkicCyv5kTDlbNeXreol0= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf20.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709828688; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=o66RPkO5jJIij75MHy4nRuxqZ3FZG+d0/VQF8XLeZt4=; b=AXsbNeEebscBuZuyxye6nj06EHey6BEvyAPpi7qMEtHtOMYr9gSscTQLeiuB0swGQ/9qQ8 29XXsq5w2qVzUH43vy0Yx2Ip1H0sxWbw0sD/yAZo5NusLVjVfC8xBwPNy6oLC27Wmgl6k8 zGISqmmF3j4NdOu524Nsj8fIY8NtQv4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A1DADC15; Thu, 7 Mar 2024 08:25:23 -0800 (PST) Received: from [10.1.25.184] (XHFQ2J9959.cambridge.arm.com [10.1.25.184]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7AD923F738; Thu, 7 Mar 2024 08:24:45 -0800 (PST) Message-ID: <9ed188de-648e-463a-832b-ef132da1d16e@arm.com> Date: Thu, 7 Mar 2024 16:24:43 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Content-Language: en-GB From: Ryan Roberts To: Matthew Wilcox , "Yin, Fengwei" Cc: Zi Yan , Andrew Morton , linux-mm@kvack.org, Yang Shi , Huang Ying References: <20240227174254.710559-11-willy@infradead.org> <367a14f7-340e-4b29-90ae-bc3fcefdd5f4@arm.com> <85cc26ed-6386-4d6b-b680-1e5fba07843f@arm.com> <36bdda72-2731-440e-ad15-39b845401f50@arm.com> <03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com> <0f5bdbf3-725b-49c7-ba66-973b7cfc93be@intel.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: EC28F1C0013 X-Stat-Signature: kiuufhb8frq1dbox18gw4ms3qwczxzut X-Rspam-User: X-HE-Tag: 1709828687-720211 X-HE-Meta: U2FsdGVkX19IMJISQdarnvTX67iepFLPzd1ENS+oy/LsGSIEcK9sdPPePyfexIzcbcGABmpSyhVsl4kA/JRIps01A/QWxT2vtXdxBzIQIjMIOA8A3GEyMvJ4D2E31T/rZcLlP0IemQugKGQFWGms4bblOMe8RtEacaGcPB6h6J+kQYnsB3JgD4KFrBGexOfGqnSKYZvE5ET2TBjymZkcbJoJv7WO5fHxeopDjgILrCWbwnPOT1E1EC6dw9OV8VGkm+F8cEULGOzNOAKodtO4o0I108jeG7LFm8bzVqvf45XaIyvqmjXUWGJo8v//VYsRWzeiDfxIwccL3TR4KsX7aVCTHYGLAywuFVfZok2gko4IqNUk4ndZ1P6S8FkoZdiErx6guPRIZJEY7fz2PxNsKXS2n5lCezcOqY4Rh9yH5u2mAZcohM8ohbC/EGbbRr1uvkYvhUa+pvpiSADipmhZaHkmtgTrc5XhBLCWIZTocU2z7hz5x4LwKqblHhKQX/E8XbFYjrBoaeLZ3Wm6Chns/UHMvXPxWh06IdFqN1oTwlRgSymwyLzmlFAjX4U8ZP0a3hChPSwiXh5X1HEekLXI0vPe6+5JcKLSgGeKnEPr7MDWxi6DhZD4B0crJMD7RnE6blccA1v5QrHGYnUtP3m9f5MgDkNnH0+RkBUQYC7KFP2djjJIfHxqb6t8fvBA5g0paIgzGLy82F+MCuEJhv0J8+mA3fbucrQ9G57DA7SWGCpv6COrSFVUFddFwnQQkJFJ7Ubsu91yx/VtwUqex3MeHxry79tBZodFjBlFJ2f8sXD67i2deG8p3oRzAiJXDEq+Xqorg3Fu66MiavkgPkqTM9ywauZEf8tTS8AD2ONuxYYnarNf9Cz7fp1NQfE9K76XRXkxRYJMlyCFB8FAGxJV6sJBdP3Sqv/+4Ej1iZRcDG2pGbn8QOEvedBOCfQKnIrdtBnNaEEjYdBw/3kAqPA 5MeiMCVK VkvvcH7Ua5jTSp7VKv/3hztIXnvJnASGvn63e/uHpf17JfGOp2gl4fwG39Q0EdwIRH9jCKynGtIAJJZMKrSQ2M0AOaVH9BRsPkHCrgvvmYBKdb9hupkoKQeVNU5mDILn10Oldcf9rxnqyehGGBPOhvbunCQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 07/03/2024 15:24, Ryan Roberts wrote: > On 07/03/2024 14:05, Matthew Wilcox wrote: >> On Thu, Mar 07, 2024 at 09:50:09PM +0800, Yin, Fengwei wrote: >>> >>> >>> On 3/7/2024 4:56 PM, wrote: >>>> I just want to make sure I've understood correctly: CPU1's folio_put() >>>> is not the last reference, and it keeps iterating through the local >>>> list. Then CPU2 does the final folio_put() which causes list_del_init() >>>> to modify the local list concurrently with CPU1's iteration, so CPU1 >>>> probably goes into the weeds? >>> >>> My understanding is this can not corrupt the folio->deferred_list as >>> this folio was iterated already. >> >> I am not convinced about that at all. It's possible this isn't the only >> problem, but deleting something from a list without holding (the correct) >> lock is something you have to think incredibly hard about to get right. >> I didn't bother going any deeper into the analysis once I spotted the >> locking problem, but the proof is very much on you that this is not a bug! >> >>> But I did see other strange thing: >>> [ 76.269942] page: refcount:0 mapcount:1 mapping:0000000000000000 >>> index:0xffffbd0a0 pfn:0x2554a0 >>> [ 76.270483] note: kcompactd0[62] exited with preempt_count 1 >>> [ 76.271344] head: order:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 >>> >>> This large folio has order 0? Maybe folio->_flags_1 was screwed? >>> >>> In free_unref_folios(), there is code like following: >>> if (order > 0 && folio_test_large_rmappable(folio)) >>> folio_undo_large_rmappable(folio); >>> >>> But with destroy_large_folio(): >>> if (folio_test_large_rmappable(folio)) >>> >>> folio_undo_large_rmappable(folio); >>> >>> Can it connect to the folio has zero refcount still in deferred list >>> with Matthew's patch? >>> >>> >>> Looks like folio order was cleared unexpected somewhere. > > I think there could be something to this... > > I have a set up where, when running with Matthew's deferred split fix AND have > commit 31b2ff82aefb "mm: handle large folios in free_unref_folios()" REVERTED, > everything works as expected. And at the end, I have the expected amount of > memory free (seen in meminfo and buddyinfo). > > But if I run only with the deferred split fix and DO NOT revert the other > change, everything grinds to a halt when swapping 2M pages. Sometimes with RCU > stalls where I can't even interact on the serial port. Sometimes (more usually) > everything just gets stuck trying to reclaim and allocate memory. And when I > kill the jobs, I still have barely any memory in the system - about 10% what I > would expect. > > So is it possible that after commit 31b2ff82aefb "mm: handle large folios in > free_unref_folios()", when freeing 2M folio back to the buddy, we are actually > only telling it about the first 4K page? So we end up leaking the rest? I notice that before the commit, large folios are uncharged with __mem_cgroup_uncharge() and now they use __mem_cgroup_uncharge_folios(). The former has an upfront check: if (!folio_memcg(folio)) return; I'm not exactly sure what that's checking but could the fact this is missing after the change cause things to go wonky? > >> >> No, we intentionally clear it: >> >> free_unref_folios -> free_unref_page_prepare -> free_pages_prepare -> >> page[1].flags &= ~PAGE_FLAGS_SECOND; >> >> PAGE_FLAGS_SECOND includes the order, which is why we have to save it >> away in folio->private so that we know what it is in the second loop. >> So it's always been cleared by the time we call free_page_is_bad(). >