From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 832DAC5475B for ; Wed, 6 Mar 2024 18:41:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0AC986B009B; Wed, 6 Mar 2024 13:41:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 05D216B00A5; Wed, 6 Mar 2024 13:41:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E40026B00AF; Wed, 6 Mar 2024 13:41:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D465D6B009B for ; Wed, 6 Mar 2024 13:41:23 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A6FAAA05F5 for ; Wed, 6 Mar 2024 18:41:23 +0000 (UTC) X-FDA: 81867482046.08.2B892C7 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2055.outbound.protection.outlook.com [40.107.93.55]) by imf03.hostedemail.com (Postfix) with ESMTP id 9383520004 for ; Wed, 6 Mar 2024 18:41:19 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=D14u+ngm; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf03.hostedemail.com: domain of ziy@nvidia.com designates 40.107.93.55 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709750479; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EgOGjS2ZWV+BwRaNEeiQTKlvMp1uRHC/WUWt8zs0MhE=; b=cD9m1dc30dp+g9xWJL7z3qu6+zppAuRcuDWvV1yhxUatWnKK94G2qfvwlZH1KDKudy3O2A QP/gzHaqVMfdvnahDayWgEMXgAyaNv+uqxw6bNDIcJY66eq3zr0Ua28V172vLeu+kIGsy0 kUOCssgHh6BbaYiSJKLoj4SH/eipTjM= ARC-Authentication-Results: i=2; imf03.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=D14u+ngm; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf03.hostedemail.com: domain of ziy@nvidia.com designates 40.107.93.55 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1709750479; a=rsa-sha256; cv=pass; b=Gte8h+MK/0Iet56Tg3Kuq2YY5OJ6muUA3HN2GAi/J+xHhWFtxEHepzPNGIdE448EjCU786 QG9OePNAVv8drve5GS9IwKzJcUwJxPIQOP6Ms5waLwzW/7L1aWxhSYV8ojA6Ju4W1nsZvn AGGpC7HiY8Xy3vmT1KtOvKxlvSICj0A= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hmdHKj91wIVt4QxaWGE2A9XutlcGZxx8K+l6sW7DfP5v57S07na09aAWaJEMCtu1LFIbj1plUavKxPDYBpHqfP42saAOIurjvMjBK4jnOVkBCuG0NjB+jd5bP/a74HCYvs/pDpuA28b4axMmSBnRfwbvw2jtqTBhWKFUPfPTb3yzQjApGPmqpGrabtyRc8j+DX3DbQE2TLiR86Mbjqa3TxTq+jxF7UYIXIT/J/Ck80UDZ6ZKLwEMgURcRWghgSs99hsJtF75r/s5RmJt67jhm7XfZfqwhOo7HREVxfRbuRYrR4JrtU2VVw59dS80gKkf4Ah8TiyGKpF2dPG0fydQTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EgOGjS2ZWV+BwRaNEeiQTKlvMp1uRHC/WUWt8zs0MhE=; b=BJqWCI+hlOfCO/OrlsRZP6ZJPqQ1vN31FDZ34KvLIfUTAHoh0t9GDS+8Jkd5om32SbEt40SVAd/8BuRVEjMuxildlfCr0WfzWWsYEzl8bDOaCFR/nK14f0hLUwx37wMpKiP11iogms8WRIUdcGkt+TkB1SdsuVe7e2wldetXu0LSnRVLGXv8XqsYbMUbpqDfIugMxpyEB+ajmExA9M+zxFZe+Ga8sAGKd5fviWkvtoWlrw1CAImFljrCj3wcI69uVYu7jqdex2Sc9YNxIknyC7uZ2MgweXugbAFhDcvcmVDvdD6uOhggpUSB3LJPMJzEDYlq2+HeJPksvMf7k4JjkQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EgOGjS2ZWV+BwRaNEeiQTKlvMp1uRHC/WUWt8zs0MhE=; b=D14u+ngmFhohgOPuqd7qu9nObIS+Jb4U2w6ev3lE9plLt2TIqunoVlOcKM/LkhZfKBsKKqbEU97h+hjaGJYGHbFAgFibr0QTdUN3wRhuf+vO31edCMIZobsrfirhsB9tfzmVi1dftIuEHyQOL/p1lNu95/OAJVt1jVCyeKWteBM/AmDBmUzYrkPPw3DZrWCrI8QmZ8HWivXTJrWaSdFho8aAhn6RZoO9Kq4iBUUL5PIL+0p6RAquY0SY57bDCYDJVlJjl0XQANEmzSSqIVqosuVAU93uHGsOlb+ufivsiGKCYD1l/VeUhfI9oe9Vd72sUZE7/ZMyh/06PUzfYMKEwA== Received: from DS7PR12MB5744.namprd12.prod.outlook.com (2603:10b6:8:73::18) by SJ2PR12MB8738.namprd12.prod.outlook.com (2603:10b6:a03:548::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7339.39; Wed, 6 Mar 2024 18:41:16 +0000 Received: from DS7PR12MB5744.namprd12.prod.outlook.com ([fe80::dc5c:2cf1:d5f5:9753]) by DS7PR12MB5744.namprd12.prod.outlook.com ([fe80::dc5c:2cf1:d5f5:9753%6]) with mapi id 15.20.7339.035; Wed, 6 Mar 2024 18:41:16 +0000 From: Zi Yan To: Ryan Roberts Cc: Matthew Wilcox , Andrew Morton , linux-mm@kvack.org, Yang Shi , Huang Ying Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Date: Wed, 06 Mar 2024 13:41:13 -0500 X-Mailer: MailMate (1.14r6018) Message-ID: <03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com> In-Reply-To: <36bdda72-2731-440e-ad15-39b845401f50@arm.com> References: <20240227174254.710559-1-willy@infradead.org> <20240227174254.710559-11-willy@infradead.org> <367a14f7-340e-4b29-90ae-bc3fcefdd5f4@arm.com> <85cc26ed-6386-4d6b-b680-1e5fba07843f@arm.com> <36bdda72-2731-440e-ad15-39b845401f50@arm.com> Content-Type: multipart/signed; boundary="=_MailMate_79270828-7A42-4182-A14F-4D95ED2DC63E_="; micalg=pgp-sha512; protocol="application/pgp-signature" X-ClientProxiedBy: BL0PR1501CA0007.namprd15.prod.outlook.com (2603:10b6:207:17::20) To DS7PR12MB5744.namprd12.prod.outlook.com (2603:10b6:8:73::18) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR12MB5744:EE_|SJ2PR12MB8738:EE_ X-MS-Office365-Filtering-Correlation-Id: 90d16020-8d0f-4c36-96fe-08dc3e0d0170 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 4XUDosPj7ZsYK71NZx2zITBqn+FDZsscJNsuO7l7QqYqd8n69eIP6mX3xru7VinlmEWw6Yb+FpSdS2J4Lr5mO9b2gsxAxQ6Unm7HaCvj47PG3ECt452Rou97JSkP/TUQWWtwjAFXxgDBxlcTEvVNO7RKbXXHQ3ZkeD4rMv5PohKDZf+yOeLue2W4t99DRN8C1/+mte7r5YkP2tdRSdv0GSnHmYqIqbMrqgjdGS9cQmviK3uT0iUGFRCd1GXwGvVfrru2iJsdcue/rRowSFhiO6JVjw7TjmEAbdGLeDNz6IPAwPk2pKkrM6FtjmKhgZVeUDXsl9M01NxZeoEUYPcfYcQq5uV/JzySKaRDqm7XabCOa2QMHiwUtGtSgh1sUFkNhBVULyreONr+qEhdPXqgvHQpyK5MT44JHwkMpCAfTps6bplxZS9B4PZPsrCl26ofRz9v1h6zuPR9WzYjxKYloyn84ofmjpJ6pFA5Pu3WsTMRVCrOBXTcVwooJSlDLaM+DZmZ5AreOBPzjsYa/d++NRpDFfB9w4EY0mpA9/cvh5WAwGz0gJv6Qvq3rCqg5Sk3ibGRsKmDGZKokSD4f9h0bcrdV6nFNs0JPEq7xKj+kBt+SE353Oeesk0U4BB4bCbDq/cvBgSHf7kkJeIYCfq+HtWLFeUtBWOFt/Qctc+Q6wQ= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR12MB5744.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(376005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?HtkPj5GMcL6A+fvjNbiGq/jQ3GgQNHczXWKP+YafJllg4XUPf8a8Id+wWEgQ?= =?us-ascii?Q?epC9iNT+kvcfWzU0pp0RSX9c3DBsqqguULuB31VTCFPrmOWlBLV8EnfRlmLS?= =?us-ascii?Q?mAtrEOPvrCSMPyAc03k+rcvpb9TSYpdU3mnaYsnhJT4yKeredLu0gleHoO8+?= =?us-ascii?Q?0H6P0QR8++Jx/qd+LIv6oOOBs0xrSAM/FaF9af2K8SD4VU7QyrVp/sgJMmvi?= =?us-ascii?Q?3Ne8DWE8ZtRTC3Ba1S4t5gm1EWU5sAtyqKUg4p4g9+cGf61t9ITNhgmMlOEg?= =?us-ascii?Q?yfLKIpWn/K0r1KbJex+9Cu0AAXfLylLxu8A/qizIF3LGFyntTDHFro23WKu8?= =?us-ascii?Q?8NbobRvvvkaZmPZkNtZqE0blDO7y8yhuJ5wy7LL9+NHW6qjXEeNH1JeCZ9sp?= =?us-ascii?Q?KoROteM1UBmMi06k03J7K9zpb8OIeI7Cbc6WqMVI1iSnYnyg2blZ1dMgjlP7?= =?us-ascii?Q?jaY1ZGFyvrRd5XAZF0sx2lBRhMYWBL3VHOjH74yRIvBcbFI89exzdCMYp8+g?= =?us-ascii?Q?nha+bKQKGu9sPqAqkr1AJpAQAjdw/9UO5f0GvqwN5+Dv1xMdHlnOqzcR2u6u?= =?us-ascii?Q?/F2pLCFCkFBvVU7aOZ6GgX1ajPqBF6AIFdtTpESEpmXZGKrDb3nXtE6yqur/?= =?us-ascii?Q?IlzKoF6gCpyddp6l1k/CCzuguMPMgn8JkXhw0so8HyvPuDpmNykd8f4C5nHM?= =?us-ascii?Q?z7gXWDWmmzhJZNeQhl1on+1EAicAgs0SAzz3O1nmAjcbizXhD25u2JtHjnWY?= =?us-ascii?Q?lGnPAMqmGSVSCMtWyTyN2LbdkHNGfM3WGUCnudtO2VyVoFLMRI4C5sKZTJV/?= =?us-ascii?Q?1C5WdjkjIEI6W1zNtPHLa6G5d8pWYjsxjtfjSLYDVn64PInEGjcHSeG2zP1R?= =?us-ascii?Q?mXx8bnbZ3dwZRloAOR+v3gb+DlvtsmflJ3ZRJ8kq+iKPS/mMMfa9yd3YtLv6?= =?us-ascii?Q?KA9gCmrcFFEyHEjHUH88J5jYz2n0RIR7Ul22Ro5tjBO3gKnIOYn36/uK/fQd?= =?us-ascii?Q?8jVg2iy6s4Cw8DqJjc/xIcxoYJczpwQl2Sc8Ex9RXqR92SLeJsiKMj67jON/?= =?us-ascii?Q?5zauOKBYRJNc5lpx8ApBgVdR8Drto+rjFa3evrLy+NG5991LUALC/DAk8i+o?= =?us-ascii?Q?SFMwQukeiLNvQkcDbvR4YDnI/KvhSrx6QEf0iFqbuoZ6ceKw7Ur0p3WkYmuP?= =?us-ascii?Q?xSYvwbS8mal4l1CdGy2oFQBIBje4zlcFVOi97+iDv1W1NzsWoovYnycD6bb/?= =?us-ascii?Q?Kr7cV/6GIsVXWc100pF+mdPnM4X3pJfRmAPGurDXpXZwRT48ixOaGEjy6QcA?= =?us-ascii?Q?pARR9EtZflYRgkq60uJ6OsE7mfj64W514OJfR8Cy/msrlTzeJN1L/zbYv9JR?= =?us-ascii?Q?m5YXggjqsB+W1sULij/WXlBjDt8RUaI8efj3hjlqK7K9cLKOGHBDZy7fdNmw?= =?us-ascii?Q?ekCC+5LWSFo4ItrSxRc8xl2DaamwxsPYm5Z0+Xs2grC22u/19FPrbYBfBhxP?= =?us-ascii?Q?0iinxhzwLkUbnK9PfjkVtA2n0QWo3s6RSXz4dIoSJ2qA2Uh51XNJSPPs7l9V?= =?us-ascii?Q?c5hkgNJp4roCYH31aAc=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 90d16020-8d0f-4c36-96fe-08dc3e0d0170 X-MS-Exchange-CrossTenant-AuthSource: DS7PR12MB5744.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2024 18:41:16.1027 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 2HCuN5yI5pRc/pN5SEYZ53AKGJWaXJpBBzD/byxtHmK5GgTJxFITM3OuXr9mMuMu X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB8738 X-Rspamd-Queue-Id: 9383520004 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: mr8wha7panmjezwi9dbwudec63fm4b53 X-HE-Tag: 1709750479-855111 X-HE-Meta: U2FsdGVkX1/fQ9WQRKH0EzIHajyg9OAoyuv3sX8H59zH6CU8r0XK8Jz8fvMvlliiNkUdw2bAEa+h3FH+b2ev+kT+1LKnUMl/NTgC7+LybFo3m7grHqDW1dK6wcmRybYHAGzoBHRofm+d0eHc8jxvezYd8bC0qAPZtlEJTASVBk+Ph1rcg6tIM1mejZtl+r+Kg5zavrk8jQ3ykV9nZzDQOIooQlji4n17RZ9rBqTta2/nmNbkI/QhRNQlYu7hijeBxg3fDG0/ck53R9e65TDrgBh6A+Z9KErFxbcv7XvDF3tbbIjv3Dv5Q6O6TvxtcpMct/oP3ClhqksIUQ9Q+D+oq9IIw3Oi+4ARNbL3dlxHQQiASTAZOt2grTmXtdu3oZbLlMaCRfJgNO/mx7YS8G6wqCl67LMZghx6D73zHTXRNedXB7A0ZK5qZdg+xUD2Dw5N85YEl1dIbezymehKN6p8D4lw15bUf7x2/E4ZabHtsPoZartK2DqSqalgAhbuc7iRDkeXfnJCRVTDVTq0uKR+7Nv6rraS27uHYl5jqarG3blwpItZggrj2uEfJ0TnlcyvqOB9Cw2XYgxvbQB/tS/SRA/8MDypEMB38ok9UuhBCcZjRhPxqHdAGSgUZ2GZrYolazFhZ9elrMNKf9QJIVvyRDXcVv1SohS149LiLnGdPe6o+nbX9x1ffOfBq20RuE2lguDZDWMB4iyQwBFcmy7fOaxQICMwppyS2tIuw8nlUszxLUZwNLM98s4RPKqSRVMu0k0GEgX1Bor0kgzc7pqP7oTSptbhoNI5o2CcPzXRB7HQwwXs+vRCsFh70zcIWh//rBUk90sa+oBPMyW+PWyNxWlSmGj6taoOMfpZeveVghYNhcXN823RU3tZavuQuZ+nH638lIZQggi8T8bjyuEZ5UfDvD+aaZrMdqbR2osHS9zzUplj2lk6WePFiY4RFmEnoEef/RrQPLAu+wePMcw h74rQkqj 6Zn4/BdugnDijBrbY6Y9KPkxt3C0f2KyEv3FJxI3Aa/WnRyTjV6tBlCaQng6SVAZqZZH08edL456wzJq59e9vK9EPyf/iDpsJN/bKkJdtfTFt1TOe3BMfGJPpwjbnynB8QpSiHfGfMtyJmJDzPd6bqjRw7fStV1yWMwk7Gm1BAVEZ2ZBEFOGD7VhClTcg2lsH8dl0S2G32ZqzgcCqx85LA0CCV0Ge4HNUlNZTNtE+kAnTYDjUvr+wp5ky3i7gnMmupeAzPnK2gOUe5GYcyJjhl6lSCZN7ZCRzm6EznPg0tVdA2RwTuYhl0F2/crw98KWjU/EB4/KZm5g7jWrceHkBPwaNrzHGmtjC9xn0Xp5CNylnUC/AtRyY4zUPUUMipj81Cqp+mWdpa4S1NGWFdI1mvvSmvk3tvqi7HXX6u07OXeZHOpX2uwqbA5NujRQSset3IUUtUrSwayXLJBI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --=_MailMate_79270828-7A42-4182-A14F-4D95ED2DC63E_= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On 6 Mar 2024, at 12:41, Ryan Roberts wrote: > On 06/03/2024 16:19, Ryan Roberts wrote: >> On 06/03/2024 16:09, Matthew Wilcox wrote: >>> On Wed, Mar 06, 2024 at 01:42:06PM +0000, Ryan Roberts wrote: >>>> When running some swap tests with this change (which is in mm-stable= ) >>>> present, I see BadThings(TM). Usually I see a "bad page state" >>>> followed by a delay of a few seconds, followed by an oops or NULL >>>> pointer deref. Bisect points to this change, and if I revert it, >>>> the problem goes away. >>> >>> That oops is really messed up ;-( We're clearly got two CPUs oopsing= at >>> the same time and it's all interleaved. That said, I can pick some >>> nuggets out of it. >>> >>>> [ 76.239466] BUG: Bad page state in process usemem pfn:2554a0 >>>> [ 76.240196] kernel BUG at include/linux/mm.h:1120! >>> >>> These are the two different BUGs being called simultaneously ... >>> >>> The first one is bad_page() in page_alloc.c and the second is >>> put_page_testzero() >>> VM_BUG_ON_PAGE(page_ref_count(page) =3D=3D 0, page); >>> >>> I'm sure it's significant that both of these are the same page (pfn >>> 2554a0). Feels like we have two CPUs calling put_folio() at the same= >>> time, and one of them underflows. It probably doesn't matter which c= all >>> trace ends up in bad_page() and which in put_page_testzero(). >>> >>> One of them is coming from deferred_split_scan(), which is weird beca= use >>> we can see the folio_try_get() earlier in the function. So whatever >>> this folio was, we found it on the deferred split list, got its refco= unt, >>> moved it to the local list, either failed to get the lock, or >>> successfully got the lock, split it, unlocked it and put it. >>> >>> (I can see this was invoked from page fault -> memcg shrinking. That= 's >>> probably irrelevant but explains some of the functions in the backtra= ce) >>> >>> The other call trace comes from migrate_folio_done() where we're putt= ing >>> the _source_ folio. That was called from migrate_pages_batch() which= >>> was called from kcompactd. >>> >>> Um. Where do we handle the deferred list in the migration code? >>> >>> >>> I've also tried looking at this from a different angle -- what is it >>> about this commit that produces this problem? It's a fairly small >>> commit: >>> >>> - if (folio_test_large(folio)) { >>> + /* hugetlb has its own memcg */ >>> + if (folio_test_hugetlb(folio)) { >>> if (lruvec) { >>> unlock_page_lruvec_irqrestore(lruvec,= flags); >>> lruvec =3D NULL; >>> } >>> - __folio_put_large(folio); >>> + free_huge_folio(folio); >>> >>> So all that's changed is that large non-hugetlb folios do not call >>> __folio_put_large(). As a reminder, that function does: >>> >>> if (!folio_test_hugetlb(folio)) >>> page_cache_release(folio); >>> destroy_large_folio(folio); >>> >>> and destroy_large_folio() does: >>> if (folio_test_large_rmappable(folio)) >>> folio_undo_large_rmappable(folio); >>> >>> mem_cgroup_uncharge(folio); >>> free_the_page(&folio->page, folio_order(folio)); >>> >>> So after my patch, instead of calling (in order): >>> >>> page_cache_release(folio); >>> folio_undo_large_rmappable(folio); >>> mem_cgroup_uncharge(folio); >>> free_unref_page() >>> >>> it calls: >>> >>> __page_cache_release(folio, &lruvec, &flags); >>> mem_cgroup_uncharge_folios() >>> folio_undo_large_rmappable(folio); >>> >>> So have I simply widened the window for this race >> >> Yes that's the conclusion I'm coming to. I have reverted this patch an= d am still >> seeing what looks like the same problem very occasionally. (I was just= about to >> let you know when I saw this reply). It's much harder to reproduce now= =2E.. great. >> >> The original oops I reported against your RFC is here: >> https://lore.kernel.org/linux-mm/eeaf36cf-8e29-4de2-9e5a-9ec2a5e30c61@= arm.com/ >> >> Looks like I had UBSAN enabled for that run. Let me turn on all the be= lls and >> whistles and see if I can get it to repro more reliably to bisect. >> >> Assuming the original oops and this are related, that implies that the= problem >> is lurking somewhere in this series, if not this patch. >> >> I'll come back to you shortly... > > Just a bunch of circumstantial observations, I'm afraid. No conclusions= yet... > > With this patch reverted: > > - Haven't triggered with any of the sanitizers compiled in > - Have only triggered when my code is on top (swap-out mTHP) > - Have only triggered when compiled using GCC 12.2 (can't trigger with = 11.4) > > So perhaps I'm looking at 2 different things, with this new intermitten= t problem > caused by my changes. Or perhaps my changes increase the window signifi= cantly. > > I have to go pick up my daughter now. Can look at this some more tomorr= ow, but > struggling for ideas - need a way to more reliably reproduce. > >> >>> , whatever it is >>> exactly? Something involving mis-handling of the deferred list? I had a chat with willy on the deferred list mis-handling. Current migrat= ion code (starting from commit 616b8371539a6 ("mm: thp: enable thp migration = in generic path")) does not properly handle THP and mTHP on the deferred lis= t. So if the source folio is on the deferred list, after migration, the destination folio will not. But this seems a benign bug, since the opportunity of splitting a partially mapped THP/mTHP is gone. In terms of potential races, the source folio refcount is elevated before= migration, deferred_split_scan() can move the folio off the deferred_list= , but cannot split it. During folio_migrate_mapping() when folio is frozen,= deferred_split_scan() cannot move the folio off the deferred_list to begi= n with. I am going to send a patch to fix the deferred_list handling in migration= , but it seems not be related to the bug in this email thread. -- Best Regards, Yan, Zi --=_MailMate_79270828-7A42-4182-A14F-4D95ED2DC63E_= Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQJDBAEBCgAtFiEE6rR4j8RuQ2XmaZol4n+egRQHKFQFAmXouMoPHHppeUBudmlk aWEuY29tAAoJEOJ/noEUByhUuNMP/1k7+02dGlvs11ZEwWQW8AEP2INTI394ArAX u3/WXHQGSZ9dCgqrJpJKPBzVaaPJTQNAlKDHD8F9RH+z8b3axtBNZdaaqKnHZdq1 cBxBqHPDotD76sQ3CalOMCnVYIIQbSZy4y06iJ+oWbsTH1e6aYZR5N94HIrgaeIx nIgZv/BQ2VV8TvkchcDQE2oMUPI962nP/jCXChWvtfpE9ch6ARChA6ZYERWTWURb 4DaRwd1v3e8pRKyH2yi/fT22kto8bfhntF6s5hkBigcQ2d6bAcKLc5n6GnhIMXbx crBXnJ+PaDJdDwcBCkSPDaFsJaP+R+XPPoQb4i7RBeCw9UxJv7S98xrqE0Url7uc UTY+fNX5Pyojuq+ZeKgFBPcYqFQm/GwntZ6Ec13DFbvVQW2JmPy+/ZeQ5DHJW0Df ADcp8NM6urHSkx3z8IliFzpL1pcoIHLgUK2r6RLDgzYEMSu0rbeIyOaFz2PoZvdr TGr4xD8lszlgiaIyDyJGFJ2jGLLkfikifVPez+U1oe9H8KFbZCV98mq4rtUDkDvu bKAb3AYtqN5hzc2fEad2+LvMiPFs0cVhoVooMyXXAYu58NpV6LJFzeDbYKWP5z3u T6pCwxY7nXqbZZB8J0Gt4TugJhtVW60+wmFmCfok0PjvHXSXIWN9OziO16uTGrrp gfIOxfkT =6jqA -----END PGP SIGNATURE----- --=_MailMate_79270828-7A42-4182-A14F-4D95ED2DC63E_=--