From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72741C4828D for ; Tue, 6 Feb 2024 14:54:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CA366B007D; Tue, 6 Feb 2024 09:54:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 084816B007E; Tue, 6 Feb 2024 09:54:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5CB26B0080; Tue, 6 Feb 2024 09:54:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D1A9D6B007D for ; Tue, 6 Feb 2024 09:54:46 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id AA5BD40919 for ; Tue, 6 Feb 2024 14:54:46 +0000 (UTC) X-FDA: 81761675772.07.876E6C3 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by imf01.hostedemail.com (Postfix) with ESMTP id 7C34F40023 for ; Tue, 6 Feb 2024 14:54:43 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b="dl JCCCB"; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf01.hostedemail.com: domain of quic_charante@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_charante@quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707231283; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZE2MbNQUrchihls/LHIc3c/uiBlQgZGKRsJgNtjHJUE=; b=3zhHDqAGUz1ARY6p3hv7aGbfWF7LO5ilfcC7m4uEddA7wSixhtE2EATXyrmcWTgvYmQvj0 nWlArKFsWFdpJEPDF0GbDRrHRxeIcMWab22mOCL9zJ8fmi6K0lAFYtW04NwFXfy1+0ehQr ZqHwVwmXa/JSh+xMRmfKxugn9j8ds/A= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b="dl JCCCB"; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf01.hostedemail.com: domain of quic_charante@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_charante@quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707231283; a=rsa-sha256; cv=none; b=nFVFfESGKovPJQMyYMg8xcT9xhX0IGqj/jNSxzrTweKQafM42qschauiJBIEU+hSAAQ1PI LQ0VSAWDTVLj0OSqdimx56mwUPkZxPKs+FGFbtK3ZiYpoMIjsfptkhXONrf2CpLv2sqK48 mrayvrSNTdtRr0FLIpuQJX7LDgD1F9g= Received: from pps.filterd (m0279872.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 41692XSv025683; Tue, 6 Feb 2024 14:54:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= message-id:date:mime-version:subject:to:references:from :in-reply-to:content-type:content-transfer-encoding; s= qcppdkim1; bh=ZE2MbNQUrchihls/LHIc3c/uiBlQgZGKRsJgNtjHJUE=; b=dl JCCCBDPZRnTEfosARCIPefTupqQ6xMyEkBbprjfu+dBikyYp/um6Vz/JxLgOabez bOt0JqHtTXTvVlspf8MYbqz+pNRVPavnV8Vm0LzFFglwYqCXxQk8CGdSH0P5/fi1 faIb82RisNj9v8Qf9xMZ4OdNDAnTOTc1flTtnUBz0XiZpaLG1tBvlX9EQYfpA2T9 42OjOQkna0kOJ6s0YWpgDu7lJPTWCpCPLCx5Up0A4P+WEGBROhoIPHz7R5IBrPY4 R9iHuVSw49FEF/fRn/PUlh44yEJysQ/o9wR38G5ioCAVlEtDi9k8Ep8epP+tFX98 6LS/chdcEkD+gogsHcwA== Received: from nalasppmta04.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3w3hserpst-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Feb 2024 14:54:35 +0000 (GMT) Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA04.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 416EsYg5008015 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 6 Feb 2024 14:54:34 GMT Received: from [10.216.21.243] (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Tue, 6 Feb 2024 06:54:33 -0800 Message-ID: <69cb784f-578d-ded1-cd9f-c6db04696336@quicinc.com> Date: Tue, 6 Feb 2024 20:24:30 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH] mm: Migrate high-order folios in swap cache correctly To: "Matthew Wilcox (Oracle)" , Andrew Morton , References: <20231214045841.961776-1-willy@infradead.org> Content-Language: en-US From: Charan Teja Kalla In-Reply-To: <20231214045841.961776-1-willy@infradead.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: tB-ZjqkJgK3dGzen_w2LsNByzp9e2Z-y X-Proofpoint-GUID: tB-ZjqkJgK3dGzen_w2LsNByzp9e2Z-y X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-06_08,2024-01-31_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 lowpriorityscore=0 mlxlogscore=447 mlxscore=0 impostorscore=0 bulkscore=0 phishscore=0 clxscore=1015 spamscore=0 priorityscore=1501 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2401310000 definitions=main-2402060105 X-Rspamd-Queue-Id: 7C34F40023 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: rzk3k54gwie4gkhena1dp11ikrni816k X-HE-Tag: 1707231283-253049 X-HE-Meta: U2FsdGVkX1/lXUQz9eCJ/VswmKWAeav3URBP4Tknvl7kwN5TYNHHi9zC97oE3VBAaXZ6ycH3sVm6tWdQ55cFmVGk7OLyhlq2WlySOtK1SYdU++9io7Jf6OubuZD/yhE2D77gII+7wF5Qglmq+4S+wlfTPJZQCt+QgdJmyGF20l46ir69VwwQ4dwTrgrLccG0/nG/33sIYV1ztJLWlfujTbls0KxdQBNeWONIsmUur3KO1LShbKk4J9iQMIF3mU9Mh+Qh4SA73nryEkreA5fme/p6mHT9WuR996FI9FFg1ak5so7GylMnpnNbToZSPBwI8DmwOzmAk52QESA1YckbiuAOJdc5N/Gb2uYAqBuE7L2H6r0F0+03yyFVYOTVkbo2lnnlaF4xlwvep1HTI9/lL1KrZ0B5T/NZ4QHoHXN3sqqiKIG3f2U9BWTp7U0aZj2HGXGmozCkgvbdVBsXpZWE5N2FEI7CN69Ou6+8l1Jsh8uVkMXy/xbdTo1N/wjL7TGap5QtEKEFpZMaczVIF2oi8QhxGlSMqaGFMPrAz4/NzGvodZWM9rcqJd/fttuKQwWCyyijMSXYk6uA9wn2juf8698noWE3p4mQI6HTATECYwL3GcujLtm9kfFHuDyu6pq59GiuUO4PGXJqYh51uTinArldxHy23EX6/EarBTzLhb/7aQgDWbFr1oFQDn8DDEZ5uCipyeRaEVphkJkZfdCDE3ppnt1umjSKX35NFU+wWqRfH8vsQDfGMtniQ7ne3ed381FBZ2dAh3/NAGJMQiwLrjIvkgqsEXi10oLF+XvGuvD+9U4feYnD0QyL7CD4371Q5HVhgngD7x/483Or3Or9OuJ+zJPUxUDYR5QhIjwgsd0Oi+JosPXHOBWr7GKtmMSgXMAkqBkzUuJH4KoaGj8Lpz/MPtgeSjmZPqQ4B++6Fk001HyyNYf+JlcSD92nRs5Ow+wBZHHTZkg6XNfIzCx rcfaeWQA zBaFhT2qAQOd/Z7eBMIWqsiI8SbfQ07JedqmR+hYVXYJeMTQvgJh57aNNfEFCJEisAZY6yTOcq99ZQmYy/mcbPZPhD7er6qTFezZHauWpRCupZAJfQU+mYU16xn0DOB/gSXycL0ltK0fhAfDq5jIlg8pdth9kfm9k1K6u3NfFyj4mkuQSLQxLl98OFzLztSQ/NxLoBb0ROAWiu8SORhsMCx4zmyBGii8A9aCykS2Jitz/wHi18pG4gSMq/kMZjJUv8AQFOuYFRSOezfpPQMjRJ6aCpGEOx6vU+qT4ukAtFWo4AnEmpJAhXNyDvjElWl/o5+ieb+6F5IQMsvWOpDalA/OKNNcAVT4Ng1DHXMxD0y1d04Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Matthew, It seems that issue is not completely fixed with the below change. This time it is because of not filling the ->private for subpages of THP. This can result into same type of issue this patch is tried for: 1) For reclaim of anon THP, page is first added to swap cache. As part of this, contig swap space of THP size is allocated, fills the folio_nr_pages of swap cache indices with folio, set each subpage ->private with respective swap entry. add_to_swap_cache(): for (i = 0; i < nr; i++) { set_page_private(folio_page(folio, i), entry.val + i); xas_store(&xas, folio); xas_next(&xas); } 2) Migrating the folio that is sitting on the swap cache causes only head page of newfolio->private set to swap entry. The tail pages are missed. folio_migrate_mapping(): newfolio->private = folio_get_private(folio); for (i = 0; i < entries; i++) { xas_store(&xas, newfolio); xas_next(&xas); } 3) Now again, when this migrated page that is still sitting on the swap cache, is tried to migrate, this THP page can be split (see migrate_pages()->try_split_thp), which will endup in storing the swap cache entries with sub pages in the respective indices, but this sub pages->private doesn't contain the valid swap cache entry. __split_huge_page(): for (i = nr - 1; i >= 1; i--) { if { ... } else if(swap_cache) { __xa_store(&swap_cache->i_pages, offset + i, head + i, 0); } } This leads to a state where all the sub pages are sitting on the swap cache with ->private not holding the valid value. As the subpage is tried with delete_from_swap_cache(), it tries to replace the __wrong swap cache index__ with NULL/shadow value and subsequently decrease the refcount of the sub page. for (i = 0; i < nr; i++) { if (subpage == page) continue; free_page_and_swap_cache(subpage): free_swap_cache(page); (Calls delete_from_swap_cache()) put_page(page); } Consider a folio just sitting on a swap cache, its subpage entries will now have refcount of 1(ref counts decremented from swap cache deletion + put_page) from 3 (isolate + swap cache + lru_add_page_tail()). Now migrate_pages() is tried again on these splitted THP pages. But the refcount of '1' makes the page freed directly (unmap_and_move). So, this leads to a state of "freed page entry on the swap cache" which can causes various corruptions, loop under RCU lock(observed in mapping_get_entry) e.t.c., Seems below change is also required here. diff --git a/mm/migrate.c b/mm/migrate.c index 9f5f52d..8049f4e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -427,10 +427,8 @@ int folio_migrate_mapping(struct address_space *mapping, folio_ref_add(newfolio, nr); /* add cache reference */ if (folio_test_swapbacked(folio)) { __folio_set_swapbacked(newfolio); - if (folio_test_swapcache(folio)) { + if (folio_test_swapcache(folio)) folio_set_swapcache(newfolio); - newfolio->private = folio_get_private(folio); - } entries = nr; } else { VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); @@ -446,6 +444,8 @@ int folio_migrate_mapping(struct address_space *mapping, /* Swap cache still stores N entries instead of a high-order entry */ for (i = 0; i < entries; i++) { + set_page_private(folio_page(newfolio, i), + folio_page(folio, i)->private); xas_store(&xas, newfolio); xas_next(&xas); } On 12/14/2023 10:28 AM, Matthew Wilcox (Oracle) wrote: > From: Charan Teja Kalla > > Large folios occupy N consecutive entries in the swap cache > instead of using multi-index entries like the page cache. > However, if a large folio is re-added to the LRU list, it can > be migrated. The migration code was not aware of the difference > between the swap cache and the page cache and assumed that a single > xas_store() would be sufficient. > > This leaves potentially many stale pointers to the now-migrated folio > in the swap cache, which can lead to almost arbitrary data corruption > in the future. This can also manifest as infinite loops with the > RCU read lock held. > > Signed-off-by: Charan Teja Kalla > [modifications to the changelog & tweaked the fix] > Signed-off-by: Matthew Wilcox (Oracle) > --- > mm/migrate.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index d9d2b9432e81..2d67ca47d2e2 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -405,6 +405,7 @@ int folio_migrate_mapping(struct address_space *mapping, > int dirty; > int expected_count = folio_expected_refs(mapping, folio) + extra_count; > long nr = folio_nr_pages(folio); > + long entries, i; > > if (!mapping) { > /* Anonymous page without mapping */ > @@ -442,8 +443,10 @@ int folio_migrate_mapping(struct address_space *mapping, > folio_set_swapcache(newfolio); > newfolio->private = folio_get_private(folio); > } > + entries = nr; > } else { > VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); > + entries = 1; > } > > /* Move dirty while page refs frozen and newpage not yet exposed */ > @@ -453,7 +456,11 @@ int folio_migrate_mapping(struct address_space *mapping, > folio_set_dirty(newfolio); > } > > - xas_store(&xas, newfolio); > + /* Swap cache still stores N entries instead of a high-order entry */ > + for (i = 0; i < entries; i++) { > + xas_store(&xas, newfolio); > + xas_next(&xas); > + } > > /* > * Drop cache reference from old page by unfreezing