From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE66BC6FA8E for ; Sun, 5 Mar 2023 06:51:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7DAF96B0075; Sun, 5 Mar 2023 01:51:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 73D006B0078; Sun, 5 Mar 2023 01:51:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53F4C6B007B; Sun, 5 Mar 2023 01:51:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3E53A6B0075 for ; Sun, 5 Mar 2023 01:51:25 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1A1E5A0920 for ; Sun, 5 Mar 2023 06:51:25 +0000 (UTC) X-FDA: 80533923330.19.FAE54C5 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf23.hostedemail.com (Postfix) with ESMTP id 2B27B140007 for ; Sun, 5 Mar 2023 06:51:22 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=A9NRA7LO; spf=pass (imf23.hostedemail.com: domain of 36jsEZAgKCJkCB3JBR3G9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--jiaqiyan.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=36jsEZAgKCJkCB3JBR3G9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677999083; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3Eup8McE/uaOExA8PBA3tRNDpt9b7T26XXk8FifqgTc=; b=pNpiTjtX7aWC5EJ24QFQ6xIn0UewJ5y9gY27qP6xoShffn4PbSlKcTRKCL/5NKLRozteRu 6NLFhZEBTDccnNsTOwKfeBvVn4D7cQ9xPRe7ItR9N2fTAOM+rxaMDQJoY2izJq0de6Q83/ 2q0hc+ZGglP4JzFq4pV+Cbb+TIqfbrk= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=A9NRA7LO; spf=pass (imf23.hostedemail.com: domain of 36jsEZAgKCJkCB3JBR3G9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--jiaqiyan.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=36jsEZAgKCJkCB3JBR3G9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677999083; a=rsa-sha256; cv=none; b=KEqa5T2fXezgZ1+Z7a6y2PRLyxhLwls8Y8RqLk9Tx8TUVYhAtt0aryVfgAVzazMXxV6w8b 4YW6UVhYIBWJraaLOBoZIavLfADKy6n1wfRSfTKM9h1NDDhJtSa7TBuaz9uUpRj07AP8EQ u5mFm0JaU1X+es3o02YWejbKdlpbDVk= Received: by mail-pf1-f201.google.com with SMTP id bw25-20020a056a00409900b005a9d0e66a7aso3617636pfb.5 for ; Sat, 04 Mar 2023 22:51:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1677999082; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3Eup8McE/uaOExA8PBA3tRNDpt9b7T26XXk8FifqgTc=; b=A9NRA7LOzoqLYNfsOv5cw9awPZz7azqxpXnIXDD/71kPvs5ewSVvY5Nq7Fl8m8asSx ruKHBSjeSAIIz1RVZIShD8AlBHBS3W4zQBwEu/CvqiMOumiB3U/7Am/8/PRgf+LN8kFr euJJqtELjtWL1b+4+42GZv0VGU5xtLOWRw1MXisV+vmdb/tDQmPTGiqLtnv9f59jkxzA i2rSzGqSLI+tUn+bEZQsjE3YE5cTk7ygEnH2kLIqWkjzny57BiQoaaHOb8hALNy9+/2u F+B0FKAeuzG/prep2zTfldr0t2/clBEir+WcHhQBG5GQuNOm71I7ARraUJaATBjABe96 3CGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677999082; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3Eup8McE/uaOExA8PBA3tRNDpt9b7T26XXk8FifqgTc=; b=0zgzlpyOvb8heMYx1GoCpiMm4+mBLunzMUoJgy8kVwZ4jwQgyIoZqhzv5yDnMI+YZd Ot8GMzgoTjWZo8lc3VHQXTedDQB89M7nAbXj8G7xywVSpUgwTFz7u7axPAgFHB9aOgSu x5IzODEYGV9Zmztdvr43YU7mjojgENtbnFy5rLf0JEB0RvLCrBtToRo75vJ411i0k7EX xYa82CIriFRggyc3OpdY7ncC0guicsVVNl6lLeB9aVlL1J1BQslmaZYHxVCpefoOuRXN T8JlYGX7wzdyUypaeCcG3taurFF7WHXIPokZFLBJ+QU4t+Bwkksp+2ptnfc4TLDwGeS2 ftUg== X-Gm-Message-State: AO0yUKWQaUAQMz+pit+QNxqe9O7ebRf4GLT0vaSde26cdRV9o6rD/iaK eTNr3/7OAbJFNl8bMqWbNT1/MH4dnNRddQ== X-Google-Smtp-Source: AK7set+12/8VfWcI7PrGGvc1yiHksO8bMxQZJRk1joxSU0lmpLikkQGJa/yCK5rtiOsBNiaHt2SvKQ6bTXfrvg== X-Received: from yjqkernel.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1837]) (user=jiaqiyan job=sendgmr) by 2002:a63:f91d:0:b0:502:fd12:83ce with SMTP id h29-20020a63f91d000000b00502fd1283cemr2476574pgi.5.1677999082036; Sat, 04 Mar 2023 22:51:22 -0800 (PST) Date: Sat, 4 Mar 2023 22:51:12 -0800 In-Reply-To: <20230305065112.1932255-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230305065112.1932255-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230305065112.1932255-4-jiaqiyan@google.com> Subject: [PATCH v10 3/3] mm/khugepaged: recover from poisoned file-backed memory From: Jiaqi Yan To: kirill.shutemov@linux.intel.com, kirill@shutemov.name, shy828301@gmail.com, tongtiangen@huawei.com, tony.luck@intel.com, akpm@linux-foundation.org Cc: naoya.horiguchi@nec.com, linmiaohe@huawei.com, jiaqiyan@google.com, linux-mm@kvack.org, osalvador@suse.de, wangkefeng.wang@huawei.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: 8ccjmh3k9ijnd8bmau1hjn647qwh3xkf X-Rspam-User: X-Rspamd-Queue-Id: 2B27B140007 X-Rspamd-Server: rspam06 X-HE-Tag: 1677999082-152870 X-HE-Meta: U2FsdGVkX1/dXzbI6aVLOZVFXWjG2axOsD4z0SJ0xVaRL1iX66dZJ1Admd0CtsCVZepYI/KNrZuB7HIXotXNj3LRZ4MeYUooZFe5pP8SWNy60zsfCqvbFAxZ3SRyo8jkKuB/viec+7Ki+s5TEaVG98+3YTdsrD2CUmaOzqg5R640Kky5w2wRSD6GHOE2flghfnp++HsF3K/+gE3wvHyySgeiQoB+mYx9jSaO8OseUBj1LrQutqgTgASfg6NjPHq5HlPwDysB+3myq5UjE5ucPodFUN5bjEIL1GZxMhHezBNsiK5Kt5SH5tbrlXn6qZqVLM7uRpr8O9f/PaRRCTF57AaYFP5ecWAIAUkU5cowc/ouLiqNl1LO8ISPy/bxgeFfawCsj6zzTfNwWai6Gfzq2K/2S2O5Shh1kT3uQWhpSib9/dnNb9lLwoRFVgxO3tHlcYyl2VNRuOniB2yKVPM3BuCRNyBxsY194q1Yp6SUYGthgV/jMcI5pnucvmbi5q3xJrOXN5HksxLx3bUCHfOOOf4gYvoIwFL2kL56CZccsclWtNUZXO56Y/oHyHbfG+aHNSy4knQcLAuUrbkOnUt8OF3jhQY8doORpMsYWUOs2WmNuw3EYbh3OyrDqphRl6gBbNT6XLGzbWvmjopak0+eKt4/hxPBl8iHiHMzdPl9wVhVgRKQUVXd0RVZ/IDCs7VyqM8YiDTU4URID65WP/fdMjACADjsVUXIsCvJTda1t0dOkXnHM2digAtsDXYah2IdI2QQ3C83Jmkn9S9KHb+EDjAAK5vwftN612sVfjtCXM8ZGzFueUVX0mF1gnGPx07VHKEp1LFEPVyhjZNoseA6neRGK89mRAFIcsGen0prTCZtpp0fohqTmahFIQcSw8HfQyekhmYNktTQ0s8JTg/6+x8v4XEPaGZ9dAMETiYSjSPX/uuDzi5RmWv76ircY2nxuQ7vcs+SP+ry0XWr3Pp aUhEveWR GJ5+EdIz66JE/qRimHwEuO7D007dsWts5ReuxBpZH4qwzSWRGMUPXTlrVZCJgMGJWGgRr1roAq3FLbS++/cVJGx+kBExzOGJQctq7nwFeQOG2wlUQlmKC9LpFakSIZcylYaEn4J2ReErjA4/OpCFQzGiNe0MtRn6RZYguKxPu5WkJRJsckf1nnl8yGKajHWrYMEKFRvCFUz52Xezg1JT9RlPHqh60rtqRGQJrJrVWxoKuZdeyXmkwZlfEMq4pt2oXSmRvymMgPLD7G+/HHo4BTLi1vOp1EEr8oac71QFlL3BSTHRLCD7VFpsdoBO/cka+MDAuSlgCSFmE4NRLlOyjV0vnD6NDpmlY8fvxELqm5usb5Psci608lxPA2bYwN1UoqIVRQI23fRi1KVN41vy3OKUJVvDdTRDBOhrVHARmL6yBsuf50D/QJTRK7t6hzqmLMyjCeT9sHj+Fz/rDejAvvqxX3ni+kADIqCzbqegJdQBbMhB9SWgmWdkufmoxt2As2mJITKwxtgrfxHu1v7BXgEmFnJx7QdlystN1HyBNa6WFMkFoLgdYwFlT2XRbwZuG6UNJJNXTBNGtEjnY1YaIunW8LLZUWAFXli+ziwiRrQb7T42ZyPkrsbE20HPq3pjPhVY+Zh2DB6IhZxA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Make collapse_file roll back when copying pages failed. More concretely: - extract copying operations into a separate loop - postpone the updates for nr_none until both scanning and copying succeeded - postpone joining small xarray entries until both scanning and copying succeeded - postpone the update operations to NR_XXX_THPS until both scanning and copying succeeded - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded but copying failed Tested manually: 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdisk. 1. Start a two-thread application. Each thread allocates a chunk of non-huge memory buffer from /mnt/ramdisk. 2. Pick 4 random buffer address (2 in each thread) and inject uncorrectable memory errors at physical addresses. 3. Signal both threads to make their memory buffer collapsible, i.e. calling madvise(MADV_HUGEPAGE). 4. Wait and then check kernel log: khugepaged is able to recover from poisoned pages by skipping them. 5. Signal both threads to inspect their buffer contents and make sure no data corruption. Signed-off-by: Jiaqi Yan --- mm/khugepaged.c | 78 ++++++++++++++++++++++++++++++------------------- 1 file changed, 48 insertions(+), 30 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c3c217f6ebc6e..3ea2aa55c2c52 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1890,6 +1890,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, { struct address_space *mapping = file->f_mapping; struct page *hpage; + struct page *page; + struct page *tmp; + struct folio *folio; pgoff_t index = 0, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); @@ -1934,8 +1937,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_set(&xas, start); for (index = start; index < end; index++) { - struct page *page = xas_next(&xas); - struct folio *folio; + page = xas_next(&xas); VM_BUG_ON(index != xas.xa_index); if (is_shmem) { @@ -2117,10 +2119,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } nr = thp_nr_pages(hpage); - if (is_shmem) - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); - else { - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); + if (!is_shmem) { filemap_nr_thps_inc(mapping); /* * Paired with smp_mb() in do_dentry_open() to ensure @@ -2131,21 +2130,10 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, smp_mb(); if (inode_is_open_for_write(mapping->host)) { result = SCAN_FAIL; - __mod_lruvec_page_state(hpage, NR_FILE_THPS, -nr); filemap_nr_thps_dec(mapping); goto xa_locked; } } - - if (nr_none) { - __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); - /* nr_none is always 0 for non-shmem. */ - __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); - } - - /* Join all the small entries into a single multi-index entry */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, hpage); xa_locked: xas_unlock_irq(&xas); xa_unlocked: @@ -2158,21 +2146,35 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, try_to_unmap_flush(); if (result == SCAN_SUCCEED) { - struct page *page, *tmp; - struct folio *folio; - /* * Replacing old pages with new one has succeeded, now we - * need to copy the content and free the old pages. + * attempt to copy the contents. */ index = start; - list_for_each_entry_safe(page, tmp, &pagelist, lru) { + list_for_each_entry(page, &pagelist, lru) { while (index < page->index) { clear_highpage(hpage + (index % HPAGE_PMD_NR)); index++; } - copy_highpage(hpage + (page->index % HPAGE_PMD_NR), - page); + if (copy_mc_highpage(hpage + (page->index % HPAGE_PMD_NR), + page) > 0) { + result = SCAN_COPY_MC; + break; + } + index++; + } + while (result == SCAN_SUCCEED && index < end) { + clear_highpage(hpage + (index % HPAGE_PMD_NR)); + index++; + } + } + + if (result == SCAN_SUCCEED) { + /* + * Copying old pages to huge one has succeeded, now we + * need to free the old pages. + */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { list_del(&page->lru); page->mapping = NULL; page_ref_unfreeze(page, 1); @@ -2180,12 +2182,23 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, ClearPageUnevictable(page); unlock_page(page); put_page(page); - index++; } - while (index < end) { - clear_highpage(hpage + (index % HPAGE_PMD_NR)); - index++; + + xas_lock_irq(&xas); + if (is_shmem) + __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); + else + __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); + + if (nr_none) { + __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); + /* nr_none is always 0 for non-shmem. */ + __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); } + /* Join all the small entries into a single multi-index entry. */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, hpage); + xas_unlock_irq(&xas); folio = page_folio(hpage); folio_mark_uptodate(folio); @@ -2203,8 +2216,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, unlock_page(hpage); hpage = NULL; } else { - struct page *page; - /* Something went wrong: roll back page cache changes */ xas_lock_irq(&xas); if (nr_none) { @@ -2238,6 +2249,13 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_lock_irq(&xas); } VM_BUG_ON(nr_none); + /* + * Undo the updates of filemap_nr_thps_inc for non-SHMEM file only. + * This undo is not needed unless failure is due to SCAN_COPY_MC. + */ + if (!is_shmem && result == SCAN_COPY_MC) + filemap_nr_thps_dec(mapping); + xas_unlock_irq(&xas); hpage->mapping = NULL; -- 2.40.0.rc0.216.gc4246ad0f0-goog