From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6D4FC6FD1C for ; Fri, 24 Mar 2023 21:15:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 548426B0072; Fri, 24 Mar 2023 17:15:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F8216B0074; Fri, 24 Mar 2023 17:15:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BFBC6B0075; Fri, 24 Mar 2023 17:15:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2AD726B0072 for ; Fri, 24 Mar 2023 17:15:24 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E6000A0B0D for ; Fri, 24 Mar 2023 21:15:23 +0000 (UTC) X-FDA: 80605047726.16.445E297 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf12.hostedemail.com (Postfix) with ESMTP id 2817D4001A for ; Fri, 24 Mar 2023 21:15:21 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=SFHhCEuc; spf=pass (imf12.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679692522; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q1OCNvBZvBuTFHLQ71U7FHBkNzIRuTZvoRCZuXHsy/M=; b=eGkFFzyI12hJDrh5qSuwBVcHdkyh1t1k/SLE+H7y2NaI7wRy6YpBShegkf/lFO16daS0HV fxMW1BX5/LHYtwQhm9uJh33LblswWG2nhbFgQVTQjK0M9d5bBZ7PO/PzJ2wd2rIJ/yVpV3 88Yzy6Q6iXfRJkQao3tN8ZfXnSPdiOk= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=SFHhCEuc; spf=pass (imf12.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679692522; a=rsa-sha256; cv=none; b=Wdlvzj2ef0QhTofWl6p6ZO5qjbmIdRY+r5DdV0YsFOp0k5QYbM8PHTMTH4DLN+02rQQoc1 hv+VKAKOPvLgxohcqK5uDMZiOMJ32m2X8HQlWW83JsmNqexVT6tBLxeLFGev4QoD/23xn2 MxcHqKBE13P0kbFfM208a4u+R6eDXFI= Received: by mail-pj1-f53.google.com with SMTP id p13-20020a17090a284d00b0023d2e945aebso6585212pjf.0 for ; Fri, 24 Mar 2023 14:15:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679692521; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=q1OCNvBZvBuTFHLQ71U7FHBkNzIRuTZvoRCZuXHsy/M=; b=SFHhCEucEWk3izksmdzUnwQW9Huf2SbGAn8F3swRq2SFW8l72Y4KeLUCOVnP2i78Lw NSe+TFtWiZPb7YdBU2FsjFaDjPyEvTKjFFhp1PDF3mcVqr214q8XW1d31Xllcvaz/1fx yQGzOZ6OqaRXYTmufLb6GKrc2dJ6/8nrbgNF0LkIpemVNJd8qNxn7ze+Rye+WRsEJykS A9YVByTR4VC/OFTl3tplXuPmuh/MyUAMi2U6TiZxOXdQg4PpM+MIaa+AkfK8eazr6AV0 acb5i8r+dTrVV5zc+G+zhiIwX1lSnynixUmJ9ziMxqHQDoGMgJv7fBZ8h9wDPgP0qV7n 7fCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679692521; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q1OCNvBZvBuTFHLQ71U7FHBkNzIRuTZvoRCZuXHsy/M=; b=QDtbzW69kOPE/v1N4KCcALkNcK8nU1SDZbr/Ag8/RBixK01qVBRHsJ9yHPe06r33RG jRipR5FNZZR5k32uB7OiveyCCSkTfARy9ZxRtvxEkHqJf6fkxqEgSjzWplB8tMYLUIbJ 9jGOIkV411UnnckfzwrAzmaSeXZHsXZgjRKH5Q1fZyZ42uOU53xa8NGA+RKrSP65tI5+ Ev3uH3Ooo/TlMndKiMbUlQml1s33xoVJMSYKfect8x2Giqen/E3+3vnc7dwy4d4EuMd7 i/TP4hDm9vzb3px3Tj6beJMwTA2Wdec2TSCVLlHcisXFZtsiokPqC6JNz2SQtEJPKfvZ hC8A== X-Gm-Message-State: AAQBX9eq1gY1aCRA7RH1pkBsnzVk2q7Y1cO2YmJLzEeybfBtwd4+byb3 0ka3QcqrGBY5hqt4CQOAk5xc5GWBte47OltFFNI= X-Google-Smtp-Source: AKy350brtxxaxQ4xGJBo2U/k29uaVLRhlhoNImCnMIDHZ/GZn3YuLi5WjBLKWmBY30UXE+p1raa5yKqjFMJ7y62mpn0= X-Received: by 2002:a17:90a:7893:b0:23d:bd2:ab35 with SMTP id x19-20020a17090a789300b0023d0bd2ab35mr1285352pjk.3.1679692520796; Fri, 24 Mar 2023 14:15:20 -0700 (PDT) MIME-Version: 1.0 References: <20230305065112.1932255-1-jiaqiyan@google.com> <20230305065112.1932255-4-jiaqiyan@google.com> In-Reply-To: <20230305065112.1932255-4-jiaqiyan@google.com> From: Yang Shi Date: Fri, 24 Mar 2023 14:15:09 -0700 Message-ID: Subject: Re: [PATCH v10 3/3] mm/khugepaged: recover from poisoned file-backed memory To: Jiaqi Yan Cc: kirill.shutemov@linux.intel.com, kirill@shutemov.name, tongtiangen@huawei.com, tony.luck@intel.com, akpm@linux-foundation.org, naoya.horiguchi@nec.com, linmiaohe@huawei.com, linux-mm@kvack.org, osalvador@suse.de, wangkefeng.wang@huawei.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2817D4001A X-Rspam-User: X-Stat-Signature: u35cqicy3jka4jdicr8htkoc931txh1c X-HE-Tag: 1679692521-735660 X-HE-Meta: U2FsdGVkX18dtsGHKOfc733jcDyi9TgJez5DaeijEXLBkleFje/N46lOfzxaqCnmNQ3nkvj+SCow8hJoDmtzgXnEZ7Fm/JUiaH/9Gecz9h4YLtTsJt1/dhju3+CsXZPILG+l2sfCNw1UBsGX4otLr0FIJBrhXC4SVysDHGIo/GcSz3sBLJOzQ42M+aDkV51y6iPnnKMLp5vrWe1ZtiDHEXsY4lrLUHMhNazXRfxjB8coE+gy+W5DsvnaqRvthEhdByH37YM4dhset4o8Vlx396BiNFl6UPWCPsbL9lEQWwRAeilIDvh61+K4l2ZeaX74YVLigY+zdXNaTY6h9wDYrPVk1rsZtNf/vuyCB5zX/bti6Hel1Y3ZzbhOJbYcWmJU8oLdJpQUYu+umCUBcLWjcz2fboc1vmPOovCMytOaMPm6J/5GaiVESd1ZfTb+8v9t+4kHWl7znjJcPliRMvrKamnik1NsuARX6yUGIHzbVPZRPQkpkebK1KSkUNwfVsqnpKNkvS8molMKcoWna0fhdIPS5otuuZtfAdM0cURRlmHQiSsTbN2vEaS2HCTZxy54zadMjR9YI5kunEUKxpS52EgQdTKtmD8qXBPyvAeqF6Ec2mbK+OFB56NObKtmAu4ULxaVxRnx8EWMXyfnUDXDrjgXE3aHwpvy2V0zjJefbJ4n+v0+0JqqnU52GaZrSPUJxQKlHBZMGcYc/VJtGS5y0GwzUiPb3tIBHw7mERHiK0ur5FPCCloCQY6hlUufX2PHSgFzyEqe6NwUHAcJq3T6tL7s9p1PzR1jljPYhQfVUhrCGI8C33tjxJ8k+kC+hfNdnfgsvkgml/Qmoa3yMN1FLu7lCz5IO+rytfGNCLwcP9DN7A7oAZ8cVvwACV90YBB893iGEmpEVsMDkZ9Hf7ykoXc/Cxzr9DJny8qeONy4DmYUDxPFtETMaN/gJZJViNMviK8xeue0Z6Yob44srHS AfTcLLw7 E9+qW28FUs2odKZlFqf533XUqLtCcQyVcqkovNfKNMG6NOZBVewYUjdNyg+x5rzde5rHs9Luh9T4dOsYBZqiXLKGsmrP2UL/o8OJlNxpbM7crCVS/om7yOvHOUYFIoOcnl4iUC0m+XqVyv2eQYG6wOhKJLLRIK+p2L8UAgZXWJEitLgaIBQnngXCQsE3VKXbdKUZYaT/utnXz+bPiuKcOsBNzioAhp2EVDEx7GZCFZdqmqwbCvofLOPg6Vtj6hHiUM0KMC8WcaEwN/cSotd8svyKTkY417rMYIDla5P697ey/gKrkk3eu6JW9dn09LKrhqXVofASQvxD5jmsUokwxYaGL8ucmZsXS1jBYeb3XRUkudmPCn1KcVjRoGLK/DIKWYFj5Kw6steWDqYcaZ51+4ebQj2ZgQJrV8gkw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Mar 4, 2023 at 10:51=E2=80=AFPM Jiaqi Yan wro= te: > > Make collapse_file roll back when copying pages failed. More concretely: > - extract copying operations into a separate loop > - postpone the updates for nr_none until both scanning and copying > succeeded > - postpone joining small xarray entries until both scanning and copying > succeeded > - postpone the update operations to NR_XXX_THPS until both scanning and > copying succeeded > - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded but > copying failed > > Tested manually: > 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdisk. > 1. Start a two-thread application. Each thread allocates a chunk of > non-huge memory buffer from /mnt/ramdisk. > 2. Pick 4 random buffer address (2 in each thread) and inject > uncorrectable memory errors at physical addresses. > 3. Signal both threads to make their memory buffer collapsible, i.e. > calling madvise(MADV_HUGEPAGE). > 4. Wait and then check kernel log: khugepaged is able to recover from > poisoned pages by skipping them. > 5. Signal both threads to inspect their buffer contents and make sure no > data corruption. > > Signed-off-by: Jiaqi Yan Reviewed-by: Yang Shi Just a nit below: > --- > mm/khugepaged.c | 78 ++++++++++++++++++++++++++++++------------------- > 1 file changed, 48 insertions(+), 30 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index c3c217f6ebc6e..3ea2aa55c2c52 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1890,6 +1890,9 @@ static int collapse_file(struct mm_struct *mm, unsi= gned long addr, > { > struct address_space *mapping =3D file->f_mapping; > struct page *hpage; > + struct page *page; > + struct page *tmp; > + struct folio *folio; > pgoff_t index =3D 0, end =3D start + HPAGE_PMD_NR; > LIST_HEAD(pagelist); > XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); > @@ -1934,8 +1937,7 @@ static int collapse_file(struct mm_struct *mm, unsi= gned long addr, > > xas_set(&xas, start); > for (index =3D start; index < end; index++) { > - struct page *page =3D xas_next(&xas); > - struct folio *folio; > + page =3D xas_next(&xas); > > VM_BUG_ON(index !=3D xas.xa_index); > if (is_shmem) { > @@ -2117,10 +2119,7 @@ static int collapse_file(struct mm_struct *mm, uns= igned long addr, > } > nr =3D thp_nr_pages(hpage); > > - if (is_shmem) > - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); > - else { > - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); > + if (!is_shmem) { > filemap_nr_thps_inc(mapping); > /* > * Paired with smp_mb() in do_dentry_open() to ensure > @@ -2131,21 +2130,10 @@ static int collapse_file(struct mm_struct *mm, un= signed long addr, > smp_mb(); > if (inode_is_open_for_write(mapping->host)) { > result =3D SCAN_FAIL; > - __mod_lruvec_page_state(hpage, NR_FILE_THPS, -nr)= ; > filemap_nr_thps_dec(mapping); > goto xa_locked; > } > } > - > - if (nr_none) { > - __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); > - /* nr_none is always 0 for non-shmem. */ > - __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); > - } > - > - /* Join all the small entries into a single multi-index entry */ > - xas_set_order(&xas, start, HPAGE_PMD_ORDER); > - xas_store(&xas, hpage); > xa_locked: > xas_unlock_irq(&xas); > xa_unlocked: > @@ -2158,21 +2146,35 @@ static int collapse_file(struct mm_struct *mm, un= signed long addr, > try_to_unmap_flush(); > > if (result =3D=3D SCAN_SUCCEED) { > - struct page *page, *tmp; > - struct folio *folio; > - > /* > * Replacing old pages with new one has succeeded, now we > - * need to copy the content and free the old pages. > + * attempt to copy the contents. > */ > index =3D start; > - list_for_each_entry_safe(page, tmp, &pagelist, lru) { > + list_for_each_entry(page, &pagelist, lru) { > while (index < page->index) { > clear_highpage(hpage + (index % HPAGE_PMD= _NR)); > index++; > } > - copy_highpage(hpage + (page->index % HPAGE_PMD_NR= ), > - page); > + if (copy_mc_highpage(hpage + (page->index % HPAGE= _PMD_NR), > + page) > 0) { > + result =3D SCAN_COPY_MC; > + break; > + } > + index++; > + } > + while (result =3D=3D SCAN_SUCCEED && index < end) { > + clear_highpage(hpage + (index % HPAGE_PMD_NR)); > + index++; > + } > + } > + > + if (result =3D=3D SCAN_SUCCEED) { > + /* > + * Copying old pages to huge one has succeeded, now we > + * need to free the old pages. > + */ > + list_for_each_entry_safe(page, tmp, &pagelist, lru) { > list_del(&page->lru); > page->mapping =3D NULL; > page_ref_unfreeze(page, 1); > @@ -2180,12 +2182,23 @@ static int collapse_file(struct mm_struct *mm, un= signed long addr, > ClearPageUnevictable(page); > unlock_page(page); > put_page(page); > - index++; > } > - while (index < end) { > - clear_highpage(hpage + (index % HPAGE_PMD_NR)); > - index++; > + > + xas_lock_irq(&xas); > + if (is_shmem) > + __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr)= ; > + else > + __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); > + > + if (nr_none) { > + __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_= none); > + /* nr_none is always 0 for non-shmem. */ > + __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none)= ; > } > + /* Join all the small entries into a single multi-index e= ntry. */ > + xas_set_order(&xas, start, HPAGE_PMD_ORDER); > + xas_store(&xas, hpage); > + xas_unlock_irq(&xas); > > folio =3D page_folio(hpage); > folio_mark_uptodate(folio); > @@ -2203,8 +2216,6 @@ static int collapse_file(struct mm_struct *mm, unsi= gned long addr, > unlock_page(hpage); > hpage =3D NULL; > } else { > - struct page *page; > - > /* Something went wrong: roll back page cache changes */ > xas_lock_irq(&xas); > if (nr_none) { > @@ -2238,6 +2249,13 @@ static int collapse_file(struct mm_struct *mm, uns= igned long addr, > xas_lock_irq(&xas); > } > VM_BUG_ON(nr_none); > + /* > + * Undo the updates of filemap_nr_thps_inc for non-SHMEM = file only. > + * This undo is not needed unless failure is due to SCAN_= COPY_MC. > + */ > + if (!is_shmem && result =3D=3D SCAN_COPY_MC) > + filemap_nr_thps_dec(mapping); We may need a memory barrier here. But missing the memory barrier is not a fatal issue either, the worst case is unnecessary truncate from open path if it sees obsolete nr_thps counter. And it may be better to handle it in a follow up patch by moving smp_mb() into filemap_nr_thp_xxx functions. > + > xas_unlock_irq(&xas); > > hpage->mapping =3D NULL; > -- > 2.40.0.rc0.216.gc4246ad0f0-goog >