From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F36EC76195 for ; Mon, 27 Mar 2023 21:15:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EC0D900003; Mon, 27 Mar 2023 17:15:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39A36900002; Mon, 27 Mar 2023 17:15:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26213900003; Mon, 27 Mar 2023 17:15:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 17B08900002 for ; Mon, 27 Mar 2023 17:15:44 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D26B1406DC for ; Mon, 27 Mar 2023 21:15:43 +0000 (UTC) X-FDA: 80615934966.19.E2234AF Received: from mail-yb1-f170.google.com (mail-yb1-f170.google.com [209.85.219.170]) by imf26.hostedemail.com (Postfix) with ESMTP id 20C9D140017 for ; Mon, 27 Mar 2023 21:15:41 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="qry7is/h"; spf=pass (imf26.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.219.170 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679951742; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oeoZI/dPWB45ipnsxcuHCuK40eLTBc6PYGDjND7NW0M=; b=l4zN8s0Q0A9Hu5Hk0UjV1VptwmiZgMtvcUvnWx2OrAi8U176oTsFg04N6pg+GIDvQHO0f7 Fa0MWD7ToLFId60yH24D9dQaGGPHGPbNKvPGyvmC3D1Fp7JywfJjqsyWpxSEGAH0MhRTJH 8rGnOVRHzq8/xD+DTrZ0pb3o3rMI40I= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="qry7is/h"; spf=pass (imf26.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.219.170 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679951742; a=rsa-sha256; cv=none; b=AlvzcIIldkr6M8M+YMUlorvjQ7cq+a3zTmFhbyWTv7QOIl0syMMd4TRyD9QUmqrDETuKIO xQw8fv3frYUEjA5Nvw7Y0HZWHSpaLzcBu0+M2g/yZLiY+e4p00iy8huplVSCRbfoX4qqfx Uqn/enYzgzQOkhKhcaZrVYoaeNY6S0c= Received: by mail-yb1-f170.google.com with SMTP id j7so12537859ybg.4 for ; Mon, 27 Mar 2023 14:15:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679951741; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=oeoZI/dPWB45ipnsxcuHCuK40eLTBc6PYGDjND7NW0M=; b=qry7is/hcvKgW42GKjRIPeR9lh3SJAO8cZp1EI4KJPJJ7tX83paiwKy+IC405jhcpU Ltbn4N2AjzbDSQV9vTNLVshZ4IVkZIjLL7xyz8SkHRrUBhVO79ylcfD5Okn1xizBXVHl 91Qk0/JQd9TTX5hI5Z8iDh57ZxBX0onIuKHXxcwp3fbGhzmZovdhyYMWyhc/n5Ezeenk nBj/reyv0eSRPiHN5Xk4p6XoRPUJkX1hBctmdJDCus6QddMB0G9j78f+8YFU6pRIF+bq ACl+cYR+TNo8383bjfCsE0wxLKXHEG2KDPW3EDLx8ACdfIH2tuKr6p70mA4x1RiRgD1h l3eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679951741; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oeoZI/dPWB45ipnsxcuHCuK40eLTBc6PYGDjND7NW0M=; b=e32hYUL4OyixJDEiPXf6ZwF6cqj0FKZe50b5MX5Z5QUQv8dgx5iwtICAw8BYys1KSh 8KtP3rybcT1K2+Gytak5M9PNkoRGoYSOmUQUouwXNe+KtMqO5TazmzQSHhsMrCitLWWn gGVF13mC+TEl+ByoiiB9U5RDytHpqtCz1dZFOPSfUD+wdaVCgcrfOXoHglsjKBD/aTgW akZlcAmgcNjyuxqhr5qLJoVeymE7o8FMre7G6XD6nVJzBJnXQtr+vKIeuAwFauQbquGe eg3nEVes3NR99wyt1Jm9yOut8el3d/pUrb1v1ai+lJpqVMMkkswlpB7ASZ9naiBbnLl5 8xrQ== X-Gm-Message-State: AAQBX9cQQ/jskpP3BOBt86TDVA3ynLNOmvgbvgCqGOpDfdTtXe2Tw9wg Y5MP9YHU33ltMQ5TwdGw9e3UwlZxtDkmsTV5yuvZdg== X-Google-Smtp-Source: AKy350YZGpboqPqeVLJ1Z368weh7w364vueJaKfSu63JbWPxYul5/GdiTKy188B9H71DAtf3x0/nDod3RCUKFwHAgK8= X-Received: by 2002:a05:6902:154a:b0:b3c:637f:ad00 with SMTP id r10-20020a056902154a00b00b3c637fad00mr8652068ybu.5.1679951741101; Mon, 27 Mar 2023 14:15:41 -0700 (PDT) MIME-Version: 1.0 References: <20230305065112.1932255-1-jiaqiyan@google.com> <20230305065112.1932255-4-jiaqiyan@google.com> <3731c8e-961c-7497-f7c9-5edf8c6ea793@google.com> In-Reply-To: <3731c8e-961c-7497-f7c9-5edf8c6ea793@google.com> From: Jiaqi Yan Date: Mon, 27 Mar 2023 14:15:29 -0700 Message-ID: Subject: Re: [PATCH v10 3/3] mm/khugepaged: recover from poisoned file-backed memory To: Hugh Dickins Cc: Yang Shi , kirill.shutemov@linux.intel.com, kirill@shutemov.name, tongtiangen@huawei.com, tony.luck@intel.com, akpm@linux-foundation.org, naoya.horiguchi@nec.com, linmiaohe@huawei.com, linux-mm@kvack.org, osalvador@suse.de, wangkefeng.wang@huawei.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 20C9D140017 X-Rspam-User: X-Stat-Signature: swpy855dmfeo58mug6nge1xhifhpxqzw X-HE-Tag: 1679951741-870111 X-HE-Meta: U2FsdGVkX1+yHci2GPO7b6og044A4Jp9hS6aAydTQngyvuPs8Q+ar0tIs97ZJCtrvPRTIno+UU+a3/4Advmue5CCgZPrKfSnwRjbbLiRfxUYfKzftQjyhH6Mo3hPP3ZTOSwBHlEwB3uOXHH9ZtnwQNtGGhDzRl3KYUxVVhbHxqdX97T/tDWyJVn8RBc8aiVYTkMw543A9fluXsU/+loa6bu0zApsAhgahfNbdUdfJ5LEHg4qOY8SHs1e4k0n3sEg4eMwI1njeo5vSU+7mNQ5QQkDkRDcnI8tXmKvqBN2nXBGmLUyxrqGmV50wPeYtYknRXx0uzsgCh/HinfW6fPoy8ujOPNu35Y7x6vBnrZk7miWR5XWHXDFqBhVp5xISSGT3gpmwyLV+t7rSEW4+bOCpE4AzobK6zxya9RmEjeta9JZ5kC9TCrSDs598idOp+UO4vuNnGQGQvPCNX/969sJ6/JL6vu1XXFSlCWfPAvyXxRUAdKS/6y6EQfIOOi3y9zXybdcCcbFU6ESI9XiSJu2r1zcE+MLW2gIZUK4Be/Hgoh10NiZF7yFMJ6rwIuGDZ4JoEYLm3pxmdqX8QU8xkGY9riOSlCGyPvlnXMo55cn1KiuBEBschAxGJeHuuo4P7i+lrZipX4xe5ItFlkqi7J4EMbJabSZMVmXQROc1IUzgdTOXA59diTpAm8N+X7zjsDLj0jpx/Xldm2XgcWYU5tlpEUg1C3SoqJ9cdHTDZ0KWrXr6MiWiAW7C/s6i+wlArhiO33l94SAN8xQA6JB6e74yr5nodDmqh3oHboCm7U85Tx/+u+iJhOCkkn6pABXuP/72tB4lVwkq32AxRL3gZ0NGJQfZ7Zk4XjlGdfW+gEmOS6dN5YF4oCReYOYqegRftYLA+IC2QH6JYl5spPQarPKtocJPNKKr8WWTgSIO878GlZ2tgGAza0whK4L4q8molSKkU6PzBnlzQjv08EVVDN gg+FNhvJ QfDzqvlz/jRMpoOXEGlZJNTXwMMnRnPmYTA2+XAJ6uHuijAM4JFcMnYdgKv7U87oL+k3ubG/iFxoxZ91vqC/EUy1rWLgaPhp5hOWqELds1QTVq6JiTEuSF1rUnltCzL3jiL9zZhFzEik0TJnfQQwIgr6lrDolRmMlGiIC7Hais6y7vmW0sutpYLrIfvXP+YxDTjZO7KmvIBD+rHYbl7nzcvlR/yJ++5DVhqjoGeqauW9ZqF7KBFFaUKg+SqCf/x9ZfrnyPin+bwHUxZoZs+IlAZGjUG4qmGupwKt532RtHN1N6cg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 24, 2023 at 5:39=E2=80=AFPM Hugh Dickins wro= te: > > On Fri, 24 Mar 2023, Jiaqi Yan wrote: > > On Fri, Mar 24, 2023 at 2:15=E2=80=AFPM Yang Shi = wrote: > > > On Sat, Mar 4, 2023 at 10:51=E2=80=AFPM Jiaqi Yan wrote: > > > > > > > > Make collapse_file roll back when copying pages failed. More concre= tely: > > > > - extract copying operations into a separate loop > > > > - postpone the updates for nr_none until both scanning and copying > > > > succeeded > > > > - postpone joining small xarray entries until both scanning and cop= ying > > > > succeeded > > > > - postpone the update operations to NR_XXX_THPS until both scanning= and > > > > copying succeeded > > > > - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeed= ed but > > > > copying failed > > > > > > > > Tested manually: > > > > 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramd= isk. > > > > 1. Start a two-thread application. Each thread allocates a chunk of > > > > non-huge memory buffer from /mnt/ramdisk. > > > > 2. Pick 4 random buffer address (2 in each thread) and inject > > > > uncorrectable memory errors at physical addresses. > > > > 3. Signal both threads to make their memory buffer collapsible, i.e= . > > > > calling madvise(MADV_HUGEPAGE). > > > > 4. Wait and then check kernel log: khugepaged is able to recover fr= om > > > > poisoned pages by skipping them. > > > > 5. Signal both threads to inspect their buffer contents and make su= re no > > > > data corruption. > > > > > > > > Signed-off-by: Jiaqi Yan > > > > > > Reviewed-by: Yang Shi > > > > > > Just a nit below: > > Acked-by: Hugh Dickins > > with a little nit from me below, if you are respinning: > > > > > > > > --- > > > > mm/khugepaged.c | 78 ++++++++++++++++++++++++++++++---------------= ---- > > > > 1 file changed, 48 insertions(+), 30 deletions(-) > > > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > index c3c217f6ebc6e..3ea2aa55c2c52 100644 > > > > --- a/mm/khugepaged.c > > > > +++ b/mm/khugepaged.c > > > > @@ -1890,6 +1890,9 @@ static int collapse_file(struct mm_struct *mm= , unsigned long addr, > > > > { > > > > struct address_space *mapping =3D file->f_mapping; > > > > struct page *hpage; > > > > + struct page *page; > > > > + struct page *tmp; > > > > + struct folio *folio; > > > > pgoff_t index =3D 0, end =3D start + HPAGE_PMD_NR; > > > > LIST_HEAD(pagelist); > > > > XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORD= ER); > > > > @@ -1934,8 +1937,7 @@ static int collapse_file(struct mm_struct *mm= , unsigned long addr, > > > > > > > > xas_set(&xas, start); > > > > for (index =3D start; index < end; index++) { > > > > - struct page *page =3D xas_next(&xas); > > > > - struct folio *folio; > > > > + page =3D xas_next(&xas); > > > > > > > > VM_BUG_ON(index !=3D xas.xa_index); > > > > if (is_shmem) { > > > > @@ -2117,10 +2119,7 @@ static int collapse_file(struct mm_struct *m= m, unsigned long addr, > > > > } > > > > nr =3D thp_nr_pages(hpage); > > > > > > > > - if (is_shmem) > > > > - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); > > > > - else { > > > > - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); > > > > + if (!is_shmem) { > > > > filemap_nr_thps_inc(mapping); > > > > /* > > > > * Paired with smp_mb() in do_dentry_open() to ensu= re > > That "nr =3D thp_nr_pages(hpage);" above becomes stranded a long way away > from where "nr" is actually used for updating those statistics: please > move it down with them. (I see "nr" is also reported in the tracepoint > at the end, FWIW, so maybe that will show "0" in more failure cases than > it used to, but that's okay - it has been decently initialized.) > Thanks Hugh! I will make sure V11 moves "nr" closer to the place it is used= . > Thanks, > Hugh