From: "Luck, Tony" <tony.luck@intel.com> To: Shuai Xue <xueshuai@linux.alibaba.com>, David Laight <David.Laight@ACULAB.COM> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>, Andrew Morton <akpm@linux-foundation.org>, Miaohe Lin <linmiaohe@huawei.com>, "Matthew Wilcox" <willy@infradead.org>, "Williams, Dan J" <dan.j.williams@intel.com>, Michael Ellerman <mpe@ellerman.id.au>, Nicholas Piggin <npiggin@gmail.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org> Subject: RE: [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults Date: Fri, 21 Oct 2022 16:30:50 +0000 [thread overview] Message-ID: <SJ1PR11MB6083CEDBA2719825A1AD325EFC2D9@SJ1PR11MB6083.namprd11.prod.outlook.com> (raw) In-Reply-To: <dda2321d-15f4-342a-2fbe-5c535858eb34@linux.alibaba.com> >> But maybe it is some RMW instruction ... then, if all the above options didn't happen ... we >> could get another machine check from the same address. But then we just follow the usual >> recovery path. > Let assume the instruction that cause the COW is in the 63/64 case, aka, > it is writing a different cache line from the poisoned one. But the new_page > allocated in COW is dropped right? So might page fault again? It can, but this should be no surprise to a user that has a signal handler for a h/w event (SIGBUS, SIGSEGV, SIGILL) that does nothing to address the problem, but simply returns to re-execute the same instruction that caused the original trap. There may be badly written signal handlers that do this. But they just cause pain for themselves. Linux can keep taking the traps and fixing things up and sending a new signal over and over. In this case that loop may involve taking the machine check again, so some extra pain for the kernel, but recoverable machine checks on Intel/x86 switched from broadcast to delivery to just the logical CPU that tried to consume the poison a few generations back. So only a bit more painful than a repeated page fault. -Tony
WARNING: multiple messages have this Message-ID (diff)
From: "Luck, Tony" <tony.luck@intel.com> To: Shuai Xue <xueshuai@linux.alibaba.com>, David Laight <David.Laight@ACULAB.COM> Cc: Miaohe Lin <linmiaohe@huawei.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Matthew Wilcox <willy@infradead.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, Nicholas Piggin <npiggin@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>, "Williams, Dan J" <dan.j.williams@intel.com> Subject: RE: [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults Date: Fri, 21 Oct 2022 16:30:50 +0000 [thread overview] Message-ID: <SJ1PR11MB6083CEDBA2719825A1AD325EFC2D9@SJ1PR11MB6083.namprd11.prod.outlook.com> (raw) In-Reply-To: <dda2321d-15f4-342a-2fbe-5c535858eb34@linux.alibaba.com> >> But maybe it is some RMW instruction ... then, if all the above options didn't happen ... we >> could get another machine check from the same address. But then we just follow the usual >> recovery path. > Let assume the instruction that cause the COW is in the 63/64 case, aka, > it is writing a different cache line from the poisoned one. But the new_page > allocated in COW is dropped right? So might page fault again? It can, but this should be no surprise to a user that has a signal handler for a h/w event (SIGBUS, SIGSEGV, SIGILL) that does nothing to address the problem, but simply returns to re-execute the same instruction that caused the original trap. There may be badly written signal handlers that do this. But they just cause pain for themselves. Linux can keep taking the traps and fixing things up and sending a new signal over and over. In this case that loop may involve taking the machine check again, so some extra pain for the kernel, but recoverable machine checks on Intel/x86 switched from broadcast to delivery to just the logical CPU that tried to consume the poison a few generations back. So only a bit more painful than a repeated page fault. -Tony
next prev parent reply other threads:[~2022-10-21 16:31 UTC|newest] Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-10-17 23:42 [RFC PATCH] mm, hwpoison: Recover from copy-on-write machine checks Tony Luck 2022-10-18 8:43 ` HORIGUCHI NAOYA(堀口 直也) 2022-10-18 17:52 ` Luck, Tony 2022-10-19 17:08 ` [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults Tony Luck 2022-10-19 17:08 ` Tony Luck 2022-10-19 17:45 ` Dan Williams 2022-10-19 17:45 ` Dan Williams 2022-10-19 20:30 ` Luck, Tony 2022-10-19 20:30 ` Luck, Tony 2022-10-20 1:57 ` Shuai Xue 2022-10-20 1:57 ` Shuai Xue 2022-10-20 20:05 ` Tony Luck 2022-10-20 20:05 ` Tony Luck 2022-10-21 1:38 ` Miaohe Lin 2022-10-21 1:38 ` Miaohe Lin 2022-10-21 3:57 ` Luck, Tony 2022-10-21 3:57 ` Luck, Tony 2022-10-21 1:52 ` Shuai Xue 2022-10-21 1:52 ` Shuai Xue 2022-10-21 4:08 ` Tony Luck 2022-10-21 4:08 ` Tony Luck 2022-10-21 4:11 ` David Laight 2022-10-21 4:11 ` David Laight 2022-10-21 4:41 ` Luck, Tony 2022-10-21 4:41 ` Luck, Tony 2022-10-21 9:29 ` Shuai Xue 2022-10-21 9:29 ` Shuai Xue 2022-10-21 16:30 ` Luck, Tony [this message] 2022-10-21 16:30 ` Luck, Tony 2022-10-23 15:04 ` Shuai Xue 2022-10-23 15:04 ` Shuai Xue 2022-10-21 6:57 ` Shuai Xue 2022-10-21 6:57 ` Shuai Xue 2022-10-21 20:01 ` [PATCH v3 0/2] Copy-on-write poison recovery Tony Luck 2022-10-21 20:01 ` Tony Luck 2022-10-21 20:01 ` [PATCH v3 1/2] mm, hwpoison: Try to recover from copy-on write faults Tony Luck 2022-10-21 20:01 ` Tony Luck 2022-10-25 5:46 ` HORIGUCHI NAOYA(堀口 直也) 2022-10-25 5:46 ` HORIGUCHI NAOYA(堀口 直也) 2022-10-28 2:11 ` Miaohe Lin 2022-10-28 2:11 ` Miaohe Lin 2022-10-28 16:09 ` Luck, Tony 2022-10-28 16:09 ` Luck, Tony 2022-11-02 14:27 ` Alexander Potapenko 2022-11-02 14:27 ` Alexander Potapenko 2022-11-02 14:30 ` Alexander Potapenko 2022-11-02 14:30 ` Alexander Potapenko 2022-10-21 20:01 ` [PATCH v3 2/2] mm, hwpoison: When copy-on-write hits poison, take page offline Tony Luck 2022-10-21 20:01 ` Tony Luck 2022-10-28 2:28 ` Miaohe Lin 2022-10-28 2:28 ` Miaohe Lin 2022-10-28 16:13 ` Luck, Tony 2022-10-28 16:13 ` Luck, Tony 2022-10-29 1:55 ` Miaohe Lin 2022-10-29 1:55 ` Miaohe Lin 2022-10-23 15:52 ` [PATCH v3 0/2] Copy-on-write poison recovery Shuai Xue 2022-10-23 15:52 ` Shuai Xue 2022-10-26 5:19 ` Shuai Xue 2022-10-26 5:19 ` Shuai Xue 2022-10-31 20:10 ` [PATCH v4 " Tony Luck 2022-10-31 20:10 ` Tony Luck 2022-10-31 20:10 ` [PATCH v4 1/2] mm, hwpoison: Try to recover from copy-on write faults Tony Luck 2022-10-31 20:10 ` Tony Luck 2022-10-31 20:10 ` [PATCH v4 2/2] mm, hwpoison: When copy-on-write hits poison, take page offline Tony Luck 2022-10-31 20:10 ` Tony Luck 2023-05-18 21:49 ` Jane Chu 2023-05-18 22:10 ` Luck, Tony 2023-05-19 7:28 ` Greg Kroah-Hartman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=SJ1PR11MB6083CEDBA2719825A1AD325EFC2D9@SJ1PR11MB6083.namprd11.prod.outlook.com \ --to=tony.luck@intel.com \ --cc=David.Laight@ACULAB.COM \ --cc=akpm@linux-foundation.org \ --cc=christophe.leroy@csgroup.eu \ --cc=dan.j.williams@intel.com \ --cc=linmiaohe@huawei.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mpe@ellerman.id.au \ --cc=naoya.horiguchi@nec.com \ --cc=npiggin@gmail.com \ --cc=willy@infradead.org \ --cc=xueshuai@linux.alibaba.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.