From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 5 Jun 2018 16:11:04 +0200 From: Michal Hocko To: Dan Williams Cc: linux-nvdimm , linux-edac@vger.kernel.org, Tony Luck , Borislav Petkov , =?iso-8859-1?B?Suly9G1l?= Glisse , Jan Kara , "H. Peter Anvin" , X86 ML , Thomas Gleixner , Christoph Hellwig , Ross Zwisler , Matthew Wilcox , Ingo Molnar , Naoya Horiguchi , Souptick Joarder , Linux MM , linux-fsdevel Subject: Re: [PATCH v2 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages Message-ID: <20180605141104.GF19202@dhcp22.suse.cz> References: <152800336321.17112.3300876636370683279.stgit@dwillia2-desk3.amr.corp.intel.com> <20180604124031.GP19202@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: On Mon 04-06-18 07:31:25, Dan Williams wrote: [...] > I'm trying to solve this real world problem when real poison is > consumed through a dax mapping: > > mce: Uncorrected hardware memory error in user-access at af34214200 > {1}[Hardware Error]: It has been corrected by h/w and requires > no further action > mce: [Hardware Error]: Machine check events logged > {1}[Hardware Error]: event severity: corrected > Memory failure: 0xaf34214: reserved kernel page still > referenced by 1 users > [..] > Memory failure: 0xaf34214: recovery action for reserved kernel > page: Failed > mce: Memory error not recovered > > ...i.e. currently all poison consumed through dax mappings is > needlessly system fatal. Thanks. That should be a part of the changelog. It would be great to describe why this cannot be simply handled by hwpoison code without any ZONE_DEVICE specific hacks? The error is recoverable so why does hwpoison code even care? -- Michal Hocko SUSE Labs