From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 5F643211211FD for ; Tue, 5 Jun 2018 07:11:07 -0700 (PDT) Date: Tue, 5 Jun 2018 16:11:04 +0200 From: Michal Hocko Subject: Re: [PATCH v2 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages Message-ID: <20180605141104.GF19202@dhcp22.suse.cz> References: <152800336321.17112.3300876636370683279.stgit@dwillia2-desk3.amr.corp.intel.com> <20180604124031.GP19202@dhcp22.suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Dan Williams Cc: Tony Luck , Jan Kara , Matthew Wilcox , linux-nvdimm , X86 ML , Linux MM , =?iso-8859-1?B?Suly9G1l?= Glisse , Ingo Molnar , Borislav Petkov , Souptick Joarder , "H. Peter Anvin" , linux-fsdevel , Thomas Gleixner , Christoph Hellwig , Naoya Horiguchi , linux-edac@vger.kernel.org List-ID: On Mon 04-06-18 07:31:25, Dan Williams wrote: [...] > I'm trying to solve this real world problem when real poison is > consumed through a dax mapping: > > mce: Uncorrected hardware memory error in user-access at af34214200 > {1}[Hardware Error]: It has been corrected by h/w and requires > no further action > mce: [Hardware Error]: Machine check events logged > {1}[Hardware Error]: event severity: corrected > Memory failure: 0xaf34214: reserved kernel page still > referenced by 1 users > [..] > Memory failure: 0xaf34214: recovery action for reserved kernel > page: Failed > mce: Memory error not recovered > > ...i.e. currently all poison consumed through dax mappings is > needlessly system fatal. Thanks. That should be a part of the changelog. It would be great to describe why this cannot be simply handled by hwpoison code without any ZONE_DEVICE specific hacks? The error is recoverable so why does hwpoison code even care? -- Michal Hocko SUSE Labs _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm