From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <153074042316.27838.17319837331947007626.stgit@dwillia2-desk3.amr.corp.intel.com> References: <153074042316.27838.17319837331947007626.stgit@dwillia2-desk3.amr.corp.intel.com> From: Dan Williams Date: Thu, 12 Jul 2018 21:44:44 -0700 Message-ID: Subject: Re: [PATCH v5 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages To: linux-nvdimm Cc: linux-edac@vger.kernel.org, Tony Luck , Borislav Petkov , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jan Kara , "H. Peter Anvin" , X86 ML , Thomas Gleixner , Christoph Hellwig , Ross Zwisler , Ingo Molnar , Michal Hocko , Naoya Horiguchi , Souptick Joarder , linux-fsdevel , Linux MM , Linux Kernel Mailing List , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: On Wed, Jul 4, 2018 at 2:40 PM, Dan Williams wrote: > Changes since v4 [1]: > * Rework dax_lock_page() to reuse get_unlocked_mapping_entry() (Jan) > > * Change the calling convention to take a 'struct page *' and return > success / failure instead of performing the pfn_to_page() internal to > the api (Jan, Ross). > > * Rename dax_lock_page() to dax_lock_mapping_entry() (Jan) > > * Account for the case that a given pfn can be fsdax mapped with > different sizes in different vmas (Jan) > > * Update collect_procs() to determine the mapping size of the pfn for > each page given it can be variable in the dax case. > > [1]: https://lists.01.org/pipermail/linux-nvdimm/2018-June/016279.html > > --- > > As it stands, memory_failure() gets thoroughly confused by dev_pagemap > backed mappings. The recovery code has specific enabling for several > possible page states and needs new enabling to handle poison in dax > mappings. > > In order to support reliable reverse mapping of user space addresses: > > 1/ Add new locking in the memory_failure() rmap path to prevent races > that would typically be handled by the page lock. > > 2/ Since dev_pagemap pages are hidden from the page allocator and the > "compound page" accounting machinery, add a mechanism to determine the > size of the mapping that encompasses a given poisoned pfn. > > 3/ Given pmem errors can be repaired, change the speculatively accessed > poison protection, mce_unmap_kpfn(), to be reversible and otherwise > allow ongoing access from the kernel. > > A side effect of this enabling is that MADV_HWPOISON becomes usable for > dax mappings, however the primary motivation is to allow the system to > survive userspace consumption of hardware-poison via dax. Specifically > the current behavior is: > > mce: Uncorrected hardware memory error in user-access at af34214200 > {1}[Hardware Error]: It has been corrected by h/w and requires no further action > mce: [Hardware Error]: Machine check events logged > {1}[Hardware Error]: event severity: corrected > Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users > [..] > Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed > mce: Memory error not recovered > > > ...and with these changes: > > Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000 > Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption > Memory failure: 0x20cb00: recovery action for dax page: Recovered > > Given all the cross dependencies I propose taking this through > nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax > folks. > Hi, Any comments on this series? Matthew is patiently waiting to rebase some of his Xarray work until the dax_lock_mapping_entry() changes hit -next.