* Re: [PATCH] mm: clean up hwpoison page cache page in fault path
2022-02-11 22:05 [PATCH] mm: clean up hwpoison page cache page in fault path Rik van Riel
@ 2022-02-12 3:10 ` Miaohe Lin
2022-02-12 18:17 ` kernel test robot
2022-02-13 8:56 ` John Hubbard
2 siblings, 0 replies; 5+ messages in thread
From: Miaohe Lin @ 2022-02-12 3:10 UTC (permalink / raw)
To: Rik van Riel
Cc: kernel-team, linux-mm, Andrew Morton, Mel Gorman,
Johannes Weiner, Matthew Wilcox, linux-kernel
On 2022/2/12 6:05, Rik van Riel wrote:
> Sometimes the page offlining code can leave behind a hwpoisoned clean
> page cache page. This can lead to programs being killed over and over
Yep, __soft_offline_page tries to invalidate_inode_page in a lightway.
> and over again as they fault in the hwpoisoned page, get killed, and
> then get re-spawned by whatever wanted to run them.
>
> This is particularly embarrassing when the page was offlined due to
> having too many corrected memory errors. Now we are killing tasks
> due to them trying to access memory that probably isn't even corrupted.
>
> This problem can be avoided by invalidating the page from the page
> fault handler, which already has a branch for dealing with these
> kinds of pages. With this patch we simply pretend the page fault
> was successful if the page was invalidated, return to userspace,
> incur another page fault, read in the file from disk (to a new
> memory page), and then everything works again.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
Good catch! This looks good to me. Thanks.
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
>
> diff --git a/mm/memory.c b/mm/memory.c
> index c125c4969913..2300358e268c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3871,11 +3871,16 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
> return ret;
>
> if (unlikely(PageHWPoison(vmf->page))) {
> - if (ret & VM_FAULT_LOCKED)
> + int poisonret = VM_FAULT_HWPOISON;
> + if (ret & VM_FAULT_LOCKED) {
> + /* Retry if a clean page was removed from the cache. */
> + if (invalidate_inode_page(vmf->page))
> + poisonret = 0;
> unlock_page(vmf->page);
> + }
> put_page(vmf->page);
> vmf->page = NULL;
> - return VM_FAULT_HWPOISON;
> + return poisonret;
> }
>
> if (unlikely(!(ret & VM_FAULT_LOCKED)))
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: clean up hwpoison page cache page in fault path
2022-02-11 22:05 [PATCH] mm: clean up hwpoison page cache page in fault path Rik van Riel
2022-02-12 3:10 ` Miaohe Lin
@ 2022-02-12 18:17 ` kernel test robot
2022-02-13 8:56 ` John Hubbard
2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2022-02-12 18:17 UTC (permalink / raw)
To: Rik van Riel, linux-kernel
Cc: kbuild-all, kernel-team, linux-mm, Andrew Morton, Mel Gorman,
Johannes Weiner, Matthew Wilcox
Hi Rik,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on hnaz-mm/master]
url: https://github.com/0day-ci/linux/commits/Rik-van-Riel/mm-clean-up-hwpoison-page-cache-page-in-fault-path/20220212-060643
base: https://github.com/hnaz/linux-mm master
config: sparc-randconfig-s031-20220211 (https://download.01.org/0day-ci/archive/20220212/202202122306.S9ByO64R-lkp@intel.com/config)
compiler: sparc-linux-gcc (GCC) 11.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/0day-ci/linux/commit/abd960cdbc9487dbf0a0dc3b2395825a38f8fa44
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Rik-van-Riel/mm-clean-up-hwpoison-page-cache-page-in-fault-path/20220212-060643
git checkout abd960cdbc9487dbf0a0dc3b2395825a38f8fa44
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=sparc SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
>> mm/memory.c:3913:33: sparse: sparse: incorrect type in initializer (different base types) @@ expected int poisonret @@ got restricted vm_fault_t @@
mm/memory.c:3913:33: sparse: expected int poisonret
mm/memory.c:3913:33: sparse: got restricted vm_fault_t
>> mm/memory.c:3922:24: sparse: sparse: incorrect type in return expression (different base types) @@ expected restricted vm_fault_t @@ got int [assigned] poisonret @@
mm/memory.c:3922:24: sparse: expected restricted vm_fault_t
mm/memory.c:3922:24: sparse: got int [assigned] poisonret
mm/memory.c:1024:17: sparse: sparse: context imbalance in 'copy_pte_range' - different lock contexts for basic block
mm/memory.c:1740:16: sparse: sparse: context imbalance in '__get_locked_pte' - different lock contexts for basic block
mm/memory.c:1788:9: sparse: sparse: context imbalance in 'insert_page' - different lock contexts for basic block
mm/memory.c:2290:17: sparse: sparse: context imbalance in 'remap_pte_range' - different lock contexts for basic block
mm/memory.c:2546:17: sparse: sparse: context imbalance in 'apply_to_pte_range' - unexpected unlock
mm/memory.c:2834:17: sparse: sparse: context imbalance in 'wp_page_copy' - unexpected unlock
mm/memory.c:3173:17: sparse: sparse: context imbalance in 'wp_pfn_shared' - unexpected unlock
mm/memory.c:3236:19: sparse: sparse: context imbalance in 'do_wp_page' - different lock contexts for basic block
mm/memory.c:4939:5: sparse: sparse: context imbalance in 'follow_invalidate_pte' - different lock contexts for basic block
mm/memory.c: note: in included file (through arch/sparc/include/asm/pgtable.h, include/linux/pgtable.h, include/linux/mm.h):
arch/sparc/include/asm/pgtable_32.h:275:29: sparse: sparse: context imbalance in 'follow_pfn' - unexpected unlock
vim +3913 mm/memory.c
3875
3876 /*
3877 * The mmap_lock must have been held on entry, and may have been
3878 * released depending on flags and vma->vm_ops->fault() return value.
3879 * See filemap_fault() and __lock_page_retry().
3880 */
3881 static vm_fault_t __do_fault(struct vm_fault *vmf)
3882 {
3883 struct vm_area_struct *vma = vmf->vma;
3884 vm_fault_t ret;
3885
3886 /*
3887 * Preallocate pte before we take page_lock because this might lead to
3888 * deadlocks for memcg reclaim which waits for pages under writeback:
3889 * lock_page(A)
3890 * SetPageWriteback(A)
3891 * unlock_page(A)
3892 * lock_page(B)
3893 * lock_page(B)
3894 * pte_alloc_one
3895 * shrink_page_list
3896 * wait_on_page_writeback(A)
3897 * SetPageWriteback(B)
3898 * unlock_page(B)
3899 * # flush A, B to clear the writeback
3900 */
3901 if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) {
3902 vmf->prealloc_pte = pte_alloc_one(vma->vm_mm);
3903 if (!vmf->prealloc_pte)
3904 return VM_FAULT_OOM;
3905 }
3906
3907 ret = vma->vm_ops->fault(vmf);
3908 if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
3909 VM_FAULT_DONE_COW)))
3910 return ret;
3911
3912 if (unlikely(PageHWPoison(vmf->page))) {
> 3913 int poisonret = VM_FAULT_HWPOISON;
3914 if (ret & VM_FAULT_LOCKED) {
3915 /* Retry if a clean page was removed from the cache. */
3916 if (invalidate_inode_page(vmf->page))
3917 poisonret = 0;
3918 unlock_page(vmf->page);
3919 }
3920 put_page(vmf->page);
3921 vmf->page = NULL;
> 3922 return poisonret;
3923 }
3924
3925 if (unlikely(!(ret & VM_FAULT_LOCKED)))
3926 lock_page(vmf->page);
3927 else
3928 VM_BUG_ON_PAGE(!PageLocked(vmf->page), vmf->page);
3929
3930 return ret;
3931 }
3932
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: clean up hwpoison page cache page in fault path
2022-02-11 22:05 [PATCH] mm: clean up hwpoison page cache page in fault path Rik van Riel
2022-02-12 3:10 ` Miaohe Lin
2022-02-12 18:17 ` kernel test robot
@ 2022-02-13 8:56 ` John Hubbard
2022-02-13 19:21 ` Rik van Riel
2 siblings, 1 reply; 5+ messages in thread
From: John Hubbard @ 2022-02-13 8:56 UTC (permalink / raw)
To: Rik van Riel
Cc: linux-kernel, kernel-team, linux-mm, Andrew Morton, Mel Gorman,
Johannes Weiner, Matthew Wilcox
On Fri, 11 Feb 2022, Rik van Riel wrote:
> Sometimes the page offlining code can leave behind a hwpoisoned clean
> page cache page. This can lead to programs being killed over and over
> and over again as they fault in the hwpoisoned page, get killed, and
> then get re-spawned by whatever wanted to run them.
Hi Rik,
This looks good. Some minor suggestions below:
>
> This is particularly embarrassing when the page was offlined due to
> having too many corrected memory errors. Now we are killing tasks
> due to them trying to access memory that probably isn't even corrupted.
I'd recommend deleting that paragraph entirely. It's a separate
question, and it is not necessarily an accurate assessment of that
question either: the engineers who set the thresholds for "too many
corrected errors" may not--in fact, probably *will not*--agree with your
feeling that the memory is still working and reliable!
>
> This problem can be avoided by invalidating the page from the page
> fault handler, which already has a branch for dealing with these
> kinds of pages. With this patch we simply pretend the page fault
> was successful if the page was invalidated, return to userspace,
> incur another page fault, read in the file from disk (to a new
> memory page), and then everything works again.
Very nice write-up here, it's rare that I get to read such a short, yet
clear explanation. Thank you for that. :)
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
>
> diff --git a/mm/memory.c b/mm/memory.c
> index c125c4969913..2300358e268c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3871,11 +3871,16 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
> return ret;
>
> if (unlikely(PageHWPoison(vmf->page))) {
> - if (ret & VM_FAULT_LOCKED)
> + int poisonret = VM_FAULT_HWPOISON;
> + if (ret & VM_FAULT_LOCKED) {
How about changing those two lines to these three (note the newline
after the declaration):
vm_fault_t poison_ret = VM_FAULT_HWPOISON;
if (ret & VM_FAULT_LOCKED) {
...which should fix the krobot complaint, and the underscore is just a
very tiny readability improvement while we're there.
> + /* Retry if a clean page was removed from the cache. */
This is more cryptic than necessary, and in fact you've already provided
a write-up, so how about something like this, instead of the above:
/*
* Avoid refaulting on the same poisoned page
* forever: attempt to invalidate the page. If that
* succeeds, then pretend the page fault was successful,
* return to userspace, incur another page fault, read
* in the file from disk (to a new page),and then
* everything works again.
*/
> + if (invalidate_inode_page(vmf->page))
> + poisonret = 0;
> unlock_page(vmf->page);
> + }
> put_page(vmf->page);
> vmf->page = NULL;
> - return VM_FAULT_HWPOISON;
> + return poisonret;
> }
>
> if (unlikely(!(ret & VM_FAULT_LOCKED)))
>
>
> --
> All rights reversed.
Anyway, those are all documentation nits, except for the vm_fault_t,
which is a declaration nit :) , so I'm comfortable saying:
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply [flat|nested] 5+ messages in thread