[RFC,v1,04/11] mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED
diff mbox series

Message ID 1541746035-13408-5-git-send-email-n-horiguchi@ah.jp.nec.com
State New
Headers show
Series
  • hwpoison improvement part 1
Related show

Commit Message

Naoya Horiguchi Nov. 9, 2018, 6:47 a.m. UTC
Currently madvise_inject_error() pins the target page when calling
memory error handler, but it's not good because the refcount is just
an artifact of error injector and mock nothing about hw error itself.
IOW, pinning the error page is part of error handler's task, so
let's stop doing it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/madvise.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

Comments

Anshuman Khandual Nov. 9, 2018, 10:46 a.m. UTC | #1
On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> Currently madvise_inject_error() pins the target page when calling
> memory error handler, but it's not good because the refcount is just
> an artifact of error injector and mock nothing about hw error itself.
> IOW, pinning the error page is part of error handler's task, so
> let's stop doing it.

Did not get that. Could you please kindly explain how an incremented
ref count through get_user_pages_fast() was a mocking the HW error
previously ? Though I might be missing the some context here.

> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  mm/madvise.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
> index 6cb1ca9..9fa0225 100644
> --- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
> +++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
> @@ -637,6 +637,16 @@ static int madvise_inject_error(int behavior,
>  		ret = get_user_pages_fast(start, 1, 0, &page);
>  		if (ret != 1)
>  			return ret;
> +		/*
> +		 * The get_user_pages_fast() is just to get the pfn of the
> +		 * given address, and the refcount has nothing to do with
> +		 * what we try to test, so it should be released immediately.
> +		 * This is racy but it's intended because the real hardware
> +		 * errors could happen at any moment and memory error handlers
> +		 * must properly handle the race.
> +		 */
> +		put_page(page);
> +
>  		pfn = page_to_pfn(page);
>  
>  		/*
> @@ -646,16 +656,11 @@ static int madvise_inject_error(int behavior,
>  		 */
>  		order = compound_order(compound_head(page));
>  
> -		if (PageHWPoison(page)) {
> -			put_page(page);
> -			continue;
> -		}
> -
>  		if (behavior == MADV_SOFT_OFFLINE) {
>  			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
>  					pfn, start);
>  
> -			ret = soft_offline_page(page, MF_COUNT_INCREASED);
> +			ret = soft_offline_page(page, 0);

Probably something defined as a new "ignored" in the memory faults flag
enumeration instead of passing '0' directly.
Naoya Horiguchi Nov. 13, 2018, 12:18 a.m. UTC | #2
On Fri, Nov 09, 2018 at 04:16:55PM +0530, Anshuman Khandual wrote:
> 
> 
> On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> > Currently madvise_inject_error() pins the target page when calling
> > memory error handler, but it's not good because the refcount is just
> > an artifact of error injector and mock nothing about hw error itself.
> > IOW, pinning the error page is part of error handler's task, so
> > let's stop doing it.
> 
> Did not get that. Could you please kindly explain how an incremented
> ref count through get_user_pages_fast() was a mocking the HW error
> previously ? Though I might be missing the some context here.

I meant in "mock nothing about hw error itself" that in the code path
for actual HW error (from MCE handler code) the error page is not pinned
outside (but inside) memory_failure().
So it makes more sense to me to do similarly also in error injection code,
and another good thing is that that makes code more simple (A later patch
eliminates MF_COUNT_INCREASED.)

> 
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > ---
> >  mm/madvise.c | 25 +++++++++++--------------
> >  1 file changed, 11 insertions(+), 14 deletions(-)
> > 
> > diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
> > index 6cb1ca9..9fa0225 100644
> > --- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
> > +++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
> > @@ -637,6 +637,16 @@ static int madvise_inject_error(int behavior,
> >  		ret = get_user_pages_fast(start, 1, 0, &page);
> >  		if (ret != 1)
> >  			return ret;
> > +		/*
> > +		 * The get_user_pages_fast() is just to get the pfn of the
> > +		 * given address, and the refcount has nothing to do with
> > +		 * what we try to test, so it should be released immediately.
> > +		 * This is racy but it's intended because the real hardware
> > +		 * errors could happen at any moment and memory error handlers
> > +		 * must properly handle the race.
> > +		 */
> > +		put_page(page);
> > +
> >  		pfn = page_to_pfn(page);
> >  
> >  		/*
> > @@ -646,16 +656,11 @@ static int madvise_inject_error(int behavior,
> >  		 */
> >  		order = compound_order(compound_head(page));
> >  
> > -		if (PageHWPoison(page)) {
> > -			put_page(page);
> > -			continue;
> > -		}
> > -
> >  		if (behavior == MADV_SOFT_OFFLINE) {
> >  			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
> >  					pfn, start);
> >  
> > -			ret = soft_offline_page(page, MF_COUNT_INCREASED);
> > +			ret = soft_offline_page(page, 0);
> 
> Probably something defined as a new "ignored" in the memory faults flag
> enumeration instead of passing '0' directly.

MF_* flags are defined as bitmap, not separate values. And according to
other caller like do_memory_failure(), multiple bits in flags can be set together.

    static int do_memory_failure(struct mce *m)
    {
            int flags = MF_ACTION_REQUIRED;
            ....
            if (!(m->mcgstatus & MCG_STATUS_RIPV))
                    flags |= MF_MUST_KILL;
            ret = memory_failure(m->addr >> PAGE_SHIFT, flags);

So I think that simply adding new MF_* value doesn't work, and "flags == 0"
seems to me to show "no flag set" in the clearest way.
Or if you have any code suggestion, that's great.

Thanks,
Naoya Horiguchi

Patch
diff mbox series

diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
index 6cb1ca9..9fa0225 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
@@ -637,6 +637,16 @@  static int madvise_inject_error(int behavior,
 		ret = get_user_pages_fast(start, 1, 0, &page);
 		if (ret != 1)
 			return ret;
+		/*
+		 * The get_user_pages_fast() is just to get the pfn of the
+		 * given address, and the refcount has nothing to do with
+		 * what we try to test, so it should be released immediately.
+		 * This is racy but it's intended because the real hardware
+		 * errors could happen at any moment and memory error handlers
+		 * must properly handle the race.
+		 */
+		put_page(page);
+
 		pfn = page_to_pfn(page);
 
 		/*
@@ -646,16 +656,11 @@  static int madvise_inject_error(int behavior,
 		 */
 		order = compound_order(compound_head(page));
 
-		if (PageHWPoison(page)) {
-			put_page(page);
-			continue;
-		}
-
 		if (behavior == MADV_SOFT_OFFLINE) {
 			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
 					pfn, start);
 
-			ret = soft_offline_page(page, MF_COUNT_INCREASED);
+			ret = soft_offline_page(page, 0);
 			if (ret)
 				return ret;
 			continue;
@@ -663,14 +668,6 @@  static int madvise_inject_error(int behavior,
 
 		pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
 				pfn, start);
-
-		/*
-		 * Drop the page reference taken by get_user_pages_fast(). In
-		 * the absence of MF_COUNT_INCREASED the memory_failure()
-		 * routine is responsible for pinning the page to prevent it
-		 * from being released back to the page allocator.
-		 */
-		put_page(page);
 		ret = memory_failure(pfn, 0);
 		if (ret)
 			return ret;