All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>,
	Aili Yao <yaoaili@kingsoft.com>,
	"david@redhat.com" <david@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"inux-edac@vger.kernel.org" <inux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>
Subject: Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned
Date: Thu, 25 Feb 2021 10:15:42 -0800	[thread overview]
Message-ID: <20210225181542.GA178925@agluck-desk2.amr.corp.intel.com> (raw)
In-Reply-To: <20210225123806.GA15006@hori.linux.bs1.fc.nec.co.jp>

On Thu, Feb 25, 2021 at 12:38:06PM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> Thank you for shedding light on this, this race looks worrisome to me.
> We call try_to_unmap() inside memory_failure(), where we find affected
> ptes by page_vma_mapped_walk() and convert into hwpoison entires in
> try_to_unmap_one().  So there seems two racy cases:
> 
>   1)
>      CPU 0                          CPU 1
>      page_vma_mapped_walk
>                                     clear _PAGE_PRESENT bit
>        // skipped the entry
> 
>   2)
>      CPU 0                          CPU 1
>      page_vma_mapped_walk
>        try_to_unmap_one
>                                     clear _PAGE_PRESENT bit
>          convert the entry
>          set_pte_at
> 
> In case 1, the affected processes get signals on later access,
> so although the info in SIGBUS could be different, that's OK.
> And we have no impact in case 2.

I've been debugging a similar issue for a few days and finally
got enough traces to partially understand what happened.

The test case is a multi-threaded pointer chasing micro-benchmark
running on all logical CPUs. We then inject poison into the address
space of the process.

All works fine if one thread consumes poison and completes all
Linux machine check processing before any other threads read the
poison. The page is unmapped, a SIGBUS is sent (which kills all
threads).

But in the problem case I see:

CPU1 reads poison, takes a machine check. Gets to the
kill_me_maybe() task work, which calls memory_failure()
this CPU sets the page poison flag, but is still executing the
rest of the flow to hunt down tasks/mappings to invalidate pages
and send SIGBUS if required.

CPU2 reads the poison. When it gets to memory_failure()
there's an early return because the poison flag is already
set. So in current code it returns and takes the machine
check again.

CPU3 reads the poison and starts along same path that CPU2
did.

Meanwhile CPU1 gets far enough along in memory failure and hits
a problem. It prints:

[ 1867.409837] Memory failure: 0x42a9ff6: reserved kernel page still referenced by 1 users
[ 1867.409850] Memory failure: 0x42a9ff6: recovery action for reserved kernel page: Failed

and doesn't complete unmapping the page that CPU2 and CPU3 are touching.

Other CPUs gradually reach the poison and join in the fun of repeatedly
taking machine checks.

I have not yet tracked why this user access is reporting as a "reserved kernel page".
Some traces showed that futex(2) syscall was in use from this benchmark,
so maybe the kernel locked a user page that was a contended futex???


Idea for what we should do next ... Now that x86 is calling memory_failure()
from user context ... maybe parallel calls for the same page should
be blocked until the first caller completes so we can:
a) know that pages are unmapped (if that happens)
b) all get the same success/fail status

-Tony

  reply	other threads:[~2021-02-25 18:16 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24  7:16 [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-02-24 10:10 ` David Hildenbrand
2021-02-24 10:31 ` Oscar Salvador
2021-02-25  3:43   ` Aili Yao
2021-02-25 11:28     ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 11:39       ` Oscar Salvador
2021-02-25 12:38         ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 18:15           ` Luck, Tony [this message]
2021-02-26  2:19             ` HORIGUCHI NAOYA(堀口 直也)
2021-02-26  2:59               ` Aili Yao
2021-03-03  3:39                 ` Luck, Tony
2021-03-03  3:57                   ` Aili Yao
2021-03-03  8:39                     ` Aili Yao
2021-03-03 15:41                       ` Luck, Tony
2021-03-04  2:16                         ` Aili Yao
2021-03-04  4:19                           ` Aili Yao
2021-03-04  6:45                             ` Aili Yao
2021-03-04 23:57                               ` Luck, Tony
2021-03-05  1:30                                 ` Aili Yao
2021-03-05  1:36                                   ` Aili Yao
2021-03-05 22:11                                     ` Luck, Tony
2021-03-08  6:45                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 18:54                                         ` Luck, Tony
2021-03-08 22:38                                           ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 22:55                                             ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Luck, Tony
2021-03-08 23:42                                               ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  2:04                                               ` Aili Yao
2021-03-09  6:04                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  6:35                                                   ` [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-03-09  8:28                                                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09 20:01                                                       ` Luck, Tony
2021-03-10  8:05                                                         ` HORIGUCHI NAOYA(堀口 直也)
2021-03-13  1:55                                                         ` Jue Wang
2021-03-13  1:55                                                           ` Jue Wang
2021-03-10  8:01                                                       ` Aili Yao
2021-03-31 11:25                                                     ` [PATCH v3] mm,hwpoison: return -EHWPOISON " Aili Yao
2021-04-01 15:33                                                       ` Luck, Tony
2021-04-02  1:18                                                         ` Aili Yao
2021-04-02 15:11                                                           ` Luck, Tony
2021-04-05 13:50                                                             ` HORIGUCHI NAOYA(堀口 直也)
2021-04-06  1:04                                                               ` Aili Yao
2021-03-09  6:38                                                   ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Aili Yao
2021-03-05 15:55                                   ` [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Luck, Tony
2021-03-10  6:10                                     ` Aili Yao
2021-03-11  8:55                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-11 11:23                                         ` Aili Yao
2021-03-11 17:05                                         ` Luck, Tony
2021-03-12  5:55                                           ` Aili Yao
2021-03-12 16:29                                             ` Luck, Tony
2021-03-12 23:48                                               ` Luck, Tony
2021-03-16  6:42                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-16  7:54                                                   ` Aili Yao
2021-03-17  0:29                                                 ` Luck, Tony
2021-03-17  9:07                                                   ` Aili Yao
2021-03-17  7:48                                         ` Aili Yao
2021-03-17  8:23                                           ` Aili Yao
2021-02-26  3:26               ` Tony Luck
2021-02-26  3:26                 ` Tony Luck
2021-02-26  2:52         ` Aili Yao
2021-02-26 17:58           ` Luck, Tony
2021-03-02  4:32             ` Aili Yao
2021-03-31 10:56         ` Aili Yao
2021-03-31 10:58           ` David Hildenbrand
2021-03-01 23:21 Jue Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210225181542.GA178925@agluck-desk2.amr.corp.intel.com \
    --to=tony.luck@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=inux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yangfeng1@kingsoft.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.