All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Li Wang <liwang@redhat.com>
Cc: Linux-MM <linux-mm@kvack.org>, LTP List <ltp@lists.linux.it>,
	"mike.kravetz@oracle.com" <mike.kravetz@oracle.com>,
	"xishi.qiuxishi@alibaba-inc.com" <xishi.qiuxishi@alibaba-inc.com>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	Cyril Hrubis <chrubis@suse.cz>
Subject: Re: [MM Bug?] mmap() triggers SIGBUS while doing the​ ​numa_move_pages() for offlined hugepage in background
Date: Fri, 2 Aug 2019 03:48:26 +0000	[thread overview]
Message-ID: <20190802034825.GA20130@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <CAEemH2dMW6oh6Bbm=yqUADF+mDhuQgFTTGYftB+xAhqqdYV3Ng@mail.gmail.com>

On Mon, Jul 29, 2019 at 01:17:27PM +0800, Li Wang wrote:
> Hi Naoya and Linux-MMers,
> 
> The LTP/move_page12 V2 triggers SIGBUS in the kernel-v5.2.3 testing.
> https://github.com/wangli5665/ltp/blob/master/testcases/kernel/syscalls/
> move_pages/move_pages12.c
> 
> It seems like the retry mmap() triggers SIGBUS while doing the numa_move_pages
> () in background. That is very similar to the kernel bug which was mentioned by
> commit 6bc9b56433b76e40d(mm: fix race on soft-offlining ): A race condition
> between soft offline and hugetlb_fault which causes unexpected process SIGBUS
> killing.
> 
> I'm not sure if that below patch is making sene to memory-failures.c, but after
> building a new kernel-5.2.3 with this change, the problem can NOT be reproduced
> . 
> 
> Any comments?
> 
> ----------------------------------
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1695,15 +1695,16 @@ static int soft_offline_huge_page(struct page *page,
> int flags)
>         unlock_page(hpage);
> 
>         ret = isolate_huge_page(hpage, &pagelist);
> +       if (!ret) {
> +               pr_info("soft offline: %#lx hugepage failed to isolate\n",
> pfn);
> +               return -EBUSY;
> +       }
> +
>         /*
>          * get_any_page() and isolate_huge_page() takes a refcount each,
>          * so need to drop one here.
>          */
>         put_hwpoison_page(hpage);
> -       if (!ret) {
> -               pr_info("soft offline: %#lx hugepage failed to isolate\n",
> pfn);
> -               return -EBUSY;
> -       }

Sorry for my late response.

This change skips put_hwpoison_page() in failure path, so soft_offline_page()
should return without releasing hpage's refcount taken by get_any_page(),
maybe which is not what we want.

- Naoya

WARNING: multiple messages have this Message-ID (diff)
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: ltp@lists.linux.it
Subject: [LTP]  [MM Bug?] mmap() triggers SIGBUS while doing the​ ​numa_move_pages() for offlined hugepage in background
Date: Fri, 2 Aug 2019 03:48:26 +0000	[thread overview]
Message-ID: <20190802034825.GA20130@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <CAEemH2dMW6oh6Bbm=yqUADF+mDhuQgFTTGYftB+xAhqqdYV3Ng@mail.gmail.com>

On Mon, Jul 29, 2019 at 01:17:27PM +0800, Li Wang wrote:
> Hi Naoya and Linux-MMers,
> 
> The LTP/move_page12 V2 triggers SIGBUS in the kernel-v5.2.3 testing.
> https://github.com/wangli5665/ltp/blob/master/testcases/kernel/syscalls/
> move_pages/move_pages12.c
> 
> It seems like the retry mmap() triggers SIGBUS while doing the numa_move_pages
> () in background. That is very similar to the kernel bug which was mentioned by
> commit 6bc9b56433b76e40d(mm: fix race on soft-offlining ): A race condition
> between soft offline and hugetlb_fault which causes unexpected process SIGBUS
> killing.
> 
> I'm not sure if that below patch is making sene to memory-failures.c, but after
> building a new kernel-5.2.3 with this change, the problem can NOT be reproduced
> . 
> 
> Any comments?
> 
> ----------------------------------
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1695,15 +1695,16 @@ static int soft_offline_huge_page(struct page *page,
> int flags)
>         unlock_page(hpage);
> 
>         ret = isolate_huge_page(hpage, &pagelist);
> +       if (!ret) {
> +               pr_info("soft offline: %#lx hugepage failed to isolate\n",
> pfn);
> +               return -EBUSY;
> +       }
> +
>         /*
>          * get_any_page() and isolate_huge_page() takes a refcount each,
>          * so need to drop one here.
>          */
>         put_hwpoison_page(hpage);
> -       if (!ret) {
> -               pr_info("soft offline: %#lx hugepage failed to isolate\n",
> pfn);
> -               return -EBUSY;
> -       }

Sorry for my late response.

This change skips put_hwpoison_page() in failure path, so soft_offline_page()
should return without releasing hpage's refcount taken by get_any_page(),
maybe which is not what we want.

- Naoya

  parent reply	other threads:[~2019-08-02  3:49 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-29  5:17 [MM Bug?] mmap() triggers SIGBUS while doing the​ ​numa_move_pages() for offlined hugepage in background Li Wang
2019-07-29  5:17 ` [LTP] " Li Wang
2019-07-29 19:00 ` Mike Kravetz
2019-07-29 19:00   ` [LTP] " Mike Kravetz
2019-07-30  6:29   ` Li Wang
2019-07-30  6:29     ` [LTP] " Li Wang
2019-07-31  0:44     ` Mike Kravetz
2019-07-31  0:44       ` [LTP] " Mike Kravetz
2019-08-02  0:19       ` Mike Kravetz
2019-08-02  0:19         ` [LTP] " Mike Kravetz
2019-08-02  4:15         ` Naoya Horiguchi
2019-08-02  4:15           ` [LTP] " Naoya Horiguchi
2019-08-02 17:42           ` Mike Kravetz
2019-08-02 17:42             ` [LTP] " Mike Kravetz
2019-08-05  0:40             ` Naoya Horiguchi
2019-08-05  0:40               ` [LTP] " Naoya Horiguchi
2019-08-05  8:57             ` Michal Hocko
2019-08-05  8:57               ` [LTP] " Michal Hocko
2019-08-05 17:36               ` Mike Kravetz
2019-08-05 17:36                 ` [LTP] " Mike Kravetz
2019-08-07  0:07                 ` Mike Kravetz
2019-08-07  0:07                   ` [LTP] " Mike Kravetz
2019-08-07  7:39                   ` Michal Hocko
2019-08-07  7:39                     ` [LTP] " Michal Hocko
2019-08-07 15:10                     ` Mike Kravetz
2019-08-07 15:10                       ` [LTP] " Mike Kravetz
2019-08-02  9:59         ` Li Wang
2019-08-02  9:59           ` [LTP] " Li Wang
2019-07-30  6:38   ` Li Wang
2019-07-30  6:38     ` [LTP] " Li Wang
2019-08-02  3:48 ` Naoya Horiguchi [this message]
2019-08-02  3:48   ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190802034825.GA20130@hori.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=chrubis@suse.cz \
    --cc=linux-mm@kvack.org \
    --cc=liwang@redhat.com \
    --cc=ltp@lists.linux.it \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=xishi.qiuxishi@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.