From: Li Wang <liwang@redhat.com> To: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>, Linux-MM <linux-mm@kvack.org>, LTP List <ltp@lists.linux.it>, xishi.qiuxishi@alibaba-inc.com, mhocko@kernel.org, Cyril Hrubis <chrubis@suse.cz> Subject: Re: [MM Bug?] mmap() triggers SIGBUS while doing the numa_move_pages() for offlined hugepage in background Date: Tue, 30 Jul 2019 14:29:09 +0800 [thread overview] Message-ID: <CAEemH2d=vEfppCbCgVoGdHed2kuY3GWnZGhymYT1rnxjoWNdcQ@mail.gmail.com> (raw) In-Reply-To: <47999e20-ccbe-deda-c960-473db5b56ea0@oracle.com> [-- Attachment #1: Type: text/plain, Size: 3983 bytes --] Hi Mike, Thanks for trying this. On Tue, Jul 30, 2019 at 3:01 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 7/28/19 10:17 PM, Li Wang wrote: > > Hi Naoya and Linux-MMers, > > > > The LTP/move_page12 V2 triggers SIGBUS in the kernel-v5.2.3 testing. > > https://github.com/wangli5665/ltp/blob/master/testcases/kernel/syscalls/move_pages/move_pages12.c > > > > It seems like the retry mmap() triggers SIGBUS while doing thenuma_move_pages() in background. That is very similar to the kernelbug which was mentioned by commit 6bc9b56433b76e40d(mm: fix race onsoft-offlining ): A race condition between soft offline andhugetlb_fault which causes unexpected process SIGBUS killing. > > > > I'm not sure if that below patch is making sene to memory-failures.c, but after building a new kernel-5.2.3 with this change, the problem can NOT be reproduced. > > > > Any comments? > > Something seems strange. I can not reproduce with unmodified 5.2.3 It's not 100% reproducible, I tried ten times only hit 4~6 times fail. Did you try the test case with patch V3(in my branch)? https://github.com/wangli5665/ltp/commit/198fca89870c1b807a01b27bb1d2ec6e2af1c7b6 # git clone https://github.com/wangli5665/ltp ltp.wangli --depth=1 # cd ltp.wangli/; make autotools; # ./configure ; make -j24 # cd testcases/kernel/syscalls/move_pages/ # ./move_pages12 tst_test.c:1100: INFO: Timeout per run is 0h 05m 00s move_pages12.c:249: INFO: Free RAM 64386300 kB move_pages12.c:267: INFO: Increasing 2048kB hugepages pool on node 0 to 4 move_pages12.c:277: INFO: Increasing 2048kB hugepages pool on node 1 to 4 move_pages12.c:193: INFO: Allocating and freeing 4 hugepages on node 0 move_pages12.c:193: INFO: Allocating and freeing 4 hugepages on node 1 move_pages12.c:183: PASS: Bug not reproduced tst_test.c:1145: BROK: Test killed by SIGBUS! move_pages12.c:117: FAIL: move_pages failed: ESRCH # uname -r 5.2.3 # numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39 node 0 size: 16049 MB node 0 free: 15736 MB node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47 node 1 size: 16123 MB node 1 free: 15850 MB node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 node 2 size: 16123 MB node 2 free: 15989 MB node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63 node 3 size: 16097 MB node 3 free: 15278 MB node distances: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 20 2: 20 20 10 20 3: 20 20 20 10 > Also, the soft_offline_huge_page() code should not come into play with > this specific test. I got the "soft offline xxx.. hugepage failed to isolate" message from soft_offline_huge_page() in dmesg log. === debug print info === --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1701,7 +1701,7 @@ static int soft_offline_huge_page(struct page *page, int flags) */ put_hwpoison_page(hpage); if (!ret) { - pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn); + pr_info("liwang -- soft offline: %#lx hugepage failed to isolate\n", pfn); return -EBUSY; } # dmesg ... [ 1068.947205] Soft offlining pfn 0x40b200 at process virtual address 0x7f9d8d000000 [ 1068.987054] Soft offlining pfn 0x40ac00 at process virtual address 0x7f9d8d200000 [ 1069.048478] Soft offlining pfn 0x40a800 at process virtual address 0x7f9d8d000000 [ 1069.087413] Soft offlining pfn 0x40ae00 at process virtual address 0x7f9d8d200000 [ 1069.123285] liwang -- soft offline: 0x40ae00 hugepage failed to isolate [ 1069.160137] Soft offlining pfn 0x80f800 at process virtual address 0x7f9d8d000000 [ 1069.196009] Soft offlining pfn 0x80fe00 at process virtual address 0x7f9d8d200000 [ 1069.243436] Soft offlining pfn 0x40a400 at process virtual address 0x7f9d8d000000 [ 1069.281301] Soft offlining pfn 0x40a600 at process virtual address 0x7f9d8d200000 [ 1069.318171] liwang -- soft offline: 0x40a600 hugepage failed to isolate -- Regards, Li Wang [-- Attachment #2: Type: text/html, Size: 5786 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: Li Wang <liwang@redhat.com> To: ltp@lists.linux.it Subject: [LTP] [MM Bug?] mmap() triggers SIGBUS while doing the numa_move_pages() for offlined hugepage in background Date: Tue, 30 Jul 2019 14:29:09 +0800 [thread overview] Message-ID: <CAEemH2d=vEfppCbCgVoGdHed2kuY3GWnZGhymYT1rnxjoWNdcQ@mail.gmail.com> (raw) In-Reply-To: <47999e20-ccbe-deda-c960-473db5b56ea0@oracle.com> Hi Mike, Thanks for trying this. On Tue, Jul 30, 2019 at 3:01 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 7/28/19 10:17 PM, Li Wang wrote: > > Hi Naoya and Linux-MMers, > > > > The LTP/move_page12 V2 triggers SIGBUS in the kernel-v5.2.3 testing. > > https://github.com/wangli5665/ltp/blob/master/testcases/kernel/syscalls/move_pages/move_pages12.c > > > > It seems like the retry mmap() triggers SIGBUS while doing thenuma_move_pages() in background. That is very similar to the kernelbug which was mentioned by commit 6bc9b56433b76e40d(mm: fix race onsoft-offlining ): A race condition between soft offline andhugetlb_fault which causes unexpected process SIGBUS killing. > > > > I'm not sure if that below patch is making sene to memory-failures.c, but after building a new kernel-5.2.3 with this change, the problem can NOT be reproduced. > > > > Any comments? > > Something seems strange. I can not reproduce with unmodified 5.2.3 It's not 100% reproducible, I tried ten times only hit 4~6 times fail. Did you try the test case with patch V3(in my branch)? https://github.com/wangli5665/ltp/commit/198fca89870c1b807a01b27bb1d2ec6e2af1c7b6 # git clone https://github.com/wangli5665/ltp ltp.wangli --depth=1 # cd ltp.wangli/; make autotools; # ./configure ; make -j24 # cd testcases/kernel/syscalls/move_pages/ # ./move_pages12 tst_test.c:1100: INFO: Timeout per run is 0h 05m 00s move_pages12.c:249: INFO: Free RAM 64386300 kB move_pages12.c:267: INFO: Increasing 2048kB hugepages pool on node 0 to 4 move_pages12.c:277: INFO: Increasing 2048kB hugepages pool on node 1 to 4 move_pages12.c:193: INFO: Allocating and freeing 4 hugepages on node 0 move_pages12.c:193: INFO: Allocating and freeing 4 hugepages on node 1 move_pages12.c:183: PASS: Bug not reproduced tst_test.c:1145: BROK: Test killed by SIGBUS! move_pages12.c:117: FAIL: move_pages failed: ESRCH # uname -r 5.2.3 # numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39 node 0 size: 16049 MB node 0 free: 15736 MB node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47 node 1 size: 16123 MB node 1 free: 15850 MB node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 node 2 size: 16123 MB node 2 free: 15989 MB node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63 node 3 size: 16097 MB node 3 free: 15278 MB node distances: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 20 2: 20 20 10 20 3: 20 20 20 10 > Also, the soft_offline_huge_page() code should not come into play with > this specific test. I got the "soft offline xxx.. hugepage failed to isolate" message from soft_offline_huge_page() in dmesg log. === debug print info === --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1701,7 +1701,7 @@ static int soft_offline_huge_page(struct page *page, int flags) */ put_hwpoison_page(hpage); if (!ret) { - pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn); + pr_info("liwang -- soft offline: %#lx hugepage failed to isolate\n", pfn); return -EBUSY; } # dmesg ... [ 1068.947205] Soft offlining pfn 0x40b200 at process virtual address 0x7f9d8d000000 [ 1068.987054] Soft offlining pfn 0x40ac00 at process virtual address 0x7f9d8d200000 [ 1069.048478] Soft offlining pfn 0x40a800 at process virtual address 0x7f9d8d000000 [ 1069.087413] Soft offlining pfn 0x40ae00 at process virtual address 0x7f9d8d200000 [ 1069.123285] liwang -- soft offline: 0x40ae00 hugepage failed to isolate [ 1069.160137] Soft offlining pfn 0x80f800 at process virtual address 0x7f9d8d000000 [ 1069.196009] Soft offlining pfn 0x80fe00 at process virtual address 0x7f9d8d200000 [ 1069.243436] Soft offlining pfn 0x40a400 at process virtual address 0x7f9d8d000000 [ 1069.281301] Soft offlining pfn 0x40a600 at process virtual address 0x7f9d8d200000 [ 1069.318171] liwang -- soft offline: 0x40a600 hugepage failed to isolate -- Regards, Li Wang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linux.it/pipermail/ltp/attachments/20190730/23d7a58b/attachment-0001.htm>
next prev parent reply other threads:[~2019-07-30 6:29 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-07-29 5:17 [MM Bug?] mmap() triggers SIGBUS while doing the numa_move_pages() for offlined hugepage in background Li Wang 2019-07-29 5:17 ` [LTP] " Li Wang 2019-07-29 19:00 ` Mike Kravetz 2019-07-29 19:00 ` [LTP] " Mike Kravetz 2019-07-30 6:29 ` Li Wang [this message] 2019-07-30 6:29 ` Li Wang 2019-07-31 0:44 ` Mike Kravetz 2019-07-31 0:44 ` [LTP] " Mike Kravetz 2019-08-02 0:19 ` Mike Kravetz 2019-08-02 0:19 ` [LTP] " Mike Kravetz 2019-08-02 4:15 ` Naoya Horiguchi 2019-08-02 4:15 ` [LTP] " Naoya Horiguchi 2019-08-02 17:42 ` Mike Kravetz 2019-08-02 17:42 ` [LTP] " Mike Kravetz 2019-08-05 0:40 ` Naoya Horiguchi 2019-08-05 0:40 ` [LTP] " Naoya Horiguchi 2019-08-05 8:57 ` Michal Hocko 2019-08-05 8:57 ` [LTP] " Michal Hocko 2019-08-05 17:36 ` Mike Kravetz 2019-08-05 17:36 ` [LTP] " Mike Kravetz 2019-08-07 0:07 ` Mike Kravetz 2019-08-07 0:07 ` [LTP] " Mike Kravetz 2019-08-07 7:39 ` Michal Hocko 2019-08-07 7:39 ` [LTP] " Michal Hocko 2019-08-07 15:10 ` Mike Kravetz 2019-08-07 15:10 ` [LTP] " Mike Kravetz 2019-08-02 9:59 ` Li Wang 2019-08-02 9:59 ` [LTP] " Li Wang 2019-07-30 6:38 ` Li Wang 2019-07-30 6:38 ` [LTP] " Li Wang 2019-08-02 3:48 ` Naoya Horiguchi 2019-08-02 3:48 ` [LTP] " Naoya Horiguchi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAEemH2d=vEfppCbCgVoGdHed2kuY3GWnZGhymYT1rnxjoWNdcQ@mail.gmail.com' \ --to=liwang@redhat.com \ --cc=chrubis@suse.cz \ --cc=linux-mm@kvack.org \ --cc=ltp@lists.linux.it \ --cc=mhocko@kernel.org \ --cc=mike.kravetz@oracle.com \ --cc=n-horiguchi@ah.jp.nec.com \ --cc=xishi.qiuxishi@alibaba-inc.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.