All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] mm/memory-failure: rework fix on huge_zero_page splitting
@ 2022-04-27  6:10 Xu Yu
  2022-04-27  6:10 ` [PATCH 1/2] Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()" Xu Yu
  2022-04-27  6:10 ` [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page Xu Yu
  0 siblings, 2 replies; 18+ messages in thread
From: Xu Yu @ 2022-04-27  6:10 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, naoya.horiguchi, shy828301

This is actually PATCH v3 for the bug reported in PATCH v2[1], which is
prematurely merged into mainline.

Therefore, this patchset first reverts PATCH v2, and then provides the
v3 fix as the subsequent patch.

[1] https://lkml.kernel.org/r/497d3835612610e370c74e697ea3c721d1d55b9c.1649775850.git.xuyu@linux.alibaba.com

---
v2->v3: replace the BUG to WARN + returning -EBUSY when splitting
huge_zero_page, and keep memory_failure unchanged.

Xu Yu (2):
  Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
  mm/huge_memory: do not overkill when splitting huge_zero_page

 mm/huge_memory.c    |  4 +++-
 mm/memory-failure.c | 13 -------------
 2 files changed, 3 insertions(+), 14 deletions(-)

-- 
2.20.1.2432.ga663e714



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
  2022-04-27  6:10 [PATCH 0/2] mm/memory-failure: rework fix on huge_zero_page splitting Xu Yu
@ 2022-04-27  6:10 ` Xu Yu
  2022-04-27 21:13   ` Yang Shi
  2022-04-28  2:23   ` Miaohe Lin
  2022-04-27  6:10 ` [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page Xu Yu
  1 sibling, 2 replies; 18+ messages in thread
From: Xu Yu @ 2022-04-27  6:10 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, naoya.horiguchi, shy828301

This reverts commit d173d5417fb67411e623d394aab986d847e47dad.

The commit d173d5417fb6 ("mm/memory-failure.c: skip huge_zero_page in
memory_failure()") explicitly skips huge_zero_page in memory_failure(),
in order to avoid triggering VM_BUG_ON_PAGE on huge_zero_page in
split_huge_page_to_list().

This works, but Yang Shi thinks that,

    Raising BUG is overkilling for splitting huge_zero_page. The
    huge_zero_page can't be met from normal paths other than memory
    failure, but memory failure is a valid caller. So I tend to replace
    the BUG to WARN + returning -EBUSY. If we don't care about the
    reason code in memory failure, we don't have to touch memory
    failure.

And for the issue that huge_zero_page will be set PG_has_hwpoisoned,
Yang Shi comments that,

    The anonymous page fault doesn't check if the page is poisoned or
    not since it typically gets a fresh allocated page and assumes the
    poisoned page (isolated successfully) can't be reallocated again.
    But huge zero page and base zero page are reused every time. So no
    matter what fix we pick, the issue is always there.

Finally, Yang, David, Anshuman and Naoya all agree to fix the bug, i.e.,
to split huge_zero_page, in split_huge_page_to_list().

This reverts the commit d173d5417fb6 ("mm/memory-failure.c: skip
huge_zero_page in memory_failure()"), and the original bug will be fixed
by the next patch.

Suggested-by: Yang Shi <shy828301@gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
---
 mm/memory-failure.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 27760c19bad7..2020944398c9 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1860,19 +1860,6 @@ int memory_failure(unsigned long pfn, int flags)
 	}
 
 	if (PageTransHuge(hpage)) {
-		/*
-		 * Bail out before SetPageHasHWPoisoned() if hpage is
-		 * huge_zero_page, although PG_has_hwpoisoned is not
-		 * checked in set_huge_zero_page().
-		 *
-		 * TODO: Handle memory failure of huge_zero_page thoroughly.
-		 */
-		if (is_huge_zero_page(hpage)) {
-			action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
-			res = -EBUSY;
-			goto unlock_mutex;
-		}
-
 		/*
 		 * The flag must be set after the refcount is bumped
 		 * otherwise it may race with THP split.
-- 
2.20.1.2432.ga663e714



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  6:10 [PATCH 0/2] mm/memory-failure: rework fix on huge_zero_page splitting Xu Yu
  2022-04-27  6:10 ` [PATCH 1/2] Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()" Xu Yu
@ 2022-04-27  6:10 ` Xu Yu
  2022-04-27  7:12   ` HORIGUCHI NAOYA(堀口 直也)
                     ` (4 more replies)
  1 sibling, 5 replies; 18+ messages in thread
From: Xu Yu @ 2022-04-27  6:10 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, naoya.horiguchi, shy828301

Kernel panic when injecting memory_failure for the global
huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.

  Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
  page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
  head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
  flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
  raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
  page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
  ------------[ cut here ]------------
  kernel BUG at mm/huge_memory.c:2499!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
  Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
  RIP: 0010:split_huge_page_to_list+0x66a/0x880
  Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
  RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
  RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
  RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
  R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
  R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
  FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
  try_to_split_thp_page+0x3a/0x130
  memory_failure+0x128/0x800
  madvise_inject_error.cold+0x8b/0xa1
  __x64_sys_madvise+0x54/0x60
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7fc3754f8bf9
  Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
  RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
  RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
  RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
  R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
  R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000

We think that raising BUG is overkilling for splitting huge_zero_page,
the huge_zero_page can't be met from normal paths other than memory
failure, but memory failure is a valid caller. So we tend to replace the
BUG to WARN + returning -EBUSY, and thus the panic above won't happen
again.

Suggested-by: Yang Shi <shy828301@gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
---
 mm/huge_memory.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c468fee595ff..3bb464509518 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2496,10 +2496,12 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	int extra_pins, ret;
 	pgoff_t end;
 
-	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
 	VM_BUG_ON_PAGE(!PageLocked(head), head);
 	VM_BUG_ON_PAGE(!PageCompound(head), head);
 
+	if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
+		return -EBUSY;
+
 	if (PageWriteback(head))
 		return -EBUSY;
 
-- 
2.20.1.2432.ga663e714



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  6:10 ` [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page Xu Yu
@ 2022-04-27  7:12   ` HORIGUCHI NAOYA(堀口 直也)
  2022-04-27  7:37     ` Yu Xu
  2022-04-27 19:00     ` Andrew Morton
  2022-04-27  9:01   ` kernel test robot
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-04-27  7:12 UTC (permalink / raw)
  To: Xu Yu; +Cc: linux-mm, akpm, shy828301

On Wed, Apr 27, 2022 at 02:10:17PM +0800, Xu Yu wrote:
> Kernel panic when injecting memory_failure for the global
> huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
> 
>   Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
>   page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
>   head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
>   flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
>   raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
>   raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
>   page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
>   ------------[ cut here ]------------
>   kernel BUG at mm/huge_memory.c:2499!
>   invalid opcode: 0000 [#1] PREEMPT SMP PTI
>   CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
>   Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
>   RIP: 0010:split_huge_page_to_list+0x66a/0x880
>   Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
>   RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
>   RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
>   RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
>   RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
>   R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
>   R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
>   FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>   try_to_split_thp_page+0x3a/0x130
>   memory_failure+0x128/0x800
>   madvise_inject_error.cold+0x8b/0xa1
>   __x64_sys_madvise+0x54/0x60
>   do_syscall_64+0x35/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>   RIP: 0033:0x7fc3754f8bf9
>   Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
>   RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
>   RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
>   RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
>   RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
>   R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
>   R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
> 
> We think that raising BUG is overkilling for splitting huge_zero_page,
> the huge_zero_page can't be met from normal paths other than memory
> failure, but memory failure is a valid caller. So we tend to replace the
> BUG to WARN + returning -EBUSY, and thus the panic above won't happen
> again.
> 
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

What to do on -stable?
The older version was backported to 5.15.z and 5.17.z, so if you choose
to send this to stable, 1/2 should be also sent to stable.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  7:12   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-04-27  7:37     ` Yu Xu
  2022-04-27 19:00     ` Andrew Morton
  1 sibling, 0 replies; 18+ messages in thread
From: Yu Xu @ 2022-04-27  7:37 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: linux-mm, akpm, shy828301

On 4/27/22 3:12 PM, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Apr 27, 2022 at 02:10:17PM +0800, Xu Yu wrote:
>> Kernel panic when injecting memory_failure for the global
>> huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
>>
>>    Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
>>    page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
>>    head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
>>    flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
>>    raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
>>    raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
>>    page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
>>    ------------[ cut here ]------------
>>    kernel BUG at mm/huge_memory.c:2499!
>>    invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>    CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
>>    Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
>>    RIP: 0010:split_huge_page_to_list+0x66a/0x880
>>    Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
>>    RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
>>    RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
>>    RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
>>    RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
>>    R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
>>    R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
>>    FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
>>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>    CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
>>    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>    Call Trace:
>>    try_to_split_thp_page+0x3a/0x130
>>    memory_failure+0x128/0x800
>>    madvise_inject_error.cold+0x8b/0xa1
>>    __x64_sys_madvise+0x54/0x60
>>    do_syscall_64+0x35/0x80
>>    entry_SYSCALL_64_after_hwframe+0x44/0xae
>>    RIP: 0033:0x7fc3754f8bf9
>>    Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
>>    RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
>>    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
>>    RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
>>    RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
>>    R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
>>    R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
>>
>> We think that raising BUG is overkilling for splitting huge_zero_page,
>> the huge_zero_page can't be met from normal paths other than memory
>> failure, but memory failure is a valid caller. So we tend to replace the
>> BUG to WARN + returning -EBUSY, and thus the panic above won't happen
>> again.
>>
>> Suggested-by: Yang Shi <shy828301@gmail.com>
>> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
> 
> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> What to do on -stable?
> The older version was backported to 5.15.z and 5.17.z, so if you choose
> to send this to stable, 1/2 should be also sent to stable.

IMHO, I would like to view v3 as an optimization of v2, since the older version
is also capable of fixing this bug accurately. Since the Fixes tag has already
been added to the older version, let's keep -stable to use the older version.

Anyway, the older version is not bad. :)


> 
> Thanks,
> Naoya Horiguchi

-- 
Thanks,
Yu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  6:10 ` [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page Xu Yu
  2022-04-27  7:12   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-04-27  9:01   ` kernel test robot
  2022-04-27  9:48       ` Yu Xu
  2022-04-27  9:36   ` kernel test robot
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: kernel test robot @ 2022-04-27  9:01 UTC (permalink / raw)
  To: Xu Yu, linux-mm; +Cc: kbuild-all, akpm, naoya.horiguchi, shy828301

Hi Xu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
base:   https://github.com/hnaz/linux-mm master
config: s390-randconfig-r044-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271636.UqHlxRwk-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
        git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/bits.h:22,
                    from include/linux/ratelimit_types.h:5,
                    from include/linux/printk.h:10,
                    from include/asm-generic/bug.h:22,
                    from arch/s390/include/asm/bug.h:68,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:6,
                    from mm/huge_memory.c:8:
   mm/huge_memory.c: In function 'split_huge_page_to_list':
>> include/linux/build_bug.h:30:33: error: void value not ignored as it ought to be
      30 | #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
         |                                 ^
   include/linux/mmdebug.h:81:43: note: in expansion of macro 'BUILD_BUG_ON_INVALID'
      81 | #define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
         |                                           ^~~~~~~~~~~~~~~~~~~~
   mm/huge_memory.c:2553:13: note: in expansion of macro 'VM_WARN_ON_ONCE_PAGE'
    2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
         |             ^~~~~~~~~~~~~~~~~~~~


vim +30 include/linux/build_bug.h

527edbc18a70e74 Masahiro Yamada 2019-01-03  18  
527edbc18a70e74 Masahiro Yamada 2019-01-03  19  /* Force a compilation error if a constant expression is not a power of 2 */
527edbc18a70e74 Masahiro Yamada 2019-01-03  20  #define __BUILD_BUG_ON_NOT_POWER_OF_2(n)	\
527edbc18a70e74 Masahiro Yamada 2019-01-03  21  	BUILD_BUG_ON(((n) & ((n) - 1)) != 0)
527edbc18a70e74 Masahiro Yamada 2019-01-03  22  #define BUILD_BUG_ON_NOT_POWER_OF_2(n)			\
527edbc18a70e74 Masahiro Yamada 2019-01-03  23  	BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0))
bc6245e5efd70c4 Ian Abbott      2017-07-10  24  
bc6245e5efd70c4 Ian Abbott      2017-07-10  25  /*
bc6245e5efd70c4 Ian Abbott      2017-07-10  26   * BUILD_BUG_ON_INVALID() permits the compiler to check the validity of the
bc6245e5efd70c4 Ian Abbott      2017-07-10  27   * expression but avoids the generation of any code, even if that expression
bc6245e5efd70c4 Ian Abbott      2017-07-10  28   * has side-effects.
bc6245e5efd70c4 Ian Abbott      2017-07-10  29   */
bc6245e5efd70c4 Ian Abbott      2017-07-10 @30  #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
bc6245e5efd70c4 Ian Abbott      2017-07-10  31  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  6:10 ` [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page Xu Yu
  2022-04-27  7:12   ` HORIGUCHI NAOYA(堀口 直也)
  2022-04-27  9:01   ` kernel test robot
@ 2022-04-27  9:36   ` kernel test robot
  2022-04-27  9:44   ` [PATCH 2/2 RESEND] " Xu Yu
  2022-04-28  1:59   ` [PATCH 2/2] " kernel test robot
  4 siblings, 0 replies; 18+ messages in thread
From: kernel test robot @ 2022-04-27  9:36 UTC (permalink / raw)
  To: Xu Yu, linux-mm; +Cc: llvm, kbuild-all, akpm, naoya.horiguchi, shy828301

Hi Xu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
base:   https://github.com/hnaz/linux-mm master
config: i386-randconfig-a003-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271706.mGX6CwrT-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1cddcfdc3c683b393df1a5c9063252eb60e52818)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
        git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/huge_memory.c:2553:2: error: statement requires expression of scalar type ('void' invalid)
           if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
           ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 error generated.


vim +2553 mm/huge_memory.c

  2519	
  2520	/*
  2521	 * This function splits huge page into normal pages. @page can point to any
  2522	 * subpage of huge page to split. Split doesn't change the position of @page.
  2523	 *
  2524	 * Only caller must hold pin on the @page, otherwise split fails with -EBUSY.
  2525	 * The huge page must be locked.
  2526	 *
  2527	 * If @list is null, tail pages will be added to LRU list, otherwise, to @list.
  2528	 *
  2529	 * Both head page and tail pages will inherit mapping, flags, and so on from
  2530	 * the hugepage.
  2531	 *
  2532	 * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
  2533	 * they are not mapped.
  2534	 *
  2535	 * Returns 0 if the hugepage is split successfully.
  2536	 * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
  2537	 * us.
  2538	 */
  2539	int split_huge_page_to_list(struct page *page, struct list_head *list)
  2540	{
  2541		struct folio *folio = page_folio(page);
  2542		struct page *head = &folio->page;
  2543		struct deferred_split *ds_queue = get_deferred_split_queue(head);
  2544		XA_STATE(xas, &head->mapping->i_pages, head->index);
  2545		struct anon_vma *anon_vma = NULL;
  2546		struct address_space *mapping = NULL;
  2547		int extra_pins, ret;
  2548		pgoff_t end;
  2549	
  2550		VM_BUG_ON_PAGE(!PageLocked(head), head);
  2551		VM_BUG_ON_PAGE(!PageCompound(head), head);
  2552	
> 2553		if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
  2554			return -EBUSY;
  2555	
  2556		if (PageWriteback(head))
  2557			return -EBUSY;
  2558	
  2559		if (PageAnon(head)) {
  2560			/*
  2561			 * The caller does not necessarily hold an mmap_lock that would
  2562			 * prevent the anon_vma disappearing so we first we take a
  2563			 * reference to it and then lock the anon_vma for write. This
  2564			 * is similar to folio_lock_anon_vma_read except the write lock
  2565			 * is taken to serialise against parallel split or collapse
  2566			 * operations.
  2567			 */
  2568			anon_vma = page_get_anon_vma(head);
  2569			if (!anon_vma) {
  2570				ret = -EBUSY;
  2571				goto out;
  2572			}
  2573			end = -1;
  2574			mapping = NULL;
  2575			anon_vma_lock_write(anon_vma);
  2576		} else {
  2577			mapping = head->mapping;
  2578	
  2579			/* Truncated ? */
  2580			if (!mapping) {
  2581				ret = -EBUSY;
  2582				goto out;
  2583			}
  2584	
  2585			xas_split_alloc(&xas, head, compound_order(head),
  2586					mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK);
  2587			if (xas_error(&xas)) {
  2588				ret = xas_error(&xas);
  2589				goto out;
  2590			}
  2591	
  2592			anon_vma = NULL;
  2593			i_mmap_lock_read(mapping);
  2594	
  2595			/*
  2596			 *__split_huge_page() may need to trim off pages beyond EOF:
  2597			 * but on 32-bit, i_size_read() takes an irq-unsafe seqlock,
  2598			 * which cannot be nested inside the page tree lock. So note
  2599			 * end now: i_size itself may be changed at any moment, but
  2600			 * head page lock is good enough to serialize the trimming.
  2601			 */
  2602			end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE);
  2603			if (shmem_mapping(mapping))
  2604				end = shmem_fallocend(mapping->host, end);
  2605		}
  2606	
  2607		/*
  2608		 * Racy check if we can split the page, before unmap_page() will
  2609		 * split PMDs
  2610		 */
  2611		if (!can_split_folio(folio, &extra_pins)) {
  2612			ret = -EBUSY;
  2613			goto out_unlock;
  2614		}
  2615	
  2616		unmap_page(head);
  2617	
  2618		/* block interrupt reentry in xa_lock and spinlock */
  2619		local_irq_disable();
  2620		if (mapping) {
  2621			/*
  2622			 * Check if the head page is present in page cache.
  2623			 * We assume all tail are present too, if head is there.
  2624			 */
  2625			xas_lock(&xas);
  2626			xas_reset(&xas);
  2627			if (xas_load(&xas) != head)
  2628				goto fail;
  2629		}
  2630	
  2631		/* Prevent deferred_split_scan() touching ->_refcount */
  2632		spin_lock(&ds_queue->split_queue_lock);
  2633		if (page_ref_freeze(head, 1 + extra_pins)) {
  2634			if (!list_empty(page_deferred_list(head))) {
  2635				ds_queue->split_queue_len--;
  2636				list_del(page_deferred_list(head));
  2637			}
  2638			spin_unlock(&ds_queue->split_queue_lock);
  2639			if (mapping) {
  2640				int nr = thp_nr_pages(head);
  2641	
  2642				xas_split(&xas, head, thp_order(head));
  2643				if (PageSwapBacked(head)) {
  2644					__mod_lruvec_page_state(head, NR_SHMEM_THPS,
  2645								-nr);
  2646				} else {
  2647					__mod_lruvec_page_state(head, NR_FILE_THPS,
  2648								-nr);
  2649					filemap_nr_thps_dec(mapping);
  2650				}
  2651			}
  2652	
  2653			__split_huge_page(page, list, end);
  2654			ret = 0;
  2655		} else {
  2656			spin_unlock(&ds_queue->split_queue_lock);
  2657	fail:
  2658			if (mapping)
  2659				xas_unlock(&xas);
  2660			local_irq_enable();
  2661			remap_page(folio, folio_nr_pages(folio));
  2662			ret = -EBUSY;
  2663		}
  2664	
  2665	out_unlock:
  2666		if (anon_vma) {
  2667			anon_vma_unlock_write(anon_vma);
  2668			put_anon_vma(anon_vma);
  2669		}
  2670		if (mapping)
  2671			i_mmap_unlock_read(mapping);
  2672	out:
  2673		/* Free any memory we didn't use */
  2674		xas_nomem(&xas, 0);
  2675		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
  2676		return ret;
  2677	}
  2678	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2 RESEND] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  6:10 ` [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page Xu Yu
                     ` (2 preceding siblings ...)
  2022-04-27  9:36   ` kernel test robot
@ 2022-04-27  9:44   ` Xu Yu
  2022-04-27 21:15     ` Yang Shi
                       ` (2 more replies)
  2022-04-28  1:59   ` [PATCH 2/2] " kernel test robot
  4 siblings, 3 replies; 18+ messages in thread
From: Xu Yu @ 2022-04-27  9:44 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, naoya.horiguchi, shy828301

Kernel panic when injecting memory_failure for the global
huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.

  Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
  page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
  head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
  flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
  raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
  page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
  ------------[ cut here ]------------
  kernel BUG at mm/huge_memory.c:2499!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
  Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
  RIP: 0010:split_huge_page_to_list+0x66a/0x880
  Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
  RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
  RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
  RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
  R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
  R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
  FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
  try_to_split_thp_page+0x3a/0x130
  memory_failure+0x128/0x800
  madvise_inject_error.cold+0x8b/0xa1
  __x64_sys_madvise+0x54/0x60
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7fc3754f8bf9
  Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
  RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
  RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
  RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
  R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
  R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000

We think that raising BUG is overkilling for splitting huge_zero_page,
the huge_zero_page can't be met from normal paths other than memory
failure, but memory failure is a valid caller. So we tend to replace the
BUG to WARN + returning -EBUSY, and thus the panic above won't happen
again.

Suggested-by: Yang Shi <shy828301@gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
---
 mm/huge_memory.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c468fee595ff..910a138e9859 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2495,11 +2495,16 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	struct address_space *mapping = NULL;
 	int extra_pins, ret;
 	pgoff_t end;
+	bool is_hzp;
 
-	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
 	VM_BUG_ON_PAGE(!PageLocked(head), head);
 	VM_BUG_ON_PAGE(!PageCompound(head), head);
 
+	is_hzp = is_huge_zero_page(head);
+	VM_WARN_ON_ONCE_PAGE(is_hzp, head);
+	if (is_hzp)
+		return -EBUSY;
+
 	if (PageWriteback(head))
 		return -EBUSY;
 
-- 
2.20.1.2432.ga663e714



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  9:01   ` kernel test robot
@ 2022-04-27  9:48       ` Yu Xu
  0 siblings, 0 replies; 18+ messages in thread
From: Yu Xu @ 2022-04-27  9:48 UTC (permalink / raw)
  To: kernel test robot, linux-mm; +Cc: kbuild-all, akpm, naoya.horiguchi, shy828301

Thanks!

Sorry for that I tested only with CONFIG_DEBUG_VM enabled. This issue is
triggered when CONFIG_DEBUG_VM is disabled.

PATCH 2/2 has been resend.

On 4/27/22 5:01 PM, kernel test robot wrote:
> Hi Xu,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on hnaz-mm/master]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
> base:   https://github.com/hnaz/linux-mm master
> config: s390-randconfig-r044-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271636.UqHlxRwk-lkp@intel.com/config)
> compiler: s390-linux-gcc (GCC) 11.3.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1
>          git remote add linux-review https://github.com/intel-lab-lkp/linux
>          git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
>          git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1
>          # save the config file
>          mkdir build_dir && cp config build_dir/.config
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>     In file included from include/linux/bits.h:22,
>                      from include/linux/ratelimit_types.h:5,
>                      from include/linux/printk.h:10,
>                      from include/asm-generic/bug.h:22,
>                      from arch/s390/include/asm/bug.h:68,
>                      from include/linux/bug.h:5,
>                      from include/linux/mmdebug.h:5,
>                      from include/linux/mm.h:6,
>                      from mm/huge_memory.c:8:
>     mm/huge_memory.c: In function 'split_huge_page_to_list':
>>> include/linux/build_bug.h:30:33: error: void value not ignored as it ought to be
>        30 | #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
>           |                                 ^
>     include/linux/mmdebug.h:81:43: note: in expansion of macro 'BUILD_BUG_ON_INVALID'
>        81 | #define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
>           |                                           ^~~~~~~~~~~~~~~~~~~~
>     mm/huge_memory.c:2553:13: note: in expansion of macro 'VM_WARN_ON_ONCE_PAGE'
>      2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
>           |             ^~~~~~~~~~~~~~~~~~~~
> 
> 
> vim +30 include/linux/build_bug.h
> 
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  18
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  19  /* Force a compilation error if a constant expression is not a power of 2 */
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  20  #define __BUILD_BUG_ON_NOT_POWER_OF_2(n)	\
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  21  	BUILD_BUG_ON(((n) & ((n) - 1)) != 0)
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  22  #define BUILD_BUG_ON_NOT_POWER_OF_2(n)			\
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  23  	BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0))
> bc6245e5efd70c4 Ian Abbott      2017-07-10  24
> bc6245e5efd70c4 Ian Abbott      2017-07-10  25  /*
> bc6245e5efd70c4 Ian Abbott      2017-07-10  26   * BUILD_BUG_ON_INVALID() permits the compiler to check the validity of the
> bc6245e5efd70c4 Ian Abbott      2017-07-10  27   * expression but avoids the generation of any code, even if that expression
> bc6245e5efd70c4 Ian Abbott      2017-07-10  28   * has side-effects.
> bc6245e5efd70c4 Ian Abbott      2017-07-10  29   */
> bc6245e5efd70c4 Ian Abbott      2017-07-10 @30  #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
> bc6245e5efd70c4 Ian Abbott      2017-07-10  31
> 

-- 
Thanks,
Yu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page
@ 2022-04-27  9:48       ` Yu Xu
  0 siblings, 0 replies; 18+ messages in thread
From: Yu Xu @ 2022-04-27  9:48 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4160 bytes --]

Thanks!

Sorry for that I tested only with CONFIG_DEBUG_VM enabled. This issue is
triggered when CONFIG_DEBUG_VM is disabled.

PATCH 2/2 has been resend.

On 4/27/22 5:01 PM, kernel test robot wrote:
> Hi Xu,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on hnaz-mm/master]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
> base:   https://github.com/hnaz/linux-mm master
> config: s390-randconfig-r044-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271636.UqHlxRwk-lkp(a)intel.com/config)
> compiler: s390-linux-gcc (GCC) 11.3.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1
>          git remote add linux-review https://github.com/intel-lab-lkp/linux
>          git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
>          git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1
>          # save the config file
>          mkdir build_dir && cp config build_dir/.config
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>     In file included from include/linux/bits.h:22,
>                      from include/linux/ratelimit_types.h:5,
>                      from include/linux/printk.h:10,
>                      from include/asm-generic/bug.h:22,
>                      from arch/s390/include/asm/bug.h:68,
>                      from include/linux/bug.h:5,
>                      from include/linux/mmdebug.h:5,
>                      from include/linux/mm.h:6,
>                      from mm/huge_memory.c:8:
>     mm/huge_memory.c: In function 'split_huge_page_to_list':
>>> include/linux/build_bug.h:30:33: error: void value not ignored as it ought to be
>        30 | #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
>           |                                 ^
>     include/linux/mmdebug.h:81:43: note: in expansion of macro 'BUILD_BUG_ON_INVALID'
>        81 | #define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
>           |                                           ^~~~~~~~~~~~~~~~~~~~
>     mm/huge_memory.c:2553:13: note: in expansion of macro 'VM_WARN_ON_ONCE_PAGE'
>      2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
>           |             ^~~~~~~~~~~~~~~~~~~~
> 
> 
> vim +30 include/linux/build_bug.h
> 
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  18
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  19  /* Force a compilation error if a constant expression is not a power of 2 */
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  20  #define __BUILD_BUG_ON_NOT_POWER_OF_2(n)	\
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  21  	BUILD_BUG_ON(((n) & ((n) - 1)) != 0)
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  22  #define BUILD_BUG_ON_NOT_POWER_OF_2(n)			\
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  23  	BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0))
> bc6245e5efd70c4 Ian Abbott      2017-07-10  24
> bc6245e5efd70c4 Ian Abbott      2017-07-10  25  /*
> bc6245e5efd70c4 Ian Abbott      2017-07-10  26   * BUILD_BUG_ON_INVALID() permits the compiler to check the validity of the
> bc6245e5efd70c4 Ian Abbott      2017-07-10  27   * expression but avoids the generation of any code, even if that expression
> bc6245e5efd70c4 Ian Abbott      2017-07-10  28   * has side-effects.
> bc6245e5efd70c4 Ian Abbott      2017-07-10  29   */
> bc6245e5efd70c4 Ian Abbott      2017-07-10 @30  #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
> bc6245e5efd70c4 Ian Abbott      2017-07-10  31
> 

-- 
Thanks,
Yu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  7:12   ` HORIGUCHI NAOYA(堀口 直也)
  2022-04-27  7:37     ` Yu Xu
@ 2022-04-27 19:00     ` Andrew Morton
  1 sibling, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2022-04-27 19:00 UTC (permalink / raw)
  To: HORIGUCHI NAOYA; +Cc: Xu Yu, linux-mm, shy828301

On Wed, 27 Apr 2022 07:12:32 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote:

> What to do on -stable?
> The older version was backported to 5.15.z and 5.17.z, so if you choose
> to send this to stable, 1/2 should be also sent to stable.

I added

Fixes: d173d5417fb ("mm/memory-failure.c: skip huge_zero_page in memory_failure()")
Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7")
Cc: <stable@vger.kernel.org>

to both patches.  I think -stable people will be able to sort that out.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
  2022-04-27  6:10 ` [PATCH 1/2] Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()" Xu Yu
@ 2022-04-27 21:13   ` Yang Shi
  2022-04-28  2:23   ` Miaohe Lin
  1 sibling, 0 replies; 18+ messages in thread
From: Yang Shi @ 2022-04-27 21:13 UTC (permalink / raw)
  To: Xu Yu
  Cc: Linux MM, Andrew Morton,
	HORIGUCHI NAOYA(堀口 直也)

On Tue, Apr 26, 2022 at 11:10 PM Xu Yu <xuyu@linux.alibaba.com> wrote:
>
> This reverts commit d173d5417fb67411e623d394aab986d847e47dad.
>
> The commit d173d5417fb6 ("mm/memory-failure.c: skip huge_zero_page in
> memory_failure()") explicitly skips huge_zero_page in memory_failure(),
> in order to avoid triggering VM_BUG_ON_PAGE on huge_zero_page in
> split_huge_page_to_list().
>
> This works, but Yang Shi thinks that,
>
>     Raising BUG is overkilling for splitting huge_zero_page. The
>     huge_zero_page can't be met from normal paths other than memory
>     failure, but memory failure is a valid caller. So I tend to replace
>     the BUG to WARN + returning -EBUSY. If we don't care about the
>     reason code in memory failure, we don't have to touch memory
>     failure.
>
> And for the issue that huge_zero_page will be set PG_has_hwpoisoned,
> Yang Shi comments that,
>
>     The anonymous page fault doesn't check if the page is poisoned or
>     not since it typically gets a fresh allocated page and assumes the
>     poisoned page (isolated successfully) can't be reallocated again.
>     But huge zero page and base zero page are reused every time. So no
>     matter what fix we pick, the issue is always there.
>
> Finally, Yang, David, Anshuman and Naoya all agree to fix the bug, i.e.,
> to split huge_zero_page, in split_huge_page_to_list().
>
> This reverts the commit d173d5417fb6 ("mm/memory-failure.c: skip
> huge_zero_page in memory_failure()"), and the original bug will be fixed
> by the next patch.

Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
> ---
>  mm/memory-failure.c | 13 -------------
>  1 file changed, 13 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 27760c19bad7..2020944398c9 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1860,19 +1860,6 @@ int memory_failure(unsigned long pfn, int flags)
>         }
>
>         if (PageTransHuge(hpage)) {
> -               /*
> -                * Bail out before SetPageHasHWPoisoned() if hpage is
> -                * huge_zero_page, although PG_has_hwpoisoned is not
> -                * checked in set_huge_zero_page().
> -                *
> -                * TODO: Handle memory failure of huge_zero_page thoroughly.
> -                */
> -               if (is_huge_zero_page(hpage)) {
> -                       action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> -                       res = -EBUSY;
> -                       goto unlock_mutex;
> -               }
> -
>                 /*
>                  * The flag must be set after the refcount is bumped
>                  * otherwise it may race with THP split.
> --
> 2.20.1.2432.ga663e714
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2 RESEND] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  9:44   ` [PATCH 2/2 RESEND] " Xu Yu
@ 2022-04-27 21:15     ` Yang Shi
  2022-04-28  2:25     ` Miaohe Lin
  2022-04-28 16:04     ` David Hildenbrand
  2 siblings, 0 replies; 18+ messages in thread
From: Yang Shi @ 2022-04-27 21:15 UTC (permalink / raw)
  To: Xu Yu
  Cc: Linux MM, Andrew Morton,
	HORIGUCHI NAOYA(堀口 直也)

On Wed, Apr 27, 2022 at 2:45 AM Xu Yu <xuyu@linux.alibaba.com> wrote:
>
> Kernel panic when injecting memory_failure for the global
> huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
>
>   Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
>   page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
>   head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
>   flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
>   raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
>   raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
>   page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
>   ------------[ cut here ]------------
>   kernel BUG at mm/huge_memory.c:2499!
>   invalid opcode: 0000 [#1] PREEMPT SMP PTI
>   CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
>   Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
>   RIP: 0010:split_huge_page_to_list+0x66a/0x880
>   Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
>   RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
>   RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
>   RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
>   RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
>   R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
>   R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
>   FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>   try_to_split_thp_page+0x3a/0x130
>   memory_failure+0x128/0x800
>   madvise_inject_error.cold+0x8b/0xa1
>   __x64_sys_madvise+0x54/0x60
>   do_syscall_64+0x35/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>   RIP: 0033:0x7fc3754f8bf9
>   Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
>   RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
>   RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
>   RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
>   RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
>   R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
>   R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
>
> We think that raising BUG is overkilling for splitting huge_zero_page,
> the huge_zero_page can't be met from normal paths other than memory
> failure, but memory failure is a valid caller. So we tend to replace the
> BUG to WARN + returning -EBUSY, and thus the panic above won't happen
> again.

Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
> ---
>  mm/huge_memory.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c468fee595ff..910a138e9859 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2495,11 +2495,16 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>         struct address_space *mapping = NULL;
>         int extra_pins, ret;
>         pgoff_t end;
> +       bool is_hzp;
>
> -       VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
>         VM_BUG_ON_PAGE(!PageLocked(head), head);
>         VM_BUG_ON_PAGE(!PageCompound(head), head);
>
> +       is_hzp = is_huge_zero_page(head);
> +       VM_WARN_ON_ONCE_PAGE(is_hzp, head);
> +       if (is_hzp)
> +               return -EBUSY;
> +
>         if (PageWriteback(head))
>                 return -EBUSY;
>
> --
> 2.20.1.2432.ga663e714
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  6:10 ` [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page Xu Yu
                     ` (3 preceding siblings ...)
  2022-04-27  9:44   ` [PATCH 2/2 RESEND] " Xu Yu
@ 2022-04-28  1:59   ` kernel test robot
  4 siblings, 0 replies; 18+ messages in thread
From: kernel test robot @ 2022-04-28  1:59 UTC (permalink / raw)
  To: Xu Yu, linux-mm; +Cc: kbuild-all, akpm, naoya.horiguchi, shy828301

Hi Xu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
base:   https://github.com/hnaz/linux-mm master
config: arc-randconfig-r005-20220425 (https://download.01.org/0day-ci/archive/20220428/202204280339.5Akc9USp-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
        git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:5,
                    from arch/arc/include/asm/bug.h:30,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:6,
                    from mm/huge_memory.c:8:
   mm/huge_memory.c: In function 'split_huge_page_to_list':
>> include/linux/compiler.h:56:45: error: invalid use of void expression
      56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                                             ^
   include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
      58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
         |                                                    ^~~~
   mm/huge_memory.c:2553:9: note: in expansion of macro 'if'
    2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
         |         ^~
>> include/linux/compiler.h:56:45: error: invalid use of void expression
      56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                                             ^
   include/linux/compiler.h:58:61: note: in definition of macro '__trace_if_var'
      58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
         |                                                             ^~~~
   mm/huge_memory.c:2553:9: note: in expansion of macro 'if'
    2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
         |         ^~
>> include/linux/compiler.h:56:45: error: invalid use of void expression
      56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                                             ^
   include/linux/compiler.h:69:10: note: in definition of macro '__trace_if_value'
      69 |         (cond) ?                                        \
         |          ^~~~
   include/linux/compiler.h:56:28: note: in expansion of macro '__trace_if_var'
      56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                            ^~~~~~~~~~~~~~
   mm/huge_memory.c:2553:9: note: in expansion of macro 'if'
    2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
         |         ^~


vim +56 include/linux/compiler.h

2bcd521a684cc94 Steven Rostedt 2008-11-21  50  
2bcd521a684cc94 Steven Rostedt 2008-11-21  51  #ifdef CONFIG_PROFILE_ALL_BRANCHES
2bcd521a684cc94 Steven Rostedt 2008-11-21  52  /*
2bcd521a684cc94 Steven Rostedt 2008-11-21  53   * "Define 'is'", Bill Clinton
2bcd521a684cc94 Steven Rostedt 2008-11-21  54   * "Define 'if'", Steven Rostedt
2bcd521a684cc94 Steven Rostedt 2008-11-21  55   */
a15fd609ad53a63 Linus Torvalds 2019-03-20 @56  #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
a15fd609ad53a63 Linus Torvalds 2019-03-20  57  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
  2022-04-27  6:10 ` [PATCH 1/2] Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()" Xu Yu
  2022-04-27 21:13   ` Yang Shi
@ 2022-04-28  2:23   ` Miaohe Lin
  1 sibling, 0 replies; 18+ messages in thread
From: Miaohe Lin @ 2022-04-28  2:23 UTC (permalink / raw)
  To: Xu Yu; +Cc: akpm, naoya.horiguchi, shy828301, Linux-MM

On 2022/4/27 14:10, Xu Yu wrote:
> This reverts commit d173d5417fb67411e623d394aab986d847e47dad.
> 
> The commit d173d5417fb6 ("mm/memory-failure.c: skip huge_zero_page in
> memory_failure()") explicitly skips huge_zero_page in memory_failure(),
> in order to avoid triggering VM_BUG_ON_PAGE on huge_zero_page in
> split_huge_page_to_list().
> 
> This works, but Yang Shi thinks that,
> 
>     Raising BUG is overkilling for splitting huge_zero_page. The
>     huge_zero_page can't be met from normal paths other than memory
>     failure, but memory failure is a valid caller. So I tend to replace
>     the BUG to WARN + returning -EBUSY. If we don't care about the
>     reason code in memory failure, we don't have to touch memory
>     failure.
> 
> And for the issue that huge_zero_page will be set PG_has_hwpoisoned,
> Yang Shi comments that,
> 
>     The anonymous page fault doesn't check if the page is poisoned or
>     not since it typically gets a fresh allocated page and assumes the
>     poisoned page (isolated successfully) can't be reallocated again.
>     But huge zero page and base zero page are reused every time. So no
>     matter what fix we pick, the issue is always there.
> 
> Finally, Yang, David, Anshuman and Naoya all agree to fix the bug, i.e.,
> to split huge_zero_page, in split_huge_page_to_list().
> 
> This reverts the commit d173d5417fb6 ("mm/memory-failure.c: skip
> huge_zero_page in memory_failure()"), and the original bug will be fixed
> by the next patch.
> 
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>

LGTM. Thanks!

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

> ---
>  mm/memory-failure.c | 13 -------------
>  1 file changed, 13 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 27760c19bad7..2020944398c9 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1860,19 +1860,6 @@ int memory_failure(unsigned long pfn, int flags)
>  	}
>  
>  	if (PageTransHuge(hpage)) {
> -		/*
> -		 * Bail out before SetPageHasHWPoisoned() if hpage is
> -		 * huge_zero_page, although PG_has_hwpoisoned is not
> -		 * checked in set_huge_zero_page().
> -		 *
> -		 * TODO: Handle memory failure of huge_zero_page thoroughly.
> -		 */
> -		if (is_huge_zero_page(hpage)) {
> -			action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> -			res = -EBUSY;
> -			goto unlock_mutex;
> -		}
> -
>  		/*
>  		 * The flag must be set after the refcount is bumped
>  		 * otherwise it may race with THP split.
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2 RESEND] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  9:44   ` [PATCH 2/2 RESEND] " Xu Yu
  2022-04-27 21:15     ` Yang Shi
@ 2022-04-28  2:25     ` Miaohe Lin
  2022-04-28 16:04     ` David Hildenbrand
  2 siblings, 0 replies; 18+ messages in thread
From: Miaohe Lin @ 2022-04-28  2:25 UTC (permalink / raw)
  To: Xu Yu; +Cc: akpm, naoya.horiguchi, shy828301, Linux-MM

On 2022/4/27 17:44, Xu Yu wrote:
> Kernel panic when injecting memory_failure for the global
> huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
> 
>   Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
>   page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
>   head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
>   flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
>   raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
>   raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
>   page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
>   ------------[ cut here ]------------
>   kernel BUG at mm/huge_memory.c:2499!
>   invalid opcode: 0000 [#1] PREEMPT SMP PTI
>   CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
>   Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
>   RIP: 0010:split_huge_page_to_list+0x66a/0x880
>   Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
>   RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
>   RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
>   RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
>   RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
>   R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
>   R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
>   FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>   try_to_split_thp_page+0x3a/0x130
>   memory_failure+0x128/0x800
>   madvise_inject_error.cold+0x8b/0xa1
>   __x64_sys_madvise+0x54/0x60
>   do_syscall_64+0x35/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>   RIP: 0033:0x7fc3754f8bf9
>   Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
>   RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
>   RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
>   RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
>   RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
>   R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
>   R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
> 
> We think that raising BUG is overkilling for splitting huge_zero_page,
> the huge_zero_page can't be met from normal paths other than memory
> failure, but memory failure is a valid caller. So we tend to replace the
> BUG to WARN + returning -EBUSY, and thus the panic above won't happen
> again.
> 
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>

LGTM. Thanks!

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

> ---
>  mm/huge_memory.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c468fee595ff..910a138e9859 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2495,11 +2495,16 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>  	struct address_space *mapping = NULL;
>  	int extra_pins, ret;
>  	pgoff_t end;
> +	bool is_hzp;
>  
> -	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
>  	VM_BUG_ON_PAGE(!PageLocked(head), head);
>  	VM_BUG_ON_PAGE(!PageCompound(head), head);
>  
> +	is_hzp = is_huge_zero_page(head);
> +	VM_WARN_ON_ONCE_PAGE(is_hzp, head);
> +	if (is_hzp)
> +		return -EBUSY;
> +
>  	if (PageWriteback(head))
>  		return -EBUSY;
>  
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2 RESEND] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-27  9:44   ` [PATCH 2/2 RESEND] " Xu Yu
  2022-04-27 21:15     ` Yang Shi
  2022-04-28  2:25     ` Miaohe Lin
@ 2022-04-28 16:04     ` David Hildenbrand
  2022-04-28 17:18       ` Yang Shi
  2 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2022-04-28 16:04 UTC (permalink / raw)
  To: Xu Yu, linux-mm; +Cc: akpm, naoya.horiguchi, shy828301

On 27.04.22 11:44, Xu Yu wrote:
> Kernel panic when injecting memory_failure for the global
> huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
> 
>   Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
>   page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
>   head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
>   flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
>   raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
>   raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
>   page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
>   ------------[ cut here ]------------
>   kernel BUG at mm/huge_memory.c:2499!
>   invalid opcode: 0000 [#1] PREEMPT SMP PTI
>   CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
>   Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
>   RIP: 0010:split_huge_page_to_list+0x66a/0x880
>   Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
>   RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
>   RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
>   RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
>   RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
>   R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
>   R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
>   FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>   try_to_split_thp_page+0x3a/0x130
>   memory_failure+0x128/0x800
>   madvise_inject_error.cold+0x8b/0xa1
>   __x64_sys_madvise+0x54/0x60
>   do_syscall_64+0x35/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>   RIP: 0033:0x7fc3754f8bf9
>   Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
>   RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
>   RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
>   RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
>   RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
>   R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
>   R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
> 
> We think that raising BUG is overkilling for splitting huge_zero_page,
> the huge_zero_page can't be met from normal paths other than memory
> failure, but memory failure is a valid caller. So we tend to replace the
> BUG to WARN + returning -EBUSY, and thus the panic above won't happen
> again.
> 
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
> ---
>  mm/huge_memory.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c468fee595ff..910a138e9859 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2495,11 +2495,16 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>  	struct address_space *mapping = NULL;
>  	int extra_pins, ret;
>  	pgoff_t end;
> +	bool is_hzp;
>  
> -	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
>  	VM_BUG_ON_PAGE(!PageLocked(head), head);
>  	VM_BUG_ON_PAGE(!PageCompound(head), head);
>  
> +	is_hzp = is_huge_zero_page(head);
> +	VM_WARN_ON_ONCE_PAGE(is_hzp, head);

If this code is valid to be reached, VM_WARN_ON_ONCE_PAGE is most
probably the wrong choice.

IIUC, after patch #1 (revert) we can reach this again?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2 RESEND] mm/huge_memory: do not overkill when splitting huge_zero_page
  2022-04-28 16:04     ` David Hildenbrand
@ 2022-04-28 17:18       ` Yang Shi
  0 siblings, 0 replies; 18+ messages in thread
From: Yang Shi @ 2022-04-28 17:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Xu Yu, Linux MM, Andrew Morton,
	HORIGUCHI NAOYA(堀口 直也)

On Thu, Apr 28, 2022 at 9:04 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 27.04.22 11:44, Xu Yu wrote:
> > Kernel panic when injecting memory_failure for the global
> > huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
> >
> >   Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
> >   page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
> >   head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
> >   flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
> >   raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
> >   raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
> >   page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
> >   ------------[ cut here ]------------
> >   kernel BUG at mm/huge_memory.c:2499!
> >   invalid opcode: 0000 [#1] PREEMPT SMP PTI
> >   CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
> >   Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
> >   RIP: 0010:split_huge_page_to_list+0x66a/0x880
> >   Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
> >   RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
> >   RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
> >   RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
> >   RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
> >   R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
> >   R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
> >   FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
> >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
> >   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >   Call Trace:
> >   try_to_split_thp_page+0x3a/0x130
> >   memory_failure+0x128/0x800
> >   madvise_inject_error.cold+0x8b/0xa1
> >   __x64_sys_madvise+0x54/0x60
> >   do_syscall_64+0x35/0x80
> >   entry_SYSCALL_64_after_hwframe+0x44/0xae
> >   RIP: 0033:0x7fc3754f8bf9
> >   Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
> >   RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
> >   RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
> >   RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
> >   RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
> >   R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
> >   R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
> >
> > We think that raising BUG is overkilling for splitting huge_zero_page,
> > the huge_zero_page can't be met from normal paths other than memory
> > failure, but memory failure is a valid caller. So we tend to replace the
> > BUG to WARN + returning -EBUSY, and thus the panic above won't happen
> > again.
> >
> > Suggested-by: Yang Shi <shy828301@gmail.com>
> > Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
> > ---
> >  mm/huge_memory.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index c468fee595ff..910a138e9859 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2495,11 +2495,16 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> >       struct address_space *mapping = NULL;
> >       int extra_pins, ret;
> >       pgoff_t end;
> > +     bool is_hzp;
> >
> > -     VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
> >       VM_BUG_ON_PAGE(!PageLocked(head), head);
> >       VM_BUG_ON_PAGE(!PageCompound(head), head);
> >
> > +     is_hzp = is_huge_zero_page(head);
> > +     VM_WARN_ON_ONCE_PAGE(is_hzp, head);
>
> If this code is valid to be reached, VM_WARN_ON_ONCE_PAGE is most
> probably the wrong choice.

Only from the memory failure path, any other path is invalid. The
warning is mainly used to catch the invalid cases. It should be rare
to have memory failure on huge zero page in real life.

>
> IIUC, after patch #1 (revert) we can reach this again?
>
> --
> Thanks,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-04-28 17:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27  6:10 [PATCH 0/2] mm/memory-failure: rework fix on huge_zero_page splitting Xu Yu
2022-04-27  6:10 ` [PATCH 1/2] Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()" Xu Yu
2022-04-27 21:13   ` Yang Shi
2022-04-28  2:23   ` Miaohe Lin
2022-04-27  6:10 ` [PATCH 2/2] mm/huge_memory: do not overkill when splitting huge_zero_page Xu Yu
2022-04-27  7:12   ` HORIGUCHI NAOYA(堀口 直也)
2022-04-27  7:37     ` Yu Xu
2022-04-27 19:00     ` Andrew Morton
2022-04-27  9:01   ` kernel test robot
2022-04-27  9:48     ` Yu Xu
2022-04-27  9:48       ` Yu Xu
2022-04-27  9:36   ` kernel test robot
2022-04-27  9:44   ` [PATCH 2/2 RESEND] " Xu Yu
2022-04-27 21:15     ` Yang Shi
2022-04-28  2:25     ` Miaohe Lin
2022-04-28 16:04     ` David Hildenbrand
2022-04-28 17:18       ` Yang Shi
2022-04-28  1:59   ` [PATCH 2/2] " kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.