linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: fix race by making init_zero_pfn() early_initcall
@ 2021-03-29  5:24 Ilya Lipnitskiy
  2021-03-29  5:29 ` [PATCH v2] " Ilya Lipnitskiy
  0 siblings, 1 reply; 4+ messages in thread
From: Ilya Lipnitskiy @ 2021-03-29  5:24 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel; +Cc: Ilya Lipnitskiy, Eric W. Biederman

There are code paths that rely on zero_pfn to be fully initialized
before core_initcall. For example, wq_sysfs_init() is a core_initcall
function that eventually results in a call to kernel_execve, which
causes a page fault with a subsequent mmput. If zero_pfn is not
initialized by then it may not get cleaned up properly and result in an
error:
  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1

Here is an analysis of the race as seen on a MIPS device. On this
particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
initialized, at which point it becomes PFN 5120:
  1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
       [<80340dc8>] kobject_uevent_env+0x7e4/0x7ec
       [<8033f8b8>] kset_register+0x68/0x88
       [<803cf824>] bus_register+0xdc/0x34c
       [<803cfac8>] subsys_virtual_register+0x34/0x78
       [<8086afb0>] wq_sysfs_init+0x1c/0x4c
       [<80001648>] do_one_initcall+0x50/0x1a8
       [<8086503c>] kernel_init_freeable+0x230/0x2c8
       [<8066bca0>] kernel_init+0x10/0x100
       [<80003038>] ret_from_kernel_thread+0x14/0x1c

  2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
     kernel_execve asynchronously.

  3. Memory allocations in kernel_execve cause a page fault, bumping the
     MM reference counter:
       [<8015adb4>] add_mm_counter_fast+0xb4/0xc0
       [<80160d58>] handle_mm_fault+0x6e4/0xea0
       [<80158aa4>] __get_user_pages.part.78+0x190/0x37c
       [<8015992c>] __get_user_pages_remote+0x128/0x360
       [<801a6d9c>] get_arg_page+0x34/0xa0
       [<801a7394>] copy_string_kernel+0x194/0x2a4
       [<801a880c>] kernel_execve+0x11c/0x298
       [<800420f4>] call_usermodehelper_exec_async+0x114/0x194

  4. In case zero_pfn has not been initialized yet, zap_pte_range does
     not decrement the MM_ANONPAGES RSS counter and the BUG message is
     triggered shortly afterwards when __mmdrop checks the ref counters:
       [<800285e8>] __mmdrop+0x98/0x1d0
       [<801a6de8>] free_bprm+0x44/0x118
       [<801a86a8>] kernel_execve+0x160/0x1d8
       [<800420f4>] call_usermodehelper_exec_async+0x114/0x194
       [<80003198>] ret_from_kernel_thread+0x14/0x1c

To avoid races such as described above, initialize init_zero_pfn at
early_initcall level. Depending on the architecture, ZERO_PAGE is either
constant or gets initialized even earlier, at paging_init, so there is
no issue with initializing zero_pfn earlier.

ML discussion: https://lore.kernel.org/lkml/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com/

Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 46ef306375bd..a8bbc4fc121f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -166,7 +166,7 @@ static int __init init_zero_pfn(void)
 	zero_pfn = page_to_pfn(ZERO_PAGE(0));
 	return 0;
 }
-core_initcall(init_zero_pfn);
+early_initcall(init_zero_pfn);
 
 void mm_trace_rss_stat(struct mm_struct *mm, int member, long count)
 {
-- 
2.31.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2] mm: fix race by making init_zero_pfn() early_initcall
  2021-03-29  5:24 [PATCH] mm: fix race by making init_zero_pfn() early_initcall Ilya Lipnitskiy
@ 2021-03-29  5:29 ` Ilya Lipnitskiy
  2021-03-30  4:42   ` [PATCH v3] " Ilya Lipnitskiy
  0 siblings, 1 reply; 4+ messages in thread
From: Ilya Lipnitskiy @ 2021-03-29  5:29 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel
  Cc: Ilya Lipnitskiy, Eric W. Biederman, stable

There are code paths that rely on zero_pfn to be fully initialized
before core_initcall. For example, wq_sysfs_init() is a core_initcall
function that eventually results in a call to kernel_execve, which
causes a page fault with a subsequent mmput. If zero_pfn is not
initialized by then it may not get cleaned up properly and result in an
error:
  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1

Here is an analysis of the race as seen on a MIPS device. On this
particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
initialized, at which point it becomes PFN 5120:
  1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
       [<80340dc8>] kobject_uevent_env+0x7e4/0x7ec
       [<8033f8b8>] kset_register+0x68/0x88
       [<803cf824>] bus_register+0xdc/0x34c
       [<803cfac8>] subsys_virtual_register+0x34/0x78
       [<8086afb0>] wq_sysfs_init+0x1c/0x4c
       [<80001648>] do_one_initcall+0x50/0x1a8
       [<8086503c>] kernel_init_freeable+0x230/0x2c8
       [<8066bca0>] kernel_init+0x10/0x100
       [<80003038>] ret_from_kernel_thread+0x14/0x1c

  2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
     kernel_execve asynchronously.

  3. Memory allocations in kernel_execve cause a page fault, bumping the
     MM reference counter:
       [<8015adb4>] add_mm_counter_fast+0xb4/0xc0
       [<80160d58>] handle_mm_fault+0x6e4/0xea0
       [<80158aa4>] __get_user_pages.part.78+0x190/0x37c
       [<8015992c>] __get_user_pages_remote+0x128/0x360
       [<801a6d9c>] get_arg_page+0x34/0xa0
       [<801a7394>] copy_string_kernel+0x194/0x2a4
       [<801a880c>] kernel_execve+0x11c/0x298
       [<800420f4>] call_usermodehelper_exec_async+0x114/0x194

  4. In case zero_pfn has not been initialized yet, zap_pte_range does
     not decrement the MM_ANONPAGES RSS counter and the BUG message is
     triggered shortly afterwards when __mmdrop checks the ref counters:
       [<800285e8>] __mmdrop+0x98/0x1d0
       [<801a6de8>] free_bprm+0x44/0x118
       [<801a86a8>] kernel_execve+0x160/0x1d8
       [<800420f4>] call_usermodehelper_exec_async+0x114/0x194
       [<80003198>] ret_from_kernel_thread+0x14/0x1c

To avoid races such as described above, initialize init_zero_pfn at
early_initcall level. Depending on the architecture, ZERO_PAGE is either
constant or gets initialized even earlier, at paging_init, so there is
no issue with initializing zero_pfn earlier.

ML discussion: https://lore.kernel.org/lkml/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com/

Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 46ef306375bd..a8bbc4fc121f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -166,7 +166,7 @@ static int __init init_zero_pfn(void)
 	zero_pfn = page_to_pfn(ZERO_PAGE(0));
 	return 0;
 }
-core_initcall(init_zero_pfn);
+early_initcall(init_zero_pfn);
 
 void mm_trace_rss_stat(struct mm_struct *mm, int member, long count)
 {
-- 
2.31.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v3] mm: fix race by making init_zero_pfn() early_initcall
  2021-03-29  5:29 ` [PATCH v2] " Ilya Lipnitskiy
@ 2021-03-30  4:42   ` Ilya Lipnitskiy
  2021-03-30  4:59     ` Zhou Yanjie
  0 siblings, 1 reply; 4+ messages in thread
From: Ilya Lipnitskiy @ 2021-03-30  4:42 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, linux-kernel
  Cc: Ilya Lipnitskiy, Hugh Dickins, Eric W. Biederman, stable

There are code paths that rely on zero_pfn to be fully initialized
before core_initcall. For example, wq_sysfs_init() is a core_initcall
function that eventually results in a call to kernel_execve, which
causes a page fault with a subsequent mmput. If zero_pfn is not
initialized by then it may not get cleaned up properly and result in an
error:
  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1

Here is an analysis of the race as seen on a MIPS device. On this
particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
initialized, at which point it becomes PFN 5120:
  1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
       [<80340dc8>] kobject_uevent_env+0x7e4/0x7ec
       [<8033f8b8>] kset_register+0x68/0x88
       [<803cf824>] bus_register+0xdc/0x34c
       [<803cfac8>] subsys_virtual_register+0x34/0x78
       [<8086afb0>] wq_sysfs_init+0x1c/0x4c
       [<80001648>] do_one_initcall+0x50/0x1a8
       [<8086503c>] kernel_init_freeable+0x230/0x2c8
       [<8066bca0>] kernel_init+0x10/0x100
       [<80003038>] ret_from_kernel_thread+0x14/0x1c

  2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
     kernel_execve asynchronously.

  3. Memory allocations in kernel_execve cause a page fault, bumping the
     MM reference counter:
       [<8015adb4>] add_mm_counter_fast+0xb4/0xc0
       [<80160d58>] handle_mm_fault+0x6e4/0xea0
       [<80158aa4>] __get_user_pages.part.78+0x190/0x37c
       [<8015992c>] __get_user_pages_remote+0x128/0x360
       [<801a6d9c>] get_arg_page+0x34/0xa0
       [<801a7394>] copy_string_kernel+0x194/0x2a4
       [<801a880c>] kernel_execve+0x11c/0x298
       [<800420f4>] call_usermodehelper_exec_async+0x114/0x194

  4. In case zero_pfn has not been initialized yet, zap_pte_range does
     not decrement the MM_ANONPAGES RSS counter and the BUG message is
     triggered shortly afterwards when __mmdrop checks the ref counters:
       [<800285e8>] __mmdrop+0x98/0x1d0
       [<801a6de8>] free_bprm+0x44/0x118
       [<801a86a8>] kernel_execve+0x160/0x1d8
       [<800420f4>] call_usermodehelper_exec_async+0x114/0x194
       [<80003198>] ret_from_kernel_thread+0x14/0x1c

To avoid races such as described above, initialize init_zero_pfn at
early_initcall level. Depending on the architecture, ZERO_PAGE is either
constant or gets initialized even earlier, at paging_init, so there is
no issue with initializing zero_pfn earlier.

Discussion: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com

Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 5c3b29d3af66..e66b11ac1659 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -166,7 +166,7 @@ static int __init init_zero_pfn(void)
 	zero_pfn = page_to_pfn(ZERO_PAGE(0));
 	return 0;
 }
-core_initcall(init_zero_pfn);
+early_initcall(init_zero_pfn);
 
 void mm_trace_rss_stat(struct mm_struct *mm, int member, long count)
 {
-- 
2.31.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] mm: fix race by making init_zero_pfn() early_initcall
  2021-03-30  4:42   ` [PATCH v3] " Ilya Lipnitskiy
@ 2021-03-30  4:59     ` Zhou Yanjie
  0 siblings, 0 replies; 4+ messages in thread
From: Zhou Yanjie @ 2021-03-30  4:59 UTC (permalink / raw)
  To: Ilya Lipnitskiy, Andrew Morton, linux-mm, linux-kernel
  Cc: Hugh Dickins, Eric W. Biederman, stable

Hi Ilya,

On 2021/3/30 下午12:42, Ilya Lipnitskiy wrote:
> There are code paths that rely on zero_pfn to be fully initialized
> before core_initcall. For example, wq_sysfs_init() is a core_initcall
> function that eventually results in a call to kernel_execve, which
> causes a page fault with a subsequent mmput. If zero_pfn is not
> initialized by then it may not get cleaned up properly and result in an
> error:
>    BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1
>
> Here is an analysis of the race as seen on a MIPS device. On this
> particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
> initialized, at which point it becomes PFN 5120:
>    1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
>         [<80340dc8>] kobject_uevent_env+0x7e4/0x7ec
>         [<8033f8b8>] kset_register+0x68/0x88
>         [<803cf824>] bus_register+0xdc/0x34c
>         [<803cfac8>] subsys_virtual_register+0x34/0x78
>         [<8086afb0>] wq_sysfs_init+0x1c/0x4c
>         [<80001648>] do_one_initcall+0x50/0x1a8
>         [<8086503c>] kernel_init_freeable+0x230/0x2c8
>         [<8066bca0>] kernel_init+0x10/0x100
>         [<80003038>] ret_from_kernel_thread+0x14/0x1c
>
>    2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
>       kernel_execve asynchronously.
>
>    3. Memory allocations in kernel_execve cause a page fault, bumping the
>       MM reference counter:
>         [<8015adb4>] add_mm_counter_fast+0xb4/0xc0
>         [<80160d58>] handle_mm_fault+0x6e4/0xea0
>         [<80158aa4>] __get_user_pages.part.78+0x190/0x37c
>         [<8015992c>] __get_user_pages_remote+0x128/0x360
>         [<801a6d9c>] get_arg_page+0x34/0xa0
>         [<801a7394>] copy_string_kernel+0x194/0x2a4
>         [<801a880c>] kernel_execve+0x11c/0x298
>         [<800420f4>] call_usermodehelper_exec_async+0x114/0x194
>
>    4. In case zero_pfn has not been initialized yet, zap_pte_range does
>       not decrement the MM_ANONPAGES RSS counter and the BUG message is
>       triggered shortly afterwards when __mmdrop checks the ref counters:
>         [<800285e8>] __mmdrop+0x98/0x1d0
>         [<801a6de8>] free_bprm+0x44/0x118
>         [<801a86a8>] kernel_execve+0x160/0x1d8
>         [<800420f4>] call_usermodehelper_exec_async+0x114/0x194
>         [<80003198>] ret_from_kernel_thread+0x14/0x1c
>
> To avoid races such as described above, initialize init_zero_pfn at
> early_initcall level. Depending on the architecture, ZERO_PAGE is either
> constant or gets initialized even earlier, at paging_init, so there is
> no issue with initializing zero_pfn earlier.
>
> Discussion: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com
>
> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: stable@vger.kernel.org
> ---
>   mm/memory.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)


Tested-by: 周琰杰 (Zhou Yanjie)<zhouyanjie@wanyeetech.com> # on 
CU1000-Neo/X1000E and CU1830-Neo/X1830


> diff --git a/mm/memory.c b/mm/memory.c
> index 5c3b29d3af66..e66b11ac1659 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -166,7 +166,7 @@ static int __init init_zero_pfn(void)
>   	zero_pfn = page_to_pfn(ZERO_PAGE(0));
>   	return 0;
>   }
> -core_initcall(init_zero_pfn);
> +early_initcall(init_zero_pfn);
>   
>   void mm_trace_rss_stat(struct mm_struct *mm, int member, long count)
>   {


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-30  4:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-29  5:24 [PATCH] mm: fix race by making init_zero_pfn() early_initcall Ilya Lipnitskiy
2021-03-29  5:29 ` [PATCH v2] " Ilya Lipnitskiy
2021-03-30  4:42   ` [PATCH v3] " Ilya Lipnitskiy
2021-03-30  4:59     ` Zhou Yanjie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).