* [PATCH 1/2] mm: fix null-ptr-deref in kswapd_is_running()
@ 2022-08-24 7:19 Kefeng Wang
2022-08-24 7:19 ` [PATCH 2/2] mm: slince possible data races about pgdat->kswapd Kefeng Wang
2022-08-24 7:56 ` [PATCH 1/2] mm: fix null-ptr-deref in kswapd_is_running() David Hildenbrand
0 siblings, 2 replies; 8+ messages in thread
From: Kefeng Wang @ 2022-08-24 7:19 UTC (permalink / raw)
To: Andrew Morton, linux-mm; +Cc: muchun.song, linux-kernel, Kefeng Wang
The kswapd_run/stop() will set pgdat->kswapd to NULL, which
could race with kswapd_is_running() in kcompactd(),
kswapd_run/stop() kcompactd()
kswapd_is_running()
if (pgdat->kswapd) // load non-NULL pgdat->kswapd
pgdat->kswapd = NULL
task_is_running(pgdat->kswapd) // Null pointer derefence
The KASAN report the null-ptr-deref shown below,
vmscan: Failed to start kswapd on node 0
...
BUG: KASAN: null-ptr-deref in kcompactd+0x440/0x504
Read of size 8 at addr 0000000000000024 by task kcompactd0/37
CPU: 0 PID: 37 Comm: kcompactd0 Kdump: loaded Tainted: G OE 5.10.60 #1
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
dump_backtrace+0x0/0x394
show_stack+0x34/0x4c
dump_stack+0x158/0x1e4
__kasan_report+0x138/0x140
kasan_report+0x44/0xdc
__asan_load8+0x94/0xd0
kcompactd+0x440/0x504
kthread+0x1a4/0x1f0
ret_from_fork+0x10/0x18
For race between kswapd_run() and kcompactd(), adding a temporary value
when create a kthread, and only set it to pgdat->kswapd if kthread_run()
return successful task_struct to fix the issue.
For race between kswapd_stop() and kcompactd(), let's call kcompactd_stop()
before kswapd_stop() to fix the issue.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
mm/memory_hotplug.c | 2 +-
mm/vmscan.c | 8 +++++---
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fad6d1f2262a..2fd45ccbce45 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1940,8 +1940,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
node_states_clear_node(node, &arg);
if (arg.status_change_nid >= 0) {
- kswapd_stop(node);
kcompactd_stop(node);
+ kswapd_stop(node);
}
writeback_set_ratelimit();
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b2b1431352dc..08c6497f76c3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4642,16 +4642,18 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
void kswapd_run(int nid)
{
pg_data_t *pgdat = NODE_DATA(nid);
+ struct task_struct *t;
if (pgdat->kswapd)
return;
- pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
- if (IS_ERR(pgdat->kswapd)) {
+ t = kthread_run(kswapd, pgdat, "kswapd%d", nid);
+ if (IS_ERR(t)) {
/* failure at boot is fatal */
BUG_ON(system_state < SYSTEM_RUNNING);
pr_err("Failed to start kswapd on node %d\n", nid);
- pgdat->kswapd = NULL;
+ } else {
+ pgdat->kswapd = t;
}
}
--
2.35.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] mm: slince possible data races about pgdat->kswapd
2022-08-24 7:19 [PATCH 1/2] mm: fix null-ptr-deref in kswapd_is_running() Kefeng Wang
@ 2022-08-24 7:19 ` Kefeng Wang
2022-08-24 8:24 ` David Hildenbrand
2022-08-24 7:56 ` [PATCH 1/2] mm: fix null-ptr-deref in kswapd_is_running() David Hildenbrand
1 sibling, 1 reply; 8+ messages in thread
From: Kefeng Wang @ 2022-08-24 7:19 UTC (permalink / raw)
To: Andrew Morton, linux-mm; +Cc: muchun.song, linux-kernel, Kefeng Wang
The pgdat->kswapd could be accessed concurrently by kswapd_run() and
kcompactd(), it don't be protected by any lock, which could leads to
data races, adding READ/WRITE_ONCE() to slince it.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
mm/compaction.c | 4 +++-
mm/vmscan.c | 8 ++++----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 640fa76228dd..aa1cfe47f046 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1983,7 +1983,9 @@ static inline bool is_via_compact_memory(int order)
static bool kswapd_is_running(pg_data_t *pgdat)
{
- return pgdat->kswapd && task_is_running(pgdat->kswapd);
+ struct task_struct *t = READ_ONCE(pgdat->kswapd);
+
+ return t && task_is_running(t);
}
/*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 08c6497f76c3..65b19ca8c8ee 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4644,7 +4644,7 @@ void kswapd_run(int nid)
pg_data_t *pgdat = NODE_DATA(nid);
struct task_struct *t;
- if (pgdat->kswapd)
+ if (READ_ONCE(pgdat->kswapd))
return;
t = kthread_run(kswapd, pgdat, "kswapd%d", nid);
@@ -4653,7 +4653,7 @@ void kswapd_run(int nid)
BUG_ON(system_state < SYSTEM_RUNNING);
pr_err("Failed to start kswapd on node %d\n", nid);
} else {
- pgdat->kswapd = t;
+ WRITE_ONCE(pgdat->kswapd, t);
}
}
@@ -4663,11 +4663,11 @@ void kswapd_run(int nid)
*/
void kswapd_stop(int nid)
{
- struct task_struct *kswapd = NODE_DATA(nid)->kswapd;
+ struct task_struct *kswapd = READ_ONCE(NODE_DATA(nid)->kswapd);
if (kswapd) {
kthread_stop(kswapd);
- NODE_DATA(nid)->kswapd = NULL;
+ WRITE_ONCE(NODE_DATA(nid)->kswapd, NULL);
}
}
--
2.35.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] mm: fix null-ptr-deref in kswapd_is_running()
2022-08-24 7:19 [PATCH 1/2] mm: fix null-ptr-deref in kswapd_is_running() Kefeng Wang
2022-08-24 7:19 ` [PATCH 2/2] mm: slince possible data races about pgdat->kswapd Kefeng Wang
@ 2022-08-24 7:56 ` David Hildenbrand
1 sibling, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2022-08-24 7:56 UTC (permalink / raw)
To: Kefeng Wang, Andrew Morton, linux-mm; +Cc: muchun.song, linux-kernel
On 24.08.22 09:19, Kefeng Wang wrote:
> The kswapd_run/stop() will set pgdat->kswapd to NULL, which
> could race with kswapd_is_running() in kcompactd(),
>
> kswapd_run/stop() kcompactd()
> kswapd_is_running()
> if (pgdat->kswapd) // load non-NULL pgdat->kswapd
> pgdat->kswapd = NULL
> task_is_running(pgdat->kswapd) // Null pointer derefence
>
> The KASAN report the null-ptr-deref shown below,
>
> vmscan: Failed to start kswapd on node 0
> ...
> BUG: KASAN: null-ptr-deref in kcompactd+0x440/0x504
> Read of size 8 at addr 0000000000000024 by task kcompactd0/37
>
> CPU: 0 PID: 37 Comm: kcompactd0 Kdump: loaded Tainted: G OE 5.10.60 #1
> Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> Call trace:
> dump_backtrace+0x0/0x394
> show_stack+0x34/0x4c
> dump_stack+0x158/0x1e4
> __kasan_report+0x138/0x140
> kasan_report+0x44/0xdc
> __asan_load8+0x94/0xd0
> kcompactd+0x440/0x504
> kthread+0x1a4/0x1f0
> ret_from_fork+0x10/0x18
>
> For race between kswapd_run() and kcompactd(), adding a temporary value
> when create a kthread, and only set it to pgdat->kswapd if kthread_run()
> return successful task_struct to fix the issue.
>
> For race between kswapd_stop() and kcompactd(), let's call kcompactd_stop()
> before kswapd_stop() to fix the issue.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> mm/memory_hotplug.c | 2 +-
> mm/vmscan.c | 8 +++++---
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index fad6d1f2262a..2fd45ccbce45 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1940,8 +1940,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>
> node_states_clear_node(node, &arg);
> if (arg.status_change_nid >= 0) {
> - kswapd_stop(node);
> kcompactd_stop(node);
> + kswapd_stop(node);
> }
This looks just fragile to randomly break again in the future when
people work on this code without being aware of this condition. Or once
with other (future?) kswapd_is_running() users. We at least need some
comment explaining that the order here matters and why.
But I do wonder if we can't handle it in a cleaner, more obvious, way.
kswapd_start()/kswapd_stop() should have a proper way to synchronize
with kswapd_is_running(). Just the matter of finding a suitable locking
primitive :)
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] mm: slince possible data races about pgdat->kswapd
2022-08-24 7:19 ` [PATCH 2/2] mm: slince possible data races about pgdat->kswapd Kefeng Wang
@ 2022-08-24 8:24 ` David Hildenbrand
2022-08-24 9:51 ` Kefeng Wang
2022-08-25 2:34 ` Kefeng Wang
0 siblings, 2 replies; 8+ messages in thread
From: David Hildenbrand @ 2022-08-24 8:24 UTC (permalink / raw)
To: Kefeng Wang, Andrew Morton, linux-mm; +Cc: muchun.song, linux-kernel
On 24.08.22 09:19, Kefeng Wang wrote:
> The pgdat->kswapd could be accessed concurrently by kswapd_run() and
> kcompactd(), it don't be protected by any lock, which could leads to
> data races, adding READ/WRITE_ONCE() to slince it.
Okay, I think this patch here makes it clearer that we really just want
proper synchronization instead of hacking around it.
What speaks against protecting pgdat->kswapd this using some proper
locking primitive?
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> mm/compaction.c | 4 +++-
> mm/vmscan.c | 8 ++++----
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 640fa76228dd..aa1cfe47f046 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1983,7 +1983,9 @@ static inline bool is_via_compact_memory(int order)
>
> static bool kswapd_is_running(pg_data_t *pgdat)
> {
> - return pgdat->kswapd && task_is_running(pgdat->kswapd);
> + struct task_struct *t = READ_ONCE(pgdat->kswapd);
> +
> + return t && task_is_running(t);
> }
>
> /*
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 08c6497f76c3..65b19ca8c8ee 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4644,7 +4644,7 @@ void kswapd_run(int nid)
> pg_data_t *pgdat = NODE_DATA(nid);
> struct task_struct *t;
>
> - if (pgdat->kswapd)
> + if (READ_ONCE(pgdat->kswapd))
> return;
>
> t = kthread_run(kswapd, pgdat, "kswapd%d", nid);
> @@ -4653,7 +4653,7 @@ void kswapd_run(int nid)
> BUG_ON(system_state < SYSTEM_RUNNING);
> pr_err("Failed to start kswapd on node %d\n", nid);
> } else {
> - pgdat->kswapd = t;
> + WRITE_ONCE(pgdat->kswapd, t);
> }
> }
>
> @@ -4663,11 +4663,11 @@ void kswapd_run(int nid)
> */
> void kswapd_stop(int nid)
> {
> - struct task_struct *kswapd = NODE_DATA(nid)->kswapd;
> + struct task_struct *kswapd = READ_ONCE(NODE_DATA(nid)->kswapd);
>
> if (kswapd) {
> kthread_stop(kswapd);
> - NODE_DATA(nid)->kswapd = NULL;
> + WRITE_ONCE(NODE_DATA(nid)->kswapd, NULL);
> }
> }
>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] mm: slince possible data races about pgdat->kswapd
2022-08-24 8:24 ` David Hildenbrand
@ 2022-08-24 9:51 ` Kefeng Wang
2022-08-25 2:34 ` Kefeng Wang
1 sibling, 0 replies; 8+ messages in thread
From: Kefeng Wang @ 2022-08-24 9:51 UTC (permalink / raw)
To: David Hildenbrand, Andrew Morton, linux-mm; +Cc: muchun.song, linux-kernel
On 2022/8/24 16:24, David Hildenbrand wrote:
> On 24.08.22 09:19, Kefeng Wang wrote:
>> The pgdat->kswapd could be accessed concurrently by kswapd_run() and
>> kcompactd(), it don't be protected by any lock, which could leads to
>> data races, adding READ/WRITE_ONCE() to slince it.
> Okay, I think this patch here makes it clearer that we really just want
> proper synchronization instead of hacking around it.
>
> What speaks against protecting pgdat->kswapd this using some proper
> locking primitive?
So add a new lock into struct pglist_data to protect pgdat->kswapd,other
option, thanks.
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>> mm/compaction.c | 4 +++-
>> mm/vmscan.c | 8 ++++----
>> 2 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 640fa76228dd..aa1cfe47f046 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1983,7 +1983,9 @@ static inline bool is_via_compact_memory(int order)
>>
>> static bool kswapd_is_running(pg_data_t *pgdat)
>> {
>> - return pgdat->kswapd && task_is_running(pgdat->kswapd);
>> + struct task_struct *t = READ_ONCE(pgdat->kswapd);
>> +
>> + return t && task_is_running(t);
>> }
>>
>> /*
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 08c6497f76c3..65b19ca8c8ee 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -4644,7 +4644,7 @@ void kswapd_run(int nid)
>> pg_data_t *pgdat = NODE_DATA(nid);
>> struct task_struct *t;
>>
>> - if (pgdat->kswapd)
>> + if (READ_ONCE(pgdat->kswapd))
>> return;
>>
>> t = kthread_run(kswapd, pgdat, "kswapd%d", nid);
>> @@ -4653,7 +4653,7 @@ void kswapd_run(int nid)
>> BUG_ON(system_state < SYSTEM_RUNNING);
>> pr_err("Failed to start kswapd on node %d\n", nid);
>> } else {
>> - pgdat->kswapd = t;
>> + WRITE_ONCE(pgdat->kswapd, t);
>> }
>> }
>>
>> @@ -4663,11 +4663,11 @@ void kswapd_run(int nid)
>> */
>> void kswapd_stop(int nid)
>> {
>> - struct task_struct *kswapd = NODE_DATA(nid)->kswapd;
>> + struct task_struct *kswapd = READ_ONCE(NODE_DATA(nid)->kswapd);
>>
>> if (kswapd) {
>> kthread_stop(kswapd);
>> - NODE_DATA(nid)->kswapd = NULL;
>> + WRITE_ONCE(NODE_DATA(nid)->kswapd, NULL);
>> }
>> }
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] mm: slince possible data races about pgdat->kswapd
2022-08-24 8:24 ` David Hildenbrand
2022-08-24 9:51 ` Kefeng Wang
@ 2022-08-25 2:34 ` Kefeng Wang
2022-08-25 8:22 ` David Hildenbrand
1 sibling, 1 reply; 8+ messages in thread
From: Kefeng Wang @ 2022-08-25 2:34 UTC (permalink / raw)
To: David Hildenbrand, Andrew Morton, linux-mm; +Cc: muchun.song, linux-kernel
On 2022/8/24 16:24, David Hildenbrand wrote:
> On 24.08.22 09:19, Kefeng Wang wrote:
>> The pgdat->kswapd could be accessed concurrently by kswapd_run() and
>> kcompactd(), it don't be protected by any lock, which could leads to
>> data races, adding READ/WRITE_ONCE() to slince it.
> Okay, I think this patch here makes it clearer that we really just want
> proper synchronization instead of hacking around it.
>
> What speaks against protecting pgdat->kswapd this using some proper
> locking primitive?
as comments about kswapd in struct pglist_data, pgdat->kswapd should be
protected by mem_hotplug_begin/done(), how about this way?
diff --git a/mm/compaction.c b/mm/compaction.c
index 640fa76228dd..62018f35242a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1983,7 +1983,13 @@ static inline bool is_via_compact_memory(int order)
static bool kswapd_is_running(pg_data_t *pgdat)
{
- return pgdat->kswapd && task_is_running(pgdat->kswapd);
+ bool running;
+
+ mem_hotplug_begin();
+ running = pgdat->kswapd && task_is_running(pgdat->kswapd);
+ mem_hotplug_end();
+
+ return running;
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] mm: slince possible data races about pgdat->kswapd
2022-08-25 2:34 ` Kefeng Wang
@ 2022-08-25 8:22 ` David Hildenbrand
2022-08-25 9:48 ` Kefeng Wang
0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2022-08-25 8:22 UTC (permalink / raw)
To: Kefeng Wang, Andrew Morton, linux-mm; +Cc: muchun.song, linux-kernel
On 25.08.22 04:34, Kefeng Wang wrote:
>
> On 2022/8/24 16:24, David Hildenbrand wrote:
>> On 24.08.22 09:19, Kefeng Wang wrote:
>>> The pgdat->kswapd could be accessed concurrently by kswapd_run() and
>>> kcompactd(), it don't be protected by any lock, which could leads to
>>> data races, adding READ/WRITE_ONCE() to slince it.
>> Okay, I think this patch here makes it clearer that we really just want
>> proper synchronization instead of hacking around it.
>>
>> What speaks against protecting pgdat->kswapd this using some proper
>> locking primitive?
>
> as comments about kswapd in struct pglist_data, pgdat->kswapd should be
>
> protected by mem_hotplug_begin/done(), how about this way?
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 640fa76228dd..62018f35242a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1983,7 +1983,13 @@ static inline bool is_via_compact_memory(int order)
>
> static bool kswapd_is_running(pg_data_t *pgdat)
> {
> - return pgdat->kswapd && task_is_running(pgdat->kswapd);
> + bool running;
> +
> + mem_hotplug_begin();
> + running = pgdat->kswapd && task_is_running(pgdat->kswapd);
> + mem_hotplug_end();
> +
> + return running;
> }
I'd much rather just use a dedicated lock that does not involve memory
hotplug.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] mm: slince possible data races about pgdat->kswapd
2022-08-25 8:22 ` David Hildenbrand
@ 2022-08-25 9:48 ` Kefeng Wang
0 siblings, 0 replies; 8+ messages in thread
From: Kefeng Wang @ 2022-08-25 9:48 UTC (permalink / raw)
To: David Hildenbrand, Andrew Morton, linux-mm; +Cc: muchun.song, linux-kernel
On 2022/8/25 16:22, David Hildenbrand wrote:
> On 25.08.22 04:34, Kefeng Wang wrote:
>> On 2022/8/24 16:24, David Hildenbrand wrote:
>>> On 24.08.22 09:19, Kefeng Wang wrote:
>>>> The pgdat->kswapd could be accessed concurrently by kswapd_run() and
>>>> kcompactd(), it don't be protected by any lock, which could leads to
>>>> data races, adding READ/WRITE_ONCE() to slince it.
>>> Okay, I think this patch here makes it clearer that we really just want
>>> proper synchronization instead of hacking around it.
>>>
>>> What speaks against protecting pgdat->kswapd this using some proper
>>> locking primitive?
>> as comments about kswapd in struct pglist_data, pgdat->kswapd should be
>>
>> protected by mem_hotplug_begin/done(), how about this way?
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 640fa76228dd..62018f35242a 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1983,7 +1983,13 @@ static inline bool is_via_compact_memory(int order)
>>
>> static bool kswapd_is_running(pg_data_t *pgdat)
>> {
>> - return pgdat->kswapd && task_is_running(pgdat->kswapd);
>> + bool running;
>> +
>> + mem_hotplug_begin();
>> + running = pgdat->kswapd && task_is_running(pgdat->kswapd);
>> + mem_hotplug_end();
>> +
>> + return running;
>> }
> I'd much rather just use a dedicated lock that does not involve memory
> hotplug.
The issue only occurred due memory hotplug, without mem-hotplug,
the kswapd won't stop or re-run, there is no above issue too, add a new
lock would be duplicated, but the scope of protection is smaller, I could
repost with new lock if no more comment.
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-08-25 9:52 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-24 7:19 [PATCH 1/2] mm: fix null-ptr-deref in kswapd_is_running() Kefeng Wang
2022-08-24 7:19 ` [PATCH 2/2] mm: slince possible data races about pgdat->kswapd Kefeng Wang
2022-08-24 8:24 ` David Hildenbrand
2022-08-24 9:51 ` Kefeng Wang
2022-08-25 2:34 ` Kefeng Wang
2022-08-25 8:22 ` David Hildenbrand
2022-08-25 9:48 ` Kefeng Wang
2022-08-24 7:56 ` [PATCH 1/2] mm: fix null-ptr-deref in kswapd_is_running() David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).