All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()
@ 2023-01-29  2:44 Kefeng Wang
  2023-01-29  4:09 ` [PATCH resend] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() Kefeng Wang
  2023-01-29 21:48 ` [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty() Andrew Morton
  0 siblings, 2 replies; 11+ messages in thread
From: Kefeng Wang @ 2023-01-29  2:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tejun Heo, Jens Axboe, Jan Kara, Shakeel Butt, Naoya Horiguchi,
	linux-kernel, linux-mm, Kefeng Wang, Ma Wupeng

As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
occurs a NULL pointer dereference, let's do not record the foreign
writebacks for folio memcg is null in mem_cgroup_track_foreign() to
fix it.

Reported-by: Ma Wupeng <mawupeng1@huawei.com>
Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/memcontrol.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index eb6e5b18e1ad..35478695cabf 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1688,10 +1688,13 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio,
 static inline void mem_cgroup_track_foreign_dirty(struct folio *folio,
 						  struct bdi_writeback *wb)
 {
+	struct mem_cgroup *memcg;
+
 	if (mem_cgroup_disabled())
 		return;
 
-	if (unlikely(&folio_memcg(folio)->css != wb->memcg_css))
+	memcg = folio_memcg(folio);
+	if (unlikely(memcg && &memcg->css != wb->memcg_css))
 		mem_cgroup_track_foreign_dirty_slowpath(folio, wb);
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH resend] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()
  2023-01-29  2:44 [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty() Kefeng Wang
@ 2023-01-29  4:09 ` Kefeng Wang
  2023-01-29 10:38   ` mikoxyzzz
  2023-01-29 21:48 ` [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty() Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: Kefeng Wang @ 2023-01-29  4:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tejun Heo, Jens Axboe, Jan Kara, Shakeel Butt, Naoya Horiguchi,
	linux-kernel, linux-mm, Kefeng Wang, Ma Wupeng

As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
occurs a NULL pointer dereference, let's do not record the foreign
writebacks for folio memcg is null in mem_cgroup_track_foreign_dirty()
to fix it.

Reported-by: Ma Wupeng <mawupeng1@huawei.com>
Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
resend: correct function name
 include/linux/memcontrol.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index eb6e5b18e1ad..35478695cabf 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1688,10 +1688,13 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio,
 static inline void mem_cgroup_track_foreign_dirty(struct folio *folio,
 						  struct bdi_writeback *wb)
 {
+	struct mem_cgroup *memcg;
+
 	if (mem_cgroup_disabled())
 		return;
 
-	if (unlikely(&folio_memcg(folio)->css != wb->memcg_css))
+	memcg = folio_memcg(folio);
+	if (unlikely(memcg && &memcg->css != wb->memcg_css))
 		mem_cgroup_track_foreign_dirty_slowpath(folio, wb);
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH resend] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()
  2023-01-29  4:09 ` [PATCH resend] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() Kefeng Wang
@ 2023-01-29 10:38   ` mikoxyzzz
  0 siblings, 0 replies; 11+ messages in thread
From: mikoxyzzz @ 2023-01-29 10:38 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton
  Cc: Tejun Heo, Jens Axboe, Jan Kara, Shakeel Butt, Naoya Horiguchi,
	linux-kernel, linux-mm, Ma Wupeng

On Sun, 2023-01-29 at 12:09 +0800, Kefeng Wang wrote:
> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU
> pages"),
> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> occurs a NULL pointer dereference, let's do not record the foreign
> writebacks for folio memcg is null in
> mem_cgroup_track_foreign_dirty()
> to fix it.
> 
> Reported-by: Ma Wupeng <mawupeng1@huawei.com>
> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty
> flushing")
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> resend: correct function name
>  include/linux/memcontrol.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index eb6e5b18e1ad..35478695cabf 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1688,10 +1688,13 @@ void
> mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio,
>  static inline void mem_cgroup_track_foreign_dirty(struct folio
> *folio,
>                                                   struct
> bdi_writeback *wb)
>  {
> +       struct mem_cgroup *memcg;
> +
>         if (mem_cgroup_disabled())
>                 return;
>  
> -       if (unlikely(&folio_memcg(folio)->css != wb->memcg_css))
> +       memcg = folio_memcg(folio);
> +       if (unlikely(memcg && &memcg->css != wb->memcg_css))
>                 mem_cgroup_track_foreign_dirty_slowpath(folio, wb);
>  }
>  

Might want to Cc linux-stable.
Tested-by: Miko Larsson <mikoxyzzz@gmail.com>

--
~miko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()
  2023-01-29  2:44 [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty() Kefeng Wang
  2023-01-29  4:09 ` [PATCH resend] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() Kefeng Wang
@ 2023-01-29 21:48 ` Andrew Morton
  2023-01-30  1:16   ` Kefeng Wang
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2023-01-29 21:48 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Tejun Heo, Jens Axboe, Jan Kara, Shakeel Butt, Naoya Horiguchi,
	linux-kernel, linux-mm, Ma Wupeng, Michal Hocko

On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:

> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),

Merged in 2017.

> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> occurs a NULL pointer dereference, let's do not record the foreign
> writebacks for folio memcg is null in mem_cgroup_track_foreign() to
> fix it.
> 
> Reported-by: Ma Wupeng <mawupeng1@huawei.com>
> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")

Merged in 2019.

> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1688,10 +1688,13 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio,
>  static inline void mem_cgroup_track_foreign_dirty(struct folio *folio,
>  						  struct bdi_writeback *wb)
>  {
> +	struct mem_cgroup *memcg;
> +
>  	if (mem_cgroup_disabled())
>  		return;
>  
> -	if (unlikely(&folio_memcg(folio)->css != wb->memcg_css))
> +	memcg = folio_memcg(folio);
> +	if (unlikely(memcg && &memcg->css != wb->memcg_css))
>  		mem_cgroup_track_foreign_dirty_slowpath(folio, wb);
>  }

Has this null deref actually been observed, or is this from code
inspection?  (This is why it's nice to include the Link: after a
Reported-by!)

Do we have any theories why this took so many years to surface?

I'm confused about the mention of 18365225f044, but the Fixes: target
is a different commit.  Please explain this?

Do you think the fix should be backported into earlier -stable kernels?
If so, it will need some rework due to the subsequent folio
conversion.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()
  2023-01-29 21:48 ` [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty() Andrew Morton
@ 2023-01-30  1:16   ` Kefeng Wang
  2023-01-30  8:48     ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Kefeng Wang @ 2023-01-30  1:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tejun Heo, Jens Axboe, Jan Kara, Shakeel Butt, Naoya Horiguchi,
	linux-kernel, linux-mm, Ma Wupeng, Michal Hocko



On 2023/1/30 5:48, Andrew Morton wrote:
> On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> 
>> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
> 
> Merged in 2017.
> 
>> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
>> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
>> occurs a NULL pointer dereference, let's do not record the foreign
>> writebacks for folio memcg is null in mem_cgroup_track_foreign() to
>> fix it.
>>
>> Reported-by: Ma Wupeng <mawupeng1@huawei.com>
>> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
> 
> Merged in 2019.
> 
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -1688,10 +1688,13 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio,
>>   static inline void mem_cgroup_track_foreign_dirty(struct folio *folio,
>>   						  struct bdi_writeback *wb)
>>   {
>> +	struct mem_cgroup *memcg;
>> +
>>   	if (mem_cgroup_disabled())
>>   		return;
>>   
>> -	if (unlikely(&folio_memcg(folio)->css != wb->memcg_css))
>> +	memcg = folio_memcg(folio);
>> +	if (unlikely(memcg && &memcg->css != wb->memcg_css))
>>   		mem_cgroup_track_foreign_dirty_slowpath(folio, wb);
>>   }
> 
> Has this null deref actually been observed, or is this from code
> inspection?  (This is why it's nice to include the Link: after a
> Reported-by!)

It does occurs in our internal test and report by wupeng(based v5.10),

BUG: KASAN: user-memory-access in 
mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708
Read of size 8 at addr 0000000000001000 by task syz-executor.2/28325

CPU: 2 PID: 28325 Comm: syz-executor.2 Tainted: G        W 
5.10.0-03333-g48e46a146cbc-dirty #1
Hardware name: linux,dummy-virt (DT)
Call trace:
  ...
  mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708
  mem_cgroup_track_foreign_dirty include/linux/memcontrol.h:1880 [inline]
  account_page_dirtied+0x9a0/0xa90 mm/page-writeback.c:2436
  __set_page_dirty+0x1f8/0x4b0 fs/buffer.c:608
  __set_page_dirty_buffers+0x3d0/0x550 fs/buffer.c:668
  set_page_dirty+0x158/0x500 mm/page-writeback.c:2575
  filemap_page_mkwrite+0x3dc/0x490 mm/filemap.c:3224
  do_page_mkwrite+0xc4/0x3d0 mm/memory.c:2786
  wp_page_shared+0x14c/0x980 mm/memory.c:3118
  do_wp_page+0x930/0xbc0 mm/memory.c:3219
  handle_pte_fault+0x5e0/0x630 mm/memory.c:4570
  __handle_mm_fault+0x41c/0x910 mm/memory.c:4690
  handle_mm_fault+0x25c/0x484 mm/memory.c:4788
  __do_page_fault arch/arm64/mm/fault.c:440 [inline]
  do_page_fault+0x3ac/0x9d4 arch/arm64/mm/fault.c:539

> 
> Do we have any theories why this took so many years to surface?

After google, I found a similar issue[1], maybe 
hwpoison/mem_cgroup_track_foreign_dirty concurrency is uncommon.

[1] https://syzkaller.appspot.com/bug?extid=0c84bf23aed8ee0d8399

> 
> I'm confused about the mention of 18365225f044, but the Fixes: target
> is a different commit.  Please explain this?

18365225f044 said that it will uncharge it manually from its memcg in 
hwpison handler, but when the new feature "writeback, memcg: Implement 
foreign dirty flushing" is introduced, we don't consider this, when
folio's memcg is cleared by hwpison handler, the issue occurs.

> 
> Do you think the fix should be backported into earlier -stable kernels?

it's better to send stable kernel.

> If so, it will need some rework due to the subsequent folio
> conversion.

When the patch is merged, I could refresh and send to stable maillist.

> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()
  2023-01-30  1:16   ` Kefeng Wang
@ 2023-01-30  8:48     ` Michal Hocko
  2023-01-30 12:20       ` Kefeng Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2023-01-30  8:48 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, Tejun Heo, Jens Axboe, Jan Kara, Shakeel Butt,
	Naoya Horiguchi, linux-kernel, linux-mm, Ma Wupeng

On Mon 30-01-23 09:16:13, Kefeng Wang wrote:
> 
> 
> On 2023/1/30 5:48, Andrew Morton wrote:
> > On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> > 
> > > As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
> > 
> > Merged in 2017.
> > 
> > > hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> > > could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> > > occurs a NULL pointer dereference, let's do not record the foreign
> > > writebacks for folio memcg is null in mem_cgroup_track_foreign() to
> > > fix it.
> > > 
> > > Reported-by: Ma Wupeng <mawupeng1@huawei.com>
> > > Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
> > 
> > Merged in 2019.
> > 
> > > --- a/include/linux/memcontrol.h
> > > +++ b/include/linux/memcontrol.h
> > > @@ -1688,10 +1688,13 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio,
> > >   static inline void mem_cgroup_track_foreign_dirty(struct folio *folio,
> > >   						  struct bdi_writeback *wb)
> > >   {
> > > +	struct mem_cgroup *memcg;
> > > +
> > >   	if (mem_cgroup_disabled())
> > >   		return;
> > > -	if (unlikely(&folio_memcg(folio)->css != wb->memcg_css))
> > > +	memcg = folio_memcg(folio);
> > > +	if (unlikely(memcg && &memcg->css != wb->memcg_css))
> > >   		mem_cgroup_track_foreign_dirty_slowpath(folio, wb);
> > >   }
> > 
> > Has this null deref actually been observed, or is this from code
> > inspection?  (This is why it's nice to include the Link: after a
> > Reported-by!)
> 
> It does occurs in our internal test and report by wupeng(based v5.10),
> 
> BUG: KASAN: user-memory-access in
> mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708
> Read of size 8 at addr 0000000000001000 by task syz-executor.2/28325
> 
> CPU: 2 PID: 28325 Comm: syz-executor.2 Tainted: G        W
> 5.10.0-03333-g48e46a146cbc-dirty #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  ...
>  mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708
>  mem_cgroup_track_foreign_dirty include/linux/memcontrol.h:1880 [inline]
>  account_page_dirtied+0x9a0/0xa90 mm/page-writeback.c:2436
>  __set_page_dirty+0x1f8/0x4b0 fs/buffer.c:608
>  __set_page_dirty_buffers+0x3d0/0x550 fs/buffer.c:668
>  set_page_dirty+0x158/0x500 mm/page-writeback.c:2575
>  filemap_page_mkwrite+0x3dc/0x490 mm/filemap.c:3224
>  do_page_mkwrite+0xc4/0x3d0 mm/memory.c:2786
>  wp_page_shared+0x14c/0x980 mm/memory.c:3118
>  do_wp_page+0x930/0xbc0 mm/memory.c:3219
>  handle_pte_fault+0x5e0/0x630 mm/memory.c:4570
>  __handle_mm_fault+0x41c/0x910 mm/memory.c:4690
>  handle_mm_fault+0x25c/0x484 mm/memory.c:4788
>  __do_page_fault arch/arm64/mm/fault.c:440 [inline]
>  do_page_fault+0x3ac/0x9d4 arch/arm64/mm/fault.c:539

Just to make sure I understand. The page has been hwpoisoned, uncharged
but stayed in the page cache so a next page fault on the address has blowned
up?

Say we address the NULL memcg case. What is the resulting behavior?
Doesn't userspace access a poisoned page and get a silend memory
corruption?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()
  2023-01-30  8:48     ` Michal Hocko
@ 2023-01-30 12:20       ` Kefeng Wang
  2023-01-30 13:02         ` Michal Hocko
  2023-01-30 19:30         ` Yang Shi
  0 siblings, 2 replies; 11+ messages in thread
From: Kefeng Wang @ 2023-01-30 12:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Tejun Heo, Jens Axboe, Jan Kara, Shakeel Butt,
	Naoya Horiguchi, linux-kernel, linux-mm, Ma Wupeng, shy828301



On 2023/1/30 16:48, Michal Hocko wrote:
> On Mon 30-01-23 09:16:13, Kefeng Wang wrote:
>>
>>
>> On 2023/1/30 5:48, Andrew Morton wrote:
>>> On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>
>>>> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
>>>
>>> Merged in 2017.
>>>
>>>> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
>>>> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
>>>> occurs a NULL pointer dereference, let's do not record the foreign
>>>> writebacks for folio memcg is null in mem_cgroup_track_foreign() to
>>>> fix it.
>>>>
>>>> Reported-by: Ma Wupeng <mawupeng1@huawei.com>
>>>> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
>>>
>>> Merged in 2019.
>>>
...
> 
> Just to make sure I understand. The page has been hwpoisoned, uncharged
> but stayed in the page cache so a next page fault on the address has blowned
> up?
> 
> Say we address the NULL memcg case. What is the resulting behavior?
> Doesn't userspace access a poisoned page and get a silend memory
> corruption?

+ Yang Shi

Check previous link[1], seems that it is a known issue, and there is a 
TODO list for storage backed filesystems from Yang.


[1] 
https://lore.kernel.org/all/20211020210755.23964-6-shy828301@gmail.com/T/#m1d40559ca2dcf94396df5369214288f69dec379b

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()
  2023-01-30 12:20       ` Kefeng Wang
@ 2023-01-30 13:02         ` Michal Hocko
  2023-01-30 19:30         ` Yang Shi
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2023-01-30 13:02 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, Tejun Heo, Jens Axboe, Jan Kara, Shakeel Butt,
	Naoya Horiguchi, linux-kernel, linux-mm, Ma Wupeng, shy828301

On Mon 30-01-23 20:20:16, Kefeng Wang wrote:
> 
> 
> On 2023/1/30 16:48, Michal Hocko wrote:
> > On Mon 30-01-23 09:16:13, Kefeng Wang wrote:
> > > 
> > > 
> > > On 2023/1/30 5:48, Andrew Morton wrote:
> > > > On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> > > > 
> > > > > As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
> > > > 
> > > > Merged in 2017.
> > > > 
> > > > > hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> > > > > could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> > > > > occurs a NULL pointer dereference, let's do not record the foreign
> > > > > writebacks for folio memcg is null in mem_cgroup_track_foreign() to
> > > > > fix it.
> > > > > 
> > > > > Reported-by: Ma Wupeng <mawupeng1@huawei.com>
> > > > > Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
> > > > 
> > > > Merged in 2019.
> > > > 
> ...
> > 
> > Just to make sure I understand. The page has been hwpoisoned, uncharged
> > but stayed in the page cache so a next page fault on the address has blowned
> > up?
> > 
> > Say we address the NULL memcg case. What is the resulting behavior?
> > Doesn't userspace access a poisoned page and get a silend memory
> > corruption?
> 
> + Yang Shi
> 
> Check previous link[1], seems that it is a known issue, and there is a TODO
> list for storage backed filesystems from Yang.

OK, so IIUC this patch will just help the test to not blow up but it
will not allow the test to behave consistently. From my past experience
the hwpoisoning is not really something that any production environment
should be relying on working properly.

But this patch is straightforward so no objection from me. 
Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!

> [1] https://lore.kernel.org/all/20211020210755.23964-6-shy828301@gmail.com/T/#m1d40559ca2dcf94396df5369214288f69dec379b

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()
  2023-01-30 12:20       ` Kefeng Wang
  2023-01-30 13:02         ` Michal Hocko
@ 2023-01-30 19:30         ` Yang Shi
  2023-02-01  8:07           ` Michal Hocko
  1 sibling, 1 reply; 11+ messages in thread
From: Yang Shi @ 2023-01-30 19:30 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Michal Hocko, Andrew Morton, Tejun Heo, Jens Axboe, Jan Kara,
	Shakeel Butt, Naoya Horiguchi, linux-kernel, linux-mm, Ma Wupeng

On Mon, Jan 30, 2023 at 4:20 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
>
> On 2023/1/30 16:48, Michal Hocko wrote:
> > On Mon 30-01-23 09:16:13, Kefeng Wang wrote:
> >>
> >>
> >> On 2023/1/30 5:48, Andrew Morton wrote:
> >>> On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>>
> >>>> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
> >>>
> >>> Merged in 2017.
> >>>
> >>>> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> >>>> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> >>>> occurs a NULL pointer dereference, let's do not record the foreign
> >>>> writebacks for folio memcg is null in mem_cgroup_track_foreign() to
> >>>> fix it.
> >>>>
> >>>> Reported-by: Ma Wupeng <mawupeng1@huawei.com>
> >>>> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
> >>>
> >>> Merged in 2019.
> >>>
> ...
> >
> > Just to make sure I understand. The page has been hwpoisoned, uncharged
> > but stayed in the page cache so a next page fault on the address has blowned
> > up?
> >
> > Say we address the NULL memcg case. What is the resulting behavior?
> > Doesn't userspace access a poisoned page and get a silend memory
> > corruption?
>
> + Yang Shi
>
> Check previous link[1], seems that it is a known issue, and there is a
> TODO list for storage backed filesystems from Yang.

For tmpfs and hugetlbfs, the page cache still stay in page cache, the
later page fault will handle the case gracefully. Other real storage
backed filesystem will have page cache truncated.

The page cache will be uncharged before truncate. If the truncate
fails, we may end up in this case.

>
>
> [1]
> https://lore.kernel.org/all/20211020210755.23964-6-shy828301@gmail.com/T/#m1d40559ca2dcf94396df5369214288f69dec379b

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()
  2023-01-30 19:30         ` Yang Shi
@ 2023-02-01  8:07           ` Michal Hocko
  2023-02-01 17:21             ` Yang Shi
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2023-02-01  8:07 UTC (permalink / raw)
  To: Yang Shi
  Cc: Kefeng Wang, Andrew Morton, Tejun Heo, Jens Axboe, Jan Kara,
	Shakeel Butt, Naoya Horiguchi, linux-kernel, linux-mm, Ma Wupeng

On Mon 30-01-23 11:30:47, Yang Shi wrote:
> On Mon, Jan 30, 2023 at 4:20 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >
> >
> >
> > On 2023/1/30 16:48, Michal Hocko wrote:
> > > On Mon 30-01-23 09:16:13, Kefeng Wang wrote:
> > >>
> > >>
> > >> On 2023/1/30 5:48, Andrew Morton wrote:
> > >>> On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> > >>>
> > >>>> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
> > >>>
> > >>> Merged in 2017.
> > >>>
> > >>>> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> > >>>> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> > >>>> occurs a NULL pointer dereference, let's do not record the foreign
> > >>>> writebacks for folio memcg is null in mem_cgroup_track_foreign() to
> > >>>> fix it.
> > >>>>
> > >>>> Reported-by: Ma Wupeng <mawupeng1@huawei.com>
> > >>>> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
> > >>>
> > >>> Merged in 2019.
> > >>>
> > ...
> > >
> > > Just to make sure I understand. The page has been hwpoisoned, uncharged
> > > but stayed in the page cache so a next page fault on the address has blowned
> > > up?
> > >
> > > Say we address the NULL memcg case. What is the resulting behavior?
> > > Doesn't userspace access a poisoned page and get a silend memory
> > > corruption?
> >
> > + Yang Shi
> >
> > Check previous link[1], seems that it is a known issue, and there is a
> > TODO list for storage backed filesystems from Yang.
> 
> For tmpfs and hugetlbfs, the page cache still stay in page cache, the
> later page fault will handle the case gracefully. Other real storage
> backed filesystem will have page cache truncated.
> 
> The page cache will be uncharged before truncate. If the truncate
> fails, we may end up in this case.

This would be a good addendum to the changelog. What would be a typical
failure in the truncation path?
 
> >
> >
> > [1]
> > https://lore.kernel.org/all/20211020210755.23964-6-shy828301@gmail.com/T/#m1d40559ca2dcf94396df5369214288f69dec379b

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()
  2023-02-01  8:07           ` Michal Hocko
@ 2023-02-01 17:21             ` Yang Shi
  0 siblings, 0 replies; 11+ messages in thread
From: Yang Shi @ 2023-02-01 17:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kefeng Wang, Andrew Morton, Tejun Heo, Jens Axboe, Jan Kara,
	Shakeel Butt, Naoya Horiguchi, linux-kernel, linux-mm, Ma Wupeng

On Wed, Feb 1, 2023 at 12:07 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 30-01-23 11:30:47, Yang Shi wrote:
> > On Mon, Jan 30, 2023 at 4:20 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> > >
> > >
> > >
> > > On 2023/1/30 16:48, Michal Hocko wrote:
> > > > On Mon 30-01-23 09:16:13, Kefeng Wang wrote:
> > > >>
> > > >>
> > > >> On 2023/1/30 5:48, Andrew Morton wrote:
> > > >>> On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> > > >>>
> > > >>>> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
> > > >>>
> > > >>> Merged in 2017.
> > > >>>
> > > >>>> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> > > >>>> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> > > >>>> occurs a NULL pointer dereference, let's do not record the foreign
> > > >>>> writebacks for folio memcg is null in mem_cgroup_track_foreign() to
> > > >>>> fix it.
> > > >>>>
> > > >>>> Reported-by: Ma Wupeng <mawupeng1@huawei.com>
> > > >>>> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
> > > >>>
> > > >>> Merged in 2019.
> > > >>>
> > > ...
> > > >
> > > > Just to make sure I understand. The page has been hwpoisoned, uncharged
> > > > but stayed in the page cache so a next page fault on the address has blowned
> > > > up?
> > > >
> > > > Say we address the NULL memcg case. What is the resulting behavior?
> > > > Doesn't userspace access a poisoned page and get a silend memory
> > > > corruption?
> > >
> > > + Yang Shi
> > >
> > > Check previous link[1], seems that it is a known issue, and there is a
> > > TODO list for storage backed filesystems from Yang.
> >
> > For tmpfs and hugetlbfs, the page cache still stay in page cache, the
> > later page fault will handle the case gracefully. Other real storage
> > backed filesystem will have page cache truncated.
> >
> > The page cache will be uncharged before truncate. If the truncate
> > fails, we may end up in this case.
>
> This would be a good addendum to the changelog. What would be a typical
> failure in the truncation path?

For memory failure path, there may be a couple of cases, for example,
page is not for a regular file (maybe directory), fail to release
buffers, etc.

>
> > >
> > >
> > > [1]
> > > https://lore.kernel.org/all/20211020210755.23964-6-shy828301@gmail.com/T/#m1d40559ca2dcf94396df5369214288f69dec379b
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-02-01 17:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-29  2:44 [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty() Kefeng Wang
2023-01-29  4:09 ` [PATCH resend] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() Kefeng Wang
2023-01-29 10:38   ` mikoxyzzz
2023-01-29 21:48 ` [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty() Andrew Morton
2023-01-30  1:16   ` Kefeng Wang
2023-01-30  8:48     ` Michal Hocko
2023-01-30 12:20       ` Kefeng Wang
2023-01-30 13:02         ` Michal Hocko
2023-01-30 19:30         ` Yang Shi
2023-02-01  8:07           ` Michal Hocko
2023-02-01 17:21             ` Yang Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.