linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
@ 2021-01-25  4:24 Waiman Long
  2021-01-25  4:36 ` Matthew Wilcox
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Waiman Long @ 2021-01-25  4:24 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner, Alex Shi
  Cc: linux-mm, linux-kernel, Waiman Long

The commit 3fea5a499d57 ("mm: memcontrol: convert page
cache to a new mem_cgroup_charge() API") introduced a bug in
__add_to_page_cache_locked() causing the following splat:

 [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
 [ 1570.068333] pages's memcg:ffff8889a4116000
 [ 1570.068343] ------------[ cut here ]------------
 [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
 [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
 [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
 [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
 [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
   :
 [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
 [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
 [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
 [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
 [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
 [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
 [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
 [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
 [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [ 1570.068402] PKRU: 55555554
 [ 1570.068404] Call Trace:
 [ 1570.068407]  mem_cgroup_charge+0x175/0x770
 [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
 [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
 [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
 [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
 [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
 [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
 [ 1570.068693]  read_pages+0x5b1/0xc40
 [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
 [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
 [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
 [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
 [ 1570.068872]  new_sync_read+0x3af/0x610
 [ 1570.068901]  vfs_read+0x339/0x4b0
 [ 1570.068909]  ksys_read+0xf1/0x1c0
 [ 1570.068920]  do_syscall_64+0x33/0x40
 [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 [ 1570.068930] RIP: 0033:0x7ff039135595

Before that commit, there was a try_charge() and commit_charge()
in __add_to_page_cache_locked(). These 2 separated charge functions
were replaced by a single mem_cgroup_charge(). However, it forgot
to add a matching mem_cgroup_uncharge() when the xarray insertion
failed with the page released back to the pool. Fix this by adding a
mem_cgroup_uncharge() call when insertion error happens.

Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/filemap.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 5c9d564317a5..aa0e0fb04670 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
 	XA_STATE(xas, &mapping->i_pages, offset);
 	int huge = PageHuge(page);
 	int error;
+	bool charged = false;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageSwapBacked(page), page);
@@ -848,6 +849,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
 		error = mem_cgroup_charge(page, current->mm, gfp);
 		if (error)
 			goto error;
+		charged = true;
 	}
 
 	gfp &= GFP_RECLAIM_MASK;
@@ -896,6 +898,8 @@ noinline int __add_to_page_cache_locked(struct page *page,
 
 	if (xas_error(&xas)) {
 		error = xas_error(&xas);
+		if (charged)
+			mem_cgroup_uncharge(page);
 		goto error;
 	}
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  4:24 [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Waiman Long
@ 2021-01-25  4:36 ` Matthew Wilcox
  2021-01-25  6:34   ` Miaohe Lin
  2021-01-25 14:09   ` Waiman Long
  2021-01-25  4:41 ` Alex Shi
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 26+ messages in thread
From: Matthew Wilcox @ 2021-01-25  4:36 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Johannes Weiner, Alex Shi, linux-mm, linux-kernel

On Sun, Jan 24, 2021 at 11:24:41PM -0500, Waiman Long wrote:
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 5c9d564317a5..aa0e0fb04670 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  	XA_STATE(xas, &mapping->i_pages, offset);
>  	int huge = PageHuge(page);
>  	int error;
> +	bool charged = false;

I don't think we need this extra bool.

> @@ -896,6 +898,8 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  
>  	if (xas_error(&xas)) {
>  		error = xas_error(&xas);
> +		if (charged)
> +			mem_cgroup_uncharge(page);
>  		goto error;
>  	}

Better:

-		goto error;
+		goto uncharge;
...
+uncharge:
+	if (!huge)
+		mem_cgroup_uncharge(page);
 error:
...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  4:24 [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Waiman Long
  2021-01-25  4:36 ` Matthew Wilcox
@ 2021-01-25  4:41 ` Alex Shi
  2021-01-25  6:30 ` Miaohe Lin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Alex Shi @ 2021-01-25  4:41 UTC (permalink / raw)
  To: Waiman Long, Andrew Morton, Johannes Weiner; +Cc: linux-mm, linux-kernel

Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>

在 2021/1/25 下午12:24, Waiman Long 写道:
> The commit 3fea5a499d57 ("mm: memcontrol: convert page
> cache to a new mem_cgroup_charge() API") introduced a bug in
> __add_to_page_cache_locked() causing the following splat:
> 
>  [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>  [ 1570.068333] pages's memcg:ffff8889a4116000
>  [ 1570.068343] ------------[ cut here ]------------
>  [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>  [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>  [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>  [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>  [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>    :
>  [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>  [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>  [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>  [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>  [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>  [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>  [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>  [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>  [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  [ 1570.068402] PKRU: 55555554
>  [ 1570.068404] Call Trace:
>  [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>  [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>  [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>  [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>  [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>  [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>  [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>  [ 1570.068693]  read_pages+0x5b1/0xc40
>  [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>  [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>  [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>  [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>  [ 1570.068872]  new_sync_read+0x3af/0x610
>  [ 1570.068901]  vfs_read+0x339/0x4b0
>  [ 1570.068909]  ksys_read+0xf1/0x1c0
>  [ 1570.068920]  do_syscall_64+0x33/0x40
>  [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  [ 1570.068930] RIP: 0033:0x7ff039135595
> 
> Before that commit, there was a try_charge() and commit_charge()
> in __add_to_page_cache_locked(). These 2 separated charge functions
> were replaced by a single mem_cgroup_charge(). However, it forgot
> to add a matching mem_cgroup_uncharge() when the xarray insertion
> failed with the page released back to the pool. Fix this by adding a
> mem_cgroup_uncharge() call when insertion error happens.
> 
> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  mm/filemap.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 5c9d564317a5..aa0e0fb04670 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  	XA_STATE(xas, &mapping->i_pages, offset);
>  	int huge = PageHuge(page);
>  	int error;
> +	bool charged = false;
>  
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>  	VM_BUG_ON_PAGE(PageSwapBacked(page), page);
> @@ -848,6 +849,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  		error = mem_cgroup_charge(page, current->mm, gfp);
>  		if (error)
>  			goto error;
> +		charged = true;
>  	}
>  
>  	gfp &= GFP_RECLAIM_MASK;
> @@ -896,6 +898,8 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  
>  	if (xas_error(&xas)) {
>  		error = xas_error(&xas);
> +		if (charged)
> +			mem_cgroup_uncharge(page);
>  		goto error;
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  4:24 [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Waiman Long
  2021-01-25  4:36 ` Matthew Wilcox
  2021-01-25  4:41 ` Alex Shi
@ 2021-01-25  6:30 ` Miaohe Lin
  2021-01-25 14:12   ` Waiman Long
  2021-01-25  8:07 ` Muchun Song
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Miaohe Lin @ 2021-01-25  6:30 UTC (permalink / raw)
  To: Waiman Long
  Cc: linux-mm, linux-kernel, Andrew Morton, Johannes Weiner, Alex Shi

On 2021/1/25 12:24, Waiman Long wrote:
> The commit 3fea5a499d57 ("mm: memcontrol: convert page
> cache to a new mem_cgroup_charge() API") introduced a bug in
> __add_to_page_cache_locked() causing the following splat:
> 
>  [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>  [ 1570.068333] pages's memcg:ffff8889a4116000
>  [ 1570.068343] ------------[ cut here ]------------
>  [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>  [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>  [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>  [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>  [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>    :
>  [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>  [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>  [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>  [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>  [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>  [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>  [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>  [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>  [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  [ 1570.068402] PKRU: 55555554
>  [ 1570.068404] Call Trace:
>  [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>  [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>  [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>  [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>  [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>  [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>  [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>  [ 1570.068693]  read_pages+0x5b1/0xc40
>  [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>  [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>  [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>  [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>  [ 1570.068872]  new_sync_read+0x3af/0x610
>  [ 1570.068901]  vfs_read+0x339/0x4b0
>  [ 1570.068909]  ksys_read+0xf1/0x1c0
>  [ 1570.068920]  do_syscall_64+0x33/0x40
>  [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  [ 1570.068930] RIP: 0033:0x7ff039135595
> 
> Before that commit, there was a try_charge() and commit_charge()
> in __add_to_page_cache_locked(). These 2 separated charge functions
> were replaced by a single mem_cgroup_charge(). However, it forgot
> to add a matching mem_cgroup_uncharge() when the xarray insertion
> failed with the page released back to the pool. Fix this by adding a
> mem_cgroup_uncharge() call when insertion error happens.
> 
> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> Signed-off-by: Waiman Long <longman@redhat.com>

Cc stable should be needed.

> ---
>  mm/filemap.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 5c9d564317a5..aa0e0fb04670 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  	XA_STATE(xas, &mapping->i_pages, offset);
>  	int huge = PageHuge(page);
>  	int error;
> +	bool charged = false;
>  
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>  	VM_BUG_ON_PAGE(PageSwapBacked(page), page);
> @@ -848,6 +849,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  		error = mem_cgroup_charge(page, current->mm, gfp);
>  		if (error)
>  			goto error;
> +		charged = true;
>  	}
>  
>  	gfp &= GFP_RECLAIM_MASK;
> @@ -896,6 +898,8 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  
>  	if (xas_error(&xas)) {
>  		error = xas_error(&xas);
> +		if (charged)
> +			mem_cgroup_uncharge(page);
>  		goto error;
>  	}
>  
> 

Looks good to me. Thanks.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  4:36 ` Matthew Wilcox
@ 2021-01-25  6:34   ` Miaohe Lin
  2021-01-25 14:09   ` Waiman Long
  1 sibling, 0 replies; 26+ messages in thread
From: Miaohe Lin @ 2021-01-25  6:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Johannes Weiner, Alex Shi, linux-mm, linux-kernel,
	Waiman Long

Hi:
On 2021/1/25 12:36, Matthew Wilcox wrote:
> On Sun, Jan 24, 2021 at 11:24:41PM -0500, Waiman Long wrote:
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 5c9d564317a5..aa0e0fb04670 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>>  	XA_STATE(xas, &mapping->i_pages, offset);
>>  	int huge = PageHuge(page);
>>  	int error;
>> +	bool charged = false;
> 
> I don't think we need this extra bool.
> 
>> @@ -896,6 +898,8 @@ noinline int __add_to_page_cache_locked(struct page *page,
>>  
>>  	if (xas_error(&xas)) {
>>  		error = xas_error(&xas);
>> +		if (charged)
>> +			mem_cgroup_uncharge(page);
>>  		goto error;
>>  	}
> 
> Better:
> 
> -		goto error;
> +		goto uncharge;
> ...
> +uncharge:
> +	if (!huge)

Since commit 7a02fa97b897 ("secretmem: add memcg accounting"), this may be:
	if (!huge && !page_is_secretmem(page))
Or just bool charged is better?
Many thanks.
> +		mem_cgroup_uncharge(page);
>  error:
> ...
> 
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  4:24 [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Waiman Long
                   ` (2 preceding siblings ...)
  2021-01-25  6:30 ` Miaohe Lin
@ 2021-01-25  8:07 ` Muchun Song
  2021-01-25 15:35   ` Waiman Long
  2021-01-25  9:28 ` Michal Hocko
  2021-01-25 19:32 ` Johannes Weiner
  5 siblings, 1 reply; 26+ messages in thread
From: Muchun Song @ 2021-01-25  8:07 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Johannes Weiner, Alex Shi, linux-mm, linux-kernel

On Mon, Jan 25, 2021 at 12:29 PM Waiman Long <longman@redhat.com> wrote:
>
> The commit 3fea5a499d57 ("mm: memcontrol: convert page
> cache to a new mem_cgroup_charge() API") introduced a bug in
> __add_to_page_cache_locked() causing the following splat:
>
>  [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>  [ 1570.068333] pages's memcg:ffff8889a4116000
>  [ 1570.068343] ------------[ cut here ]------------
>  [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>  [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>  [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>  [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>  [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>    :
>  [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>  [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>  [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>  [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>  [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>  [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>  [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>  [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>  [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  [ 1570.068402] PKRU: 55555554
>  [ 1570.068404] Call Trace:
>  [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>  [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>  [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>  [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>  [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>  [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>  [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>  [ 1570.068693]  read_pages+0x5b1/0xc40
>  [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>  [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>  [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>  [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>  [ 1570.068872]  new_sync_read+0x3af/0x610
>  [ 1570.068901]  vfs_read+0x339/0x4b0
>  [ 1570.068909]  ksys_read+0xf1/0x1c0
>  [ 1570.068920]  do_syscall_64+0x33/0x40
>  [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  [ 1570.068930] RIP: 0033:0x7ff039135595
>
> Before that commit, there was a try_charge() and commit_charge()
> in __add_to_page_cache_locked(). These 2 separated charge functions
> were replaced by a single mem_cgroup_charge(). However, it forgot
> to add a matching mem_cgroup_uncharge() when the xarray insertion
> failed with the page released back to the pool. Fix this by adding a
> mem_cgroup_uncharge() call when insertion error happens.
>
> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  mm/filemap.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 5c9d564317a5..aa0e0fb04670 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>         XA_STATE(xas, &mapping->i_pages, offset);
>         int huge = PageHuge(page);
>         int error;
> +       bool charged = false;
>
>         VM_BUG_ON_PAGE(!PageLocked(page), page);
>         VM_BUG_ON_PAGE(PageSwapBacked(page), page);
> @@ -848,6 +849,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>                 error = mem_cgroup_charge(page, current->mm, gfp);
>                 if (error)
>                         goto error;
> +               charged = true;
>         }
>
>         gfp &= GFP_RECLAIM_MASK;
> @@ -896,6 +898,8 @@ noinline int __add_to_page_cache_locked(struct page *page,
>
>         if (xas_error(&xas)) {
>                 error = xas_error(&xas);
> +               if (charged)

Can "if (!huge)" replace "if (charged)"?

> +                       mem_cgroup_uncharge(page);
>                 goto error;
>         }
>
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  4:24 [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Waiman Long
                   ` (3 preceding siblings ...)
  2021-01-25  8:07 ` Muchun Song
@ 2021-01-25  9:28 ` Michal Hocko
  2021-01-25 15:57   ` Waiman Long
  2021-01-25 19:32 ` Johannes Weiner
  5 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2021-01-25  9:28 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Johannes Weiner, Alex Shi, linux-mm, linux-kernel

On Sun 24-01-21 23:24:41, Waiman Long wrote:
> The commit 3fea5a499d57 ("mm: memcontrol: convert page
> cache to a new mem_cgroup_charge() API") introduced a bug in
> __add_to_page_cache_locked() causing the following splat:
> 
>  [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>  [ 1570.068333] pages's memcg:ffff8889a4116000
>  [ 1570.068343] ------------[ cut here ]------------
>  [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>  [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>  [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>  [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>  [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>    :
>  [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>  [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>  [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>  [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>  [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>  [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>  [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>  [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>  [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  [ 1570.068402] PKRU: 55555554
>  [ 1570.068404] Call Trace:
>  [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>  [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>  [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>  [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>  [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>  [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>  [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>  [ 1570.068693]  read_pages+0x5b1/0xc40
>  [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>  [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>  [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>  [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>  [ 1570.068872]  new_sync_read+0x3af/0x610
>  [ 1570.068901]  vfs_read+0x339/0x4b0
>  [ 1570.068909]  ksys_read+0xf1/0x1c0
>  [ 1570.068920]  do_syscall_64+0x33/0x40
>  [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  [ 1570.068930] RIP: 0033:0x7ff039135595
> 
> Before that commit, there was a try_charge() and commit_charge()
> in __add_to_page_cache_locked(). These 2 separated charge functions
> were replaced by a single mem_cgroup_charge(). However, it forgot
> to add a matching mem_cgroup_uncharge() when the xarray insertion
> failed with the page released back to the pool. Fix this by adding a
> mem_cgroup_uncharge() call when insertion error happens.
> 
> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> Signed-off-by: Waiman Long <longman@redhat.com>

OK, this is indeed a subtle bug. The patch aimed at simplifying the
charge lifetime so that users do not really have to think about when to
uncharge as that happens when the page is freed. fscache somehow breaks
that assumption because it doesn't free up pages but it keeps some of
them in the cache.

I have tried to wrap my head around the cached object life time in
fscache but failed and got lost in the maze. Is this the only instance
of the problem? Would it make more sense to explicitly handle charges in
the fscache code or there are other potential users to fall into this
trap?

> ---
>  mm/filemap.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 5c9d564317a5..aa0e0fb04670 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  	XA_STATE(xas, &mapping->i_pages, offset);
>  	int huge = PageHuge(page);
>  	int error;
> +	bool charged = false;
>  
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>  	VM_BUG_ON_PAGE(PageSwapBacked(page), page);
> @@ -848,6 +849,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  		error = mem_cgroup_charge(page, current->mm, gfp);
>  		if (error)
>  			goto error;
> +		charged = true;
>  	}
>  
>  	gfp &= GFP_RECLAIM_MASK;
> @@ -896,6 +898,8 @@ noinline int __add_to_page_cache_locked(struct page *page,
>  
>  	if (xas_error(&xas)) {
>  		error = xas_error(&xas);
> +		if (charged)
> +			mem_cgroup_uncharge(page);
>  		goto error;
>  	}
>  
> -- 
> 2.18.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  4:36 ` Matthew Wilcox
  2021-01-25  6:34   ` Miaohe Lin
@ 2021-01-25 14:09   ` Waiman Long
  1 sibling, 0 replies; 26+ messages in thread
From: Waiman Long @ 2021-01-25 14:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Johannes Weiner, Alex Shi, linux-mm, linux-kernel

On 1/24/21 11:36 PM, Matthew Wilcox wrote:
> On Sun, Jan 24, 2021 at 11:24:41PM -0500, Waiman Long wrote:
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 5c9d564317a5..aa0e0fb04670 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>>   	XA_STATE(xas, &mapping->i_pages, offset);
>>   	int huge = PageHuge(page);
>>   	int error;
>> +	bool charged = false;
> I don't think we need this extra bool.
>
>> @@ -896,6 +898,8 @@ noinline int __add_to_page_cache_locked(struct page *page,
>>   
>>   	if (xas_error(&xas)) {
>>   		error = xas_error(&xas);
>> +		if (charged)
>> +			mem_cgroup_uncharge(page);
>>   		goto error;
>>   	}
> Better:
>
> -		goto error;
> +		goto uncharge;
> ...
> +uncharge:
> +	if (!huge)
> +		mem_cgroup_uncharge(page);
>   error:
> ...
>
That was my original plan. After finding out there was a potentially 
conflicting patch in linux-next:

commit 7a02fa97b897 ("secretmem: add memcg accounting")

@@ -839,7 +840,7 @@ noinline int __add_to_page_cache_locked(struct page 
*page,
         page->mapping = mapping;
         page->index = offset;

-       if (!huge) {
+       if (!huge && !page_is_secretmem(page)) {
                 error = mem_cgroup_charge(page, current->mm, gfp);
                 if (error)
                         goto error;

Adding a boolean is an easy way out without conflicting it.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  6:30 ` Miaohe Lin
@ 2021-01-25 14:12   ` Waiman Long
  2021-01-25 23:11     ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Waiman Long @ 2021-01-25 14:12 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: linux-mm, linux-kernel, Andrew Morton, Johannes Weiner, Alex Shi

On 1/25/21 1:30 AM, Miaohe Lin wrote:
> On 2021/1/25 12:24, Waiman Long wrote:
>> The commit 3fea5a499d57 ("mm: memcontrol: convert page
>> cache to a new mem_cgroup_charge() API") introduced a bug in
>> __add_to_page_cache_locked() causing the following splat:
>>
>>   [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>>   [ 1570.068333] pages's memcg:ffff8889a4116000
>>   [ 1570.068343] ------------[ cut here ]------------
>>   [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>>   [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>>   [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>>   [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>>   [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>>     :
>>   [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>>   [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>>   [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>>   [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>>   [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>>   [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>>   [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>>   [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>>   [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   [ 1570.068402] PKRU: 55555554
>>   [ 1570.068404] Call Trace:
>>   [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>>   [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>>   [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>>   [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>>   [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>>   [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>>   [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>>   [ 1570.068693]  read_pages+0x5b1/0xc40
>>   [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>>   [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>>   [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>>   [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>>   [ 1570.068872]  new_sync_read+0x3af/0x610
>>   [ 1570.068901]  vfs_read+0x339/0x4b0
>>   [ 1570.068909]  ksys_read+0xf1/0x1c0
>>   [ 1570.068920]  do_syscall_64+0x33/0x40
>>   [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>   [ 1570.068930] RIP: 0033:0x7ff039135595
>>
>> Before that commit, there was a try_charge() and commit_charge()
>> in __add_to_page_cache_locked(). These 2 separated charge functions
>> were replaced by a single mem_cgroup_charge(). However, it forgot
>> to add a matching mem_cgroup_uncharge() when the xarray insertion
>> failed with the page released back to the pool. Fix this by adding a
>> mem_cgroup_uncharge() call when insertion error happens.
>>
>> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
>> Signed-off-by: Waiman Long <longman@redhat.com>
> Cc stable should be needed.

Yes, this patch should go to stable. I think the stable tree maintainers 
will automatically pick up patches with the "Fixes" tag. That is why I 
don't explicitly put a "cc:stable" line in the patch.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  8:07 ` Muchun Song
@ 2021-01-25 15:35   ` Waiman Long
  0 siblings, 0 replies; 26+ messages in thread
From: Waiman Long @ 2021-01-25 15:35 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, Johannes Weiner, Alex Shi, linux-mm, linux-kernel

On 1/25/21 3:07 AM, Muchun Song wrote:
> On Mon, Jan 25, 2021 at 12:29 PM Waiman Long <longman@redhat.com> wrote:
>> The commit 3fea5a499d57 ("mm: memcontrol: convert page
>> cache to a new mem_cgroup_charge() API") introduced a bug in
>> __add_to_page_cache_locked() causing the following splat:
>>
>>   [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>>   [ 1570.068333] pages's memcg:ffff8889a4116000
>>   [ 1570.068343] ------------[ cut here ]------------
>>   [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>>   [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>>   [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>>   [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>>   [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>>     :
>>   [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>>   [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>>   [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>>   [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>>   [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>>   [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>>   [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>>   [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>>   [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   [ 1570.068402] PKRU: 55555554
>>   [ 1570.068404] Call Trace:
>>   [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>>   [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>>   [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>>   [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>>   [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>>   [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>>   [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>>   [ 1570.068693]  read_pages+0x5b1/0xc40
>>   [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>>   [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>>   [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>>   [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>>   [ 1570.068872]  new_sync_read+0x3af/0x610
>>   [ 1570.068901]  vfs_read+0x339/0x4b0
>>   [ 1570.068909]  ksys_read+0xf1/0x1c0
>>   [ 1570.068920]  do_syscall_64+0x33/0x40
>>   [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>   [ 1570.068930] RIP: 0033:0x7ff039135595
>>
>> Before that commit, there was a try_charge() and commit_charge()
>> in __add_to_page_cache_locked(). These 2 separated charge functions
>> were replaced by a single mem_cgroup_charge(). However, it forgot
>> to add a matching mem_cgroup_uncharge() when the xarray insertion
>> failed with the page released back to the pool. Fix this by adding a
>> mem_cgroup_uncharge() call when insertion error happens.
>>
>> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   mm/filemap.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 5c9d564317a5..aa0e0fb04670 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>>          XA_STATE(xas, &mapping->i_pages, offset);
>>          int huge = PageHuge(page);
>>          int error;
>> +       bool charged = false;
>>
>>          VM_BUG_ON_PAGE(!PageLocked(page), page);
>>          VM_BUG_ON_PAGE(PageSwapBacked(page), page);
>> @@ -848,6 +849,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
>>                  error = mem_cgroup_charge(page, current->mm, gfp);
>>                  if (error)
>>                          goto error;
>> +               charged = true;
>>          }
>>
>>          gfp &= GFP_RECLAIM_MASK;
>> @@ -896,6 +898,8 @@ noinline int __add_to_page_cache_locked(struct page *page,
>>
>>          if (xas_error(&xas)) {
>>                  error = xas_error(&xas);
>> +               if (charged)
> Can "if (!huge)" replace "if (charged)"?

See my reply to Mathew Wilcox.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  9:28 ` Michal Hocko
@ 2021-01-25 15:57   ` Waiman Long
  2021-01-25 16:03     ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: Waiman Long @ 2021-01-25 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Alex Shi, linux-mm, linux-kernel

On 1/25/21 4:28 AM, Michal Hocko wrote:
> On Sun 24-01-21 23:24:41, Waiman Long wrote:
>> The commit 3fea5a499d57 ("mm: memcontrol: convert page
>> cache to a new mem_cgroup_charge() API") introduced a bug in
>> __add_to_page_cache_locked() causing the following splat:
>>
>>   [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>>   [ 1570.068333] pages's memcg:ffff8889a4116000
>>   [ 1570.068343] ------------[ cut here ]------------
>>   [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>>   [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>>   [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>>   [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>>   [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>>     :
>>   [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>>   [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>>   [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>>   [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>>   [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>>   [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>>   [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>>   [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>>   [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   [ 1570.068402] PKRU: 55555554
>>   [ 1570.068404] Call Trace:
>>   [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>>   [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>>   [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>>   [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>>   [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>>   [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>>   [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>>   [ 1570.068693]  read_pages+0x5b1/0xc40
>>   [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>>   [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>>   [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>>   [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>>   [ 1570.068872]  new_sync_read+0x3af/0x610
>>   [ 1570.068901]  vfs_read+0x339/0x4b0
>>   [ 1570.068909]  ksys_read+0xf1/0x1c0
>>   [ 1570.068920]  do_syscall_64+0x33/0x40
>>   [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>   [ 1570.068930] RIP: 0033:0x7ff039135595
>>
>> Before that commit, there was a try_charge() and commit_charge()
>> in __add_to_page_cache_locked(). These 2 separated charge functions
>> were replaced by a single mem_cgroup_charge(). However, it forgot
>> to add a matching mem_cgroup_uncharge() when the xarray insertion
>> failed with the page released back to the pool. Fix this by adding a
>> mem_cgroup_uncharge() call when insertion error happens.
>>
>> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
>> Signed-off-by: Waiman Long <longman@redhat.com>
> OK, this is indeed a subtle bug. The patch aimed at simplifying the
> charge lifetime so that users do not really have to think about when to
> uncharge as that happens when the page is freed. fscache somehow breaks
> that assumption because it doesn't free up pages but it keeps some of
> them in the cache.
>
> I have tried to wrap my head around the cached object life time in
> fscache but failed and got lost in the maze. Is this the only instance
> of the problem? Would it make more sense to explicitly handle charges in
> the fscache code or there are other potential users to fall into this
> trap?

There may be other places that have similar problem. I focus on the 
filemap.c case as I have a test case that can reliably produce the bug 
splat. This patch does fix it for my test case.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 15:57   ` Waiman Long
@ 2021-01-25 16:03     ` Michal Hocko
  2021-01-25 16:25       ` Matthew Wilcox
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2021-01-25 16:03 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Johannes Weiner, Alex Shi, linux-mm, linux-kernel

On Mon 25-01-21 10:57:54, Waiman Long wrote:
> On 1/25/21 4:28 AM, Michal Hocko wrote:
> > On Sun 24-01-21 23:24:41, Waiman Long wrote:
> > > The commit 3fea5a499d57 ("mm: memcontrol: convert page
> > > cache to a new mem_cgroup_charge() API") introduced a bug in
> > > __add_to_page_cache_locked() causing the following splat:
> > > 
> > >   [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
> > >   [ 1570.068333] pages's memcg:ffff8889a4116000
> > >   [ 1570.068343] ------------[ cut here ]------------
> > >   [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
> > >   [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
> > >   [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
> > >   [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
> > >   [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
> > >     :
> > >   [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
> > >   [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
> > >   [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
> > >   [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
> > >   [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
> > >   [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
> > >   [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
> > >   [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >   [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
> > >   [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >   [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >   [ 1570.068402] PKRU: 55555554
> > >   [ 1570.068404] Call Trace:
> > >   [ 1570.068407]  mem_cgroup_charge+0x175/0x770
> > >   [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
> > >   [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
> > >   [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
> > >   [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
> > >   [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
> > >   [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
> > >   [ 1570.068693]  read_pages+0x5b1/0xc40
> > >   [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
> > >   [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
> > >   [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
> > >   [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
> > >   [ 1570.068872]  new_sync_read+0x3af/0x610
> > >   [ 1570.068901]  vfs_read+0x339/0x4b0
> > >   [ 1570.068909]  ksys_read+0xf1/0x1c0
> > >   [ 1570.068920]  do_syscall_64+0x33/0x40
> > >   [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >   [ 1570.068930] RIP: 0033:0x7ff039135595
> > > 
> > > Before that commit, there was a try_charge() and commit_charge()
> > > in __add_to_page_cache_locked(). These 2 separated charge functions
> > > were replaced by a single mem_cgroup_charge(). However, it forgot
> > > to add a matching mem_cgroup_uncharge() when the xarray insertion
> > > failed with the page released back to the pool. Fix this by adding a
> > > mem_cgroup_uncharge() call when insertion error happens.
> > > 
> > > Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> > > Signed-off-by: Waiman Long <longman@redhat.com>
> > OK, this is indeed a subtle bug. The patch aimed at simplifying the
> > charge lifetime so that users do not really have to think about when to
> > uncharge as that happens when the page is freed. fscache somehow breaks
> > that assumption because it doesn't free up pages but it keeps some of
> > them in the cache.
> > 
> > I have tried to wrap my head around the cached object life time in
> > fscache but failed and got lost in the maze. Is this the only instance
> > of the problem? Would it make more sense to explicitly handle charges in
> > the fscache code or there are other potential users to fall into this
> > trap?
> 
> There may be other places that have similar problem. I focus on the
> filemap.c case as I have a test case that can reliably produce the bug
> splat. This patch does fix it for my test case.

I believe this needs a more general fix than catching a random places
which you can trigger. Would it make more sense to address this at the
fscache level and always make sure that a page returned to the pool is
always uncharged instead?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 16:03     ` Michal Hocko
@ 2021-01-25 16:25       ` Matthew Wilcox
  2021-01-25 16:41         ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox @ 2021-01-25 16:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Waiman Long, Andrew Morton, Johannes Weiner, Alex Shi, linux-mm,
	linux-kernel

On Mon, Jan 25, 2021 at 05:03:28PM +0100, Michal Hocko wrote:
> On Mon 25-01-21 10:57:54, Waiman Long wrote:
> > On 1/25/21 4:28 AM, Michal Hocko wrote:
> > > On Sun 24-01-21 23:24:41, Waiman Long wrote:
> > > > The commit 3fea5a499d57 ("mm: memcontrol: convert page
> > > > cache to a new mem_cgroup_charge() API") introduced a bug in
> > > > __add_to_page_cache_locked() causing the following splat:
> > > > 
> > > >   [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
> > > >   [ 1570.068333] pages's memcg:ffff8889a4116000
> > > >   [ 1570.068343] ------------[ cut here ]------------
> > > >   [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
> > > >   [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
> > > >   [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
> > > >   [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
> > > >   [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
> > > >     :
> > > >   [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
> > > >   [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
> > > >   [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
> > > >   [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
> > > >   [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
> > > >   [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
> > > >   [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
> > > >   [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > >   [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
> > > >   [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > >   [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > >   [ 1570.068402] PKRU: 55555554
> > > >   [ 1570.068404] Call Trace:
> > > >   [ 1570.068407]  mem_cgroup_charge+0x175/0x770
> > > >   [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
> > > >   [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
> > > >   [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
> > > >   [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
> > > >   [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
> > > >   [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
> > > >   [ 1570.068693]  read_pages+0x5b1/0xc40
> > > >   [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
> > > >   [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
> > > >   [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
> > > >   [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
> > > >   [ 1570.068872]  new_sync_read+0x3af/0x610
> > > >   [ 1570.068901]  vfs_read+0x339/0x4b0
> > > >   [ 1570.068909]  ksys_read+0xf1/0x1c0
> > > >   [ 1570.068920]  do_syscall_64+0x33/0x40
> > > >   [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > >   [ 1570.068930] RIP: 0033:0x7ff039135595
> > > > 
> > > > Before that commit, there was a try_charge() and commit_charge()
> > > > in __add_to_page_cache_locked(). These 2 separated charge functions
> > > > were replaced by a single mem_cgroup_charge(). However, it forgot
> > > > to add a matching mem_cgroup_uncharge() when the xarray insertion
> > > > failed with the page released back to the pool. Fix this by adding a
> > > > mem_cgroup_uncharge() call when insertion error happens.
> > > > 
> > > > Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> > > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > OK, this is indeed a subtle bug. The patch aimed at simplifying the
> > > charge lifetime so that users do not really have to think about when to
> > > uncharge as that happens when the page is freed. fscache somehow breaks
> > > that assumption because it doesn't free up pages but it keeps some of
> > > them in the cache.
> > > 
> > > I have tried to wrap my head around the cached object life time in
> > > fscache but failed and got lost in the maze. Is this the only instance
> > > of the problem? Would it make more sense to explicitly handle charges in
> > > the fscache code or there are other potential users to fall into this
> > > trap?
> > 
> > There may be other places that have similar problem. I focus on the
> > filemap.c case as I have a test case that can reliably produce the bug
> > splat. This patch does fix it for my test case.
> 
> I believe this needs a more general fix than catching a random places
> which you can trigger. Would it make more sense to address this at the
> fscache level and always make sure that a page returned to the pool is
> always uncharged instead?

I believe you mean "page cache" -- there is a separate thing called
'fscache' which is used to cache network filesystems.

I don't understand the memcg code at all, so I have no useful feedback
on what you're saying other than this.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 16:25       ` Matthew Wilcox
@ 2021-01-25 16:41         ` Michal Hocko
  2021-01-25 18:14           ` Michal Hocko
  2021-01-25 18:43           ` Johannes Weiner
  0 siblings, 2 replies; 26+ messages in thread
From: Michal Hocko @ 2021-01-25 16:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Waiman Long, Andrew Morton, Johannes Weiner, Alex Shi, linux-mm,
	linux-kernel

On Mon 25-01-21 16:25:06, Matthew Wilcox wrote:
> On Mon, Jan 25, 2021 at 05:03:28PM +0100, Michal Hocko wrote:
> > On Mon 25-01-21 10:57:54, Waiman Long wrote:
> > > On 1/25/21 4:28 AM, Michal Hocko wrote:
> > > > On Sun 24-01-21 23:24:41, Waiman Long wrote:
> > > > > The commit 3fea5a499d57 ("mm: memcontrol: convert page
> > > > > cache to a new mem_cgroup_charge() API") introduced a bug in
> > > > > __add_to_page_cache_locked() causing the following splat:
> > > > > 
> > > > >   [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
> > > > >   [ 1570.068333] pages's memcg:ffff8889a4116000
> > > > >   [ 1570.068343] ------------[ cut here ]------------
> > > > >   [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
> > > > >   [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
> > > > >   [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
> > > > >   [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
> > > > >   [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
> > > > >     :
> > > > >   [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
> > > > >   [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
> > > > >   [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
> > > > >   [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
> > > > >   [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
> > > > >   [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
> > > > >   [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
> > > > >   [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > >   [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
> > > > >   [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > >   [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > >   [ 1570.068402] PKRU: 55555554
> > > > >   [ 1570.068404] Call Trace:
> > > > >   [ 1570.068407]  mem_cgroup_charge+0x175/0x770
> > > > >   [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
> > > > >   [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
> > > > >   [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
> > > > >   [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
> > > > >   [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
> > > > >   [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
> > > > >   [ 1570.068693]  read_pages+0x5b1/0xc40
> > > > >   [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
> > > > >   [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
> > > > >   [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
> > > > >   [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
> > > > >   [ 1570.068872]  new_sync_read+0x3af/0x610
> > > > >   [ 1570.068901]  vfs_read+0x339/0x4b0
> > > > >   [ 1570.068909]  ksys_read+0xf1/0x1c0
> > > > >   [ 1570.068920]  do_syscall_64+0x33/0x40
> > > > >   [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > >   [ 1570.068930] RIP: 0033:0x7ff039135595
> > > > > 
> > > > > Before that commit, there was a try_charge() and commit_charge()
> > > > > in __add_to_page_cache_locked(). These 2 separated charge functions
> > > > > were replaced by a single mem_cgroup_charge(). However, it forgot
> > > > > to add a matching mem_cgroup_uncharge() when the xarray insertion
> > > > > failed with the page released back to the pool. Fix this by adding a
> > > > > mem_cgroup_uncharge() call when insertion error happens.
> > > > > 
> > > > > Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> > > > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > > OK, this is indeed a subtle bug. The patch aimed at simplifying the
> > > > charge lifetime so that users do not really have to think about when to
> > > > uncharge as that happens when the page is freed. fscache somehow breaks
> > > > that assumption because it doesn't free up pages but it keeps some of
> > > > them in the cache.
> > > > 
> > > > I have tried to wrap my head around the cached object life time in
> > > > fscache but failed and got lost in the maze. Is this the only instance
> > > > of the problem? Would it make more sense to explicitly handle charges in
> > > > the fscache code or there are other potential users to fall into this
> > > > trap?
> > > 
> > > There may be other places that have similar problem. I focus on the
> > > filemap.c case as I have a test case that can reliably produce the bug
> > > splat. This patch does fix it for my test case.
> > 
> > I believe this needs a more general fix than catching a random places
> > which you can trigger. Would it make more sense to address this at the
> > fscache level and always make sure that a page returned to the pool is
> > always uncharged instead?
> 
> I believe you mean "page cache" -- there is a separate thing called
> 'fscache' which is used to cache network filesystems.

Yes, I really had fscache in mind because it does have an "unusual" page
life time rules.

> I don't understand the memcg code at all, so I have no useful feedback
> on what you're saying other than this.

Well the memcg accounting rules after the rework should have simplified
the API usage for most users. You will get memory charged when it is
used and it will go away when the page is freed. If a page is not really
freed in some cases and it can be reused then it doesn't really fit into
this scheme automagically. I do undestand that this puts some additional
burden on those special cases. I am not really sure what is the right
way here myself but considering there might be other similar cases like
that I would lean towards special casing where the pool is implemented.
I would expect there is some state to be maintain for that purpose
already.

But as I've said I am not really familiar with fscache so this might be
just unfeasible and a better fix would be in a library code.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 16:41         ` Michal Hocko
@ 2021-01-25 18:14           ` Michal Hocko
  2021-01-25 18:23             ` Waiman Long
  2021-01-25 18:43           ` Johannes Weiner
  1 sibling, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2021-01-25 18:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Waiman Long, Andrew Morton, Johannes Weiner, Alex Shi, linux-mm,
	linux-kernel

On Mon 25-01-21 17:41:19, Michal Hocko wrote:
> On Mon 25-01-21 16:25:06, Matthew Wilcox wrote:
> > On Mon, Jan 25, 2021 at 05:03:28PM +0100, Michal Hocko wrote:
> > > On Mon 25-01-21 10:57:54, Waiman Long wrote:
> > > > On 1/25/21 4:28 AM, Michal Hocko wrote:
> > > > > On Sun 24-01-21 23:24:41, Waiman Long wrote:
> > > > > > The commit 3fea5a499d57 ("mm: memcontrol: convert page
> > > > > > cache to a new mem_cgroup_charge() API") introduced a bug in
> > > > > > __add_to_page_cache_locked() causing the following splat:
> > > > > > 
> > > > > >   [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
> > > > > >   [ 1570.068333] pages's memcg:ffff8889a4116000
> > > > > >   [ 1570.068343] ------------[ cut here ]------------
> > > > > >   [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
> > > > > >   [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
> > > > > >   [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
> > > > > >   [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
> > > > > >   [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
> > > > > >     :
> > > > > >   [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
> > > > > >   [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
> > > > > >   [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
> > > > > >   [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
> > > > > >   [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
> > > > > >   [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
> > > > > >   [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
> > > > > >   [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > >   [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
> > > > > >   [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > >   [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > >   [ 1570.068402] PKRU: 55555554
> > > > > >   [ 1570.068404] Call Trace:
> > > > > >   [ 1570.068407]  mem_cgroup_charge+0x175/0x770
> > > > > >   [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
> > > > > >   [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
> > > > > >   [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
> > > > > >   [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
> > > > > >   [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
> > > > > >   [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
> > > > > >   [ 1570.068693]  read_pages+0x5b1/0xc40
> > > > > >   [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
> > > > > >   [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
> > > > > >   [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
> > > > > >   [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
> > > > > >   [ 1570.068872]  new_sync_read+0x3af/0x610
> > > > > >   [ 1570.068901]  vfs_read+0x339/0x4b0
> > > > > >   [ 1570.068909]  ksys_read+0xf1/0x1c0
> > > > > >   [ 1570.068920]  do_syscall_64+0x33/0x40
> > > > > >   [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > >   [ 1570.068930] RIP: 0033:0x7ff039135595
> > > > > > 
> > > > > > Before that commit, there was a try_charge() and commit_charge()
> > > > > > in __add_to_page_cache_locked(). These 2 separated charge functions
> > > > > > were replaced by a single mem_cgroup_charge(). However, it forgot
> > > > > > to add a matching mem_cgroup_uncharge() when the xarray insertion
> > > > > > failed with the page released back to the pool. Fix this by adding a
> > > > > > mem_cgroup_uncharge() call when insertion error happens.
> > > > > > 
> > > > > > Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> > > > > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > > > OK, this is indeed a subtle bug. The patch aimed at simplifying the
> > > > > charge lifetime so that users do not really have to think about when to
> > > > > uncharge as that happens when the page is freed. fscache somehow breaks
> > > > > that assumption because it doesn't free up pages but it keeps some of
> > > > > them in the cache.
> > > > > 
> > > > > I have tried to wrap my head around the cached object life time in
> > > > > fscache but failed and got lost in the maze. Is this the only instance
> > > > > of the problem? Would it make more sense to explicitly handle charges in
> > > > > the fscache code or there are other potential users to fall into this
> > > > > trap?
> > > > 
> > > > There may be other places that have similar problem. I focus on the
> > > > filemap.c case as I have a test case that can reliably produce the bug
> > > > splat. This patch does fix it for my test case.
> > > 
> > > I believe this needs a more general fix than catching a random places
> > > which you can trigger. Would it make more sense to address this at the
> > > fscache level and always make sure that a page returned to the pool is
> > > always uncharged instead?
> > 
> > I believe you mean "page cache" -- there is a separate thing called
> > 'fscache' which is used to cache network filesystems.
> 
> Yes, I really had fscache in mind because it does have an "unusual" page
> life time rules.
> 
> > I don't understand the memcg code at all, so I have no useful feedback
> > on what you're saying other than this.
> 
> Well the memcg accounting rules after the rework should have simplified
> the API usage for most users. You will get memory charged when it is
> used and it will go away when the page is freed. If a page is not really
> freed in some cases and it can be reused then it doesn't really fit into
> this scheme automagically. I do undestand that this puts some additional
> burden on those special cases. I am not really sure what is the right
> way here myself but considering there might be other similar cases like
> that I would lean towards special casing where the pool is implemented.
> I would expect there is some state to be maintain for that purpose
> already.

After some more thinking I've came to conclusion that the patch as
proposed is the proper way forward. It is easier to follow if the
unwinding of state changes are local to the function.

With the proposed simplification by Willy
Acked-by: Michal Hocko <mhocko@suse.com>
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 18:14           ` Michal Hocko
@ 2021-01-25 18:23             ` Waiman Long
  2021-01-25 18:29               ` Matthew Wilcox
                                 ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Waiman Long @ 2021-01-25 18:23 UTC (permalink / raw)
  To: Michal Hocko, Matthew Wilcox
  Cc: Andrew Morton, Johannes Weiner, Alex Shi, linux-mm, linux-kernel

On 1/25/21 1:14 PM, Michal Hocko wrote:
> On Mon 25-01-21 17:41:19, Michal Hocko wrote:
>> On Mon 25-01-21 16:25:06, Matthew Wilcox wrote:
>>> On Mon, Jan 25, 2021 at 05:03:28PM +0100, Michal Hocko wrote:
>>>> On Mon 25-01-21 10:57:54, Waiman Long wrote:
>>>>> On 1/25/21 4:28 AM, Michal Hocko wrote:
>>>>>> On Sun 24-01-21 23:24:41, Waiman Long wrote:
>>>>>>> The commit 3fea5a499d57 ("mm: memcontrol: convert page
>>>>>>> cache to a new mem_cgroup_charge() API") introduced a bug in
>>>>>>> __add_to_page_cache_locked() causing the following splat:
>>>>>>>
>>>>>>>    [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>>>>>>>    [ 1570.068333] pages's memcg:ffff8889a4116000
>>>>>>>    [ 1570.068343] ------------[ cut here ]------------
>>>>>>>    [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>>>>>>>    [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>>>>>>>    [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>>>>>>>    [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>>>>>>>    [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>>>>>>>      :
>>>>>>>    [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>>>>>>>    [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>>>>>>>    [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>>>>>>>    [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>>>>>>>    [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>>>>>>>    [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>>>>>>>    [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>>>>>>>    [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>    [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>>>>>>>    [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>    [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>    [ 1570.068402] PKRU: 55555554
>>>>>>>    [ 1570.068404] Call Trace:
>>>>>>>    [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>>>>>>>    [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>>>>>>>    [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>>>>>>>    [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>>>>>>>    [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>>>>>>>    [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>>>>>>>    [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>>>>>>>    [ 1570.068693]  read_pages+0x5b1/0xc40
>>>>>>>    [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>>>>>>>    [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>>>>>>>    [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>>>>>>>    [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>>>>>>>    [ 1570.068872]  new_sync_read+0x3af/0x610
>>>>>>>    [ 1570.068901]  vfs_read+0x339/0x4b0
>>>>>>>    [ 1570.068909]  ksys_read+0xf1/0x1c0
>>>>>>>    [ 1570.068920]  do_syscall_64+0x33/0x40
>>>>>>>    [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>>    [ 1570.068930] RIP: 0033:0x7ff039135595
>>>>>>>
>>>>>>> Before that commit, there was a try_charge() and commit_charge()
>>>>>>> in __add_to_page_cache_locked(). These 2 separated charge functions
>>>>>>> were replaced by a single mem_cgroup_charge(). However, it forgot
>>>>>>> to add a matching mem_cgroup_uncharge() when the xarray insertion
>>>>>>> failed with the page released back to the pool. Fix this by adding a
>>>>>>> mem_cgroup_uncharge() call when insertion error happens.
>>>>>>>
>>>>>>> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
>>>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>>> OK, this is indeed a subtle bug. The patch aimed at simplifying the
>>>>>> charge lifetime so that users do not really have to think about when to
>>>>>> uncharge as that happens when the page is freed. fscache somehow breaks
>>>>>> that assumption because it doesn't free up pages but it keeps some of
>>>>>> them in the cache.
>>>>>>
>>>>>> I have tried to wrap my head around the cached object life time in
>>>>>> fscache but failed and got lost in the maze. Is this the only instance
>>>>>> of the problem? Would it make more sense to explicitly handle charges in
>>>>>> the fscache code or there are other potential users to fall into this
>>>>>> trap?
>>>>> There may be other places that have similar problem. I focus on the
>>>>> filemap.c case as I have a test case that can reliably produce the bug
>>>>> splat. This patch does fix it for my test case.
>>>> I believe this needs a more general fix than catching a random places
>>>> which you can trigger. Would it make more sense to address this at the
>>>> fscache level and always make sure that a page returned to the pool is
>>>> always uncharged instead?
>>> I believe you mean "page cache" -- there is a separate thing called
>>> 'fscache' which is used to cache network filesystems.
>> Yes, I really had fscache in mind because it does have an "unusual" page
>> life time rules.
>>
>>> I don't understand the memcg code at all, so I have no useful feedback
>>> on what you're saying other than this.
>> Well the memcg accounting rules after the rework should have simplified
>> the API usage for most users. You will get memory charged when it is
>> used and it will go away when the page is freed. If a page is not really
>> freed in some cases and it can be reused then it doesn't really fit into
>> this scheme automagically. I do undestand that this puts some additional
>> burden on those special cases. I am not really sure what is the right
>> way here myself but considering there might be other similar cases like
>> that I would lean towards special casing where the pool is implemented.
>> I would expect there is some state to be maintain for that purpose
>> already.
> After some more thinking I've came to conclusion that the patch as
> proposed is the proper way forward. It is easier to follow if the
> unwinding of state changes are local to the function.
I think so. It is easier to understand if the charge and uncharge 
functions are grouped together in the same function.
>
> With the proposed simplification by Willy
> Acked-by: Michal Hocko <mhocko@suse.com>

Thank for the ack. However, I am a bit confused about what you mean by 
simplification. There is another linux-next patch that changes the 
condition for mem_cgroup_charge() to

-       if (!huge) {
+       if (!huge && !page_is_secretmem(page)) {
                 error = mem_cgroup_charge(page, current->mm, gfp);

That is the main reason why I introduced the boolean variable as I don't 
want to call the external page_is_secretmem() function twice.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 18:23             ` Waiman Long
@ 2021-01-25 18:29               ` Matthew Wilcox
  2021-01-25 18:45                 ` Waiman Long
  2021-01-25 18:31               ` Michal Hocko
  2021-01-25 18:52               ` Johannes Weiner
  2 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox @ 2021-01-25 18:29 UTC (permalink / raw)
  To: Waiman Long
  Cc: Michal Hocko, Andrew Morton, Johannes Weiner, Alex Shi, linux-mm,
	linux-kernel

On Mon, Jan 25, 2021 at 01:23:58PM -0500, Waiman Long wrote:
> On 1/25/21 1:14 PM, Michal Hocko wrote:
> > On Mon 25-01-21 17:41:19, Michal Hocko wrote:
> > > On Mon 25-01-21 16:25:06, Matthew Wilcox wrote:
> > > > On Mon, Jan 25, 2021 at 05:03:28PM +0100, Michal Hocko wrote:
> > > > > On Mon 25-01-21 10:57:54, Waiman Long wrote:
> > > > > > On 1/25/21 4:28 AM, Michal Hocko wrote:
> > > > > > > On Sun 24-01-21 23:24:41, Waiman Long wrote:
> > > > > > > > The commit 3fea5a499d57 ("mm: memcontrol: convert page
> > > > > > > > cache to a new mem_cgroup_charge() API") introduced a bug in
> > > > > > > > __add_to_page_cache_locked() causing the following splat:
> > > > > > > > 
> > > > > > > >    [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
> > > > > > > >    [ 1570.068333] pages's memcg:ffff8889a4116000
> > > > > > > >    [ 1570.068343] ------------[ cut here ]------------
> > > > > > > >    [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
> > > > > > > >    [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
> > > > > > > >    [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
> > > > > > > >    [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
> > > > > > > >    [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
> > > > > > > >      :
> > > > > > > >    [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
> > > > > > > >    [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
> > > > > > > >    [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
> > > > > > > >    [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
> > > > > > > >    [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
> > > > > > > >    [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
> > > > > > > >    [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
> > > > > > > >    [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > >    [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
> > > > > > > >    [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > >    [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > >    [ 1570.068402] PKRU: 55555554
> > > > > > > >    [ 1570.068404] Call Trace:
> > > > > > > >    [ 1570.068407]  mem_cgroup_charge+0x175/0x770
> > > > > > > >    [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
> > > > > > > >    [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
> > > > > > > >    [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
> > > > > > > >    [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
> > > > > > > >    [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
> > > > > > > >    [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
> > > > > > > >    [ 1570.068693]  read_pages+0x5b1/0xc40
> > > > > > > >    [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
> > > > > > > >    [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
> > > > > > > >    [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
> > > > > > > >    [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
> > > > > > > >    [ 1570.068872]  new_sync_read+0x3af/0x610
> > > > > > > >    [ 1570.068901]  vfs_read+0x339/0x4b0
> > > > > > > >    [ 1570.068909]  ksys_read+0xf1/0x1c0
> > > > > > > >    [ 1570.068920]  do_syscall_64+0x33/0x40
> > > > > > > >    [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > > >    [ 1570.068930] RIP: 0033:0x7ff039135595
> > > > > > > > 
> > > > > > > > Before that commit, there was a try_charge() and commit_charge()
> > > > > > > > in __add_to_page_cache_locked(). These 2 separated charge functions
> > > > > > > > were replaced by a single mem_cgroup_charge(). However, it forgot
> > > > > > > > to add a matching mem_cgroup_uncharge() when the xarray insertion
> > > > > > > > failed with the page released back to the pool. Fix this by adding a
> > > > > > > > mem_cgroup_uncharge() call when insertion error happens.
> > > > > > > > 
> > > > > > > > Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> > > > > > > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > > > > > OK, this is indeed a subtle bug. The patch aimed at simplifying the
> > > > > > > charge lifetime so that users do not really have to think about when to
> > > > > > > uncharge as that happens when the page is freed. fscache somehow breaks
> > > > > > > that assumption because it doesn't free up pages but it keeps some of
> > > > > > > them in the cache.
> > > > > > > 
> > > > > > > I have tried to wrap my head around the cached object life time in
> > > > > > > fscache but failed and got lost in the maze. Is this the only instance
> > > > > > > of the problem? Would it make more sense to explicitly handle charges in
> > > > > > > the fscache code or there are other potential users to fall into this
> > > > > > > trap?
> > > > > > There may be other places that have similar problem. I focus on the
> > > > > > filemap.c case as I have a test case that can reliably produce the bug
> > > > > > splat. This patch does fix it for my test case.
> > > > > I believe this needs a more general fix than catching a random places
> > > > > which you can trigger. Would it make more sense to address this at the
> > > > > fscache level and always make sure that a page returned to the pool is
> > > > > always uncharged instead?
> > > > I believe you mean "page cache" -- there is a separate thing called
> > > > 'fscache' which is used to cache network filesystems.
> > > Yes, I really had fscache in mind because it does have an "unusual" page
> > > life time rules.
> > > 
> > > > I don't understand the memcg code at all, so I have no useful feedback
> > > > on what you're saying other than this.
> > > Well the memcg accounting rules after the rework should have simplified
> > > the API usage for most users. You will get memory charged when it is
> > > used and it will go away when the page is freed. If a page is not really
> > > freed in some cases and it can be reused then it doesn't really fit into
> > > this scheme automagically. I do undestand that this puts some additional
> > > burden on those special cases. I am not really sure what is the right
> > > way here myself but considering there might be other similar cases like
> > > that I would lean towards special casing where the pool is implemented.
> > > I would expect there is some state to be maintain for that purpose
> > > already.
> > After some more thinking I've came to conclusion that the patch as
> > proposed is the proper way forward. It is easier to follow if the
> > unwinding of state changes are local to the function.
> I think so. It is easier to understand if the charge and uncharge functions
> are grouped together in the same function.
> > 
> > With the proposed simplification by Willy
> > Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thank for the ack. However, I am a bit confused about what you mean by
> simplification. There is another linux-next patch that changes the condition
> for mem_cgroup_charge() to
> 
> -       if (!huge) {
> +       if (!huge && !page_is_secretmem(page)) {
>                 error = mem_cgroup_charge(page, current->mm, gfp);
> 
> That is the main reason why I introduced the boolean variable as I don't
> want to call the external page_is_secretmem() function twice.

See earlier emails to Mike suggesting that the accounting of secretmem
here is wrong.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 18:23             ` Waiman Long
  2021-01-25 18:29               ` Matthew Wilcox
@ 2021-01-25 18:31               ` Michal Hocko
  2021-01-25 18:52               ` Johannes Weiner
  2 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2021-01-25 18:31 UTC (permalink / raw)
  To: Waiman Long
  Cc: Matthew Wilcox, Andrew Morton, Johannes Weiner, Alex Shi,
	linux-mm, linux-kernel

On Mon 25-01-21 13:23:58, Waiman Long wrote:
> On 1/25/21 1:14 PM, Michal Hocko wrote:
[...]
> > With the proposed simplification by Willy
> > Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thank for the ack. However, I am a bit confused about what you mean by
> simplification. There is another linux-next patch that changes the condition
> for mem_cgroup_charge() to

This is obviously a fix and I believe should go in the next rc while
secretmem is a new feature which should wait at least to the next merge
window.

> -       if (!huge) {
> +       if (!huge && !page_is_secretmem(page)) {
>                 error = mem_cgroup_charge(page, current->mm, gfp);
> 
> That is the main reason why I introduced the boolean variable as I don't
> want to call the external page_is_secretmem() function twice.
> 
> Cheers,
> Longman
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 16:41         ` Michal Hocko
  2021-01-25 18:14           ` Michal Hocko
@ 2021-01-25 18:43           ` Johannes Weiner
  1 sibling, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2021-01-25 18:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, Waiman Long, Andrew Morton, Alex Shi, linux-mm,
	linux-kernel

On Mon, Jan 25, 2021 at 05:41:18PM +0100, Michal Hocko wrote:
> On Mon 25-01-21 16:25:06, Matthew Wilcox wrote:
> > On Mon, Jan 25, 2021 at 05:03:28PM +0100, Michal Hocko wrote:
> > > On Mon 25-01-21 10:57:54, Waiman Long wrote:
> > > > On 1/25/21 4:28 AM, Michal Hocko wrote:
> > > > > On Sun 24-01-21 23:24:41, Waiman Long wrote:
> > > > > > The commit 3fea5a499d57 ("mm: memcontrol: convert page
> > > > > > cache to a new mem_cgroup_charge() API") introduced a bug in
> > > > > > __add_to_page_cache_locked() causing the following splat:
> > > > > > 
> > > > > >   [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
> > > > > >   [ 1570.068333] pages's memcg:ffff8889a4116000
> > > > > >   [ 1570.068343] ------------[ cut here ]------------
> > > > > >   [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
> > > > > >   [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
> > > > > >   [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
> > > > > >   [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
> > > > > >   [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
> > > > > >     :
> > > > > >   [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
> > > > > >   [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
> > > > > >   [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
> > > > > >   [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
> > > > > >   [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
> > > > > >   [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
> > > > > >   [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
> > > > > >   [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > >   [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
> > > > > >   [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > >   [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > >   [ 1570.068402] PKRU: 55555554
> > > > > >   [ 1570.068404] Call Trace:
> > > > > >   [ 1570.068407]  mem_cgroup_charge+0x175/0x770
> > > > > >   [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
> > > > > >   [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
> > > > > >   [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
> > > > > >   [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
> > > > > >   [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
> > > > > >   [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
> > > > > >   [ 1570.068693]  read_pages+0x5b1/0xc40
> > > > > >   [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
> > > > > >   [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
> > > > > >   [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
> > > > > >   [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
> > > > > >   [ 1570.068872]  new_sync_read+0x3af/0x610
> > > > > >   [ 1570.068901]  vfs_read+0x339/0x4b0
> > > > > >   [ 1570.068909]  ksys_read+0xf1/0x1c0
> > > > > >   [ 1570.068920]  do_syscall_64+0x33/0x40
> > > > > >   [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > >   [ 1570.068930] RIP: 0033:0x7ff039135595
> > > > > > 
> > > > > > Before that commit, there was a try_charge() and commit_charge()
> > > > > > in __add_to_page_cache_locked(). These 2 separated charge functions
> > > > > > were replaced by a single mem_cgroup_charge(). However, it forgot
> > > > > > to add a matching mem_cgroup_uncharge() when the xarray insertion
> > > > > > failed with the page released back to the pool. Fix this by adding a
> > > > > > mem_cgroup_uncharge() call when insertion error happens.
> > > > > > 
> > > > > > Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> > > > > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > > > OK, this is indeed a subtle bug. The patch aimed at simplifying the
> > > > > charge lifetime so that users do not really have to think about when to
> > > > > uncharge as that happens when the page is freed. fscache somehow breaks
> > > > > that assumption because it doesn't free up pages but it keeps some of
> > > > > them in the cache.
> > > > > 
> > > > > I have tried to wrap my head around the cached object life time in
> > > > > fscache but failed and got lost in the maze. Is this the only instance
> > > > > of the problem? Would it make more sense to explicitly handle charges in
> > > > > the fscache code or there are other potential users to fall into this
> > > > > trap?
> > > > 
> > > > There may be other places that have similar problem. I focus on the
> > > > filemap.c case as I have a test case that can reliably produce the bug
> > > > splat. This patch does fix it for my test case.
> > > 
> > > I believe this needs a more general fix than catching a random places
> > > which you can trigger. Would it make more sense to address this at the
> > > fscache level and always make sure that a page returned to the pool is
> > > always uncharged instead?
> > 
> > I believe you mean "page cache" -- there is a separate thing called
> > 'fscache' which is used to cache network filesystems.
> 
> Yes, I really had fscache in mind because it does have an "unusual" page
> life time rules.
> 
> > I don't understand the memcg code at all, so I have no useful feedback
> > on what you're saying other than this.
> 
> Well the memcg accounting rules after the rework should have simplified
> the API usage for most users. You will get memory charged when it is
> used and it will go away when the page is freed. If a page is not really
> freed in some cases and it can be reused then it doesn't really fit into
> this scheme automagically. I do undestand that this puts some additional
> burden on those special cases. I am not really sure what is the right
> way here myself but considering there might be other similar cases like
> that I would lean towards special casing where the pool is implemented.
> I would expect there is some state to be maintain for that purpose
> already.

FWIW, khugepaged does a similar type of page recycling, where when one
virtual area fails to collapse, the scanner caches the physical page
and tries to reuse it for the next area. See the mem_cgroup_uncharge()
call in collapse_huge_page().

While it's nice to be able to leave the uncharge to the free call, in
reuse cases like this I don't think it's a problem to have a matching
uncharge.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 18:29               ` Matthew Wilcox
@ 2021-01-25 18:45                 ` Waiman Long
  0 siblings, 0 replies; 26+ messages in thread
From: Waiman Long @ 2021-01-25 18:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Michal Hocko, Andrew Morton, Johannes Weiner, Alex Shi, linux-mm,
	linux-kernel

On 1/25/21 1:29 PM, Matthew Wilcox wrote:
> On Mon, Jan 25, 2021 at 01:23:58PM -0500, Waiman Long wrote:
>> On 1/25/21 1:14 PM, Michal Hocko wrote:
>>> On Mon 25-01-21 17:41:19, Michal Hocko wrote:
>>>> On Mon 25-01-21 16:25:06, Matthew Wilcox wrote:
>>>>> On Mon, Jan 25, 2021 at 05:03:28PM +0100, Michal Hocko wrote:
>>>>>> On Mon 25-01-21 10:57:54, Waiman Long wrote:
>>>>>>> On 1/25/21 4:28 AM, Michal Hocko wrote:
>>>>>>>> On Sun 24-01-21 23:24:41, Waiman Long wrote:
>>>>>>>>> The commit 3fea5a499d57 ("mm: memcontrol: convert page
>>>>>>>>> cache to a new mem_cgroup_charge() API") introduced a bug in
>>>>>>>>> __add_to_page_cache_locked() causing the following splat:
>>>>>>>>>
>>>>>>>>>     [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>>>>>>>>>     [ 1570.068333] pages's memcg:ffff8889a4116000
>>>>>>>>>     [ 1570.068343] ------------[ cut here ]------------
>>>>>>>>>     [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>>>>>>>>>     [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>>>>>>>>>     [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>>>>>>>>>     [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>>>>>>>>>     [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>>>>>>>>>       :
>>>>>>>>>     [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>>>>>>>>>     [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>>>>>>>>>     [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>>>>>>>>>     [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>>>>>>>>>     [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>>>>>>>>>     [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>>>>>>>>>     [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>>>>>>>>>     [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>     [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>>>>>>>>>     [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>>     [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>     [ 1570.068402] PKRU: 55555554
>>>>>>>>>     [ 1570.068404] Call Trace:
>>>>>>>>>     [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>>>>>>>>>     [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>>>>>>>>>     [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>>>>>>>>>     [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>>>>>>>>>     [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>>>>>>>>>     [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>>>>>>>>>     [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>>>>>>>>>     [ 1570.068693]  read_pages+0x5b1/0xc40
>>>>>>>>>     [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>>>>>>>>>     [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>>>>>>>>>     [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>>>>>>>>>     [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>>>>>>>>>     [ 1570.068872]  new_sync_read+0x3af/0x610
>>>>>>>>>     [ 1570.068901]  vfs_read+0x339/0x4b0
>>>>>>>>>     [ 1570.068909]  ksys_read+0xf1/0x1c0
>>>>>>>>>     [ 1570.068920]  do_syscall_64+0x33/0x40
>>>>>>>>>     [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>>>>     [ 1570.068930] RIP: 0033:0x7ff039135595
>>>>>>>>>
>>>>>>>>> Before that commit, there was a try_charge() and commit_charge()
>>>>>>>>> in __add_to_page_cache_locked(). These 2 separated charge functions
>>>>>>>>> were replaced by a single mem_cgroup_charge(). However, it forgot
>>>>>>>>> to add a matching mem_cgroup_uncharge() when the xarray insertion
>>>>>>>>> failed with the page released back to the pool. Fix this by adding a
>>>>>>>>> mem_cgroup_uncharge() call when insertion error happens.
>>>>>>>>>
>>>>>>>>> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
>>>>>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>>>>> OK, this is indeed a subtle bug. The patch aimed at simplifying the
>>>>>>>> charge lifetime so that users do not really have to think about when to
>>>>>>>> uncharge as that happens when the page is freed. fscache somehow breaks
>>>>>>>> that assumption because it doesn't free up pages but it keeps some of
>>>>>>>> them in the cache.
>>>>>>>>
>>>>>>>> I have tried to wrap my head around the cached object life time in
>>>>>>>> fscache but failed and got lost in the maze. Is this the only instance
>>>>>>>> of the problem? Would it make more sense to explicitly handle charges in
>>>>>>>> the fscache code or there are other potential users to fall into this
>>>>>>>> trap?
>>>>>>> There may be other places that have similar problem. I focus on the
>>>>>>> filemap.c case as I have a test case that can reliably produce the bug
>>>>>>> splat. This patch does fix it for my test case.
>>>>>> I believe this needs a more general fix than catching a random places
>>>>>> which you can trigger. Would it make more sense to address this at the
>>>>>> fscache level and always make sure that a page returned to the pool is
>>>>>> always uncharged instead?
>>>>> I believe you mean "page cache" -- there is a separate thing called
>>>>> 'fscache' which is used to cache network filesystems.
>>>> Yes, I really had fscache in mind because it does have an "unusual" page
>>>> life time rules.
>>>>
>>>>> I don't understand the memcg code at all, so I have no useful feedback
>>>>> on what you're saying other than this.
>>>> Well the memcg accounting rules after the rework should have simplified
>>>> the API usage for most users. You will get memory charged when it is
>>>> used and it will go away when the page is freed. If a page is not really
>>>> freed in some cases and it can be reused then it doesn't really fit into
>>>> this scheme automagically. I do undestand that this puts some additional
>>>> burden on those special cases. I am not really sure what is the right
>>>> way here myself but considering there might be other similar cases like
>>>> that I would lean towards special casing where the pool is implemented.
>>>> I would expect there is some state to be maintain for that purpose
>>>> already.
>>> After some more thinking I've came to conclusion that the patch as
>>> proposed is the proper way forward. It is easier to follow if the
>>> unwinding of state changes are local to the function.
>> I think so. It is easier to understand if the charge and uncharge functions
>> are grouped together in the same function.
>>> With the proposed simplification by Willy
>>> Acked-by: Michal Hocko <mhocko@suse.com>
>> Thank for the ack. However, I am a bit confused about what you mean by
>> simplification. There is another linux-next patch that changes the condition
>> for mem_cgroup_charge() to
>>
>> -       if (!huge) {
>> +       if (!huge && !page_is_secretmem(page)) {
>>                  error = mem_cgroup_charge(page, current->mm, gfp);
>>
>> That is the main reason why I introduced the boolean variable as I don't
>> want to call the external page_is_secretmem() function twice.
> See earlier emails to Mike suggesting that the accounting of secretmem
> here is wrong.
>
I am not in the secretmem discussion and so I am not aware of the issue 
with that patchset. Anyway, my goal is to minimize potential conflict 
with other patches.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 18:23             ` Waiman Long
  2021-01-25 18:29               ` Matthew Wilcox
  2021-01-25 18:31               ` Michal Hocko
@ 2021-01-25 18:52               ` Johannes Weiner
  2021-01-25 18:57                 ` Waiman Long
  2 siblings, 1 reply; 26+ messages in thread
From: Johannes Weiner @ 2021-01-25 18:52 UTC (permalink / raw)
  To: Waiman Long
  Cc: Michal Hocko, Matthew Wilcox, Andrew Morton, Alex Shi, linux-mm,
	linux-kernel

On Mon, Jan 25, 2021 at 01:23:58PM -0500, Waiman Long wrote:
> On 1/25/21 1:14 PM, Michal Hocko wrote:
> > On Mon 25-01-21 17:41:19, Michal Hocko wrote:
> > > On Mon 25-01-21 16:25:06, Matthew Wilcox wrote:
> > > > On Mon, Jan 25, 2021 at 05:03:28PM +0100, Michal Hocko wrote:
> > > > > On Mon 25-01-21 10:57:54, Waiman Long wrote:
> > > > > > On 1/25/21 4:28 AM, Michal Hocko wrote:
> > > > > > > On Sun 24-01-21 23:24:41, Waiman Long wrote:
> > > > > > > > The commit 3fea5a499d57 ("mm: memcontrol: convert page
> > > > > > > > cache to a new mem_cgroup_charge() API") introduced a bug in
> > > > > > > > __add_to_page_cache_locked() causing the following splat:
> > > > > > > > 
> > > > > > > >    [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
> > > > > > > >    [ 1570.068333] pages's memcg:ffff8889a4116000
> > > > > > > >    [ 1570.068343] ------------[ cut here ]------------
> > > > > > > >    [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
> > > > > > > >    [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
> > > > > > > >    [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
> > > > > > > >    [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
> > > > > > > >    [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
> > > > > > > >      :
> > > > > > > >    [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
> > > > > > > >    [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
> > > > > > > >    [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
> > > > > > > >    [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
> > > > > > > >    [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
> > > > > > > >    [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
> > > > > > > >    [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
> > > > > > > >    [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > >    [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
> > > > > > > >    [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > >    [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > >    [ 1570.068402] PKRU: 55555554
> > > > > > > >    [ 1570.068404] Call Trace:
> > > > > > > >    [ 1570.068407]  mem_cgroup_charge+0x175/0x770
> > > > > > > >    [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
> > > > > > > >    [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
> > > > > > > >    [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
> > > > > > > >    [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
> > > > > > > >    [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
> > > > > > > >    [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
> > > > > > > >    [ 1570.068693]  read_pages+0x5b1/0xc40
> > > > > > > >    [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
> > > > > > > >    [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
> > > > > > > >    [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
> > > > > > > >    [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
> > > > > > > >    [ 1570.068872]  new_sync_read+0x3af/0x610
> > > > > > > >    [ 1570.068901]  vfs_read+0x339/0x4b0
> > > > > > > >    [ 1570.068909]  ksys_read+0xf1/0x1c0
> > > > > > > >    [ 1570.068920]  do_syscall_64+0x33/0x40
> > > > > > > >    [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > > >    [ 1570.068930] RIP: 0033:0x7ff039135595
> > > > > > > > 
> > > > > > > > Before that commit, there was a try_charge() and commit_charge()
> > > > > > > > in __add_to_page_cache_locked(). These 2 separated charge functions
> > > > > > > > were replaced by a single mem_cgroup_charge(). However, it forgot
> > > > > > > > to add a matching mem_cgroup_uncharge() when the xarray insertion
> > > > > > > > failed with the page released back to the pool. Fix this by adding a
> > > > > > > > mem_cgroup_uncharge() call when insertion error happens.
> > > > > > > > 
> > > > > > > > Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> > > > > > > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > > > > > OK, this is indeed a subtle bug. The patch aimed at simplifying the
> > > > > > > charge lifetime so that users do not really have to think about when to
> > > > > > > uncharge as that happens when the page is freed. fscache somehow breaks
> > > > > > > that assumption because it doesn't free up pages but it keeps some of
> > > > > > > them in the cache.
> > > > > > > 
> > > > > > > I have tried to wrap my head around the cached object life time in
> > > > > > > fscache but failed and got lost in the maze. Is this the only instance
> > > > > > > of the problem? Would it make more sense to explicitly handle charges in
> > > > > > > the fscache code or there are other potential users to fall into this
> > > > > > > trap?
> > > > > > There may be other places that have similar problem. I focus on the
> > > > > > filemap.c case as I have a test case that can reliably produce the bug
> > > > > > splat. This patch does fix it for my test case.
> > > > > I believe this needs a more general fix than catching a random places
> > > > > which you can trigger. Would it make more sense to address this at the
> > > > > fscache level and always make sure that a page returned to the pool is
> > > > > always uncharged instead?
> > > > I believe you mean "page cache" -- there is a separate thing called
> > > > 'fscache' which is used to cache network filesystems.
> > > Yes, I really had fscache in mind because it does have an "unusual" page
> > > life time rules.
> > > 
> > > > I don't understand the memcg code at all, so I have no useful feedback
> > > > on what you're saying other than this.
> > > Well the memcg accounting rules after the rework should have simplified
> > > the API usage for most users. You will get memory charged when it is
> > > used and it will go away when the page is freed. If a page is not really
> > > freed in some cases and it can be reused then it doesn't really fit into
> > > this scheme automagically. I do undestand that this puts some additional
> > > burden on those special cases. I am not really sure what is the right
> > > way here myself but considering there might be other similar cases like
> > > that I would lean towards special casing where the pool is implemented.
> > > I would expect there is some state to be maintain for that purpose
> > > already.
> > After some more thinking I've came to conclusion that the patch as
> > proposed is the proper way forward. It is easier to follow if the
> > unwinding of state changes are local to the function.
> I think so. It is easier to understand if the charge and uncharge functions
> are grouped together in the same function.
> > 
> > With the proposed simplification by Willy
> > Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thank for the ack. However, I am a bit confused about what you mean by
> simplification. There is another linux-next patch that changes the condition
> for mem_cgroup_charge() to
> 
> -       if (!huge) {
> +       if (!huge && !page_is_secretmem(page)) {
>                 error = mem_cgroup_charge(page, current->mm, gfp);
> 
> That is the main reason why I introduced the boolean variable as I don't
> want to call the external page_is_secretmem() function twice.

The variable works for me.

On the other hand, as Michal points out, the uncharge function will be
called again on the page when it's being freed (in non-fscache cases),
so you're already relying on being able to call it on any page -
charged, uncharged, never charged. It would be fine to call it
unconditionally in the error path. Aesthetic preference, I guess.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 18:52               ` Johannes Weiner
@ 2021-01-25 18:57                 ` Waiman Long
  2021-01-26  8:01                   ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: Waiman Long @ 2021-01-25 18:57 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Matthew Wilcox, Andrew Morton, Alex Shi, linux-mm,
	linux-kernel

On 1/25/21 1:52 PM, Johannes Weiner wrote:
> On Mon, Jan 25, 2021 at 01:23:58PM -0500, Waiman Long wrote:
>> On 1/25/21 1:14 PM, Michal Hocko wrote:
>>> On Mon 25-01-21 17:41:19, Michal Hocko wrote:
>>>> On Mon 25-01-21 16:25:06, Matthew Wilcox wrote:
>>>>> On Mon, Jan 25, 2021 at 05:03:28PM +0100, Michal Hocko wrote:
>>>>>> On Mon 25-01-21 10:57:54, Waiman Long wrote:
>>>>>>> On 1/25/21 4:28 AM, Michal Hocko wrote:
>>>>>>>> On Sun 24-01-21 23:24:41, Waiman Long wrote:
>>>>>>>>> The commit 3fea5a499d57 ("mm: memcontrol: convert page
>>>>>>>>> cache to a new mem_cgroup_charge() API") introduced a bug in
>>>>>>>>> __add_to_page_cache_locked() causing the following splat:
>>>>>>>>>
>>>>>>>>>     [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>>>>>>>>>     [ 1570.068333] pages's memcg:ffff8889a4116000
>>>>>>>>>     [ 1570.068343] ------------[ cut here ]------------
>>>>>>>>>     [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>>>>>>>>>     [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>>>>>>>>>     [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>>>>>>>>>     [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>>>>>>>>>     [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>>>>>>>>>       :
>>>>>>>>>     [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>>>>>>>>>     [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>>>>>>>>>     [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>>>>>>>>>     [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>>>>>>>>>     [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>>>>>>>>>     [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>>>>>>>>>     [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>>>>>>>>>     [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>     [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>>>>>>>>>     [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>>     [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>     [ 1570.068402] PKRU: 55555554
>>>>>>>>>     [ 1570.068404] Call Trace:
>>>>>>>>>     [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>>>>>>>>>     [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>>>>>>>>>     [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>>>>>>>>>     [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>>>>>>>>>     [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>>>>>>>>>     [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>>>>>>>>>     [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>>>>>>>>>     [ 1570.068693]  read_pages+0x5b1/0xc40
>>>>>>>>>     [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>>>>>>>>>     [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>>>>>>>>>     [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>>>>>>>>>     [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>>>>>>>>>     [ 1570.068872]  new_sync_read+0x3af/0x610
>>>>>>>>>     [ 1570.068901]  vfs_read+0x339/0x4b0
>>>>>>>>>     [ 1570.068909]  ksys_read+0xf1/0x1c0
>>>>>>>>>     [ 1570.068920]  do_syscall_64+0x33/0x40
>>>>>>>>>     [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>>>>     [ 1570.068930] RIP: 0033:0x7ff039135595
>>>>>>>>>
>>>>>>>>> Before that commit, there was a try_charge() and commit_charge()
>>>>>>>>> in __add_to_page_cache_locked(). These 2 separated charge functions
>>>>>>>>> were replaced by a single mem_cgroup_charge(). However, it forgot
>>>>>>>>> to add a matching mem_cgroup_uncharge() when the xarray insertion
>>>>>>>>> failed with the page released back to the pool. Fix this by adding a
>>>>>>>>> mem_cgroup_uncharge() call when insertion error happens.
>>>>>>>>>
>>>>>>>>> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
>>>>>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>>>>> OK, this is indeed a subtle bug. The patch aimed at simplifying the
>>>>>>>> charge lifetime so that users do not really have to think about when to
>>>>>>>> uncharge as that happens when the page is freed. fscache somehow breaks
>>>>>>>> that assumption because it doesn't free up pages but it keeps some of
>>>>>>>> them in the cache.
>>>>>>>>
>>>>>>>> I have tried to wrap my head around the cached object life time in
>>>>>>>> fscache but failed and got lost in the maze. Is this the only instance
>>>>>>>> of the problem? Would it make more sense to explicitly handle charges in
>>>>>>>> the fscache code or there are other potential users to fall into this
>>>>>>>> trap?
>>>>>>> There may be other places that have similar problem. I focus on the
>>>>>>> filemap.c case as I have a test case that can reliably produce the bug
>>>>>>> splat. This patch does fix it for my test case.
>>>>>> I believe this needs a more general fix than catching a random places
>>>>>> which you can trigger. Would it make more sense to address this at the
>>>>>> fscache level and always make sure that a page returned to the pool is
>>>>>> always uncharged instead?
>>>>> I believe you mean "page cache" -- there is a separate thing called
>>>>> 'fscache' which is used to cache network filesystems.
>>>> Yes, I really had fscache in mind because it does have an "unusual" page
>>>> life time rules.
>>>>
>>>>> I don't understand the memcg code at all, so I have no useful feedback
>>>>> on what you're saying other than this.
>>>> Well the memcg accounting rules after the rework should have simplified
>>>> the API usage for most users. You will get memory charged when it is
>>>> used and it will go away when the page is freed. If a page is not really
>>>> freed in some cases and it can be reused then it doesn't really fit into
>>>> this scheme automagically. I do undestand that this puts some additional
>>>> burden on those special cases. I am not really sure what is the right
>>>> way here myself but considering there might be other similar cases like
>>>> that I would lean towards special casing where the pool is implemented.
>>>> I would expect there is some state to be maintain for that purpose
>>>> already.
>>> After some more thinking I've came to conclusion that the patch as
>>> proposed is the proper way forward. It is easier to follow if the
>>> unwinding of state changes are local to the function.
>> I think so. It is easier to understand if the charge and uncharge functions
>> are grouped together in the same function.
>>> With the proposed simplification by Willy
>>> Acked-by: Michal Hocko <mhocko@suse.com>
>> Thank for the ack. However, I am a bit confused about what you mean by
>> simplification. There is another linux-next patch that changes the condition
>> for mem_cgroup_charge() to
>>
>> -       if (!huge) {
>> +       if (!huge && !page_is_secretmem(page)) {
>>                  error = mem_cgroup_charge(page, current->mm, gfp);
>>
>> That is the main reason why I introduced the boolean variable as I don't
>> want to call the external page_is_secretmem() function twice.
> The variable works for me.
>
> On the other hand, as Michal points out, the uncharge function will be
> called again on the page when it's being freed (in non-fscache cases),
> so you're already relying on being able to call it on any page -
> charged, uncharged, never charged. It would be fine to call it
> unconditionally in the error path. Aesthetic preference, I guess.

That may be true. However, I haven't fully studied how the huge page 
memory accounting work to make sure the uncharge function can be called 
for huge pages. So I will keep the current code for now.

Thanks,
Longman



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25  4:24 [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Waiman Long
                   ` (4 preceding siblings ...)
  2021-01-25  9:28 ` Michal Hocko
@ 2021-01-25 19:32 ` Johannes Weiner
  5 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2021-01-25 19:32 UTC (permalink / raw)
  To: Waiman Long; +Cc: Andrew Morton, Alex Shi, linux-mm, linux-kernel

On Sun, Jan 24, 2021 at 11:24:41PM -0500, Waiman Long wrote:
> The commit 3fea5a499d57 ("mm: memcontrol: convert page
> cache to a new mem_cgroup_charge() API") introduced a bug in
> __add_to_page_cache_locked() causing the following splat:
> 
>  [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
>  [ 1570.068333] pages's memcg:ffff8889a4116000
>  [ 1570.068343] ------------[ cut here ]------------
>  [ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
>  [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
>  [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
>  [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
>  [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
>    :
>  [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
>  [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
>  [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
>  [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
>  [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
>  [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
>  [ 1570.068391] FS:  00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
>  [ 1570.068394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
>  [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  [ 1570.068402] PKRU: 55555554
>  [ 1570.068404] Call Trace:
>  [ 1570.068407]  mem_cgroup_charge+0x175/0x770
>  [ 1570.068413]  __add_to_page_cache_locked+0x712/0xad0
>  [ 1570.068439]  add_to_page_cache_lru+0xc5/0x1f0
>  [ 1570.068461]  cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
>  [ 1570.068524]  __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
>  [ 1570.068540]  __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
>  [ 1570.068585]  nfs_readpages+0x24e/0x540 [nfs]
>  [ 1570.068693]  read_pages+0x5b1/0xc40
>  [ 1570.068711]  page_cache_ra_unbounded+0x460/0x750
>  [ 1570.068729]  generic_file_buffered_read_get_pages+0x290/0x1710
>  [ 1570.068756]  generic_file_buffered_read+0x2a9/0xc30
>  [ 1570.068832]  nfs_file_read+0x13f/0x230 [nfs]
>  [ 1570.068872]  new_sync_read+0x3af/0x610
>  [ 1570.068901]  vfs_read+0x339/0x4b0
>  [ 1570.068909]  ksys_read+0xf1/0x1c0
>  [ 1570.068920]  do_syscall_64+0x33/0x40
>  [ 1570.068926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  [ 1570.068930] RIP: 0033:0x7ff039135595
> 
> Before that commit, there was a try_charge() and commit_charge()
> in __add_to_page_cache_locked(). These 2 separated charge functions
> were replaced by a single mem_cgroup_charge(). However, it forgot
> to add a matching mem_cgroup_uncharge() when the xarray insertion
> failed with the page released back to the pool. Fix this by adding a
> mem_cgroup_uncharge() call when insertion error happens.
> 
> Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
> Signed-off-by: Waiman Long <longman@redhat.com>

Thanks Waiman.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Since this crashes the kernel, we should also add

Cc: stable@vger.kernel.org # 5.8+

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 14:12   ` Waiman Long
@ 2021-01-25 23:11     ` Andrew Morton
  2021-01-25 23:13       ` Waiman Long
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2021-01-25 23:11 UTC (permalink / raw)
  To: Waiman Long; +Cc: Miaohe Lin, linux-mm, linux-kernel, Johannes Weiner, Alex Shi

On Mon, 25 Jan 2021 09:12:56 -0500 Waiman Long <longman@redhat.com> wrote:

> Yes, this patch should go to stable. I think the stable tree maintainers 
> will automatically pick up patches with the "Fixes" tag. That is why I 
> don't explicitly put a "cc:stable" line in the patch.

No.  Both the Fixes: and the cc:stable are needed for the backport. 
Because sometimes we fix things but don't consider the fix important
enough to risk the backport.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 23:11     ` Andrew Morton
@ 2021-01-25 23:13       ` Waiman Long
  0 siblings, 0 replies; 26+ messages in thread
From: Waiman Long @ 2021-01-25 23:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Miaohe Lin, linux-mm, linux-kernel, Johannes Weiner, Alex Shi

On 1/25/21 6:11 PM, Andrew Morton wrote:
> On Mon, 25 Jan 2021 09:12:56 -0500 Waiman Long <longman@redhat.com> wrote:
>
>> Yes, this patch should go to stable. I think the stable tree maintainers
>> will automatically pick up patches with the "Fixes" tag. That is why I
>> don't explicitly put a "cc:stable" line in the patch.
> No.  Both the Fixes: and the cc:stable are needed for the backport.
> Because sometimes we fix things but don't consider the fix important
> enough to risk the backport.
>
I see. Thanks for the clarification. I will keep that in mind next time.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
  2021-01-25 18:57                 ` Waiman Long
@ 2021-01-26  8:01                   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2021-01-26  8:01 UTC (permalink / raw)
  To: Waiman Long
  Cc: Johannes Weiner, Matthew Wilcox, Andrew Morton, Alex Shi,
	linux-mm, linux-kernel

On Mon 25-01-21 13:57:18, Waiman Long wrote:
> On 1/25/21 1:52 PM, Johannes Weiner wrote:
> > On Mon, Jan 25, 2021 at 01:23:58PM -0500, Waiman Long wrote:
> > > On 1/25/21 1:14 PM, Michal Hocko wrote:
[...]
> > > > With the proposed simplification by Willy
> > > > Acked-by: Michal Hocko <mhocko@suse.com>
> > > Thank for the ack. However, I am a bit confused about what you mean by
> > > simplification. There is another linux-next patch that changes the condition
> > > for mem_cgroup_charge() to
> > > 
> > > -       if (!huge) {
> > > +       if (!huge && !page_is_secretmem(page)) {
> > >                  error = mem_cgroup_charge(page, current->mm, gfp);
> > > 
> > > That is the main reason why I introduced the boolean variable as I don't
> > > want to call the external page_is_secretmem() function twice.
> > The variable works for me.
> > 
> > On the other hand, as Michal points out, the uncharge function will be
> > called again on the page when it's being freed (in non-fscache cases),
> > so you're already relying on being able to call it on any page -
> > charged, uncharged, never charged. It would be fine to call it
> > unconditionally in the error path. Aesthetic preference, I guess.

Yes aesthetic preference... Just compare to 
diff --git a/mm/filemap.c b/mm/filemap.c
index 5c9d564317a5..7aa05420107e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -896,11 +896,14 @@ noinline int __add_to_page_cache_locked(struct page *page,
 
 	if (xas_error(&xas)) {
 		error = xas_error(&xas);
-		goto error;
+		goto error_uncharge;
 	}
 
 	trace_mm_filemap_add_to_page_cache(page);
 	return 0;
+error_uncharge:
+	/* memcg will ignore uncharged pages */
+	mem_cgroup_uncharge(page);
 error:
 	page->mapping = NULL;
 	/* Leave page->index set: truncation relies upon it */

which resembles our usual state unwinding style much more.

> That may be true. However, I haven't fully studied how the huge page memory
> accounting work to make sure the uncharge function can be called for huge
> pages.

... but this is rather lame argument to make, don't you think. This
sounds like a ducktaping engineering to me. Over time this leads to a
terrible code. Seriously!

All that being said I do not want to block this or bother people with
more emails but geez
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-01-26 20:46 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25  4:24 [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Waiman Long
2021-01-25  4:36 ` Matthew Wilcox
2021-01-25  6:34   ` Miaohe Lin
2021-01-25 14:09   ` Waiman Long
2021-01-25  4:41 ` Alex Shi
2021-01-25  6:30 ` Miaohe Lin
2021-01-25 14:12   ` Waiman Long
2021-01-25 23:11     ` Andrew Morton
2021-01-25 23:13       ` Waiman Long
2021-01-25  8:07 ` Muchun Song
2021-01-25 15:35   ` Waiman Long
2021-01-25  9:28 ` Michal Hocko
2021-01-25 15:57   ` Waiman Long
2021-01-25 16:03     ` Michal Hocko
2021-01-25 16:25       ` Matthew Wilcox
2021-01-25 16:41         ` Michal Hocko
2021-01-25 18:14           ` Michal Hocko
2021-01-25 18:23             ` Waiman Long
2021-01-25 18:29               ` Matthew Wilcox
2021-01-25 18:45                 ` Waiman Long
2021-01-25 18:31               ` Michal Hocko
2021-01-25 18:52               ` Johannes Weiner
2021-01-25 18:57                 ` Waiman Long
2021-01-26  8:01                   ` Michal Hocko
2021-01-25 18:43           ` Johannes Weiner
2021-01-25 19:32 ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).