linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/memcg: set memcg when split pages
@ 2021-03-02  1:34 Zhou Guanghui
  2021-03-02  1:59 ` Zi Yan
  2021-03-02  9:17 ` Michal Hocko
  0 siblings, 2 replies; 12+ messages in thread
From: Zhou Guanghui @ 2021-03-02  1:34 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: akpm, npiggin, wangkefeng.wang, guohanjun, dingtianhong,
	chenweilong, rui.xiang

When split page, the memory cgroup info recorded in first page is
not copied to tail pages. In this case, when the tail pages are
freed, the uncharge operation is not performed. As a result, the
usage of this memcg keeps increasing, and the OOM may occur.

So, the copying of first page's memory cgroup info to tail pages
is needed when split page.

Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
---
 include/linux/memcontrol.h | 10 ++++++++++
 mm/page_alloc.c            |  4 +++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e6dc793d587d..c7e2b4421dc1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
 extern bool cgroup_memory_noswap;
 #endif
 
+static inline void copy_page_memcg(struct page *dst, struct page *src)
+{
+	if (src->memcg_data)
+		dst->memcg_data = src->memcg_data;
+}
+
 struct mem_cgroup *lock_page_memcg(struct page *page);
 void __unlock_page_memcg(struct mem_cgroup *memcg);
 void unlock_page_memcg(struct page *page);
@@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
 {
 }
 
+static inline void copy_page_memcg(struct page *dst, struct page *src)
+{
+}
+
 static inline struct mem_cgroup *lock_page_memcg(struct page *page)
 {
 	return NULL;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e4b29ee2b1e..ee0a63dc1c9b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
 	VM_BUG_ON_PAGE(PageCompound(page), page);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 
-	for (i = 1; i < (1 << order); i++)
+	for (i = 1; i < (1 << order); i++) {
 		set_page_refcounted(page + i);
+		copy_page_memcg(page + i, page);
+	}
 	split_page_owner(page, 1 << order);
 }
 EXPORT_SYMBOL_GPL(split_page);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02  1:34 [PATCH] mm/memcg: set memcg when split pages Zhou Guanghui
@ 2021-03-02  1:59 ` Zi Yan
  2021-03-02  7:05   ` Zhouguanghui (OS Kernel)
  2021-03-02  9:17 ` Michal Hocko
  1 sibling, 1 reply; 12+ messages in thread
From: Zi Yan @ 2021-03-02  1:59 UTC (permalink / raw)
  To: Zhou Guanghui
  Cc: linux-kernel, linux-mm, akpm, npiggin, wangkefeng.wang,
	guohanjun, dingtianhong, chenweilong, rui.xiang, Johannes Weiner,
	Michal Hocko, Vladimir Davydov

[-- Attachment #1: Type: text/plain, Size: 2357 bytes --]

On 1 Mar 2021, at 20:34, Zhou Guanghui wrote:

> When split page, the memory cgroup info recorded in first page is
> not copied to tail pages. In this case, when the tail pages are
> freed, the uncharge operation is not performed. As a result, the
> usage of this memcg keeps increasing, and the OOM may occur.
>
> So, the copying of first page's memory cgroup info to tail pages
> is needed when split page.
>
> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
> ---
>  include/linux/memcontrol.h | 10 ++++++++++
>  mm/page_alloc.c            |  4 +++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e6dc793d587d..c7e2b4421dc1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
>  extern bool cgroup_memory_noswap;
>  #endif
>
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> +	if (src->memcg_data)
> +		dst->memcg_data = src->memcg_data;
> +}
> +
>  struct mem_cgroup *lock_page_memcg(struct page *page);
>  void __unlock_page_memcg(struct mem_cgroup *memcg);
>  void unlock_page_memcg(struct page *page);
> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
>  {
>  }
>
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> +}
> +
>  static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>  {
>  	return NULL;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
>  	VM_BUG_ON_PAGE(PageCompound(page), page);
>  	VM_BUG_ON_PAGE(!page_count(page), page);
>
> -	for (i = 1; i < (1 << order); i++)
> +	for (i = 1; i < (1 << order); i++) {
>  		set_page_refcounted(page + i);
> +		copy_page_memcg(page + i, page);
> +	}
>  	split_page_owner(page, 1 << order);
>  }
>  EXPORT_SYMBOL_GPL(split_page);
> -- 
> 2.25.0

+memcg maintainers

split_page() is used for non-compound higher-order pages. I am not sure
if there is any such pages monitored by memcg. Please let me know
if I miss anything.

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02  1:59 ` Zi Yan
@ 2021-03-02  7:05   ` Zhouguanghui (OS Kernel)
  2021-03-02 15:37     ` Zi Yan
  0 siblings, 1 reply; 12+ messages in thread
From: Zhouguanghui (OS Kernel) @ 2021-03-02  7:05 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-kernel, linux-mm, akpm, npiggin, Wangkefeng (OS Kernel Lab),
	Guohanjun (Hanjun Guo),
	Dingtianhong, Chenweilong, Xiangrui (Euler),
	Johannes Weiner, Michal Hocko, Vladimir Davydov

在 2021/3/2 10:00, Zi Yan 写道:
> On 1 Mar 2021, at 20:34, Zhou Guanghui wrote:
> 
>> When split page, the memory cgroup info recorded in first page is
>> not copied to tail pages. In this case, when the tail pages are
>> freed, the uncharge operation is not performed. As a result, the
>> usage of this memcg keeps increasing, and the OOM may occur.
>>
>> So, the copying of first page's memory cgroup info to tail pages
>> is needed when split page.
>>
>> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
>> ---
>>   include/linux/memcontrol.h | 10 ++++++++++
>>   mm/page_alloc.c            |  4 +++-
>>   2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index e6dc793d587d..c7e2b4421dc1 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
>>   extern bool cgroup_memory_noswap;
>>   #endif
>>
>> +static inline void copy_page_memcg(struct page *dst, struct page *src)
>> +{
>> +	if (src->memcg_data)
>> +		dst->memcg_data = src->memcg_data;
>> +}
>> +
>>   struct mem_cgroup *lock_page_memcg(struct page *page);
>>   void __unlock_page_memcg(struct mem_cgroup *memcg);
>>   void unlock_page_memcg(struct page *page);
>> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
>>   {
>>   }
>>
>> +static inline void copy_page_memcg(struct page *dst, struct page *src)
>> +{
>> +}
>> +
>>   static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>>   {
>>   	return NULL;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
>>   	VM_BUG_ON_PAGE(PageCompound(page), page);
>>   	VM_BUG_ON_PAGE(!page_count(page), page);
>>
>> -	for (i = 1; i < (1 << order); i++)
>> +	for (i = 1; i < (1 << order); i++) {
>>   		set_page_refcounted(page + i);
>> +		copy_page_memcg(page + i, page);
>> +	}
>>   	split_page_owner(page, 1 << order);
>>   }
>>   EXPORT_SYMBOL_GPL(split_page);
>> -- 
>> 2.25.0
> 
> +memcg maintainers
> 
> split_page() is used for non-compound higher-order pages. I am not sure
> if there is any such pages monitored by memcg. Please let me know
> if I miss anything.

Thank you for taking time for this.

This should be put in kmemcg, and I'll modify it.

When the kmemcg is enabled and _GFP_ACCOUNT is set, the charged and 
uncharged sizes do not match when alloc/free_pages_exact method is used 
to apply for or free memory with exact size. This is because memcg data 
of the tail page is not set during the split page.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02  1:34 [PATCH] mm/memcg: set memcg when split pages Zhou Guanghui
  2021-03-02  1:59 ` Zi Yan
@ 2021-03-02  9:17 ` Michal Hocko
  2021-03-02 15:51   ` Michal Hocko
  2021-03-02 20:24   ` Hugh Dickins
  1 sibling, 2 replies; 12+ messages in thread
From: Michal Hocko @ 2021-03-02  9:17 UTC (permalink / raw)
  To: Zhou Guanghui
  Cc: linux-kernel, linux-mm, akpm, wangkefeng.wang, guohanjun,
	dingtianhong, chenweilong, rui.xiang, Johannes Weiner,
	Nicholas Piggin

[Cc Johannes for awareness and fixup Nick's email]

On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> When split page, the memory cgroup info recorded in first page is
> not copied to tail pages. In this case, when the tail pages are
> freed, the uncharge operation is not performed. As a result, the
> usage of this memcg keeps increasing, and the OOM may occur.
> 
> So, the copying of first page's memory cgroup info to tail pages
> is needed when split page.

I was not aware that alloc_pages_exact is used for accounted allocations
but git grep told me otherwise so this is not a theoretical one. Both
users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
used in dma allocator but I got lost in indirection so I have no idea
whether there are any users there.

The page itself looks reasonable to me.

> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>

Acked-by: Michal Hocko <mhocko@suse.com>

Minor nit

> ---
>  include/linux/memcontrol.h | 10 ++++++++++
>  mm/page_alloc.c            |  4 +++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e6dc793d587d..c7e2b4421dc1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
>  extern bool cgroup_memory_noswap;
>  #endif
>  
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> +	if (src->memcg_data)
> +		dst->memcg_data = src->memcg_data;

I would just drop the test. The struct page is a single cache line which
is dirty by the reference count so another store will unlikely be
noticeable even when NULL is stored here and you safe a conditional.

> +}
> +
>  struct mem_cgroup *lock_page_memcg(struct page *page);
>  void __unlock_page_memcg(struct mem_cgroup *memcg);
>  void unlock_page_memcg(struct page *page);
> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
>  {
>  }
>  
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> +}
> +
>  static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>  {
>  	return NULL;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
>  	VM_BUG_ON_PAGE(PageCompound(page), page);
>  	VM_BUG_ON_PAGE(!page_count(page), page);
>  
> -	for (i = 1; i < (1 << order); i++)
> +	for (i = 1; i < (1 << order); i++) {
>  		set_page_refcounted(page + i);
> +		copy_page_memcg(page + i, page);
> +	}
>  	split_page_owner(page, 1 << order);
>  }
>  EXPORT_SYMBOL_GPL(split_page);
> -- 
> 2.25.0
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02  7:05   ` Zhouguanghui (OS Kernel)
@ 2021-03-02 15:37     ` Zi Yan
  2021-03-02 15:42       ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Zi Yan @ 2021-03-02 15:37 UTC (permalink / raw)
  To: Zhouguanghui (OS Kernel)
  Cc: linux-kernel, linux-mm, akpm, npiggin, Wangkefeng (OS Kernel Lab),
	Guohanjun (Hanjun Guo),
	Dingtianhong, Chenweilong, Xiangrui (Euler),
	Johannes Weiner, Michal Hocko, Vladimir Davydov

[-- Attachment #1: Type: text/plain, Size: 3421 bytes --]

On 2 Mar 2021, at 2:05, Zhouguanghui (OS Kernel) wrote:

> 在 2021/3/2 10:00, Zi Yan 写道:
>> On 1 Mar 2021, at 20:34, Zhou Guanghui wrote:
>>
>>> When split page, the memory cgroup info recorded in first page is
>>> not copied to tail pages. In this case, when the tail pages are
>>> freed, the uncharge operation is not performed. As a result, the
>>> usage of this memcg keeps increasing, and the OOM may occur.
>>>
>>> So, the copying of first page's memory cgroup info to tail pages
>>> is needed when split page.
>>>
>>> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
>>> ---
>>>   include/linux/memcontrol.h | 10 ++++++++++
>>>   mm/page_alloc.c            |  4 +++-
>>>   2 files changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>> index e6dc793d587d..c7e2b4421dc1 100644
>>> --- a/include/linux/memcontrol.h
>>> +++ b/include/linux/memcontrol.h
>>> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
>>>   extern bool cgroup_memory_noswap;
>>>   #endif
>>>
>>> +static inline void copy_page_memcg(struct page *dst, struct page *src)
>>> +{
>>> +	if (src->memcg_data)
>>> +		dst->memcg_data = src->memcg_data;
>>> +}
>>> +
>>>   struct mem_cgroup *lock_page_memcg(struct page *page);
>>>   void __unlock_page_memcg(struct mem_cgroup *memcg);
>>>   void unlock_page_memcg(struct page *page);
>>> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
>>>   {
>>>   }
>>>
>>> +static inline void copy_page_memcg(struct page *dst, struct page *src)
>>> +{
>>> +}
>>> +
>>>   static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>>>   {
>>>   	return NULL;
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
>>>   	VM_BUG_ON_PAGE(PageCompound(page), page);
>>>   	VM_BUG_ON_PAGE(!page_count(page), page);
>>>
>>> -	for (i = 1; i < (1 << order); i++)
>>> +	for (i = 1; i < (1 << order); i++) {
>>>   		set_page_refcounted(page + i);
>>> +		copy_page_memcg(page + i, page);
>>> +	}
>>>   	split_page_owner(page, 1 << order);
>>>   }
>>>   EXPORT_SYMBOL_GPL(split_page);
>>> -- 
>>> 2.25.0
>>
>> +memcg maintainers
>>
>> split_page() is used for non-compound higher-order pages. I am not sure
>> if there is any such pages monitored by memcg. Please let me know
>> if I miss anything.
>
> Thank you for taking time for this.
>
> This should be put in kmemcg, and I'll modify it.
>
> When the kmemcg is enabled and _GFP_ACCOUNT is set, the charged and
> uncharged sizes do not match when alloc/free_pages_exact method is used
> to apply for or free memory with exact size. This is because memcg data
> of the tail page is not set during the split page.

Thanks for your clarification. I missed kmemcg.

I have a question on copy_page_memcg above. By reading __memcg_kmem_charge_page
and __memcg_kmem_uncharge_page, it seems to me that every single page requires
a css_get(&memcg->css) at charge time and a css_put(&memcg->css) at uncharge time.
But your copy_page_memcg does not do css_get for split subpages. Will it cause
memcg->css underflow when subpages are uncharged?


—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02 15:37     ` Zi Yan
@ 2021-03-02 15:42       ` Michal Hocko
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2021-03-02 15:42 UTC (permalink / raw)
  To: Zi Yan
  Cc: Zhouguanghui (OS Kernel),
	linux-kernel, linux-mm, akpm, Wangkefeng (OS Kernel Lab),
	Guohanjun (Hanjun Guo),
	Dingtianhong, Chenweilong, Xiangrui (Euler),
	Johannes Weiner, Vladimir Davydov, Nicholas Piggin

On Tue 02-03-21 10:37:13, Zi Yan wrote:
[...]
> I have a question on copy_page_memcg above. By reading __memcg_kmem_charge_page
> and __memcg_kmem_uncharge_page, it seems to me that every single page requires
> a css_get(&memcg->css) at charge time and a css_put(&memcg->css) at uncharge time.
> But your copy_page_memcg does not do css_get for split subpages. Will it cause
> memcg->css underflow when subpages are uncharged?

yes, well spotted. I have completely missed that. This will also discard
my comment on testing the memcg.


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02  9:17 ` Michal Hocko
@ 2021-03-02 15:51   ` Michal Hocko
  2021-03-02 20:24   ` Hugh Dickins
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2021-03-02 15:51 UTC (permalink / raw)
  To: Zhou Guanghui
  Cc: linux-kernel, linux-mm, akpm, wangkefeng.wang, guohanjun,
	dingtianhong, chenweilong, rui.xiang, Johannes Weiner,
	Nicholas Piggin

On Tue 02-03-21 10:17:03, Michal Hocko wrote:
> [Cc Johannes for awareness and fixup Nick's email]
> 
> On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > When split page, the memory cgroup info recorded in first page is
> > not copied to tail pages. In this case, when the tail pages are
> > freed, the uncharge operation is not performed. As a result, the
> > usage of this memcg keeps increasing, and the OOM may occur.
> > 
> > So, the copying of first page's memory cgroup info to tail pages
> > is needed when split page.
> 
> I was not aware that alloc_pages_exact is used for accounted allocations
> but git grep told me otherwise so this is not a theoretical one. Both
> users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> used in dma allocator but I got lost in indirection so I have no idea
> whether there are any users there.
> 
> The page itself looks reasonable to me.
> 
> > Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Minor nit
> 
> > ---
> >  include/linux/memcontrol.h | 10 ++++++++++
> >  mm/page_alloc.c            |  4 +++-
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index e6dc793d587d..c7e2b4421dc1 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
> >  extern bool cgroup_memory_noswap;
> >  #endif
> >  
> > +static inline void copy_page_memcg(struct page *dst, struct page *src)
> > +{
> > +	if (src->memcg_data)
> > +		dst->memcg_data = src->memcg_data;
> 
> I would just drop the test. The struct page is a single cache line which
> is dirty by the reference count so another store will unlikely be
> noticeable even when NULL is stored here and you safe a conditional.

Disregard this. As Zi Yan mentioned in other reply, we need to keep the
check and take a css reference along with transfering the memcg.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02  9:17 ` Michal Hocko
  2021-03-02 15:51   ` Michal Hocko
@ 2021-03-02 20:24   ` Hugh Dickins
  2021-03-02 22:56     ` Johannes Weiner
  1 sibling, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2021-03-02 20:24 UTC (permalink / raw)
  To: Zhou Guanghui
  Cc: Michal Hocko, linux-kernel, linux-mm, akpm, wangkefeng.wang,
	guohanjun, dingtianhong, chenweilong, rui.xiang, Johannes Weiner,
	Nicholas Piggin, Kirill A. Shutemov, Hugh Dickins

On Tue, 2 Mar 2021, Michal Hocko wrote:
> [Cc Johannes for awareness and fixup Nick's email]
> 
> On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > When split page, the memory cgroup info recorded in first page is
> > not copied to tail pages. In this case, when the tail pages are
> > freed, the uncharge operation is not performed. As a result, the
> > usage of this memcg keeps increasing, and the OOM may occur.
> > 
> > So, the copying of first page's memory cgroup info to tail pages
> > is needed when split page.
> 
> I was not aware that alloc_pages_exact is used for accounted allocations
> but git grep told me otherwise so this is not a theoretical one. Both
> users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> used in dma allocator but I got lost in indirection so I have no idea
> whether there are any users there.

Yes, it's a bit worrying that such a low-level thing as split_page()
can now get caught up in memcg accounting, but I suppose that's okay.

I feel rather strongly that whichever way it is done, THP splitting
and split_page() should use the same interface to memcg.

And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
there need to be css_get()s too - or better, a css_get_many().

Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
it mem_cgroup_split_page_fixup(), and take order from caller.

Though I've never much liked that separate pass: would it be
better page by page, like this copy_page_memcg() does?  Though
mem_cgroup_disabled() and css_getting make that less appealing.

Hugh

> 
> The page itself looks reasonable to me.
> 
> > Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Minor nit
> 
> > ---
> >  include/linux/memcontrol.h | 10 ++++++++++
> >  mm/page_alloc.c            |  4 +++-
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index e6dc793d587d..c7e2b4421dc1 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
> >  extern bool cgroup_memory_noswap;
> >  #endif
> >  
> > +static inline void copy_page_memcg(struct page *dst, struct page *src)
> > +{
> > +	if (src->memcg_data)
> > +		dst->memcg_data = src->memcg_data;
> 
> I would just drop the test. The struct page is a single cache line which
> is dirty by the reference count so another store will unlikely be
> noticeable even when NULL is stored here and you safe a conditional.
> 
> > +}
> > +
> >  struct mem_cgroup *lock_page_memcg(struct page *page);
> >  void __unlock_page_memcg(struct mem_cgroup *memcg);
> >  void unlock_page_memcg(struct page *page);
> > @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
> >  {
> >  }
> >  
> > +static inline void copy_page_memcg(struct page *dst, struct page *src)
> > +{
> > +}
> > +
> >  static inline struct mem_cgroup *lock_page_memcg(struct page *page)
> >  {
> >  	return NULL;
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3e4b29ee2b1e..ee0a63dc1c9b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
> >  	VM_BUG_ON_PAGE(PageCompound(page), page);
> >  	VM_BUG_ON_PAGE(!page_count(page), page);
> >  
> > -	for (i = 1; i < (1 << order); i++)
> > +	for (i = 1; i < (1 << order); i++) {
> >  		set_page_refcounted(page + i);
> > +		copy_page_memcg(page + i, page);
> > +	}
> >  	split_page_owner(page, 1 << order);
> >  }
> >  EXPORT_SYMBOL_GPL(split_page);
> > -- 
> > 2.25.0
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02 20:24   ` Hugh Dickins
@ 2021-03-02 22:56     ` Johannes Weiner
  2021-03-02 23:49       ` Hugh Dickins
  2021-03-03  7:46       ` Michal Hocko
  0 siblings, 2 replies; 12+ messages in thread
From: Johannes Weiner @ 2021-03-02 22:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Zhou Guanghui, Michal Hocko, linux-kernel, linux-mm, akpm,
	wangkefeng.wang, guohanjun, dingtianhong, chenweilong, rui.xiang,
	Nicholas Piggin, Kirill A. Shutemov

On Tue, Mar 02, 2021 at 12:24:41PM -0800, Hugh Dickins wrote:
> On Tue, 2 Mar 2021, Michal Hocko wrote:
> > [Cc Johannes for awareness and fixup Nick's email]
> > 
> > On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > > When split page, the memory cgroup info recorded in first page is
> > > not copied to tail pages. In this case, when the tail pages are
> > > freed, the uncharge operation is not performed. As a result, the
> > > usage of this memcg keeps increasing, and the OOM may occur.
> > > 
> > > So, the copying of first page's memory cgroup info to tail pages
> > > is needed when split page.
> > 
> > I was not aware that alloc_pages_exact is used for accounted allocations
> > but git grep told me otherwise so this is not a theoretical one. Both
> > users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> > used in dma allocator but I got lost in indirection so I have no idea
> > whether there are any users there.
> 
> Yes, it's a bit worrying that such a low-level thing as split_page()
> can now get caught up in memcg accounting, but I suppose that's okay.
> 
> I feel rather strongly that whichever way it is done, THP splitting
> and split_page() should use the same interface to memcg.
> 
> And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
> there need to be css_get()s too - or better, a css_get_many().
> 
> Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
> it mem_cgroup_split_page_fixup(), and take order from caller.

+1

There is already a split_page_owner() in both these places as well
which does a similar thing. Mabye we can match that by calling it
split_page_memcg() and having it take a nr of pages?

> Though I've never much liked that separate pass: would it be
> better page by page, like this copy_page_memcg() does?  Though
> mem_cgroup_disabled() and css_getting make that less appealing.

Agreed on both counts. mem_cgroup_disabled() is a jump label and would
be okay, IMO, but the refcounting - though it is (usually) per-cpu -
adds at least two branches and rcu read locking.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02 22:56     ` Johannes Weiner
@ 2021-03-02 23:49       ` Hugh Dickins
  2021-03-03  7:46       ` Michal Hocko
  1 sibling, 0 replies; 12+ messages in thread
From: Hugh Dickins @ 2021-03-02 23:49 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Hugh Dickins, Zhou Guanghui, Michal Hocko, linux-kernel,
	linux-mm, akpm, wangkefeng.wang, guohanjun, dingtianhong,
	chenweilong, rui.xiang, Nicholas Piggin, Kirill A. Shutemov

On Tue, 2 Mar 2021, Johannes Weiner wrote:
> On Tue, Mar 02, 2021 at 12:24:41PM -0800, Hugh Dickins wrote:
> > On Tue, 2 Mar 2021, Michal Hocko wrote:
> > > [Cc Johannes for awareness and fixup Nick's email]
> > > 
> > > On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > > > When split page, the memory cgroup info recorded in first page is
> > > > not copied to tail pages. In this case, when the tail pages are
> > > > freed, the uncharge operation is not performed. As a result, the
> > > > usage of this memcg keeps increasing, and the OOM may occur.
> > > > 
> > > > So, the copying of first page's memory cgroup info to tail pages
> > > > is needed when split page.
> > > 
> > > I was not aware that alloc_pages_exact is used for accounted allocations
> > > but git grep told me otherwise so this is not a theoretical one. Both
> > > users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> > > used in dma allocator but I got lost in indirection so I have no idea
> > > whether there are any users there.
> > 
> > Yes, it's a bit worrying that such a low-level thing as split_page()
> > can now get caught up in memcg accounting, but I suppose that's okay.
> > 
> > I feel rather strongly that whichever way it is done, THP splitting
> > and split_page() should use the same interface to memcg.
> > 
> > And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
> > there need to be css_get()s too - or better, a css_get_many().
> > 
> > Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
> > it mem_cgroup_split_page_fixup(), and take order from caller.
> 
> +1
> 
> There is already a split_page_owner() in both these places as well
> which does a similar thing. Mabye we can match that by calling it
> split_page_memcg() and having it take a nr of pages?

Agreed on both counts :) "fixup" was not an inspiring name.

> 
> > Though I've never much liked that separate pass: would it be
> > better page by page, like this copy_page_memcg() does?  Though
> > mem_cgroup_disabled() and css_getting make that less appealing.
> 
> Agreed on both counts. mem_cgroup_disabled() is a jump label and would
> be okay, IMO, but the refcounting - though it is (usually) per-cpu -
> adds at least two branches and rcu read locking.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02 22:56     ` Johannes Weiner
  2021-03-02 23:49       ` Hugh Dickins
@ 2021-03-03  7:46       ` Michal Hocko
  2021-03-03  9:15         ` Zhouguanghui (OS Kernel)
  1 sibling, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2021-03-03  7:46 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Hugh Dickins, Zhou Guanghui, linux-kernel, linux-mm, akpm,
	wangkefeng.wang, guohanjun, dingtianhong, chenweilong, rui.xiang,
	Nicholas Piggin, Kirill A. Shutemov

On Tue 02-03-21 17:56:07, Johannes Weiner wrote:
> On Tue, Mar 02, 2021 at 12:24:41PM -0800, Hugh Dickins wrote:
> > On Tue, 2 Mar 2021, Michal Hocko wrote:
> > > [Cc Johannes for awareness and fixup Nick's email]
> > > 
> > > On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > > > When split page, the memory cgroup info recorded in first page is
> > > > not copied to tail pages. In this case, when the tail pages are
> > > > freed, the uncharge operation is not performed. As a result, the
> > > > usage of this memcg keeps increasing, and the OOM may occur.
> > > > 
> > > > So, the copying of first page's memory cgroup info to tail pages
> > > > is needed when split page.
> > > 
> > > I was not aware that alloc_pages_exact is used for accounted allocations
> > > but git grep told me otherwise so this is not a theoretical one. Both
> > > users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> > > used in dma allocator but I got lost in indirection so I have no idea
> > > whether there are any users there.
> > 
> > Yes, it's a bit worrying that such a low-level thing as split_page()
> > can now get caught up in memcg accounting, but I suppose that's okay.
> > 
> > I feel rather strongly that whichever way it is done, THP splitting
> > and split_page() should use the same interface to memcg.
> > 
> > And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
> > there need to be css_get()s too - or better, a css_get_many().
> > 
> > Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
> > it mem_cgroup_split_page_fixup(), and take order from caller.
> 
> +1
> 
> There is already a split_page_owner() in both these places as well
> which does a similar thing. Mabye we can match that by calling it
> split_page_memcg() and having it take a nr of pages?

Sounds good to me.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-03  7:46       ` Michal Hocko
@ 2021-03-03  9:15         ` Zhouguanghui (OS Kernel)
  0 siblings, 0 replies; 12+ messages in thread
From: Zhouguanghui (OS Kernel) @ 2021-03-03  9:15 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner
  Cc: Hugh Dickins, linux-kernel, linux-mm, akpm,
	Wangkefeng (OS Kernel Lab), Guohanjun (Hanjun Guo),
	Dingtianhong, Chenweilong, Xiangrui (Euler),
	Nicholas Piggin, Kirill A. Shutemov, Zi Yan

在 2021/3/3 15:46, Michal Hocko 写道:
> On Tue 02-03-21 17:56:07, Johannes Weiner wrote:
>> On Tue, Mar 02, 2021 at 12:24:41PM -0800, Hugh Dickins wrote:
>>> On Tue, 2 Mar 2021, Michal Hocko wrote:
>>>> [Cc Johannes for awareness and fixup Nick's email]
>>>>
>>>> On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
>>>>> When split page, the memory cgroup info recorded in first page is
>>>>> not copied to tail pages. In this case, when the tail pages are
>>>>> freed, the uncharge operation is not performed. As a result, the
>>>>> usage of this memcg keeps increasing, and the OOM may occur.
>>>>>
>>>>> So, the copying of first page's memory cgroup info to tail pages
>>>>> is needed when split page.
>>>>
>>>> I was not aware that alloc_pages_exact is used for accounted allocations
>>>> but git grep told me otherwise so this is not a theoretical one. Both
>>>> users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
>>>> used in dma allocator but I got lost in indirection so I have no idea
>>>> whether there are any users there.
>>>
>>> Yes, it's a bit worrying that such a low-level thing as split_page()
>>> can now get caught up in memcg accounting, but I suppose that's okay.
>>>
>>> I feel rather strongly that whichever way it is done, THP splitting
>>> and split_page() should use the same interface to memcg.
>>>
>>> And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
>>> there need to be css_get()s too - or better, a css_get_many().
>>>
>>> Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
>>> it mem_cgroup_split_page_fixup(), and take order from caller.
>>
>> +1
>>
>> There is already a split_page_owner() in both these places as well
>> which does a similar thing. Mabye we can match that by calling it
>> split_page_memcg() and having it take a nr of pages?
> 
> Sounds good to me.
> 
  Hi, Michal, Johannes, Hugh, and Zi Yan, thank you for taking time for 
this.

I agree, and will send v2 patches for taking these.

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-03-03 16:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02  1:34 [PATCH] mm/memcg: set memcg when split pages Zhou Guanghui
2021-03-02  1:59 ` Zi Yan
2021-03-02  7:05   ` Zhouguanghui (OS Kernel)
2021-03-02 15:37     ` Zi Yan
2021-03-02 15:42       ` Michal Hocko
2021-03-02  9:17 ` Michal Hocko
2021-03-02 15:51   ` Michal Hocko
2021-03-02 20:24   ` Hugh Dickins
2021-03-02 22:56     ` Johannes Weiner
2021-03-02 23:49       ` Hugh Dickins
2021-03-03  7:46       ` Michal Hocko
2021-03-03  9:15         ` Zhouguanghui (OS Kernel)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).