linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/memcg: set memcg when split pages
@ 2021-03-02  1:34 Zhou Guanghui
  2021-03-02  1:59 ` Zi Yan
  2021-03-02  9:17 ` Michal Hocko
  0 siblings, 2 replies; 6+ messages in thread
From: Zhou Guanghui @ 2021-03-02  1:34 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: akpm, npiggin, wangkefeng.wang, guohanjun, dingtianhong,
	chenweilong, rui.xiang

When split page, the memory cgroup info recorded in first page is
not copied to tail pages. In this case, when the tail pages are
freed, the uncharge operation is not performed. As a result, the
usage of this memcg keeps increasing, and the OOM may occur.

So, the copying of first page's memory cgroup info to tail pages
is needed when split page.

Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
---
 include/linux/memcontrol.h | 10 ++++++++++
 mm/page_alloc.c            |  4 +++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e6dc793d587d..c7e2b4421dc1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
 extern bool cgroup_memory_noswap;
 #endif
 
+static inline void copy_page_memcg(struct page *dst, struct page *src)
+{
+	if (src->memcg_data)
+		dst->memcg_data = src->memcg_data;
+}
+
 struct mem_cgroup *lock_page_memcg(struct page *page);
 void __unlock_page_memcg(struct mem_cgroup *memcg);
 void unlock_page_memcg(struct page *page);
@@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
 {
 }
 
+static inline void copy_page_memcg(struct page *dst, struct page *src)
+{
+}
+
 static inline struct mem_cgroup *lock_page_memcg(struct page *page)
 {
 	return NULL;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e4b29ee2b1e..ee0a63dc1c9b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
 	VM_BUG_ON_PAGE(PageCompound(page), page);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 
-	for (i = 1; i < (1 << order); i++)
+	for (i = 1; i < (1 << order); i++) {
 		set_page_refcounted(page + i);
+		copy_page_memcg(page + i, page);
+	}
 	split_page_owner(page, 1 << order);
 }
 EXPORT_SYMBOL_GPL(split_page);
-- 
2.25.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02  1:34 [PATCH] mm/memcg: set memcg when split pages Zhou Guanghui
@ 2021-03-02  1:59 ` Zi Yan
  2021-03-02  7:05   ` Zhouguanghui (OS Kernel)
  2021-03-02  9:17 ` Michal Hocko
  1 sibling, 1 reply; 6+ messages in thread
From: Zi Yan @ 2021-03-02  1:59 UTC (permalink / raw)
  To: Zhou Guanghui
  Cc: linux-kernel, linux-mm, akpm, npiggin, wangkefeng.wang,
	guohanjun, dingtianhong, chenweilong, rui.xiang, Johannes Weiner,
	Michal Hocko, Vladimir Davydov

[-- Attachment #1: Type: text/plain, Size: 2357 bytes --]

On 1 Mar 2021, at 20:34, Zhou Guanghui wrote:

> When split page, the memory cgroup info recorded in first page is
> not copied to tail pages. In this case, when the tail pages are
> freed, the uncharge operation is not performed. As a result, the
> usage of this memcg keeps increasing, and the OOM may occur.
>
> So, the copying of first page's memory cgroup info to tail pages
> is needed when split page.
>
> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
> ---
>  include/linux/memcontrol.h | 10 ++++++++++
>  mm/page_alloc.c            |  4 +++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e6dc793d587d..c7e2b4421dc1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
>  extern bool cgroup_memory_noswap;
>  #endif
>
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> +	if (src->memcg_data)
> +		dst->memcg_data = src->memcg_data;
> +}
> +
>  struct mem_cgroup *lock_page_memcg(struct page *page);
>  void __unlock_page_memcg(struct mem_cgroup *memcg);
>  void unlock_page_memcg(struct page *page);
> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
>  {
>  }
>
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> +}
> +
>  static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>  {
>  	return NULL;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
>  	VM_BUG_ON_PAGE(PageCompound(page), page);
>  	VM_BUG_ON_PAGE(!page_count(page), page);
>
> -	for (i = 1; i < (1 << order); i++)
> +	for (i = 1; i < (1 << order); i++) {
>  		set_page_refcounted(page + i);
> +		copy_page_memcg(page + i, page);
> +	}
>  	split_page_owner(page, 1 << order);
>  }
>  EXPORT_SYMBOL_GPL(split_page);
> -- 
> 2.25.0

+memcg maintainers

split_page() is used for non-compound higher-order pages. I am not sure
if there is any such pages monitored by memcg. Please let me know
if I miss anything.

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02  1:59 ` Zi Yan
@ 2021-03-02  7:05   ` Zhouguanghui (OS Kernel)
  0 siblings, 0 replies; 6+ messages in thread
From: Zhouguanghui (OS Kernel) @ 2021-03-02  7:05 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-kernel, linux-mm, akpm, npiggin, Wangkefeng (OS Kernel Lab),
	Guohanjun (Hanjun Guo),
	Dingtianhong, Chenweilong, Xiangrui (Euler),
	Johannes Weiner, Michal Hocko, Vladimir Davydov

在 2021/3/2 10:00, Zi Yan 写道:
> On 1 Mar 2021, at 20:34, Zhou Guanghui wrote:
> 
>> When split page, the memory cgroup info recorded in first page is
>> not copied to tail pages. In this case, when the tail pages are
>> freed, the uncharge operation is not performed. As a result, the
>> usage of this memcg keeps increasing, and the OOM may occur.
>>
>> So, the copying of first page's memory cgroup info to tail pages
>> is needed when split page.
>>
>> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
>> ---
>>   include/linux/memcontrol.h | 10 ++++++++++
>>   mm/page_alloc.c            |  4 +++-
>>   2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index e6dc793d587d..c7e2b4421dc1 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
>>   extern bool cgroup_memory_noswap;
>>   #endif
>>
>> +static inline void copy_page_memcg(struct page *dst, struct page *src)
>> +{
>> +	if (src->memcg_data)
>> +		dst->memcg_data = src->memcg_data;
>> +}
>> +
>>   struct mem_cgroup *lock_page_memcg(struct page *page);
>>   void __unlock_page_memcg(struct mem_cgroup *memcg);
>>   void unlock_page_memcg(struct page *page);
>> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
>>   {
>>   }
>>
>> +static inline void copy_page_memcg(struct page *dst, struct page *src)
>> +{
>> +}
>> +
>>   static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>>   {
>>   	return NULL;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
>>   	VM_BUG_ON_PAGE(PageCompound(page), page);
>>   	VM_BUG_ON_PAGE(!page_count(page), page);
>>
>> -	for (i = 1; i < (1 << order); i++)
>> +	for (i = 1; i < (1 << order); i++) {
>>   		set_page_refcounted(page + i);
>> +		copy_page_memcg(page + i, page);
>> +	}
>>   	split_page_owner(page, 1 << order);
>>   }
>>   EXPORT_SYMBOL_GPL(split_page);
>> -- 
>> 2.25.0
> 
> +memcg maintainers
> 
> split_page() is used for non-compound higher-order pages. I am not sure
> if there is any such pages monitored by memcg. Please let me know
> if I miss anything.

Thank you for taking time for this.

This should be put in kmemcg, and I'll modify it.

When the kmemcg is enabled and _GFP_ACCOUNT is set, the charged and 
uncharged sizes do not match when alloc/free_pages_exact method is used 
to apply for or free memory with exact size. This is because memcg data 
of the tail page is not set during the split page.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-02  1:34 [PATCH] mm/memcg: set memcg when split pages Zhou Guanghui
  2021-03-02  1:59 ` Zi Yan
@ 2021-03-02  9:17 ` Michal Hocko
       [not found]   ` <alpine.LSU.2.11.2103021157160.8450@eggly.anvils>
  1 sibling, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2021-03-02  9:17 UTC (permalink / raw)
  To: Zhou Guanghui
  Cc: linux-kernel, linux-mm, akpm, wangkefeng.wang, guohanjun,
	dingtianhong, chenweilong, rui.xiang, Johannes Weiner,
	Nicholas Piggin

[Cc Johannes for awareness and fixup Nick's email]

On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> When split page, the memory cgroup info recorded in first page is
> not copied to tail pages. In this case, when the tail pages are
> freed, the uncharge operation is not performed. As a result, the
> usage of this memcg keeps increasing, and the OOM may occur.
> 
> So, the copying of first page's memory cgroup info to tail pages
> is needed when split page.

I was not aware that alloc_pages_exact is used for accounted allocations
but git grep told me otherwise so this is not a theoretical one. Both
users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
used in dma allocator but I got lost in indirection so I have no idea
whether there are any users there.

The page itself looks reasonable to me.

> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>

Acked-by: Michal Hocko <mhocko@suse.com>

Minor nit

> ---
>  include/linux/memcontrol.h | 10 ++++++++++
>  mm/page_alloc.c            |  4 +++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e6dc793d587d..c7e2b4421dc1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -867,6 +867,12 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
>  extern bool cgroup_memory_noswap;
>  #endif
>  
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> +	if (src->memcg_data)
> +		dst->memcg_data = src->memcg_data;

I would just drop the test. The struct page is a single cache line which
is dirty by the reference count so another store will unlikely be
noticeable even when NULL is stored here and you safe a conditional.

> +}
> +
>  struct mem_cgroup *lock_page_memcg(struct page *page);
>  void __unlock_page_memcg(struct mem_cgroup *memcg);
>  void unlock_page_memcg(struct page *page);
> @@ -1291,6 +1297,10 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
>  {
>  }
>  
> +static inline void copy_page_memcg(struct page *dst, struct page *src)
> +{
> +}
> +
>  static inline struct mem_cgroup *lock_page_memcg(struct page *page)
>  {
>  	return NULL;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e4b29ee2b1e..ee0a63dc1c9b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3307,8 +3307,10 @@ void split_page(struct page *page, unsigned int order)
>  	VM_BUG_ON_PAGE(PageCompound(page), page);
>  	VM_BUG_ON_PAGE(!page_count(page), page);
>  
> -	for (i = 1; i < (1 << order); i++)
> +	for (i = 1; i < (1 << order); i++) {
>  		set_page_refcounted(page + i);
> +		copy_page_memcg(page + i, page);
> +	}
>  	split_page_owner(page, 1 << order);
>  }
>  EXPORT_SYMBOL_GPL(split_page);
> -- 
> 2.25.0
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
       [not found]     ` <YD7Ch/8QebzmneCR@cmpxchg.org>
@ 2021-03-03  7:46       ` Michal Hocko
  2021-03-03  9:15         ` Zhouguanghui (OS Kernel)
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2021-03-03  7:46 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Hugh Dickins, Zhou Guanghui, linux-kernel, linux-mm, akpm,
	wangkefeng.wang, guohanjun, dingtianhong, chenweilong, rui.xiang,
	Nicholas Piggin, Kirill A. Shutemov

On Tue 02-03-21 17:56:07, Johannes Weiner wrote:
> On Tue, Mar 02, 2021 at 12:24:41PM -0800, Hugh Dickins wrote:
> > On Tue, 2 Mar 2021, Michal Hocko wrote:
> > > [Cc Johannes for awareness and fixup Nick's email]
> > > 
> > > On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
> > > > When split page, the memory cgroup info recorded in first page is
> > > > not copied to tail pages. In this case, when the tail pages are
> > > > freed, the uncharge operation is not performed. As a result, the
> > > > usage of this memcg keeps increasing, and the OOM may occur.
> > > > 
> > > > So, the copying of first page's memory cgroup info to tail pages
> > > > is needed when split page.
> > > 
> > > I was not aware that alloc_pages_exact is used for accounted allocations
> > > but git grep told me otherwise so this is not a theoretical one. Both
> > > users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
> > > used in dma allocator but I got lost in indirection so I have no idea
> > > whether there are any users there.
> > 
> > Yes, it's a bit worrying that such a low-level thing as split_page()
> > can now get caught up in memcg accounting, but I suppose that's okay.
> > 
> > I feel rather strongly that whichever way it is done, THP splitting
> > and split_page() should use the same interface to memcg.
> > 
> > And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
> > there need to be css_get()s too - or better, a css_get_many().
> > 
> > Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
> > it mem_cgroup_split_page_fixup(), and take order from caller.
> 
> +1
> 
> There is already a split_page_owner() in both these places as well
> which does a similar thing. Mabye we can match that by calling it
> split_page_memcg() and having it take a nr of pages?

Sounds good to me.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/memcg: set memcg when split pages
  2021-03-03  7:46       ` Michal Hocko
@ 2021-03-03  9:15         ` Zhouguanghui (OS Kernel)
  0 siblings, 0 replies; 6+ messages in thread
From: Zhouguanghui (OS Kernel) @ 2021-03-03  9:15 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner
  Cc: Hugh Dickins, linux-kernel, linux-mm, akpm,
	Wangkefeng (OS Kernel Lab), Guohanjun (Hanjun Guo),
	Dingtianhong, Chenweilong, Xiangrui (Euler),
	Nicholas Piggin, Kirill A. Shutemov, Zi Yan

在 2021/3/3 15:46, Michal Hocko 写道:
> On Tue 02-03-21 17:56:07, Johannes Weiner wrote:
>> On Tue, Mar 02, 2021 at 12:24:41PM -0800, Hugh Dickins wrote:
>>> On Tue, 2 Mar 2021, Michal Hocko wrote:
>>>> [Cc Johannes for awareness and fixup Nick's email]
>>>>
>>>> On Tue 02-03-21 01:34:51, Zhou Guanghui wrote:
>>>>> When split page, the memory cgroup info recorded in first page is
>>>>> not copied to tail pages. In this case, when the tail pages are
>>>>> freed, the uncharge operation is not performed. As a result, the
>>>>> usage of this memcg keeps increasing, and the OOM may occur.
>>>>>
>>>>> So, the copying of first page's memory cgroup info to tail pages
>>>>> is needed when split page.
>>>>
>>>> I was not aware that alloc_pages_exact is used for accounted allocations
>>>> but git grep told me otherwise so this is not a theoretical one. Both
>>>> users (arm64 and s390 kvm) are quite recent AFAICS. split_page is also
>>>> used in dma allocator but I got lost in indirection so I have no idea
>>>> whether there are any users there.
>>>
>>> Yes, it's a bit worrying that such a low-level thing as split_page()
>>> can now get caught up in memcg accounting, but I suppose that's okay.
>>>
>>> I feel rather strongly that whichever way it is done, THP splitting
>>> and split_page() should use the same interface to memcg.
>>>
>>> And a look at mem_cgroup_split_huge_fixup() suggests that nowadays
>>> there need to be css_get()s too - or better, a css_get_many().
>>>
>>> Its #ifdef CONFIG_TRANSPARENT_HUGEPAGE should be removed, rename
>>> it mem_cgroup_split_page_fixup(), and take order from caller.
>>
>> +1
>>
>> There is already a split_page_owner() in both these places as well
>> which does a similar thing. Mabye we can match that by calling it
>> split_page_memcg() and having it take a nr of pages?
> 
> Sounds good to me.
> 
  Hi, Michal, Johannes, Hugh, and Zi Yan, thank you for taking time for 
this.

I agree, and will send v2 patches for taking these.

Thanks


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-03  9:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02  1:34 [PATCH] mm/memcg: set memcg when split pages Zhou Guanghui
2021-03-02  1:59 ` Zi Yan
2021-03-02  7:05   ` Zhouguanghui (OS Kernel)
2021-03-02  9:17 ` Michal Hocko
     [not found]   ` <alpine.LSU.2.11.2103021157160.8450@eggly.anvils>
     [not found]     ` <YD7Ch/8QebzmneCR@cmpxchg.org>
2021-03-03  7:46       ` Michal Hocko
2021-03-03  9:15         ` Zhouguanghui (OS Kernel)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).