linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Patch v4] mm: thp: remove the defer list related code since this will not happen
@ 2020-01-17 23:38 Wei Yang
  2020-01-18  0:57 ` Yang Shi
  2020-01-18 22:54 ` Andrew Morton
  0 siblings, 2 replies; 13+ messages in thread
From: Wei Yang @ 2020-01-17 23:38 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, ktkhai, kirill.shutemov, yang.shi
  Cc: cgroups, linux-mm, linux-kernel, alexander.duyck, rientjes,
	Wei Yang, stable

If compound is true, this means it is a PMD mapped THP. Which implies
the page is not linked to any defer list. So the first code chunk will
not be executed.

Also with this reason, it would not be proper to add this page to a
defer list. So the second code chunk is not correct.

Based on this, we should remove the defer list related code.

Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>    [5.4+]

---
v4:
  * finally we identified the related code is not necessary and not
    correct, just remove it
  * thanks to Kirill T first spot some problem
v3:
  * remove all review/ack tag since rewrite the changelog
  * use deferred_split_huge_page as the example of race
  * add cc stable 5.4+ tag as suggested by David Rientjes

v2:
  * move check on compound outside suggested by Alexander
  * an example of the race condition, suggested by Michal
---
 mm/memcontrol.c | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6c83cf4ed970..27c231bf4565 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5340,14 +5340,6 @@ static int mem_cgroup_move_account(struct page *page,
 		__mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages);
 	}
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	if (compound && !list_empty(page_deferred_list(page))) {
-		spin_lock(&from->deferred_split_queue.split_queue_lock);
-		list_del_init(page_deferred_list(page));
-		from->deferred_split_queue.split_queue_len--;
-		spin_unlock(&from->deferred_split_queue.split_queue_lock);
-	}
-#endif
 	/*
 	 * It is safe to change page->mem_cgroup here because the page
 	 * is referenced, charged, and isolated - we can't race with
@@ -5357,16 +5349,6 @@ static int mem_cgroup_move_account(struct page *page,
 	/* caller should have done css_get */
 	page->mem_cgroup = to;
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	if (compound && list_empty(page_deferred_list(page))) {
-		spin_lock(&to->deferred_split_queue.split_queue_lock);
-		list_add_tail(page_deferred_list(page),
-			      &to->deferred_split_queue.split_queue);
-		to->deferred_split_queue.split_queue_len++;
-		spin_unlock(&to->deferred_split_queue.split_queue_lock);
-	}
-#endif
-
 	spin_unlock_irqrestore(&from->move_lock, flags);
 
 	ret = 0;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-17 23:38 [Patch v4] mm: thp: remove the defer list related code since this will not happen Wei Yang
@ 2020-01-18  0:57 ` Yang Shi
  2020-01-18  5:30   ` Yang Shi
  2020-01-18 22:54 ` Andrew Morton
  1 sibling, 1 reply; 13+ messages in thread
From: Yang Shi @ 2020-01-18  0:57 UTC (permalink / raw)
  To: Wei Yang, hannes, mhocko, vdavydov.dev, akpm, ktkhai, kirill.shutemov
  Cc: cgroups, linux-mm, linux-kernel, alexander.duyck, rientjes, stable



On 1/17/20 3:38 PM, Wei Yang wrote:
> If compound is true, this means it is a PMD mapped THP. Which implies
> the page is not linked to any defer list. So the first code chunk will
> not be executed.
>
> Also with this reason, it would not be proper to add this page to a
> defer list. So the second code chunk is not correct.
>
> Based on this, we should remove the defer list related code.
>
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: <stable@vger.kernel.org>    [5.4+]
>
> ---
> v4:
>    * finally we identified the related code is not necessary and not
>      correct, just remove it
>    * thanks to Kirill T first spot some problem

Thanks for debugging and figuring this out. Acked-by: Yang Shi 
<yang.shi@linux.alibaba.com>

> v3:
>    * remove all review/ack tag since rewrite the changelog
>    * use deferred_split_huge_page as the example of race
>    * add cc stable 5.4+ tag as suggested by David Rientjes
>
> v2:
>    * move check on compound outside suggested by Alexander
>    * an example of the race condition, suggested by Michal
> ---
>   mm/memcontrol.c | 18 ------------------
>   1 file changed, 18 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6c83cf4ed970..27c231bf4565 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5340,14 +5340,6 @@ static int mem_cgroup_move_account(struct page *page,
>   		__mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages);
>   	}
>   
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	if (compound && !list_empty(page_deferred_list(page))) {
> -		spin_lock(&from->deferred_split_queue.split_queue_lock);
> -		list_del_init(page_deferred_list(page));
> -		from->deferred_split_queue.split_queue_len--;
> -		spin_unlock(&from->deferred_split_queue.split_queue_lock);
> -	}
> -#endif
>   	/*
>   	 * It is safe to change page->mem_cgroup here because the page
>   	 * is referenced, charged, and isolated - we can't race with
> @@ -5357,16 +5349,6 @@ static int mem_cgroup_move_account(struct page *page,
>   	/* caller should have done css_get */
>   	page->mem_cgroup = to;
>   
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	if (compound && list_empty(page_deferred_list(page))) {
> -		spin_lock(&to->deferred_split_queue.split_queue_lock);
> -		list_add_tail(page_deferred_list(page),
> -			      &to->deferred_split_queue.split_queue);
> -		to->deferred_split_queue.split_queue_len++;
> -		spin_unlock(&to->deferred_split_queue.split_queue_lock);
> -	}
> -#endif
> -
>   	spin_unlock_irqrestore(&from->move_lock, flags);
>   
>   	ret = 0;



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-18  0:57 ` Yang Shi
@ 2020-01-18  5:30   ` Yang Shi
  0 siblings, 0 replies; 13+ messages in thread
From: Yang Shi @ 2020-01-18  5:30 UTC (permalink / raw)
  To: Wei Yang, hannes, mhocko, vdavydov.dev, akpm, ktkhai, kirill.shutemov
  Cc: cgroups, linux-mm, linux-kernel, alexander.duyck, rientjes, stable



On 1/17/20 4:57 PM, Yang Shi wrote:
>
>
> On 1/17/20 3:38 PM, Wei Yang wrote:
>> If compound is true, this means it is a PMD mapped THP. Which implies
>> the page is not linked to any defer list. So the first code chunk will
>> not be executed.
>>
>> Also with this reason, it would not be proper to add this page to a
>> defer list. So the second code chunk is not correct.
>>
>> Based on this, we should remove the defer list related code.
>>
>> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg 
>> aware")
>>
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: <stable@vger.kernel.org>    [5.4+]
>>
>> ---
>> v4:
>>    * finally we identified the related code is not necessary and not
>>      correct, just remove it
>>    * thanks to Kirill T first spot some problem
>
> Thanks for debugging and figuring this out. Acked-by: Yang Shi 
> <yang.shi@linux.alibaba.com>

BTW, the patch itself is fine, but the subject looks really confusing. 
It sounds like we would remove all deferred list code. I'd suggest 
rephrase it to:

mm: thp: don't need care deferred split queue in memcg charge move path

>
>> v3:
>>    * remove all review/ack tag since rewrite the changelog
>>    * use deferred_split_huge_page as the example of race
>>    * add cc stable 5.4+ tag as suggested by David Rientjes
>>
>> v2:
>>    * move check on compound outside suggested by Alexander
>>    * an example of the race condition, suggested by Michal
>> ---
>>   mm/memcontrol.c | 18 ------------------
>>   1 file changed, 18 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 6c83cf4ed970..27c231bf4565 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5340,14 +5340,6 @@ static int mem_cgroup_move_account(struct page 
>> *page,
>>           __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages);
>>       }
>>   -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -    if (compound && !list_empty(page_deferred_list(page))) {
>> - spin_lock(&from->deferred_split_queue.split_queue_lock);
>> -        list_del_init(page_deferred_list(page));
>> -        from->deferred_split_queue.split_queue_len--;
>> - spin_unlock(&from->deferred_split_queue.split_queue_lock);
>> -    }
>> -#endif
>>       /*
>>        * It is safe to change page->mem_cgroup here because the page
>>        * is referenced, charged, and isolated - we can't race with
>> @@ -5357,16 +5349,6 @@ static int mem_cgroup_move_account(struct page 
>> *page,
>>       /* caller should have done css_get */
>>       page->mem_cgroup = to;
>>   -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -    if (compound && list_empty(page_deferred_list(page))) {
>> - spin_lock(&to->deferred_split_queue.split_queue_lock);
>> -        list_add_tail(page_deferred_list(page),
>> - &to->deferred_split_queue.split_queue);
>> -        to->deferred_split_queue.split_queue_len++;
>> - spin_unlock(&to->deferred_split_queue.split_queue_lock);
>> -    }
>> -#endif
>> -
>>       spin_unlock_irqrestore(&from->move_lock, flags);
>>         ret = 0;
>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-17 23:38 [Patch v4] mm: thp: remove the defer list related code since this will not happen Wei Yang
  2020-01-18  0:57 ` Yang Shi
@ 2020-01-18 22:54 ` Andrew Morton
  2020-01-18 23:36   ` David Rientjes
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2020-01-18 22:54 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, ktkhai, kirill.shutemov, yang.shi,
	cgroups, linux-mm, linux-kernel, alexander.duyck, rientjes,
	stable

On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote:

> If compound is true, this means it is a PMD mapped THP. Which implies
> the page is not linked to any defer list. So the first code chunk will
> not be executed.
> 
> Also with this reason, it would not be proper to add this page to a
> defer list. So the second code chunk is not correct.
> 
> Based on this, we should remove the defer list related code.
> 
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: <stable@vger.kernel.org>    [5.4+]

This patch is identical to "mm: thp: grab the lock before manipulating
defer list", which is rather confusing.  Please let people know when
this sort of thing is done.

The earlier changelog mentioned a possible race condition.  This
changelog does not.  In fact this changelog fails to provide any
description of any userspace-visible runtime effects of the bug. 
Please send along such a description for inclusion, as always.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-18 22:54 ` Andrew Morton
@ 2020-01-18 23:36   ` David Rientjes
  2020-01-19  2:24     ` Wei Yang
  2020-01-20  7:22     ` Michal Hocko
  0 siblings, 2 replies; 13+ messages in thread
From: David Rientjes @ 2020-01-18 23:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wei Yang, hannes, mhocko, vdavydov.dev, ktkhai, kirill.shutemov,
	yang.shi, cgroups, linux-mm, linux-kernel, alexander.duyck,
	stable

On Sat, 18 Jan 2020, Andrew Morton wrote:

> On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote:
> 
> > If compound is true, this means it is a PMD mapped THP. Which implies
> > the page is not linked to any defer list. So the first code chunk will
> > not be executed.
> > 
> > Also with this reason, it would not be proper to add this page to a
> > defer list. So the second code chunk is not correct.
> > 
> > Based on this, we should remove the defer list related code.
> > 
> > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> > 
> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Cc: <stable@vger.kernel.org>    [5.4+]
> 
> This patch is identical to "mm: thp: grab the lock before manipulating
> defer list", which is rather confusing.  Please let people know when
> this sort of thing is done.
> 
> The earlier changelog mentioned a possible race condition.  This
> changelog does not.  In fact this changelog fails to provide any
> description of any userspace-visible runtime effects of the bug. 
> Please send along such a description for inclusion, as always.
> 

The locking concern that Wei was originally looking at is no longer an 
issue because we determined that the code in question could simply be 
removed.

I think the following can be added to the changelog:

----->o-----

When migrating memcg charges of thp memory, there are two possibilities:

 (1) The underlying compound page is mapped by a pmd and thus does is not 
     on a deferred split queue (it's mapped), or

 (2) The compound page is not mapped by a pmd and is awaiting split on a
     deferred split queue.

The current charge migration implementation does *not* migrate charges for 
thp memory on the deferred split queue, it only migrates charges for pages 
that are mapped by a pmd.

Thus, to migrate charges, the underlying compound page cannot be on a 
deferred split queue; no list manipulation needs to be done in 
mem_cgroup_move_account().

With the current code, the underlying compound page is moved to the 
deferred split queue of the memcg its memory is not charged to, so 
susbequent reclaim will consider these pages for the wrong memcg.  Remove 
the deferred split queue handling in mem_cgroup_move_account() entirely.

----->o-----

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-18 23:36   ` David Rientjes
@ 2020-01-19  2:24     ` Wei Yang
  2020-01-20  7:22     ` Michal Hocko
  1 sibling, 0 replies; 13+ messages in thread
From: Wei Yang @ 2020-01-19  2:24 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Wei Yang, hannes, mhocko, vdavydov.dev, ktkhai,
	kirill.shutemov, yang.shi, cgroups, linux-mm, linux-kernel,
	alexander.duyck, stable

On Sat, Jan 18, 2020 at 03:36:06PM -0800, David Rientjes wrote:
>On Sat, 18 Jan 2020, Andrew Morton wrote:
>
>> On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote:
>> 
>> > If compound is true, this means it is a PMD mapped THP. Which implies
>> > the page is not linked to any defer list. So the first code chunk will
>> > not be executed.
>> > 
>> > Also with this reason, it would not be proper to add this page to a
>> > defer list. So the second code chunk is not correct.
>> > 
>> > Based on this, we should remove the defer list related code.
>> > 
>> > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>> > 
>> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> > Cc: <stable@vger.kernel.org>    [5.4+]
>> 
>> This patch is identical to "mm: thp: grab the lock before manipulating
>> defer list", which is rather confusing.  Please let people know when
>> this sort of thing is done.
>> 
>> The earlier changelog mentioned a possible race condition.  This
>> changelog does not.  In fact this changelog fails to provide any
>> description of any userspace-visible runtime effects of the bug. 
>> Please send along such a description for inclusion, as always.
>> 
>
>The locking concern that Wei was originally looking at is no longer an 
>issue because we determined that the code in question could simply be 
>removed.
>
>I think the following can be added to the changelog:
>
>----->o-----
>
>When migrating memcg charges of thp memory, there are two possibilities:
>
> (1) The underlying compound page is mapped by a pmd and thus does is not 
>     on a deferred split queue (it's mapped), or
>
> (2) The compound page is not mapped by a pmd and is awaiting split on a
>     deferred split queue.
>
>The current charge migration implementation does *not* migrate charges for 
>thp memory on the deferred split queue, it only migrates charges for pages 
>that are mapped by a pmd.
>
>Thus, to migrate charges, the underlying compound page cannot be on a 
>deferred split queue; no list manipulation needs to be done in 
>mem_cgroup_move_account().
>
>With the current code, the underlying compound page is moved to the 
>deferred split queue of the memcg its memory is not charged to, so 
>susbequent reclaim will consider these pages for the wrong memcg.  Remove 
>the deferred split queue handling in mem_cgroup_move_account() entirely.
>
>----->o-----
>
>Acked-by: David Rientjes <rientjes@google.com>

Hi David,

The changlog looks awesome to me. Thanks ~

Hi Andrew

I see you queue this in you tree, do I need to rewrite a patch with better
changelog?

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-18 23:36   ` David Rientjes
  2020-01-19  2:24     ` Wei Yang
@ 2020-01-20  7:22     ` Michal Hocko
  2020-01-20  8:17       ` Wei Yang
  2020-01-20 21:10       ` David Rientjes
  1 sibling, 2 replies; 13+ messages in thread
From: Michal Hocko @ 2020-01-20  7:22 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Wei Yang, hannes, vdavydov.dev, ktkhai,
	kirill.shutemov, yang.shi, cgroups, linux-mm, linux-kernel,
	alexander.duyck, stable

On Sat 18-01-20 15:36:06, David Rientjes wrote:
> On Sat, 18 Jan 2020, Andrew Morton wrote:
> 
> > On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote:
> > 
> > > If compound is true, this means it is a PMD mapped THP. Which implies
> > > the page is not linked to any defer list. So the first code chunk will
> > > not be executed.
> > > 
> > > Also with this reason, it would not be proper to add this page to a
> > > defer list. So the second code chunk is not correct.
> > > 
> > > Based on this, we should remove the defer list related code.
> > > 
> > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> > > 
> > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> > > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > Cc: <stable@vger.kernel.org>    [5.4+]
> > 
> > This patch is identical to "mm: thp: grab the lock before manipulating
> > defer list", which is rather confusing.  Please let people know when
> > this sort of thing is done.
> > 
> > The earlier changelog mentioned a possible race condition.  This
> > changelog does not.  In fact this changelog fails to provide any
> > description of any userspace-visible runtime effects of the bug. 
> > Please send along such a description for inclusion, as always.
> > 
> 
> The locking concern that Wei was originally looking at is no longer an 
> issue because we determined that the code in question could simply be 
> removed.
> 
> I think the following can be added to the changelog:
> 
> ----->o-----
> 
> When migrating memcg charges of thp memory, there are two possibilities:
> 
>  (1) The underlying compound page is mapped by a pmd and thus does is not 
>      on a deferred split queue (it's mapped), or
> 
>  (2) The compound page is not mapped by a pmd and is awaiting split on a
>      deferred split queue.
> 
> The current charge migration implementation does *not* migrate charges for 
> thp memory on the deferred split queue, it only migrates charges for pages 
> that are mapped by a pmd.
> 
> Thus, to migrate charges, the underlying compound page cannot be on a 
> deferred split queue; no list manipulation needs to be done in 
> mem_cgroup_move_account().
> 
> With the current code, the underlying compound page is moved to the 
> deferred split queue of the memcg its memory is not charged to, so 
> susbequent reclaim will consider these pages for the wrong memcg.  Remove 
> the deferred split queue handling in mem_cgroup_move_account() entirely.

I believe this still doesn't describe the underlying problem to the full
extent. What happens with the page on the deferred list when it
shouldn't be there in fact? Unless I am missing something deferred_split_scan
will simply split that huge page. Which is a bit unfortunate but nothing
really critical. This should be mentioned in the changelog.

With that clarified, feel free to add

Acked-by: Michal Hocko <mhocko@suse.com>

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-20  7:22     ` Michal Hocko
@ 2020-01-20  8:17       ` Wei Yang
  2020-01-20 21:10       ` David Rientjes
  1 sibling, 0 replies; 13+ messages in thread
From: Wei Yang @ 2020-01-20  8:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Andrew Morton, Wei Yang, hannes, vdavydov.dev,
	ktkhai, kirill.shutemov, yang.shi, cgroups, linux-mm,
	linux-kernel, alexander.duyck, stable

On Mon, Jan 20, 2020 at 08:22:37AM +0100, Michal Hocko wrote:
>On Sat 18-01-20 15:36:06, David Rientjes wrote:
>> On Sat, 18 Jan 2020, Andrew Morton wrote:
>> 
>> > On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote:
>> > 
>> > > If compound is true, this means it is a PMD mapped THP. Which implies
>> > > the page is not linked to any defer list. So the first code chunk will
>> > > not be executed.
>> > > 
>> > > Also with this reason, it would not be proper to add this page to a
>> > > defer list. So the second code chunk is not correct.
>> > > 
>> > > Based on this, we should remove the defer list related code.
>> > > 
>> > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>> > > 
>> > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> > > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> > > Cc: <stable@vger.kernel.org>    [5.4+]
>> > 
>> > This patch is identical to "mm: thp: grab the lock before manipulating
>> > defer list", which is rather confusing.  Please let people know when
>> > this sort of thing is done.
>> > 
>> > The earlier changelog mentioned a possible race condition.  This
>> > changelog does not.  In fact this changelog fails to provide any
>> > description of any userspace-visible runtime effects of the bug. 
>> > Please send along such a description for inclusion, as always.
>> > 
>> 
>> The locking concern that Wei was originally looking at is no longer an 
>> issue because we determined that the code in question could simply be 
>> removed.
>> 
>> I think the following can be added to the changelog:
>> 
>> ----->o-----
>> 
>> When migrating memcg charges of thp memory, there are two possibilities:
>> 
>>  (1) The underlying compound page is mapped by a pmd and thus does is not 
>>      on a deferred split queue (it's mapped), or
>> 
>>  (2) The compound page is not mapped by a pmd and is awaiting split on a
>>      deferred split queue.
>> 
>> The current charge migration implementation does *not* migrate charges for 
>> thp memory on the deferred split queue, it only migrates charges for pages 
>> that are mapped by a pmd.
>> 
>> Thus, to migrate charges, the underlying compound page cannot be on a 
>> deferred split queue; no list manipulation needs to be done in 
>> mem_cgroup_move_account().
>> 
>> With the current code, the underlying compound page is moved to the 
>> deferred split queue of the memcg its memory is not charged to, so 
>> susbequent reclaim will consider these pages for the wrong memcg.  Remove 
>> the deferred split queue handling in mem_cgroup_move_account() entirely.
>
>I believe this still doesn't describe the underlying problem to the full
>extent. What happens with the page on the deferred list when it
>shouldn't be there in fact? Unless I am missing something deferred_split_scan
>will simply split that huge page. Which is a bit unfortunate but nothing
>really critical. This should be mentioned in the changelog.
>

Per my understanding, if we do the split when it is not necessary, we
probably have a lower performance due to tlb miss. For others, I don't see the
impact.

>With that clarified, feel free to add
>
>Acked-by: Michal Hocko <mhocko@suse.com>
>
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-20  7:22     ` Michal Hocko
  2020-01-20  8:17       ` Wei Yang
@ 2020-01-20 21:10       ` David Rientjes
  2020-01-20 21:27         ` Michal Hocko
  1 sibling, 1 reply; 13+ messages in thread
From: David Rientjes @ 2020-01-20 21:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Wei Yang, hannes, vdavydov.dev, ktkhai,
	kirill.shutemov, yang.shi, cgroups, linux-mm, linux-kernel,
	alexander.duyck, stable

On Mon, 20 Jan 2020, Michal Hocko wrote:

> > When migrating memcg charges of thp memory, there are two possibilities:
> > 
> >  (1) The underlying compound page is mapped by a pmd and thus does is not 
> >      on a deferred split queue (it's mapped), or
> > 
> >  (2) The compound page is not mapped by a pmd and is awaiting split on a
> >      deferred split queue.
> > 
> > The current charge migration implementation does *not* migrate charges for 
> > thp memory on the deferred split queue, it only migrates charges for pages 
> > that are mapped by a pmd.
> > 
> > Thus, to migrate charges, the underlying compound page cannot be on a 
> > deferred split queue; no list manipulation needs to be done in 
> > mem_cgroup_move_account().
> > 
> > With the current code, the underlying compound page is moved to the 
> > deferred split queue of the memcg its memory is not charged to, so 
> > susbequent reclaim will consider these pages for the wrong memcg.  Remove 
> > the deferred split queue handling in mem_cgroup_move_account() entirely.
> 
> I believe this still doesn't describe the underlying problem to the full
> extent. What happens with the page on the deferred list when it
> shouldn't be there in fact? Unless I am missing something deferred_split_scan
> will simply split that huge page. Which is a bit unfortunate but nothing
> really critical. This should be mentioned in the changelog.
> 

Are you referring to a compound page on the deferred split queue before a 
task is moved?  I'm not sure this is within the scope of Wei's patch.. 
this is simply preventing a page from being moved to the deferred split
queue of a memcg that it is not charged to.  Is there a concern about why 
this code can be removed or a suggestion on something else it should be 
doing instead?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-20 21:10       ` David Rientjes
@ 2020-01-20 21:27         ` Michal Hocko
  2020-01-21 23:08           ` David Rientjes
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2020-01-20 21:27 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Wei Yang, hannes, vdavydov.dev, ktkhai,
	kirill.shutemov, yang.shi, cgroups, linux-mm, linux-kernel,
	alexander.duyck, stable

On Mon 20-01-20 13:10:56, David Rientjes wrote:
> On Mon, 20 Jan 2020, Michal Hocko wrote:
> 
> > > When migrating memcg charges of thp memory, there are two possibilities:
> > > 
> > >  (1) The underlying compound page is mapped by a pmd and thus does is not 
> > >      on a deferred split queue (it's mapped), or
> > > 
> > >  (2) The compound page is not mapped by a pmd and is awaiting split on a
> > >      deferred split queue.
> > > 
> > > The current charge migration implementation does *not* migrate charges for 
> > > thp memory on the deferred split queue, it only migrates charges for pages 
> > > that are mapped by a pmd.
> > > 
> > > Thus, to migrate charges, the underlying compound page cannot be on a 
> > > deferred split queue; no list manipulation needs to be done in 
> > > mem_cgroup_move_account().
> > > 
> > > With the current code, the underlying compound page is moved to the 
> > > deferred split queue of the memcg its memory is not charged to, so 
> > > susbequent reclaim will consider these pages for the wrong memcg.  Remove 
> > > the deferred split queue handling in mem_cgroup_move_account() entirely.
> > 
> > I believe this still doesn't describe the underlying problem to the full
> > extent. What happens with the page on the deferred list when it
> > shouldn't be there in fact? Unless I am missing something deferred_split_scan
> > will simply split that huge page. Which is a bit unfortunate but nothing
> > really critical. This should be mentioned in the changelog.
> > 
> 
> Are you referring to a compound page on the deferred split queue before a 
> task is moved?  I'm not sure this is within the scope of Wei's patch.. 
> this is simply preventing a page from being moved to the deferred split
> queue of a memcg that it is not charged to.  Is there a concern about why 
> this code can be removed or a suggestion on something else it should be 
> doing instead?

No, I do not have any concern about the patch itslef. It is that the
changelog doesn't decribe the user visible effect. All I am saying is
that the current code splits THPs of moved pages under memory pressure
even if that is not needed. And that is a clear bug.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-20 21:27         ` Michal Hocko
@ 2020-01-21 23:08           ` David Rientjes
  2020-01-22  8:14             ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2020-01-21 23:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Wei Yang, hannes, vdavydov.dev, ktkhai,
	kirill.shutemov, yang.shi, cgroups, linux-mm, linux-kernel,
	alexander.duyck, stable

On Mon, 20 Jan 2020, Michal Hocko wrote:

> > > > When migrating memcg charges of thp memory, there are two possibilities:
> > > > 
> > > >  (1) The underlying compound page is mapped by a pmd and thus does is not 
> > > >      on a deferred split queue (it's mapped), or
> > > > 
> > > >  (2) The compound page is not mapped by a pmd and is awaiting split on a
> > > >      deferred split queue.
> > > > 
> > > > The current charge migration implementation does *not* migrate charges for 
> > > > thp memory on the deferred split queue, it only migrates charges for pages 
> > > > that are mapped by a pmd.
> > > > 
> > > > Thus, to migrate charges, the underlying compound page cannot be on a 
> > > > deferred split queue; no list manipulation needs to be done in 
> > > > mem_cgroup_move_account().
> > > > 
> > > > With the current code, the underlying compound page is moved to the 
> > > > deferred split queue of the memcg its memory is not charged to, so 
> > > > susbequent reclaim will consider these pages for the wrong memcg.  Remove 
> > > > the deferred split queue handling in mem_cgroup_move_account() entirely.
> > > 
> > > I believe this still doesn't describe the underlying problem to the full
> > > extent. What happens with the page on the deferred list when it
> > > shouldn't be there in fact? Unless I am missing something deferred_split_scan
> > > will simply split that huge page. Which is a bit unfortunate but nothing
> > > really critical. This should be mentioned in the changelog.
> > > 
> > 
> > Are you referring to a compound page on the deferred split queue before a 
> > task is moved?  I'm not sure this is within the scope of Wei's patch.. 
> > this is simply preventing a page from being moved to the deferred split
> > queue of a memcg that it is not charged to.  Is there a concern about why 
> > this code can be removed or a suggestion on something else it should be 
> > doing instead?
> 
> No, I do not have any concern about the patch itslef. It is that the
> changelog doesn't decribe the user visible effect. All I am saying is
> that the current code splits THPs of moved pages under memory pressure
> even if that is not needed. And that is a clear bug.

Ah, gotcha.  I tried to do this in the final paragraph of my amedment to 
Wei's patch and why it's important that this is marked as stable.

The current code in 5.4 from commit 87eaceb3faa59 places any migrated 
compound page onto the deferred split queue of the destination memcg 
regardless of whether it has a mapping pmd 
(list_empty(page_deferred_list()) was already false) or it does not have a 
mapping pmd (but is now on the wrong queue).  For the latter, 
can_split_huge_page() can help for the actual split but not for the 
removal of the page that is now erroneously on the queue.  For the former, 
memcg reclaim would not see the pages that it should split under memcg 
pressure so we'll see the same memcg oom conditions we saw before the 
deferred split shrinker became SHRINKER_MEMCG_AWARE: unnecessary ooms.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-21 23:08           ` David Rientjes
@ 2020-01-22  8:14             ` Michal Hocko
  2020-01-22 23:39               ` David Rientjes
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2020-01-22  8:14 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Wei Yang, hannes, vdavydov.dev, ktkhai,
	kirill.shutemov, yang.shi, cgroups, linux-mm, linux-kernel,
	alexander.duyck, stable

On Tue 21-01-20 15:08:39, David Rientjes wrote:
> On Mon, 20 Jan 2020, Michal Hocko wrote:
> 
> > > > > When migrating memcg charges of thp memory, there are two possibilities:
> > > > > 
> > > > >  (1) The underlying compound page is mapped by a pmd and thus does is not 
> > > > >      on a deferred split queue (it's mapped), or
> > > > > 
> > > > >  (2) The compound page is not mapped by a pmd and is awaiting split on a
> > > > >      deferred split queue.
> > > > > 
> > > > > The current charge migration implementation does *not* migrate charges for 
> > > > > thp memory on the deferred split queue, it only migrates charges for pages 
> > > > > that are mapped by a pmd.
> > > > > 
> > > > > Thus, to migrate charges, the underlying compound page cannot be on a 
> > > > > deferred split queue; no list manipulation needs to be done in 
> > > > > mem_cgroup_move_account().
> > > > > 
> > > > > With the current code, the underlying compound page is moved to the 
> > > > > deferred split queue of the memcg its memory is not charged to, so 
> > > > > susbequent reclaim will consider these pages for the wrong memcg.  Remove 
> > > > > the deferred split queue handling in mem_cgroup_move_account() entirely.
> > > > 
> > > > I believe this still doesn't describe the underlying problem to the full
> > > > extent. What happens with the page on the deferred list when it
> > > > shouldn't be there in fact? Unless I am missing something deferred_split_scan
> > > > will simply split that huge page. Which is a bit unfortunate but nothing
> > > > really critical. This should be mentioned in the changelog.
> > > > 
> > > 
> > > Are you referring to a compound page on the deferred split queue before a 
> > > task is moved?  I'm not sure this is within the scope of Wei's patch.. 
> > > this is simply preventing a page from being moved to the deferred split
> > > queue of a memcg that it is not charged to.  Is there a concern about why 
> > > this code can be removed or a suggestion on something else it should be 
> > > doing instead?
> > 
> > No, I do not have any concern about the patch itslef. It is that the
> > changelog doesn't decribe the user visible effect. All I am saying is
> > that the current code splits THPs of moved pages under memory pressure
> > even if that is not needed. And that is a clear bug.
> 
> Ah, gotcha.  I tried to do this in the final paragraph of my amedment to 
> Wei's patch and why it's important that this is marked as stable.

I considered "susbequent reclaim will consider these pages for the wrong
memcg." quite unclear TBH.
 
> The current code in 5.4 from commit 87eaceb3faa59 places any migrated 
> compound page onto the deferred split queue of the destination memcg 
> regardless of whether it has a mapping pmd 
> (list_empty(page_deferred_list()) was already false) or it does not have a 
> mapping pmd (but is now on the wrong queue).  For the latter, 
> can_split_huge_page() can help for the actual split but not for the 
> removal of the page that is now erroneously on the queue.

Does that mean that those fully mapped THPs are not going to be split?

> For the former, 
> memcg reclaim would not see the pages that it should split under memcg 
> pressure so we'll see the same memcg oom conditions we saw before the 
> deferred split shrinker became SHRINKER_MEMCG_AWARE: unnecessary ooms.

OK, this is yet another user visibile effect and it would be better to
mention it explicitly in the changelog. 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen
  2020-01-22  8:14             ` Michal Hocko
@ 2020-01-22 23:39               ` David Rientjes
  0 siblings, 0 replies; 13+ messages in thread
From: David Rientjes @ 2020-01-22 23:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Wei Yang, hannes, vdavydov.dev, ktkhai,
	kirill.shutemov, yang.shi, cgroups, linux-mm, linux-kernel,
	alexander.duyck, stable

On Wed, 22 Jan 2020, Michal Hocko wrote:

> > The current code in 5.4 from commit 87eaceb3faa59 places any migrated 
> > compound page onto the deferred split queue of the destination memcg 
> > regardless of whether it has a mapping pmd 
> > (list_empty(page_deferred_list()) was already false) or it does not have a 
> > mapping pmd (but is now on the wrong queue).  For the latter, 
> > can_split_huge_page() can help for the actual split but not for the 
> > removal of the page that is now erroneously on the queue.
> 
> Does that mean that those fully mapped THPs are not going to be split?
> 

It believe it should but deferred_split_scan() also won't take it off the 
wrong split queue so it will remain there and any other checks for 
page_deferred_list(page) will succeed.

> > For the former, 
> > memcg reclaim would not see the pages that it should split under memcg 
> > pressure so we'll see the same memcg oom conditions we saw before the 
> > deferred split shrinker became SHRINKER_MEMCG_AWARE: unnecessary ooms.
> 
> OK, this is yet another user visibile effect and it would be better to
> mention it explicitly in the changelog. 
> 

Ok, feel free to add to the last paragraph:

Memcg reclaim would not see the compound pages that it should split under 
memcg pressure so we'll otherwise see the same memcg oom conditions we saw 
before the deferred split shrinker became SHRINKER_MEMCG_AWARE: 
unnecessary ooms.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-01-22 23:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-17 23:38 [Patch v4] mm: thp: remove the defer list related code since this will not happen Wei Yang
2020-01-18  0:57 ` Yang Shi
2020-01-18  5:30   ` Yang Shi
2020-01-18 22:54 ` Andrew Morton
2020-01-18 23:36   ` David Rientjes
2020-01-19  2:24     ` Wei Yang
2020-01-20  7:22     ` Michal Hocko
2020-01-20  8:17       ` Wei Yang
2020-01-20 21:10       ` David Rientjes
2020-01-20 21:27         ` Michal Hocko
2020-01-21 23:08           ` David Rientjes
2020-01-22  8:14             ` Michal Hocko
2020-01-22 23:39               ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).