linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch v2] mm: thp: grab the lock before manipulation defer list
@ 2020-01-09 14:30 Wei Yang
  2020-01-09 18:52 ` David Rientjes
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Wei Yang @ 2020-01-09 14:30 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm
  Cc: cgroups, linux-mm, linux-kernel, kirill.shutemov, yang.shi,
	alexander.duyck, rientjes, Wei Yang

As all the other places, we grab the lock before manipulate the defer list.
Current implementation may face a race condition.

For example, the potential race would be:

    CPU1                      CPU2
    mem_cgroup_move_account   split_huge_page_to_list
      !list_empty
                                lock
                                !list_empty
                                list_del
                                unlock
      lock
      # !list_empty might not hold anymore
      list_del_init
      unlock

When this sequence happens, the list_del_init() in
mem_cgroup_move_account() would crash if CONFIG_DEBUG_LIST since the
page is already been removed by list_del in split_huge_page_to_list().

Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>

---
v2:
  * move check on compound outside suggested by Alexander
  * an example of the race condition, suggested by Michal
---
 mm/memcontrol.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bc01423277c5..1492eefe4f3c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5368,10 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
 	}
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	if (compound && !list_empty(page_deferred_list(page))) {
+	if (compound) {
 		spin_lock(&from->deferred_split_queue.split_queue_lock);
-		list_del_init(page_deferred_list(page));
-		from->deferred_split_queue.split_queue_len--;
+		if (!list_empty(page_deferred_list(page))) {
+			list_del_init(page_deferred_list(page));
+			from->deferred_split_queue.split_queue_len--;
+		}
 		spin_unlock(&from->deferred_split_queue.split_queue_lock);
 	}
 #endif
@@ -5385,11 +5387,13 @@ static int mem_cgroup_move_account(struct page *page,
 	page->mem_cgroup = to;
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	if (compound && list_empty(page_deferred_list(page))) {
+	if (compound) {
 		spin_lock(&to->deferred_split_queue.split_queue_lock);
-		list_add_tail(page_deferred_list(page),
-			      &to->deferred_split_queue.split_queue);
-		to->deferred_split_queue.split_queue_len++;
+		if (list_empty(page_deferred_list(page))) {
+			list_add_tail(page_deferred_list(page),
+				      &to->deferred_split_queue.split_queue);
+			to->deferred_split_queue.split_queue_len++;
+		}
 		spin_unlock(&to->deferred_split_queue.split_queue_lock);
 	}
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-09 14:30 [Patch v2] mm: thp: grab the lock before manipulation defer list Wei Yang
@ 2020-01-09 18:52 ` David Rientjes
  2020-01-09 18:58 ` Michal Hocko
  2020-01-11  0:03 ` Kirill A. Shutemov
  2 siblings, 0 replies; 16+ messages in thread
From: David Rientjes @ 2020-01-09 18:52 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
	linux-kernel, kirill.shutemov, yang.shi, alexander.duyck

On Thu, 9 Jan 2020, Wei Yang wrote:

> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.
> 
> For example, the potential race would be:
> 
>     CPU1                      CPU2
>     mem_cgroup_move_account   split_huge_page_to_list
>       !list_empty
>                                 lock
>                                 !list_empty
>                                 list_del
>                                 unlock
>       lock
>       # !list_empty might not hold anymore
>       list_del_init
>       unlock
> 
> When this sequence happens, the list_del_init() in
> mem_cgroup_move_account() would crash if CONFIG_DEBUG_LIST since the
> page is already been removed by list_del in split_huge_page_to_list().
> 
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> Acked-by: David Rientjes <rientjes@google.com>

Thanks Wei!

Andrew, I'd also suggest:

Cc: stable@vger.kernel.org # 5.4+

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-09 14:30 [Patch v2] mm: thp: grab the lock before manipulation defer list Wei Yang
  2020-01-09 18:52 ` David Rientjes
@ 2020-01-09 18:58 ` Michal Hocko
  2020-01-11  0:03 ` Kirill A. Shutemov
  2 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2020-01-09 18:58 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel,
	kirill.shutemov, yang.shi, alexander.duyck, rientjes

On Thu 09-01-20 22:30:54, Wei Yang wrote:
> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.
> 
> For example, the potential race would be:
> 
>     CPU1                      CPU2
>     mem_cgroup_move_account   split_huge_page_to_list
>       !list_empty
>                                 lock
>                                 !list_empty
>                                 list_del
>                                 unlock
>       lock
>       # !list_empty might not hold anymore
>       list_del_init
>       unlock
> 
> When this sequence happens, the list_del_init() in
> mem_cgroup_move_account() would crash if CONFIG_DEBUG_LIST since the
> page is already been removed by list_del in split_huge_page_to_list().
> 
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> Acked-by: David Rientjes <rientjes@google.com>

Thanks a lot for the changelog approvements!
Acked-by: Michal Hocko <mhocko@suse.com>

> 
> ---
> v2:
>   * move check on compound outside suggested by Alexander
>   * an example of the race condition, suggested by Michal
> ---
>  mm/memcontrol.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc01423277c5..1492eefe4f3c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5368,10 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
>  	}
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	if (compound && !list_empty(page_deferred_list(page))) {
> +	if (compound) {
>  		spin_lock(&from->deferred_split_queue.split_queue_lock);
> -		list_del_init(page_deferred_list(page));
> -		from->deferred_split_queue.split_queue_len--;
> +		if (!list_empty(page_deferred_list(page))) {
> +			list_del_init(page_deferred_list(page));
> +			from->deferred_split_queue.split_queue_len--;
> +		}
>  		spin_unlock(&from->deferred_split_queue.split_queue_lock);
>  	}
>  #endif
> @@ -5385,11 +5387,13 @@ static int mem_cgroup_move_account(struct page *page,
>  	page->mem_cgroup = to;
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	if (compound && list_empty(page_deferred_list(page))) {
> +	if (compound) {
>  		spin_lock(&to->deferred_split_queue.split_queue_lock);
> -		list_add_tail(page_deferred_list(page),
> -			      &to->deferred_split_queue.split_queue);
> -		to->deferred_split_queue.split_queue_len++;
> +		if (list_empty(page_deferred_list(page))) {
> +			list_add_tail(page_deferred_list(page),
> +				      &to->deferred_split_queue.split_queue);
> +			to->deferred_split_queue.split_queue_len++;
> +		}
>  		spin_unlock(&to->deferred_split_queue.split_queue_lock);
>  	}
>  #endif
> -- 
> 2.17.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-09 14:30 [Patch v2] mm: thp: grab the lock before manipulation defer list Wei Yang
  2020-01-09 18:52 ` David Rientjes
  2020-01-09 18:58 ` Michal Hocko
@ 2020-01-11  0:03 ` Kirill A. Shutemov
  2020-01-12  2:28   ` Wei Yang
  2020-01-14  9:31   ` Michal Hocko
  2 siblings, 2 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-01-11  0:03 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
	linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
	rientjes

On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.
> 
> For example, the potential race would be:
> 
>     CPU1                      CPU2
>     mem_cgroup_move_account   split_huge_page_to_list
>       !list_empty
>                                 lock
>                                 !list_empty
>                                 list_del
>                                 unlock
>       lock
>       # !list_empty might not hold anymore
>       list_del_init
>       unlock

I don't think this particular race is possible. Both parties take page
lock before messing with deferred queue, but anytway:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> 
> When this sequence happens, the list_del_init() in
> mem_cgroup_move_account() would crash if CONFIG_DEBUG_LIST since the
> page is already been removed by list_del in split_huge_page_to_list().
> 
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> Acked-by: David Rientjes <rientjes@google.com>
> 
> ---
> v2:
>   * move check on compound outside suggested by Alexander
>   * an example of the race condition, suggested by Michal
> ---
>  mm/memcontrol.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc01423277c5..1492eefe4f3c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5368,10 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
>  	}
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	if (compound && !list_empty(page_deferred_list(page))) {
> +	if (compound) {
>  		spin_lock(&from->deferred_split_queue.split_queue_lock);
> -		list_del_init(page_deferred_list(page));
> -		from->deferred_split_queue.split_queue_len--;
> +		if (!list_empty(page_deferred_list(page))) {
> +			list_del_init(page_deferred_list(page));
> +			from->deferred_split_queue.split_queue_len--;
> +		}
>  		spin_unlock(&from->deferred_split_queue.split_queue_lock);
>  	}
>  #endif
> @@ -5385,11 +5387,13 @@ static int mem_cgroup_move_account(struct page *page,
>  	page->mem_cgroup = to;
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	if (compound && list_empty(page_deferred_list(page))) {
> +	if (compound) {
>  		spin_lock(&to->deferred_split_queue.split_queue_lock);
> -		list_add_tail(page_deferred_list(page),
> -			      &to->deferred_split_queue.split_queue);
> -		to->deferred_split_queue.split_queue_len++;
> +		if (list_empty(page_deferred_list(page))) {
> +			list_add_tail(page_deferred_list(page),
> +				      &to->deferred_split_queue.split_queue);
> +			to->deferred_split_queue.split_queue_len++;
> +		}
>  		spin_unlock(&to->deferred_split_queue.split_queue_lock);
>  	}
>  #endif
> -- 
> 2.17.1
> 
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-11  0:03 ` Kirill A. Shutemov
@ 2020-01-12  2:28   ` Wei Yang
  2020-01-12 22:57     ` Kirill A. Shutemov
  2020-01-14  9:31   ` Michal Hocko
  1 sibling, 1 reply; 16+ messages in thread
From: Wei Yang @ 2020-01-12  2:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Wei Yang, hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
	linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
	rientjes

On Sat, Jan 11, 2020 at 03:03:52AM +0300, Kirill A. Shutemov wrote:
>On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
>> As all the other places, we grab the lock before manipulate the defer list.
>> Current implementation may face a race condition.
>> 
>> For example, the potential race would be:
>> 
>>     CPU1                      CPU2
>>     mem_cgroup_move_account   split_huge_page_to_list
>>       !list_empty
>>                                 lock
>>                                 !list_empty
>>                                 list_del
>>                                 unlock
>>       lock
>>       # !list_empty might not hold anymore
>>       list_del_init
>>       unlock
>
>I don't think this particular race is possible. Both parties take page
>lock before messing with deferred queue, but anytway:

I am afraid not. Page lock is per page, while defer queue is per pgdate or
memcg.

It is possible two page in the same pgdate or memcg grab page lock
respectively and then access the same defer queue concurrently.

>
>Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
>> 
>> When this sequence happens, the list_del_init() in
>> mem_cgroup_move_account() would crash if CONFIG_DEBUG_LIST since the
>> page is already been removed by list_del in split_huge_page_to_list().
>> 
>> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>> 
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> Acked-by: David Rientjes <rientjes@google.com>
>> 
>> ---
>> v2:
>>   * move check on compound outside suggested by Alexander
>>   * an example of the race condition, suggested by Michal
>> ---
>>  mm/memcontrol.c | 18 +++++++++++-------
>>  1 file changed, 11 insertions(+), 7 deletions(-)
>> 
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index bc01423277c5..1492eefe4f3c 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5368,10 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
>>  	}
>>  
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -	if (compound && !list_empty(page_deferred_list(page))) {
>> +	if (compound) {
>>  		spin_lock(&from->deferred_split_queue.split_queue_lock);
>> -		list_del_init(page_deferred_list(page));
>> -		from->deferred_split_queue.split_queue_len--;
>> +		if (!list_empty(page_deferred_list(page))) {
>> +			list_del_init(page_deferred_list(page));
>> +			from->deferred_split_queue.split_queue_len--;
>> +		}
>>  		spin_unlock(&from->deferred_split_queue.split_queue_lock);
>>  	}
>>  #endif
>> @@ -5385,11 +5387,13 @@ static int mem_cgroup_move_account(struct page *page,
>>  	page->mem_cgroup = to;
>>  
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -	if (compound && list_empty(page_deferred_list(page))) {
>> +	if (compound) {
>>  		spin_lock(&to->deferred_split_queue.split_queue_lock);
>> -		list_add_tail(page_deferred_list(page),
>> -			      &to->deferred_split_queue.split_queue);
>> -		to->deferred_split_queue.split_queue_len++;
>> +		if (list_empty(page_deferred_list(page))) {
>> +			list_add_tail(page_deferred_list(page),
>> +				      &to->deferred_split_queue.split_queue);
>> +			to->deferred_split_queue.split_queue_len++;
>> +		}
>>  		spin_unlock(&to->deferred_split_queue.split_queue_lock);
>>  	}
>>  #endif
>> -- 
>> 2.17.1
>> 
>> 
>
>-- 
> Kirill A. Shutemov

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-12  2:28   ` Wei Yang
@ 2020-01-12 22:57     ` Kirill A. Shutemov
  2020-01-13  0:44       ` Wei Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-01-12 22:57 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
	linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
	rientjes

On Sun, Jan 12, 2020 at 10:28:58AM +0800, Wei Yang wrote:
> On Sat, Jan 11, 2020 at 03:03:52AM +0300, Kirill A. Shutemov wrote:
> >On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
> >> As all the other places, we grab the lock before manipulate the defer list.
> >> Current implementation may face a race condition.
> >> 
> >> For example, the potential race would be:
> >> 
> >>     CPU1                      CPU2
> >>     mem_cgroup_move_account   split_huge_page_to_list
> >>       !list_empty
> >>                                 lock
> >>                                 !list_empty
> >>                                 list_del
> >>                                 unlock
> >>       lock
> >>       # !list_empty might not hold anymore
> >>       list_del_init
> >>       unlock
> >
> >I don't think this particular race is possible. Both parties take page
> >lock before messing with deferred queue, but anytway:
> 
> I am afraid not. Page lock is per page, while defer queue is per pgdate or
> memcg.
> 
> It is possible two page in the same pgdate or memcg grab page lock
> respectively and then access the same defer queue concurrently.

Look closer on the list_empty() argument. It's list_head local to the
page. Too different pages can be handled in parallel without any problem
in this particular scenario. As long as we as we modify it under the lock.

Said that, page lock here was somewhat accidential and I still belive we
need to move the check under the lock anyway.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-12 22:57     ` Kirill A. Shutemov
@ 2020-01-13  0:44       ` Wei Yang
  2020-01-13  7:36         ` Kirill A. Shutemov
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2020-01-13  0:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Wei Yang, hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
	linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
	rientjes

On Mon, Jan 13, 2020 at 01:57:18AM +0300, Kirill A. Shutemov wrote:
>On Sun, Jan 12, 2020 at 10:28:58AM +0800, Wei Yang wrote:
>> On Sat, Jan 11, 2020 at 03:03:52AM +0300, Kirill A. Shutemov wrote:
>> >On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
>> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> Current implementation may face a race condition.
>> >> 
>> >> For example, the potential race would be:
>> >> 
>> >>     CPU1                      CPU2
>> >>     mem_cgroup_move_account   split_huge_page_to_list
>> >>       !list_empty
>> >>                                 lock
>> >>                                 !list_empty
>> >>                                 list_del
>> >>                                 unlock
>> >>       lock
>> >>       # !list_empty might not hold anymore
>> >>       list_del_init
>> >>       unlock
>> >
>> >I don't think this particular race is possible. Both parties take page
>> >lock before messing with deferred queue, but anytway:
>> 
>> I am afraid not. Page lock is per page, while defer queue is per pgdate or
>> memcg.
>> 
>> It is possible two page in the same pgdate or memcg grab page lock
>> respectively and then access the same defer queue concurrently.
>
>Look closer on the list_empty() argument. It's list_head local to the
>page. Too different pages can be handled in parallel without any problem
>in this particular scenario. As long as we as we modify it under the lock.
>
>Said that, page lock here was somewhat accidential and I still belive we
>need to move the check under the lock anyway.
>

If my understanding is correct, you agree with my statement?

>-- 
> Kirill A. Shutemov

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-13  0:44       ` Wei Yang
@ 2020-01-13  7:36         ` Kirill A. Shutemov
  2020-01-13  8:23           ` Wei Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-01-13  7:36 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
	linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
	rientjes

On Mon, Jan 13, 2020 at 08:44:57AM +0800, Wei Yang wrote:
> >> It is possible two page in the same pgdate or memcg grab page lock
> >> respectively and then access the same defer queue concurrently.
> 
> If my understanding is correct, you agree with my statement?

Which one? If the one above then no. list_empty only accesses list_head
for the struct page, not the queue.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-13  7:36         ` Kirill A. Shutemov
@ 2020-01-13  8:23           ` Wei Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Yang @ 2020-01-13  8:23 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Wei Yang, hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
	linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
	rientjes

On Mon, Jan 13, 2020 at 10:36:14AM +0300, Kirill A. Shutemov wrote:
>On Mon, Jan 13, 2020 at 08:44:57AM +0800, Wei Yang wrote:
>> >> It is possible two page in the same pgdate or memcg grab page lock
>> >> respectively and then access the same defer queue concurrently.
>> 
>> If my understanding is correct, you agree with my statement?
>
>Which one? If the one above then no. list_empty only accesses list_head
>for the struct page, not the queue.
>

Ah, I get your point.

>-- 
> Kirill A. Shutemov

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-11  0:03 ` Kirill A. Shutemov
  2020-01-12  2:28   ` Wei Yang
@ 2020-01-14  9:31   ` Michal Hocko
  2020-01-14 10:31     ` Kirill A. Shutemov
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2020-01-14  9:31 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Wei Yang, hannes, vdavydov.dev, akpm, cgroups, linux-mm,
	linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
	rientjes

On Sat 11-01-20 03:03:52, Kirill A. Shutemov wrote:
> On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
> > As all the other places, we grab the lock before manipulate the defer list.
> > Current implementation may face a race condition.
> > 
> > For example, the potential race would be:
> > 
> >     CPU1                      CPU2
> >     mem_cgroup_move_account   split_huge_page_to_list
> >       !list_empty
> >                                 lock
> >                                 !list_empty
> >                                 list_del
> >                                 unlock
> >       lock
> >       # !list_empty might not hold anymore
> >       list_del_init
> >       unlock
> 
> I don't think this particular race is possible. Both parties take page
> lock before messing with deferred queue, but anytway:
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

I am confused, if the above race is not possible then what would be a
real race? We really do not want to have a patch with a misleading
changelog, do we?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-14  9:31   ` Michal Hocko
@ 2020-01-14 10:31     ` Kirill A. Shutemov
  2020-01-14 10:59       ` Kirill A. Shutemov
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-01-14 10:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, hannes, vdavydov.dev, akpm, cgroups, linux-mm,
	linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
	rientjes

On Tue, Jan 14, 2020 at 10:31:22AM +0100, Michal Hocko wrote:
> On Sat 11-01-20 03:03:52, Kirill A. Shutemov wrote:
> > On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
> > > As all the other places, we grab the lock before manipulate the defer list.
> > > Current implementation may face a race condition.
> > > 
> > > For example, the potential race would be:
> > > 
> > >     CPU1                      CPU2
> > >     mem_cgroup_move_account   split_huge_page_to_list
> > >       !list_empty
> > >                                 lock
> > >                                 !list_empty
> > >                                 list_del
> > >                                 unlock
> > >       lock
> > >       # !list_empty might not hold anymore
> > >       list_del_init
> > >       unlock
> > 
> > I don't think this particular race is possible. Both parties take page
> > lock before messing with deferred queue, but anytway:
> > 
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> I am confused, if the above race is not possible then what would be a
> real race? We really do not want to have a patch with a misleading
> changelog, do we?

The alternative is to make sure that all page_deferred_list() called with
page lock taken.

I'll look into it.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-14 10:31     ` Kirill A. Shutemov
@ 2020-01-14 10:59       ` Kirill A. Shutemov
  2020-01-14 20:57         ` David Rientjes
  2020-01-15  1:07         ` Wei Yang
  0 siblings, 2 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-01-14 10:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, hannes, vdavydov.dev, akpm, cgroups, linux-mm,
	linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
	rientjes

On Tue, Jan 14, 2020 at 01:31:12PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jan 14, 2020 at 10:31:22AM +0100, Michal Hocko wrote:
> > On Sat 11-01-20 03:03:52, Kirill A. Shutemov wrote:
> > > On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
> > > > As all the other places, we grab the lock before manipulate the defer list.
> > > > Current implementation may face a race condition.
> > > > 
> > > > For example, the potential race would be:
> > > > 
> > > >     CPU1                      CPU2
> > > >     mem_cgroup_move_account   split_huge_page_to_list
> > > >       !list_empty
> > > >                                 lock
> > > >                                 !list_empty
> > > >                                 list_del
> > > >                                 unlock
> > > >       lock
> > > >       # !list_empty might not hold anymore
> > > >       list_del_init
> > > >       unlock
> > > 
> > > I don't think this particular race is possible. Both parties take page
> > > lock before messing with deferred queue, but anytway:
> > > 
> > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > 
> > I am confused, if the above race is not possible then what would be a
> > real race? We really do not want to have a patch with a misleading
> > changelog, do we?
> 
> The alternative is to make sure that all page_deferred_list() called with
> page lock taken.
> 
> I'll look into it.

split_huge_page_to_list() has page lock taken.

free_transhuge_page() is in the free path and doesn't susceptible to the
race.

deferred_split_scan() is trickier. list_move() should be safe against
list_empty() as it will not produce false-positive list_empty().
list_del_init() *should* (correct me if I'm wrong) be safe because the page
is freeing and memcg will not touch the page anymore.

deferred_split_huge_page() is a problematic one. It called from
page_remove_rmap() path witch does require page lock. I don't see any
obvious way to exclude race with mem_cgroup_move_account() here.
Anybody else?

Wei, could you rewrite the commit message with deferred_split_huge_page()
as a race source instead of split_huge_page_to_list()?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-14 10:59       ` Kirill A. Shutemov
@ 2020-01-14 20:57         ` David Rientjes
  2020-01-15  1:19           ` Wei Yang
  2020-01-15  1:07         ` Wei Yang
  1 sibling, 1 reply; 16+ messages in thread
From: David Rientjes @ 2020-01-14 20:57 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Michal Hocko, Wei Yang, hannes, vdavydov.dev, akpm, cgroups,
	linux-mm, linux-kernel, kirill.shutemov, yang.shi,
	alexander.duyck

On Tue, 14 Jan 2020, Kirill A. Shutemov wrote:

> split_huge_page_to_list() has page lock taken.
> 
> free_transhuge_page() is in the free path and doesn't susceptible to the
> race.
> 
> deferred_split_scan() is trickier. list_move() should be safe against
> list_empty() as it will not produce false-positive list_empty().
> list_del_init() *should* (correct me if I'm wrong) be safe because the page
> is freeing and memcg will not touch the page anymore.
> 
> deferred_split_huge_page() is a problematic one. It called from
> page_remove_rmap() path witch does require page lock. I don't see any
> obvious way to exclude race with mem_cgroup_move_account() here.
> Anybody else?
> 
> Wei, could you rewrite the commit message with deferred_split_huge_page()
> as a race source instead of split_huge_page_to_list()?
> 

I think describing the race in terms of deferred_split_huge_page() makes 
the most sense and I'd prefer a cc to stable for 5.4+.  Even getting the 
split_queue_len, which is unsigned long, to underflow because of a 
list_empty(page_deferred_list()) check that is no longer accurate after 
the lock is taken would be a significant issue for shrinkers.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-14 10:59       ` Kirill A. Shutemov
  2020-01-14 20:57         ` David Rientjes
@ 2020-01-15  1:07         ` Wei Yang
  2020-01-15  3:26           ` David Rientjes
  1 sibling, 1 reply; 16+ messages in thread
From: Wei Yang @ 2020-01-15  1:07 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Michal Hocko, Wei Yang, hannes, vdavydov.dev, akpm, cgroups,
	linux-mm, linux-kernel, kirill.shutemov, yang.shi,
	alexander.duyck, rientjes

On Tue, Jan 14, 2020 at 01:59:21PM +0300, Kirill A. Shutemov wrote:
>On Tue, Jan 14, 2020 at 01:31:12PM +0300, Kirill A. Shutemov wrote:
>> On Tue, Jan 14, 2020 at 10:31:22AM +0100, Michal Hocko wrote:
>> > On Sat 11-01-20 03:03:52, Kirill A. Shutemov wrote:
>> > > On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
>> > > > As all the other places, we grab the lock before manipulate the defer list.
>> > > > Current implementation may face a race condition.
>> > > > 
>> > > > For example, the potential race would be:
>> > > > 
>> > > >     CPU1                      CPU2
>> > > >     mem_cgroup_move_account   split_huge_page_to_list
>> > > >       !list_empty
>> > > >                                 lock
>> > > >                                 !list_empty
>> > > >                                 list_del
>> > > >                                 unlock
>> > > >       lock
>> > > >       # !list_empty might not hold anymore
>> > > >       list_del_init
>> > > >       unlock
>> > > 
>> > > I don't think this particular race is possible. Both parties take page
>> > > lock before messing with deferred queue, but anytway:
>> > > 
>> > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> > 
>> > I am confused, if the above race is not possible then what would be a
>> > real race? We really do not want to have a patch with a misleading
>> > changelog, do we?
>> 
>> The alternative is to make sure that all page_deferred_list() called with
>> page lock taken.
>> 
>> I'll look into it.
>
>split_huge_page_to_list() has page lock taken.
>
>free_transhuge_page() is in the free path and doesn't susceptible to the
>race.
>
>deferred_split_scan() is trickier. list_move() should be safe against
>list_empty() as it will not produce false-positive list_empty().
>list_del_init() *should* (correct me if I'm wrong) be safe because the page
>is freeing and memcg will not touch the page anymore.
>
>deferred_split_huge_page() is a problematic one. It called from
>page_remove_rmap() path witch does require page lock. I don't see any
>obvious way to exclude race with mem_cgroup_move_account() here.
>Anybody else?

If my understanding is correct, the reason is deferred_split_huge_page()
doesn't has page lock taken, right?

>
>Wei, could you rewrite the commit message with deferred_split_huge_page()
>as a race source instead of split_huge_page_to_list()?
>
>-- 
> Kirill A. Shutemov

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-14 20:57         ` David Rientjes
@ 2020-01-15  1:19           ` Wei Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Yang @ 2020-01-15  1:19 UTC (permalink / raw)
  To: David Rientjes
  Cc: Kirill A. Shutemov, Michal Hocko, Wei Yang, hannes, vdavydov.dev,
	akpm, cgroups, linux-mm, linux-kernel, kirill.shutemov, yang.shi,
	alexander.duyck

On Tue, Jan 14, 2020 at 12:57:22PM -0800, David Rientjes wrote:
>On Tue, 14 Jan 2020, Kirill A. Shutemov wrote:
>
>> split_huge_page_to_list() has page lock taken.
>> 
>> free_transhuge_page() is in the free path and doesn't susceptible to the
>> race.
>> 
>> deferred_split_scan() is trickier. list_move() should be safe against
>> list_empty() as it will not produce false-positive list_empty().
>> list_del_init() *should* (correct me if I'm wrong) be safe because the page
>> is freeing and memcg will not touch the page anymore.
>> 
>> deferred_split_huge_page() is a problematic one. It called from
>> page_remove_rmap() path witch does require page lock. I don't see any
>> obvious way to exclude race with mem_cgroup_move_account() here.
>> Anybody else?
>> 
>> Wei, could you rewrite the commit message with deferred_split_huge_page()
>> as a race source instead of split_huge_page_to_list()?
>> 
>
>I think describing the race in terms of deferred_split_huge_page() makes 
>the most sense and I'd prefer a cc to stable for 5.4+.  Even getting the 
>split_queue_len, which is unsigned long, to underflow because of a 
>list_empty(page_deferred_list()) check that is no longer accurate after 
>the lock is taken would be a significant issue for shrinkers.

Oh, you are right. Even the list is not corrupted between
deferred_split_scan() and mem_cgroup_move_account(), split_queue_len would be
in a wrong state.

Hmm... to some extend, the page lock complicates the picture a little here,
even it helps in some cases.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
  2020-01-15  1:07         ` Wei Yang
@ 2020-01-15  3:26           ` David Rientjes
  0 siblings, 0 replies; 16+ messages in thread
From: David Rientjes @ 2020-01-15  3:26 UTC (permalink / raw)
  To: Wei Yang
  Cc: Kirill A. Shutemov, Michal Hocko, hannes, vdavydov.dev, akpm,
	cgroups, linux-mm, linux-kernel, kirill.shutemov, yang.shi,
	alexander.duyck

On Wed, 15 Jan 2020, Wei Yang wrote:

> >split_huge_page_to_list() has page lock taken.
> >
> >free_transhuge_page() is in the free path and doesn't susceptible to the
> >race.
> >
> >deferred_split_scan() is trickier. list_move() should be safe against
> >list_empty() as it will not produce false-positive list_empty().
> >list_del_init() *should* (correct me if I'm wrong) be safe because the page
> >is freeing and memcg will not touch the page anymore.
> >
> >deferred_split_huge_page() is a problematic one. It called from
> >page_remove_rmap() path witch does require page lock. I don't see any
> >obvious way to exclude race with mem_cgroup_move_account() here.
> >Anybody else?
> 
> If my understanding is correct, the reason is deferred_split_huge_page()
> doesn't has page lock taken, right?
> 

I think the fix that you have proposed has inspired some deeper looks at 
the locking around the deferred split queue and the hope was that perhaps 
this could be protected by the page lock but it was found that at least in 
one path that isn't taken.  So I believe your fix is still needed and any 
possible optimizations in this area can be proposed on top.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-01-15  3:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-09 14:30 [Patch v2] mm: thp: grab the lock before manipulation defer list Wei Yang
2020-01-09 18:52 ` David Rientjes
2020-01-09 18:58 ` Michal Hocko
2020-01-11  0:03 ` Kirill A. Shutemov
2020-01-12  2:28   ` Wei Yang
2020-01-12 22:57     ` Kirill A. Shutemov
2020-01-13  0:44       ` Wei Yang
2020-01-13  7:36         ` Kirill A. Shutemov
2020-01-13  8:23           ` Wei Yang
2020-01-14  9:31   ` Michal Hocko
2020-01-14 10:31     ` Kirill A. Shutemov
2020-01-14 10:59       ` Kirill A. Shutemov
2020-01-14 20:57         ` David Rientjes
2020-01-15  1:19           ` Wei Yang
2020-01-15  1:07         ` Wei Yang
2020-01-15  3:26           ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).