* [Patch v2] mm: thp: grab the lock before manipulation defer list
@ 2020-01-09 14:30 Wei Yang
2020-01-09 18:52 ` David Rientjes
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Wei Yang @ 2020-01-09 14:30 UTC (permalink / raw)
To: hannes, mhocko, vdavydov.dev, akpm
Cc: cgroups, linux-mm, linux-kernel, kirill.shutemov, yang.shi,
alexander.duyck, rientjes, Wei Yang
As all the other places, we grab the lock before manipulate the defer list.
Current implementation may face a race condition.
For example, the potential race would be:
CPU1 CPU2
mem_cgroup_move_account split_huge_page_to_list
!list_empty
lock
!list_empty
list_del
unlock
lock
# !list_empty might not hold anymore
list_del_init
unlock
When this sequence happens, the list_del_init() in
mem_cgroup_move_account() would crash if CONFIG_DEBUG_LIST since the
page is already been removed by list_del in split_huge_page_to_list().
Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
---
v2:
* move check on compound outside suggested by Alexander
* an example of the race condition, suggested by Michal
---
mm/memcontrol.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bc01423277c5..1492eefe4f3c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5368,10 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (compound && !list_empty(page_deferred_list(page))) {
+ if (compound) {
spin_lock(&from->deferred_split_queue.split_queue_lock);
- list_del_init(page_deferred_list(page));
- from->deferred_split_queue.split_queue_len--;
+ if (!list_empty(page_deferred_list(page))) {
+ list_del_init(page_deferred_list(page));
+ from->deferred_split_queue.split_queue_len--;
+ }
spin_unlock(&from->deferred_split_queue.split_queue_lock);
}
#endif
@@ -5385,11 +5387,13 @@ static int mem_cgroup_move_account(struct page *page,
page->mem_cgroup = to;
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (compound && list_empty(page_deferred_list(page))) {
+ if (compound) {
spin_lock(&to->deferred_split_queue.split_queue_lock);
- list_add_tail(page_deferred_list(page),
- &to->deferred_split_queue.split_queue);
- to->deferred_split_queue.split_queue_len++;
+ if (list_empty(page_deferred_list(page))) {
+ list_add_tail(page_deferred_list(page),
+ &to->deferred_split_queue.split_queue);
+ to->deferred_split_queue.split_queue_len++;
+ }
spin_unlock(&to->deferred_split_queue.split_queue_lock);
}
#endif
--
2.17.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-09 14:30 [Patch v2] mm: thp: grab the lock before manipulation defer list Wei Yang
@ 2020-01-09 18:52 ` David Rientjes
2020-01-09 18:58 ` Michal Hocko
2020-01-11 0:03 ` Kirill A. Shutemov
2 siblings, 0 replies; 16+ messages in thread
From: David Rientjes @ 2020-01-09 18:52 UTC (permalink / raw)
To: Wei Yang
Cc: hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
linux-kernel, kirill.shutemov, yang.shi, alexander.duyck
On Thu, 9 Jan 2020, Wei Yang wrote:
> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.
>
> For example, the potential race would be:
>
> CPU1 CPU2
> mem_cgroup_move_account split_huge_page_to_list
> !list_empty
> lock
> !list_empty
> list_del
> unlock
> lock
> # !list_empty might not hold anymore
> list_del_init
> unlock
>
> When this sequence happens, the list_del_init() in
> mem_cgroup_move_account() would crash if CONFIG_DEBUG_LIST since the
> page is already been removed by list_del in split_huge_page_to_list().
>
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> Acked-by: David Rientjes <rientjes@google.com>
Thanks Wei!
Andrew, I'd also suggest:
Cc: stable@vger.kernel.org # 5.4+
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-09 14:30 [Patch v2] mm: thp: grab the lock before manipulation defer list Wei Yang
2020-01-09 18:52 ` David Rientjes
@ 2020-01-09 18:58 ` Michal Hocko
2020-01-11 0:03 ` Kirill A. Shutemov
2 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2020-01-09 18:58 UTC (permalink / raw)
To: Wei Yang
Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel,
kirill.shutemov, yang.shi, alexander.duyck, rientjes
On Thu 09-01-20 22:30:54, Wei Yang wrote:
> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.
>
> For example, the potential race would be:
>
> CPU1 CPU2
> mem_cgroup_move_account split_huge_page_to_list
> !list_empty
> lock
> !list_empty
> list_del
> unlock
> lock
> # !list_empty might not hold anymore
> list_del_init
> unlock
>
> When this sequence happens, the list_del_init() in
> mem_cgroup_move_account() would crash if CONFIG_DEBUG_LIST since the
> page is already been removed by list_del in split_huge_page_to_list().
>
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> Acked-by: David Rientjes <rientjes@google.com>
Thanks a lot for the changelog approvements!
Acked-by: Michal Hocko <mhocko@suse.com>
>
> ---
> v2:
> * move check on compound outside suggested by Alexander
> * an example of the race condition, suggested by Michal
> ---
> mm/memcontrol.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc01423277c5..1492eefe4f3c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5368,10 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
> }
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - if (compound && !list_empty(page_deferred_list(page))) {
> + if (compound) {
> spin_lock(&from->deferred_split_queue.split_queue_lock);
> - list_del_init(page_deferred_list(page));
> - from->deferred_split_queue.split_queue_len--;
> + if (!list_empty(page_deferred_list(page))) {
> + list_del_init(page_deferred_list(page));
> + from->deferred_split_queue.split_queue_len--;
> + }
> spin_unlock(&from->deferred_split_queue.split_queue_lock);
> }
> #endif
> @@ -5385,11 +5387,13 @@ static int mem_cgroup_move_account(struct page *page,
> page->mem_cgroup = to;
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - if (compound && list_empty(page_deferred_list(page))) {
> + if (compound) {
> spin_lock(&to->deferred_split_queue.split_queue_lock);
> - list_add_tail(page_deferred_list(page),
> - &to->deferred_split_queue.split_queue);
> - to->deferred_split_queue.split_queue_len++;
> + if (list_empty(page_deferred_list(page))) {
> + list_add_tail(page_deferred_list(page),
> + &to->deferred_split_queue.split_queue);
> + to->deferred_split_queue.split_queue_len++;
> + }
> spin_unlock(&to->deferred_split_queue.split_queue_lock);
> }
> #endif
> --
> 2.17.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-09 14:30 [Patch v2] mm: thp: grab the lock before manipulation defer list Wei Yang
2020-01-09 18:52 ` David Rientjes
2020-01-09 18:58 ` Michal Hocko
@ 2020-01-11 0:03 ` Kirill A. Shutemov
2020-01-12 2:28 ` Wei Yang
2020-01-14 9:31 ` Michal Hocko
2 siblings, 2 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-01-11 0:03 UTC (permalink / raw)
To: Wei Yang
Cc: hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
rientjes
On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.
>
> For example, the potential race would be:
>
> CPU1 CPU2
> mem_cgroup_move_account split_huge_page_to_list
> !list_empty
> lock
> !list_empty
> list_del
> unlock
> lock
> # !list_empty might not hold anymore
> list_del_init
> unlock
I don't think this particular race is possible. Both parties take page
lock before messing with deferred queue, but anytway:
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
> When this sequence happens, the list_del_init() in
> mem_cgroup_move_account() would crash if CONFIG_DEBUG_LIST since the
> page is already been removed by list_del in split_huge_page_to_list().
>
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> Acked-by: David Rientjes <rientjes@google.com>
>
> ---
> v2:
> * move check on compound outside suggested by Alexander
> * an example of the race condition, suggested by Michal
> ---
> mm/memcontrol.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc01423277c5..1492eefe4f3c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5368,10 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
> }
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - if (compound && !list_empty(page_deferred_list(page))) {
> + if (compound) {
> spin_lock(&from->deferred_split_queue.split_queue_lock);
> - list_del_init(page_deferred_list(page));
> - from->deferred_split_queue.split_queue_len--;
> + if (!list_empty(page_deferred_list(page))) {
> + list_del_init(page_deferred_list(page));
> + from->deferred_split_queue.split_queue_len--;
> + }
> spin_unlock(&from->deferred_split_queue.split_queue_lock);
> }
> #endif
> @@ -5385,11 +5387,13 @@ static int mem_cgroup_move_account(struct page *page,
> page->mem_cgroup = to;
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - if (compound && list_empty(page_deferred_list(page))) {
> + if (compound) {
> spin_lock(&to->deferred_split_queue.split_queue_lock);
> - list_add_tail(page_deferred_list(page),
> - &to->deferred_split_queue.split_queue);
> - to->deferred_split_queue.split_queue_len++;
> + if (list_empty(page_deferred_list(page))) {
> + list_add_tail(page_deferred_list(page),
> + &to->deferred_split_queue.split_queue);
> + to->deferred_split_queue.split_queue_len++;
> + }
> spin_unlock(&to->deferred_split_queue.split_queue_lock);
> }
> #endif
> --
> 2.17.1
>
>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-11 0:03 ` Kirill A. Shutemov
@ 2020-01-12 2:28 ` Wei Yang
2020-01-12 22:57 ` Kirill A. Shutemov
2020-01-14 9:31 ` Michal Hocko
1 sibling, 1 reply; 16+ messages in thread
From: Wei Yang @ 2020-01-12 2:28 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Wei Yang, hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
rientjes
On Sat, Jan 11, 2020 at 03:03:52AM +0300, Kirill A. Shutemov wrote:
>On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
>> As all the other places, we grab the lock before manipulate the defer list.
>> Current implementation may face a race condition.
>>
>> For example, the potential race would be:
>>
>> CPU1 CPU2
>> mem_cgroup_move_account split_huge_page_to_list
>> !list_empty
>> lock
>> !list_empty
>> list_del
>> unlock
>> lock
>> # !list_empty might not hold anymore
>> list_del_init
>> unlock
>
>I don't think this particular race is possible. Both parties take page
>lock before messing with deferred queue, but anytway:
I am afraid not. Page lock is per page, while defer queue is per pgdate or
memcg.
It is possible two page in the same pgdate or memcg grab page lock
respectively and then access the same defer queue concurrently.
>
>Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
>>
>> When this sequence happens, the list_del_init() in
>> mem_cgroup_move_account() would crash if CONFIG_DEBUG_LIST since the
>> page is already been removed by list_del in split_huge_page_to_list().
>>
>> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>>
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> Acked-by: David Rientjes <rientjes@google.com>
>>
>> ---
>> v2:
>> * move check on compound outside suggested by Alexander
>> * an example of the race condition, suggested by Michal
>> ---
>> mm/memcontrol.c | 18 +++++++++++-------
>> 1 file changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index bc01423277c5..1492eefe4f3c 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5368,10 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
>> }
>>
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> - if (compound && !list_empty(page_deferred_list(page))) {
>> + if (compound) {
>> spin_lock(&from->deferred_split_queue.split_queue_lock);
>> - list_del_init(page_deferred_list(page));
>> - from->deferred_split_queue.split_queue_len--;
>> + if (!list_empty(page_deferred_list(page))) {
>> + list_del_init(page_deferred_list(page));
>> + from->deferred_split_queue.split_queue_len--;
>> + }
>> spin_unlock(&from->deferred_split_queue.split_queue_lock);
>> }
>> #endif
>> @@ -5385,11 +5387,13 @@ static int mem_cgroup_move_account(struct page *page,
>> page->mem_cgroup = to;
>>
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> - if (compound && list_empty(page_deferred_list(page))) {
>> + if (compound) {
>> spin_lock(&to->deferred_split_queue.split_queue_lock);
>> - list_add_tail(page_deferred_list(page),
>> - &to->deferred_split_queue.split_queue);
>> - to->deferred_split_queue.split_queue_len++;
>> + if (list_empty(page_deferred_list(page))) {
>> + list_add_tail(page_deferred_list(page),
>> + &to->deferred_split_queue.split_queue);
>> + to->deferred_split_queue.split_queue_len++;
>> + }
>> spin_unlock(&to->deferred_split_queue.split_queue_lock);
>> }
>> #endif
>> --
>> 2.17.1
>>
>>
>
>--
> Kirill A. Shutemov
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-12 2:28 ` Wei Yang
@ 2020-01-12 22:57 ` Kirill A. Shutemov
2020-01-13 0:44 ` Wei Yang
0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-01-12 22:57 UTC (permalink / raw)
To: Wei Yang
Cc: hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
rientjes
On Sun, Jan 12, 2020 at 10:28:58AM +0800, Wei Yang wrote:
> On Sat, Jan 11, 2020 at 03:03:52AM +0300, Kirill A. Shutemov wrote:
> >On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
> >> As all the other places, we grab the lock before manipulate the defer list.
> >> Current implementation may face a race condition.
> >>
> >> For example, the potential race would be:
> >>
> >> CPU1 CPU2
> >> mem_cgroup_move_account split_huge_page_to_list
> >> !list_empty
> >> lock
> >> !list_empty
> >> list_del
> >> unlock
> >> lock
> >> # !list_empty might not hold anymore
> >> list_del_init
> >> unlock
> >
> >I don't think this particular race is possible. Both parties take page
> >lock before messing with deferred queue, but anytway:
>
> I am afraid not. Page lock is per page, while defer queue is per pgdate or
> memcg.
>
> It is possible two page in the same pgdate or memcg grab page lock
> respectively and then access the same defer queue concurrently.
Look closer on the list_empty() argument. It's list_head local to the
page. Too different pages can be handled in parallel without any problem
in this particular scenario. As long as we as we modify it under the lock.
Said that, page lock here was somewhat accidential and I still belive we
need to move the check under the lock anyway.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-12 22:57 ` Kirill A. Shutemov
@ 2020-01-13 0:44 ` Wei Yang
2020-01-13 7:36 ` Kirill A. Shutemov
0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2020-01-13 0:44 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Wei Yang, hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
rientjes
On Mon, Jan 13, 2020 at 01:57:18AM +0300, Kirill A. Shutemov wrote:
>On Sun, Jan 12, 2020 at 10:28:58AM +0800, Wei Yang wrote:
>> On Sat, Jan 11, 2020 at 03:03:52AM +0300, Kirill A. Shutemov wrote:
>> >On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
>> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> Current implementation may face a race condition.
>> >>
>> >> For example, the potential race would be:
>> >>
>> >> CPU1 CPU2
>> >> mem_cgroup_move_account split_huge_page_to_list
>> >> !list_empty
>> >> lock
>> >> !list_empty
>> >> list_del
>> >> unlock
>> >> lock
>> >> # !list_empty might not hold anymore
>> >> list_del_init
>> >> unlock
>> >
>> >I don't think this particular race is possible. Both parties take page
>> >lock before messing with deferred queue, but anytway:
>>
>> I am afraid not. Page lock is per page, while defer queue is per pgdate or
>> memcg.
>>
>> It is possible two page in the same pgdate or memcg grab page lock
>> respectively and then access the same defer queue concurrently.
>
>Look closer on the list_empty() argument. It's list_head local to the
>page. Too different pages can be handled in parallel without any problem
>in this particular scenario. As long as we as we modify it under the lock.
>
>Said that, page lock here was somewhat accidential and I still belive we
>need to move the check under the lock anyway.
>
If my understanding is correct, you agree with my statement?
>--
> Kirill A. Shutemov
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-13 0:44 ` Wei Yang
@ 2020-01-13 7:36 ` Kirill A. Shutemov
2020-01-13 8:23 ` Wei Yang
0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-01-13 7:36 UTC (permalink / raw)
To: Wei Yang
Cc: hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
rientjes
On Mon, Jan 13, 2020 at 08:44:57AM +0800, Wei Yang wrote:
> >> It is possible two page in the same pgdate or memcg grab page lock
> >> respectively and then access the same defer queue concurrently.
>
> If my understanding is correct, you agree with my statement?
Which one? If the one above then no. list_empty only accesses list_head
for the struct page, not the queue.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-13 7:36 ` Kirill A. Shutemov
@ 2020-01-13 8:23 ` Wei Yang
0 siblings, 0 replies; 16+ messages in thread
From: Wei Yang @ 2020-01-13 8:23 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Wei Yang, hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm,
linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
rientjes
On Mon, Jan 13, 2020 at 10:36:14AM +0300, Kirill A. Shutemov wrote:
>On Mon, Jan 13, 2020 at 08:44:57AM +0800, Wei Yang wrote:
>> >> It is possible two page in the same pgdate or memcg grab page lock
>> >> respectively and then access the same defer queue concurrently.
>>
>> If my understanding is correct, you agree with my statement?
>
>Which one? If the one above then no. list_empty only accesses list_head
>for the struct page, not the queue.
>
Ah, I get your point.
>--
> Kirill A. Shutemov
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-11 0:03 ` Kirill A. Shutemov
2020-01-12 2:28 ` Wei Yang
@ 2020-01-14 9:31 ` Michal Hocko
2020-01-14 10:31 ` Kirill A. Shutemov
1 sibling, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2020-01-14 9:31 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Wei Yang, hannes, vdavydov.dev, akpm, cgroups, linux-mm,
linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
rientjes
On Sat 11-01-20 03:03:52, Kirill A. Shutemov wrote:
> On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
> > As all the other places, we grab the lock before manipulate the defer list.
> > Current implementation may face a race condition.
> >
> > For example, the potential race would be:
> >
> > CPU1 CPU2
> > mem_cgroup_move_account split_huge_page_to_list
> > !list_empty
> > lock
> > !list_empty
> > list_del
> > unlock
> > lock
> > # !list_empty might not hold anymore
> > list_del_init
> > unlock
>
> I don't think this particular race is possible. Both parties take page
> lock before messing with deferred queue, but anytway:
>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
I am confused, if the above race is not possible then what would be a
real race? We really do not want to have a patch with a misleading
changelog, do we?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-14 9:31 ` Michal Hocko
@ 2020-01-14 10:31 ` Kirill A. Shutemov
2020-01-14 10:59 ` Kirill A. Shutemov
0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-01-14 10:31 UTC (permalink / raw)
To: Michal Hocko
Cc: Wei Yang, hannes, vdavydov.dev, akpm, cgroups, linux-mm,
linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
rientjes
On Tue, Jan 14, 2020 at 10:31:22AM +0100, Michal Hocko wrote:
> On Sat 11-01-20 03:03:52, Kirill A. Shutemov wrote:
> > On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
> > > As all the other places, we grab the lock before manipulate the defer list.
> > > Current implementation may face a race condition.
> > >
> > > For example, the potential race would be:
> > >
> > > CPU1 CPU2
> > > mem_cgroup_move_account split_huge_page_to_list
> > > !list_empty
> > > lock
> > > !list_empty
> > > list_del
> > > unlock
> > > lock
> > > # !list_empty might not hold anymore
> > > list_del_init
> > > unlock
> >
> > I don't think this particular race is possible. Both parties take page
> > lock before messing with deferred queue, but anytway:
> >
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
> I am confused, if the above race is not possible then what would be a
> real race? We really do not want to have a patch with a misleading
> changelog, do we?
The alternative is to make sure that all page_deferred_list() called with
page lock taken.
I'll look into it.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-14 10:31 ` Kirill A. Shutemov
@ 2020-01-14 10:59 ` Kirill A. Shutemov
2020-01-14 20:57 ` David Rientjes
2020-01-15 1:07 ` Wei Yang
0 siblings, 2 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-01-14 10:59 UTC (permalink / raw)
To: Michal Hocko
Cc: Wei Yang, hannes, vdavydov.dev, akpm, cgroups, linux-mm,
linux-kernel, kirill.shutemov, yang.shi, alexander.duyck,
rientjes
On Tue, Jan 14, 2020 at 01:31:12PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jan 14, 2020 at 10:31:22AM +0100, Michal Hocko wrote:
> > On Sat 11-01-20 03:03:52, Kirill A. Shutemov wrote:
> > > On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
> > > > As all the other places, we grab the lock before manipulate the defer list.
> > > > Current implementation may face a race condition.
> > > >
> > > > For example, the potential race would be:
> > > >
> > > > CPU1 CPU2
> > > > mem_cgroup_move_account split_huge_page_to_list
> > > > !list_empty
> > > > lock
> > > > !list_empty
> > > > list_del
> > > > unlock
> > > > lock
> > > > # !list_empty might not hold anymore
> > > > list_del_init
> > > > unlock
> > >
> > > I don't think this particular race is possible. Both parties take page
> > > lock before messing with deferred queue, but anytway:
> > >
> > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >
> > I am confused, if the above race is not possible then what would be a
> > real race? We really do not want to have a patch with a misleading
> > changelog, do we?
>
> The alternative is to make sure that all page_deferred_list() called with
> page lock taken.
>
> I'll look into it.
split_huge_page_to_list() has page lock taken.
free_transhuge_page() is in the free path and doesn't susceptible to the
race.
deferred_split_scan() is trickier. list_move() should be safe against
list_empty() as it will not produce false-positive list_empty().
list_del_init() *should* (correct me if I'm wrong) be safe because the page
is freeing and memcg will not touch the page anymore.
deferred_split_huge_page() is a problematic one. It called from
page_remove_rmap() path witch does require page lock. I don't see any
obvious way to exclude race with mem_cgroup_move_account() here.
Anybody else?
Wei, could you rewrite the commit message with deferred_split_huge_page()
as a race source instead of split_huge_page_to_list()?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-14 10:59 ` Kirill A. Shutemov
@ 2020-01-14 20:57 ` David Rientjes
2020-01-15 1:19 ` Wei Yang
2020-01-15 1:07 ` Wei Yang
1 sibling, 1 reply; 16+ messages in thread
From: David Rientjes @ 2020-01-14 20:57 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Michal Hocko, Wei Yang, hannes, vdavydov.dev, akpm, cgroups,
linux-mm, linux-kernel, kirill.shutemov, yang.shi,
alexander.duyck
On Tue, 14 Jan 2020, Kirill A. Shutemov wrote:
> split_huge_page_to_list() has page lock taken.
>
> free_transhuge_page() is in the free path and doesn't susceptible to the
> race.
>
> deferred_split_scan() is trickier. list_move() should be safe against
> list_empty() as it will not produce false-positive list_empty().
> list_del_init() *should* (correct me if I'm wrong) be safe because the page
> is freeing and memcg will not touch the page anymore.
>
> deferred_split_huge_page() is a problematic one. It called from
> page_remove_rmap() path witch does require page lock. I don't see any
> obvious way to exclude race with mem_cgroup_move_account() here.
> Anybody else?
>
> Wei, could you rewrite the commit message with deferred_split_huge_page()
> as a race source instead of split_huge_page_to_list()?
>
I think describing the race in terms of deferred_split_huge_page() makes
the most sense and I'd prefer a cc to stable for 5.4+. Even getting the
split_queue_len, which is unsigned long, to underflow because of a
list_empty(page_deferred_list()) check that is no longer accurate after
the lock is taken would be a significant issue for shrinkers.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-14 10:59 ` Kirill A. Shutemov
2020-01-14 20:57 ` David Rientjes
@ 2020-01-15 1:07 ` Wei Yang
2020-01-15 3:26 ` David Rientjes
1 sibling, 1 reply; 16+ messages in thread
From: Wei Yang @ 2020-01-15 1:07 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Michal Hocko, Wei Yang, hannes, vdavydov.dev, akpm, cgroups,
linux-mm, linux-kernel, kirill.shutemov, yang.shi,
alexander.duyck, rientjes
On Tue, Jan 14, 2020 at 01:59:21PM +0300, Kirill A. Shutemov wrote:
>On Tue, Jan 14, 2020 at 01:31:12PM +0300, Kirill A. Shutemov wrote:
>> On Tue, Jan 14, 2020 at 10:31:22AM +0100, Michal Hocko wrote:
>> > On Sat 11-01-20 03:03:52, Kirill A. Shutemov wrote:
>> > > On Thu, Jan 09, 2020 at 10:30:54PM +0800, Wei Yang wrote:
>> > > > As all the other places, we grab the lock before manipulate the defer list.
>> > > > Current implementation may face a race condition.
>> > > >
>> > > > For example, the potential race would be:
>> > > >
>> > > > CPU1 CPU2
>> > > > mem_cgroup_move_account split_huge_page_to_list
>> > > > !list_empty
>> > > > lock
>> > > > !list_empty
>> > > > list_del
>> > > > unlock
>> > > > lock
>> > > > # !list_empty might not hold anymore
>> > > > list_del_init
>> > > > unlock
>> > >
>> > > I don't think this particular race is possible. Both parties take page
>> > > lock before messing with deferred queue, but anytway:
>> > >
>> > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> >
>> > I am confused, if the above race is not possible then what would be a
>> > real race? We really do not want to have a patch with a misleading
>> > changelog, do we?
>>
>> The alternative is to make sure that all page_deferred_list() called with
>> page lock taken.
>>
>> I'll look into it.
>
>split_huge_page_to_list() has page lock taken.
>
>free_transhuge_page() is in the free path and doesn't susceptible to the
>race.
>
>deferred_split_scan() is trickier. list_move() should be safe against
>list_empty() as it will not produce false-positive list_empty().
>list_del_init() *should* (correct me if I'm wrong) be safe because the page
>is freeing and memcg will not touch the page anymore.
>
>deferred_split_huge_page() is a problematic one. It called from
>page_remove_rmap() path witch does require page lock. I don't see any
>obvious way to exclude race with mem_cgroup_move_account() here.
>Anybody else?
If my understanding is correct, the reason is deferred_split_huge_page()
doesn't has page lock taken, right?
>
>Wei, could you rewrite the commit message with deferred_split_huge_page()
>as a race source instead of split_huge_page_to_list()?
>
>--
> Kirill A. Shutemov
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-14 20:57 ` David Rientjes
@ 2020-01-15 1:19 ` Wei Yang
0 siblings, 0 replies; 16+ messages in thread
From: Wei Yang @ 2020-01-15 1:19 UTC (permalink / raw)
To: David Rientjes
Cc: Kirill A. Shutemov, Michal Hocko, Wei Yang, hannes, vdavydov.dev,
akpm, cgroups, linux-mm, linux-kernel, kirill.shutemov, yang.shi,
alexander.duyck
On Tue, Jan 14, 2020 at 12:57:22PM -0800, David Rientjes wrote:
>On Tue, 14 Jan 2020, Kirill A. Shutemov wrote:
>
>> split_huge_page_to_list() has page lock taken.
>>
>> free_transhuge_page() is in the free path and doesn't susceptible to the
>> race.
>>
>> deferred_split_scan() is trickier. list_move() should be safe against
>> list_empty() as it will not produce false-positive list_empty().
>> list_del_init() *should* (correct me if I'm wrong) be safe because the page
>> is freeing and memcg will not touch the page anymore.
>>
>> deferred_split_huge_page() is a problematic one. It called from
>> page_remove_rmap() path witch does require page lock. I don't see any
>> obvious way to exclude race with mem_cgroup_move_account() here.
>> Anybody else?
>>
>> Wei, could you rewrite the commit message with deferred_split_huge_page()
>> as a race source instead of split_huge_page_to_list()?
>>
>
>I think describing the race in terms of deferred_split_huge_page() makes
>the most sense and I'd prefer a cc to stable for 5.4+. Even getting the
>split_queue_len, which is unsigned long, to underflow because of a
>list_empty(page_deferred_list()) check that is no longer accurate after
>the lock is taken would be a significant issue for shrinkers.
Oh, you are right. Even the list is not corrupted between
deferred_split_scan() and mem_cgroup_move_account(), split_queue_len would be
in a wrong state.
Hmm... to some extend, the page lock complicates the picture a little here,
even it helps in some cases.
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch v2] mm: thp: grab the lock before manipulation defer list
2020-01-15 1:07 ` Wei Yang
@ 2020-01-15 3:26 ` David Rientjes
0 siblings, 0 replies; 16+ messages in thread
From: David Rientjes @ 2020-01-15 3:26 UTC (permalink / raw)
To: Wei Yang
Cc: Kirill A. Shutemov, Michal Hocko, hannes, vdavydov.dev, akpm,
cgroups, linux-mm, linux-kernel, kirill.shutemov, yang.shi,
alexander.duyck
On Wed, 15 Jan 2020, Wei Yang wrote:
> >split_huge_page_to_list() has page lock taken.
> >
> >free_transhuge_page() is in the free path and doesn't susceptible to the
> >race.
> >
> >deferred_split_scan() is trickier. list_move() should be safe against
> >list_empty() as it will not produce false-positive list_empty().
> >list_del_init() *should* (correct me if I'm wrong) be safe because the page
> >is freeing and memcg will not touch the page anymore.
> >
> >deferred_split_huge_page() is a problematic one. It called from
> >page_remove_rmap() path witch does require page lock. I don't see any
> >obvious way to exclude race with mem_cgroup_move_account() here.
> >Anybody else?
>
> If my understanding is correct, the reason is deferred_split_huge_page()
> doesn't has page lock taken, right?
>
I think the fix that you have proposed has inspired some deeper looks at
the locking around the deferred split queue and the hope was that perhaps
this could be protected by the page lock but it was found that at least in
one path that isn't taken. So I believe your fix is still needed and any
possible optimizations in this area can be proposed on top.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-01-15 3:26 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-09 14:30 [Patch v2] mm: thp: grab the lock before manipulation defer list Wei Yang
2020-01-09 18:52 ` David Rientjes
2020-01-09 18:58 ` Michal Hocko
2020-01-11 0:03 ` Kirill A. Shutemov
2020-01-12 2:28 ` Wei Yang
2020-01-12 22:57 ` Kirill A. Shutemov
2020-01-13 0:44 ` Wei Yang
2020-01-13 7:36 ` Kirill A. Shutemov
2020-01-13 8:23 ` Wei Yang
2020-01-14 9:31 ` Michal Hocko
2020-01-14 10:31 ` Kirill A. Shutemov
2020-01-14 10:59 ` Kirill A. Shutemov
2020-01-14 20:57 ` David Rientjes
2020-01-15 1:19 ` Wei Yang
2020-01-15 1:07 ` Wei Yang
2020-01-15 3:26 ` David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).