linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
@ 2022-03-30 17:26 Waiman Long
  2022-03-30 19:46 ` Roman Gushchin
  2022-03-31  2:14 ` Andrew Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Waiman Long @ 2022-03-30 17:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Muchun Song, Roman Gushchin, Waiman Long

Muchun Song found out there could be a race between list_lru_add()
and memcg_reparent_list_lru_node() causing the later function to miss
reparenting of a lru entry as shown below:

CPU0:                           CPU1:
list_lru_add()
    spin_lock(&nlru->lock)
    l = list_lru_from_kmem(memcg)
                                memcg_reparent_objcgs(memcg)
                                memcg_reparent_list_lrus(memcg)
                                    memcg_reparent_list_lru()
                                        memcg_reparent_list_lru_node()
                                            if (!READ_ONCE(nlru->nr_items))
                                                // Miss reparenting
                                                return
    // Assume 0->1
    l->nr_items++
    // Assume 0->1
    nlru->nr_items++

Though it is not likely that a list_lru_node that has 0 item suddenly
has a newly added lru entry at the end of its life. The race is still
theoretically possible.

With the lock/unlock pair used within the percpu_ref_kill() which is
the last function call of memcg_reparent_objcgs(), any read issued
in memcg_reparent_list_lru_node() will not be reordered before the
reparenting of objcgs.

Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check
to ensure that either the reading of nr_items is valid or the racing
list_lru_add() will see the reparented objcg.

Fixes: 405cc51fc104 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
Reported-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/list_lru.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index c669d87001a6..08ff54ffabd6 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -395,10 +395,33 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
 	struct list_lru_one *src, *dst;
 
 	/*
-	 * If there is no lru entry in this nlru, we can skip it immediately.
+	 * With the lock/unlock pair used within the percpu_ref_kill()
+	 * which is the last function call of memcg_reparent_objcgs(), any
+	 * read issued here will not be reordered before the reparenting
+	 * of objcgs.
+	 *
+	 * Assuming a racing list_lru_add():
+	 * list_lru_add()
+	 *				<- memcg_reparent_list_lru_node()
+	 *   spin_lock(&nlru->lock)
+	 *   l = list_lru_from_kmem(memcg)
+	 *   nlru->nr_items++
+	 *   spin_unlock(&nlru->lock)
+	 *				<- memcg_reparent_list_lru_node()
+	 *
+	 * The !spin_is_locked(&nlru->lock) check is true means it is
+	 * either before the spin_lock() or after the spin_unlock(). In the
+	 * former case, list_lru_add() will see the reparented objcg and so
+	 * won't touch the lru to be reparented. In the later case, it will
+	 * see the updated nr_items. So we can use the optimization that if
+	 * there is no lru entry in this nlru, skip it immediately.
 	 */
-	if (!READ_ONCE(nlru->nr_items))
-		return;
+	if (!spin_is_locked(&nlru->lock)) {
+		/* nr_items read must be ordered after nlru->lock */
+		smp_rmb();
+		if (!READ_ONCE(nlru->nr_items))
+			return;
+	}
 
 	/*
 	 * Since list_lru_{add,del} may be called under an IRQ-safe lock,
@@ -407,7 +430,7 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
 	spin_lock_irq(&nlru->lock);
 
 	src = list_lru_from_memcg_idx(lru, nid, src_idx);
-	if (!src)
+	if (!src || !src->nr_items)
 		goto out;
 	dst = list_lru_from_memcg_idx(lru, nid, dst_idx);
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
  2022-03-30 17:26 [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node() Waiman Long
@ 2022-03-30 19:46 ` Roman Gushchin
  2022-03-31  2:14 ` Andrew Morton
  1 sibling, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2022-03-30 19:46 UTC (permalink / raw)
  To: Waiman Long; +Cc: Andrew Morton, linux-mm, linux-kernel, Muchun Song

On Wed, Mar 30, 2022 at 01:26:46PM -0400, Waiman Long wrote:
> Muchun Song found out there could be a race between list_lru_add()
> and memcg_reparent_list_lru_node() causing the later function to miss
> reparenting of a lru entry as shown below:
> 
> CPU0:                           CPU1:
> list_lru_add()
>     spin_lock(&nlru->lock)
>     l = list_lru_from_kmem(memcg)
>                                 memcg_reparent_objcgs(memcg)
>                                 memcg_reparent_list_lrus(memcg)
>                                     memcg_reparent_list_lru()
>                                         memcg_reparent_list_lru_node()
>                                             if (!READ_ONCE(nlru->nr_items))
>                                                 // Miss reparenting
>                                                 return
>     // Assume 0->1
>     l->nr_items++
>     // Assume 0->1
>     nlru->nr_items++
> 
> Though it is not likely that a list_lru_node that has 0 item suddenly
> has a newly added lru entry at the end of its life. The race is still
> theoretically possible.
> 
> With the lock/unlock pair used within the percpu_ref_kill() which is
> the last function call of memcg_reparent_objcgs(), any read issued
> in memcg_reparent_list_lru_node() will not be reordered before the
> reparenting of objcgs.
> 
> Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check
> to ensure that either the reading of nr_items is valid or the racing
> list_lru_add() will see the reparented objcg.
> 
> Fixes: 405cc51fc104 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
> Reported-by: Muchun Song <songmuchun@bytedance.com>
> Signed-off-by: Waiman Long <longman@redhat.com>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
  2022-03-30 17:26 [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node() Waiman Long
  2022-03-30 19:46 ` Roman Gushchin
@ 2022-03-31  2:14 ` Andrew Morton
  2022-03-31  2:48   ` Roman Gushchin
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2022-03-31  2:14 UTC (permalink / raw)
  To: Waiman Long; +Cc: linux-mm, linux-kernel, Muchun Song, Roman Gushchin

On Wed, 30 Mar 2022 13:26:46 -0400 Waiman Long <longman@redhat.com> wrote:

> Muchun Song found out there could be a race between list_lru_add()
> and memcg_reparent_list_lru_node() causing the later function to miss
> reparenting of a lru entry as shown below:
> 
> CPU0:                           CPU1:
> list_lru_add()
>     spin_lock(&nlru->lock)
>     l = list_lru_from_kmem(memcg)
>                                 memcg_reparent_objcgs(memcg)
>                                 memcg_reparent_list_lrus(memcg)
>                                     memcg_reparent_list_lru()
>                                         memcg_reparent_list_lru_node()
>                                             if (!READ_ONCE(nlru->nr_items))
>                                                 // Miss reparenting
>                                                 return
>     // Assume 0->1
>     l->nr_items++
>     // Assume 0->1
>     nlru->nr_items++
> 
> Though it is not likely that a list_lru_node that has 0 item suddenly
> has a newly added lru entry at the end of its life. The race is still
> theoretically possible.
> 
> With the lock/unlock pair used within the percpu_ref_kill() which is
> the last function call of memcg_reparent_objcgs(), any read issued
> in memcg_reparent_list_lru_node() will not be reordered before the
> reparenting of objcgs.
> 
> Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check
> to ensure that either the reading of nr_items is valid or the racing
> list_lru_add() will see the reparented objcg.
> 
> ...
>
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -395,10 +395,33 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
>  	struct list_lru_one *src, *dst;
>  
>  	/*
> -	 * If there is no lru entry in this nlru, we can skip it immediately.
> +	 * With the lock/unlock pair used within the percpu_ref_kill()
> +	 * which is the last function call of memcg_reparent_objcgs(), any
> +	 * read issued here will not be reordered before the reparenting
> +	 * of objcgs.
> +	 *
> +	 * Assuming a racing list_lru_add():
> +	 * list_lru_add()
> +	 *				<- memcg_reparent_list_lru_node()
> +	 *   spin_lock(&nlru->lock)
> +	 *   l = list_lru_from_kmem(memcg)
> +	 *   nlru->nr_items++
> +	 *   spin_unlock(&nlru->lock)
> +	 *				<- memcg_reparent_list_lru_node()
> +	 *
> +	 * The !spin_is_locked(&nlru->lock) check is true means it is
> +	 * either before the spin_lock() or after the spin_unlock(). In the
> +	 * former case, list_lru_add() will see the reparented objcg and so
> +	 * won't touch the lru to be reparented. In the later case, it will
> +	 * see the updated nr_items. So we can use the optimization that if
> +	 * there is no lru entry in this nlru, skip it immediately.
>  	 */
> -	if (!READ_ONCE(nlru->nr_items))
> -		return;
> +	if (!spin_is_locked(&nlru->lock)) {

ick.

> +		/* nr_items read must be ordered after nlru->lock */
> +		smp_rmb();
> +		if (!READ_ONCE(nlru->nr_items))
> +			return;
> +	}

include/linux/spinlock_up.h has

#define arch_spin_is_locked(lock)	((void)(lock), 0)

so this `if' will always be true on CONFIG_SMP=n.  Will the kernel
still work?

At the very least let's have changelogging and commenting explaining
that we've actually thought about this.

Preferably, can we fix this hole properly and avoid this hack?  There is
a reason for this:

hp2:/usr/src/25> grep spin_is_locked mm/*.c
hp2:/usr/src/25> 




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
  2022-03-31  2:14 ` Andrew Morton
@ 2022-03-31  2:48   ` Roman Gushchin
  2022-03-31  6:39     ` Shakeel Butt
  0 siblings, 1 reply; 10+ messages in thread
From: Roman Gushchin @ 2022-03-31  2:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Waiman Long, linux-mm, linux-kernel, Muchun Song



> On Mar 30, 2022, at 7:14 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Wed, 30 Mar 2022 13:26:46 -0400 Waiman Long <longman@redhat.com> wrote:
> 
>> Muchun Song found out there could be a race between list_lru_add()
>> and memcg_reparent_list_lru_node() causing the later function to miss
>> reparenting of a lru entry as shown below:
>> 
>> CPU0:                           CPU1:
>> list_lru_add()
>>    spin_lock(&nlru->lock)
>>    l = list_lru_from_kmem(memcg)
>>                                memcg_reparent_objcgs(memcg)
>>                                memcg_reparent_list_lrus(memcg)
>>                                    memcg_reparent_list_lru()
>>                                        memcg_reparent_list_lru_node()
>>                                            if (!READ_ONCE(nlru->nr_items))
>>                                                // Miss reparenting
>>                                                return
>>    // Assume 0->1
>>    l->nr_items++
>>    // Assume 0->1
>>    nlru->nr_items++
>> 
>> Though it is not likely that a list_lru_node that has 0 item suddenly
>> has a newly added lru entry at the end of its life. The race is still
>> theoretically possible.
>> 
>> With the lock/unlock pair used within the percpu_ref_kill() which is
>> the last function call of memcg_reparent_objcgs(), any read issued
>> in memcg_reparent_list_lru_node() will not be reordered before the
>> reparenting of objcgs.
>> 
>> Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check
>> to ensure that either the reading of nr_items is valid or the racing
>> list_lru_add() will see the reparented objcg.
>> 
>> ...
>> 
>> --- a/mm/list_lru.c
>> +++ b/mm/list_lru.c
>> @@ -395,10 +395,33 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
>>    struct list_lru_one *src, *dst;
>> 
>>    /*
>> -     * If there is no lru entry in this nlru, we can skip it immediately.
>> +     * With the lock/unlock pair used within the percpu_ref_kill()
>> +     * which is the last function call of memcg_reparent_objcgs(), any
>> +     * read issued here will not be reordered before the reparenting
>> +     * of objcgs.
>> +     *
>> +     * Assuming a racing list_lru_add():
>> +     * list_lru_add()
>> +     *                <- memcg_reparent_list_lru_node()
>> +     *   spin_lock(&nlru->lock)
>> +     *   l = list_lru_from_kmem(memcg)
>> +     *   nlru->nr_items++
>> +     *   spin_unlock(&nlru->lock)
>> +     *                <- memcg_reparent_list_lru_node()
>> +     *
>> +     * The !spin_is_locked(&nlru->lock) check is true means it is
>> +     * either before the spin_lock() or after the spin_unlock(). In the
>> +     * former case, list_lru_add() will see the reparented objcg and so
>> +     * won't touch the lru to be reparented. In the later case, it will
>> +     * see the updated nr_items. So we can use the optimization that if
>> +     * there is no lru entry in this nlru, skip it immediately.
>>     */
>> -    if (!READ_ONCE(nlru->nr_items))
>> -        return;
>> +    if (!spin_is_locked(&nlru->lock)) {
> 
> ick.
> 
>> +        /* nr_items read must be ordered after nlru->lock */
>> +        smp_rmb();
>> +        if (!READ_ONCE(nlru->nr_items))
>> +            return;
>> +    }
> 
> include/linux/spinlock_up.h has
> 
> #define arch_spin_is_locked(lock)    ((void)(lock), 0)
> 
> so this `if' will always be true on CONFIG_SMP=n.  Will the kernel
> still work?

I guess yes, because this race is not possible on a !smp machine.

> 
> At the very least let's have changelogging and commenting explaining
> that we've actually thought about this.
> 
> Preferably, can we fix this hole properly and avoid this hack?  There is
> a reason for this:
> 
> hp2:/usr/src/25> grep spin_is_locked mm/*.c
> hp2:/usr/src/25> 


But honestly, I’d drop the original optimization together with the fix, if only there is no _real world_ data on the problem and the improvement. It seems like it has started as a nice simple improvement, but the race makes it complex and probably not worth the added complexity and fragility.

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
  2022-03-31  2:48   ` Roman Gushchin
@ 2022-03-31  6:39     ` Shakeel Butt
  2022-03-31  7:46       ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2022-03-31  6:39 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Waiman Long, linux-mm, linux-kernel, Muchun Song

On Wed, Mar 30, 2022 at 07:48:45PM -0700, Roman Gushchin wrote:
> 
> 
[...]
> 
> 
> But honestly, I’d drop the original optimization together with the fix, if only there is no _real world_ data on the problem and the improvement. It seems like it has started as a nice simple improvement, but the race makes it complex and probably not worth the added complexity and fragility.

I agree with dropping the original optimization as it is not really
fixing an observed issue which may justify adding some complexity.

thanks,
Shakeel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
  2022-03-31  6:39     ` Shakeel Butt
@ 2022-03-31  7:46       ` Michal Hocko
  2022-04-01  1:11         ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2022-03-31  7:46 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Roman Gushchin, Andrew Morton, Waiman Long, linux-mm,
	linux-kernel, Muchun Song

On Thu 31-03-22 06:39:56, Shakeel Butt wrote:
> On Wed, Mar 30, 2022 at 07:48:45PM -0700, Roman Gushchin wrote:
> > 
> > 
> [...]
> > 
> > 
> > But honestly, I’d drop the original optimization together with
> > the fix, if only there is no _real world_ data on the problem and
> > the improvement. It seems like it has started as a nice simple
> > improvement, but the race makes it complex and probably not worth
> > the added complexity and fragility.
> 
> I agree with dropping the original optimization as it is not really
> fixing an observed issue which may justify adding some complexity.

Completely agreed. The patch as it is proposed is not really acceptable
IMHO and I have to say I am worried that this is not the first time we
are in a situation when a follow up fixes or unrelated patches are
growing in complexity to fit on top of a performance optimizations which
do not refer to any actual numbers.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
  2022-03-31  7:46       ` Michal Hocko
@ 2022-04-01  1:11         ` Andrew Morton
  2022-04-01  4:23           ` Shakeel Butt
                             ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrew Morton @ 2022-04-01  1:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Shakeel Butt, Roman Gushchin, Waiman Long, linux-mm,
	linux-kernel, Muchun Song

On Thu, 31 Mar 2022 09:46:52 +0200 Michal Hocko <mhocko@suse.com> wrote:

> On Thu 31-03-22 06:39:56, Shakeel Butt wrote:
> > On Wed, Mar 30, 2022 at 07:48:45PM -0700, Roman Gushchin wrote:
> > > 
> > > 
> > [...]
> > > 
> > > 
> > > But honestly, I’d drop the original optimization together with
> > > the fix, if only there is no _real world_ data on the problem and
> > > the improvement. It seems like it has started as a nice simple
> > > improvement, but the race makes it complex and probably not worth
> > > the added complexity and fragility.
> > 
> > I agree with dropping the original optimization as it is not really
> > fixing an observed issue which may justify adding some complexity.
> 
> Completely agreed. The patch as it is proposed is not really acceptable
> IMHO and I have to say I am worried that this is not the first time we
> are in a situation when a follow up fixes or unrelated patches are
> growing in complexity to fit on top of a performance optimizations which
> do not refer to any actual numbers.

Yup.  I did this:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm/list_lru.c: revert "mm/list_lru: optimize memcg_reparent_list_lru_node()"

405cc51fc1049c73 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
has subtle races which are proving ugly to fix.  Revert the original
optimization.  If quantitative testing indicates that we have a
significant problem here then other implementations can be looked at.

Fixes: 405cc51fc1049c73 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
Cc: Waiman Long <longman@redhat.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/list_lru.c |    6 ------
 1 file changed, 6 deletions(-)

--- a/mm/list_lru.c~revert-1
+++ a/mm/list_lru.c
@@ -395,12 +395,6 @@ static void memcg_reparent_list_lru_node
 	struct list_lru_one *src, *dst;
 
 	/*
-	 * If there is no lru entry in this nlru, we can skip it immediately.
-	 */
-	if (!READ_ONCE(nlru->nr_items))
-		return;
-
-	/*
 	 * Since list_lru_{add,del} may be called under an IRQ-safe lock,
 	 * we have to use IRQ-safe primitives here to avoid deadlock.
 	 */
_



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
  2022-04-01  1:11         ` Andrew Morton
@ 2022-04-01  4:23           ` Shakeel Butt
  2022-04-01  4:30           ` Muchun Song
  2022-04-01 11:28           ` Michal Hocko
  2 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2022-04-01  4:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Roman Gushchin, Waiman Long, Linux MM, LKML, Muchun Song

On Thu, Mar 31, 2022 at 6:11 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 31 Mar 2022 09:46:52 +0200 Michal Hocko <mhocko@suse.com> wrote:
>
> > On Thu 31-03-22 06:39:56, Shakeel Butt wrote:
> > > On Wed, Mar 30, 2022 at 07:48:45PM -0700, Roman Gushchin wrote:
> > > >
> > > >
> > > [...]
> > > >
> > > >
> > > > But honestly, I’d drop the original optimization together with
> > > > the fix, if only there is no _real world_ data on the problem and
> > > > the improvement. It seems like it has started as a nice simple
> > > > improvement, but the race makes it complex and probably not worth
> > > > the added complexity and fragility.
> > >
> > > I agree with dropping the original optimization as it is not really
> > > fixing an observed issue which may justify adding some complexity.
> >
> > Completely agreed. The patch as it is proposed is not really acceptable
> > IMHO and I have to say I am worried that this is not the first time we
> > are in a situation when a follow up fixes or unrelated patches are
> > growing in complexity to fit on top of a performance optimizations which
> > do not refer to any actual numbers.
>
> Yup.  I did this:
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm/list_lru.c: revert "mm/list_lru: optimize memcg_reparent_list_lru_node()"
>
> 405cc51fc1049c73 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
> has subtle races which are proving ugly to fix.  Revert the original
> optimization.  If quantitative testing indicates that we have a
> significant problem here then other implementations can be looked at.
>
> Fixes: 405cc51fc1049c73 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
> Cc: Waiman Long <longman@redhat.com>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Muchun Song <songmuchun@bytedance.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Acked-by: Shakeel Butt <shakeelb@google.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
  2022-04-01  1:11         ` Andrew Morton
  2022-04-01  4:23           ` Shakeel Butt
@ 2022-04-01  4:30           ` Muchun Song
  2022-04-01 11:28           ` Michal Hocko
  2 siblings, 0 replies; 10+ messages in thread
From: Muchun Song @ 2022-04-01  4:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Shakeel Butt, Roman Gushchin, Waiman Long,
	Linux Memory Management List, LKML

On Fri, Apr 1, 2022 at 9:11 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 31 Mar 2022 09:46:52 +0200 Michal Hocko <mhocko@suse.com> wrote:
>
> > On Thu 31-03-22 06:39:56, Shakeel Butt wrote:
> > > On Wed, Mar 30, 2022 at 07:48:45PM -0700, Roman Gushchin wrote:
> > > >
> > > >
> > > [...]
> > > >
> > > >
> > > > But honestly, I’d drop the original optimization together with
> > > > the fix, if only there is no _real world_ data on the problem and
> > > > the improvement. It seems like it has started as a nice simple
> > > > improvement, but the race makes it complex and probably not worth
> > > > the added complexity and fragility.
> > >
> > > I agree with dropping the original optimization as it is not really
> > > fixing an observed issue which may justify adding some complexity.
> >
> > Completely agreed. The patch as it is proposed is not really acceptable
> > IMHO and I have to say I am worried that this is not the first time we
> > are in a situation when a follow up fixes or unrelated patches are
> > growing in complexity to fit on top of a performance optimizations which
> > do not refer to any actual numbers.
>
> Yup.  I did this:
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm/list_lru.c: revert "mm/list_lru: optimize memcg_reparent_list_lru_node()"
>
> 405cc51fc1049c73 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
> has subtle races which are proving ugly to fix.  Revert the original
> optimization.  If quantitative testing indicates that we have a
> significant problem here then other implementations can be looked at.
>
> Fixes: 405cc51fc1049c73 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
> Cc: Waiman Long <longman@redhat.com>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Muchun Song <songmuchun@bytedance.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
  2022-04-01  1:11         ` Andrew Morton
  2022-04-01  4:23           ` Shakeel Butt
  2022-04-01  4:30           ` Muchun Song
@ 2022-04-01 11:28           ` Michal Hocko
  2 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2022-04-01 11:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shakeel Butt, Roman Gushchin, Waiman Long, linux-mm,
	linux-kernel, Muchun Song

On Thu 31-03-22 18:11:26, Andrew Morton wrote:
[...]
> Yup.  I did this:
> 
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm/list_lru.c: revert "mm/list_lru: optimize memcg_reparent_list_lru_node()"
> 
> 405cc51fc1049c73 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
> has subtle races which are proving ugly to fix.  Revert the original
> optimization.  If quantitative testing indicates that we have a
> significant problem here then other implementations can be looked at.
> 
> Fixes: 405cc51fc1049c73 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
> Cc: Waiman Long <longman@redhat.com>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Muchun Song <songmuchun@bytedance.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
> 
>  mm/list_lru.c |    6 ------
>  1 file changed, 6 deletions(-)
> 
> --- a/mm/list_lru.c~revert-1
> +++ a/mm/list_lru.c
> @@ -395,12 +395,6 @@ static void memcg_reparent_list_lru_node
>  	struct list_lru_one *src, *dst;
>  
>  	/*
> -	 * If there is no lru entry in this nlru, we can skip it immediately.
> -	 */
> -	if (!READ_ONCE(nlru->nr_items))
> -		return;
> -
> -	/*
>  	 * Since list_lru_{add,del} may be called under an IRQ-safe lock,
>  	 * we have to use IRQ-safe primitives here to avoid deadlock.
>  	 */
> _

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-04-01 11:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-30 17:26 [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node() Waiman Long
2022-03-30 19:46 ` Roman Gushchin
2022-03-31  2:14 ` Andrew Morton
2022-03-31  2:48   ` Roman Gushchin
2022-03-31  6:39     ` Shakeel Butt
2022-03-31  7:46       ` Michal Hocko
2022-04-01  1:11         ` Andrew Morton
2022-04-01  4:23           ` Shakeel Butt
2022-04-01  4:30           ` Muchun Song
2022-04-01 11:28           ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).