linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Waiman Long <longman@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Muchun Song <songmuchun@bytedance.com>
Subject: Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()
Date: Wed, 30 Mar 2022 19:48:45 -0700	[thread overview]
Message-ID: <89B53D3A-FCC5-4107-8D49-81D5B9AE5172@linux.dev> (raw)
In-Reply-To: <20220330191440.1cc1b2de2b849d1ba93d2ba7@linux-foundation.org>



> On Mar 30, 2022, at 7:14 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Wed, 30 Mar 2022 13:26:46 -0400 Waiman Long <longman@redhat.com> wrote:
> 
>> Muchun Song found out there could be a race between list_lru_add()
>> and memcg_reparent_list_lru_node() causing the later function to miss
>> reparenting of a lru entry as shown below:
>> 
>> CPU0:                           CPU1:
>> list_lru_add()
>>    spin_lock(&nlru->lock)
>>    l = list_lru_from_kmem(memcg)
>>                                memcg_reparent_objcgs(memcg)
>>                                memcg_reparent_list_lrus(memcg)
>>                                    memcg_reparent_list_lru()
>>                                        memcg_reparent_list_lru_node()
>>                                            if (!READ_ONCE(nlru->nr_items))
>>                                                // Miss reparenting
>>                                                return
>>    // Assume 0->1
>>    l->nr_items++
>>    // Assume 0->1
>>    nlru->nr_items++
>> 
>> Though it is not likely that a list_lru_node that has 0 item suddenly
>> has a newly added lru entry at the end of its life. The race is still
>> theoretically possible.
>> 
>> With the lock/unlock pair used within the percpu_ref_kill() which is
>> the last function call of memcg_reparent_objcgs(), any read issued
>> in memcg_reparent_list_lru_node() will not be reordered before the
>> reparenting of objcgs.
>> 
>> Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check
>> to ensure that either the reading of nr_items is valid or the racing
>> list_lru_add() will see the reparented objcg.
>> 
>> ...
>> 
>> --- a/mm/list_lru.c
>> +++ b/mm/list_lru.c
>> @@ -395,10 +395,33 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
>>    struct list_lru_one *src, *dst;
>> 
>>    /*
>> -     * If there is no lru entry in this nlru, we can skip it immediately.
>> +     * With the lock/unlock pair used within the percpu_ref_kill()
>> +     * which is the last function call of memcg_reparent_objcgs(), any
>> +     * read issued here will not be reordered before the reparenting
>> +     * of objcgs.
>> +     *
>> +     * Assuming a racing list_lru_add():
>> +     * list_lru_add()
>> +     *                <- memcg_reparent_list_lru_node()
>> +     *   spin_lock(&nlru->lock)
>> +     *   l = list_lru_from_kmem(memcg)
>> +     *   nlru->nr_items++
>> +     *   spin_unlock(&nlru->lock)
>> +     *                <- memcg_reparent_list_lru_node()
>> +     *
>> +     * The !spin_is_locked(&nlru->lock) check is true means it is
>> +     * either before the spin_lock() or after the spin_unlock(). In the
>> +     * former case, list_lru_add() will see the reparented objcg and so
>> +     * won't touch the lru to be reparented. In the later case, it will
>> +     * see the updated nr_items. So we can use the optimization that if
>> +     * there is no lru entry in this nlru, skip it immediately.
>>     */
>> -    if (!READ_ONCE(nlru->nr_items))
>> -        return;
>> +    if (!spin_is_locked(&nlru->lock)) {
> 
> ick.
> 
>> +        /* nr_items read must be ordered after nlru->lock */
>> +        smp_rmb();
>> +        if (!READ_ONCE(nlru->nr_items))
>> +            return;
>> +    }
> 
> include/linux/spinlock_up.h has
> 
> #define arch_spin_is_locked(lock)    ((void)(lock), 0)
> 
> so this `if' will always be true on CONFIG_SMP=n.  Will the kernel
> still work?

I guess yes, because this race is not possible on a !smp machine.

> 
> At the very least let's have changelogging and commenting explaining
> that we've actually thought about this.
> 
> Preferably, can we fix this hole properly and avoid this hack?  There is
> a reason for this:
> 
> hp2:/usr/src/25> grep spin_is_locked mm/*.c
> hp2:/usr/src/25> 


But honestly, I’d drop the original optimization together with the fix, if only there is no _real world_ data on the problem and the improvement. It seems like it has started as a nice simple improvement, but the race makes it complex and probably not worth the added complexity and fragility.

Thanks!

  reply	other threads:[~2022-03-31  2:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-30 17:26 [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node() Waiman Long
2022-03-30 19:46 ` Roman Gushchin
2022-03-31  2:14 ` Andrew Morton
2022-03-31  2:48   ` Roman Gushchin [this message]
2022-03-31  6:39     ` Shakeel Butt
2022-03-31  7:46       ` Michal Hocko
2022-04-01  1:11         ` Andrew Morton
2022-04-01  4:23           ` Shakeel Butt
2022-04-01  4:30           ` Muchun Song
2022-04-01 11:28           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89B53D3A-FCC5-4107-8D49-81D5B9AE5172@linux.dev \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).