Re: [PATCH] locking/osq_lock: fix a data race in osq_wait_next

From: Qian Cai <cai@lca.pw>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Marco Elver <elver@google.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	kasan-dev <kasan-dev@googlegroups.com>
Subject: Re: [PATCH] locking/osq_lock: fix a data race in osq_wait_next
Date: Thu, 30 Jan 2020 22:32:29 -0500	[thread overview]
Message-ID: <4A97061E-2152-4734-92C6-F5431C27360B@lca.pw> (raw)
In-Reply-To: <20200130134851.GY14914@hirez.programming.kicks-ass.net>

> On Jan 30, 2020, at 8:48 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Jan 30, 2020 at 02:39:38PM +0100, Marco Elver wrote:
>> On Wed, 29 Jan 2020 at 19:40, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>>> It's probably not terrible to put a READ_ONCE() there; we just need to
>>> make sure the compiler doesn't do something stupid (it is known to do
>>> stupid when 'volatile' is present).
>> 
>> Maybe we need to optimize READ_ONCE().
> 
> I think recent compilers have gotten better at volatile. In part because
> of our complaints.
> 
>> 'if (data_race(..))' would also work here and has no cost.
> 
> Right, that might be the best option.
> 

OK, I’ll send a patch for that.

BTW, I have another one to report. Can’t see how the load tearing would
cause any real issue.

[  519.240629] BUG: KCSAN: data-race in osq_lock / osq_unlock

[  519.249088] write (marked) to 0xffff8bb2f133be40 of 8 bytes by task 421 on cpu 38:
[  519.257427]  osq_unlock+0xa8/0x170 kernel/locking/osq_lock.c:219
[  519.261571]  __mutex_lock+0x4b3/0xd20
[  519.265972]  mutex_lock_nested+0x31/0x40
[  519.270639]  memcg_create_kmem_cache+0x2e/0x190
[  519.275922]  memcg_kmem_cache_create_func+0x40/0x80
[  519.281553]  process_one_work+0x54c/0xbe0
[  519.286308]  worker_thread+0x80/0x650
[  519.290715]  kthread+0x1e0/0x200
[  519.294690]  ret_from_fork+0x27/0x50

void osq_unlock(struct optimistic_spin_queue *lock)
{
        struct optimistic_spin_node *node, *next;
        int curr = encode_cpu(smp_processor_id());

        /*
         * Fast path for the uncontended case.
         */
        if (likely(atomic_cmpxchg_release(&lock->tail, curr,
                                          OSQ_UNLOCKED_VAL) == curr))
                return;

        /*
         * Second most likely case.
         */
        node = this_cpu_ptr(&osq_node);
        next = xchg(&node->next, NULL);    <--------------------------
        if (next) {
                WRITE_ONCE(next->locked, 1);
                return;
        }

        next = osq_wait_next(lock, node, NULL);
        if (next)
                WRITE_ONCE(next->locked, 1);
}

[  519.301232] read to 0xffff8bb2f133be40 of 8 bytes by task 196 on cpu 12:
[  519.308705]  osq_lock+0x1e2/0x340 kernel/locking/osq_lock.c:157
[  519.312762]  __mutex_lock+0x277/0xd20
[  519.317167]  mutex_lock_nested+0x31/0x40
[  519.321838]  memcg_create_kmem_cache+0x2e/0x190
[  519.327120]  memcg_kmem_cache_create_func+0x40/0x80
[  519.332751]  process_one_work+0x54c/0xbe0
[  519.337508]  worker_thread+0x80/0x650
[  519.341922]  kthread+0x1e0/0x200
[  519.345889]  ret_from_fork+0x27/0x50

        for (;;) {
                if (prev->next == node &&         <------------------------
                    cmpxchg(&prev->next, node, NULL) == node)
                        break;

                /*
                 * We can only fail the cmpxchg() racing against an unlock(),
                 * in which case we should observe @node->locked becomming
                 * true.
                 */
                if (smp_load_acquire(&node->locked))
                        return true;

                cpu_relax();

                /*
                 * Or we race against a concurrent unqueue()'s step-B, in which
                 * case its step-C will write us a new @node->prev pointer.
                 */
                prev = READ_ONCE(node->prev);
        }

[  519.352420] Reported by Kernel Concurrency Sanitizer on:
[  519.358492] CPU: 12 PID: 196 Comm: kworker/12:1 Tainted: G        W    L    5.5.0-next-20200130+ #3
[  519.368317] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[  519.377627] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func