From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751134AbdAQLJZ (ORCPT <rfc822;w@1wt.eu>);
        Tue, 17 Jan 2017 06:09:25 -0500
Received: from mail-lf0-f66.google.com ([209.85.215.66]:35019 "EHLO
        mail-lf0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750958AbdAQLJV (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 17 Jan 2017 06:09:21 -0500
Subject: Re: kvm: use-after-free in process_srcu
To: Dmitry Vyukov <dvyukov@google.com>,
        Paul McKenney <paulmck@linux.vnet.ibm.com>
References: <CACT4Y+ZJHafVoV2Q3H8s7voP7WprS0ax2SpRYs9JLdz+TzE2=Q@mail.gmail.com>
 <CAOMGZ=H5gJksJ7c9Lyv_OSFUjfacA1abGUS9dfPVwV1jaYd6Sw@mail.gmail.com>
 <CACT4Y+bSBioNnZzTzX-n5bp-oUBuqAGFS-8_QdsfErMr2=jPnw@mail.gmail.com>
 <CABayD+fqOYGm77FM0QA8EytvjisawhySQcEBo6Zcti9hyo4Pyg@mail.gmail.com>
 <CACT4Y+b-9-xN+qFD1WCdCeGsBSqyHWKZ-BSuBVjCjf7grnTKNw@mail.gmail.com>
 <CACT4Y+ZquKsPz6hO=4RNPs0Wpa1Ayns2ZNAx90EbBbooUFEJJg@mail.gmail.com>
 <CACT4Y+bChiBMC9CsC7OCmJh48ioqhxB4WZqtr1d147fmaCJ_HA@mail.gmail.com>
 <754246063.9562871.1484603305281.JavaMail.zimbra@redhat.com>
 <CACT4Y+YTuqCK24X-DYpbG6jg9Nqqvb2uikZzyZeqFUmUmGbipw@mail.gmail.com>
 <CACT4Y+Y82G0KHgBGLvcACSgVxczeJV1TK-umg3qeZXX4S3D2qw@mail.gmail.com>
Cc: Steve Rutherford <srutherford@google.com>,
        syzkaller <syzkaller@googlegroups.com>,
        =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>,
        KVM list <kvm@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <cf0b545e-b947-243c-91ec-d75d349da970@redhat.com>
Date: Tue, 17 Jan 2017 12:08:30 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.5.1
MIME-Version: 1.0
In-Reply-To: <CACT4Y+Y82G0KHgBGLvcACSgVxczeJV1TK-umg3qeZXX4S3D2qw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 17/01/2017 10:56, Dmitry Vyukov wrote:
>> I am seeing use-after-frees in process_srcu as struct srcu_struct is
>> already freed. Before freeing struct srcu_struct, code does
>> cleanup_srcu_struct(&kvm->irq_srcu). We also tried to do:
>> 
>> +      srcu_barrier(&kvm->irq_srcu);
>>          cleanup_srcu_struct(&kvm->irq_srcu);
>> 
>> It reduced rate of use-after-frees, but did not eliminate them
>> completely. The full threaded is here:
>> https://groups.google.com/forum/#!msg/syzkaller/i48YZ8mwePY/0PQ8GkQTBwAJ
>> 
>> Does Paolo's fix above make sense to you? Namely adding
>> flush_delayed_work(&sp->work) to cleanup_srcu_struct()?
>
> I am not sure about interaction of flush_delayed_work and
> srcu_reschedule... flush_delayed_work probably assumes that no work is
> queued concurrently, but what if srcu_reschedule queues another work
> concurrently... can't it happen that flush_delayed_work will miss that
> newly scheduled work?

Newly scheduled callbacks would be a bug in SRCU usage, but my patch is
indeed insufficient.  Because of SRCU's two-phase algorithm, it's possible
that the first flush_delayed_work doesn't invoke all callbacks.  Instead I 
would propose this (still untested, but this time with a commit message):

---------------- 8< --------------
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH] srcu: wait for all callbacks before deeming SRCU "cleaned up"

Even though there are no concurrent readers, it is possible that the
work item is queued for delayed processing when cleanup_srcu_struct is
called.  The work item needs to be flushed before returning, or a
use-after-free can ensue.

Furthermore, because of SRCU's two-phase algorithm it may take up to
two executions of srcu_advance_batches before all callbacks are invoked.
This can happen if the first flush_delayed_work happens as follows

                                                          srcu_read_lock
    process_srcu
	srcu_advance_batches
            ...
            if (!try_check_zero(sp, idx^1, trycount))
                // there is a reader
                return;
        srcu_invoke_callbacks
            ...
                                                          srcu_read_unlock
                                                          cleanup_srcu_struct
                                                              flush_delayed_work
        srcu_reschedule
            queue_delayed_work

Now flush_delayed_work returns but srcu_reschedule will *not* have cleared
sp->running to false.

Not-tested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
index 9b9cdd549caa..9470f1ba2ef2 100644
--- a/kernel/rcu/srcu.c
+++ b/kernel/rcu/srcu.c
@@ -283,6 +283,14 @@ void cleanup_srcu_struct(struct srcu_struct *sp)
 {
 	if (WARN_ON(srcu_readers_active(sp)))
 		return; /* Leakage unless caller handles error. */
+
+	/*
+	 * No readers active, so any pending callbacks will rush through the two
+	 * batches before sp->running becomes false.  No risk of busy-waiting.
+	 */
+	while (sp->running)
+		flush_delayed_work(&sp->work);
+
 	free_percpu(sp->per_cpu_ref);
 	sp->per_cpu_ref = NULL;
 }


Thanks,

Paolo