From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751262AbdAQJ46 (ORCPT ); Tue, 17 Jan 2017 04:56:58 -0500 Received: from mail-lf0-f49.google.com ([209.85.215.49]:36651 "EHLO mail-lf0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750942AbdAQJ44 (ORCPT ); Tue, 17 Jan 2017 04:56:56 -0500 MIME-Version: 1.0 In-Reply-To: References: <754246063.9562871.1484603305281.JavaMail.zimbra@redhat.com> From: Dmitry Vyukov Date: Tue, 17 Jan 2017 10:56:34 +0100 Message-ID: Subject: Re: kvm: use-after-free in process_srcu To: Paolo Bonzini , Paul McKenney Cc: Steve Rutherford , syzkaller , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , KVM list , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v0H9v5GO030518 On Tue, Jan 17, 2017 at 10:47 AM, Dmitry Vyukov wrote: > On Mon, Jan 16, 2017 at 10:48 PM, Paolo Bonzini wrote: >> >> >> ----- Original Message ----- >>> From: "Dmitry Vyukov" >>> To: "Steve Rutherford" >>> Cc: "syzkaller" , "Paolo Bonzini" , "Radim Krčmář" >>> , "KVM list" , "LKML" >>> Sent: Monday, January 16, 2017 10:34:26 PM >>> Subject: Re: kvm: use-after-free in process_srcu >>> >>> On Sun, Jan 15, 2017 at 6:11 PM, Dmitry Vyukov wrote: >>> > On Fri, Jan 13, 2017 at 10:19 AM, Dmitry Vyukov wrote: >>> >> On Fri, Jan 13, 2017 at 4:30 AM, Steve Rutherford >>> >> wrote: >>> >>> I'm not that familiar with the kernel's workqueues, but this seems >>> >>> like the classic "callback outlives the memory it references" >>> >>> use-after-free, where the process_srcu callback is outliving struct >>> >>> kvm (which contains the srcu_struct). If that's right, then calling >>> >>> srcu_barrier (which should wait for all of the call_srcu callbacks to >>> >>> complete, which are what enqueue the process_srcu callbacks) before >>> >>> cleanup_srcu_struct in kvm_destroy_vm probably fixes this. >>> >>> >>> >>> The corresponding patch to virt/kvm/kvm_main.c looks something like: >>> >>> static void kvm_destroy_vm(struct kvm *kvm) >>> >>> { >>> >>> ... >>> >>> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) >>> >>> kvm_free_memslots(kvm, kvm->memslots[i]); >>> >>> + srcu_barrier(&kvm->irq_srcu); >>> >>> cleanup_srcu_struct(&kvm->irq_srcu); >>> >>> + srcu_barrier(&kvm->srcu); >>> >>> cleanup_srcu_struct(&kvm->srcu); >>> >>> ... >>> >>> >>> >>> >>> >>> Since we don't have a repro, this obviously won't be readily testable. >>> >>> I find srcu subtle enough that I don't trust my reasoning fully (in >>> >>> particular, I don't trust that waiting for all of the call_srcu >>> >>> callbacks to complete also waits for all of the process_srcu >>> >>> callbacks). Someone else know if that's the case? >>> >> >>> >> >>> >> From the function description it looks like it should do the trick: >>> >> >>> >> 514 /** >>> >> 515 * srcu_barrier - Wait until all in-flight call_srcu() callbacks >>> >> complete. >>> >> 516 * @sp: srcu_struct on which to wait for in-flight callbacks. >>> >> 517 */ >>> >> 518 void srcu_barrier(struct srcu_struct *sp) >>> >> >>> >> I see this failure happening several times per day. I've applied your >>> >> patch locally and will check if I see these failures happening. >>> > >>> > >>> > I have not seen the crash in 3 days, when usually I see several >>> > crashes per night. So I think we can consider that the patch fixes the >>> > crash: >>> > >>> > Tested-by: Dmitry Vyukov >>> >>> >>> Unfortunately I hit it again with the patch applied. It definitely >>> happens less frequently now, but still happens: >> >> Try this: >> >> diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c >> index 9b9cdd549caa..ef5599c65299 100644 >> --- a/kernel/rcu/srcu.c >> +++ b/kernel/rcu/srcu.c >> @@ -283,6 +283,7 @@ void cleanup_srcu_struct(struct srcu_struct *sp) >> { >> if (WARN_ON(srcu_readers_active(sp))) >> return; /* Leakage unless caller handles error. */ >> + flush_delayed_work(&sp->work); >> free_percpu(sp->per_cpu_ref); >> sp->per_cpu_ref = NULL; >> } >> >> I think it should subsume Steve's patch, but I'm not 101% sure. We >> will have to run this through Paul. > > > Hi +Pual, > > I am seeing use-after-frees in process_srcu as struct srcu_struct is > already freed. Before freeing struct srcu_struct, code does > cleanup_srcu_struct(&kvm->irq_srcu). We also tried to do: > > + srcu_barrier(&kvm->irq_srcu); > cleanup_srcu_struct(&kvm->irq_srcu); > > It reduced rate of use-after-frees, but did not eliminate them > completely. The full threaded is here: > https://groups.google.com/forum/#!msg/syzkaller/i48YZ8mwePY/0PQ8GkQTBwAJ > > Does Paolo's fix above make sense to you? Namely adding > flush_delayed_work(&sp->work) to cleanup_srcu_struct()? I am not sure about interaction of flush_delayed_work and srcu_reschedule... flush_delayed_work probably assumes that no work is queued concurrently, but what if srcu_reschedule queues another work concurrently... can't it happen that flush_delayed_work will miss that newly scheduled work? Meanwhile I will apply this change instead of Steve's change and see what happens.