From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751052AbdAQVBO (ORCPT ); Tue, 17 Jan 2017 16:01:14 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:35004 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750997AbdAQVBL (ORCPT ); Tue, 17 Jan 2017 16:01:11 -0500 Date: Tue, 17 Jan 2017 12:34:36 -0800 From: "Paul E. McKenney" To: Paolo Bonzini Cc: Dmitry Vyukov , Steve Rutherford , syzkaller , Radim =?utf-8?B?S3LEjW3DocWZ?= , KVM list , LKML Subject: Re: kvm: use-after-free in process_srcu Reply-To: paulmck@linux.vnet.ibm.com References: <754246063.9562871.1484603305281.JavaMail.zimbra@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17011720-0004-0000-0000-000011512B62 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006451; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000199; SDB=6.00808993; UDB=6.00394058; IPR=6.00586351; BA=6.00005066; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013953; XFM=3.00000011; UTC=2017-01-17 20:34:39 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17011720-0005-0000-0000-00007C409A35 Message-Id: <20170117203436.GC5238@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-01-17_13:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=7 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1701170272 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 17, 2017 at 01:03:28PM +0100, Paolo Bonzini wrote: > > > On 17/01/2017 12:13, Dmitry Vyukov wrote: > > On Tue, Jan 17, 2017 at 12:08 PM, Paolo Bonzini wrote: > >> > >> > >> On 17/01/2017 10:56, Dmitry Vyukov wrote: > >>>> I am seeing use-after-frees in process_srcu as struct srcu_struct is > >>>> already freed. Before freeing struct srcu_struct, code does > >>>> cleanup_srcu_struct(&kvm->irq_srcu). We also tried to do: > >>>> > >>>> + srcu_barrier(&kvm->irq_srcu); > >>>> cleanup_srcu_struct(&kvm->irq_srcu); > >>>> > >>>> It reduced rate of use-after-frees, but did not eliminate them > >>>> completely. The full threaded is here: > >>>> https://groups.google.com/forum/#!msg/syzkaller/i48YZ8mwePY/0PQ8GkQTBwAJ > >>>> > >>>> Does Paolo's fix above make sense to you? Namely adding > >>>> flush_delayed_work(&sp->work) to cleanup_srcu_struct()? Yes, we do need a flush_delayed_work(), good catch! But doing multiple of them should not be necessary because there shouldn't be any callbacks at all once the srcu_barrier() returns, and the only time SRCU queues more work is if there is at least one callback pending. The code is making sure that no new call_srcu() invocations happen before it does the srcu_barrier(), right? So if you are seing failures even with the single flush_delayed_work(), it would be interesting to set a flag in the srcu_struct at cleanup_srcu_struct time, and then splat if srcu_reschedule() does its queue_delayed_work() when that flag is set. > >>> I am not sure about interaction of flush_delayed_work and > >>> srcu_reschedule... flush_delayed_work probably assumes that no work is > >>> queued concurrently, but what if srcu_reschedule queues another work > >>> concurrently... can't it happen that flush_delayed_work will miss that > >>> newly scheduled work? > >> > >> Newly scheduled callbacks would be a bug in SRCU usage, but my patch is > > > > I mean not srcu callbacks, but the sp->work being rescheduled. > > Consider that callbacks are already scheduled. We call > > flush_delayed_work, it waits for completion of process_srcu. But that > > process_srcu schedules sp->work again in srcu_reschedule. It only does this if there are callbacks still on the srcu_struct, so if you are seeing this, we either have a bug in SRCU that finds callbacks when none are present or we have a usage bug that is creating new callbacks after src_barrier() starts. Do any of your callback functions invoke call_srcu()? (Hey, I have to ask!) > >> indeed insufficient. Because of SRCU's two-phase algorithm, it's possible > >> that the first flush_delayed_work doesn't invoke all callbacks. Instead I > >> would propose this (still untested, but this time with a commit message): > >> > >> ---------------- 8< -------------- > >> From: Paolo Bonzini > >> Subject: [PATCH] srcu: wait for all callbacks before deeming SRCU "cleaned up" > >> > >> Even though there are no concurrent readers, it is possible that the > >> work item is queued for delayed processing when cleanup_srcu_struct is > >> called. The work item needs to be flushed before returning, or a > >> use-after-free can ensue. > >> > >> Furthermore, because of SRCU's two-phase algorithm it may take up to > >> two executions of srcu_advance_batches before all callbacks are invoked. > >> This can happen if the first flush_delayed_work happens as follows > >> > >> srcu_read_lock > >> process_srcu > >> srcu_advance_batches > >> ... > >> if (!try_check_zero(sp, idx^1, trycount)) > >> // there is a reader > >> return; > >> srcu_invoke_callbacks > >> ... > >> srcu_read_unlock > >> cleanup_srcu_struct > >> flush_delayed_work > >> srcu_reschedule > >> queue_delayed_work > >> > >> Now flush_delayed_work returns but srcu_reschedule will *not* have cleared > >> sp->running to false. But srcu_reschedule() sets sp->running to false if there are no callbacks. And at that point, there had better be no callbacks. > >> Not-tested-by: Paolo Bonzini > >> Signed-off-by: Paolo Bonzini > >> > >> diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c > >> index 9b9cdd549caa..9470f1ba2ef2 100644 > >> --- a/kernel/rcu/srcu.c > >> +++ b/kernel/rcu/srcu.c > >> @@ -283,6 +283,14 @@ void cleanup_srcu_struct(struct srcu_struct *sp) > >> { > >> if (WARN_ON(srcu_readers_active(sp))) > >> return; /* Leakage unless caller handles error. */ > >> + > >> + /* > >> + * No readers active, so any pending callbacks will rush through the two > >> + * batches before sp->running becomes false. No risk of busy-waiting. > >> + */ > >> + while (sp->running) > >> + flush_delayed_work(&sp->work); > > > > Unsynchronized accesses to shared state make me nervous. running is > > meant to be protected with sp->queue_lock. > > I think it could just be > > while (flush_delayed_work(&sp->work)); > > but let's wait for Paul. If it needs to be more than just a single flush_delayed_work(), we have some other bug somewhere. ;-) Thanx, Paul > Paolo > > > At least we will get back to you with a KTSAN report. > > > >> free_percpu(sp->per_cpu_ref); > >> sp->per_cpu_ref = NULL; > >> } > >> > >> > >> Thanks, > >> > >> Paolo > >> > >> -- > >> You received this message because you are subscribed to the Google Groups "syzkaller" group. > >> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com. > >> For more options, visit https://groups.google.com/d/optout. > > >