From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:54583) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLaug-0004jP-ID for qemu-devel@nongnu.org; Tue, 09 Oct 2012 10:36:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLauW-0005o1-II for qemu-devel@nongnu.org; Tue, 09 Oct 2012 10:36:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:30603) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLauW-0005nJ-9c for qemu-devel@nongnu.org; Tue, 09 Oct 2012 10:35:52 -0400 Message-ID: <50743614.7080805@redhat.com> Date: Tue, 09 Oct 2012 16:35:00 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1348577763-12920-1-git-send-email-pbonzini@redhat.com> <20121008113932.GB16332@stefanha-thinkpad.redhat.com> <5072CE54.8020208@redhat.com> <20121009090811.GB13775@stefanha-thinkpad.redhat.com> <5073EDB3.3020804@redhat.com> <5073FE3A.1090903@redhat.com> <507401D8.8090203@redhat.com> <507405B5.4060108@redhat.com> <507410BD.6050901@redhat.com> <50741218.90000@redhat.com> <5074171A.2030904@redhat.com> <5074226A.3030907@redhat.com> <507424E5.4060705@redhat.com> <50742B97.2060608@redhat.com> <50743390.3@redhat.com> In-Reply-To: <50743390.3@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Kevin Wolf , Anthony Liguori , Ping Fan Liu , Stefan Hajnoczi , qemu-devel@nongnu.org, Jan Kiszka Il 09/10/2012 16:24, Avi Kivity ha scritto: > > But we are not Linux, and I think the tradeoffs are different for RCU in > > Linux vs. QEMU. > > > > For CPUs in the kernel, running user code is just one way to get things > > done; QEMU threads are much more event driven, and their whole purpose > > is to either run the guest or sleep, until "something happens" (VCPU > > exit or readable fd). In other words, QEMU threads should be able to > > stay most of the time in KVM_RUN or select() for any workload (to some > > approximation). > > If you're streaming data (the saturated iothread from that other thread) > or live migrating or have a block job with fast storage, this isn't > necessarily true. You could make sure each thread polls the rcu state > periodically though. Yep, that was the approximation part. >> > Not just that: we do not need to minimize RCU critical sections, because >> > anyway we want to minimize the time spent in QEMU, period. >> > >> > So I believe that to some approximation, in QEMU we can completely >> > ignore everything else, and behave as if threads were always under >> > rcu_read_lock(), except if in KVM_RUN/select. KVM_RUN and select are >> > what Paul McKenney calls extended quiescent states, and in fact the >> > following mapping works: >> > >> > rcu_extended_quiesce_start() -> rcu_read_unlock(); >> > rcu_extended_quiesce_end() -> rcu_read_lock(); >> > rcu_read_lock/unlock() -> nop >> > >> > This in turn means that dispatching inside the RCU critical section is >> > not really bad. > I believe you still cannot synchronize_rcu() while in an rcu critical > section per the rcu documentation, even when lock/unlock map to nops. Right, what the userspace RCU library does is that synchronize_rcu() also calls rcu_extended_quiesce_start/end() around the actual synchronization, so that synchronize_rcu() does not wait for its own grace period. Instead of a complete nop, rcu_read_lock/unlock() can just write to a thread-local variable if you want to assert that synchronize_rcu() is not called within a critical section. Probably a good idea. > Of course we can violate that and it wouldn't know a thing, but I prefer > to stick to the established pattern. I wasn't suggesting that, just evaluating the different tradeoffs QEMU could make. Reference counting is complicated because it has to apply to all objects used as opaques, and we're using things other than the DeviceState as opaques in many cases. Paolo