From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:46469) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TRQp7-0005JI-FH for qemu-devel@nongnu.org; Thu, 25 Oct 2012 13:02:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TRQp2-0005y1-6n for qemu-devel@nongnu.org; Thu, 25 Oct 2012 13:02:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42724) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TRQp1-0005xm-V3 for qemu-devel@nongnu.org; Thu, 25 Oct 2012 13:02:20 -0400 Message-ID: <50897089.50304@redhat.com> Date: Thu, 25 Oct 2012 19:02:01 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1350897839-29593-1-git-send-email-pingfank@linux.vnet.ibm.com> <1350897839-29593-13-git-send-email-pingfank@linux.vnet.ibm.com> <50865DB9.1060106@siemens.com> <50879606.3070004@web.de> <50890002.90701@redhat.com> <508906DC.8090502@web.de> <50896707.9060405@redhat.com> <50896B51.3040708@siemens.com> In-Reply-To: <50896B51.3040708@siemens.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Liu Ping Fan , Stefan Hajnoczi , Marcelo Tosatti , "qemu-devel@nongnu.org" , liu ping fan , Anthony Liguori , Paolo Bonzini On 10/25/2012 06:39 PM, Jan Kiszka wrote: >> >> That doesn't work cross-thread. >> >> vcpu A: write to device X, dma-ing to device Y >> vcpu B: write to device Y, dma-ing to device X > > We will deny DMA-ing from device X on behalf of a VCPU, ie. in dispatch > context, to Y. > > What we do not deny, though, is DMA-ing from an I/O thread that > processes an event for device X. I would really like to avoid depending on the context. In real hardware, there is no such thing. > If the invoked callback of device X > holds the device lock across some DMA request to Y, then we risk to run > into the same ABBA issue. Hmm... Yup. > >> >> My suggestion was to drop the locks around DMA, then re-acquire the lock >> and re-validate data. > > Maybe possible, but hairy depending on the device model. It's unpleasant, yes. Note depending on the device, we may not need to re-validate data, it may be sufficient to load it into local variables to we know it is consistent at some point. But all those solutions suffer from requiring device model authors to understand all those issues, rather than just add a simple lock around access to their data structures. >>>>> I see that we have a all-or-nothing problem here: to address this >>>>> properly, we need to convert the IRQ path to lock-less (or at least >>>>> compatible with holding per-device locks) as well. >>>> >>>> There is a transitional path where writing to a register that can cause >>>> IRQ changes takes both the big lock and the local lock. >>>> >>>> Eventually, though, of course all inner subsystems must be threaded for >>>> this work to have value. >>>> >>> >>> But that transitional path must not introduce regressions. Opening a >>> race window between IRQ cause update and event injection is such a >>> thing, just like dropping concurrent requests on the floor. >> >> Can you explain the race? > > Context A Context B > > device.lock > ... > device.set interrupt_cause = 0 > lower_irq = true > ... > device.unlock > device.lock > ... > device.interrupt_cause = 42 > raise_irq = true > ... > device.unlock > if (raise_irq) > bql.lock > set_irq(device.irqno) > bql.unlock > if (lower_irq) > bql.lock > clear_irq(device.irqno) > bql.unlock > > > And there it goes, our interrupt event. Obviously you'll need to reacquire the device lock after taking bql and revalidate stuff. But that is not what I am suggesting. Instead, any path that can lead to an irq update (or timer update etc) will take both the bql and the device lock. This will leave after the first pass only side effect free register reads and writes, which is silly if we keep it that way, but we will follow with a threaded timer and irq subsystem and we'll peel away those big locks. device_mmio_write: if register is involved in irq or timers or block layer or really anything that matters: bql.acquire device.lock.acquire do stuff device.lock.release if that big condition from above was true: bql.release -- error compiling committee.c: too many arguments to function