From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:48722) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qyqi5-0005hR-Id for qemu-devel@nongnu.org; Wed, 31 Aug 2011 15:44:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qyqi3-0007W3-Ss for qemu-devel@nongnu.org; Wed, 31 Aug 2011 15:44:29 -0400 Received: from mail-qw0-f47.google.com ([209.85.216.47]:58941) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qyqi3-0007Vu-Od for qemu-devel@nongnu.org; Wed, 31 Aug 2011 15:44:27 -0400 Received: by qwh5 with SMTP id 5so849435qwh.34 for ; Wed, 31 Aug 2011 12:44:27 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4E5E7AB8.9060304@siemens.com> References: <4E58FC3F.6080809@web.de> <4E5BE7C5.60705@us.ibm.com> <4E5BFF51.9010503@web.de> <4E5C00F0.9070103@redhat.com> <4E5D39C8.5020205@web.de> <4E5E1297.3050904@siemens.com> <4E5E7AB8.9060304@siemens.com> From: Blue Swirl Date: Wed, 31 Aug 2011 19:44:07 +0000 Message-ID: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] pc: Clean up PIC-to-APIC IRQ path List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Lucas Meneghel Rodrigues , Peter Maydell , Anthony Liguori , Marcelo Tosatti , qemu-devel , Avi Kivity , Gerd Hoffmann On Wed, Aug 31, 2011 at 6:17 PM, Jan Kiszka wrote: > On 2011-08-31 19:41, Blue Swirl wrote: >> On Wed, Aug 31, 2011 at 10:53 AM, Jan Kiszka wr= ote: >>> On 2011-08-31 10:25, Peter Maydell wrote: >>>> On 30 August 2011 20:28, Jan Kiszka wrote: >>>>> Yes, that's the current state. Once we have bidirectional IRQ links i= n >>>>> place (pushing downward, querying upward - required to skip IRQ route= rs >>>>> for fast, lockless deliveries), that should change again. >>>> >>>> Can you elaborate a bit more on this? I don't think anybody has >>>> proposed links with their own internal state before in the qdev/qom >>>> discussions... >>> >>> That basic idea is to allow >>> >>> a) a discovery of the currently active IRQ path from source to sink >>> =C2=A0 (that would be possible via QOM just using forward links) >> >> Why, only for b)? This is not possible with real hardware. >> >>> b) skip updating the states of IRQ routers in the common case, just >>> =C2=A0 signaling directly the sink from the source (to allow in-kernel = IRQ >>> =C2=A0 delivery or to skip taking some device locks). Whenever some rou= ter >>> =C2=A0 is queried for its current IRQ line state, it would have to ask = the >>> =C2=A0 preceding IRQ source for its state. So we need a backward link. >> >> I think this would need pretty heavy changes everywhere. At board >> level the full path needs to be identified and special versions of >> IRQs installed along the way. The routers would need to use callbacks >> to inform other parties about routing changes. > > It already works in practice (based on a hack and minus IRQ router state > updates) for x86 PCI device pass-through. At least I don't want this > upstream but instead a generic solution. The ability to skip IRQ routers > also in pure user space device model scenarios is a useful by-product. > >> >>> We haven't thought about how this could be implemented in details yet >>> though. Among other things, it heavily depends on the final QOM design. >> >> Perhaps a global IRQ manager could help. It would keep track of the >> whole IRQ matrix, what are input (x axis) and output (y axis) states >> and what each matrix node (router state) looks like (or able to >> compute) if asked. I don't think backward links would be needed with >> this approach. > > Well, the backward links would then be moved to that global IRQ manager. > It's just moving the data management, but if it turns out to allow a > cleaner device design, I would surely not vote against it. But that > manager must support lazy updates as well because we cannot call it from > kernel space for each and every event. The global IRQ switch matrix would take over all routing for the devices in question. As an example, let's consider a PCI card (source), PCI host bridge, IO-APIC, LAPIC and CPU (final destination). The matrix would get input from the PCI card and output directly to LAPIC since the CPU does not use a qemu_irq yet. These leaf devices actually need no changes, just the qemu_irq is routed elsewhere. The matrix would also have access to intermediate devices but the qemu_irq inputs of those would not be updated in the lazy mode and the outputs need not go anywhere. So for a matrix where x0, x1... is used for inputs and y0, y1... for outputs, x0 would be connected in the example to PCI card, x1 to PCI host bridge output and x2 to IO-APIC (ignoring multiple lines). Then y0 would be the calculated output from PCI host bridge, y1 IO-APIC and only y2 would be connected to LAPIC. If the intermediate IRQ state between the nodes is queried (for example at IO-APIC), the matrix must be able to tell what would be the true state at that point and then this value would be supplied to the device. So a method to query device IRQ input state from matrix on demand is needed. This could be as unspecific as "refresh all my inputs". A backward link may not be able to tell the state if information about next backward link is lost (logic OR of several signals) whereas the matrix should be able to do that even in that case. If routes in IO-APIC change, the matrix needs to be recalculated and states in the affected devices should be updated. This update cycle could be triggered via normal qemu_irq device input lines or direct callback (needs more changes). Maybe even this could be handled lazily so that no updates would be needed until the final output changes or the state of an intermediate device is queried. The global manager should handle saving and restoring states, the device states would not matter. If it pushed the true state to devices, we're back to the ordering problem.