From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:48722)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <blauwirbel@gmail.com>) id 1Qyqi5-0005hR-Id
	for qemu-devel@nongnu.org; Wed, 31 Aug 2011 15:44:30 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <blauwirbel@gmail.com>) id 1Qyqi3-0007W3-Ss
	for qemu-devel@nongnu.org; Wed, 31 Aug 2011 15:44:29 -0400
Received: from mail-qw0-f47.google.com ([209.85.216.47]:58941)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <blauwirbel@gmail.com>) id 1Qyqi3-0007Vu-Od
	for qemu-devel@nongnu.org; Wed, 31 Aug 2011 15:44:27 -0400
Received: by qwh5 with SMTP id 5so849435qwh.34
	for <qemu-devel@nongnu.org>; Wed, 31 Aug 2011 12:44:27 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <4E5E7AB8.9060304@siemens.com>
References: <4E58FC3F.6080809@web.de> <4E5BE7C5.60705@us.ibm.com>
	<4E5BFF51.9010503@web.de> <4E5C00F0.9070103@redhat.com>
	<CAAu8pHs+thsnWrmvWg45BDVLkiVj682GNpbEMUnCCZ6qnyK+YA@mail.gmail.com>
	<4E5D39C8.5020205@web.de>
	<CAFEAcA8yN6=TXYLprvH+bee9GRmcuOuLTRUi2QgcFCvasgMbQw@mail.gmail.com>
	<4E5E1297.3050904@siemens.com>
	<CAAu8pHv-RVcwWgWBjc8gTJQLcKW1+vhT_3DvL5y=JD7rKhJ7Mw@mail.gmail.com>
	<4E5E7AB8.9060304@siemens.com>
From: Blue Swirl <blauwirbel@gmail.com>
Date: Wed, 31 Aug 2011 19:44:07 +0000
Message-ID: <CAAu8pHtUhpvvP4OC-Tegp1oVD06gQy5xigFOdOuZFdyS1N84ww@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH] pc: Clean up PIC-to-APIC IRQ path
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Lucas Meneghel Rodrigues <lmr@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Anthony Liguori <aliguori@us.ibm.com>, Marcelo Tosatti <mtosatti@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Avi Kivity <avi@redhat.com>, Gerd Hoffmann <kraxel@redhat.com>

On Wed, Aug 31, 2011 at 6:17 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-08-31 19:41, Blue Swirl wrote:
>> On Wed, Aug 31, 2011 at 10:53 AM, Jan Kiszka <jan.kiszka@siemens.com> wr=
ote:
>>> On 2011-08-31 10:25, Peter Maydell wrote:
>>>> On 30 August 2011 20:28, Jan Kiszka <jan.kiszka@web.de> wrote:
>>>>> Yes, that's the current state. Once we have bidirectional IRQ links i=
n
>>>>> place (pushing downward, querying upward - required to skip IRQ route=
rs
>>>>> for fast, lockless deliveries), that should change again.
>>>>
>>>> Can you elaborate a bit more on this? I don't think anybody has
>>>> proposed links with their own internal state before in the qdev/qom
>>>> discussions...
>>>
>>> That basic idea is to allow
>>>
>>> a) a discovery of the currently active IRQ path from source to sink
>>> =C2=A0 (that would be possible via QOM just using forward links)
>>
>> Why, only for b)? This is not possible with real hardware.
>>
>>> b) skip updating the states of IRQ routers in the common case, just
>>> =C2=A0 signaling directly the sink from the source (to allow in-kernel =
IRQ
>>> =C2=A0 delivery or to skip taking some device locks). Whenever some rou=
ter
>>> =C2=A0 is queried for its current IRQ line state, it would have to ask =
the
>>> =C2=A0 preceding IRQ source for its state. So we need a backward link.
>>
>> I think this would need pretty heavy changes everywhere. At board
>> level the full path needs to be identified and special versions of
>> IRQs installed along the way. The routers would need to use callbacks
>> to inform other parties about routing changes.
>
> It already works in practice (based on a hack and minus IRQ router state
> updates) for x86 PCI device pass-through. At least I don't want this
> upstream but instead a generic solution. The ability to skip IRQ routers
> also in pure user space device model scenarios is a useful by-product.
>
>>
>>> We haven't thought about how this could be implemented in details yet
>>> though. Among other things, it heavily depends on the final QOM design.
>>
>> Perhaps a global IRQ manager could help. It would keep track of the
>> whole IRQ matrix, what are input (x axis) and output (y axis) states
>> and what each matrix node (router state) looks like (or able to
>> compute) if asked. I don't think backward links would be needed with
>> this approach.
>
> Well, the backward links would then be moved to that global IRQ manager.
> It's just moving the data management, but if it turns out to allow a
> cleaner device design, I would surely not vote against it. But that
> manager must support lazy updates as well because we cannot call it from
> kernel space for each and every event.

The global IRQ switch matrix would take over all routing for the
devices in question. As an example, let's consider a PCI card
(source), PCI host bridge, IO-APIC, LAPIC and CPU (final destination).
The matrix would get input from the PCI card and output directly to
LAPIC since the CPU does not use a qemu_irq yet. These leaf devices
actually need no changes, just the qemu_irq is routed elsewhere. The
matrix would also have access to intermediate devices but the qemu_irq
inputs of those would not be updated in the lazy mode and the outputs
need not go anywhere.

So for a matrix where x0, x1...  is used for inputs and y0, y1... for
outputs, x0 would be connected in the example to PCI card, x1 to PCI
host bridge output and x2 to IO-APIC (ignoring multiple lines). Then
y0 would be the calculated output from PCI host bridge, y1 IO-APIC and
only y2 would be connected to LAPIC.

If the intermediate IRQ state between the nodes is queried (for
example at IO-APIC), the matrix must be able to tell what would be the
true state at that point and then this value would be supplied to the
device. So a method to query device IRQ input state from matrix on
demand is needed. This could be as unspecific as "refresh all my
inputs". A backward link may not be able to tell the state if
information about next backward link is lost (logic OR of several
signals) whereas the matrix should be able to do that even in that
case.

If routes in IO-APIC change, the matrix needs to be recalculated and
states in the affected devices should be updated. This update cycle
could be triggered via normal qemu_irq device input lines or direct
callback (needs more changes). Maybe even this could be handled lazily
so that no updates would be needed until the final output changes or
the state of an intermediate device is queried.

The global manager should handle saving and restoring states, the
device states would not matter. If it pushed the true state to
devices, we're back to the ordering problem.