All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: Avi Kivity <avi@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] kvm: enable irq injection from interrupt context
Date: Thu, 16 Sep 2010 16:23:35 +0200	[thread overview]
Message-ID: <20100916142335.GC24850@redhat.com> (raw)
In-Reply-To: <20100916140615.GJ3008@redhat.com>

On Thu, Sep 16, 2010 at 04:06:15PM +0200, Gleb Natapov wrote:
> On Thu, Sep 16, 2010 at 03:51:37PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Sep 16, 2010 at 03:18:23PM +0200, Gleb Natapov wrote:
> > > On Thu, Sep 16, 2010 at 02:57:17PM +0200, Michael S. Tsirkin wrote:
> > > > On Thu, Sep 16, 2010 at 02:33:01PM +0200, Gleb Natapov wrote:
> > > > > On Thu, Sep 16, 2010 at 02:13:38PM +0200, Michael S. Tsirkin wrote:
> > > > > > > > We haver two users: qemu does deasserts, vhost-net does asserts.
> > > > > > > Well this is broken. You want KVM to track level for you and this is
> > > > > > > wrong. KVM does this anyway because it can't relay on devise model
> > > > > > > to behave correctly [0], but in your case it is designed to behave
> > > > > > > incorrectly.
> > > > > > > 
> > > > > > > Interrupt type is a device property. PCI devices just happen to be level
> > > > > > > triggered according to PCI spec. What if you want to use vhost-net to
> > > > > > > implement network device which has active-low interrupt line? [1]
> > > > > > 
> > > > > > The polarity would have to be reversed in gsi (irq line can be shared,
> > > > > > all devices must be active high or low consistently).
> > > > > > 
> > > > > There are gsi dedicated to PCI. They can be shared only between PCI
> > > > > devices.
> > > > > 
> > > > > > > If you want to split parts that asserts irq and de-asserts it then we
> > > > > > > should have irqfd that tracks line status and knows interrupt line
> > > > > > > polarity.
> > > > > > 
> > > > > > Yes, it can know about polarity even though I think it's cleaner to do this
> > > > > > per gsi. But it can not track line status as line is shared with
> > > > > > other devices.
> > > > > It should track only device's line status.
> > > > 
> > > > There is no such thing as device's line status on real hardware, either.
> > > > Devices do not drive INT# high: they drive it low (all the time)
> > > > or do not drive it at all.
> > > Same thing, other naming. Device either drive it low (irq_set(1)) or
> > > not (irq_set(0)).
> > 
> > Well, if I drive it low any number of times it should hae no effect.
> > 
> There is no meaning of "driving low the line multiple times" on real HW.
> You either drive it low or not. We try to emulate this with individual
> "drive low/high" events in software.
> > > > 
> > > > Or consider express, the spec explicitly says:
> > > > "Note: Duplicate Assert_INTx/Deassert_INTx Messages have no effect, but

Express has assert and deassert messages. It might be easier for
you to think in these terms. level 1: assert, level 0: deassert.
Seems a simple model and what we do models this pretty well.

> > > >  are not errors."
> > > > 
> > > > > > 
> > > > > > > > Another application is out of process virtio (sandboxing!).
> > > > > > > It will still assert and de-assert irq at the same code, so it will be
> > > > > > > able to track irq line status.
> > > > > > > 
> > > > > > > > Again, pci stuff needs to stay in qemu.
> > > > > > > > 
> > > > > > > 
> > > > > > > Nothing to do with PCI whatsoever.
> > > > > > > 
> > > > > > > [0] most qemu devices behave incorrectly and trigger level irq more then
> > > > > > >     needed.
> > > > > > 
> > > > > > Which devices?
> > > > > Most of them. They just call update_irq_status() or something and
> > > > > re-assert interrupt regardless of what previous status was.
> > > > 
> > > > At least for PCI devices, these calls do nothing if status does not change.
> > > I am not sure what code are you locking at. e1000 device emulation
> > > doesn't check previous line status before calling qemu_set_irq().
> > 
> > Right. If you dig through useless levels of indirection, you will
> > see that each PCI device has 4 pin levels, when one of these
> > changes this makes it up level to the parent bus, and so on.
> Yes. Qemu PCI level does it right. Ideally device would not even invoke 
> this logic if interrupt status haven't changed.

It needs to call *some* function to check status
and assert, right? qemu_set_irq is that function.

> > 
> > > > 
> > > > > > pci core tracks line status and will never assert the same
> > > > > > line multiple times.
> > > > > That's good if pci core does this, but device shouldn't even try it.
> > > > 
> > > > I disagree. We don't want to duplicate a ton of code all over
> > > > the codebase.
> > > > 
> > > So abstract it into a function. It shouldn't be part of PCI emulation.
> > 
> > I don't know what you mean by this, send a patch and we can discuss?
> I don't care enough to send patch.  Just remember previous irq status
> and do not call qemu_set_irq() if it doesn't change. Three lines of
> code.

Heh, we have a ton of devices to support.
And then we need to migrate this extra status, and make sure it's in
sync with PCI code.  We'll end up with much more more than 3 lines all
of it in a very sensitive and hard to test parts code.

> > Note that when I patches PCI interrupt handling for compliance
> > I made it mimic hardware as closely as possible: devices
> > can send any # of assert/deassert messages, bus discards duplicates.
> > 
> Qemu PCI code is correct as far as I can see. Not all devices are connected
> via PCI and there is not need to go through couple of layer of
> indirection to figure out that nothing should be done.
> 

If we want to remove the indirection I would be much more
interested to remove it for all cases, not just when
nothing should be done.

> > > > > > 
> > > > > > > [1] this is how correct PCI device should behave but we override
> > > > > > >     polarity in ACPI, but now incorrect behaviour is deeply designed
> > > > > > >     into vhost-net.
> > > > > > 
> > > > > > Not really, vhost net signals an eventfd. What happens then is
> > > > > > up to kvm.
> > > > > > 
> > > > > That is what current broken design does and it works, but if you want to
> > > > > save unneeded calls into kvm fix design.
> > > > 
> > > > The interface seems clean enough: vhost handles virtio ring, qemu/kvm handle pci.
> > > > Making vhost aware of pci breaks this, I would not call that fixing the
> > > > design.
> > > > 
> > > Once again. Nothing to do with PCI, everything to do with device
> > > emulation. And I propose to abstract interrupt assertion part into
> > > irqfd, not into vhost.
> > > 
> > > --
> > > 			Gleb.
> > 
> > This could work. KVM would need to find all irqfd
> > objects mapped to gsi and notify them on deassert.
> > No idea how hard this is.
> > 
> What for? Device emulation should do de-assert.

Sorry, but at this point I have no idea what you call device emulation.
qemu has code to de-assert. vhost has code to assert.
I would like to optimize level interrupts and stop driving
scheduler insane if at all possible.

> --
> 			Gleb.

  reply	other threads:[~2010-09-16 14:29 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-15 18:54 [PATCH RFC] kvm: enable irq injection from interrupt context Michael S. Tsirkin
2010-09-16  9:02 ` Gleb Natapov
2010-09-16  9:10   ` Michael S. Tsirkin
2010-09-16  9:25     ` Gleb Natapov
2010-09-16  9:43       ` Michael S. Tsirkin
2010-09-16  9:46       ` Avi Kivity
2010-09-16  9:53         ` Michael S. Tsirkin
2010-09-16 10:13           ` Gleb Natapov
2010-09-16 10:13             ` Michael S. Tsirkin
2010-09-16 10:20               ` Gleb Natapov
2010-09-16 10:44                 ` Michael S. Tsirkin
2010-09-16 10:54                   ` Gleb Natapov
2010-09-16 10:53                     ` Michael S. Tsirkin
2010-09-16 11:17                       ` Gleb Natapov
2010-09-16 12:13                         ` Michael S. Tsirkin
2010-09-16 12:33                           ` Gleb Natapov
2010-09-16 12:57                             ` Michael S. Tsirkin
2010-09-16 13:14                               ` Avi Kivity
2010-09-16 13:38                                 ` Michael S. Tsirkin
2010-09-16 13:50                                   ` Avi Kivity
2010-09-16 13:55                                   ` Gleb Natapov
2010-09-16 13:18                               ` Gleb Natapov
2010-09-16 13:51                                 ` Michael S. Tsirkin
2010-09-16 14:06                                   ` Gleb Natapov
2010-09-16 14:23                                     ` Michael S. Tsirkin [this message]
2010-09-16 14:51                                       ` Gleb Natapov
2010-09-16 15:24                                         ` Michael S. Tsirkin
2010-09-16 15:43                                           ` Gleb Natapov
2010-09-16 22:07                                             ` Michael S. Tsirkin
2010-09-17  7:59                                               ` Gleb Natapov
2010-09-19 10:45                                                 ` Michael S. Tsirkin
2010-09-19 10:56                                                   ` Avi Kivity
2010-09-19 10:55                                                     ` Michael S. Tsirkin
2010-09-19 11:05                                                       ` Avi Kivity
2010-09-19 11:23                                                         ` Michael S. Tsirkin
2010-09-19 11:18                                                       ` Gleb Natapov
2010-09-19 11:21                                                         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100916142335.GC24850@redhat.com \
    --to=mst@redhat.com \
    --cc=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.