All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] kvm: enable irq injection from interrupt context
Date: Thu, 16 Sep 2010 16:51:17 +0200	[thread overview]
Message-ID: <20100916145117.GK3008@redhat.com> (raw)
In-Reply-To: <20100916142335.GC24850@redhat.com>

On Thu, Sep 16, 2010 at 04:23:35PM +0200, Michael S. Tsirkin wrote:
> > > > > There is no such thing as device's line status on real hardware, either.
> > > > > Devices do not drive INT# high: they drive it low (all the time)
> > > > > or do not drive it at all.
> > > > Same thing, other naming. Device either drive it low (irq_set(1)) or
> > > > not (irq_set(0)).
> > > 
> > > Well, if I drive it low any number of times it should hae no effect.
> > > 
> > There is no meaning of "driving low the line multiple times" on real HW.
> > You either drive it low or not. We try to emulate this with individual
> > "drive low/high" events in software.
> > > > > 
> > > > > Or consider express, the spec explicitly says:
> > > > > "Note: Duplicate Assert_INTx/Deassert_INTx Messages have no effect, but
> 
> Express has assert and deassert messages. It might be easier for
> you to think in these terms. level 1: assert, level 0: deassert.
> Seems a simple model and what we do models this pretty well.
> 
They are between device and pci controller. I am talking about what
happens between pci controller and irq chip. We are talking about
different things really.

> > > > >  are not errors."
> > > > > 
> > > > > > > 
> > > > > > > > > Another application is out of process virtio (sandboxing!).
> > > > > > > > It will still assert and de-assert irq at the same code, so it will be
> > > > > > > > able to track irq line status.
> > > > > > > > 
> > > > > > > > > Again, pci stuff needs to stay in qemu.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Nothing to do with PCI whatsoever.
> > > > > > > > 
> > > > > > > > [0] most qemu devices behave incorrectly and trigger level irq more then
> > > > > > > >     needed.
> > > > > > > 
> > > > > > > Which devices?
> > > > > > Most of them. They just call update_irq_status() or something and
> > > > > > re-assert interrupt regardless of what previous status was.
> > > > > 
> > > > > At least for PCI devices, these calls do nothing if status does not change.
> > > > I am not sure what code are you locking at. e1000 device emulation
> > > > doesn't check previous line status before calling qemu_set_irq().
> > > 
> > > Right. If you dig through useless levels of indirection, you will
> > > see that each PCI device has 4 pin levels, when one of these
> > > changes this makes it up level to the parent bus, and so on.
> > Yes. Qemu PCI level does it right. Ideally device would not even invoke 
> > this logic if interrupt status haven't changed.
> 
> It needs to call *some* function to check status
> and assert, right? qemu_set_irq is that function.
qemu_set_irq does not check previous level and calls a callback
unconditionally.

> 
> > > 
> > > > > 
> > > > > > > pci core tracks line status and will never assert the same
> > > > > > > line multiple times.
> > > > > > That's good if pci core does this, but device shouldn't even try it.
> > > > > 
> > > > > I disagree. We don't want to duplicate a ton of code all over
> > > > > the codebase.
> > > > > 
> > > > So abstract it into a function. It shouldn't be part of PCI emulation.
> > > 
> > > I don't know what you mean by this, send a patch and we can discuss?
> > I don't care enough to send patch.  Just remember previous irq status
> > and do not call qemu_set_irq() if it doesn't change. Three lines of
> > code.
> 
> Heh, we have a ton of devices to support.
So?

> And then we need to migrate this extra status, and make sure it's in
> sync with PCI code.  We'll end up with much more more than 3 lines all
> of it in a very sensitive and hard to test parts code.
> 
You should be able to reconstruct it from device state. What should be
in sync with PCI code? 

> > > Note that when I patches PCI interrupt handling for compliance
> > > I made it mimic hardware as closely as possible: devices
> > > can send any # of assert/deassert messages, bus discards duplicates.
> > > 
> > Qemu PCI code is correct as far as I can see. Not all devices are connected
> > via PCI and there is not need to go through couple of layer of
> > indirection to figure out that nothing should be done.
> > 
> 
> If we want to remove the indirection I would be much more
> interested to remove it for all cases, not just when
> nothing should be done.
I don't care. This indirection may be justified for all I know. You try to
shift this discussion to areas I am not interested to look into :) All I
am saying is that each device is capable of knowing its current irq line
state and optimize out function call + additional logic. Whether upper
layer should handle two asserts without de-assert in between is different
point and I think we agree on it.

> 
> > > > > > > 
> > > > > > > > [1] this is how correct PCI device should behave but we override
> > > > > > > >     polarity in ACPI, but now incorrect behaviour is deeply designed
> > > > > > > >     into vhost-net.
> > > > > > > 
> > > > > > > Not really, vhost net signals an eventfd. What happens then is
> > > > > > > up to kvm.
> > > > > > > 
> > > > > > That is what current broken design does and it works, but if you want to
> > > > > > save unneeded calls into kvm fix design.
> > > > > 
> > > > > The interface seems clean enough: vhost handles virtio ring, qemu/kvm handle pci.
> > > > > Making vhost aware of pci breaks this, I would not call that fixing the
> > > > > design.
> > > > > 
> > > > Once again. Nothing to do with PCI, everything to do with device
> > > > emulation. And I propose to abstract interrupt assertion part into
> > > > irqfd, not into vhost.
> > > > 
> > > > --
> > > > 			Gleb.
> > > 
> > > This could work. KVM would need to find all irqfd
> > > objects mapped to gsi and notify them on deassert.
> > > No idea how hard this is.
> > > 
> > What for? Device emulation should do de-assert.
> 
> Sorry, but at this point I have no idea what you call device emulation.
The same thing everyone calls device emulation. In case of virtio-net it
is in hw/virtio-net.c. If vhost-net is in use device emulation is split
between userspace and kernel, but it is still just device emulation.

> qemu has code to de-assert. vhost has code to assert.
Good. So qemu will de-assert. So what do you mean by
"KVM would need to find all irqfd objects mapped to gsi and notify
them on deassert"

> I would like to optimize level interrupts and stop driving
> scheduler insane if at all possible.
> 
Worthy goal. Do it in irqfd. Irqfd shouldn't call kvm_set_irq() if irq
level hasn't changed.

--
			Gleb.

  reply	other threads:[~2010-09-16 14:51 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-15 18:54 [PATCH RFC] kvm: enable irq injection from interrupt context Michael S. Tsirkin
2010-09-16  9:02 ` Gleb Natapov
2010-09-16  9:10   ` Michael S. Tsirkin
2010-09-16  9:25     ` Gleb Natapov
2010-09-16  9:43       ` Michael S. Tsirkin
2010-09-16  9:46       ` Avi Kivity
2010-09-16  9:53         ` Michael S. Tsirkin
2010-09-16 10:13           ` Gleb Natapov
2010-09-16 10:13             ` Michael S. Tsirkin
2010-09-16 10:20               ` Gleb Natapov
2010-09-16 10:44                 ` Michael S. Tsirkin
2010-09-16 10:54                   ` Gleb Natapov
2010-09-16 10:53                     ` Michael S. Tsirkin
2010-09-16 11:17                       ` Gleb Natapov
2010-09-16 12:13                         ` Michael S. Tsirkin
2010-09-16 12:33                           ` Gleb Natapov
2010-09-16 12:57                             ` Michael S. Tsirkin
2010-09-16 13:14                               ` Avi Kivity
2010-09-16 13:38                                 ` Michael S. Tsirkin
2010-09-16 13:50                                   ` Avi Kivity
2010-09-16 13:55                                   ` Gleb Natapov
2010-09-16 13:18                               ` Gleb Natapov
2010-09-16 13:51                                 ` Michael S. Tsirkin
2010-09-16 14:06                                   ` Gleb Natapov
2010-09-16 14:23                                     ` Michael S. Tsirkin
2010-09-16 14:51                                       ` Gleb Natapov [this message]
2010-09-16 15:24                                         ` Michael S. Tsirkin
2010-09-16 15:43                                           ` Gleb Natapov
2010-09-16 22:07                                             ` Michael S. Tsirkin
2010-09-17  7:59                                               ` Gleb Natapov
2010-09-19 10:45                                                 ` Michael S. Tsirkin
2010-09-19 10:56                                                   ` Avi Kivity
2010-09-19 10:55                                                     ` Michael S. Tsirkin
2010-09-19 11:05                                                       ` Avi Kivity
2010-09-19 11:23                                                         ` Michael S. Tsirkin
2010-09-19 11:18                                                       ` Gleb Natapov
2010-09-19 11:21                                                         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100916145117.GK3008@redhat.com \
    --to=gleb@redhat.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.