From mboxrd@z Thu Jan  1 00:00:00 1970
From: Scott Wood <scottwood@freescale.com>
Subject: Re: [RFC PATCH 1/6] kvm: add device control API
Date: Wed, 20 Feb 2013 20:05:12 -0600
Message-ID: <1361412312.31212.18@snotra>
References: <20130220130949.GN3600@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1;
	delsp=Yes	format=Flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Alexander Graf <agraf@suse.de>, <kvm-ppc@vger.kernel.org>,
	<kvm@vger.kernel.org>, Christoffer Dall <cdall@cs.columbia.edu>
To: Gleb Natapov <gleb@redhat.com>
Return-path: <kvm-ppc-owner@vger.kernel.org>
In-Reply-To: <20130220130949.GN3600@redhat.com> (from gleb@redhat.com on Wed
	Feb 20 07:09:49 2013)
Content-Disposition: inline
Sender: kvm-ppc-owner@vger.kernel.org
List-Id: kvm.vger.kernel.org

  On 02/20/2013 07:09:49 AM, Gleb Natapov wrote:
> On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote:
> > On 02/19/2013 06:24:18 AM, Gleb Natapov wrote:
> > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote:
> > >> The ability to set/get attributes is needed.  Sorry, but "get or=
 =20
> set
> > >> one blob of data, up to 512 bytes, for the entire irqchip" is =20
> just
> > >> not good enough -- assuming you don't want us to start sticking
> > >> pointers and commands in *that* data. :-)
> > >>
> > >Proposed interface sticks pointers into ioctl data, so why doing
> > >the same
> > >for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile.
> >
> > There's a difference between putting a pointer in an ioctl control
> > structure that is specifically documented as being that way (as in
> > ONE_REG), versus taking an ioctl that claims to be setting/getting =
a
> > blob of state and embedding pointers in it.  It would be like
> > sticking a pointer in the attribute payload of this API, which I
> > think is something to be discouraged.
> If documentation is what differentiate for you between silly and smar=
t
> then write documentation instead of new interfaces.

You mean like what we did with SREGS, that got deprecated and replaced =
=20
with ONE_REG?

How is writing documentation not creating new interfaces, if the =20
documentation is different from what the interface is currently =20
understood to do?
Note that Marcelo seems to view KVM_SET_IRQCHIP as effectively being a =
=20
device reset, which is rather different.

> KVM_SET_IRQCHIP/KVM_GET_IRQCHIP is defined to operate on blob of data=
 =20
> on
> x86, nothing prevent you from adding MPIC specifics to the interface,
> Add mpic state into kvm_irqchip structure and if 512 bytes is not =20
> enough
> for you to transfer the state put pointers there and _document_ them.

So basically, you want me to keep this interface but share the ioctl =20
number with an older interface? :-P

> But with 512 bytes you can transfer properties inline, so you probabl=
y
> do not need pointer there anyway. I see you have three properties 2 o=
f
> them 32bit and one 64bit.

Three *groups* of properties.  One of the property groups is per =20
source, and we can have hundreds of sources.  Another exposes the =20
register space, which is 64 KiB (admittedly it's somewhat sparse, but =20
there's more than 512 bytes of real data in there).  And we don't =20
necessarily want to set *everything*.

> >                                        It'd also be using
> > KVM_SET_IRQCHIP to read data, which is the sort of thing you object
> > to later on regarding KVM_IRQ_LINE_STATUS.
> >
> Do not see why.

It's either that, or have the data direction of the "chip" field in =20
KVM_GET_IRQCHIP not be entirely in the "get" direction.

> > Then there's the silliness of transporting 512 bytes just to read a
> > descriptor for transporting something else.
> >
> Yes, agree. But is this enough of a reason to introduce entirely new
> interface? Is it on performance critical path? Doubt it, unless you
> abuse the interface to send interrupts, but then isn't it silty to
> do copy_from_user() twice to inject an interrupt like proposed =20
> interface
> does?

It should probably be get_user() instead, which is pretty fast in the =20
absence of a fault.

> > >For signaling irqs (I think this is what you mean by "commands")
> > >we have KVM_IRQ_LINE.
> >
> > It's one type of command.  Another is setting the address.  Another
> > is writing to registers that have side effects (this is how MSI
> > injection is done on MPIC, just as in real hardware).
> >
> Setting the address is setting an attribute. Sending MSI is a command=
=2E
> Things you set/get during init/migration are attributes. Things you d=
o
> to cause side-effects are commands.

What if I set the address at a time that isn't init/migration (the =20
hardware allows moving it, like a PCI BAR)?  Suddenly it becomes not an=
 =20
attribute due to how the caller uses it?

> > What is the benefit of KVM_IRQ_LINE over what MPIC does?  What real
> > (non-glue/wrapper) code can become common?
> >
> No new ioctl with exactly same result (well actually even faster sinc=
e
> less copying is done).

Which ioctl would go away?

> You need to show us the benefits of the new interface
> vs existing one, not vice versa.

Well, as I said to Marcello, the real reason why we probably need to =20
use the existing interface is irqfd.  That doesn't make the device =20
control stuff go away.

> > And I really hope you don't want us to do MSIs the x86 way.
> >
> What is wrong with KVM_SIGNAL_MSI? Except that you'll need code to =20
> glue it
> to mpic.

We'll have to write extra code for it compared to the current way where=
 =20
it works with *zero* code beyond what is wanted for other purposes such=
 =20
as debug and (eventually) migration.  At least it's more direct than =20
having to establish a GSI route...

> > In the XICS thread, Paul brought up the possibliity of cascaded
> > MPICs.  It's not relevant to the systems we're trying to model, but
> > if one did want to use the in-kernel irqchip interface for that, it
> > would be really nice to be able to operate on a specific MPIC for
> > injection rather than have to come up with some sort of global
> > identifier (above and beyond the minor flattening we'd need to do t=
o
> > represent a single MPIC's interrupts in a flat numberspace).
> >
> ARM encodes information in irq field of KVM_IRQ_LINE like that:
>  =A0bits:  | 31 ... 24 | 23  ... 16 | 15    ...     0 |
>   field: | irq_type  | vcpu_index |   irq_number    |
> Why will similar approach not work?

Well, it conflicts with the GSI routing stuff, and I don't see an irq =20
chip ID field...

But otherwise (or assuming you mean to use such an encoding when =20
setting up a GSI route), I didn't say this part couldn't be made to =20
work.  It will require new kernel code for managing a GSI table in a =20
non-APIC way, and a new callback into the device code, but as I've said=
 =20
elsewhere I think we need it for irqfd anyway.  If I use KVM_IRQ_LINE =20
for injecting interrupts, do you still object to the rest of it?

> > Could an error return be used for cases where the IRQ was not
> > delivered, in the very unlikely event that we want to implement
> > something similar on MPIC?
> We can, but I do not think it will be good API. This condition is not=
 =20
> an
> error.

-EBUSY seems appropriate enough...

> >                            Note again that MPIC's decision to use
> > or not use KVM_IRQ_LINE is only about what MPIC does; it is not
> > inherent in the device control API.
> That's the crux of the problem though. MPIC tries to be different jus=
t
> for the sake to be different. Why? The only explanation you provide i=
s
> because current API is "silly", not that you cannot implement MPIC =20
> with
> it or it will be unnecessary slow, just "silly".

It's not about "silliness" as that this new thing I added for other =20
reasons did the job just as well (again, except when it comes to =20
irqfd), and avoided the need for a GSI table, etc.  IRQ injection was =20
not the main point of the new interface.

> > >Other devices may get other commands that need
> > >response, so if we design generic interface we should take it into
> > >account. I think using KVM_SET_DEVICE_ATTR to inject interrupts is=
 =20
> a
> > >misnomer, you do not set internal device attribute, you toggle
> > >external
> > >input. My be another ioctl KVM_SEND_DEVICE_COMMAND is needed.
> >
> > I see no need for a separate ioctl in terms of the underlying
> > infrastructure for distinguishing "attribute" from "write-only
> > command".  I'm open to improvements on what the ioctl is called.
> > It's basically like setting a register on a device, except I was
> > concerned that if we actually called it a "register" that people
> > would take it too literally and think it's only for the architected
> > register state of the emulated device.
> I agree "attribute" is better name than "register", but injecting
> interrupt is not setting an attribute.

It's a dynamic attribute -- the state of the input line.  Better names =
=20
are welcome.  I don't see this difference as enough to warrant separate=
 =20
ioctls.

> > >> >ARM vGIC code, that is ready to go upstream, uses old way too. =
=20
> So
> > >> >it will
> > >> >be 2 archs against one.
> > >>
> > >> I wasn't aware that that's how it worked. :-P
> > >>
> > >What worked? That vGIC uses existing interface or that non generic
> > >interface used by many arches wins generic one used by only one =20
> arch?
> >
> > The latter.  Two wrongs don't make a right, and adding another
> > inextensible, device-specific API is not the answer to the existing
> > APIs being too inextensible and device/arch-specific.  Some portion
> > will always need to be device-specific because we're controlling th=
e
> > creation and of a specific device, but the glue does not need to be=
=2E
> >
> This is not "adding another inextensible, device-specific API" vs =20
> "adding
> cool generic extensible API" though. It is "using existing =20
> inextensible,
> device-specific API" vs "adding cool generic extensible API".

The "existing inextensible device-specific API" doesn't have support =20
for this "specific device".  Something new has to be added one way or =20
another.

> > >APIs are easy to add and impossible to remove.
> >
> > That's why I want to get it right this time.
> >
> And what if you'll fail?

That's always a possibility of course.  I don't think that's a good =20
reason to avoid trying to move in the right direction.

> What if next architecture will bring new
> developer that will proclaim your new interface "silly" since it does=
 =20
> not
> allow for device destruction and do not return file descriptor for =20
> newly
> created device that userspace can do select on to wait for a device's
> events or mmap memory for fast userspace/device communication?

The device id that gets returned is arbitrary; you could turn it into =20
an fd later with no loss of compatibility.

Device destruction would complicate things and I would not support =20
requiring all devices to allow it.  If someone wanted to add it for =20
certain devices, at the interface level it would just be a new ioctl.

-Scott

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Scott Wood <scottwood@freescale.com>
Date: Thu, 21 Feb 2013 02:05:12 +0000
Subject: Re: [RFC PATCH 1/6] kvm: add device control API
Message-Id: <1361412312.31212.18@snotra>
List-Id: <kvm-ppc.vger.kernel.org>
References: <20130220130949.GN3600@redhat.com>
In-Reply-To: <20130220130949.GN3600@redhat.com> (from gleb@redhat.com on Wed
	Feb 20 07:09:49 2013)
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
To: Gleb Natapov <gleb@redhat.com>
Cc: Alexander Graf <agraf@suse.de>, kvm-ppc@vger.kernel.org, kvm@vger.kernel.org, Christoffer Dall <cdall@cs.columbia.edu>

  On 02/20/2013 07:09:49 AM, Gleb Natapov wrote:
> On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote:
> > On 02/19/2013 06:24:18 AM, Gleb Natapov wrote:
> > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote:
> > >> The ability to set/get attributes is needed.  Sorry, but "get or =20
> set
> > >> one blob of data, up to 512 bytes, for the entire irqchip" is =20
> just
> > >> not good enough -- assuming you don't want us to start sticking
> > >> pointers and commands in *that* data. :-)
> > >>
> > >Proposed interface sticks pointers into ioctl data, so why doing
> > >the same
> > >for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile.
> >
> > There's a difference between putting a pointer in an ioctl control
> > structure that is specifically documented as being that way (as in
> > ONE_REG), versus taking an ioctl that claims to be setting/getting a
> > blob of state and embedding pointers in it.  It would be like
> > sticking a pointer in the attribute payload of this API, which I
> > think is something to be discouraged.
> If documentation is what differentiate for you between silly and smart
> then write documentation instead of new interfaces.

You mean like what we did with SREGS, that got deprecated and replaced =20
with ONE_REG?

How is writing documentation not creating new interfaces, if the =20
documentation is different from what the interface is currently =20
understood to do?
Note that Marcelo seems to view KVM_SET_IRQCHIP as effectively being a =20
device reset, which is rather different.

> KVM_SET_IRQCHIP/KVM_GET_IRQCHIP is defined to operate on blob of data =20
> on
> x86, nothing prevent you from adding MPIC specifics to the interface,
> Add mpic state into kvm_irqchip structure and if 512 bytes is not =20
> enough
> for you to transfer the state put pointers there and _document_ them.

So basically, you want me to keep this interface but share the ioctl =20
number with an older interface? :-P

> But with 512 bytes you can transfer properties inline, so you probably
> do not need pointer there anyway. I see you have three properties 2 of
> them 32bit and one 64bit.

Three *groups* of properties.  One of the property groups is per =20
source, and we can have hundreds of sources.  Another exposes the =20
register space, which is 64 KiB (admittedly it's somewhat sparse, but =20
there's more than 512 bytes of real data in there).  And we don't =20
necessarily want to set *everything*.

> >                                        It'd also be using
> > KVM_SET_IRQCHIP to read data, which is the sort of thing you object
> > to later on regarding KVM_IRQ_LINE_STATUS.
> >
> Do not see why.

It's either that, or have the data direction of the "chip" field in =20
KVM_GET_IRQCHIP not be entirely in the "get" direction.

> > Then there's the silliness of transporting 512 bytes just to read a
> > descriptor for transporting something else.
> >
> Yes, agree. But is this enough of a reason to introduce entirely new
> interface? Is it on performance critical path? Doubt it, unless you
> abuse the interface to send interrupts, but then isn't it silty to
> do copy_from_user() twice to inject an interrupt like proposed =20
> interface
> does?

It should probably be get_user() instead, which is pretty fast in the =20
absence of a fault.

> > >For signaling irqs (I think this is what you mean by "commands")
> > >we have KVM_IRQ_LINE.
> >
> > It's one type of command.  Another is setting the address.  Another
> > is writing to registers that have side effects (this is how MSI
> > injection is done on MPIC, just as in real hardware).
> >
> Setting the address is setting an attribute. Sending MSI is a command.
> Things you set/get during init/migration are attributes. Things you do
> to cause side-effects are commands.

What if I set the address at a time that isn't init/migration (the =20
hardware allows moving it, like a PCI BAR)?  Suddenly it becomes not an =20
attribute due to how the caller uses it?

> > What is the benefit of KVM_IRQ_LINE over what MPIC does?  What real
> > (non-glue/wrapper) code can become common?
> >
> No new ioctl with exactly same result (well actually even faster since
> less copying is done).

Which ioctl would go away?

> You need to show us the benefits of the new interface
> vs existing one, not vice versa.

Well, as I said to Marcello, the real reason why we probably need to =20
use the existing interface is irqfd.  That doesn't make the device =20
control stuff go away.

> > And I really hope you don't want us to do MSIs the x86 way.
> >
> What is wrong with KVM_SIGNAL_MSI? Except that you'll need code to =20
> glue it
> to mpic.

We'll have to write extra code for it compared to the current way where =20
it works with *zero* code beyond what is wanted for other purposes such =20
as debug and (eventually) migration.  At least it's more direct than =20
having to establish a GSI route...

> > In the XICS thread, Paul brought up the possibliity of cascaded
> > MPICs.  It's not relevant to the systems we're trying to model, but
> > if one did want to use the in-kernel irqchip interface for that, it
> > would be really nice to be able to operate on a specific MPIC for
> > injection rather than have to come up with some sort of global
> > identifier (above and beyond the minor flattening we'd need to do to
> > represent a single MPIC's interrupts in a flat numberspace).
> >
> ARM encodes information in irq field of KVM_IRQ_LINE like that:
>  =A0bits:  | 31 ... 24 | 23  ... 16 | 15    ...     0 |
>   field: | irq_type  | vcpu_index |   irq_number    |
> Why will similar approach not work?

Well, it conflicts with the GSI routing stuff, and I don't see an irq =20
chip ID field...

But otherwise (or assuming you mean to use such an encoding when =20
setting up a GSI route), I didn't say this part couldn't be made to =20
work.  It will require new kernel code for managing a GSI table in a =20
non-APIC way, and a new callback into the device code, but as I've said =20
elsewhere I think we need it for irqfd anyway.  If I use KVM_IRQ_LINE =20
for injecting interrupts, do you still object to the rest of it?

> > Could an error return be used for cases where the IRQ was not
> > delivered, in the very unlikely event that we want to implement
> > something similar on MPIC?
> We can, but I do not think it will be good API. This condition is not =20
> an
> error.

-EBUSY seems appropriate enough...

> >                            Note again that MPIC's decision to use
> > or not use KVM_IRQ_LINE is only about what MPIC does; it is not
> > inherent in the device control API.
> That's the crux of the problem though. MPIC tries to be different just
> for the sake to be different. Why? The only explanation you provide is
> because current API is "silly", not that you cannot implement MPIC =20
> with
> it or it will be unnecessary slow, just "silly".

It's not about "silliness" as that this new thing I added for other =20
reasons did the job just as well (again, except when it comes to =20
irqfd), and avoided the need for a GSI table, etc.  IRQ injection was =20
not the main point of the new interface.

> > >Other devices may get other commands that need
> > >response, so if we design generic interface we should take it into
> > >account. I think using KVM_SET_DEVICE_ATTR to inject interrupts is =20
> a
> > >misnomer, you do not set internal device attribute, you toggle
> > >external
> > >input. My be another ioctl KVM_SEND_DEVICE_COMMAND is needed.
> >
> > I see no need for a separate ioctl in terms of the underlying
> > infrastructure for distinguishing "attribute" from "write-only
> > command".  I'm open to improvements on what the ioctl is called.
> > It's basically like setting a register on a device, except I was
> > concerned that if we actually called it a "register" that people
> > would take it too literally and think it's only for the architected
> > register state of the emulated device.
> I agree "attribute" is better name than "register", but injecting
> interrupt is not setting an attribute.

It's a dynamic attribute -- the state of the input line.  Better names =20
are welcome.  I don't see this difference as enough to warrant separate =20
ioctls.

> > >> >ARM vGIC code, that is ready to go upstream, uses old way too. =20
> So
> > >> >it will
> > >> >be 2 archs against one.
> > >>
> > >> I wasn't aware that that's how it worked. :-P
> > >>
> > >What worked? That vGIC uses existing interface or that non generic
> > >interface used by many arches wins generic one used by only one =20
> arch?
> >
> > The latter.  Two wrongs don't make a right, and adding another
> > inextensible, device-specific API is not the answer to the existing
> > APIs being too inextensible and device/arch-specific.  Some portion
> > will always need to be device-specific because we're controlling the
> > creation and of a specific device, but the glue does not need to be.
> >
> This is not "adding another inextensible, device-specific API" vs =20
> "adding
> cool generic extensible API" though. It is "using existing =20
> inextensible,
> device-specific API" vs "adding cool generic extensible API".

The "existing inextensible device-specific API" doesn't have support =20
for this "specific device".  Something new has to be added one way or =20
another.

> > >APIs are easy to add and impossible to remove.
> >
> > That's why I want to get it right this time.
> >
> And what if you'll fail?

That's always a possibility of course.  I don't think that's a good =20
reason to avoid trying to move in the right direction.

> What if next architecture will bring new
> developer that will proclaim your new interface "silly" since it does =20
> not
> allow for device destruction and do not return file descriptor for =20
> newly
> created device that userspace can do select on to wait for a device's
> events or mmap memory for fast userspace/device communication?

The device id that gets returned is arbitrary; you could turn it into =20
an fd later with no loss of compatibility.

Device destruction would complicate things and I would not support =20
requiring all devices to allow it.  If someone wanted to add it for =20
certain devices, at the interface level it would just be a new ioctl.

-Scott