From mboxrd@z Thu Jan 1 00:00:00 1970 From: Scott Wood Subject: Re: [RFC PATCH 1/6] kvm: add device control API Date: Wed, 20 Feb 2013 20:05:12 -0600 Message-ID: <1361412312.31212.18@snotra> References: <20130220130949.GN3600@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; delsp=Yes format=Flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Alexander Graf , , , Christoffer Dall To: Gleb Natapov Return-path: In-Reply-To: <20130220130949.GN3600@redhat.com> (from gleb@redhat.com on Wed Feb 20 07:09:49 2013) Content-Disposition: inline Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 02/20/2013 07:09:49 AM, Gleb Natapov wrote: > On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote: > > On 02/19/2013 06:24:18 AM, Gleb Natapov wrote: > > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > > >> The ability to set/get attributes is needed. Sorry, but "get or= =20 > set > > >> one blob of data, up to 512 bytes, for the entire irqchip" is =20 > just > > >> not good enough -- assuming you don't want us to start sticking > > >> pointers and commands in *that* data. :-) > > >> > > >Proposed interface sticks pointers into ioctl data, so why doing > > >the same > > >for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile. > > > > There's a difference between putting a pointer in an ioctl control > > structure that is specifically documented as being that way (as in > > ONE_REG), versus taking an ioctl that claims to be setting/getting = a > > blob of state and embedding pointers in it. It would be like > > sticking a pointer in the attribute payload of this API, which I > > think is something to be discouraged. > If documentation is what differentiate for you between silly and smar= t > then write documentation instead of new interfaces. You mean like what we did with SREGS, that got deprecated and replaced = =20 with ONE_REG? How is writing documentation not creating new interfaces, if the =20 documentation is different from what the interface is currently =20 understood to do? Note that Marcelo seems to view KVM_SET_IRQCHIP as effectively being a = =20 device reset, which is rather different. > KVM_SET_IRQCHIP/KVM_GET_IRQCHIP is defined to operate on blob of data= =20 > on > x86, nothing prevent you from adding MPIC specifics to the interface, > Add mpic state into kvm_irqchip structure and if 512 bytes is not =20 > enough > for you to transfer the state put pointers there and _document_ them. So basically, you want me to keep this interface but share the ioctl =20 number with an older interface? :-P > But with 512 bytes you can transfer properties inline, so you probabl= y > do not need pointer there anyway. I see you have three properties 2 o= f > them 32bit and one 64bit. Three *groups* of properties. One of the property groups is per =20 source, and we can have hundreds of sources. Another exposes the =20 register space, which is 64 KiB (admittedly it's somewhat sparse, but =20 there's more than 512 bytes of real data in there). And we don't =20 necessarily want to set *everything*. > > It'd also be using > > KVM_SET_IRQCHIP to read data, which is the sort of thing you object > > to later on regarding KVM_IRQ_LINE_STATUS. > > > Do not see why. It's either that, or have the data direction of the "chip" field in =20 KVM_GET_IRQCHIP not be entirely in the "get" direction. > > Then there's the silliness of transporting 512 bytes just to read a > > descriptor for transporting something else. > > > Yes, agree. But is this enough of a reason to introduce entirely new > interface? Is it on performance critical path? Doubt it, unless you > abuse the interface to send interrupts, but then isn't it silty to > do copy_from_user() twice to inject an interrupt like proposed =20 > interface > does? It should probably be get_user() instead, which is pretty fast in the =20 absence of a fault. > > >For signaling irqs (I think this is what you mean by "commands") > > >we have KVM_IRQ_LINE. > > > > It's one type of command. Another is setting the address. Another > > is writing to registers that have side effects (this is how MSI > > injection is done on MPIC, just as in real hardware). > > > Setting the address is setting an attribute. Sending MSI is a command= =2E > Things you set/get during init/migration are attributes. Things you d= o > to cause side-effects are commands. What if I set the address at a time that isn't init/migration (the =20 hardware allows moving it, like a PCI BAR)? Suddenly it becomes not an= =20 attribute due to how the caller uses it? > > What is the benefit of KVM_IRQ_LINE over what MPIC does? What real > > (non-glue/wrapper) code can become common? > > > No new ioctl with exactly same result (well actually even faster sinc= e > less copying is done). Which ioctl would go away? > You need to show us the benefits of the new interface > vs existing one, not vice versa. Well, as I said to Marcello, the real reason why we probably need to =20 use the existing interface is irqfd. That doesn't make the device =20 control stuff go away. > > And I really hope you don't want us to do MSIs the x86 way. > > > What is wrong with KVM_SIGNAL_MSI? Except that you'll need code to =20 > glue it > to mpic. We'll have to write extra code for it compared to the current way where= =20 it works with *zero* code beyond what is wanted for other purposes such= =20 as debug and (eventually) migration. At least it's more direct than =20 having to establish a GSI route... > > In the XICS thread, Paul brought up the possibliity of cascaded > > MPICs. It's not relevant to the systems we're trying to model, but > > if one did want to use the in-kernel irqchip interface for that, it > > would be really nice to be able to operate on a specific MPIC for > > injection rather than have to come up with some sort of global > > identifier (above and beyond the minor flattening we'd need to do t= o > > represent a single MPIC's interrupts in a flat numberspace). > > > ARM encodes information in irq field of KVM_IRQ_LINE like that: > =A0bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 | > field: | irq_type | vcpu_index | irq_number | > Why will similar approach not work? Well, it conflicts with the GSI routing stuff, and I don't see an irq =20 chip ID field... But otherwise (or assuming you mean to use such an encoding when =20 setting up a GSI route), I didn't say this part couldn't be made to =20 work. It will require new kernel code for managing a GSI table in a =20 non-APIC way, and a new callback into the device code, but as I've said= =20 elsewhere I think we need it for irqfd anyway. If I use KVM_IRQ_LINE =20 for injecting interrupts, do you still object to the rest of it? > > Could an error return be used for cases where the IRQ was not > > delivered, in the very unlikely event that we want to implement > > something similar on MPIC? > We can, but I do not think it will be good API. This condition is not= =20 > an > error. -EBUSY seems appropriate enough... > > Note again that MPIC's decision to use > > or not use KVM_IRQ_LINE is only about what MPIC does; it is not > > inherent in the device control API. > That's the crux of the problem though. MPIC tries to be different jus= t > for the sake to be different. Why? The only explanation you provide i= s > because current API is "silly", not that you cannot implement MPIC =20 > with > it or it will be unnecessary slow, just "silly". It's not about "silliness" as that this new thing I added for other =20 reasons did the job just as well (again, except when it comes to =20 irqfd), and avoided the need for a GSI table, etc. IRQ injection was =20 not the main point of the new interface. > > >Other devices may get other commands that need > > >response, so if we design generic interface we should take it into > > >account. I think using KVM_SET_DEVICE_ATTR to inject interrupts is= =20 > a > > >misnomer, you do not set internal device attribute, you toggle > > >external > > >input. My be another ioctl KVM_SEND_DEVICE_COMMAND is needed. > > > > I see no need for a separate ioctl in terms of the underlying > > infrastructure for distinguishing "attribute" from "write-only > > command". I'm open to improvements on what the ioctl is called. > > It's basically like setting a register on a device, except I was > > concerned that if we actually called it a "register" that people > > would take it too literally and think it's only for the architected > > register state of the emulated device. > I agree "attribute" is better name than "register", but injecting > interrupt is not setting an attribute. It's a dynamic attribute -- the state of the input line. Better names = =20 are welcome. I don't see this difference as enough to warrant separate= =20 ioctls. > > >> >ARM vGIC code, that is ready to go upstream, uses old way too. = =20 > So > > >> >it will > > >> >be 2 archs against one. > > >> > > >> I wasn't aware that that's how it worked. :-P > > >> > > >What worked? That vGIC uses existing interface or that non generic > > >interface used by many arches wins generic one used by only one =20 > arch? > > > > The latter. Two wrongs don't make a right, and adding another > > inextensible, device-specific API is not the answer to the existing > > APIs being too inextensible and device/arch-specific. Some portion > > will always need to be device-specific because we're controlling th= e > > creation and of a specific device, but the glue does not need to be= =2E > > > This is not "adding another inextensible, device-specific API" vs =20 > "adding > cool generic extensible API" though. It is "using existing =20 > inextensible, > device-specific API" vs "adding cool generic extensible API". The "existing inextensible device-specific API" doesn't have support =20 for this "specific device". Something new has to be added one way or =20 another. > > >APIs are easy to add and impossible to remove. > > > > That's why I want to get it right this time. > > > And what if you'll fail? That's always a possibility of course. I don't think that's a good =20 reason to avoid trying to move in the right direction. > What if next architecture will bring new > developer that will proclaim your new interface "silly" since it does= =20 > not > allow for device destruction and do not return file descriptor for =20 > newly > created device that userspace can do select on to wait for a device's > events or mmap memory for fast userspace/device communication? The device id that gets returned is arbitrary; you could turn it into =20 an fd later with no loss of compatibility. Device destruction would complicate things and I would not support =20 requiring all devices to allow it. If someone wanted to add it for =20 certain devices, at the interface level it would just be a new ioctl. -Scott From mboxrd@z Thu Jan 1 00:00:00 1970 From: Scott Wood Date: Thu, 21 Feb 2013 02:05:12 +0000 Subject: Re: [RFC PATCH 1/6] kvm: add device control API Message-Id: <1361412312.31212.18@snotra> List-Id: References: <20130220130949.GN3600@redhat.com> In-Reply-To: <20130220130949.GN3600@redhat.com> (from gleb@redhat.com on Wed Feb 20 07:09:49 2013) MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: Gleb Natapov Cc: Alexander Graf , kvm-ppc@vger.kernel.org, kvm@vger.kernel.org, Christoffer Dall On 02/20/2013 07:09:49 AM, Gleb Natapov wrote: > On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote: > > On 02/19/2013 06:24:18 AM, Gleb Natapov wrote: > > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > > >> The ability to set/get attributes is needed. Sorry, but "get or =20 > set > > >> one blob of data, up to 512 bytes, for the entire irqchip" is =20 > just > > >> not good enough -- assuming you don't want us to start sticking > > >> pointers and commands in *that* data. :-) > > >> > > >Proposed interface sticks pointers into ioctl data, so why doing > > >the same > > >for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile. > > > > There's a difference between putting a pointer in an ioctl control > > structure that is specifically documented as being that way (as in > > ONE_REG), versus taking an ioctl that claims to be setting/getting a > > blob of state and embedding pointers in it. It would be like > > sticking a pointer in the attribute payload of this API, which I > > think is something to be discouraged. > If documentation is what differentiate for you between silly and smart > then write documentation instead of new interfaces. You mean like what we did with SREGS, that got deprecated and replaced =20 with ONE_REG? How is writing documentation not creating new interfaces, if the =20 documentation is different from what the interface is currently =20 understood to do? Note that Marcelo seems to view KVM_SET_IRQCHIP as effectively being a =20 device reset, which is rather different. > KVM_SET_IRQCHIP/KVM_GET_IRQCHIP is defined to operate on blob of data =20 > on > x86, nothing prevent you from adding MPIC specifics to the interface, > Add mpic state into kvm_irqchip structure and if 512 bytes is not =20 > enough > for you to transfer the state put pointers there and _document_ them. So basically, you want me to keep this interface but share the ioctl =20 number with an older interface? :-P > But with 512 bytes you can transfer properties inline, so you probably > do not need pointer there anyway. I see you have three properties 2 of > them 32bit and one 64bit. Three *groups* of properties. One of the property groups is per =20 source, and we can have hundreds of sources. Another exposes the =20 register space, which is 64 KiB (admittedly it's somewhat sparse, but =20 there's more than 512 bytes of real data in there). And we don't =20 necessarily want to set *everything*. > > It'd also be using > > KVM_SET_IRQCHIP to read data, which is the sort of thing you object > > to later on regarding KVM_IRQ_LINE_STATUS. > > > Do not see why. It's either that, or have the data direction of the "chip" field in =20 KVM_GET_IRQCHIP not be entirely in the "get" direction. > > Then there's the silliness of transporting 512 bytes just to read a > > descriptor for transporting something else. > > > Yes, agree. But is this enough of a reason to introduce entirely new > interface? Is it on performance critical path? Doubt it, unless you > abuse the interface to send interrupts, but then isn't it silty to > do copy_from_user() twice to inject an interrupt like proposed =20 > interface > does? It should probably be get_user() instead, which is pretty fast in the =20 absence of a fault. > > >For signaling irqs (I think this is what you mean by "commands") > > >we have KVM_IRQ_LINE. > > > > It's one type of command. Another is setting the address. Another > > is writing to registers that have side effects (this is how MSI > > injection is done on MPIC, just as in real hardware). > > > Setting the address is setting an attribute. Sending MSI is a command. > Things you set/get during init/migration are attributes. Things you do > to cause side-effects are commands. What if I set the address at a time that isn't init/migration (the =20 hardware allows moving it, like a PCI BAR)? Suddenly it becomes not an =20 attribute due to how the caller uses it? > > What is the benefit of KVM_IRQ_LINE over what MPIC does? What real > > (non-glue/wrapper) code can become common? > > > No new ioctl with exactly same result (well actually even faster since > less copying is done). Which ioctl would go away? > You need to show us the benefits of the new interface > vs existing one, not vice versa. Well, as I said to Marcello, the real reason why we probably need to =20 use the existing interface is irqfd. That doesn't make the device =20 control stuff go away. > > And I really hope you don't want us to do MSIs the x86 way. > > > What is wrong with KVM_SIGNAL_MSI? Except that you'll need code to =20 > glue it > to mpic. We'll have to write extra code for it compared to the current way where =20 it works with *zero* code beyond what is wanted for other purposes such =20 as debug and (eventually) migration. At least it's more direct than =20 having to establish a GSI route... > > In the XICS thread, Paul brought up the possibliity of cascaded > > MPICs. It's not relevant to the systems we're trying to model, but > > if one did want to use the in-kernel irqchip interface for that, it > > would be really nice to be able to operate on a specific MPIC for > > injection rather than have to come up with some sort of global > > identifier (above and beyond the minor flattening we'd need to do to > > represent a single MPIC's interrupts in a flat numberspace). > > > ARM encodes information in irq field of KVM_IRQ_LINE like that: > =A0bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 | > field: | irq_type | vcpu_index | irq_number | > Why will similar approach not work? Well, it conflicts with the GSI routing stuff, and I don't see an irq =20 chip ID field... But otherwise (or assuming you mean to use such an encoding when =20 setting up a GSI route), I didn't say this part couldn't be made to =20 work. It will require new kernel code for managing a GSI table in a =20 non-APIC way, and a new callback into the device code, but as I've said =20 elsewhere I think we need it for irqfd anyway. If I use KVM_IRQ_LINE =20 for injecting interrupts, do you still object to the rest of it? > > Could an error return be used for cases where the IRQ was not > > delivered, in the very unlikely event that we want to implement > > something similar on MPIC? > We can, but I do not think it will be good API. This condition is not =20 > an > error. -EBUSY seems appropriate enough... > > Note again that MPIC's decision to use > > or not use KVM_IRQ_LINE is only about what MPIC does; it is not > > inherent in the device control API. > That's the crux of the problem though. MPIC tries to be different just > for the sake to be different. Why? The only explanation you provide is > because current API is "silly", not that you cannot implement MPIC =20 > with > it or it will be unnecessary slow, just "silly". It's not about "silliness" as that this new thing I added for other =20 reasons did the job just as well (again, except when it comes to =20 irqfd), and avoided the need for a GSI table, etc. IRQ injection was =20 not the main point of the new interface. > > >Other devices may get other commands that need > > >response, so if we design generic interface we should take it into > > >account. I think using KVM_SET_DEVICE_ATTR to inject interrupts is =20 > a > > >misnomer, you do not set internal device attribute, you toggle > > >external > > >input. My be another ioctl KVM_SEND_DEVICE_COMMAND is needed. > > > > I see no need for a separate ioctl in terms of the underlying > > infrastructure for distinguishing "attribute" from "write-only > > command". I'm open to improvements on what the ioctl is called. > > It's basically like setting a register on a device, except I was > > concerned that if we actually called it a "register" that people > > would take it too literally and think it's only for the architected > > register state of the emulated device. > I agree "attribute" is better name than "register", but injecting > interrupt is not setting an attribute. It's a dynamic attribute -- the state of the input line. Better names =20 are welcome. I don't see this difference as enough to warrant separate =20 ioctls. > > >> >ARM vGIC code, that is ready to go upstream, uses old way too. =20 > So > > >> >it will > > >> >be 2 archs against one. > > >> > > >> I wasn't aware that that's how it worked. :-P > > >> > > >What worked? That vGIC uses existing interface or that non generic > > >interface used by many arches wins generic one used by only one =20 > arch? > > > > The latter. Two wrongs don't make a right, and adding another > > inextensible, device-specific API is not the answer to the existing > > APIs being too inextensible and device/arch-specific. Some portion > > will always need to be device-specific because we're controlling the > > creation and of a specific device, but the glue does not need to be. > > > This is not "adding another inextensible, device-specific API" vs =20 > "adding > cool generic extensible API" though. It is "using existing =20 > inextensible, > device-specific API" vs "adding cool generic extensible API". The "existing inextensible device-specific API" doesn't have support =20 for this "specific device". Something new has to be added one way or =20 another. > > >APIs are easy to add and impossible to remove. > > > > That's why I want to get it right this time. > > > And what if you'll fail? That's always a possibility of course. I don't think that's a good =20 reason to avoid trying to move in the right direction. > What if next architecture will bring new > developer that will proclaim your new interface "silly" since it does =20 > not > allow for device destruction and do not return file descriptor for =20 > newly > created device that userspace can do select on to wait for a device's > events or mmap memory for fast userspace/device communication? The device id that gets returned is arbitrary; you could turn it into =20 an fd later with no loss of compatibility. Device destruction would complicate things and I would not support =20 requiring all devices to allow it. If someone wanted to add it for =20 certain devices, at the interface level it would just be a new ioctl. -Scott