* XenProject/XenServer QEMU working group, Friday 8th July, 2016, 15:00. @ 2016-07-28 17:01 Jennifer Herbert 2016-08-01 11:32 ` Device model operation hypercall (DMOP, re qemu depriv) Ian Jackson 2016-09-09 16:18 ` XenProject/XenServer QEMU working group minutes, 30th August 2016 Jennifer Herbert 0 siblings, 2 replies; 74+ messages in thread From: Jennifer Herbert @ 2016-07-28 17:01 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Ian Jackson, David Vrabel [-- Attachment #1: Type: text/plain, Size: 1 bytes --] [-- Attachment #2: qemu-col-notes.txt --] [-- Type: text/plain, Size: 6135 bytes --] XenProject/XenServer QEMU working group, Friday 8th July, 2016, 15:00. Date and Attendees ================== XenProject/XenServer QEMU working group, held on Friday the 8th of July, 2016 at 15:00. The Following were present: Ian Jackson [platform team] David Vrabel [ring0] Andrew Cooper [ring0] Jennifer Herbert [ring0] Purpose of meeting ================== Both XenServer (currently using qemu-trad with de-priv) and XenProject (using QEMU (upstream) without dep-priv), would like to move to using QEMU (upstream) de-privelaged. This meeting was intended to restart XenServer/ring0 and XenProject's collaboration. Agenda includes: * Discuss requirements * Review the our status and strategy on achieving these requirements. * Agree next steps. Meeting Actions =============== Ian: Find all the Xenstore keys used by QEMU, and evaluate how much work it would be to either read before privileges are dropped, or otherwise stop reading/writing them when de-privelaged. Ian: Write up DM_Opp design (discussed during meeting), talk to Jan about this. XenServer: Go though All the priv-cmd ops, check how they would fit within the dm-opp design. David: Post Event channel / priv command patch to upstream. (Now done) Meeting Transcript - abridged ============================= Discussed Goals --------------- Everyone agrees we want to stop anything gaining control of QEMU from also gaining access to the platform. XenProject would like to use depriv by default. Should not preclude other configurations such as PCI pass though. Ian: There is a project in progress to use stub domains with QEMU – would expect a user to run depriv or stubdoms, but not at the same time. XS: states that is does not want to go down the stub domains route, as is not considered scalable. (Boot storms etc;) Ian: In upstream we expect to pursue both depriv qemu, and stub qemu, as implementation strategies. Depriv qemu is probably deliverable sooner. Stub qemu is more secure but not suitable for all users (eg, maybe not for XenServer as David suggested). Going though list technology areas listed in document shared beforehand. Disk / Nsetwork / Host IO ------------------------- Can be addressed by running as unprivileged user. Andrew: Rather linux centric - not posix solution for all these problem: Ian: There are similar interfaces on other platforms. In upstream we expect to at least be able to run as an unprivileged user, which is indeed portable. Mem map ------------ XenServer has a solution. Ian: Would like to see this shared immediately. Andrew: Consider rest of design first. HyperCalls ---------- Three options: 1: XSM 2: Fix ABIan: Re-arrange priv-cmd to make it easier to parse and restrict. 3: Loadable pattern matching solution. All agree this would be horrible to maintain. Ian: Difference between not having Stable API, and the instability of ABI stopping the determination of domain. Ian: Version compatibility out of scope. Andrew: Disagree, should do both together. Andrew: Requirement: – no interdependence between QEMU and the kernel, such that you need to update both together. All: Agree xen interface should not be embedded in the kernel. Jen: Asked why Ian was against using XSM. Ian: XSM wholesale replace existing checks, and current XSM polices are unproven. Andrew: XS is experimenting, and there are problems. Ian & Andrew: Agree would need auditing. Ian: Reluctant to link xsm to this work. David: Can do both – XSM and 'Fix ABI'. Ian: With both, wouldn't need the nesting. David: Would still need to nesting to wrap domains. Ian: Doesn't have to do all …. David: <Unconvinced> David: (wearing a linux hat): Sceptical for stable API – don't want to have a new kernel when API new features added. Ian: Suggest ideas of DM-Space – D-ctrl space. David: Would need to be a strong component. Both ends would want commitment ~ no snow flakes. Ian: Shouldn't be a ploblem. Andrew: PVH would relpcaes QEMU. Ian: Not relevant. Idea of DM-ops is discussed more. Dm-op restricted to domain No need to version this. Important fields (domain) fixed, such that DM-ops can be added changed, without affecting kernel interface. Andrew bring up XenGT. Is this considered part of QEMU. QEMU shouldn't set pci bars. Question if DM-OP would just be HVM-OP. David: Should enumerate hyper-calls, make sure in DM-ops. Andrew & Ian : would need to discuss with Jan. Some things may need to be in both dm-op & other things, ie guest interface. - This is ok. dm-ops is tentatively agreed on as a way forward. XenStore Restrict. ------------------ Jen : Upstream discussing maybe dropping sockets. (Necessary for restrict) David: Problem, is it gives the same permissions as the guest – does this restrict too much? Should look though all uses, see if they can be used before it drops privileges. Ian: Already looked at QEMU PV backend/DM spilt work. - This work was awkard disjoint work. I think this approach would work. Jen: Brought up Physmap keys in xenstore… All agree this isn't nice. Andrew: Mentions memory accounts swamp he wants to sort out. Others: - don't want to tie this work to that. Ian: Guest can blow of its own head – physmap not great, but ok. Need to get Xenstore keys, so see issues. Ian: Happy to say everyone should move to xenstore-d, but need to find way to multiplex this, such that it can be used from stub domains. Need some sort of plan. New ring – not good idea. Alternative plan, prefix command. Andrew: Protocol command? David: not like that idea. Ian: Huge patch needed to separate PV-QEMU from DM part, running in separate qemus. David: XenSever has no PV part, so XS dodges this issue. Ian: Might be xenstore path issues, if not to use the switch. David: It may be that we can remove all use of xenstore while deprivelaged, we could avoid the need for xenstore changes. Agreed someone would need to look though remaining uses, to see how much work this would be. [-- Attachment #3: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Device model operation hypercall (DMOP, re qemu depriv) 2016-07-28 17:01 XenProject/XenServer QEMU working group, Friday 8th July, 2016, 15:00 Jennifer Herbert @ 2016-08-01 11:32 ` Ian Jackson 2016-08-01 12:41 ` Jan Beulich ` (2 more replies) 2016-09-09 16:18 ` XenProject/XenServer QEMU working group minutes, 30th August 2016 Jennifer Herbert 1 sibling, 3 replies; 74+ messages in thread From: Ian Jackson @ 2016-08-01 11:32 UTC (permalink / raw) To: Jan Beulich, Andrew Cooper, dgdegra, Wei Liu, George Dunlap, Stefano Stabellini, Tim Deegan, David Vrabel, Anthony Perard, Konrad Rzeszutek Wilk Cc: xen-devel Introducing HVMCTL, Jan wrote: > A long while back separating out all control kind operations (intended > for use by only the control domain or device model) from the currect > hvmop hypercall has been discussed. This series aims at finally making > this reality (at once allowing to streamline the associated XSM checking). I think we need to introduce a new hypercall (which I will call DMOP for now) which may augment or replace some of HVMCTL. Let me explain: We would like to be able to deprivilege qemu-in-dom0. This is because qemu has a large attack surface and has a history of security bugs. If we get this right we can easily reduce the impact of `guest can take over qemu' bugs to DoS; and perhaps with a bit of effort we can eliminate the DoS too. (qemu stubdom are another way to do this but they have their own difficulties.) A part of this plan has to be a way for qemu to make hypercalls related to the guest it is servicing. But qemu needs to be _unable_ to make _other_ hypercalls. I see four possible approaches. In IMO increasing order of desirability: 1. We could simply patch the dom0 privcmd driver to know exactly which hypercalls are permitted. This is obviously never going to work because there would have to be a massive table in the kernel, kept in step with Xen. We could have a kind of pattern matching engine instead, and load the tables from userspace, but that's a daft edifice to be building (even if we reuse BPF or something) and a total pain to maintain. 2. We could have some kind of privileged proxy or helper process, which makes the hypercalls on instruction from qemu. This would be quite complicated and involve a lot of back-and-forth parameter passing. Like option 1, this arrangement would end up embedding detailed knowledge about which hypercalls are appropriate, and have to understand all of their parameters. 3. We could have the dom0 privcmd driver wrap each of qemu's hypercalls in a special "wrap up with different XSM tag" hypercall. Then, we could specify the set of allowable hypercalls with XSM. If we want qemu deprivileged by default, this depends on turning XSM on by default. But we want qemu depriv ASAP and there are difficulties with XSM by default. This approach also involves writing a large and hard-to-verify hypercall permission table, in the form of an XSM policy. 4. We could invent a new hypercall `DMOP' for hypercalls which device models should be able to use, which always has the target domain in a fixed location in the arguments. We have the dom0 privcmd driver know about this one hypercall number and the location of the target domid. Option 4 has the following advantages: * The specification of which hypercalls are authorised to qemu is integrated with the specification of the hypercalls themselves: There is no need to maintain a separate table which can get out of step (or contain security bugs). * The changes required to the rest of the system are fairly small. In particular: * We need only one small, non-varying, patch to the dom0 kernel. Let me flesh out option 4 in more detail: We define a new hypercall DMOP. Its first argument is always a target domid. The DMOP hypercall number and position of the target domid in the arguments are fixed. A DMOP is defined to never put at risk the stability or security of the whole system, nor of the domain which calls DMOP. However, a DMOP may have arbitrary effects on the target domid. In the privcmd driver, we provide a new restriction ioctl, which takes a domid parameter. After that restriction ioctl is called, the privcmd driver will permit only DMOP hypercalls, and only with the specified target domid. Since the hypercall number and the target domid are stable, this is a simple check which will not need to be updated as new DMOPs are defined (and old ones retired). DMOPs are not available to guests (other than stub device model domains) and do not form part of the guest-stable ABI. Where the set of operations provided through DMOPs overlaps with guest-stable hypercalls, identical functionality must provided through both parts of the hypercall namespace. Privileged toolstack software is permitted to use DMOPs as well as other hypercalls, of course. So there is no need to duplicate functionality between DMOPs and non-stable privileged toolstack hypercalls. On ABI/API stability: For this scheme to work, it is not essential that the DMOPs themselves should have a stable ABI. However, we do want to be able to decouple qemu versions from Xen versions. This could be done by having the relevant bit of libxc (let us suppose libdevicemodel) be capable of driving multiple versions of Xen. Or by having different libdevicemodel versions, one for each version of Xen, and some kind of ad-hoc select-the-right-library arrangement to cope with dual booting. Alternatively, old DMOP interfaces (ie, old DMOPs) could simply be retained for a few Xen releases and then retired, providing a semi-stable ABI to device model software. In any case, probably the DMOP opcode needs to be a wide field so that when new DMOPs, or new versions of old DMOPs, arise, we can assign them new numbers. (Alternatively we could have a version field in every DMOP which is checked for equality, but that makes some compatibility strategies more painful.) What do people think ? Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-01 11:32 ` Device model operation hypercall (DMOP, re qemu depriv) Ian Jackson @ 2016-08-01 12:41 ` Jan Beulich 2016-08-02 11:38 ` Wei Liu 2016-08-02 11:37 ` Wei Liu 2016-09-09 15:16 ` Jennifer Herbert 2 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-01 12:41 UTC (permalink / raw) To: Andrew Cooper, Ian Jackson Cc: StefanoStabellini, Wei Liu, George Dunlap, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 01.08.16 at 13:32, <ian.jackson@eu.citrix.com> wrote: > 4. We could invent a new hypercall `DMOP' for hypercalls which device > models should be able to use, which always has the target domain in > a fixed location in the arguments. We have the dom0 privcmd driver > know about this one hypercall number and the location of the target > domid. > > Option 4 has the following advantages: > > * The specification of which hypercalls are authorised to qemu is > integrated with the specification of the hypercalls themselves: > There is no need to maintain a separate table which can get out of > step (or contain security bugs). > > * The changes required to the rest of the system are fairly small. > In particular: > > * We need only one small, non-varying, patch to the dom0 kernel. > > > Let me flesh out option 4 in more detail: > > > We define a new hypercall DMOP. > > Its first argument is always a target domid. The DMOP hypercall > number and position of the target domid in the arguments are fixed. > > A DMOP is defined to never put at risk the stability or security of > the whole system, nor of the domain which calls DMOP. However, a DMOP > may have arbitrary effects on the target domid. With the exception of this and the privcmd layer described below, DMOP == HVMCTL afaics. The privcmd layer is independent anyway. And the security aspect mentioned above won't disappear if we use DMOP instead of HVMCTL. So I don't see why the hvmctl series as is can't be the starting point of this, with the stability/ security concerns addressed subsequently, for being orthogonal. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-01 12:41 ` Jan Beulich @ 2016-08-02 11:38 ` Wei Liu 2016-08-02 11:58 ` Jan Beulich 2016-08-03 10:29 ` Ian Jackson 0 siblings, 2 replies; 74+ messages in thread From: Wei Liu @ 2016-08-02 11:38 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra On Mon, Aug 01, 2016 at 06:41:20AM -0600, Jan Beulich wrote: > >>> On 01.08.16 at 13:32, <ian.jackson@eu.citrix.com> wrote: > > 4. We could invent a new hypercall `DMOP' for hypercalls which device > > models should be able to use, which always has the target domain in > > a fixed location in the arguments. We have the dom0 privcmd driver > > know about this one hypercall number and the location of the target > > domid. > > > > Option 4 has the following advantages: > > > > * The specification of which hypercalls are authorised to qemu is > > integrated with the specification of the hypercalls themselves: > > There is no need to maintain a separate table which can get out of > > step (or contain security bugs). > > > > * The changes required to the rest of the system are fairly small. > > In particular: > > > > * We need only one small, non-varying, patch to the dom0 kernel. > > > > > > Let me flesh out option 4 in more detail: > > > > > > We define a new hypercall DMOP. > > > > Its first argument is always a target domid. The DMOP hypercall > > number and position of the target domid in the arguments are fixed. > > > > A DMOP is defined to never put at risk the stability or security of > > the whole system, nor of the domain which calls DMOP. However, a DMOP > > may have arbitrary effects on the target domid. > > With the exception of this and the privcmd layer described below, > DMOP == HVMCTL afaics. The privcmd layer is independent anyway. > And the security aspect mentioned above won't disappear if we > use DMOP instead of HVMCTL. So I don't see why the hvmctl > series as is can't be the starting point of this, with the stability/ > security concerns addressed subsequently, for being orthogonal. > Yeah, to turn HVMCTL to DMOP: 1. s/HVMCTL/DMOP/ 2. maybe s/interface_version// I think we could at least do #1 and merge the series. Wei. > Jan > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-02 11:38 ` Wei Liu @ 2016-08-02 11:58 ` Jan Beulich 2016-08-02 13:02 ` David Vrabel 2016-08-03 10:29 ` Ian Jackson 1 sibling, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-02 11:58 UTC (permalink / raw) To: Wei Liu Cc: StefanoStabellini, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 02.08.16 at 13:38, <wei.liu2@citrix.com> wrote: > On Mon, Aug 01, 2016 at 06:41:20AM -0600, Jan Beulich wrote: >> >>> On 01.08.16 at 13:32, <ian.jackson@eu.citrix.com> wrote: >> > 4. We could invent a new hypercall `DMOP' for hypercalls which device >> > models should be able to use, which always has the target domain in >> > a fixed location in the arguments. We have the dom0 privcmd driver >> > know about this one hypercall number and the location of the target >> > domid. >> > >> > Option 4 has the following advantages: >> > >> > * The specification of which hypercalls are authorised to qemu is >> > integrated with the specification of the hypercalls themselves: >> > There is no need to maintain a separate table which can get out of >> > step (or contain security bugs). >> > >> > * The changes required to the rest of the system are fairly small. >> > In particular: >> > >> > * We need only one small, non-varying, patch to the dom0 kernel. >> > >> > >> > Let me flesh out option 4 in more detail: >> > >> > >> > We define a new hypercall DMOP. >> > >> > Its first argument is always a target domid. The DMOP hypercall >> > number and position of the target domid in the arguments are fixed. >> > >> > A DMOP is defined to never put at risk the stability or security of >> > the whole system, nor of the domain which calls DMOP. However, a DMOP >> > may have arbitrary effects on the target domid. >> >> With the exception of this and the privcmd layer described below, >> DMOP == HVMCTL afaics. The privcmd layer is independent anyway. >> And the security aspect mentioned above won't disappear if we >> use DMOP instead of HVMCTL. So I don't see why the hvmctl >> series as is can't be the starting point of this, with the stability/ >> security concerns addressed subsequently, for being orthogonal. >> > > Yeah, to turn HVMCTL to DMOP: > > 1. s/HVMCTL/DMOP/ > 2. maybe s/interface_version// Andrew had brought up 2 too, but I'm really not sure that'd be a good idea. I rather think we should keep it but maybe (other than domctl/sysctl) recognize older versions. In any event I consider having it better for an unstable interface (as Ian said, libxc is supposed to provide the stable one). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-02 11:58 ` Jan Beulich @ 2016-08-02 13:02 ` David Vrabel 2016-08-02 13:29 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: David Vrabel @ 2016-08-02 13:02 UTC (permalink / raw) To: Jan Beulich, Wei Liu Cc: StefanoStabellini, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, Anthony Perard, xen-devel, dgdegra On 02/08/16 12:58, Jan Beulich wrote: >>>> On 02.08.16 at 13:38, <wei.liu2@citrix.com> wrote: >> On Mon, Aug 01, 2016 at 06:41:20AM -0600, Jan Beulich wrote: >>>>>> On 01.08.16 at 13:32, <ian.jackson@eu.citrix.com> wrote: >>>> 4. We could invent a new hypercall `DMOP' for hypercalls which device >>>> models should be able to use, which always has the target domain in >>>> a fixed location in the arguments. We have the dom0 privcmd driver >>>> know about this one hypercall number and the location of the target >>>> domid. >>>> >>>> Option 4 has the following advantages: >>>> >>>> * The specification of which hypercalls are authorised to qemu is >>>> integrated with the specification of the hypercalls themselves: >>>> There is no need to maintain a separate table which can get out of >>>> step (or contain security bugs). >>>> >>>> * The changes required to the rest of the system are fairly small. >>>> In particular: >>>> >>>> * We need only one small, non-varying, patch to the dom0 kernel. >>>> >>>> >>>> Let me flesh out option 4 in more detail: >>>> >>>> >>>> We define a new hypercall DMOP. >>>> >>>> Its first argument is always a target domid. The DMOP hypercall >>>> number and position of the target domid in the arguments are fixed. >>>> >>>> A DMOP is defined to never put at risk the stability or security of >>>> the whole system, nor of the domain which calls DMOP. However, a DMOP >>>> may have arbitrary effects on the target domid. >>> >>> With the exception of this and the privcmd layer described below, >>> DMOP == HVMCTL afaics. The privcmd layer is independent anyway. >>> And the security aspect mentioned above won't disappear if we >>> use DMOP instead of HVMCTL. So I don't see why the hvmctl >>> series as is can't be the starting point of this, with the stability/ >>> security concerns addressed subsequently, for being orthogonal. >>> >> >> Yeah, to turn HVMCTL to DMOP: >> >> 1. s/HVMCTL/DMOP/ >> 2. maybe s/interface_version// > > Andrew had brought up 2 too, but I'm really not sure that'd be a > good idea. I rather think we should keep it but maybe (other than > domctl/sysctl) recognize older versions. In any event I consider > having it better for an unstable interface (as Ian said, libxc is > supposed to provide the stable one). A stable user space library API is no good for an in-kernel emulator, like that needed for Intel GVT-g -- the hypercall ABI needs to be stable. David _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-02 13:02 ` David Vrabel @ 2016-08-02 13:29 ` Jan Beulich 0 siblings, 0 replies; 74+ messages in thread From: Jan Beulich @ 2016-08-02 13:29 UTC (permalink / raw) To: David Vrabel Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, Anthony Perard, xen-devel, dgdegra >>> On 02.08.16 at 15:02, <david.vrabel@citrix.com> wrote: > On 02/08/16 12:58, Jan Beulich wrote: >>>>> On 02.08.16 at 13:38, <wei.liu2@citrix.com> wrote: >>> On Mon, Aug 01, 2016 at 06:41:20AM -0600, Jan Beulich wrote: >>>>>>> On 01.08.16 at 13:32, <ian.jackson@eu.citrix.com> wrote: >>>>> 4. We could invent a new hypercall `DMOP' for hypercalls which device >>>>> models should be able to use, which always has the target domain in >>>>> a fixed location in the arguments. We have the dom0 privcmd driver >>>>> know about this one hypercall number and the location of the target >>>>> domid. >>>>> >>>>> Option 4 has the following advantages: >>>>> >>>>> * The specification of which hypercalls are authorised to qemu is >>>>> integrated with the specification of the hypercalls themselves: >>>>> There is no need to maintain a separate table which can get out of >>>>> step (or contain security bugs). >>>>> >>>>> * The changes required to the rest of the system are fairly small. >>>>> In particular: >>>>> >>>>> * We need only one small, non-varying, patch to the dom0 kernel. >>>>> >>>>> >>>>> Let me flesh out option 4 in more detail: >>>>> >>>>> >>>>> We define a new hypercall DMOP. >>>>> >>>>> Its first argument is always a target domid. The DMOP hypercall >>>>> number and position of the target domid in the arguments are fixed. >>>>> >>>>> A DMOP is defined to never put at risk the stability or security of >>>>> the whole system, nor of the domain which calls DMOP. However, a DMOP >>>>> may have arbitrary effects on the target domid. >>>> >>>> With the exception of this and the privcmd layer described below, >>>> DMOP == HVMCTL afaics. The privcmd layer is independent anyway. >>>> And the security aspect mentioned above won't disappear if we >>>> use DMOP instead of HVMCTL. So I don't see why the hvmctl >>>> series as is can't be the starting point of this, with the stability/ >>>> security concerns addressed subsequently, for being orthogonal. >>>> >>> >>> Yeah, to turn HVMCTL to DMOP: >>> >>> 1. s/HVMCTL/DMOP/ >>> 2. maybe s/interface_version// >> >> Andrew had brought up 2 too, but I'm really not sure that'd be a >> good idea. I rather think we should keep it but maybe (other than >> domctl/sysctl) recognize older versions. In any event I consider >> having it better for an unstable interface (as Ian said, libxc is >> supposed to provide the stable one). > > A stable user space library API is no good for an in-kernel emulator, > like that needed for Intel GVT-g -- the hypercall ABI needs to be stable. I'm pretty certain only a (perhaps small) subset of the proposed new operations would be needed by them, which we could then consider marking stable. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-02 11:38 ` Wei Liu 2016-08-02 11:58 ` Jan Beulich @ 2016-08-03 10:29 ` Ian Jackson 2016-08-03 12:03 ` Jan Beulich 1 sibling, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-08-03 10:29 UTC (permalink / raw) To: Wei Liu Cc: StefanoStabellini, George Dunlap, Andrew Cooper, Tim Deegan, David Vrabel, Jan Beulich, Anthony Perard, xen-devel, dgdegra Wei Liu writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > On Mon, Aug 01, 2016 at 06:41:20AM -0600, Jan Beulich wrote: > > > A DMOP is defined to never put at risk the stability or security of > > > the whole system, nor of the domain which calls DMOP. However, a DMOP > > > may have arbitrary effects on the target domid. > > > > With the exception of this and the privcmd layer described below, > > DMOP == HVMCTL afaics. The privcmd layer is independent anyway. > > And the security aspect mentioned above won't disappear if we > > use DMOP instead of HVMCTL. So I don't see why the hvmctl > > series as is can't be the starting point of this, with the stability/ > > security concerns addressed subsequently, for being orthogonal. I don't (currently) have a clear understanding of how my proposed DMOP relates to HVMCTL. I thought it useful to set out the DMOP proposal from first principles, with clear motivation, discussion of not-chosen alternatives, and of course with a clear statement of the principles of operation and of the security design. The security property I have quoted above is absolutely critical to the DMOP proposal. I'm a bit concerned by comments like the above `with the exception of this' (which seems to refer to the security property). Earlier during one of the HVMCTL threads I asked This is a slight digression, but is it intended that all of these hvmctl's are safe to expose to a deprivileged device model process in dom0, or to a device model stub domain ? Jan replied: Yes, afaict (they've been exposed the same way before). Does that mean that functionality exposed by all the prooposed HVMCTLs is currently available to stubdoms ? > Yeah, to turn HVMCTL to DMOP: > > 1. s/HVMCTL/DMOP/ > 2. maybe s/interface_version// Well, that would certainly be nice. But there are some caveats I would like sorting out. > > So I don't see why the hvmctl series as is can't be the starting > > point of this, with the stability/ security concerns addressed > > subsequently, for being orthogonal. Please don't misunderstand me as trying to compete with or block your HVMCTL work. It may well be that HVMCTL is what I want, but: If we adopt the design principles I describe in my DMOP proposal, I don't think the security concerns are separable. ISTM that a patch series introducing DMOP should start with a patch which introduces the DMOP hypercall, with no sub-operations. Such a patch would have code content very like that in [PATCH 01/11] public / x86: introduce hvmctl hypercall But, such a patch should also explain the semantics. The Xen public headers ought to contain explanations of the promises that the hypervisor makes about DMOP. Importantly: - the promise that a DMOP cannot harm anyone except the target domid - the ABI stability of the target domid field - what the ABI stability policy is wrt the actual DMOPs themselves If the 01/ patch contains such promises, then logically the 02/ patch which introduces the first DMOP is extending that promise to that operation. It is at that point that the security decision should be made. Now, there may be other ways to represent/record the security status. But it will be necessary to either (i) avoid violating the DMOP security promise, by making questionable calls not available via DMOP or (ii) trying to retrofit the security promise to DMOP later. I think (ii) is not a good approach. It would amount to introducing a whole new set of interfaces, and then later trying to redefine them to have a particular security property which was not originally there. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-03 10:29 ` Ian Jackson @ 2016-08-03 12:03 ` Jan Beulich 2016-08-03 13:37 ` Ian Jackson 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-03 12:03 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 03.08.16 at 12:29, <ian.jackson@eu.citrix.com> wrote: > Wei Liu writes ("Re: Device model operation hypercall (DMOP, re qemu > depriv)"): >> On Mon, Aug 01, 2016 at 06:41:20AM -0600, Jan Beulich wrote: >> > > A DMOP is defined to never put at risk the stability or security of >> > > the whole system, nor of the domain which calls DMOP. However, a DMOP >> > > may have arbitrary effects on the target domid. >> > >> > With the exception of this and the privcmd layer described below, >> > DMOP == HVMCTL afaics. The privcmd layer is independent anyway. >> > And the security aspect mentioned above won't disappear if we >> > use DMOP instead of HVMCTL. So I don't see why the hvmctl >> > series as is can't be the starting point of this, with the stability/ >> > security concerns addressed subsequently, for being orthogonal. > > I don't (currently) have a clear understanding of how my proposed DMOP > relates to HVMCTL. > > I thought it useful to set out the DMOP proposal from first > principles, with clear motivation, discussion of not-chosen > alternatives, and of course with a clear statement of the principles > of operation and of the security design. Okay; nevertheless I did get the feeling that some of this was prompted by the hvmctl series posting. > The security property I have quoted above is absolutely critical to > the DMOP proposal. I'm a bit concerned by comments like the above > `with the exception of this' (which seems to refer to the security > property). Indeed it does. > Earlier during one of the HVMCTL threads I asked > > This is a slight digression, but is it intended that all of these > hvmctl's are safe to expose to a deprivileged device model process in > dom0, or to a device model stub domain ? > > Jan replied: > > Yes, afaict (they've been exposed the same way before). > > Does that mean that functionality exposed by all the prooposed HVMCTLs > is currently available to stubdoms ? That series only moves code from one hypercall to another (new) one, without any security implications at all. What has been available to stubdoms prior to that series will be available the same way once it got applied. >> > So I don't see why the hvmctl series as is can't be the starting >> > point of this, with the stability/ security concerns addressed >> > subsequently, for being orthogonal. > > Please don't misunderstand me as trying to compete with or block > your HVMCTL work. It may well be that HVMCTL is what I want, but: > > If we adopt the design principles I describe in my DMOP proposal, I > don't think the security concerns are separable. > > ISTM that a patch series introducing DMOP should start with a patch > which introduces the DMOP hypercall, with no sub-operations. > > Such a patch would have code content very like that in > [PATCH 01/11] public / x86: introduce hvmctl hypercall > > But, such a patch should also explain the semantics. The Xen public > headers ought to contain explanations of the promises that the > hypervisor makes about DMOP. Importantly: > - the promise that a DMOP cannot harm anyone except the target domid > - the ABI stability of the target domid field > - what the ABI stability policy is wrt the actual DMOPs themselves Well, none of that was the original goal of the series; some of this could be merged in. > If the 01/ patch contains such promises, then logically the 02/ patch > which introduces the first DMOP is extending that promise to that > operation. It is at that point that the security decision should be > made. Correct. Yet again the original goal of the series was just proper separation of two groups of operations that should never have been all thrown under the same hypercall. > Now, there may be other ways to represent/record the security status. > But it will be necessary to either (i) avoid violating the DMOP > security promise, by making questionable calls not available via DMOP > or (ii) trying to retrofit the security promise to DMOP later. > > I think (ii) is not a good approach. It would amount to introducing a > whole new set of interfaces, and then later trying to redefine them to > have a particular security property which was not originally there. I agree that (i) would be the better approach, but I don't think I can assess when I would find the time to do the required auditing of all involved code. Plus I don't see the difference between going the (ii) route with the presented hvmctl series vs keeping things as they are (under hvmop) - in both cases the security promise will need to be retrofit onto existing code. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-03 12:03 ` Jan Beulich @ 2016-08-03 13:37 ` Ian Jackson 2016-08-03 14:16 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-08-03 13:37 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > On 03.08.16 at 12:29, <ian.jackson@eu.citrix.com> wrote: > > I thought it useful to set out the DMOP proposal from first > > principles, with clear motivation, discussion of not-chosen > > alternatives, and of course with a clear statement of the principles > > of operation and of the security design. > > Okay; nevertheless I did get the feeling that some of this was > prompted by the hvmctl series posting. Well, I think the HVMCTL series is an important part of this conversation - even though this DMOP idea is motivated by the qemu depriv proposals (initiated mostly by Stefano in the last release cycle). > > Does that mean that functionality exposed by all the prooposed HVMCTLs > > is currently available to stubdoms ? > > That series only moves code from one hypercall to another (new) one, > without any security implications at all. What has been available to > stubdoms prior to that series will be available the same way once it > got applied. So what you mean is that in HVMCTL, the privilege check is retained in the individual HVMCTL sub-operation ? (Ie what used to be IS_PRIV or IS_PRIV_FOR - and implicitly, some sub-ops would be IS_PRIV and some IS_PRIV_FOR.) But looking at your v2 01/11, I see this in do_hvmctl: + rc = xsm_hvm_control(XSM_DM_PRIV, d, op.cmd); + if ( rc ) + { + rcu_unlock_domain(d); + return rc; + } And looking at v2 02/11, I didn't see any privilege check in the specific hypercall. With DMOP it would make sense for the privilege check to be IS_PRIV_FOR, in the DMOP dispatcher. But it seems that this privilege check already exists in HVMCTL in the right form. So AFAICT HVMCTLs already guarantee not to have an adverse impact on the whole system. If that weren't the case, then a stub dm could exploit the situation. Is the audit that is required, to check that the DMOP doesn't have an adverse effect on the _calling domain_ ? AFAICT most HVMCTLs/DMOPs have /no/ effect on the calling domain, other than as part of argument passing. So that audit should be easy. So I am confused. What am I missing ? > > If the 01/ patch contains such promises, then logically the 02/ patch > > which introduces the first DMOP is extending that promise to that > > operation. It is at that point that the security decision should be > > made. > > Correct. Yet again the original goal of the series was just proper > separation of two groups of operations that should never have > been all thrown under the same hypercall. What I am doing is presenting another, additional goal. I don't dispute the usefulness of the HVMCTL cleanup. But I want to understand our future intentions. In particular, as a proponent of the DMOP idea, I want to avoid the situation that my idea is blocked somehow by a conflicting series. > > Now, there may be other ways to represent/record the security status. > > But it will be necessary to either (i) avoid violating the DMOP > > security promise, by making questionable calls not available via DMOP > > or (ii) trying to retrofit the security promise to DMOP later. > > > > I think (ii) is not a good approach. It would amount to introducing a > > whole new set of interfaces, and then later trying to redefine them to > > have a particular security property which was not originally there. > > I agree that (i) would be the better approach, but I don't think I > can assess when I would find the time to do the required auditing > of all involved code. Plus I don't see the difference between going > the (ii) route with the presented hvmctl series vs keeping things as > they are (under hvmop) - in both cases the security promise will > need to be retrofit onto existing code. If we don't apply HVMCTL, we can introduce DMOP and then individually move hypercalls from hvmop to DMOP as they are audited. Would a similar approach be acceptable even after HVMCTL ? That is, the following plan: 1. Apply HVMCTL right away. This solves the cleanup problem, but leaves the qemu depriv problem unsolved. 2. After the necessary discussion to understand and refine the DMOP design, create a new DMOP hypercall with appropriate security promises and whatever stability promises are agreed, but with no sub-ops. 3. Move each HVMCTL to DMOP, one by one, as it is audited. 4. Perhaps some HVMCTLs will remain which are inherently unsuitable for use with qemu depriv. If not, perhaps eventually delete HVMCTL entirely, or leave it empty (with no subops). This has the downside that we end up moving all the DMOPs twice. But it does allow us to separate the audit work from the cleanup/reorg. Regards, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-03 13:37 ` Ian Jackson @ 2016-08-03 14:16 ` Jan Beulich 2016-08-03 14:21 ` George Dunlap 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-03 14:16 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 03.08.16 at 15:37, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu > depriv)"): >> On 03.08.16 at 12:29, <ian.jackson@eu.citrix.com> wrote: >> > Does that mean that functionality exposed by all the prooposed HVMCTLs >> > is currently available to stubdoms ? >> >> That series only moves code from one hypercall to another (new) one, >> without any security implications at all. What has been available to >> stubdoms prior to that series will be available the same way once it >> got applied. > > So what you mean is that in HVMCTL, the privilege check is retained in > the individual HVMCTL sub-operation ? (Ie what used to be IS_PRIV or > IS_PRIV_FOR - and implicitly, some sub-ops would be IS_PRIV and some > IS_PRIV_FOR.) > > But looking at your v2 01/11, I see this in do_hvmctl: > > + rc = xsm_hvm_control(XSM_DM_PRIV, d, op.cmd); > + if ( rc ) > + { > + rcu_unlock_domain(d); > + return rc; > + } > > And looking at v2 02/11, I didn't see any privilege check in the > specific hypercall. Of course - there were no explicit IS_PRIV{,_FOR}() uses before (i.e. the patch also doesn't remove any), these all sit behind the XSM hook. And no, there are no per-sub-op privilege checks, they're being consolidated to just the one you quote above. > With DMOP it would make sense for the privilege check to be > IS_PRIV_FOR, in the DMOP dispatcher. But it seems that this privilege > check already exists in HVMCTL in the right form. > > So AFAICT HVMCTLs already guarantee not to have an adverse impact on > the whole system. If that weren't the case, then a stub dm could > exploit the situation. And to tell you the truth, I'm not entirely convinced that all the code implementing those operations (be it in there current hvmop form or the new hvmctl one - again, the series only moves code around) is really safe in this regard. But with there even being at least one domctl not on xsm-flask.txt's safe-for-disaggregation list, but reportedly used by qemu (I don't recall right now which exact one it is), stubdom-s can't be considered fully secure right now anyway. > Is the audit that is required, to check that the DMOP doesn't have an > adverse effect on the _calling domain_ ? AFAICT most HVMCTLs/DMOPs > have /no/ effect on the calling domain, other than as part of argument > passing. So that audit should be easy. > > So I am confused. What am I missing ? The main adverse effect on the whole system that I can see would be long latency operations, but I can't exclude others: Just look at the final patch of the series, which fixes a serialization bug which I noticed while doing the code movement. I don't think lack of proper serialization is guaranteed to only affect the target domain. >> > Now, there may be other ways to represent/record the security status. >> > But it will be necessary to either (i) avoid violating the DMOP >> > security promise, by making questionable calls not available via DMOP >> > or (ii) trying to retrofit the security promise to DMOP later. >> > >> > I think (ii) is not a good approach. It would amount to introducing a >> > whole new set of interfaces, and then later trying to redefine them to >> > have a particular security property which was not originally there. >> >> I agree that (i) would be the better approach, but I don't think I >> can assess when I would find the time to do the required auditing >> of all involved code. Plus I don't see the difference between going >> the (ii) route with the presented hvmctl series vs keeping things as >> they are (under hvmop) - in both cases the security promise will >> need to be retrofit onto existing code. > > If we don't apply HVMCTL, we can introduce DMOP and then individually > move hypercalls from hvmop to DMOP as they are audited. > > Would a similar approach be acceptable even after HVMCTL ? > > That is, the following plan: > > 1. Apply HVMCTL right away. This solves the cleanup problem, > but leaves the qemu depriv problem unsolved. > > 2. After the necessary discussion to understand and refine the DMOP > design, create a new DMOP hypercall with appropriate security > promises and whatever stability promises are agreed, but with no > sub-ops. > > 3. Move each HVMCTL to DMOP, one by one, as it is audited. > > 4. Perhaps some HVMCTLs will remain which are inherently unsuitable > for use with qemu depriv. If not, perhaps eventually delete HVMCTL > entirely, or leave it empty (with no subops). > > This has the downside that we end up moving all the DMOPs twice. But > it does allow us to separate the audit work from the cleanup/reorg. Well, moving things twice doesn't sound attractive. I was rather thinking of declaring hvmctl (or dmop, if that's to intended name) sub-ops secure one by one as they get audited. Part of my reason to be hesitant to do such an audit myself (apart from the time constraint) is me remembering my failure to do so properly as a follow-up to XSA-77 (which needed to be corrected a couple of months back), plus not really having got any help with that back then (which arguably might end up better here, as this appears to be an area of wider interest). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-03 14:16 ` Jan Beulich @ 2016-08-03 14:21 ` George Dunlap 2016-08-03 16:10 ` Ian Jackson 0 siblings, 1 reply; 74+ messages in thread From: George Dunlap @ 2016-08-03 14:21 UTC (permalink / raw) To: Jan Beulich, Ian Jackson Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra On 03/08/16 15:16, Jan Beulich wrote: >>>> On 03.08.16 at 15:37, <ian.jackson@eu.citrix.com> wrote: >> Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu >> depriv)"): >>> On 03.08.16 at 12:29, <ian.jackson@eu.citrix.com> wrote: >>>> Does that mean that functionality exposed by all the prooposed HVMCTLs >>>> is currently available to stubdoms ? >>> >>> That series only moves code from one hypercall to another (new) one, >>> without any security implications at all. What has been available to >>> stubdoms prior to that series will be available the same way once it >>> got applied. >> >> So what you mean is that in HVMCTL, the privilege check is retained in >> the individual HVMCTL sub-operation ? (Ie what used to be IS_PRIV or >> IS_PRIV_FOR - and implicitly, some sub-ops would be IS_PRIV and some >> IS_PRIV_FOR.) >> >> But looking at your v2 01/11, I see this in do_hvmctl: >> >> + rc = xsm_hvm_control(XSM_DM_PRIV, d, op.cmd); >> + if ( rc ) >> + { >> + rcu_unlock_domain(d); >> + return rc; >> + } >> >> And looking at v2 02/11, I didn't see any privilege check in the >> specific hypercall. > > Of course - there were no explicit IS_PRIV{,_FOR}() uses before > (i.e. the patch also doesn't remove any), these all sit behind the > XSM hook. And no, there are no per-sub-op privilege checks, > they're being consolidated to just the one you quote above. > >> With DMOP it would make sense for the privilege check to be >> IS_PRIV_FOR, in the DMOP dispatcher. But it seems that this privilege >> check already exists in HVMCTL in the right form. >> >> So AFAICT HVMCTLs already guarantee not to have an adverse impact on >> the whole system. If that weren't the case, then a stub dm could >> exploit the situation. > > And to tell you the truth, I'm not entirely convinced that all the > code implementing those operations (be it in there current hvmop > form or the new hvmctl one - again, the series only moves code > around) is really safe in this regard. But with there even being > at least one domctl not on xsm-flask.txt's safe-for-disaggregation > list, but reportedly used by qemu (I don't recall right now which > exact one it is), stubdom-s can't be considered fully secure right > now anyway. > >> Is the audit that is required, to check that the DMOP doesn't have an >> adverse effect on the _calling domain_ ? AFAICT most HVMCTLs/DMOPs >> have /no/ effect on the calling domain, other than as part of argument >> passing. So that audit should be easy. >> >> So I am confused. What am I missing ? > > The main adverse effect on the whole system that I can see > would be long latency operations, but I can't exclude others: Just > look at the final patch of the series, which fixes a serialization bug > which I noticed while doing the code movement. I don't think lack > of proper serialization is guaranteed to only affect the target > domain. > >>>> Now, there may be other ways to represent/record the security status. >>>> But it will be necessary to either (i) avoid violating the DMOP >>>> security promise, by making questionable calls not available via DMOP >>>> or (ii) trying to retrofit the security promise to DMOP later. >>>> >>>> I think (ii) is not a good approach. It would amount to introducing a >>>> whole new set of interfaces, and then later trying to redefine them to >>>> have a particular security property which was not originally there. >>> >>> I agree that (i) would be the better approach, but I don't think I >>> can assess when I would find the time to do the required auditing >>> of all involved code. Plus I don't see the difference between going >>> the (ii) route with the presented hvmctl series vs keeping things as >>> they are (under hvmop) - in both cases the security promise will >>> need to be retrofit onto existing code. >> >> If we don't apply HVMCTL, we can introduce DMOP and then individually >> move hypercalls from hvmop to DMOP as they are audited. >> >> Would a similar approach be acceptable even after HVMCTL ? >> >> That is, the following plan: >> >> 1. Apply HVMCTL right away. This solves the cleanup problem, >> but leaves the qemu depriv problem unsolved. >> >> 2. After the necessary discussion to understand and refine the DMOP >> design, create a new DMOP hypercall with appropriate security >> promises and whatever stability promises are agreed, but with no >> sub-ops. >> >> 3. Move each HVMCTL to DMOP, one by one, as it is audited. >> >> 4. Perhaps some HVMCTLs will remain which are inherently unsuitable >> for use with qemu depriv. If not, perhaps eventually delete HVMCTL >> entirely, or leave it empty (with no subops). >> >> This has the downside that we end up moving all the DMOPs twice. But >> it does allow us to separate the audit work from the cleanup/reorg. > > Well, moving things twice doesn't sound attractive. I was rather > thinking of declaring hvmctl (or dmop, if that's to intended name) > sub-ops secure one by one as they get audited. Part of my reason > to be hesitant to do such an audit myself (apart from the time > constraint) is me remembering my failure to do so properly as a > follow-up to XSA-77 (which needed to be corrected a couple of > months back), plus not really having got any help with that back > then (which arguably might end up better here, as this appears to > be an area of wider interest). There is certainly a wider interest, and I don't think anyone was suggesting that you would be doing any of the work we're discussing to allow qemu depriv. But for the qemu depriv perspective, we want filtering to happen at dom0, and we want dom0 to have a very simple rule for knowing whether to allow a call or not: "Is the hypercall equal to the DMOP hypercall?" So before qemu devpriv can be usable, *all* the HVMCTL operations would need to be audited, and those that were deemed insecure would need to be either fixed or removed. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-03 14:21 ` George Dunlap @ 2016-08-03 16:10 ` Ian Jackson 2016-08-03 16:18 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-08-03 16:10 UTC (permalink / raw) To: George Dunlap Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, David Vrabel, Jan Beulich, Anthony Perard, xen-devel, dgdegra George Dunlap writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > So before qemu devpriv can be usable, *all* the HVMCTL operations would > need to be audited, and those that were deemed insecure would need to be > either fixed or removed. Even worse, the bad HVMCTLs would be retrospectively turned into security-bugs-in-old-hypervisors. I don't think this is tenable. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-03 16:10 ` Ian Jackson @ 2016-08-03 16:18 ` Jan Beulich 2016-08-04 11:21 ` Ian Jackson 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-03 16:18 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, George Dunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 03.08.16 at 18:10, <ian.jackson@eu.citrix.com> wrote: > George Dunlap writes ("Re: Device model operation hypercall (DMOP, re qemu > depriv)"): >> So before qemu devpriv can be usable, *all* the HVMCTL operations would >> need to be audited, and those that were deemed insecure would need to be >> either fixed or removed. > > Even worse, the bad HVMCTLs would be retrospectively turned into > security-bugs-in-old-hypervisors. I don't think this is tenable. How would a bug in the respective current hvmop then not be a security issue as well? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-03 16:18 ` Jan Beulich @ 2016-08-04 11:21 ` Ian Jackson 2016-08-04 13:24 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-08-04 11:21 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, George Dunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > On 03.08.16 at 18:10, <ian.jackson@eu.citrix.com> wrote: > > George Dunlap writes ("Re: Device model operation hypercall (DMOP, re qemu > > depriv)"): > >> So before qemu devpriv can be usable, *all* the HVMCTL operations would > >> need to be audited, and those that were deemed insecure would need to be > >> either fixed or removed. > > > > Even worse, the bad HVMCTLs would be retrospectively turned into > > security-bugs-in-old-hypervisors. I don't think this is tenable. > > How would a bug in the respective current hvmop then not be a > security issue as well? AFAICT there are two kinds of possible bug: 1. An HVMCTL (or hvmop) that can have an adverse affect on the whole system. Such bugs would already be security bugs, covered by our security support policy. Such a bug would be a security bug whether the operation is an hvmop or an HVMCTL or a DMOP. 2. An HVMCTL (or hvmop) that can have an adverse effect on the calling domain. Such bugs are not currently security bugs. But the of qemu depriv project requires them to be removed. Such such a bug is a security bug if it is a DMOP[1] but not otherwise. By "is a security bug if it is a DMOP" I mean "is a security bug if exposed via a hypercall which promises the security property necessary for DMOP, even if that hypercall is called something else". Strictly speaking, in order to move existing hypercalls into a new DMOP hypercall, no further auditing is required for bugs of class 1 (other than that the privilege check is correct, ie that the new DMOP is indeed currently available to stub device models). But in order to move hypercalls into a new DMOP hypercall, they need to be checked for bugs of class 2. If we move some hypercalls which have potential bugs of class 2 from hvmop to HVMCTL, and then later think we want to create something like DMOP, we have a number of choices: We can invent a new DMOP hypercall and start moving everything from HVMCTL to DMOP as we audit it. We could audit every HVMCTL, recording the audit status in-tree somewhere, moving hypercalls with class 2 problems out of HVMCTL to a new hypercall, and eventually, after we have audited everything, we could change HVMCTL's hypercall number (and maybe rename it DMOP). What we cannot do is audit every HVMCTL, fix the class 2 problems, and then declare HVMCTL to have the relevant security property, and implement corresponding code in dom0's privcmd drivers which relies on the security property. This is because the dom0 privcmd driver doesn't know whether the HVMCTLs it is allowing not-fully-trusted userspace to make are actually trustworthy (with the specific hypervisor version in question.) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-04 11:21 ` Ian Jackson @ 2016-08-04 13:24 ` Jan Beulich 2016-08-05 16:28 ` Ian Jackson 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-04 13:24 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 04.08.16 at 13:21, <ian.jackson@eu.citrix.com> wrote: > What we cannot do is audit every HVMCTL, fix the class 2 problems, and > then declare HVMCTL to have the relevant security property, and > implement corresponding code in dom0's privcmd drivers which relies on > the security property. This is because the dom0 privcmd driver > doesn't know whether the HVMCTLs it is allowing not-fully-trusted > userspace to make are actually trustworthy (with the specific > hypervisor version in question.) I continue to not really understand this argumentation: Dom0's privcmd driver doesn't really matter here. If there's a bug in something qemu uses, this is a problem no matter whether that operation gets called though the to-be-added privcmd logic, or straight from a stubdom qemu. Both are less than fully privileged. What do I continue to be missing? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-04 13:24 ` Jan Beulich @ 2016-08-05 16:28 ` Ian Jackson 2016-08-08 11:18 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-08-05 16:28 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > On 04.08.16 at 13:21, <ian.jackson@eu.citrix.com> wrote: > > What we cannot do is audit every HVMCTL, fix the class 2 problems, and > > then declare HVMCTL to have the relevant security property, and > > implement corresponding code in dom0's privcmd drivers which relies on > > the security property. This is because the dom0 privcmd driver > > doesn't know whether the HVMCTLs it is allowing not-fully-trusted > > userspace to make are actually trustworthy (with the specific > > hypervisor version in question.) > > I continue to not really understand this argumentation: Dom0's > privcmd driver doesn't really matter here. If there's a bug in > something qemu uses, this is a problem no matter whether that > operation gets called though the to-be-added privcmd logic, or > straight from a stubdom qemu. Both are less than fully privileged. > What do I continue to be missing? Let me try again. Earlier I wrote: AFAICT there are two kinds of possible bug: 1. An HVMCTL (or hvmop) that can have an adverse affect on the whole system. Such bugs would already be security bugs, covered by our security support policy. Such a bug would be a security bug whether the operation is an hvmop or an HVMCTL or a DMOP. 2. An HVMCTL (or hvmop) that can have an adverse effect on the calling domain. Such bugs are not currently security bugs. But the of qemu depriv project requires them to be removed. Such such a bug is a security bug if it is a DMOP[1] but not otherwise. Bugs of class 1 are already security bugs. They can already be exploited by stub device models. Bugs of class 2 are only security bugs if we allow unprivileged callers within a privileged domain to use the corresponding hypercall. That is, a bug of class 2 would allow the unprivileged qemu process in dom0 to cause damage to other parts of dom0. Bugs of class 2 are of no interest to an attacker who has control of a stub device model, because all it allows them to do is damage the device domain domain (which they already control). Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-05 16:28 ` Ian Jackson @ 2016-08-08 11:18 ` Jan Beulich 2016-08-08 13:46 ` Ian Jackson 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-08 11:18 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 05.08.16 at 18:28, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu > depriv)"): >> On 04.08.16 at 13:21, <ian.jackson@eu.citrix.com> wrote: >> > What we cannot do is audit every HVMCTL, fix the class 2 problems, and >> > then declare HVMCTL to have the relevant security property, and >> > implement corresponding code in dom0's privcmd drivers which relies on >> > the security property. This is because the dom0 privcmd driver >> > doesn't know whether the HVMCTLs it is allowing not-fully-trusted >> > userspace to make are actually trustworthy (with the specific >> > hypervisor version in question.) >> >> I continue to not really understand this argumentation: Dom0's >> privcmd driver doesn't really matter here. If there's a bug in >> something qemu uses, this is a problem no matter whether that >> operation gets called though the to-be-added privcmd logic, or >> straight from a stubdom qemu. Both are less than fully privileged. >> What do I continue to be missing? > > Let me try again. Earlier I wrote: > > AFAICT there are two kinds of possible bug: > > 1. An HVMCTL (or hvmop) that can have an adverse affect on the whole > system. Such bugs would already be security bugs, covered by our > security support policy. Such a bug would be a security bug whether > the operation is an hvmop or an HVMCTL or a DMOP. > > 2. An HVMCTL (or hvmop) that can have an adverse effect on the calling > domain. Such bugs are not currently security bugs. But the of qemu > depriv project requires them to be removed. Such such a bug is a > security bug if it is a DMOP[1] but not otherwise. > > Bugs of class 1 are already security bugs. They can already be > exploited by stub device models. > > Bugs of class 2 are only security bugs if we allow unprivileged > callers within a privileged domain to use the corresponding hypercall. > > That is, a bug of class 2 would allow the unprivileged qemu process in > dom0 to cause damage to other parts of dom0. > > Bugs of class 2 are of no interest to an attacker who has control of a > stub device model, because all it allows them to do is damage the > device domain domain (which they already control). Ah, okay, I think I finally understand. I'm sorry for being dense. I'm having, however, a hard time imagining a class 2 bug for any of the hvmop-s that are being converted by the hvmctl series: These act on the target domain, so would not touch the calling ones state other than for copying argument structures to/from guest memory (and I don't view mixing up domain pointers as a likely source of problems). Any other problem they might reasonably have would then affect the system as a whole (class 1) or just the target domain (non-security bug). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-08 11:18 ` Jan Beulich @ 2016-08-08 13:46 ` Ian Jackson 2016-08-08 14:07 ` Jan Beulich 2016-08-09 10:29 ` Jan Beulich 0 siblings, 2 replies; 74+ messages in thread From: Ian Jackson @ 2016-08-08 13:46 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > On 05.08.16 at 18:28, <ian.jackson@eu.citrix.com> wrote: > > That is, a bug of class 2 would allow the unprivileged qemu process in > > dom0 to cause damage to other parts of dom0. ... > Ah, okay, I think I finally understand. [...] > > I'm having, however, a hard time imagining a class 2 bug for any > of the hvmop-s that are being converted by the hvmctl series: > These act on the target domain, so would not touch the calling > ones state other than for copying argument structures to/from > guest memory (and I don't view mixing up domain pointers as > a likely source of problems). Right. AIUI all the hypercall arguments are passed using calling-guest-virtual addresses, which the dom0 privcmd arrangements are capable of ensuring are sane. > Any other problem they might > reasonably have would then affect the system as a whole (class > 1) or just the target domain (non-security bug). So would it therefore be OK to introduce the enhanced security promise - the lack of `class 2' bugs - for HVMCTL from the beginning ? This would involve a small amount of extra thought for each invididual hypercall, just to check that the assumptions we are relying on (as you put them above) are not violated. If so then good, but I think we still need to have a proper conversation about how we are going to provide ABI stability to qemu in practice. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-08 13:46 ` Ian Jackson @ 2016-08-08 14:07 ` Jan Beulich 2016-08-26 11:38 ` Ian Jackson 2016-08-09 10:29 ` Jan Beulich 1 sibling, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-08 14:07 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 08.08.16 at 15:46, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu > depriv)"): >> I'm having, however, a hard time imagining a class 2 bug for any >> of the hvmop-s that are being converted by the hvmctl series: >> These act on the target domain, so would not touch the calling >> ones state other than for copying argument structures to/from >> guest memory (and I don't view mixing up domain pointers as >> a likely source of problems). > > Right. AIUI all the hypercall arguments are passed using > calling-guest-virtual addresses, which the dom0 privcmd arrangements > are capable of ensuring are sane. > >> Any other problem they might >> reasonably have would then affect the system as a whole (class >> 1) or just the target domain (non-security bug). > > So would it therefore be OK to introduce the enhanced security promise > - the lack of `class 2' bugs - for HVMCTL from the beginning ? I think so, since ... > This would involve a small amount of extra thought for each invididual > hypercall, just to check that the assumptions we are relying on (as > you put them above) are not violated. ... this looks to be a manageable amount of code auditing (albeit I'd like to see whether someone else can perhaps come up with some potential, more realistic kind of bug that could fall into class 2 before volunteering to make an attempt at doing such an audit). > If so then good, but I think we still need to have a proper > conversation about how we are going to provide ABI stability to qemu > in practice. Yes, the ABI stability aspect still needs coming to an agreement. Since relaxation is easier than tightening, I'm going to retain the current way the patches are in this regard, i.e. assume an unstable hypercall ABI. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-08 14:07 ` Jan Beulich @ 2016-08-26 11:38 ` Ian Jackson 2016-08-26 12:58 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-08-26 11:38 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > On 08.08.16 at 15:46, <ian.jackson@eu.citrix.com> wrote: > > So would it therefore be OK to introduce the enhanced security promise > > - the lack of `class 2' bugs - for HVMCTL from the beginning ? > > I think so, since ... > > > This would involve a small amount of extra thought for each invididual > > hypercall, just to check that the assumptions we are relying on (as > > you put them above) are not violated. > > ... this looks to be a manageable amount of code auditing (albeit > I'd like to see whether someone else can perhaps come up with > some potential, more realistic kind of bug that could fall into class > 2 before volunteering to make an attempt at doing such an audit). Right. Let me try to think of some examples. Thinking `aloud': The real problem comes if a DMOP talks about the calling domain's resources or namespaces, implicitly or explicitly. An example of an explicit reference to the calling domain's resources is the references to memory space in the calling domain (vaddrs). We have already had an extensive discussion of that... Another example would be a DMOP that takes (or returns) an event channel number in the calling domain. This would be a problem because there would be nothing to stop qemu from messing about with evtchns which dom0 is using for other purposes (or conversely, there would be no way for the dom0 evtchn driver to know about the returned evtchn number and allow qemu to receive it). Another might be a DMOP that implicitly grants the target domain some of the calling domain's scheduling priority. (I realise this is quite implausible from a scheduling API POV, but it gives an idea.) Another example is that of course VCPU pool management and VCPU-PCPU pinning must not be available via DMOP. (I write `qemu' here for brevity and clarity, but really I mean any DMOP caller which is supposed to be privileged for the target domain but not generally privileged.) Does that help at all ? Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-26 11:38 ` Ian Jackson @ 2016-08-26 12:58 ` Jan Beulich 2016-08-26 14:35 ` Ian Jackson 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-26 12:58 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 26.08.16 at 13:38, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu > depriv)"): >> On 08.08.16 at 15:46, <ian.jackson@eu.citrix.com> wrote: >> > So would it therefore be OK to introduce the enhanced security promise >> > - the lack of `class 2' bugs - for HVMCTL from the beginning ? >> >> I think so, since ... >> >> > This would involve a small amount of extra thought for each invididual >> > hypercall, just to check that the assumptions we are relying on (as >> > you put them above) are not violated. >> >> ... this looks to be a manageable amount of code auditing (albeit >> I'd like to see whether someone else can perhaps come up with >> some potential, more realistic kind of bug that could fall into class >> 2 before volunteering to make an attempt at doing such an audit). > > Right. > > Let me try to think of some examples. Thinking `aloud': > > The real problem comes if a DMOP talks about the calling domain's > resources or namespaces, implicitly or explicitly. > > An example of an explicit reference to the calling domain's resources > is the references to memory space in the calling domain (vaddrs). We > have already had an extensive discussion of that... > > Another example would be a DMOP that takes (or returns) an event > channel number in the calling domain. This would be a problem because > there would be nothing to stop qemu from messing about with evtchns > which dom0 is using for other purposes (or conversely, there would be > no way for the dom0 evtchn driver to know about the returned evtchn > number and allow qemu to receive it). Doesn't that follow the more general "mixing up own and target domains" pattern, which is relatively easy to audit for? > Another might be a DMOP that implicitly grants the target domain some > of the calling domain's scheduling priority. (I realise this is quite > implausible from a scheduling API POV, but it gives an idea.) > > Another example is that of course VCPU pool management and VCPU-PCPU > pinning must not be available via DMOP. > > (I write `qemu' here for brevity and clarity, but really I mean any > DMOP caller which is supposed to be privileged for the target domain > but not generally privileged.) These all look rather contrived, especially keeping in mind that what we mean to exclude right now are accidental violations of the intended isolation. I.e. I think for all of those one would need to go to some lengths to actually achieve the "goal", but they are rather unlikely to be the result of a bug. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-26 12:58 ` Jan Beulich @ 2016-08-26 14:35 ` Ian Jackson 2016-08-26 15:13 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-08-26 14:35 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > On 26.08.16 at 13:38, <ian.jackson@eu.citrix.com> wrote: > > Another example would be a DMOP that takes (or returns) an event > > channel number in the calling domain. This would be a problem because > > there would be nothing to stop qemu from messing about with evtchns > > which dom0 is using for other purposes (or conversely, there would be > > no way for the dom0 evtchn driver to know about the returned evtchn > > number and allow qemu to receive it). > > Doesn't that follow the more general "mixing up own and target > domains" pattern, which is relatively easy to audit for? Yes, as I understand what you mean by that pattern, indeed. > > Another might be a DMOP that implicitly grants the target domain some > > of the calling domain's scheduling priority. (I realise this is quite > > implausible from a scheduling API POV, but it gives an idea.) > > > > Another example is that of course VCPU pool management and VCPU-PCPU > > pinning must not be available via DMOP. > > > > (I write `qemu' here for brevity and clarity, but really I mean any > > DMOP caller which is supposed to be privileged for the target domain > > but not generally privileged.) > > These all look rather contrived, especially keeping in mind that > what we mean to exclude right now are accidental violations of > the intended isolation. I.e. I think for all of those one would need > to go to some lengths to actually achieve the "goal", but they are > rather unlikely to be the result of a bug. Right. So I think this confirms your conclusion that this "audit" (ie, checking that there are problems in these kind of categories) won't be very difficult ? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-26 14:35 ` Ian Jackson @ 2016-08-26 15:13 ` Jan Beulich 2016-08-30 11:02 ` Ian Jackson 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-26 15:13 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 26.08.16 at 16:35, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu > depriv)"): >> On 26.08.16 at 13:38, <ian.jackson@eu.citrix.com> wrote: >> > Another example would be a DMOP that takes (or returns) an event >> > channel number in the calling domain. This would be a problem because >> > there would be nothing to stop qemu from messing about with evtchns >> > which dom0 is using for other purposes (or conversely, there would be >> > no way for the dom0 evtchn driver to know about the returned evtchn >> > number and allow qemu to receive it). >> >> Doesn't that follow the more general "mixing up own and target >> domains" pattern, which is relatively easy to audit for? > > Yes, as I understand what you mean by that pattern, indeed. > >> > Another might be a DMOP that implicitly grants the target domain some >> > of the calling domain's scheduling priority. (I realise this is quite >> > implausible from a scheduling API POV, but it gives an idea.) >> > >> > Another example is that of course VCPU pool management and VCPU-PCPU >> > pinning must not be available via DMOP. >> > >> > (I write `qemu' here for brevity and clarity, but really I mean any >> > DMOP caller which is supposed to be privileged for the target domain >> > but not generally privileged.) >> >> These all look rather contrived, especially keeping in mind that >> what we mean to exclude right now are accidental violations of >> the intended isolation. I.e. I think for all of those one would need >> to go to some lengths to actually achieve the "goal", but they are >> rather unlikely to be the result of a bug. > > Right. > > So I think this confirms your conclusion that this "audit" (ie, > checking that there are problems in these kind of categories) won't be > very difficult ? Well, in a way. And then not: Initially I had thought no issue would arise, until I came to realize the kernel memory corruption potential. Question is whether now we're overlooking some other not immediately obvious issue. The problem with auditing is that generally you can only look for things you're aware of (or magically become aware of while looking at the code). But I guess we should just go ahead with the patterns we know of. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-26 15:13 ` Jan Beulich @ 2016-08-30 11:02 ` Ian Jackson 2016-08-30 21:47 ` Stefano Stabellini 2016-09-02 14:08 ` Wei Liu 0 siblings, 2 replies; 74+ messages in thread From: Ian Jackson @ 2016-08-30 11:02 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > Well, in a way. And then not: Initially I had thought no issue would > arise, until I came to realize the kernel memory corruption potential. > Question is whether now we're overlooking some other not > immediately obvious issue. The problem with auditing is that > generally you can only look for things you're aware of (or magically > become aware of while looking at the code). But I guess we should > just go ahead with the patterns we know of. I think so, yes. I will take a look at the interfaces, at least, to see if I can spot anything missing. This will probably generate some more stupid questions... So, then, is everyone now happy with the overall approach ? That is, as I wrote up in: Message-ID: <22464.10246.708893.563258@mariner.uk.xensource.com> Subject: Re: Device model operation hypercall (DMOP, re qemu depriv) Date: Fri, 26 Aug 2016 12:29:10 +0100 Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-30 11:02 ` Ian Jackson @ 2016-08-30 21:47 ` Stefano Stabellini 2016-09-02 14:08 ` Wei Liu 1 sibling, 0 replies; 74+ messages in thread From: Stefano Stabellini @ 2016-08-30 21:47 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Jan Beulich, Anthony Perard, xen-devel, dgdegra On Tue, 30 Aug 2016, Ian Jackson wrote: > Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > > Well, in a way. And then not: Initially I had thought no issue would > > arise, until I came to realize the kernel memory corruption potential. > > Question is whether now we're overlooking some other not > > immediately obvious issue. The problem with auditing is that > > generally you can only look for things you're aware of (or magically > > become aware of while looking at the code). But I guess we should > > just go ahead with the patterns we know of. > > I think so, yes. I will take a look at the interfaces, at least, to > see if I can spot anything missing. This will probably generate some > more stupid questions... > > So, then, is everyone now happy with the overall approach ? That is, > as I wrote up in: > Message-ID: <22464.10246.708893.563258@mariner.uk.xensource.com> > Subject: Re: Device model operation hypercall (DMOP, re qemu depriv) > Date: Fri, 26 Aug 2016 12:29:10 +0100 I think it looks good _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-30 11:02 ` Ian Jackson 2016-08-30 21:47 ` Stefano Stabellini @ 2016-09-02 14:08 ` Wei Liu 1 sibling, 0 replies; 74+ messages in thread From: Wei Liu @ 2016-09-02 14:08 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Jan Beulich, Anthony Perard, xen-devel, dgdegra On Tue, Aug 30, 2016 at 12:02:26PM +0100, Ian Jackson wrote: > Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > > Well, in a way. And then not: Initially I had thought no issue would > > arise, until I came to realize the kernel memory corruption potential. > > Question is whether now we're overlooking some other not > > immediately obvious issue. The problem with auditing is that > > generally you can only look for things you're aware of (or magically > > become aware of while looking at the code). But I guess we should > > just go ahead with the patterns we know of. > > I think so, yes. I will take a look at the interfaces, at least, to > see if I can spot anything missing. This will probably generate some > more stupid questions... > > So, then, is everyone now happy with the overall approach ? That is, > as I wrote up in: > Message-ID: <22464.10246.708893.563258@mariner.uk.xensource.com> > Subject: Re: Device model operation hypercall (DMOP, re qemu depriv) > Date: Fri, 26 Aug 2016 12:29:10 +0100 > It looks like a reasonable plan to me. Wei. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-08 13:46 ` Ian Jackson 2016-08-08 14:07 ` Jan Beulich @ 2016-08-09 10:29 ` Jan Beulich 2016-08-09 10:48 ` Ian Jackson 2016-08-15 14:50 ` David Vrabel 1 sibling, 2 replies; 74+ messages in thread From: Jan Beulich @ 2016-08-09 10:29 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 08.08.16 at 15:46, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu > depriv)"): >> On 05.08.16 at 18:28, <ian.jackson@eu.citrix.com> wrote: >> > That is, a bug of class 2 would allow the unprivileged qemu process in >> > dom0 to cause damage to other parts of dom0. > ... >> Ah, okay, I think I finally understand. [...] >> >> I'm having, however, a hard time imagining a class 2 bug for any >> of the hvmop-s that are being converted by the hvmctl series: >> These act on the target domain, so would not touch the calling >> ones state other than for copying argument structures to/from >> guest memory (and I don't view mixing up domain pointers as >> a likely source of problems). > > Right. AIUI all the hypercall arguments are passed using > calling-guest-virtual addresses, which the dom0 privcmd arrangements > are capable of ensuring are sane. Actually, having thought about this some more, and taking this together with the expectations to the privcmd driver previously outlined, I think this part is problematic: If all the driver is to know is the position (within the interface structure) of the target domain ID, then any guest handles embedded in the interface structure (XEN_HVMCTL_track_dirty_vram only for now) couldn't get validated, and hence user mode code would have a way to access or modify kernel memory. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-09 10:29 ` Jan Beulich @ 2016-08-09 10:48 ` Ian Jackson 2016-08-09 11:30 ` Jan Beulich 2016-08-15 14:50 ` David Vrabel 1 sibling, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-08-09 10:48 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > Actually, having thought about this some more, and taking this > together with the expectations to the privcmd driver previously > outlined, I think this part is problematic: If all the driver is to know > is the position (within the interface structure) of the target domain > ID, then any guest handles embedded in the interface structure > (XEN_HVMCTL_track_dirty_vram only for now) couldn't get > validated, and hence user mode code would have a way to access > or modify kernel memory. Could the hypervisor know the difference between user and kernel memory, in principle ? Alternatively, would it be possible for the ABI to specify somehow what parameters are guest handles, so that the privcmd driver could check them ? (Would it be sufficient to check the starts, or would the ends need to be checked too?) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-09 10:48 ` Ian Jackson @ 2016-08-09 11:30 ` Jan Beulich 2016-08-12 9:44 ` George Dunlap 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-09 11:30 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 09.08.16 at 12:48, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu > depriv)"): >> Actually, having thought about this some more, and taking this >> together with the expectations to the privcmd driver previously >> outlined, I think this part is problematic: If all the driver is to know >> is the position (within the interface structure) of the target domain >> ID, then any guest handles embedded in the interface structure >> (XEN_HVMCTL_track_dirty_vram only for now) couldn't get >> validated, and hence user mode code would have a way to access >> or modify kernel memory. > > Could the hypervisor know the difference between user and kernel > memory, in principle ? Not without further new hypercalls, as the guest kernel would need to tell Xen what address ranges are kernel vs user (and that implies that any OS wishing to be able to act as Dom0 has a uniform separation of address spaces). > Alternatively, would it be possible for the ABI to specify somehow > what parameters are guest handles, so that the privcmd driver could > check them ? We could presumably invent something, but I'm afraid it would end up quite ugly. > (Would it be sufficient to check the starts, or would > the ends need to be checked too?) Both would need to be checked, so the size field(s) would need to be locatable too. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-09 11:30 ` Jan Beulich @ 2016-08-12 9:44 ` George Dunlap 2016-08-12 11:50 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: George Dunlap @ 2016-08-12 9:44 UTC (permalink / raw) To: Jan Beulich, Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra On 09/08/16 12:30, Jan Beulich wrote: >>>> On 09.08.16 at 12:48, <ian.jackson@eu.citrix.com> wrote: >> Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu >> depriv)"): >>> Actually, having thought about this some more, and taking this >>> together with the expectations to the privcmd driver previously >>> outlined, I think this part is problematic: If all the driver is to know >>> is the position (within the interface structure) of the target domain >>> ID, then any guest handles embedded in the interface structure >>> (XEN_HVMCTL_track_dirty_vram only for now) couldn't get >>> validated, and hence user mode code would have a way to access >>> or modify kernel memory. >> >> Could the hypervisor know the difference between user and kernel >> memory, in principle ? > > Not without further new hypercalls, as the guest kernel would need > to tell Xen what address ranges are kernel vs user (and that implies > that any OS wishing to be able to act as Dom0 has a uniform > separation of address spaces). Couldn't Xen tell from the guest pagetables whether the memory being accessed was user-mode or kernel mode? >> Alternatively, would it be possible for the ABI to specify somehow >> what parameters are guest handles, so that the privcmd driver could >> check them ? > > We could presumably invent something, but I'm afraid it would end > up quite ugly. > >> (Would it be sufficient to check the starts, or would >> the ends need to be checked too?) > > Both would need to be checked, so the size field(s) would need to > be locatable too. We could have the "fixed" part of the structure contain domid and an array of <ptr, len> which the privcmd driver could check. I don't think that would be terrible. Alternately, the "fixed" part of the hypercall could contain a <start,end> range, which if non-zero, Xen should use to check any pointer contained in the struct -- that would be more flexible probably. Or we could do as Jan hints at above -- have some way to have dom0 communicate the kernel address range to Xen (either via hypercall, or maybe via the shared info page) so that Xen just knows the address layout for any individual domain. And unless we're positive the guest kernel will never need these hypercalls, we probably need a flag that allows kernel-mode pointers. It's worth pointing out that the problem Xen has distinguishing user/kernel mode pointers is the same even if we use the alternate suggestion of per-process XSM labels. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-12 9:44 ` George Dunlap @ 2016-08-12 11:50 ` Jan Beulich 2016-08-15 9:39 ` George Dunlap 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-12 11:50 UTC (permalink / raw) To: George Dunlap Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 12.08.16 at 11:44, <george.dunlap@citrix.com> wrote: > On 09/08/16 12:30, Jan Beulich wrote: >>>>> On 09.08.16 at 12:48, <ian.jackson@eu.citrix.com> wrote: >>> Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu >>> depriv)"): >>>> Actually, having thought about this some more, and taking this >>>> together with the expectations to the privcmd driver previously >>>> outlined, I think this part is problematic: If all the driver is to know >>>> is the position (within the interface structure) of the target domain >>>> ID, then any guest handles embedded in the interface structure >>>> (XEN_HVMCTL_track_dirty_vram only for now) couldn't get >>>> validated, and hence user mode code would have a way to access >>>> or modify kernel memory. >>> >>> Could the hypervisor know the difference between user and kernel >>> memory, in principle ? >> >> Not without further new hypercalls, as the guest kernel would need >> to tell Xen what address ranges are kernel vs user (and that implies >> that any OS wishing to be able to act as Dom0 has a uniform >> separation of address spaces). > > Couldn't Xen tell from the guest pagetables whether the memory being > accessed was user-mode or kernel mode? That would be possible, but would feel like adding heuristics instead of a proper distinction. Clearly we'd already be in some trouble if there were cases where some structure doesn't get written to (and hence could live in user-r/o mapped space), but others would need to be verified to be user-r/w mapped. A lot of special casing, that is, and hence of lot of things to be got wrong. And then there is the problem of calling code being in rings 1 or 2: Page tables don't guard ring 0 against such, and we don't even have the notion of selectors (and hence address ranges) bounding accessible regions. We can't even say we assume all of them to be %ds-relative, as it would certainly be legitimate for such a structure to e.g. live on the stack. Of course an option would be to require the kernel driver to not allow requests from other than ring 3. Plus finally - how would we tell interface structures coming from a kernel invoked hypercall from those originating from user mode? There would need to be at least some kind of flag then, which the privcmd driver set, but normal hypercalls originating in the kernel don't. Or would you envision to allow this DMOP hypercall to only be made by user mode tools? If so, does stubdom run its qemu in ring 3 or rather in ring 0? [breaking the order of quoting] > And unless we're positive the guest kernel will never need these > hypercalls, we probably need a flag that allows kernel-mode pointers. Ah, you actually mention that already. >>> (Would it be sufficient to check the starts, or would >>> the ends need to be checked too?) >> >> Both would need to be checked, so the size field(s) would need to >> be locatable too. > > We could have the "fixed" part of the structure contain domid and an > array of <ptr, len> which the privcmd driver could check. I don't think > that would be terrible. Doable, yes, but not really nice, especially for the party invoking the hypercall as well as the backing implementation in Xen (as opposed to the privcmd driver, for which such a model would likely work quite well), as they then can't use the normal, simple reading of structure fields, but instead would need to populate array elements in the right order. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-12 11:50 ` Jan Beulich @ 2016-08-15 9:39 ` George Dunlap 2016-08-15 10:19 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: George Dunlap @ 2016-08-15 9:39 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra On 12/08/16 12:50, Jan Beulich wrote: >>>> On 12.08.16 at 11:44, <george.dunlap@citrix.com> wrote: >> On 09/08/16 12:30, Jan Beulich wrote: >>>>>> On 09.08.16 at 12:48, <ian.jackson@eu.citrix.com> wrote: >>>> Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu >>>> depriv)"): >>>>> Actually, having thought about this some more, and taking this >>>>> together with the expectations to the privcmd driver previously >>>>> outlined, I think this part is problematic: If all the driver is to know >>>>> is the position (within the interface structure) of the target domain >>>>> ID, then any guest handles embedded in the interface structure >>>>> (XEN_HVMCTL_track_dirty_vram only for now) couldn't get >>>>> validated, and hence user mode code would have a way to access >>>>> or modify kernel memory. >>>> >>>> Could the hypervisor know the difference between user and kernel >>>> memory, in principle ? >>> >>> Not without further new hypercalls, as the guest kernel would need >>> to tell Xen what address ranges are kernel vs user (and that implies >>> that any OS wishing to be able to act as Dom0 has a uniform >>> separation of address spaces). >> >> Couldn't Xen tell from the guest pagetables whether the memory being >> accessed was user-mode or kernel mode? > > That would be possible, but would feel like adding heuristics instead > of a proper distinction. Clearly we'd already be in some trouble if > there were cases where some structure doesn't get written to (and > hence could live in user-r/o mapped space), but others would need > to be verified to be user-r/w mapped. A lot of special casing, that is, > and hence of lot of things to be got wrong. > > And then there is the problem of calling code being in rings 1 or 2: > Page tables don't guard ring 0 against such, and we don't even have > the notion of selectors (and hence address ranges) bounding > accessible regions. We can't even say we assume all of them to be > %ds-relative, as it would certainly be legitimate for such a structure > to e.g. live on the stack. Of course an option would be to require > the kernel driver to not allow requests from other than ring 3. > > Plus finally - how would we tell interface structures coming from a > kernel invoked hypercall from those originating from user mode? > There would need to be at least some kind of flag then, which the > privcmd driver set, but normal hypercalls originating in the kernel > don't. Or would you envision to allow this DMOP hypercall to only > be made by user mode tools? If so, does stubdom run its qemu in > ring 3 or rather in ring 0? > > [breaking the order of quoting] >> And unless we're positive the guest kernel will never need these >> hypercalls, we probably need a flag that allows kernel-mode pointers. > > Ah, you actually mention that already. > >>>> (Would it be sufficient to check the starts, or would >>>> the ends need to be checked too?) >>> >>> Both would need to be checked, so the size field(s) would need to >>> be locatable too. >> >> We could have the "fixed" part of the structure contain domid and an >> array of <ptr, len> which the privcmd driver could check. I don't think >> that would be terrible. > > Doable, yes, but not really nice, especially for the party invoking > the hypercall as well as the backing implementation in Xen (as > opposed to the privcmd driver, for which such a model would likely > work quite well), as they then can't use the normal, simple reading > of structure fields, but instead would need to populate array > elements in the right order. So on the whole, what would be your suggestion for how to solve the userspace-pointer problem? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-15 9:39 ` George Dunlap @ 2016-08-15 10:19 ` Jan Beulich 2016-08-15 10:47 ` George Dunlap 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-15 10:19 UTC (permalink / raw) To: George Dunlap Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 15.08.16 at 11:39, <george.dunlap@citrix.com> wrote: > On 12/08/16 12:50, Jan Beulich wrote: >>>>> On 12.08.16 at 11:44, <george.dunlap@citrix.com> wrote: >>> On 09/08/16 12:30, Jan Beulich wrote: >>>>>>> On 09.08.16 at 12:48, <ian.jackson@eu.citrix.com> wrote: >>>>> Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu >>>>> depriv)"): >>>>>> Actually, having thought about this some more, and taking this >>>>>> together with the expectations to the privcmd driver previously >>>>>> outlined, I think this part is problematic: If all the driver is to know >>>>>> is the position (within the interface structure) of the target domain >>>>>> ID, then any guest handles embedded in the interface structure >>>>>> (XEN_HVMCTL_track_dirty_vram only for now) couldn't get >>>>>> validated, and hence user mode code would have a way to access >>>>>> or modify kernel memory. >>>>> >>>>> Could the hypervisor know the difference between user and kernel >>>>> memory, in principle ? >>>> >>>> Not without further new hypercalls, as the guest kernel would need >>>> to tell Xen what address ranges are kernel vs user (and that implies >>>> that any OS wishing to be able to act as Dom0 has a uniform >>>> separation of address spaces). >>> >>> Couldn't Xen tell from the guest pagetables whether the memory being >>> accessed was user-mode or kernel mode? >> >> That would be possible, but would feel like adding heuristics instead >> of a proper distinction. Clearly we'd already be in some trouble if >> there were cases where some structure doesn't get written to (and >> hence could live in user-r/o mapped space), but others would need >> to be verified to be user-r/w mapped. A lot of special casing, that is, >> and hence of lot of things to be got wrong. >> >> And then there is the problem of calling code being in rings 1 or 2: >> Page tables don't guard ring 0 against such, and we don't even have >> the notion of selectors (and hence address ranges) bounding >> accessible regions. We can't even say we assume all of them to be >> %ds-relative, as it would certainly be legitimate for such a structure >> to e.g. live on the stack. Of course an option would be to require >> the kernel driver to not allow requests from other than ring 3. >> >> Plus finally - how would we tell interface structures coming from a >> kernel invoked hypercall from those originating from user mode? >> There would need to be at least some kind of flag then, which the >> privcmd driver set, but normal hypercalls originating in the kernel >> don't. Or would you envision to allow this DMOP hypercall to only >> be made by user mode tools? If so, does stubdom run its qemu in >> ring 3 or rather in ring 0? >> >> [breaking the order of quoting] >>> And unless we're positive the guest kernel will never need these >>> hypercalls, we probably need a flag that allows kernel-mode pointers. >> >> Ah, you actually mention that already. >> >>>>> (Would it be sufficient to check the starts, or would >>>>> the ends need to be checked too?) >>>> >>>> Both would need to be checked, so the size field(s) would need to >>>> be locatable too. >>> >>> We could have the "fixed" part of the structure contain domid and an >>> array of <ptr, len> which the privcmd driver could check. I don't think >>> that would be terrible. >> >> Doable, yes, but not really nice, especially for the party invoking >> the hypercall as well as the backing implementation in Xen (as >> opposed to the privcmd driver, for which such a model would likely >> work quite well), as they then can't use the normal, simple reading >> of structure fields, but instead would need to populate array >> elements in the right order. > > So on the whole, what would be your suggestion for how to solve the > userspace-pointer problem? Well, none of the options considered so far are really nice or readily available. I think the easiest to use for both the caller and the implementation of the hypercall would be the auxiliary hypercall for a kernel to indicate user/kernel boundaries plus a flag on the DMOP one for the kernel mode driver to indicate its user mode origin. The main (purely theoretical afaict) downside of this is the difficulty to use it in OSes with variable user/kernel boundaries. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-15 10:19 ` Jan Beulich @ 2016-08-15 10:47 ` George Dunlap 2016-08-15 11:20 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: George Dunlap @ 2016-08-15 10:47 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra On 15/08/16 11:19, Jan Beulich wrote: >>>> On 15.08.16 at 11:39, <george.dunlap@citrix.com> wrote: >> On 12/08/16 12:50, Jan Beulich wrote: >>>>>> On 12.08.16 at 11:44, <george.dunlap@citrix.com> wrote: >>>> On 09/08/16 12:30, Jan Beulich wrote: >>>>>>>> On 09.08.16 at 12:48, <ian.jackson@eu.citrix.com> wrote: >>>>>> Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu >>>>>> depriv)"): >>>>>>> Actually, having thought about this some more, and taking this >>>>>>> together with the expectations to the privcmd driver previously >>>>>>> outlined, I think this part is problematic: If all the driver is to know >>>>>>> is the position (within the interface structure) of the target domain >>>>>>> ID, then any guest handles embedded in the interface structure >>>>>>> (XEN_HVMCTL_track_dirty_vram only for now) couldn't get >>>>>>> validated, and hence user mode code would have a way to access >>>>>>> or modify kernel memory. >>>>>> >>>>>> Could the hypervisor know the difference between user and kernel >>>>>> memory, in principle ? >>>>> >>>>> Not without further new hypercalls, as the guest kernel would need >>>>> to tell Xen what address ranges are kernel vs user (and that implies >>>>> that any OS wishing to be able to act as Dom0 has a uniform >>>>> separation of address spaces). >>>> >>>> Couldn't Xen tell from the guest pagetables whether the memory being >>>> accessed was user-mode or kernel mode? >>> >>> That would be possible, but would feel like adding heuristics instead >>> of a proper distinction. Clearly we'd already be in some trouble if >>> there were cases where some structure doesn't get written to (and >>> hence could live in user-r/o mapped space), but others would need >>> to be verified to be user-r/w mapped. A lot of special casing, that is, >>> and hence of lot of things to be got wrong. >>> >>> And then there is the problem of calling code being in rings 1 or 2: >>> Page tables don't guard ring 0 against such, and we don't even have >>> the notion of selectors (and hence address ranges) bounding >>> accessible regions. We can't even say we assume all of them to be >>> %ds-relative, as it would certainly be legitimate for such a structure >>> to e.g. live on the stack. Of course an option would be to require >>> the kernel driver to not allow requests from other than ring 3. >>> >>> Plus finally - how would we tell interface structures coming from a >>> kernel invoked hypercall from those originating from user mode? >>> There would need to be at least some kind of flag then, which the >>> privcmd driver set, but normal hypercalls originating in the kernel >>> don't. Or would you envision to allow this DMOP hypercall to only >>> be made by user mode tools? If so, does stubdom run its qemu in >>> ring 3 or rather in ring 0? >>> >>> [breaking the order of quoting] >>>> And unless we're positive the guest kernel will never need these >>>> hypercalls, we probably need a flag that allows kernel-mode pointers. >>> >>> Ah, you actually mention that already. >>> >>>>>> (Would it be sufficient to check the starts, or would >>>>>> the ends need to be checked too?) >>>>> >>>>> Both would need to be checked, so the size field(s) would need to >>>>> be locatable too. >>>> >>>> We could have the "fixed" part of the structure contain domid and an >>>> array of <ptr, len> which the privcmd driver could check. I don't think >>>> that would be terrible. >>> >>> Doable, yes, but not really nice, especially for the party invoking >>> the hypercall as well as the backing implementation in Xen (as >>> opposed to the privcmd driver, for which such a model would likely >>> work quite well), as they then can't use the normal, simple reading >>> of structure fields, but instead would need to populate array >>> elements in the right order. >> >> So on the whole, what would be your suggestion for how to solve the >> userspace-pointer problem? > > Well, none of the options considered so far are really nice or > readily available. I think the easiest to use for both the caller and > the implementation of the hypercall would be the auxiliary > hypercall for a kernel to indicate user/kernel boundaries plus a > flag on the DMOP one for the kernel mode driver to indicate its > user mode origin. The main (purely theoretical afaict) downside > of this is the difficulty to use it in OSes with variable user/kernel > boundaries. What about including in the "fixed" part of the hypercall a virtual address range that all pointers must be in? That wouldn't even require a user/kernel flag actually; and could conceivably be used by the caller (either userspace or kernel space) to thwart certain kinds of potential attacks. It would take changing the copy_guest() macros to include a potential range argument, but that shouldn't be too intrusive on the whole, I wouldn't think. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-15 10:47 ` George Dunlap @ 2016-08-15 11:20 ` Jan Beulich 2016-08-15 12:07 ` Ian Jackson 2016-08-15 14:57 ` George Dunlap 0 siblings, 2 replies; 74+ messages in thread From: Jan Beulich @ 2016-08-15 11:20 UTC (permalink / raw) To: George Dunlap Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 15.08.16 at 12:47, <george.dunlap@citrix.com> wrote: > On 15/08/16 11:19, Jan Beulich wrote: >> Well, none of the options considered so far are really nice or >> readily available. I think the easiest to use for both the caller and >> the implementation of the hypercall would be the auxiliary >> hypercall for a kernel to indicate user/kernel boundaries plus a >> flag on the DMOP one for the kernel mode driver to indicate its >> user mode origin. The main (purely theoretical afaict) downside >> of this is the difficulty to use it in OSes with variable user/kernel >> boundaries. > > What about including in the "fixed" part of the hypercall a virtual > address range that all pointers must be in? That wouldn't even require > a user/kernel flag actually; and could conceivably be used by the caller > (either userspace or kernel space) to thwart certain kinds of potential > attacks. That's definitely an option, if we're sufficiently certain that no OSes will ever require two or more ranges. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-15 11:20 ` Jan Beulich @ 2016-08-15 12:07 ` Ian Jackson 2016-08-15 14:20 ` Jan Beulich 2016-08-15 14:57 ` George Dunlap 1 sibling, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-08-15 12:07 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, George Dunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > On 15.08.16 at 12:47, <george.dunlap@citrix.com> wrote: > > What about including in the "fixed" part of the hypercall a virtual > > address range that all pointers must be in? That wouldn't even require > > a user/kernel flag actually; and could conceivably be used by the caller > > (either userspace or kernel space) to thwart certain kinds of potential > > attacks. > > That's definitely an option, if we're sufficiently certain that no OSes > will ever require two or more ranges. How hard would it be to allow the caller to specify several allowable ranges ? Note that the hypercall argument construction code in libxc already has to handle all hypercall argument memory specially, so libxc could automatically build a list of the arguments' memory addresses. What would be needed is some kind of restriction on (or variant of) copy_* which double-checked against the list provided in the non-op-specific part of the hypercall. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-15 12:07 ` Ian Jackson @ 2016-08-15 14:20 ` Jan Beulich 0 siblings, 0 replies; 74+ messages in thread From: Jan Beulich @ 2016-08-15 14:20 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, George Dunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 15.08.16 at 14:07, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu > depriv)"): >> On 15.08.16 at 12:47, <george.dunlap@citrix.com> wrote: >> > What about including in the "fixed" part of the hypercall a virtual >> > address range that all pointers must be in? That wouldn't even require >> > a user/kernel flag actually; and could conceivably be used by the caller >> > (either userspace or kernel space) to thwart certain kinds of potential >> > attacks. >> >> That's definitely an option, if we're sufficiently certain that no OSes >> will ever require two or more ranges. > > How hard would it be to allow the caller to specify several allowable > ranges ? Not very hard - just like with so many things an additional level of indirection would do. > Note that the hypercall argument construction code in libxc already > has to handle all hypercall argument memory specially, so libxc could > automatically build a list of the arguments' memory addresses. > > What would be needed is some kind of restriction on (or variant of) > copy_* which double-checked against the list provided in the > non-op-specific part of the hypercall. Yeah, as George already mentioned. I'd favor "variant of", to avoid penalizing all other callers. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-15 11:20 ` Jan Beulich 2016-08-15 12:07 ` Ian Jackson @ 2016-08-15 14:57 ` George Dunlap 2016-08-15 15:22 ` Jan Beulich 1 sibling, 1 reply; 74+ messages in thread From: George Dunlap @ 2016-08-15 14:57 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra On 15/08/16 12:20, Jan Beulich wrote: >>>> On 15.08.16 at 12:47, <george.dunlap@citrix.com> wrote: >> On 15/08/16 11:19, Jan Beulich wrote: >>> Well, none of the options considered so far are really nice or >>> readily available. I think the easiest to use for both the caller and >>> the implementation of the hypercall would be the auxiliary >>> hypercall for a kernel to indicate user/kernel boundaries plus a >>> flag on the DMOP one for the kernel mode driver to indicate its >>> user mode origin. The main (purely theoretical afaict) downside >>> of this is the difficulty to use it in OSes with variable user/kernel >>> boundaries. >> >> What about including in the "fixed" part of the hypercall a virtual >> address range that all pointers must be in? That wouldn't even require >> a user/kernel flag actually; and could conceivably be used by the caller >> (either userspace or kernel space) to thwart certain kinds of potential >> attacks. > > That's definitely an option, if we're sufficiently certain that no OSes > will ever require two or more ranges. I'm sorry, I think this is getting a bit silly. There are currently no known operating systems which have discontinuous user-space virtual address ranges. Even if there were, the hypercall structure I'm proposing would still function; the only restriction would be that any single hypercall would have to have all arguments within one of those ranges. If we find such a monster in the future, we can try to figure out how to accommodate it at that point. Another idea I had was to do a multi-call-style hypercall "wrapper" hypercall, similar to David's XSM idea, but instead of running with a specific XSM label, would restrict the target domain and VA range of all included hypercalls. Then the individual hypercall structure wouldn't need to be known at all; and if it ever became important to have (say) two VA ranges, we could simply add a new "wrapper" hypercall. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-15 14:57 ` George Dunlap @ 2016-08-15 15:22 ` Jan Beulich 0 siblings, 0 replies; 74+ messages in thread From: Jan Beulich @ 2016-08-15 15:22 UTC (permalink / raw) To: George Dunlap Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 15.08.16 at 16:57, <george.dunlap@citrix.com> wrote: > On 15/08/16 12:20, Jan Beulich wrote: >>>>> On 15.08.16 at 12:47, <george.dunlap@citrix.com> wrote: >>> On 15/08/16 11:19, Jan Beulich wrote: >>>> Well, none of the options considered so far are really nice or >>>> readily available. I think the easiest to use for both the caller and >>>> the implementation of the hypercall would be the auxiliary >>>> hypercall for a kernel to indicate user/kernel boundaries plus a >>>> flag on the DMOP one for the kernel mode driver to indicate its >>>> user mode origin. The main (purely theoretical afaict) downside >>>> of this is the difficulty to use it in OSes with variable user/kernel >>>> boundaries. >>> >>> What about including in the "fixed" part of the hypercall a virtual >>> address range that all pointers must be in? That wouldn't even require >>> a user/kernel flag actually; and could conceivably be used by the caller >>> (either userspace or kernel space) to thwart certain kinds of potential >>> attacks. >> >> That's definitely an option, if we're sufficiently certain that no OSes >> will ever require two or more ranges. > > I'm sorry, I think this is getting a bit silly. There are currently no > known operating systems which have discontinuous user-space virtual > address ranges. Even if there were, the hypercall structure I'm > proposing would still function; the only restriction would be that any > single hypercall would have to have all arguments within one of those > ranges. Well, you then discount the original vDSO range 64-bit Linux had high up in the fixmap area. Not that this area would be usable for any bad here, but it shows that such an idea isn't pure theory. > Another idea I had was to do a multi-call-style hypercall "wrapper" > hypercall, similar to David's XSM idea, but instead of running with a > specific XSM label, would restrict the target domain and VA range of all > included hypercalls. Then the individual hypercall structure wouldn't > need to be known at all; and if it ever became important to have (say) > two VA ranges, we could simply add a new "wrapper" hypercall. That should work too, yes. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-09 10:29 ` Jan Beulich 2016-08-09 10:48 ` Ian Jackson @ 2016-08-15 14:50 ` David Vrabel 2016-08-15 15:24 ` Jan Beulich 1 sibling, 1 reply; 74+ messages in thread From: David Vrabel @ 2016-08-15 14:50 UTC (permalink / raw) To: Jan Beulich, Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, Anthony Perard, xen-devel, dgdegra On 09/08/16 11:29, Jan Beulich wrote: >>>> On 08.08.16 at 15:46, <ian.jackson@eu.citrix.com> wrote: >> Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu >> depriv)"): >>> On 05.08.16 at 18:28, <ian.jackson@eu.citrix.com> wrote: >>>> That is, a bug of class 2 would allow the unprivileged qemu process in >>>> dom0 to cause damage to other parts of dom0. >> ... >>> Ah, okay, I think I finally understand. [...] >>> >>> I'm having, however, a hard time imagining a class 2 bug for any >>> of the hvmop-s that are being converted by the hvmctl series: >>> These act on the target domain, so would not touch the calling >>> ones state other than for copying argument structures to/from >>> guest memory (and I don't view mixing up domain pointers as >>> a likely source of problems). >> >> Right. AIUI all the hypercall arguments are passed using >> calling-guest-virtual addresses, which the dom0 privcmd arrangements >> are capable of ensuring are sane. > > Actually, having thought about this some more, and taking this > together with the expectations to the privcmd driver previously > outlined, I think this part is problematic: If all the driver is to know > is the position (within the interface structure) of the target domain > ID, then any guest handles embedded in the interface structure > (XEN_HVMCTL_track_dirty_vram only for now) couldn't get > validated, and hence user mode code would have a way to access > or modify kernel memory. It seems simpler to me to have in the privcmd driver: if (op == HVMCTL_track_dirty_vram) ret = access_ok(...); It looks simpler to me to fix the ABI and add the checking in the privcmd driver than add one of the proposed mechanisms to allow the hypervisor to do the checking. To avoid the issues with having to update the kernel in lock-step with the hypervisor (if new checks are needed in privcmd), we can (in the common case) do version the checking in the driver. i.e., if the privcmd drivers knows about hypervisor ABI V but qemu needs V+1 then it can choose to disable the depriv and thus continue to work (with reduced defense in depth). David _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-15 14:50 ` David Vrabel @ 2016-08-15 15:24 ` Jan Beulich 2016-08-26 11:29 ` Ian Jackson 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-08-15 15:24 UTC (permalink / raw) To: David Vrabel Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Ian Jackson, GeorgeDunlap, Tim Deegan, Anthony Perard, xen-devel, dgdegra >>> On 15.08.16 at 16:50, <david.vrabel@citrix.com> wrote: > On 09/08/16 11:29, Jan Beulich wrote: >>>>> On 08.08.16 at 15:46, <ian.jackson@eu.citrix.com> wrote: >>> Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu >>> depriv)"): >>>> On 05.08.16 at 18:28, <ian.jackson@eu.citrix.com> wrote: >>>>> That is, a bug of class 2 would allow the unprivileged qemu process in >>>>> dom0 to cause damage to other parts of dom0. >>> ... >>>> Ah, okay, I think I finally understand. [...] >>>> >>>> I'm having, however, a hard time imagining a class 2 bug for any >>>> of the hvmop-s that are being converted by the hvmctl series: >>>> These act on the target domain, so would not touch the calling >>>> ones state other than for copying argument structures to/from >>>> guest memory (and I don't view mixing up domain pointers as >>>> a likely source of problems). >>> >>> Right. AIUI all the hypercall arguments are passed using >>> calling-guest-virtual addresses, which the dom0 privcmd arrangements >>> are capable of ensuring are sane. >> >> Actually, having thought about this some more, and taking this >> together with the expectations to the privcmd driver previously >> outlined, I think this part is problematic: If all the driver is to know >> is the position (within the interface structure) of the target domain >> ID, then any guest handles embedded in the interface structure >> (XEN_HVMCTL_track_dirty_vram only for now) couldn't get >> validated, and hence user mode code would have a way to access >> or modify kernel memory. > > It seems simpler to me to have in the privcmd driver: > > if (op == HVMCTL_track_dirty_vram) > ret = access_ok(...); > > It looks simpler to me to fix the ABI and add the checking in the > privcmd driver than add one of the proposed mechanisms to allow the > hypervisor to do the checking. Simpler, yes, but ... > To avoid the issues with having to update the kernel in lock-step with > the hypervisor (if new checks are needed in privcmd), we can (in the > common case) do version the checking in the driver. > > i.e., if the privcmd drivers knows about hypervisor ABI V but qemu needs > V+1 then it can choose to disable the depriv and thus continue to work > (with reduced defense in depth). ... less flexible, and prone to end up in a mess once we have more than a handful of versions for the driver to deal with. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-15 15:24 ` Jan Beulich @ 2016-08-26 11:29 ` Ian Jackson 2016-08-26 12:58 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-08-26 11:29 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: Device model operation hypercall (DMOP, re qemu depriv)"): > On 15.08.16 at 16:50, <david.vrabel@citrix.com> wrote: > > It seems simpler to me to have in the privcmd driver: > > > > if (op == HVMCTL_track_dirty_vram) > > ret = access_ok(...); > > > > It looks simpler to me to fix the ABI and add the checking in the > > privcmd driver than add one of the proposed mechanisms to allow the > > hypervisor to do the checking. > > Simpler, yes, but ... > > > To avoid the issues with having to update the kernel in lock-step with > > the hypervisor (if new checks are needed in privcmd), we can (in the > > common case) do version the checking in the driver. > > > > i.e., if the privcmd drivers knows about hypervisor ABI V but qemu needs > > V+1 then it can choose to disable the depriv and thus continue to work > > (with reduced defense in depth). > > ... less flexible, and prone to end up in a mess once we have more > than a handful of versions for the driver to deal with. I agree. To summarise the current proposal: DMOP is roughly the same as Jan's HVMCTL. It needs two enhancements compared to HVMCTL: * Each hypercall transferred to DMOP[1] must promise not to do bad things to the calling domain (or to the whole system). There will need to be some minimal audit (or consideration of the hypercall's functionality) for this purpose. Jan thinks this is not too much work but wants an example or two of a plausible vulnerability that needs to be excluded. * We will need to enable the privcmd driver to defend dom0's memory. This will be done by: the privcmd driver specifying to the hypervisor (either inside each DMOP hypercall struct, or in a previous hypercall) a single address range which is permitted for this purpose. (This would be the vaddr range which is used for userspace.) If the range is specified separately, the DMOP hypercall needs a flag to say whether this specific DMOP call ought to be subject to the restriction. We also discussed compatibility. We need to present a stable ABI and API to qemu. We considered making a limited stability promise for the DMOP hypercalls, but this would involve the hypervisor supporting multiple old versions of DMOPs which is not ideal from a security point of view. Instead, it is proposed that this will be dealt with in libxc. In more detail, I think the plan has then to look something like this: Suppose a DMOP is introduced in Xen version K, and then modified in a non-ABI-compatible way Xen version K+1. Xen K+1 implements only the new ABI, and Xen K provides only the old. libxc in K provides only the single entrypoint for the old DMOP. libxc in K+1 provides a new entrypoint for the new DMOP (with enhanced parameters, or whatever). This new entrypoint needs no compatibility handling and works only only Xen >= K+1 (and otherwise returns ENOSYS). However, libxc K+1 it also provides the old entrypoint. This old entrypoint checks the hypervisor DMOP ABI version, and makes either the old or new hypercall. For this to work, the ABI definition for the retired DMOP must remain in xen.git somewhere, so that the code in libxc can call it. Also, there should be a single DMOP ABI version which can be retrieved by the hypervisor and cached by libxc. This could be the hypervisor ABI version if we don't mind breakage during development. Is this plan a good one for everyone ? Thanks, Ian. [1] By this I don't mean to take a point of view about the name. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-26 11:29 ` Ian Jackson @ 2016-08-26 12:58 ` Jan Beulich 0 siblings, 0 replies; 74+ messages in thread From: Jan Beulich @ 2016-08-26 12:58 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Tim Deegan, GeorgeDunlap, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 26.08.16 at 13:29, <ian.jackson@eu.citrix.com> wrote: > Is this plan a good one for everyone ? Sounds reasonable to me; just needs settling on a few of the actual details. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-01 11:32 ` Device model operation hypercall (DMOP, re qemu depriv) Ian Jackson 2016-08-01 12:41 ` Jan Beulich @ 2016-08-02 11:37 ` Wei Liu 2016-08-02 11:42 ` George Dunlap 2016-09-09 15:16 ` Jennifer Herbert 2 siblings, 1 reply; 74+ messages in thread From: Wei Liu @ 2016-08-02 11:37 UTC (permalink / raw) To: Ian Jackson Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, David Vrabel, Jan Beulich, Anthony Perard, xen-devel, dgdegra On Mon, Aug 01, 2016 at 12:32:54PM +0100, Ian Jackson wrote: > Introducing HVMCTL, Jan wrote: > > A long while back separating out all control kind operations (intended > > for use by only the control domain or device model) from the currect > > hvmop hypercall has been discussed. This series aims at finally making > > this reality (at once allowing to streamline the associated XSM checking). > > I think we need to introduce a new hypercall (which I will call DMOP > for now) which may augment or replace some of HVMCTL. Let me explain: > > > We would like to be able to deprivilege qemu-in-dom0. This is > because qemu has a large attack surface and has a history of security > bugs. If we get this right we can easily reduce the impact of `guest > can take over qemu' bugs to DoS; and perhaps with a bit of effort we > can eliminate the DoS too. (qemu stubdom are another way to do this > but they have their own difficulties.) > > A part of this plan has to be a way for qemu to make hypercalls > related to the guest it is servicing. But qemu needs to be _unable_ > to make _other_ hypercalls. > > I see four possible approaches. In IMO increasing order of > desirability: > > 1. We could simply patch the dom0 privcmd driver to know exactly which > hypercalls are permitted. This is obviously never going to work > because there would have to be a massive table in the kernel, kept > in step with Xen. We could have a kind of pattern matching engine > instead, and load the tables from userspace, but that's a daft > edifice to be building (even if we reuse BPF or something) and a > total pain to maintain. > > 2. We could have some kind of privileged proxy or helper process, > which makes the hypercalls on instruction from qemu. This would be > quite complicated and involve a lot of back-and-forth parameter > passing. Like option 1, this arrangement would end up embedding > detailed knowledge about which hypercalls are appropriate, and have > to understand all of their parameters. > > 3. We could have the dom0 privcmd driver wrap each of qemu's > hypercalls in a special "wrap up with different XSM tag" hypercall. > Then, we could specify the set of allowable hypercalls with XSM. > If we want qemu deprivileged by default, this depends on turning > XSM on by default. But we want qemu depriv ASAP and there are > difficulties with XSM by default. This approach also involves > writing a large and hard-to-verify hypercall permission table, in > the form of an XSM policy. > > 4. We could invent a new hypercall `DMOP' for hypercalls which device > models should be able to use, which always has the target domain in > a fixed location in the arguments. We have the dom0 privcmd driver > know about this one hypercall number and the location of the target > domid. > > Option 4 has the following advantages: > > * The specification of which hypercalls are authorised to qemu is > integrated with the specification of the hypercalls themselves: > There is no need to maintain a separate table which can get out of > step (or contain security bugs). > > * The changes required to the rest of the system are fairly small. > In particular: > > * We need only one small, non-varying, patch to the dom0 kernel. > I think your analysis makes sense. > > Let me flesh out option 4 in more detail: > > > We define a new hypercall DMOP. > > Its first argument is always a target domid. The DMOP hypercall > number and position of the target domid in the arguments are fixed. > > A DMOP is defined to never put at risk the stability or security of > the whole system, nor of the domain which calls DMOP. However, a DMOP > may have arbitrary effects on the target domid. > I would like to point out that this is non-trivial since we would need to audit a lot of stuff. But the requirement to audit interface is not unique to DMOP -- I expect this is needed for any other approach. > In the privcmd driver, we provide a new restriction ioctl, which takes > a domid parameter. After that restriction ioctl is called, the > privcmd driver will permit only DMOP hypercalls, and only with the > specified target domid. > It is phrased like that the guest kernel is supposed to enforce the policy? Would it be possible to make Xen do it? I don't think we should trust DM domain kernel here. Wei. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-02 11:37 ` Wei Liu @ 2016-08-02 11:42 ` George Dunlap 2016-08-02 12:34 ` Wei Liu 0 siblings, 1 reply; 74+ messages in thread From: George Dunlap @ 2016-08-02 11:42 UTC (permalink / raw) To: Wei Liu, Ian Jackson Cc: Stefano Stabellini, George Dunlap, Andrew Cooper, Tim Deegan, David Vrabel, Jan Beulich, Anthony Perard, xen-devel, dgdegra On 02/08/16 12:37, Wei Liu wrote: > On Mon, Aug 01, 2016 at 12:32:54PM +0100, Ian Jackson wrote: >> Introducing HVMCTL, Jan wrote: >>> A long while back separating out all control kind operations (intended >>> for use by only the control domain or device model) from the currect >>> hvmop hypercall has been discussed. This series aims at finally making >>> this reality (at once allowing to streamline the associated XSM checking). >> >> I think we need to introduce a new hypercall (which I will call DMOP >> for now) which may augment or replace some of HVMCTL. Let me explain: >> >> >> We would like to be able to deprivilege qemu-in-dom0. This is >> because qemu has a large attack surface and has a history of security >> bugs. If we get this right we can easily reduce the impact of `guest >> can take over qemu' bugs to DoS; and perhaps with a bit of effort we >> can eliminate the DoS too. (qemu stubdom are another way to do this >> but they have their own difficulties.) >> >> A part of this plan has to be a way for qemu to make hypercalls >> related to the guest it is servicing. But qemu needs to be _unable_ >> to make _other_ hypercalls. >> >> I see four possible approaches. In IMO increasing order of >> desirability: >> >> 1. We could simply patch the dom0 privcmd driver to know exactly which >> hypercalls are permitted. This is obviously never going to work >> because there would have to be a massive table in the kernel, kept >> in step with Xen. We could have a kind of pattern matching engine >> instead, and load the tables from userspace, but that's a daft >> edifice to be building (even if we reuse BPF or something) and a >> total pain to maintain. >> >> 2. We could have some kind of privileged proxy or helper process, >> which makes the hypercalls on instruction from qemu. This would be >> quite complicated and involve a lot of back-and-forth parameter >> passing. Like option 1, this arrangement would end up embedding >> detailed knowledge about which hypercalls are appropriate, and have >> to understand all of their parameters. >> >> 3. We could have the dom0 privcmd driver wrap each of qemu's >> hypercalls in a special "wrap up with different XSM tag" hypercall. >> Then, we could specify the set of allowable hypercalls with XSM. >> If we want qemu deprivileged by default, this depends on turning >> XSM on by default. But we want qemu depriv ASAP and there are >> difficulties with XSM by default. This approach also involves >> writing a large and hard-to-verify hypercall permission table, in >> the form of an XSM policy. >> >> 4. We could invent a new hypercall `DMOP' for hypercalls which device >> models should be able to use, which always has the target domain in >> a fixed location in the arguments. We have the dom0 privcmd driver >> know about this one hypercall number and the location of the target >> domid. >> >> Option 4 has the following advantages: >> >> * The specification of which hypercalls are authorised to qemu is >> integrated with the specification of the hypercalls themselves: >> There is no need to maintain a separate table which can get out of >> step (or contain security bugs). >> >> * The changes required to the rest of the system are fairly small. >> In particular: >> >> * We need only one small, non-varying, patch to the dom0 kernel. >> > > I think your analysis makes sense. > >> >> Let me flesh out option 4 in more detail: >> >> >> We define a new hypercall DMOP. >> >> Its first argument is always a target domid. The DMOP hypercall >> number and position of the target domid in the arguments are fixed. >> >> A DMOP is defined to never put at risk the stability or security of >> the whole system, nor of the domain which calls DMOP. However, a DMOP >> may have arbitrary effects on the target domid. >> > > I would like to point out that this is non-trivial since we would need > to audit a lot of stuff. > > But the requirement to audit interface is not unique to DMOP -- I expect > this is needed for any other approach. > >> In the privcmd driver, we provide a new restriction ioctl, which takes >> a domid parameter. After that restriction ioctl is called, the >> privcmd driver will permit only DMOP hypercalls, and only with the >> specified target domid. >> > > It is phrased like that the guest kernel is supposed to enforce the > policy? Would it be possible to make Xen do it? I don't think we should > trust DM domain kernel here. The problem is that Xen doesn't know what process is running, and so can't tell whether qemuA is accessing domainA's memory, or whether qemuB is accessing domainA's memory. The two options that have been proposed are: 1. Have a way for dom0 to give Xen an XSM tag for the current process (so Xen can do the enforcing) 2. Have dom0 filter out the calls based on the fact that all the hypercalls have the same template (i.e., domid in the same position). Either way you are relying on dom0 ("trusting" dom0) to DTRT -- either to do the filtering properly, or to give you the right XSM tag. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-02 11:42 ` George Dunlap @ 2016-08-02 12:34 ` Wei Liu 0 siblings, 0 replies; 74+ messages in thread From: Wei Liu @ 2016-08-02 12:34 UTC (permalink / raw) To: George Dunlap Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Jan Beulich, Anthony Perard, xen-devel, dgdegra On Tue, Aug 02, 2016 at 12:42:36PM +0100, George Dunlap wrote: > On 02/08/16 12:37, Wei Liu wrote: > > On Mon, Aug 01, 2016 at 12:32:54PM +0100, Ian Jackson wrote: > >> Introducing HVMCTL, Jan wrote: > >>> A long while back separating out all control kind operations (intended > >>> for use by only the control domain or device model) from the currect > >>> hvmop hypercall has been discussed. This series aims at finally making > >>> this reality (at once allowing to streamline the associated XSM checking). > >> > >> I think we need to introduce a new hypercall (which I will call DMOP > >> for now) which may augment or replace some of HVMCTL. Let me explain: > >> > >> > >> We would like to be able to deprivilege qemu-in-dom0. This is > >> because qemu has a large attack surface and has a history of security > >> bugs. If we get this right we can easily reduce the impact of `guest > >> can take over qemu' bugs to DoS; and perhaps with a bit of effort we > >> can eliminate the DoS too. (qemu stubdom are another way to do this > >> but they have their own difficulties.) > >> > >> A part of this plan has to be a way for qemu to make hypercalls > >> related to the guest it is servicing. But qemu needs to be _unable_ > >> to make _other_ hypercalls. > >> > >> I see four possible approaches. In IMO increasing order of > >> desirability: > >> > >> 1. We could simply patch the dom0 privcmd driver to know exactly which > >> hypercalls are permitted. This is obviously never going to work > >> because there would have to be a massive table in the kernel, kept > >> in step with Xen. We could have a kind of pattern matching engine > >> instead, and load the tables from userspace, but that's a daft > >> edifice to be building (even if we reuse BPF or something) and a > >> total pain to maintain. > >> > >> 2. We could have some kind of privileged proxy or helper process, > >> which makes the hypercalls on instruction from qemu. This would be > >> quite complicated and involve a lot of back-and-forth parameter > >> passing. Like option 1, this arrangement would end up embedding > >> detailed knowledge about which hypercalls are appropriate, and have > >> to understand all of their parameters. > >> > >> 3. We could have the dom0 privcmd driver wrap each of qemu's > >> hypercalls in a special "wrap up with different XSM tag" hypercall. > >> Then, we could specify the set of allowable hypercalls with XSM. > >> If we want qemu deprivileged by default, this depends on turning > >> XSM on by default. But we want qemu depriv ASAP and there are > >> difficulties with XSM by default. This approach also involves > >> writing a large and hard-to-verify hypercall permission table, in > >> the form of an XSM policy. > >> > >> 4. We could invent a new hypercall `DMOP' for hypercalls which device > >> models should be able to use, which always has the target domain in > >> a fixed location in the arguments. We have the dom0 privcmd driver > >> know about this one hypercall number and the location of the target > >> domid. > >> > >> Option 4 has the following advantages: > >> > >> * The specification of which hypercalls are authorised to qemu is > >> integrated with the specification of the hypercalls themselves: > >> There is no need to maintain a separate table which can get out of > >> step (or contain security bugs). > >> > >> * The changes required to the rest of the system are fairly small. > >> In particular: > >> > >> * We need only one small, non-varying, patch to the dom0 kernel. > >> > > > > I think your analysis makes sense. > > > >> > >> Let me flesh out option 4 in more detail: > >> > >> > >> We define a new hypercall DMOP. > >> > >> Its first argument is always a target domid. The DMOP hypercall > >> number and position of the target domid in the arguments are fixed. > >> > >> A DMOP is defined to never put at risk the stability or security of > >> the whole system, nor of the domain which calls DMOP. However, a DMOP > >> may have arbitrary effects on the target domid. > >> > > > > I would like to point out that this is non-trivial since we would need > > to audit a lot of stuff. > > > > But the requirement to audit interface is not unique to DMOP -- I expect > > this is needed for any other approach. > > > >> In the privcmd driver, we provide a new restriction ioctl, which takes > >> a domid parameter. After that restriction ioctl is called, the > >> privcmd driver will permit only DMOP hypercalls, and only with the > >> specified target domid. > >> > > > > It is phrased like that the guest kernel is supposed to enforce the > > policy? Would it be possible to make Xen do it? I don't think we should > > trust DM domain kernel here. > > The problem is that Xen doesn't know what process is running, and so > can't tell whether qemuA is accessing domainA's memory, or whether qemuB > is accessing domainA's memory. > > The two options that have been proposed are: > > 1. Have a way for dom0 to give Xen an XSM tag for the current process > (so Xen can do the enforcing) > > 2. Have dom0 filter out the calls based on the fact that all the > hypercalls have the same template (i.e., domid in the same position). > > Either way you are relying on dom0 ("trusting" dom0) to DTRT -- either > to do the filtering properly, or to give you the right XSM tag. > Right. I think that's what slipped my mind. Thanks for explaining! Wei. > -George > > > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-08-01 11:32 ` Device model operation hypercall (DMOP, re qemu depriv) Ian Jackson 2016-08-01 12:41 ` Jan Beulich 2016-08-02 11:37 ` Wei Liu @ 2016-09-09 15:16 ` Jennifer Herbert 2016-09-09 15:34 ` David Vrabel ` (2 more replies) 2 siblings, 3 replies; 74+ messages in thread From: Jennifer Herbert @ 2016-09-09 15:16 UTC (permalink / raw) To: Ian Jackson, Jan Beulich, Andrew Cooper, dgdegra, Wei Liu, George Dunlap, Stefano Stabellini, Tim Deegan, David Vrabel, Anthony Perard, Konrad Rzeszutek Wilk Cc: xen-devel On 01/08/16 12:32, Ian Jackson wrote: > I think we need to introduce a new hypercall (which I will call DMOP > for now) which may augment or replace some of HVMCTL. Let me explain: > I believe the new 'DMOP' hypercall is a good idea, but following on from discussions, I propose a revised design, which I present below. Please let me know what you think. Thanks, Jenny. DMOP (multi-buffer variant) ============================ Introduction ------------ A previous proposal for a 'DMOP' was put forward by Ian Jackson on the 1st of August. This proposal seem very promising, however a problem was identified with it, that this proposal addresses. The aim of DMOP, as before, is to prevent a compromised device model from compromising domains other then the one it is associated with. (And is therefore likely already compromised) The previous proposal adds a DMOP hypercall, for use by the device models, which places the domain ID in a fixed place, within the calling args, such that the priv driver can always find it, and not need to know any further details about the particular DMOP in order to validate it against previously set (vie ioctl) domain. The problem occurs when you have a DMOP with references to other user memory objects, such as with Track dirty VRAM (As used with the VGA buffer). Is this case, the address of this other user object needs to be vetted, to ensure it is not within a restricted address ranges, such as kernel memory. The real problem comes down to how you would vet this address - the idea place to do this is within the privcmd driver, since it would have knowledge of the address space involved. However, with a principal goal of DMOP to allow privcmd to be free from any knowledge of DMOP's sub ops, it would have no way to identify any user buffer addresses that need checking. The alternative of allowing the hypervisor to vet the address is also problematic, since it has no knowledge of the guest memory layout. The Design ---------- As with the previous design, we provide a new restriction ioctl, which takes a domid parameter. After that restriction ioctl is called, the privcmd driver will permit only DMOP hypercalls, and only with the specified target domid. In the previous design, a DMOP consisted of one buffer, containing all of the DMOP parameters, which may include further explicit references to more buffers. In this design, an array of buffers with length is presented, with the first one containing the DMOP parameters, which could implicitly reference to further buffers from within in the array. Here, the only user buffers passed, are that found with the array, and so all can be audited from privcmd. With the passing of the length of the buffers array, privcmd does not need to know which DMOP it is to audit them. If the hypervisor ends up with the wrong number of buffers, it can reject the DMOP at that point. The following code illustrates this idea: typedef struct dm_op_buffer { XEN_GUEST_HANDLE(void) h; size_t len; } dm_op_buffer_t; int HYPERVISOR_device_model_op( domid_t domid, unsigned int nr_buffers, XEN_GUEST_HANDLE_PARAM(dm_op_buffer_t) buffers) @domid: the domain the hypercall operates on. @nr_buffers; the number of buffers in the @buffers array. @buffers: an array of buffers. @buffers[0] contains device_model_op - the structure describing the sub-op and its parameters. @buffers[1], @buffers[2] etc. may be used by a sub-op for passing additional buffers. struct device_model_op { uint32_t op; union { struct op_1 op1; struct op_2 op2; /* etc... */ } u; }; It is forbidden for the above struct (device_model_op) to contain any guest handles - if they are needed, they should instead be in HYPERVISOR_device_model_op->buffers. Validation by privcmd driver ------------------------------ If the privcmd driver has been restricted to specific domain (using a new ioctl), when it received an op, it will: 1. Check hypercall is DMOP. 2. Check domid == restricted domid. 3. For each @nr_buffers in @buffers: Check @h and @len give a buffer wholey in the user space part of the virtual address space. (e.g., on Linux use access_ok()). Xen Implementation ------------------ Since a DMOP sub of may need to copy or return a buffer from the guest, as well as the DMOP itself to git the initial buffer, functions for doing this would be written as below. Note that care is taken to prevent damage from buffer under or over run situations. If the DMOP is called with insufficient buffers, zeros will be read, while extra is ignored. int copy_dm_buffer_from_guest( void *dst, /* Kernel destination buffer */ size_t max_len, /* Size of destination buffer */ XEN_GUEST_HANDLE_PARAM(dm_op_buffer_t) buffers, /* dm_op_buffers passed into DMOP */ unsigned int nr_buffers, /* total number of dm_op_buffers */ unsigned int idx) /* Index of buffer we require */ { struct dm_op_buffer buffer; int ret; size_t len; memset(dst, 0, max_len); if (idx>=nr_buffers) return -EFAULT; ret = copy_from_guest_offset(&buffer, buffers, idx, 1); if ( ret != sizeof(buffer) ) return -EFAULT; len = min(max_len, buffer->len); ret = raw_copy_from_guest(dst, buffer->h, len); if ( ret != len ) return -EFAULT; return 0; } int copy_dm_buffer_to_guest(...) { /* Similar to the above, except copying the the other direction. */ } This leaves do_device_model_op easy to implement as below: int do_device_model_op(domid_t domid, unsigned int nr_buffers, XEN_GUEST_HANDLE_PARAM(dm_op_buffer_t) buffers) { struct device_model_op op; ret = copy_dm_buffer_from_guest(&op, sizeof(op), buffers, nr_buffers, 0); if ( ret < 0 ) return ret; switch (op->op) { case DMOP_sub_op1: /* ... */ break; /* etc. */ } return 0; } Summary -------- Advantages of this system, over previouse DMOP proposals: * The validation of address ranges is easily done by the privcmd driver, using standard kernel mechanisms. No need to get Xen thinking about guest memory layout, which it should be independent of, and potentially adding confusion. * Arbitrary number of additional address ranges validated with same mechanism as the initial parameter block. * No need for any new copy_from_guest() variants in the hypervisor, which among other things, prevents code using the wrong one by error, potentially bypassing security. And as with the original DMOP proposal: * The Privcmd driver, or any other kernel parts will not need to be updated when new DMOPs are added or changed. Disadvantages of this system: * Minor stylistic issue relating to buffers being implicitly referred to. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-09 15:16 ` Jennifer Herbert @ 2016-09-09 15:34 ` David Vrabel 2016-09-12 13:47 ` George Dunlap 2016-09-12 14:32 ` Jan Beulich 2 siblings, 0 replies; 74+ messages in thread From: David Vrabel @ 2016-09-09 15:34 UTC (permalink / raw) To: Jennifer Herbert, Ian Jackson, Jan Beulich, Andrew Cooper, dgdegra, Wei Liu, George Dunlap, Stefano Stabellini, Tim Deegan, David Vrabel, Anthony Perard, Konrad Rzeszutek Wilk Cc: xen-devel On 09/09/16 16:16, Jennifer Herbert wrote: > On 01/08/16 12:32, Ian Jackson wrote: >> I think we need to introduce a new hypercall (which I will call DMOP >> for now) which may augment or replace some of HVMCTL. Let me explain: >> > > I believe the new 'DMOP' hypercall is a good idea, but following on > from discussions, I propose a revised design, which I present below. > Please let me know what you think. > > Thanks, > Jenny. > > > DMOP (multi-buffer variant) > ============================ [...] > Advantages of this system, over previouse DMOP proposals: > > > * The validation of address ranges is easily done by the privcmd driver, > using standard kernel mechanisms. No need to get Xen thinking about > guest memory layout, which it should be independent of, and potentially > adding confusion. +1. The user address limit in Linux is a per-thread property, so trying to pass this information to the hypervisor would require passing this information in every dm_op, or switching the information on every task switch. The user address limit is also architecture specific, and would require every arch to expose this via some new API. David _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-09 15:16 ` Jennifer Herbert 2016-09-09 15:34 ` David Vrabel @ 2016-09-12 13:47 ` George Dunlap 2016-09-12 14:32 ` Jan Beulich 2 siblings, 0 replies; 74+ messages in thread From: George Dunlap @ 2016-09-12 13:47 UTC (permalink / raw) To: Jennifer Herbert, Ian Jackson, Jan Beulich, Andrew Cooper, dgdegra, Wei Liu, George Dunlap, Stefano Stabellini, Tim Deegan, David Vrabel, Anthony Perard, Konrad Rzeszutek Wilk Cc: xen-devel On 09/09/16 16:16, Jennifer Herbert wrote: > The following code illustrates this idea: > > typedef struct dm_op_buffer { > XEN_GUEST_HANDLE(void) h; > size_t len; > } dm_op_buffer_t; > > int > HYPERVISOR_device_model_op( > domid_t domid, > unsigned int nr_buffers, > XEN_GUEST_HANDLE_PARAM(dm_op_buffer_t) buffers) > > @domid: the domain the hypercall operates on. > @nr_buffers; the number of buffers in the @buffers array. > > @buffers: an array of buffers. @buffers[0] contains device_model_op - the > structure describing the sub-op and its parameters. @buffers[1], > @buffers[2] > etc. may be used by a sub-op for passing additional buffers. > > struct device_model_op { > uint32_t op; > union { > struct op_1 op1; > struct op_2 op2; > /* etc... */ > } u; > }; > > It is forbidden for the above struct (device_model_op) to contain any > guest handles - if they are needed, they should instead be in > HYPERVISOR_device_model_op->buffers. Sounds plausible to me. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-09 15:16 ` Jennifer Herbert 2016-09-09 15:34 ` David Vrabel 2016-09-12 13:47 ` George Dunlap @ 2016-09-12 14:32 ` Jan Beulich 2016-09-13 10:37 ` George Dunlap 2016-09-13 16:07 ` David Vrabel 2 siblings, 2 replies; 74+ messages in thread From: Jan Beulich @ 2016-09-12 14:32 UTC (permalink / raw) To: Jennifer Herbert Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 09.09.16 at 17:16, <Jennifer.Herbert@citrix.com> wrote: > The following code illustrates this idea: > > typedef struct dm_op_buffer { > XEN_GUEST_HANDLE(void) h; > size_t len; > } dm_op_buffer_t; This implies that we'll lose all type safety on the handles passed, as is also emphasized by the use of raw_copy_from_guest() in the code outline further down. > int > HYPERVISOR_device_model_op( > domid_t domid, > unsigned int nr_buffers, > XEN_GUEST_HANDLE_PARAM(dm_op_buffer_t) buffers) Along the same lines this will add implicit agreements (presumably solely written out as comments, or maybe via manifest constants) about which element of the buffer array has which meaning, quite contrary to the otherwise enforced agreement (through use of structure fields). > int copy_dm_buffer_from_guest( > void *dst, /* Kernel destination buffer */ > size_t max_len, /* Size of destination buffer */ > XEN_GUEST_HANDLE_PARAM(dm_op_buffer_t) buffers, > /* dm_op_buffers passed into DMOP */ > unsigned int nr_buffers, /* total number of dm_op_buffers */ > unsigned int idx) /* Index of buffer we require */ Looking at other hypercalls, it is not uncommon that arrays get read element by element. While of course this function can be extended suitably (perhaps with additional macro wrappers to deal with the base types of such arrays), it will then become more cumbersome to use, extending the "Minor stylistic issue" mentioned in the disadvantages section further down. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-12 14:32 ` Jan Beulich @ 2016-09-13 10:37 ` George Dunlap 2016-09-13 11:53 ` Jan Beulich 2016-09-13 16:07 ` David Vrabel 1 sibling, 1 reply; 74+ messages in thread From: George Dunlap @ 2016-09-13 10:37 UTC (permalink / raw) To: Jan Beulich, Jennifer Herbert Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra On 12/09/16 15:32, Jan Beulich wrote: >>>> On 09.09.16 at 17:16, <Jennifer.Herbert@citrix.com> wrote: >> The following code illustrates this idea: >> >> typedef struct dm_op_buffer { >> XEN_GUEST_HANDLE(void) h; >> size_t len; >> } dm_op_buffer_t; > > This implies that we'll lose all type safety on the handles passed, as > is also emphasized by the use of raw_copy_from_guest() in the code > outline further down. If most of the time the hypercalls are made by calling libxc functions, and the libxc functions have types as arguments, then the end caller has the same type safety. We'll have to be careful inside the individual libxc functions, but that should be fairly straightforward to do. So the places where we need to take extra care should be very localized. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-13 10:37 ` George Dunlap @ 2016-09-13 11:53 ` Jan Beulich 0 siblings, 0 replies; 74+ messages in thread From: Jan Beulich @ 2016-09-13 11:53 UTC (permalink / raw) To: George Dunlap, Jennifer Herbert Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, IanJackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 13.09.16 at 12:37, <george.dunlap@citrix.com> wrote: > On 12/09/16 15:32, Jan Beulich wrote: >>>>> On 09.09.16 at 17:16, <Jennifer.Herbert@citrix.com> wrote: >>> The following code illustrates this idea: >>> >>> typedef struct dm_op_buffer { >>> XEN_GUEST_HANDLE(void) h; >>> size_t len; >>> } dm_op_buffer_t; >> >> This implies that we'll lose all type safety on the handles passed, as >> is also emphasized by the use of raw_copy_from_guest() in the code >> outline further down. > > If most of the time the hypercalls are made by calling libxc functions, > and the libxc functions have types as arguments, then the end caller has > the same type safety. We'll have to be careful inside the individual > libxc functions, but that should be fairly straightforward to do. So > the places where we need to take extra care should be very localized. My main fear isn't so much to get it wrong the first time round, but to break things down the road by not adjusting types in all needed places at once (because the compiler doesn't tell the programmer). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-12 14:32 ` Jan Beulich 2016-09-13 10:37 ` George Dunlap @ 2016-09-13 16:07 ` David Vrabel 2016-09-14 9:51 ` Jan Beulich 1 sibling, 1 reply; 74+ messages in thread From: David Vrabel @ 2016-09-13 16:07 UTC (permalink / raw) To: Jan Beulich, Jennifer Herbert Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, David Vrabel, Anthony Perard, xen-devel, dgdegra On 12/09/16 15:32, Jan Beulich wrote: >>>> On 09.09.16 at 17:16, <Jennifer.Herbert@citrix.com> wrote: >> The following code illustrates this idea: >> >> typedef struct dm_op_buffer { >> XEN_GUEST_HANDLE(void) h; >> size_t len; >> } dm_op_buffer_t; > > This implies that we'll lose all type safety on the handles passed, as > is also emphasized by the use of raw_copy_from_guest() in the code > outline further down. This is an direct result of the requirement that the privcmd driver does not know the types of the sub-ops themselves. We can't have this requirement and type safety. Which do we want? I would point out that Linux copy_from_user() and copy_to_user() functions are not type safe and I'm not aware that this causes many problems. David _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-13 16:07 ` David Vrabel @ 2016-09-14 9:51 ` Jan Beulich 2016-09-21 11:21 ` Ian Jackson 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-09-14 9:51 UTC (permalink / raw) To: David Vrabel, Jennifer Herbert Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, TimDeegan, Anthony Perard, xen-devel, dgdegra >>> On 13.09.16 at 18:07, <david.vrabel@citrix.com> wrote: > On 12/09/16 15:32, Jan Beulich wrote: >>>>> On 09.09.16 at 17:16, <Jennifer.Herbert@citrix.com> wrote: >>> The following code illustrates this idea: >>> >>> typedef struct dm_op_buffer { >>> XEN_GUEST_HANDLE(void) h; >>> size_t len; >>> } dm_op_buffer_t; >> >> This implies that we'll lose all type safety on the handles passed, as >> is also emphasized by the use of raw_copy_from_guest() in the code >> outline further down. > > This is an direct result of the requirement that the privcmd driver does > not know the types of the sub-ops themselves. We can't have this > requirement and type safety. Which do we want? Both, which the proposal utilizing side band information on VA ranges allows for. (And in any event this to me clearly is an aspect that would need to be mentioned in the disadvantages section of the document.) Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-14 9:51 ` Jan Beulich @ 2016-09-21 11:21 ` Ian Jackson 2016-09-21 11:28 ` George Dunlap 2016-09-21 11:55 ` Jan Beulich 0 siblings, 2 replies; 74+ messages in thread From: Ian Jackson @ 2016-09-21 11:21 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Jennifer Herbert, TimDeegan, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, re qemu depriv)"): > On 13.09.16 at 18:07, <david.vrabel@citrix.com> wrote: > > This is an direct result of the requirement that the privcmd driver does > > not know the types of the sub-ops themselves. We can't have this > > requirement and type safety. Which do we want? > > Both, which the proposal utilizing side band information on VA > ranges allows for. (And in any event this to me clearly is an > aspect that would need to be mentioned in the disadvantages > section of the document.) The side band information approach has the problem that it is easy to accidentally write hypervisor code which misses the additionally-required range check. It might be possible to provide both strict type safety of the kind you're concerned about here, _and_ protection against missing access checks, but it's not very straightforward. One approach to do that would be to augment the guest handle array proposal with a typesafe macro system for accessing the guest handle slots. But I think George has a good argument as to why this is not needed: If most of the time the hypercalls are made by calling libxc functions, and the libxc functions have types as arguments, then the end caller has the same type safety. We'll have to be careful inside the individual libxc functions, but that should be fairly straightforward to do. So the places where we need to take extra care should be very localized. We do not expect DMOP callers to make the hypercalls directly. (They can't sensibly do so because the DMOP ABI is not stable - they need the assistance of the stability layer which we intend to provide in libxc.) So the actual hypercall call sites are all in-tree, in libxc. If the format of the struct used for one of these guest handle slots changes, the same struct definition from the Xen headers is used both in the hypervisor and in libxc, and any incompatibility will be detected. It's true that changes to the semantics of these slots (for example, a change of a slot from "array of struct X" to "one struct Y") will not be detected by the compiler. But *all* ABI changes to the DMOP interface need to be accompanied by changes to libxc to add compatibility code. I think it will be fairly straightforward to check, during review, that each DMOP change comes with a corresponding libxc change. But if you do not agree, then how about hiding the slots with a macro setup ? Something like: struct device_model_op { uint32_t op; union { struct op_foo foo; struct op_bar bar; /* etc... */ } u; }; struct op_foo { some stuff; #define DMOP_GUESTHANDLES_foo(O,ONE,MANY) \ ONE(O, struct op_foo_big_block, big_block) \ MANY(O, struct op_foo_data_array_element, data_array) }; DMOP_DEFINE(foo) Supported (preceded by) something like this (many toothpicks elided for clarity and ease of editing): /* dmop typesafe slot machinery: */ #define DMOP_DEFINE(opname) \ enum { DMOP_GUESTHANDLES_##opname(DMOP__SLOT_INDEX,DMOP__SLOT_INDEX) }; DMOP_GUESTHANDLES_##opname(DMOP__ACCESSORS_ONE,DMOP__ACCESSORS_MANY) #define DMOP__SLOT_INDEX(O,T,N) DMOP__SLOT_INDEX__##O##_##N, #define DMOP__ACCESSORS_ONE(O,T,N) \ static inline int copy_dm_buffer_from_guest_##O##_##N( T *dst, const struct device_model_op *dmop /* accesses [nr_]buffers */) { return copy_dm_buffer_from_guest__raw ((void*)dst, sizeof(*dst), dmop, DMOP__SLOT_INDEX__##O##_##N); } likewise copy_...to #define DMOP__ACCESSORS_MANY(O,T,N) \ static inline size_t get_dm_buffer_array_size_##O##_##N ... { calculation divides buffer size by type size; } static inline long copy_dm_buffer_from_guest_##O##_##N( T *dst, size_t nr_dst, const struct device_model_op *dmop /* accesses [nr_]buffers */) { Personally I don't think this is worth the effort. But if you do I would be happy to turn this into something which actually compiles :-). Thanks for your attention. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-21 11:21 ` Ian Jackson @ 2016-09-21 11:28 ` George Dunlap 2016-09-21 11:58 ` Jan Beulich 2016-09-21 11:55 ` Jan Beulich 1 sibling, 1 reply; 74+ messages in thread From: George Dunlap @ 2016-09-21 11:28 UTC (permalink / raw) To: Ian Jackson, Jan Beulich Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Jennifer Herbert, TimDeegan, David Vrabel, Anthony Perard, xen-devel, dgdegra On 21/09/16 12:21, Ian Jackson wrote: > Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, re qemu depriv)"): >> On 13.09.16 at 18:07, <david.vrabel@citrix.com> wrote: >>> This is an direct result of the requirement that the privcmd driver does >>> not know the types of the sub-ops themselves. We can't have this >>> requirement and type safety. Which do we want? >> >> Both, which the proposal utilizing side band information on VA >> ranges allows for. (And in any event this to me clearly is an >> aspect that would need to be mentioned in the disadvantages >> section of the document.) > > The side band information approach has the problem that it is easy to > accidentally write hypervisor code which misses the > additionally-required range check. > > It might be possible to provide both strict type safety of the kind > you're concerned about here, _and_ protection against missing access > checks, but it's not very straightforward. > > One approach to do that would be to augment the guest handle array > proposal with a typesafe macro system for accessing the guest handle > slots. > > But I think George has a good argument as to why this is not needed: > > If most of the time the hypercalls are made by calling libxc functions, > and the libxc functions have types as arguments, then the end caller has > the same type safety. We'll have to be careful inside the individual > libxc functions, but that should be fairly straightforward to do. So > the places where we need to take extra care should be very localized. > > We do not expect DMOP callers to make the hypercalls directly. (They > can't sensibly do so because the DMOP ABI is not stable - they need > the assistance of the stability layer which we intend to provide in > libxc.) > > So the actual hypercall call sites are all in-tree, in libxc. If the > format of the struct used for one of these guest handle slots changes, > the same struct definition from the Xen headers is used both in the > hypervisor and in libxc, and any incompatibility will be detected. > > It's true that changes to the semantics of these slots (for example, a > change of a slot from "array of struct X" to "one struct Y") will not > be detected by the compiler. > > But *all* ABI changes to the DMOP interface need to be accompanied by > changes to libxc to add compatibility code. I think it will be fairly > straightforward to check, during review, that each DMOP change comes > with a corresponding libxc change. > > > But if you do not agree, then how about hiding the slots with a macro > setup ? Something like: > > struct device_model_op { > uint32_t op; > union { > struct op_foo foo; > struct op_bar bar; > /* etc... */ > } u; > }; > > struct op_foo { > some stuff; > #define DMOP_GUESTHANDLES_foo(O,ONE,MANY) \ > ONE(O, struct op_foo_big_block, big_block) \ > MANY(O, struct op_foo_data_array_element, data_array) > }; > DMOP_DEFINE(foo) > > > Supported (preceded by) something like this (many toothpicks elided > for clarity and ease of editing): > > /* dmop typesafe slot machinery: */ > > #define DMOP_DEFINE(opname) \ > enum { > DMOP_GUESTHANDLES_##opname(DMOP__SLOT_INDEX,DMOP__SLOT_INDEX) > }; > DMOP_GUESTHANDLES_##opname(DMOP__ACCESSORS_ONE,DMOP__ACCESSORS_MANY) > > #define DMOP__SLOT_INDEX(O,T,N) DMOP__SLOT_INDEX__##O##_##N, > > #define DMOP__ACCESSORS_ONE(O,T,N) \ > static inline int copy_dm_buffer_from_guest_##O##_##N( > T *dst, > const struct device_model_op *dmop /* accesses [nr_]buffers */) > { > return copy_dm_buffer_from_guest__raw > ((void*)dst, sizeof(*dst), dmop, > DMOP__SLOT_INDEX__##O##_##N); > } > likewise copy_...to > > #define DMOP__ACCESSORS_MANY(O,T,N) \ > static inline size_t get_dm_buffer_array_size_##O##_##N ... { > calculation divides buffer size by type size; > } > static inline long copy_dm_buffer_from_guest_##O##_##N( > T *dst, size_t nr_dst, > const struct device_model_op *dmop /* accesses [nr_]buffers */) > { > > > Personally I don't think this is worth the effort. But if you do I > would be happy to turn this into something which actually compiles > :-). > > Thanks for your attention. I think the other slight bit of information missing from Jan at least (and perhaps also David) is the difference between their preference / recommendation and their requirement. That is, if David and Jan each strongly recommend their own preferred method, but are willing to (perhaps grudingly) give Ack's to an implementaton of the other's method, then it's really down to the one doing the implementation to read the advice and make their own decision as best they can. If someone feels strongly enough to Nack the other version, please say so now; otherwise, I think Ian (since it seems like he'll be the one implementing it) should make his best judgement based on the arguments available and begin implementation. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-21 11:28 ` George Dunlap @ 2016-09-21 11:58 ` Jan Beulich 0 siblings, 0 replies; 74+ messages in thread From: Jan Beulich @ 2016-09-21 11:58 UTC (permalink / raw) To: George Dunlap Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Jennifer Herbert, TimDeegan, David Vrabel, Anthony Perard, xen-devel, dgdegra, Ian Jackson >>> On 21.09.16 at 13:28, <george.dunlap@citrix.com> wrote: > I think the other slight bit of information missing from Jan at least > (and perhaps also David) is the difference between their preference / > recommendation and their requirement. > > That is, if David and Jan each strongly recommend their own preferred > method, but are willing to (perhaps grudingly) give Ack's to an > implementaton of the other's method, then it's really down to the one > doing the implementation to read the advice and make their own decision > as best they can. > > If someone feels strongly enough to Nack the other version, please say > so now; otherwise, I think Ian (since it seems like he'll be the one > implementing it) should make his best judgement based on the arguments > available and begin implementation. No, I don't think I'd nack David's / Jenny's version (after all the type issue can be taken care off by sufficient discipline of both submitters and reviewers), but I do think that it failed to point out this downside, which imo moves the balance between the two proposals. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) 2016-09-21 11:21 ` Ian Jackson 2016-09-21 11:28 ` George Dunlap @ 2016-09-21 11:55 ` Jan Beulich 2016-09-21 12:23 ` Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] Ian Jackson 1 sibling, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-09-21 11:55 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Jennifer Herbert, TimDeegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 21.09.16 at 13:21, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, > re qemu depriv)"): >> On 13.09.16 at 18:07, <david.vrabel@citrix.com> wrote: >> > This is an direct result of the requirement that the privcmd driver does >> > not know the types of the sub-ops themselves. We can't have this >> > requirement and type safety. Which do we want? >> >> Both, which the proposal utilizing side band information on VA >> ranges allows for. (And in any event this to me clearly is an >> aspect that would need to be mentioned in the disadvantages >> section of the document.) > > The side band information approach has the problem that it is easy to > accidentally write hypervisor code which misses the > additionally-required range check. > > It might be possible to provide both strict type safety of the kind > you're concerned about here, _and_ protection against missing access > checks, but it's not very straightforward. > > One approach to do that would be to augment the guest handle array > proposal with a typesafe macro system for accessing the guest handle > slots. > > But I think George has a good argument as to why this is not needed: > > If most of the time the hypercalls are made by calling libxc functions, > and the libxc functions have types as arguments, then the end caller has > the same type safety. We'll have to be careful inside the individual > libxc functions, but that should be fairly straightforward to do. So > the places where we need to take extra care should be very localized. > > We do not expect DMOP callers to make the hypercalls directly. (They > can't sensibly do so because the DMOP ABI is not stable - they need > the assistance of the stability layer which we intend to provide in > libxc.) > > So the actual hypercall call sites are all in-tree, in libxc. If the > format of the struct used for one of these guest handle slots changes, > the same struct definition from the Xen headers is used both in the > hypervisor and in libxc, and any incompatibility will be detected. Wait, no. The guest handle slots in Jenny's proposal are basically typeless: typedef struct dm_op_buffer { XEN_GUEST_HANDLE(void) h; size_t len; } dm_op_buffer_t; Each sub-op implies a certain type for each handle slot it actually uses. And then I disagree with the general view taken here: The issue I see is not with caller of the libxc interfaces (it was always clear that it would be possible to have type safety at that layer), but between libxc as the consumer and Xen as the producer of the interface. > It's true that changes to the semantics of these slots (for example, a > change of a slot from "array of struct X" to "one struct Y") will not > be detected by the compiler. Well, the "one" vs "array of" part can't normally be taken care of by compiler type checking anyway, except that for the "one" case we wouldn't normally use a handle in the first place. > But *all* ABI changes to the DMOP interface need to be accompanied by > changes to libxc to add compatibility code. I think it will be fairly > straightforward to check, during review, that each DMOP change comes > with a corresponding libxc change. Well, yes, we can only hope for that. The main risk is that someone doesn't update the other side at all, when a change to one side gets done which compiles fine at both ends, and perhaps doesn't even alter the public header. > But if you do not agree, then how about hiding the slots with a macro > setup ? Something like: >[...] Well, that's exactly the kind of workaround I'd like to avoid. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] 2016-09-21 11:55 ` Jan Beulich @ 2016-09-21 12:23 ` Ian Jackson 2016-09-21 12:48 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-09-21 12:23 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Jennifer Herbert, George Dunlap, TimDeegan, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, re qemu depriv)"): > Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, > > So the actual hypercall call sites are all in-tree, in libxc. If the > > format of the struct used for one of these guest handle slots changes, > > the same struct definition from the Xen headers is used both in the > > hypervisor and in libxc, and any incompatibility will be detected. > > Wait, no. The guest handle slots in Jenny's proposal are basically > typeless: ... > Each sub-op implies a certain type for each handle slot it actually > uses. Yes. But in practice each such slot will in practice be accessed by using copy_dm_buffer_from_guest (or whatever) to copy it to a struct. It is intended that that same struct type will be used by libxc to populate the buffer. Now, it is true that the compiler cannot check that the _same type_ is used in both places. So, as I say: It's true that changes to the semantics of these slots (for example, a change of a slot from "array of struct X" to "one struct Y") will not be detected by the compiler. But changes in the contents of the specific struct /will/ be spotted. For example, consider the struct device_model_op, which in Jennifer's document is in slot 0. If someone edits struct device_model_op to change its contents, both libxc and Xen see the same modified struct. The problem can only occur if someone accesses a slot with entirely the wrong struct type, or by the wrong slot number. In such a situation at least in new code, the thing probably wouldn't work at all. And even if we had some kind of typesafe system, the typesafety still wouldn't spot uncontrolled ABI changes which would silently break old callers. So the need for careful review of DMOP changes is still present. So overall I think this typesafety problem is fairly limited. > And then I disagree with the general view taken here: The issue > I see is not with caller of the libxc interfaces (it was always clear > that it would be possible to have type safety at that layer), but > between libxc as the consumer and Xen as the producer of the > interface. I think you have misunderstood me. I was discussing precisely typesafety of that interface. > > But *all* ABI changes to the DMOP interface need to be accompanied by > > changes to libxc to add compatibility code. I think it will be fairly > > straightforward to check, during review, that each DMOP change comes > > with a corresponding libxc change. > > Well, yes, we can only hope for that. The main risk is that someone > doesn't update the other side at all, when a change to one side gets > done which compiles fine at both ends, and perhaps doesn't even > alter the public header. As I say I think if people are making uncontrolled changes to this area, we have ABI stability problems not covered by the guest handle typesafety system. If we wanted to sort that would probably need to make a proper DSL for the ABI. > > But if you do not agree, then how about hiding the slots with a macro > > setup ? Something like: > >[...] > > Well, that's exactly the kind of workaround I'd like to avoid. Well, I think it's a lot of effort for not much gain. But it's certainly doable and if you think it worthwhile I'm prepared to try it. Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, re qemu depriv)"): > On 21.09.16 at 13:28, <george.dunlap@citrix.com> wrote: > > If someone feels strongly enough to Nack the other version, please say > > so now; otherwise, I think Ian (since it seems like he'll be the one > > implementing it) should make his best judgement based on the arguments > > available and begin implementation. > > No, I don't think I'd nack David's / Jenny's version (after all the type > issue can be taken care off by sufficient discipline of both submitters > and reviewers), Thanks. Personally I think David/Jennifer's version is better than the previous proposal based on implicit or separate address limits. > but I do think that it failed to point out this downside, > which imo moves the balance between the two proposals. I'm afraid that I want to complain about this part of your approach to the thread, which I think is unduly harsh. It is of course fair to point out a potential downside of the proposal, which wasn't clearly discussed in the discussion document. And it is sensible for us all to consider that potential downside along with the others - as indeed I do above. But I don't think it is really fair to criticise *the discussion document* (which is what Jennifer's email was) on the grounds that it ommitted to discuss a potential downside which its authors felt was minor[1]. The purpose of a discussion document is to put forward a proposal and/or summarise previous discussion. It is not required to be either neutral or, indeed, comprehensive - and even so, I found this one quite comprehensive. Personally I think Jennifer's email was an excellent contribution [2] and I would be inclined to use it as an example if anyone were to ask in future what a constructive design discussion document should look like. If there was a plan to *commit* the discussion document to xen.git or formally publish it somewhere then you might say "please also discuss this question". Then it would be worth the discouragement implied bty criticism, to improve the document. But since the pros/cons section of this document has now served its purpose there is only harm in criticising it in this way. (And as I say I think the criticism is unfounded.) In summary: writing up a good summary of the discussion is a very virtuous thing to do, which we in the Xen community should do more of. The goal of encouraging people to do that is undermined by complaining about the contents of such a writeup, rather than by constructively addressing the actual substantive issues at stake. Thanks, Ian. [1] Full disclosure: Jennifer sent me a preview of the email beforehand, and I didn't think the typesafety issue was significant enough to suggest she should mention it. [2] As indeed I said to Jennifer in private email before she posted it. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] 2016-09-21 12:23 ` Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] Ian Jackson @ 2016-09-21 12:48 ` Jan Beulich 2016-09-21 13:24 ` Ian Jackson 0 siblings, 1 reply; 74+ messages in thread From: Jan Beulich @ 2016-09-21 12:48 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Jennifer Herbert, George Dunlap, TimDeegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 21.09.16 at 14:23, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, > re qemu depriv)"): >> Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, >> > So the actual hypercall call sites are all in-tree, in libxc. If the >> > format of the struct used for one of these guest handle slots changes, >> > the same struct definition from the Xen headers is used both in the >> > hypervisor and in libxc, and any incompatibility will be detected. >> >> Wait, no. The guest handle slots in Jenny's proposal are basically >> typeless: > ... >> Each sub-op implies a certain type for each handle slot it actually >> uses. > > Yes. But in practice each such slot will in practice be accessed by > using copy_dm_buffer_from_guest (or whatever) to copy it to a struct. > It is intended that that same struct type will be used by libxc to > populate the buffer. > > Now, it is true that the compiler cannot check that the _same type_ is > used in both places. So, as I say: > > It's true that changes to the semantics of these slots (for example, a > change of a slot from "array of struct X" to "one struct Y") will not > be detected by the compiler. > > But changes in the contents of the specific struct /will/ be spotted. As long as it is a structure, yes. What about someone changing uint64_t to xen_pfn_t? >> but I do think that it failed to point out this downside, >> which imo moves the balance between the two proposals. > > I'm afraid that I want to complain about this part of your approach to > the thread, which I think is unduly harsh. > > It is of course fair to point out a potential downside of the proposal, > which wasn't clearly discussed in the discussion document. And it is > sensible for us all to consider that potential downside along with the > others - as indeed I do above. > > But I don't think it is really fair to criticise *the discussion > document* (which is what Jennifer's email was) on the grounds that it > ommitted to discuss a potential downside which its authors felt was > minor[1]. Hmm, then I'm sorry if this came over the wrong way. I didn't mean to criticize the document as a whole, or how it was written. > The purpose of a discussion document is to put forward a > proposal and/or summarise previous discussion. It is not required to > be either neutral or, indeed, comprehensive - and even so, I found > this one quite comprehensive. Nevertheless I think such a document should be as honest as possible wrt downsides of the (by the author(s)) preferred of potentially multiple approaches. If some aspect is deemed minor, I don't think it should be omitted (as then readers won't know whether the aspect was considered at all), but mentioned and stated to be believed to be minor. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] 2016-09-21 12:48 ` Jan Beulich @ 2016-09-21 13:24 ` Ian Jackson 2016-09-21 13:56 ` Jan Beulich 0 siblings, 1 reply; 74+ messages in thread From: Ian Jackson @ 2016-09-21 13:24 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Jennifer Herbert, George Dunlap, TimDeegan, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages]"): > On 21.09.16 at 14:23, <ian.jackson@eu.citrix.com> wrote: > > But changes in the contents of the specific struct /will/ be spotted. > > As long as it is a structure, yes. What about someone changing > uint64_t to xen_pfn_t? You mean if someone changes one of the buffers from "array of uint64_t" to "array of xen_pfn_t", so that the ABI changes on some but not all architectures ? We could declare a rule that a dmop buffer may contain only a struct or type name specific to that dmop (or an array of such), and must always be accessed through a copy to or from such a type. The rule would mean that if we had a dmop which wanted to call, using one of the buffers, a function like this: int some_function(uint64_t *pfns, size_t n_pfns); You'd have to write the dmop definition like this: typedef uint64_t xen_dmop_foo_entry; xen_dmop_foo_entry *pfns; size_t n_pfns; ...allocate table somehow... ret = copy_dm_buffer_from_guest(pfns, sizeof(pfns[0])*n_pfns, buffers, nr_buffers, 1); ... ret = some_function(pfns, n_pfns); And similar code, again using xen_dmop_foo_entry. on the calling side. Then if the buffer entry type is changed to xen_pfn_t you change it to: typedef xen_pfn_t xen_dmop_foo_entry; Now, unless the rest of the code is changed too, the compiler will warn that a xen_pfn_t* cannot be converted to a uint64_t* in the call to some_function. (And likewise at the hypercall site in libxc.) Would this be enough of an improvement, for you, to be worth the additional type name clutter etc. ? > > But I don't think it is really fair to criticise *the discussion > > document* (which is what Jennifer's email was) on the grounds that it > > ommitted to discuss a potential downside which its authors felt was > > minor[1]. > > Hmm, then I'm sorry if this came over the wrong way. I didn't > mean to criticize the document as a whole, or how it was written. Thanks. > > The purpose of a discussion document is to put forward a > > proposal and/or summarise previous discussion. It is not required to > > be either neutral or, indeed, comprehensive - and even so, I found > > this one quite comprehensive. > > Nevertheless I think such a document should be as honest as > possible wrt downsides of the (by the author(s)) preferred of > potentially multiple approaches. I agree with that, although the word "honest" is rather tendentious. > If some aspect is deemed minor, > I don't think it should be omitted (as then readers won't know > whether the aspect was considered at all), but mentioned and > stated to be believed to be minor. I think there is a balance to be struck between mentioning every possible consideration, however minor and remote, and making the document concise and readable. It will inevitably sometimes occur (as it did here) that some issue that one person thought not worth mentioning, seems quite important to another person. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] 2016-09-21 13:24 ` Ian Jackson @ 2016-09-21 13:56 ` Jan Beulich 2016-09-21 15:06 ` Ian Jackson 2016-09-21 17:09 ` George Dunlap 0 siblings, 2 replies; 74+ messages in thread From: Jan Beulich @ 2016-09-21 13:56 UTC (permalink / raw) To: Ian Jackson Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Jennifer Herbert, George Dunlap, TimDeegan, David Vrabel, Anthony Perard, xen-devel, dgdegra >>> On 21.09.16 at 15:24, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, > re qemu depriv) [and 1 more messages]"): >> On 21.09.16 at 14:23, <ian.jackson@eu.citrix.com> wrote: >> > But changes in the contents of the specific struct /will/ be spotted. >> >> As long as it is a structure, yes. What about someone changing >> uint64_t to xen_pfn_t? > > You mean if someone changes one of the buffers from "array of > uint64_t" to "array of xen_pfn_t", so that the ABI changes on some but > not all architectures ? > > We could declare a rule that a dmop buffer may contain only a struct > or type name specific to that dmop (or an array of such), and must > always be accessed through a copy to or from such a type. > > The rule would mean that if we had a dmop which wanted to call, using > one of the buffers, a function like this: > > int some_function(uint64_t *pfns, size_t n_pfns); > > You'd have to write the dmop definition like this: > > typedef uint64_t xen_dmop_foo_entry; > > xen_dmop_foo_entry *pfns; > size_t n_pfns; > > ...allocate table somehow... > ret = copy_dm_buffer_from_guest(pfns, > sizeof(pfns[0])*n_pfns, > buffers, nr_buffers, 1); > ... > ret = some_function(pfns, n_pfns); > > And similar code, again using xen_dmop_foo_entry. on the calling side. > Then if the buffer entry type is changed to xen_pfn_t you change it > to: > > typedef xen_pfn_t xen_dmop_foo_entry; > > Now, unless the rest of the code is changed too, the compiler will > warn that a xen_pfn_t* cannot be converted to a uint64_t* in the call > to some_function. (And likewise at the hypercall site in libxc.) > > Would this be enough of an improvement, for you, to be worth the > additional type name clutter etc. ? Again this looks like much clutter with little benefit to me, i.e. I'd then rather go with the unmodified original proposal. That's largely because nothing really enforces anyone to use the (disconnected) xen_dmop_foo_entry type. I.e. it continues to be a matter of the programmer's and the reviewers' attention/discipline. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] 2016-09-21 13:56 ` Jan Beulich @ 2016-09-21 15:06 ` Ian Jackson 2016-09-21 17:09 ` George Dunlap 1 sibling, 0 replies; 74+ messages in thread From: Ian Jackson @ 2016-09-21 15:06 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, George Dunlap, Andrew Cooper, Jennifer Herbert, George Dunlap, TimDeegan, David Vrabel, Anthony Perard, xen-devel, dgdegra Jan Beulich writes ("Re: [Xen-devel] Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages]"): > On 21.09.16 at 15:24, <ian.jackson@eu.citrix.com> wrote: > > Would this be enough of an improvement, for you, to be worth the > > additional type name clutter etc. ? > > Again this looks like much clutter with little benefit to me, i.e. I'd > then rather go with the unmodified original proposal. That's largely > because nothing really enforces anyone to use the (disconnected) > xen_dmop_foo_entry type. I.e. it continues to be a matter of the > programmer's and the reviewers' attention/discipline. Fair enough. Thanks for your opinions. Regards, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] 2016-09-21 13:56 ` Jan Beulich 2016-09-21 15:06 ` Ian Jackson @ 2016-09-21 17:09 ` George Dunlap 2016-09-22 8:47 ` Jan Beulich 1 sibling, 1 reply; 74+ messages in thread From: George Dunlap @ 2016-09-21 17:09 UTC (permalink / raw) To: Jan Beulich Cc: StefanoStabellini, Wei Liu, Ian Jackson, TimDeegan, Andrew Cooper, David Vrabel, Jennifer Herbert, Anthony Perard, xen-devel, Daniel De Graaf On Wed, Sep 21, 2016 at 2:56 PM, Jan Beulich <JBeulich@suse.com> wrote: > Again this looks like much clutter with little benefit to me, i.e. I'd > then rather go with the unmodified original proposal. That's largely > because nothing really enforces anyone to use the (disconnected) > xen_dmop_foo_entry type. I.e. it continues to be a matter of the > programmer's and the reviewers' attention/discipline. But is "Must use hypercall dmop, subop foo with struct dmop_foo" any different than "Must use hypercall bar with struct bar"? In theory there could be a mismatch between the struct libxc expected to use for a whole hypercall with the struct Xen expected to find for an entire hypercall. But in practice that never happens because we set up the call with the appropriate struct once and then never change it. (That is, we may change the struct elements, but not the name.) This seems to me like a fairly similar case. Note that I'm not arguing for the extra "clutter" per se, I just think that it will be pretty effective in mitigating the type safety issue. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] 2016-09-21 17:09 ` George Dunlap @ 2016-09-22 8:47 ` Jan Beulich 0 siblings, 0 replies; 74+ messages in thread From: Jan Beulich @ 2016-09-22 8:47 UTC (permalink / raw) To: George Dunlap Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Ian Jackson, TimDeegan, David Vrabel, Jennifer Herbert, Anthony Perard, xen-devel, Daniel De Graaf >>> On 21.09.16 at 19:09, <George.Dunlap@eu.citrix.com> wrote: > On Wed, Sep 21, 2016 at 2:56 PM, Jan Beulich <JBeulich@suse.com> wrote: >> Again this looks like much clutter with little benefit to me, i.e. I'd >> then rather go with the unmodified original proposal. That's largely >> because nothing really enforces anyone to use the (disconnected) >> xen_dmop_foo_entry type. I.e. it continues to be a matter of the >> programmer's and the reviewers' attention/discipline. > > But is "Must use hypercall dmop, subop foo with struct dmop_foo" any > different than "Must use hypercall bar with struct bar"? > > In theory there could be a mismatch between the struct libxc expected > to use for a whole hypercall with the struct Xen expected to find for > an entire hypercall. But in practice that never happens because we set > up the call with the appropriate struct once and then never change it. Yes. > (That is, we may change the struct elements, but not the name.) This > seems to me like a fairly similar case. I don't think so - what I'm talking about here is the equivalent of "we may change the struct elements". Such changes would go unnoticed by the compiler the respective pseudo-struct-element is a handle. But anyway - I think we've beaten this horse to death, and I'm the only one worried about type issues here. Therefore I think we should just stop this discussion and go with the proposed model. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* XenProject/XenServer QEMU working group minutes, 30th August 2016 2016-07-28 17:01 XenProject/XenServer QEMU working group, Friday 8th July, 2016, 15:00 Jennifer Herbert 2016-08-01 11:32 ` Device model operation hypercall (DMOP, re qemu depriv) Ian Jackson @ 2016-09-09 16:18 ` Jennifer Herbert 2016-09-12 7:16 ` Juergen Gross 2016-10-14 18:01 ` QEMU XenServer/XenProject Working group meeting 29th September 2016 Jennifer Herbert 1 sibling, 2 replies; 74+ messages in thread From: Jennifer Herbert @ 2016-09-09 16:18 UTC (permalink / raw) To: xen-devel QEMU XenServer/XenProject Working group meeting 30th August 2016 ================================================================ Attendance ---------- Andrew Cooper Ian Jackson Paul Durrant David Vrabel Jennifer Herbert Introduction ------------ Introduced Paul Durrant to the working group. Started by recapping our purpose: A way to make it possible for qemu to be able to make hypercalls without too much privilege, in a way which is up streamable. Dom0 guest must not be able abuse interface to compromise dom0 kernel. QEMU Hypercalls – DM op ----------------------- Has been much discussion on XenDevel - a problem identified is when you have operations with references to other user memory objects, such as with Track dirty VRAM (As used with the VGA buffer) At the moment, apparently there is only that one, but others may emerge. Most obvious solution would involve the guest kernel validating the virtual address passed, however that would would rely on the guest kernel knowing where to objects where. This is to be avoided. Ian recounts how there where variose proposals, on XenDevel involving, essentially, informing the hypervisor, or some way providing the information about which virtual addresses where being talked about by the hypercall to the hypervisor. Many of these involved this information being transmitted via a different channel. Ian suggest the idea provide a way for the kernel to tell the hypervisor is user virtual rages, dm op allowed memory. And there would be a flag, in the dm op, in a fixed location, that would tell the hypervisor, this only talks about special dm pre-approved memory. A scheme of pre-nominating an area in QEMU, maybe using hypercall buffers is briefly discussed,as well as a few other ideas, but concludes that doesn’t really address the problem of future DM ops – of which there could easily be. Even if we can avoid the problem by special cases for our current set-up, we still need a story for how to add future interfaces with handles, without the need to change the kernel interface. Once we come up with story, we wouldn't necessarily have to implement it. The concept of using physical addressed hypercall buffers was discussed. Privcmd could allocate you a place, and mmap it into user ram, and this is the only memory that would be used with the hypercalls. A hypercall would tell you the buffer range. Each qemu would need to be associated with the correct set of physical buffers. A recent AMD proposal was discussed, which would use only physical addresses, no virtual address. The upshot being we should come up with a solution that is not incompatible this. Ideas further discussed: User code could just put stuff in mmaped memory, and only refer to offset within that buffer. The privcmd driver would fill in physical details. All dm ops would have 3 arguments: dm op, pointer to to struct, and optional pointer to restriction array – the last of which is filled in my privcmd driver. It is discussed how privcmd driver must not look at the dm op number, in particular, to know how to validate addresses, as it must be independent from the API. A scheme where qemu calls an ioctl before it drops privileges, to set up restrictions ahead of time, is discussed. One scheme may work by setting up a rang for a given domain or VCPU. The assumption is that all device model, running in the same domain, have the same virtual address layout. Then there would be a flag, in the stable bit of the api, if to apply that restriction - any kernel dm op would not apply that restriction. The idea can be extended – to have more one address range, or can have range explicitly provided in the hypercall. This latter suggestion is preferred, however each platform would have different valid address ranges, and privcmd is platform independent. Its discussed how a function could be created to return valid rages for your given platform, but this is not considered a element solution. The third parameter of the dm op could be array of ranges, where common case for virtual addresses may be 0-3GB, but for physical addresses, it might be quite fragmented. A further ideas is proposed to extend the dm op, to have a fixed part, to have an array of guest handles, the kernel can audit. The arguments would be: Arg1: Dom ID: Arg2: Guests handle array of tuples(address, size) Arg3: Number guest handles. The first element of the array could be the DM op structure itself, containing the DM Op code, and othe argument to the perticular op. The Privcmd driver would only pass though what is provided by the user. Any extra elements would be ignored by the hypercall, and if there where insufficient, the hypercall code would see a NULL, and be able to gracefully fail. The initial block (of dm arguments) passed in the array would be copied into pre-zeroed memory of max op size, having checked the size is not greater then this. No need to check minimum, buffer initialised to zero, so zero length would result in op 0 being called. Functions/Macros could be created to make retrieving such a block easier. Any further blocks needed would be implicitly refereed too, as a given dm op knows it will put buffer foo in array position bar. It would then use the provided function/macros to retrieve it. This last idea is compared with the proposal previously posted to xendevel by Ian. This scheme is slightly messier in the dm op code, having to refer to numbers instead of fields, however, the pros are that: * it's more extendible. It dostn involeve providing a new weird copy to user memory macro, that can be misused, which security implications. * The restriction, is bound the the specific call, and can vary. * Priv cmd slightly simpler, can just call access_ok. * Is physical access scheme compatible. David agrees to write this idea up in a design document. He will not need to discuss any individual dm ops, but should describe the pros and cons compared on other ideas on the table. XenStore -------- The xs-restrict mechanism was summarised, and its limitation – it does not work though the kernel XenStore driver, which is needed to talk to a XenStore domain. A way to fix this would be to create a wrapper. Another approach is to try and remove XenStore from all non-priverlaged parts of QEMU – as it is thought there isn't that much use remaining. Protocols such as QMP would be used instead. PV drivers such as QDISK could be run in a separate qemu process – for which a patch exists. There where concerns this would like a lot of time to achieve. Although time ran out, it was vaguely concluded that multiple approaches could be run in parallel, where initially xs-restrict is used as is, and then a the xenstore wrapper could be developed alongside efforts to reduce XenStore use in QEMU. Even with the XenStore wrapper, QEMU may benefit from reducing the number of communication protocols in use – ie removing XenStore use. Action items ------------ David: Write up latest DM op proposal. Jenny: Write up and arrange next meetup. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: XenProject/XenServer QEMU working group minutes, 30th August 2016 2016-09-09 16:18 ` XenProject/XenServer QEMU working group minutes, 30th August 2016 Jennifer Herbert @ 2016-09-12 7:16 ` Juergen Gross 2016-10-14 18:01 ` QEMU XenServer/XenProject Working group meeting 29th September 2016 Jennifer Herbert 1 sibling, 0 replies; 74+ messages in thread From: Juergen Gross @ 2016-09-12 7:16 UTC (permalink / raw) To: Jennifer Herbert, xen-devel On 09/09/16 18:18, Jennifer Herbert wrote: > QEMU XenServer/XenProject Working group meeting 30th August 2016 > ================================================================ ... > XenStore > -------- > > The xs-restrict mechanism was summarised, and its limitation – it does > not work though the kernel XenStore driver, which is needed to talk to > a XenStore domain. A way to fix this would be to create a wrapper. > > Another approach is to try and remove XenStore from all > non-priverlaged parts of QEMU – as it is thought there isn't that much > use remaining. Protocols such as QMP would be used instead. PV > drivers such as QDISK could be run in a separate qemu process – for > which a patch exists. There where concerns this would like a lot of > time to achieve. > > Although time ran out, it was vaguely concluded that multiple > approaches could be run in parallel, where initially xs-restrict is > used as is, and then a the xenstore wrapper could be developed > alongside efforts to reduce XenStore use in QEMU. Even with the > XenStore wrapper, QEMU may benefit from reducing the number of > communication protocols in use – ie removing XenStore use. What about the following: Split up the transaction ID of Xenstore messages (32 bits) to two 16 bit parts. The high part is a restriction ID (0 == unrestricted). As the transaction ID is local to a connection 16 bits should always be enough for transaction IDs: we would still have more transaction IDs available than domain IDs. xs-restrict returns a restriction ID to be used for the connection from now on. This can be easily added by a kernel wrapper without having to modify the Xenstore protocol: there are just some bits with a special meaning now which have always been under control of the Xenstore daemon/domain. A socket connection with Xenstore daemon using xs-restrict will force the complete connection to be restricted (as today in oxenstored), while a connection from another domain (Xenstore domain case) will rely on the kernel wrapper of the connecting domain to do the restriction ID handling correctly. Adding support for that feature in Xenstore daemon/domain is very easy as the needed modifications should be rather local. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* QEMU XenServer/XenProject Working group meeting 29th September 2016 2016-09-09 16:18 ` XenProject/XenServer QEMU working group minutes, 30th August 2016 Jennifer Herbert 2016-09-12 7:16 ` Juergen Gross @ 2016-10-14 18:01 ` Jennifer Herbert 2016-10-18 19:54 ` Stefano Stabellini 2017-02-28 18:18 ` QEMU XenServer/XenProject Working group meeting 10th February 2017 Jennifer Herbert 1 sibling, 2 replies; 74+ messages in thread From: Jennifer Herbert @ 2016-10-14 18:01 UTC (permalink / raw) To: xen-devel QEMU XenServer/XenProject Working group meeting 29th September 2016 =================================================================== Attendees --------- David Vrabel Jennifer Herbert Ian Jackson Andrew Cooper Paul Durrant Lars Kurth QEMU depriv =========== DMOP ---- There has been agreement on list on the DMOP proposal. The HVMCTL patch series, which was proposed should need only mechanical changes to use it as a basis for DMOP. Privcmd ------- The privcmd changes should be fairly trivial to implement. Libxc would need changing, but this code is also in the HVMCTL patch series. This mean only thing needed for QEMU it to call the restrict ioctl, to enable it. If restrict ioctl missing, an error would be returned. QEMU would probably want an option to it, to indicate de-priv is required. Given this option, the QEMU would raise an error if the restrict ioctl was not present. In order to avoid accidents due to ABI instability, old DMOP numbers would be retired when a DMOP in changed in an ABI-incompatible way - there is no shortage of new DMOP numbers. Eventchan --------- Eventchan has resections in 4.7, but the libxc parts need to be done. This should not be much work. XenStore -------- For the non-pv part of QEMU, XenStore is only used in two places. There is the DM state, and the physmap mechanism. Although there is a vague plan for replacing the physmap mechanism, it is some way off. The DM state key is used for knowing when the qemu process is running etcetera, QMP would seem to be an option to replace it - however there is no (nice) way to wait on a socket until it has been opened. One solution might be to use Xenstore to let you know the QMP sockets where available, before QEMU drops privileges, and then QMP could be used to know QEMU is in the running state. To avoid the need to use xs-restrict, you would need to both replace physmap and rework qemu startup procedure. The use of xs-restrict would be more expedient, and does not look to need that much work. Discussion was had over how secure it would be to allow a guest access to these Xenstore keys - it was concluded that a guest could mostly only mess itself up. If I guest attempted to prevent itself from being migrated, the tool stack time it out, and could kill it. There followed a discussion on the Xenbus protocol, and additions needed. The aim is to merely restrict the permission for the command, to that of the guest who's domID you provide. It was proposed that it uses the header as is, with its 16 bytes, with the command 'one-time-restrict' , and then the payload would have two additional field at the start. These two field would correspond to the domid to restrict as, and the real command. Transaction ID and tags would be taken from the real header. Although inter domain xs-restrict is not specifically needed for this project, it is thought it might be a blocking items for upstream acceptance. It it thoughts these changes would not require that much work to implement, and may be useful in use use cases. Only a few changes to QEMU would be needed, and libxl should be able to track QEMU versions. Ian Jackson volunteered to look at this, with David helping with the kernel bits. Ian won't have time to look at this until after Xen 4.8 is released. There discussion about what may fail once privileges are taken away, which would include CDs and PCI pass though. It is thought the full list can only be known by trying. Not everything needs to work for acceptance upstream, such as PCI pass though. If such an incompatible feature is needed, restrictions can be turned off. These problems can be fixed in a later phase, with CDs likely being at teh top of the list. Action items ------------ Hypervisor bits really needed first, but can't be done until 4.8 has been. Ian to look at the Xenstore items David is to look at the kernel items. Paul is to audit the HVMops, checking parameters etc; It is too late to get this in 4.8, but it is desired to get this in early into 4.9 so that there can be a period of stabilisation. With the release of 4.8 imminent, little work will happen until after that. However Paul, David and Ian are asked to have a think about their respective areas, and have a plan for when they can be done. They are welcome to start tackling them if they have time. disaggregation ============= A disaggregation proposal which had previously been posted to a QEMU forum was discussed. It was not previously accepted by all. The big question was how to separate the device models from the machine, with a particular point of contention being around PIIX and the idea of starting a QEMU instance without one. The general desire from us is we want to have a specific device emulated and nothing else. It is suggested you would have a software interface between each device that looked a software version of PCI. The PIIX device could be attached to CPU this pseudo PCI interface. This would fit in well with how IOREQ server and IOMMU works. Although this sounds like a large architectural change is wanted, its suggested that actually its just that we're asking them to take a different stability and plug-ability posture on the interfaces they already have. This architectural issue is the cause behind lots of little annoyances, which have been going on for years. Xen is having to make up lots of strange stuff to keep QEMU happy, and there is confusion over memory ownership. Fixing the architecture should make our lives much easier. These architectural issues are also making things difficult for Intel, who are trying to work around the issue with Xen changes, which may just worsen the problem. This means this is effectively blocking them. It is proposed that instead of having a QEMU binary, what is really wanted is a QEMU library. With a library you could easily take the bits needed, create your own main loop and link them to whatever interface, IOREQ services or IPC mechanism is needed. There would be no longer be a need for the IOREQ server to be in QEMU, which is thought should be an attractive idea for the QEMU maintainers. It is also thought that other projects, such as the clear containers people would also benefit from such an architecture. The idea of spiltting out the CPU code from the device code may even be attractive to KVM. The code in the Xen tools directory, would be a small event loop, using glib probably, thing that reads ioreq off a ring, and a thing that speaks Xenstore. There would be a bunch of initialisation calls, that calls into libqemu and initialise the various devices,with device structures for them, indicating where they should be mapped and so forth. There would be no IDE code in our tree, and no ioreq server in the QEMU tree. The QEMU maintainers should be in favour of removing Xen specific code from QEMU, and it is also thought that you could demonstrate how to use this to make disaggregated device models for KVM's case. It is further postulated that there may be many people out there with dev boards and experiments with FPGAs and strange PCI stuff, they don't want to wrestle with QEMUs PCI emulator. With the libqemu, it may just take 50 lines of of code for a random developer to plug some hardware together and make a simulator. There was discussion on if a halfway solution might be easier. However it was concluded that such a solution would likely only benefit Xen as a quick fix, and not as much as the full libqemu idea, and so not look that appealing from QEMUs perspective. While the the full libqemu idea would benefit many more people, allowing an explosion of QEMU potential use cases and users. More people using a project should mean more contributors. It was concluded that this was largely a political issue, and that we need to find out what the objections really are. If we where to convince everyone of the benefit, then we'd probably need to step up and to much of the work - however, this is still likely to be less work then maintaining the current set-up. There was further discussion on who our allies might be, and the approach should take to persuade people. It was stressed that we need to sell the benefits of this system i.e. "Releasing its full potential". The alternative to the politics may be to simply fork the project again - the previous fork lasted a decade. However it should be much better to cooperate, and so we much try. Action item ----------- Ian to reach out to Peter Maydell and discuss the issue, and to consider writting down new proposal. (updated: Probebly to early for writeup.) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: QEMU XenServer/XenProject Working group meeting 29th September 2016 2016-10-14 18:01 ` QEMU XenServer/XenProject Working group meeting 29th September 2016 Jennifer Herbert @ 2016-10-18 19:54 ` Stefano Stabellini 2016-10-20 17:37 ` Lars Kurth 2017-02-28 18:18 ` QEMU XenServer/XenProject Working group meeting 10th February 2017 Jennifer Herbert 1 sibling, 1 reply; 74+ messages in thread From: Stefano Stabellini @ 2016-10-18 19:54 UTC (permalink / raw) To: Jennifer Herbert; +Cc: xen-devel I think this kind of calls should be announced on xen-devel before they happen, to give a chance to other people to participate (I cannot promise I would have participated but it is the principle that counts). If I missed the announcement, I apologize. On Fri, 14 Oct 2016, Jennifer Herbert wrote: > XenStore > -------- > > For the non-pv part of QEMU, XenStore is only used in two places. > There is the DM state, and the physmap mechanism. Although there is a > vague plan for replacing the physmap mechanism, it is some way off. > > The DM state key is used for knowing when the qemu process is running > etcetera, QMP would seem to be an option to replace it - however there > is no (nice) way to wait on a socket until it has been opened. One > solution might be to use Xenstore to let you know the QMP sockets > where available, before QEMU drops privileges, and then QMP could be > used to know QEMU is in the running state. > > To avoid the need to use xs-restrict, you would need to both replace > physmap and rework qemu startup procedure. The use of xs-restrict would > be more expedient, and does not look to need that much work. > > Discussion was had over how secure it would be to allow a guest access > to these Xenstore keys - it was concluded that a guest could mostly > only mess itself up. If I guest attempted to prevent itself from being > migrated, the tool stack time it out, and could kill it. > > There followed a discussion on the Xenbus protocol, and additions > needed. The aim is to merely restrict the permission for the command, > to that of the guest who's domID you provide. It was proposed that > it uses the header as is, with its 16 bytes, with the command > 'one-time-restrict' , and then the payload would have two additional > field at the start. These two field would correspond to the domid to > restrict as, and the real command. Transaction ID and tags would be > taken from the real header. > > Although inter domain xs-restrict is not specifically needed for this > project, it is thought it might be a blocking items for upstream > acceptance. It it thoughts these changes would not require that much > work to implement, and may be useful in use use cases. Only a few > changes to QEMU would be needed, and libxl should be able to track > QEMU versions. Ian Jackson volunteered to look at this, with David > helping with the kernel bits. Ian won't have time to look at this > until after Xen 4.8 is released. > > There discussion about what may fail once privileges are taken away, > which would include CDs and PCI pass though. It is thought the full > list can only be known by trying. Not everything needs to work for > acceptance upstream, such as PCI pass though. If such an > incompatible feature is needed, restrictions can be turned off. These > problems can be fixed in a later phase, with CDs likely being at teh > top of the list. One thing to note is that xs-restrict is unimplemented in cxenstored. > disaggregation > ============= > > A disaggregation proposal which had previously been posted to a QEMU > forum was discussed. It was not previously accepted by all. The big > question was how to separate the device models from the machine, with > a particular point of contention being around PIIX and the idea of > starting a QEMU instance without one. Right. In particular I tend to agree with the other QEMU maintainers when they say: why ask for a PIIX3 compatible machine, when actually you don't want to be PIIX3 compatible? > The general desire from us is > we want to have a specific device emulated and nothing else. This is really not possible with QEMU, because QEMU is a machine emulator, not a device emulator. BTW who wants this? I mean, why is this part of the QEMU depriv discussion? It is not necessary. I think what we want for QEMU depriv is to be able to build a QEMU PV machine with just the PV backends in it, which is attainable with the current architecture. I know there are use cases for having an emulator of just one device, but I don't think they should be confused with the more important underlying issue here, which is QEMU running with full privileges. > It is > suggested you would have a software interface between each device that > looked a software version of PCI. The PIIX device could be attached to > CPU this pseudo PCI interface. This would fit in well with how IOREQ > server and IOMMU works. Although this sounds like a large > architectural change is wanted, its suggested that actually its just > that we're asking them to take a different stability and plug-ability > posture on the interfaces they already have. > > This architectural issue is the cause behind lots of little > annoyances, which have been going on for years. Xen is having to make > up lots of strange stuff to keep QEMU happy, and there is confusion > over memory ownership. Fixing the architecture should make our lives > much easier. These architectural issues are also making things > difficult for Intel, who are trying to work around the issue with Xen > changes, which may just worsen the problem. This means this is > effectively blocking them. > > It is proposed that instead of having a QEMU binary, what is really > wanted is a QEMU library. With a library you could easily take the > bits needed, create your own main loop and link them to whatever > interface, IOREQ services or IPC mechanism is needed. There would be > no longer be a need for the IOREQ server to be in QEMU, which is > thought should be an attractive idea for the QEMU maintainers. It is > also thought that other projects, such as the clear containers people > would also benefit from such an architecture. The idea of spiltting > out the CPU code from the device code may even be attractive to KVM. The idea of having a QEMU library has always been resisted upstream. It takes the project in a very different direction. As QEMU maintainer I don't know if such a thing would actually be good for the QEMU community. As I wrote, I would like to see a PV machine which can be built with just the PV backends. This can be done without libqemu. We might also be able to emulate a PIIX3 compatible machine without any CPUs, also without libqemu. > The code in the Xen tools directory, would be a small event loop, > using glib probably, thing that reads ioreq off a ring, and a > thing that speaks Xenstore. There would be a bunch of initialisation > calls, that calls into libqemu and initialise the various devices,with > device structures for them, indicating where they should be mapped and > so forth. There would be no IDE code in our tree, and no ioreq > server in the QEMU tree. > > The QEMU maintainers should be in favour of removing Xen specific code > from QEMU, and it is also thought that you could demonstrate how to > use this to make disaggregated device models for KVM's case. It is > further postulated that there may be many people out there with dev > boards and experiments with FPGAs and strange PCI stuff, they don't > want to wrestle with QEMUs PCI emulator. With the libqemu, it may > just take 50 lines of of code for a random developer to plug some > hardware together and make a simulator. The way QEMU community solved this problem is by using device tree to assemble a machine at runtime with the things needed. It works very nicely. It can be done on x86 too as long as one doesn't try to pretend to emulate an existing well defined machine such as PIIX3. > There was discussion on if a halfway solution might be easier. > However it was concluded that such a solution would likely only > benefit Xen as a quick fix, and not as much as the full libqemu idea, > and so not look that appealing from QEMUs perspective. While the > the full libqemu idea would benefit many more people, allowing an > explosion of QEMU potential use cases and users. More people using a > project should mean more contributors. > > It was concluded that this was largely a political issue, and that we > need to find out what the objections really are. If we where to > convince everyone of the benefit, then we'd probably need to step up > and to much of the work - however, this is still likely to be less > work then maintaining the current set-up. There was further > discussion on who our allies might be, and the approach should take to > persuade people. It was stressed that we need to sell the benefits of > this system i.e. "Releasing its full potential". > > The alternative to the politics may be to simply fork the project > again - the previous fork lasted a decade. However it should be much > better to cooperate, and so we much try. > > Action item > ----------- > > Ian to reach out to Peter Maydell and discuss the issue, and to > consider writting down new proposal. > (updated: Probebly to early for writeup.) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: QEMU XenServer/XenProject Working group meeting 29th September 2016 2016-10-18 19:54 ` Stefano Stabellini @ 2016-10-20 17:37 ` Lars Kurth 2016-10-20 18:53 ` Stefano Stabellini 0 siblings, 1 reply; 74+ messages in thread From: Lars Kurth @ 2016-10-20 17:37 UTC (permalink / raw) To: Stefano Stabellini; +Cc: Jennifer Herbert, xen-devel > On 18 Oct 2016, at 20:54, Stefano Stabellini <sstabellini@kernel.org> wrote: > > I think this kind of calls should be announced on xen-devel before they > happen, to give a chance to other people to participate (I cannot > promise I would have participated but it is the principle that counts). > > If I missed the announcement, I apologize. Stefano, the meeting started off as an internal meeting to brainstorm and share experiences and challenges we have with QEMU amongst different Citrix teams with a view to get a wider dialog started. Maybe we are at the stage where it makes sense to open it up. > On Fri, 14 Oct 2016, Jennifer Herbert wrote: >> XenStore >> -------- >> >> For the non-pv part of QEMU, XenStore is only used in two places. >> There is the DM state, and the physmap mechanism. Although there is a >> vague plan for replacing the physmap mechanism, it is some way off. >> >> The DM state key is used for knowing when the qemu process is running >> etcetera, QMP would seem to be an option to replace it - however there >> is no (nice) way to wait on a socket until it has been opened. One >> solution might be to use Xenstore to let you know the QMP sockets >> where available, before QEMU drops privileges, and then QMP could be >> used to know QEMU is in the running state. >> >> To avoid the need to use xs-restrict, you would need to both replace >> physmap and rework qemu startup procedure. The use of xs-restrict would >> be more expedient, and does not look to need that much work. >> >> Discussion was had over how secure it would be to allow a guest access >> to these Xenstore keys - it was concluded that a guest could mostly >> only mess itself up. If I guest attempted to prevent itself from being >> migrated, the tool stack time it out, and could kill it. >> >> There followed a discussion on the Xenbus protocol, and additions >> needed. The aim is to merely restrict the permission for the command, >> to that of the guest who's domID you provide. It was proposed that >> it uses the header as is, with its 16 bytes, with the command >> 'one-time-restrict' , and then the payload would have two additional >> field at the start. These two field would correspond to the domid to >> restrict as, and the real command. Transaction ID and tags would be >> taken from the real header. >> >> Although inter domain xs-restrict is not specifically needed for this >> project, it is thought it might be a blocking items for upstream >> acceptance. It it thoughts these changes would not require that much >> work to implement, and may be useful in use use cases. Only a few >> changes to QEMU would be needed, and libxl should be able to track >> QEMU versions. Ian Jackson volunteered to look at this, with David >> helping with the kernel bits. Ian won't have time to look at this >> until after Xen 4.8 is released. >> >> There discussion about what may fail once privileges are taken away, >> which would include CDs and PCI pass though. It is thought the full >> list can only be known by trying. Not everything needs to work for >> acceptance upstream, such as PCI pass though. If such an >> incompatible feature is needed, restrictions can be turned off. These >> problems can be fixed in a later phase, with CDs likely being at teh >> top of the list. > > One thing to note is that xs-restrict is unimplemented in cxenstored. > > >> disaggregation >> ============= >> >> A disaggregation proposal which had previously been posted to a QEMU >> forum was discussed. It was not previously accepted by all. The big >> question was how to separate the device models from the machine, with >> a particular point of contention being around PIIX and the idea of >> starting a QEMU instance without one. > > Right. In particular I tend to agree with the other QEMU maintainers > when they say: why ask for a PIIX3 compatible machine, when actually you > don't want to be PIIX3 compatible? > > >> The general desire from us is >> we want to have a specific device emulated and nothing else. > > This is really not possible with QEMU, because QEMU is a machine > emulator, not a device emulator. BTW who wants this? I mean, why is this > part of the QEMU depriv discussion? It is not necessary. I think what we > want for QEMU depriv is to be able to build a QEMU PV machine with just > the PV backends in it, which is attainable with the current > architecture. I know there are use cases for having an emulator of just > one device, but I don't think they should be confused with the more > important underlying issue here, which is QEMU running with full > privileges. > > >> It is >> suggested you would have a software interface between each device that >> looked a software version of PCI. The PIIX device could be attached to >> CPU this pseudo PCI interface. This would fit in well with how IOREQ >> server and IOMMU works. Although this sounds like a large >> architectural change is wanted, its suggested that actually its just >> that we're asking them to take a different stability and plug-ability >> posture on the interfaces they already have. >> >> This architectural issue is the cause behind lots of little >> annoyances, which have been going on for years. Xen is having to make >> up lots of strange stuff to keep QEMU happy, and there is confusion >> over memory ownership. Fixing the architecture should make our lives >> much easier. These architectural issues are also making things >> difficult for Intel, who are trying to work around the issue with Xen >> changes, which may just worsen the problem. This means this is >> effectively blocking them. >> >> It is proposed that instead of having a QEMU binary, what is really >> wanted is a QEMU library. With a library you could easily take the >> bits needed, create your own main loop and link them to whatever >> interface, IOREQ services or IPC mechanism is needed. There would be >> no longer be a need for the IOREQ server to be in QEMU, which is >> thought should be an attractive idea for the QEMU maintainers. It is >> also thought that other projects, such as the clear containers people >> would also benefit from such an architecture. The idea of spiltting >> out the CPU code from the device code may even be attractive to KVM. > > The idea of having a QEMU library has always been resisted upstream. It > takes the project in a very different direction. As QEMU maintainer I > don't know if such a thing would actually be good for the QEMU > community. We revisited the original disaggregation thread (Wei originally proposed the patches) and what we proposed at the time was a sort of a half-way house that was very Xen specific and not really of much use to anyone other QEMU downstream but Xen. Even then, opinions amongst QEMU maintainers were divided: some were in favour, some were not. But we would definitely need to make a good case, do some convincing upfront and address the concerns of the QEMU community and work with the QEMU maintainers from the get-go. As you rightly point out, such an approach does change some of the fundamental assumptions within QEMU and we wouldn't want to do this, if there are no benefits to QEMU. I think it is worthwhile trying this again. You may have some further insights, which would be quite valuable. Regards Lars _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: QEMU XenServer/XenProject Working group meeting 29th September 2016 2016-10-20 17:37 ` Lars Kurth @ 2016-10-20 18:53 ` Stefano Stabellini 0 siblings, 0 replies; 74+ messages in thread From: Stefano Stabellini @ 2016-10-20 18:53 UTC (permalink / raw) To: Lars Kurth; +Cc: Stefano Stabellini, Jennifer Herbert, xen-devel On Thu, 20 Oct 2016, Lars Kurth wrote: > > On 18 Oct 2016, at 20:54, Stefano Stabellini <sstabellini@kernel.org> wrote: > > > > I think this kind of calls should be announced on xen-devel before they > > happen, to give a chance to other people to participate (I cannot > > promise I would have participated but it is the principle that counts). > > > > If I missed the announcement, I apologize. > > Stefano, the meeting started off as an internal meeting to brainstorm and share experiences and challenges we have with QEMU amongst different Citrix teams with a view to get a wider dialog started. Maybe we are at the stage where it makes sense to open it up. No worries, I didn't mean to pick on you guys, as I wrote I am not sure I would actually have participated, but I think the meeting would work better for you and the Xen community if it was open. In fact I think that we don't have enough open meetings in the Xen community in general: for your information Julien and I are going to start organizing one for Xen on ARM soon, with the intention of making it a regular monthly meeting. > > On Fri, 14 Oct 2016, Jennifer Herbert wrote: > >> XenStore > >> -------- > >> > >> For the non-pv part of QEMU, XenStore is only used in two places. > >> There is the DM state, and the physmap mechanism. Although there is a > >> vague plan for replacing the physmap mechanism, it is some way off. > >> > >> The DM state key is used for knowing when the qemu process is running > >> etcetera, QMP would seem to be an option to replace it - however there > >> is no (nice) way to wait on a socket until it has been opened. One > >> solution might be to use Xenstore to let you know the QMP sockets > >> where available, before QEMU drops privileges, and then QMP could be > >> used to know QEMU is in the running state. > >> > >> To avoid the need to use xs-restrict, you would need to both replace > >> physmap and rework qemu startup procedure. The use of xs-restrict would > >> be more expedient, and does not look to need that much work. > >> > >> Discussion was had over how secure it would be to allow a guest access > >> to these Xenstore keys - it was concluded that a guest could mostly > >> only mess itself up. If I guest attempted to prevent itself from being > >> migrated, the tool stack time it out, and could kill it. > >> > >> There followed a discussion on the Xenbus protocol, and additions > >> needed. The aim is to merely restrict the permission for the command, > >> to that of the guest who's domID you provide. It was proposed that > >> it uses the header as is, with its 16 bytes, with the command > >> 'one-time-restrict' , and then the payload would have two additional > >> field at the start. These two field would correspond to the domid to > >> restrict as, and the real command. Transaction ID and tags would be > >> taken from the real header. > >> > >> Although inter domain xs-restrict is not specifically needed for this > >> project, it is thought it might be a blocking items for upstream > >> acceptance. It it thoughts these changes would not require that much > >> work to implement, and may be useful in use use cases. Only a few > >> changes to QEMU would be needed, and libxl should be able to track > >> QEMU versions. Ian Jackson volunteered to look at this, with David > >> helping with the kernel bits. Ian won't have time to look at this > >> until after Xen 4.8 is released. > >> > >> There discussion about what may fail once privileges are taken away, > >> which would include CDs and PCI pass though. It is thought the full > >> list can only be known by trying. Not everything needs to work for > >> acceptance upstream, such as PCI pass though. If such an > >> incompatible feature is needed, restrictions can be turned off. These > >> problems can be fixed in a later phase, with CDs likely being at teh > >> top of the list. > > > > One thing to note is that xs-restrict is unimplemented in cxenstored. > > > > > >> disaggregation > >> ============= > >> > >> A disaggregation proposal which had previously been posted to a QEMU > >> forum was discussed. It was not previously accepted by all. The big > >> question was how to separate the device models from the machine, with > >> a particular point of contention being around PIIX and the idea of > >> starting a QEMU instance without one. > > > > Right. In particular I tend to agree with the other QEMU maintainers > > when they say: why ask for a PIIX3 compatible machine, when actually you > > don't want to be PIIX3 compatible? > > > > > >> The general desire from us is > >> we want to have a specific device emulated and nothing else. > > > > This is really not possible with QEMU, because QEMU is a machine > > emulator, not a device emulator. BTW who wants this? I mean, why is this > > part of the QEMU depriv discussion? It is not necessary. I think what we > > want for QEMU depriv is to be able to build a QEMU PV machine with just > > the PV backends in it, which is attainable with the current > > architecture. I know there are use cases for having an emulator of just > > one device, but I don't think they should be confused with the more > > important underlying issue here, which is QEMU running with full > > privileges. > > > > > >> It is > >> suggested you would have a software interface between each device that > >> looked a software version of PCI. The PIIX device could be attached to > >> CPU this pseudo PCI interface. This would fit in well with how IOREQ > >> server and IOMMU works. Although this sounds like a large > >> architectural change is wanted, its suggested that actually its just > >> that we're asking them to take a different stability and plug-ability > >> posture on the interfaces they already have. > >> > >> This architectural issue is the cause behind lots of little > >> annoyances, which have been going on for years. Xen is having to make > >> up lots of strange stuff to keep QEMU happy, and there is confusion > >> over memory ownership. Fixing the architecture should make our lives > >> much easier. These architectural issues are also making things > >> difficult for Intel, who are trying to work around the issue with Xen > >> changes, which may just worsen the problem. This means this is > >> effectively blocking them. > >> > >> It is proposed that instead of having a QEMU binary, what is really > >> wanted is a QEMU library. With a library you could easily take the > >> bits needed, create your own main loop and link them to whatever > >> interface, IOREQ services or IPC mechanism is needed. There would be > >> no longer be a need for the IOREQ server to be in QEMU, which is > >> thought should be an attractive idea for the QEMU maintainers. It is > >> also thought that other projects, such as the clear containers people > >> would also benefit from such an architecture. The idea of spiltting > >> out the CPU code from the device code may even be attractive to KVM. > > > > The idea of having a QEMU library has always been resisted upstream. It > > takes the project in a very different direction. As QEMU maintainer I > > don't know if such a thing would actually be good for the QEMU > > community. > > We revisited the original disaggregation thread (Wei originally proposed the patches) and what we proposed at the time was a sort of a half-way house that was very Xen specific and not really of much use to anyone other QEMU downstream but Xen. Even then, opinions amongst QEMU maintainers were divided: some were in favour, some were not. But we would definitely need to make a good case, do some convincing upfront and address the concerns of the QEMU community and work with the QEMU maintainers from the get-go. As you rightly point out, such an approach does change some of the fundamental assumptions within QEMU and we wouldn't want to do this, if there are no benefits to QEMU. I think it is worthwhile trying this again. You may have some further insights, which would be quite valuable. OK, these are my 2 cents: for political, licensing and technical reasons I don't think that pushing for libqemu is a worthwhile goal. But I think that something along the lines of what Wei suggested originally could still work, but I agree that we need to make it less Xen specific. I think the way to generalize it is to disentangle CPU emulation, architecture emulation (x86/ARM/etc.) and board emulation (VersatilePB/CubieBoard/etc.). QEMU should be able to emulate a board with no CPUs. Or the same board with an ARM or a PowerPC CPU. For example QEMU should definitely be able to emulate a VersatilePB without a CPU or with a non-ARM CPU. At that point we could ask QEMU to emulate a PIIX3 board with no CPUs or a Xen PV board with no CPUs. This idea comes from past conversations with the QEMU guys. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: QEMU XenServer/XenProject Working group meeting 10th February 2017 2016-10-14 18:01 ` QEMU XenServer/XenProject Working group meeting 29th September 2016 Jennifer Herbert 2016-10-18 19:54 ` Stefano Stabellini @ 2017-02-28 18:18 ` Jennifer Herbert 2017-06-05 13:48 ` QEMU XenServer/XenProject Working group meeting 10th May 2017 Jennifer Herbert 1 sibling, 1 reply; 74+ messages in thread From: Jennifer Herbert @ 2017-02-28 18:18 UTC (permalink / raw) To: xen-devel QEMU XenServer/XenProject Working group meeting 10th February 2017 ================================================================== Attendance ---------- Andrew Cooper Ian Jackson Paul Durrent Jennifer Herbert George Dunlap Igor Druzhinin Sergey Dyasli First topic: DMop ----------------- After an introduction, Paul started by describing his recent progress with Dmop. He has posted a privcmd series, including the dmop hypervisor parts. This adds the dmop ioctl and the restriction ioctl. This restrict ioctl, restricts access of the dmop to that of the given domain, and prevents further access via the privcmd. Paul has also written all the code for libdevicemodel - 4 patches. This code is linux specific, and will detect if its running on an OS it doest support (either old linux or windows), or where the privcmd driver doesnt support the dm ioctl. He next intends add a compatibility layer to libxenctl, which reroutes old xc calls to the new way. This would make the make calls switch transparently, but you'd know at the time you made the restriction call, if it succeeded or not, and can use local policy to determine if to continue or not. Following this, QEMU needs to be ported onto this new library. Next was a lengthy discussion QEMU's mapcahe, and on how to secure mmap calls. It was eventual concluded that Paul could just extend his current patches to also include the mmap calls. It was asked if using this library would mean having two fds, but this was said to be only true if you continued to keep xenctl open – which you shouldn’t since the new restricted ioctl fd should be able to do everything you might need the old one for. Paul detailed how the privcmd driver now has a domid associated with each fd, which defaults to 'invalid'. The restrict ioctl sets this domid, and from then on, every call from that fd is checked against that domain. The restrict ioctl won't allow the domid to be changed again later. It was suggested other calls should be checked, but Paul didn't think there where any others not already discussed. Next topic: How to prevent QEMU running as root ----------------------------------------------- There is a patch to so make QEMU change away from root user. This is expected to make a number of things stop working: * CD-ROM insert - Eject is expected to work, if you could insert one. * Emulated hotplug - unsure. * PCI-passthough – This is deemed out of scope since pci-passthough inherently unsafe anyway. It should give a useful error message suggesting turning off de-priv. This could be made to work in future. It is thought that some of these problems could be solved by passing fds over QMP. Furthermore, it was thought that DavidV. had looked, and found you could already do this. If so, only toolstack changes would be needed. Making emulated network hotplug or IDE hotplug work is probably not important. NOTE: Need to check QMP and it applicability to this problem. Next Topic: Xenstore -------------------- Juergen has a proposal for xenstored, to have something similar to dm-op ioctl. Each command can have a prefix which requests it's executed with a specified domain security context. Its also necessary to run the multilpe QEMUs, such that the PV back-ends can be run in a separate instance from the emulators. In doing this, you also need to separate out the xenstore paths, such that each QEMU uses its own xenstore branch. Part of the reason is that the backend paths that you are using to control the emulated deprivileged QEMU, have to be writeable by the guest. This is because the emulated QEMU is going to have to write to them. The emulator patch ID stuff is already upstream – and can be invoked. But toostack has to be able to cope. Ian wrote a LibXL patch series (20~30 patches) 18 months ago, but that has bit rotted. This patch was said to be surprisingly long. Andrew is concerned that giving a guest permission to write to the xenstore paths used by its QEMU may cause a significant security issue, while others think it would merely offer a new way for it to crash itself. Andrew to inspect Ian's patch series to evaluate this risk. There are only a hand full of xenstore paths needed by the emulated QEMU – which will only drop as QMP use increased. Another thing to consider is you'll have to be able to talk to multiple QEMU's via QMP – knowing which is which! There was a suggestion that you could differentiate between QEMU pv and QEMU emulated by how your communicating with it – but this wouldn't really be possible, as QEMU upstream won't accept anything using xenstore, if it might be useful to more then just the xen community. XenServer is currently working towards moving to QEMU upstream, with it's core functionally currently working using temporary python shims, translating the API calls. If depriv is unstreamed in time, XenServer won't have to transition the APIs twice. Given this, it was decided it would be easiest for LibXL to tested with the changes first, and see how well it QEMU works as not-root. Next Topic: Drop Caps ---------------------- Andrew points out that there is more to running without root privileges then just changing your UID. He recalls how a past college has needed to us drop-caps, and moved things to an anonymous name space, to prevent things like being able to open TCP sockets. Doing such operations are Linux specific, but as long as such operations are enabled by an option, this is ok. It was concluded these where incremental benefits which could be done later, after we got the basics working. This issue also not Xen specific. Action Items ------------ Paul to carry on with Xen device model work. Ian to re-factor LibXL after libxl split. Ian to negotiate xenstore restrictions Ian to check QMP can take an fd and use for CD insert and emulated hotplug. Andrew to look over Ian's patchqueue to see how bad giving a guest the extra xenstore permissions would be. Jennifer to continue leading Xenservers change over from QEMU trad to the use of QEMU upstream. XenServer is making slow progress here, but not highest priority. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
* QEMU XenServer/XenProject Working group meeting 10th May 2017 2017-02-28 18:18 ` QEMU XenServer/XenProject Working group meeting 10th February 2017 Jennifer Herbert @ 2017-06-05 13:48 ` Jennifer Herbert 0 siblings, 0 replies; 74+ messages in thread From: Jennifer Herbert @ 2017-06-05 13:48 UTC (permalink / raw) To: xen-devel QEMU XenServer/XenProject Working group meeting 5th May 2017 ============================================================= Attendees: * Paul Durrant * Andrew Cooper * Ian Jackson * Jenny Herbert * Igor Druzhinin * Simon Crow * Marcus Granado Reviewed previous action points * Paul to carry on with Xen device model work. - Done. * Ian - No progress with re factoring libxl split, XenStore restrictions or using and FD for QMP cd insert and emulated hotplug, due to feature freeze on 4.8.9. * Andrew to look over Ian's patch series to see how bad extra XenStore permission would be. - No. * Jenny – XenServer is continuing to make slow progress. We have XenCentre patch that means it can talk to QEMU trad, working on some bugs. Various other small bit of work done. Andrew to double check dm-op sub ops for continuation support. XenServer uses XenStore to communicate the VNC QEMU clipboard to the guest and back. It was concluded this isnt nice, as has caused veriose security problems in the past. New plan is to implement this using a shared page. Next XenStore restriction was discussed. Including an approach involving using a shared ring buffer and removing the previous xs-restrict. Instead implementing an xs restriction prefixing command within the xenbus command. Slightly fiddly, but small number of parts. Its doable and not controversial. Andrew had a diffident plan, to remove any use of XenStore from QEMU – specifically for the hvm case. Currently little us in the hvm case. Two uses should be trivial to implement in QMP, and the other is physmap, which Igor has been working on. Its agreed the phymap key needs removing, and that removing the last few uses of XenStore would be a good idea, however, if it seemed that dealing with the physmap issue where to take a long time, the xs-ristrict could be used as an intermidiate plan, which would allow this project to move on while physmap or any other complications are being sorted out. Andrew is not convinced that allowing a guest to fiddle with the phymap keys (as wolud have to be permitted under the xs-restict scheme) would be safe. Certainly the guest could destroy itself, which in itself is permit-able, but as it changes the interval virtual memory layout for QEMU, there might be some exploit hidden there – this would need to be checked. The conversation turns to the physmap keys, being the biggest XenStore concern. Andrew suggests that the correct way to fix it in a compatible way would be if grant map foreign took in a pointer that would be nmaped. The pointer would allow the range to be mapped exactly where it pointed. A new library call to xen foreign map could be written, but the complication of needing compatibility for QEMU with older versions of Xen was raised. Ian suggested that this wasn’t really a problem, since we could do the old thing with the old versions. Libxl already tests for the version of QEMU, and so if it finds this too old, it does the phymap keys thing, otherwise, it can use QEMU-depriv including the the new physmap mechanism. The new physmap mechanism would not require a substitute for the physmap keys, as QEMU already has all the information it needs. The keys where entirely to allow two parts of startup logic in QEMU to be reversed. Andrew says that Igor has a solution but doestn think its upstreamable. Ian suggests that given the size of the xs-restrict patch queue, if can fix the physmap issue relatively easily, we should, and then he could drop most of the xs-restict patchque. Ian offers to help with the libxl part of the phymap work. Igor explains an approach he tried hit a block of QXL, which initialised certain registers and write then into various memory locations. He tried in each place to insert an 'if Xen and runstate=resuming', doent touch. But the fix is very intrusive in terms of QXL code, and so he didn’t try to upstream it. The idea of using helper functions was discussed. Other helper functions have been used before for similar problems. A function could be created to access vram, instead of just writing to a pointer. All the access parts of QXL code could eb redirected though this helper function, and that would be likely ustreamable. In particular, its been suggested before that helper function could be used for range checking. They could be added for range checking, and then also modified for with the 'if xen and resuming' clauses. Igor explains how another approach he had been looking at was to change the order of QEMU startup, and move the memory map, and actually map the memory where QEMU expects it to be mapped. Here again, there was the issue of compatibility. The compact layer is still needed to work with old versions of libxc. It was discussed how it would have QEMU would have to be able to work with both skews. It could be decided at compile time, as you at this point if you have the new function call. It the new function call is not available, it would default to old behaviour. A goal of Igors was to remove the phymas XenStore keys code altogether, but for the depriv project, the important bit want they where not used. It was also noted that compact code can eventually be retired, when its felt no one is using it any more. The xenserver case, using QEMU-trad, XenServer hasn’t needed to create the physmap keys, and would rather not start. A benefit for libXL is it could get rid of a magic record for the migration stream. The new libxen foreign mem call would still be need. This is too late for 4.9, but that not a problem, and XenServer can backport. RFCs can be posted sooner. The conversation returns to XenStore. The patchque can be substantially cut down, which is a good thing. The other uses of XenStore are thought to be 'start log dirty' and the 'shutdown' keys. It was thought it was likely that QEMU might already have QMP commands to do these things, but if not, they are likely to be very happy to allow them to be added. After this, XenStore can be moved into the PV specific part of the code. Jenny suggested ring0 could put an item on their backlog to look at log dirty and the shutdown code. They should also do a scan for other uses. The xentore clipboard problem is entirely a XenServer problem, and they can implement the new clipboard mechanism in their own time. Ian still on the hook for the CD and libxl work. Will still have to do the two QEMUs code, and all the code to do with starting and stopping two QEMUs. But much less painful. Jenny has already created a confluence page linking to previously posted meeting notes, and it was suggested this could be extended to include action points, links to created code or links to related discussions. Action Items ------------ * If Andrew to double check dm-op sub ops for continuation support * Igor to implement and post patches to create new mem map function, and to implement the new phymap mechanism in QEMU. * Ian to look a using and FD for QMP cd insert and emulated hotplug, as well as splitting two QEMUs. * Jenny to create internal jira tickets looking at removing other used of XenStore in QEMU, as well update the confluence page. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 74+ messages in thread
end of thread, other threads:[~2017-06-05 13:48 UTC | newest] Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-28 17:01 XenProject/XenServer QEMU working group, Friday 8th July, 2016, 15:00 Jennifer Herbert 2016-08-01 11:32 ` Device model operation hypercall (DMOP, re qemu depriv) Ian Jackson 2016-08-01 12:41 ` Jan Beulich 2016-08-02 11:38 ` Wei Liu 2016-08-02 11:58 ` Jan Beulich 2016-08-02 13:02 ` David Vrabel 2016-08-02 13:29 ` Jan Beulich 2016-08-03 10:29 ` Ian Jackson 2016-08-03 12:03 ` Jan Beulich 2016-08-03 13:37 ` Ian Jackson 2016-08-03 14:16 ` Jan Beulich 2016-08-03 14:21 ` George Dunlap 2016-08-03 16:10 ` Ian Jackson 2016-08-03 16:18 ` Jan Beulich 2016-08-04 11:21 ` Ian Jackson 2016-08-04 13:24 ` Jan Beulich 2016-08-05 16:28 ` Ian Jackson 2016-08-08 11:18 ` Jan Beulich 2016-08-08 13:46 ` Ian Jackson 2016-08-08 14:07 ` Jan Beulich 2016-08-26 11:38 ` Ian Jackson 2016-08-26 12:58 ` Jan Beulich 2016-08-26 14:35 ` Ian Jackson 2016-08-26 15:13 ` Jan Beulich 2016-08-30 11:02 ` Ian Jackson 2016-08-30 21:47 ` Stefano Stabellini 2016-09-02 14:08 ` Wei Liu 2016-08-09 10:29 ` Jan Beulich 2016-08-09 10:48 ` Ian Jackson 2016-08-09 11:30 ` Jan Beulich 2016-08-12 9:44 ` George Dunlap 2016-08-12 11:50 ` Jan Beulich 2016-08-15 9:39 ` George Dunlap 2016-08-15 10:19 ` Jan Beulich 2016-08-15 10:47 ` George Dunlap 2016-08-15 11:20 ` Jan Beulich 2016-08-15 12:07 ` Ian Jackson 2016-08-15 14:20 ` Jan Beulich 2016-08-15 14:57 ` George Dunlap 2016-08-15 15:22 ` Jan Beulich 2016-08-15 14:50 ` David Vrabel 2016-08-15 15:24 ` Jan Beulich 2016-08-26 11:29 ` Ian Jackson 2016-08-26 12:58 ` Jan Beulich 2016-08-02 11:37 ` Wei Liu 2016-08-02 11:42 ` George Dunlap 2016-08-02 12:34 ` Wei Liu 2016-09-09 15:16 ` Jennifer Herbert 2016-09-09 15:34 ` David Vrabel 2016-09-12 13:47 ` George Dunlap 2016-09-12 14:32 ` Jan Beulich 2016-09-13 10:37 ` George Dunlap 2016-09-13 11:53 ` Jan Beulich 2016-09-13 16:07 ` David Vrabel 2016-09-14 9:51 ` Jan Beulich 2016-09-21 11:21 ` Ian Jackson 2016-09-21 11:28 ` George Dunlap 2016-09-21 11:58 ` Jan Beulich 2016-09-21 11:55 ` Jan Beulich 2016-09-21 12:23 ` Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] Ian Jackson 2016-09-21 12:48 ` Jan Beulich 2016-09-21 13:24 ` Ian Jackson 2016-09-21 13:56 ` Jan Beulich 2016-09-21 15:06 ` Ian Jackson 2016-09-21 17:09 ` George Dunlap 2016-09-22 8:47 ` Jan Beulich 2016-09-09 16:18 ` XenProject/XenServer QEMU working group minutes, 30th August 2016 Jennifer Herbert 2016-09-12 7:16 ` Juergen Gross 2016-10-14 18:01 ` QEMU XenServer/XenProject Working group meeting 29th September 2016 Jennifer Herbert 2016-10-18 19:54 ` Stefano Stabellini 2016-10-20 17:37 ` Lars Kurth 2016-10-20 18:53 ` Stefano Stabellini 2017-02-28 18:18 ` QEMU XenServer/XenProject Working group meeting 10th February 2017 Jennifer Herbert 2017-06-05 13:48 ` QEMU XenServer/XenProject Working group meeting 10th May 2017 Jennifer Herbert
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.