All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lars Kurth <lars.kurth.xen@gmail.com>
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: Jennifer Herbert <Jennifer.Herbert@citrix.com>, xen-devel@lists.xen.org
Subject: Re: QEMU XenServer/XenProject Working group meeting 29th September 2016
Date: Thu, 20 Oct 2016 18:37:46 +0100	[thread overview]
Message-ID: <BCA53EC6-758A-4516-9145-9666EBB909BA@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1610181222070.16441@sstabellini-ThinkPad-X260>


> On 18 Oct 2016, at 20:54, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> I think this kind of calls should be announced on xen-devel before they
> happen, to give a chance to other people to participate (I cannot
> promise I would have participated but it is the principle that counts).
> 
> If I missed the announcement, I apologize.

Stefano, the meeting started off as an internal meeting to brainstorm and share experiences and challenges we have with QEMU amongst different Citrix teams with a view to get a wider dialog started. Maybe we are at the stage where it makes sense to open it up. 

> On Fri, 14 Oct 2016, Jennifer Herbert wrote:
>> XenStore
>> --------
>> 
>> For the non-pv part of QEMU, XenStore is only used in two places.
>> There is the DM state, and the physmap mechanism.  Although there is a
>> vague plan for replacing the physmap mechanism, it is some way off.
>> 
>> The DM state key is used for knowing when the qemu process is running
>> etcetera, QMP would seem to be an option to replace it - however there
>> is no (nice) way to wait on a socket until it has been opened.  One
>> solution might be to use Xenstore to let you know the QMP sockets
>> where available, before QEMU drops privileges,  and then QMP could be
>> used to know QEMU is in the running state.
>> 
>> To avoid the need to use xs-restrict, you would need to both replace
>> physmap and rework qemu startup procedure. The use of xs-restrict would
>> be more expedient, and does not look to need that much work.
>> 
>> Discussion was had over how secure it would be to allow a guest access
>> to these Xenstore keys - it was concluded that a guest could mostly
>> only mess itself up.  If I guest attempted to prevent itself from being
>> migrated, the tool stack time it out, and could kill it.
>> 
>> There followed a discussion on the Xenbus protocol, and additions
>> needed.  The aim is to merely restrict the permission for the command,
>> to that of the guest who's domID you provide.  It was proposed that
>> it uses the header as is, with its  16 bytes, with the command
>> 'one-time-restrict' , and then the payload would have two additional
>> field at the start.  These two field would correspond to the domid to
>> restrict as, and the real command. Transaction ID and tags would be
>> taken from the real header.
>> 
>> Although inter domain xs-restrict is not specifically needed for this
>> project, it is thought it might be a blocking items for upstream
>> acceptance.  It it thoughts these changes would not require that much
>> work to implement, and may be useful in use use cases. Only a few
>> changes to QEMU would be needed, and libxl should be able to track
>> QEMU versions.  Ian Jackson volunteered to look at this, with David
>> helping  with the kernel bits.  Ian won't have time to look at this
>> until after Xen 4.8 is released.
>> 
>> There discussion about what may fail once privileges are taken away,
>> which would include CDs and PCI pass though.  It is thought the full
>> list can only be known by trying.  Not everything needs to work for
>> acceptance upstream, such as PCI pass though.   If such an
>> incompatible feature is needed, restrictions can be turned off.  These
>> problems can be fixed in a later phase, with CDs likely being at teh
>> top of the list.
> 
> One thing to note is that xs-restrict is unimplemented in cxenstored.
> 
> 
>> disaggregation
>> =============
>> 
>> A disaggregation proposal which had previously been posted to a QEMU
>> forum was discussed.  It was not previously accepted by all. The big
>> question was how to separate the device models from the machine, with
>> a particular point of contention being around PIIX and the idea of
>> starting a QEMU instance without one.
> 
> Right. In particular I tend to agree with the other QEMU maintainers
> when they say: why ask for a PIIX3 compatible machine, when actually you
> don't want to be PIIX3 compatible?
> 
> 
>> The general desire from us is
>> we want to have a specific device emulated and nothing else.
> 
> This is really not possible with QEMU, because QEMU is a machine
> emulator, not a device emulator. BTW who wants this? I mean, why is this
> part of the QEMU depriv discussion? It is not necessary. I think what we
> want for QEMU depriv is to be able to build a QEMU PV machine with just
> the PV backends in it, which is attainable with the current
> architecture. I know there are use cases for having an emulator of just
> one device, but I don't think they should be confused with the more
> important underlying issue here, which is QEMU running with full
> privileges.
> 
> 
>> It is
>> suggested you would have a software interface between each device that
>> looked a software version of PCI.  The PIIX device could be attached to
>> CPU this pseudo PCI interface.  This would fit in well with how IOREQ
>> server and IOMMU works.  Although this sounds like a large
>> architectural change is wanted, its suggested that actually its just
>> that we're asking them to take a different stability and plug-ability
>> posture on the interfaces they already have.
>> 
>> This architectural issue is the cause behind lots of little
>> annoyances, which have been going on for years. Xen is having to make
>> up lots of strange stuff to keep QEMU happy, and there is confusion
>> over memory ownership.  Fixing the architecture  should make our lives
>> much easier.  These architectural issues are also making things
>> difficult for Intel, who are trying to work around the issue with Xen
>> changes, which may just worsen the problem.  This means this is
>> effectively blocking them.
>> 
>> It is proposed that instead of having a QEMU binary, what is really
>> wanted is a QEMU library.  With a library you could easily take the
>> bits needed, create your own main loop and link them to whatever
>> interface, IOREQ services or IPC mechanism is needed. There would be
>> no longer be a need for the IOREQ server to be in QEMU, which is
>> thought should be an attractive idea for the QEMU maintainers.  It is
>> also thought that other projects, such as the clear containers people
>> would also benefit from such an architecture.  The idea of spiltting
>> out the CPU code from the device code may even be attractive to KVM.
> 
> The idea of having a QEMU library has always been resisted upstream. It
> takes the project in a very different direction. As QEMU maintainer I
> don't know if such a thing would actually be good for the QEMU
> community.

We revisited the original disaggregation thread (Wei originally proposed the patches) and what we proposed at the time was a sort of a half-way house that was very Xen specific and not really of much use to anyone other QEMU downstream but Xen. Even then, opinions amongst QEMU maintainers were divided: some were in favour, some were not. But we would definitely need to make a good case, do some convincing upfront and address the concerns of the QEMU community and work with the QEMU maintainers from the get-go. As you rightly point out, such an approach does change some of the fundamental assumptions within QEMU and we wouldn't want to do this, if there are no benefits to QEMU. I think it is worthwhile trying this again. You may have some further insights, which would be quite valuable. 

Regards
Lars

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-10-20 17:37 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-28 17:01 XenProject/XenServer QEMU working group, Friday 8th July, 2016, 15:00 Jennifer Herbert
2016-08-01 11:32 ` Device model operation hypercall (DMOP, re qemu depriv) Ian Jackson
2016-08-01 12:41   ` Jan Beulich
2016-08-02 11:38     ` Wei Liu
2016-08-02 11:58       ` Jan Beulich
2016-08-02 13:02         ` David Vrabel
2016-08-02 13:29           ` Jan Beulich
2016-08-03 10:29       ` Ian Jackson
2016-08-03 12:03         ` Jan Beulich
2016-08-03 13:37           ` Ian Jackson
2016-08-03 14:16             ` Jan Beulich
2016-08-03 14:21               ` George Dunlap
2016-08-03 16:10                 ` Ian Jackson
2016-08-03 16:18                   ` Jan Beulich
2016-08-04 11:21                     ` Ian Jackson
2016-08-04 13:24                       ` Jan Beulich
2016-08-05 16:28                         ` Ian Jackson
2016-08-08 11:18                           ` Jan Beulich
2016-08-08 13:46                             ` Ian Jackson
2016-08-08 14:07                               ` Jan Beulich
2016-08-26 11:38                                 ` Ian Jackson
2016-08-26 12:58                                   ` Jan Beulich
2016-08-26 14:35                                     ` Ian Jackson
2016-08-26 15:13                                       ` Jan Beulich
2016-08-30 11:02                                         ` Ian Jackson
2016-08-30 21:47                                           ` Stefano Stabellini
2016-09-02 14:08                                           ` Wei Liu
2016-08-09 10:29                               ` Jan Beulich
2016-08-09 10:48                                 ` Ian Jackson
2016-08-09 11:30                                   ` Jan Beulich
2016-08-12  9:44                                     ` George Dunlap
2016-08-12 11:50                                       ` Jan Beulich
2016-08-15  9:39                                         ` George Dunlap
2016-08-15 10:19                                           ` Jan Beulich
2016-08-15 10:47                                             ` George Dunlap
2016-08-15 11:20                                               ` Jan Beulich
2016-08-15 12:07                                                 ` Ian Jackson
2016-08-15 14:20                                                   ` Jan Beulich
2016-08-15 14:57                                                 ` George Dunlap
2016-08-15 15:22                                                   ` Jan Beulich
2016-08-15 14:50                                 ` David Vrabel
2016-08-15 15:24                                   ` Jan Beulich
2016-08-26 11:29                                     ` Ian Jackson
2016-08-26 12:58                                       ` Jan Beulich
2016-08-02 11:37   ` Wei Liu
2016-08-02 11:42     ` George Dunlap
2016-08-02 12:34       ` Wei Liu
2016-09-09 15:16   ` Jennifer Herbert
2016-09-09 15:34     ` David Vrabel
2016-09-12 13:47     ` George Dunlap
2016-09-12 14:32     ` Jan Beulich
2016-09-13 10:37       ` George Dunlap
2016-09-13 11:53         ` Jan Beulich
2016-09-13 16:07       ` David Vrabel
2016-09-14  9:51         ` Jan Beulich
2016-09-21 11:21           ` Ian Jackson
2016-09-21 11:28             ` George Dunlap
2016-09-21 11:58               ` Jan Beulich
2016-09-21 11:55             ` Jan Beulich
2016-09-21 12:23               ` Device model operation hypercall (DMOP, re qemu depriv) [and 1 more messages] Ian Jackson
2016-09-21 12:48                 ` Jan Beulich
2016-09-21 13:24                   ` Ian Jackson
2016-09-21 13:56                     ` Jan Beulich
2016-09-21 15:06                       ` Ian Jackson
2016-09-21 17:09                       ` George Dunlap
2016-09-22  8:47                         ` Jan Beulich
2016-09-09 16:18 ` XenProject/XenServer QEMU working group minutes, 30th August 2016 Jennifer Herbert
2016-09-12  7:16   ` Juergen Gross
2016-10-14 18:01   ` QEMU XenServer/XenProject Working group meeting 29th September 2016 Jennifer Herbert
2016-10-18 19:54     ` Stefano Stabellini
2016-10-20 17:37       ` Lars Kurth [this message]
2016-10-20 18:53         ` Stefano Stabellini
2017-02-28 18:18     ` QEMU XenServer/XenProject Working group meeting 10th February 2017 Jennifer Herbert
2017-06-05 13:48       ` QEMU XenServer/XenProject Working group meeting 10th May 2017 Jennifer Herbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BCA53EC6-758A-4516-9145-9666EBB909BA@gmail.com \
    --to=lars.kurth.xen@gmail.com \
    --cc=Jennifer.Herbert@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.