All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Juergen Gross <jgross@suse.com>,
	Xen-devel <xen-devel@lists.xenproject.org>
Cc: "Lars Kurth" <lars.kurth@citrix.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Wei Liu" <wl@xen.org>,
	"George Dunlap" <George.Dunlap@eu.citrix.com>,
	"Tim Deegan" <tim@xen.org>, "Julien Grall" <julien.grall@arm.com>,
	"Paul Durrant" <paul.durrant@citrix.com>,
	"Jan Beulich" <jbeulich@suse.com>,
	"Ian Jackson" <ian.jackson@citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [Xen-devel] [PATCH] docs/sphinx: todo/wishlist
Date: Tue, 23 Jul 2019 20:05:53 +0100	[thread overview]
Message-ID: <8092088a-ebd7-ac32-5f48-a411f88628d8@citrix.com> (raw)
In-Reply-To: <712cf182-861f-3d10-9abe-b0ae689eb24c@suse.com>

On 23/07/2019 16:30, Juergen Gross wrote:
> On 23.07.19 15:38, Andrew Cooper wrote:
>> On 23/07/2019 05:36, Juergen Gross wrote:
>>> On 22.07.19 21:20, Andrew Cooper wrote:
>>>> diff --git a/docs/misc/wishlist.rst b/docs/misc/wishlist.rst
>>>> new file mode 100644
>>>> index 0000000000..6cdb47d6e7
>>>> --- /dev/null
>>>> +++ b/docs/misc/wishlist.rst
>>>> @@ -0,0 +1,53 @@
>>>> +Development Wishlist
>>>> +====================
>>>> +
>>>> +Remove xenstored's dependencies on unstable interfaces
>>>> +------------------------------------------------------
>>>> +
>>>> +Various xenstored implementations use libxc for two purposes.  It
>>>> would be a
>>>> +substantial advantage to move xenstored onto entirely stable
>>>> interfaces, which
>>>> +disconnects it from the internal of the libxc.
>>>> +
>>>> +1. Foreign mapping of the store ring
>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> +
>>>> +This is obsolete since :xen-cs:`6a2de353a9` (2012) which allocated
>>>> grant
>>>> +entries instead, to allow xenstored to function as a stub-domain
>>>> without dom0
>>>> +permissions.  :xen-cs:`38eeb3864d` dropped foreign mapping for
>>>> cxenstored.
>>>> +However, there are no OCaml bindings for libxengnttab.
>>>> +
>>>> +Work Items:
>>>> +
>>>> +* Minimal ``tools/ocaml/libs/xg/`` binding for
>>>> ``tools/libs/gnttab/``.
>>>> +* Replicate :xen-cs:`38eeb3864d` for oxenstored as well.
>>>> +
>>>> +
>>>> +2. Figuring out which domain(s) have gone away
>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> +
>>>> +Currently, the handling of domains is asymmetric.
>>>> +
>>>> +* When a domain is created, the toolstack explicitly sends an
>>>> +  ``XS_INTRODUCE(domid, store mfn, store evtchn)`` message to
>>>> xenstored, to
>>>> +  cause xenstored to connect to the guest ring, and fire the
>>>> +  ``@introduceDomain`` watch.
>>>> +* When a domain is destroyed, Xen fires ``VIRQ_DOM_EXC`` which is
>>>> bound by
>>>> +  xenstored, rather than the toolstack.  xenstored updates its idea
>>>> of the
>>>> +  status of domains, and fires the ``@releaseDomain`` watch.
>>>> +
>>>> +Xenstored uses ``xc_domain_getinfo()``, to work out which domain(s)
>>>> have gone
>>>> +away, and only cares about the shutdown status.
>>>> +
>>>> +Furthermore, ``@releaseDomain`` (like ``VIRQ_DOM_EXC``) is a
>>>> single-bit
>>>> +message, which requires all listeners to evaluate whether the
>>>> message applies
>>>> +to them or not.  This results in a flurry of ``xc_domain_getinfo()``
>>>> calls
>>>> +from multiple entities in the system, which all serialise on the
>>>> domctl lock
>>>> +in Xen.
>>>> +
>>>> +Work Items:
>>>> +
>>>> +* Figure out how shutdown status can be expressed in a stable way
>>>> from Xen.
>>>> +* Figure out if ``VIRQ_DOM_EXC`` and ``@releaseDomain`` can be
>>>> extended to
>>>> +  carry at least a domid, to make domain shutdown scale better.
>>>
>>> @releaseDomain (and @introduceDomain) can't be extended, we'd need to
>>> add another watch path like @domainStatus/<domid>/<newState>. Xenstored
>>> could advertise its capability to raise this watch in /tool/xenstored.
>>
>> I guess I was being a bit fast and loose with terminology.  I didn't
>> intend to imply "literally modify @{introduce,release}Domain", as they
>> are already fixed ABIs, but more to "compatibly build something which is
>> better".
>
> Okay.
>
>>
>> That scheme would work for improved @releaseDomain, but it wouldn't work
>> for an improved introduce.  Introduce needs a single key to watch on,
>> which hands back the domid so you don't need to go searching for it.
>
> Yes, and? Its perfectly fine to set a watch firing if anything below
> @domainStatus is changing.

Hmm - that might work if no other information was put into domainStatus,
but would quickly become a scalability problem otherwise.

>
>>
>>>
>>> As VIRQ_DOM_EXC is just an event I don't see how the domid could be
>>> passed by it. I guess we'd need e.g. a shared memory area which the
>>> domain registered for VIRQ_DOM_EXC could map and which would contain a
>>> bitmap (one bit per domain). The hypervisor would set the bit on a
>>> status change and fire VIRQ_DOM_EXC, xenstored would look for a set
>>> bit, clear it and read the status of the related domain.
>>
>> The point here is to avoid using xc_domain_getinfo() in the first place,
>> so there needs to be no "getting the status of the domain".
>
> I'd guess a single xc_domain_getinfo() in the tools wouldn't be so
> problematic. The caller would know the domid already, so no need to
> query all domains.

Its still a problem when you've got 1000 Qemu's, they all get
@releaseDomain, and try to figure out if it is their own domain which
went away.

This shouldn't require taking the domctl lock in Xen 1000 times to
figure out, seeing as xenstored knows exactly which domain actually went
away.

>
>> DOM_EXC is fired for domain_shutdown() only (but for reasons which
>> escape me, fired twice).  Given that a domid is a 15 bit number, a
>> bitmap of all domains does fit within a single 4k page.
>
> Firing twice is needed: first time for disconnecting all backends
> and the second time for cleaning up when the domain is completely
> gone.

Do we really have pieces of code which count the the @releaseDomain's
for a specific domain?  Please say no...

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-07-23 19:06 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-22 19:20 [Xen-devel] [PATCH] docs/sphinx: todo/wishlist Andrew Cooper
2019-07-23  4:36 ` Juergen Gross
2019-07-23 13:38   ` Andrew Cooper
2019-07-23 15:30     ` Juergen Gross
2019-07-23 19:05       ` Andrew Cooper [this message]
2019-07-24  5:04         ` Juergen Gross
2019-07-23 13:53 ` Andrew Cooper
2019-09-09 11:02   ` Ian Jackson
2019-09-09 11:56     ` Andrew Cooper
2019-07-29  3:15 ` Rich Persaud
2019-09-06  7:48 ` Lars Kurth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8092088a-ebd7-ac32-5f48-a411f88628d8@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=ian.jackson@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jgross@suse.com \
    --cc=julien.grall@arm.com \
    --cc=lars.kurth@citrix.com \
    --cc=paul.durrant@citrix.com \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=tim@xen.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.