KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	"Sylvain Bauza" <sbauza@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Libvirt Devel" <libvir-list@redhat.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>,
	"Erik Skultety" <eskultet@redhat.com>,
	"Pavel Hrdina" <phrdina@redhat.com>
Subject: Re: mdevctl: A shoestring mediated device management and persistence utility
Date: Wed, 19 Jun 2019 12:46:33 -0600
Message-ID: <20190619124633.1c573484@x1.home> (raw)
In-Reply-To: <20190619114659.1f20c773.cohuck@redhat.com>

On Wed, 19 Jun 2019 11:46:59 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Wed, 19 Jun 2019 08:28:02 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Tue, Jun 18, 2019 at 04:12:10PM -0600, Alex Williamson wrote:  
> > > On Tue, 18 Jun 2019 14:48:11 +0200
> > > Sylvain Bauza <sbauza@redhat.com> wrote:
> > >     
> > > > On Tue, Jun 18, 2019 at 1:01 PM Cornelia Huck <cohuck@redhat.com> wrote:  
> 
> > > > > I think we need to reach consensus about the actual scope of the
> > > > > mdevctl tool.
> > > > >
> > > > >      
> > > > Thanks Cornelia, my thoughts:
> > > > 
> > > > - Is it supposed to be responsible for managing *all* mdev devices in    
> > > > >   the system, or is it more supposed to be a convenience helper for
> > > > >   users/software wanting to manage mdevs?
> > > > >      
> > > > 
> > > > The latter. If an operator (or some software) wants to create mdevs by not
> > > > using mdevctl (and rather directly calling the sysfs), I think it's OK.
> > > > That said, mdevs created by mdevctl would be supported by systemctl, while
> > > > the others not but I think it's okay.    
> > > 
> > > I agree (sort of), and I'm hearing that we should drop any sort of
> > > automatic persistence of mdevs created outside of mdevctl.  The problem
> > > comes when we try to draw the line between unmanaged and manged
> > > devices.  For instance, if we have a command to list mdevs it would
> > > feel incomplete if it didn't list all mdevs both those managed by
> > > mdevctl and those created elsewhere.  For managed devices, I expect
> > > we'll also have commands that allow the mode of the device to be
> > > switched between transient, saved, and persistent.  Should a user then  
> 
> Hm, what's the difference between 'saved' and 'persistent'? That
> 'saved' devices are not necessarily present?

It seems like we're coming up with the following classes:

1) transient
  a) mdevctl created
  b) foreign
2) defined
  a) automatic start-up
  b) manual start-up

I was using persistent for 2b), but that's probably not a good name
because devices can still be stopped, so they're not really
persistently available even in this class.

mdevctl today only has defined devices, transient needs to be
implemented, which should lead to a conclusion on whether 1a) and 1b)
are really the same.

> > > be allowed to promote an unmanaged device to one of these modes via the
> > > same command?  Should they be allowed to stop an unmanaged device
> > > through driverctl?  Through systemctl?  These all seem like reasonable
> > > things to do, so what then is the difference between transient and
> > > unmanaged mdev and is mdevctl therefore managing all mdevs, not just
> > > those it has created?    
> > 
> > To my mind there shouldn't really need to be a difference between
> > transient mdevs created by mdevctrl and mdevs created by an user
> > directly using sysfs. Both are mdevs on the running system with
> > no config file that you have to enumerate by looking at sysfs.
> > This ties back to my belief that we shouldn't need to have any
> > config on disk for a transient mdev, just discover them all
> > dynamically when required.  
> 
> So mdevctl can potentially interact with any mdev device on the system,
> it just has to be instructed by a user or software to do so? I think we
> can work with that.

Some TBDs around systemd/init support for transient devices and how
transient devices can be promoted to defined.  For instance if a
vfio-ap device requires matrix programming after instantiation, can we
glean that programming from sysfs or is there metadata irrecoverably
lost if no config file is created for a transient device?  This would
also imply that a 1b) foreign device could not be promoted to 2x)
defined device.

> > > > - Do we want mdevctl to manage config files for individual mdevs, or    
> > > > >   are they supposed to be in a common format that can also be managed
> > > > >   by e.g. libvirt?
> > > > >      
> > > > 
> > > > Unless I misunderstand, I think mdevctl just helps to create mdevs for
> > > > being used by guests created either by libvirt or QEMU or even others.
> > > > How a guest would allocate a mdev (ie. saying "I'll use this specific mdev
> > > > UUID") is IMHO not something for mdevctl.    
> > > 
> > > Right, mdevctl isn't concerned with how a specific mdev is used, but I
> > > think what Connie is after is more the proposal from Daniel where
> > > libvirt can essentially manage mdevctl config files itself and then
> > > only invoke mdevctl for the dirty work of creating and deleting
> > > devices.  In fact, assuming systemd, libvirt could avoid direct
> > > interaction with mdevctl entirely, instead using systemctl device units
> > > to start and stop the mdevs.  Maybe where that proposal takes a turn is
> > > when we again consider non-systemd hosts, where maybe mdevctl needs to
> > > write out an init script per mdev and libvirt injecting itself into
> > > manipulation of the config files would either need to perform the same
> > > or fall back to mdevctl.  Unfortunately there seems to be an ultimatum
> > > to either condone external config file manipulation or expand the scope
> > > of the project into becoming a library.    
> > 
> > Is mdevctl really tackling a problem that is complex enough that we
> > will gain significantly by keeping the config files private and forcing
> > use of a CLI or Library to access them ?  The amount of information
> > that we need to store per mdev looks pretty small, with minimal
> > compound structure. To me it feels like we can easily define a standard
> > config format without suffering any serious long term pain, as the chances
> > we'd need to radically change it look minimal.  
> 
> We probably can get some consensus on a file format pretty quickly, I
> guess; and I agree that there's probably not enough magic in there to
> make future changes painful. So yes, let's try to agree on a format
> that various entities can consume.
> 
> >   
> > > > - Should mdevctl be a stand-alone tool, provide library functions, or    
> > > > >   both? Related: should it keep any internal state that is not written
> > > > >   to disk? (I think that also plays into the transient vs. persistent
> > > > >   question.)    
> > > 
> > > I don't think we want an mdevctld, if that's what you mean by internal
> > > state not written to disk.  I think we ideally want all state in the
> > > mdev config files or discerned through sysfs.  How we handle
> > > non-systemd hosts may throw a wrench in that though since currently the
> > > systemd integration relies on a template to support arbitrary mdevs and
> > > I'm not sure how to replicate that in other init services.  If we need
> > > to dynamically manage per mdev init files in addition to config files,
> > > we're not so self contained.    
> > 
> > The most important part of the init script integration is just the bulk
> > creation of mdevs on startup. I think this could be handled on non-systemd
> > hosts via a fairly dumb init script that does this something approximating
> > this:
> > 
> >     for dev in `mdevctl list`
> >     do
> >         mdevctrl get-autostart $dev
> > 	test $? = 0 && mdevctrl start $dev
> >     done
> > 
> > ie, iterate over all configs. If the config is marked to autostart,
> > then start it.  
> 
> How likely is a non-systemd system to not have udev? Can we rely on it
> for dynamic management? (Any system will likely have _some_ kind of
> uevent-consuming handlers, I guess, unless it is a very specialized
> system where mdevctl is unlikely to be used at all.)

Yep, I'd be uncomfortable with anything that only processed parent
devices once at boot time, we need to expect mdev parent devices (or at
least their support of mdev within that parent driver) to be dynamic.
Churning through devices at a fixed point in boot also only provides the
persistence, not the $INITSYSTEM based management where there seem to
be multiple affirmations that systemctl base management is a good
thing.  How much do we *really* care about non-systemd hosts?  Or
maybe, how much do we really care about feature parity for non-systemd
hosts and could we shrug off anything beyond basic persistence and
mdevctl-based management?

> > > > FWIW, I'd love using mdevctl for OpenStack (Nova) just at least for
> > > > creating persisted mdevs (ie. mdevs that would be recreated after rebooting
> > > > using systemctl). That's the real use case I need.
> > > > Whether libvirt would internally support mdevctl would be nice but that's
> > > > not really something Nova needs, so I leave others providing their own
> > > > thoughts.
> > > > 
> > > > 
> > > > My personal opinion is that mdevctl should be able to tolerate mdevs    
> > > > > being configured by other means, but probably should not try to impose
> > > > > its own configuration if it detects that (unless explicitly asked to do
> > > > > so). Not sure how feasible that goal is.
> > > > >
> > > > > That's what I misunderstand : in order to have a guest using a vGPU, you      
> > > > need to do two things :
> > > > 1/ create the mdev
> > > > 2/ allocate this created dev to a specific guest config
> > > > 
> > > > Of course, we could imagine a way to have both steps to be done directly by
> > > > libvirt, but from my opinion, mdevctl is really helping 1/ and not 2/.    
> 
> I don't think mdevctl should care about mdev-to-guest assignments; that
> should be done by libvirt, OpenStack, whatever.
> 
> > > 
> > > Yep, we also don't want to presume libvirt is the only consumer here.
> > > mdevctl should also support other VM management tools, users who write
> > > their own management scripts, and even non-VM related use cases.
> > >     
> > > > > A well-defined config file format is probably a win, even if it only
> > > > > ends up being used by mdevctl itself.    
> > > 
> > > Yes, regardless of whether others touch them, conversion scripts on
> > > upgrade should be avoided.  Do we need something beyond a key=value
> > > file?  So far we're only storing the mdev type and startup mode, but
> > > vfio-ap clearly needs more, apparently key=value1,value2,... type
> > > representation.  Still, I think I'd prefer simple over jumping to xml
> > > or json or yaml.  Thanks,    
> > 
> > For libvirt our preference would be something we can easily support
> > without having to write new parsers. I'm not going to suggest XML,
> > but JSON is probably our highest preference. If not then a simple
> > flat file with one line of   key="value"  per setting is something
> > we already parse for /etc/libvirt/libvirtd.conf file. For slightly
> > more structure the .ini style file is also good. That's basically
> > just flat  key=value  pairs, but with [section] headers so you can
> > represent some level of structured data.  
> 
> It would be nice if a simple .ini style would cover all things, but I'm
> not sure whether vfio-ap isn't already a bit awkward to support with
> that. I don't mind JSON much, but would like to avoid XML :)

I think the vfio-ap issue is simply that we have multiple values for
the same key, if we choose our key to be sysfs attributes, but we can
either redefine our key or the value or the config format to resolve
this.  For instance we can add a sequence number to the key such that
we maintain a 1:1/key:value and interpret the key accordingly.  We can
also escape the value into an array and apply each value in sequence
to the key.  Does something in JSON provide an obviously better
solution to either of these?  Thanks,

Alex

  reply index

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-23 23:20 Alex Williamson
2019-05-24 10:11 ` Cornelia Huck
2019-05-24 14:43   ` Alex Williamson
2019-06-07 16:06   ` Halil Pasic
2019-06-11 19:45     ` Cornelia Huck
2019-06-11 20:28       ` Alex Williamson
2019-06-12  7:14         ` Cornelia Huck
2019-06-12 15:54           ` Halil Pasic
2019-06-13 10:02             ` Cornelia Huck
2019-06-12 15:07       ` Halil Pasic
2019-06-13 16:17 ` Christophe de Dinechin
2019-06-13 16:35   ` Alex Williamson
2019-06-14  9:54     ` [libvirt] " Christophe de Dinechin
2019-06-14 14:23       ` Alex Williamson
2019-06-14 15:06         ` Christophe de Dinechin
2019-06-14 16:04           ` Alex Williamson
2019-06-17 16:03             ` Cornelia Huck
2019-06-17 14:00 ` Daniel P. Berrangé
2019-06-17 14:54   ` Alex Williamson
2019-06-17 15:10     ` Daniel P. Berrangé
2019-06-17 17:05       ` Alex Williamson
2019-06-18  8:44         ` Daniel P. Berrangé
2019-06-18 11:01         ` Cornelia Huck
     [not found]           ` <CALOCmukPWiXiM+mN0hCTvSwfdHy5UdERU8WnvOXiBrMQ9tH3VA@mail.gmail.com>
2019-06-18 22:12             ` Alex Williamson
2019-06-19  7:28               ` Daniel P. Berrangé
2019-06-19  9:46                 ` Cornelia Huck
2019-06-19 18:46                   ` Alex Williamson [this message]
2019-06-20  8:24                     ` Daniel P. Berrangé
     [not found]               ` <CALOCmu=6Xmw-_-SVXujCEcgPY2CQiBQKgfUMJ45WnZ_9XORyUw@mail.gmail.com>
2019-06-19  9:57                 ` Cornelia Huck
2019-06-19 19:53                 ` Alex Williamson
2019-06-25 22:52 ` Alex Williamson
2019-06-26  9:58   ` Cornelia Huck
2019-06-26 14:37     ` Alex Williamson
2019-06-27  1:53       ` Alex Williamson
2019-06-27 12:26         ` Cornelia Huck
2019-06-27 15:00           ` Matthew Rosato
2019-06-27 15:38             ` Alex Williamson
2019-06-27 16:13               ` Matthew Rosato
2019-06-27 21:15               ` Alex Williamson
2019-06-28  1:57                 ` Alex Williamson
2019-06-28  9:06                   ` Cornelia Huck
2019-06-28 14:01                     ` Matthew Rosato
2019-06-28 17:05                     ` Alex Williamson
2019-07-01  8:20                       ` Cornelia Huck
2019-07-01 14:40                         ` Alex Williamson
2019-07-01 17:13                           ` Cornelia Huck

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190619124633.1c573484@x1.home \
    --to=alex.williamson@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=libvir-list@redhat.com \
    --cc=phrdina@redhat.com \
    --cc=sbauza@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git