All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@eu.citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: RFC: Automatically making a PCI device assignable in the config file
Date: Tue, 9 Jul 2013 17:38:08 +0100	[thread overview]
Message-ID: <51DC3C70.2010605@eu.citrix.com> (raw)
In-Reply-To: <20130709142527.GD24897@phenom.dumpdata.com>

On 07/09/2013 03:25 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 09, 2013 at 01:52:38PM +0100, George Dunlap wrote:
>> On 07/08/2013 08:23 PM, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Jul 05, 2013 at 02:52:08PM +0100, George Dunlap wrote:
>>>> On 05/07/13 14:48, Andrew Cooper wrote:
>>>>> On 05/07/13 14:45, George Dunlap wrote:
>>>>>> On 05/07/13 14:39, Andrew Cooper wrote:
>>>>>>> On 05/07/13 12:01, George Dunlap wrote:
>>>>>>>> I've been doing some work to try to make driver domains easier to set
>>>>>>>> up and use.  At the moment, in order to pass a device through to a
>>>>>>>> guest, you first need to assign it to pciback.  This involves doing
>>>>>>>> one of three things:
>>>>>>>> * Running xl pci-assignable-add for the device
>>>>>>>> * Specifying the device to be grabbed on the dom0 Linux command-line
>>>>>>>> * Doing some hackery in /etc/modules.d
>>>>>>>>
>>>>>>>> None of these are very satisfying.  What I think would be better is if
>>>>>>>> there was a way to specify in the guest config file, "If device X is
>>>>>>>> not assignable, try to make it assignable".  That way you can have a
>>>>>>>> driver domain grab the appropriate device just by running "xl create
>>>>>>>> domnet"; and once we have the xendomains script up and running with
>>>>>>>> xl, you can simply configure your domnet appropriately, and then put
>>>>>>>> it in /etc/xen/auto, to be started automatically on boot.
>>>>>>>>
>>>>>>>> My initial idea was to add a parameter to the pci argument in the
>>>>>>>> config file; for example:
>>>>>>>>
>>>>>>>> pci = ['08:04.1,permissive=1,seize=1']
>>>>>>>>
>>>>>>>> The 'seize=1' would indicate that if bdf 08:04.1 is not already
>>>>>>>> assignable, that xl should try to make is assignable.
>>>>>>>>
>>>>>>>> The problem here is that this would need to be parsed by
>>>>>>>> xlu_pci_parse_bdf(), which only takes an argumen tof type
>>>>>>>> libxl_device_pci.
>>>>>>>>
>>>>>>>> Now it seems to me that the right place to do this "seizing" is in xl,
>>>>>>>> not inside libxl -- the functions for doing assignment exist already,
>>>>>>>> and are simple and straightforward.  But doing it in xl, but as a
>>>>>>>> parameter of the "pci" setting, means changing xlu_pci_parse_bdf() to
>>>>>>>> pass something else back, which begins to get awkward.
>>>>>>>>
>>>>>>>> So it seems to me we have a couple of options:
>>>>>>>> 1. Create a new argument, "pci_seize" or something like that, which
>>>>>>>> would be processed separately from pci
>>>>>>>> 2. Change xlu_pci_parse_bdf to take a pointer to an extra struct, for
>>>>>>>> arguments directed at xl rather than libxl
>>>>>>>> 3. Add "seize" to libxl_device_pci, but have it only used by xl
>>>>>>>> 4. Add "seize" to libxl_device_pci, and have libxl do the seizing.
>>>>>>>>
>>>>>>>> Any preference -- or any other ideas?
>>>>>>>>
>>>>>>>>    -George
>>>>>>> How about a setting in xl.conf of "auto-seize pci devices" ?  That way
>>>>>>> the seizing is entirely part of xl
>>>>>> Auto-seizing is fairly dangerous; you could easily accidentally yank
>>>>>> out the ethernet card, or even the disk that dom0 is using.  I really
>>>>>> think it should have to be enabled on a device-by-device basis.
>>>>>>
>>>>>> I suppose another option would be to be able to set, in xl.conf, a
>>>>>> list of auto-seizeable devices.  I don't really like that option as
>>>>>> well, though.  I'd rather be able to keep all the configuration in one
>>>>>> place.
>>>>>>
>>>>>>   -George
>>>>> Or a slight less extreme version.
>>>>>
>>>>> If xl sees that it would need seize a device, it could ask "You are
>>>>> trying to create a domain with device $FOO.  Would you like to seize it
>>>> >from dom0 ?"
>>>>
>>>> That won't work for driver domains, as we want it all to happen
>>>> automatically when the host is booting. :-)
>>>
>>> The high-level goal is that we want to put the network devices with a
>>> network backend and storage devices with storage backend. Ignorning
>>> that for network devices you might want seperate backends for each
>>> device (say one backend for Wireless, one for Ethernet, etc).
>>>
>>> Perhaps the logic ought to do grouping - so you say:
>>>   a) "backends:all-network" (which would created one backend with all of the
>>>     wireless, ethernet, etc PCI devices), or
>>>   b) "backends:all-network,seperate-storage", which  create one backend with
>>>    all of the wireless, ethernet in one backend; and one backend domain for each
>>>    storage device?
>>>
>>> Naturally the user gets to chose which grouping they would like?
>>
>> We seem to be talking about different things.  You seem to be
>> talking about automatically starting some pre-made VMs and assigning
>> devices and backends to them?  But I'm not really sure.
>
> I am trying to look at it from a high perspective to see whether we can
> make this automated for 99% of people out of the box. Hence the
> idea of grouping. And yes to '..assigning devices and backends to them'.
>>
>> I was assuming that the user was going to be installing and
>> configuring their own driver domains.  The user already has to
>> specify "pci=['$BDF']" in their config file to get specific devices
>> passed through -- this would just be making it easy to have the
>> device assigned to pciback as well.
>
> I think the technical bits what libxl is doing and yanking devices
> around is driven either by the admin or a policy. If the policy
> is this idea of grouping (that is a terrible name now that I think
> of it), then perhaps we should think how to make that work and then
> the details (such as this automatic yanking of devices to pci-back)
> can be filled in.
>
>
>>
>> I suspect that a lot of people will want to have one network card
>> assigned to domain 0 as a "management network", and only have other
>> devices assigned to driver domains.  I think that having one device
>> per domain is probably the best recommendation; although we
>> obviously want to support someone who wants a single "manage all the
>> devices" domain, we should assume that people are going to have one
>> device per driver domain.
>
> I don't know. My feeble idea was that we would have at minimum _two_
> guests on bootup. One is a control one that has no devices - but is
> the one that launches the guests.
>
> Then there is the dom1 which would have all (or some) of the storage
> and network devices plugged in along with the backends. Then a dom2
> which would be the old-style-dom0 - so it would have the graphic card
> and the rest of the PCI devices.
>
> In other words, when I boot I would have two tiny domains launch
> right before "old-style-dom0" is started. But I am getting in specifics
> here.
>
> Perhaps you could explain to me how you envisioned how the device
> driver domains idea would work? How would you want it to work on your
> laptop?
>
> Or are we right now just thinking of the small pieces of making the
> code be able to yank the devices around and assign them?

I was thinking for now just making the "manually configure it" case 
easier.  I decided to switch one of my test boxen to using a network 
driver domain by default, and although the core is there, there are a 
bunch of things that are unnecessarily crufty.

I do agree that long term it would be nice to make it easy to make 
driver domains the default, but that's not what I had in mind for this 
conversation. :-)

The hard part for making it really automated, it seems to me, comes from 
two things.  O

One, you have to make sure your driver domain has the appropriate 
hardware drivers for your system as well.  We don't want to be in the 
business of maintaining a distro; most people will probably want the 
driver domain to be from the same distro they're using for dom0, which 
means that setting up such a domain will need to be done differently on 
a distro-by-distro basis.

Two, you have the configuration problem.  In Debian, for instance, if 
you wanted to switch a device from being owned by dom0 to being in a 
driver domain, you'd have to:
* Copy over the udev rules recognizing the mac address, so it got the 
same ethN
* copy over the eth and bridge info from dom0's /etc/network/interfaces 
into the guest /etc/network/interfaces

I'm not sure exactly what you have to do in Fedora, but I bet it's 
something similar.

It might be nice to work with distros to make the process of making 
driver domains / stub domains easier, and to make it easy to configure 
driver domain networking options from the distro's network scripts; but 
that's kind of another level of functionality.

I think first things first: make manually-set-up driver domains actually 
easy to use.

  -George

  reply	other threads:[~2013-07-09 16:38 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-05 11:01 RFC: Automatically making a PCI device assignable in the config file George Dunlap
2013-07-05 13:39 ` Andrew Cooper
2013-07-05 13:45   ` George Dunlap
2013-07-05 13:48     ` Andrew Cooper
2013-07-05 13:52       ` George Dunlap
2013-07-08 19:23         ` Konrad Rzeszutek Wilk
2013-07-09 12:52           ` George Dunlap
2013-07-09 14:25             ` Konrad Rzeszutek Wilk
2013-07-09 16:38               ` George Dunlap [this message]
2013-07-10 13:45                 ` Stefano Stabellini
2013-07-10 13:49               ` Stefano Stabellini
2013-07-10 13:55     ` Ian Jackson
2013-07-10 14:45       ` George Dunlap
2013-07-10 15:12         ` Gordan Bobic
2013-07-10 15:29           ` George Dunlap
2013-07-10 15:37             ` Gordan Bobic
2013-07-10 13:53 ` Ian Jackson
2013-07-10 14:48   ` George Dunlap
2013-07-11 11:35     ` David Vrabel
2013-07-12  9:36       ` George Dunlap
2013-07-12  9:55         ` David Vrabel
2013-07-12 10:32           ` George Dunlap
2013-07-12 13:10         ` Ian Jackson
2013-07-12 13:48           ` Konrad Rzeszutek Wilk
2013-07-12 14:43             ` Ian Jackson
2013-07-12 15:01               ` Konrad Rzeszutek Wilk
2013-07-12 15:09                 ` George Dunlap
2013-07-12 16:02                   ` Konrad Rzeszutek Wilk
2013-07-12 16:08                     ` George Dunlap
2013-07-12 14:44             ` Sander Eikelenboom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51DC3C70.2010605@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.