From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Liu Subject: Re: [RFC][PATCH 01/13] tools: introduce some new parameters to set rdm policy Date: Mon, 18 May 2015 20:17:43 +0100 Message-ID: <20150518191743.GJ9503@zion.uk.xensource.com> References: <1428657724-3498-1-git-send-email-tiejun.chen@intel.com> <1428657724-3498-2-git-send-email-tiejun.chen@intel.com> <20150508130429.GL3848@zion.uk.xensource.com> <55503F8A.5070709@intel.com> <20150511145432.GD30997@zion.uk.xensource.com> <55555157.3000604@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <55555157.3000604@intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "Chen, Tiejun" Cc: kevin.tian@intel.com, Wei Liu , ian.campbell@citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, stefano.stabellini@citrix.com, JBeulich@suse.com, yang.z.zhang@intel.com, Ian.Jackson@eu.citrix.com List-Id: xen-devel@lists.xenproject.org On Fri, May 15, 2015 at 09:52:23AM +0800, Chen, Tiejun wrote: [...] > > > ... > > B has the form C<[KEY=VALUE,KEY=VALUE,...> where: > > =over 4 > > =item B > > Possible Bs are: > > =over 4 > > =item B > > Currently we just have one type. "host" means all reserved device memory on > this platform should be reserved in this VM's pfn space. > > =over 4 > > =item B > ... > Yes, something like this. > > > > >>> [...] > >>>>index 9af0e99..d7434d6 100644 > >>>>--- a/docs/misc/vtd.txt > >>>>+++ b/docs/misc/vtd.txt > >>>>@@ -111,6 +111,40 @@ in the config file: > >>>> To override for a specific device: > >>>> pci = [ '01:00.0,msitranslate=0', '03:00.0' ] > >>>> > >>>>+RDM, 'reserved device memory', for PCI Device Passthrough > >>>>+--------------------------------------------------------- > >>>>+ > >>>>+There are some devices the BIOS controls, for e.g. USB devices to perform > >>>>+PS2 emulation. The regions of memory used for these devices are marked > >>>>+reserved in the e820 map. When we turn on DMA translation, DMA to those > >>>>+regions will fail. Hence BIOS uses RMRR to specify these regions along with > >>>>+devices that need to access these regions. OS is expected to setup > >>>>+identity mappings for these regions for these devices to access these regions. > >>>>+ > >>>>+While creating a VM we should reserve them in advance, and avoid any conflicts. > >>>>+So we introduce user configurable parameters to specify RDM resource and > >>>>+according policies, > >>>>+ > >>>>+To enable this globally, add "rdm" in the config file: > >>>>+ > >>>>+ rdm = [ 'host, reserve=force/try' ] > >>>>+ > >>> > >>>The "force/try" should be called "policy". And then you explain what > >>>policies we have. > >> > >>Do you mean we should rename this? > >> > >>rdm = [ 'host, policy=force/try' ] > >> > > > >No, I didn't ask you to rename that. > > > >The above line is an example which should reflect the correct syntax. > >"force/try" is not the *actual syntax*, i.e. you won't write that in > >your config file. > > > >I meant to changes it to "reserve=POLICY". Then you explain the possible > >values of POLICY. > > > > Understood so what about this, > > To enable this globally, add "rdm" in the config file: > > rdm = [ 'host, reserve=' ] > OK, so this is a specific example in vtd.txt. Last time I misread it as part of the manpage. I think you meant in this specific example (with other suggestions incorporated): rdm = "type=host, reserve=force" Then you point user to xl.cfg manpage. > Or just for a specific device: > > pci = [ '01:00.0,rdm_reserve=', '03:00.0' ] > Same here. Just don't write "force/try" or "strcit/relax" because that's not the exact syntax you would use in a real config file. > Global RDM parameter allows user to specify reserved regions explicitly. > Using "host" to include all reserved regions reported on this platform > which is good to handle hotplug scenario. In the future this parameter > may be further extended to allow specifying random regions, e.g. even > those belonging to another platform as a preparation for live migration > with passthrough devices. > > Currently "POLICY" includes two options, "strict" and "relaxed". It decides > how to handle conflict when reserving RDM regions in pfn space. If conflict > ... > > >>This is really a policy but 'reserve' may can reflect our action explicitly, > >>right? > >> > >>> > >>>>+Or just for a specific device: > >>>>+ > >>>>+ pci = [ '01:00.0,rdm_reserve=force/try', '03:00.0' ] > >> > >>And you also can see this. > >> > >>But anyway, if you're really stick to rename this, I'm going to be fine as > >>well but we should ping every one to check this point since this name is > >>from our previous discussion. > >> > >>>>+ > >>>>+Global RDM parameter allows user to specify reserved regions explicitly. > >>>>+Using 'host' to include all reserved regions reported on this platform > >>>>+which is good to handle hotplug scenario. In the future this parameter > >>>>+may be further extended to allow specifying random regions, e.g. even > >>>>+those belonging to another platform as a preparation for live migration > >>>>+with passthrough devices. > >>>>+ > >>>>+'force/try' policy decides how to handle conflict when reserving RDM > >>>>+regions in pfn space. If conflict exists, 'force' means an immediate error > >>>>+so VM will be killed, while 'try' allows moving forward with a warning > >>> > >>>Be killed by whom? I think it's hvmloader crashes voluntarily, right? > >> > >>s/VM will be kille/hvmloader crashes voluntarily > >> > >>> > >>>>+message thrown out. > >>>>+ > >>>> > >>>> Caveat on Conventional PCI Device Passthrough > >>>> --------------------------------------------- > >>>>diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > >>>>index 98687bd..9ed40d4 100644 > >>>>--- a/tools/libxl/libxl_create.c > >>>>+++ b/tools/libxl/libxl_create.c > >>>>@@ -1407,6 +1407,11 @@ static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev, > >>>> } > >>>> > >>>> for (i = 0; i < d_config->num_pcidevs; i++) { > >>>>+ /* > >>>>+ * If the rdm global policy is 'force' we should override each device. > >>>>+ */ > >>>>+ if (d_config->b_info.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_FORCE) > >>>>+ d_config->pcidevs[i].rdm_reserve = LIBXL_RDM_RESERVE_FLAG_FORCE; > >>>> ret = libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1); > >>>> if (ret < 0) { > >>>> LIBXL__LOG(ctx, LIBXL__LOG_ERROR, > >>>>diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > >>>>index 47af340..5786455 100644 > >>>>--- a/tools/libxl/libxl_types.idl > >>>>+++ b/tools/libxl/libxl_types.idl > >>>>@@ -71,6 +71,17 @@ libxl_domain_type = Enumeration("domain_type", [ > >>>> (2, "PV"), > >>>> ], init_val = "LIBXL_DOMAIN_TYPE_INVALID") > >>>> > >>>>+libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [ > >>>>+ (0, "none"), > >>>>+ (1, "host"), > >>>>+ ]) > >>>>+ > >>>>+libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [ > >>>>+ (-1, "invalid"), > >>>>+ (0, "force"), > >>>>+ (1, "try"), > >>>>+ ]) > >>> > >>>If you don't set init_val, the default value would be "force" (0), is this > >> > >>Yes. > >> > >>>want you want? > >> > >>We have a little bit of complexity here, > >> > >>"Default per-device RDM policy is 'force', while default global RDM policy > >>is 'try'. When both policies are specified on a given region, 'force' is > >>always preferred." > >> > > > >This is going to be done in actual code anyway. > > > >This type is used both in global and per-device setting, so I envisage > > Yes. > > >this to have an invalid value to start with. Appropriate default values > > Sounds I should set this, > > +libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [ > + (-1, "invalid"), > + (0, "strict"), > + (1, "relaxed"), > + ], init_val = "LIBXL_RDM_RESERVE_FLAG_INVALID") > + > Yes, and then don't forget to set the value to appropriate value in the _setdefault functions for different types. > > >should be done in libxl_TYPE_setdefault functions. And the logic to > >detect conflict and preferences done in your construct function. > > > >What do you think? > > > >>> > >>>>+ [...] > >>>> pcidev->permissive = atoi(tok); > >>>> }else if ( !strcmp(optkey, "seize") ) { > >>>> pcidev->seize = atoi(tok); > >>>>+ }else if ( !strcmp(optkey, "rdm_reserve") ) { > >>>>+ if ( !strcmp(tok, "force") ) { > >>>>+ pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_FORCE; > >>>>+ } else if ( !strcmp(tok, "try") ) { > >>>>+ pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_TRY; > >>>>+ } else { > >>>>+ pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_FORCE; > >>>>+ XLU__PCI_ERR(cfg, "Unknown PCI RDM property flag value:" > >>>>+ " %s, so goes 'force' by default.", > >>> > >>>If this is not an error, you don't need XLU__PCI_ERR. > >>> > >>>But I would say we should just treat this as an error and > >>>abort/exit/report (whatever the parser should do in this case). > >> > >>In our case we just want to post a message to set a appropriate flag to > >>recover this behavior like we write here, > >> > >> XLU__PCI_ERR(cfg, "Unknown PCI RDM property flag > >>value:" > >> " %s, so goes 'strict' by > >>default.", > >> tok); > > > >I suggest we just abort in this case and not second guess what the admin > >wants. > > Okay, > } else { > XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM > property" > " flag: 'strict' or 'relaxed'.", > tok); > abort(); > No, not calling the "abort" function. I meant returning appropriate error value and let the caller handles this situation. > > > > >> > >>This may just be a warning? But I don't we have this sort of definition, > >>XLU__PCI_WARN, ... > >> > >>So what LOG format can be adopted here? > > > >Feel free to introduce XLU__PCI_WARN if it turns out to be necessary. > > If it goes to abort(), I think XLU__PCI_ERR() should be good. > > > > >> > >>> > >>>>+ tok); > >>>>+ } > >>>> }else{ > >>>> XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey); > >>>> } > >>>>@@ -167,6 +180,71 @@ parse_error: > >>>> return ERROR_INVAL; > >>>> } > >>>> > >>>>+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str) > >>>>+{ > >>>>+ unsigned state = STATE_TYPE; > >>>>+ char *buf2, *tok, *ptr, *end; > >>>>+ > >>>>+ if ( NULL == (buf2 = ptr = strdup(str)) ) > >>>>+ return ERROR_NOMEM; > >>>>+ > >>>>+ for(tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) { > >>>>+ switch(state) { > > > >Coding style. I haven't checked what actual style this file uses, but > >there is inconsistency in this function by itself. > > I just refer to xlu_pci_parse_bdf() to generate xlu_rdm_parse(), and they > are in the same file... > > Anyway, I should change this line, > > for ( tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++ ) { > for (tok = ptr, end...) switch (state) { > > > >>>>+ case STATE_TYPE: > >>>>+ if ( *ptr == '\0' || *ptr == ',' ) { > >>>>+ state = STATE_CHECK_FLAG; > >>>>+ *ptr = '\0'; [...] > >>>> > >>>>+ /* > >>>>+ * By default our global policy is to query all rdm entries, and > >>>>+ * force reserve them. > >>>>+ */ > >>>>+ b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_HOST; > >>>>+ b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_TRY; > >>> > >>>This should probably to into the _setdefault function of > >>>libxl_domain_build_info. > >> > >>Sorry, I just see this > >> > >>libxl_domain_build_info_init() > >> | > >> + libxl_rdm_reserve_init(&p->rdm); > >> | > >> + memset(p, '\0', sizeof(*p)); > >> > >>But this should be generated automatically, right? So how to implement your > >>idea? Could you give me a show? > >> > > > >Check libxl_domain_build_info_setdefault. > > > >To use libxl types. You normally do: > > > > libxl_TYPE_init > > libxl_TYPE_setdefault > > > > DO STUFF > > > > libxl_TYPE_dispose > > > >_init and _dispose are auto-generated. _setdefault is not. > > So in our case, maybe we can do this, > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index f0da7dc..461606c 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -100,6 +100,17 @@ static int sched_params_valid(libxl__gc *gc, > return 1; > } > > +void libxl__device_rdm_setdefault(libxl__gc *gc, > + libxl_domain_build_info *b_info) It's not a device. Use libxl__rdm_setdefault. > +{ > + /* > + * By default our global policy is to query all rdm entries, and > + * force reserve them. > + */ > + b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_HOST; > + b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_STRICT; > +} > + Isn't the global policy "relaxed" (or "try")? At least that's what your old code does. BTW your original code contradicts your original comment. > int libxl__domain_build_info_setdefault(libxl__gc *gc, > libxl_domain_build_info *b_info) > { > @@ -410,6 +421,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, > libxl_domain_type_to_string(b_info->type)); > return ERROR_INVAL; > } > + > + libxl__device_rdm_setdefault(gc, b_info); > return 0; > } > And you also need to modify libxl__device_pci_setdefault. I actually have another question on the interface design. To recap, in your patch: diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 47af340..5786455 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -71,6 +71,17 @@ libxl_domain_type = Enumeration("domain_type", [ (2, "PV"), ], init_val = "LIBXL_DOMAIN_TYPE_INVALID") +libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [ + (0, "none"), + (1, "host"), + ]) + +libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [ + (-1, "invalid"), + (0, "force"), + (1, "try"), + ]) + libxl_channel_connection = Enumeration("channel_connection", [ (0, "UNKNOWN"), (1, "PTY"), @@ -356,6 +367,11 @@ libxl_domain_sched_params = Struct("domain_sched_params",[ ("budget", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}), ]) +libxl_rdm_reserve = Struct("rdm_reserve", [ + ("type", libxl_rdm_reserve_type), + ("reserve", libxl_rdm_reserve_flag), + ]) + libxl_domain_build_info = Struct("domain_build_info",[ ("max_vcpus", integer), ("avail_vcpus", libxl_bitmap), @@ -401,6 +417,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("kernel", string), ("cmdline", string), ("ramdisk", string), + ("rdm", libxl_rdm_reserve), ("u", KeyedUnion(None, libxl_domain_type, "type", [("hvm", Struct(None, [("firmware", string), ("bios", libxl_bios_type), @@ -521,6 +538,7 @@ libxl_device_pci = Struct("device_pci", [ ("power_mgmt", bool), ("permissive", bool), ("seize", bool), + ("rdm_reserve", libxl_rdm_reserve_flag), ]) Do you actually need libxl_rdm_reserve type? I.e. do you envisage that structure to change a lot? Can you not just use libxl_rdm_reserve_type and libxl_rdm_reserve_flag in build_info. Wei.