From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Chen, Tiejun" Subject: Re: [RFC][PATCH 01/13] tools: introduce some new parameters to set rdm policy Date: Tue, 19 May 2015 11:16:33 +0800 Message-ID: <555AAB11.3080105@intel.com> References: <1428657724-3498-1-git-send-email-tiejun.chen@intel.com> <1428657724-3498-2-git-send-email-tiejun.chen@intel.com> <20150508130429.GL3848@zion.uk.xensource.com> <55503F8A.5070709@intel.com> <20150511145432.GD30997@zion.uk.xensource.com> <55555157.3000604@intel.com> <20150518191743.GJ9503@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20150518191743.GJ9503@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: kevin.tian@intel.com, ian.campbell@citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, stefano.stabellini@citrix.com, JBeulich@suse.com, yang.z.zhang@intel.com, Ian.Jackson@eu.citrix.com List-Id: xen-devel@lists.xenproject.org On 2015/5/19 3:17, Wei Liu wrote: > On Fri, May 15, 2015 at 09:52:23AM +0800, Chen, Tiejun wrote: > [...] >> >> >> ... >> >> B has the form C<[KEY=VALUE,KEY=VALUE,...> where: >> >> =over 4 >> >> =item B >> >> Possible Bs are: >> >> =over 4 >> >> =item B >> >> Currently we just have one type. "host" means all reserved device memory on >> this platform should be reserved in this VM's pfn space. >> >> =over 4 >> >> =item B >> ... >> > > Yes, something like this. > >> >>> >>>>> > [...] >>>>>> index 9af0e99..d7434d6 100644 >>>>>> --- a/docs/misc/vtd.txt >>>>>> +++ b/docs/misc/vtd.txt >>>>>> @@ -111,6 +111,40 @@ in the config file: >>>>>> To override for a specific device: >>>>>> pci = [ '01:00.0,msitranslate=0', '03:00.0' ] >>>>>> >>>>>> +RDM, 'reserved device memory', for PCI Device Passthrough >>>>>> +--------------------------------------------------------- >>>>>> + >>>>>> +There are some devices the BIOS controls, for e.g. USB devices to perform >>>>>> +PS2 emulation. The regions of memory used for these devices are marked >>>>>> +reserved in the e820 map. When we turn on DMA translation, DMA to those >>>>>> +regions will fail. Hence BIOS uses RMRR to specify these regions along with >>>>>> +devices that need to access these regions. OS is expected to setup >>>>>> +identity mappings for these regions for these devices to access these regions. >>>>>> + >>>>>> +While creating a VM we should reserve them in advance, and avoid any conflicts. >>>>>> +So we introduce user configurable parameters to specify RDM resource and >>>>>> +according policies, >>>>>> + >>>>>> +To enable this globally, add "rdm" in the config file: >>>>>> + >>>>>> + rdm = [ 'host, reserve=force/try' ] >>>>>> + >>>>> >>>>> The "force/try" should be called "policy". And then you explain what >>>>> policies we have. >>>> >>>> Do you mean we should rename this? >>>> >>>> rdm = [ 'host, policy=force/try' ] >>>> >>> >>> No, I didn't ask you to rename that. >>> >>> The above line is an example which should reflect the correct syntax. >>> "force/try" is not the *actual syntax*, i.e. you won't write that in >>> your config file. >>> >>> I meant to changes it to "reserve=POLICY". Then you explain the possible >>> values of POLICY. >>> >> >> Understood so what about this, >> >> To enable this globally, add "rdm" in the config file: >> >> rdm = [ 'host, reserve=' ] >> > > OK, so this is a specific example in vtd.txt. Last time I misread it as > part of the manpage. > > I think you meant in this specific example (with other suggestions > incorporated): > > rdm = "type=host, reserve=force" > > Then you point user to xl.cfg manpage. > >> Or just for a specific device: >> >> pci = [ '01:00.0,rdm_reserve=', '03:00.0' ] >> > > Same here. > In order to be compatible with vtd.txt, could this work for you? To enable this globally, add "rdm" in the config file: rdm = [ 'type=host, reserve=' ] (default policy is "relaxed") Or just for a specific device: pci = [ '01:00.0,rdm_reserve=' ] (default policy is "strict") > Just don't write "force/try" or "strcit/relax" because that's not the > exact syntax you would use in a real config file. Yeah. > >> Global RDM parameter allows user to specify reserved regions explicitly. >> Using "host" to include all reserved regions reported on this platform >> which is good to handle hotplug scenario. In the future this parameter >> may be further extended to allow specifying random regions, e.g. even >> those belonging to another platform as a preparation for live migration >> with passthrough devices. >> >> Currently "POLICY" includes two options, "strict" and "relaxed". It decides >> how to handle conflict when reserving RDM regions in pfn space. If conflict >> ... >> >>>> This is really a policy but 'reserve' may can reflect our action explicitly, >>>> right? >>>> >>>>> >>>>>> +Or just for a specific device: >>>>>> + >>>>>> + pci = [ '01:00.0,rdm_reserve=force/try', '03:00.0' ] >>>> >>>> And you also can see this. >>>> >>>> But anyway, if you're really stick to rename this, I'm going to be fine as >>>> well but we should ping every one to check this point since this name is >>> >from our previous discussion. >>>> >>>>>> + >>>>>> +Global RDM parameter allows user to specify reserved regions explicitly. >>>>>> +Using 'host' to include all reserved regions reported on this platform >>>>>> +which is good to handle hotplug scenario. In the future this parameter >>>>>> +may be further extended to allow specifying random regions, e.g. even >>>>>> +those belonging to another platform as a preparation for live migration >>>>>> +with passthrough devices. >>>>>> + >>>>>> +'force/try' policy decides how to handle conflict when reserving RDM >>>>>> +regions in pfn space. If conflict exists, 'force' means an immediate error >>>>>> +so VM will be killed, while 'try' allows moving forward with a warning >>>>> >>>>> Be killed by whom? I think it's hvmloader crashes voluntarily, right? >>>> >>>> s/VM will be kille/hvmloader crashes voluntarily >>>> >>>>> >>>>>> +message thrown out. >>>>>> + >>>>>> >>>>>> Caveat on Conventional PCI Device Passthrough >>>>>> --------------------------------------------- >>>>>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c >>>>>> index 98687bd..9ed40d4 100644 >>>>>> --- a/tools/libxl/libxl_create.c >>>>>> +++ b/tools/libxl/libxl_create.c >>>>>> @@ -1407,6 +1407,11 @@ static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev, >>>>>> } >>>>>> >>>>>> for (i = 0; i < d_config->num_pcidevs; i++) { >>>>>> + /* >>>>>> + * If the rdm global policy is 'force' we should override each device. >>>>>> + */ >>>>>> + if (d_config->b_info.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_FORCE) >>>>>> + d_config->pcidevs[i].rdm_reserve = LIBXL_RDM_RESERVE_FLAG_FORCE; >>>>>> ret = libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1); >>>>>> if (ret < 0) { >>>>>> LIBXL__LOG(ctx, LIBXL__LOG_ERROR, >>>>>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl >>>>>> index 47af340..5786455 100644 >>>>>> --- a/tools/libxl/libxl_types.idl >>>>>> +++ b/tools/libxl/libxl_types.idl >>>>>> @@ -71,6 +71,17 @@ libxl_domain_type = Enumeration("domain_type", [ >>>>>> (2, "PV"), >>>>>> ], init_val = "LIBXL_DOMAIN_TYPE_INVALID") >>>>>> >>>>>> +libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [ >>>>>> + (0, "none"), >>>>>> + (1, "host"), >>>>>> + ]) >>>>>> + >>>>>> +libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [ >>>>>> + (-1, "invalid"), >>>>>> + (0, "force"), >>>>>> + (1, "try"), >>>>>> + ]) >>>>> >>>>> If you don't set init_val, the default value would be "force" (0), is this >>>> >>>> Yes. >>>> >>>>> want you want? >>>> >>>> We have a little bit of complexity here, >>>> >>>> "Default per-device RDM policy is 'force', while default global RDM policy >>>> is 'try'. When both policies are specified on a given region, 'force' is >>>> always preferred." >>>> >>> >>> This is going to be done in actual code anyway. >>> >>> This type is used both in global and per-device setting, so I envisage >> >> Yes. >> >>> this to have an invalid value to start with. Appropriate default values >> >> Sounds I should set this, >> >> +libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [ >> + (-1, "invalid"), >> + (0, "strict"), >> + (1, "relaxed"), >> + ], init_val = "LIBXL_RDM_RESERVE_FLAG_INVALID") >> + >> > > Yes, and then don't forget to set the value to appropriate value in the > _setdefault functions for different types. Currently "type" is not associated to "policy" so we can do this if necessary in the future. > >> >>> should be done in libxl_TYPE_setdefault functions. And the logic to >>> detect conflict and preferences done in your construct function. >>> >>> What do you think? >>> >>>>> >>>>>> + > > [...] > >>>>>> pcidev->permissive = atoi(tok); >>>>>> }else if ( !strcmp(optkey, "seize") ) { >>>>>> pcidev->seize = atoi(tok); >>>>>> + }else if ( !strcmp(optkey, "rdm_reserve") ) { >>>>>> + if ( !strcmp(tok, "force") ) { >>>>>> + pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_FORCE; >>>>>> + } else if ( !strcmp(tok, "try") ) { >>>>>> + pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_TRY; >>>>>> + } else { >>>>>> + pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_FORCE; >>>>>> + XLU__PCI_ERR(cfg, "Unknown PCI RDM property flag value:" >>>>>> + " %s, so goes 'force' by default.", >>>>> >>>>> If this is not an error, you don't need XLU__PCI_ERR. >>>>> >>>>> But I would say we should just treat this as an error and >>>>> abort/exit/report (whatever the parser should do in this case). >>>> >>>> In our case we just want to post a message to set a appropriate flag to >>>> recover this behavior like we write here, >>>> >>>> XLU__PCI_ERR(cfg, "Unknown PCI RDM property flag >>>> value:" >>>> " %s, so goes 'strict' by >>>> default.", >>>> tok); >>> >>> I suggest we just abort in this case and not second guess what the admin >>> wants. >> >> Okay, >> } else { >> XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM >> property" >> " flag: 'strict' or 'relaxed'.", >> tok); >> abort(); >> > > No, not calling the "abort" function. I meant returning appropriate error > value and let the caller handles this situation. Okay, just call "goto parse_error". > >> >>> >>>> >>>> This may just be a warning? But I don't we have this sort of definition, >>>> XLU__PCI_WARN, ... >>>> >>>> So what LOG format can be adopted here? >>> >>> Feel free to introduce XLU__PCI_WARN if it turns out to be necessary. >> >> If it goes to abort(), I think XLU__PCI_ERR() should be good. >> >>> >>>> >>>>> >>>>>> + tok); >>>>>> + } >>>>>> }else{ >>>>>> XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey); >>>>>> } >>>>>> @@ -167,6 +180,71 @@ parse_error: >>>>>> return ERROR_INVAL; >>>>>> } >>>>>> >>>>>> +int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str) >>>>>> +{ >>>>>> + unsigned state = STATE_TYPE; >>>>>> + char *buf2, *tok, *ptr, *end; >>>>>> + >>>>>> + if ( NULL == (buf2 = ptr = strdup(str)) ) >>>>>> + return ERROR_NOMEM; >>>>>> + >>>>>> + for(tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) { >>>>>> + switch(state) { >>> >>> Coding style. I haven't checked what actual style this file uses, but >>> there is inconsistency in this function by itself. >> >> I just refer to xlu_pci_parse_bdf() to generate xlu_rdm_parse(), and they >> are in the same file... >> >> Anyway, I should change this line, >> >> for ( tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++ ) { >> > > for (tok = ptr, end...) > > switch (state) { But what is the difference to compare the initial code? >>>>>> + if ( NULL == (buf2 = ptr = strdup(str)) ) >>>>>> + return ERROR_NOMEM; >>>>>> + >>>>>> + for(tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) { >>>>>> + switch(state) { I thought initially you let me to follow that previous "if" :) > > >>> >>>>>> + case STATE_TYPE: >>>>>> + if ( *ptr == '\0' || *ptr == ',' ) { >>>>>> + state = STATE_CHECK_FLAG; >>>>>> + *ptr = '\0'; > > [...] > >>>>>> >>>>>> + /* >>>>>> + * By default our global policy is to query all rdm entries, and >>>>>> + * force reserve them. >>>>>> + */ >>>>>> + b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_HOST; >>>>>> + b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_TRY; >>>>> >>>>> This should probably to into the _setdefault function of >>>>> libxl_domain_build_info. >>>> >>>> Sorry, I just see this >>>> >>>> libxl_domain_build_info_init() >>>> | >>>> + libxl_rdm_reserve_init(&p->rdm); >>>> | >>>> + memset(p, '\0', sizeof(*p)); >>>> >>>> But this should be generated automatically, right? So how to implement your >>>> idea? Could you give me a show? >>>> >>> >>> Check libxl_domain_build_info_setdefault. >>> >>> To use libxl types. You normally do: >>> >>> libxl_TYPE_init >>> libxl_TYPE_setdefault >>> >>> DO STUFF >>> >>> libxl_TYPE_dispose >>> >>> _init and _dispose are auto-generated. _setdefault is not. >> >> So in our case, maybe we can do this, >> >> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c >> index f0da7dc..461606c 100644 >> --- a/tools/libxl/libxl_create.c >> +++ b/tools/libxl/libxl_create.c >> @@ -100,6 +100,17 @@ static int sched_params_valid(libxl__gc *gc, >> return 1; >> } >> >> +void libxl__device_rdm_setdefault(libxl__gc *gc, >> + libxl_domain_build_info *b_info) > > It's not a device. Use libxl__rdm_setdefault. Okay. > >> +{ >> + /* >> + * By default our global policy is to query all rdm entries, and >> + * force reserve them. >> + */ >> + b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_HOST; >> + b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_STRICT; >> +} >> + > > Isn't the global policy "relaxed" (or "try")? At least that's what your > old code does. BTW your original code contradicts your original comment. Right, sorry for this typo. > >> int libxl__domain_build_info_setdefault(libxl__gc *gc, >> libxl_domain_build_info *b_info) >> { >> @@ -410,6 +421,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, >> libxl_domain_type_to_string(b_info->type)); >> return ERROR_INVAL; >> } >> + >> + libxl__device_rdm_setdefault(gc, b_info); >> return 0; >> } >> > > And you also need to modify libxl__device_pci_setdefault. > Okay. > I actually have another question on the interface design. To recap, in > your patch: > > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > index 47af340..5786455 100644 > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -71,6 +71,17 @@ libxl_domain_type = Enumeration("domain_type", [ > (2, "PV"), > ], init_val = "LIBXL_DOMAIN_TYPE_INVALID") > > +libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [ > + (0, "none"), > + (1, "host"), > + ]) > + > +libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [ > + (-1, "invalid"), > + (0, "force"), > + (1, "try"), > + ]) > + > libxl_channel_connection = Enumeration("channel_connection", [ > (0, "UNKNOWN"), > (1, "PTY"), > @@ -356,6 +367,11 @@ libxl_domain_sched_params = Struct("domain_sched_params",[ > ("budget", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}), > ]) > > +libxl_rdm_reserve = Struct("rdm_reserve", [ > + ("type", libxl_rdm_reserve_type), > + ("reserve", libxl_rdm_reserve_flag), > + ]) > + > libxl_domain_build_info = Struct("domain_build_info",[ > ("max_vcpus", integer), > ("avail_vcpus", libxl_bitmap), > @@ -401,6 +417,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ > ("kernel", string), > ("cmdline", string), > ("ramdisk", string), > + ("rdm", libxl_rdm_reserve), > ("u", KeyedUnion(None, libxl_domain_type, "type", > [("hvm", Struct(None, [("firmware", string), > ("bios", libxl_bios_type), > @@ -521,6 +538,7 @@ libxl_device_pci = Struct("device_pci", [ > ("power_mgmt", bool), > ("permissive", bool), > ("seize", bool), > + ("rdm_reserve", libxl_rdm_reserve_flag), > ]) > > Do you actually need libxl_rdm_reserve type? I.e. do you envisage that > structure to change a lot? Can you not just use libxl_rdm_reserve_type > and libxl_rdm_reserve_flag in build_info. > We'd like to introduce this type, libxl_rdm_reserve, to combine "type" and "flag". From my point of view, this sole structure can represent a holistic approach to rdm because, #1. Obviously its easy to get all; #2. It will probably be extended since like this name, rdm, reserved device memory, this should not be restricted to RMRR currently. So I just feel its flexible to support others in the future, or much more properties :) Thanks Tiejun