From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Chen, Tiejun" Subject: Re: [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM Date: Fri, 08 May 2015 09:24:56 +0800 Message-ID: <554C1068.7020500@intel.com> References: <1428657724-3498-1-git-send-email-tiejun.chen@intel.com> <1428657724-3498-5-git-send-email-tiejun.chen@intel.com> <5534FB700200007800073B6C@mail.emea.novell.com> <554A2CA1.9030907@intel.com> <554A509402000078000773F6@mail.emea.novell.com> <554ACC5C.2040300@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <554ACC5C.2040300@intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , ian.campbell@citrix.com, stefano.stabellini@citrix.com, wei.liu2@citrix.com, Ian.Jackson@eu.citrix.com Cc: yang.z.zhang@intel.com, andrew.cooper3@citrix.com, kevin.tian@intel.com, tim@xen.org, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Campbell, Jackson, Wei and Stefano, Any consideration? I can follow up Jan's idea but I need you guys make sure I'm going to do this properly. Thanks Tiejun On 2015/5/7 10:22, Chen, Tiejun wrote: > On 2015/5/6 23:34, Jan Beulich wrote: >>>>> On 06.05.15 at 17:00, wrote: >>> On 2015/4/20 19:13, Jan Beulich wrote: >>>>>>> On 10.04.15 at 11:21, wrote: >>>>> --- a/tools/libxc/xc_domain.c >>>>> +++ b/tools/libxc/xc_domain.c >>>>> @@ -1665,6 +1665,46 @@ int xc_assign_device( >>>>> return do_domctl(xch, &domctl); >>>>> } >>>>> >>>>> +struct xen_reserved_device_memory >>>>> +*xc_device_get_rdm(xc_interface *xch, >>>>> + uint32_t flag, >>>>> + uint16_t seg, >>>>> + uint8_t bus, >>>>> + uint8_t devfn, >>>>> + unsigned int *nr_entries) >>>>> +{ >>>> >>>> So what's the point of having both this new function and >>>> xc_reserved_device_memory_map()? Is the latter useful for >>>> anything besides the purpose here? >>> >>> I just hope xc_reserved_device_memory_map() is a standard interface to >>> call that XENMEM_reserved_device_memory_map, but xc_device_get_rdm() can >>> handle some errors in current case. >>> >>> I think you are hinting we just need one, right? >> >> Correct. But remember - I'm not a maintainer of this code, so > > But this may be a little complex with one... > >> maintainers may be of different opinion. > > Anyway, let me ask our tools maintainers. > > Campbell, Jackson, Wei and Stefano, > > What about your concern to this? > >> >>>>> + struct xen_reserved_device_memory *xrdm = NULL; >>>>> + int rc = xc_reserved_device_memory_map(xch, flag, seg, bus, >>>>> devfn, xrdm, >>>>> + nr_entries); >>>>> + >>>>> + if ( rc < 0 ) >>>>> + { >>>>> + if ( errno == ENOBUFS ) >>>>> + { >>>>> + if ( (xrdm = malloc(*nr_entries * >>>>> + >>>>> sizeof(xen_reserved_device_memory_t))) == NULL ) >>>>> + { >>>>> + PERROR("Could not allocate memory."); >>>> >>>> Now that's exactly the kind of error message that makes no sense: >>>> As errno will already cause PERROR() to print something along the >>>> lines of the message you provide here, you're just creating >>>> redundancy. Indicating the purpose of the allocation, otoh, would >>>> add helpful context for the one inspecting the resulting log. >>> >>> What about this? >>> >>> PERROR("Could not allocate memory buffers to store reserved device >>> memory entries."); >> >> You kind of go from one extreme to the other - the message >> doesn't need to be overly long, but it should be distinct from >> all other messages (so that when seen one can identify what >> went wrong). > > I originally refer to some existing examples like this, > > int > xc_core_arch_memory_map_get(xc_interface *xch, struct > xc_core_arch_context *unused, > xc_dominfo_t *info, shared_info_any_t > *live_shinfo, > xc_core_memory_map_t **mapp, > unsigned int *nr_entries) > { > ... > map = malloc(sizeof(*map)); > if ( map == NULL ) > { > PERROR("Could not allocate memory"); > return -1; > } > > Maybe this is wrong to my case. Could I change this? > > PERROR("Could not allocate memory for XENMEM_reserved_device_memory_map > hypercall"); > > Or just give me your line. > >> >>>>> @@ -302,8 +300,11 @@ static int setup_guest(xc_interface *xch, >>>>> >>>>> for ( i = 0; i < nr_pages; i++ ) >>>>> page_array[i] = i; >>>>> - for ( i = mmio_start >> PAGE_SHIFT; i < nr_pages; i++ ) >>>>> - page_array[i] += mmio_size >> PAGE_SHIFT; >>>>> + /* >>>>> + * This condition 'lowmem_end <= mmio_start' is always true. >>>>> + */ >>>> >>>> For one I think you mean "The", not "This", as there's no such >>>> condition around here. And then - why? DYM "is supposed to >>>> always be true"? In which case you may want to check... >>> >>> I always do this inside libxl__build_hvm() but before setup_guest(), >>> >>> + if (args.lowmem_size > mmio_start) >>> + args.lowmem_size = mmio_start; >>> >>> And plus, we also another policy to rdm, >>> >>> #1. Above a predefined boundary (default 2G) >>> - move lowmem_end below reserved region to solve conflict; >>> >>> This means there's such a likelihood of args.lowmem_size < mmio_start) >>> as well. >>> >>> So here I'm saying the condition is always true. >> >> Okay, but again - if this is relevant to the following code, an >> assertion or alike may still be warranted. > > Yes I should add 'assert()' here. > >> >>>> and hence don't have the final say on stylistic issues, I don't see >>>> why the above couldn't be expressed with a single return statement. >>> >>> Are you saying something like this? Note this was showed by yourself >>> long time ago. >> >> I know, and hence I was puzzled to still see you use the more >> convoluted form. >> >>> static bool check_mmio_hole_conflict(uint64_t start, uint64_t memsize, >>> uint64_t mmio_start, uint64_t >>> mmio_size) >>> { >>> return start + memsize > mmio_start && start < mmio_start + >>> mmio_size; >>> } >>> >>> But I don't think this really can't work out our case. >> >> It's equivalent to the original you had, so I don't see what you >> mean with "this really can't work out our case". >> > > Let me make this point clear. > > The original implementation, > > +static int check_rdm_hole(uint64_t start, uint64_t memsize, > + uint64_t rdm_start, uint64_t rdm_size) > +{ > + if (start + memsize <= rdm_start || start >= rdm_start + rdm_size) > + return 0; > + else > + return 1; > +} > > means it returns 'false' in two cases: > > #1. end = start + memsize; end <= rdm_start; > > This region [start, end] is below of rdm entry. > > #2. rdm_end = rdm_start + rdm_size; stat >= rdm_end; > > This region [start, end] is above of rdm entry. > > So others conditions should indicate that rdm entry is overlapping with > this region. Actually this has three cases: > > #1. This region just conflicts with the first half of rdm entry; > #2. This region just conflicts with the second half of rdm entry; > #3. This whole region falls inside of rdm entry; > > Then it should return 'true', right? > > But with this single line, > > return start + memsize > rdm_start && start < rdm_start + rdm_size; > > => > > return end > rdm_start && start < rdm_end; > > This just guarantee it return 'true' *only* if #3 above occurs. > >>>>> +int libxl__domain_device_check_rdm(libxl__gc *gc, >>>>> + libxl_domain_config *d_config, >>>>> + uint64_t rdm_mem_guard, >>>>> + struct xc_hvm_build_args *args) >>>>> +{ >>>>> + int i, j, conflict; >>>>> + libxl_ctx *ctx = libxl__gc_owner(gc); >>>>> + struct xen_reserved_device_memory *xrdm = NULL; >>>>> + unsigned int nr_all_rdms = 0; >>>>> + uint64_t rdm_start, rdm_size, highmem_end = (1ULL << 32); >>>>> + uint32_t type = d_config->b_info.rdm.type; >>>>> + uint16_t seg; >>>>> + uint8_t bus, devfn; >>>>> + >>>>> + /* Might not to expose rdm. */ >>>>> + if ((type == LIBXL_RDM_RESERVE_TYPE_NONE) && >>>>> !d_config->num_pcidevs) >>>>> + return 0; >>>>> + >>>>> + /* Collect all rdm info if exist. */ >>>>> + xrdm = xc_device_get_rdm(ctx->xch, LIBXL_RDM_RESERVE_TYPE_HOST, >>>>> + 0, 0, 0, &nr_all_rdms); >>>> >>>> What meaning has passing a libxl private value to a libxc function? >>> >>> We intend to collect all rdm entries info in advance and then we can >>> construct d_config->rdms based on our policies as follows. Because we >>> need to first allocate d_config->rdms properly to store rdms, but in >>> some cases we don't know how many buffers are enough. For example, we >>> don't have that global flag but with multiple pci devices. And even a >>> shared entry worsen this situation. >>> >>> So here, we set that flag as LIBXL_RDM_RESERVE_TYPE_HOST but without any >>> SBDF to grab all rdms. >> >> I'm afraid you didn't get my point: Values passed to libxc should be > > Sorry for this misunderstanding. > >> known to libxc. Values privately defined by libxl for its own purposes >> aren't known to libxc, and hence shouldn't be passed to libxc >> functions. > > I think we should set this with 'PCI_DEV_RDM_ALL' since, > > struct xen_reserved_device_memory_map { > /* IN */ > /* Currently just one bit to indicate checkng all Reserved Device > Memory. */ > #define PCI_DEV_RDM_ALL 0x1 > >> >>>>> + * 'try' policy is specified, and we also mark this as INVALID >>>>> not to expose >>>>> + * this entry to hvmloader. >>>> >>>> What is "this" in "... also mark this as ..."? Certainly neither the >>>> conflict >>>> nor the warning. >>> >>> Sorry, this is my fault. >>> >>> * If a conflict is detected on a given RMRR entry, an error >>> will be >>> * returned if 'strict' policy is specified. Or conflict is >>> treated as a >>> * warning if 'relaxed' policy is specified, and we also mark >>> this as >>> * INVALID not to expose this entry to hvmloader. >> >> The same "this" still doesn't have anything reasonable it references. I >> think you mean "the entry" (in which case the subsequent "this entry" >> could become just "it" afaict). But (not being a native speaker) the >> grammar of the second half of the sentence looks odd (and hence >> potentially confusing) to me anyway (i.e. even with the previous > > Sure, we need to make this better and clear. > >> issue fixed). > > * If a conflict is detected on a given RMRR entry, an error will be > * returned if 'strict' policy is specified. Instead, if 'relaxed' > policy > * specified, this conflict is treated just as a warning, but we > mark this > * RMRR entry as INVALID to indicate that this entry shouldn't be > exposed > * to hvmloader. > > I hope this can help us understand what we do. > >> >>>>> + * >>>>> + * Firstly we should check the case of rdm < 4G because we may >>>>> need to >>>>> + * expand highmem_end. >>>>> + */ >>>>> + for (i = 0; i < d_config->num_rdms; i++) { >>>>> + rdm_start = d_config->rdms[i].start; >>>>> + rdm_size = d_config->rdms[i].size; >>>>> + conflict = check_rdm_hole(0, args->lowmem_size, rdm_start, >>>>> rdm_size); >>>>> + >>>>> + if (!conflict) >>>>> + continue; >>>>> + >>>>> + /* >>>>> + * Just check if RDM > our memory boundary >>>>> + */ >>>>> + if (d_config->rdms[i].start > rdm_mem_guard) { >>>>> + /* >>>>> + * We will move downwards lowmem_end so we have to expand >>>>> + * highmem_end. >>>>> + */ >>>>> + highmem_end += (args->lowmem_size - rdm_start); >>>>> + /* Now move downwards lowmem_end. */ >>>>> + args->lowmem_size = rdm_start; >>>> >>>> Considering that the action here doesn't depend on the specific >>>> ->rdms[] slot being looked at, I don't see why the loop needs to >>> >>> I'm not sure if I understand what you mean. >>> >>> All rdm entries are organized disorderly in d_config->rdms, so we should >>> traverse all entries to make sure args->lowmem_size is below all rdms' >>> start address. >> >> I think I see what confused me: in the if() condition you reference >> d_config->rdms[i].start, yet the body of the if() has no reference >> to d_config->rdms[i] at all. If the if() used rdm_start it would >> become obvious that this is being latched at the beginning of the > > Indeed, I really should use rdm_start here. > >> body (which is what I overlooked, assuming the variable's value >> to have got set prior to the loop), and hence the body is not loop >> invariant. >> > > So just replace d_config->rdms[i].start with rdm_start like this, > > /* > * Just check if RDM > our memory boundary > */ > if (rdm_start > rdm_mem_guard) { > /* > * We will move downwards lowmem_end so we have to expand > * highmem_end. > */ > highmem_end += (args->lowmem_size - rdm_start); > /* Now move downwards lowmem_end. */ > args->lowmem_size = rdm_start; > } > } > > Thanks > Tiejun > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > >