From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Chen, Tiejun" Subject: Re: [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM Date: Tue, 19 May 2015 14:47:05 +0800 Message-ID: <555ADC69.5060605@intel.com> References: <1428657724-3498-1-git-send-email-tiejun.chen@intel.com> <1428657724-3498-5-git-send-email-tiejun.chen@intel.com> <20150508144343.GO3848@zion.uk.xensource.com> <555063D1.7030907@intel.com> <20150511113247.GC30997@zion.uk.xensource.com> <55545C81.8070509@intel.com> <20150518200017.GA9893@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20150518200017.GA9893@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: kevin.tian@intel.com, ian.campbell@citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, stefano.stabellini@citrix.com, JBeulich@suse.com, yang.z.zhang@intel.com, Ian.Jackson@eu.citrix.com List-Id: xen-devel@lists.xenproject.org On 2015/5/19 4:00, Wei Liu wrote: > On Thu, May 14, 2015 at 04:27:45PM +0800, Chen, Tiejun wrote: >> On 2015/5/11 19:32, Wei Liu wrote: >>> On Mon, May 11, 2015 at 04:09:53PM +0800, Chen, Tiejun wrote: >>>> On 2015/5/8 22:43, Wei Liu wrote: >>>>> Sorry for the late review. This series fell through the crack. >>>>> >>>> >>>> Thanks for your review. >>>> >>>>> On Fri, Apr 10, 2015 at 05:21:55PM +0800, Tiejun Chen wrote: >>>>>> While building a VM, HVM domain builder provides struct hvm_info_table{} >>>>>> to help hvmloader. Currently it includes two fields to construct guest >>>>>> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should >>>>>> check them to fix any conflict with RAM. >>>>>> >>>>>> RMRR can reside in address space beyond 4G theoretically, but we never >>>>>> see this in real world. So in order to avoid breaking highmem layout >>>>> >>>>> How does this break highmem layout? >>>> >>>> In most cases highmen is always continuous like [4G, ...] but RMRR is >>>> theoretically beyond 4G but very rarely. Especially we don't see this >>>> happened in real world. So we don't want to such a case breaking the >>>> highmem. >>>> >>> >>> The problem is we take this approach just because this rarely happens >>> *now* is not future proof. It needs to be clearly documented somewhere >>> in the manual (or any other Intel docs) and be referenced in the code. >>> Otherwise we might end up in a situation that no-one knows how it is >>> supposed to work and no-one can fix it if it breaks in the future, that >>> is, every single device on earth requires RMRR > 4G overnight (I'm >>> exaggerating). >>> >>> Or you can just make it works with highmem. How much more work do you >>> envisage? >>> >>> (If my above comment makes no sense do let me know. I only have very >>> shallow understanding of RMRR) >> >> Maybe I'm misleading you :) >> >> I don't mean RMRR > 4G is not allowed to work in our implementation. What >> I'm saying is that our *policy* is just simple for this kind of rare highmem >> case... >> > > Yes, but this policy is hardcoded in code (as in, you bail when > detecting conflict in highmem region). I don't think we have the > expertise to un-hardcode it in the future (it might require x86 specific > insider information and specific hardware to test). So I would like to > make it as future proof as possible. > > I know you're limited by hvm_info. I would accept this hardcoded policy > as short term solution, but I would like commitment from Intel to > maintain this piece of code and properly work out a flexible solution to > deal with >4G case. Looks Kevin replied this. > >>> >>>>> >>>>>> we don't solve highmem conflict. Note this means highmem rmrr could still >>>>>> be supported if no conflict. >>>>>> >> >> Like these two sentences above. >> >>>>> >>>>> Aren't you actively trying to avoid conflict in libxl? >>>> >>>> RMRR is fixed by BIOS so we can't aovid conflict. Here we want to adopt some >>>> good policies to address RMRR. In the case of highmemt, that simple policy >>>> should be enough? >>>> >>> >>> Whatever policy you and HV maintainers agree on. Just clearly document >>> it. >> >> Do you mean I should brief this patch description into one separate >> document? >> > > Either in code comment or in a separate document. Okay. > >>> >>>>> > > [...] > >>>> The caller to xc_device_get_rdm always frees this. >>>> >>> >>> I think I misunderstood how this function works. I thought xrdm was >>> passed in by caller, which is clearly not the case. Sorry! >>> >>> >>> In that case, the `if ( rc < 0 )' is not needed because the call should >>> always return rc < 0. An assert is good enough. >> >> assert(rc < 0)? But we can't presume the user always pass '0' firstly, and >> additionally, we may have no any RMRR indeed. >> >> So I guess what you want is, >> >> assert( rc <=0 ); >> if ( !rc ) >> goto xrdm; >> > > Yes I was thinking about something like this. > > How in this case can rc equal to 0? Is it because you don't actually have any Yes. > regions? If so, please write a comment. > Sure. >> if ( errno == ENOBUFS ) >> ... >> >> Right? >> >>> > > [...] > >>>>> >>>>>> + memset(d_config->rdms, 0, sizeof(libxl_device_rdm)); >>>>>> + >>>>>> + /* Query all RDM entries in this platform */ >>>>>> + if (type == LIBXL_RDM_RESERVE_TYPE_HOST) { >>>>>> + d_config->num_rdms = nr_all_rdms; >>>>>> + for (i = 0; i < d_config->num_rdms; i++) { >>>>>> + d_config->rdms[i].start = >>>>>> + (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT; >>>>>> + d_config->rdms[i].size = >>>>>> + (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT; >>>>>> + d_config->rdms[i].flag = d_config->b_info.rdm.reserve; >>>>>> + } >>>>>> + } else { >>>>>> + d_config->num_rdms = 0; >>>>>> + } >>>>> >>>>> And you should move d_config->rdms = libxl__calloc inside that `if'. >>>>> That is, don't allocate memory if you don't need it. >>>> >>>> We can't since in all cases we need to preallocate this, and then we will >>>> handle this according to our policy. >>>> >>> >>> How would it ever be used again if you set d_config->num_rdms to 0? How >>> do you know the exact size of your array again? >> >> If we don't consider multiple devices shares one rdm entry, our workflow can >> be showed as follows: >> >> #1. We always preallocate all rdms[] but with memset(). > > Because the number of RDM entries used in a domain never exceeds the > number of global entries? I.e. we never risk overrunning the array? Yes. "global" indicates the number would count all rdm entries. If w/o "global" flag, we just count these rdm entries associated to those pci devices assigned to this domain. So this means actually we have this equation, The number without "global" <= The number with "global" > >> #2. Then we have two cases for that global rule, >> >> #2.1 If we really need all according to our global rule, we would set all >> rdms[] with all real rdm info and set d_config->num_rdms. >> #2.2 If we have no such a global rule to obtain all, we just clear >> d_config->num_rdms. >> > > No, don't do this. In any failure path, if the free / dispose function > depends on num_rdms to iterate through the whole list to dispose memory > (if your rdm structure later contains pointers to allocated memory), > this method leaks memory. > > The num_ field should always reflect the actual size of the array. > >> #3. Then for per device rule >> >> #3.1 From #2.1, we just need to check if we should change one given rdm >> entry's policy if this given entry is just associated to this device. >> #3.2 From 2.2, obviously we just need to fill rdms one by one. Of course, >> its very possible that we don't fill all rdms since all passthroued devices >> might not have no rdm at all or they just occupy some. But anyway, finally >> we sync d_config->num_rdms. >> > > Sorry I don't follow. But if your problem is you don't know how many > entries you actually need, just use libxl__realloc? > >>> >>>>> >>>>>> + free(xrdm); >>>>>> + >>>>>> + /* Query RDM entries per-device */ >>>>>> + for (i = 0; i < d_config->num_pcidevs; i++) { >>>>>> + unsigned int nr_entries = 0; >>>> >>>> Maybe I should restate this, >>>> unsigned int nr_entries; >>>> > > [...] > >>>> >>>> Like I replied above we always preallocate this at the beginning. >>>> >>> >>> Ah, OK. >>> >>> But please don't do this. It's hard to see you don't overrun the >>> buffer. Please allocate memory only when you need it. >> >> Sorry I don't understand this. As I mention above, we don't know how many >> rdm entries we really need to allocate. So this is why we'd like to >> preallocate all rdms at the beginning. Then this can cover all cases, global >> policy, (global policy & per device) and only per device. Even if multiple >> devices shares one rdm we also need to avoid duplicating a new... >> > > Can you use libxl__realloc? > Looks this can expand size each time automatically, right? If so I think we can try this. Thanks Tiejun