From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Liu Subject: Re: [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM Date: Mon, 11 May 2015 12:32:47 +0100 Message-ID: <20150511113247.GC30997@zion.uk.xensource.com> References: <1428657724-3498-1-git-send-email-tiejun.chen@intel.com> <1428657724-3498-5-git-send-email-tiejun.chen@intel.com> <20150508144343.GO3848@zion.uk.xensource.com> <555063D1.7030907@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <555063D1.7030907@intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "Chen, Tiejun" Cc: kevin.tian@intel.com, Wei Liu , ian.campbell@citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, stefano.stabellini@citrix.com, JBeulich@suse.com, yang.z.zhang@intel.com, Ian.Jackson@eu.citrix.com List-Id: xen-devel@lists.xenproject.org On Mon, May 11, 2015 at 04:09:53PM +0800, Chen, Tiejun wrote: > On 2015/5/8 22:43, Wei Liu wrote: > >Sorry for the late review. This series fell through the crack. > > > > Thanks for your review. > > >On Fri, Apr 10, 2015 at 05:21:55PM +0800, Tiejun Chen wrote: > >>While building a VM, HVM domain builder provides struct hvm_info_table{} > >>to help hvmloader. Currently it includes two fields to construct guest > >>e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should > >>check them to fix any conflict with RAM. > >> > >>RMRR can reside in address space beyond 4G theoretically, but we never > >>see this in real world. So in order to avoid breaking highmem layout > > > >How does this break highmem layout? > > In most cases highmen is always continuous like [4G, ...] but RMRR is > theoretically beyond 4G but very rarely. Especially we don't see this > happened in real world. So we don't want to such a case breaking the > highmem. > The problem is we take this approach just because this rarely happens *now* is not future proof. It needs to be clearly documented somewhere in the manual (or any other Intel docs) and be referenced in the code. Otherwise we might end up in a situation that no-one knows how it is supposed to work and no-one can fix it if it breaks in the future, that is, every single device on earth requires RMRR > 4G overnight (I'm exaggerating). Or you can just make it works with highmem. How much more work do you envisage? (If my above comment makes no sense do let me know. I only have very shallow understanding of RMRR) > > > >>we don't solve highmem conflict. Note this means highmem rmrr could still > >>be supported if no conflict. > >> > > > >Aren't you actively trying to avoid conflict in libxl? > > RMRR is fixed by BIOS so we can't aovid conflict. Here we want to adopt some > good policies to address RMRR. In the case of highmemt, that simple policy > should be enough? > Whatever policy you and HV maintainers agree on. Just clearly document it. > > > >>But in the case of lowmem, RMRR probably scatter the whole RAM space. > >>Especially multiple RMRR entries would worsen this to lead a complicated > >>memory layout. And then its hard to extend hvm_info_table{} to work > >>hvmloader out. So here we're trying to figure out a simple solution to > >>avoid breaking existing layout. So when a conflict occurs, > >> > >> #1. Above a predefined boundary (default 2G) > >> - move lowmem_end below reserved region to solve conflict; > >> > > > >I hope this "predefined boundary" is user tunable. I will check later in > >this patch if it is the case. > > Yes. As we clarified in that comments, > > * TODO: Its batter to provide a config parameter for this boundary value. > > This mean I would provide a patch address this since currently I just think > this is not a big deal? > Yes please provide a config option to override that. It's reasonable that user wants to change that. > > > >> #2 Below a predefined boundary (default 2G) > >> - Check force/try policy. > >> "force" policy leads to fail libxl. Note when both policies > >> are specified on a given region, 'force' is always preferred. > >> "try" policy issue a warning message and also mask this entry INVALID > >> to indicate we shouldn't expose this entry to hvmloader. > >> > >>Signed-off-by: Tiejun Chen > >>--- > >> tools/libxc/include/xenctrl.h | 8 ++ > >> tools/libxc/include/xenguest.h | 3 +- > >> tools/libxc/xc_domain.c | 40 +++++++++ > >> tools/libxc/xc_hvm_build_x86.c | 28 +++--- > >> tools/libxl/libxl_create.c | 2 +- > >> tools/libxl/libxl_dm.c | 195 +++++++++++++++++++++++++++++++++++++++++ > >> tools/libxl/libxl_dom.c | 27 +++++- > >> tools/libxl/libxl_internal.h | 11 ++- > >> tools/libxl/libxl_types.idl | 7 ++ > >> 9 files changed, 303 insertions(+), 18 deletions(-) > >> > >>diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h > >>index 59bbe06..299b95f 100644 > >>--- a/tools/libxc/include/xenctrl.h > >>+++ b/tools/libxc/include/xenctrl.h > >>@@ -2053,6 +2053,14 @@ int xc_get_device_group(xc_interface *xch, > >> uint32_t *num_sdevs, > >> uint32_t *sdev_array); > >> > >>+struct xen_reserved_device_memory > >>+*xc_device_get_rdm(xc_interface *xch, > >>+ uint32_t flag, > >>+ uint16_t seg, > >>+ uint8_t bus, > >>+ uint8_t devfn, > >>+ unsigned int *nr_entries); > >>+ > >> int xc_test_assign_device(xc_interface *xch, > >> uint32_t domid, > >> uint32_t machine_sbdf); [...] > > > >> uint64_t mem_target; /* Memory target in bytes. */ > >> uint64_t mmio_size; /* Size of the MMIO hole in bytes. */ > >> const char *image_file_name; /* File name of the image to load. */ > >>diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > >>index 4f8383e..85b18ea 100644 > >>--- a/tools/libxc/xc_domain.c > >>+++ b/tools/libxc/xc_domain.c > >>@@ -1665,6 +1665,46 @@ int xc_assign_device( > >> return do_domctl(xch, &domctl); > >> } > >> > >>+struct xen_reserved_device_memory > >>+*xc_device_get_rdm(xc_interface *xch, > >>+ uint32_t flag, > >>+ uint16_t seg, > >>+ uint8_t bus, > >>+ uint8_t devfn, > >>+ unsigned int *nr_entries) > >>+{ > >>+ struct xen_reserved_device_memory *xrdm = NULL; > >>+ int rc = xc_reserved_device_memory_map(xch, flag, seg, bus, devfn, xrdm, > >>+ nr_entries); > >>+ > >>+ if ( rc < 0 ) > >>+ { > >>+ if ( errno == ENOBUFS ) > >>+ { > >>+ if ( (xrdm = malloc(*nr_entries * > >>+ sizeof(xen_reserved_device_memory_t))) == NULL ) > >>+ { > >>+ PERROR("Could not allocate memory."); > >>+ goto out; > >>+ } > > > >Don't you leak origin xrdm in this case? > > The caller to xc_device_get_rdm always frees this. > I think I misunderstood how this function works. I thought xrdm was passed in by caller, which is clearly not the case. Sorry! In that case, the `if ( rc < 0 )' is not needed because the call should always return rc < 0. An assert is good enough. > > > >And, this style is not very good. Shouldn't the caller allocate enough > >memory before hand? > > Are you saying the caller to xc_device_get_rdm()? If so, any caller don't > know this, too. > I see. > Actually this is just a wrapper of that fundamental hypercall, > xc_reserved_device_memory_map() in patch #2, and based on that, we always > have to first call this to inquire how much memory we really need. And this > is why we have this wrapper since we don't want to duplicate more codes. > > One error handler of this wrapper is just handling ENOBUFS since the caller > never know how much memory we should allocate. So oftentimes we always set > 'entries = 0' to inquire firstly. > > Here Jan suggested we may need to figure out a good way to consolidate > xc_reserved_device_memory_map() and its wrapper, xc_device_get_rdm(). > > But in some ways, that wrapper likes a static function so we just need to > move this into that file associated to its caller, right? > Yes, if there is only one user at the moment, make a static function. > > > >>+ rc = xc_reserved_device_memory_map(xch, flag, seg, bus, devfn, xrdm, > >>+ nr_entries); > >>+ if ( rc ) > >>+ { > >>+ PERROR("Could not get reserved device memory maps."); > >>+ free(xrdm); > >>+ xrdm = NULL; > >>+ } > >>+ } > >>+ else { > >>+ PERROR("Could not get reserved device memory maps."); > >>+ } > >>+ } > >>+ > >>+ out: > >>+ return xrdm; > >>+} > >>+ > >> int xc_get_device_group( > >> xc_interface *xch, > >> uint32_t domid, > >>diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c > >>index c81a25b..3f87bb3 100644 > >>--- a/tools/libxc/xc_hvm_build_x86.c > >>+++ b/tools/libxc/xc_hvm_build_x86.c > >>@@ -89,19 +89,16 @@ static int modules_init(struct xc_hvm_build_args *args, > >> } > >> > >> static void build_hvm_info(void *hvm_info_page, uint64_t mem_size, > >>- uint64_t mmio_start, uint64_t mmio_size) > >>+ uint64_t lowmem_end) > >> { > >> struct hvm_info_table *hvm_info = (struct hvm_info_table *) > >> (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET); > >>- uint64_t lowmem_end = mem_size, highmem_end = 0; > >>+ uint64_t highmem_end = 0; > >> uint8_t sum; > >> int i; > >> > >>- if ( lowmem_end > mmio_start ) > >>- { > >>- highmem_end = (1ull<<32) + (lowmem_end - mmio_start); > >>- lowmem_end = mmio_start; > >>- } > >>+ if ( mem_size > lowmem_end ) > >>+ highmem_end = (1ull<<32) + (mem_size - lowmem_end); > >> > >> memset(hvm_info_page, 0, PAGE_SIZE); [...] > > > >>+ struct xc_hvm_build_args *args) > > > >This function does more than "checking", so a better name is needed. > > > >May be you should split this function to one "build" function and one > >"check" function? What do you think? > > We'd better keep this big one since this can make our policy understandable, > but I agree we need to rename this like, > > libxl__domain_device_construct_rdm() > > construct = check + build :) I'm fine with this. > > > > >>+{ > >>+ int i, j, conflict; > >>+ libxl_ctx *ctx = libxl__gc_owner(gc); > > > >You can just use CTX macro. [...] > > > >>+ if ((type == LIBXL_RDM_RESERVE_TYPE_NONE) && !d_config->num_pcidevs) > >>+ return 0; > >>+ > >>+ /* Collect all rdm info if exist. */ > >>+ xrdm = xc_device_get_rdm(ctx->xch, LIBXL_RDM_RESERVE_TYPE_HOST, > >>+ 0, 0, 0, &nr_all_rdms); > >>+ if (!nr_all_rdms) > >>+ return 0; > >>+ d_config->rdms = libxl__calloc(gc, nr_all_rdms, > >>+ sizeof(libxl_device_rdm)); > > > >Note that if you use "gc" here the allocated memory will be, well, > >garbage collected at some point. If you don't want them to be gc'ed you > >should use NOGC. > > Sorry, what does that mean by 'garbage collected'? > That means the memory allocated with gc will be freed at some point by GC_FREE, because those memory regions are meant to be temporary and used internally. When entering a libxl public API function (those start wtih libxl_), that function calls GC_INIT to initialise a garbage collector. When that function exits, it calls GC_FREE to free all memory allocated with that gc. Since d_config is very likely to be used by libxl user (xl, libvirt etc), you probably don't want to fill it with gc allocated memory. > > > >>+ memset(d_config->rdms, 0, sizeof(libxl_device_rdm)); > >>+ > >>+ /* Query all RDM entries in this platform */ > >>+ if (type == LIBXL_RDM_RESERVE_TYPE_HOST) { > >>+ d_config->num_rdms = nr_all_rdms; > >>+ for (i = 0; i < d_config->num_rdms; i++) { > >>+ d_config->rdms[i].start = > >>+ (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT; > >>+ d_config->rdms[i].size = > >>+ (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT; > >>+ d_config->rdms[i].flag = d_config->b_info.rdm.reserve; > >>+ } > >>+ } else { > >>+ d_config->num_rdms = 0; > >>+ } > > > >And you should move d_config->rdms = libxl__calloc inside that `if'. > >That is, don't allocate memory if you don't need it. > > We can't since in all cases we need to preallocate this, and then we will > handle this according to our policy. > How would it ever be used again if you set d_config->num_rdms to 0? How do you know the exact size of your array again? > > > >>+ free(xrdm); > >>+ > >>+ /* Query RDM entries per-device */ > >>+ for (i = 0; i < d_config->num_pcidevs; i++) { > >>+ unsigned int nr_entries = 0; > > Maybe I should restate this, > unsigned int nr_entries; > > >>+ bool new = true; > >>+ seg = d_config->pcidevs[i].domain; > >>+ bus = d_config->pcidevs[i].bus; > >>+ devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func); > >>+ nr_entries = 0; > > > >You've already initialised this variable. > > We need to set this as zero to start. > Either of the tow works for me. Just don't want redundant initialisation. > > > >>+ xrdm = xc_device_get_rdm(ctx->xch, LIBXL_RDM_RESERVE_TYPE_NONE, > >>+ seg, bus, devfn, &nr_entries); > >>+ /* No RDM to associated with this device. */ > >>+ if (!nr_entries) > >>+ continue; > >>+ > >>+ /* Need to check whether this entry is already saved in the array. > >>+ * This could come from two cases: > >>+ * > >>+ * - user may configure to get all RMRRs in this platform, which > >>+ * is already queried before this point > > > >Formatting. > > Are you saying this? I mean you need to move "is already..." to the right go align with previous line. > > + /* Need to check whether this entry is already saved in the array. > > => The CODING_STYLE in libxl doesn't seem to enforce this, so you can just follow other examples. > /* > > * Need to check whether this entry is already saved in the array. > * This could come from two cases: > > > > >>+ * - or two assigned devices may share one RMRR entry > >>+ * > >>+ * different policies may be configured on the same RMRR due to above > >>+ * two cases. We choose a simple policy to always favor stricter policy > >>+ */ > >>+ for (j = 0; j < d_config->num_rdms; j++) { > >>+ if (d_config->rdms[j].start == > >>+ (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT) > >>+ { > >>+ if (d_config->rdms[j].flag != LIBXL_RDM_RESERVE_FLAG_FORCE) > >>+ d_config->rdms[j].flag = d_config->pcidevs[i].rdm_reserve; > >>+ new = false; > >>+ break; > >>+ } > >>+ } > >>+ > >>+ if (new) { > >>+ if (d_config->num_rdms > nr_all_rdms - 1) { > >>+ LIBXL__LOG(CTX, LIBXL__LOG_ERROR, "Exceed rdm array boundary!\n"); > > > >LOG(ERROR, ...) > > Fixed. > > > > >>+ free(xrdm); > >>+ return -1; > > > >Please use goto out idiom. > > We just have two 'return -1' differently so I'm not sure its worth doing > this. > Yes, please comply with libxl idiom. > > > >>+ } > >>+ > >>+ /* > >>+ * This is a new entry. > >>+ */ > > > >/* This is a new entry. */ > > Fixed. > > > > >>+ d_config->rdms[d_config->num_rdms].start = > >>+ (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT; > >>+ d_config->rdms[d_config->num_rdms].size = > >>+ (uint64_t)xrdm[0].nr_pages << XC_PAGE_SHIFT; > >>+ d_config->rdms[d_config->num_rdms].flag = d_config->pcidevs[i].rdm_reserve; > >>+ d_config->num_rdms++; > > > >Does this work? I don't see you reallocate memory. > > Like I replied above we always preallocate this at the beginning. > Ah, OK. But please don't do this. It's hard to see you don't overrun the buffer. Please allocate memory only when you need it. > > > >>+ } > >>+ free(xrdm); > > > >Bug: you free xrdm several times. > > No any conflict. > > What I did is that I would free once I finish to calling every > xc_device_get_rdm(). > OK. I misread. Sorry. [...] > > > > >>+ /* Next step is to check and avoid potential conflict between RDM entries > >>+ * and guest RAM. To avoid intrusive impact to existing memory layout > >>+ * {lowmem, mmio, highmem} which is passed around various function blocks, > >>+ * below conflicts are not handled which are rare and handling them would > >>+ * lead to a more scattered layout: > >>+ * - RMRR in highmem area (>4G) > >>+ * - RMRR lower than a defined memory boundary (e.g. 2G) > >>+ * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem > >>+ * end below reserved region to solve conflict. > >>+ * > >>+ * If a conflict is detected on a given RMRR entry, an error will be > >>+ * returned. > >>+ * If 'force' policy is specified. Or conflict is treated as a warning if > >>+ * 'try' policy is specified, and we also mark this as INVALID not to expose > >>+ * this entry to hvmloader. > >>+ * > >>+ * Firstly we should check the case of rdm < 4G because we may need to > >>+ * expand highmem_end. > > > >Is this strategy agreed in previous discussion? How future-proof is this > > Yes, this is based on that design. > OK. [...] > >> > >> int libxl__build_hvm(libxl__gc *gc, uint32_t domid, > >>- libxl_domain_build_info *info, > >>+ libxl_domain_config *d_config, > >> libxl__domain_build_state *state) > >> { > >> libxl_ctx *ctx = libxl__gc_owner(gc); > >> struct xc_hvm_build_args args = {}; > >> int ret, rc = ERROR_FAIL; > >>+ libxl_domain_build_info *const info = &d_config->b_info; > >>+ uint64_t rdm_mem_boundary, mmio_start; I didn't mention this in the first pass. You seem to have inserted some tabs? We use space to indent. Wei.