On 19.12.2013 11:19, Ian Campbell wrote: > On Wed, 2013-12-18 at 17:44 -0700, Jim Fehlig wrote: >> Stefan Bader wrote: >>> On 18.12.2013 14:28, Ian Campbell wrote: >>> >>>> On Wed, 2013-12-18 at 14:12 +0100, Stefan Bader wrote: >>>> >>>>> On 18.12.2013 13:27, Ian Campbell wrote: >>>>> >>>>>> On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote: >>>>>> >>>>>>>> Might this libxl fix be relevant: >>>>>>>> commit 5420f26507fc5c9853eb1076401a8658d72669da >>>>>>>> Author: Jim Fehlig >>>>>>>> Date: Fri Jan 11 12:22:26 2013 +0000 >>>>>>>> >>>>>>>> libxl: Set vfb and vkb devid if not done so by the caller >>>>>>>> >>>>>>>> Other devices set a sensible devid if the caller has not done so. >>>>>>>> Do the same for vfb and vkb. While at it, factor out the common code >>>>>>>> used to determine a sensible devid, so it can be used by other >>>>>>>> libxl__device_*_add functions. >>>>>>>> >>>>>>>> Signed-off-by: Jim Fehlig >>>>>>>> Acked-by: Ian Campbell >>>>>>>> Committed-by: Ian Campbell >>>>>>>> >>>>>>>> and a follow up in dfeccbeaa. Although the comment implies that nic's >>>>>>>> were already correctly assigning a devid if the caller specified -1, so >>>>>>>> I don't know why it doesn't work for you :-( >>>>>>>> >>>>>>> Ok, yes, the commit above indeed changes libxl__device_nic_add to call >>>>>>> libxl__device_nextid for the devid... Just how is this actually called. >>>>>>> Maybe not sufficient but "git grep libxl__device_nic_add" in the xen code only >>>>>>> shows the definition and a declaration in libxl_internal.h to me... >>>>>>> >>>>>> I have a feeling a macro might be involved... >>>>>> >>>>>> Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really >>>>>> add the eventual function names in comments to provide grep fodder.... >>>>>> >>>>> Oh duh, yeah. So in DEFINE_DEVICE_ADD a libxl_device_nic_add is created which >>>>> calls to libxl__device_nic_add. When I look for the single _ version I find a >>>>> call from xl_cmdimpl.c and its public declaration in libxl.h. >>>>> So I guess the bug is that libvirt in the libxl driver never seems to do so >>>>> >>>> The macro creates libxl__add_nics which adds the nics from the >>>> libxl_domain_config->nics array. I don't think libvirt needs to call >>>> libxl_device_nic_add manually unless it is hotplugging a new nic at >>>> runtime. >>>> >>>> >>> >>> Hm, so I think this is the path: >>> >>> libxl_domain_create_new >>> -> do_domain_create >>> -> initiate_domain_create >>> -> libxl__bootloader_run (HVM domain, skipping bootloader) >>> <- domcreate_bootloader_done >>> -> domcreate_rebuild_done >>> <- domcreate_launch_dm >>> -> libxl__spawn_local_dm >>> <- domcreate_devmodel_started >>> >>> In libxl__spawn_local_dm, there is the following loop: >>> >>> for (i = 0; i < d_config->num_nics; i++) { >>> /* We have to init the nic here, because we still haven't >>> * called libxl_device_nic_add at this point, but qemu needs >>> * the nic information to be complete. >>> */ >>> ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid); >>> if (ret) >>> goto error_out; >>> } >>> >>> So I think when starting the dm, the devid just is not set as setdefault does >>> not seem to do so. I would be done in the later domcreate_devmodel_started >>> callback but that is too late for the generated qemu arguments. >>> >> >> Sorry for jumping in late... >> >> I stumbled across this problem just before openSUSE13.1 released and did >> a quick fix in libvirt >> >> https://build.opensuse.org/package/view_file/Virtualization:openSUSE13.1/libvirt/libxl-hvm-nic.patch?expand=1 >> >> I removed setting the NIC devid in the libxl driver a while back to be >> consistent with other devices >> >> http://libvirt.org/git/?p=libvirt.git;a=commit;h=ba64b97134a6129a48684f22f31be92c3b6eef96 >> >> The quick fix was to essentially revert the above commit until I could >> investigate further. Thank you for now having done that investigation >> :). Can the devid assignment logic be moved from >> libxl__device_nic_add() to libxl__device_nic_setdefault()? > > It certainly seems like it would be more natural to do it there. > > I suspect it might be done this way because at setdefault time you might > be walking a list of nics none of which have been created yet -- so > looking in xenstore would return "devid zero is free" for every one of > them? > > How about we: > * move the init to setdefault to catch the single NIC added via > hotplug case Init of devid? Hm, would that work as I am not sure there is a simple way of differentiating between a NIC config for a single hotplug and one that is part of a create-time array... > * we add somewhere early in the domain create path a call to a > function which assigns devids to an entire array of devices (and > do it for all the different device types). Perhaps in > initiate_domain_create() after the calls to > libxl__domain_create_info_setdefault and > libxl__domain_build_info_setdefault but before the loop calling > libxl__device_disk_setdefault for the disks. > * perhaps that same function should call setdefault too, after > having assigned the device, rather than it being done later in > an adhoc way? > > Does that sound at all plausible? I wonder, well this won't help for any other device types (maybe not really needed), what about just adding the following to the existing loop in domcreate_launch_dm (just a brain dump, not even tried to compile): for (i = 0; i < d_config->num_nics; i++) { /* We have to init the nic here, because we still haven't * called libxl_device_nic_add at this point, but qemu needs * the nic information to be complete. */ ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid); if (ret) goto error_out; + if (d_config->nics[i].devid < 0) + d_config->nics[i].devid = i; } Of course this a gain won't work well if the caller had assigned some devids but not other. Ok, maybe do the loop twice, first round sets default and picks the highest pre-assigned devid and second round makes sure any still unassigned ones are set to ++that. Oh, just while talking about setdefault. Jim, this is one of the odd things when moving from xm to xl stack from libvirt: libvirt defaults to the netfront NIC when no model is specified and sets the type. The libxl setdefault function sets the model to rtl8139 but leaves the type untouched. So setting no model in the xml config creates a domain with no emulated NIC (this does not matter after Linux is up because the emulated devices get unplugged). Just that PXE boot will not work. This gets odd because with the old xen (xm) driver, no model meant rtl8139. Sigh, and to hijack this thread even further I noticed a quite unexpected behaviour when starting a domain trhough libvirt and then try to use "xl list -l" to get config details. "xl list" shows all running domains but "xl list -l" produces something like "you have to specify a domain name". I found the origin of this to be libxl_userdata_retrieve which takes a userdate_userid as an argument. Libvirt uses "libvirt-xml" for that, while xl uses "xl". This might be intentional and the bug is just that we need a better check for not finding the userdata and then skipping those domains. On the other hand ... its after all in both cases a domain created and started through libxl... Stefan > > Ian. > >