From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39483) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XPJai-00034B-6f for qemu-devel@nongnu.org; Wed, 03 Sep 2014 19:04:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XPJaV-0003b7-Uo for qemu-devel@nongnu.org; Wed, 03 Sep 2014 19:03:52 -0400 Received: from e34.co.us.ibm.com ([32.97.110.152]:44256) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XPJaV-0003am-Np for qemu-devel@nongnu.org; Wed, 03 Sep 2014 19:03:39 -0400 Received: from /spool/local by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 3 Sep 2014 17:03:37 -0600 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: Michael Roth In-Reply-To: References: <1408407718-10835-1-git-send-email-mdroth@linux.vnet.ibm.com> <1408407718-10835-10-git-send-email-mdroth@linux.vnet.ibm.com> Message-ID: <20140903230331.32021.36244@loki> Date: Wed, 03 Sep 2014 18:03:31 -0500 Subject: Re: [Qemu-devel] [PATCH 09/12] spapr_pci: enable basic hotplug operations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Bharata B Rao Cc: aik@ozlabs.ru, "qemu-devel@nongnu.org" , Alexander Graf , ncmike@ncultra.org, "qemu-ppc@nongnu.org" , tyreld@linux.vnet.ibm.com, nfont@linux.vnet.ibm.com Quoting Bharata B Rao (2014-09-03 05:33:56) > On Tue, Aug 19, 2014 at 5:51 AM, Michael Roth = wrote: > > This enables hotplug for PHB bridges. Upon hotplug we generate the > > OF-nodes required by PAPR specification and IEEE 1275-1994 > > "PCI Bus Binding to Open Firmware" for the device. > > > > We associate the corresponding FDT for these nodes with the DrcEntry > > corresponding to the slot, which will be fetched via > > ibm,configure-connector RTAS calls by the guest as described by PAPR > > specification. The FDT is cleaned up in the case of unplug. > > > > Signed-off-by: Michael Roth > > --- > > hw/ppc/spapr_pci.c | 235 +++++++++++++++++++++++++++++++++++++++++= ++++++-- > > include/hw/ppc/spapr.h | 1 + > > 2 files changed, 228 insertions(+), 8 deletions(-) > > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > index 96a57be..23864ab 100644 > > --- a/hw/ppc/spapr_pci.c > > +++ b/hw/ppc/spapr_pci.c > > @@ -87,6 +87,17 @@ > > #define ENCODE_DRC_STATE(val, m, s) \ > > (((uint32_t)(val) << (s)) & (uint32_t)(m)) > > > > +#define FDT_MAX_SIZE 0x10000 > > +#define _FDT(exp) \ > > + do { \ > > + int ret =3D (exp); \ > > + if (ret < 0) { \ > > + return ret; \ > > + } \ > > + } while (0) > > + > > +static void spapr_drc_state_reset(sPAPRDrcEntry *drc_entry); > > + > > static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid) > > { > > sPAPRPHBState *sphb; > > @@ -476,6 +487,22 @@ static void rtas_set_indicator(PowerPCCPU *cpu, sP= APREnvironment *spapr, > > /* encode the new value into the correct bit field */ > > shift =3D INDICATOR_ISOLATION_SHIFT; > > mask =3D INDICATOR_ISOLATION_MASK; > > + if (drc_entry) { > > + /* transition from unisolated to isolated for a hotplug sl= ot > > + * entails completion of guest-side device unplug/cleanup,= so > > + * we can now safely remove the device if qemu is waiting = for > > + * it to be released > > + */ > > + if (DECODE_DRC_STATE(*pind, mask, shift) !=3D indicator_st= ate) { > > + if (indicator_state =3D=3D 0 && drc_entry->awaiting_re= lease) { > > + /* device_del has been called and host is waiting > > + * for guest to release/isolate device, go ahead > > + * and remove it now > > + */ > > + spapr_drc_state_reset(drc_entry); > > + } > > + } > > + } > > break; > > case 9002: /* DR */ > > shift =3D INDICATOR_DR_SHIFT; > > @@ -816,6 +843,198 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *= bus, void *opaque, int devfn) > > return &phb->iommu_as; > > } > > > > +static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int = offset, > > + int phb_index) > > +{ > > + int slot =3D PCI_SLOT(dev->devfn); > > + char slotname[16]; > > + bool is_bridge =3D 1; > > + sPAPRDrcEntry *drc_entry, *drc_entry_slot; > > + uint32_t reg[RESOURCE_CELLS_TOTAL * 8] =3D { 0 }; > > + uint32_t assigned[RESOURCE_CELLS_TOTAL * 8] =3D { 0 }; > > + int reg_size, assigned_size; > > + > > + drc_entry =3D spapr_phb_to_drc_entry(phb_index + SPAPR_PCI_BASE_BU= ID); > > + g_assert(drc_entry); > > + drc_entry_slot =3D &drc_entry->child_entries[slot]; > > + > > + if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) =3D=3D > > + PCI_HEADER_TYPE_NORMAL) { > > + is_bridge =3D 0; > > + } > > + > > + _FDT(fdt_setprop_cell(fdt, offset, "vendor-id", > > + pci_default_read_config(dev, PCI_VENDOR_ID, = 2))); > > + _FDT(fdt_setprop_cell(fdt, offset, "device-id", > > + pci_default_read_config(dev, PCI_DEVICE_ID, = 2))); > > + _FDT(fdt_setprop_cell(fdt, offset, "revision-id", > > + pci_default_read_config(dev, PCI_REVISION_ID= , 1))); > > + _FDT(fdt_setprop_cell(fdt, offset, "class-code", > > + pci_default_read_config(dev, PCI_CLASS_DEVIC= E, 2) << 8)); > > + > > + _FDT(fdt_setprop_cell(fdt, offset, "interrupts", > > + pci_default_read_config(dev, PCI_INTERRUPT_P= IN, 1))); > > + > > + /* if this device is NOT a bridge */ > > + if (!is_bridge) { > > + _FDT(fdt_setprop_cell(fdt, offset, "min-grant", > > + pci_default_read_config(dev, PCI_MIN_GNT, 1))); > > + _FDT(fdt_setprop_cell(fdt, offset, "max-latency", > > + pci_default_read_config(dev, PCI_MAX_LAT, 1))); > > + _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id", > > + pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2))); > > + _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id", > > + pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2))); > > + } > > + > > + _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size", > > + pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1))); > > + > > + /* the following fdt cells are masked off the pci status register = */ > > + int pci_status =3D pci_default_read_config(dev, PCI_STATUS, 2); > > + _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed", > > + PCI_STATUS_DEVSEL_MASK & pci_status)); > > + _FDT(fdt_setprop_cell(fdt, offset, "fast-back-to-back", > > + PCI_STATUS_FAST_BACK & pci_status)); > > + _FDT(fdt_setprop_cell(fdt, offset, "66mhz-capable", > > + PCI_STATUS_66MHZ & pci_status)); > > + _FDT(fdt_setprop_cell(fdt, offset, "udf-supported", > > + PCI_STATUS_UDF & pci_status)); > > + > > + _FDT(fdt_setprop_string(fdt, offset, "name", "pci")); > > + sprintf(slotname, "Slot %d", slot + phb_index * 32); > > + _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", slotname, strlen(slo= tname))); > > + _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", > > + drc_entry_slot->drc_index)); > > + > > + _FDT(fdt_setprop_cell(fdt, offset, "#address-cells", > > + RESOURCE_CELLS_ADDRESS)); > > + _FDT(fdt_setprop_cell(fdt, offset, "#size-cells", > > + RESOURCE_CELLS_SIZE)); > > + _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x", > > + RESOURCE_CELLS_SIZE)); > > + fill_resource_props(dev, phb_index, reg, ®_size, > > + assigned, &assigned_size); > > + _FDT(fdt_setprop(fdt, offset, "reg", reg, reg_size)); > > + _FDT(fdt_setprop(fdt, offset, "assigned-addresses", > > + assigned, assigned_size)); > > + > > + return 0; > > +} > > + > > +static int spapr_device_hotplug_add(DeviceState *qdev, PCIDevice *dev) > > +{ > > + sPAPRPHBState *phb =3D SPAPR_PCI_HOST_BRIDGE(qdev); > > + sPAPRDrcEntry *drc_entry, *drc_entry_slot; > > + sPAPRConfigureConnectorState *ccs; > > + int slot =3D PCI_SLOT(dev->devfn); > > + int offset, ret; > > + void *fdt_orig, *fdt; > > + char nodename[512]; > > + uint32_t encoded =3D ENCODE_DRC_STATE(INDICATOR_ENTITY_SENSE_PRESE= NT, > > + INDICATOR_ENTITY_SENSE_MASK, > > + INDICATOR_ENTITY_SENSE_SHIFT); > > + > = > I am building on this infrastructure of yours and adding CPU hotplug > support to sPAPR guests. > = > So we start with dr state of UNUSABLE and change it to PRESENT like > above when performing hotplug operation. But after this, in case of > CPU hotplug, the CPU hotplug code in the kernel > (arch/powerpc/platforms/pseries/dlpar.c:dlpar_acquire_drc()) does > get-sensor-state rtas call and expects the dr state to be UNUSABLE. Is > the guest kernel right in expecting dr state to be in UNUSABLE state > like this ? I have in fact disabled this check in the guest kernel to > get a CPU hotplugged successfully, but wanted to understand the state > changes and the expectations from the guest kernel correctly. According to PAPR+ 2.7 13.5.3.3, PRESENT (1): = Returned for logical and physical DR entities when the DR connector is allocated to the OS and the DR entity is present. For physical DR entitie= s, this indicates that the DR connector actually has a DR entity plugged into it. For DR connectors of physical DR entities, the DR connector must be allocated to the OS to return this value, otherwise a status of -3, no su= ch sensor implemented, will be returned from the get-sensor-state RTAS call.= For DR connectors of logical DR entities, the DR connector must be allocated = to the OS to return this value, otherwise a sensor value of 2 or 3 will be returned. = UNUSABLE (2): = Returned for logical DR entities when the DR entity is not currently available to the OS, but may possibly be made available to the OS by call= ing set-indicator with the allocation-state indicator, setting that indicator= to usable. So it seems 'PRESENT' is in fact the right value immediately after PCI hotplug, but it doesn't seem clear from the documentation whether 'PRESENT' or 'UNUSABLE' is more correct immediately after CPU hotplug. What does seem clear as that UNUSABLE does not have any use in the case of PCI devices: just PRESENT/EMPTY(0). But we actually 0-initialize the sensor field for DRCEntrys corresponding to PCI devices, which corresponds to 'EMPTY' (0), so the handling seems correct for PCI devices... And since we already initialize PHB sensors to UNUSABLE in the top-level DRC list, I'm not sure why adjacent CPU entries would be affected by what we do later for PCI devices? It seems like you'd just need to do the equivalent of what we do for PHBs during realize: spapr_add_phb_to_drc_table(sphb->buid, 2 /* Unusable */); So I'm not sure where the need for guest kernel changes is coming from for CPU hotplug. Do you have a snippet of what the initialize/hot_add hooks like in your case? > = > Regards, > Bharata.