From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Is: 'basic pci bridge and root device support. 'Was:Re: Discussion about virtual iommu support for Xen guest Date: Fri, 3 Jun 2016 15:51:33 -0400 Message-ID: <20160603195133.GF20730@char.us.oracle.com> References: <5746B3FA.6020401@intel.com> <5746DF93.8090803@citrix.com> <57480327.60609@intel.com> <78d122f0-c773-7fa3-1258-c551937c508e@intel.com> <575081E8.6070609@citrix.com> <57518B78.6060604@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="9jxsPFA5p3P2qPhR" Return-path: Content-Disposition: inline In-Reply-To: <57518B78.6060604@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Andrew Cooper Cc: "Lan, Tianyu" , "yang.zhang.wz@gmail.com" , "Tian, Kevin" , "sstabellini@kernel.org" , "Nakajima, Jun" , "Dong, Eddie" , "ian.jackson@eu.citrix.com" , "xen-devel@lists.xensource.com" , "jbeulich@suse.com" , "anthony.perard@citrix.com" , Roger Pau Monne List-Id: xen-devel@lists.xenproject.org --9jxsPFA5p3P2qPhR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline > For HVMLite, there is specifically no qemu, and we need something which > can function when we want PCI Passthrough to work. I am quite confident > that the correct solution here is to have a basic host bridge/root port > implementation in Xen (as we already have 80% of this already), at which > point we don't need any qemu interaction for PCI Passthough at all, even > for HVM guests. Could you expand on this a bit? I am asking b/c some time ago I wrote in Xen code to construct a full view of the bridges->devices (and various in branching) so that I could renumber the bus values and its devices (expand them) on bridges. This was solely done so that I could use SR-IOV devices on non-SR-IOV capable BIOSes. I am wondering how much of the basic functionality (enumeration, keeping track, etc) could be worked in this 'basic host bridge/root port' implementation idea of yours. Attaching the patches. --9jxsPFA5p3P2qPhR Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0001-pci-On-PCI-dump-device-keyhandler-include-Device-and.patch" >>From 4ea2d880c0250c1278995e5ee7d9e48151c4e4e1 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Tue, 4 Feb 2014 12:52:35 -0500 Subject: [PATCH 1/5] pci: On PCI dump device keyhandler include Device and Vendor ID As it helps in troubleshooting if the initial domain has re-numbered the bus numbers and what Xen sees is not the reality. Signed-off-by: Konrad Rzeszutek Wilk --- xen/drivers/passthrough/pci.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index cdbabc2..5e5097e 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1211,9 +1211,12 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg) list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list ) { - printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ", + int id = pci_conf_read32(pseg->nr, pdev->bus, PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn), 0); + printk("%04x:%02x:%02x.%u (%04x:%04x)- dom %-3d - node %-3d - MSIs < ", pseg->nr, pdev->bus, PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), + id & 0xffff, (id >> 16) & 0xffff, pdev->domain ? pdev->domain->domain_id : -1, (pdev->node != NUMA_NO_NODE) ? pdev->node : -1); list_for_each_entry ( msi, &pdev->msi_list, list ) -- 2.5.5 --9jxsPFA5p3P2qPhR Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0002-DEBUG-Include-upstream-bridge-information.patch" >>From ac79b6cdd20765d30adbff40514e729a2c33e74e Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Tue, 4 Feb 2014 17:01:42 -0500 Subject: [PATCH 2/5] DEBUG: Include upstream bridge information. Signed-off-by: Konrad Rzeszutek Wilk --- xen/drivers/passthrough/pci.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 5e5097e..ae6df78 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1213,6 +1213,9 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg) { int id = pci_conf_read32(pseg->nr, pdev->bus, PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), 0); + int rc = 0; + u8 bus, devfn, secbus; + printk("%04x:%02x:%02x.%u (%04x:%04x)- dom %-3d - node %-3d - MSIs < ", pseg->nr, pdev->bus, PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), @@ -1221,7 +1224,14 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg) (pdev->node != NUMA_NO_NODE) ? pdev->node : -1); list_for_each_entry ( msi, &pdev->msi_list, list ) printk("%d ", msi->irq); - printk(">\n"); + bus = pdev->bus; + devfn = pdev->devfn; + + rc = find_upstream_bridge( pseg->nr, &bus, &devfn, &secbus ); + if ( rc < 0) + printk(">\n"); + else + printk(">[%02x:%02x.%u]\n", bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); } printk("==== Bus2Bridge %04x ====\n", pseg->nr); spin_lock(&pseg->bus2bridge_lock); -- 2.5.5 --9jxsPFA5p3P2qPhR Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0003-xen-pci-assign-buses-Renumber-the-bus-if-there-is-a-.patch" >>From 954e04936a7fdbd5a10807b901812f09b382f07a Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 12 Jan 2015 16:37:32 -0500 Subject: [PATCH 3/5] xen/pci=assign-buses: Renumber the bus if there is a need to (v6). Xen can re-number the PCI buses if there are SR-IOV devices there and the BIOS hadn't done its job. Use pci=assign-buses,verbose to see it work. Signed-off-by: Konrad Rzeszutek Wilk --- xen/arch/x86/setup.c | 2 + xen/drivers/passthrough/pci.c | 647 ++++++++++++++++++++++++++++++++++++++++++ xen/include/xen/pci.h | 1 + 3 files changed, 650 insertions(+) diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 90b1b6c..81a7d6d 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -1474,6 +1474,8 @@ void __init noreturn __start_xen(unsigned long mbi_p) acpi_mmcfg_init(); + early_pci_reassign_busses(); + early_msi_init(); iommu_setup(); /* setup iommu if available */ diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index ae6df78..62b5f85 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -178,6 +178,8 @@ custom_param("pci-phantom", parse_phantom_dev); static u16 __read_mostly command_mask; static u16 __read_mostly bridge_ctl_mask; +static unsigned int __initdata assign_busses; +static unsigned int __initdata verbose; /* * The 'pci' parameter controls certain PCI device aspects. @@ -213,6 +215,10 @@ static void __init parse_pci_param(char *s) cmd_mask = PCI_COMMAND_PARITY; brctl_mask = PCI_BRIDGE_CTL_PARITY; } + else if ( !strcmp(s, "assign-buses") ) + assign_busses = 1; + else if ( !strcmp(s, "verbose") ) + verbose = 1; if ( on ) { @@ -1091,6 +1097,647 @@ static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg, void *arg return 0; } +/* Move this to its own file */ +#define DEBUG 1 + +struct early_pci_bus; + +struct early_pci_dev { + struct list_head bus_list; /* Linked against 'devices */ + unsigned int is_serial:1; + unsigned int is_ehci:1; + unsigned int is_sriov:1; + unsigned int is_bridge:1; + u16 vendor; + u16 device; + u8 devfn; + u16 total_vfs; + u16 revision; + u16 class; + struct early_pci_bus *bus; /* On what bus we are. */ + struct early_pci_bus *bridge; /* Ourselves if we are a bridge */ +}; +struct early_pci_bus { + struct list_head next; + struct list_head devices; + struct list_head children; + struct early_pci_bus *parent; /* Bus upstream of us. */ + struct early_pci_dev *self; /* The PCI device that controls this bus. */ + u8 primary; /* The (parent) bus number */ + u8 number; + u8 start; + u8 end; + u8 new_end; /* To be updated too */ + u8 new_start; + u8 new_primary; + u8 old_number; +}; + +static struct list_head __initdata early_buses_list; +#define PCI_CLASS_SERIAL_USB_EHCI 0x0c0320 + +static __init struct early_pci_dev *early_alloc_pci_dev(struct early_pci_bus *bus, + u8 devfn) +{ + struct early_pci_dev *dev; + u8 type; + u16 class_dev, total; + u32 class, id; + unsigned int pos; + + if ( !bus ) + return NULL; + + dev = xzalloc(struct early_pci_dev); + if ( !dev ) + return NULL; + + INIT_LIST_HEAD(&dev->bus_list); + dev->devfn = devfn; + dev->bus = bus; + class = pci_conf_read32(0, bus->number, PCI_SLOT(devfn), PCI_FUNC(devfn), + PCI_CLASS_REVISION); + + dev->revision = class & 0xff; + dev->class = class >> 8; + if ( dev->class == PCI_CLASS_SERIAL_USB_EHCI ) + dev->is_ehci = 1; + + class_dev = pci_conf_read16(0, bus->number, PCI_SLOT(devfn), PCI_FUNC(devfn), + PCI_CLASS_DEVICE); + switch ( class_dev ) + { + case 0x0700: /* single port serial */ + case 0x0702: /* multi port serial */ + case 0x0780: /* other (e.g serial+parallel) */ + dev->is_serial = 1; + default: + break; + } + type = pci_conf_read8(0, bus->number, PCI_SLOT(devfn), PCI_FUNC(devfn), + PCI_HEADER_TYPE); + switch ( type & 0x7f ) + { + case PCI_HEADER_TYPE_BRIDGE: + case PCI_HEADER_TYPE_CARDBUS: + dev->is_bridge = 1; + break; + case PCI_HEADER_TYPE_NORMAL: + pos = pci_find_cap_offset(0, bus->number, PCI_SLOT(devfn), + PCI_FUNC(devfn), PCI_CAP_ID_EXP); + if (!pos) /* Not PCIe */ + break; + pos = pci_find_ext_capability(0, bus->number, devfn, + PCI_EXT_CAP_ID_SRIOV); + if (!pos) /* Not SR-IOV */ + break; + total = pci_conf_read16(0, bus->number, PCI_SLOT(devfn), + PCI_FUNC(devfn), pos + PCI_SRIOV_TOTAL_VF); + if (!total) + break; + dev->is_sriov = 1; + dev->total_vfs = total; + /* Fall through */ + default: + break; + } + id = pci_conf_read32(0, bus->number, PCI_SLOT(devfn), PCI_FUNC(devfn), + PCI_VENDOR_ID); + dev->vendor = id & 0xffff; + dev->device = (id >> 16) & 0xffff; + /* In case MCFG is not configured we have our blacklist */ + switch ( dev->vendor ) + { + case 0x8086: /* Intel */ + switch ( dev->device ) + { + case 0x10c9: /* Intel Corporation 82576 Gigabit Network Connection (rev 01) */ + if ( dev->is_sriov ) + break; + dev->is_sriov = 1; + dev->total_vfs = 8; + } + default: + break; + } + return dev; +} + +static __init struct early_pci_bus *__find_bus(struct early_pci_bus *parent, + u8 nr) +{ + struct early_pci_bus *child, *bus; + + if ( parent->number == nr ) + return parent; + + list_for_each_entry ( child, &parent->children, next ) + { + if ( child->number == nr ) + return child; + bus = __find_bus(child, nr); + if ( bus ) + return bus; + } + return NULL; +} + +static __init struct early_pci_bus *find_bus(u8 nr) +{ + struct early_pci_bus *bus, *child; + + list_for_each_entry ( bus, &early_buses_list, next ) + { + child = __find_bus(bus, nr); + if ( child ) + return child; + } + return NULL; +} + +static __init struct early_pci_dev *find_dev(u8 nr, u8 devfn) +{ + struct early_pci_bus *bus = NULL; + + bus = find_bus(nr); + if ( bus ) { + struct early_pci_dev *dev = NULL; + + list_for_each_entry ( dev, &bus->devices, bus_list ) + if ( dev->devfn == devfn ) + return dev; + } + return NULL; +} + +static __init struct early_pci_bus *early_alloc_pci_bus(struct early_pci_dev *dev, u8 nr) +{ + struct early_pci_bus *bus; + + bus = xzalloc(struct early_pci_bus); + if ( !bus ) + return NULL; + + INIT_LIST_HEAD(&bus->next); + INIT_LIST_HEAD(&bus->devices); + INIT_LIST_HEAD(&bus->children); + bus->number = nr; + bus->old_number = nr; + bus->self = dev; + if ( dev ) + if ( !dev->bridge ) + dev->bridge = bus; + return bus; +} + +static void __init early_free_pci_bus(struct early_pci_bus *bus) +{ + struct early_pci_dev *dev, *d_tmp; + struct early_pci_bus *b, *b_tmp; + + list_for_each_entry_safe ( b, b_tmp, &bus->children, next ) + { + early_free_pci_bus (b); + list_del ( &b->next ); + } + list_for_each_entry_safe ( dev, d_tmp, &bus->devices, bus_list ) + { + list_del ( &dev->bus_list ); + xfree ( dev ); + } +} + +static void __init early_free_all(void) +{ + struct early_pci_bus *bus, *tmp; + + list_for_each_entry_safe( bus, tmp, &early_buses_list, next ) + { + early_free_pci_bus (bus); + list_del( &bus->next ); + xfree(bus); + } +} + +unsigned int __init pci_iov_scan(struct early_pci_bus *bus) +{ + struct early_pci_dev *dev; + unsigned int max = 0; + u8 busnr; + + list_for_each_entry ( dev, &bus->devices, bus_list ) + { + if ( !dev->is_sriov ) + continue; + if ( !dev->total_vfs ) + continue; + busnr = (dev->total_vfs) / 8; /* How many buses we will need */ + if ( busnr > max ) + max = busnr; + } + /* Do we have enough space for them ? */ + if ( (bus->end - bus->start) >= max ) + return 0; + return max; +} + +#ifdef DEBUG +static __init const char *spaces(unsigned int lvl) +{ + if (lvl == 0) + return " "; + if (lvl == 1) + return " +--+"; + if (lvl == 2) + return " +-+"; + if (lvl == 3) + return " +-+"; + return " +...+"; +} + +static void __init print_devs(struct early_pci_bus *parent, int lvl) +{ + struct early_pci_dev *dev; + struct early_pci_bus *bus; + + list_for_each_entry( dev, &parent->devices, bus_list ) + { + printk("%s%04x:%02x:%u [%04x:%04x] class %06x", spaces(lvl), parent->number, + PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), dev->vendor, + dev->device, dev->class); + if ( dev->is_bridge ) + { + printk(" BRIDGE"); + if ( dev->bridge ) + { + struct early_pci_bus *bridge = dev->bridge; + printk(" to BUS %x [spans %x->%x] primary BUS %x", bridge->number, bridge->start, bridge->end, bridge->primary); + printk(" (primary: %x spans %x->%x)", bridge->new_primary, bridge->new_start, bridge->new_end); + } + } + if ( dev->is_sriov ) + printk(" sriov: %d", dev->total_vfs); + if ( dev->is_ehci ) + printk (" EHCI DEBUG "); + if ( dev->is_serial ) + printk (" SERIAL "); + printk("\n"); + } + list_for_each_entry( bus, &parent->children, next ) + print_devs(bus, lvl + 1); +} +#endif + +static void __init print_devices(void) +{ +#ifdef DEBUG + struct early_pci_bus *bus; + + if ( !verbose ) + return; + + list_for_each_entry( bus, &early_buses_list, next ) + print_devs(bus, 0); +#endif +} + +unsigned int pci_scan_bus( struct early_pci_bus *bus); +unsigned int __init pci_scan_slot(struct early_pci_bus *bus, unsigned int devfn) +{ + struct early_pci_dev *dev; + + if ( find_dev(bus->number, devfn) ) + return 0; + + if ( !pci_device_detect (0, bus->number, PCI_SLOT(devfn), PCI_FUNC(devfn)) ) + return 0; + + dev = early_alloc_pci_dev(bus, devfn); + if ( !dev ) + return -ENODEV; + + list_add_tail(&dev->bus_list, &bus->devices); + return 0; +} + +static int __init pci_scan_bridge(struct early_pci_bus *bus, + struct early_pci_dev *dev, + unsigned int max) +{ + struct early_pci_bus *child; + u32 buses; + u8 primary, secondary, subordinate; + unsigned int cmax = 0; + + buses = pci_conf_read32(0, bus->number, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + PCI_PRIMARY_BUS); + + primary = buses & 0xFF; + secondary = (buses >> 8) & 0xFF; + subordinate = (buses >> 16) & 0xFF; + + if (!primary && (primary != bus->number) && secondary && subordinate) { + printk("Primary bus is hard wired to 0\n"); + primary = bus->number; + } + + child = find_bus(secondary); + if ( !child ) + { + child = early_alloc_pci_bus(dev, secondary); + if ( !child ) + goto out; + /* Add to the parent's bus list */ + list_add_tail(&child->next, &bus->children); + /* The primary is the upstream bus number. */ + child->primary = primary; + child->start = secondary; + child->end = subordinate; + child->parent = bus; + } + cmax = pci_scan_bus(child); + if ( cmax > max ) + max = cmax; + + if ( child->end > max ) + max = child->end; +out: + return max; +} + +unsigned int __init pci_scan_bus( struct early_pci_bus *bus) +{ + unsigned int max = 0, devfn; + struct early_pci_dev *dev; + + for ( devfn = 0; devfn < 0x100; devfn++ ) + pci_scan_slot (bus, devfn); + + /* Walk all devices and create the bus structs */ + list_for_each_entry ( dev, &bus->devices, bus_list ) + { + if ( !dev->is_bridge ) + continue; + if ( verbose ) + printk("Scanning bridge %04x:%02x.%u [%04x:%04x] class %06x\n", bus->number, + PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), dev->vendor, dev->device, + dev->class); + max = pci_scan_bridge(bus, dev, max); + } + if ( max > bus->end ) + bus->end = max; + return max; +} + +static __init unsigned int adjust_span(struct early_pci_bus *bus, + unsigned int offset) +{ + struct early_pci_bus *child = NULL; + unsigned int scan; + + bus->new_start = bus->start; + bus->new_end = bus->end; + /* We can't check against offset as the loop might have altered it. */ + /* N.B. Ignore host bridges. */ + if ( offset && bus->parent ) + bus->new_start += offset; + + scan = pci_iov_scan(bus); + offset += scan; + + list_for_each_entry( child, &bus->children, next ) + { + unsigned int new_offset; + + new_offset = adjust_span(child , offset); + if ( new_offset > offset ) + /* A new contender ! */ + offset = new_offset; + } + bus->new_end += offset; + return offset; +} + +static __init void adjust_primary(struct early_pci_bus *bus, + unsigned int offset) +{ + struct early_pci_bus *child; + + list_for_each_entry( child, &bus->children, next ) + { + child->new_primary = bus->new_start; + adjust_primary(child, offset); + + } +} + +static void __init pci_disable_forwarding(struct early_pci_bus *parent) +{ + struct early_pci_dev *dev; + u32 buses; + + list_for_each_entry ( dev, &parent->devices, bus_list ) + { + u8 bus; + u16 bctl; + + if ( !dev->is_bridge ) + continue; + + bus = dev->bus->number; + buses = pci_conf_read32(0, bus, PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn), PCI_PRIMARY_BUS); + if ( verbose ) + printk("%04x:%02x.%u PCI_PRIMARY_BUS read %x [%s]\n", bus, + PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), buses, __func__); + /* Lifted from Linux but not sure if this MasterAbort masking is + * still needed. */ + + bctl = pci_conf_read32(0, bus, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + PCI_BRIDGE_CONTROL); + + pci_conf_write32(0, bus, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + PCI_BRIDGE_CONTROL, bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT); + + if ( verbose ) + printk("%04x:%02x.%u clearing PCI_PRIMARY_BUS %x\n", bus, PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn), buses & ~0xffffff); + + /* Disable forwarding */ + pci_conf_write32(0, bus, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + PCI_PRIMARY_BUS, buses & ~0xffffff); + + pci_conf_write32(0, bus, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + PCI_BRIDGE_CONTROL, bctl); + } +} + +static void __init __pci_program_bridge(struct early_pci_dev *dev, u8 bus) +{ + u16 bctl; + u32 buses; + struct early_pci_bus *child, *bridges; + u8 primary, secondary, subordinate; + + child = dev->bridge; /* The bridge we are serving and don't use parent. */ + ASSERT( child ); + + buses = pci_conf_read32(0, bus, PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn), PCI_PRIMARY_BUS); + if ( verbose ) + printk("%04x:%02x.%u PCI_PRIMARY_BUS read %x [%s]\n", bus, + PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), buses, __func__); + + /* Lifted from Linux but not sure if this MasterAbort masking is + * still needed. */ + bctl = pci_conf_read32(0, bus, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + PCI_BRIDGE_CONTROL); + pci_conf_write32(0, bus, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + PCI_BRIDGE_CONTROL, bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT); + + pci_conf_write32(0, bus, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + PCI_STATUS, 0xffff); + + buses = (buses & 0xff000000) + | ((unsigned int)(child->new_primary) << 0) + | ((unsigned int)(child->new_start) << 8) + | ((unsigned int)(child->new_end) << 16); + if ( verbose ) + printk("%04x:%02x.%u wrote to PCI_PRIMARY_BUS %x\n", bus, PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn), buses); + + pci_conf_write32(0, bus, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + PCI_PRIMARY_BUS, buses); + + pci_conf_write32(0, bus, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + PCI_BRIDGE_CONTROL, bctl); + + /* Double check that it is correct. */ + buses = pci_conf_read32(0, bus, PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn), PCI_PRIMARY_BUS); + if ( verbose ) + printk("%04x:%02x.%u PCI_PRIMARY_BUS read %x\n", bus, + PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), buses); + + primary = buses & 0xFF; + secondary = (buses >> 8) & 0xFF; + subordinate = (buses >> 16) & 0xFF; + + ASSERT(primary == child->new_primary); + ASSERT(secondary == child->new_start); + ASSERT(subordinate == child->new_end); + + child->number = child->new_start; + child->primary = child->new_primary; + child->start = child->new_start; + child->end = child->new_end; + + pci_disable_forwarding( child ); /* Bridges below us */ + + list_for_each_entry ( bridges, &child->children, next ) + { + if ( bridges->self ) + __pci_program_bridge(bridges->self, child->number); + } +} + +static void __init pci_program_bridge(struct early_pci_bus *bus) +{ + struct early_pci_dev *dev; + + list_for_each_entry ( dev, &bus->devices, bus_list ) + { + if ( !dev->is_bridge ) + continue; + __pci_program_bridge(dev, bus->number); + } +} +static void __init update_console_devices(struct early_pci_bus *parent) +{ + struct early_pci_dev *dev; + struct early_pci_bus *bus; + + list_for_each_entry( dev, &parent->devices, bus_list ) + { + if ( dev->is_ehci || dev->is_serial || dev->is_bridge ) + { + ;/* TODO */ + } + } + list_for_each_entry( bus, &parent->children, next ) + update_console_devices(bus); +} + +void __init early_pci_reassign_busses(void) +{ + unsigned int nr; + struct early_pci_bus *bus; + unsigned int max = 0, adjust = 0, last_end; + + if ( !assign_busses ) + return; + + INIT_LIST_HEAD(&early_buses_list); + for ( nr = 0; nr < 256; nr++ ) + { + if ( !pci_device_detect (0, nr, 0, 0) ) + continue; + if ( find_bus(nr) ) + continue; + /* Host bridges do not have any parent devices ! */ + bus = early_alloc_pci_bus(NULL, nr); + if ( !bus ) + goto out; + bus->start = nr; + bus->primary = 0; /* Points to host, which is zero */ + max = pci_scan_bus(bus); + list_add_tail(&bus->next, &early_buses_list); + } + /* Walk all the devices, figure out what will be the _new_ + * max if any. */ + last_end = 0; + list_for_each_entry( bus, &early_buses_list, next ) + { + unsigned int offset; + /* Oh no, the previous end bus number overlaps! */ + if ( last_end > bus->start ) + { + bus->new_start = last_end; + bus->new_end = bus->new_end + last_end; + } + last_end = bus->end; + offset = adjust_span(bus, 0 /* no offset ! */); + if (offset > adjust) { + adjust = offset; + last_end = bus->new_end; + } + adjust_primary(bus, 0); + } + + print_devices(); + if ( !adjust ) + { + printk("No need to reassign busses.\n"); + goto out; + } + printk("Re-assigning busses to make space for %d bus numbers.\n", adjust); + + /* Walk all the bridges, disable forwarding */ + /* Walk all bridges, reprogram with max (so new primary, secondary and such. */ + list_for_each_entry( bus, &early_buses_list, next ) + { + pci_disable_forwarding(bus); + pci_program_bridge(bus); + } + /* Walk all devices, re-enable serial, ehci with new bus number */ + list_for_each_entry( bus, &early_buses_list, next ) + update_console_devices(bus); + + print_devices(); +out: + early_free_all(); +} + void __hwdom_init setup_hwdom_pci_devices( struct domain *d, int (*handler)(u8 devfn, struct pci_dev *)) { diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 6ed29dd..ad09cce 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -111,6 +111,7 @@ struct pci_dev *pci_lock_domain_pdev( void setup_hwdom_pci_devices(struct domain *, int (*)(u8 devfn, struct pci_dev *)); int pci_release_devices(struct domain *d); +void early_pci_reassign_busses(void); int pci_add_segment(u16 seg); const unsigned long *pci_get_ro_map(u16 seg); int pci_add_device(u16 seg, u8 bus, u8 devfn, -- 2.5.5 --9jxsPFA5p3P2qPhR Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0004-pci-assign-buses-Suspend-resume-the-console-device-a.patch" >>From fa86138d42b1976b61485c886f1ba280dd23c29d Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Fri, 21 Feb 2014 11:43:51 -0500 Subject: [PATCH 4/5] pci/assign-buses: Suspend/resume the console device and update bus (v2). When we suspend and resume the console devices we need the proper bus number. With us altering the bus numbers we need to update the bus numbers otherwise the console device might reprogram the wrong device. Signed-off-by: Konrad Rzeszutek Wilk --- xen/drivers/char/ehci-dbgp.c | 24 +++++++++++++++++++++++- xen/drivers/char/ns16550.c | 37 +++++++++++++++++++++++++++++++++++++ xen/drivers/char/serial.c | 17 +++++++++++++++++ xen/drivers/passthrough/pci.c | 17 ++++++++++++++++- xen/include/xen/serial.h | 7 +++++++ 5 files changed, 100 insertions(+), 2 deletions(-) diff --git a/xen/drivers/char/ehci-dbgp.c b/xen/drivers/char/ehci-dbgp.c index 3feeafe..3266307 100644 --- a/xen/drivers/char/ehci-dbgp.c +++ b/xen/drivers/char/ehci-dbgp.c @@ -1437,7 +1437,27 @@ static void ehci_dbgp_resume(struct serial_port *port) ehci_dbgp_setup_preirq(dbgp); ehci_dbgp_setup_postirq(dbgp); } +static int __init ehci_dbgp_is_owner(struct serial_port *port, u8 bus, u8 devfn) +{ + struct ehci_dbgp *dbgp = port->uart; + if ( dbgp->bus == bus && dbgp->slot == PCI_SLOT(devfn) && + dbgp->func == PCI_FUNC(devfn)) + return 1; + return -ENODEV; +} +static int __init ehci_dbgp_update_bus(struct serial_port *port, u8 old_bus, + u8 devfn, u8 new_bus) +{ + struct ehci_dbgp *dbgp; + + if ( ehci_dbgp_is_owner (port, old_bus, devfn) < 0 ) + return -ENODEV; + + dbgp = port->uart; + dbgp->bus = new_bus; + return 1; +} static struct uart_driver __read_mostly ehci_dbgp_driver = { .init_preirq = ehci_dbgp_init_preirq, .init_postirq = ehci_dbgp_init_postirq, @@ -1447,7 +1467,9 @@ static struct uart_driver __read_mostly ehci_dbgp_driver = { .tx_ready = ehci_dbgp_tx_ready, .putc = ehci_dbgp_putc, .flush = ehci_dbgp_flush, - .getc = ehci_dbgp_getc + .getc = ehci_dbgp_getc, + .is_owner = ehci_dbgp_is_owner, + .update_bus = ehci_dbgp_update_bus, }; static struct ehci_dbgp ehci_dbgp = { .state = dbgp_unsafe, .phys_port = 1 }; diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c index b2b5f56..51b71ee 100644 --- a/xen/drivers/char/ns16550.c +++ b/xen/drivers/char/ns16550.c @@ -821,7 +821,40 @@ static const struct vuart_info *ns16550_vuart_info(struct serial_port *port) return &uart->vuart; } #endif +#ifdef HAS_PCI +static int __init ns16550_is_owner(struct serial_port *port, u8 bus, u8 devfn) +{ + struct ns16550 *uart = port->uart; + + if ( uart->ps_bdf_enable ) + { + if ( (bus == uart->ps_bdf[0]) && (uart->ps_bdf[1] == PCI_SLOT(devfn)) && + (uart->ps_bdf[2] == PCI_FUNC(devfn)) ) + return 1; + } + if ( uart->pb_bdf_enable ) + { + if ( (bus == uart->pb_bdf[0]) && (uart->pb_bdf[1] == PCI_SLOT(devfn)) && + (uart->pb_bdf[2] == PCI_FUNC(devfn)) ) + return 1; + } + return -ENODEV; +} +static int __init ns16550_update_bus(struct serial_port *port, u8 old_bus, + u8 devfn, u8 new_bus) +{ + struct ns16550 *uart; + if ( ns16550_is_owner(port, old_bus, devfn ) < 0 ) + return -ENODEV; + uart = port->uart; + if ( uart->ps_bdf_enable ) + uart->ps_bdf[0]= new_bus; + if ( uart->pb_bdf_enable ) + uart->pb_bdf[0] = new_bus; + return 1; +} +#endif static struct uart_driver __read_mostly ns16550_driver = { .init_preirq = ns16550_init_preirq, .init_postirq = ns16550_init_postirq, @@ -835,6 +868,10 @@ static struct uart_driver __read_mostly ns16550_driver = { #ifdef CONFIG_ARM .vuart_info = ns16550_vuart_info, #endif +#ifdef HAS_PCI + .is_owner = ns16550_is_owner, + .update_bus = ns16550_update_bus, +#endif }; static int __init parse_parity_char(int c) diff --git a/xen/drivers/char/serial.c b/xen/drivers/char/serial.c index c583a48..f7b8178 100644 --- a/xen/drivers/char/serial.c +++ b/xen/drivers/char/serial.c @@ -543,6 +543,23 @@ const struct vuart_info *serial_vuart_info(int idx) return NULL; } +int __init serial_is_owner(u8 bus, u8 devfn) +{ + int i; + for ( i = 0; i < ARRAY_SIZE(com); i++ ) + if ( com[i].driver->is_owner ) + return com[i].driver->is_owner(&com[i], bus, devfn); + + return 0; +} +int __init serial_update_bus(u8 old_bus, u8 devfn, u8 new_bus) +{ + int i; + for ( i = 0; i < ARRAY_SIZE(com); i++ ) + if ( com[i].driver->update_bus ) + return com[i].driver->update_bus(&com[i], old_bus, devfn, new_bus); + return 0; +} void serial_suspend(void) { int i; diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 62b5f85..d7bdbd5 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1098,6 +1098,7 @@ static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg, void *arg } /* Move this to its own file */ +#include #define DEBUG 1 struct early_pci_bus; @@ -1661,7 +1662,14 @@ static void __init update_console_devices(struct early_pci_bus *parent) { if ( dev->is_ehci || dev->is_serial || dev->is_bridge ) { - ;/* TODO */ + int rc = 0; + if ( serial_is_owner(parent->old_number , dev->devfn ) < 0 ) + continue; + rc = serial_update_bus(parent->old_number, dev->devfn, parent->number); + if ( verbose ) + printk("%02x:%02x.%u bus %x -> %x, rc=%d\n", parent->number, + PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), + parent->old_number, parent->number, rc); } } list_for_each_entry( bus, &parent->children, next ) @@ -1722,6 +1730,10 @@ void __init early_pci_reassign_busses(void) } printk("Re-assigning busses to make space for %d bus numbers.\n", adjust); + /* Walk all the devices, disable serial and ehci */ + if ( !verbose) + serial_suspend(); + /* Walk all the bridges, disable forwarding */ /* Walk all bridges, reprogram with max (so new primary, secondary and such. */ list_for_each_entry( bus, &early_buses_list, next ) @@ -1733,6 +1745,9 @@ void __init early_pci_reassign_busses(void) list_for_each_entry( bus, &early_buses_list, next ) update_console_devices(bus); + if ( !verbose ) + serial_resume(); + print_devices(); out: early_free_all(); diff --git a/xen/include/xen/serial.h b/xen/include/xen/serial.h index 1212a12..2ba9da7 100644 --- a/xen/include/xen/serial.h +++ b/xen/include/xen/serial.h @@ -87,6 +87,10 @@ struct uart_driver { void (*stop_tx)(struct serial_port *); /* Get serial information */ const struct vuart_info *(*vuart_info)(struct serial_port *); + /* Check if the BDF matches this device */ + int (*is_owner)(struct serial_port *, u8 , u8); + /* Update its BDF due to bus number changing. devfn still same. */ + int (*update_bus)(struct serial_port *, u8, u8, u8); }; /* 'Serial handles' are composed from the following fields. */ @@ -140,6 +144,9 @@ int serial_irq(int idx); /* Retrieve basic UART information to emulate it (base address, size...) */ const struct vuart_info* serial_vuart_info(int idx); +int serial_is_owner(u8 bus, u8 devfn); +int serial_update_bus(u8 old_bus, u8 devfn, u8 bus); + /* Serial suspend/resume. */ void serial_suspend(void); void serial_resume(void); -- 2.5.5 --9jxsPFA5p3P2qPhR Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0005-pci-assign-busses-Add-Mellenox.patch" >>From 867032964e63165019487112b6317c6e75437bee Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Thu, 24 Apr 2014 20:29:50 -0400 Subject: [PATCH 5/5] pci/assign-busses: Add Mellenox --- xen/drivers/passthrough/pci.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index d7bdbd5..455aed5 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1217,7 +1217,20 @@ static __init struct early_pci_dev *early_alloc_pci_dev(struct early_pci_bus *bu break; dev->is_sriov = 1; dev->total_vfs = 8; + break; + } + break; + case 0x15b3: + switch ( dev->device ) + { + case 0x673c: /* InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] */ + if ( dev->is_sriov ) + break; + dev->is_sriov = 1; + dev->total_vfs = 64; + break; } + break; default: break; } -- 2.5.5 --9jxsPFA5p3P2qPhR Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --9jxsPFA5p3P2qPhR--