All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
@ 2012-04-11  0:10 Jiang Liu
  2012-04-11  0:10 ` [PATCH V4 1/6] PCI, x86: split out pci_mmcfg_check_reserved() for code reuse Jiang Liu
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Jiang Liu @ 2012-04-11  0:10 UTC (permalink / raw)
  To: Yinghai Lu, Kenji Kaneshige, Taku Izumi
  Cc: Jiang Liu, Bjorn Helgaas, Jiang Liu, Keping Chen, linux-pci

This patchset enhance pci_root driver to update MMCFG information when
hot-plugging PCI root bridges. It applies to Yinghai's tree at
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug

The second patch is based on Taku Izumi work with some enhancements to
correctly handle PCI host bridges without _CBA method.

-v2: split into smaller patches and skip updating MMCFG information when
     MMCFG is disabled
-v3: add mmconf_added to simply free path, also make pci_mmconfig_insert()
     to process extra exist case --- By Yinghai
-v4: tune arch_acpi_pci_root_add() to handle a corner case raised by Kenji

Jiang Liu (6):
  PCI, x86: split out pci_mmcfg_check_reserved() for code reuse
  PCI, x86: split out pci_mmconfig_alloc() for code reuse
  PCI, x86: use RCU list to protect mmconfig list
  PCI, x86: introduce pci_mmcfg_arch_map()/pci_mmcfg_arch_unmap()
  PCI, x86: introduce pci_mmconfig_insert()/delete() for PCI root
    bridge hotplug
  PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host
    bridges

 arch/x86/include/asm/pci_x86.h |    5 +
 arch/x86/pci/acpi.c            |   58 +++++++++++++
 arch/x86/pci/mmconfig-shared.c |  185 +++++++++++++++++++++++++++++++---------
 arch/x86/pci/mmconfig_32.c     |   30 ++++++-
 arch/x86/pci/mmconfig_64.c     |   37 +++++++-
 drivers/acpi/pci_root.c        |   16 ++++
 include/acpi/acnames.h         |    1 +
 include/acpi/acpi_bus.h        |    1 +
 include/linux/pci-acpi.h       |    3 +
 9 files changed, 287 insertions(+), 49 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH V4 1/6] PCI, x86: split out pci_mmcfg_check_reserved() for code reuse
  2012-04-11  0:10 [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
@ 2012-04-11  0:10 ` Jiang Liu
  2012-04-11  0:10 ` [PATCH V4 2/6] PCI, x86: split out pci_mmconfig_alloc() " Jiang Liu
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jiang Liu @ 2012-04-11  0:10 UTC (permalink / raw)
  To: Yinghai Lu, Kenji Kaneshige, Taku Izumi
  Cc: Jiang Liu, Bjorn Helgaas, Jiang Liu, Keping Chen, linux-pci

Split out pci_mmcfg_check_reserved() for code reuse, which will be used
when supporting PCI host bridge hotplug.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 arch/x86/pci/mmconfig-shared.c |   51 +++++++++++++++++++--------------------
 1 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 301e325..f799949 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -474,39 +474,38 @@ static int __init is_mmconf_reserved(check_reserved_t is_reserved,
 	return valid;
 }
 
+static int __devinit pci_mmcfg_check_reserved(struct pci_mmcfg_region *cfg,
+					      int early)
+{
+	if (!early && !acpi_disabled) {
+		if (is_mmconf_reserved(is_acpi_reserved, cfg, 0))
+			return 1;
+		else
+			printk(KERN_ERR FW_BUG PREFIX
+			       "MMCONFIG at %pR not reserved in "
+			       "ACPI motherboard resources\n",
+			       &cfg->res);
+	}
+
+	/* Don't try to do this check unless configuration
+	   type 1 is available. how about type 2 ?*/
+	if (raw_pci_ops)
+		return is_mmconf_reserved(e820_all_mapped, cfg, 1);
+
+	return 0;
+}
+
 static void __init pci_mmcfg_reject_broken(int early)
 {
 	struct pci_mmcfg_region *cfg;
 
 	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
-		int valid = 0;
-
-		if (!early && !acpi_disabled) {
-			valid = is_mmconf_reserved(is_acpi_reserved, cfg, 0);
-
-			if (valid)
-				continue;
-			else
-				printk(KERN_ERR FW_BUG PREFIX
-				       "MMCONFIG at %pR not reserved in "
-				       "ACPI motherboard resources\n",
-				       &cfg->res);
+		if (pci_mmcfg_check_reserved(cfg, early) == 0) {
+			printk(KERN_INFO PREFIX "not using MMCONFIG\n");
+			free_all_mmcfg();
+			return;
 		}
-
-		/* Don't try to do this check unless configuration
-		   type 1 is available. how about type 2 ?*/
-		if (raw_pci_ops)
-			valid = is_mmconf_reserved(e820_all_mapped, cfg, 1);
-
-		if (!valid)
-			goto reject;
 	}
-
-	return;
-
-reject:
-	printk(KERN_INFO PREFIX "not using MMCONFIG\n");
-	free_all_mmcfg();
 }
 
 static int __initdata known_bridge;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 2/6] PCI, x86: split out pci_mmconfig_alloc() for code reuse
  2012-04-11  0:10 [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
  2012-04-11  0:10 ` [PATCH V4 1/6] PCI, x86: split out pci_mmcfg_check_reserved() for code reuse Jiang Liu
@ 2012-04-11  0:10 ` Jiang Liu
  2012-04-11  0:11 ` [PATCH V4 3/6] PCI, x86: use RCU list to protect mmconfig list Jiang Liu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jiang Liu @ 2012-04-11  0:10 UTC (permalink / raw)
  To: Yinghai Lu, Kenji Kaneshige, Taku Izumi
  Cc: Jiang Liu, Bjorn Helgaas, Jiang Liu, Keping Chen, linux-pci

Split out pci_mmconfig_alloc() for code reuse, which will be used
when supporting PCI root bridge hotplug.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 arch/x86/pci/mmconfig-shared.c |   19 +++++++++++++++----
 1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index f799949..5e2cd2a 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -61,8 +61,9 @@ static __init void list_add_sorted(struct pci_mmcfg_region *new)
 	list_add_tail(&new->list, &pci_mmcfg_list);
 }
 
-static __init struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
-							int end, u64 addr)
+static __devinit struct pci_mmcfg_region *pci_mmconfig_alloc(int segment,
+							     int start,
+							     int end, u64 addr)
 {
 	struct pci_mmcfg_region *new;
 	struct resource *res;
@@ -79,8 +80,6 @@ static __init struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
 	new->start_bus = start;
 	new->end_bus = end;
 
-	list_add_sorted(new);
-
 	res = &new->res;
 	res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
 	res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1;
@@ -96,6 +95,18 @@ static __init struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
 	return new;
 }
 
+static __init struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+							int end, u64 addr)
+{
+	struct pci_mmcfg_region *new;
+
+	new = pci_mmconfig_alloc(segment, start, end, addr);
+	if (new)
+		list_add_sorted(new);
+
+	return new;
+}
+
 struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
 {
 	struct pci_mmcfg_region *cfg;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 3/6] PCI, x86: use RCU list to protect mmconfig list
  2012-04-11  0:10 [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
  2012-04-11  0:10 ` [PATCH V4 1/6] PCI, x86: split out pci_mmcfg_check_reserved() for code reuse Jiang Liu
  2012-04-11  0:10 ` [PATCH V4 2/6] PCI, x86: split out pci_mmconfig_alloc() " Jiang Liu
@ 2012-04-11  0:11 ` Jiang Liu
  2012-04-11  0:11 ` [PATCH V4 4/6] PCI, x86: introduce pci_mmcfg_arch_map()/pci_mmcfg_arch_unmap() Jiang Liu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jiang Liu @ 2012-04-11  0:11 UTC (permalink / raw)
  To: Yinghai Lu, Kenji Kaneshige, Taku Izumi
  Cc: Jiang Liu, Bjorn Helgaas, Jiang Liu, Keping Chen, linux-pci

Use RCU list to protect mmconfig list from dynamic change
when supporting PCI host bridge hotplug.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 arch/x86/pci/mmconfig-shared.c |   11 ++++++-----
 arch/x86/pci/mmconfig_32.c     |   13 +++++++++++--
 arch/x86/pci/mmconfig_64.c     |   13 +++++++++++--
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 5e2cd2a..3bcc361 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -17,6 +17,7 @@
 #include <linux/bitmap.h>
 #include <linux/dmi.h>
 #include <linux/slab.h>
+#include <linux/rculist.h>
 #include <asm/e820.h>
 #include <asm/pci_x86.h>
 #include <asm/acpi.h>
@@ -45,20 +46,20 @@ static __init void free_all_mmcfg(void)
 		pci_mmconfig_remove(cfg);
 }
 
-static __init void list_add_sorted(struct pci_mmcfg_region *new)
+static __devinit void list_add_sorted(struct pci_mmcfg_region *new)
 {
 	struct pci_mmcfg_region *cfg;
 
 	/* keep list sorted by segment and starting bus number */
-	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
+	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
 		if (cfg->segment > new->segment ||
 		    (cfg->segment == new->segment &&
 		     cfg->start_bus >= new->start_bus)) {
-			list_add_tail(&new->list, &cfg->list);
+			list_add_tail_rcu(&new->list, &cfg->list);
 			return;
 		}
 	}
-	list_add_tail(&new->list, &pci_mmcfg_list);
+	list_add_tail_rcu(&new->list, &pci_mmcfg_list);
 }
 
 static __devinit struct pci_mmcfg_region *pci_mmconfig_alloc(int segment,
@@ -111,7 +112,7 @@ struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
 {
 	struct pci_mmcfg_region *cfg;
 
-	list_for_each_entry(cfg, &pci_mmcfg_list, list)
+	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
 		if (cfg->segment == segment &&
 		    cfg->start_bus <= bus && bus <= cfg->end_bus)
 			return cfg;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 5372e86..5dad04a 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -11,6 +11,7 @@
 
 #include <linux/pci.h>
 #include <linux/init.h>
+#include <linux/rcupdate.h>
 #include <asm/e820.h>
 #include <asm/pci_x86.h>
 #include <acpi/acpi.h>
@@ -60,9 +61,12 @@ err:		*value = -1;
 		return -EINVAL;
 	}
 
+	rcu_read_lock();
 	base = get_base_addr(seg, bus, devfn);
-	if (!base)
+	if (!base) {
+		rcu_read_unlock();
 		goto err;
+	}
 
 	raw_spin_lock_irqsave(&pci_config_lock, flags);
 
@@ -80,6 +84,7 @@ err:		*value = -1;
 		break;
 	}
 	raw_spin_unlock_irqrestore(&pci_config_lock, flags);
+	rcu_read_unlock();
 
 	return 0;
 }
@@ -93,9 +98,12 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
 	if ((bus > 255) || (devfn > 255) || (reg > 4095))
 		return -EINVAL;
 
+	rcu_read_lock();
 	base = get_base_addr(seg, bus, devfn);
-	if (!base)
+	if (!base) {
+		rcu_read_unlock();
 		return -EINVAL;
+	}
 
 	raw_spin_lock_irqsave(&pci_config_lock, flags);
 
@@ -113,6 +121,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
 		break;
 	}
 	raw_spin_unlock_irqrestore(&pci_config_lock, flags);
+	rcu_read_unlock();
 
 	return 0;
 }
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index 915a493..acc48c5 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -9,6 +9,7 @@
 #include <linux/init.h>
 #include <linux/acpi.h>
 #include <linux/bitmap.h>
+#include <linux/rcupdate.h>
 #include <asm/e820.h>
 #include <asm/pci_x86.h>
 
@@ -34,9 +35,12 @@ err:		*value = -1;
 		return -EINVAL;
 	}
 
+	rcu_read_lock();
 	addr = pci_dev_base(seg, bus, devfn);
-	if (!addr)
+	if (!addr) {
+		rcu_read_unlock();
 		goto err;
+	}
 
 	switch (len) {
 	case 1:
@@ -49,6 +53,7 @@ err:		*value = -1;
 		*value = mmio_config_readl(addr + reg);
 		break;
 	}
+	rcu_read_unlock();
 
 	return 0;
 }
@@ -62,9 +67,12 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
 	if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095)))
 		return -EINVAL;
 
+	rcu_read_lock();
 	addr = pci_dev_base(seg, bus, devfn);
-	if (!addr)
+	if (!addr) {
+		rcu_read_unlock();
 		return -EINVAL;
+	}
 
 	switch (len) {
 	case 1:
@@ -77,6 +85,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
 		mmio_config_writel(addr + reg, value);
 		break;
 	}
+	rcu_read_unlock();
 
 	return 0;
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 4/6] PCI, x86: introduce pci_mmcfg_arch_map()/pci_mmcfg_arch_unmap()
  2012-04-11  0:10 [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
                   ` (2 preceding siblings ...)
  2012-04-11  0:11 ` [PATCH V4 3/6] PCI, x86: use RCU list to protect mmconfig list Jiang Liu
@ 2012-04-11  0:11 ` Jiang Liu
  2012-04-11  0:11 ` [PATCH V4 5/6] PCI, x86: introduce pci_mmconfig_insert()/delete() for PCI root bridge hotplug Jiang Liu
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jiang Liu @ 2012-04-11  0:11 UTC (permalink / raw)
  To: Yinghai Lu, Kenji Kaneshige, Taku Izumi
  Cc: Jiang Liu, Bjorn Helgaas, Jiang Liu, Keping Chen, linux-pci

Introduce pci_mmcfg_arch_map()/pci_mmcfg_arch_unmap(), which will be used
when supporting PCI root bridge hotplug.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 arch/x86/include/asm/pci_x86.h |    2 ++
 arch/x86/pci/mmconfig_32.c     |   15 +++++++++++++++
 arch/x86/pci/mmconfig_64.c     |   22 +++++++++++++++++++++-
 3 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index b3a5317..df898ce 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -135,6 +135,8 @@ struct pci_mmcfg_region {
 
 extern int __init pci_mmcfg_arch_init(void);
 extern void __init pci_mmcfg_arch_free(void);
+extern int __devinit pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg);
+extern void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg);
 extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
 
 extern struct list_head pci_mmcfg_list;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 5dad04a..a22785d 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -141,3 +141,18 @@ int __init pci_mmcfg_arch_init(void)
 void __init pci_mmcfg_arch_free(void)
 {
 }
+
+int __devinit pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg)
+{
+	return 0;
+}
+
+void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg)
+{
+	unsigned long flags;
+
+	/* Invalidate the cached mmcfg map entry. */
+	raw_spin_lock_irqsave(&pci_config_lock, flags);
+	mmcfg_last_accessed_device = 0;
+	raw_spin_unlock_irqrestore(&pci_config_lock, flags);
+}
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index acc48c5..4e05779 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -95,7 +95,7 @@ static const struct pci_raw_ops pci_mmcfg = {
 	.write =	pci_mmcfg_write,
 };
 
-static void __iomem * __init mcfg_ioremap(struct pci_mmcfg_region *cfg)
+static void __iomem * __devinit mcfg_ioremap(struct pci_mmcfg_region *cfg)
 {
 	void __iomem *addr;
 	u64 start, size;
@@ -138,3 +138,23 @@ void __init pci_mmcfg_arch_free(void)
 		}
 	}
 }
+
+int __devinit pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg)
+{
+	cfg->virt = mcfg_ioremap(cfg);
+	if (!cfg->virt) {
+		printk(KERN_ERR PREFIX "can't map MMCONFIG at %pR\n",
+		       &cfg->res);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg)
+{
+	if (cfg && cfg->virt) {
+		iounmap(cfg->virt + PCI_MMCFG_BUS_OFFSET(cfg->start_bus));
+		cfg->virt = NULL;
+	}
+}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 5/6] PCI, x86: introduce pci_mmconfig_insert()/delete() for PCI root bridge hotplug
  2012-04-11  0:10 [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
                   ` (3 preceding siblings ...)
  2012-04-11  0:11 ` [PATCH V4 4/6] PCI, x86: introduce pci_mmcfg_arch_map()/pci_mmcfg_arch_unmap() Jiang Liu
@ 2012-04-11  0:11 ` Jiang Liu
  2012-04-11  0:11 ` [PATCH V4 6/6] PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
  2012-04-11  4:02 ` [PATCH V4 0/6] PCI, " Bjorn Helgaas
  6 siblings, 0 replies; 26+ messages in thread
From: Jiang Liu @ 2012-04-11  0:11 UTC (permalink / raw)
  To: Yinghai Lu, Kenji Kaneshige, Taku Izumi
  Cc: Jiang Liu, Bjorn Helgaas, Jiang Liu, Keping Chen, linux-pci

Introduce pci_mmconfig_insert()/pci_mmconfig_delete(), which will be used to
update MMCFG information when supporting PCI root bridge hotplug.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 arch/x86/include/asm/pci_x86.h |    2 +
 arch/x86/pci/mmconfig-shared.c |  104 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 99 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index df898ce..3252c97 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -137,6 +137,8 @@ extern int __init pci_mmcfg_arch_init(void);
 extern void __init pci_mmcfg_arch_free(void);
 extern int __devinit pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg);
 extern void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg);
+extern int __devinit pci_mmconfig_insert(int seg, int start, int end, u64 addr);
+extern int pci_mmconfig_delete(int seg, int start, int end);
 extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
 
 extern struct list_head pci_mmcfg_list;
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 3bcc361..ee1eaab 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -17,6 +17,7 @@
 #include <linux/bitmap.h>
 #include <linux/dmi.h>
 #include <linux/slab.h>
+#include <linux/mutex.h>
 #include <linux/rculist.h>
 #include <asm/e820.h>
 #include <asm/pci_x86.h>
@@ -376,8 +377,8 @@ static void __init pci_mmcfg_insert_resources(void)
 	pci_mmcfg_resources_inserted = 1;
 }
 
-static acpi_status __init check_mcfg_resource(struct acpi_resource *res,
-					      void *data)
+static acpi_status __devinit check_mcfg_resource(struct acpi_resource *res,
+						 void *data)
 {
 	struct resource *mcfg_res = data;
 	struct acpi_resource_address64 address;
@@ -413,8 +414,8 @@ static acpi_status __init check_mcfg_resource(struct acpi_resource *res,
 	return AE_OK;
 }
 
-static acpi_status __init find_mboard_resource(acpi_handle handle, u32 lvl,
-		void *context, void **rv)
+static acpi_status __devinit find_mboard_resource(acpi_handle handle, u32 lvl,
+						  void *context, void **rv)
 {
 	struct resource *mcfg_res = context;
 
@@ -427,7 +428,7 @@ static acpi_status __init find_mboard_resource(acpi_handle handle, u32 lvl,
 	return AE_OK;
 }
 
-static int __init is_acpi_reserved(u64 start, u64 end, unsigned not_used)
+static int __devinit is_acpi_reserved(u64 start, u64 end, unsigned not_used)
 {
 	struct resource mcfg_res;
 
@@ -446,8 +447,9 @@ static int __init is_acpi_reserved(u64 start, u64 end, unsigned not_used)
 
 typedef int (*check_reserved_t)(u64 start, u64 end, unsigned type);
 
-static int __init is_mmconf_reserved(check_reserved_t is_reserved,
-				    struct pci_mmcfg_region *cfg, int with_e820)
+static int __devinit is_mmconf_reserved(check_reserved_t is_reserved,
+					struct pci_mmcfg_region *cfg,
+					int with_e820)
 {
 	u64 addr = cfg->res.start;
 	u64 size = resource_size(&cfg->res);
@@ -676,3 +678,91 @@ static int __init pci_mmcfg_late_insert_resources(void)
  * with other system resources.
  */
 late_initcall(pci_mmcfg_late_insert_resources);
+
+static DEFINE_MUTEX(pci_mmcfg_lock);
+
+/* Add MMCFG information for hot-added host bridges at runtime */
+int __devinit pci_mmconfig_insert(int segment, int start, int end, u64 addr)
+{
+	int rc = -EINVAL;
+	struct pci_mmcfg_region *cfg = NULL;
+
+	if (segment < 0 || segment > USHRT_MAX ||
+	    start < 0 || start > 255 || end < start || end > 255)
+		return rc;
+
+	mutex_lock(&pci_mmcfg_lock);
+	cfg = pci_mmconfig_lookup(segment, start);
+	if (cfg) {
+		if (cfg->start_bus <= start && cfg->end_bus >= end) {
+			rc = -EEXIST;
+		} else {
+			printk(KERN_WARNING PREFIX
+			       "MMCONFIG for domain %04x [bus %02x-%02x] "
+			       "conflicts with domain %04x [bus %02x-%02x]\n",
+			       segment, start, end,
+			       cfg->segment, cfg->start_bus, cfg->end_bus);
+		}
+		goto out;
+	}
+	if (!addr)
+		goto out;
+
+	cfg = pci_mmconfig_alloc(segment, start, end, addr);
+	if (cfg == NULL) {
+		rc = -ENOMEM;
+	} else if (!pci_mmcfg_check_reserved(cfg, 0)) {
+		printk(KERN_WARNING PREFIX
+		       "MMCONFIG for domain %04x [bus %02x-%02x] "
+		       "isn't reserved\n", segment, start, end);
+	} else if (insert_resource(&iomem_resource, &cfg->res)) {
+		rc = -EBUSY;
+		printk(KERN_WARNING PREFIX
+		       "failed to insert resource for domain "
+		       "%04x [bus %02x-%02x]\n", segment, start, end);
+	} else if (pci_mmcfg_arch_map(cfg)) {
+		rc = -EBUSY;
+		printk(KERN_WARNING PREFIX
+		       "failed to map resource for domain "
+		       "%04x [bus %02x-%02x]\n", segment, start, end);
+	} else {
+		list_add_sorted(cfg);
+		cfg = NULL;
+		rc = 0;
+	}
+
+	if (cfg) {
+		if (cfg->res.parent)
+			release_resource(&cfg->res);
+		kfree(cfg);
+	}
+
+out:
+	mutex_unlock(&pci_mmcfg_lock);
+
+	return rc;
+}
+
+/* Delete MMCFG information at runtime */
+int pci_mmconfig_delete(int segment, int start, int end)
+{
+	struct pci_mmcfg_region *cfg;
+
+	mutex_lock(&pci_mmcfg_lock);
+	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
+		if (cfg->segment == segment && cfg->start_bus == start &&
+		    cfg->end_bus == end) {
+			list_del_rcu(&cfg->list);
+			synchronize_rcu();
+			pci_mmcfg_arch_unmap(cfg);
+			if (cfg->res.parent)
+				release_resource(&cfg->res);
+			mutex_unlock(&pci_mmcfg_lock);
+			kfree(cfg);
+			return 0;
+		}
+	}
+	mutex_unlock(&pci_mmcfg_lock);
+
+	return -ENOENT;
+}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 6/6] PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-11  0:10 [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
                   ` (4 preceding siblings ...)
  2012-04-11  0:11 ` [PATCH V4 5/6] PCI, x86: introduce pci_mmconfig_insert()/delete() for PCI root bridge hotplug Jiang Liu
@ 2012-04-11  0:11 ` Jiang Liu
  2012-04-18  6:47   ` Taku Izumi
  2012-04-11  4:02 ` [PATCH V4 0/6] PCI, " Bjorn Helgaas
  6 siblings, 1 reply; 26+ messages in thread
From: Jiang Liu @ 2012-04-11  0:11 UTC (permalink / raw)
  To: Yinghai Lu, Kenji Kaneshige, Taku Izumi
  Cc: Jiang Liu, Bjorn Helgaas, Jiang Liu, Keping Chen, linux-pci

This patch enhances pci_root driver to update MMCFG information when
hot-plugging PCI root bridges on x86 platforms.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 arch/x86/include/asm/pci_x86.h |    1 +
 arch/x86/pci/acpi.c            |   58 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/pci/mmconfig_32.c     |    2 +-
 arch/x86/pci/mmconfig_64.c     |    2 +-
 drivers/acpi/pci_root.c        |   16 +++++++++++
 include/acpi/acnames.h         |    1 +
 include/acpi/acpi_bus.h        |    1 +
 include/linux/pci-acpi.h       |    3 ++
 8 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index 3252c97..5e1d9a6 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -100,6 +100,7 @@ struct pci_raw_ops {
 extern const struct pci_raw_ops *raw_pci_ops;
 extern const struct pci_raw_ops *raw_pci_ext_ops;
 
+extern const struct pci_raw_ops pci_mmcfg;
 extern const struct pci_raw_ops pci_direct_conf1;
 extern bool port_cf9_safe;
 
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index da0149d..729d694 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -4,6 +4,7 @@
 #include <linux/irq.h>
 #include <linux/dmi.h>
 #include <linux/slab.h>
+#include <linux/pci-acpi.h>
 #include <asm/numa.h>
 #include <asm/pci_x86.h>
 
@@ -488,6 +489,63 @@ struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root *root)
 	return bus;
 }
 
+int __devinit arch_acpi_pci_root_add(struct acpi_pci_root *root)
+{
+	int result = 0;
+	acpi_status status;
+	unsigned long long base_addr;
+
+	/* MMCFG not in use */
+	if (raw_pci_ext_ops && raw_pci_ext_ops != &pci_mmcfg)
+		return result;
+
+	status = acpi_evaluate_integer(root->device->handle, METHOD_NAME__CBA,
+				       NULL, &base_addr);
+	if (ACPI_FAILURE(status))
+		base_addr = 0;
+
+	/*
+	 * Try to insert MMCFG information for host bridge
+	 */
+	result = pci_mmconfig_insert(root->segment, root->secondary.start,
+				     root->secondary.end, base_addr);
+	if (result == -EEXIST)
+		result = 0;
+	else if (!result) {
+		root->mmconf_added = true;
+
+		/* Try to enable MMCFG if it hasn't been enabled yet */
+		if (raw_pci_ext_ops == NULL) {
+			if (pci_probe & PCI_PROBE_MMCONF)
+				raw_pci_ext_ops = &pci_mmcfg;
+			else {
+				arch_acpi_pci_root_remove(root);
+				result = -ENODEV;
+			}
+		}
+	}
+	if (result)
+		printk(KERN_ERR
+			"can't add MMCFG information for Bus %04x:%02x\n",
+			root->segment, (unsigned int)root->secondary.start);
+
+	/* still could use conf1 with it */
+	if (!root->segment)
+		result = 0;
+
+	return result;
+}
+
+void arch_acpi_pci_root_remove(struct acpi_pci_root *root)
+{
+	if (root->mmconf_added) {
+		pci_mmconfig_delete(root->segment,
+				    root->secondary.start,
+				    root->secondary.end);
+		root->mmconf_added = false;
+	}
+}
+
 int __init pci_acpi_init(void)
 {
 	struct pci_dev *dev = NULL;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index a22785d..db63ac2 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -126,7 +126,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
 	return 0;
 }
 
-static const struct pci_raw_ops pci_mmcfg = {
+const struct pci_raw_ops pci_mmcfg = {
 	.read =		pci_mmcfg_read,
 	.write =	pci_mmcfg_write,
 };
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index 4e05779..34c08dd 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -90,7 +90,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
 	return 0;
 }
 
-static const struct pci_raw_ops pci_mmcfg = {
+const struct pci_raw_ops pci_mmcfg = {
 	.read =		pci_mmcfg_read,
 	.write =	pci_mmcfg_write,
 };
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 4a7d575..b7c0b53 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -448,6 +448,15 @@ out:
 }
 EXPORT_SYMBOL(acpi_pci_osc_control_set);
 
+int __weak arch_acpi_pci_root_add(struct acpi_pci_root *root)
+{
+	return 0;
+}
+
+void __weak arch_acpi_pci_root_remove(struct acpi_pci_root *root)
+{
+}
+
 static int __devinit acpi_pci_root_add(struct acpi_device *device)
 {
 	unsigned long long segment, bus;
@@ -504,6 +513,11 @@ static int __devinit acpi_pci_root_add(struct acpi_device *device)
 	strcpy(acpi_device_class(device), ACPI_PCI_ROOT_CLASS);
 	device->driver_data = root;
 
+	if (arch_acpi_pci_root_add(root)) {
+		result = -ENODEV;
+		goto out_free;
+	}
+
 	/*
 	 * All supported architectures that use ACPI have support for
 	 * PCI domains, so we indicate this in _OSC support capabilities.
@@ -629,6 +643,7 @@ out_del_root:
 	list_del_rcu(&root->node);
 	mutex_unlock(&acpi_pci_root_lock);
 	synchronize_rcu();
+	arch_acpi_pci_root_remove(root);
 out_free:
 	kfree(root);
 	return result;
@@ -679,6 +694,7 @@ out:
 	list_del_rcu(&root->node);
 	mutex_unlock(&acpi_pci_root_lock);
 	synchronize_rcu();
+	arch_acpi_pci_root_remove(root);
 	kfree(root);
 
 	return 0;
diff --git a/include/acpi/acnames.h b/include/acpi/acnames.h
index 38f5088..99bda75 100644
--- a/include/acpi/acnames.h
+++ b/include/acpi/acnames.h
@@ -62,6 +62,7 @@
 #define METHOD_NAME__AEI        "_AEI"
 #define METHOD_NAME__PRW        "_PRW"
 #define METHOD_NAME__SRS        "_SRS"
+#define METHOD_NAME__CBA	"_CBA"
 
 /* Method names - these methods must appear at the namespace root */
 
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index f1c8ca6..cb4f402 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -370,6 +370,7 @@ struct acpi_pci_root {
 
 	u32 osc_support_set;	/* _OSC state of support bits */
 	u32 osc_control_set;	/* _OSC state of control bits */
+	bool mmconf_added;
 };
 
 /* helper */
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index ac93634..7745b8a 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -18,6 +18,9 @@ extern acpi_status pci_acpi_add_pm_notifier(struct acpi_device *dev,
 					     struct pci_dev *pci_dev);
 extern acpi_status pci_acpi_remove_pm_notifier(struct acpi_device *dev);
 
+int arch_acpi_pci_root_add(struct acpi_pci_root *root);
+void arch_acpi_pci_root_remove(struct acpi_pci_root *root);
+
 static inline acpi_handle acpi_find_root_bridge_handle(struct pci_dev *pdev)
 {
 	struct pci_bus *pbus = pdev->bus;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-11  0:10 [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
                   ` (5 preceding siblings ...)
  2012-04-11  0:11 ` [PATCH V4 6/6] PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
@ 2012-04-11  4:02 ` Bjorn Helgaas
  2012-04-11 12:05   ` Kenji Kaneshige
  6 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2012-04-11  4:02 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Yinghai Lu, Kenji Kaneshige, Taku Izumi, Jiang Liu, Keping Chen,
	linux-pci

On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu <liuj97@gmail.com> wrote:
> This patchset enhance pci_root driver to update MMCFG information when
> hot-plugging PCI root bridges. It applies to Yinghai's tree at
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug
>
> The second patch is based on Taku Izumi work with some enhancements to
> correctly handle PCI host bridges without _CBA method.

I'm sorry I won't have time to really review these for a couple weeks.

It always seemed wrong to me that we parse MCFG and set things up
before we even look at PNP0A03/PNP0A08 devices.  It would make more
sense to me to have something in acpi_pci_root_add() to set up
MMCONFIG using _CBA if available, and falling back to parsing MCFG if
it's not.

Bjorn

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-11  4:02 ` [PATCH V4 0/6] PCI, " Bjorn Helgaas
@ 2012-04-11 12:05   ` Kenji Kaneshige
  2012-04-11 15:34     ` Jiang Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Kenji Kaneshige @ 2012-04-11 12:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jiang Liu, Yinghai Lu, Taku Izumi, Jiang Liu, Keping Chen, linux-pci

(2012/04/11 13:02), Bjorn Helgaas wrote:
> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>  wrote:
>> This patchset enhance pci_root driver to update MMCFG information when
>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug
>>
>> The second patch is based on Taku Izumi work with some enhancements to
>> correctly handle PCI host bridges without _CBA method.
>
> I'm sorry I won't have time to really review these for a couple weeks.
>
> It always seemed wrong to me that we parse MCFG and set things up
> before we even look at PNP0A03/PNP0A08 devices.  It would make more
> sense to me to have something in acpi_pci_root_add() to set up
> MMCONFIG using _CBA if available, and falling back to parsing MCFG if
> it's not.

I think your idea could make the code (design) much cleaner.
Do you have any other reason why you think "It always seemed
wrong..."?

Regards,
Kenji Kaneshige

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-11 12:05   ` Kenji Kaneshige
@ 2012-04-11 15:34     ` Jiang Liu
  2012-04-12  0:06       ` Bjorn Helgaas
  0 siblings, 1 reply; 26+ messages in thread
From: Jiang Liu @ 2012-04-11 15:34 UTC (permalink / raw)
  To: Kenji Kaneshige
  Cc: Bjorn Helgaas, Yinghai Lu, Taku Izumi, Jiang Liu, Keping Chen, linux-pci

On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
> (2012/04/11 13:02), Bjorn Helgaas wrote:
>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>  wrote:
>>> This patchset enhance pci_root driver to update MMCFG information when
>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug
>>>
>>> The second patch is based on Taku Izumi work with some enhancements to
>>> correctly handle PCI host bridges without _CBA method.
>>
>> I'm sorry I won't have time to really review these for a couple weeks.
>>
>> It always seemed wrong to me that we parse MCFG and set things up
>> before we even look at PNP0A03/PNP0A08 devices.  It would make more
>> sense to me to have something in acpi_pci_root_add() to set up
>> MMCONFIG using _CBA if available, and falling back to parsing MCFG if
>> it's not.
> 
> I think your idea could make the code (design) much cleaner.
> Do you have any other reason why you think "It always seemed
> wrong..."?
Yeah, that may lead to a cleaner design. 
But there are still some special cases, such as:
1) ACPI subsystem is disabled by kernel boot options, so we can't rely
on the ACPI pci_root driver to initialize the MMCFG.
2) Some PCI host bridges are not reported by the ACPI namespace. My partner
has observed a system which doesn't report the host bridges embedded in the
NHM-EX processors.

Thanks!

> 
> Regards,
> Kenji Kaneshige


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-11 15:34     ` Jiang Liu
@ 2012-04-12  0:06       ` Bjorn Helgaas
  2012-04-13 10:48         ` Kenji Kaneshige
  0 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2012-04-12  0:06 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Kenji Kaneshige, Yinghai Lu, Taku Izumi, Jiang Liu, Keping Chen,
	linux-pci

On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu <liuj97@gmail.com> wrote:
> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>  wrote:
>>>> This patchset enhance pci_root driver to update MMCFG information when
>>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug
>>>>
>>>> The second patch is based on Taku Izumi work with some enhancements to
>>>> correctly handle PCI host bridges without _CBA method.
>>>
>>> I'm sorry I won't have time to really review these for a couple weeks.
>>>
>>> It always seemed wrong to me that we parse MCFG and set things up
>>> before we even look at PNP0A03/PNP0A08 devices.  It would make more
>>> sense to me to have something in acpi_pci_root_add() to set up
>>> MMCONFIG using _CBA if available, and falling back to parsing MCFG if
>>> it's not.
>>
>> I think your idea could make the code (design) much cleaner.
>> Do you have any other reason why you think "It always seemed
>> wrong..."?

The current scheme is just an ugly design.  Does I need more reasons?  :)

> Yeah, that may lead to a cleaner design.
> But there are still some special cases, such as:
> 1) ACPI subsystem is disabled by kernel boot options, so we can't rely
> on the ACPI pci_root driver to initialize the MMCFG.

I don't think it's a requirement to make everything work with
"acpi=off".  On a system with ACPI, running with "acpi=off" is just a
kludge and if things work at all, it's only because we're very lucky.

> 2) Some PCI host bridges are not reported by the ACPI namespace. My partner
> has observed a system which doesn't report the host bridges embedded in the
> NHM-EX processors.

I don't think it's a requirement for Linux to use PCI devices behind
unreported host bridges.  I'd like to pick a date and say "after BIOS
date X, we will no longer blindly probe for these unreported host
bridges."

Bjorn

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-12  0:06       ` Bjorn Helgaas
@ 2012-04-13 10:48         ` Kenji Kaneshige
  2012-04-13 14:33           ` Jiang Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Kenji Kaneshige @ 2012-04-13 10:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jiang Liu, Yinghai Lu, Taku Izumi, Jiang Liu, Keping Chen, linux-pci

(2012/04/12 9:06), Bjorn Helgaas wrote:
> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>  wrote:
>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>    wrote:
>>>>> This patchset enhance pci_root driver to update MMCFG information when
>>>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug
>>>>>
>>>>> The second patch is based on Taku Izumi work with some enhancements to
>>>>> correctly handle PCI host bridges without _CBA method.
>>>>
>>>> I'm sorry I won't have time to really review these for a couple weeks.
>>>>
>>>> It always seemed wrong to me that we parse MCFG and set things up
>>>> before we even look at PNP0A03/PNP0A08 devices.  It would make more
>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>> MMCONFIG using _CBA if available, and falling back to parsing MCFG if
>>>> it's not.
>>>
>>> I think your idea could make the code (design) much cleaner.
>>> Do you have any other reason why you think "It always seemed
>>> wrong..."?
>
> The current scheme is just an ugly design.  Does I need more reasons?  :)

Ok, I just wanted to know if I'm missing anything we need to
take into account when re-factoring the code.

By the way, the following code makes me think there could be
some hardwares that need a fixup using mmconfig access before
scanning the PCI tree. If this is the case, we would need
something to enable early mmconfig initialization for those
hardwares.

static __init int pci_arch_init(void)
{
	...
         if (!(pci_probe & PCI_PROBE_NOEARLY))
                 pci_mmcfg_early_init();


Regards,
Kenji Kaneshige



>
>> Yeah, that may lead to a cleaner design.
>> But there are still some special cases, such as:
>> 1) ACPI subsystem is disabled by kernel boot options, so we can't rely
>> on the ACPI pci_root driver to initialize the MMCFG.
>
> I don't think it's a requirement to make everything work with
> "acpi=off".  On a system with ACPI, running with "acpi=off" is just a
> kludge and if things work at all, it's only because we're very lucky.
>
>> 2) Some PCI host bridges are not reported by the ACPI namespace. My partner
>> has observed a system which doesn't report the host bridges embedded in the
>> NHM-EX processors.
>
> I don't think it's a requirement for Linux to use PCI devices behind
> unreported host bridges.  I'd like to pick a date and say "after BIOS
> date X, we will no longer blindly probe for these unreported host
> bridges."
>
> Bjorn
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-13 10:48         ` Kenji Kaneshige
@ 2012-04-13 14:33           ` Jiang Liu
  2012-04-16 15:30             ` Don Dutile
  0 siblings, 1 reply; 26+ messages in thread
From: Jiang Liu @ 2012-04-13 14:33 UTC (permalink / raw)
  To: Kenji Kaneshige
  Cc: Bjorn Helgaas, Yinghai Lu, Taku Izumi, Jiang Liu, Keping Chen, linux-pci

On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
> (2012/04/12 9:06), Bjorn Helgaas wrote:
>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>  wrote:
>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>    wrote:
>>>>>> This patchset enhance pci_root driver to update MMCFG information when
>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug
>>>>>>
>>>>>> The second patch is based on Taku Izumi work with some enhancements to
>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>
>>>>> I'm sorry I won't have time to really review these for a couple weeks.
>>>>>
>>>>> It always seemed wrong to me that we parse MCFG and set things up
>>>>> before we even look at PNP0A03/PNP0A08 devices.  It would make more
>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>> MMCONFIG using _CBA if available, and falling back to parsing MCFG if
>>>>> it's not.
>>>>
>>>> I think your idea could make the code (design) much cleaner.
>>>> Do you have any other reason why you think "It always seemed
>>>> wrong..."?
>>
>> The current scheme is just an ugly design.  Does I need more reasons?  :)
> 
> Ok, I just wanted to know if I'm missing anything we need to
> take into account when re-factoring the code.
> 
> By the way, the following code makes me think there could be
> some hardwares that need a fixup using mmconfig access before
> scanning the PCI tree. If this is the case, we would need
> something to enable early mmconfig initialization for those
> hardwares.
> 
> static __init int pci_arch_init(void)
> {
>     ...
>         if (!(pci_probe & PCI_PROBE_NOEARLY))
>                 pci_mmcfg_early_init();
> 
> 
> Regards,
> Kenji Kaneshige

If MMCFG could be treated as an optional configuration space access method,
we can refine the MMCFG code according to Bjorn's suggestion. And as Kenji
has mentioned, there may be some risks ahead. So need more confirmation
from other PCI experts here.

It may be a good idea to ping the ACPI community to check whether ACPICA
has any dependency on the MMCFG mechanism too.

Thanks
Gerry

> 
> 
> 
>>
>>> Yeah, that may lead to a cleaner design.
>>> But there are still some special cases, such as:
>>> 1) ACPI subsystem is disabled by kernel boot options, so we can't rely
>>> on the ACPI pci_root driver to initialize the MMCFG.
>>
>> I don't think it's a requirement to make everything work with
>> "acpi=off".  On a system with ACPI, running with "acpi=off" is just a
>> kludge and if things work at all, it's only because we're very lucky.
>>
>>> 2) Some PCI host bridges are not reported by the ACPI namespace. My partner
>>> has observed a system which doesn't report the host bridges embedded in the
>>> NHM-EX processors.
>>
>> I don't think it's a requirement for Linux to use PCI devices behind
>> unreported host bridges.  I'd like to pick a date and say "after BIOS
>> date X, we will no longer blindly probe for these unreported host
>> bridges."
>>
>> Bjorn
>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-13 14:33           ` Jiang Liu
@ 2012-04-16 15:30             ` Don Dutile
  2012-04-16 16:09               ` Jiang Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Don Dutile @ 2012-04-16 15:30 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Kenji Kaneshige, Bjorn Helgaas, Yinghai Lu, Taku Izumi,
	Jiang Liu, Keping Chen, linux-pci

On 04/13/2012 10:33 AM, Jiang Liu wrote:
> On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
>> (2012/04/12 9:06), Bjorn Helgaas wrote:
>>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>   wrote:
>>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>     wrote:
>>>>>>> This patchset enhance pci_root driver to update MMCFG information when
>>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug
>>>>>>>
>>>>>>> The second patch is based on Taku Izumi work with some enhancements to
>>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>>
>>>>>> I'm sorry I won't have time to really review these for a couple weeks.
>>>>>>
>>>>>> It always seemed wrong to me that we parse MCFG and set things up
>>>>>> before we even look at PNP0A03/PNP0A08 devices.  It would make more
>>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>>> MMCONFIG using _CBA if available, and falling back to parsing MCFG if
>>>>>> it's not.
>>>>>
>>>>> I think your idea could make the code (design) much cleaner.
>>>>> Do you have any other reason why you think "It always seemed
>>>>> wrong..."?
>>>
>>> The current scheme is just an ugly design.  Does I need more reasons?  :)
>>
>> Ok, I just wanted to know if I'm missing anything we need to
>> take into account when re-factoring the code.
>>
>> By the way, the following code makes me think there could be
>> some hardwares that need a fixup using mmconfig access before
>> scanning the PCI tree. If this is the case, we would need
>> something to enable early mmconfig initialization for those
>> hardwares.
>>
>> static __init int pci_arch_init(void)
>> {
>>      ...
>>          if (!(pci_probe&  PCI_PROBE_NOEARLY))
>>                  pci_mmcfg_early_init();
>>
>>
>> Regards,
>> Kenji Kaneshige
>
> If MMCFG could be treated as an optional configuration space access method,
> we can refine the MMCFG code according to Bjorn's suggestion. And as Kenji
> has mentioned, there may be some risks ahead. So need more confirmation
> from other PCI experts here.
>
I looked at the thread, but didn't know which suggestion of Bjorn's you were referring to.
But, mmcfg access to PCI config space is need for any cap structure
greater than 256 byte offset.  A number of devices have cap structures
in this upper PCI config space, esp. SRIOV devices.
So, if 'optional MMCFG' only means at the beginning of kernel scanning of
PCI (pass-0 scanning), that should be ok, but in-depth, pass-1 scanning
of PCIe devices may require MMCFG for full functional support.

> It may be a good idea to ping the ACPI community to check whether ACPICA
> has any dependency on the MMCFG mechanism too.
>
> Thanks
> Gerry
>
>>
>>
>>
>>>
>>>> Yeah, that may lead to a cleaner design.
>>>> But there are still some special cases, such as:
>>>> 1) ACPI subsystem is disabled by kernel boot options, so we can't rely
>>>> on the ACPI pci_root driver to initialize the MMCFG.
>>>
>>> I don't think it's a requirement to make everything work with
>>> "acpi=off".  On a system with ACPI, running with "acpi=off" is just a
>>> kludge and if things work at all, it's only because we're very lucky.
>>>
>>>> 2) Some PCI host bridges are not reported by the ACPI namespace. My partner
>>>> has observed a system which doesn't report the host bridges embedded in the
>>>> NHM-EX processors.
>>>
>>> I don't think it's a requirement for Linux to use PCI devices behind
>>> unreported host bridges.  I'd like to pick a date and say "after BIOS
>>> date X, we will no longer blindly probe for these unreported host
>>> bridges."
>>>
>>> Bjorn
>>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-16 15:30             ` Don Dutile
@ 2012-04-16 16:09               ` Jiang Liu
  2012-04-16 17:54                 ` Don Dutile
  0 siblings, 1 reply; 26+ messages in thread
From: Jiang Liu @ 2012-04-16 16:09 UTC (permalink / raw)
  To: Don Dutile
  Cc: Kenji Kaneshige, Bjorn Helgaas, Yinghai Lu, Taku Izumi,
	Jiang Liu, Keping Chen, linux-pci

Hi Don,
	Thanks for your comments and please refer to inline comments below.
On 04/16/2012 11:30 PM, Don Dutile wrote:
> On 04/13/2012 10:33 AM, Jiang Liu wrote:
>> On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
>>> (2012/04/12 9:06), Bjorn Helgaas wrote:
>>>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>   wrote:
>>>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>     wrote:
>>>>>>>> This patchset enhance pci_root driver to update MMCFG information when
>>>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug
>>>>>>>>
>>>>>>>> The second patch is based on Taku Izumi work with some enhancements to
>>>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>>>
>>>>>>> I'm sorry I won't have time to really review these for a couple weeks.
>>>>>>>
>>>>>>> It always seemed wrong to me that we parse MCFG and set things up
>>>>>>> before we even look at PNP0A03/PNP0A08 devices.  It would make more
>>>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>>>> MMCONFIG using _CBA if available, and falling back to parsing MCFG if
>>>>>>> it's not.
>>>>>>
>>>>>> I think your idea could make the code (design) much cleaner.
>>>>>> Do you have any other reason why you think "It always seemed
>>>>>> wrong..."?
>>>>
>>>> The current scheme is just an ugly design.  Does I need more reasons?  :)
>>>
>>> Ok, I just wanted to know if I'm missing anything we need to
>>> take into account when re-factoring the code.
>>>
>>> By the way, the following code makes me think there could be
>>> some hardwares that need a fixup using mmconfig access before
>>> scanning the PCI tree. If this is the case, we would need
>>> something to enable early mmconfig initialization for those
>>> hardwares.
>>>
>>> static __init int pci_arch_init(void)
>>> {
>>>      ...
>>>          if (!(pci_probe&  PCI_PROBE_NOEARLY))
>>>                  pci_mmcfg_early_init();
>>>
>>>
>>> Regards,
>>> Kenji Kaneshige
>>
>> If MMCFG could be treated as an optional configuration space access method,
>> we can refine the MMCFG code according to Bjorn's suggestion. And as Kenji
>> has mentioned, there may be some risks ahead. So need more confirmation
>> from other PCI experts here.
>>
> I looked at the thread, but didn't know which suggestion of Bjorn's you were referring to.
> But, mmcfg access to PCI config space is need for any cap structure
> greater than 256 byte offset.  A number of devices have cap structures
> in this upper PCI config space, esp. SRIOV devices.
> So, if 'optional MMCFG' only means at the beginning of kernel scanning of
> PCI (pass-0 scanning), that should be ok, but in-depth, pass-1 scanning
> of PCIe devices may require MMCFG for full functional support.
For mainstream systems with support of ACPI and MMCFG, the booting sequences are about:
1) Probe for legacy PCI configuration access mechanism, such as CONF1, CONF2, BIOS
2) Start ACPICA/ACPI subsystem with the legacy PCI configuration access mechanism
3) Enumerate PCI root bridges (PNP0A03/PNP0A08) in ACPI namespace and bind pci_root
   driver to them
4) pci_root driver calls into arch code to add MMCFG information for the host bridge
5) pci_root driver calls PCI core to enumerate all PCI devices under the host bridge

The above flow should work for SRIOV case. But still need to check following cases:
1) ACPICA/ACPI subsystem has no dependency on MMCFG
2) Systems implementing SFI instead of ACPI work as expected
3) ACPI has been disabled by user (Bjorn points out we could ignore this case)
4) Some host bridges are not reported by ACPI (Bjorn points out we should eventually
   get rid of the blind probing logic)

So we may have a try according to Bjorn's suggestions.
Thanks!


> 
>> It may be a good idea to ping the ACPI community to check whether ACPICA
>> has any dependency on the MMCFG mechanism too.
>>
>> Thanks
>> Gerry
>>
>>>
>>>
>>>
>>>>
>>>>> Yeah, that may lead to a cleaner design.
>>>>> But there are still some special cases, such as:
>>>>> 1) ACPI subsystem is disabled by kernel boot options, so we can't rely
>>>>> on the ACPI pci_root driver to initialize the MMCFG.
>>>>
>>>> I don't think it's a requirement to make everything work with
>>>> "acpi=off".  On a system with ACPI, running with "acpi=off" is just a
>>>> kludge and if things work at all, it's only because we're very lucky.
>>>>
>>>>> 2) Some PCI host bridges are not reported by the ACPI namespace. My partner
>>>>> has observed a system which doesn't report the host bridges embedded in the
>>>>> NHM-EX processors.
>>>>
>>>> I don't think it's a requirement for Linux to use PCI devices behind
>>>> unreported host bridges.  I'd like to pick a date and say "after BIOS
>>>> date X, we will no longer blindly probe for these unreported host
>>>> bridges."
>>>>
>>>> Bjorn
>>>>
>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-16 16:09               ` Jiang Liu
@ 2012-04-16 17:54                 ` Don Dutile
  2012-04-23 17:41                   ` Bjorn Helgaas
  0 siblings, 1 reply; 26+ messages in thread
From: Don Dutile @ 2012-04-16 17:54 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Kenji Kaneshige, Bjorn Helgaas, Yinghai Lu, Taku Izumi,
	Jiang Liu, Keping Chen, linux-pci

On 04/16/2012 12:09 PM, Jiang Liu wrote:
> Hi Don,
> 	Thanks for your comments and please refer to inline comments below.

Thanks for the info below; couple quick replies below.. - Don

> On 04/16/2012 11:30 PM, Don Dutile wrote:
>> On 04/13/2012 10:33 AM, Jiang Liu wrote:
>>> On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
>>>> (2012/04/12 9:06), Bjorn Helgaas wrote:
>>>>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>    wrote:
>>>>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>      wrote:
>>>>>>>>> This patchset enhance pci_root driver to update MMCFG information when
>>>>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug
>>>>>>>>>
>>>>>>>>> The second patch is based on Taku Izumi work with some enhancements to
>>>>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>>>>
>>>>>>>> I'm sorry I won't have time to really review these for a couple weeks.
>>>>>>>>
>>>>>>>> It always seemed wrong to me that we parse MCFG and set things up
>>>>>>>> before we even look at PNP0A03/PNP0A08 devices.  It would make more
>>>>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>>>>> MMCONFIG using _CBA if available, and falling back to parsing MCFG if
>>>>>>>> it's not.
>>>>>>>
>>>>>>> I think your idea could make the code (design) much cleaner.
>>>>>>> Do you have any other reason why you think "It always seemed
>>>>>>> wrong..."?
>>>>>
>>>>> The current scheme is just an ugly design.  Does I need more reasons?  :)
>>>>
>>>> Ok, I just wanted to know if I'm missing anything we need to
>>>> take into account when re-factoring the code.
>>>>
>>>> By the way, the following code makes me think there could be
>>>> some hardwares that need a fixup using mmconfig access before
>>>> scanning the PCI tree. If this is the case, we would need
>>>> something to enable early mmconfig initialization for those
>>>> hardwares.
>>>>
>>>> static __init int pci_arch_init(void)
>>>> {
>>>>       ...
>>>>           if (!(pci_probe&   PCI_PROBE_NOEARLY))
>>>>                   pci_mmcfg_early_init();
>>>>
>>>>
>>>> Regards,
>>>> Kenji Kaneshige
>>>
>>> If MMCFG could be treated as an optional configuration space access method,
>>> we can refine the MMCFG code according to Bjorn's suggestion. And as Kenji
>>> has mentioned, there may be some risks ahead. So need more confirmation
>>> from other PCI experts here.
>>>
>> I looked at the thread, but didn't know which suggestion of Bjorn's you were referring to.
>> But, mmcfg access to PCI config space is need for any cap structure
>> greater than 256 byte offset.  A number of devices have cap structures
>> in this upper PCI config space, esp. SRIOV devices.
>> So, if 'optional MMCFG' only means at the beginning of kernel scanning of
>> PCI (pass-0 scanning), that should be ok, but in-depth, pass-1 scanning
>> of PCIe devices may require MMCFG for full functional support.
> For mainstream systems with support of ACPI and MMCFG, the booting sequences are about:
> 1) Probe for legacy PCI configuration access mechanism, such as CONF1, CONF2, BIOS
> 2) Start ACPICA/ACPI subsystem with the legacy PCI configuration access mechanism
> 3) Enumerate PCI root bridges (PNP0A03/PNP0A08) in ACPI namespace and bind pci_root
>     driver to them
> 4) pci_root driver calls into arch code to add MMCFG information for the host bridge
> 5) pci_root driver calls PCI core to enumerate all PCI devices under the host bridge
>
> The above flow should work for SRIOV case. But still need to check following cases:
> 1) ACPICA/ACPI subsystem has no dependency on MMCFG
> 2) Systems implementing SFI instead of ACPI work as expected
> 3) ACPI has been disabled by user (Bjorn points out we could ignore this case)
Agreed. My least favorite bz: "I set boot param to noacpi and can't scan entire PCI space.... duh!

> 4) Some host bridges are not reported by ACPI (Bjorn points out we should eventually
>     get rid of the blind probing logic)
And depend on BIOS-ACPI to be correct all the time? ....hahahahahaha  ...
sorry.... you hit my funny bone! ;-)
Is blind probing problematic ?
Seems like a pci-fixup/quirk can be implemented under arch/<>/pci to handle
these cases, and thus, depend on ACPI for host-bridge info... wait! did I just
say depend on ACPI?!?!   :)

>
> So we may have a try according to Bjorn's suggestions.
> Thanks!
>
>
>>
>>> It may be a good idea to ping the ACPI community to check whether ACPICA
>>> has any dependency on the MMCFG mechanism too.
>>>
>>> Thanks
>>> Gerry
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>> Yeah, that may lead to a cleaner design.
>>>>>> But there are still some special cases, such as:
>>>>>> 1) ACPI subsystem is disabled by kernel boot options, so we can't rely
>>>>>> on the ACPI pci_root driver to initialize the MMCFG.
>>>>>
>>>>> I don't think it's a requirement to make everything work with
>>>>> "acpi=off".  On a system with ACPI, running with "acpi=off" is just a
>>>>> kludge and if things work at all, it's only because we're very lucky.
>>>>>
>>>>>> 2) Some PCI host bridges are not reported by the ACPI namespace. My partner
>>>>>> has observed a system which doesn't report the host bridges embedded in the
>>>>>> NHM-EX processors.
>>>>>
>>>>> I don't think it's a requirement for Linux to use PCI devices behind
>>>>> unreported host bridges.  I'd like to pick a date and say "after BIOS
>>>>> date X, we will no longer blindly probe for these unreported host
>>>>> bridges."
>>>>>
>>>>> Bjorn
>>>>>
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 6/6] PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-11  0:11 ` [PATCH V4 6/6] PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
@ 2012-04-18  6:47   ` Taku Izumi
  2012-04-18  7:49     ` Jiang Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Taku Izumi @ 2012-04-18  6:47 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Yinghai Lu, Kenji Kaneshige, Jiang Liu, Bjorn Helgaas,
	Keping Chen, linux-pci


Hi Jiang,

On Wed, 11 Apr 2012 08:11:03 +0800
Jiang Liu <liuj97@gmail.com> wrote:

> This patch enhances pci_root driver to update MMCFG information when
> hot-plugging PCI root bridges on x86 platforms.
> 
> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> ---
>  arch/x86/include/asm/pci_x86.h |    1 +
>  arch/x86/pci/acpi.c            |   58 ++++++++++++++++++++++++++++++++++++++++
>  arch/x86/pci/mmconfig_32.c     |    2 +-
>  arch/x86/pci/mmconfig_64.c     |    2 +-
>  drivers/acpi/pci_root.c        |   16 +++++++++++
>  include/acpi/acnames.h         |    1 +
>  include/acpi/acpi_bus.h        |    1 +
>  include/linux/pci-acpi.h       |    3 ++
>  8 files changed, 82 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 3252c97..5e1d9a6 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -100,6 +100,7 @@ struct pci_raw_ops {
>  extern const struct pci_raw_ops *raw_pci_ops;
>  extern const struct pci_raw_ops *raw_pci_ext_ops;
>  
> +extern const struct pci_raw_ops pci_mmcfg;
>  extern const struct pci_raw_ops pci_direct_conf1;
>  extern bool port_cf9_safe;
>  
> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> index da0149d..729d694 100644
> --- a/arch/x86/pci/acpi.c
> +++ b/arch/x86/pci/acpi.c
> @@ -4,6 +4,7 @@
>  #include <linux/irq.h>
>  #include <linux/dmi.h>
>  #include <linux/slab.h>
> +#include <linux/pci-acpi.h>
>  #include <asm/numa.h>
>  #include <asm/pci_x86.h>
>  
> @@ -488,6 +489,63 @@ struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root *root)
>  	return bus;
>  }
>  
> +int __devinit arch_acpi_pci_root_add(struct acpi_pci_root *root)
> +{
> +	int result = 0;
> +	acpi_status status;
> +	unsigned long long base_addr;
> +
> +	/* MMCFG not in use */
> +	if (raw_pci_ext_ops && raw_pci_ext_ops != &pci_mmcfg)
> +		return result;
> +
> +	status = acpi_evaluate_integer(root->device->handle, METHOD_NAME__CBA,
> +				       NULL, &base_addr);
> +	if (ACPI_FAILURE(status))
> +		base_addr = 0;
> +	/*
> +	 * Try to insert MMCFG information for host bridge
> +	 */
> +	result = pci_mmconfig_insert(root->segment, root->secondary.start,
> +				     root->secondary.end, base_addr);

	This logic looks strange.
	If the evaluation of "_CBA" failed (for example, "_CBA" didn't exist), 
	pci_mmconfig_insert() should not be invoked.


> +	if (result == -EEXIST)
> +		result = 0;
> +	else if (!result) {
> +		root->mmconf_added = true;
> +
> +		/* Try to enable MMCFG if it hasn't been enabled yet */
> +		if (raw_pci_ext_ops == NULL) {
> +			if (pci_probe & PCI_PROBE_MMCONF)
> +				raw_pci_ext_ops = &pci_mmcfg;
> +			else {
> +				arch_acpi_pci_root_remove(root);
> +				result = -ENODEV;
> +			}
> +		}
> +	}
> +	if (result)
> +		printk(KERN_ERR
> +			"can't add MMCFG information for Bus %04x:%02x\n",
> +			root->segment, (unsigned int)root->secondary.start);
> +
> +	/* still could use conf1 with it */
> +	if (!root->segment)
> +		result = 0;
> +
> +	return result;
> +}
> +
> +void arch_acpi_pci_root_remove(struct acpi_pci_root *root)
> +{
> +	if (root->mmconf_added) {
> +		pci_mmconfig_delete(root->segment,
> +				    root->secondary.start,
> +				    root->secondary.end);
> +		root->mmconf_added = false;
> +	}
> +}
> +
>  int __init pci_acpi_init(void)
>  {
>  	struct pci_dev *dev = NULL;
> diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
> index a22785d..db63ac2 100644
> --- a/arch/x86/pci/mmconfig_32.c
> +++ b/arch/x86/pci/mmconfig_32.c
> @@ -126,7 +126,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
>  	return 0;
>  }
>  
> -static const struct pci_raw_ops pci_mmcfg = {
> +const struct pci_raw_ops pci_mmcfg = {
>  	.read =		pci_mmcfg_read,
>  	.write =	pci_mmcfg_write,
>  };
> diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
> index 4e05779..34c08dd 100644
> --- a/arch/x86/pci/mmconfig_64.c
> +++ b/arch/x86/pci/mmconfig_64.c
> @@ -90,7 +90,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
>  	return 0;
>  }
>  
> -static const struct pci_raw_ops pci_mmcfg = {
> +const struct pci_raw_ops pci_mmcfg = {
>  	.read =		pci_mmcfg_read,
>  	.write =	pci_mmcfg_write,
>  };
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 4a7d575..b7c0b53 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -448,6 +448,15 @@ out:
>  }
>  EXPORT_SYMBOL(acpi_pci_osc_control_set);
>  
> +int __weak arch_acpi_pci_root_add(struct acpi_pci_root *root)
> +{
> +	return 0;
> +}
> +
> +void __weak arch_acpi_pci_root_remove(struct acpi_pci_root *root)
> +{
> +}
> +
>  static int __devinit acpi_pci_root_add(struct acpi_device *device)
>  {
>  	unsigned long long segment, bus;
> @@ -504,6 +513,11 @@ static int __devinit acpi_pci_root_add(struct acpi_device *device)
>  	strcpy(acpi_device_class(device), ACPI_PCI_ROOT_CLASS);
>  	device->driver_data = root;
>  
> +	if (arch_acpi_pci_root_add(root)) {
> +		result = -ENODEV;
> +		goto out_free;
> +	}
> +
>  	/*
>  	 * All supported architectures that use ACPI have support for
>  	 * PCI domains, so we indicate this in _OSC support capabilities.
> @@ -629,6 +643,7 @@ out_del_root:
>  	list_del_rcu(&root->node);
>  	mutex_unlock(&acpi_pci_root_lock);
>  	synchronize_rcu();
> +	arch_acpi_pci_root_remove(root);
>  out_free:
>  	kfree(root);
>  	return result;
> @@ -679,6 +694,7 @@ out:
>  	list_del_rcu(&root->node);
>  	mutex_unlock(&acpi_pci_root_lock);
>  	synchronize_rcu();
> +	arch_acpi_pci_root_remove(root);
>  	kfree(root);
>  
>  	return 0;
> diff --git a/include/acpi/acnames.h b/include/acpi/acnames.h
> index 38f5088..99bda75 100644
> --- a/include/acpi/acnames.h
> +++ b/include/acpi/acnames.h
> @@ -62,6 +62,7 @@
>  #define METHOD_NAME__AEI        "_AEI"
>  #define METHOD_NAME__PRW        "_PRW"
>  #define METHOD_NAME__SRS        "_SRS"
> +#define METHOD_NAME__CBA	"_CBA"
>  
>  /* Method names - these methods must appear at the namespace root */
>  
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index f1c8ca6..cb4f402 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -370,6 +370,7 @@ struct acpi_pci_root {
>  
>  	u32 osc_support_set;	/* _OSC state of support bits */
>  	u32 osc_control_set;	/* _OSC state of control bits */
> +	bool mmconf_added;
>  };
>  
>  /* helper */
> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> index ac93634..7745b8a 100644
> --- a/include/linux/pci-acpi.h
> +++ b/include/linux/pci-acpi.h
> @@ -18,6 +18,9 @@ extern acpi_status pci_acpi_add_pm_notifier(struct acpi_device *dev,
>  					     struct pci_dev *pci_dev);
>  extern acpi_status pci_acpi_remove_pm_notifier(struct acpi_device *dev);
>  
> +int arch_acpi_pci_root_add(struct acpi_pci_root *root);
> +void arch_acpi_pci_root_remove(struct acpi_pci_root *root);
> +
>  static inline acpi_handle acpi_find_root_bridge_handle(struct pci_dev *pdev)
>  {
>  	struct pci_bus *pbus = pdev->bus;
> -- 
> 1.7.5.4
> 
> 


-- 
Taku Izumi <izumi.taku@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 6/6] PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-18  6:47   ` Taku Izumi
@ 2012-04-18  7:49     ` Jiang Liu
  2012-04-19  6:49       ` Taku Izumi
  0 siblings, 1 reply; 26+ messages in thread
From: Jiang Liu @ 2012-04-18  7:49 UTC (permalink / raw)
  To: Taku Izumi
  Cc: Jiang Liu, Yinghai Lu, Kenji Kaneshige, Bjorn Helgaas,
	Keping Chen, linux-pci

Hi Taku,
	Thanks for your comments and please refer to inline comments
below.
	gerry
On 2012-4-18 14:47, Taku Izumi wrote:
>
> Hi Jiang,
>
> On Wed, 11 Apr 2012 08:11:03 +0800
> Jiang Liu<liuj97@gmail.com>  wrote:
>
>> This patch enhances pci_root driver to update MMCFG information when
>> hot-plugging PCI root bridges on x86 platforms.
>>
>> Signed-off-by: Jiang Liu<jiang.liu@huawei.com>
>> ---
>>   arch/x86/include/asm/pci_x86.h |    1 +
>>   arch/x86/pci/acpi.c            |   58 ++++++++++++++++++++++++++++++++++++++++
>>   arch/x86/pci/mmconfig_32.c     |    2 +-
>>   arch/x86/pci/mmconfig_64.c     |    2 +-
>>   drivers/acpi/pci_root.c        |   16 +++++++++++
>>   include/acpi/acnames.h         |    1 +
>>   include/acpi/acpi_bus.h        |    1 +
>>   include/linux/pci-acpi.h       |    3 ++
>>   8 files changed, 82 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
>> index 3252c97..5e1d9a6 100644
>> --- a/arch/x86/include/asm/pci_x86.h
>> +++ b/arch/x86/include/asm/pci_x86.h
>> @@ -100,6 +100,7 @@ struct pci_raw_ops {
>>   extern const struct pci_raw_ops *raw_pci_ops;
>>   extern const struct pci_raw_ops *raw_pci_ext_ops;
>>
>> +extern const struct pci_raw_ops pci_mmcfg;
>>   extern const struct pci_raw_ops pci_direct_conf1;
>>   extern bool port_cf9_safe;
>>
>> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
>> index da0149d..729d694 100644
>> --- a/arch/x86/pci/acpi.c
>> +++ b/arch/x86/pci/acpi.c
>> @@ -4,6 +4,7 @@
>>   #include<linux/irq.h>
>>   #include<linux/dmi.h>
>>   #include<linux/slab.h>
>> +#include<linux/pci-acpi.h>
>>   #include<asm/numa.h>
>>   #include<asm/pci_x86.h>
>>
>> @@ -488,6 +489,63 @@ struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root *root)
>>   	return bus;
>>   }
>>
>> +int __devinit arch_acpi_pci_root_add(struct acpi_pci_root *root)
>> +{
>> +	int result = 0;
>> +	acpi_status status;
>> +	unsigned long long base_addr;
>> +
>> +	/* MMCFG not in use */
>> +	if (raw_pci_ext_ops&&  raw_pci_ext_ops !=&pci_mmcfg)
>> +		return result;
>> +
>> +	status = acpi_evaluate_integer(root->device->handle, METHOD_NAME__CBA,
>> +				       NULL,&base_addr);
>> +	if (ACPI_FAILURE(status))
>> +		base_addr = 0;
>> +	/*
>> +	 * Try to insert MMCFG information for host bridge
>> +	 */
>> +	result = pci_mmconfig_insert(root->segment, root->secondary.start,
>> +				     root->secondary.end, base_addr);
>
> 	This logic looks strange.
> 	If the evaluation of "_CBA" failed (for example, "_CBA" didn't exist),
> 	pci_mmconfig_insert() should not be invoked.
>
Yes, the code looks a little strange here, maybe need more clear
comments here.

For hotplug capable (physical hotplug) host bridges, BIOS should provide
_CBA method for them and we should insert the new MMCFG info here.
For non-hotplug-capable host bridges, we check that MMCFG information
for the host bridge already exists.

So the magic part is in function pci_mmconfig_insert(). The algorithm
for function pci_mmconfig_insert() is
{
	if (base_addr is not zero)
		return try_to_insert_MMCFG_info();
	else if (MMCFG info for (seg, start, bus) already exists)
		return -EEXIST;
	else
		return -EINVAL;
}

We will check for -EEXIST on return from pci_mmconfig_insert() for the
special case.

>
>> +	if (result == -EEXIST)
>> +		result = 0;
>> +	else if (!result) {
>> +		root->mmconf_added = true;
>> +
>> +		/* Try to enable MMCFG if it hasn't been enabled yet */
>> +		if (raw_pci_ext_ops == NULL) {
>> +			if (pci_probe&  PCI_PROBE_MMCONF)
>> +				raw_pci_ext_ops =&pci_mmcfg;
>> +			else {
>> +				arch_acpi_pci_root_remove(root);
>> +				result = -ENODEV;
>> +			}
>> +		}
>> +	}
>> +	if (result)
>> +		printk(KERN_ERR
>> +			"can't add MMCFG information for Bus %04x:%02x\n",
>> +			root->segment, (unsigned int)root->secondary.start);
>> +
>> +	/* still could use conf1 with it */
>> +	if (!root->segment)
>> +		result = 0;
>> +
>> +	return result;
>> +}
>> +
>> +void arch_acpi_pci_root_remove(struct acpi_pci_root *root)
>> +{
>> +	if (root->mmconf_added) {
>> +		pci_mmconfig_delete(root->segment,
>> +				    root->secondary.start,
>> +				    root->secondary.end);
>> +		root->mmconf_added = false;
>> +	}
>> +}
>> +
>>   int __init pci_acpi_init(void)
>>   {
>>   	struct pci_dev *dev = NULL;
>> diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
>> index a22785d..db63ac2 100644
>> --- a/arch/x86/pci/mmconfig_32.c
>> +++ b/arch/x86/pci/mmconfig_32.c
>> @@ -126,7 +126,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
>>   	return 0;
>>   }
>>
>> -static const struct pci_raw_ops pci_mmcfg = {
>> +const struct pci_raw_ops pci_mmcfg = {
>>   	.read =		pci_mmcfg_read,
>>   	.write =	pci_mmcfg_write,
>>   };
>> diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
>> index 4e05779..34c08dd 100644
>> --- a/arch/x86/pci/mmconfig_64.c
>> +++ b/arch/x86/pci/mmconfig_64.c
>> @@ -90,7 +90,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
>>   	return 0;
>>   }
>>
>> -static const struct pci_raw_ops pci_mmcfg = {
>> +const struct pci_raw_ops pci_mmcfg = {
>>   	.read =		pci_mmcfg_read,
>>   	.write =	pci_mmcfg_write,
>>   };
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index 4a7d575..b7c0b53 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -448,6 +448,15 @@ out:
>>   }
>>   EXPORT_SYMBOL(acpi_pci_osc_control_set);
>>
>> +int __weak arch_acpi_pci_root_add(struct acpi_pci_root *root)
>> +{
>> +	return 0;
>> +}
>> +
>> +void __weak arch_acpi_pci_root_remove(struct acpi_pci_root *root)
>> +{
>> +}
>> +
>>   static int __devinit acpi_pci_root_add(struct acpi_device *device)
>>   {
>>   	unsigned long long segment, bus;
>> @@ -504,6 +513,11 @@ static int __devinit acpi_pci_root_add(struct acpi_device *device)
>>   	strcpy(acpi_device_class(device), ACPI_PCI_ROOT_CLASS);
>>   	device->driver_data = root;
>>
>> +	if (arch_acpi_pci_root_add(root)) {
>> +		result = -ENODEV;
>> +		goto out_free;
>> +	}
>> +
>>   	/*
>>   	 * All supported architectures that use ACPI have support for
>>   	 * PCI domains, so we indicate this in _OSC support capabilities.
>> @@ -629,6 +643,7 @@ out_del_root:
>>   	list_del_rcu(&root->node);
>>   	mutex_unlock(&acpi_pci_root_lock);
>>   	synchronize_rcu();
>> +	arch_acpi_pci_root_remove(root);
>>   out_free:
>>   	kfree(root);
>>   	return result;
>> @@ -679,6 +694,7 @@ out:
>>   	list_del_rcu(&root->node);
>>   	mutex_unlock(&acpi_pci_root_lock);
>>   	synchronize_rcu();
>> +	arch_acpi_pci_root_remove(root);
>>   	kfree(root);
>>
>>   	return 0;
>> diff --git a/include/acpi/acnames.h b/include/acpi/acnames.h
>> index 38f5088..99bda75 100644
>> --- a/include/acpi/acnames.h
>> +++ b/include/acpi/acnames.h
>> @@ -62,6 +62,7 @@
>>   #define METHOD_NAME__AEI        "_AEI"
>>   #define METHOD_NAME__PRW        "_PRW"
>>   #define METHOD_NAME__SRS        "_SRS"
>> +#define METHOD_NAME__CBA	"_CBA"
>>
>>   /* Method names - these methods must appear at the namespace root */
>>
>> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
>> index f1c8ca6..cb4f402 100644
>> --- a/include/acpi/acpi_bus.h
>> +++ b/include/acpi/acpi_bus.h
>> @@ -370,6 +370,7 @@ struct acpi_pci_root {
>>
>>   	u32 osc_support_set;	/* _OSC state of support bits */
>>   	u32 osc_control_set;	/* _OSC state of control bits */
>> +	bool mmconf_added;
>>   };
>>
>>   /* helper */
>> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
>> index ac93634..7745b8a 100644
>> --- a/include/linux/pci-acpi.h
>> +++ b/include/linux/pci-acpi.h
>> @@ -18,6 +18,9 @@ extern acpi_status pci_acpi_add_pm_notifier(struct acpi_device *dev,
>>   					     struct pci_dev *pci_dev);
>>   extern acpi_status pci_acpi_remove_pm_notifier(struct acpi_device *dev);
>>
>> +int arch_acpi_pci_root_add(struct acpi_pci_root *root);
>> +void arch_acpi_pci_root_remove(struct acpi_pci_root *root);
>> +
>>   static inline acpi_handle acpi_find_root_bridge_handle(struct pci_dev *pdev)
>>   {
>>   	struct pci_bus *pbus = pdev->bus;
>> --
>> 1.7.5.4
>>
>>
>
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 6/6] PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-18  7:49     ` Jiang Liu
@ 2012-04-19  6:49       ` Taku Izumi
  2012-04-19  7:04         ` Jiang Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Taku Izumi @ 2012-04-19  6:49 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Jiang Liu, Yinghai Lu, Kenji Kaneshige, Bjorn Helgaas,
	Keping Chen, linux-pci


Hi Jiang,
   Thanks, and I have another question.

    What does "acpi_pci_root.mmconf_added" indicate ?
    In case of non-hotpluggable hostbridges, that is, when MMCONF 
    is described using MCFG table, acpi_pci_root.mmconf_added is always 0.
    Is "acpi_pci_root.mmconf_added" only for hotpluggable hostbridges ?

-- 
Best regarads,
Taku Izumi <izumi.taku@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 6/6] PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-19  6:49       ` Taku Izumi
@ 2012-04-19  7:04         ` Jiang Liu
  0 siblings, 0 replies; 26+ messages in thread
From: Jiang Liu @ 2012-04-19  7:04 UTC (permalink / raw)
  To: Taku Izumi
  Cc: Jiang Liu, Yinghai Lu, Kenji Kaneshige, Bjorn Helgaas,
	Keping Chen, linux-pci

Hi Taku,
	You are right. acpi_pci_root.mmconf_added is used to track
whether MMCFG info is added by pci_mmconfig_insert(), and
arch_acpi_pci_root_remove() will only delete MMCFG info added by
pci_mmconfig_insert().
	gerry

On 2012-4-19 14:49, Taku Izumi wrote:
>
> Hi Jiang,
>     Thanks, and I have another question.
>
>      What does "acpi_pci_root.mmconf_added" indicate ?
>      In case of non-hotpluggable hostbridges, that is, when MMCONF
>      is described using MCFG table, acpi_pci_root.mmconf_added is always 0.
>      Is "acpi_pci_root.mmconf_added" only for hotpluggable hostbridges ?
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-16 17:54                 ` Don Dutile
@ 2012-04-23 17:41                   ` Bjorn Helgaas
  2012-04-23 18:50                     ` Don Dutile
  0 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2012-04-23 17:41 UTC (permalink / raw)
  To: Don Dutile
  Cc: Jiang Liu, Kenji Kaneshige, Yinghai Lu, Taku Izumi, Jiang Liu,
	Keping Chen, linux-pci

On Mon, Apr 16, 2012 at 11:54 AM, Don Dutile <ddutile@redhat.com> wrote:
> On 04/16/2012 12:09 PM, Jiang Liu wrote:
>>
>> Hi Don,
>>        Thanks for your comments and please refer to inline comments below.
>
>
> Thanks for the info below; couple quick replies below.. - Don
>
>
>> On 04/16/2012 11:30 PM, Don Dutile wrote:
>>>
>>> On 04/13/2012 10:33 AM, Jiang Liu wrote:
>>>>
>>>> On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
>>>>>
>>>>> (2012/04/12 9:06), Bjorn Helgaas wrote:
>>>>>>
>>>>>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>    wrote:
>>>>>>>
>>>>>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>>>>>>
>>>>>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> This patchset enhance pci_root driver to update MMCFG information
>>>>>>>>>> when
>>>>>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>>>>>>>>>
>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>>>>>>>>>> for-pci-root-bus-hotplug
>>>>>>>>>>
>>>>>>>>>> The second patch is based on Taku Izumi work with some
>>>>>>>>>> enhancements to
>>>>>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm sorry I won't have time to really review these for a couple
>>>>>>>>> weeks.
>>>>>>>>>
>>>>>>>>> It always seemed wrong to me that we parse MCFG and set things up
>>>>>>>>> before we even look at PNP0A03/PNP0A08 devices.  It would make more
>>>>>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>>>>>> MMCONFIG using _CBA if available, and falling back to parsing MCFG
>>>>>>>>> if
>>>>>>>>> it's not.
>>>>>>>>
>>>>>>>>
>>>>>>>> I think your idea could make the code (design) much cleaner.
>>>>>>>> Do you have any other reason why you think "It always seemed
>>>>>>>> wrong..."?
>>>>>>
>>>>>>
>>>>>> The current scheme is just an ugly design.  Does I need more reasons?
>>>>>>  :)
>>>>>
>>>>>
>>>>> Ok, I just wanted to know if I'm missing anything we need to
>>>>> take into account when re-factoring the code.
>>>>>
>>>>> By the way, the following code makes me think there could be
>>>>> some hardwares that need a fixup using mmconfig access before
>>>>> scanning the PCI tree. If this is the case, we would need
>>>>> something to enable early mmconfig initialization for those
>>>>> hardwares.
>>>>>
>>>>> static __init int pci_arch_init(void)
>>>>> {
>>>>>      ...
>>>>>          if (!(pci_probe&   PCI_PROBE_NOEARLY))
>>>>>                  pci_mmcfg_early_init();
>>>>>
>>>>>
>>>>> Regards,
>>>>> Kenji Kaneshige
>>>>
>>>>
>>>> If MMCFG could be treated as an optional configuration space access
>>>> method,
>>>> we can refine the MMCFG code according to Bjorn's suggestion. And as
>>>> Kenji
>>>> has mentioned, there may be some risks ahead. So need more confirmation
>>>> from other PCI experts here.
>>>>
>>> I looked at the thread, but didn't know which suggestion of Bjorn's you
>>> were referring to.
>>> But, mmcfg access to PCI config space is need for any cap structure
>>> greater than 256 byte offset.  A number of devices have cap structures
>>> in this upper PCI config space, esp. SRIOV devices.
>>> So, if 'optional MMCFG' only means at the beginning of kernel scanning of
>>> PCI (pass-0 scanning), that should be ok, but in-depth, pass-1 scanning
>>> of PCIe devices may require MMCFG for full functional support.
>>
>> For mainstream systems with support of ACPI and MMCFG, the booting
>> sequences are about:
>> 1) Probe for legacy PCI configuration access mechanism, such as CONF1,
>> CONF2, BIOS
>> 2) Start ACPICA/ACPI subsystem with the legacy PCI configuration access
>> mechanism
>> 3) Enumerate PCI root bridges (PNP0A03/PNP0A08) in ACPI namespace and bind
>> pci_root
>>    driver to them
>> 4) pci_root driver calls into arch code to add MMCFG information for the
>> host bridge
>> 5) pci_root driver calls PCI core to enumerate all PCI devices under the
>> host bridge
>>
>> The above flow should work for SRIOV case. But still need to check
>> following cases:
>> 1) ACPICA/ACPI subsystem has no dependency on MMCFG
>> 2) Systems implementing SFI instead of ACPI work as expected
>> 3) ACPI has been disabled by user (Bjorn points out we could ignore this
>> case)
>
> Agreed. My least favorite bz: "I set boot param to noacpi and can't scan
> entire PCI space.... duh!
>
>
>> 4) Some host bridges are not reported by ACPI (Bjorn points out we should
>> eventually
>>    get rid of the blind probing logic)
>
> And depend on BIOS-ACPI to be correct all the time? ....hahahahahaha  ...
> sorry.... you hit my funny bone! ;-)
> Is blind probing problematic ?
> Seems like a pci-fixup/quirk can be implemented under arch/<>/pci to handle
> these cases, and thus, depend on ACPI for host-bridge info... wait! did I
> just
> say depend on ACPI?!?!   :)

Hope your funny bone has stopped tingling by now :)

When we probe blindly, we don't know what resources are available on
the bus (except for AMD systems).  Therefore, we can't do reliable
assignment, and we have to rely on whatever the BIOS did.

Blind probing finds devices not exposed by the BIOS.  This might be a
BIOS bug, or it might be a conscious decision to hide the devices from
the OS.  Some OEMs hide devices to reduce the likelihood of users
messing things up with setpci.

It would be interesting and relatively easy to figure out whether
Windows ever discovers a device behind an unreported host bridge.  My
guess is "no," but I haven't had time to verify this.

Bjorn

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-23 17:41                   ` Bjorn Helgaas
@ 2012-04-23 18:50                     ` Don Dutile
  2012-04-25 16:50                       ` Bjorn Helgaas
  0 siblings, 1 reply; 26+ messages in thread
From: Don Dutile @ 2012-04-23 18:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jiang Liu, Kenji Kaneshige, Yinghai Lu, Taku Izumi, Jiang Liu,
	Keping Chen, linux-pci

On 04/23/2012 01:41 PM, Bjorn Helgaas wrote:
> On Mon, Apr 16, 2012 at 11:54 AM, Don Dutile<ddutile@redhat.com>  wrote:
>> On 04/16/2012 12:09 PM, Jiang Liu wrote:
>>>
>>> Hi Don,
>>>         Thanks for your comments and please refer to inline comments below.
>>
>>
>> Thanks for the info below; couple quick replies below.. - Don
>>
>>
>>> On 04/16/2012 11:30 PM, Don Dutile wrote:
>>>>
>>>> On 04/13/2012 10:33 AM, Jiang Liu wrote:
>>>>>
>>>>> On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
>>>>>>
>>>>>> (2012/04/12 9:06), Bjorn Helgaas wrote:
>>>>>>>
>>>>>>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>      wrote:
>>>>>>>>
>>>>>>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>>>>>>>
>>>>>>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>> This patchset enhance pci_root driver to update MMCFG information
>>>>>>>>>>> when
>>>>>>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>>>>>>>>>>
>>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>>>>>>>>>>> for-pci-root-bus-hotplug
>>>>>>>>>>>
>>>>>>>>>>> The second patch is based on Taku Izumi work with some
>>>>>>>>>>> enhancements to
>>>>>>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'm sorry I won't have time to really review these for a couple
>>>>>>>>>> weeks.
>>>>>>>>>>
>>>>>>>>>> It always seemed wrong to me that we parse MCFG and set things up
>>>>>>>>>> before we even look at PNP0A03/PNP0A08 devices.  It would make more
>>>>>>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>>>>>>> MMCONFIG using _CBA if available, and falling back to parsing MCFG
>>>>>>>>>> if
>>>>>>>>>> it's not.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think your idea could make the code (design) much cleaner.
>>>>>>>>> Do you have any other reason why you think "It always seemed
>>>>>>>>> wrong..."?
>>>>>>>
>>>>>>>
>>>>>>> The current scheme is just an ugly design.  Does I need more reasons?
>>>>>>>   :)
>>>>>>
>>>>>>
>>>>>> Ok, I just wanted to know if I'm missing anything we need to
>>>>>> take into account when re-factoring the code.
>>>>>>
>>>>>> By the way, the following code makes me think there could be
>>>>>> some hardwares that need a fixup using mmconfig access before
>>>>>> scanning the PCI tree. If this is the case, we would need
>>>>>> something to enable early mmconfig initialization for those
>>>>>> hardwares.
>>>>>>
>>>>>> static __init int pci_arch_init(void)
>>>>>> {
>>>>>>       ...
>>>>>>           if (!(pci_probe&     PCI_PROBE_NOEARLY))
>>>>>>                   pci_mmcfg_early_init();
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Kenji Kaneshige
>>>>>
>>>>>
>>>>> If MMCFG could be treated as an optional configuration space access
>>>>> method,
>>>>> we can refine the MMCFG code according to Bjorn's suggestion. And as
>>>>> Kenji
>>>>> has mentioned, there may be some risks ahead. So need more confirmation
>>>>> from other PCI experts here.
>>>>>
>>>> I looked at the thread, but didn't know which suggestion of Bjorn's you
>>>> were referring to.
>>>> But, mmcfg access to PCI config space is need for any cap structure
>>>> greater than 256 byte offset.  A number of devices have cap structures
>>>> in this upper PCI config space, esp. SRIOV devices.
>>>> So, if 'optional MMCFG' only means at the beginning of kernel scanning of
>>>> PCI (pass-0 scanning), that should be ok, but in-depth, pass-1 scanning
>>>> of PCIe devices may require MMCFG for full functional support.
>>>
>>> For mainstream systems with support of ACPI and MMCFG, the booting
>>> sequences are about:
>>> 1) Probe for legacy PCI configuration access mechanism, such as CONF1,
>>> CONF2, BIOS
>>> 2) Start ACPICA/ACPI subsystem with the legacy PCI configuration access
>>> mechanism
>>> 3) Enumerate PCI root bridges (PNP0A03/PNP0A08) in ACPI namespace and bind
>>> pci_root
>>>     driver to them
>>> 4) pci_root driver calls into arch code to add MMCFG information for the
>>> host bridge
>>> 5) pci_root driver calls PCI core to enumerate all PCI devices under the
>>> host bridge
>>>
>>> The above flow should work for SRIOV case. But still need to check
>>> following cases:
>>> 1) ACPICA/ACPI subsystem has no dependency on MMCFG
>>> 2) Systems implementing SFI instead of ACPI work as expected
>>> 3) ACPI has been disabled by user (Bjorn points out we could ignore this
>>> case)
>>
>> Agreed. My least favorite bz: "I set boot param to noacpi and can't scan
>> entire PCI space.... duh!
>>
>>
>>> 4) Some host bridges are not reported by ACPI (Bjorn points out we should
>>> eventually
>>>     get rid of the blind probing logic)
>>
>> And depend on BIOS-ACPI to be correct all the time? ....hahahahahaha  ...
>> sorry.... you hit my funny bone! ;-)
>> Is blind probing problematic ?
>> Seems like a pci-fixup/quirk can be implemented under arch/<>/pci to handle
>> these cases, and thus, depend on ACPI for host-bridge info... wait! did I
>> just
>> say depend on ACPI?!?!   :)
>
> Hope your funny bone has stopped tingling by now :)
>
Not when ACPI's always there to bang it again! ;-)

> When we probe blindly, we don't know what resources are available on
> the bus (except for AMD systems).  Therefore, we can't do reliable
> assignment, and we have to rely on whatever the BIOS did.
>
> Blind probing finds devices not exposed by the BIOS.  This might be a
> BIOS bug, or it might be a conscious decision to hide the devices from
> the OS.  Some OEMs hide devices to reduce the likelihood of users
> messing things up with setpci.
>
> It would be interesting and relatively easy to figure out whether
> Windows ever discovers a device behind an unreported host bridge.  My
> guess is "no," but I haven't had time to verify this.
>
> Bjorn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Well, call me crazy(again!), but why put a host-bridge device on a system
and then (try to) hide it with software(BIOS/ACPI) ?
Sounds like a recipe for disaster if the OS tries to
reconfigure address space or bus-number space.....
then again, mmconfig is tied to acpi (for x86) and maybe more ACPI
magic can restrict cfg space regions to alleviate this problem/issue/condition/practice.
Hiding a PCI device is as simple as not loading its driver... :)

Have to run.... /me has to tell associate to get a new BIOS update
b/c when they enable the BIOS option for IOMMU, the
ACPI DMAR tables are borked.... glad paranoid checks
were added to the Linux IOMMU setup code for the ACPI-DMAR tables!
... of course, assuming such an update exists ....
another borked ACPI table strikes again! :) :) :) ... that's my funny
bone being hit again!


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-23 18:50                     ` Don Dutile
@ 2012-04-25 16:50                       ` Bjorn Helgaas
  2012-04-26  3:35                         ` Don Dutile
  0 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2012-04-25 16:50 UTC (permalink / raw)
  To: Don Dutile
  Cc: Jiang Liu, Kenji Kaneshige, Yinghai Lu, Taku Izumi, Jiang Liu,
	Keping Chen, linux-pci, Pearson, Greg

On Mon, Apr 23, 2012 at 12:50 PM, Don Dutile <ddutile@redhat.com> wrote:
> On 04/23/2012 01:41 PM, Bjorn Helgaas wrote:
>>
>> On Mon, Apr 16, 2012 at 11:54 AM, Don Dutile<ddutile@redhat.com>  wrote:
>>>
>>> On 04/16/2012 12:09 PM, Jiang Liu wrote:
>>>>
>>>>
>>>> Hi Don,
>>>>        Thanks for your comments and please refer to inline comments
>>>> below.
>>>
>>>
>>>
>>> Thanks for the info below; couple quick replies below.. - Don
>>>
>>>
>>>> On 04/16/2012 11:30 PM, Don Dutile wrote:
>>>>>
>>>>>
>>>>> On 04/13/2012 10:33 AM, Jiang Liu wrote:
>>>>>>
>>>>>>
>>>>>> On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
>>>>>>>
>>>>>>>
>>>>>>> (2012/04/12 9:06), Bjorn Helgaas wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>>>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This patchset enhance pci_root driver to update MMCFG
>>>>>>>>>>>> information
>>>>>>>>>>>> when
>>>>>>>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>>>>>>>>>>>> for-pci-root-bus-hotplug
>>>>>>>>>>>>
>>>>>>>>>>>> The second patch is based on Taku Izumi work with some
>>>>>>>>>>>> enhancements to
>>>>>>>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm sorry I won't have time to really review these for a couple
>>>>>>>>>>> weeks.
>>>>>>>>>>>
>>>>>>>>>>> It always seemed wrong to me that we parse MCFG and set things up
>>>>>>>>>>> before we even look at PNP0A03/PNP0A08 devices.  It would make
>>>>>>>>>>> more
>>>>>>>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>>>>>>>> MMCONFIG using _CBA if available, and falling back to parsing
>>>>>>>>>>> MCFG
>>>>>>>>>>> if
>>>>>>>>>>> it's not.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think your idea could make the code (design) much cleaner.
>>>>>>>>>> Do you have any other reason why you think "It always seemed
>>>>>>>>>> wrong..."?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The current scheme is just an ugly design.  Does I need more
>>>>>>>> reasons?
>>>>>>>>  :)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Ok, I just wanted to know if I'm missing anything we need to
>>>>>>> take into account when re-factoring the code.
>>>>>>>
>>>>>>> By the way, the following code makes me think there could be
>>>>>>> some hardwares that need a fixup using mmconfig access before
>>>>>>> scanning the PCI tree. If this is the case, we would need
>>>>>>> something to enable early mmconfig initialization for those
>>>>>>> hardwares.
>>>>>>>
>>>>>>> static __init int pci_arch_init(void)
>>>>>>> {
>>>>>>>      ...
>>>>>>>          if (!(pci_probe&     PCI_PROBE_NOEARLY))
>>>>>>>                  pci_mmcfg_early_init();
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Kenji Kaneshige
>>>>>>
>>>>>>
>>>>>>
>>>>>> If MMCFG could be treated as an optional configuration space access
>>>>>> method,
>>>>>> we can refine the MMCFG code according to Bjorn's suggestion. And as
>>>>>> Kenji
>>>>>> has mentioned, there may be some risks ahead. So need more
>>>>>> confirmation
>>>>>> from other PCI experts here.
>>>>>>
>>>>> I looked at the thread, but didn't know which suggestion of Bjorn's you
>>>>> were referring to.
>>>>> But, mmcfg access to PCI config space is need for any cap structure
>>>>> greater than 256 byte offset.  A number of devices have cap structures
>>>>> in this upper PCI config space, esp. SRIOV devices.
>>>>> So, if 'optional MMCFG' only means at the beginning of kernel scanning
>>>>> of
>>>>> PCI (pass-0 scanning), that should be ok, but in-depth, pass-1 scanning
>>>>> of PCIe devices may require MMCFG for full functional support.
>>>>
>>>>
>>>> For mainstream systems with support of ACPI and MMCFG, the booting
>>>> sequences are about:
>>>> 1) Probe for legacy PCI configuration access mechanism, such as CONF1,
>>>> CONF2, BIOS
>>>> 2) Start ACPICA/ACPI subsystem with the legacy PCI configuration access
>>>> mechanism
>>>> 3) Enumerate PCI root bridges (PNP0A03/PNP0A08) in ACPI namespace and
>>>> bind
>>>> pci_root
>>>>    driver to them
>>>> 4) pci_root driver calls into arch code to add MMCFG information for the
>>>> host bridge
>>>> 5) pci_root driver calls PCI core to enumerate all PCI devices under the
>>>> host bridge
>>>>
>>>> The above flow should work for SRIOV case. But still need to check
>>>> following cases:
>>>> 1) ACPICA/ACPI subsystem has no dependency on MMCFG
>>>> 2) Systems implementing SFI instead of ACPI work as expected
>>>> 3) ACPI has been disabled by user (Bjorn points out we could ignore this
>>>> case)
>>>
>>>
>>> Agreed. My least favorite bz: "I set boot param to noacpi and can't scan
>>> entire PCI space.... duh!
>>>
>>>
>>>> 4) Some host bridges are not reported by ACPI (Bjorn points out we
>>>> should
>>>> eventually
>>>>    get rid of the blind probing logic)
>>>
>>>
>>> And depend on BIOS-ACPI to be correct all the time? ....hahahahahaha  ...
>>> sorry.... you hit my funny bone! ;-)
>>> Is blind probing problematic ?
>>> Seems like a pci-fixup/quirk can be implemented under arch/<>/pci to
>>> handle
>>> these cases, and thus, depend on ACPI for host-bridge info... wait! did I
>>> just
>>> say depend on ACPI?!?!   :)
>>
>>
>> Hope your funny bone has stopped tingling by now :)
>>
> Not when ACPI's always there to bang it again! ;-)
>
>> When we probe blindly, we don't know what resources are available on
>> the bus (except for AMD systems).  Therefore, we can't do reliable
>> assignment, and we have to rely on whatever the BIOS did.
>>
>> Blind probing finds devices not exposed by the BIOS.  This might be a
>> BIOS bug, or it might be a conscious decision to hide the devices from
>> the OS.  Some OEMs hide devices to reduce the likelihood of users
>> messing things up with setpci.
>>
>> It would be interesting and relatively easy to figure out whether
>> Windows ever discovers a device behind an unreported host bridge.  My
>> guess is "no," but I haven't had time to verify this.
>>
> Well, call me crazy(again!), but why put a host-bridge device on a system
> and then (try to) hide it with software(BIOS/ACPI) ?
> Sounds like a recipe for disaster if the OS tries to
> reconfigure address space or bus-number space.....

Some OEMs do leave host bridges unreported.  Here's an example, from
an HP DL380 G7:

  ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-1a])
  pci 0000:00:00.0: [8086:3406] type 0 class 0x000600
  pci 0000:00:01.0: [8086:3408] type 1 class 0x000604
  pci 0000:00:02.0: [8086:3409] type 1 class 0x000604
  ...
  PCI: Discovered peer bus 3e
  pci 0000:3e:00.0: [8086:2c70] type 0 class 0x000600
  pci 0000:3e:00.1: [8086:2d81] type 0 class 0x000600
  pci 0000:3e:02.0: [8086:2d90] type 0 class 0x000600
  ...
  PCI: Discovered peer bus 3f
  pci 0000:3f:00.0: [8086:2c70] type 0 class 0x000600
  pci 0000:3f:00.1: [8086:2d81] type 0 class 0x000600
  pci 0000:3f:02.0: [8086:2d90] type 0 class 0x000600

ACPI did not report host bridges leading to buses 3e and 3f; we found
those devices by probing blindly.

In this case, these are Intel CPU uncore devices, and they don't
consume MMIO or IO port resources.  But you're absolutely right that
we can't safely reconfigure anything we find this way.

> Hiding a PCI device is as simple as not loading its driver... :)

Sure, if it's the OS that wants to hide it.  In this case, it's the
OEM that wants to hide it, and the only mechanism for doing that is to
refrain from telling the OS how to find it.  I'm pretty confident that
the DL380 omission of those host bridges is an intentional choice by
the BIOS writers.

Bjorn

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-25 16:50                       ` Bjorn Helgaas
@ 2012-04-26  3:35                         ` Don Dutile
  2012-04-26  3:53                           ` Jiang Liu
  2012-04-26  4:02                           ` Jiang Liu
  0 siblings, 2 replies; 26+ messages in thread
From: Don Dutile @ 2012-04-26  3:35 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jiang Liu, Kenji Kaneshige, Yinghai Lu, Taku Izumi, Jiang Liu,
	Keping Chen, linux-pci, Pearson, Greg

On 04/25/2012 12:50 PM, Bjorn Helgaas wrote:
> On Mon, Apr 23, 2012 at 12:50 PM, Don Dutile<ddutile@redhat.com>  wrote:
>> On 04/23/2012 01:41 PM, Bjorn Helgaas wrote:
>>>
>>> On Mon, Apr 16, 2012 at 11:54 AM, Don Dutile<ddutile@redhat.com>    wrote:
>>>>
>>>> On 04/16/2012 12:09 PM, Jiang Liu wrote:
>>>>>
>>>>>
>>>>> Hi Don,
>>>>>         Thanks for your comments and please refer to inline comments
>>>>> below.
>>>>
>>>>
>>>>
>>>> Thanks for the info below; couple quick replies below.. - Don
>>>>
>>>>
>>>>> On 04/16/2012 11:30 PM, Don Dutile wrote:
>>>>>>
>>>>>>
>>>>>> On 04/13/2012 10:33 AM, Jiang Liu wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> (2012/04/12 9:06), Bjorn Helgaas wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>   wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This patchset enhance pci_root driver to update MMCFG
>>>>>>>>>>>>> information
>>>>>>>>>>>>> when
>>>>>>>>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's tree at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>>>>>>>>>>>>> for-pci-root-bus-hotplug
>>>>>>>>>>>>>
>>>>>>>>>>>>> The second patch is based on Taku Izumi work with some
>>>>>>>>>>>>> enhancements to
>>>>>>>>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I'm sorry I won't have time to really review these for a couple
>>>>>>>>>>>> weeks.
>>>>>>>>>>>>
>>>>>>>>>>>> It always seemed wrong to me that we parse MCFG and set things up
>>>>>>>>>>>> before we even look at PNP0A03/PNP0A08 devices.  It would make
>>>>>>>>>>>> more
>>>>>>>>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>>>>>>>>> MMCONFIG using _CBA if available, and falling back to parsing
>>>>>>>>>>>> MCFG
>>>>>>>>>>>> if
>>>>>>>>>>>> it's not.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think your idea could make the code (design) much cleaner.
>>>>>>>>>>> Do you have any other reason why you think "It always seemed
>>>>>>>>>>> wrong..."?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The current scheme is just an ugly design.  Does I need more
>>>>>>>>> reasons?
>>>>>>>>>   :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, I just wanted to know if I'm missing anything we need to
>>>>>>>> take into account when re-factoring the code.
>>>>>>>>
>>>>>>>> By the way, the following code makes me think there could be
>>>>>>>> some hardwares that need a fixup using mmconfig access before
>>>>>>>> scanning the PCI tree. If this is the case, we would need
>>>>>>>> something to enable early mmconfig initialization for those
>>>>>>>> hardwares.
>>>>>>>>
>>>>>>>> static __init int pci_arch_init(void)
>>>>>>>> {
>>>>>>>>       ...
>>>>>>>>           if (!(pci_probe&       PCI_PROBE_NOEARLY))
>>>>>>>>                   pci_mmcfg_early_init();
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Kenji Kaneshige
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> If MMCFG could be treated as an optional configuration space access
>>>>>>> method,
>>>>>>> we can refine the MMCFG code according to Bjorn's suggestion. And as
>>>>>>> Kenji
>>>>>>> has mentioned, there may be some risks ahead. So need more
>>>>>>> confirmation
>>>>>>> from other PCI experts here.
>>>>>>>
>>>>>> I looked at the thread, but didn't know which suggestion of Bjorn's you
>>>>>> were referring to.
>>>>>> But, mmcfg access to PCI config space is need for any cap structure
>>>>>> greater than 256 byte offset.  A number of devices have cap structures
>>>>>> in this upper PCI config space, esp. SRIOV devices.
>>>>>> So, if 'optional MMCFG' only means at the beginning of kernel scanning
>>>>>> of
>>>>>> PCI (pass-0 scanning), that should be ok, but in-depth, pass-1 scanning
>>>>>> of PCIe devices may require MMCFG for full functional support.
>>>>>
>>>>>
>>>>> For mainstream systems with support of ACPI and MMCFG, the booting
>>>>> sequences are about:
>>>>> 1) Probe for legacy PCI configuration access mechanism, such as CONF1,
>>>>> CONF2, BIOS
>>>>> 2) Start ACPICA/ACPI subsystem with the legacy PCI configuration access
>>>>> mechanism
>>>>> 3) Enumerate PCI root bridges (PNP0A03/PNP0A08) in ACPI namespace and
>>>>> bind
>>>>> pci_root
>>>>>     driver to them
>>>>> 4) pci_root driver calls into arch code to add MMCFG information for the
>>>>> host bridge
>>>>> 5) pci_root driver calls PCI core to enumerate all PCI devices under the
>>>>> host bridge
>>>>>
>>>>> The above flow should work for SRIOV case. But still need to check
>>>>> following cases:
>>>>> 1) ACPICA/ACPI subsystem has no dependency on MMCFG
>>>>> 2) Systems implementing SFI instead of ACPI work as expected
>>>>> 3) ACPI has been disabled by user (Bjorn points out we could ignore this
>>>>> case)
>>>>
>>>>
>>>> Agreed. My least favorite bz: "I set boot param to noacpi and can't scan
>>>> entire PCI space.... duh!
>>>>
>>>>
>>>>> 4) Some host bridges are not reported by ACPI (Bjorn points out we
>>>>> should
>>>>> eventually
>>>>>     get rid of the blind probing logic)
>>>>
>>>>
>>>> And depend on BIOS-ACPI to be correct all the time? ....hahahahahaha  ...
>>>> sorry.... you hit my funny bone! ;-)
>>>> Is blind probing problematic ?
>>>> Seems like a pci-fixup/quirk can be implemented under arch/<>/pci to
>>>> handle
>>>> these cases, and thus, depend on ACPI for host-bridge info... wait! did I
>>>> just
>>>> say depend on ACPI?!?!   :)
>>>
>>>
>>> Hope your funny bone has stopped tingling by now :)
>>>
>> Not when ACPI's always there to bang it again! ;-)
>>
>>> When we probe blindly, we don't know what resources are available on
>>> the bus (except for AMD systems).  Therefore, we can't do reliable
>>> assignment, and we have to rely on whatever the BIOS did.
>>>
>>> Blind probing finds devices not exposed by the BIOS.  This might be a
>>> BIOS bug, or it might be a conscious decision to hide the devices from
>>> the OS.  Some OEMs hide devices to reduce the likelihood of users
>>> messing things up with setpci.
>>>
>>> It would be interesting and relatively easy to figure out whether
>>> Windows ever discovers a device behind an unreported host bridge.  My
>>> guess is "no," but I haven't had time to verify this.
>>>
>> Well, call me crazy(again!), but why put a host-bridge device on a system
>> and then (try to) hide it with software(BIOS/ACPI) ?
>> Sounds like a recipe for disaster if the OS tries to
>> reconfigure address space or bus-number space.....
>
> Some OEMs do leave host bridges unreported.  Here's an example, from
> an HP DL380 G7:
>
I didn't doubt they tried to.... again, why!?!

>    ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-1a])
>    pci 0000:00:00.0: [8086:3406] type 0 class 0x000600
>    pci 0000:00:01.0: [8086:3408] type 1 class 0x000604
>    pci 0000:00:02.0: [8086:3409] type 1 class 0x000604
>    ...
>    PCI: Discovered peer bus 3e
>    pci 0000:3e:00.0: [8086:2c70] type 0 class 0x000600
>    pci 0000:3e:00.1: [8086:2d81] type 0 class 0x000600
>    pci 0000:3e:02.0: [8086:2d90] type 0 class 0x000600
>    ...
>    PCI: Discovered peer bus 3f
>    pci 0000:3f:00.0: [8086:2c70] type 0 class 0x000600
>    pci 0000:3f:00.1: [8086:2d81] type 0 class 0x000600
>    pci 0000:3f:02.0: [8086:2d90] type 0 class 0x000600
>
> ACPI did not report host bridges leading to buses 3e and 3f; we found
> those devices by probing blindly.
>
> In this case, these are Intel CPU uncore devices, and they don't
> consume MMIO or IO port resources.  But you're absolutely right that
> we can't safely reconfigure anything we find this way.
>
>> Hiding a PCI device is as simple as not loading its driver... :)
>
> Sure, if it's the OS that wants to hide it.  In this case, it's the
> OEM that wants to hide it, and the only mechanism for doing that is to
> refrain from telling the OS how to find it.  I'm pretty confident that
> the DL380 omission of those host bridges is an intentional choice by
> the BIOS writers.
>
> Bjorn

but this kind of hiding will break code paths/logic like pci=assign-busses,
b/c if the hidden bus nums aren't registered, and they are used by
the assign-bus code, then duplicate busses will have overlapping sec-num/sub-num
ranges and then there are two bridges that will race (or as I experienced, hang)
to reply with config cycles in the overlapping range.
so, scanning & registering (but not configuring) 'hidden' (non-ACPI-id'd)
bridges/busses sounds like something the pci code has to do.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-26  3:35                         ` Don Dutile
@ 2012-04-26  3:53                           ` Jiang Liu
  2012-04-26  4:02                           ` Jiang Liu
  1 sibling, 0 replies; 26+ messages in thread
From: Jiang Liu @ 2012-04-26  3:53 UTC (permalink / raw)
  To: Don Dutile
  Cc: Bjorn Helgaas, Jiang Liu, Kenji Kaneshige, Yinghai Lu,
	Taku Izumi, Keping Chen, linux-pci, Pearson, Greg

On 2012-4-26 11:35, Don Dutile wrote:
> On 04/25/2012 12:50 PM, Bjorn Helgaas wrote:
>> On Mon, Apr 23, 2012 at 12:50 PM, Don Dutile<ddutile@redhat.com> wrote:
>>> On 04/23/2012 01:41 PM, Bjorn Helgaas wrote:
>>>>
>>>> On Mon, Apr 16, 2012 at 11:54 AM, Don Dutile<ddutile@redhat.com> wrote:
>>>>>
>>>>> On 04/16/2012 12:09 PM, Jiang Liu wrote:
>>>>>>
>>>>>>
>>>>>> Hi Don,
>>>>>> Thanks for your comments and please refer to inline comments
>>>>>> below.
>>>>>
>>>>>
>>>>>
>>>>> Thanks for the info below; couple quick replies below.. - Don
>>>>>
>>>>>
>>>>>> On 04/16/2012 11:30 PM, Don Dutile wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 04/13/2012 10:33 AM, Jiang Liu wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (2012/04/12 9:06), Bjorn Helgaas wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This patchset enhance pci_root driver to update MMCFG
>>>>>>>>>>>>>> information
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's
>>>>>>>>>>>>>> tree at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> for-pci-root-bus-hotplug
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The second patch is based on Taku Izumi work with some
>>>>>>>>>>>>>> enhancements to
>>>>>>>>>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm sorry I won't have time to really review these for a
>>>>>>>>>>>>> couple
>>>>>>>>>>>>> weeks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It always seemed wrong to me that we parse MCFG and set
>>>>>>>>>>>>> things up
>>>>>>>>>>>>> before we even look at PNP0A03/PNP0A08 devices. It would make
>>>>>>>>>>>>> more
>>>>>>>>>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>>>>>>>>>> MMCONFIG using _CBA if available, and falling back to parsing
>>>>>>>>>>>>> MCFG
>>>>>>>>>>>>> if
>>>>>>>>>>>>> it's not.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I think your idea could make the code (design) much cleaner.
>>>>>>>>>>>> Do you have any other reason why you think "It always seemed
>>>>>>>>>>>> wrong..."?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The current scheme is just an ugly design. Does I need more
>>>>>>>>>> reasons?
>>>>>>>>>> :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ok, I just wanted to know if I'm missing anything we need to
>>>>>>>>> take into account when re-factoring the code.
>>>>>>>>>
>>>>>>>>> By the way, the following code makes me think there could be
>>>>>>>>> some hardwares that need a fixup using mmconfig access before
>>>>>>>>> scanning the PCI tree. If this is the case, we would need
>>>>>>>>> something to enable early mmconfig initialization for those
>>>>>>>>> hardwares.
>>>>>>>>>
>>>>>>>>> static __init int pci_arch_init(void)
>>>>>>>>> {
>>>>>>>>> ...
>>>>>>>>> if (!(pci_probe& PCI_PROBE_NOEARLY))
>>>>>>>>> pci_mmcfg_early_init();
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Kenji Kaneshige
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> If MMCFG could be treated as an optional configuration space access
>>>>>>>> method,
>>>>>>>> we can refine the MMCFG code according to Bjorn's suggestion.
>>>>>>>> And as
>>>>>>>> Kenji
>>>>>>>> has mentioned, there may be some risks ahead. So need more
>>>>>>>> confirmation
>>>>>>>> from other PCI experts here.
>>>>>>>>
>>>>>>> I looked at the thread, but didn't know which suggestion of
>>>>>>> Bjorn's you
>>>>>>> were referring to.
>>>>>>> But, mmcfg access to PCI config space is need for any cap structure
>>>>>>> greater than 256 byte offset. A number of devices have cap
>>>>>>> structures
>>>>>>> in this upper PCI config space, esp. SRIOV devices.
>>>>>>> So, if 'optional MMCFG' only means at the beginning of kernel
>>>>>>> scanning
>>>>>>> of
>>>>>>> PCI (pass-0 scanning), that should be ok, but in-depth, pass-1
>>>>>>> scanning
>>>>>>> of PCIe devices may require MMCFG for full functional support.
>>>>>>
>>>>>>
>>>>>> For mainstream systems with support of ACPI and MMCFG, the booting
>>>>>> sequences are about:
>>>>>> 1) Probe for legacy PCI configuration access mechanism, such as
>>>>>> CONF1,
>>>>>> CONF2, BIOS
>>>>>> 2) Start ACPICA/ACPI subsystem with the legacy PCI configuration
>>>>>> access
>>>>>> mechanism
>>>>>> 3) Enumerate PCI root bridges (PNP0A03/PNP0A08) in ACPI namespace and
>>>>>> bind
>>>>>> pci_root
>>>>>> driver to them
>>>>>> 4) pci_root driver calls into arch code to add MMCFG information
>>>>>> for the
>>>>>> host bridge
>>>>>> 5) pci_root driver calls PCI core to enumerate all PCI devices
>>>>>> under the
>>>>>> host bridge
>>>>>>
>>>>>> The above flow should work for SRIOV case. But still need to check
>>>>>> following cases:
>>>>>> 1) ACPICA/ACPI subsystem has no dependency on MMCFG
>>>>>> 2) Systems implementing SFI instead of ACPI work as expected
>>>>>> 3) ACPI has been disabled by user (Bjorn points out we could
>>>>>> ignore this
>>>>>> case)
>>>>>
>>>>>
>>>>> Agreed. My least favorite bz: "I set boot param to noacpi and can't
>>>>> scan
>>>>> entire PCI space.... duh!
>>>>>
>>>>>
>>>>>> 4) Some host bridges are not reported by ACPI (Bjorn points out we
>>>>>> should
>>>>>> eventually
>>>>>> get rid of the blind probing logic)
>>>>>
>>>>>
>>>>> And depend on BIOS-ACPI to be correct all the time?
>>>>> ....hahahahahaha ...
>>>>> sorry.... you hit my funny bone! ;-)
>>>>> Is blind probing problematic ?
>>>>> Seems like a pci-fixup/quirk can be implemented under arch/<>/pci to
>>>>> handle
>>>>> these cases, and thus, depend on ACPI for host-bridge info... wait!
>>>>> did I
>>>>> just
>>>>> say depend on ACPI?!?! :)
>>>>
>>>>
>>>> Hope your funny bone has stopped tingling by now :)
>>>>
>>> Not when ACPI's always there to bang it again! ;-)
>>>
>>>> When we probe blindly, we don't know what resources are available on
>>>> the bus (except for AMD systems). Therefore, we can't do reliable
>>>> assignment, and we have to rely on whatever the BIOS did.
>>>>
>>>> Blind probing finds devices not exposed by the BIOS. This might be a
>>>> BIOS bug, or it might be a conscious decision to hide the devices from
>>>> the OS. Some OEMs hide devices to reduce the likelihood of users
>>>> messing things up with setpci.
>>>>
>>>> It would be interesting and relatively easy to figure out whether
>>>> Windows ever discovers a device behind an unreported host bridge. My
>>>> guess is "no," but I haven't had time to verify this.
>>>>
>>> Well, call me crazy(again!), but why put a host-bridge device on a
>>> system
>>> and then (try to) hide it with software(BIOS/ACPI) ?
>>> Sounds like a recipe for disaster if the OS tries to
>>> reconfigure address space or bus-number space.....
>>
>> Some OEMs do leave host bridges unreported. Here's an example, from
>> an HP DL380 G7:
>>
> I didn't doubt they tried to.... again, why!?!
Some OEMs may choose to hidden special PCI host bridges due to security
reasons.
For example, Intel NHM-EX processor has an embedded PCI host bridge,
which hosts all processor specific configuration information.
You could do memory physical address to memory device address
translation according to information exposed by PCI devices on those
embedded PCI host buses. (Once we have done that.) But later those
PCI host bridges are hidden from OS by BIOS due to security concerns.

>
>> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-1a])
>> pci 0000:00:00.0: [8086:3406] type 0 class 0x000600
>> pci 0000:00:01.0: [8086:3408] type 1 class 0x000604
>> pci 0000:00:02.0: [8086:3409] type 1 class 0x000604
>> ...
>> PCI: Discovered peer bus 3e
>> pci 0000:3e:00.0: [8086:2c70] type 0 class 0x000600
>> pci 0000:3e:00.1: [8086:2d81] type 0 class 0x000600
>> pci 0000:3e:02.0: [8086:2d90] type 0 class 0x000600
>> ...
>> PCI: Discovered peer bus 3f
>> pci 0000:3f:00.0: [8086:2c70] type 0 class 0x000600
>> pci 0000:3f:00.1: [8086:2d81] type 0 class 0x000600
>> pci 0000:3f:02.0: [8086:2d90] type 0 class 0x000600
>>
>> ACPI did not report host bridges leading to buses 3e and 3f; we found
>> those devices by probing blindly.
>>
>> In this case, these are Intel CPU uncore devices, and they don't
>> consume MMIO or IO port resources. But you're absolutely right that
>> we can't safely reconfigure anything we find this way.
>>
>>> Hiding a PCI device is as simple as not loading its driver... :)
>>
>> Sure, if it's the OS that wants to hide it. In this case, it's the
>> OEM that wants to hide it, and the only mechanism for doing that is to
>> refrain from telling the OS how to find it. I'm pretty confident that
>> the DL380 omission of those host bridges is an intentional choice by
>> the BIOS writers.
>>
>> Bjorn
>
> but this kind of hiding will break code paths/logic like pci=assign-busses,
> b/c if the hidden bus nums aren't registered, and they are used by
> the assign-bus code, then duplicate busses will have overlapping
> sec-num/sub-num
> ranges and then there are two bridges that will race (or as I
> experienced, hang)
> to reply with config cycles in the overlapping range.
> so, scanning & registering (but not configuring) 'hidden' (non-ACPI-id'd)
> bridges/busses sounds like something the pci code has to do.
>
> .
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
  2012-04-26  3:35                         ` Don Dutile
  2012-04-26  3:53                           ` Jiang Liu
@ 2012-04-26  4:02                           ` Jiang Liu
  1 sibling, 0 replies; 26+ messages in thread
From: Jiang Liu @ 2012-04-26  4:02 UTC (permalink / raw)
  To: Don Dutile
  Cc: Bjorn Helgaas, Jiang Liu, Kenji Kaneshige, Yinghai Lu,
	Taku Izumi, Keping Chen, linux-pci, Pearson, Greg

On 2012-4-26 11:35, Don Dutile wrote:
> On 04/25/2012 12:50 PM, Bjorn Helgaas wrote:
>> On Mon, Apr 23, 2012 at 12:50 PM, Don Dutile<ddutile@redhat.com> wrote:
>>> On 04/23/2012 01:41 PM, Bjorn Helgaas wrote:
>>>>
>>>> On Mon, Apr 16, 2012 at 11:54 AM, Don Dutile<ddutile@redhat.com> wrote:
>>>>>
>>>>> On 04/16/2012 12:09 PM, Jiang Liu wrote:
>>>>>>
>>>>>>
>>>>>> Hi Don,
>>>>>> Thanks for your comments and please refer to inline comments
>>>>>> below.
>>>>>
>>>>>
>>>>>
>>>>> Thanks for the info below; couple quick replies below.. - Don
>>>>>
>>>>>
>>>>>> On 04/16/2012 11:30 PM, Don Dutile wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 04/13/2012 10:33 AM, Jiang Liu wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (2012/04/12 9:06), Bjorn Helgaas wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This patchset enhance pci_root driver to update MMCFG
>>>>>>>>>>>>>> information
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's
>>>>>>>>>>>>>> tree at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> for-pci-root-bus-hotplug
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The second patch is based on Taku Izumi work with some
>>>>>>>>>>>>>> enhancements to
>>>>>>>>>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm sorry I won't have time to really review these for a
>>>>>>>>>>>>> couple
>>>>>>>>>>>>> weeks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It always seemed wrong to me that we parse MCFG and set
>>>>>>>>>>>>> things up
>>>>>>>>>>>>> before we even look at PNP0A03/PNP0A08 devices. It would make
>>>>>>>>>>>>> more
>>>>>>>>>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>>>>>>>>>> MMCONFIG using _CBA if available, and falling back to parsing
>>>>>>>>>>>>> MCFG
>>>>>>>>>>>>> if
>>>>>>>>>>>>> it's not.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I think your idea could make the code (design) much cleaner.
>>>>>>>>>>>> Do you have any other reason why you think "It always seemed
>>>>>>>>>>>> wrong..."?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The current scheme is just an ugly design. Does I need more
>>>>>>>>>> reasons?
>>>>>>>>>> :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ok, I just wanted to know if I'm missing anything we need to
>>>>>>>>> take into account when re-factoring the code.
>>>>>>>>>
>>>>>>>>> By the way, the following code makes me think there could be
>>>>>>>>> some hardwares that need a fixup using mmconfig access before
>>>>>>>>> scanning the PCI tree. If this is the case, we would need
>>>>>>>>> something to enable early mmconfig initialization for those
>>>>>>>>> hardwares.
>>>>>>>>>
>>>>>>>>> static __init int pci_arch_init(void)
>>>>>>>>> {
>>>>>>>>> ...
>>>>>>>>> if (!(pci_probe& PCI_PROBE_NOEARLY))
>>>>>>>>> pci_mmcfg_early_init();
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Kenji Kaneshige
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> If MMCFG could be treated as an optional configuration space access
>>>>>>>> method,
>>>>>>>> we can refine the MMCFG code according to Bjorn's suggestion.
>>>>>>>> And as
>>>>>>>> Kenji
>>>>>>>> has mentioned, there may be some risks ahead. So need more
>>>>>>>> confirmation
>>>>>>>> from other PCI experts here.
>>>>>>>>
>>>>>>> I looked at the thread, but didn't know which suggestion of
>>>>>>> Bjorn's you
>>>>>>> were referring to.
>>>>>>> But, mmcfg access to PCI config space is need for any cap structure
>>>>>>> greater than 256 byte offset. A number of devices have cap
>>>>>>> structures
>>>>>>> in this upper PCI config space, esp. SRIOV devices.
>>>>>>> So, if 'optional MMCFG' only means at the beginning of kernel
>>>>>>> scanning
>>>>>>> of
>>>>>>> PCI (pass-0 scanning), that should be ok, but in-depth, pass-1
>>>>>>> scanning
>>>>>>> of PCIe devices may require MMCFG for full functional support.
>>>>>>
>>>>>>
>>>>>> For mainstream systems with support of ACPI and MMCFG, the booting
>>>>>> sequences are about:
>>>>>> 1) Probe for legacy PCI configuration access mechanism, such as
>>>>>> CONF1,
>>>>>> CONF2, BIOS
>>>>>> 2) Start ACPICA/ACPI subsystem with the legacy PCI configuration
>>>>>> access
>>>>>> mechanism
>>>>>> 3) Enumerate PCI root bridges (PNP0A03/PNP0A08) in ACPI namespace and
>>>>>> bind
>>>>>> pci_root
>>>>>> driver to them
>>>>>> 4) pci_root driver calls into arch code to add MMCFG information
>>>>>> for the
>>>>>> host bridge
>>>>>> 5) pci_root driver calls PCI core to enumerate all PCI devices
>>>>>> under the
>>>>>> host bridge
>>>>>>
>>>>>> The above flow should work for SRIOV case. But still need to check
>>>>>> following cases:
>>>>>> 1) ACPICA/ACPI subsystem has no dependency on MMCFG
>>>>>> 2) Systems implementing SFI instead of ACPI work as expected
>>>>>> 3) ACPI has been disabled by user (Bjorn points out we could
>>>>>> ignore this
>>>>>> case)
>>>>>
>>>>>
>>>>> Agreed. My least favorite bz: "I set boot param to noacpi and can't
>>>>> scan
>>>>> entire PCI space.... duh!
>>>>>
>>>>>
>>>>>> 4) Some host bridges are not reported by ACPI (Bjorn points out we
>>>>>> should
>>>>>> eventually
>>>>>> get rid of the blind probing logic)
>>>>>
>>>>>
>>>>> And depend on BIOS-ACPI to be correct all the time?
>>>>> ....hahahahahaha ...
>>>>> sorry.... you hit my funny bone! ;-)
>>>>> Is blind probing problematic ?
>>>>> Seems like a pci-fixup/quirk can be implemented under arch/<>/pci to
>>>>> handle
>>>>> these cases, and thus, depend on ACPI for host-bridge info... wait!
>>>>> did I
>>>>> just
>>>>> say depend on ACPI?!?! :)
>>>>
>>>>
>>>> Hope your funny bone has stopped tingling by now :)
>>>>
>>> Not when ACPI's always there to bang it again! ;-)
>>>
>>>> When we probe blindly, we don't know what resources are available on
>>>> the bus (except for AMD systems). Therefore, we can't do reliable
>>>> assignment, and we have to rely on whatever the BIOS did.
>>>>
>>>> Blind probing finds devices not exposed by the BIOS. This might be a
>>>> BIOS bug, or it might be a conscious decision to hide the devices from
>>>> the OS. Some OEMs hide devices to reduce the likelihood of users
>>>> messing things up with setpci.
>>>>
>>>> It would be interesting and relatively easy to figure out whether
>>>> Windows ever discovers a device behind an unreported host bridge. My
>>>> guess is "no," but I haven't had time to verify this.
>>>>
>>> Well, call me crazy(again!), but why put a host-bridge device on a
>>> system
>>> and then (try to) hide it with software(BIOS/ACPI) ?
>>> Sounds like a recipe for disaster if the OS tries to
>>> reconfigure address space or bus-number space.....
>>
>> Some OEMs do leave host bridges unreported. Here's an example, from
>> an HP DL380 G7:
>>
> I didn't doubt they tried to.... again, why!?!
>
>> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-1a])
>> pci 0000:00:00.0: [8086:3406] type 0 class 0x000600
>> pci 0000:00:01.0: [8086:3408] type 1 class 0x000604
>> pci 0000:00:02.0: [8086:3409] type 1 class 0x000604
>> ...
>> PCI: Discovered peer bus 3e
>> pci 0000:3e:00.0: [8086:2c70] type 0 class 0x000600
>> pci 0000:3e:00.1: [8086:2d81] type 0 class 0x000600
>> pci 0000:3e:02.0: [8086:2d90] type 0 class 0x000600
>> ...
>> PCI: Discovered peer bus 3f
>> pci 0000:3f:00.0: [8086:2c70] type 0 class 0x000600
>> pci 0000:3f:00.1: [8086:2d81] type 0 class 0x000600
>> pci 0000:3f:02.0: [8086:2d90] type 0 class 0x000600
>>
>> ACPI did not report host bridges leading to buses 3e and 3f; we found
>> those devices by probing blindly.
>>
>> In this case, these are Intel CPU uncore devices, and they don't
>> consume MMIO or IO port resources. But you're absolutely right that
>> we can't safely reconfigure anything we find this way.
>>
>>> Hiding a PCI device is as simple as not loading its driver... :)
>>
>> Sure, if it's the OS that wants to hide it. In this case, it's the
>> OEM that wants to hide it, and the only mechanism for doing that is to
>> refrain from telling the OS how to find it. I'm pretty confident that
>> the DL380 omission of those host bridges is an intentional choice by
>> the BIOS writers.
>>
>> Bjorn
>
> but this kind of hiding will break code paths/logic like pci=assign-busses,
> b/c if the hidden bus nums aren't registered, and they are used by
> the assign-bus code, then duplicate busses will have overlapping
> sec-num/sub-num
> ranges and then there are two bridges that will race (or as I
> experienced, hang)
> to reply with config cycles in the overlapping range.
> so, scanning & registering (but not configuring) 'hidden' (non-ACPI-id'd)
> bridges/busses sounds like something the pci code has to do.

This is really a issue. If we could distinguish segment 0 and other
segments, things may become much more simple. Segment 0 may be treated
specially for compatibility issues.



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-04-26  4:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11  0:10 [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
2012-04-11  0:10 ` [PATCH V4 1/6] PCI, x86: split out pci_mmcfg_check_reserved() for code reuse Jiang Liu
2012-04-11  0:10 ` [PATCH V4 2/6] PCI, x86: split out pci_mmconfig_alloc() " Jiang Liu
2012-04-11  0:11 ` [PATCH V4 3/6] PCI, x86: use RCU list to protect mmconfig list Jiang Liu
2012-04-11  0:11 ` [PATCH V4 4/6] PCI, x86: introduce pci_mmcfg_arch_map()/pci_mmcfg_arch_unmap() Jiang Liu
2012-04-11  0:11 ` [PATCH V4 5/6] PCI, x86: introduce pci_mmconfig_insert()/delete() for PCI root bridge hotplug Jiang Liu
2012-04-11  0:11 ` [PATCH V4 6/6] PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
2012-04-18  6:47   ` Taku Izumi
2012-04-18  7:49     ` Jiang Liu
2012-04-19  6:49       ` Taku Izumi
2012-04-19  7:04         ` Jiang Liu
2012-04-11  4:02 ` [PATCH V4 0/6] PCI, " Bjorn Helgaas
2012-04-11 12:05   ` Kenji Kaneshige
2012-04-11 15:34     ` Jiang Liu
2012-04-12  0:06       ` Bjorn Helgaas
2012-04-13 10:48         ` Kenji Kaneshige
2012-04-13 14:33           ` Jiang Liu
2012-04-16 15:30             ` Don Dutile
2012-04-16 16:09               ` Jiang Liu
2012-04-16 17:54                 ` Don Dutile
2012-04-23 17:41                   ` Bjorn Helgaas
2012-04-23 18:50                     ` Don Dutile
2012-04-25 16:50                       ` Bjorn Helgaas
2012-04-26  3:35                         ` Don Dutile
2012-04-26  3:53                           ` Jiang Liu
2012-04-26  4:02                           ` Jiang Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.