linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs
@ 2014-02-26 19:37 Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 1/9] resource: Add resource_contains() Bjorn Helgaas
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:37 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

I'm trying to unify the way we handle unassigned PCI BARs, i.e., resources
where we know the type and size, but we haven't assigned an address yet.
The PCI core and the various architectures don't really have a consistent
way of dealing with these.

Many places currently use "res->start == 0" to indicate unassigned
resources.  I don't think that's a good idea in general, because it's
possible for a resource to actually start at zero.  Zero is also a
perfectly good BAR value, especially for a host bridge that translates
addresses, so I want to support that, too.

The IORESOURCE_UNSET flag exists already, but is hardly used at all.  In
drivers/pci, we set it for an obscure error case, and clear it when
updating a BAR.  The microblaze and powerpc architectures use it the same
way I want to use it here: to indicate a resource with no assigned address.

Here's the outline of what this series does:

- Add resource_contains(): true iff r1 contains r2 (for minor cleanup)
- Make %pR print resource size, not address, when IORESOURCE_UNSET
- Stop advertising pci_find_parent_resource() for use in allocation
- Mark PCI resources IORESOURCE_UNSET when BIOS left decoding disabled
- Mark PCI resources IORESOURCE_UNSET while we're trying to assign addresses
- Don't enable PCI decoding when no address has been assigned to BARs

It might be too aggressive to ignore the initial value of a BAR and try to
reassign it when the BIOS left decoding disabled.  If the BIOS left
decoding *enabled*, we can have some confidence that the BAR value is
valid.  It's possible the BAR is also valid even if the BIOS turned off
decoding.  We could conceivably try to use BAR values that are inside
upstream bridge windows, even if the BAR was initially disabled.  But this
first pass just ignores the values in BARs that are disabled.

I welcome any comments :)

---

Bjorn Helgaas (9):
      resource: Add resource_contains()
      vsprintf: Add support for IORESOURCE_UNSET in %pR
      PCI: Remove pci_find_parent_resource() use for allocation
      PCI: Mark resources as IORESOURCE_UNSET if we can't assign them
      PCI: Don't clear IORESOURCE_UNSET when updating BAR
      PCI: Check IORESOURCE_UNSET before updating BAR
      PCI: Don't try to claim IORESOURCE_UNSET resources
      PCI: Ignore BAR contents when firmware left decoding disabled
      PCI: Don't enable decoding if BAR hasn't been assigned an address


 drivers/pci/host-bridge.c |    8 --------
 drivers/pci/pci.c         |   41 +++++++++++++++++++++++++----------------
 drivers/pci/probe.c       |    8 +++++++-
 drivers/pci/quirks.c      |    5 +++++
 drivers/pci/rom.c         |    2 ++
 drivers/pci/setup-res.c   |   37 +++++++++++++++++++++++++------------
 include/linux/ioport.h    |   12 +++++++++++-
 kernel/resource.c         |    8 ++------
 lib/vsprintf.c            |   13 +++++++++----
 9 files changed, 86 insertions(+), 48 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/9] resource: Add resource_contains()
  2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
@ 2014-02-26 19:37 ` Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 2/9] vsprintf: Add support for IORESOURCE_UNSET in %pR Bjorn Helgaas
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:37 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

We have two identical copies of resource_contains() already, and more
places that could use it.  This moves it to ioport.h where it can be
shared.

resource_contains(struct resource *r1, struct resource *r2) returns true
iff r1 and r2 are the same type (most callers already checked this
separately) and the r1 address range completely contains r2.

In addition, the new resource_contains() checks that both r1 and r2 have
addresses assigned to them.  If a resource is IORESOURCE_UNSET, it doesn't
have a valid address and can't contain or be contained by another resource.
Some callers already check this or for res->start.

No functional change.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/host-bridge.c |    8 --------
 include/linux/ioport.h    |   10 ++++++++++
 kernel/resource.c         |    8 ++------
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
index 06ace6248c61..47aaf22d814e 100644
--- a/drivers/pci/host-bridge.c
+++ b/drivers/pci/host-bridge.c
@@ -32,11 +32,6 @@ void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
 	bridge->release_data = release_data;
 }
 
-static bool resource_contains(struct resource *res1, struct resource *res2)
-{
-	return res1->start <= res2->start && res1->end >= res2->end;
-}
-
 void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region,
 			     struct resource *res)
 {
@@ -45,9 +40,6 @@ void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region,
 	resource_size_t offset = 0;
 
 	list_for_each_entry(window, &bridge->windows, list) {
-		if (resource_type(res) != resource_type(window->res))
-			continue;
-
 		if (resource_contains(window->res, res)) {
 			offset = window->offset;
 			break;
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 89b7c24a36e9..9fcaac8bc4f6 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -169,6 +169,16 @@ static inline unsigned long resource_type(const struct resource *res)
 {
 	return res->flags & IORESOURCE_TYPE_BITS;
 }
+/* True iff r1 completely contains r2 */
+static inline bool resource_contains(struct resource *r1, struct resource *r2)
+{
+	if (resource_type(r1) != resource_type(r2))
+		return false;
+	if (r1->flags & IORESOURCE_UNSET || r2->flags & IORESOURCE_UNSET)
+		return false;
+	return r1->start <= r2->start && r1->end >= r2->end;
+}
+
 
 /* Convenience shorthand with allocation */
 #define request_region(start,n,name)		__request_region(&ioport_resource, (start), (n), (name), 0)
diff --git a/kernel/resource.c b/kernel/resource.c
index 3f285dce9347..a8344dda7049 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -432,11 +432,6 @@ static void resource_clip(struct resource *res, resource_size_t min,
 		res->end = max;
 }
 
-static bool resource_contains(struct resource *res1, struct resource *res2)
-{
-	return res1->start <= res2->start && res1->end >= res2->end;
-}
-
 /*
  * Find empty slot in the resource tree with the given range and
  * alignment constraints
@@ -471,10 +466,11 @@ static int __find_resource(struct resource *root, struct resource *old,
 		arch_remove_reservations(&tmp);
 
 		/* Check for overflow after ALIGN() */
-		avail = *new;
 		avail.start = ALIGN(tmp.start, constraint->align);
 		avail.end = tmp.end;
+		avail.flags = new->flags & ~IORESOURCE_UNSET;
 		if (avail.start >= tmp.start) {
+			alloc.flags = avail.flags;
 			alloc.start = constraint->alignf(constraint->alignf_data, &avail,
 					size, constraint->align);
 			alloc.end = alloc.start + size - 1;


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/9] vsprintf: Add support for IORESOURCE_UNSET in %pR
  2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 1/9] resource: Add resource_contains() Bjorn Helgaas
@ 2014-02-26 19:37 ` Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 3/9] PCI: Remove pci_find_parent_resource() use for allocation Bjorn Helgaas
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:37 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

Sometimes we have a struct resource where we know the type (MEM/IO/etc.)
and the size, but we haven't assigned address space for it.  The
IORESOURCE_UNSET flag is a way to indicate this situation.  For these
"unset" resources, the start address is meaningless, so print only the
size, e.g.,

  - pci 0000:0c:00.0: reg 184: [mem 0x00000000-0x00001fff 64bit]
  + pci 0000:0c:00.0: reg 184: [mem size 0x2000 64bit]

For %pr (printing with raw flags), we still print the address range,
because %pr is mostly used for debugging anyway.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 include/linux/ioport.h |    2 +-
 lib/vsprintf.c         |   13 +++++++++----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 9fcaac8bc4f6..5e3a906cc089 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -51,7 +51,7 @@ struct resource {
 
 #define IORESOURCE_EXCLUSIVE	0x08000000	/* Userland may not map this resource */
 #define IORESOURCE_DISABLED	0x10000000
-#define IORESOURCE_UNSET	0x20000000
+#define IORESOURCE_UNSET	0x20000000	/* No address assigned yet */
 #define IORESOURCE_AUTO		0x40000000
 #define IORESOURCE_BUSY		0x80000000	/* Driver has marked this resource busy */
 
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 185b6d300ebc..c14669f4ffc4 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -719,10 +719,15 @@ char *resource_string(char *buf, char *end, struct resource *res,
 		specp = &mem_spec;
 		decode = 0;
 	}
-	p = number(p, pend, res->start, *specp);
-	if (res->start != res->end) {
-		*p++ = '-';
-		p = number(p, pend, res->end, *specp);
+	if (decode && res->flags & IORESOURCE_UNSET) {
+		p = string(p, pend, "size ", str_spec);
+		p = number(p, pend, res->end - res->start + 1, *specp);
+	} else {
+		p = number(p, pend, res->start, *specp);
+		if (res->start != res->end) {
+			*p++ = '-';
+			p = number(p, pend, res->end, *specp);
+		}
 	}
 	if (decode) {
 		if (res->flags & IORESOURCE_MEM_64)


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/9] PCI: Remove pci_find_parent_resource() use for allocation
  2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 1/9] resource: Add resource_contains() Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 2/9] vsprintf: Add support for IORESOURCE_UNSET in %pR Bjorn Helgaas
@ 2014-02-26 19:37 ` Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 4/9] PCI: Mark resources as IORESOURCE_UNSET if we can't assign them Bjorn Helgaas
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:37 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

If the resource hasn't been allocated yet, pci_find_parent_resource() is
documented as returning the region "where it should be allocated from."
This is impossible in general because there may be several candidates: a
prefetchable BAR can be put in either a prefetchable or non-prefetchable
window, a transparent bridge may have overlapping positively- and
subtractively-decoded windows, and a root bus may have several windows of
the same type.

Allocation should be done by pci_bus_alloc_resource(), which iterates
through all bus resources and looks for the best match, e.g., one with the
desired prefetchability attributes, and falls back to less-desired
possibilities.

The only valid use of pci_find_parent_resource() is to find the parent of
an already-allocated resource so we can claim it via request_resource(),
and all we need for that is a bus region of the correct type that contains
the resource.

Note that like 8c8def26bfaa ("PCI: allow matching of prefetchable resources
to non-prefetchable windows"), this depends on pci_bus_for_each_resource()
iterating through positively-decoded regions before subtractively-decoded
ones.  We prefer not to return a subtractively-decoded region because
requesting from it will likely conflict with the overlapping positively-
decoded window (see Launchpad report below).

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/424142
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/pci/pci.c |   39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1febe90831b4..99293fa40db9 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -401,33 +401,40 @@ EXPORT_SYMBOL_GPL(pci_find_ht_capability);
  * @res: child resource record for which parent is sought
  *
  *  For given resource region of given device, return the resource
- *  region of parent bus the given region is contained in or where
- *  it should be allocated from.
+ *  region of parent bus the given region is contained in.
  */
 struct resource *
 pci_find_parent_resource(const struct pci_dev *dev, struct resource *res)
 {
 	const struct pci_bus *bus = dev->bus;
+	struct resource *r;
 	int i;
-	struct resource *best = NULL, *r;
 
 	pci_bus_for_each_resource(bus, r, i) {
 		if (!r)
 			continue;
-		if (res->start && !(res->start >= r->start && res->end <= r->end))
-			continue;	/* Not contained */
-		if ((res->flags ^ r->flags) & (IORESOURCE_IO | IORESOURCE_MEM))
-			continue;	/* Wrong type */
-		if (!((res->flags ^ r->flags) & IORESOURCE_PREFETCH))
-			return r;	/* Exact match */
-		/* We can't insert a non-prefetch resource inside a prefetchable parent .. */
-		if (r->flags & IORESOURCE_PREFETCH)
-			continue;
-		/* .. but we can put a prefetchable resource inside a non-prefetchable one */
-		if (!best)
-			best = r;
+		if (res->start && resource_contains(r, res)) {
+
+			/*
+			 * If the window is prefetchable but the BAR is
+			 * not, the allocator made a mistake.
+			 */
+			if (r->flags & IORESOURCE_PREFETCH &&
+			    !(res->flags & IORESOURCE_PREFETCH))
+				return NULL;
+
+			/*
+			 * If we're below a transparent bridge, there may
+			 * be both a positively-decoded aperture and a
+			 * subtractively-decoded region that contain the BAR.
+			 * We want the positively-decoded one, so this depends
+			 * on pci_bus_for_each_resource() giving us those
+			 * first.
+			 */
+			return r;
+		}
 	}
-	return best;
+	return NULL;
 }
 
 /**


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/9] PCI: Mark resources as IORESOURCE_UNSET if we can't assign them
  2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
                   ` (2 preceding siblings ...)
  2014-02-26 19:37 ` [PATCH 3/9] PCI: Remove pci_find_parent_resource() use for allocation Bjorn Helgaas
@ 2014-02-26 19:37 ` Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 5/9] PCI: Don't clear IORESOURCE_UNSET when updating BAR Bjorn Helgaas
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:37 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

When assigning addresses to resources, mark them with IORESOURCE_UNSET
before we start and clear IORESOURCE_UNSET if assignment is successful.
That means that if we print the resource during assignment, we will show
the size, not a meaningless address.

Also, clear IORESOURCE_UNSET if we do assign an address, so we print the
address when it is valid.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci.c       |    2 ++
 drivers/pci/quirks.c    |    5 +++++
 drivers/pci/rom.c       |    2 ++
 drivers/pci/setup-res.c |    4 ++++
 4 files changed, 13 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 99293fa40db9..dc9ce62be7aa 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4244,6 +4244,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
 				"Rounding up size of resource #%d to %#llx.\n",
 				i, (unsigned long long)size);
 		}
+		r->flags |= IORESOURCE_UNSET;
 		r->end = size - 1;
 		r->start = 0;
 	}
@@ -4257,6 +4258,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
 			r = &dev->resource[i];
 			if (!(r->flags & IORESOURCE_MEM))
 				continue;
+			r->flags |= IORESOURCE_UNSET;
 			r->end = resource_size(r) - 1;
 			r->start = 0;
 		}
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 5cb726c193de..6e596ab77fb9 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -296,6 +296,7 @@ static void quirk_s3_64M(struct pci_dev *dev)
 	struct resource *r = &dev->resource[0];
 
 	if ((r->start & 0x3ffffff) || r->end != r->start + 0x3ffffff) {
+		r->flags |= IORESOURCE_UNSET;
 		r->start = 0;
 		r->end = 0x3ffffff;
 	}
@@ -937,6 +938,8 @@ DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_AMD,	PCI_DEVICE_ID_AMD_FE_GATE_700C
 static void quirk_dunord(struct pci_dev *dev)
 {
 	struct resource *r = &dev->resource [1];
+
+	r->flags |= IORESOURCE_UNSET;
 	r->start = 0;
 	r->end = 0xffffff;
 }
@@ -1740,6 +1743,7 @@ static void quirk_tc86c001_ide(struct pci_dev *dev)
 	struct resource *r = &dev->resource[0];
 
 	if (r->start & 0x8) {
+		r->flags |= IORESOURCE_UNSET;
 		r->start = 0;
 		r->end = 0xf;
 	}
@@ -1769,6 +1773,7 @@ static void quirk_plx_pci9050(struct pci_dev *dev)
 			dev_info(&dev->dev,
 				 "Re-allocating PLX PCI 9050 BAR %u to length 256 to avoid bit 7 bug\n",
 				 bar);
+			r->flags |= IORESOURCE_UNSET;
 			r->start = 0;
 			r->end = 0xff;
 		}
diff --git a/drivers/pci/rom.c b/drivers/pci/rom.c
index 5d595724e5f4..c1839450d4d6 100644
--- a/drivers/pci/rom.c
+++ b/drivers/pci/rom.c
@@ -197,8 +197,10 @@ void pci_unmap_rom(struct pci_dev *pdev, void __iomem *rom)
 void pci_cleanup_rom(struct pci_dev *pdev)
 {
 	struct resource *res = &pdev->resource[PCI_ROM_RESOURCE];
+
 	if (res->flags & IORESOURCE_ROM_COPY) {
 		kfree((void*)(unsigned long)res->start);
+		res->flags |= IORESOURCE_UNSET;
 		res->flags &= ~IORESOURCE_ROM_COPY;
 		res->start = 0;
 		res->end = 0;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 5c060b152ce6..0474b0217fdf 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -263,6 +263,7 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
 	resource_size_t align, size;
 	int ret;
 
+	res->flags |= IORESOURCE_UNSET;
 	align = pci_resource_alignment(dev, res);
 	if (!align) {
 		dev_info(&dev->dev, "BAR %d: can't assign %pR "
@@ -282,6 +283,7 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
 		ret = pci_revert_fw_address(res, dev, resno, size);
 
 	if (!ret) {
+		res->flags &= ~IORESOURCE_UNSET;
 		res->flags &= ~IORESOURCE_STARTALIGN;
 		dev_info(&dev->dev, "BAR %d: assigned %pR\n", resno, res);
 		if (resno < PCI_BRIDGE_RESOURCES)
@@ -297,6 +299,7 @@ int pci_reassign_resource(struct pci_dev *dev, int resno, resource_size_t addsiz
 	resource_size_t new_size;
 	int ret;
 
+	res->flags |= IORESOURCE_UNSET;
 	if (!res->parent) {
 		dev_info(&dev->dev, "BAR %d: can't reassign an unassigned resource %pR "
 			 "\n", resno, res);
@@ -307,6 +310,7 @@ int pci_reassign_resource(struct pci_dev *dev, int resno, resource_size_t addsiz
 	new_size = resource_size(res) + addsize;
 	ret = _pci_assign_resource(dev, resno, new_size, min_align);
 	if (!ret) {
+		res->flags &= ~IORESOURCE_UNSET;
 		res->flags &= ~IORESOURCE_STARTALIGN;
 		dev_info(&dev->dev, "BAR %d: reassigned %pR\n", resno, res);
 		if (resno < PCI_BRIDGE_RESOURCES)


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/9] PCI: Don't clear IORESOURCE_UNSET when updating BAR
  2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
                   ` (3 preceding siblings ...)
  2014-02-26 19:37 ` [PATCH 4/9] PCI: Mark resources as IORESOURCE_UNSET if we can't assign them Bjorn Helgaas
@ 2014-02-26 19:37 ` Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 6/9] PCI: Check IORESOURCE_UNSET before " Bjorn Helgaas
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:37 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

Clear IORESOURCE_UNSET when we assign an address to a resource, not when we
write the address to the BAR.

Also, drop the "BAR %d: set to %pR" message; this is mostly redundant with
the "BAR %d: assigned %pR" message from pci_assign_resource().

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/setup-res.c |    5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 0474b0217fdf..725d5b28398c 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -101,11 +101,6 @@ void pci_update_resource(struct pci_dev *dev, int resno)
 
 	if (disable)
 		pci_write_config_word(dev, PCI_COMMAND, cmd);
-
-	res->flags &= ~IORESOURCE_UNSET;
-	dev_dbg(&dev->dev, "BAR %d: set to %pR (PCI address [%#llx-%#llx])\n",
-		resno, res, (unsigned long long)region.start,
-		(unsigned long long)region.end);
 }
 
 int pci_claim_resource(struct pci_dev *dev, int resource)


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/9] PCI: Check IORESOURCE_UNSET before updating BAR
  2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
                   ` (4 preceding siblings ...)
  2014-02-26 19:37 ` [PATCH 5/9] PCI: Don't clear IORESOURCE_UNSET when updating BAR Bjorn Helgaas
@ 2014-02-26 19:37 ` Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 7/9] PCI: Don't try to claim IORESOURCE_UNSET resources Bjorn Helgaas
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:37 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

Check to make sure we don't update a BAR with an address we haven't
assigned.

If we haven't assigned an address to a resource, we shouldn't write it to a
BAR.  This isn't a problem for the usual path via pci_assign_resource(),
which clears IORESOURCE_UNSET before calling pci_update_resource(), but
paths like pci_restore_bars() can call this for resources we haven't
assigned.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/setup-res.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 725d5b28398c..7f7652176fc5 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -44,6 +44,9 @@ void pci_update_resource(struct pci_dev *dev, int resno)
 	if (!res->flags)
 		return;
 
+	if (res->flags & IORESOURCE_UNSET)
+		return;
+
 	/*
 	 * Ignore non-moveable resources.  This might be legacy resources for
 	 * which no functional BAR register exists or another important


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/9] PCI: Don't try to claim IORESOURCE_UNSET resources
  2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
                   ` (5 preceding siblings ...)
  2014-02-26 19:37 ` [PATCH 6/9] PCI: Check IORESOURCE_UNSET before " Bjorn Helgaas
@ 2014-02-26 19:37 ` Bjorn Helgaas
  2014-02-26 19:37 ` [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled Bjorn Helgaas
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:37 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

If the IORESOURCE_UNSET bit is set, it means we haven't assigned an address
yet, so don't try to claim the region.

Also, make the error messages more uniform and add info about which BAR is
involved.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/setup-res.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 7f7652176fc5..6e443135ba24 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -111,18 +111,23 @@ int pci_claim_resource(struct pci_dev *dev, int resource)
 	struct resource *res = &dev->resource[resource];
 	struct resource *root, *conflict;
 
+	if (res->flags & IORESOURCE_UNSET) {
+		dev_info(&dev->dev, "can't claim BAR %d %pR: no address assigned\n",
+			 resource, res);
+		return -EINVAL;
+	}
+
 	root = pci_find_parent_resource(dev, res);
 	if (!root) {
-		dev_info(&dev->dev, "no compatible bridge window for %pR\n",
-			 res);
+		dev_info(&dev->dev, "can't claim BAR %d %pR: no compatible bridge window\n",
+			 resource, res);
 		return -EINVAL;
 	}
 
 	conflict = request_resource_conflict(root, res);
 	if (conflict) {
-		dev_info(&dev->dev,
-			 "address space collision: %pR conflicts with %s %pR\n",
-			 res, conflict->name, conflict);
+		dev_info(&dev->dev, "can't claim BAR %d %pR: address conflict with %s %pR\n",
+			 resource, res, conflict->name, conflict);
 		return -EBUSY;
 	}
 


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
                   ` (6 preceding siblings ...)
  2014-02-26 19:37 ` [PATCH 7/9] PCI: Don't try to claim IORESOURCE_UNSET resources Bjorn Helgaas
@ 2014-02-26 19:37 ` Bjorn Helgaas
  2014-03-13  8:51   ` Ming Lei
  2014-03-19 18:54   ` Bjorn Helgaas
  2014-02-26 19:38 ` [PATCH 9/9] PCI: Don't enable decoding if BAR hasn't been assigned an address Bjorn Helgaas
  2014-03-04 20:53 ` [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
  9 siblings, 2 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:37 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

Don't rely on BAR contents when the command register says the BAR is
disabled.

If we receive a PCI device from firmware (or a hot-added device that was
just powered up) with the MEMORY or IO enable bits in the PCI command
register cleared, there's no reason to believe the BARs contain valid
addresses.

In that case, we still know the type and size of the BAR, but this
patch marks the resource as "unset" so we have a chance to reassign it.

Historically, we often used "BAR == 0" to decide the BAR is invalid.  But 0
is a legal BAR value, especially if the host bridge translates addresses,
so I think it's better to decide based on the PCI command register, and
store the conclusion in the IORESOURCE_UNSET bit.

Reference: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679545
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=48451
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/probe.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6e34498ec9f0..02654b5ec1b9 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -177,9 +177,10 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 
 	mask = type ? PCI_ROM_ADDRESS_MASK : ~0;
 
+	pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
+
 	/* No printks while decoding is disabled! */
 	if (!dev->mmio_always_on) {
-		pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
 		if (orig_cmd & PCI_COMMAND_DECODE_ENABLE) {
 			pci_write_config_word(dev, PCI_COMMAND,
 				orig_cmd & ~PCI_COMMAND_DECODE_ENABLE);
@@ -215,9 +216,13 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 		if (res->flags & IORESOURCE_IO) {
 			l &= PCI_BASE_ADDRESS_IO_MASK;
 			mask = PCI_BASE_ADDRESS_IO_MASK & (u32) IO_SPACE_LIMIT;
+			if (!(orig_cmd & PCI_COMMAND_IO))
+				res->flags |= IORESOURCE_UNSET;
 		} else {
 			l &= PCI_BASE_ADDRESS_MEM_MASK;
 			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
+			if (!(orig_cmd & PCI_COMMAND_MEMORY))
+				res->flags |= IORESOURCE_UNSET;
 		}
 	} else {
 		res->flags |= (l & IORESOURCE_ROM_ENABLE);
@@ -252,6 +257,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 			/* Address above 32-bit boundary; disable the BAR */
 			pci_write_config_dword(dev, pos, 0);
 			pci_write_config_dword(dev, pos + 4, 0);
+			res->flags |= IORESOURCE_UNSET;
 			region.start = 0;
 			region.end = sz64;
 			bar_disabled = true;


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 9/9] PCI: Don't enable decoding if BAR hasn't been assigned an address
  2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
                   ` (7 preceding siblings ...)
  2014-02-26 19:37 ` [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled Bjorn Helgaas
@ 2014-02-26 19:38 ` Bjorn Helgaas
  2014-03-04 20:53 ` [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
  9 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:38 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

Don't enable memory or I/O decoding if we haven't assigned or claimed the
BAR's resource.

If we enable decoding for a BAR that hasn't been assigned an address, we'll
likely cause bus conflicts.  This declines to enable decoding for resources
with IORESOURCE_UNSET.

Note that drivers can use pci_enable_device_io() or pci_enable_device_mem()
if they only care about specific types of BARs.  In that case, we don't
bother checking whether the corresponding resources are assigned or
claimed.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/setup-res.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 6e443135ba24..7eed671d5586 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -343,9 +343,15 @@ int pci_enable_resources(struct pci_dev *dev, int mask)
 				(!(r->flags & IORESOURCE_ROM_ENABLE)))
 			continue;
 
+		if (r->flags & IORESOURCE_UNSET) {
+			dev_err(&dev->dev, "can't enable device: BAR %d %pR not assigned\n",
+				i, r);
+			return -EINVAL;
+		}
+
 		if (!r->parent) {
-			dev_err(&dev->dev, "device not available "
-				"(can't reserve %pR)\n", r);
+			dev_err(&dev->dev, "can't enable device: BAR %d %pR not claimed\n",
+				i, r);
 			return -EINVAL;
 		}
 


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs
  2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
                   ` (8 preceding siblings ...)
  2014-02-26 19:38 ` [PATCH 9/9] PCI: Don't enable decoding if BAR hasn't been assigned an address Bjorn Helgaas
@ 2014-03-04 20:53 ` Bjorn Helgaas
  9 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-03-04 20:53 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

On Wed, Feb 26, 2014 at 12:37 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> I'm trying to unify the way we handle unassigned PCI BARs, i.e., resources
> where we know the type and size, but we haven't assigned an address yet.
> The PCI core and the various architectures don't really have a consistent
> way of dealing with these.
>
> Many places currently use "res->start == 0" to indicate unassigned
> resources.  I don't think that's a good idea in general, because it's
> possible for a resource to actually start at zero.  Zero is also a
> perfectly good BAR value, especially for a host bridge that translates
> addresses, so I want to support that, too.
>
> The IORESOURCE_UNSET flag exists already, but is hardly used at all.  In
> drivers/pci, we set it for an obscure error case, and clear it when
> updating a BAR.  The microblaze and powerpc architectures use it the same
> way I want to use it here: to indicate a resource with no assigned address.
>
> Here's the outline of what this series does:
>
> - Add resource_contains(): true iff r1 contains r2 (for minor cleanup)
> - Make %pR print resource size, not address, when IORESOURCE_UNSET
> - Stop advertising pci_find_parent_resource() for use in allocation
> - Mark PCI resources IORESOURCE_UNSET when BIOS left decoding disabled
> - Mark PCI resources IORESOURCE_UNSET while we're trying to assign addresses
> - Don't enable PCI decoding when no address has been assigned to BARs
>
> It might be too aggressive to ignore the initial value of a BAR and try to
> reassign it when the BIOS left decoding disabled.  If the BIOS left
> decoding *enabled*, we can have some confidence that the BAR value is
> valid.  It's possible the BAR is also valid even if the BIOS turned off
> decoding.  We could conceivably try to use BAR values that are inside
> upstream bridge windows, even if the BAR was initially disabled.  But this
> first pass just ignores the values in BARs that are disabled.
>
> I welcome any comments :)
>
> ---
>
> Bjorn Helgaas (9):
>       resource: Add resource_contains()
>       vsprintf: Add support for IORESOURCE_UNSET in %pR
>       PCI: Remove pci_find_parent_resource() use for allocation
>       PCI: Mark resources as IORESOURCE_UNSET if we can't assign them
>       PCI: Don't clear IORESOURCE_UNSET when updating BAR
>       PCI: Check IORESOURCE_UNSET before updating BAR
>       PCI: Don't try to claim IORESOURCE_UNSET resources
>       PCI: Ignore BAR contents when firmware left decoding disabled
>       PCI: Don't enable decoding if BAR hasn't been assigned an address
>
>
>  drivers/pci/host-bridge.c |    8 --------
>  drivers/pci/pci.c         |   41 +++++++++++++++++++++++++----------------
>  drivers/pci/probe.c       |    8 +++++++-
>  drivers/pci/quirks.c      |    5 +++++
>  drivers/pci/rom.c         |    2 ++
>  drivers/pci/setup-res.c   |   37 +++++++++++++++++++++++++------------
>  include/linux/ioport.h    |   12 +++++++++++-
>  kernel/resource.c         |    8 ++------
>  lib/vsprintf.c            |   13 +++++++++----
>  9 files changed, 86 insertions(+), 48 deletions(-)

I applied these to pci/resource for v3.15.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-02-26 19:37 ` [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled Bjorn Helgaas
@ 2014-03-13  8:51   ` Ming Lei
  2014-03-13 16:08     ` Bjorn Helgaas
  2014-03-19 18:54   ` Bjorn Helgaas
  1 sibling, 1 reply; 25+ messages in thread
From: Ming Lei @ 2014-03-13  8:51 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Linux Kernel Mailing List, Rusty Russell

Hi Bjorn,

I found this patch broke virtio-pci devices.

On Thu, Feb 27, 2014 at 3:37 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Don't rely on BAR contents when the command register says the BAR is
> disabled.
>
> If we receive a PCI device from firmware (or a hot-added device that was
> just powered up) with the MEMORY or IO enable bits in the PCI command
> register cleared, there's no reason to believe the BARs contain valid
> addresses.

>From PCI LOCAL BUS SPECIFICATION, REV. 3.0, both
PCI_COMMAND_IO and PCI_COMMAND_MEM should be
cleared after reset, so looks the patch sets IORESOURCE_UNSET
too early because PCI drivers may call pci_enable_device()
(->pci_enable_resources()) to enable the two bits of
PCI_COMMAND explicitly.

With this patch, driver can't enable device/resource with
pci_enable_device() any more because IORESOURCE_UNSET
has been set already.

>
> In that case, we still know the type and size of the BAR, but this
> patch marks the resource as "unset" so we have a chance to reassign it.
>
> Historically, we often used "BAR == 0" to decide the BAR is invalid.  But 0
> is a legal BAR value, especially if the host bridge translates addresses,
> so I think it's better to decide based on the PCI command register, and
> store the conclusion in the IORESOURCE_UNSET bit.
>
> Reference: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679545
> Reference: https://bugzilla.kernel.org/show_bug.cgi?id=48451
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/probe.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 6e34498ec9f0..02654b5ec1b9 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -177,9 +177,10 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>
>         mask = type ? PCI_ROM_ADDRESS_MASK : ~0;
>
> +       pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
> +
>         /* No printks while decoding is disabled! */
>         if (!dev->mmio_always_on) {
> -               pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
>                 if (orig_cmd & PCI_COMMAND_DECODE_ENABLE) {
>                         pci_write_config_word(dev, PCI_COMMAND,
>                                 orig_cmd & ~PCI_COMMAND_DECODE_ENABLE);
> @@ -215,9 +216,13 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>                 if (res->flags & IORESOURCE_IO) {
>                         l &= PCI_BASE_ADDRESS_IO_MASK;
>                         mask = PCI_BASE_ADDRESS_IO_MASK & (u32) IO_SPACE_LIMIT;
> +                       if (!(orig_cmd & PCI_COMMAND_IO))
> +                               res->flags |= IORESOURCE_UNSET;
>                 } else {
>                         l &= PCI_BASE_ADDRESS_MEM_MASK;
>                         mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> +                       if (!(orig_cmd & PCI_COMMAND_MEMORY))
> +                               res->flags |= IORESOURCE_UNSET;
>                 }
>         } else {
>                 res->flags |= (l & IORESOURCE_ROM_ENABLE);
> @@ -252,6 +257,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>                         /* Address above 32-bit boundary; disable the BAR */
>                         pci_write_config_dword(dev, pos, 0);
>                         pci_write_config_dword(dev, pos + 4, 0);
> +                       res->flags |= IORESOURCE_UNSET;
>                         region.start = 0;
>                         region.end = sz64;
>                         bar_disabled = true;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-13  8:51   ` Ming Lei
@ 2014-03-13 16:08     ` Bjorn Helgaas
  2014-03-14  1:48       ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2014-03-13 16:08 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-pci, Linux Kernel Mailing List, Rusty Russell

On Thu, Mar 13, 2014 at 2:51 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> Hi Bjorn,
>
> I found this patch broke virtio-pci devices.

Thanks a lot for testing this.

> On Thu, Feb 27, 2014 at 3:37 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Don't rely on BAR contents when the command register says the BAR is
>> disabled.
>>
>> If we receive a PCI device from firmware (or a hot-added device that was
>> just powered up) with the MEMORY or IO enable bits in the PCI command
>> register cleared, there's no reason to believe the BARs contain valid
>> addresses.
>
> From PCI LOCAL BUS SPECIFICATION, REV. 3.0, both
> PCI_COMMAND_IO and PCI_COMMAND_MEM should be
> cleared after reset, so looks the patch sets IORESOURCE_UNSET
> too early because PCI drivers may call pci_enable_device()
> (->pci_enable_resources()) to enable the two bits of
> PCI_COMMAND explicitly.

The point is that it's not safe to enable those two bits unless we're
certain that the BARs they control contain valid values that don't
conflict with anything else in the system.

Maybe we should only set IORESOURCE_UNSET when we find a conflict or a
BAR that's not contained by an upstream bridge window, and we should
try to reallocate then.  I'm pretty sure we do that at least in some
cases, but it would probably simplify things if we did it more
consistently, and maybe we shouldn't set it at all here in
__pci_read_base().

But I'd like to understand your situation better, so can you provide
more details, please?  Complete before/after dmesg logs would go a
long way toward illustrating the problem you're seeing.

> With this patch, driver can't enable device/resource with
> pci_enable_device() any more because IORESOURCE_UNSET
> has been set already.

>> In that case, we still know the type and size of the BAR, but this
>> patch marks the resource as "unset" so we have a chance to reassign it.
>>
>> Historically, we often used "BAR == 0" to decide the BAR is invalid.  But 0
>> is a legal BAR value, especially if the host bridge translates addresses,
>> so I think it's better to decide based on the PCI command register, and
>> store the conclusion in the IORESOURCE_UNSET bit.
>>
>> Reference: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679545
>> Reference: https://bugzilla.kernel.org/show_bug.cgi?id=48451
>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>> ---
>>  drivers/pci/probe.c |    8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 6e34498ec9f0..02654b5ec1b9 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -177,9 +177,10 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>>
>>         mask = type ? PCI_ROM_ADDRESS_MASK : ~0;
>>
>> +       pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
>> +
>>         /* No printks while decoding is disabled! */
>>         if (!dev->mmio_always_on) {
>> -               pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
>>                 if (orig_cmd & PCI_COMMAND_DECODE_ENABLE) {
>>                         pci_write_config_word(dev, PCI_COMMAND,
>>                                 orig_cmd & ~PCI_COMMAND_DECODE_ENABLE);
>> @@ -215,9 +216,13 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>>                 if (res->flags & IORESOURCE_IO) {
>>                         l &= PCI_BASE_ADDRESS_IO_MASK;
>>                         mask = PCI_BASE_ADDRESS_IO_MASK & (u32) IO_SPACE_LIMIT;
>> +                       if (!(orig_cmd & PCI_COMMAND_IO))
>> +                               res->flags |= IORESOURCE_UNSET;
>>                 } else {
>>                         l &= PCI_BASE_ADDRESS_MEM_MASK;
>>                         mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
>> +                       if (!(orig_cmd & PCI_COMMAND_MEMORY))
>> +                               res->flags |= IORESOURCE_UNSET;
>>                 }
>>         } else {
>>                 res->flags |= (l & IORESOURCE_ROM_ENABLE);
>> @@ -252,6 +257,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>>                         /* Address above 32-bit boundary; disable the BAR */
>>                         pci_write_config_dword(dev, pos, 0);
>>                         pci_write_config_dword(dev, pos + 4, 0);
>> +                       res->flags |= IORESOURCE_UNSET;
>>                         region.start = 0;
>>                         region.end = sz64;
>>                         bar_disabled = true;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
>
> Thanks,
> --
> Ming Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-13 16:08     ` Bjorn Helgaas
@ 2014-03-14  1:48       ` Ming Lei
  2014-03-18  0:27         ` Bjorn Helgaas
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2014-03-14  1:48 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Linux Kernel Mailing List, Rusty Russell

[-- Attachment #1: Type: text/plain, Size: 1980 bytes --]

On Fri, Mar 14, 2014 at 12:08 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Mar 13, 2014 at 2:51 AM, Ming Lei <tom.leiming@gmail.com> wrote:
>> Hi Bjorn,
>>
>> I found this patch broke virtio-pci devices.
>
> Thanks a lot for testing this.
>
>> On Thu, Feb 27, 2014 at 3:37 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> Don't rely on BAR contents when the command register says the BAR is
>>> disabled.
>>>
>>> If we receive a PCI device from firmware (or a hot-added device that was
>>> just powered up) with the MEMORY or IO enable bits in the PCI command
>>> register cleared, there's no reason to believe the BARs contain valid
>>> addresses.
>>
>> From PCI LOCAL BUS SPECIFICATION, REV. 3.0, both
>> PCI_COMMAND_IO and PCI_COMMAND_MEM should be
>> cleared after reset, so looks the patch sets IORESOURCE_UNSET
>> too early because PCI drivers may call pci_enable_device()
>> (->pci_enable_resources()) to enable the two bits of
>> PCI_COMMAND explicitly.
>
> The point is that it's not safe to enable those two bits unless we're
> certain that the BARs they control contain valid values that don't
> conflict with anything else in the system.
>
> Maybe we should only set IORESOURCE_UNSET when we find a conflict or a
> BAR that's not contained by an upstream bridge window, and we should
> try to reallocate then.  I'm pretty sure we do that at least in some
> cases, but it would probably simplify things if we did it more
> consistently, and maybe we shouldn't set it at all here in
> __pci_read_base().

I think so because __pci_read_base() is called in device emulation
path.

>
> But I'd like to understand your situation better, so can you provide
> more details, please?  Complete before/after dmesg logs would go a
> long way toward illustrating the problem you're seeing.

Please see the two attachment log. The memory allocation failure
is caused by mistaken value read from pci address after the device
is failed to enable.


Thanks,
-- 
Ming Lei

[-- Attachment #2: dmesg_before_revert.tar.gz --]
[-- Type: application/x-gzip, Size: 6458 bytes --]

[-- Attachment #3: dmesg_after_revert.tar.gz --]
[-- Type: application/x-gzip, Size: 10732 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-14  1:48       ` Ming Lei
@ 2014-03-18  0:27         ` Bjorn Helgaas
  2014-03-19  3:32           ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2014-03-18  0:27 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-pci, Linux Kernel Mailing List, Rusty Russell

On Fri, Mar 14, 2014 at 09:48:35AM +0800, Ming Lei wrote:
> On Fri, Mar 14, 2014 at 12:08 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > On Thu, Mar 13, 2014 at 2:51 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> >> Hi Bjorn,
> >>
> >> I found this patch broke virtio-pci devices.
> >
> > Thanks a lot for testing this.
> >
> >> On Thu, Feb 27, 2014 at 3:37 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>> Don't rely on BAR contents when the command register says the BAR is
> >>> disabled.
> >>>
> >>> If we receive a PCI device from firmware (or a hot-added device that was
> >>> just powered up) with the MEMORY or IO enable bits in the PCI command
> >>> register cleared, there's no reason to believe the BARs contain valid
> >>> addresses.
> >>
> >> From PCI LOCAL BUS SPECIFICATION, REV. 3.0, both
> >> PCI_COMMAND_IO and PCI_COMMAND_MEM should be
> >> cleared after reset, so looks the patch sets IORESOURCE_UNSET
> >> too early because PCI drivers may call pci_enable_device()
> >> (->pci_enable_resources()) to enable the two bits of
> >> PCI_COMMAND explicitly.
> >
> > The point is that it's not safe to enable those two bits unless we're
> > certain that the BARs they control contain valid values that don't
> > conflict with anything else in the system.
> >
> > Maybe we should only set IORESOURCE_UNSET when we find a conflict or a
> > BAR that's not contained by an upstream bridge window, and we should
> > try to reallocate then.  I'm pretty sure we do that at least in some
> > cases, but it would probably simplify things if we did it more
> > consistently, and maybe we shouldn't set it at all here in
> > __pci_read_base().
> 
> I think so because __pci_read_base() is called in device emulation
> path.

Which path is this?  I don't know anything about virtio-pci, and I only see
calls to __pci_read_base() from:

  sriov_init()
  pci_sriov_resource_alignment()
  pci_read_bases()

> > But I'd like to understand your situation better, so can you provide
> > more details, please?  Complete before/after dmesg logs would go a
> > long way toward illustrating the problem you're seeing.
> 
> Please see the two attachment log. The memory allocation failure
> is caused by mistaken value read from pci address after the device
> is failed to enable.

Your logs are harder than necessary to compare because one has a lot more
debug turned on than the other.

In the failing case, we ignore all the initial BAR values, but we do assign
values to all of them later:

  pci 0000:00:00.0: can't claim BAR 0 [mem size 0x00000400]: no address assigned
  pci 0000:00:00.0: can't claim BAR 1 [io  size 0x0400]: no address assigned
  ...
  pci 0000:00:00.0: BAR 0: assigned [mem 0x40000000-0x400003ff]
  pci 0000:00:00.0: BAR 1: assigned [io  0x1000-0x13ff]
  ...

The newly-assigned values look valid, and as far as I can tell, they should
work.  Do you know why they don't?  Is there an assumption somewhere that
we never change BAR values?

Even if we don't need to ignore BAR values in as many cases as we do, it
should be legal to ignore them and reassign them, so I want to understand
what's going on here before reverting this.

Is there an easy way I can reproduce the problem on my own box?

Bjorn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-18  0:27         ` Bjorn Helgaas
@ 2014-03-19  3:32           ` Ming Lei
  2014-03-19  4:52             ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2014-03-19  3:32 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Linux Kernel Mailing List, Rusty Russell

On Tue, Mar 18, 2014 at 8:27 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Mar 14, 2014 at 09:48:35AM +0800, Ming Lei wrote:
>> On Fri, Mar 14, 2014 at 12:08 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> > On Thu, Mar 13, 2014 at 2:51 AM, Ming Lei <tom.leiming@gmail.com> wrote:
>> >> Hi Bjorn,
>> >>
>> >> I found this patch broke virtio-pci devices.
>> >
>> > Thanks a lot for testing this.
>> >
>> >> On Thu, Feb 27, 2014 at 3:37 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> >>> Don't rely on BAR contents when the command register says the BAR is
>> >>> disabled.
>> >>>
>> >>> If we receive a PCI device from firmware (or a hot-added device that was
>> >>> just powered up) with the MEMORY or IO enable bits in the PCI command
>> >>> register cleared, there's no reason to believe the BARs contain valid
>> >>> addresses.
>> >>
>> >> From PCI LOCAL BUS SPECIFICATION, REV. 3.0, both
>> >> PCI_COMMAND_IO and PCI_COMMAND_MEM should be
>> >> cleared after reset, so looks the patch sets IORESOURCE_UNSET
>> >> too early because PCI drivers may call pci_enable_device()
>> >> (->pci_enable_resources()) to enable the two bits of
>> >> PCI_COMMAND explicitly.
>> >
>> > The point is that it's not safe to enable those two bits unless we're
>> > certain that the BARs they control contain valid values that don't
>> > conflict with anything else in the system.
>> >
>> > Maybe we should only set IORESOURCE_UNSET when we find a conflict or a
>> > BAR that's not contained by an upstream bridge window, and we should
>> > try to reallocate then.  I'm pretty sure we do that at least in some
>> > cases, but it would probably simplify things if we did it more
>> > consistently, and maybe we shouldn't set it at all here in
>> > __pci_read_base().
>>
>> I think so because __pci_read_base() is called in device emulation
>> path.
>
> Which path is this?  I don't know anything about virtio-pci, and I only see
> calls to __pci_read_base() from:
>
>   sriov_init()
>   pci_sriov_resource_alignment()
>   pci_read_bases()
>
>> > But I'd like to understand your situation better, so can you provide
>> > more details, please?  Complete before/after dmesg logs would go a
>> > long way toward illustrating the problem you're seeing.
>>
>> Please see the two attachment log. The memory allocation failure
>> is caused by mistaken value read from pci address after the device
>> is failed to enable.
>
> Your logs are harder than necessary to compare because one has a lot more
> debug turned on than the other.
>
> In the failing case, we ignore all the initial BAR values, but we do assign
> values to all of them later:
>
>   pci 0000:00:00.0: can't claim BAR 0 [mem size 0x00000400]: no address assigned
>   pci 0000:00:00.0: can't claim BAR 1 [io  size 0x0400]: no address assigned
>   ...
>   pci 0000:00:00.0: BAR 0: assigned [mem 0x40000000-0x400003ff]
>   pci 0000:00:00.0: BAR 1: assigned [io  0x1000-0x13ff]
>   ...
>
> The newly-assigned values look valid, and as far as I can tell, they should
> work.  Do you know why they don't?  Is there an assumption somewhere that
> we never change BAR values?

I don't know the cause, maybe it is related with the hypervisor
implementation.

>
> Even if we don't need to ignore BAR values in as many cases as we do, it
> should be legal to ignore them and reassign them, so I want to understand
> what's going on here before reverting this.
>
> Is there an easy way I can reproduce the problem on my own box?

It is not quite difficult, you can build a lkvm following the README in
below link and test -next tree on the small kvm hypervisor:

     https://github.com/penberg/linux-kvm/blob/master/tools/kvm/README

Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-19  3:32           ` Ming Lei
@ 2014-03-19  4:52             ` Ming Lei
  2014-03-19 16:45               ` Bjorn Helgaas
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2014-03-19  4:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Linux Kernel Mailing List, Rusty Russell,
	Pekka Enberg, Sasha Levin

Hi,

Looks Sasha fixed the problem in lkvm tool[1].

Sasha, looks we both saw the problem, but from technical
view, I am wondering if the fix is correct, because PCI spec.
requires that the IO/MMIO bits in COMMAND register should
be cleared after reset, maybe there are some potential problem
in lkvm pci emulation.


[1],  commit 6478ce1416aacf1ce35530f79ea035f89fb21e90
Author: Sasha Levin <sasha.levin@oracle.com>
Date:   Wed Mar 5 23:08:16 2014 -0500

    kvm tools: mark our PCI card as PIO and MMIO able


Thanks,
--
Ming Lei

On Wed, Mar 19, 2014 at 11:32 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Tue, Mar 18, 2014 at 8:27 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, Mar 14, 2014 at 09:48:35AM +0800, Ming Lei wrote:
>>> On Fri, Mar 14, 2014 at 12:08 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> > On Thu, Mar 13, 2014 at 2:51 AM, Ming Lei <tom.leiming@gmail.com> wrote:
>>> >> Hi Bjorn,
>>> >>
>>> >> I found this patch broke virtio-pci devices.
>>> >
>>> > Thanks a lot for testing this.
>>> >
>>> >> On Thu, Feb 27, 2014 at 3:37 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> >>> Don't rely on BAR contents when the command register says the BAR is
>>> >>> disabled.
>>> >>>
>>> >>> If we receive a PCI device from firmware (or a hot-added device that was
>>> >>> just powered up) with the MEMORY or IO enable bits in the PCI command
>>> >>> register cleared, there's no reason to believe the BARs contain valid
>>> >>> addresses.
>>> >>
>>> >> From PCI LOCAL BUS SPECIFICATION, REV. 3.0, both
>>> >> PCI_COMMAND_IO and PCI_COMMAND_MEM should be
>>> >> cleared after reset, so looks the patch sets IORESOURCE_UNSET
>>> >> too early because PCI drivers may call pci_enable_device()
>>> >> (->pci_enable_resources()) to enable the two bits of
>>> >> PCI_COMMAND explicitly.
>>> >
>>> > The point is that it's not safe to enable those two bits unless we're
>>> > certain that the BARs they control contain valid values that don't
>>> > conflict with anything else in the system.
>>> >
>>> > Maybe we should only set IORESOURCE_UNSET when we find a conflict or a
>>> > BAR that's not contained by an upstream bridge window, and we should
>>> > try to reallocate then.  I'm pretty sure we do that at least in some
>>> > cases, but it would probably simplify things if we did it more
>>> > consistently, and maybe we shouldn't set it at all here in
>>> > __pci_read_base().
>>>
>>> I think so because __pci_read_base() is called in device emulation
>>> path.
>>
>> Which path is this?  I don't know anything about virtio-pci, and I only see
>> calls to __pci_read_base() from:
>>
>>   sriov_init()
>>   pci_sriov_resource_alignment()
>>   pci_read_bases()
>>
>>> > But I'd like to understand your situation better, so can you provide
>>> > more details, please?  Complete before/after dmesg logs would go a
>>> > long way toward illustrating the problem you're seeing.
>>>
>>> Please see the two attachment log. The memory allocation failure
>>> is caused by mistaken value read from pci address after the device
>>> is failed to enable.
>>
>> Your logs are harder than necessary to compare because one has a lot more
>> debug turned on than the other.
>>
>> In the failing case, we ignore all the initial BAR values, but we do assign
>> values to all of them later:
>>
>>   pci 0000:00:00.0: can't claim BAR 0 [mem size 0x00000400]: no address assigned
>>   pci 0000:00:00.0: can't claim BAR 1 [io  size 0x0400]: no address assigned
>>   ...
>>   pci 0000:00:00.0: BAR 0: assigned [mem 0x40000000-0x400003ff]
>>   pci 0000:00:00.0: BAR 1: assigned [io  0x1000-0x13ff]
>>   ...
>>
>> The newly-assigned values look valid, and as far as I can tell, they should
>> work.  Do you know why they don't?  Is there an assumption somewhere that
>> we never change BAR values?
>
> I don't know the cause, maybe it is related with the hypervisor
> implementation.
>
>>
>> Even if we don't need to ignore BAR values in as many cases as we do, it
>> should be legal to ignore them and reassign them, so I want to understand
>> what's going on here before reverting this.
>>
>> Is there an easy way I can reproduce the problem on my own box?
>
> It is not quite difficult, you can build a lkvm following the README in
> below link and test -next tree on the small kvm hypervisor:
>
>      https://github.com/penberg/linux-kvm/blob/master/tools/kvm/README
>
> Thanks,
> --
> Ming Lei



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-19  4:52             ` Ming Lei
@ 2014-03-19 16:45               ` Bjorn Helgaas
  2014-03-20  1:32                 ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2014-03-19 16:45 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-pci, Linux Kernel Mailing List, Rusty Russell,
	Pekka Enberg, Sasha Levin

On Tue, Mar 18, 2014 at 10:52 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> Hi,
>
> Looks Sasha fixed the problem in lkvm tool[1].
>
> Sasha, looks we both saw the problem, but from technical
> view, I am wondering if the fix is correct, because PCI spec.
> requires that the IO/MMIO bits in COMMAND register should
> be cleared after reset, maybe there are some potential problem
> in lkvm pci emulation.

I think I'm going to revert this patch ([2], "Ignore BAR contents when
firmware left decoding disabled").  The main reason for that patch was
to try for a consistent way of figuring out whether BARs are valid
that we could use on all architectures, but I think we can do it in a
better way.

That said, this kvm change should not be necessary.  We *should* be
able to take any PCI device and initialize it from power-on state
without any dependencies on what the BIOS left in the BARs or the
command register.  As far as I can tell, the PCI core actually worked
fine in this case (we assigned valid addresses to the devices), but
something else blew up.  If I revert that patch, it will cover up
whatever this other bug is, but it would be much better to figure out
what it is and fix is.

You said earlier that "The memory allocation failure is caused by
mistaken value read from pci address after the device is failed to
enable."  Can you elaborate on that?  Are you saying that something
tried to read from a region mapped by a BAR even though
pci_enable_device() failed?  That would be a programming error, of
course.  If you have any more details about exactly where this
happened, that would help a lot in finding the problem.

Bjorn

[2] http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?id=5c89a9ee943d5e

> [1],  commit 6478ce1416aacf1ce35530f79ea035f89fb21e90
> Author: Sasha Levin <sasha.levin@oracle.com>
> Date:   Wed Mar 5 23:08:16 2014 -0500
>
>     kvm tools: mark our PCI card as PIO and MMIO able
>
>
> Thanks,
> --
> Ming Lei
>
> On Wed, Mar 19, 2014 at 11:32 AM, Ming Lei <tom.leiming@gmail.com> wrote:
>> On Tue, Mar 18, 2014 at 8:27 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> On Fri, Mar 14, 2014 at 09:48:35AM +0800, Ming Lei wrote:
>>>> On Fri, Mar 14, 2014 at 12:08 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>> > On Thu, Mar 13, 2014 at 2:51 AM, Ming Lei <tom.leiming@gmail.com> wrote:
>>>> >> Hi Bjorn,
>>>> >>
>>>> >> I found this patch broke virtio-pci devices.
>>>> >
>>>> > Thanks a lot for testing this.
>>>> >
>>>> >> On Thu, Feb 27, 2014 at 3:37 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>> >>> Don't rely on BAR contents when the command register says the BAR is
>>>> >>> disabled.
>>>> >>>
>>>> >>> If we receive a PCI device from firmware (or a hot-added device that was
>>>> >>> just powered up) with the MEMORY or IO enable bits in the PCI command
>>>> >>> register cleared, there's no reason to believe the BARs contain valid
>>>> >>> addresses.
>>>> >>
>>>> >> From PCI LOCAL BUS SPECIFICATION, REV. 3.0, both
>>>> >> PCI_COMMAND_IO and PCI_COMMAND_MEM should be
>>>> >> cleared after reset, so looks the patch sets IORESOURCE_UNSET
>>>> >> too early because PCI drivers may call pci_enable_device()
>>>> >> (->pci_enable_resources()) to enable the two bits of
>>>> >> PCI_COMMAND explicitly.
>>>> >
>>>> > The point is that it's not safe to enable those two bits unless we're
>>>> > certain that the BARs they control contain valid values that don't
>>>> > conflict with anything else in the system.
>>>> >
>>>> > Maybe we should only set IORESOURCE_UNSET when we find a conflict or a
>>>> > BAR that's not contained by an upstream bridge window, and we should
>>>> > try to reallocate then.  I'm pretty sure we do that at least in some
>>>> > cases, but it would probably simplify things if we did it more
>>>> > consistently, and maybe we shouldn't set it at all here in
>>>> > __pci_read_base().
>>>>
>>>> I think so because __pci_read_base() is called in device emulation
>>>> path.
>>>
>>> Which path is this?  I don't know anything about virtio-pci, and I only see
>>> calls to __pci_read_base() from:
>>>
>>>   sriov_init()
>>>   pci_sriov_resource_alignment()
>>>   pci_read_bases()
>>>
>>>> > But I'd like to understand your situation better, so can you provide
>>>> > more details, please?  Complete before/after dmesg logs would go a
>>>> > long way toward illustrating the problem you're seeing.
>>>>
>>>> Please see the two attachment log. The memory allocation failure
>>>> is caused by mistaken value read from pci address after the device
>>>> is failed to enable.
>>>
>>> Your logs are harder than necessary to compare because one has a lot more
>>> debug turned on than the other.
>>>
>>> In the failing case, we ignore all the initial BAR values, but we do assign
>>> values to all of them later:
>>>
>>>   pci 0000:00:00.0: can't claim BAR 0 [mem size 0x00000400]: no address assigned
>>>   pci 0000:00:00.0: can't claim BAR 1 [io  size 0x0400]: no address assigned
>>>   ...
>>>   pci 0000:00:00.0: BAR 0: assigned [mem 0x40000000-0x400003ff]
>>>   pci 0000:00:00.0: BAR 1: assigned [io  0x1000-0x13ff]
>>>   ...
>>>
>>> The newly-assigned values look valid, and as far as I can tell, they should
>>> work.  Do you know why they don't?  Is there an assumption somewhere that
>>> we never change BAR values?
>>
>> I don't know the cause, maybe it is related with the hypervisor
>> implementation.
>>
>>>
>>> Even if we don't need to ignore BAR values in as many cases as we do, it
>>> should be legal to ignore them and reassign them, so I want to understand
>>> what's going on here before reverting this.
>>>
>>> Is there an easy way I can reproduce the problem on my own box?
>>
>> It is not quite difficult, you can build a lkvm following the README in
>> below link and test -next tree on the small kvm hypervisor:
>>
>>      https://github.com/penberg/linux-kvm/blob/master/tools/kvm/README
>>
>> Thanks,
>> --
>> Ming Lei
>
>
>
> --
> Ming Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-02-26 19:37 ` [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled Bjorn Helgaas
  2014-03-13  8:51   ` Ming Lei
@ 2014-03-19 18:54   ` Bjorn Helgaas
  2014-03-19 21:16     ` Bjorn Helgaas
  1 sibling, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2014-03-19 18:54 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-kernel, Ming Lei, Rusty Russell, Pekka Enberg, Sasha Levin

[+cc Ming, Rusty, Pekka, Sasha]

On Wed, Feb 26, 2014 at 12:37:57PM -0700, Bjorn Helgaas wrote:
> Don't rely on BAR contents when the command register says the BAR is
> disabled.
> 
> If we receive a PCI device from firmware (or a hot-added device that was
> just powered up) with the MEMORY or IO enable bits in the PCI command
> register cleared, there's no reason to believe the BARs contain valid
> addresses.
> 
> In that case, we still know the type and size of the BAR, but this
> patch marks the resource as "unset" so we have a chance to reassign it.
> 
> Historically, we often used "BAR == 0" to decide the BAR is invalid.  But 0
> is a legal BAR value, especially if the host bridge translates addresses,
> so I think it's better to decide based on the PCI command register, and
> store the conclusion in the IORESOURCE_UNSET bit.

I plan to replace this patch with the following, which only sets
IORESOURCE_UNSET when we already have been clearing the bus region start
address.  (This probably should have been a separate patch to begin with,
mea culpa.)

This is intended for the v3.15 merge window, so I made the minimal change
to reduce risk.

Thanks to Ming Lei for prompting me to look at this; I think the issue he
reported with the original patch is really a problem somewhere else that
the patch just happened to expose, but the original patch was more
aggressive than necessary, so this revision tones it down a bit.

Bjorn


PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit

From: Bjorn Helgaas <bhelgaas@google.com>

If we don't support 64-bit addresses, i.e., CONFIG_PHYS_ADDR_T_64BIT is not
set, we can't deal with BARs above 4GB.  In this case we already pretend
the BAR contained zero; this patch also sets IORESOURCE_UNSET so we can try
to reallocate it later.

I don't think this is exactly correct: what we care about here are *bus*
addresses, not CPU addresses, so the tests of sizeof(resource_size_t)
probably should be on sizeof(dma_addr_t) instead.  But this is what's been
in -next, so we'll fix that later.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/probe.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6e34498ec9f0..78335efbbb74 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -252,6 +252,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 			/* Address above 32-bit boundary; disable the BAR */
 			pci_write_config_dword(dev, pos, 0);
 			pci_write_config_dword(dev, pos + 4, 0);
+			res->flags |= IORESOURCE_UNSET;
 			region.start = 0;
 			region.end = sz64;
 			bar_disabled = true;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-19 18:54   ` Bjorn Helgaas
@ 2014-03-19 21:16     ` Bjorn Helgaas
  2014-03-19 21:23       ` Sasha Levin
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2014-03-19 21:16 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-kernel, Ming Lei, Rusty Russell, Pekka Enberg, Sasha Levin

On Wed, Mar 19, 2014 at 12:54 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Ming, Rusty, Pekka, Sasha]
> ...
> I plan to replace this patch with the following, which only sets
> IORESOURCE_UNSET when we already have been clearing the bus region start
> address.  (This probably should have been a separate patch to begin with,
> mea culpa.)
>
> This is intended for the v3.15 merge window, so I made the minimal change
> to reduce risk.

I put this patch in my pci/resource branch and re-merged it into my
"next" branch.  This rebased both pci/resource and next, which is
unfortunate, but I think it's the cleanest and least risky way at this
point.

Bjorn

> PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit
>
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> If we don't support 64-bit addresses, i.e., CONFIG_PHYS_ADDR_T_64BIT is not
> set, we can't deal with BARs above 4GB.  In this case we already pretend
> the BAR contained zero; this patch also sets IORESOURCE_UNSET so we can try
> to reallocate it later.
>
> I don't think this is exactly correct: what we care about here are *bus*
> addresses, not CPU addresses, so the tests of sizeof(resource_size_t)
> probably should be on sizeof(dma_addr_t) instead.  But this is what's been
> in -next, so we'll fix that later.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/probe.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 6e34498ec9f0..78335efbbb74 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -252,6 +252,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>                         /* Address above 32-bit boundary; disable the BAR */
>                         pci_write_config_dword(dev, pos, 0);
>                         pci_write_config_dword(dev, pos + 4, 0);
> +                       res->flags |= IORESOURCE_UNSET;
>                         region.start = 0;
>                         region.end = sz64;
>                         bar_disabled = true;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-19 21:16     ` Bjorn Helgaas
@ 2014-03-19 21:23       ` Sasha Levin
  0 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2014-03-19 21:23 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: linux-kernel, Ming Lei, Rusty Russell, Pekka Enberg

On 03/19/2014 05:16 PM, Bjorn Helgaas wrote:
> On Wed, Mar 19, 2014 at 12:54 PM, Bjorn Helgaas<bhelgaas@google.com>  wrote:
>> >[+cc Ming, Rusty, Pekka, Sasha]
>> >...
>> >I plan to replace this patch with the following, which only sets
>> >IORESOURCE_UNSET when we already have been clearing the bus region start
>> >address.  (This probably should have been a separate patch to begin with,
>> >mea culpa.)
>> >
>> >This is intended for the v3.15 merge window, so I made the minimal change
>> >to reduce risk.
> I put this patch in my pci/resource branch and re-merged it into my
> "next" branch.  This rebased both pci/resource and next, which is
> unfortunate, but I think it's the cleanest and least risky way at this
> point.

Thanks for pointing out the issue Ming. I must admit that I haven't referred
to the PCI spec when sending in my "fix" since the upstream patch sort of
made sense. OTOH, you can't really call the kvm tool PCI implementation spec
compliant :)

Pekka, we can either revert my patch completely since the issue won't ever be
visible (since the pci tree got rebase) or just keep it in. Let me know what you
prefer.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-19 16:45               ` Bjorn Helgaas
@ 2014-03-20  1:32                 ` Ming Lei
  2014-03-21 20:07                   ` Bjorn Helgaas
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2014-03-20  1:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Linux Kernel Mailing List, Rusty Russell,
	Pekka Enberg, Sasha Levin

On Thu, Mar 20, 2014 at 12:45 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Mar 18, 2014 at 10:52 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> Hi,
>>
>> Looks Sasha fixed the problem in lkvm tool[1].
>>
>> Sasha, looks we both saw the problem, but from technical
>> view, I am wondering if the fix is correct, because PCI spec.
>> requires that the IO/MMIO bits in COMMAND register should
>> be cleared after reset, maybe there are some potential problem
>> in lkvm pci emulation.
>
> I think I'm going to revert this patch ([2], "Ignore BAR contents when
> firmware left decoding disabled").  The main reason for that patch was
> to try for a consistent way of figuring out whether BARs are valid
> that we could use on all architectures, but I think we can do it in a
> better way.
>
> That said, this kvm change should not be necessary.  We *should* be
> able to take any PCI device and initialize it from power-on state
> without any dependencies on what the BIOS left in the BARs or the
> command register.  As far as I can tell, the PCI core actually worked
> fine in this case (we assigned valid addresses to the devices), but
> something else blew up.  If I revert that patch, it will cover up
> whatever this other bug is, but it would be much better to figure out
> what it is and fix is.
>
> You said earlier that "The memory allocation failure is caused by
> mistaken value read from pci address after the device is failed to
> enable."  Can you elaborate on that?  Are you saying that something

Sorry, that's my take for granted.

> tried to read from a region mapped by a BAR even though
> pci_enable_device() failed?  That would be a programming error, of
> course.  If you have any more details about exactly where this
> happened, that would help a lot in finding the problem.

When I check again, as you saw in the dmesg log after reverting, the
virtio device has been enabled successfully, looks no obvious PCI
failure, and the only problem is that the virtio driver reads zero queue
number from one region mapped by a BAR:

ioread16(vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NUM)
         <- setup_vq(): drivers/virtio/virtio_pci.c

That causes the memory allocation failure.

Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-20  1:32                 ` Ming Lei
@ 2014-03-21 20:07                   ` Bjorn Helgaas
  2014-03-21 20:25                     ` Sasha Levin
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2014-03-21 20:07 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-pci, Linux Kernel Mailing List, Rusty Russell,
	Pekka Enberg, Sasha Levin, kvm

[+cc kvm list]

On Wed, Mar 19, 2014 at 7:32 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Thu, Mar 20, 2014 at 12:45 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Tue, Mar 18, 2014 at 10:52 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>>> Hi,
>>>
>>> Looks Sasha fixed the problem in lkvm tool[1].
>>>
>>> Sasha, looks we both saw the problem, but from technical
>>> view, I am wondering if the fix is correct, because PCI spec.
>>> requires that the IO/MMIO bits in COMMAND register should
>>> be cleared after reset, maybe there are some potential problem
>>> in lkvm pci emulation.
>>
>> I think I'm going to revert this patch ([2], "Ignore BAR contents when
>> firmware left decoding disabled").  The main reason for that patch was
>> to try for a consistent way of figuring out whether BARs are valid
>> that we could use on all architectures, but I think we can do it in a
>> better way.
>>
>> That said, this kvm change should not be necessary.  We *should* be
>> able to take any PCI device and initialize it from power-on state
>> without any dependencies on what the BIOS left in the BARs or the
>> command register.  As far as I can tell, the PCI core actually worked
>> fine in this case (we assigned valid addresses to the devices), but
>> something else blew up.  If I revert that patch, it will cover up
>> whatever this other bug is, but it would be much better to figure out
>> what it is and fix is.

I think I figured out what the problem is.  In virtio_pci__init(), we
allocate some address space with pci_get_io_space_block(), save its
address in vpci->mmio_addr, and hook that address space up to
virtio_pci__io_mmio_callback with kvm__register_mmio().

But when we update the BAR value in pci__config_wr(), the address
space mapping is never updated.  I think this means that virtio-pci
can't tolerate its devices being moved by the OS.

In my opinion, this is a bug in linux-kvm.  We've managed to avoid
triggering this bug by preventing Linux from moving the BAR (either by
me reverting my patch, or by Sasha's linux-kvm change [1]).  But it's
not very robust to assume that the OS will never change the BAR, so
it's quite possible that you'll trip over this again in the future.

Bjorn

[1] 6478ce1416aa kvm tools: mark our PCI card as PIO and MMIO able

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-21 20:07                   ` Bjorn Helgaas
@ 2014-03-21 20:25                     ` Sasha Levin
  2014-03-21 20:40                       ` Bjorn Helgaas
  0 siblings, 1 reply; 25+ messages in thread
From: Sasha Levin @ 2014-03-21 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, Ming Lei
  Cc: linux-pci, Linux Kernel Mailing List, Rusty Russell, Pekka Enberg, kvm

On 03/21/2014 04:07 PM, Bjorn Helgaas wrote:
> I think I figured out what the problem is.  In virtio_pci__init(), we
> allocate some address space with pci_get_io_space_block(), save its
> address in vpci->mmio_addr, and hook that address space up to
> virtio_pci__io_mmio_callback with kvm__register_mmio().
>
> But when we update the BAR value in pci__config_wr(), the address
> space mapping is never updated.  I think this means that virtio-pci
> can't tolerate its devices being moved by the OS.
>
> In my opinion, this is a bug in linux-kvm.  We've managed to avoid
> triggering this bug by preventing Linux from moving the BAR (either by
> me reverting my patch, or by Sasha's linux-kvm change [1]).  But it's
> not very robust to assume that the OS will never change the BAR, so
> it's quite possible that you'll trip over this again in the future.

The purpose of KVM tool is to implement as much as possible of the KVM
interface and the virtio spec so that we'll have a good development/testing
environment with a very simple to understand codebase.

The issue you've mentioned is the "evil" side of the KVM tool. It never
tried (or claimed) to implement anything close to legacy hardware
interfaces. This means, for example, that it doesn't run any BIOS, there's
very lacking APIC support and the kernel is just injected into the virtual
RAM and gets run from there.

It also means that we went into the PCI spec deep enough to get the code
to work with the kernel. The only reason we implemented MSI interrupts
for example is because they provide improved performance with KVM, not
because we were trying to get a complete implementation of the PCI spec.

So yes, the PCI implementation in the KVM tool is lacking and what we
have there might be broken by making the kernel conform more closely
to the spec, but we are always happy to adapt and improve our code to
work with any changes in the kernel.

To sum it up, If you'll end up adding a change to the kernel that is
valid according to the spec but breaks the KVM tool we'll just go ahead
and fix the tool. You really don't need to worry about breaking it.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled
  2014-03-21 20:25                     ` Sasha Levin
@ 2014-03-21 20:40                       ` Bjorn Helgaas
  0 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2014-03-21 20:40 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ming Lei, linux-pci, Linux Kernel Mailing List, Rusty Russell,
	Pekka Enberg, kvm

On Fri, Mar 21, 2014 at 2:25 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
> On 03/21/2014 04:07 PM, Bjorn Helgaas wrote:
>>
>> I think I figured out what the problem is.  In virtio_pci__init(), we
>> allocate some address space with pci_get_io_space_block(), save its
>> address in vpci->mmio_addr, and hook that address space up to
>> virtio_pci__io_mmio_callback with kvm__register_mmio().
>>
>> But when we update the BAR value in pci__config_wr(), the address
>> space mapping is never updated.  I think this means that virtio-pci
>> can't tolerate its devices being moved by the OS.
>>
>> In my opinion, this is a bug in linux-kvm.  We've managed to avoid
>> triggering this bug by preventing Linux from moving the BAR (either by
>> me reverting my patch, or by Sasha's linux-kvm change [1]).  But it's
>> not very robust to assume that the OS will never change the BAR, so
>> it's quite possible that you'll trip over this again in the future.
>
>
> The purpose of KVM tool is to implement as much as possible of the KVM
> interface and the virtio spec so that we'll have a good development/testing
> environment with a very simple to understand codebase.
>
> The issue you've mentioned is the "evil" side of the KVM tool. It never
> tried (or claimed) to implement anything close to legacy hardware
> interfaces. This means, for example, that it doesn't run any BIOS, there's
> very lacking APIC support and the kernel is just injected into the virtual
> RAM and gets run from there.
>
> It also means that we went into the PCI spec deep enough to get the code
> to work with the kernel. The only reason we implemented MSI interrupts
> for example is because they provide improved performance with KVM, not
> because we were trying to get a complete implementation of the PCI spec.
>
> So yes, the PCI implementation in the KVM tool is lacking and what we
> have there might be broken by making the kernel conform more closely
> to the spec, but we are always happy to adapt and improve our code to
> work with any changes in the kernel.
>
> To sum it up, If you'll end up adding a change to the kernel that is
> valid according to the spec but breaks the KVM tool we'll just go ahead
> and fix the tool. You really don't need to worry about breaking it.

That makes sense, and I'm glad I had a chance to get acquainted with
the KVM tool.  If I get another problem report related to it, I'll try
to remember that I don't need to worry about breaking it :)

Bjorn

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-03-21 20:41 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26 19:37 [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas
2014-02-26 19:37 ` [PATCH 1/9] resource: Add resource_contains() Bjorn Helgaas
2014-02-26 19:37 ` [PATCH 2/9] vsprintf: Add support for IORESOURCE_UNSET in %pR Bjorn Helgaas
2014-02-26 19:37 ` [PATCH 3/9] PCI: Remove pci_find_parent_resource() use for allocation Bjorn Helgaas
2014-02-26 19:37 ` [PATCH 4/9] PCI: Mark resources as IORESOURCE_UNSET if we can't assign them Bjorn Helgaas
2014-02-26 19:37 ` [PATCH 5/9] PCI: Don't clear IORESOURCE_UNSET when updating BAR Bjorn Helgaas
2014-02-26 19:37 ` [PATCH 6/9] PCI: Check IORESOURCE_UNSET before " Bjorn Helgaas
2014-02-26 19:37 ` [PATCH 7/9] PCI: Don't try to claim IORESOURCE_UNSET resources Bjorn Helgaas
2014-02-26 19:37 ` [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled Bjorn Helgaas
2014-03-13  8:51   ` Ming Lei
2014-03-13 16:08     ` Bjorn Helgaas
2014-03-14  1:48       ` Ming Lei
2014-03-18  0:27         ` Bjorn Helgaas
2014-03-19  3:32           ` Ming Lei
2014-03-19  4:52             ` Ming Lei
2014-03-19 16:45               ` Bjorn Helgaas
2014-03-20  1:32                 ` Ming Lei
2014-03-21 20:07                   ` Bjorn Helgaas
2014-03-21 20:25                     ` Sasha Levin
2014-03-21 20:40                       ` Bjorn Helgaas
2014-03-19 18:54   ` Bjorn Helgaas
2014-03-19 21:16     ` Bjorn Helgaas
2014-03-19 21:23       ` Sasha Levin
2014-02-26 19:38 ` [PATCH 9/9] PCI: Don't enable decoding if BAR hasn't been assigned an address Bjorn Helgaas
2014-03-04 20:53 ` [PATCH 0/9] PCI: Use IORESOURCE_UNSET for unassigned BARs Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).