All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] PCI: ARM: add support for generic PCI host controller
@ 2014-02-24 18:59 ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2014-02-24 18:59 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: arnd, linux-pci, bhelgaas, jgunthorpe, Will Deacon

This patch adds support for a generic PCI host controller, such as a
firmware-initialised device with static windows or an emulation by
something such as kvmtool.

The controller itself has no configuration registers and has its address
spaces described entirely by the device-tree (using the bindings from
ePAPR). Both CAM and ECAM are supported for Config Space accesses.

Corresponding documentation is added for the DT binding.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---

Hello,

This is version 4 of the patches previously posted here:

  v1: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229679.html
  v2: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/232213.html
  v3: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/233491.html

Changes since v3 include:

  - Dropped the rest of the series (I sent the bios32 patch to rmk, dropped the
    bus claiming patch and the mach-virt Kconfig patch is trivial)

  - Added support for "linux,pci-probe-only" instead of using cmdline

  - Fixed offset calculation for I/O windows and reworked resource registration

As per usual, all feedback is welcome.

Will

 .../devicetree/bindings/pci/host-generic-pci.txt   | 100 ++++++
 drivers/pci/host/Kconfig                           |   7 +
 drivers/pci/host/Makefile                          |   1 +
 drivers/pci/host/pci-host-generic.c                | 374 +++++++++++++++++++++
 4 files changed, 482 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/host-generic-pci.txt
 create mode 100644 drivers/pci/host/pci-host-generic.c

diff --git a/Documentation/devicetree/bindings/pci/host-generic-pci.txt b/Documentation/devicetree/bindings/pci/host-generic-pci.txt
new file mode 100644
index 000000000000..3b1e6e5ccb99
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/host-generic-pci.txt
@@ -0,0 +1,100 @@
+* Generic PCI host controller
+
+Firmware-initialised PCI host controllers and PCI emulations, such as the
+virtio-pci implementations found in kvmtool and other para-virtualised
+systems, do not require driver support for complexities such as regulator
+and clock management. In fact, the controller may not even require the
+configuration of a control interface by the operating system, instead
+presenting a set of fixed windows describing a subset of IO, Memory and
+Configuration Spaces.
+
+Such a controller can be described purely in terms of the standardized device
+tree bindings communicated in pci.txt:
+
+
+Properties of the host controller node:
+
+- compatible     : Must be "pci-host-cam-generic" or "pci-host-ecam-generic"
+                   depending on the layout of configuration space (CAM vs
+                   ECAM respectively).
+
+- device_type    : Must be "pci".
+
+- ranges         : As described in IEEE Std 1275-1994, but must provide
+                   at least a definition of non-prefetchable memory. One
+                   or both of prefetchable Memory and IO Space may also
+                   be provided.
+
+- bus-range      : Optional property (also described in IEEE Std 1275-1994)
+                   to indicate the range of bus numbers for this controller.
+                   If absent, defaults to <0 255> (i.e. all buses).
+
+- #address-cells : Must be 3.
+
+- #size-cells    : Must be 2.
+
+- reg            : The Configuration Space base address and size, as accessed
+                   from the parent bus.
+
+
+Properties of the /chosen node:
+
+- linux,pci-probe-only
+                 : Optional property which takes a single-cell argument.
+                   If '0', then Linux will assign devices in its usual manner,
+                   otherwise it will not try to assign devices and instead use
+                   them as they are configured already.
+
+Configuration Space is assumed to be memory-mapped (as opposed to being
+accessed via an ioport) and laid out with a direct correspondence to the
+geography of a PCI bus address by concatenating the various components to
+form an offset.
+
+For CAM, this 24-bit offset is:
+
+        cfg_offset(bus, device, function, register) =
+                   bus << 16 | device << 11 | function << 8 | register
+
+Whilst ECAM extends this by 4 bits to accomodate 4k of function space:
+
+        cfg_offset(bus, device, function, register) =
+                   bus << 20 | device << 15 | function << 12 | register
+
+Interrupt mapping is exactly as described in `Open Firmware Recommended
+Practice: Interrupt Mapping' and requires the following properties:
+
+- #interrupt-cells   : Must be 1
+
+- interrupt-map      : <see aforementioned specification>
+
+- interrupt-map-mask : <see aforementioned specification>
+
+
+Example:
+
+pci {
+    compatible = "pci-host-cam-generic"
+    device_type = "pci";
+    #address-cells = <3>;
+    #size-cells = <2>;
+    bus-range = <0x0 0x1>;
+
+    // CPU_PHYSICAL(2)  SIZE(2)
+    reg = <0x0 0x40000000  0x0 0x1000000>;
+
+    // BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2)
+    ranges = <0x1000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
+             <0x2000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
+
+
+    #interrupt-cells = <0x1>;
+
+    // PCI_DEVICE(3)  INT#(1)  CONTROLLER(PHANDLE)  CONTROLLER_DATA(3)
+    interrupt-map = <  0x0 0x0 0x0  0x1  &gic  0x0 0x4 0x1
+                     0x800 0x0 0x0  0x1  &gic  0x0 0x5 0x1
+                    0x1000 0x0 0x0  0x1  &gic  0x0 0x6 0x1
+                    0x1800 0x0 0x0  0x1  &gic  0x0 0x7 0x1>;
+
+    // PCI_DEVICE(3)  INT#(1)
+    interrupt-map-mask = <0xf800 0x0 0x0  0x7>;
+}
diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
index 47d46c6d8468..f91637d47065 100644
--- a/drivers/pci/host/Kconfig
+++ b/drivers/pci/host/Kconfig
@@ -33,4 +33,11 @@ config PCI_RCAR_GEN2
 	  There are 3 internal PCI controllers available with a single
 	  built-in EHCI/OHCI host controller present on each one.
 
+config PCI_HOST_GENERIC
+	bool "Generic PCI host controller"
+	depends on ARM && OF
+	help
+	  Say Y here if you want to support a simple generic PCI host
+	  controller, such as the one emulated by kvmtool.
+
 endmenu
diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
index 13fb3333aa05..bd1bf1ab4ac8 100644
--- a/drivers/pci/host/Makefile
+++ b/drivers/pci/host/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
 obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
 obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
 obj-$(CONFIG_PCI_RCAR_GEN2) += pci-rcar-gen2.o
+obj-$(CONFIG_PCI_HOST_GENERIC) += pci-host-generic.o
diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c
new file mode 100644
index 000000000000..71b0960772ec
--- /dev/null
+++ b/drivers/pci/host/pci-host-generic.c
@@ -0,0 +1,374 @@
+/*
+ * Simple, generic PCI host controller driver targetting firmware-initialised
+ * systems and virtual machines (e.g. the PCI emulation provided by kvmtool).
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (C) 2014 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/of_pci.h>
+#include <linux/platform_device.h>
+
+struct gen_pci_cfg_bus_ops {
+	u32 bus_shift;
+	void __iomem *(*map_bus)(struct pci_bus *, unsigned int, int);
+};
+
+struct gen_pci_cfg_windows {
+	struct resource				res;
+	struct resource				bus_range;
+	void __iomem				**win;
+
+	const struct gen_pci_cfg_bus_ops	*ops;
+};
+
+struct gen_pci {
+	struct pci_host_bridge			host;
+	struct gen_pci_cfg_windows		cfg;
+};
+
+static void __iomem *gen_pci_map_cfg_bus_cam(struct pci_bus *bus,
+					     unsigned int devfn,
+					     int where)
+{
+	struct pci_sys_data *sys = bus->sysdata;
+	struct gen_pci *pci = sys->private_data;
+	resource_size_t idx = bus->number - pci->cfg.bus_range.start;
+
+	return pci->cfg.win[idx] + ((devfn << 8) | where);
+}
+
+static struct gen_pci_cfg_bus_ops gen_pci_cfg_cam_bus_ops = {
+	.bus_shift	= 16,
+	.map_bus	= gen_pci_map_cfg_bus_cam,
+};
+
+static void __iomem *gen_pci_map_cfg_bus_ecam(struct pci_bus *bus,
+					      unsigned int devfn,
+					      int where)
+{
+	struct pci_sys_data *sys = bus->sysdata;
+	struct gen_pci *pci = sys->private_data;
+	resource_size_t idx = bus->number - pci->cfg.bus_range.start;
+
+	return pci->cfg.win[idx] + ((devfn << 12) | where);
+}
+
+static struct gen_pci_cfg_bus_ops gen_pci_cfg_ecam_bus_ops = {
+	.bus_shift	= 20,
+	.map_bus	= gen_pci_map_cfg_bus_ecam,
+};
+
+static int gen_pci_config_read(struct pci_bus *bus, unsigned int devfn,
+				int where, int size, u32 *val)
+{
+	void __iomem *addr;
+	struct pci_sys_data *sys = bus->sysdata;
+	struct gen_pci *pci = sys->private_data;
+
+	addr = pci->cfg.ops->map_bus(bus, devfn, where);
+
+	switch (size) {
+	case 1:
+		*val = readb(addr);
+		break;
+	case 2:
+		*val = readw(addr);
+		break;
+	default:
+		*val = readl(addr);
+	}
+
+	return PCIBIOS_SUCCESSFUL;
+}
+
+static int gen_pci_config_write(struct pci_bus *bus, unsigned int devfn,
+				 int where, int size, u32 val)
+{
+	void __iomem *addr;
+	struct pci_sys_data *sys = bus->sysdata;
+	struct gen_pci *pci = sys->private_data;
+
+	addr = pci->cfg.ops->map_bus(bus, devfn, where);
+
+	switch (size) {
+	case 1:
+		writeb(val, addr);
+		break;
+	case 2:
+		writew(val, addr);
+		break;
+	default:
+		writel(val, addr);
+	}
+
+	return PCIBIOS_SUCCESSFUL;
+}
+
+static struct pci_ops gen_pci_ops = {
+	.read	= gen_pci_config_read,
+	.write	= gen_pci_config_write,
+};
+
+static int gen_pci_calc_io_offset(struct device *dev,
+				  struct of_pci_range *range,
+				  struct resource *res,
+				  resource_size_t *offset)
+{
+	static atomic_t wins = ATOMIC_INIT(0);
+	int err, idx, max_win;
+	unsigned int io_offset;
+
+	/* Ensure the CPU physical address is aligned to a 64K boundary */
+	range->size += range->cpu_addr & (SZ_64K - 1);
+	range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
+	range->pci_addr = round_down(range->pci_addr, SZ_64K);
+
+	if (range->size > SZ_64K)
+		return -E2BIG;
+
+	max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
+	idx = atomic_inc_return(&wins);
+	if (idx >= max_win)
+		return -ENOSPC;
+
+	io_offset = (idx - 1) * SZ_64K;
+	err = pci_ioremap_io(io_offset, range->cpu_addr);
+	if (err)
+		return err;
+
+	of_pci_range_to_resource(range, dev->of_node, res);
+	res->start = io_offset + range->pci_addr;
+	res->end = res->start + range->size - 1;
+	*offset = io_offset;
+	return 0;
+}
+
+static int gen_pci_calc_mem_offset(struct device *dev,
+				   struct of_pci_range *range,
+				   struct resource *res,
+				   resource_size_t *offset)
+{
+	of_pci_range_to_resource(range, dev->of_node, res);
+	*offset = range->cpu_addr - range->pci_addr;
+	return 0;
+}
+
+static const struct of_device_id gen_pci_of_match[] = {
+	{ .compatible = "pci-host-cam-generic",
+	  .data = &gen_pci_cfg_cam_bus_ops },
+
+	{ .compatible = "pci-host-ecam-generic",
+	  .data = &gen_pci_cfg_ecam_bus_ops },
+
+	{ },
+};
+MODULE_DEVICE_TABLE(of, gen_pci_of_match);
+
+static int gen_pci_setup(int nr, struct pci_sys_data *sys)
+{
+	int err, res_valid;
+	u8 bus_max;
+	struct pci_host_bridge_window *win;
+	struct resource *bus_range;
+	resource_size_t busn;
+	const char *type;
+	struct of_pci_range range;
+	struct of_pci_range_parser parser;
+	const struct of_device_id *of_id;
+	const int *prop;
+	struct gen_pci *pci = sys->private_data;
+	struct device *dev = pci->host.dev.parent;
+	struct device_node *np = dev->of_node;
+
+	type = of_get_property(np, "device_type", NULL);
+	if (!type || strcmp(type, "pci")) {
+		dev_err(dev, "invalid \"device_type\" %s\n", type);
+		return -EINVAL;
+	}
+
+	if (of_pci_range_parser_init(&parser, np)) {
+		dev_err(dev, "missing \"ranges\" property\n");
+		return -EINVAL;
+	}
+
+	prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL);
+	if (prop) {
+		if (*prop)
+			pci_add_flags(PCI_PROBE_ONLY);
+		else
+			pci_clear_flags(PCI_PROBE_ONLY);
+	}
+
+	if (of_pci_parse_bus_range(np, &pci->cfg.bus_range))
+		pci->cfg.bus_range = (struct resource) {
+			.name	= np->name,
+			.start	= 0,
+			.end	= 0xff,
+			.flags	= IORESOURCE_BUS,
+		};
+
+	err = of_address_to_resource(np, 0, &pci->cfg.res);
+	if (err) {
+		dev_err(dev, "missing \"reg\" property\n");
+		return err;
+	}
+
+	pci->cfg.win = devm_kcalloc(dev, resource_size(&pci->cfg.bus_range),
+				    sizeof(*pci->cfg.win), GFP_KERNEL);
+	if (!pci->cfg.win)
+		return -ENOMEM;
+
+	of_id = of_match_node(gen_pci_of_match, np);
+	pci->cfg.ops = of_id->data;
+	INIT_LIST_HEAD(&pci->host.windows);
+
+	/* Limit the bus-range to fit within reg */
+	bus_max = pci->cfg.bus_range.start +
+		  (resource_size(&pci->cfg.res) >> pci->cfg.ops->bus_shift) - 1;
+	pci->cfg.bus_range.end = min_t(resource_size_t, pci->cfg.bus_range.end,
+				       bus_max);
+
+	/* Create and register resources from the ranges property */
+	res_valid = 0;
+	for_each_of_pci_range(&parser, &range) {
+		struct resource *parent, *res;
+		resource_size_t offset;
+		u32 restype = range.flags & IORESOURCE_TYPE_BITS;
+
+		res = devm_kmalloc(dev, sizeof(*res), GFP_KERNEL);
+		if (!res) {
+			err = -ENOMEM;
+			goto out_release_res;
+		}
+
+		switch (restype) {
+		case IORESOURCE_IO:
+			parent = &ioport_resource;
+			err = gen_pci_calc_io_offset(dev, &range, res, &offset);
+			break;
+		case IORESOURCE_MEM:
+			parent = &iomem_resource;
+			err = gen_pci_calc_mem_offset(dev, &range, res, &offset);
+			res_valid |= !(res->flags & IORESOURCE_PREFETCH || err);
+			break;
+		default:
+			err = -EINVAL;
+			continue;
+		}
+
+		if (err) {
+			dev_warn(dev,
+				 "error %d: failed to add resource [type 0x%x, %lld bytes]\n",
+				 err, restype, range.size);
+			continue;
+		}
+
+		err = request_resource(parent, res);
+		if (err)
+			goto out_release_res;
+
+		pci_add_resource_offset(&sys->resources, res, offset);
+	}
+
+	if (!res_valid) {
+		dev_err(dev, "non-prefetchable memory resource required\n");
+		err = -EINVAL;
+		goto out_release_res;
+	}
+
+	/* Map our Configuration Space windows */
+	if (!request_mem_region(pci->cfg.res.start,
+				resource_size(&pci->cfg.res),
+				"Configuration Space"))
+		goto out_release_res;
+
+	bus_range = &pci->cfg.bus_range;
+	for (busn = bus_range->start; busn <= bus_range->end; ++busn) {
+		u32 idx = busn - bus_range->start;
+		u32 sz = 1 << pci->cfg.ops->bus_shift;
+
+		pci->cfg.win[idx] = devm_ioremap(dev,
+						 pci->cfg.res.start + busn * sz,
+						 sz);
+		if (!pci->cfg.win[idx]) {
+			err = -ENOMEM;
+			goto out_unmap_cfg;
+		}
+	}
+
+	/* Register bus resource */
+	pci_add_resource(&sys->resources, bus_range);
+	return 1;
+
+out_unmap_cfg:
+	while (busn-- > bus_range->start)
+		devm_iounmap(dev, pci->cfg.win[busn - bus_range->start]);
+
+out_release_res:
+	release_child_resources(&iomem_resource);
+	release_child_resources(&ioport_resource);
+
+	list_for_each_entry(win, &sys->resources, list)
+		devm_kfree(dev, win->res);
+	pci_free_resource_list(&sys->resources);
+
+	devm_kfree(dev, pci->cfg.win);
+	return err;
+}
+
+static int gen_pci_probe(struct platform_device *pdev)
+{
+	struct hw_pci hw;
+	struct gen_pci *pci;
+	struct device *dev = &pdev->dev;
+
+	if (!dev->of_node)
+		return -ENODEV;
+
+	pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
+	if (!pci)
+		return -ENOMEM;
+
+	pci->host.dev.parent = dev;
+	hw = (struct hw_pci) {
+		.nr_controllers	= 1,
+		.private_data	= (void **)&pci,
+		.setup		= gen_pci_setup,
+		.map_irq	= of_irq_parse_and_map_pci,
+		.ops		= &gen_pci_ops,
+	};
+
+	pci_common_init_dev(dev, &hw);
+	return 0;
+}
+
+static struct platform_driver gen_pci_driver = {
+	.driver = {
+		.name = "pci-host-generic",
+		.owner = THIS_MODULE,
+		.of_match_table = gen_pci_of_match,
+	},
+	.probe = gen_pci_probe,
+};
+module_platform_driver(gen_pci_driver);
+
+MODULE_DESCRIPTION("Generic PCI host driver");
+MODULE_AUTHOR("Will Deacon <will.deacon@arm.com>");
+MODULE_LICENSE("GPLv2");
-- 
1.8.2.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4] PCI: ARM: add support for generic PCI host controller
@ 2014-02-24 18:59 ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2014-02-24 18:59 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for a generic PCI host controller, such as a
firmware-initialised device with static windows or an emulation by
something such as kvmtool.

The controller itself has no configuration registers and has its address
spaces described entirely by the device-tree (using the bindings from
ePAPR). Both CAM and ECAM are supported for Config Space accesses.

Corresponding documentation is added for the DT binding.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---

Hello,

This is version 4 of the patches previously posted here:

  v1: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229679.html
  v2: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/232213.html
  v3: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/233491.html

Changes since v3 include:

  - Dropped the rest of the series (I sent the bios32 patch to rmk, dropped the
    bus claiming patch and the mach-virt Kconfig patch is trivial)

  - Added support for "linux,pci-probe-only" instead of using cmdline

  - Fixed offset calculation for I/O windows and reworked resource registration

As per usual, all feedback is welcome.

Will

 .../devicetree/bindings/pci/host-generic-pci.txt   | 100 ++++++
 drivers/pci/host/Kconfig                           |   7 +
 drivers/pci/host/Makefile                          |   1 +
 drivers/pci/host/pci-host-generic.c                | 374 +++++++++++++++++++++
 4 files changed, 482 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/host-generic-pci.txt
 create mode 100644 drivers/pci/host/pci-host-generic.c

diff --git a/Documentation/devicetree/bindings/pci/host-generic-pci.txt b/Documentation/devicetree/bindings/pci/host-generic-pci.txt
new file mode 100644
index 000000000000..3b1e6e5ccb99
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/host-generic-pci.txt
@@ -0,0 +1,100 @@
+* Generic PCI host controller
+
+Firmware-initialised PCI host controllers and PCI emulations, such as the
+virtio-pci implementations found in kvmtool and other para-virtualised
+systems, do not require driver support for complexities such as regulator
+and clock management. In fact, the controller may not even require the
+configuration of a control interface by the operating system, instead
+presenting a set of fixed windows describing a subset of IO, Memory and
+Configuration Spaces.
+
+Such a controller can be described purely in terms of the standardized device
+tree bindings communicated in pci.txt:
+
+
+Properties of the host controller node:
+
+- compatible     : Must be "pci-host-cam-generic" or "pci-host-ecam-generic"
+                   depending on the layout of configuration space (CAM vs
+                   ECAM respectively).
+
+- device_type    : Must be "pci".
+
+- ranges         : As described in IEEE Std 1275-1994, but must provide
+                   at least a definition of non-prefetchable memory. One
+                   or both of prefetchable Memory and IO Space may also
+                   be provided.
+
+- bus-range      : Optional property (also described in IEEE Std 1275-1994)
+                   to indicate the range of bus numbers for this controller.
+                   If absent, defaults to <0 255> (i.e. all buses).
+
+- #address-cells : Must be 3.
+
+- #size-cells    : Must be 2.
+
+- reg            : The Configuration Space base address and size, as accessed
+                   from the parent bus.
+
+
+Properties of the /chosen node:
+
+- linux,pci-probe-only
+                 : Optional property which takes a single-cell argument.
+                   If '0', then Linux will assign devices in its usual manner,
+                   otherwise it will not try to assign devices and instead use
+                   them as they are configured already.
+
+Configuration Space is assumed to be memory-mapped (as opposed to being
+accessed via an ioport) and laid out with a direct correspondence to the
+geography of a PCI bus address by concatenating the various components to
+form an offset.
+
+For CAM, this 24-bit offset is:
+
+        cfg_offset(bus, device, function, register) =
+                   bus << 16 | device << 11 | function << 8 | register
+
+Whilst ECAM extends this by 4 bits to accomodate 4k of function space:
+
+        cfg_offset(bus, device, function, register) =
+                   bus << 20 | device << 15 | function << 12 | register
+
+Interrupt mapping is exactly as described in `Open Firmware Recommended
+Practice: Interrupt Mapping' and requires the following properties:
+
+- #interrupt-cells   : Must be 1
+
+- interrupt-map      : <see aforementioned specification>
+
+- interrupt-map-mask : <see aforementioned specification>
+
+
+Example:
+
+pci {
+    compatible = "pci-host-cam-generic"
+    device_type = "pci";
+    #address-cells = <3>;
+    #size-cells = <2>;
+    bus-range = <0x0 0x1>;
+
+    // CPU_PHYSICAL(2)  SIZE(2)
+    reg = <0x0 0x40000000  0x0 0x1000000>;
+
+    // BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2)
+    ranges = <0x1000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
+             <0x2000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
+
+
+    #interrupt-cells = <0x1>;
+
+    // PCI_DEVICE(3)  INT#(1)  CONTROLLER(PHANDLE)  CONTROLLER_DATA(3)
+    interrupt-map = <  0x0 0x0 0x0  0x1  &gic  0x0 0x4 0x1
+                     0x800 0x0 0x0  0x1  &gic  0x0 0x5 0x1
+                    0x1000 0x0 0x0  0x1  &gic  0x0 0x6 0x1
+                    0x1800 0x0 0x0  0x1  &gic  0x0 0x7 0x1>;
+
+    // PCI_DEVICE(3)  INT#(1)
+    interrupt-map-mask = <0xf800 0x0 0x0  0x7>;
+}
diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
index 47d46c6d8468..f91637d47065 100644
--- a/drivers/pci/host/Kconfig
+++ b/drivers/pci/host/Kconfig
@@ -33,4 +33,11 @@ config PCI_RCAR_GEN2
 	  There are 3 internal PCI controllers available with a single
 	  built-in EHCI/OHCI host controller present on each one.
 
+config PCI_HOST_GENERIC
+	bool "Generic PCI host controller"
+	depends on ARM && OF
+	help
+	  Say Y here if you want to support a simple generic PCI host
+	  controller, such as the one emulated by kvmtool.
+
 endmenu
diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
index 13fb3333aa05..bd1bf1ab4ac8 100644
--- a/drivers/pci/host/Makefile
+++ b/drivers/pci/host/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
 obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
 obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
 obj-$(CONFIG_PCI_RCAR_GEN2) += pci-rcar-gen2.o
+obj-$(CONFIG_PCI_HOST_GENERIC) += pci-host-generic.o
diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c
new file mode 100644
index 000000000000..71b0960772ec
--- /dev/null
+++ b/drivers/pci/host/pci-host-generic.c
@@ -0,0 +1,374 @@
+/*
+ * Simple, generic PCI host controller driver targetting firmware-initialised
+ * systems and virtual machines (e.g. the PCI emulation provided by kvmtool).
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (C) 2014 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/of_pci.h>
+#include <linux/platform_device.h>
+
+struct gen_pci_cfg_bus_ops {
+	u32 bus_shift;
+	void __iomem *(*map_bus)(struct pci_bus *, unsigned int, int);
+};
+
+struct gen_pci_cfg_windows {
+	struct resource				res;
+	struct resource				bus_range;
+	void __iomem				**win;
+
+	const struct gen_pci_cfg_bus_ops	*ops;
+};
+
+struct gen_pci {
+	struct pci_host_bridge			host;
+	struct gen_pci_cfg_windows		cfg;
+};
+
+static void __iomem *gen_pci_map_cfg_bus_cam(struct pci_bus *bus,
+					     unsigned int devfn,
+					     int where)
+{
+	struct pci_sys_data *sys = bus->sysdata;
+	struct gen_pci *pci = sys->private_data;
+	resource_size_t idx = bus->number - pci->cfg.bus_range.start;
+
+	return pci->cfg.win[idx] + ((devfn << 8) | where);
+}
+
+static struct gen_pci_cfg_bus_ops gen_pci_cfg_cam_bus_ops = {
+	.bus_shift	= 16,
+	.map_bus	= gen_pci_map_cfg_bus_cam,
+};
+
+static void __iomem *gen_pci_map_cfg_bus_ecam(struct pci_bus *bus,
+					      unsigned int devfn,
+					      int where)
+{
+	struct pci_sys_data *sys = bus->sysdata;
+	struct gen_pci *pci = sys->private_data;
+	resource_size_t idx = bus->number - pci->cfg.bus_range.start;
+
+	return pci->cfg.win[idx] + ((devfn << 12) | where);
+}
+
+static struct gen_pci_cfg_bus_ops gen_pci_cfg_ecam_bus_ops = {
+	.bus_shift	= 20,
+	.map_bus	= gen_pci_map_cfg_bus_ecam,
+};
+
+static int gen_pci_config_read(struct pci_bus *bus, unsigned int devfn,
+				int where, int size, u32 *val)
+{
+	void __iomem *addr;
+	struct pci_sys_data *sys = bus->sysdata;
+	struct gen_pci *pci = sys->private_data;
+
+	addr = pci->cfg.ops->map_bus(bus, devfn, where);
+
+	switch (size) {
+	case 1:
+		*val = readb(addr);
+		break;
+	case 2:
+		*val = readw(addr);
+		break;
+	default:
+		*val = readl(addr);
+	}
+
+	return PCIBIOS_SUCCESSFUL;
+}
+
+static int gen_pci_config_write(struct pci_bus *bus, unsigned int devfn,
+				 int where, int size, u32 val)
+{
+	void __iomem *addr;
+	struct pci_sys_data *sys = bus->sysdata;
+	struct gen_pci *pci = sys->private_data;
+
+	addr = pci->cfg.ops->map_bus(bus, devfn, where);
+
+	switch (size) {
+	case 1:
+		writeb(val, addr);
+		break;
+	case 2:
+		writew(val, addr);
+		break;
+	default:
+		writel(val, addr);
+	}
+
+	return PCIBIOS_SUCCESSFUL;
+}
+
+static struct pci_ops gen_pci_ops = {
+	.read	= gen_pci_config_read,
+	.write	= gen_pci_config_write,
+};
+
+static int gen_pci_calc_io_offset(struct device *dev,
+				  struct of_pci_range *range,
+				  struct resource *res,
+				  resource_size_t *offset)
+{
+	static atomic_t wins = ATOMIC_INIT(0);
+	int err, idx, max_win;
+	unsigned int io_offset;
+
+	/* Ensure the CPU physical address is aligned to a 64K boundary */
+	range->size += range->cpu_addr & (SZ_64K - 1);
+	range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
+	range->pci_addr = round_down(range->pci_addr, SZ_64K);
+
+	if (range->size > SZ_64K)
+		return -E2BIG;
+
+	max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
+	idx = atomic_inc_return(&wins);
+	if (idx >= max_win)
+		return -ENOSPC;
+
+	io_offset = (idx - 1) * SZ_64K;
+	err = pci_ioremap_io(io_offset, range->cpu_addr);
+	if (err)
+		return err;
+
+	of_pci_range_to_resource(range, dev->of_node, res);
+	res->start = io_offset + range->pci_addr;
+	res->end = res->start + range->size - 1;
+	*offset = io_offset;
+	return 0;
+}
+
+static int gen_pci_calc_mem_offset(struct device *dev,
+				   struct of_pci_range *range,
+				   struct resource *res,
+				   resource_size_t *offset)
+{
+	of_pci_range_to_resource(range, dev->of_node, res);
+	*offset = range->cpu_addr - range->pci_addr;
+	return 0;
+}
+
+static const struct of_device_id gen_pci_of_match[] = {
+	{ .compatible = "pci-host-cam-generic",
+	  .data = &gen_pci_cfg_cam_bus_ops },
+
+	{ .compatible = "pci-host-ecam-generic",
+	  .data = &gen_pci_cfg_ecam_bus_ops },
+
+	{ },
+};
+MODULE_DEVICE_TABLE(of, gen_pci_of_match);
+
+static int gen_pci_setup(int nr, struct pci_sys_data *sys)
+{
+	int err, res_valid;
+	u8 bus_max;
+	struct pci_host_bridge_window *win;
+	struct resource *bus_range;
+	resource_size_t busn;
+	const char *type;
+	struct of_pci_range range;
+	struct of_pci_range_parser parser;
+	const struct of_device_id *of_id;
+	const int *prop;
+	struct gen_pci *pci = sys->private_data;
+	struct device *dev = pci->host.dev.parent;
+	struct device_node *np = dev->of_node;
+
+	type = of_get_property(np, "device_type", NULL);
+	if (!type || strcmp(type, "pci")) {
+		dev_err(dev, "invalid \"device_type\" %s\n", type);
+		return -EINVAL;
+	}
+
+	if (of_pci_range_parser_init(&parser, np)) {
+		dev_err(dev, "missing \"ranges\" property\n");
+		return -EINVAL;
+	}
+
+	prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL);
+	if (prop) {
+		if (*prop)
+			pci_add_flags(PCI_PROBE_ONLY);
+		else
+			pci_clear_flags(PCI_PROBE_ONLY);
+	}
+
+	if (of_pci_parse_bus_range(np, &pci->cfg.bus_range))
+		pci->cfg.bus_range = (struct resource) {
+			.name	= np->name,
+			.start	= 0,
+			.end	= 0xff,
+			.flags	= IORESOURCE_BUS,
+		};
+
+	err = of_address_to_resource(np, 0, &pci->cfg.res);
+	if (err) {
+		dev_err(dev, "missing \"reg\" property\n");
+		return err;
+	}
+
+	pci->cfg.win = devm_kcalloc(dev, resource_size(&pci->cfg.bus_range),
+				    sizeof(*pci->cfg.win), GFP_KERNEL);
+	if (!pci->cfg.win)
+		return -ENOMEM;
+
+	of_id = of_match_node(gen_pci_of_match, np);
+	pci->cfg.ops = of_id->data;
+	INIT_LIST_HEAD(&pci->host.windows);
+
+	/* Limit the bus-range to fit within reg */
+	bus_max = pci->cfg.bus_range.start +
+		  (resource_size(&pci->cfg.res) >> pci->cfg.ops->bus_shift) - 1;
+	pci->cfg.bus_range.end = min_t(resource_size_t, pci->cfg.bus_range.end,
+				       bus_max);
+
+	/* Create and register resources from the ranges property */
+	res_valid = 0;
+	for_each_of_pci_range(&parser, &range) {
+		struct resource *parent, *res;
+		resource_size_t offset;
+		u32 restype = range.flags & IORESOURCE_TYPE_BITS;
+
+		res = devm_kmalloc(dev, sizeof(*res), GFP_KERNEL);
+		if (!res) {
+			err = -ENOMEM;
+			goto out_release_res;
+		}
+
+		switch (restype) {
+		case IORESOURCE_IO:
+			parent = &ioport_resource;
+			err = gen_pci_calc_io_offset(dev, &range, res, &offset);
+			break;
+		case IORESOURCE_MEM:
+			parent = &iomem_resource;
+			err = gen_pci_calc_mem_offset(dev, &range, res, &offset);
+			res_valid |= !(res->flags & IORESOURCE_PREFETCH || err);
+			break;
+		default:
+			err = -EINVAL;
+			continue;
+		}
+
+		if (err) {
+			dev_warn(dev,
+				 "error %d: failed to add resource [type 0x%x, %lld bytes]\n",
+				 err, restype, range.size);
+			continue;
+		}
+
+		err = request_resource(parent, res);
+		if (err)
+			goto out_release_res;
+
+		pci_add_resource_offset(&sys->resources, res, offset);
+	}
+
+	if (!res_valid) {
+		dev_err(dev, "non-prefetchable memory resource required\n");
+		err = -EINVAL;
+		goto out_release_res;
+	}
+
+	/* Map our Configuration Space windows */
+	if (!request_mem_region(pci->cfg.res.start,
+				resource_size(&pci->cfg.res),
+				"Configuration Space"))
+		goto out_release_res;
+
+	bus_range = &pci->cfg.bus_range;
+	for (busn = bus_range->start; busn <= bus_range->end; ++busn) {
+		u32 idx = busn - bus_range->start;
+		u32 sz = 1 << pci->cfg.ops->bus_shift;
+
+		pci->cfg.win[idx] = devm_ioremap(dev,
+						 pci->cfg.res.start + busn * sz,
+						 sz);
+		if (!pci->cfg.win[idx]) {
+			err = -ENOMEM;
+			goto out_unmap_cfg;
+		}
+	}
+
+	/* Register bus resource */
+	pci_add_resource(&sys->resources, bus_range);
+	return 1;
+
+out_unmap_cfg:
+	while (busn-- > bus_range->start)
+		devm_iounmap(dev, pci->cfg.win[busn - bus_range->start]);
+
+out_release_res:
+	release_child_resources(&iomem_resource);
+	release_child_resources(&ioport_resource);
+
+	list_for_each_entry(win, &sys->resources, list)
+		devm_kfree(dev, win->res);
+	pci_free_resource_list(&sys->resources);
+
+	devm_kfree(dev, pci->cfg.win);
+	return err;
+}
+
+static int gen_pci_probe(struct platform_device *pdev)
+{
+	struct hw_pci hw;
+	struct gen_pci *pci;
+	struct device *dev = &pdev->dev;
+
+	if (!dev->of_node)
+		return -ENODEV;
+
+	pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
+	if (!pci)
+		return -ENOMEM;
+
+	pci->host.dev.parent = dev;
+	hw = (struct hw_pci) {
+		.nr_controllers	= 1,
+		.private_data	= (void **)&pci,
+		.setup		= gen_pci_setup,
+		.map_irq	= of_irq_parse_and_map_pci,
+		.ops		= &gen_pci_ops,
+	};
+
+	pci_common_init_dev(dev, &hw);
+	return 0;
+}
+
+static struct platform_driver gen_pci_driver = {
+	.driver = {
+		.name = "pci-host-generic",
+		.owner = THIS_MODULE,
+		.of_match_table = gen_pci_of_match,
+	},
+	.probe = gen_pci_probe,
+};
+module_platform_driver(gen_pci_driver);
+
+MODULE_DESCRIPTION("Generic PCI host driver");
+MODULE_AUTHOR("Will Deacon <will.deacon@arm.com>");
+MODULE_LICENSE("GPLv2");
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] PCI: ARM: add support for generic PCI host controller
  2014-02-24 18:59 ` Will Deacon
@ 2014-02-24 19:23   ` Arnd Bergmann
  -1 siblings, 0 replies; 12+ messages in thread
From: Arnd Bergmann @ 2014-02-24 19:23 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: Will Deacon, bhelgaas, linux-pci, jgunthorpe

On Monday 24 February 2014 18:59:19 Will Deacon wrote:
> +    // BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2)
> +    ranges = <0x1000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> +             <0x2000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;

The I/O window looks very odd. Why would you start it at bus address
0x01000000 rather than 0 like everyone else?

> +static int gen_pci_calc_io_offset(struct device *dev,
> +				  struct of_pci_range *range,
> +				  struct resource *res,
> +				  resource_size_t *offset)
> +{
> +	static atomic_t wins = ATOMIC_INIT(0);
> +	int err, idx, max_win;
> +	unsigned int io_offset;
> +
> +	/* Ensure the CPU physical address is aligned to a 64K boundary */
> +	range->size += range->cpu_addr & (SZ_64K - 1);
> +	range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
> +	range->pci_addr = round_down(range->pci_addr, SZ_64K);
> +
> +	if (range->size > SZ_64K)
> +		return -E2BIG;

What is the rounding for? Isn't it enough to round to PAGE_SIZE here?
Do you just want to make sure it works with 64K pages on ARM64? I guess
if you end up rounding for the mapping, you should later make sure
that you still only register the original resource.

Presumably, the size would normally be 64K, so if you add anything
for rounding, you just fail here. How about cropping to 64K in that
case?

> +	max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
> +	idx = atomic_inc_return(&wins);
> +	if (idx >= max_win)
> +		return -ENOSPC;
> +
> +	io_offset = (idx - 1) * SZ_64K;
> +	err = pci_ioremap_io(io_offset, range->cpu_addr);

As discussed the last time, calling it "io_offset" here
is extremely confusing. Don't do that, and call it "window"
or something appropriate instead.

> +	if (err)
> +		return err;
> +
> +	of_pci_range_to_resource(range, dev->of_node, res);
> +	res->start = io_offset + range->pci_addr;
> +	res->end = res->start + range->size - 1;

You have just rounded down range->pci_addr, so the resource
now contains more than what the bus node described.

> +	prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL);

of_property_read_bool()

> +	if (of_pci_parse_bus_range(np, &pci->cfg.bus_range))
> +		pci->cfg.bus_range = (struct resource) {
> +			.name	= np->name,
> +			.start	= 0,
> +			.end	= 0xff,
> +			.flags	= IORESOURCE_BUS,
> +		};

I wonder if this should just be part of of_pci_parse_bus_range(),
or maybe an extended helper.

> +	/* Limit the bus-range to fit within reg */
> +	bus_max = pci->cfg.bus_range.start +
> +		  (resource_size(&pci->cfg.res) >> pci->cfg.ops->bus_shift) - 1;
> +	pci->cfg.bus_range.end = min_t(resource_size_t, pci->cfg.bus_range.end,
> +				       bus_max);

I think I would instead print a warning and bail out here. This should
only happen for inconsistent DT properties.

> +		err = request_resource(parent, res);
> +		if (err)
> +			goto out_release_res;
> +
> +		pci_add_resource_offset(&sys->resources, res, offset);
> +	}

Wrong offset for the I/O space. Why not call pci_add_resource_offset()
directly from gen_pci_calc_io_offset where you have the right number?

> +	bus_range = &pci->cfg.bus_range;
> +	for (busn = bus_range->start; busn <= bus_range->end; ++busn) {
> +		u32 idx = busn - bus_range->start;
> +		u32 sz = 1 << pci->cfg.ops->bus_shift;
> +
> +		pci->cfg.win[idx] = devm_ioremap(dev,
> +						 pci->cfg.res.start + busn * sz,
> +						 sz);
> +		if (!pci->cfg.win[idx]) {
> +			err = -ENOMEM;
> +			goto out_unmap_cfg;
> +		}
> +	}

I saw a trick in the tegra driver that maps the buses dynamically during
probe. While I mentioned that we can't dynamically map/unmap the config
space during access, it's probably fine to just map each bus at the first
access, since that will happen during the initial bus probe that is allowed
to sleep.

	Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4] PCI: ARM: add support for generic PCI host controller
@ 2014-02-24 19:23   ` Arnd Bergmann
  0 siblings, 0 replies; 12+ messages in thread
From: Arnd Bergmann @ 2014-02-24 19:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 24 February 2014 18:59:19 Will Deacon wrote:
> +    // BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2)
> +    ranges = <0x1000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> +             <0x2000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;

The I/O window looks very odd. Why would you start it at bus address
0x01000000 rather than 0 like everyone else?

> +static int gen_pci_calc_io_offset(struct device *dev,
> +				  struct of_pci_range *range,
> +				  struct resource *res,
> +				  resource_size_t *offset)
> +{
> +	static atomic_t wins = ATOMIC_INIT(0);
> +	int err, idx, max_win;
> +	unsigned int io_offset;
> +
> +	/* Ensure the CPU physical address is aligned to a 64K boundary */
> +	range->size += range->cpu_addr & (SZ_64K - 1);
> +	range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
> +	range->pci_addr = round_down(range->pci_addr, SZ_64K);
> +
> +	if (range->size > SZ_64K)
> +		return -E2BIG;

What is the rounding for? Isn't it enough to round to PAGE_SIZE here?
Do you just want to make sure it works with 64K pages on ARM64? I guess
if you end up rounding for the mapping, you should later make sure
that you still only register the original resource.

Presumably, the size would normally be 64K, so if you add anything
for rounding, you just fail here. How about cropping to 64K in that
case?

> +	max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
> +	idx = atomic_inc_return(&wins);
> +	if (idx >= max_win)
> +		return -ENOSPC;
> +
> +	io_offset = (idx - 1) * SZ_64K;
> +	err = pci_ioremap_io(io_offset, range->cpu_addr);

As discussed the last time, calling it "io_offset" here
is extremely confusing. Don't do that, and call it "window"
or something appropriate instead.

> +	if (err)
> +		return err;
> +
> +	of_pci_range_to_resource(range, dev->of_node, res);
> +	res->start = io_offset + range->pci_addr;
> +	res->end = res->start + range->size - 1;

You have just rounded down range->pci_addr, so the resource
now contains more than what the bus node described.

> +	prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL);

of_property_read_bool()

> +	if (of_pci_parse_bus_range(np, &pci->cfg.bus_range))
> +		pci->cfg.bus_range = (struct resource) {
> +			.name	= np->name,
> +			.start	= 0,
> +			.end	= 0xff,
> +			.flags	= IORESOURCE_BUS,
> +		};

I wonder if this should just be part of of_pci_parse_bus_range(),
or maybe an extended helper.

> +	/* Limit the bus-range to fit within reg */
> +	bus_max = pci->cfg.bus_range.start +
> +		  (resource_size(&pci->cfg.res) >> pci->cfg.ops->bus_shift) - 1;
> +	pci->cfg.bus_range.end = min_t(resource_size_t, pci->cfg.bus_range.end,
> +				       bus_max);

I think I would instead print a warning and bail out here. This should
only happen for inconsistent DT properties.

> +		err = request_resource(parent, res);
> +		if (err)
> +			goto out_release_res;
> +
> +		pci_add_resource_offset(&sys->resources, res, offset);
> +	}

Wrong offset for the I/O space. Why not call pci_add_resource_offset()
directly from gen_pci_calc_io_offset where you have the right number?

> +	bus_range = &pci->cfg.bus_range;
> +	for (busn = bus_range->start; busn <= bus_range->end; ++busn) {
> +		u32 idx = busn - bus_range->start;
> +		u32 sz = 1 << pci->cfg.ops->bus_shift;
> +
> +		pci->cfg.win[idx] = devm_ioremap(dev,
> +						 pci->cfg.res.start + busn * sz,
> +						 sz);
> +		if (!pci->cfg.win[idx]) {
> +			err = -ENOMEM;
> +			goto out_unmap_cfg;
> +		}
> +	}

I saw a trick in the tegra driver that maps the buses dynamically during
probe. While I mentioned that we can't dynamically map/unmap the config
space during access, it's probably fine to just map each bus at the first
access, since that will happen during the initial bus probe that is allowed
to sleep.

	Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] PCI: ARM: add support for generic PCI host controller
  2014-02-24 19:23   ` Arnd Bergmann
@ 2014-02-25 18:01     ` Will Deacon
  -1 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2014-02-25 18:01 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linux-arm-kernel, bhelgaas, linux-pci, jgunthorpe

Hi Arnd,

On Mon, Feb 24, 2014 at 07:23:23PM +0000, Arnd Bergmann wrote:
> On Monday 24 February 2014 18:59:19 Will Deacon wrote:
> > +    // BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2)
> > +    ranges = <0x1000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> > +             <0x2000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
> 
> The I/O window looks very odd. Why would you start it at bus address
> 0x01000000 rather than 0 like everyone else?

I was just trying to change the example, since Jason didn't like me using
CPU physical 0x0. There was also mention of maintaining a 1:1 mapping between
bus address and CPU physical address in a previous version of this thread.

Would you be happy with bus address 0x0 and CPU physical 0x01000000? (I
don't have a feel for realism in my virtual environment :)

> > +static int gen_pci_calc_io_offset(struct device *dev,
> > +				  struct of_pci_range *range,
> > +				  struct resource *res,
> > +				  resource_size_t *offset)
> > +{
> > +	static atomic_t wins = ATOMIC_INIT(0);
> > +	int err, idx, max_win;
> > +	unsigned int io_offset;
> > +
> > +	/* Ensure the CPU physical address is aligned to a 64K boundary */
> > +	range->size += range->cpu_addr & (SZ_64K - 1);
> > +	range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
> > +	range->pci_addr = round_down(range->pci_addr, SZ_64K);
> > +
> > +	if (range->size > SZ_64K)
> > +		return -E2BIG;
> 
> What is the rounding for? Isn't it enough to round to PAGE_SIZE here?
> Do you just want to make sure it works with 64K pages on ARM64? I guess
> if you end up rounding for the mapping, you should later make sure
> that you still only register the original resource.

I need the alignment code so I can handle being passed a CPU physical
address which isn't 64K aligned. In this case, pci_ioremap_io needs a 64K
aligned address so that the __io macro does the right thing.

For example, I can tell kvmtool to place I/O space at CPU physical 0x16000,
bus address 0x6000 (so ports 0x0-0x5fff are unavailable). In that case, I
still need to map those lower ports for the __io macro to offset into the
window correctly.

> Presumably, the size would normally be 64K, so if you add anything
> for rounding, you just fail here. How about cropping to 64K in that
> case?

Yeah, I can do that instead of failing.

> > +	max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
> > +	idx = atomic_inc_return(&wins);
> > +	if (idx >= max_win)
> > +		return -ENOSPC;
> > +
> > +	io_offset = (idx - 1) * SZ_64K;
> > +	err = pci_ioremap_io(io_offset, range->cpu_addr);
> 
> As discussed the last time, calling it "io_offset" here
> is extremely confusing. Don't do that, and call it "window"
> or something appropriate instead.

It's called io_offset because it *is* the thing I'm passing to
pci_add_resource, as you point out later. I can change the name once we've
sorted that out (more below).

> > +	if (err)
> > +		return err;
> > +
> > +	of_pci_range_to_resource(range, dev->of_node, res);
> > +	res->start = io_offset + range->pci_addr;
> > +	res->end = res->start + range->size - 1;
> 
> You have just rounded down range->pci_addr, so the resource
> now contains more than what the bus node described.

I've attempted to fix this below.

> > +	prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL);
> 
> of_property_read_bool()

Ok.

> 
> > +	if (of_pci_parse_bus_range(np, &pci->cfg.bus_range))
> > +		pci->cfg.bus_range = (struct resource) {
> > +			.name	= np->name,
> > +			.start	= 0,
> > +			.end	= 0xff,
> > +			.flags	= IORESOURCE_BUS,
> > +		};
> 
> I wonder if this should just be part of of_pci_parse_bus_range(),
> or maybe an extended helper.

Sure, although I want to get to the point where we're happy with this driver
(from a technical perspective) before I try and split it up into helpers.

> > +	/* Limit the bus-range to fit within reg */
> > +	bus_max = pci->cfg.bus_range.start +
> > +		  (resource_size(&pci->cfg.res) >> pci->cfg.ops->bus_shift) - 1;
> > +	pci->cfg.bus_range.end = min_t(resource_size_t, pci->cfg.bus_range.end,
> > +				       bus_max);
> 
> I think I would instead print a warning and bail out here. This should
> only happen for inconsistent DT properties.

Not sure; that forces somebody to provide a bus-range property if their reg
property doesn't describe the entire [E]CAM space. I think the truncation is
useful in that case.

> > +		err = request_resource(parent, res);
> > +		if (err)
> > +			goto out_release_res;
> > +
> > +		pci_add_resource_offset(&sys->resources, res, offset);
> > +	}
> 
> Wrong offset for the I/O space. Why not call pci_add_resource_offset()
> directly from gen_pci_calc_io_offset where you have the right number?

You're thinking of passing in io_offset - range->pci_addr, right? So, along
with the earlier feedback, we get something along the lines of (I can move the
resource registration in here, but this is just for discussion atm):


static int gen_pci_calc_io_offset(struct device *dev,
				  struct of_pci_range *range,
				  struct resource *res,
				  resource_size_t *offset)
{
	static atomic_t wins = ATOMIC_INIT(0);
	int err, idx, max_win;
	unsigned int window;

	max_win = (IO_SPACE_LIMIT + 1) / SZ_64K;
	idx = atomic_inc_return(&wins);
	if (idx >= max_win)
		return -ENOSPC;

	window = (idx - 1) * SZ_64K;
	err = pci_ioremap_io(window, round_down(range->cpu_addr, SZ_64K));
	if (err)
		return err;

	of_pci_range_to_resource(range, dev->of_node, res);
	res->start = window + range->pci_addr;
	res->end = res->start + range->size - 1;
	*offset = window - range->pci_addr;
	return 0;
}


In that case, the PCI core (in pci_create_root_bus) does:


	/* Add initial resources to the bus */
	list_for_each_entry_safe(window, n, resources, list) {
		list_move_tail(&window->list, &bridge->windows);
		res = window->res;
		offset = window->offset;
		if (res->flags & IORESOURCE_BUS)
			pci_bus_insert_busn_res(b, bus, res->end);
		else
			pci_bus_add_resource(b, res, 0);
		if (offset) {
			if (resource_type(res) == IORESOURCE_IO)
				fmt = " (bus address [%#06llx-%#06llx])";
			else
				fmt = " (bus address [%#010llx-%#010llx])";
			snprintf(bus_addr, sizeof(bus_addr), fmt,
				 (unsigned long long) (res->start - offset),
				 (unsigned long long) (res->end - offset));
		} else
			bus_addr[0] = '\0';
		dev_info(&b->dev, "root bus resource %pR%s\n", res, bus_addr);
	}


so here, we print out the `bus address' by computing res->start (window +
range->pci_addr in my driver) - window->offset. That means that
window->offset needs to equal window, otherwise the probing code gets in a
pickle:

  pci_bus 0000:00: root bus resource [io 0x6000-0xffff] (bus address [0xc000-0x15fff])

and then attempts to re-assign BARs with bogus addresses.

If I do:

	*offset = window;

then things work as expected (this is all referring back to the kvmtool example
I mentioned earlier in this mail).

As discussed on IRC, we're almost certainly continuing to talk past each
other, but hopefully this example helps show where I'm coming from with the
current code.

> > +	bus_range = &pci->cfg.bus_range;
> > +	for (busn = bus_range->start; busn <= bus_range->end; ++busn) {
> > +		u32 idx = busn - bus_range->start;
> > +		u32 sz = 1 << pci->cfg.ops->bus_shift;
> > +
> > +		pci->cfg.win[idx] = devm_ioremap(dev,
> > +						 pci->cfg.res.start + busn * sz,
> > +						 sz);
> > +		if (!pci->cfg.win[idx]) {
> > +			err = -ENOMEM;
> > +			goto out_unmap_cfg;
> > +		}
> > +	}
> 
> I saw a trick in the tegra driver that maps the buses dynamically during
> probe. While I mentioned that we can't dynamically map/unmap the config
> space during access, it's probably fine to just map each bus at the first
> access, since that will happen during the initial bus probe that is allowed
> to sleep.

I don't really see what that gains us if we have the bus-range property
(which keeps the config accessors pretty clean).

Cheers,

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4] PCI: ARM: add support for generic PCI host controller
@ 2014-02-25 18:01     ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2014-02-25 18:01 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arnd,

On Mon, Feb 24, 2014 at 07:23:23PM +0000, Arnd Bergmann wrote:
> On Monday 24 February 2014 18:59:19 Will Deacon wrote:
> > +    // BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2)
> > +    ranges = <0x1000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> > +             <0x2000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
> 
> The I/O window looks very odd. Why would you start it at bus address
> 0x01000000 rather than 0 like everyone else?

I was just trying to change the example, since Jason didn't like me using
CPU physical 0x0. There was also mention of maintaining a 1:1 mapping between
bus address and CPU physical address in a previous version of this thread.

Would you be happy with bus address 0x0 and CPU physical 0x01000000? (I
don't have a feel for realism in my virtual environment :)

> > +static int gen_pci_calc_io_offset(struct device *dev,
> > +				  struct of_pci_range *range,
> > +				  struct resource *res,
> > +				  resource_size_t *offset)
> > +{
> > +	static atomic_t wins = ATOMIC_INIT(0);
> > +	int err, idx, max_win;
> > +	unsigned int io_offset;
> > +
> > +	/* Ensure the CPU physical address is aligned to a 64K boundary */
> > +	range->size += range->cpu_addr & (SZ_64K - 1);
> > +	range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
> > +	range->pci_addr = round_down(range->pci_addr, SZ_64K);
> > +
> > +	if (range->size > SZ_64K)
> > +		return -E2BIG;
> 
> What is the rounding for? Isn't it enough to round to PAGE_SIZE here?
> Do you just want to make sure it works with 64K pages on ARM64? I guess
> if you end up rounding for the mapping, you should later make sure
> that you still only register the original resource.

I need the alignment code so I can handle being passed a CPU physical
address which isn't 64K aligned. In this case, pci_ioremap_io needs a 64K
aligned address so that the __io macro does the right thing.

For example, I can tell kvmtool to place I/O space at CPU physical 0x16000,
bus address 0x6000 (so ports 0x0-0x5fff are unavailable). In that case, I
still need to map those lower ports for the __io macro to offset into the
window correctly.

> Presumably, the size would normally be 64K, so if you add anything
> for rounding, you just fail here. How about cropping to 64K in that
> case?

Yeah, I can do that instead of failing.

> > +	max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
> > +	idx = atomic_inc_return(&wins);
> > +	if (idx >= max_win)
> > +		return -ENOSPC;
> > +
> > +	io_offset = (idx - 1) * SZ_64K;
> > +	err = pci_ioremap_io(io_offset, range->cpu_addr);
> 
> As discussed the last time, calling it "io_offset" here
> is extremely confusing. Don't do that, and call it "window"
> or something appropriate instead.

It's called io_offset because it *is* the thing I'm passing to
pci_add_resource, as you point out later. I can change the name once we've
sorted that out (more below).

> > +	if (err)
> > +		return err;
> > +
> > +	of_pci_range_to_resource(range, dev->of_node, res);
> > +	res->start = io_offset + range->pci_addr;
> > +	res->end = res->start + range->size - 1;
> 
> You have just rounded down range->pci_addr, so the resource
> now contains more than what the bus node described.

I've attempted to fix this below.

> > +	prop = of_get_property(of_chosen, "linux,pci-probe-only", NULL);
> 
> of_property_read_bool()

Ok.

> 
> > +	if (of_pci_parse_bus_range(np, &pci->cfg.bus_range))
> > +		pci->cfg.bus_range = (struct resource) {
> > +			.name	= np->name,
> > +			.start	= 0,
> > +			.end	= 0xff,
> > +			.flags	= IORESOURCE_BUS,
> > +		};
> 
> I wonder if this should just be part of of_pci_parse_bus_range(),
> or maybe an extended helper.

Sure, although I want to get to the point where we're happy with this driver
(from a technical perspective) before I try and split it up into helpers.

> > +	/* Limit the bus-range to fit within reg */
> > +	bus_max = pci->cfg.bus_range.start +
> > +		  (resource_size(&pci->cfg.res) >> pci->cfg.ops->bus_shift) - 1;
> > +	pci->cfg.bus_range.end = min_t(resource_size_t, pci->cfg.bus_range.end,
> > +				       bus_max);
> 
> I think I would instead print a warning and bail out here. This should
> only happen for inconsistent DT properties.

Not sure; that forces somebody to provide a bus-range property if their reg
property doesn't describe the entire [E]CAM space. I think the truncation is
useful in that case.

> > +		err = request_resource(parent, res);
> > +		if (err)
> > +			goto out_release_res;
> > +
> > +		pci_add_resource_offset(&sys->resources, res, offset);
> > +	}
> 
> Wrong offset for the I/O space. Why not call pci_add_resource_offset()
> directly from gen_pci_calc_io_offset where you have the right number?

You're thinking of passing in io_offset - range->pci_addr, right? So, along
with the earlier feedback, we get something along the lines of (I can move the
resource registration in here, but this is just for discussion atm):


static int gen_pci_calc_io_offset(struct device *dev,
				  struct of_pci_range *range,
				  struct resource *res,
				  resource_size_t *offset)
{
	static atomic_t wins = ATOMIC_INIT(0);
	int err, idx, max_win;
	unsigned int window;

	max_win = (IO_SPACE_LIMIT + 1) / SZ_64K;
	idx = atomic_inc_return(&wins);
	if (idx >= max_win)
		return -ENOSPC;

	window = (idx - 1) * SZ_64K;
	err = pci_ioremap_io(window, round_down(range->cpu_addr, SZ_64K));
	if (err)
		return err;

	of_pci_range_to_resource(range, dev->of_node, res);
	res->start = window + range->pci_addr;
	res->end = res->start + range->size - 1;
	*offset = window - range->pci_addr;
	return 0;
}


In that case, the PCI core (in pci_create_root_bus) does:


	/* Add initial resources to the bus */
	list_for_each_entry_safe(window, n, resources, list) {
		list_move_tail(&window->list, &bridge->windows);
		res = window->res;
		offset = window->offset;
		if (res->flags & IORESOURCE_BUS)
			pci_bus_insert_busn_res(b, bus, res->end);
		else
			pci_bus_add_resource(b, res, 0);
		if (offset) {
			if (resource_type(res) == IORESOURCE_IO)
				fmt = " (bus address [%#06llx-%#06llx])";
			else
				fmt = " (bus address [%#010llx-%#010llx])";
			snprintf(bus_addr, sizeof(bus_addr), fmt,
				 (unsigned long long) (res->start - offset),
				 (unsigned long long) (res->end - offset));
		} else
			bus_addr[0] = '\0';
		dev_info(&b->dev, "root bus resource %pR%s\n", res, bus_addr);
	}


so here, we print out the `bus address' by computing res->start (window +
range->pci_addr in my driver) - window->offset. That means that
window->offset needs to equal window, otherwise the probing code gets in a
pickle:

  pci_bus 0000:00: root bus resource [io 0x6000-0xffff] (bus address [0xc000-0x15fff])

and then attempts to re-assign BARs with bogus addresses.

If I do:

	*offset = window;

then things work as expected (this is all referring back to the kvmtool example
I mentioned earlier in this mail).

As discussed on IRC, we're almost certainly continuing to talk past each
other, but hopefully this example helps show where I'm coming from with the
current code.

> > +	bus_range = &pci->cfg.bus_range;
> > +	for (busn = bus_range->start; busn <= bus_range->end; ++busn) {
> > +		u32 idx = busn - bus_range->start;
> > +		u32 sz = 1 << pci->cfg.ops->bus_shift;
> > +
> > +		pci->cfg.win[idx] = devm_ioremap(dev,
> > +						 pci->cfg.res.start + busn * sz,
> > +						 sz);
> > +		if (!pci->cfg.win[idx]) {
> > +			err = -ENOMEM;
> > +			goto out_unmap_cfg;
> > +		}
> > +	}
> 
> I saw a trick in the tegra driver that maps the buses dynamically during
> probe. While I mentioned that we can't dynamically map/unmap the config
> space during access, it's probably fine to just map each bus at the first
> access, since that will happen during the initial bus probe that is allowed
> to sleep.

I don't really see what that gains us if we have the bus-range property
(which keeps the config accessors pretty clean).

Cheers,

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] PCI: ARM: add support for generic PCI host controller
  2014-02-25 18:01     ` Will Deacon
@ 2014-02-25 22:30       ` Arnd Bergmann
  -1 siblings, 0 replies; 12+ messages in thread
From: Arnd Bergmann @ 2014-02-25 22:30 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arm-kernel, bhelgaas, linux-pci, jgunthorpe

On Tuesday 25 February 2014, Will Deacon wrote:
> Hi Arnd,
> 
> On Mon, Feb 24, 2014 at 07:23:23PM +0000, Arnd Bergmann wrote:
> > On Monday 24 February 2014 18:59:19 Will Deacon wrote:
> > > +    // BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2)
> > > +    ranges = <0x1000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> > > +             <0x2000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
> > 
> > The I/O window looks very odd. Why would you start it at bus address
> > 0x01000000 rather than 0 like everyone else?
> 
> I was just trying to change the example, since Jason didn't like me using
> CPU physical 0x0. There was also mention of maintaining a 1:1 mapping between
> bus address and CPU physical address in a previous version of this thread.
> 
> Would you be happy with bus address 0x0 and CPU physical 0x01000000? (I
> don't have a feel for realism in my virtual environment :)

That would be a good example, yes. The bus address normally starts at
zero so you can have legacy ISA components like the UART you have on
there.

> > > +static int gen_pci_calc_io_offset(struct device *dev,
> > > +				  struct of_pci_range *range,
> > > +				  struct resource *res,
> > > +				  resource_size_t *offset)
> > > +{
> > > +	static atomic_t wins = ATOMIC_INIT(0);
> > > +	int err, idx, max_win;
> > > +	unsigned int io_offset;
> > > +
> > > +	/* Ensure the CPU physical address is aligned to a 64K boundary */
> > > +	range->size += range->cpu_addr & (SZ_64K - 1);
> > > +	range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
> > > +	range->pci_addr = round_down(range->pci_addr, SZ_64K);
> > > +
> > > +	if (range->size > SZ_64K)
> > > +		return -E2BIG;
> > 
> > What is the rounding for? Isn't it enough to round to PAGE_SIZE here?
> > Do you just want to make sure it works with 64K pages on ARM64? I guess
> > if you end up rounding for the mapping, you should later make sure
> > that you still only register the original resource.
> 
> I need the alignment code so I can handle being passed a CPU physical
> address which isn't 64K aligned. In this case, pci_ioremap_io needs a 64K
> aligned address so that the __io macro does the right thing.

I still don't see why pci_ioremap_io needs 64K alignment. For all I can tell,
it just needs a page-aligned address.

> For example, I can tell kvmtool to place I/O space at CPU physical 0x16000,
> bus address 0x6000 (so ports 0x0-0x5fff are unavailable). In that case, I
> still need to map those lower ports for the __io macro to offset into the
> window correctly.
 
I don't think so. pci_ioremap_io will still try to map 64K of address space,
but you can map 0x16000-0x25fff to PCI_IO_VIRT_BASE. The io_offset in
that case will be negative (-0x6000).
 
> > > +	max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
> > > +	idx = atomic_inc_return(&wins);
> > > +	if (idx >= max_win)
> > > +		return -ENOSPC;
> > > +
> > > +	io_offset = (idx - 1) * SZ_64K;
> > > +	err = pci_ioremap_io(io_offset, range->cpu_addr);
> > 
> > As discussed the last time, calling it "io_offset" here
> > is extremely confusing. Don't do that, and call it "window"
> > or something appropriate instead.
> 
> It's called io_offset because it *is* the thing I'm passing to
> pci_add_resource, as you point out later. I can change the name once we've
> sorted that out (more below).

Ok.

> > > +	if (of_pci_parse_bus_range(np, &pci->cfg.bus_range))
> > > +		pci->cfg.bus_range = (struct resource) {
> > > +			.name	= np->name,
> > > +			.start	= 0,
> > > +			.end	= 0xff,
> > > +			.flags	= IORESOURCE_BUS,
> > > +		};
> > 
> > I wonder if this should just be part of of_pci_parse_bus_range(),
> > or maybe an extended helper.
> 
> Sure, although I want to get to the point where we're happy with this driver
> (from a technical perspective) before I try and split it up into helpers.

Right, makes sense.

> > > +	/* Limit the bus-range to fit within reg */
> > > +	bus_max = pci->cfg.bus_range.start +
> > > +		  (resource_size(&pci->cfg.res) >> pci->cfg.ops->bus_shift) - 1;
> > > +	pci->cfg.bus_range.end = min_t(resource_size_t, pci->cfg.bus_range.end,
> > > +				       bus_max);
> > 
> > I think I would instead print a warning and bail out here. This should
> > only happen for inconsistent DT properties.
> 
> Not sure; that forces somebody to provide a bus-range property if their reg
> property doesn't describe the entire [E]CAM space. I think the truncation is
> useful in that case.

Right, good point.

> > > +		err = request_resource(parent, res);
> > > +		if (err)
> > > +			goto out_release_res;
> > > +
> > > +		pci_add_resource_offset(&sys->resources, res, offset);
> > > +	}
> > 
> > Wrong offset for the I/O space. Why not call pci_add_resource_offset()
> > directly from gen_pci_calc_io_offset where you have the right number?
> 
> You're thinking of passing in io_offset - range->pci_addr, right? 

Right.

> So, along
> with the earlier feedback, we get something along the lines of (I can move the
> resource registration in here, but this is just for discussion atm):
>
> 
> static int gen_pci_calc_io_offset(struct device *dev,
> 				  struct of_pci_range *range,
> 				  struct resource *res,
> 				  resource_size_t *offset)
> {
> 	static atomic_t wins = ATOMIC_INIT(0);
> 	int err, idx, max_win;
> 	unsigned int window;
> 
> 	max_win = (IO_SPACE_LIMIT + 1) / SZ_64K;
> 	idx = atomic_inc_return(&wins);
> 	if (idx >= max_win)
> 		return -ENOSPC;
> 
> 	window = (idx - 1) * SZ_64K;
> 	err = pci_ioremap_io(window, round_down(range->cpu_addr, SZ_64K));
> 	if (err)
> 		return err;
> 
> 	of_pci_range_to_resource(range, dev->of_node, res);
> 	res->start = window + range->pci_addr;
> 	res->end = res->start + range->size - 1;
> 	*offset = window - range->pci_addr;
> 	return 0;
> }

The offset is good now, but the start is not: res->start refers to
the linux I/O space and should just be 'window' here. Adding up
window and pci_addr makes no sense. Think of the case where you
have two host bridges, one using bus address 0x0-0xffff and the other
one using 0x10000-0x1ffff. You really want the first one to
have res->start=0 and the second one res->start=0x10000, and
using offset=0. Instead the second one get start=0x20000,
which is wrong in both CPU and bus space.

The round_down() also still gets you into trouble here, because you
don't adjust the start or the size.

> In that case, the PCI core (in pci_create_root_bus) does:
> 
> 
> 	/* Add initial resources to the bus */
> 	list_for_each_entry_safe(window, n, resources, list) {
> 		list_move_tail(&window->list, &bridge->windows);
> 		res = window->res;
> 		offset = window->offset;
> 		if (res->flags & IORESOURCE_BUS)
> 			pci_bus_insert_busn_res(b, bus, res->end);
> 		else
> 			pci_bus_add_resource(b, res, 0);
> 		if (offset) {
> 			if (resource_type(res) == IORESOURCE_IO)
> 				fmt = " (bus address [%#06llx-%#06llx])";
> 			else
> 				fmt = " (bus address [%#010llx-%#010llx])";
> 			snprintf(bus_addr, sizeof(bus_addr), fmt,
> 				 (unsigned long long) (res->start - offset),
> 				 (unsigned long long) (res->end - offset));
> 		} else
> 			bus_addr[0] = '\0';
> 		dev_info(&b->dev, "root bus resource %pR%s\n", res, bus_addr);
> 	}
> 
> 
> so here, we print out the `bus address' by computing res->start (window +
> range->pci_addr in my driver) - window->offset. That means that
> window->offset needs to equal window, otherwise the probing code gets in a
> pickle:
> 
>   pci_bus 0000:00: root bus resource [io 0x6000-0xffff] (bus address [0xc000-0x15fff])

This is the result of res->start being wrong.

> and then attempts to re-assign BARs with bogus addresses.
> 
> If I do:
> 
> 	*offset = window;
> 
> then things work as expected (this is all referring back to the kvmtool example
> I mentioned earlier in this mail).

Yes, but only only because you have two bugs that cancel each other out
to some degree. It only works for the first bus though, as explained
above: if offset is >0, the Linux I/O port number that gets assigned
in the resource is no longer the one that maps to the one in your
virtual address space pointing to the right physical address.

> > > +	bus_range = &pci->cfg.bus_range;
> > > +	for (busn = bus_range->start; busn <= bus_range->end; ++busn) {
> > > +		u32 idx = busn - bus_range->start;
> > > +		u32 sz = 1 << pci->cfg.ops->bus_shift;
> > > +
> > > +		pci->cfg.win[idx] = devm_ioremap(dev,
> > > +						 pci->cfg.res.start + busn * sz,
> > > +						 sz);
> > > +		if (!pci->cfg.win[idx]) {
> > > +			err = -ENOMEM;
> > > +			goto out_unmap_cfg;
> > > +		}
> > > +	}
> > 
> > I saw a trick in the tegra driver that maps the buses dynamically during
> > probe. While I mentioned that we can't dynamically map/unmap the config
> > space during access, it's probably fine to just map each bus at the first
> > access, since that will happen during the initial bus probe that is allowed
> > to sleep.
> 
> I don't really see what that gains us if we have the bus-range property
> (which keeps the config accessors pretty clean).

The main difference is that you end up mapping only the buses that are used,
not all the ones that are allowed. It's quite typical to have a domain that
supports all 256 buses, but have all devices on the first bus. I suppose
this is always the case for kvmtool.

	Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4] PCI: ARM: add support for generic PCI host controller
@ 2014-02-25 22:30       ` Arnd Bergmann
  0 siblings, 0 replies; 12+ messages in thread
From: Arnd Bergmann @ 2014-02-25 22:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 25 February 2014, Will Deacon wrote:
> Hi Arnd,
> 
> On Mon, Feb 24, 2014 at 07:23:23PM +0000, Arnd Bergmann wrote:
> > On Monday 24 February 2014 18:59:19 Will Deacon wrote:
> > > +    // BUS_ADDRESS(3)  CPU_PHYSICAL(2)  SIZE(2)
> > > +    ranges = <0x1000000 0x0 0x01000000  0x0 0x01000000  0x0 0x00010000>,
> > > +             <0x2000000 0x0 0x41000000  0x0 0x41000000  0x0 0x3f000000>;
> > 
> > The I/O window looks very odd. Why would you start it at bus address
> > 0x01000000 rather than 0 like everyone else?
> 
> I was just trying to change the example, since Jason didn't like me using
> CPU physical 0x0. There was also mention of maintaining a 1:1 mapping between
> bus address and CPU physical address in a previous version of this thread.
> 
> Would you be happy with bus address 0x0 and CPU physical 0x01000000? (I
> don't have a feel for realism in my virtual environment :)

That would be a good example, yes. The bus address normally starts at
zero so you can have legacy ISA components like the UART you have on
there.

> > > +static int gen_pci_calc_io_offset(struct device *dev,
> > > +				  struct of_pci_range *range,
> > > +				  struct resource *res,
> > > +				  resource_size_t *offset)
> > > +{
> > > +	static atomic_t wins = ATOMIC_INIT(0);
> > > +	int err, idx, max_win;
> > > +	unsigned int io_offset;
> > > +
> > > +	/* Ensure the CPU physical address is aligned to a 64K boundary */
> > > +	range->size += range->cpu_addr & (SZ_64K - 1);
> > > +	range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
> > > +	range->pci_addr = round_down(range->pci_addr, SZ_64K);
> > > +
> > > +	if (range->size > SZ_64K)
> > > +		return -E2BIG;
> > 
> > What is the rounding for? Isn't it enough to round to PAGE_SIZE here?
> > Do you just want to make sure it works with 64K pages on ARM64? I guess
> > if you end up rounding for the mapping, you should later make sure
> > that you still only register the original resource.
> 
> I need the alignment code so I can handle being passed a CPU physical
> address which isn't 64K aligned. In this case, pci_ioremap_io needs a 64K
> aligned address so that the __io macro does the right thing.

I still don't see why pci_ioremap_io needs 64K alignment. For all I can tell,
it just needs a page-aligned address.

> For example, I can tell kvmtool to place I/O space at CPU physical 0x16000,
> bus address 0x6000 (so ports 0x0-0x5fff are unavailable). In that case, I
> still need to map those lower ports for the __io macro to offset into the
> window correctly.
 
I don't think so. pci_ioremap_io will still try to map 64K of address space,
but you can map 0x16000-0x25fff to PCI_IO_VIRT_BASE. The io_offset in
that case will be negative (-0x6000).
 
> > > +	max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
> > > +	idx = atomic_inc_return(&wins);
> > > +	if (idx >= max_win)
> > > +		return -ENOSPC;
> > > +
> > > +	io_offset = (idx - 1) * SZ_64K;
> > > +	err = pci_ioremap_io(io_offset, range->cpu_addr);
> > 
> > As discussed the last time, calling it "io_offset" here
> > is extremely confusing. Don't do that, and call it "window"
> > or something appropriate instead.
> 
> It's called io_offset because it *is* the thing I'm passing to
> pci_add_resource, as you point out later. I can change the name once we've
> sorted that out (more below).

Ok.

> > > +	if (of_pci_parse_bus_range(np, &pci->cfg.bus_range))
> > > +		pci->cfg.bus_range = (struct resource) {
> > > +			.name	= np->name,
> > > +			.start	= 0,
> > > +			.end	= 0xff,
> > > +			.flags	= IORESOURCE_BUS,
> > > +		};
> > 
> > I wonder if this should just be part of of_pci_parse_bus_range(),
> > or maybe an extended helper.
> 
> Sure, although I want to get to the point where we're happy with this driver
> (from a technical perspective) before I try and split it up into helpers.

Right, makes sense.

> > > +	/* Limit the bus-range to fit within reg */
> > > +	bus_max = pci->cfg.bus_range.start +
> > > +		  (resource_size(&pci->cfg.res) >> pci->cfg.ops->bus_shift) - 1;
> > > +	pci->cfg.bus_range.end = min_t(resource_size_t, pci->cfg.bus_range.end,
> > > +				       bus_max);
> > 
> > I think I would instead print a warning and bail out here. This should
> > only happen for inconsistent DT properties.
> 
> Not sure; that forces somebody to provide a bus-range property if their reg
> property doesn't describe the entire [E]CAM space. I think the truncation is
> useful in that case.

Right, good point.

> > > +		err = request_resource(parent, res);
> > > +		if (err)
> > > +			goto out_release_res;
> > > +
> > > +		pci_add_resource_offset(&sys->resources, res, offset);
> > > +	}
> > 
> > Wrong offset for the I/O space. Why not call pci_add_resource_offset()
> > directly from gen_pci_calc_io_offset where you have the right number?
> 
> You're thinking of passing in io_offset - range->pci_addr, right? 

Right.

> So, along
> with the earlier feedback, we get something along the lines of (I can move the
> resource registration in here, but this is just for discussion atm):
>
> 
> static int gen_pci_calc_io_offset(struct device *dev,
> 				  struct of_pci_range *range,
> 				  struct resource *res,
> 				  resource_size_t *offset)
> {
> 	static atomic_t wins = ATOMIC_INIT(0);
> 	int err, idx, max_win;
> 	unsigned int window;
> 
> 	max_win = (IO_SPACE_LIMIT + 1) / SZ_64K;
> 	idx = atomic_inc_return(&wins);
> 	if (idx >= max_win)
> 		return -ENOSPC;
> 
> 	window = (idx - 1) * SZ_64K;
> 	err = pci_ioremap_io(window, round_down(range->cpu_addr, SZ_64K));
> 	if (err)
> 		return err;
> 
> 	of_pci_range_to_resource(range, dev->of_node, res);
> 	res->start = window + range->pci_addr;
> 	res->end = res->start + range->size - 1;
> 	*offset = window - range->pci_addr;
> 	return 0;
> }

The offset is good now, but the start is not: res->start refers to
the linux I/O space and should just be 'window' here. Adding up
window and pci_addr makes no sense. Think of the case where you
have two host bridges, one using bus address 0x0-0xffff and the other
one using 0x10000-0x1ffff. You really want the first one to
have res->start=0 and the second one res->start=0x10000, and
using offset=0. Instead the second one get start=0x20000,
which is wrong in both CPU and bus space.

The round_down() also still gets you into trouble here, because you
don't adjust the start or the size.

> In that case, the PCI core (in pci_create_root_bus) does:
> 
> 
> 	/* Add initial resources to the bus */
> 	list_for_each_entry_safe(window, n, resources, list) {
> 		list_move_tail(&window->list, &bridge->windows);
> 		res = window->res;
> 		offset = window->offset;
> 		if (res->flags & IORESOURCE_BUS)
> 			pci_bus_insert_busn_res(b, bus, res->end);
> 		else
> 			pci_bus_add_resource(b, res, 0);
> 		if (offset) {
> 			if (resource_type(res) == IORESOURCE_IO)
> 				fmt = " (bus address [%#06llx-%#06llx])";
> 			else
> 				fmt = " (bus address [%#010llx-%#010llx])";
> 			snprintf(bus_addr, sizeof(bus_addr), fmt,
> 				 (unsigned long long) (res->start - offset),
> 				 (unsigned long long) (res->end - offset));
> 		} else
> 			bus_addr[0] = '\0';
> 		dev_info(&b->dev, "root bus resource %pR%s\n", res, bus_addr);
> 	}
> 
> 
> so here, we print out the `bus address' by computing res->start (window +
> range->pci_addr in my driver) - window->offset. That means that
> window->offset needs to equal window, otherwise the probing code gets in a
> pickle:
> 
>   pci_bus 0000:00: root bus resource [io 0x6000-0xffff] (bus address [0xc000-0x15fff])

This is the result of res->start being wrong.

> and then attempts to re-assign BARs with bogus addresses.
> 
> If I do:
> 
> 	*offset = window;
> 
> then things work as expected (this is all referring back to the kvmtool example
> I mentioned earlier in this mail).

Yes, but only only because you have two bugs that cancel each other out
to some degree. It only works for the first bus though, as explained
above: if offset is >0, the Linux I/O port number that gets assigned
in the resource is no longer the one that maps to the one in your
virtual address space pointing to the right physical address.

> > > +	bus_range = &pci->cfg.bus_range;
> > > +	for (busn = bus_range->start; busn <= bus_range->end; ++busn) {
> > > +		u32 idx = busn - bus_range->start;
> > > +		u32 sz = 1 << pci->cfg.ops->bus_shift;
> > > +
> > > +		pci->cfg.win[idx] = devm_ioremap(dev,
> > > +						 pci->cfg.res.start + busn * sz,
> > > +						 sz);
> > > +		if (!pci->cfg.win[idx]) {
> > > +			err = -ENOMEM;
> > > +			goto out_unmap_cfg;
> > > +		}
> > > +	}
> > 
> > I saw a trick in the tegra driver that maps the buses dynamically during
> > probe. While I mentioned that we can't dynamically map/unmap the config
> > space during access, it's probably fine to just map each bus at the first
> > access, since that will happen during the initial bus probe that is allowed
> > to sleep.
> 
> I don't really see what that gains us if we have the bus-range property
> (which keeps the config accessors pretty clean).

The main difference is that you end up mapping only the buses that are used,
not all the ones that are allowed. It's quite typical to have a domain that
supports all 256 buses, but have all devices on the first bus. I suppose
this is always the case for kvmtool.

	Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] PCI: ARM: add support for generic PCI host controller
  2014-02-25 22:30       ` Arnd Bergmann
@ 2014-02-26 17:46         ` Will Deacon
  -1 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2014-02-26 17:46 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linux-arm-kernel, bhelgaas, linux-pci, jgunthorpe

Hi Arnd,

On Tue, Feb 25, 2014 at 10:30:33PM +0000, Arnd Bergmann wrote:
> On Tuesday 25 February 2014, Will Deacon wrote:
> > I need the alignment code so I can handle being passed a CPU physical
> > address which isn't 64K aligned. In this case, pci_ioremap_io needs a 64K
> > aligned address so that the __io macro does the right thing.
> 
> I still don't see why pci_ioremap_io needs 64K alignment. For all I can tell,
> it just needs a page-aligned address.

I misread __io as an '|' rather than a '+'. You're right, we just need page
alignment for the mapping.

> > For example, I can tell kvmtool to place I/O space at CPU physical 0x16000,
> > bus address 0x6000 (so ports 0x0-0x5fff are unavailable). In that case, I
> > still need to map those lower ports for the __io macro to offset into the
> > window correctly.
>  
> I don't think so. pci_ioremap_io will still try to map 64K of address space,
> but you can map 0x16000-0x25fff to PCI_IO_VIRT_BASE. The io_offset in
> that case will be negative (-0x6000).

Yes, and that means that all this rounding is unnecessary; I can just fail
addresses that aren't page-aligned (which is much more reasonable).

> > 	of_pci_range_to_resource(range, dev->of_node, res);
> > 	res->start = window + range->pci_addr;
> > 	res->end = res->start + range->size - 1;
> > 	*offset = window - range->pci_addr;
> > 	return 0;
> > }
> 
> The offset is good now, but the start is not: res->start refers to
> the linux I/O space and should just be 'window' here. Adding up
> window and pci_addr makes no sense. Think of the case where you
> have two host bridges, one using bus address 0x0-0xffff and the other
> one using 0x10000-0x1ffff. You really want the first one to
> have res->start=0 and the second one res->start=0x10000, and
> using offset=0. Instead the second one get start=0x20000,
> which is wrong in both CPU and bus space.

Aha, so res->start is the offset into our virtual I/O space at which the
window starts? That's what I've been getting wrong -- I thought res->start
was a PCI bus address for some reason.

> The round_down() also still gets you into trouble here, because you
> don't adjust the start or the size.

I'll just get rid of the rounding.

> > If I do:
> > 
> > 	*offset = window;
> > 
> > then things work as expected (this is all referring back to the kvmtool example
> > I mentioned earlier in this mail).
> 
> Yes, but only only because you have two bugs that cancel each other out
> to some degree. It only works for the first bus though, as explained
> above: if offset is >0, the Linux I/O port number that gets assigned
> in the resource is no longer the one that maps to the one in your
> virtual address space pointing to the right physical address.

Ok, so taking all this on board I end up with the diff below (against v4).
It seems to work well with my kvmtool setup:

  pci_bus 0000:00: root bus resource [io  0x0000-0x9fff] (bus address [0x6000-0xffff])

Will

--->8

diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c
index 71b0960772ec..b761f3e64a6a 100644
--- a/drivers/pci/host/pci-host-generic.c
+++ b/drivers/pci/host/pci-host-generic.c
@@ -133,30 +133,25 @@ static int gen_pci_calc_io_offset(struct device *dev,
 {
        static atomic_t wins = ATOMIC_INIT(0);
        int err, idx, max_win;
-       unsigned int io_offset;
+       unsigned int window;
 
-       /* Ensure the CPU physical address is aligned to a 64K boundary */
-       range->size += range->cpu_addr & (SZ_64K - 1);
-       range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
-       range->pci_addr = round_down(range->pci_addr, SZ_64K);
-
-       if (range->size > SZ_64K)
-               return -E2BIG;
+       if (!PAGE_ALIGNED(range->cpu_addr))
+               return -EINVAL;
 
-       max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
+       max_win = (IO_SPACE_LIMIT + 1) / SZ_64K;
        idx = atomic_inc_return(&wins);
        if (idx >= max_win)
                return -ENOSPC;
 
-       io_offset = (idx - 1) * SZ_64K;
-       err = pci_ioremap_io(io_offset, range->cpu_addr);
+       window = (idx - 1) * SZ_64K;
+       err = pci_ioremap_io(window, range->cpu_addr);
        if (err)
                return err;
 
        of_pci_range_to_resource(range, dev->of_node, res);
-       res->start = io_offset + range->pci_addr;
+       res->start = window;
        res->end = res->start + range->size - 1;
-       *offset = io_offset;
+       *offset = window - range->pci_addr;
        return 0;
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4] PCI: ARM: add support for generic PCI host controller
@ 2014-02-26 17:46         ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2014-02-26 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arnd,

On Tue, Feb 25, 2014 at 10:30:33PM +0000, Arnd Bergmann wrote:
> On Tuesday 25 February 2014, Will Deacon wrote:
> > I need the alignment code so I can handle being passed a CPU physical
> > address which isn't 64K aligned. In this case, pci_ioremap_io needs a 64K
> > aligned address so that the __io macro does the right thing.
> 
> I still don't see why pci_ioremap_io needs 64K alignment. For all I can tell,
> it just needs a page-aligned address.

I misread __io as an '|' rather than a '+'. You're right, we just need page
alignment for the mapping.

> > For example, I can tell kvmtool to place I/O space at CPU physical 0x16000,
> > bus address 0x6000 (so ports 0x0-0x5fff are unavailable). In that case, I
> > still need to map those lower ports for the __io macro to offset into the
> > window correctly.
>  
> I don't think so. pci_ioremap_io will still try to map 64K of address space,
> but you can map 0x16000-0x25fff to PCI_IO_VIRT_BASE. The io_offset in
> that case will be negative (-0x6000).

Yes, and that means that all this rounding is unnecessary; I can just fail
addresses that aren't page-aligned (which is much more reasonable).

> > 	of_pci_range_to_resource(range, dev->of_node, res);
> > 	res->start = window + range->pci_addr;
> > 	res->end = res->start + range->size - 1;
> > 	*offset = window - range->pci_addr;
> > 	return 0;
> > }
> 
> The offset is good now, but the start is not: res->start refers to
> the linux I/O space and should just be 'window' here. Adding up
> window and pci_addr makes no sense. Think of the case where you
> have two host bridges, one using bus address 0x0-0xffff and the other
> one using 0x10000-0x1ffff. You really want the first one to
> have res->start=0 and the second one res->start=0x10000, and
> using offset=0. Instead the second one get start=0x20000,
> which is wrong in both CPU and bus space.

Aha, so res->start is the offset into our virtual I/O space at which the
window starts? That's what I've been getting wrong -- I thought res->start
was a PCI bus address for some reason.

> The round_down() also still gets you into trouble here, because you
> don't adjust the start or the size.

I'll just get rid of the rounding.

> > If I do:
> > 
> > 	*offset = window;
> > 
> > then things work as expected (this is all referring back to the kvmtool example
> > I mentioned earlier in this mail).
> 
> Yes, but only only because you have two bugs that cancel each other out
> to some degree. It only works for the first bus though, as explained
> above: if offset is >0, the Linux I/O port number that gets assigned
> in the resource is no longer the one that maps to the one in your
> virtual address space pointing to the right physical address.

Ok, so taking all this on board I end up with the diff below (against v4).
It seems to work well with my kvmtool setup:

  pci_bus 0000:00: root bus resource [io  0x0000-0x9fff] (bus address [0x6000-0xffff])

Will

--->8

diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c
index 71b0960772ec..b761f3e64a6a 100644
--- a/drivers/pci/host/pci-host-generic.c
+++ b/drivers/pci/host/pci-host-generic.c
@@ -133,30 +133,25 @@ static int gen_pci_calc_io_offset(struct device *dev,
 {
        static atomic_t wins = ATOMIC_INIT(0);
        int err, idx, max_win;
-       unsigned int io_offset;
+       unsigned int window;
 
-       /* Ensure the CPU physical address is aligned to a 64K boundary */
-       range->size += range->cpu_addr & (SZ_64K - 1);
-       range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
-       range->pci_addr = round_down(range->pci_addr, SZ_64K);
-
-       if (range->size > SZ_64K)
-               return -E2BIG;
+       if (!PAGE_ALIGNED(range->cpu_addr))
+               return -EINVAL;
 
-       max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
+       max_win = (IO_SPACE_LIMIT + 1) / SZ_64K;
        idx = atomic_inc_return(&wins);
        if (idx >= max_win)
                return -ENOSPC;
 
-       io_offset = (idx - 1) * SZ_64K;
-       err = pci_ioremap_io(io_offset, range->cpu_addr);
+       window = (idx - 1) * SZ_64K;
+       err = pci_ioremap_io(window, range->cpu_addr);
        if (err)
                return err;
 
        of_pci_range_to_resource(range, dev->of_node, res);
-       res->start = io_offset + range->pci_addr;
+       res->start = window;
        res->end = res->start + range->size - 1;
-       *offset = io_offset;
+       *offset = window - range->pci_addr;
        return 0;
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] PCI: ARM: add support for generic PCI host controller
  2014-02-26 17:46         ` Will Deacon
@ 2014-02-26 18:01           ` Liviu Dudau
  -1 siblings, 0 replies; 12+ messages in thread
From: Liviu Dudau @ 2014-02-26 18:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnd Bergmann, linux-arm-kernel, bhelgaas, linux-pci, jgunthorpe

On Wed, Feb 26, 2014 at 05:46:59PM +0000, Will Deacon wrote:
> Hi Arnd,
> 
> On Tue, Feb 25, 2014 at 10:30:33PM +0000, Arnd Bergmann wrote:
> > On Tuesday 25 February 2014, Will Deacon wrote:
> > > I need the alignment code so I can handle being passed a CPU physical
> > > address which isn't 64K aligned. In this case, pci_ioremap_io needs a 64K
> > > aligned address so that the __io macro does the right thing.
> > 
> > I still don't see why pci_ioremap_io needs 64K alignment. For all I can tell,
> > it just needs a page-aligned address.
> 
> I misread __io as an '|' rather than a '+'. You're right, we just need page
> alignment for the mapping.
> 
> > > For example, I can tell kvmtool to place I/O space at CPU physical 0x16000,
> > > bus address 0x6000 (so ports 0x0-0x5fff are unavailable). In that case, I
> > > still need to map those lower ports for the __io macro to offset into the
> > > window correctly.
> >  
> > I don't think so. pci_ioremap_io will still try to map 64K of address space,
> > but you can map 0x16000-0x25fff to PCI_IO_VIRT_BASE. The io_offset in
> > that case will be negative (-0x6000).
> 
> Yes, and that means that all this rounding is unnecessary; I can just fail
> addresses that aren't page-aligned (which is much more reasonable).
> 
> > > 	of_pci_range_to_resource(range, dev->of_node, res);
> > > 	res->start = window + range->pci_addr;
> > > 	res->end = res->start + range->size - 1;
> > > 	*offset = window - range->pci_addr;
> > > 	return 0;
> > > }
> > 
> > The offset is good now, but the start is not: res->start refers to
> > the linux I/O space and should just be 'window' here. Adding up
> > window and pci_addr makes no sense. Think of the case where you
> > have two host bridges, one using bus address 0x0-0xffff and the other
> > one using 0x10000-0x1ffff. You really want the first one to
> > have res->start=0 and the second one res->start=0x10000, and
> > using offset=0. Instead the second one get start=0x20000,
> > which is wrong in both CPU and bus space.
> 
> Aha, so res->start is the offset into our virtual I/O space at which the
> window starts? That's what I've been getting wrong -- I thought res->start
> was a PCI bus address for some reason.

No, res->start should be start of logical I/O space. I will send today the
patch that fixes the of_pci_range_to_resource() call and then you should
no longer have to worry about offsets (other than keeping a logical io_offset
in the bridge to account for multiple host bridges being active).

> 
> > The round_down() also still gets you into trouble here, because you
> > don't adjust the start or the size.
> 
> I'll just get rid of the rounding.
> 
> > > If I do:
> > > 
> > > 	*offset = window;
> > > 
> > > then things work as expected (this is all referring back to the kvmtool example
> > > I mentioned earlier in this mail).
> > 
> > Yes, but only only because you have two bugs that cancel each other out
> > to some degree. It only works for the first bus though, as explained
> > above: if offset is >0, the Linux I/O port number that gets assigned
> > in the resource is no longer the one that maps to the one in your
> > virtual address space pointing to the right physical address.
> 
> Ok, so taking all this on board I end up with the diff below (against v4).
> It seems to work well with my kvmtool setup:
> 
>   pci_bus 0000:00: root bus resource [io  0x0000-0x9fff] (bus address [0x6000-0xffff])
> 
> Will
> 
> --->8
> 
> diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c
> index 71b0960772ec..b761f3e64a6a 100644
> --- a/drivers/pci/host/pci-host-generic.c
> +++ b/drivers/pci/host/pci-host-generic.c
> @@ -133,30 +133,25 @@ static int gen_pci_calc_io_offset(struct device *dev,
>  {
>         static atomic_t wins = ATOMIC_INIT(0);
>         int err, idx, max_win;
> -       unsigned int io_offset;
> +       unsigned int window;
>  
> -       /* Ensure the CPU physical address is aligned to a 64K boundary */
> -       range->size += range->cpu_addr & (SZ_64K - 1);
> -       range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
> -       range->pci_addr = round_down(range->pci_addr, SZ_64K);
> -
> -       if (range->size > SZ_64K)
> -               return -E2BIG;
> +       if (!PAGE_ALIGNED(range->cpu_addr))
> +               return -EINVAL;
>  
> -       max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
> +       max_win = (IO_SPACE_LIMIT + 1) / SZ_64K;
>         idx = atomic_inc_return(&wins);
>         if (idx >= max_win)
>                 return -ENOSPC;
>  
> -       io_offset = (idx - 1) * SZ_64K;
> -       err = pci_ioremap_io(io_offset, range->cpu_addr);
> +       window = (idx - 1) * SZ_64K;
> +       err = pci_ioremap_io(window, range->cpu_addr);
>         if (err)
>                 return err;
>  
>         of_pci_range_to_resource(range, dev->of_node, res);
> -       res->start = io_offset + range->pci_addr;
> +       res->start = window;
>         res->end = res->start + range->size - 1;
> -       *offset = io_offset;
> +       *offset = window - range->pci_addr;
>         return 0;
>  }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
-------------------
   .oooO
   (   )
    \ (  Oooo.
     \_) (   )
          ) /
         (_/

 One small step
   for me ...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4] PCI: ARM: add support for generic PCI host controller
@ 2014-02-26 18:01           ` Liviu Dudau
  0 siblings, 0 replies; 12+ messages in thread
From: Liviu Dudau @ 2014-02-26 18:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 26, 2014 at 05:46:59PM +0000, Will Deacon wrote:
> Hi Arnd,
> 
> On Tue, Feb 25, 2014 at 10:30:33PM +0000, Arnd Bergmann wrote:
> > On Tuesday 25 February 2014, Will Deacon wrote:
> > > I need the alignment code so I can handle being passed a CPU physical
> > > address which isn't 64K aligned. In this case, pci_ioremap_io needs a 64K
> > > aligned address so that the __io macro does the right thing.
> > 
> > I still don't see why pci_ioremap_io needs 64K alignment. For all I can tell,
> > it just needs a page-aligned address.
> 
> I misread __io as an '|' rather than a '+'. You're right, we just need page
> alignment for the mapping.
> 
> > > For example, I can tell kvmtool to place I/O space at CPU physical 0x16000,
> > > bus address 0x6000 (so ports 0x0-0x5fff are unavailable). In that case, I
> > > still need to map those lower ports for the __io macro to offset into the
> > > window correctly.
> >  
> > I don't think so. pci_ioremap_io will still try to map 64K of address space,
> > but you can map 0x16000-0x25fff to PCI_IO_VIRT_BASE. The io_offset in
> > that case will be negative (-0x6000).
> 
> Yes, and that means that all this rounding is unnecessary; I can just fail
> addresses that aren't page-aligned (which is much more reasonable).
> 
> > > 	of_pci_range_to_resource(range, dev->of_node, res);
> > > 	res->start = window + range->pci_addr;
> > > 	res->end = res->start + range->size - 1;
> > > 	*offset = window - range->pci_addr;
> > > 	return 0;
> > > }
> > 
> > The offset is good now, but the start is not: res->start refers to
> > the linux I/O space and should just be 'window' here. Adding up
> > window and pci_addr makes no sense. Think of the case where you
> > have two host bridges, one using bus address 0x0-0xffff and the other
> > one using 0x10000-0x1ffff. You really want the first one to
> > have res->start=0 and the second one res->start=0x10000, and
> > using offset=0. Instead the second one get start=0x20000,
> > which is wrong in both CPU and bus space.
> 
> Aha, so res->start is the offset into our virtual I/O space at which the
> window starts? That's what I've been getting wrong -- I thought res->start
> was a PCI bus address for some reason.

No, res->start should be start of logical I/O space. I will send today the
patch that fixes the of_pci_range_to_resource() call and then you should
no longer have to worry about offsets (other than keeping a logical io_offset
in the bridge to account for multiple host bridges being active).

> 
> > The round_down() also still gets you into trouble here, because you
> > don't adjust the start or the size.
> 
> I'll just get rid of the rounding.
> 
> > > If I do:
> > > 
> > > 	*offset = window;
> > > 
> > > then things work as expected (this is all referring back to the kvmtool example
> > > I mentioned earlier in this mail).
> > 
> > Yes, but only only because you have two bugs that cancel each other out
> > to some degree. It only works for the first bus though, as explained
> > above: if offset is >0, the Linux I/O port number that gets assigned
> > in the resource is no longer the one that maps to the one in your
> > virtual address space pointing to the right physical address.
> 
> Ok, so taking all this on board I end up with the diff below (against v4).
> It seems to work well with my kvmtool setup:
> 
>   pci_bus 0000:00: root bus resource [io  0x0000-0x9fff] (bus address [0x6000-0xffff])
> 
> Will
> 
> --->8
> 
> diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c
> index 71b0960772ec..b761f3e64a6a 100644
> --- a/drivers/pci/host/pci-host-generic.c
> +++ b/drivers/pci/host/pci-host-generic.c
> @@ -133,30 +133,25 @@ static int gen_pci_calc_io_offset(struct device *dev,
>  {
>         static atomic_t wins = ATOMIC_INIT(0);
>         int err, idx, max_win;
> -       unsigned int io_offset;
> +       unsigned int window;
>  
> -       /* Ensure the CPU physical address is aligned to a 64K boundary */
> -       range->size += range->cpu_addr & (SZ_64K - 1);
> -       range->cpu_addr = round_down(range->cpu_addr, SZ_64K);
> -       range->pci_addr = round_down(range->pci_addr, SZ_64K);
> -
> -       if (range->size > SZ_64K)
> -               return -E2BIG;
> +       if (!PAGE_ALIGNED(range->cpu_addr))
> +               return -EINVAL;
>  
> -       max_win = IO_SPACE_LIMIT + 1 / SZ_64K;
> +       max_win = (IO_SPACE_LIMIT + 1) / SZ_64K;
>         idx = atomic_inc_return(&wins);
>         if (idx >= max_win)
>                 return -ENOSPC;
>  
> -       io_offset = (idx - 1) * SZ_64K;
> -       err = pci_ioremap_io(io_offset, range->cpu_addr);
> +       window = (idx - 1) * SZ_64K;
> +       err = pci_ioremap_io(window, range->cpu_addr);
>         if (err)
>                 return err;
>  
>         of_pci_range_to_resource(range, dev->of_node, res);
> -       res->start = io_offset + range->pci_addr;
> +       res->start = window;
>         res->end = res->start + range->size - 1;
> -       *offset = io_offset;
> +       *offset = window - range->pci_addr;
>         return 0;
>  }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
-------------------
   .oooO
   (   )
    \ (  Oooo.
     \_) (   )
          ) /
         (_/

 One small step
   for me ...

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-02-26 18:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-24 18:59 [PATCH v4] PCI: ARM: add support for generic PCI host controller Will Deacon
2014-02-24 18:59 ` Will Deacon
2014-02-24 19:23 ` Arnd Bergmann
2014-02-24 19:23   ` Arnd Bergmann
2014-02-25 18:01   ` Will Deacon
2014-02-25 18:01     ` Will Deacon
2014-02-25 22:30     ` Arnd Bergmann
2014-02-25 22:30       ` Arnd Bergmann
2014-02-26 17:46       ` Will Deacon
2014-02-26 17:46         ` Will Deacon
2014-02-26 18:01         ` Liviu Dudau
2014-02-26 18:01           ` Liviu Dudau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.