All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH V3 00/17] Enable SRIOV on POWER8
@ 2014-06-10  1:56 ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

This patch set enable the SRIOV on POWER8. This is not the final version, some
patches rely on un-merged patches.

The gerneral idea is put each VF in their own PE and allocated necessary
resources, like DMA/IOMMU_TABLE.

One thing special for VF PE is we use M64BT to cover the IOV BAR. This means
we need to do some hack on pci devices's resources.
1. Expand the IOV BAR properly.
2. Shift the IOV BAR properly.
3. IOV BAR alignment is the total size instead of an individual size.
4. Take the IOV BAR alignment into consideration in the sizing and assigning.

Test Environment:
       The SRIOV device tested is Emulex Lancer and Mellanox ConnectX-3.

Examples on pass through a VF to guest through vfio:
	1. install necessary modules
	   modprobe vfio
	   modprobe vfio-pci
	2. retrieve the iommu_group the device belongs to
	   readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group
	   ../../../../kernel/iommu_groups/26
	   This means it belongs to group 26
	3. see how many devices under this iommu_group
	   ls ls /sys/kernel/iommu_groups/26/devices/
	4. unbind the original driver and bind to vfio-pci driver
	   echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind
	   echo  1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
	   Note: this should be done for each device in the same iommu_group
	5. Start qemu and pass device through vfio
	   /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \
		   -M pseries -m 2048 -enable-kvm -nographic \
		   -drive file=/home/ywywyang/kvm/fc19.img \
		   -monitor telnet:localhost:5435,server,nowait -boot cd \
		   -device "spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6"

Verify this is the exact VF response:
	1. ping from a machine in the same subnet(the broadcast domain)
	2. run arp -n on this machine
	   9.115.251.20             ether   00:00:c9:df:ed:bf   C eth0
	3. ifconfig in the guest
	   # ifconfig eth1
	   eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
	        inet 9.115.251.20  netmask 255.255.255.0  broadcast 9.115.251.255
		inet6 fe80::200:c9ff:fedf:edbf  prefixlen 64  scopeid 0x20<link>
	        ether 00:00:c9:df:ed:bf  txqueuelen 1000 (Ethernet)
	        RX packets 175  bytes 13278 (12.9 KiB)
	        RX errors 0  dropped 0  overruns 0  frame 0
		TX packets 58  bytes 9276 (9.0 KiB)
	        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
	4. They have the same MAC address

	Note: make sure you shutdown other network interfaces in guest.

---
v2 -> v3:
   1. change the return type of virtfn_bus/virtfn_devfn to int
      change the name of these two functions to pci_iov_virtfn_bus/pci_iov_virtfn_devfn
   2. reduce the second parameter or pcibios_sriov_disable()
   3. use data instead of pe in "ppc/pnv: allocate pe->iommu_table dynamically"
   4. rename __pci_sriov_resource_size to pcibios_sriov_resource_size
   5. rename __pci_sriov_resource_alignment to pcibios_sriov_resource_alignment
v1 -> v2:
   1. change the return value of virtfn_bus/virtfn_devfn to 0
   2. move some TCE related marco definition to
      arch/powerpc/platforms/powernv/pci.h
   3. fix the __pci_sriov_resource_alignment on powernv platform
      During the sizing stage, the IOV BAR is truncated to 0, which will
      effect the order of allocation. Fix this, so that make sure BAR will be
      allocated ordered by their alignment.
v0 -> v1:
   1. Improve the change log for
      "PCI: Add weak __pci_sriov_resource_size() interface"
      "PCI: Add weak __pci_sriov_resource_alignment() interface"
      "PCI: take additional IOV BAR alignment in sizing and assigning"
   2. Wrap VF PE code in CONFIG_PCI_IOV
   3. Did regression test on P7.

Wei Yang (17):
  pci/iov: Export interface for retrieve VF's BDF
  pci/of: Match PCI VFs to dev-tree nodes dynamically
  ppc/pci: don't unset pci resources for VFs
  PCI: SRIOV: add VF enable/disable hook
  ppc/pnv: user macro to define the TCE size
  ppc/pnv: allocate pe->iommu_table dynamically
  ppc/pnv: Add function to deconfig a PE
  PCI: Add weak pcibios_sriov_resource_size() interface
  PCI: Add weak pcibios_sriov_resource_alignment() interface
  PCI: take additional IOV BAR alignment in sizing and assigning
  ppc/pnv: Expand VF resources according to the number of total_pe
  powerpc/powernv: implement pcibios_sriov_resource_alignment on
    powernv
  powerpc/powernv: shift VF resource with an offset
  ppc/pci: create/release dev-tree node for VFs
  powerpc/powernv: allocate VF PE
  ppc/pci: Expanding IOV BAR, with m64_per_iov supported
  ppc/pnv: Group VF PE when IOV BAR is big on PHB3

 arch/powerpc/include/asm/iommu.h          |    3 +
 arch/powerpc/include/asm/machdep.h        |    7 +
 arch/powerpc/include/asm/pci-bridge.h     |    7 +
 arch/powerpc/include/asm/tce.h            |    3 +-
 arch/powerpc/kernel/pci-common.c          |   29 +
 arch/powerpc/platforms/powernv/Kconfig    |    1 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  824 +++++++++++++++++++++++++++--
 arch/powerpc/platforms/powernv/pci.c      |   22 +-
 arch/powerpc/platforms/powernv/pci.h      |   17 +-
 drivers/pci/iov.c                         |   84 ++-
 drivers/pci/pci.h                         |   21 -
 drivers/pci/setup-bus.c                   |   66 ++-
 include/linux/pci.h                       |   46 ++
 13 files changed, 1041 insertions(+), 89 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 00/17] Enable SRIOV on POWER8
@ 2014-06-10  1:56 ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

This patch set enable the SRIOV on POWER8. This is not the final version, some
patches rely on un-merged patches.

The gerneral idea is put each VF in their own PE and allocated necessary
resources, like DMA/IOMMU_TABLE.

One thing special for VF PE is we use M64BT to cover the IOV BAR. This means
we need to do some hack on pci devices's resources.
1. Expand the IOV BAR properly.
2. Shift the IOV BAR properly.
3. IOV BAR alignment is the total size instead of an individual size.
4. Take the IOV BAR alignment into consideration in the sizing and assigning.

Test Environment:
       The SRIOV device tested is Emulex Lancer and Mellanox ConnectX-3.

Examples on pass through a VF to guest through vfio:
	1. install necessary modules
	   modprobe vfio
	   modprobe vfio-pci
	2. retrieve the iommu_group the device belongs to
	   readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group
	   ../../../../kernel/iommu_groups/26
	   This means it belongs to group 26
	3. see how many devices under this iommu_group
	   ls ls /sys/kernel/iommu_groups/26/devices/
	4. unbind the original driver and bind to vfio-pci driver
	   echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind
	   echo  1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
	   Note: this should be done for each device in the same iommu_group
	5. Start qemu and pass device through vfio
	   /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \
		   -M pseries -m 2048 -enable-kvm -nographic \
		   -drive file=/home/ywywyang/kvm/fc19.img \
		   -monitor telnet:localhost:5435,server,nowait -boot cd \
		   -device "spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6"

Verify this is the exact VF response:
	1. ping from a machine in the same subnet(the broadcast domain)
	2. run arp -n on this machine
	   9.115.251.20             ether   00:00:c9:df:ed:bf   C eth0
	3. ifconfig in the guest
	   # ifconfig eth1
	   eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
	        inet 9.115.251.20  netmask 255.255.255.0  broadcast 9.115.251.255
		inet6 fe80::200:c9ff:fedf:edbf  prefixlen 64  scopeid 0x20<link>
	        ether 00:00:c9:df:ed:bf  txqueuelen 1000 (Ethernet)
	        RX packets 175  bytes 13278 (12.9 KiB)
	        RX errors 0  dropped 0  overruns 0  frame 0
		TX packets 58  bytes 9276 (9.0 KiB)
	        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
	4. They have the same MAC address

	Note: make sure you shutdown other network interfaces in guest.

---
v2 -> v3:
   1. change the return type of virtfn_bus/virtfn_devfn to int
      change the name of these two functions to pci_iov_virtfn_bus/pci_iov_virtfn_devfn
   2. reduce the second parameter or pcibios_sriov_disable()
   3. use data instead of pe in "ppc/pnv: allocate pe->iommu_table dynamically"
   4. rename __pci_sriov_resource_size to pcibios_sriov_resource_size
   5. rename __pci_sriov_resource_alignment to pcibios_sriov_resource_alignment
v1 -> v2:
   1. change the return value of virtfn_bus/virtfn_devfn to 0
   2. move some TCE related marco definition to
      arch/powerpc/platforms/powernv/pci.h
   3. fix the __pci_sriov_resource_alignment on powernv platform
      During the sizing stage, the IOV BAR is truncated to 0, which will
      effect the order of allocation. Fix this, so that make sure BAR will be
      allocated ordered by their alignment.
v0 -> v1:
   1. Improve the change log for
      "PCI: Add weak __pci_sriov_resource_size() interface"
      "PCI: Add weak __pci_sriov_resource_alignment() interface"
      "PCI: take additional IOV BAR alignment in sizing and assigning"
   2. Wrap VF PE code in CONFIG_PCI_IOV
   3. Did regression test on P7.

Wei Yang (17):
  pci/iov: Export interface for retrieve VF's BDF
  pci/of: Match PCI VFs to dev-tree nodes dynamically
  ppc/pci: don't unset pci resources for VFs
  PCI: SRIOV: add VF enable/disable hook
  ppc/pnv: user macro to define the TCE size
  ppc/pnv: allocate pe->iommu_table dynamically
  ppc/pnv: Add function to deconfig a PE
  PCI: Add weak pcibios_sriov_resource_size() interface
  PCI: Add weak pcibios_sriov_resource_alignment() interface
  PCI: take additional IOV BAR alignment in sizing and assigning
  ppc/pnv: Expand VF resources according to the number of total_pe
  powerpc/powernv: implement pcibios_sriov_resource_alignment on
    powernv
  powerpc/powernv: shift VF resource with an offset
  ppc/pci: create/release dev-tree node for VFs
  powerpc/powernv: allocate VF PE
  ppc/pci: Expanding IOV BAR, with m64_per_iov supported
  ppc/pnv: Group VF PE when IOV BAR is big on PHB3

 arch/powerpc/include/asm/iommu.h          |    3 +
 arch/powerpc/include/asm/machdep.h        |    7 +
 arch/powerpc/include/asm/pci-bridge.h     |    7 +
 arch/powerpc/include/asm/tce.h            |    3 +-
 arch/powerpc/kernel/pci-common.c          |   29 +
 arch/powerpc/platforms/powernv/Kconfig    |    1 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  824 +++++++++++++++++++++++++++--
 arch/powerpc/platforms/powernv/pci.c      |   22 +-
 arch/powerpc/platforms/powernv/pci.h      |   17 +-
 drivers/pci/iov.c                         |   84 ++-
 drivers/pci/pci.h                         |   21 -
 drivers/pci/setup-bus.c                   |   66 ++-
 include/linux/pci.h                       |   46 ++
 13 files changed, 1041 insertions(+), 89 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 01/17] pci/iov: Export interface for retrieve VF's BDF
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

When implementing the SR-IOV on PowerNV platform, some resource reservation is
needed for VFs which don't exist at the bootup stage. To do the match between
resources and VFs, the code need to get the VF's BDF in advance.

In this patch, it exports the interface to retrieve VF's BDF:
   * Make the virtfn_bus as an interface
   * Make the virtfn_devfn as an interface
   * rename them with more specific name
   * code cleanup in pci_sriov_resource_alignment()

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c   |   26 +++++++-------------------
 drivers/pci/pci.h   |   21 ---------------------
 include/linux/pci.h |   43 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 9dce7c5..589ef7d 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -19,18 +19,6 @@
 
 #define VIRTFN_ID_LEN	16
 
-static inline u8 virtfn_bus(struct pci_dev *dev, int id)
-{
-	return dev->bus->number + ((dev->devfn + dev->sriov->offset +
-				    dev->sriov->stride * id) >> 8);
-}
-
-static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
-{
-	return (dev->devfn + dev->sriov->offset +
-		dev->sriov->stride * id) & 0xff;
-}
-
 static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 {
 	struct pci_bus *child;
@@ -69,7 +57,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
 	struct pci_bus *bus;
 
 	mutex_lock(&iov->dev->sriov->lock);
-	bus = virtfn_add_bus(dev->bus, virtfn_bus(dev, id));
+	bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id));
 	if (!bus)
 		goto failed;
 
@@ -77,7 +65,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
 	if (!virtfn)
 		goto failed0;
 
-	virtfn->devfn = virtfn_devfn(dev, id);
+	virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
 	virtfn->vendor = dev->vendor;
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
 	pci_setup_device(virtfn);
@@ -140,8 +128,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset)
 	struct pci_sriov *iov = dev->sriov;
 
 	virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
-					     virtfn_bus(dev, id),
-					     virtfn_devfn(dev, id));
+					     pci_iov_virtfn_bus(dev, id),
+					     pci_iov_virtfn_devfn(dev, id));
 	if (!virtfn)
 		return;
 
@@ -307,7 +295,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	iov->offset = offset;
 	iov->stride = stride;
 
-	if (virtfn_bus(dev, nr_virtfn - 1) > dev->bus->busn_res.end) {
+	if (pci_iov_virtfn_bus(dev, nr_virtfn - 1) > dev->bus->busn_res.end) {
 		dev_err(&dev->dev, "SR-IOV: bus number out of range\n");
 		return -ENOMEM;
 	}
@@ -616,7 +604,7 @@ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 	if (!reg)
 		return 0;
 
-	 __pci_read_base(dev, type, &tmp, reg);
+	__pci_read_base(dev, type, &tmp, reg);
 	return resource_alignment(&tmp);
 }
 
@@ -646,7 +634,7 @@ int pci_iov_bus_range(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		if (!dev->is_physfn)
 			continue;
-		busnr = virtfn_bus(dev, dev->sriov->total_VFs - 1);
+		busnr = pci_iov_virtfn_bus(dev, dev->sriov->total_VFs - 1);
 		if (busnr > max)
 			max = busnr;
 	}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4df38df..51f1f7c 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -223,27 +223,6 @@ static inline int pci_ari_enabled(struct pci_bus *bus)
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
 
-/* Single Root I/O Virtualization */
-struct pci_sriov {
-	int pos;		/* capability position */
-	int nres;		/* number of resources */
-	u32 cap;		/* SR-IOV Capabilities */
-	u16 ctrl;		/* SR-IOV Control */
-	u16 total_VFs;		/* total VFs associated with the PF */
-	u16 initial_VFs;	/* initial VFs associated with the PF */
-	u16 num_VFs;		/* number of VFs available */
-	u16 offset;		/* first VF Routing ID offset */
-	u16 stride;		/* following VF stride */
-	u32 pgsz;		/* page size for BAR alignment */
-	u8 link;		/* Function Dependency Link */
-	u16 driver_max_VFs;	/* max num VFs driver supports */
-	struct pci_dev *dev;	/* lowest numbered PF */
-	struct pci_dev *self;	/* this PF */
-	struct mutex lock;	/* lock for VF bus */
-	struct work_struct mtask; /* VF Migration task */
-	u8 __iomem *mstate;	/* VF Migration State Array */
-};
-
 #ifdef CONFIG_PCI_ATS
 void pci_restore_ats_state(struct pci_dev *dev);
 #else
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 33aa2ca..ddb1ca0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -240,6 +240,27 @@ struct pci_vpd;
 struct pci_sriov;
 struct pci_ats;
 
+/* Single Root I/O Virtualization */
+struct pci_sriov {
+	int pos;		/* capability position */
+	int nres;		/* number of resources */
+	u32 cap;		/* SR-IOV Capabilities */
+	u16 ctrl;		/* SR-IOV Control */
+	u16 total_VFs;		/* total VFs associated with the PF */
+	u16 initial_VFs;	/* initial VFs associated with the PF */
+	u16 num_VFs;		/* number of VFs available */
+	u16 offset;		/* first VF Routing ID offset */
+	u16 stride;		/* following VF stride */
+	u32 pgsz;		/* page size for BAR alignment */
+	u8 link;		/* Function Dependency Link */
+	u16 driver_max_VFs;	/* max num VFs driver supports */
+	struct pci_dev *dev;	/* lowest numbered PF */
+	struct pci_dev *self;	/* this PF */
+	struct mutex lock;	/* lock for VF bus */
+	struct work_struct mtask; /* VF Migration task */
+	u8 __iomem *mstate;	/* VF Migration State Array */
+};
+
 /*
  * The pci_dev structure is used to describe PCI devices.
  */
@@ -1595,6 +1616,20 @@ int pci_ext_cfg_avail(void);
 void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
 
 #ifdef CONFIG_PCI_IOV
+static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
+{
+	if (!dev->is_physfn)
+		return -1;
+	return dev->bus->number + ((dev->devfn + dev->sriov->offset +
+				    dev->sriov->stride * id) >> 8);
+}
+static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
+{
+	if (!dev->is_physfn)
+		return -1;
+	return (dev->devfn + dev->sriov->offset +
+		dev->sriov->stride * id) & 0xff;
+}
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
 irqreturn_t pci_sriov_migration(struct pci_dev *dev);
@@ -1603,6 +1638,14 @@ int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 #else
+static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
+{
+	return -1;
+}
+static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
+{
+	return -1;
+}
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline void pci_disable_sriov(struct pci_dev *dev) { }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 01/17] pci/iov: Export interface for retrieve VF's BDF
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

When implementing the SR-IOV on PowerNV platform, some resource reservation is
needed for VFs which don't exist at the bootup stage. To do the match between
resources and VFs, the code need to get the VF's BDF in advance.

In this patch, it exports the interface to retrieve VF's BDF:
   * Make the virtfn_bus as an interface
   * Make the virtfn_devfn as an interface
   * rename them with more specific name
   * code cleanup in pci_sriov_resource_alignment()

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c   |   26 +++++++-------------------
 drivers/pci/pci.h   |   21 ---------------------
 include/linux/pci.h |   43 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 9dce7c5..589ef7d 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -19,18 +19,6 @@
 
 #define VIRTFN_ID_LEN	16
 
-static inline u8 virtfn_bus(struct pci_dev *dev, int id)
-{
-	return dev->bus->number + ((dev->devfn + dev->sriov->offset +
-				    dev->sriov->stride * id) >> 8);
-}
-
-static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
-{
-	return (dev->devfn + dev->sriov->offset +
-		dev->sriov->stride * id) & 0xff;
-}
-
 static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 {
 	struct pci_bus *child;
@@ -69,7 +57,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
 	struct pci_bus *bus;
 
 	mutex_lock(&iov->dev->sriov->lock);
-	bus = virtfn_add_bus(dev->bus, virtfn_bus(dev, id));
+	bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id));
 	if (!bus)
 		goto failed;
 
@@ -77,7 +65,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
 	if (!virtfn)
 		goto failed0;
 
-	virtfn->devfn = virtfn_devfn(dev, id);
+	virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
 	virtfn->vendor = dev->vendor;
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
 	pci_setup_device(virtfn);
@@ -140,8 +128,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset)
 	struct pci_sriov *iov = dev->sriov;
 
 	virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
-					     virtfn_bus(dev, id),
-					     virtfn_devfn(dev, id));
+					     pci_iov_virtfn_bus(dev, id),
+					     pci_iov_virtfn_devfn(dev, id));
 	if (!virtfn)
 		return;
 
@@ -307,7 +295,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	iov->offset = offset;
 	iov->stride = stride;
 
-	if (virtfn_bus(dev, nr_virtfn - 1) > dev->bus->busn_res.end) {
+	if (pci_iov_virtfn_bus(dev, nr_virtfn - 1) > dev->bus->busn_res.end) {
 		dev_err(&dev->dev, "SR-IOV: bus number out of range\n");
 		return -ENOMEM;
 	}
@@ -616,7 +604,7 @@ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 	if (!reg)
 		return 0;
 
-	 __pci_read_base(dev, type, &tmp, reg);
+	__pci_read_base(dev, type, &tmp, reg);
 	return resource_alignment(&tmp);
 }
 
@@ -646,7 +634,7 @@ int pci_iov_bus_range(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		if (!dev->is_physfn)
 			continue;
-		busnr = virtfn_bus(dev, dev->sriov->total_VFs - 1);
+		busnr = pci_iov_virtfn_bus(dev, dev->sriov->total_VFs - 1);
 		if (busnr > max)
 			max = busnr;
 	}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4df38df..51f1f7c 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -223,27 +223,6 @@ static inline int pci_ari_enabled(struct pci_bus *bus)
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
 
-/* Single Root I/O Virtualization */
-struct pci_sriov {
-	int pos;		/* capability position */
-	int nres;		/* number of resources */
-	u32 cap;		/* SR-IOV Capabilities */
-	u16 ctrl;		/* SR-IOV Control */
-	u16 total_VFs;		/* total VFs associated with the PF */
-	u16 initial_VFs;	/* initial VFs associated with the PF */
-	u16 num_VFs;		/* number of VFs available */
-	u16 offset;		/* first VF Routing ID offset */
-	u16 stride;		/* following VF stride */
-	u32 pgsz;		/* page size for BAR alignment */
-	u8 link;		/* Function Dependency Link */
-	u16 driver_max_VFs;	/* max num VFs driver supports */
-	struct pci_dev *dev;	/* lowest numbered PF */
-	struct pci_dev *self;	/* this PF */
-	struct mutex lock;	/* lock for VF bus */
-	struct work_struct mtask; /* VF Migration task */
-	u8 __iomem *mstate;	/* VF Migration State Array */
-};
-
 #ifdef CONFIG_PCI_ATS
 void pci_restore_ats_state(struct pci_dev *dev);
 #else
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 33aa2ca..ddb1ca0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -240,6 +240,27 @@ struct pci_vpd;
 struct pci_sriov;
 struct pci_ats;
 
+/* Single Root I/O Virtualization */
+struct pci_sriov {
+	int pos;		/* capability position */
+	int nres;		/* number of resources */
+	u32 cap;		/* SR-IOV Capabilities */
+	u16 ctrl;		/* SR-IOV Control */
+	u16 total_VFs;		/* total VFs associated with the PF */
+	u16 initial_VFs;	/* initial VFs associated with the PF */
+	u16 num_VFs;		/* number of VFs available */
+	u16 offset;		/* first VF Routing ID offset */
+	u16 stride;		/* following VF stride */
+	u32 pgsz;		/* page size for BAR alignment */
+	u8 link;		/* Function Dependency Link */
+	u16 driver_max_VFs;	/* max num VFs driver supports */
+	struct pci_dev *dev;	/* lowest numbered PF */
+	struct pci_dev *self;	/* this PF */
+	struct mutex lock;	/* lock for VF bus */
+	struct work_struct mtask; /* VF Migration task */
+	u8 __iomem *mstate;	/* VF Migration State Array */
+};
+
 /*
  * The pci_dev structure is used to describe PCI devices.
  */
@@ -1595,6 +1616,20 @@ int pci_ext_cfg_avail(void);
 void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
 
 #ifdef CONFIG_PCI_IOV
+static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
+{
+	if (!dev->is_physfn)
+		return -1;
+	return dev->bus->number + ((dev->devfn + dev->sriov->offset +
+				    dev->sriov->stride * id) >> 8);
+}
+static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
+{
+	if (!dev->is_physfn)
+		return -1;
+	return (dev->devfn + dev->sriov->offset +
+		dev->sriov->stride * id) & 0xff;
+}
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
 irqreturn_t pci_sriov_migration(struct pci_dev *dev);
@@ -1603,6 +1638,14 @@ int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 #else
+static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
+{
+	return -1;
+}
+static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
+{
+	return -1;
+}
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline void pci_disable_sriov(struct pci_dev *dev) { }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 02/17] pci/of: Match PCI VFs to dev-tree nodes dynamically
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

As introduced by commit 98d9f30c82 ("pci/of: Match PCI devices to dev-tree nodes
dynamically"), we need to match PCI devices to their corresponding dev-tree
nodes. While for VFs, this step was missed.

This patch matches VFs' PCI devices to dev-tree nodes dynamically.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 589ef7d..1d21f43 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -67,6 +67,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
 
 	virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
 	virtfn->vendor = dev->vendor;
+	pci_set_of_node(virtfn);
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
 	pci_setup_device(virtfn);
 	virtfn->dev.parent = dev->dev.parent;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 02/17] pci/of: Match PCI VFs to dev-tree nodes dynamically
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

As introduced by commit 98d9f30c82 ("pci/of: Match PCI devices to dev-tree nodes
dynamically"), we need to match PCI devices to their corresponding dev-tree
nodes. While for VFs, this step was missed.

This patch matches VFs' PCI devices to dev-tree nodes dynamically.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 589ef7d..1d21f43 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -67,6 +67,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
 
 	virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
 	virtfn->vendor = dev->vendor;
+	pci_set_of_node(virtfn);
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
 	pci_setup_device(virtfn);
 	virtfn->dev.parent = dev->dev.parent;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 03/17] ppc/pci: don't unset pci resources for VFs
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

When the PCI_REASSIGN_ALL_RSRC is set, each resource for a pci_dev will be
unset. which means the pci core will reassign those resources.

While this behavior will clean up the resources information for VFs, whose
value is calculated in virtfn_add.

This patch adds a condition. If the pci_dev is a VF, skip the resource
unset process.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-common.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index d9476c1..c449a26 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -824,6 +824,12 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
 		       pci_name(dev));
 		return;
 	}
+
+#ifdef CONFIG_PCI_IOV
+	if (dev->is_virtfn)
+		return;
+#endif
+
 	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
 		struct resource *res = dev->resource + i;
 		struct pci_bus_region reg;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 03/17] ppc/pci: don't unset pci resources for VFs
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

When the PCI_REASSIGN_ALL_RSRC is set, each resource for a pci_dev will be
unset. which means the pci core will reassign those resources.

While this behavior will clean up the resources information for VFs, whose
value is calculated in virtfn_add.

This patch adds a condition. If the pci_dev is a VF, skip the resource
unset process.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-common.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index d9476c1..c449a26 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -824,6 +824,12 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
 		       pci_name(dev));
 		return;
 	}
+
+#ifdef CONFIG_PCI_IOV
+	if (dev->is_virtfn)
+		return;
+#endif
+
 	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
 		struct resource *res = dev->resource + i;
 		struct pci_bus_region reg;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 04/17] PCI: SRIOV: add VF enable/disable hook
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

VFs are dynamically created/released when driver enable them. On some
platforms, like PowerNV, special resources are necessary to enable VFs.

This patch adds two hooks for platform initialization before creating the VFs.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 1d21f43..cc87773 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -250,6 +250,11 @@ static void sriov_disable_migration(struct pci_dev *dev)
 	iounmap(iov->mstate);
 }
 
+int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
+{
+       return 0;
+}
+
 static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 {
 	int rc;
@@ -260,6 +265,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	struct pci_dev *pdev;
 	struct pci_sriov *iov = dev->sriov;
 	int bars = 0;
+	int retval;
 
 	if (!nr_virtfn)
 		return 0;
@@ -334,6 +340,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	if (nr_virtfn < initial)
 		initial = nr_virtfn;
 
+	if ((retval = pcibios_sriov_enable(dev, initial))) {
+		dev_err(&dev->dev, "Failure %d from pcibios_sriov_setup()\n",
+				retval);
+		return retval;
+	}
+
 	for (i = 0; i < initial; i++) {
 		rc = virtfn_add(dev, i, 0);
 		if (rc)
@@ -368,6 +380,11 @@ failed:
 	return rc;
 }
 
+int __weak pcibios_sriov_disable(struct pci_dev *pdev)
+{
+       return 0;
+}
+
 static void sriov_disable(struct pci_dev *dev)
 {
 	int i;
@@ -382,6 +399,8 @@ static void sriov_disable(struct pci_dev *dev)
 	for (i = 0; i < iov->num_VFs; i++)
 		virtfn_remove(dev, i, 0);
 
+	pcibios_sriov_disable(dev);
+
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
 	pci_cfg_access_lock(dev);
 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 04/17] PCI: SRIOV: add VF enable/disable hook
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

VFs are dynamically created/released when driver enable them. On some
platforms, like PowerNV, special resources are necessary to enable VFs.

This patch adds two hooks for platform initialization before creating the VFs.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 1d21f43..cc87773 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -250,6 +250,11 @@ static void sriov_disable_migration(struct pci_dev *dev)
 	iounmap(iov->mstate);
 }
 
+int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
+{
+       return 0;
+}
+
 static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 {
 	int rc;
@@ -260,6 +265,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	struct pci_dev *pdev;
 	struct pci_sriov *iov = dev->sriov;
 	int bars = 0;
+	int retval;
 
 	if (!nr_virtfn)
 		return 0;
@@ -334,6 +340,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	if (nr_virtfn < initial)
 		initial = nr_virtfn;
 
+	if ((retval = pcibios_sriov_enable(dev, initial))) {
+		dev_err(&dev->dev, "Failure %d from pcibios_sriov_setup()\n",
+				retval);
+		return retval;
+	}
+
 	for (i = 0; i < initial; i++) {
 		rc = virtfn_add(dev, i, 0);
 		if (rc)
@@ -368,6 +380,11 @@ failed:
 	return rc;
 }
 
+int __weak pcibios_sriov_disable(struct pci_dev *pdev)
+{
+       return 0;
+}
+
 static void sriov_disable(struct pci_dev *dev)
 {
 	int i;
@@ -382,6 +399,8 @@ static void sriov_disable(struct pci_dev *dev)
 	for (i = 0; i < iov->num_VFs; i++)
 		virtfn_remove(dev, i, 0);
 
+	pcibios_sriov_disable(dev);
+
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
 	pci_cfg_access_lock(dev);
 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 05/17] ppc/pnv: user macro to define the TCE size
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

During the initialization of the TVT/TCE, it uses digits to specify the TCE IO
Page Size, TCE Table Size, TCE Entry Size, etc.

This patch replaces those digits with macros, which will be more meaningful and
easy to read.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/tce.h            |    3 ++-
 arch/powerpc/platforms/powernv/pci-ioda.c |   25 +++++++++++--------------
 arch/powerpc/platforms/powernv/pci.c      |    2 +-
 arch/powerpc/platforms/powernv/pci.h      |    5 +++++
 4 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
index 743f36b..28a1d06 100644
--- a/arch/powerpc/include/asm/tce.h
+++ b/arch/powerpc/include/asm/tce.h
@@ -40,7 +40,8 @@
 #define TCE_SHIFT	12
 #define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
 
-#define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
+#define TCE_ENTRY_SHIFT		3
+#define TCE_ENTRY_SIZE		(1 << TCE_ENTRY_SHIFT)	/* each TCE is 64 bits */
 
 #define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
 #define TCE_RPN_SHIFT		12
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8ae09cf..9715351 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -820,9 +820,6 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	int64_t rc;
 	void *addr;
 
-	/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
-
 	/* XXX FIXME: Handle 64-bit only DMA devices */
 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
@@ -834,7 +831,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	/* Grab a 32-bit TCE table */
 	pe->tce32_seg = base;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
-		(base << 28), ((base + segs) << 28) - 1);
+		(base << PNV_TCE32_SEG_SHIFT), ((base + segs) << PNV_TCE32_SEG_SHIFT) - 1);
 
 	/* XXX Currently, we allocate one big contiguous table for the
 	 * TCEs. We only really need one chunk per 256M of TCE space
@@ -842,21 +839,21 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	 * requires some added smarts with our get/put_tce implementation
 	 */
 	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
-				   get_order(TCE32_TABLE_SIZE * segs));
+				   get_order(PNV_TCE32_TAB_SIZE * segs));
 	if (!tce_mem) {
 		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
 		goto fail;
 	}
 	addr = page_address(tce_mem);
-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
+	memset(addr, 0, PNV_TCE32_TAB_SIZE * segs);
 
 	/* Configure HW */
 	for (i = 0; i < segs; i++) {
 		rc = opal_pci_map_pe_dma_window(phb->opal_id,
 					      pe->pe_number,
 					      base + i, 1,
-					      __pa(addr) + TCE32_TABLE_SIZE * i,
-					      TCE32_TABLE_SIZE, 0x1000);
+					      __pa(addr) + PNV_TCE32_TAB_SIZE * i,
+					      PNV_TCE32_TAB_SIZE, TCE_PAGE_SIZE);
 		if (rc) {
 			pe_err(pe, " Failed to configure 32-bit TCE table,"
 			       " err %ld\n", rc);
@@ -866,8 +863,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 
 	/* Setup linux iommu table */
 	tbl = &pe->tce32_table;
-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
-				  base << 28);
+	pnv_pci_setup_iommu_table(tbl, addr, PNV_TCE32_TAB_SIZE * segs,
+				  base << PNV_TCE32_SEG_SHIFT);
 
 	/* OPAL variant of P7IOC SW invalidated TCEs */
 	swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
@@ -898,7 +895,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	if (pe->tce32_seg >= 0)
 		pe->tce32_seg = -1;
 	if (tce_mem)
-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
+		__free_pages(tce_mem, get_order(PNV_TCE32_TAB_SIZE * segs));
 }
 
 static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
@@ -968,7 +965,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	/* The PE will reserve all possible 32-bits space */
 	pe->tce32_seg = 0;
 	end = (1 << ilog2(phb->ioda.m32_pci_base));
-	tce_table_size = (end / 0x1000) * 8;
+	tce_table_size = (end / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE;
 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
 		end);
 
@@ -988,7 +985,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	 */
 	rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
 					pe->pe_number << 1, 1, __pa(addr),
-					tce_table_size, 0x1000);
+					tce_table_size, TCE_PAGE_SIZE);
 	if (rc) {
 		pe_err(pe, "Failed to configure 32-bit TCE table,"
 		       " err %ld\n", rc);
@@ -1573,7 +1570,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
 
 	/* Calculate how many 32-bit TCE segments we have */
-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
+	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> PNV_TCE32_SEG_SHIFT;
 
 #if 0 /* We should really do that ... */
 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 8518817..687a068 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -597,7 +597,7 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
 	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
 	tbl->it_offset = dma_offset >> tbl->it_page_shift;
 	tbl->it_index = 0;
-	tbl->it_size = tce_size >> 3;
+	tbl->it_size = tce_size >> TCE_ENTRY_SHIFT;
 	tbl->it_busno = 0;
 	tbl->it_type = TCE_PCI;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 3e5f5a1..90f6da4 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -227,4 +227,9 @@ extern void pnv_pci_init_ioda2_phb(struct device_node *np);
 extern void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 					__be64 *startp, __be64 *endp, bool rm);
 
+#define PNV_TCE32_SEG_SHIFT     28
+#define PNV_TCE32_SEG_SIZE      (1UL << PNV_TCE32_SEG_SHIFT)
+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
+#define PNV_TCE32_TAB_SIZE	((PNV_TCE32_SEG_SIZE / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE)
+
 #endif /* __POWERNV_PCI_H */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 05/17] ppc/pnv: user macro to define the TCE size
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

During the initialization of the TVT/TCE, it uses digits to specify the TCE IO
Page Size, TCE Table Size, TCE Entry Size, etc.

This patch replaces those digits with macros, which will be more meaningful and
easy to read.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/tce.h            |    3 ++-
 arch/powerpc/platforms/powernv/pci-ioda.c |   25 +++++++++++--------------
 arch/powerpc/platforms/powernv/pci.c      |    2 +-
 arch/powerpc/platforms/powernv/pci.h      |    5 +++++
 4 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
index 743f36b..28a1d06 100644
--- a/arch/powerpc/include/asm/tce.h
+++ b/arch/powerpc/include/asm/tce.h
@@ -40,7 +40,8 @@
 #define TCE_SHIFT	12
 #define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
 
-#define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
+#define TCE_ENTRY_SHIFT		3
+#define TCE_ENTRY_SIZE		(1 << TCE_ENTRY_SHIFT)	/* each TCE is 64 bits */
 
 #define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
 #define TCE_RPN_SHIFT		12
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8ae09cf..9715351 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -820,9 +820,6 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	int64_t rc;
 	void *addr;
 
-	/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
-
 	/* XXX FIXME: Handle 64-bit only DMA devices */
 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
@@ -834,7 +831,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	/* Grab a 32-bit TCE table */
 	pe->tce32_seg = base;
 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
-		(base << 28), ((base + segs) << 28) - 1);
+		(base << PNV_TCE32_SEG_SHIFT), ((base + segs) << PNV_TCE32_SEG_SHIFT) - 1);
 
 	/* XXX Currently, we allocate one big contiguous table for the
 	 * TCEs. We only really need one chunk per 256M of TCE space
@@ -842,21 +839,21 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	 * requires some added smarts with our get/put_tce implementation
 	 */
 	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
-				   get_order(TCE32_TABLE_SIZE * segs));
+				   get_order(PNV_TCE32_TAB_SIZE * segs));
 	if (!tce_mem) {
 		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
 		goto fail;
 	}
 	addr = page_address(tce_mem);
-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
+	memset(addr, 0, PNV_TCE32_TAB_SIZE * segs);
 
 	/* Configure HW */
 	for (i = 0; i < segs; i++) {
 		rc = opal_pci_map_pe_dma_window(phb->opal_id,
 					      pe->pe_number,
 					      base + i, 1,
-					      __pa(addr) + TCE32_TABLE_SIZE * i,
-					      TCE32_TABLE_SIZE, 0x1000);
+					      __pa(addr) + PNV_TCE32_TAB_SIZE * i,
+					      PNV_TCE32_TAB_SIZE, TCE_PAGE_SIZE);
 		if (rc) {
 			pe_err(pe, " Failed to configure 32-bit TCE table,"
 			       " err %ld\n", rc);
@@ -866,8 +863,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 
 	/* Setup linux iommu table */
 	tbl = &pe->tce32_table;
-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
-				  base << 28);
+	pnv_pci_setup_iommu_table(tbl, addr, PNV_TCE32_TAB_SIZE * segs,
+				  base << PNV_TCE32_SEG_SHIFT);
 
 	/* OPAL variant of P7IOC SW invalidated TCEs */
 	swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
@@ -898,7 +895,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	if (pe->tce32_seg >= 0)
 		pe->tce32_seg = -1;
 	if (tce_mem)
-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
+		__free_pages(tce_mem, get_order(PNV_TCE32_TAB_SIZE * segs));
 }
 
 static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
@@ -968,7 +965,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	/* The PE will reserve all possible 32-bits space */
 	pe->tce32_seg = 0;
 	end = (1 << ilog2(phb->ioda.m32_pci_base));
-	tce_table_size = (end / 0x1000) * 8;
+	tce_table_size = (end / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE;
 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
 		end);
 
@@ -988,7 +985,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	 */
 	rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
 					pe->pe_number << 1, 1, __pa(addr),
-					tce_table_size, 0x1000);
+					tce_table_size, TCE_PAGE_SIZE);
 	if (rc) {
 		pe_err(pe, "Failed to configure 32-bit TCE table,"
 		       " err %ld\n", rc);
@@ -1573,7 +1570,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
 
 	/* Calculate how many 32-bit TCE segments we have */
-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
+	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> PNV_TCE32_SEG_SHIFT;
 
 #if 0 /* We should really do that ... */
 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 8518817..687a068 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -597,7 +597,7 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
 	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
 	tbl->it_offset = dma_offset >> tbl->it_page_shift;
 	tbl->it_index = 0;
-	tbl->it_size = tce_size >> 3;
+	tbl->it_size = tce_size >> TCE_ENTRY_SHIFT;
 	tbl->it_busno = 0;
 	tbl->it_type = TCE_PCI;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 3e5f5a1..90f6da4 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -227,4 +227,9 @@ extern void pnv_pci_init_ioda2_phb(struct device_node *np);
 extern void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 					__be64 *startp, __be64 *endp, bool rm);
 
+#define PNV_TCE32_SEG_SHIFT     28
+#define PNV_TCE32_SEG_SIZE      (1UL << PNV_TCE32_SEG_SHIFT)
+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
+#define PNV_TCE32_TAB_SIZE	((PNV_TCE32_SEG_SIZE / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE)
+
 #endif /* __POWERNV_PCI_H */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

Current iommu_table of a PE is a static field. This will have a problem when
iommu_free_table is called.

This patch allocate iommu_table dynamically.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/iommu.h          |    3 +++
 arch/powerpc/platforms/powernv/pci-ioda.c |   24 +++++++++++++-----------
 arch/powerpc/platforms/powernv/pci.h      |    2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 42632c7..0fedacb 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -78,6 +78,9 @@ struct iommu_table {
 	struct iommu_group *it_group;
 #endif
 	void (*set_bypass)(struct iommu_table *tbl, bool enable);
+#ifdef CONFIG_PPC_POWERNV
+	void           *data;
+#endif
 };
 
 /* Pure 2^n version of get_order */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9715351..8ca3926 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -608,6 +608,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 		return;
 	}
 
+	pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
+			GFP_KERNEL, hose->node);
+	pe->tce32_table->data = pe;
+
 	/* Associate it with all child devices */
 	pnv_ioda_setup_same_PE(bus, pe);
 
@@ -675,7 +679,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev
 
 	pe = &phb->ioda.pe_array[pdn->pe_number];
 	WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
-	set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table);
+	set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table);
 }
 
 static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
@@ -702,7 +706,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
 	} else {
 		dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
 		set_dma_ops(&pdev->dev, &dma_iommu_ops);
-		set_iommu_table_base(&pdev->dev, &pe->tce32_table);
+		set_iommu_table_base(&pdev->dev, pe->tce32_table);
 	}
 	return 0;
 }
@@ -712,7 +716,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
-		set_iommu_table_base_and_group(&dev->dev, &pe->tce32_table);
+		set_iommu_table_base_and_group(&dev->dev, pe->tce32_table);
 		if (dev->subordinate)
 			pnv_ioda_setup_bus_dma(pe, dev->subordinate);
 	}
@@ -798,8 +802,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe,
 void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 				 __be64 *startp, __be64 *endp, bool rm)
 {
-	struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
-					      tce32_table);
+	struct pnv_ioda_pe *pe = tbl->data;
 	struct pnv_phb *phb = pe->phb;
 
 	if (phb->type == PNV_PHB_IODA1)
@@ -862,7 +865,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	}
 
 	/* Setup linux iommu table */
-	tbl = &pe->tce32_table;
+	tbl = pe->tce32_table;
 	pnv_pci_setup_iommu_table(tbl, addr, PNV_TCE32_TAB_SIZE * segs,
 				  base << PNV_TCE32_SEG_SHIFT);
 
@@ -900,8 +903,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 
 static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
 {
-	struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
-					      tce32_table);
+	struct pnv_ioda_pe *pe = tbl->data;
 	uint16_t window_id = (pe->pe_number << 1 ) + 1;
 	int64_t rc;
 
@@ -942,10 +944,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb,
 	pe->tce_bypass_base = 1ull << 59;
 
 	/* Install set_bypass callback for VFIO */
-	pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass;
+	pe->tce32_table->set_bypass = pnv_pci_ioda2_set_bypass;
 
 	/* Enable bypass by default */
-	pnv_pci_ioda2_set_bypass(&pe->tce32_table, true);
+	pnv_pci_ioda2_set_bypass(pe->tce32_table, true);
 }
 
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
@@ -993,7 +995,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	}
 
 	/* Setup linux iommu table */
-	tbl = &pe->tce32_table;
+	tbl = pe->tce32_table;
 	pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0);
 
 	/* OPAL variant of PHB3 invalidated TCEs */
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 90f6da4..9fbf7c0 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -60,7 +60,7 @@ struct pnv_ioda_pe {
 	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
 	int			tce32_seg;
 	int			tce32_segcount;
-	struct iommu_table	tce32_table;
+	struct iommu_table	*tce32_table;
 	phys_addr_t		tce_inval_reg_phys;
 
 	/* 64-bit TCE bypass region */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

Current iommu_table of a PE is a static field. This will have a problem when
iommu_free_table is called.

This patch allocate iommu_table dynamically.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/iommu.h          |    3 +++
 arch/powerpc/platforms/powernv/pci-ioda.c |   24 +++++++++++++-----------
 arch/powerpc/platforms/powernv/pci.h      |    2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 42632c7..0fedacb 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -78,6 +78,9 @@ struct iommu_table {
 	struct iommu_group *it_group;
 #endif
 	void (*set_bypass)(struct iommu_table *tbl, bool enable);
+#ifdef CONFIG_PPC_POWERNV
+	void           *data;
+#endif
 };
 
 /* Pure 2^n version of get_order */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9715351..8ca3926 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -608,6 +608,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
 		return;
 	}
 
+	pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
+			GFP_KERNEL, hose->node);
+	pe->tce32_table->data = pe;
+
 	/* Associate it with all child devices */
 	pnv_ioda_setup_same_PE(bus, pe);
 
@@ -675,7 +679,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev
 
 	pe = &phb->ioda.pe_array[pdn->pe_number];
 	WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
-	set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table);
+	set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table);
 }
 
 static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
@@ -702,7 +706,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
 	} else {
 		dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
 		set_dma_ops(&pdev->dev, &dma_iommu_ops);
-		set_iommu_table_base(&pdev->dev, &pe->tce32_table);
+		set_iommu_table_base(&pdev->dev, pe->tce32_table);
 	}
 	return 0;
 }
@@ -712,7 +716,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
-		set_iommu_table_base_and_group(&dev->dev, &pe->tce32_table);
+		set_iommu_table_base_and_group(&dev->dev, pe->tce32_table);
 		if (dev->subordinate)
 			pnv_ioda_setup_bus_dma(pe, dev->subordinate);
 	}
@@ -798,8 +802,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe,
 void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 				 __be64 *startp, __be64 *endp, bool rm)
 {
-	struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
-					      tce32_table);
+	struct pnv_ioda_pe *pe = tbl->data;
 	struct pnv_phb *phb = pe->phb;
 
 	if (phb->type == PNV_PHB_IODA1)
@@ -862,7 +865,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	}
 
 	/* Setup linux iommu table */
-	tbl = &pe->tce32_table;
+	tbl = pe->tce32_table;
 	pnv_pci_setup_iommu_table(tbl, addr, PNV_TCE32_TAB_SIZE * segs,
 				  base << PNV_TCE32_SEG_SHIFT);
 
@@ -900,8 +903,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 
 static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
 {
-	struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
-					      tce32_table);
+	struct pnv_ioda_pe *pe = tbl->data;
 	uint16_t window_id = (pe->pe_number << 1 ) + 1;
 	int64_t rc;
 
@@ -942,10 +944,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb,
 	pe->tce_bypass_base = 1ull << 59;
 
 	/* Install set_bypass callback for VFIO */
-	pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass;
+	pe->tce32_table->set_bypass = pnv_pci_ioda2_set_bypass;
 
 	/* Enable bypass by default */
-	pnv_pci_ioda2_set_bypass(&pe->tce32_table, true);
+	pnv_pci_ioda2_set_bypass(pe->tce32_table, true);
 }
 
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
@@ -993,7 +995,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	}
 
 	/* Setup linux iommu table */
-	tbl = &pe->tce32_table;
+	tbl = pe->tce32_table;
 	pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0);
 
 	/* OPAL variant of PHB3 invalidated TCEs */
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 90f6da4..9fbf7c0 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -60,7 +60,7 @@ struct pnv_ioda_pe {
 	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
 	int			tce32_seg;
 	int			tce32_segcount;
-	struct iommu_table	tce32_table;
+	struct iommu_table	*tce32_table;
 	phys_addr_t		tce_inval_reg_phys;
 
 	/* 64-bit TCE bypass region */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 07/17] ppc/pnv: Add function to deconfig a PE
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

On PowerNV platform, it will support dynamic PE allocation and deallocation.

This patch adds a function to release those resources related to a PE.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   77 +++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8ca3926..87cb3089 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -330,6 +330,83 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
 }
 #endif /* CONFIG_PCI_MSI */
 
+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
+{
+	struct pci_dev *parent;
+	uint8_t bcomp, dcomp, fcomp;
+	int64_t rc;
+	long rid_end, rid;
+	if (pe->pbus) {
+		int count;
+
+		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
+		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
+		parent = pe->pbus->self;
+		if (pe->flags & PNV_IODA_PE_BUS_ALL)
+			count = pe->pbus->busn_res.end - pe->pbus->busn_res.start + 1;
+		else
+			count = 1;
+
+		switch(count) {
+		case  1: bcomp = OpalPciBusAll;         break;
+		case  2: bcomp = OpalPciBus7Bits;       break;
+		case  4: bcomp = OpalPciBus6Bits;       break;
+		case  8: bcomp = OpalPciBus5Bits;       break;
+		case 16: bcomp = OpalPciBus4Bits;       break;
+		case 32: bcomp = OpalPciBus3Bits;       break;
+		default:
+			pr_err("%s: Number of subordinate busses %d"
+			       " unsupported\n",
+			       pci_name(pe->pbus->self), count);
+			/* Do an exact match only */
+			bcomp = OpalPciBusAll;
+		}
+		rid_end = pe->rid + (count << 8);
+	}else {
+		parent = pe->pdev->bus->self;
+		bcomp = OpalPciBusAll;
+		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
+		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
+		rid_end = pe->rid + 1;
+	}
+
+	/* Disable MVT on IODA1 */
+	if (phb->type == PNV_PHB_IODA1) {
+		rc = opal_pci_set_mve_enable(phb->opal_id,
+					     pe->mve_number, OPAL_DISABLE_MVE);
+		if (rc) {
+			pe_err(pe, "OPAL error %ld enabling MVE %d\n",
+			       rc, pe->mve_number);
+			pe->mve_number = -1;
+		}
+	}
+	/* Clear the reverse map */
+	for (rid = pe->rid; rid < rid_end; rid++)
+		phb->ioda.pe_rmap[rid] = 0;
+
+	/* Release from all parents PELT-V */
+	while (parent) {
+		struct pci_dn *pdn = pci_get_pdn(parent);
+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
+			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
+						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
+			/* XXX What to do in case of error ? */
+		}
+		parent = parent->bus->self;
+	}
+
+	/* Dissociate PE in PELT */
+	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
+			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
+	if (rc)
+		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
+
+	pe->pbus = NULL;
+	pe->pdev = NULL;
+
+	return 0;
+}
+
 static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *parent;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 07/17] ppc/pnv: Add function to deconfig a PE
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

On PowerNV platform, it will support dynamic PE allocation and deallocation.

This patch adds a function to release those resources related to a PE.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   77 +++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8ca3926..87cb3089 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -330,6 +330,83 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
 }
 #endif /* CONFIG_PCI_MSI */
 
+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
+{
+	struct pci_dev *parent;
+	uint8_t bcomp, dcomp, fcomp;
+	int64_t rc;
+	long rid_end, rid;
+	if (pe->pbus) {
+		int count;
+
+		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
+		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
+		parent = pe->pbus->self;
+		if (pe->flags & PNV_IODA_PE_BUS_ALL)
+			count = pe->pbus->busn_res.end - pe->pbus->busn_res.start + 1;
+		else
+			count = 1;
+
+		switch(count) {
+		case  1: bcomp = OpalPciBusAll;         break;
+		case  2: bcomp = OpalPciBus7Bits;       break;
+		case  4: bcomp = OpalPciBus6Bits;       break;
+		case  8: bcomp = OpalPciBus5Bits;       break;
+		case 16: bcomp = OpalPciBus4Bits;       break;
+		case 32: bcomp = OpalPciBus3Bits;       break;
+		default:
+			pr_err("%s: Number of subordinate busses %d"
+			       " unsupported\n",
+			       pci_name(pe->pbus->self), count);
+			/* Do an exact match only */
+			bcomp = OpalPciBusAll;
+		}
+		rid_end = pe->rid + (count << 8);
+	}else {
+		parent = pe->pdev->bus->self;
+		bcomp = OpalPciBusAll;
+		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
+		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
+		rid_end = pe->rid + 1;
+	}
+
+	/* Disable MVT on IODA1 */
+	if (phb->type == PNV_PHB_IODA1) {
+		rc = opal_pci_set_mve_enable(phb->opal_id,
+					     pe->mve_number, OPAL_DISABLE_MVE);
+		if (rc) {
+			pe_err(pe, "OPAL error %ld enabling MVE %d\n",
+			       rc, pe->mve_number);
+			pe->mve_number = -1;
+		}
+	}
+	/* Clear the reverse map */
+	for (rid = pe->rid; rid < rid_end; rid++)
+		phb->ioda.pe_rmap[rid] = 0;
+
+	/* Release from all parents PELT-V */
+	while (parent) {
+		struct pci_dn *pdn = pci_get_pdn(parent);
+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
+			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
+						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
+			/* XXX What to do in case of error ? */
+		}
+		parent = parent->bus->self;
+	}
+
+	/* Dissociate PE in PELT */
+	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
+			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
+	if (rc)
+		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
+
+	pe->pbus = NULL;
+	pe->pdev = NULL;
+
+	return 0;
+}
+
 static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *parent;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 08/17] PCI: Add weak pcibios_sriov_resource_size() interface
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

When retrieving sriov resource size in pci_sriov_resource_size(), it will
divide the total IOV resource size with the totalVF number. This is true for
most cases, while may not be correct on some specific platform.

For example on powernv platform, in order to fix the IOV BAR into a hardware
alignment, the IOV resource size would be expended. This means the original
method couldn't work.

This patch introduces a weak pcibios_sriov_resource_size() interface, which
gives platform a chance to implement specific method to calculate the sriov
resource size.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c   |   27 +++++++++++++++++++++++++--
 include/linux/pci.h |    3 +++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index cc87773..9fd4648 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -45,6 +45,30 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus)
 		pci_remove_bus(virtbus);
 }
 
+resource_size_t __weak pcibios_sriov_resource_size(struct pci_dev *dev, int resno)
+{
+	return 0;
+}
+
+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
+{
+	u64 size;
+	struct pci_sriov *iov;
+
+	if (!dev->is_physfn)
+		return 0;
+
+	size = pcibios_sriov_resource_size(dev, resno);
+	if (size != 0)
+		return size;
+
+	iov = dev->sriov;
+	size = resource_size(dev->resource + resno);
+	do_div(size, iov->total_VFs);
+
+	return size;
+}
+
 static int virtfn_add(struct pci_dev *dev, int id, int reset)
 {
 	int i;
@@ -81,8 +105,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
 			continue;
 		virtfn->resource[i].name = pci_name(virtfn);
 		virtfn->resource[i].flags = res->flags;
-		size = resource_size(res);
-		do_div(size, iov->total_VFs);
+		size = pci_sriov_resource_size(dev, i + PCI_IOV_RESOURCES);
 		virtfn->resource[i].start = res->start + size * id;
 		virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
 		rc = request_resource(res, &virtfn->resource[i]);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ddb1ca0..315c150 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1637,6 +1637,7 @@ int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno);
 #else
 static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
 {
@@ -1658,6 +1659,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
 { return 0; }
 static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
+static inline resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
+{ return -1; }
 #endif
 
 #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 08/17] PCI: Add weak pcibios_sriov_resource_size() interface
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

When retrieving sriov resource size in pci_sriov_resource_size(), it will
divide the total IOV resource size with the totalVF number. This is true for
most cases, while may not be correct on some specific platform.

For example on powernv platform, in order to fix the IOV BAR into a hardware
alignment, the IOV resource size would be expended. This means the original
method couldn't work.

This patch introduces a weak pcibios_sriov_resource_size() interface, which
gives platform a chance to implement specific method to calculate the sriov
resource size.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c   |   27 +++++++++++++++++++++++++--
 include/linux/pci.h |    3 +++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index cc87773..9fd4648 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -45,6 +45,30 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus)
 		pci_remove_bus(virtbus);
 }
 
+resource_size_t __weak pcibios_sriov_resource_size(struct pci_dev *dev, int resno)
+{
+	return 0;
+}
+
+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
+{
+	u64 size;
+	struct pci_sriov *iov;
+
+	if (!dev->is_physfn)
+		return 0;
+
+	size = pcibios_sriov_resource_size(dev, resno);
+	if (size != 0)
+		return size;
+
+	iov = dev->sriov;
+	size = resource_size(dev->resource + resno);
+	do_div(size, iov->total_VFs);
+
+	return size;
+}
+
 static int virtfn_add(struct pci_dev *dev, int id, int reset)
 {
 	int i;
@@ -81,8 +105,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
 			continue;
 		virtfn->resource[i].name = pci_name(virtfn);
 		virtfn->resource[i].flags = res->flags;
-		size = resource_size(res);
-		do_div(size, iov->total_VFs);
+		size = pci_sriov_resource_size(dev, i + PCI_IOV_RESOURCES);
 		virtfn->resource[i].start = res->start + size * id;
 		virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
 		rc = request_resource(res, &virtfn->resource[i]);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ddb1ca0..315c150 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1637,6 +1637,7 @@ int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno);
 #else
 static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
 {
@@ -1658,6 +1659,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
 { return 0; }
 static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
+static inline resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
+{ return -1; }
 #endif
 
 #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 09/17] PCI: Add weak pcibios_sriov_resource_alignment() interface
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

The sriov resource alignment is designed to be the individual size of a sriov
resource. This works fine for many platforms, but on powernv platform it needs
some change.

The original alignment works, since at sizing and assigning stage the
requirement is from an individual VF's resource size instead of the big IOV
BAR. This is the reason for the original code to just retrieve the individual
sriov size as the alignment.

On powernv platform, it is required to align the whole IOV BAR to a hardware
aperture. Based on this fact, the alignment of sriov resource should be the
total size of the IOV BAR.

This patch introduces a weak pcibios_sriov_resource_alignment() interface, which
gives platform a chance to implement specific method to calculate the sriov
resource alignment.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 9fd4648..dd7fc42 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -628,6 +628,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno,
 		4 * (resno - PCI_IOV_RESOURCES);
 }
 
+resource_size_t __weak pcibios_sriov_resource_alignment(struct pci_dev *dev,
+		int resno, resource_size_t align)
+{
+	return align;
+}
+
 /**
  * pci_sriov_resource_alignment - get resource alignment for VF BAR
  * @dev: the PCI device
@@ -642,13 +648,16 @@ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
 	struct resource tmp;
 	enum pci_bar_type type;
+	resource_size_t align;
 	int reg = pci_iov_resource_bar(dev, resno, &type);
 
 	if (!reg)
 		return 0;
 
 	__pci_read_base(dev, type, &tmp, reg);
-	return resource_alignment(&tmp);
+	align = resource_alignment(&tmp);
+
+	return pcibios_sriov_resource_alignment(dev, resno, align);
 }
 
 /**
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 09/17] PCI: Add weak pcibios_sriov_resource_alignment() interface
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

The sriov resource alignment is designed to be the individual size of a sriov
resource. This works fine for many platforms, but on powernv platform it needs
some change.

The original alignment works, since at sizing and assigning stage the
requirement is from an individual VF's resource size instead of the big IOV
BAR. This is the reason for the original code to just retrieve the individual
sriov size as the alignment.

On powernv platform, it is required to align the whole IOV BAR to a hardware
aperture. Based on this fact, the alignment of sriov resource should be the
total size of the IOV BAR.

This patch introduces a weak pcibios_sriov_resource_alignment() interface, which
gives platform a chance to implement specific method to calculate the sriov
resource alignment.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/iov.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 9fd4648..dd7fc42 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -628,6 +628,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno,
 		4 * (resno - PCI_IOV_RESOURCES);
 }
 
+resource_size_t __weak pcibios_sriov_resource_alignment(struct pci_dev *dev,
+		int resno, resource_size_t align)
+{
+	return align;
+}
+
 /**
  * pci_sriov_resource_alignment - get resource alignment for VF BAR
  * @dev: the PCI device
@@ -642,13 +648,16 @@ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
 	struct resource tmp;
 	enum pci_bar_type type;
+	resource_size_t align;
 	int reg = pci_iov_resource_bar(dev, resno, &type);
 
 	if (!reg)
 		return 0;
 
 	__pci_read_base(dev, type, &tmp, reg);
-	return resource_alignment(&tmp);
+	align = resource_alignment(&tmp);
+
+	return pcibios_sriov_resource_alignment(dev, resno, align);
 }
 
 /**
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 10/17] PCI: take additional IOV BAR alignment in sizing and assigning
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

At resource sizing/assigning stage, resources are divided into two lists,
requested list and additional list, while the alignement of the additional
IOV BAR is not taken into the sizeing and assigning procedure.

This is reasonable in the original implementation, since IOV BAR's alignment is
mostly the size of a PF BAR alignemt. This means the alignment is already taken
into consideration. While this rule may be violated on some platform.

This patch take the additional IOV BAR alignment in sizing and assigning stage
explicitly.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/setup-bus.c |   66 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 59 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 9509ffa..0c3b3a5 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -120,6 +120,28 @@ static resource_size_t get_res_add_size(struct list_head *head,
 	return 0;
 }
 
+static resource_size_t get_res_add_align(struct list_head *head,
+		struct resource *res)
+{
+	struct pci_dev_resource *dev_res;
+
+	list_for_each_entry(dev_res, head, list) {
+		if (dev_res->res == res) {
+			int idx = res - &dev_res->dev->resource[0];
+
+			dev_printk(KERN_DEBUG, &dev_res->dev->dev,
+				   "res[%d]=%pR get_res_add_align min_align %llx\n",
+				   idx, dev_res->res,
+				   (unsigned long long)dev_res->min_align);
+
+			return dev_res->min_align;
+		}
+	}
+
+	return 0;
+}
+
+
 /* Sort resources by alignment */
 static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
@@ -369,8 +391,9 @@ static void __assign_resources_sorted(struct list_head *head,
 	LIST_HEAD(save_head);
 	LIST_HEAD(local_fail_head);
 	struct pci_dev_resource *save_res;
-	struct pci_dev_resource *dev_res, *tmp_res;
+	struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
 	unsigned long fail_type;
+	resource_size_t add_align, align;
 
 	/* Check if optional add_size is there */
 	if (!realloc_head || list_empty(realloc_head))
@@ -385,10 +408,31 @@ static void __assign_resources_sorted(struct list_head *head,
 	}
 
 	/* Update res in head list with add_size in realloc_head list */
-	list_for_each_entry(dev_res, head, list)
+	list_for_each_entry_safe(dev_res, tmp_res, head, list) {
 		dev_res->res->end += get_res_add_size(realloc_head,
 							dev_res->res);
 
+		if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
+			continue;
+
+		add_align = get_res_add_align(realloc_head, dev_res->res);
+
+		if (add_align > dev_res->res->start) {
+			dev_res->res->start = add_align;
+			dev_res->res->end = add_align +
+				            resource_size(dev_res->res);
+
+			list_for_each_entry(dev_res2, head, list) {
+				align = pci_resource_alignment(dev_res2->dev,
+							       dev_res2->res);
+				if (add_align > align)
+					list_move_tail(&dev_res->list,
+						       &dev_res2->list);
+			}
+               }
+
+	}
+
 	/* Try updated head list with add_size added */
 	assign_requested_resources_sorted(head, &local_fail_head);
 
@@ -928,6 +972,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 					 mask | IORESOURCE_PREFETCH, type);
 	unsigned int mem64_mask = 0;
 	resource_size_t children_add_size = 0;
+	resource_size_t children_add_align = 0;
+	resource_size_t add_align = 0;
 
 	if (!b_res)
 		return 0;
@@ -955,6 +1001,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			/* put SRIOV requested res to the optional list */
 			if (realloc_head && i >= PCI_IOV_RESOURCES &&
 					i <= PCI_IOV_RESOURCE_END) {
+				add_align = max(pci_resource_alignment(dev, r), add_align);
 				r->end = r->start - 1;
 				add_to_list(realloc_head, dev, r, r_size, 0/* don't care */);
 				children_add_size += r_size;
@@ -982,8 +1029,11 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 				max_order = order;
 			mem64_mask &= r->flags & IORESOURCE_MEM_64;
 
-			if (realloc_head)
+			if (realloc_head) {
 				children_add_size += get_res_add_size(realloc_head, r);
+				children_add_align = get_res_add_align(realloc_head, r);
+				add_align = max(add_align, children_add_align);
+			}
 		}
 	}
 
@@ -994,7 +1044,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		add_size = children_add_size;
 	size1 = (!realloc_head || (realloc_head && !add_size)) ? size0 :
 		calculate_memsize(size, min_size, add_size,
-				resource_size(b_res), min_align);
+				resource_size(b_res), max(min_align, add_align));
 	if (!size0 && !size1) {
 		if (b_res->start || b_res->end)
 			dev_info(&bus->self->dev, "disabling bridge window "
@@ -1007,10 +1057,12 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	b_res->end = size0 + min_align - 1;
 	b_res->flags |= IORESOURCE_STARTALIGN | mem64_mask;
 	if (size1 > size0 && realloc_head) {
-		add_to_list(realloc_head, bus->self, b_res, size1-size0, min_align);
+		add_to_list(realloc_head, bus->self, b_res, size1-size0,
+				max(min_align, add_align));
 		dev_printk(KERN_DEBUG, &bus->self->dev, "bridge window "
-				 "%pR to %pR add_size %llx\n", b_res,
-				 &bus->busn_res, (unsigned long long)size1-size0);
+				 "%pR to %pR add_size %llx add_align %llx\n", b_res,
+				 &bus->busn_res, (unsigned long long)size1-size0,
+				 max(min_align, add_align));
 	}
 	return 1;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 10/17] PCI: take additional IOV BAR alignment in sizing and assigning
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

At resource sizing/assigning stage, resources are divided into two lists,
requested list and additional list, while the alignement of the additional
IOV BAR is not taken into the sizeing and assigning procedure.

This is reasonable in the original implementation, since IOV BAR's alignment is
mostly the size of a PF BAR alignemt. This means the alignment is already taken
into consideration. While this rule may be violated on some platform.

This patch take the additional IOV BAR alignment in sizing and assigning stage
explicitly.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 drivers/pci/setup-bus.c |   66 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 59 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 9509ffa..0c3b3a5 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -120,6 +120,28 @@ static resource_size_t get_res_add_size(struct list_head *head,
 	return 0;
 }
 
+static resource_size_t get_res_add_align(struct list_head *head,
+		struct resource *res)
+{
+	struct pci_dev_resource *dev_res;
+
+	list_for_each_entry(dev_res, head, list) {
+		if (dev_res->res == res) {
+			int idx = res - &dev_res->dev->resource[0];
+
+			dev_printk(KERN_DEBUG, &dev_res->dev->dev,
+				   "res[%d]=%pR get_res_add_align min_align %llx\n",
+				   idx, dev_res->res,
+				   (unsigned long long)dev_res->min_align);
+
+			return dev_res->min_align;
+		}
+	}
+
+	return 0;
+}
+
+
 /* Sort resources by alignment */
 static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
@@ -369,8 +391,9 @@ static void __assign_resources_sorted(struct list_head *head,
 	LIST_HEAD(save_head);
 	LIST_HEAD(local_fail_head);
 	struct pci_dev_resource *save_res;
-	struct pci_dev_resource *dev_res, *tmp_res;
+	struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
 	unsigned long fail_type;
+	resource_size_t add_align, align;
 
 	/* Check if optional add_size is there */
 	if (!realloc_head || list_empty(realloc_head))
@@ -385,10 +408,31 @@ static void __assign_resources_sorted(struct list_head *head,
 	}
 
 	/* Update res in head list with add_size in realloc_head list */
-	list_for_each_entry(dev_res, head, list)
+	list_for_each_entry_safe(dev_res, tmp_res, head, list) {
 		dev_res->res->end += get_res_add_size(realloc_head,
 							dev_res->res);
 
+		if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
+			continue;
+
+		add_align = get_res_add_align(realloc_head, dev_res->res);
+
+		if (add_align > dev_res->res->start) {
+			dev_res->res->start = add_align;
+			dev_res->res->end = add_align +
+				            resource_size(dev_res->res);
+
+			list_for_each_entry(dev_res2, head, list) {
+				align = pci_resource_alignment(dev_res2->dev,
+							       dev_res2->res);
+				if (add_align > align)
+					list_move_tail(&dev_res->list,
+						       &dev_res2->list);
+			}
+               }
+
+	}
+
 	/* Try updated head list with add_size added */
 	assign_requested_resources_sorted(head, &local_fail_head);
 
@@ -928,6 +972,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 					 mask | IORESOURCE_PREFETCH, type);
 	unsigned int mem64_mask = 0;
 	resource_size_t children_add_size = 0;
+	resource_size_t children_add_align = 0;
+	resource_size_t add_align = 0;
 
 	if (!b_res)
 		return 0;
@@ -955,6 +1001,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			/* put SRIOV requested res to the optional list */
 			if (realloc_head && i >= PCI_IOV_RESOURCES &&
 					i <= PCI_IOV_RESOURCE_END) {
+				add_align = max(pci_resource_alignment(dev, r), add_align);
 				r->end = r->start - 1;
 				add_to_list(realloc_head, dev, r, r_size, 0/* don't care */);
 				children_add_size += r_size;
@@ -982,8 +1029,11 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 				max_order = order;
 			mem64_mask &= r->flags & IORESOURCE_MEM_64;
 
-			if (realloc_head)
+			if (realloc_head) {
 				children_add_size += get_res_add_size(realloc_head, r);
+				children_add_align = get_res_add_align(realloc_head, r);
+				add_align = max(add_align, children_add_align);
+			}
 		}
 	}
 
@@ -994,7 +1044,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		add_size = children_add_size;
 	size1 = (!realloc_head || (realloc_head && !add_size)) ? size0 :
 		calculate_memsize(size, min_size, add_size,
-				resource_size(b_res), min_align);
+				resource_size(b_res), max(min_align, add_align));
 	if (!size0 && !size1) {
 		if (b_res->start || b_res->end)
 			dev_info(&bus->self->dev, "disabling bridge window "
@@ -1007,10 +1057,12 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	b_res->end = size0 + min_align - 1;
 	b_res->flags |= IORESOURCE_STARTALIGN | mem64_mask;
 	if (size1 > size0 && realloc_head) {
-		add_to_list(realloc_head, bus->self, b_res, size1-size0, min_align);
+		add_to_list(realloc_head, bus->self, b_res, size1-size0,
+				max(min_align, add_align));
 		dev_printk(KERN_DEBUG, &bus->self->dev, "bridge window "
-				 "%pR to %pR add_size %llx\n", b_res,
-				 &bus->busn_res, (unsigned long long)size1-size0);
+				 "%pR to %pR add_size %llx add_align %llx\n", b_res,
+				 &bus->busn_res, (unsigned long long)size1-size0,
+				 max(min_align, add_align));
 	}
 	return 1;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 11/17] ppc/pnv: Expand VF resources according to the number of total_pe
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

On PHB3, VF resources will be covered by M64 BAR to have better PE isolation.
Mostly the total_pe number is different from the total_VFs, which will lead to
a conflict between MMIO space and the PE number.

This patch expands the VF resource size to reserve total_pe number of VFs'
resource, which prevents the conflict.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/machdep.h        |    6 +++
 arch/powerpc/include/asm/pci-bridge.h     |    3 ++
 arch/powerpc/kernel/pci-common.c          |   15 ++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |   83 +++++++++++++++++++++++++++++
 4 files changed, 107 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index ad3025d..2f2e770 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -234,9 +234,15 @@ struct machdep_calls {
 
 	/* Called after scan and before resource survey */
 	void (*pcibios_fixup_phb)(struct pci_controller *hose);
+#ifdef CONFIG_PCI_IOV
+	void (*pcibios_fixup_sriov)(struct pci_bus *bus);
+#endif /* CONFIG_PCI_IOV */
 
 	/* Called during PCI resource reassignment */
 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
+#ifdef CONFIG_PCI_IOV
+	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
+#endif /* CONFIG_PCI_IOV */
 
 	/* Called to shutdown machine specific hardware not already controlled
 	 * by other drivers.
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 4ca90a3..8c849d8 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -168,6 +168,9 @@ struct pci_dn {
 #define IODA_INVALID_PE		(-1)
 #ifdef CONFIG_PPC_POWERNV
 	int	pe_number;
+#ifdef CONFIG_PCI_IOV
+	u16     vfs;
+#endif /* CONFIG_PCI_IOV */
 #endif
 };
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index c449a26..c4e2e92 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -120,6 +120,16 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 	return 1;
 }
 
+#ifdef CONFIG_PCI_IOV
+resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
+{
+	if (ppc_md.__pci_sriov_resource_size)
+		return ppc_md.__pci_sriov_resource_size(pdev, resno);
+
+	return 0;
+}
+#endif /* CONFIG_PCI_IOV */
+
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
 {
 #ifdef CONFIG_PPC64
@@ -1675,6 +1685,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
 	if (ppc_md.pcibios_fixup_phb)
 		ppc_md.pcibios_fixup_phb(hose);
 
+#ifdef CONFIG_PCI_IOV
+	if (ppc_md.pcibios_fixup_sriov)
+		ppc_md.pcibios_fixup_sriov(bus);
+#endif /* CONFIG_PCI_IOV */
+
 	/* Configure PCI Express settings */
 	if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
 		struct pci_bus *child;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 87cb3089..7dfad6a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1298,6 +1298,67 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
 #endif /* CONFIG_PCI_MSI */
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	struct resource *res;
+	int i;
+	resource_size_t size;
+	struct pci_dn *pdn;
+
+	if (!pdev->is_physfn || pdev->is_added)
+		return;
+
+	hose = pci_bus_to_host(pdev->bus);
+	if (!hose) {
+		dev_err(&pdev->dev, "%s: NULL pci_controller\n", __func__);
+		return;
+	}
+
+	phb = hose->private_data;
+	if (!phb) {
+		dev_err(&pdev->dev, "%s: NULL PHB\n", __func__);
+		return;
+	}
+
+	pdn = pci_get_pdn(pdev);
+	pdn->vfs = 0;
+
+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
+		res = &pdev->resource[i];
+		if (!res->flags || res->parent)
+			continue;
+
+		if (!is_mem_pref_64_type(res->flags))
+			continue;
+
+		dev_info(&pdev->dev, "PowerNV: Fixing VF BAR[%d] %pR to\n",
+				i, res);
+		size = pci_sriov_resource_size(pdev, i);
+		res->end = res->start + size * phb->ioda.total_pe - 1;
+		dev_info(&pdev->dev, "                       %pR\n", res);
+	}
+	pdn->vfs = phb->ioda.total_pe;
+}
+
+static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
+{
+	struct pci_dev *pdev;
+	struct pci_bus *b;
+
+	list_for_each_entry(pdev, &bus->devices, bus_list) {
+		b = pdev->subordinate;
+
+		if (b)
+			pnv_pci_ioda_fixup_sriov(b);
+
+		pnv_pci_ioda_fixup_iov_resources(pdev);
+	}
+}
+#endif /* CONFIG_PCI_IOV */
+
 /*
  * This function is supposed to be called on basis of PE from top
  * to bottom style. So the the I/O or MMIO segment assigned to
@@ -1498,6 +1559,22 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
 	return phb->ioda.io_segsize;
 }
 
+#ifdef CONFIG_PCI_IOV
+static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int resno)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	u64 size = 0;
+
+	if (!pdn->vfs)
+		return size;
+
+	size = resource_size(pdev->resource + resno);
+	do_div(size, pdn->vfs);
+
+	return size;
+}
+#endif /* CONFIG_PCI_IOV */
+
 /* Prevent enabling devices for which we couldn't properly
  * assign a PE
  */
@@ -1692,9 +1769,15 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	 * for the P2P bridge bars so that each PCI bus (excluding
 	 * the child P2P bridges) can form individual PE.
 	 */
+#ifdef CONFIG_PCI_IOV
+	ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_sriov;
+#endif /* CONFIG_PCI_IOV */
 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 	ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook;
 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
+#ifdef CONFIG_PCI_IOV
+	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
+#endif /* CONFIG_PCI_IOV */
 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
 	/* Reset IODA tables to a clean state */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 11/17] ppc/pnv: Expand VF resources according to the number of total_pe
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

On PHB3, VF resources will be covered by M64 BAR to have better PE isolation.
Mostly the total_pe number is different from the total_VFs, which will lead to
a conflict between MMIO space and the PE number.

This patch expands the VF resource size to reserve total_pe number of VFs'
resource, which prevents the conflict.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/machdep.h        |    6 +++
 arch/powerpc/include/asm/pci-bridge.h     |    3 ++
 arch/powerpc/kernel/pci-common.c          |   15 ++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |   83 +++++++++++++++++++++++++++++
 4 files changed, 107 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index ad3025d..2f2e770 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -234,9 +234,15 @@ struct machdep_calls {
 
 	/* Called after scan and before resource survey */
 	void (*pcibios_fixup_phb)(struct pci_controller *hose);
+#ifdef CONFIG_PCI_IOV
+	void (*pcibios_fixup_sriov)(struct pci_bus *bus);
+#endif /* CONFIG_PCI_IOV */
 
 	/* Called during PCI resource reassignment */
 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
+#ifdef CONFIG_PCI_IOV
+	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
+#endif /* CONFIG_PCI_IOV */
 
 	/* Called to shutdown machine specific hardware not already controlled
 	 * by other drivers.
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 4ca90a3..8c849d8 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -168,6 +168,9 @@ struct pci_dn {
 #define IODA_INVALID_PE		(-1)
 #ifdef CONFIG_PPC_POWERNV
 	int	pe_number;
+#ifdef CONFIG_PCI_IOV
+	u16     vfs;
+#endif /* CONFIG_PCI_IOV */
 #endif
 };
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index c449a26..c4e2e92 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -120,6 +120,16 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 	return 1;
 }
 
+#ifdef CONFIG_PCI_IOV
+resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
+{
+	if (ppc_md.__pci_sriov_resource_size)
+		return ppc_md.__pci_sriov_resource_size(pdev, resno);
+
+	return 0;
+}
+#endif /* CONFIG_PCI_IOV */
+
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
 {
 #ifdef CONFIG_PPC64
@@ -1675,6 +1685,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
 	if (ppc_md.pcibios_fixup_phb)
 		ppc_md.pcibios_fixup_phb(hose);
 
+#ifdef CONFIG_PCI_IOV
+	if (ppc_md.pcibios_fixup_sriov)
+		ppc_md.pcibios_fixup_sriov(bus);
+#endif /* CONFIG_PCI_IOV */
+
 	/* Configure PCI Express settings */
 	if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
 		struct pci_bus *child;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 87cb3089..7dfad6a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1298,6 +1298,67 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
 #endif /* CONFIG_PCI_MSI */
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	struct resource *res;
+	int i;
+	resource_size_t size;
+	struct pci_dn *pdn;
+
+	if (!pdev->is_physfn || pdev->is_added)
+		return;
+
+	hose = pci_bus_to_host(pdev->bus);
+	if (!hose) {
+		dev_err(&pdev->dev, "%s: NULL pci_controller\n", __func__);
+		return;
+	}
+
+	phb = hose->private_data;
+	if (!phb) {
+		dev_err(&pdev->dev, "%s: NULL PHB\n", __func__);
+		return;
+	}
+
+	pdn = pci_get_pdn(pdev);
+	pdn->vfs = 0;
+
+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
+		res = &pdev->resource[i];
+		if (!res->flags || res->parent)
+			continue;
+
+		if (!is_mem_pref_64_type(res->flags))
+			continue;
+
+		dev_info(&pdev->dev, "PowerNV: Fixing VF BAR[%d] %pR to\n",
+				i, res);
+		size = pci_sriov_resource_size(pdev, i);
+		res->end = res->start + size * phb->ioda.total_pe - 1;
+		dev_info(&pdev->dev, "                       %pR\n", res);
+	}
+	pdn->vfs = phb->ioda.total_pe;
+}
+
+static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
+{
+	struct pci_dev *pdev;
+	struct pci_bus *b;
+
+	list_for_each_entry(pdev, &bus->devices, bus_list) {
+		b = pdev->subordinate;
+
+		if (b)
+			pnv_pci_ioda_fixup_sriov(b);
+
+		pnv_pci_ioda_fixup_iov_resources(pdev);
+	}
+}
+#endif /* CONFIG_PCI_IOV */
+
 /*
  * This function is supposed to be called on basis of PE from top
  * to bottom style. So the the I/O or MMIO segment assigned to
@@ -1498,6 +1559,22 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
 	return phb->ioda.io_segsize;
 }
 
+#ifdef CONFIG_PCI_IOV
+static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int resno)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	u64 size = 0;
+
+	if (!pdn->vfs)
+		return size;
+
+	size = resource_size(pdev->resource + resno);
+	do_div(size, pdn->vfs);
+
+	return size;
+}
+#endif /* CONFIG_PCI_IOV */
+
 /* Prevent enabling devices for which we couldn't properly
  * assign a PE
  */
@@ -1692,9 +1769,15 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	 * for the P2P bridge bars so that each PCI bus (excluding
 	 * the child P2P bridges) can form individual PE.
 	 */
+#ifdef CONFIG_PCI_IOV
+	ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_sriov;
+#endif /* CONFIG_PCI_IOV */
 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 	ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook;
 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
+#ifdef CONFIG_PCI_IOV
+	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
+#endif /* CONFIG_PCI_IOV */
 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
 	/* Reset IODA tables to a clean state */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

This patch implements the pcibios_sriov_resource_alignment() on powernv
platform.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/machdep.h        |    1 +
 arch/powerpc/kernel/pci-common.c          |    8 ++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |   17 +++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 2f2e770..3bbc55f 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -242,6 +242,7 @@ struct machdep_calls {
 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
 #ifdef CONFIG_PCI_IOV
 	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
+	resource_size_t (*__pci_sriov_resource_alignment)(struct pci_dev *, int resno, resource_size_t align);
 #endif /* CONFIG_PCI_IOV */
 
 	/* Called to shutdown machine specific hardware not already controlled
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index c4e2e92..35345ac 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -128,6 +128,14 @@ resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
 
 	return 0;
 }
+
+resource_size_t pcibios_sriov_resource_alignment(struct pci_dev *pdev, int resno, resource_size_t align)
+{
+	if (ppc_md.__pci_sriov_resource_alignment)
+		return ppc_md.__pci_sriov_resource_alignment(pdev, resno, align);
+
+	return 0;
+}
 #endif /* CONFIG_PCI_IOV */
 
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7dfad6a..b0ac851 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1573,6 +1573,22 @@ static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int r
 
 	return size;
 }
+
+static resource_size_t __pnv_pci_sriov_resource_alignment(struct pci_dev *pdev, int resno,
+		resource_size_t align)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	resource_size_t iov_align;
+
+	iov_align = resource_size(&pdev->resource[resno]);
+	if (iov_align)
+		return iov_align;
+
+	if (pdn->vfs)
+		return pdn->vfs * align;
+
+	return align;
+}
 #endif /* CONFIG_PCI_IOV */
 
 /* Prevent enabling devices for which we couldn't properly
@@ -1777,6 +1793,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
 #ifdef CONFIG_PCI_IOV
 	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
+	ppc_md.__pci_sriov_resource_alignment = __pnv_pci_sriov_resource_alignment;
 #endif /* CONFIG_PCI_IOV */
 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

This patch implements the pcibios_sriov_resource_alignment() on powernv
platform.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/machdep.h        |    1 +
 arch/powerpc/kernel/pci-common.c          |    8 ++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |   17 +++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 2f2e770..3bbc55f 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -242,6 +242,7 @@ struct machdep_calls {
 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
 #ifdef CONFIG_PCI_IOV
 	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
+	resource_size_t (*__pci_sriov_resource_alignment)(struct pci_dev *, int resno, resource_size_t align);
 #endif /* CONFIG_PCI_IOV */
 
 	/* Called to shutdown machine specific hardware not already controlled
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index c4e2e92..35345ac 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -128,6 +128,14 @@ resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
 
 	return 0;
 }
+
+resource_size_t pcibios_sriov_resource_alignment(struct pci_dev *pdev, int resno, resource_size_t align)
+{
+	if (ppc_md.__pci_sriov_resource_alignment)
+		return ppc_md.__pci_sriov_resource_alignment(pdev, resno, align);
+
+	return 0;
+}
 #endif /* CONFIG_PCI_IOV */
 
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7dfad6a..b0ac851 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1573,6 +1573,22 @@ static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int r
 
 	return size;
 }
+
+static resource_size_t __pnv_pci_sriov_resource_alignment(struct pci_dev *pdev, int resno,
+		resource_size_t align)
+{
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	resource_size_t iov_align;
+
+	iov_align = resource_size(&pdev->resource[resno]);
+	if (iov_align)
+		return iov_align;
+
+	if (pdn->vfs)
+		return pdn->vfs * align;
+
+	return align;
+}
 #endif /* CONFIG_PCI_IOV */
 
 /* Prevent enabling devices for which we couldn't properly
@@ -1777,6 +1793,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
 #ifdef CONFIG_PCI_IOV
 	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
+	ppc_md.__pci_sriov_resource_alignment = __pnv_pci_sriov_resource_alignment;
 #endif /* CONFIG_PCI_IOV */
 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 13/17] powerpc/powernv: shift VF resource with an offset
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

On powrnv platform, resource position in M64 implies the PE# the resource
belongs to. In some particular case, adjustment of a resource is necessary to
locate it to a correct position in M64.

This patch introduce a function to shift the 'real' VF BAR address according to
an offset.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   30 +++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index b0ac851..e46c5bf 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -13,6 +13,7 @@
 
 #include <linux/kernel.h>
 #include <linux/pci.h>
+#include <linux/pci_regs.h>
 #include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/string.h>
@@ -544,6 +545,35 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
 	return 10;
 }
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
+{
+	struct pci_dn *pdn = pci_get_pdn(dev);
+	int i;
+	struct resource *res;
+	resource_size_t size;
+
+	if (dev->is_physfn) {
+		for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+			res = dev->resource + PCI_IOV_RESOURCES + i;
+			if (!res->flags || !res->parent)
+				continue;
+
+			if (!is_mem_pref_64_type(res->flags))
+				continue;
+
+			dev_info(&dev->dev, "PowerNV: Shifting VF BAR %pR to\n", res);
+			size = pci_sriov_resource_size(dev, PCI_IOV_RESOURCES + i);
+			res->start += size*offset;
+
+			dev_info(&dev->dev, "                         %pR\n", res);
+			pci_update_resource(dev, PCI_IOV_RESOURCES + i);
+		}
+		pdn->vfs -= offset;
+	}
+}
+#endif /* CONFIG_PCI_IOV */
+
 #if 0
 static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 13/17] powerpc/powernv: shift VF resource with an offset
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

On powrnv platform, resource position in M64 implies the PE# the resource
belongs to. In some particular case, adjustment of a resource is necessary to
locate it to a correct position in M64.

This patch introduce a function to shift the 'real' VF BAR address according to
an offset.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   30 +++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index b0ac851..e46c5bf 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -13,6 +13,7 @@
 
 #include <linux/kernel.h>
 #include <linux/pci.h>
+#include <linux/pci_regs.h>
 #include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/string.h>
@@ -544,6 +545,35 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
 	return 10;
 }
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
+{
+	struct pci_dn *pdn = pci_get_pdn(dev);
+	int i;
+	struct resource *res;
+	resource_size_t size;
+
+	if (dev->is_physfn) {
+		for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+			res = dev->resource + PCI_IOV_RESOURCES + i;
+			if (!res->flags || !res->parent)
+				continue;
+
+			if (!is_mem_pref_64_type(res->flags))
+				continue;
+
+			dev_info(&dev->dev, "PowerNV: Shifting VF BAR %pR to\n", res);
+			size = pci_sriov_resource_size(dev, PCI_IOV_RESOURCES + i);
+			res->start += size*offset;
+
+			dev_info(&dev->dev, "                         %pR\n", res);
+			pci_update_resource(dev, PCI_IOV_RESOURCES + i);
+		}
+		pdn->vfs -= offset;
+	}
+}
+#endif /* CONFIG_PCI_IOV */
+
 #if 0
 static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 14/17] ppc/pci: create/release dev-tree node for VFs
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

Currently, powernv platform is not aware of VFs. This means no dev-node
represents a VF. Also, VF PCI device is created when PF driver want to enable
it. This leads to the pdn->pdev and pdn->pe_number an invalid value.

This patch create/release dev-node for VF and fixs this when a VF's pci_dev
is created.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Kconfig    |    1 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  103 +++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci.c      |   20 ++++++
 3 files changed, 124 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 895e8a2..0dd331b 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -11,6 +11,7 @@ config PPC_POWERNV
 	select PPC_UDBG_16550
 	select PPC_SCOM
 	select ARCH_RANDOM
+	select OF_DYNAMIC
 	default y
 
 config PPC_POWERNV_RTAS
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e46c5bf..9ace027 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -23,6 +23,7 @@
 #include <linux/io.h>
 #include <linux/msi.h>
 #include <linux/memblock.h>
+#include <linux/of_pci.h>
 
 #include <asm/sections.h>
 #include <asm/io.h>
@@ -771,6 +772,108 @@ static void pnv_pci_ioda_setup_PEs(void)
 	}
 }
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_create_vf_node(struct pci_dev *dev, u16 vf_num)
+{
+	struct device_node *dn, *p_dn;
+	struct pci_dn *pdn;
+	struct pci_controller *hose;
+	struct property *pp;
+	void* value;
+	u16 id;
+
+	hose = pci_bus_to_host(dev->bus);
+
+	/* Create dev-tree node for VFs if this is a PF */
+	p_dn = pci_bus_to_OF_node(dev->bus);
+	if (p_dn == NULL) {
+		dev_err(&dev->dev, "SRIOV: VF bus NULL device node\n");
+		return;
+	}
+
+	for (id = 0; id < vf_num; id++) {
+		dn = kzalloc(sizeof(*dn), GFP_KERNEL);
+		pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
+		pp  = kzalloc(sizeof(*pp), GFP_KERNEL);
+		value = kzalloc(sizeof(u32), GFP_KERNEL);
+
+		if (!dn || !pdn || !pp || !value) {
+			kfree(dn);
+			kfree(pdn);
+			kfree(pp);
+			kfree(value);
+			dev_warn(&dev->dev, "%s: failed to create"
+				"dev-tree node for idx(%d)\n",
+				__func__, id);
+
+			break;
+		}
+
+		pp->value = value;
+		pdn->node = dn;
+		pdn->devfn = pci_iov_virtfn_devfn(dev, id);
+		pdn->busno = dev->bus->number;
+		pdn->pe_number = IODA_INVALID_PE;
+		pdn->phb = hose;
+
+		dn->data = pdn;
+		kref_init(&dn->kref);
+		dn->full_name = dn->name =
+			kasprintf(GFP_KERNEL, "%s/vf%d",
+				p_dn->full_name, pdn->devfn);
+		dn->parent = p_dn;
+
+		pp->name = kasprintf(GFP_KERNEL, "reg");
+		pp->length = 5 * sizeof(__be32);
+		*(u32*)pp->value = cpu_to_be32(pdn->devfn) << 8;
+		dn->properties = pp;
+
+		of_attach_node(dn);
+	}
+}
+
+static void pnv_pci_release_vf_node(struct pci_dev *dev, u16 vf_num)
+{
+	struct device_node *dn;
+	struct property *pp;
+	u16 id;
+
+	for (id = 0; id < vf_num; id++) {
+		dn = of_pci_find_child_device(dev->bus->dev.of_node,
+				pci_iov_virtfn_devfn(dev, id));
+		if (!dn)
+			continue;
+
+		of_detach_node(dn);
+		pp = dn->properties;
+		kfree(pp->name);
+		kfree(pp->value);
+		kfree(pp);
+		kfree(dn->data);
+		kfree(dn);
+	}
+}
+
+int pcibios_sriov_disable(struct pci_dev *pdev)
+{
+	struct pci_sriov *iov;
+	u16 vf_num;
+
+	iov = pdev->sriov;
+	vf_num = iov->num_VFs;
+	pnv_pci_release_vf_node(pdev, vf_num);
+
+	return 0;
+}
+
+int pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
+{
+	pnv_pci_create_vf_node(pdev, vf_num);
+
+	return 0;
+}
+#endif /* CONFIG_PCI_IOV */
+
 static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev)
 {
 	struct pci_dn *pdn = pci_get_pdn(pdev);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 687a068..43fcc73 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -654,6 +654,26 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 {
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
+#ifdef CONFIG_PCI_IOV
+	struct pnv_ioda_pe *pe;
+	struct pci_dn *pdn;
+
+	/* Fix the VF pdn PE number */
+	if (pdev->is_virtfn) {
+		pdn = pci_get_pdn(pdev);
+		if (pdn->pcidev == NULL || pdn->pe_number == IODA_INVALID_PE) {
+			list_for_each_entry(pe, &phb->ioda.pe_list, list) {
+				if (pe->rid ==
+					((pdev->bus->number << 8) | (pdev->devfn & 0xff))) {
+					pdn->pcidev = pdev;
+					pdn->pe_number = pe->pe_number;
+					pe->pdev = pdev;
+					break;
+				}
+			}
+		}
+	}
+#endif /* CONFIG_PCI_IOV */
 
 	/* If we have no phb structure, try to setup a fallback based on
 	 * the device-tree (RTAS PCI for example)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 14/17] ppc/pci: create/release dev-tree node for VFs
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

Currently, powernv platform is not aware of VFs. This means no dev-node
represents a VF. Also, VF PCI device is created when PF driver want to enable
it. This leads to the pdn->pdev and pdn->pe_number an invalid value.

This patch create/release dev-node for VF and fixs this when a VF's pci_dev
is created.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Kconfig    |    1 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  103 +++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci.c      |   20 ++++++
 3 files changed, 124 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 895e8a2..0dd331b 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -11,6 +11,7 @@ config PPC_POWERNV
 	select PPC_UDBG_16550
 	select PPC_SCOM
 	select ARCH_RANDOM
+	select OF_DYNAMIC
 	default y
 
 config PPC_POWERNV_RTAS
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e46c5bf..9ace027 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -23,6 +23,7 @@
 #include <linux/io.h>
 #include <linux/msi.h>
 #include <linux/memblock.h>
+#include <linux/of_pci.h>
 
 #include <asm/sections.h>
 #include <asm/io.h>
@@ -771,6 +772,108 @@ static void pnv_pci_ioda_setup_PEs(void)
 	}
 }
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_create_vf_node(struct pci_dev *dev, u16 vf_num)
+{
+	struct device_node *dn, *p_dn;
+	struct pci_dn *pdn;
+	struct pci_controller *hose;
+	struct property *pp;
+	void* value;
+	u16 id;
+
+	hose = pci_bus_to_host(dev->bus);
+
+	/* Create dev-tree node for VFs if this is a PF */
+	p_dn = pci_bus_to_OF_node(dev->bus);
+	if (p_dn == NULL) {
+		dev_err(&dev->dev, "SRIOV: VF bus NULL device node\n");
+		return;
+	}
+
+	for (id = 0; id < vf_num; id++) {
+		dn = kzalloc(sizeof(*dn), GFP_KERNEL);
+		pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
+		pp  = kzalloc(sizeof(*pp), GFP_KERNEL);
+		value = kzalloc(sizeof(u32), GFP_KERNEL);
+
+		if (!dn || !pdn || !pp || !value) {
+			kfree(dn);
+			kfree(pdn);
+			kfree(pp);
+			kfree(value);
+			dev_warn(&dev->dev, "%s: failed to create"
+				"dev-tree node for idx(%d)\n",
+				__func__, id);
+
+			break;
+		}
+
+		pp->value = value;
+		pdn->node = dn;
+		pdn->devfn = pci_iov_virtfn_devfn(dev, id);
+		pdn->busno = dev->bus->number;
+		pdn->pe_number = IODA_INVALID_PE;
+		pdn->phb = hose;
+
+		dn->data = pdn;
+		kref_init(&dn->kref);
+		dn->full_name = dn->name =
+			kasprintf(GFP_KERNEL, "%s/vf%d",
+				p_dn->full_name, pdn->devfn);
+		dn->parent = p_dn;
+
+		pp->name = kasprintf(GFP_KERNEL, "reg");
+		pp->length = 5 * sizeof(__be32);
+		*(u32*)pp->value = cpu_to_be32(pdn->devfn) << 8;
+		dn->properties = pp;
+
+		of_attach_node(dn);
+	}
+}
+
+static void pnv_pci_release_vf_node(struct pci_dev *dev, u16 vf_num)
+{
+	struct device_node *dn;
+	struct property *pp;
+	u16 id;
+
+	for (id = 0; id < vf_num; id++) {
+		dn = of_pci_find_child_device(dev->bus->dev.of_node,
+				pci_iov_virtfn_devfn(dev, id));
+		if (!dn)
+			continue;
+
+		of_detach_node(dn);
+		pp = dn->properties;
+		kfree(pp->name);
+		kfree(pp->value);
+		kfree(pp);
+		kfree(dn->data);
+		kfree(dn);
+	}
+}
+
+int pcibios_sriov_disable(struct pci_dev *pdev)
+{
+	struct pci_sriov *iov;
+	u16 vf_num;
+
+	iov = pdev->sriov;
+	vf_num = iov->num_VFs;
+	pnv_pci_release_vf_node(pdev, vf_num);
+
+	return 0;
+}
+
+int pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
+{
+	pnv_pci_create_vf_node(pdev, vf_num);
+
+	return 0;
+}
+#endif /* CONFIG_PCI_IOV */
+
 static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev)
 {
 	struct pci_dn *pdn = pci_get_pdn(pdev);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 687a068..43fcc73 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -654,6 +654,26 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 {
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
+#ifdef CONFIG_PCI_IOV
+	struct pnv_ioda_pe *pe;
+	struct pci_dn *pdn;
+
+	/* Fix the VF pdn PE number */
+	if (pdev->is_virtfn) {
+		pdn = pci_get_pdn(pdev);
+		if (pdn->pcidev == NULL || pdn->pe_number == IODA_INVALID_PE) {
+			list_for_each_entry(pe, &phb->ioda.pe_list, list) {
+				if (pe->rid ==
+					((pdev->bus->number << 8) | (pdev->devfn & 0xff))) {
+					pdn->pcidev = pdev;
+					pdn->pe_number = pe->pe_number;
+					pe->pdev = pdev;
+					break;
+				}
+			}
+		}
+	}
+#endif /* CONFIG_PCI_IOV */
 
 	/* If we have no phb structure, try to setup a fallback based on
 	 * the device-tree (RTAS PCI for example)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 15/17] powerpc/powernv: allocate VF PE
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

VFs are created, when driver intends to enable sriov.

This patch assign related resources and allocate PEs for VF at this moment.
This patch allocate enough M64 for IOV BAR and shift the VF resource to meet
the PE# indicated by M64.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |    2 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  340 ++++++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci.h      |   10 +-
 3 files changed, 339 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 8c849d8..72f0af5 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -170,6 +170,8 @@ struct pci_dn {
 	int	pe_number;
 #ifdef CONFIG_PCI_IOV
 	u16     vfs;
+	int     offset;
+	int     m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
 };
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9ace027..fb2c2c6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -42,6 +42,17 @@
 #include "powernv.h"
 #include "pci.h"
 
+#ifdef CONFIG_PCI_IOV
+#define VF_PE_LOG						\
+	else if (pe->flags & PNV_IODA_PE_VF)                    \
+		sprintf(pfix, "%04x:%02x:%2x.%d",               \
+			pci_domain_nr(pe->parent_dev->bus),     \
+			(pe->rid & 0xff00) >> 8,                \
+			PCI_SLOT(pe->rid), PCI_FUNC(pe->rid));
+#else  /* CONFIG_PCI_IOV*/
+#define VF_PE_LOG
+#endif /* CONFIG_PCI_IOV*/
+
 #define define_pe_printk_level(func, kern_level)		\
 static int func(const struct pnv_ioda_pe *pe, const char *fmt, ...)	\
 {								\
@@ -55,13 +66,14 @@ static int func(const struct pnv_ioda_pe *pe, const char *fmt, ...)	\
 	vaf.fmt = fmt;						\
 	vaf.va = &args;						\
 								\
-	if (pe->pdev)						\
+	if (pe->flags & PNV_IODA_PE_DEV)			\
 		strlcpy(pfix, dev_name(&pe->pdev->dev),		\
 			sizeof(pfix));				\
-	else							\
+	else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) \
 		sprintf(pfix, "%04x:%02x     ",			\
 			pci_domain_nr(pe->pbus),		\
 			pe->pbus->number);			\
+	VF_PE_LOG						\
 	r = printk(kern_level "pci %s: [PE# %.3d] %pV",		\
 		   pfix, pe->pe_number, &vaf);			\
 								\
@@ -365,7 +377,12 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 		}
 		rid_end = pe->rid + (count << 8);
 	}else {
-		parent = pe->pdev->bus->self;
+#ifdef CONFIG_PCI_IOV
+		if (pe->flags & PNV_IODA_PE_VF)
+			parent = pe->parent_dev;
+		else
+#endif /* CONFIG_PCI_IOV */
+			parent = pe->pdev->bus->self;
 		bcomp = OpalPciBusAll;
 		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
 		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
@@ -405,6 +422,9 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	pe->pbus = NULL;
 	pe->pdev = NULL;
+#ifdef CONFIG_PCI_IOV
+	pe->parent_dev = NULL;
+#endif /* CONFIG_PCI_IOV */
 
 	return 0;
 }
@@ -443,7 +463,12 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 		}
 		rid_end = pe->rid + (count << 8);
 	} else {
-		parent = pe->pdev->bus->self;
+#ifdef CONFIG_PCI_IOV
+		if (pe->flags & PNV_IODA_PE_VF)
+			parent = pe->parent_dev;
+		else
+#endif /* CONFIG_PCI_IOV */
+			parent = pe->pdev->bus->self;
 		bcomp = OpalPciBusAll;
 		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
 		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
@@ -773,6 +798,114 @@ static void pnv_pci_ioda_setup_PEs(void)
 }
 
 #ifdef CONFIG_PCI_IOV
+static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
+{
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pci_dn         *pdn;
+	int                    i;
+
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+		if (pdn->m64_wins[i] == -1)
+			continue;
+		opal_pci_phb_mmio_enable(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 0);
+		clear_bit(pdn->m64_wins[i], &phb->ioda.m64win_alloc);
+		pdn->m64_wins[i] = -1;
+	}
+
+	return 0;
+}
+
+static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
+{
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pci_dn         *pdn;
+	unsigned int           win;
+	struct resource       *res;
+	int                    i;
+	int64_t                rc;
+
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
+
+	/* Initialize the m64_wins to -1 */
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+		pdn->m64_wins[i] = -1;
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+		res = pdev->resource + PCI_IOV_RESOURCES + i;
+		if (!res->flags || !res->parent)
+			continue;
+
+		if (!is_mem_pref_64_type(res->flags))
+			continue;
+
+		do {
+			win = find_next_zero_bit(&phb->ioda.m64win_alloc,
+					phb->ioda.m64_bars, 0);
+
+			if (win >= phb->ioda.m64_bars)
+				goto m64_failed;
+		} while (test_and_set_bit(win, &phb->ioda.m64win_alloc));
+
+		pdn->m64_wins[i] = win;
+
+		/* Map the M64 here */
+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
+						 OPAL_M64_WINDOW_TYPE,
+						 pdn->m64_wins[i],
+						 res->start,
+						 0, /* unused */
+						 resource_size(res));
+		if (rc != OPAL_SUCCESS) {
+			pr_err("Failed to map M64 BAR #%d: %lld\n", win, rc);
+			goto m64_failed;
+		}
+
+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 1);
+		if (rc != OPAL_SUCCESS) {
+			pr_err("Failed to enable M64 BAR #%d: %llx\n", win, rc);
+			goto m64_failed;
+		}
+	}
+	return 0;
+
+m64_failed:
+	pnv_pci_vf_release_m64(pdev);
+	return -EBUSY;
+}
+
+static void pnv_pci_release_dev_dma(struct pci_dev *dev, struct pnv_ioda_pe *pe)
+{
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct iommu_table    *tbl;
+	unsigned long         addr;
+
+	bus = dev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	tbl = pe->tce32_table;
+	addr = tbl->it_base;
+
+	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
+	free_pages(addr, get_order(PNV_TCE32_TAB_SIZE));
+	pe->tce32_table = NULL;
+}
+
 static void pnv_pci_create_vf_node(struct pci_dev *dev, u16 vf_num)
 {
 	struct device_node *dn, *p_dn;
@@ -854,23 +987,186 @@ static void pnv_pci_release_vf_node(struct pci_dev *dev, u16 vf_num)
 	}
 }
 
+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
+{
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pnv_ioda_pe    *pe, *pe_n;
+	struct pci_dn         *pdn;
+
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+
+	if (!pdev->is_physfn)
+		return;
+
+	pdn = pci_get_pdn(pdev);
+	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
+		if (pe->parent_dev != pdev)
+			continue;
+
+		pnv_pci_release_dev_dma(pdev, pe);
+
+		/* Remove from list */
+		mutex_lock(&phb->ioda.pe_list_mutex);
+		list_del(&pe->list);
+		mutex_unlock(&phb->ioda.pe_list_mutex);
+
+		pnv_ioda_deconfigure_pe(phb, pe);
+
+		pnv_ioda_free_pe(phb, pe->pe_number);
+	}
+}
+
 int pcibios_sriov_disable(struct pci_dev *pdev)
 {
-	struct pci_sriov *iov;
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pci_dn         *pdn;
+	struct pci_sriov      *iov;
 	u16 vf_num;
 
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
 	iov = pdev->sriov;
 	vf_num = iov->num_VFs;
+
+	/* Release VF PEs */
+	pnv_ioda_release_vf_PE(pdev);
 	pnv_pci_release_vf_node(pdev, vf_num);
 
+	if (phb->type == PNV_PHB_IODA2) {
+		pnv_pci_vf_resource_shift(pdev, -pdn->offset);
+
+		/* Release M64 BARs */
+		pnv_pci_vf_release_m64(pdev);
+
+		/* Release PE numbers */
+		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, vf_num);
+		pdn->offset = 0;
+	}
+
 	return 0;
 }
 
+static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
+				       struct pnv_ioda_pe *pe);
+static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 vf_num)
+{
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pnv_ioda_pe    *pe;
+	int                    pe_num;
+	u16                    vf_index;
+	struct pci_dn         *pdn;
+
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
+
+	if (!pdev->is_physfn)
+		return;
+
+	/* Reserve PE for each VF */
+	for (vf_index = 0; vf_index < vf_num; vf_index++) {
+		pe_num = pdn->offset + vf_index;
+
+		pe = &phb->ioda.pe_array[pe_num];
+		pe->pe_number = pe_num;
+		pe->phb = phb;
+		pe->flags = PNV_IODA_PE_VF;
+		pe->pbus = NULL;
+		pe->parent_dev = pdev;
+		pe->tce32_seg = -1;
+		pe->mve_number = -1;
+		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
+			   pci_iov_virtfn_devfn(pdev, vf_index);
+
+		pe_info(pe, "VF %04d:%02d:%02d.%d associated with PE#%d\n",
+			hose->global_number, pdev->bus->number,
+			PCI_SLOT(pci_iov_virtfn_devfn(pdev, vf_index)),
+			PCI_FUNC(pci_iov_virtfn_devfn(pdev, vf_index)), pe_num);
+
+		if (pnv_ioda_configure_pe(phb, pe)) {
+			/* XXX What do we do here ? */
+			if (pe_num)
+				pnv_ioda_free_pe(phb, pe_num);
+			pe->pdev = NULL;
+			continue;
+		}
+
+		pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
+				GFP_KERNEL, hose->node);
+		pe->tce32_table->data = pe;
+
+		/* Put PE to the list */
+		mutex_lock(&phb->ioda.pe_list_mutex);
+		list_add_tail(&pe->list, &phb->ioda.pe_list);
+		mutex_unlock(&phb->ioda.pe_list_mutex);
+
+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
+
+	}
+}
+
 int pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
 {
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pci_dn         *pdn;
+	int                    ret;
+
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
+
+	if (phb->type == PNV_PHB_IODA2) {
+		/* Calculate available PE for required VFs */
+		mutex_lock(&phb->ioda.pe_alloc_mutex);
+		pdn->offset = bitmap_find_next_zero_area(
+			phb->ioda.pe_alloc, phb->ioda.total_pe,
+			0, vf_num, 0);
+		if (pdn->offset >= phb->ioda.total_pe) {
+			mutex_unlock(&phb->ioda.pe_alloc_mutex);
+			pr_info("Failed to enable %d VFs, reduce VF number"
+				" and try again\n", vf_num);
+			pdn->offset = 0;
+			return -EBUSY;
+		}
+		bitmap_set(phb->ioda.pe_alloc, pdn->offset, vf_num);
+		mutex_unlock(&phb->ioda.pe_alloc_mutex);
+
+		/* Assign M64 BAR accordingly */
+		ret = pnv_pci_vf_assign_m64(pdev);
+		if (ret) {
+			pr_info("No enough M64 resource\n");
+			goto m64_failed;
+		}
+
+		/* Do some magic shift */
+		pnv_pci_vf_resource_shift(pdev, pdn->offset);
+	}
+
+	/* Setup VF PEs */
 	pnv_pci_create_vf_node(pdev, vf_num);
+	pnv_ioda_setup_vf_PE(pdev, vf_num);
 
 	return 0;
+
+m64_failed:
+	bitmap_clear(phb->ioda.pe_alloc, pdn->offset, vf_num);
+	pdn->offset = 0;
+
+	return ret;
 }
 #endif /* CONFIG_PCI_IOV */
 
@@ -1095,12 +1391,22 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 			       TCE_PCI_SWINV_PAIR;
 	}
 	iommu_init_table(tbl, phb->hose->node);
-	iommu_register_group(tbl, pci_domain_nr(pe->pbus), pe->pe_number);
 
-	if (pe->pdev)
+	if (pe->flags & PNV_IODA_PE_DEV) {
+		iommu_register_group(tbl, pci_domain_nr(pe->pdev->bus),
+				pe->pe_number);
 		set_iommu_table_base_and_group(&pe->pdev->dev, tbl);
-	else
+	}
+	else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) {
+		iommu_register_group(tbl, pci_domain_nr(pe->pbus),
+				pe->pe_number);
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
+	}
+#ifdef CONFIG_PCI_IOV
+	else if (pe->flags & PNV_IODA_PE_VF)
+		iommu_register_group(tbl, pci_domain_nr(pe->parent_dev->bus),
+				pe->pe_number);
+#endif /* CONFIG_PCI_IOV */
 
 	return;
  fail:
@@ -1223,12 +1529,22 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 		tbl->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE;
 	}
 	iommu_init_table(tbl, phb->hose->node);
-	iommu_register_group(tbl, pci_domain_nr(pe->pbus), pe->pe_number);
 
-	if (pe->pdev)
+	if (pe->flags & PNV_IODA_PE_DEV) {
+		iommu_register_group(tbl, pci_domain_nr(pe->pdev->bus),
+				pe->pe_number);
 		set_iommu_table_base_and_group(&pe->pdev->dev, tbl);
-	else
+	}
+	else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) {
+		iommu_register_group(tbl, pci_domain_nr(pe->pbus),
+				pe->pe_number);
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
+	}
+#ifdef CONFIG_PCI_IOV
+	else if (pe->flags & PNV_IODA_PE_VF)
+		iommu_register_group(tbl, pci_domain_nr(pe->parent_dev->bus),
+				pe->pe_number);
+#endif /* CONFIG_PCI_IOV */
 
 	/* Also create a bypass window */
 	pnv_pci_ioda2_setup_bypass_pe(phb, pe);
@@ -1813,6 +2129,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb->hub_id = hub_id;
 	phb->opal_id = phb_id;
 	phb->type = ioda_type;
+	mutex_init(&phb->ioda.pe_alloc_mutex);
 
 	/* Detect specific models for error handling */
 	if (of_device_is_compatible(np, "ibm,p7ioc-pciex"))
@@ -1873,6 +2190,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 
 	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
+	mutex_init(&phb->ioda.pe_list_mutex);
 
 	/* Calculate how many 32-bit TCE segments we have */
 	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> PNV_TCE32_SEG_SHIFT;
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 9fbf7c0..e3ca524 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -28,8 +28,9 @@ static inline bool is_mem_pref_64_type(unsigned long flags)
 
 #define PNV_PCI_DIAG_BUF_SIZE	8192
 #define PNV_IODA_PE_DEV		(1 << 0)	/* PE has single PCI device	*/
-#define PNV_IODA_PE_BUS		(1 << 1)	/* PE has primary PCI bus	*/
-#define PNV_IODA_PE_BUS_ALL	(1 << 2)	/* PE has subordinate buses	*/
+#define PNV_IODA_PE_VF		(1 << 1)	/* PE for one VF 		*/
+#define PNV_IODA_PE_BUS		(1 << 2)	/* PE has primary PCI bus	*/
+#define PNV_IODA_PE_BUS_ALL	(1 << 3)	/* PE has subordinate buses	*/
 
 /* Data associated with a PE, including IOMMU tracking etc.. */
 struct pnv_phb;
@@ -41,6 +42,9 @@ struct pnv_ioda_pe {
 	 * entire bus (& children). In the former case, pdev
 	 * is populated, in the later case, pbus is.
 	 */
+#ifdef CONFIG_PCI_IOV
+	struct pci_dev          *parent_dev;
+#endif
 	struct pci_dev		*pdev;
 	struct pci_bus		*pbus;
 
@@ -156,6 +160,7 @@ struct pnv_phb {
 
 			/* PE allocation bitmap */
 			unsigned long		*pe_alloc;
+			struct mutex             pe_alloc_mutex;
 
 			/* M64 window allocation bitmap */
 			unsigned long		m64win_alloc;
@@ -174,6 +179,7 @@ struct pnv_phb {
 			 * on the sequence of creation
 			 */
 			struct list_head	pe_list;
+			struct mutex            pe_list_mutex;
 
 			/* Reverse map of PEs, will have to extend if
 			 * we are to support more than 256 PEs, indexed
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 15/17] powerpc/powernv: allocate VF PE
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

VFs are created, when driver intends to enable sriov.

This patch assign related resources and allocate PEs for VF at this moment.
This patch allocate enough M64 for IOV BAR and shift the VF resource to meet
the PE# indicated by M64.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |    2 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  340 ++++++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci.h      |   10 +-
 3 files changed, 339 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 8c849d8..72f0af5 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -170,6 +170,8 @@ struct pci_dn {
 	int	pe_number;
 #ifdef CONFIG_PCI_IOV
 	u16     vfs;
+	int     offset;
+	int     m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
 };
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9ace027..fb2c2c6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -42,6 +42,17 @@
 #include "powernv.h"
 #include "pci.h"
 
+#ifdef CONFIG_PCI_IOV
+#define VF_PE_LOG						\
+	else if (pe->flags & PNV_IODA_PE_VF)                    \
+		sprintf(pfix, "%04x:%02x:%2x.%d",               \
+			pci_domain_nr(pe->parent_dev->bus),     \
+			(pe->rid & 0xff00) >> 8,                \
+			PCI_SLOT(pe->rid), PCI_FUNC(pe->rid));
+#else  /* CONFIG_PCI_IOV*/
+#define VF_PE_LOG
+#endif /* CONFIG_PCI_IOV*/
+
 #define define_pe_printk_level(func, kern_level)		\
 static int func(const struct pnv_ioda_pe *pe, const char *fmt, ...)	\
 {								\
@@ -55,13 +66,14 @@ static int func(const struct pnv_ioda_pe *pe, const char *fmt, ...)	\
 	vaf.fmt = fmt;						\
 	vaf.va = &args;						\
 								\
-	if (pe->pdev)						\
+	if (pe->flags & PNV_IODA_PE_DEV)			\
 		strlcpy(pfix, dev_name(&pe->pdev->dev),		\
 			sizeof(pfix));				\
-	else							\
+	else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) \
 		sprintf(pfix, "%04x:%02x     ",			\
 			pci_domain_nr(pe->pbus),		\
 			pe->pbus->number);			\
+	VF_PE_LOG						\
 	r = printk(kern_level "pci %s: [PE# %.3d] %pV",		\
 		   pfix, pe->pe_number, &vaf);			\
 								\
@@ -365,7 +377,12 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 		}
 		rid_end = pe->rid + (count << 8);
 	}else {
-		parent = pe->pdev->bus->self;
+#ifdef CONFIG_PCI_IOV
+		if (pe->flags & PNV_IODA_PE_VF)
+			parent = pe->parent_dev;
+		else
+#endif /* CONFIG_PCI_IOV */
+			parent = pe->pdev->bus->self;
 		bcomp = OpalPciBusAll;
 		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
 		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
@@ -405,6 +422,9 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 
 	pe->pbus = NULL;
 	pe->pdev = NULL;
+#ifdef CONFIG_PCI_IOV
+	pe->parent_dev = NULL;
+#endif /* CONFIG_PCI_IOV */
 
 	return 0;
 }
@@ -443,7 +463,12 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 		}
 		rid_end = pe->rid + (count << 8);
 	} else {
-		parent = pe->pdev->bus->self;
+#ifdef CONFIG_PCI_IOV
+		if (pe->flags & PNV_IODA_PE_VF)
+			parent = pe->parent_dev;
+		else
+#endif /* CONFIG_PCI_IOV */
+			parent = pe->pdev->bus->self;
 		bcomp = OpalPciBusAll;
 		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
 		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
@@ -773,6 +798,114 @@ static void pnv_pci_ioda_setup_PEs(void)
 }
 
 #ifdef CONFIG_PCI_IOV
+static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
+{
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pci_dn         *pdn;
+	int                    i;
+
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+		if (pdn->m64_wins[i] == -1)
+			continue;
+		opal_pci_phb_mmio_enable(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 0);
+		clear_bit(pdn->m64_wins[i], &phb->ioda.m64win_alloc);
+		pdn->m64_wins[i] = -1;
+	}
+
+	return 0;
+}
+
+static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
+{
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pci_dn         *pdn;
+	unsigned int           win;
+	struct resource       *res;
+	int                    i;
+	int64_t                rc;
+
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
+
+	/* Initialize the m64_wins to -1 */
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+		pdn->m64_wins[i] = -1;
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+		res = pdev->resource + PCI_IOV_RESOURCES + i;
+		if (!res->flags || !res->parent)
+			continue;
+
+		if (!is_mem_pref_64_type(res->flags))
+			continue;
+
+		do {
+			win = find_next_zero_bit(&phb->ioda.m64win_alloc,
+					phb->ioda.m64_bars, 0);
+
+			if (win >= phb->ioda.m64_bars)
+				goto m64_failed;
+		} while (test_and_set_bit(win, &phb->ioda.m64win_alloc));
+
+		pdn->m64_wins[i] = win;
+
+		/* Map the M64 here */
+		rc = opal_pci_set_phb_mem_window(phb->opal_id,
+						 OPAL_M64_WINDOW_TYPE,
+						 pdn->m64_wins[i],
+						 res->start,
+						 0, /* unused */
+						 resource_size(res));
+		if (rc != OPAL_SUCCESS) {
+			pr_err("Failed to map M64 BAR #%d: %lld\n", win, rc);
+			goto m64_failed;
+		}
+
+		rc = opal_pci_phb_mmio_enable(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 1);
+		if (rc != OPAL_SUCCESS) {
+			pr_err("Failed to enable M64 BAR #%d: %llx\n", win, rc);
+			goto m64_failed;
+		}
+	}
+	return 0;
+
+m64_failed:
+	pnv_pci_vf_release_m64(pdev);
+	return -EBUSY;
+}
+
+static void pnv_pci_release_dev_dma(struct pci_dev *dev, struct pnv_ioda_pe *pe)
+{
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct iommu_table    *tbl;
+	unsigned long         addr;
+
+	bus = dev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	tbl = pe->tce32_table;
+	addr = tbl->it_base;
+
+	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
+	free_pages(addr, get_order(PNV_TCE32_TAB_SIZE));
+	pe->tce32_table = NULL;
+}
+
 static void pnv_pci_create_vf_node(struct pci_dev *dev, u16 vf_num)
 {
 	struct device_node *dn, *p_dn;
@@ -854,23 +987,186 @@ static void pnv_pci_release_vf_node(struct pci_dev *dev, u16 vf_num)
 	}
 }
 
+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
+{
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pnv_ioda_pe    *pe, *pe_n;
+	struct pci_dn         *pdn;
+
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+
+	if (!pdev->is_physfn)
+		return;
+
+	pdn = pci_get_pdn(pdev);
+	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
+		if (pe->parent_dev != pdev)
+			continue;
+
+		pnv_pci_release_dev_dma(pdev, pe);
+
+		/* Remove from list */
+		mutex_lock(&phb->ioda.pe_list_mutex);
+		list_del(&pe->list);
+		mutex_unlock(&phb->ioda.pe_list_mutex);
+
+		pnv_ioda_deconfigure_pe(phb, pe);
+
+		pnv_ioda_free_pe(phb, pe->pe_number);
+	}
+}
+
 int pcibios_sriov_disable(struct pci_dev *pdev)
 {
-	struct pci_sriov *iov;
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pci_dn         *pdn;
+	struct pci_sriov      *iov;
 	u16 vf_num;
 
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
 	iov = pdev->sriov;
 	vf_num = iov->num_VFs;
+
+	/* Release VF PEs */
+	pnv_ioda_release_vf_PE(pdev);
 	pnv_pci_release_vf_node(pdev, vf_num);
 
+	if (phb->type == PNV_PHB_IODA2) {
+		pnv_pci_vf_resource_shift(pdev, -pdn->offset);
+
+		/* Release M64 BARs */
+		pnv_pci_vf_release_m64(pdev);
+
+		/* Release PE numbers */
+		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, vf_num);
+		pdn->offset = 0;
+	}
+
 	return 0;
 }
 
+static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
+				       struct pnv_ioda_pe *pe);
+static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 vf_num)
+{
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pnv_ioda_pe    *pe;
+	int                    pe_num;
+	u16                    vf_index;
+	struct pci_dn         *pdn;
+
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
+
+	if (!pdev->is_physfn)
+		return;
+
+	/* Reserve PE for each VF */
+	for (vf_index = 0; vf_index < vf_num; vf_index++) {
+		pe_num = pdn->offset + vf_index;
+
+		pe = &phb->ioda.pe_array[pe_num];
+		pe->pe_number = pe_num;
+		pe->phb = phb;
+		pe->flags = PNV_IODA_PE_VF;
+		pe->pbus = NULL;
+		pe->parent_dev = pdev;
+		pe->tce32_seg = -1;
+		pe->mve_number = -1;
+		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
+			   pci_iov_virtfn_devfn(pdev, vf_index);
+
+		pe_info(pe, "VF %04d:%02d:%02d.%d associated with PE#%d\n",
+			hose->global_number, pdev->bus->number,
+			PCI_SLOT(pci_iov_virtfn_devfn(pdev, vf_index)),
+			PCI_FUNC(pci_iov_virtfn_devfn(pdev, vf_index)), pe_num);
+
+		if (pnv_ioda_configure_pe(phb, pe)) {
+			/* XXX What do we do here ? */
+			if (pe_num)
+				pnv_ioda_free_pe(phb, pe_num);
+			pe->pdev = NULL;
+			continue;
+		}
+
+		pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
+				GFP_KERNEL, hose->node);
+		pe->tce32_table->data = pe;
+
+		/* Put PE to the list */
+		mutex_lock(&phb->ioda.pe_list_mutex);
+		list_add_tail(&pe->list, &phb->ioda.pe_list);
+		mutex_unlock(&phb->ioda.pe_list_mutex);
+
+		pnv_pci_ioda2_setup_dma_pe(phb, pe);
+
+	}
+}
+
 int pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
 {
+	struct pci_bus        *bus;
+	struct pci_controller *hose;
+	struct pnv_phb        *phb;
+	struct pci_dn         *pdn;
+	int                    ret;
+
+	bus = pdev->bus;
+	hose = pci_bus_to_host(bus);
+	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
+
+	if (phb->type == PNV_PHB_IODA2) {
+		/* Calculate available PE for required VFs */
+		mutex_lock(&phb->ioda.pe_alloc_mutex);
+		pdn->offset = bitmap_find_next_zero_area(
+			phb->ioda.pe_alloc, phb->ioda.total_pe,
+			0, vf_num, 0);
+		if (pdn->offset >= phb->ioda.total_pe) {
+			mutex_unlock(&phb->ioda.pe_alloc_mutex);
+			pr_info("Failed to enable %d VFs, reduce VF number"
+				" and try again\n", vf_num);
+			pdn->offset = 0;
+			return -EBUSY;
+		}
+		bitmap_set(phb->ioda.pe_alloc, pdn->offset, vf_num);
+		mutex_unlock(&phb->ioda.pe_alloc_mutex);
+
+		/* Assign M64 BAR accordingly */
+		ret = pnv_pci_vf_assign_m64(pdev);
+		if (ret) {
+			pr_info("No enough M64 resource\n");
+			goto m64_failed;
+		}
+
+		/* Do some magic shift */
+		pnv_pci_vf_resource_shift(pdev, pdn->offset);
+	}
+
+	/* Setup VF PEs */
 	pnv_pci_create_vf_node(pdev, vf_num);
+	pnv_ioda_setup_vf_PE(pdev, vf_num);
 
 	return 0;
+
+m64_failed:
+	bitmap_clear(phb->ioda.pe_alloc, pdn->offset, vf_num);
+	pdn->offset = 0;
+
+	return ret;
 }
 #endif /* CONFIG_PCI_IOV */
 
@@ -1095,12 +1391,22 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 			       TCE_PCI_SWINV_PAIR;
 	}
 	iommu_init_table(tbl, phb->hose->node);
-	iommu_register_group(tbl, pci_domain_nr(pe->pbus), pe->pe_number);
 
-	if (pe->pdev)
+	if (pe->flags & PNV_IODA_PE_DEV) {
+		iommu_register_group(tbl, pci_domain_nr(pe->pdev->bus),
+				pe->pe_number);
 		set_iommu_table_base_and_group(&pe->pdev->dev, tbl);
-	else
+	}
+	else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) {
+		iommu_register_group(tbl, pci_domain_nr(pe->pbus),
+				pe->pe_number);
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
+	}
+#ifdef CONFIG_PCI_IOV
+	else if (pe->flags & PNV_IODA_PE_VF)
+		iommu_register_group(tbl, pci_domain_nr(pe->parent_dev->bus),
+				pe->pe_number);
+#endif /* CONFIG_PCI_IOV */
 
 	return;
  fail:
@@ -1223,12 +1529,22 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 		tbl->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE;
 	}
 	iommu_init_table(tbl, phb->hose->node);
-	iommu_register_group(tbl, pci_domain_nr(pe->pbus), pe->pe_number);
 
-	if (pe->pdev)
+	if (pe->flags & PNV_IODA_PE_DEV) {
+		iommu_register_group(tbl, pci_domain_nr(pe->pdev->bus),
+				pe->pe_number);
 		set_iommu_table_base_and_group(&pe->pdev->dev, tbl);
-	else
+	}
+	else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) {
+		iommu_register_group(tbl, pci_domain_nr(pe->pbus),
+				pe->pe_number);
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
+	}
+#ifdef CONFIG_PCI_IOV
+	else if (pe->flags & PNV_IODA_PE_VF)
+		iommu_register_group(tbl, pci_domain_nr(pe->parent_dev->bus),
+				pe->pe_number);
+#endif /* CONFIG_PCI_IOV */
 
 	/* Also create a bypass window */
 	pnv_pci_ioda2_setup_bypass_pe(phb, pe);
@@ -1813,6 +2129,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb->hub_id = hub_id;
 	phb->opal_id = phb_id;
 	phb->type = ioda_type;
+	mutex_init(&phb->ioda.pe_alloc_mutex);
 
 	/* Detect specific models for error handling */
 	if (of_device_is_compatible(np, "ibm,p7ioc-pciex"))
@@ -1873,6 +2190,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 
 	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
+	mutex_init(&phb->ioda.pe_list_mutex);
 
 	/* Calculate how many 32-bit TCE segments we have */
 	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> PNV_TCE32_SEG_SHIFT;
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 9fbf7c0..e3ca524 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -28,8 +28,9 @@ static inline bool is_mem_pref_64_type(unsigned long flags)
 
 #define PNV_PCI_DIAG_BUF_SIZE	8192
 #define PNV_IODA_PE_DEV		(1 << 0)	/* PE has single PCI device	*/
-#define PNV_IODA_PE_BUS		(1 << 1)	/* PE has primary PCI bus	*/
-#define PNV_IODA_PE_BUS_ALL	(1 << 2)	/* PE has subordinate buses	*/
+#define PNV_IODA_PE_VF		(1 << 1)	/* PE for one VF 		*/
+#define PNV_IODA_PE_BUS		(1 << 2)	/* PE has primary PCI bus	*/
+#define PNV_IODA_PE_BUS_ALL	(1 << 3)	/* PE has subordinate buses	*/
 
 /* Data associated with a PE, including IOMMU tracking etc.. */
 struct pnv_phb;
@@ -41,6 +42,9 @@ struct pnv_ioda_pe {
 	 * entire bus (& children). In the former case, pdev
 	 * is populated, in the later case, pbus is.
 	 */
+#ifdef CONFIG_PCI_IOV
+	struct pci_dev          *parent_dev;
+#endif
 	struct pci_dev		*pdev;
 	struct pci_bus		*pbus;
 
@@ -156,6 +160,7 @@ struct pnv_phb {
 
 			/* PE allocation bitmap */
 			unsigned long		*pe_alloc;
+			struct mutex             pe_alloc_mutex;
 
 			/* M64 window allocation bitmap */
 			unsigned long		m64win_alloc;
@@ -174,6 +179,7 @@ struct pnv_phb {
 			 * on the sequence of creation
 			 */
 			struct list_head	pe_list;
+			struct mutex            pe_list_mutex;
 
 			/* Reverse map of PEs, will have to extend if
 			 * we are to support more than 256 PEs, indexed
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 16/17] ppc/pci: Expanding IOV BAR, with m64_per_iov supported
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

M64 aperture size is limited on PHB3. When the IOV BAR is too big, this will
exceed the limitation and failed to be assigned.

This patch introduce a different expanding based on the IOV BAR size:

IOV BAR size is smaller than 64M, expand to total_pe.
IOV BAR size is bigger than 64M, roundup power2.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |    2 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   28 ++++++++++++++++++++++++++--
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 72f0af5..36b88e4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -171,6 +171,8 @@ struct pci_dn {
 #ifdef CONFIG_PCI_IOV
 	u16     vfs;
 	int     offset;
+#define M64_PER_IOV 4
+	int     m64_per_iov;
 	int     m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index fb2c2c6..98fc163 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1756,6 +1756,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	int i;
 	resource_size_t size;
 	struct pci_dn *pdn;
+	int mul, total_vfs;
 
 	if (!pdev->is_physfn || pdev->is_added)
 		return;
@@ -1775,6 +1776,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	pdn = pci_get_pdn(pdev);
 	pdn->vfs = 0;
 
+	total_vfs = pci_sriov_get_totalvfs(pdev);
+	pdn->m64_per_iov = 1;
+	mul = phb->ioda.total_pe;
+
 	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
 		res = &pdev->resource[i];
 		if (!res->flags || res->parent)
@@ -1783,13 +1788,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		if (!is_mem_pref_64_type(res->flags))
 			continue;
 
+		size = pci_sriov_resource_size(pdev, i);
+
+		/* bigger than 64M */
+		if (size > (1 << 26)) {
+			dev_info(&pdev->dev, "PowerNV: VF BAR[%d] size "
+					"is bigger than 64M, roundup power2\n", i);
+			pdn->m64_per_iov = M64_PER_IOV;
+			mul = __roundup_pow_of_two(total_vfs);
+			break;
+		}
+	}
+
+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
+		res = &pdev->resource[i];
+		if (!res->flags || res->parent)
+			continue;
+		if (!is_mem_pref_64_type(res->flags))
+			continue;
+
 		dev_info(&pdev->dev, "PowerNV: Fixing VF BAR[%d] %pR to\n",
 				i, res);
 		size = pci_sriov_resource_size(pdev, i);
-		res->end = res->start + size * phb->ioda.total_pe - 1;
+		res->end = res->start + size * mul - 1;
 		dev_info(&pdev->dev, "                       %pR\n", res);
 	}
-	pdn->vfs = phb->ioda.total_pe;
+	pdn->vfs = mul;
 }
 
 static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 16/17] ppc/pci: Expanding IOV BAR, with m64_per_iov supported
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

M64 aperture size is limited on PHB3. When the IOV BAR is too big, this will
exceed the limitation and failed to be assigned.

This patch introduce a different expanding based on the IOV BAR size:

IOV BAR size is smaller than 64M, expand to total_pe.
IOV BAR size is bigger than 64M, roundup power2.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |    2 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   28 ++++++++++++++++++++++++++--
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 72f0af5..36b88e4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -171,6 +171,8 @@ struct pci_dn {
 #ifdef CONFIG_PCI_IOV
 	u16     vfs;
 	int     offset;
+#define M64_PER_IOV 4
+	int     m64_per_iov;
 	int     m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index fb2c2c6..98fc163 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1756,6 +1756,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	int i;
 	resource_size_t size;
 	struct pci_dn *pdn;
+	int mul, total_vfs;
 
 	if (!pdev->is_physfn || pdev->is_added)
 		return;
@@ -1775,6 +1776,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	pdn = pci_get_pdn(pdev);
 	pdn->vfs = 0;
 
+	total_vfs = pci_sriov_get_totalvfs(pdev);
+	pdn->m64_per_iov = 1;
+	mul = phb->ioda.total_pe;
+
 	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
 		res = &pdev->resource[i];
 		if (!res->flags || res->parent)
@@ -1783,13 +1788,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		if (!is_mem_pref_64_type(res->flags))
 			continue;
 
+		size = pci_sriov_resource_size(pdev, i);
+
+		/* bigger than 64M */
+		if (size > (1 << 26)) {
+			dev_info(&pdev->dev, "PowerNV: VF BAR[%d] size "
+					"is bigger than 64M, roundup power2\n", i);
+			pdn->m64_per_iov = M64_PER_IOV;
+			mul = __roundup_pow_of_two(total_vfs);
+			break;
+		}
+	}
+
+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
+		res = &pdev->resource[i];
+		if (!res->flags || res->parent)
+			continue;
+		if (!is_mem_pref_64_type(res->flags))
+			continue;
+
 		dev_info(&pdev->dev, "PowerNV: Fixing VF BAR[%d] %pR to\n",
 				i, res);
 		size = pci_sriov_resource_size(pdev, i);
-		res->end = res->start + size * phb->ioda.total_pe - 1;
+		res->end = res->start + size * mul - 1;
 		dev_info(&pdev->dev, "                       %pR\n", res);
 	}
-	pdn->vfs = phb->ioda.total_pe;
+	pdn->vfs = mul;
 }
 
 static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 17/17] ppc/pnv: Group VF PE when IOV BAR is big on PHB3
  2014-06-10  1:56 ` Wei Yang
@ 2014-06-10  1:56   ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu, Wei Yang

When IOV BAR is big, each of it is covered by 4 M64 window. This leads to
several VF PE sits in one PE in terms of M64.

This patch group VF PEs according to the M64 allocation.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |    2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  183 +++++++++++++++++++++++------
 2 files changed, 145 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 36b88e4..f0a21f5 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -173,7 +173,7 @@ struct pci_dn {
 	int     offset;
 #define M64_PER_IOV 4
 	int     m64_per_iov;
-	int     m64_wins[PCI_SRIOV_NUM_BARS];
+	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
 #endif /* CONFIG_PCI_IOV */
 #endif
 };
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 98fc163..86688cd 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -804,26 +804,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
 	struct pci_dn         *pdn;
-	int                    i;
+	int                    i, j;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
 	phb = hose->private_data;
 	pdn = pci_get_pdn(pdev);
 
-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-		if (pdn->m64_wins[i] == -1)
-			continue;
-		opal_pci_phb_mmio_enable(phb->opal_id,
-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 0);
-		clear_bit(pdn->m64_wins[i], &phb->ioda.m64win_alloc);
-		pdn->m64_wins[i] = -1;
-	}
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+		for (j = 0; j < M64_PER_IOV; j++) {
+			if (pdn->m64_wins[i][j] == -1)
+				continue;
+			opal_pci_phb_mmio_enable(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
+			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64win_alloc);
+			pdn->m64_wins[i][j] = -1;
+		}
 
 	return 0;
 }
 
-static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 vf_num)
 {
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
@@ -831,17 +832,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
 	struct pci_dn         *pdn;
 	unsigned int           win;
 	struct resource       *res;
-	int                    i;
+	int                    i, j;
 	int64_t                rc;
+	int                    total_vfs;
+	resource_size_t        size, start;
+	int                    pe_num;
+	int                    vf_groups;
+	int                    vf_per_group;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
 	phb = hose->private_data;
 	pdn = pci_get_pdn(pdev);
+	total_vfs = pci_sriov_get_totalvfs(pdev);
 
 	/* Initialize the m64_wins to -1 */
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-		pdn->m64_wins[i] = -1;
+		for (j = 0; j < 4; j++)
+			pdn->m64_wins[i][j] = -1;
+
+	if (pdn->m64_per_iov == M64_PER_IOV) {
+		vf_groups = (vf_num <= M64_PER_IOV) ? vf_num: M64_PER_IOV;
+		vf_per_group = (vf_num <= M64_PER_IOV)? 1:
+			__roundup_pow_of_two(vf_num) / pdn->m64_per_iov;
+	} else {
+		vf_groups = 1;
+		vf_per_group = 1;
+	}
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = pdev->resource + PCI_IOV_RESOURCES + i;
@@ -851,33 +868,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
 		if (!is_mem_pref_64_type(res->flags))
 			continue;
 
-		do {
-			win = find_next_zero_bit(&phb->ioda.m64win_alloc,
-					phb->ioda.m64_bars, 0);
-
-			if (win >= phb->ioda.m64_bars)
-				goto m64_failed;
-		} while (test_and_set_bit(win, &phb->ioda.m64win_alloc));
+		for (j = 0; j < vf_groups; j++) {
+			do {
+				win = find_next_zero_bit(&phb->ioda.m64win_alloc,
+						phb->ioda.m64_bars, 0);
+
+				if (win >= phb->ioda.m64_bars)
+					goto m64_failed;
+			} while (test_and_set_bit(win, &phb->ioda.m64win_alloc));
+
+			pdn->m64_wins[i][j] = win;
+
+			if (pdn->m64_per_iov == M64_PER_IOV) {
+				size = pci_sriov_resource_size(pdev,
+						PCI_IOV_RESOURCES + i);
+				size = size * vf_per_group;
+				start = res->start + size * j;
+			} else {
+				size = resource_size(res);
+				start = res->start;
+			}
 
-		pdn->m64_wins[i] = win;
+			/* Map the M64 here */
+			if (pdn->m64_per_iov == M64_PER_IOV) {
+				pe_num = pdn->offset + j;
+				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+						pe_num, OPAL_M64_WINDOW_TYPE,
+						pdn->m64_wins[i][j], 0);
+			}
 
-		/* Map the M64 here */
-		rc = opal_pci_set_phb_mem_window(phb->opal_id,
+			rc = opal_pci_set_phb_mem_window(phb->opal_id,
 						 OPAL_M64_WINDOW_TYPE,
-						 pdn->m64_wins[i],
-						 res->start,
+						 pdn->m64_wins[i][j],
+						 start,
 						 0, /* unused */
-						 resource_size(res));
-		if (rc != OPAL_SUCCESS) {
-			pr_err("Failed to map M64 BAR #%d: %lld\n", win, rc);
-			goto m64_failed;
-		}
+						 size);
 
-		rc = opal_pci_phb_mmio_enable(phb->opal_id,
-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 1);
-		if (rc != OPAL_SUCCESS) {
-			pr_err("Failed to enable M64 BAR #%d: %llx\n", win, rc);
-			goto m64_failed;
+
+			if (rc != OPAL_SUCCESS) {
+				pr_err("Failed to set M64 BAR #%d: %lld\n",
+						win, rc);
+				goto m64_failed;
+			}
+
+			if (pdn->m64_per_iov == M64_PER_IOV)
+				rc = opal_pci_phb_mmio_enable(phb->opal_id,
+				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
+			else
+				rc = opal_pci_phb_mmio_enable(phb->opal_id,
+				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
+
+			if (rc != OPAL_SUCCESS) {
+				pr_err("Failed to enable M64 BAR #%d: %llx\n",
+						win, rc);
+				goto m64_failed;
+			}
 		}
 	}
 	return 0;
@@ -987,21 +1032,51 @@ static void pnv_pci_release_vf_node(struct pci_dev *dev, u16 vf_num)
 	}
 }
 
-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 vf_num)
 {
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
 	struct pnv_ioda_pe    *pe, *pe_n;
 	struct pci_dn         *pdn;
+	u16                    vf_index;
+	int64_t                rc;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
 	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
 
 	if (!pdev->is_physfn)
 		return;
 
+	if (pdn->m64_per_iov == M64_PER_IOV && vf_num > M64_PER_IOV) {
+		int   vf_group;
+		int   vf_per_group;
+		int   vf_index1;
+
+		vf_per_group = __roundup_pow_of_two(vf_num) / pdn->m64_per_iov;
+
+		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
+			for (vf_index = vf_group * vf_per_group;
+				vf_index < (vf_group + 1) * vf_per_group;
+				vf_index++)
+				for (vf_index1 = vf_group * vf_per_group;
+					vf_index1 < (vf_group + 1) * vf_per_group;
+					vf_index1++){
+
+					rc = opal_pci_set_peltv(phb->opal_id,
+						pdn->offset + vf_index,
+						pdn->offset + vf_index1,
+						OPAL_REMOVE_PE_FROM_DOMAIN);
+
+					if (rc)
+					    pr_warn("%s: Failed to unlink same"
+						" group PE#%d(%lld)\n", __func__,
+						pdn->offset + vf_index1, rc);
+				}
+	}
+
 	pdn = pci_get_pdn(pdev);
 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
 		if (pe->parent_dev != pdev)
@@ -1037,11 +1112,12 @@ int pcibios_sriov_disable(struct pci_dev *pdev)
 	vf_num = iov->num_VFs;
 
 	/* Release VF PEs */
-	pnv_ioda_release_vf_PE(pdev);
+	pnv_ioda_release_vf_PE(pdev, vf_num);
 	pnv_pci_release_vf_node(pdev, vf_num);
 
 	if (phb->type == PNV_PHB_IODA2) {
-		pnv_pci_vf_resource_shift(pdev, -pdn->offset);
+		if (pdn->m64_per_iov == 1)
+			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
 
 		/* Release M64 BARs */
 		pnv_pci_vf_release_m64(pdev);
@@ -1065,6 +1141,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 vf_num)
 	int                    pe_num;
 	u16                    vf_index;
 	struct pci_dn         *pdn;
+	int64_t                rc;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1112,7 +1189,34 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 vf_num)
 		mutex_unlock(&phb->ioda.pe_list_mutex);
 
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
+	}
 
+	if (pdn->m64_per_iov == M64_PER_IOV && vf_num > M64_PER_IOV) {
+		int   vf_group;
+		int   vf_per_group;
+		int   vf_index1;
+
+		vf_per_group = __roundup_pow_of_two(vf_num) / pdn->m64_per_iov;
+
+		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
+			for (vf_index = vf_group * vf_per_group;
+				vf_index < (vf_group + 1) * vf_per_group;
+				vf_index++)
+				for (vf_index1 = vf_group * vf_per_group;
+					vf_index1 < (vf_group + 1) * vf_per_group;
+					vf_index1++) {
+
+					rc = opal_pci_set_peltv(phb->opal_id,
+						pdn->offset + vf_index,
+						pdn->offset + vf_index1,
+						OPAL_ADD_PE_TO_DOMAIN);
+
+					if (rc)
+					    pr_warn("%s: Failed to link same "
+						"group PE#%d(%lld)\n",
+						__func__,
+						pdn->offset + vf_index1, rc);
+			}
 	}
 }
 
@@ -1146,14 +1250,15 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
 		mutex_unlock(&phb->ioda.pe_alloc_mutex);
 
 		/* Assign M64 BAR accordingly */
-		ret = pnv_pci_vf_assign_m64(pdev);
+		ret = pnv_pci_vf_assign_m64(pdev, vf_num);
 		if (ret) {
 			pr_info("No enough M64 resource\n");
 			goto m64_failed;
 		}
 
 		/* Do some magic shift */
-		pnv_pci_vf_resource_shift(pdev, pdn->offset);
+		if (pdn->m64_per_iov == 1)
+			pnv_pci_vf_resource_shift(pdev, pdn->offset);
 	}
 
 	/* Setup VF PEs */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [RFC PATCH V3 17/17] ppc/pnv: Group VF PE when IOV BAR is big on PHB3
@ 2014-06-10  1:56   ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-10  1:56 UTC (permalink / raw)
  To: benh; +Cc: Wei Yang, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

When IOV BAR is big, each of it is covered by 4 M64 window. This leads to
several VF PE sits in one PE in terms of M64.

This patch group VF PEs according to the M64 allocation.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |    2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  183 +++++++++++++++++++++++------
 2 files changed, 145 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 36b88e4..f0a21f5 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -173,7 +173,7 @@ struct pci_dn {
 	int     offset;
 #define M64_PER_IOV 4
 	int     m64_per_iov;
-	int     m64_wins[PCI_SRIOV_NUM_BARS];
+	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
 #endif /* CONFIG_PCI_IOV */
 #endif
 };
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 98fc163..86688cd 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -804,26 +804,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
 	struct pci_dn         *pdn;
-	int                    i;
+	int                    i, j;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
 	phb = hose->private_data;
 	pdn = pci_get_pdn(pdev);
 
-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-		if (pdn->m64_wins[i] == -1)
-			continue;
-		opal_pci_phb_mmio_enable(phb->opal_id,
-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 0);
-		clear_bit(pdn->m64_wins[i], &phb->ioda.m64win_alloc);
-		pdn->m64_wins[i] = -1;
-	}
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+		for (j = 0; j < M64_PER_IOV; j++) {
+			if (pdn->m64_wins[i][j] == -1)
+				continue;
+			opal_pci_phb_mmio_enable(phb->opal_id,
+				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
+			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64win_alloc);
+			pdn->m64_wins[i][j] = -1;
+		}
 
 	return 0;
 }
 
-static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 vf_num)
 {
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
@@ -831,17 +832,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
 	struct pci_dn         *pdn;
 	unsigned int           win;
 	struct resource       *res;
-	int                    i;
+	int                    i, j;
 	int64_t                rc;
+	int                    total_vfs;
+	resource_size_t        size, start;
+	int                    pe_num;
+	int                    vf_groups;
+	int                    vf_per_group;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
 	phb = hose->private_data;
 	pdn = pci_get_pdn(pdev);
+	total_vfs = pci_sriov_get_totalvfs(pdev);
 
 	/* Initialize the m64_wins to -1 */
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-		pdn->m64_wins[i] = -1;
+		for (j = 0; j < 4; j++)
+			pdn->m64_wins[i][j] = -1;
+
+	if (pdn->m64_per_iov == M64_PER_IOV) {
+		vf_groups = (vf_num <= M64_PER_IOV) ? vf_num: M64_PER_IOV;
+		vf_per_group = (vf_num <= M64_PER_IOV)? 1:
+			__roundup_pow_of_two(vf_num) / pdn->m64_per_iov;
+	} else {
+		vf_groups = 1;
+		vf_per_group = 1;
+	}
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = pdev->resource + PCI_IOV_RESOURCES + i;
@@ -851,33 +868,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
 		if (!is_mem_pref_64_type(res->flags))
 			continue;
 
-		do {
-			win = find_next_zero_bit(&phb->ioda.m64win_alloc,
-					phb->ioda.m64_bars, 0);
-
-			if (win >= phb->ioda.m64_bars)
-				goto m64_failed;
-		} while (test_and_set_bit(win, &phb->ioda.m64win_alloc));
+		for (j = 0; j < vf_groups; j++) {
+			do {
+				win = find_next_zero_bit(&phb->ioda.m64win_alloc,
+						phb->ioda.m64_bars, 0);
+
+				if (win >= phb->ioda.m64_bars)
+					goto m64_failed;
+			} while (test_and_set_bit(win, &phb->ioda.m64win_alloc));
+
+			pdn->m64_wins[i][j] = win;
+
+			if (pdn->m64_per_iov == M64_PER_IOV) {
+				size = pci_sriov_resource_size(pdev,
+						PCI_IOV_RESOURCES + i);
+				size = size * vf_per_group;
+				start = res->start + size * j;
+			} else {
+				size = resource_size(res);
+				start = res->start;
+			}
 
-		pdn->m64_wins[i] = win;
+			/* Map the M64 here */
+			if (pdn->m64_per_iov == M64_PER_IOV) {
+				pe_num = pdn->offset + j;
+				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+						pe_num, OPAL_M64_WINDOW_TYPE,
+						pdn->m64_wins[i][j], 0);
+			}
 
-		/* Map the M64 here */
-		rc = opal_pci_set_phb_mem_window(phb->opal_id,
+			rc = opal_pci_set_phb_mem_window(phb->opal_id,
 						 OPAL_M64_WINDOW_TYPE,
-						 pdn->m64_wins[i],
-						 res->start,
+						 pdn->m64_wins[i][j],
+						 start,
 						 0, /* unused */
-						 resource_size(res));
-		if (rc != OPAL_SUCCESS) {
-			pr_err("Failed to map M64 BAR #%d: %lld\n", win, rc);
-			goto m64_failed;
-		}
+						 size);
 
-		rc = opal_pci_phb_mmio_enable(phb->opal_id,
-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 1);
-		if (rc != OPAL_SUCCESS) {
-			pr_err("Failed to enable M64 BAR #%d: %llx\n", win, rc);
-			goto m64_failed;
+
+			if (rc != OPAL_SUCCESS) {
+				pr_err("Failed to set M64 BAR #%d: %lld\n",
+						win, rc);
+				goto m64_failed;
+			}
+
+			if (pdn->m64_per_iov == M64_PER_IOV)
+				rc = opal_pci_phb_mmio_enable(phb->opal_id,
+				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
+			else
+				rc = opal_pci_phb_mmio_enable(phb->opal_id,
+				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
+
+			if (rc != OPAL_SUCCESS) {
+				pr_err("Failed to enable M64 BAR #%d: %llx\n",
+						win, rc);
+				goto m64_failed;
+			}
 		}
 	}
 	return 0;
@@ -987,21 +1032,51 @@ static void pnv_pci_release_vf_node(struct pci_dev *dev, u16 vf_num)
 	}
 }
 
-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 vf_num)
 {
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
 	struct pnv_ioda_pe    *pe, *pe_n;
 	struct pci_dn         *pdn;
+	u16                    vf_index;
+	int64_t                rc;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
 	phb = hose->private_data;
+	pdn = pci_get_pdn(pdev);
 
 	if (!pdev->is_physfn)
 		return;
 
+	if (pdn->m64_per_iov == M64_PER_IOV && vf_num > M64_PER_IOV) {
+		int   vf_group;
+		int   vf_per_group;
+		int   vf_index1;
+
+		vf_per_group = __roundup_pow_of_two(vf_num) / pdn->m64_per_iov;
+
+		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
+			for (vf_index = vf_group * vf_per_group;
+				vf_index < (vf_group + 1) * vf_per_group;
+				vf_index++)
+				for (vf_index1 = vf_group * vf_per_group;
+					vf_index1 < (vf_group + 1) * vf_per_group;
+					vf_index1++){
+
+					rc = opal_pci_set_peltv(phb->opal_id,
+						pdn->offset + vf_index,
+						pdn->offset + vf_index1,
+						OPAL_REMOVE_PE_FROM_DOMAIN);
+
+					if (rc)
+					    pr_warn("%s: Failed to unlink same"
+						" group PE#%d(%lld)\n", __func__,
+						pdn->offset + vf_index1, rc);
+				}
+	}
+
 	pdn = pci_get_pdn(pdev);
 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
 		if (pe->parent_dev != pdev)
@@ -1037,11 +1112,12 @@ int pcibios_sriov_disable(struct pci_dev *pdev)
 	vf_num = iov->num_VFs;
 
 	/* Release VF PEs */
-	pnv_ioda_release_vf_PE(pdev);
+	pnv_ioda_release_vf_PE(pdev, vf_num);
 	pnv_pci_release_vf_node(pdev, vf_num);
 
 	if (phb->type == PNV_PHB_IODA2) {
-		pnv_pci_vf_resource_shift(pdev, -pdn->offset);
+		if (pdn->m64_per_iov == 1)
+			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
 
 		/* Release M64 BARs */
 		pnv_pci_vf_release_m64(pdev);
@@ -1065,6 +1141,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 vf_num)
 	int                    pe_num;
 	u16                    vf_index;
 	struct pci_dn         *pdn;
+	int64_t                rc;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1112,7 +1189,34 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 vf_num)
 		mutex_unlock(&phb->ioda.pe_list_mutex);
 
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
+	}
 
+	if (pdn->m64_per_iov == M64_PER_IOV && vf_num > M64_PER_IOV) {
+		int   vf_group;
+		int   vf_per_group;
+		int   vf_index1;
+
+		vf_per_group = __roundup_pow_of_two(vf_num) / pdn->m64_per_iov;
+
+		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
+			for (vf_index = vf_group * vf_per_group;
+				vf_index < (vf_group + 1) * vf_per_group;
+				vf_index++)
+				for (vf_index1 = vf_group * vf_per_group;
+					vf_index1 < (vf_group + 1) * vf_per_group;
+					vf_index1++) {
+
+					rc = opal_pci_set_peltv(phb->opal_id,
+						pdn->offset + vf_index,
+						pdn->offset + vf_index1,
+						OPAL_ADD_PE_TO_DOMAIN);
+
+					if (rc)
+					    pr_warn("%s: Failed to link same "
+						"group PE#%d(%lld)\n",
+						__func__,
+						pdn->offset + vf_index1, rc);
+			}
 	}
 }
 
@@ -1146,14 +1250,15 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
 		mutex_unlock(&phb->ioda.pe_alloc_mutex);
 
 		/* Assign M64 BAR accordingly */
-		ret = pnv_pci_vf_assign_m64(pdev);
+		ret = pnv_pci_vf_assign_m64(pdev, vf_num);
 		if (ret) {
 			pr_info("No enough M64 resource\n");
 			goto m64_failed;
 		}
 
 		/* Do some magic shift */
-		pnv_pci_vf_resource_shift(pdev, pdn->offset);
+		if (pdn->m64_per_iov == 1)
+			pnv_pci_vf_resource_shift(pdev, pdn->offset);
 	}
 
 	/* Setup VF PEs */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 14/17] ppc/pci: create/release dev-tree node for VFs
  2014-06-10  1:56   ` Wei Yang
  (?)
@ 2014-06-18 18:26   ` Grant Likely
  2014-06-18 20:51       ` Benjamin Herrenschmidt
  2014-06-19  2:46     ` Wei Yang
  -1 siblings, 2 replies; 100+ messages in thread
From: Grant Likely @ 2014-06-18 18:26 UTC (permalink / raw)
  To: Wei Yang
  Cc: Benjamin Herrenschmidt, linux-pci, gwshan, Mike Qiu,
	Bjorn Helgaas, yan, linuxppc-dev

On Tue, Jun 10, 2014 at 2:56 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> Currently, powernv platform is not aware of VFs. This means no dev-node
> represents a VF. Also, VF PCI device is created when PF driver want to enable
> it. This leads to the pdn->pdev and pdn->pe_number an invalid value.
>
> This patch create/release dev-node for VF and fixs this when a VF's pci_dev
> is created.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>

I don't think this is the right way to handle this. Unless it is a
fixup to a buggy devicetree provided by firmware, I don't want to see
any code modifying the devicetree to describe stuff that is able to be
directly enumerated. Really the pci code should handle the lack of a
device_node gracefully. If it cannot then it should be fixed.

g.

> ---
>  arch/powerpc/platforms/powernv/Kconfig    |    1 +
>  arch/powerpc/platforms/powernv/pci-ioda.c |  103 +++++++++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/pci.c      |   20 ++++++
>  3 files changed, 124 insertions(+)
>
> diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
> index 895e8a2..0dd331b 100644
> --- a/arch/powerpc/platforms/powernv/Kconfig
> +++ b/arch/powerpc/platforms/powernv/Kconfig
> @@ -11,6 +11,7 @@ config PPC_POWERNV
>         select PPC_UDBG_16550
>         select PPC_SCOM
>         select ARCH_RANDOM
> +       select OF_DYNAMIC
>         default y
>
>  config PPC_POWERNV_RTAS
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index e46c5bf..9ace027 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -23,6 +23,7 @@
>  #include <linux/io.h>
>  #include <linux/msi.h>
>  #include <linux/memblock.h>
> +#include <linux/of_pci.h>
>
>  #include <asm/sections.h>
>  #include <asm/io.h>
> @@ -771,6 +772,108 @@ static void pnv_pci_ioda_setup_PEs(void)
>         }
>  }
>
> +#ifdef CONFIG_PCI_IOV
> +static void pnv_pci_create_vf_node(struct pci_dev *dev, u16 vf_num)
> +{
> +       struct device_node *dn, *p_dn;
> +       struct pci_dn *pdn;
> +       struct pci_controller *hose;
> +       struct property *pp;
> +       void* value;
> +       u16 id;
> +
> +       hose = pci_bus_to_host(dev->bus);
> +
> +       /* Create dev-tree node for VFs if this is a PF */
> +       p_dn = pci_bus_to_OF_node(dev->bus);
> +       if (p_dn == NULL) {
> +               dev_err(&dev->dev, "SRIOV: VF bus NULL device node\n");
> +               return;
> +       }
> +
> +       for (id = 0; id < vf_num; id++) {
> +               dn = kzalloc(sizeof(*dn), GFP_KERNEL);
> +               pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
> +               pp  = kzalloc(sizeof(*pp), GFP_KERNEL);
> +               value = kzalloc(sizeof(u32), GFP_KERNEL);
> +
> +               if (!dn || !pdn || !pp || !value) {
> +                       kfree(dn);
> +                       kfree(pdn);
> +                       kfree(pp);
> +                       kfree(value);
> +                       dev_warn(&dev->dev, "%s: failed to create"
> +                               "dev-tree node for idx(%d)\n",
> +                               __func__, id);
> +
> +                       break;
> +               }
> +
> +               pp->value = value;
> +               pdn->node = dn;
> +               pdn->devfn = pci_iov_virtfn_devfn(dev, id);
> +               pdn->busno = dev->bus->number;
> +               pdn->pe_number = IODA_INVALID_PE;
> +               pdn->phb = hose;
> +
> +               dn->data = pdn;
> +               kref_init(&dn->kref);
> +               dn->full_name = dn->name =
> +                       kasprintf(GFP_KERNEL, "%s/vf%d",
> +                               p_dn->full_name, pdn->devfn);
> +               dn->parent = p_dn;
> +
> +               pp->name = kasprintf(GFP_KERNEL, "reg");
> +               pp->length = 5 * sizeof(__be32);
> +               *(u32*)pp->value = cpu_to_be32(pdn->devfn) << 8;
> +               dn->properties = pp;
> +
> +               of_attach_node(dn);
> +       }
> +}
> +
> +static void pnv_pci_release_vf_node(struct pci_dev *dev, u16 vf_num)
> +{
> +       struct device_node *dn;
> +       struct property *pp;
> +       u16 id;
> +
> +       for (id = 0; id < vf_num; id++) {
> +               dn = of_pci_find_child_device(dev->bus->dev.of_node,
> +                               pci_iov_virtfn_devfn(dev, id));
> +               if (!dn)
> +                       continue;
> +
> +               of_detach_node(dn);
> +               pp = dn->properties;
> +               kfree(pp->name);
> +               kfree(pp->value);
> +               kfree(pp);
> +               kfree(dn->data);
> +               kfree(dn);
> +       }
> +}
> +
> +int pcibios_sriov_disable(struct pci_dev *pdev)
> +{
> +       struct pci_sriov *iov;
> +       u16 vf_num;
> +
> +       iov = pdev->sriov;
> +       vf_num = iov->num_VFs;
> +       pnv_pci_release_vf_node(pdev, vf_num);
> +
> +       return 0;
> +}
> +
> +int pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
> +{
> +       pnv_pci_create_vf_node(pdev, vf_num);
> +
> +       return 0;
> +}
> +#endif /* CONFIG_PCI_IOV */
> +
>  static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev)
>  {
>         struct pci_dn *pdn = pci_get_pdn(pdev);
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index 687a068..43fcc73 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -654,6 +654,26 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
>  {
>         struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>         struct pnv_phb *phb = hose->private_data;
> +#ifdef CONFIG_PCI_IOV
> +       struct pnv_ioda_pe *pe;
> +       struct pci_dn *pdn;
> +
> +       /* Fix the VF pdn PE number */
> +       if (pdev->is_virtfn) {
> +               pdn = pci_get_pdn(pdev);
> +               if (pdn->pcidev == NULL || pdn->pe_number == IODA_INVALID_PE) {
> +                       list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> +                               if (pe->rid ==
> +                                       ((pdev->bus->number << 8) | (pdev->devfn & 0xff))) {
> +                                       pdn->pcidev = pdev;
> +                                       pdn->pe_number = pe->pe_number;
> +                                       pe->pdev = pdev;
> +                                       break;
> +                               }
> +                       }
> +               }
> +       }
> +#endif /* CONFIG_PCI_IOV */
>
>         /* If we have no phb structure, try to setup a fallback based on
>          * the device-tree (RTAS PCI for example)
> --
> 1.7.9.5
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 14/17] ppc/pci: create/release dev-tree node for VFs
  2014-06-18 18:26   ` Grant Likely
@ 2014-06-18 20:51       ` Benjamin Herrenschmidt
  2014-06-19  2:46     ` Wei Yang
  1 sibling, 0 replies; 100+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-18 20:51 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-pci, gwshan, Mike Qiu, Bjorn Helgaas, yan, linuxppc-dev,
	Grant Likely

On Wed, 2014-06-18 at 19:26 +0100, Grant Likely wrote:
> I don't think this is the right way to handle this. Unless it is a
> fixup to a buggy devicetree provided by firmware, I don't want to see
> any code modifying the devicetree to describe stuff that is able to be
> directly enumerated. Really the pci code should handle the lack of a
> device_node gracefully. If it cannot then it should be fixed.

Right, I've long said that we need to get rid of that "pci_dn" structure
we've been carrying around forever on ppc64.

Any auxiliary data structures we keep around associated with a PCI
device should be pointed to by the pci_dev itself, possibly using
firmware_data or similar.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 14/17] ppc/pci: create/release dev-tree node for VFs
@ 2014-06-18 20:51       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 100+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-18 20:51 UTC (permalink / raw)
  To: Wei Yang; +Cc: linux-pci, gwshan, Mike Qiu, Bjorn Helgaas, yan, linuxppc-dev

On Wed, 2014-06-18 at 19:26 +0100, Grant Likely wrote:
> I don't think this is the right way to handle this. Unless it is a
> fixup to a buggy devicetree provided by firmware, I don't want to see
> any code modifying the devicetree to describe stuff that is able to be
> directly enumerated. Really the pci code should handle the lack of a
> device_node gracefully. If it cannot then it should be fixed.

Right, I've long said that we need to get rid of that "pci_dn" structure
we've been carrying around forever on ppc64.

Any auxiliary data structures we keep around associated with a PCI
device should be pointed to by the pci_dev itself, possibly using
firmware_data or similar.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 14/17] ppc/pci: create/release dev-tree node for VFs
  2014-06-18 18:26   ` Grant Likely
  2014-06-18 20:51       ` Benjamin Herrenschmidt
@ 2014-06-19  2:46     ` Wei Yang
  2014-06-19  8:30       ` Grant Likely
  1 sibling, 1 reply; 100+ messages in thread
From: Wei Yang @ 2014-06-19  2:46 UTC (permalink / raw)
  To: Grant Likely
  Cc: Wei Yang, Benjamin Herrenschmidt, linux-pci, gwshan, Mike Qiu,
	Bjorn Helgaas, yan, linuxppc-dev

On Wed, Jun 18, 2014 at 07:26:27PM +0100, Grant Likely wrote:
>On Tue, Jun 10, 2014 at 2:56 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>> Currently, powernv platform is not aware of VFs. This means no dev-node
>> represents a VF. Also, VF PCI device is created when PF driver want to enable
>> it. This leads to the pdn->pdev and pdn->pe_number an invalid value.
>>
>> This patch create/release dev-node for VF and fixs this when a VF's pci_dev
>> is created.
>>
>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>
>I don't think this is the right way to handle this. Unless it is a
>fixup to a buggy devicetree provided by firmware, I don't want to see
>any code modifying the devicetree to describe stuff that is able to be
>directly enumerated. Really the pci code should handle the lack of a
>device_node gracefully. If it cannot then it should be fixed.

Grant,

Glad to see your comment.

I will fix this in the firmware.

>
>g.
>
>> ---
>>  arch/powerpc/platforms/powernv/Kconfig    |    1 +
>>  arch/powerpc/platforms/powernv/pci-ioda.c |  103 +++++++++++++++++++++++++++++
>>  arch/powerpc/platforms/powernv/pci.c      |   20 ++++++
>>  3 files changed, 124 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
>> index 895e8a2..0dd331b 100644
>> --- a/arch/powerpc/platforms/powernv/Kconfig
>> +++ b/arch/powerpc/platforms/powernv/Kconfig
>> @@ -11,6 +11,7 @@ config PPC_POWERNV
>>         select PPC_UDBG_16550
>>         select PPC_SCOM
>>         select ARCH_RANDOM
>> +       select OF_DYNAMIC
>>         default y
>>
>>  config PPC_POWERNV_RTAS
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index e46c5bf..9ace027 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -23,6 +23,7 @@
>>  #include <linux/io.h>
>>  #include <linux/msi.h>
>>  #include <linux/memblock.h>
>> +#include <linux/of_pci.h>
>>
>>  #include <asm/sections.h>
>>  #include <asm/io.h>
>> @@ -771,6 +772,108 @@ static void pnv_pci_ioda_setup_PEs(void)
>>         }
>>  }
>>
>> +#ifdef CONFIG_PCI_IOV
>> +static void pnv_pci_create_vf_node(struct pci_dev *dev, u16 vf_num)
>> +{
>> +       struct device_node *dn, *p_dn;
>> +       struct pci_dn *pdn;
>> +       struct pci_controller *hose;
>> +       struct property *pp;
>> +       void* value;
>> +       u16 id;
>> +
>> +       hose = pci_bus_to_host(dev->bus);
>> +
>> +       /* Create dev-tree node for VFs if this is a PF */
>> +       p_dn = pci_bus_to_OF_node(dev->bus);
>> +       if (p_dn == NULL) {
>> +               dev_err(&dev->dev, "SRIOV: VF bus NULL device node\n");
>> +               return;
>> +       }
>> +
>> +       for (id = 0; id < vf_num; id++) {
>> +               dn = kzalloc(sizeof(*dn), GFP_KERNEL);
>> +               pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
>> +               pp  = kzalloc(sizeof(*pp), GFP_KERNEL);
>> +               value = kzalloc(sizeof(u32), GFP_KERNEL);
>> +
>> +               if (!dn || !pdn || !pp || !value) {
>> +                       kfree(dn);
>> +                       kfree(pdn);
>> +                       kfree(pp);
>> +                       kfree(value);
>> +                       dev_warn(&dev->dev, "%s: failed to create"
>> +                               "dev-tree node for idx(%d)\n",
>> +                               __func__, id);
>> +
>> +                       break;
>> +               }
>> +
>> +               pp->value = value;
>> +               pdn->node = dn;
>> +               pdn->devfn = pci_iov_virtfn_devfn(dev, id);
>> +               pdn->busno = dev->bus->number;
>> +               pdn->pe_number = IODA_INVALID_PE;
>> +               pdn->phb = hose;
>> +
>> +               dn->data = pdn;
>> +               kref_init(&dn->kref);
>> +               dn->full_name = dn->name =
>> +                       kasprintf(GFP_KERNEL, "%s/vf%d",
>> +                               p_dn->full_name, pdn->devfn);
>> +               dn->parent = p_dn;
>> +
>> +               pp->name = kasprintf(GFP_KERNEL, "reg");
>> +               pp->length = 5 * sizeof(__be32);
>> +               *(u32*)pp->value = cpu_to_be32(pdn->devfn) << 8;
>> +               dn->properties = pp;
>> +
>> +               of_attach_node(dn);
>> +       }
>> +}
>> +
>> +static void pnv_pci_release_vf_node(struct pci_dev *dev, u16 vf_num)
>> +{
>> +       struct device_node *dn;
>> +       struct property *pp;
>> +       u16 id;
>> +
>> +       for (id = 0; id < vf_num; id++) {
>> +               dn = of_pci_find_child_device(dev->bus->dev.of_node,
>> +                               pci_iov_virtfn_devfn(dev, id));
>> +               if (!dn)
>> +                       continue;
>> +
>> +               of_detach_node(dn);
>> +               pp = dn->properties;
>> +               kfree(pp->name);
>> +               kfree(pp->value);
>> +               kfree(pp);
>> +               kfree(dn->data);
>> +               kfree(dn);
>> +       }
>> +}
>> +
>> +int pcibios_sriov_disable(struct pci_dev *pdev)
>> +{
>> +       struct pci_sriov *iov;
>> +       u16 vf_num;
>> +
>> +       iov = pdev->sriov;
>> +       vf_num = iov->num_VFs;
>> +       pnv_pci_release_vf_node(pdev, vf_num);
>> +
>> +       return 0;
>> +}
>> +
>> +int pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
>> +{
>> +       pnv_pci_create_vf_node(pdev, vf_num);
>> +
>> +       return 0;
>> +}
>> +#endif /* CONFIG_PCI_IOV */
>> +
>>  static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev)
>>  {
>>         struct pci_dn *pdn = pci_get_pdn(pdev);
>> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>> index 687a068..43fcc73 100644
>> --- a/arch/powerpc/platforms/powernv/pci.c
>> +++ b/arch/powerpc/platforms/powernv/pci.c
>> @@ -654,6 +654,26 @@ static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
>>  {
>>         struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>         struct pnv_phb *phb = hose->private_data;
>> +#ifdef CONFIG_PCI_IOV
>> +       struct pnv_ioda_pe *pe;
>> +       struct pci_dn *pdn;
>> +
>> +       /* Fix the VF pdn PE number */
>> +       if (pdev->is_virtfn) {
>> +               pdn = pci_get_pdn(pdev);
>> +               if (pdn->pcidev == NULL || pdn->pe_number == IODA_INVALID_PE) {
>> +                       list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>> +                               if (pe->rid ==
>> +                                       ((pdev->bus->number << 8) | (pdev->devfn & 0xff))) {
>> +                                       pdn->pcidev = pdev;
>> +                                       pdn->pe_number = pe->pe_number;
>> +                                       pe->pdev = pdev;
>> +                                       break;
>> +                               }
>> +                       }
>> +               }
>> +       }
>> +#endif /* CONFIG_PCI_IOV */
>>
>>         /* If we have no phb structure, try to setup a fallback based on
>>          * the device-tree (RTAS PCI for example)
>> --
>> 1.7.9.5
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 14/17] ppc/pci: create/release dev-tree node for VFs
  2014-06-19  2:46     ` Wei Yang
@ 2014-06-19  8:30       ` Grant Likely
  2014-06-19  9:42         ` Wei Yang
  2014-06-20  3:46         ` Wei Yang
  0 siblings, 2 replies; 100+ messages in thread
From: Grant Likely @ 2014-06-19  8:30 UTC (permalink / raw)
  To: Wei Yang
  Cc: Benjamin Herrenschmidt, linux-pci, gwshan, Mike Qiu,
	Bjorn Helgaas, yan, linuxppc-dev

On Thu, Jun 19, 2014 at 3:46 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> On Wed, Jun 18, 2014 at 07:26:27PM +0100, Grant Likely wrote:
>>On Tue, Jun 10, 2014 at 2:56 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>>> Currently, powernv platform is not aware of VFs. This means no dev-node
>>> represents a VF. Also, VF PCI device is created when PF driver want to enable
>>> it. This leads to the pdn->pdev and pdn->pe_number an invalid value.
>>>
>>> This patch create/release dev-node for VF and fixs this when a VF's pci_dev
>>> is created.
>>>
>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>
>>I don't think this is the right way to handle this. Unless it is a
>>fixup to a buggy devicetree provided by firmware, I don't want to see
>>any code modifying the devicetree to describe stuff that is able to be
>>directly enumerated. Really the pci code should handle the lack of a
>>device_node gracefully. If it cannot then it should be fixed.
>
> Grant,
>
> Glad to see your comment.
>
> I will fix this in the firmware.

That's not really what I meant. The kernel should be able to deal with
virtual functions even if firmware doesn't know how, and the kernel
should not require modifying the device tree to support them.

I'm saying fix the kernel so that a device node is not necessary for
virtual functions.

g.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 14/17] ppc/pci: create/release dev-tree node for VFs
  2014-06-19  8:30       ` Grant Likely
@ 2014-06-19  9:42         ` Wei Yang
  2014-06-20  3:46         ` Wei Yang
  1 sibling, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-19  9:42 UTC (permalink / raw)
  To: Grant Likely
  Cc: Wei Yang, Benjamin Herrenschmidt, linux-pci, gwshan, Mike Qiu,
	Bjorn Helgaas, yan, linuxppc-dev

On Thu, Jun 19, 2014 at 09:30:47AM +0100, Grant Likely wrote:
>On Thu, Jun 19, 2014 at 3:46 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>> On Wed, Jun 18, 2014 at 07:26:27PM +0100, Grant Likely wrote:
>>>On Tue, Jun 10, 2014 at 2:56 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>>>> Currently, powernv platform is not aware of VFs. This means no dev-node
>>>> represents a VF. Also, VF PCI device is created when PF driver want to enable
>>>> it. This leads to the pdn->pdev and pdn->pe_number an invalid value.
>>>>
>>>> This patch create/release dev-node for VF and fixs this when a VF's pci_dev
>>>> is created.
>>>>
>>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>
>>>I don't think this is the right way to handle this. Unless it is a
>>>fixup to a buggy devicetree provided by firmware, I don't want to see
>>>any code modifying the devicetree to describe stuff that is able to be
>>>directly enumerated. Really the pci code should handle the lack of a
>>>device_node gracefully. If it cannot then it should be fixed.
>>
>> Grant,
>>
>> Glad to see your comment.
>>
>> I will fix this in the firmware.
>
>That's not really what I meant. The kernel should be able to deal with
>virtual functions even if firmware doesn't know how, and the kernel
>should not require modifying the device tree to support them.
>
>I'm saying fix the kernel so that a device node is not necessary for
>virtual functions.

oh, sorry for my poor understanding. Let me do some investigation to see
whether it is fine to get rid of device node for vfs.

>
>g.

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 14/17] ppc/pci: create/release dev-tree node for VFs
  2014-06-19  8:30       ` Grant Likely
  2014-06-19  9:42         ` Wei Yang
@ 2014-06-20  3:46         ` Wei Yang
  1 sibling, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-20  3:46 UTC (permalink / raw)
  To: Grant Likely
  Cc: Wei Yang, Benjamin Herrenschmidt, linux-pci, gwshan, Mike Qiu,
	Bjorn Helgaas, yan, linuxppc-dev

On Thu, Jun 19, 2014 at 09:30:47AM +0100, Grant Likely wrote:
>On Thu, Jun 19, 2014 at 3:46 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>> On Wed, Jun 18, 2014 at 07:26:27PM +0100, Grant Likely wrote:
>>>On Tue, Jun 10, 2014 at 2:56 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>>>> Currently, powernv platform is not aware of VFs. This means no dev-node
>>>> represents a VF. Also, VF PCI device is created when PF driver want to enable
>>>> it. This leads to the pdn->pdev and pdn->pe_number an invalid value.
>>>>
>>>> This patch create/release dev-node for VF and fixs this when a VF's pci_dev
>>>> is created.
>>>>
>>>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>
>>>I don't think this is the right way to handle this. Unless it is a
>>>fixup to a buggy devicetree provided by firmware, I don't want to see
>>>any code modifying the devicetree to describe stuff that is able to be
>>>directly enumerated. Really the pci code should handle the lack of a
>>>device_node gracefully. If it cannot then it should be fixed.
>>
>> Grant,
>>
>> Glad to see your comment.
>>
>> I will fix this in the firmware.
>
>That's not really what I meant. The kernel should be able to deal with
>virtual functions even if firmware doesn't know how, and the kernel
>should not require modifying the device tree to support them.
>
>I'm saying fix the kernel so that a device node is not necessary for
>virtual functions.
>
>g.

Grant,

After doing some investigation, I found there are two places might highly rely
on these information. And not only VFs, but also PFs.

1. pnv_pci_read_config()/pnv_pci_cfg_read()
   When doing config space read, this needs the information of the phb.
   In commit 61305a96, the phb is retrived from the bus, and in commit
   9bf41be6 it turns to use the device node for EEH hotplug case. Also VF may
   face similar case for EEH hotplug.(This is under dev)

   To get rid of the device node/pci_dn, we need a special handling for VFs.
   Hmm... it looks not nice.

2. pnv_pci_ioda_dma_dev_setup()/pnv_pci_ioda_dma_set_mask()
   In pci_dn, there is a field: pe_number. This is used to retrive the correct
   PE this pci device associated with.

   If we don't have a pci_dn for a VF, we need to store this information to
   another place. Like in the PF's pci_dn? Hmm... looks not nice neither.

Generally, we could find a workaround make the VFs work without device
node/pci_dn, but it would do some harm to the infrastructure, make it not
consistant and not easy to read/maintain.

Currently I don't find a neat way to just get rid of device node/pci_dn for
VFs only. May require a careful restructure to do so.

BTW, my understanding may not be correct. If you have better idea, please let
me know :-) Thanks a lot.

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 04/17] PCI: SRIOV: add VF enable/disable hook
  2014-06-10  1:56   ` Wei Yang
@ 2014-06-23  5:03     ` Gavin Shan
  -1 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  5:03 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Tue, Jun 10, 2014 at 09:56:26AM +0800, Wei Yang wrote:
>VFs are dynamically created/released when driver enable them. On some
>platforms, like PowerNV, special resources are necessary to enable VFs.
>
>This patch adds two hooks for platform initialization before creating the VFs.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> drivers/pci/iov.c |   19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index 1d21f43..cc87773 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -250,6 +250,11 @@ static void sriov_disable_migration(struct pci_dev *dev)
> 	iounmap(iov->mstate);
> }
>
>+int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
>+{
>+       return 0;
>+}
>+
> static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> {
> 	int rc;
>@@ -260,6 +265,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> 	struct pci_dev *pdev;
> 	struct pci_sriov *iov = dev->sriov;
> 	int bars = 0;
>+	int retval;
>
> 	if (!nr_virtfn)
> 		return 0;
>@@ -334,6 +340,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> 	if (nr_virtfn < initial)
> 		initial = nr_virtfn;
>
>+	if ((retval = pcibios_sriov_enable(dev, initial))) {
>+		dev_err(&dev->dev, "Failure %d from pcibios_sriov_setup()\n",
>+				retval);

		dev_err(&dev->dev, "Failure %d from pcibios_sriov_enable()\n",
			retval);

>+		return retval;
>+	}
>+
> 	for (i = 0; i < initial; i++) {
> 		rc = virtfn_add(dev, i, 0);
> 		if (rc)
>@@ -368,6 +380,11 @@ failed:
> 	return rc;
> }
>
>+int __weak pcibios_sriov_disable(struct pci_dev *pdev)
>+{
>+       return 0;
>+}
>+
> static void sriov_disable(struct pci_dev *dev)
> {
> 	int i;
>@@ -382,6 +399,8 @@ static void sriov_disable(struct pci_dev *dev)
> 	for (i = 0; i < iov->num_VFs; i++)
> 		virtfn_remove(dev, i, 0);
>
>+	pcibios_sriov_disable(dev);
>+
> 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> 	pci_cfg_access_lock(dev);
> 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 04/17] PCI: SRIOV: add VF enable/disable hook
@ 2014-06-23  5:03     ` Gavin Shan
  0 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  5:03 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Tue, Jun 10, 2014 at 09:56:26AM +0800, Wei Yang wrote:
>VFs are dynamically created/released when driver enable them. On some
>platforms, like PowerNV, special resources are necessary to enable VFs.
>
>This patch adds two hooks for platform initialization before creating the VFs.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> drivers/pci/iov.c |   19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index 1d21f43..cc87773 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -250,6 +250,11 @@ static void sriov_disable_migration(struct pci_dev *dev)
> 	iounmap(iov->mstate);
> }
>
>+int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
>+{
>+       return 0;
>+}
>+
> static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> {
> 	int rc;
>@@ -260,6 +265,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> 	struct pci_dev *pdev;
> 	struct pci_sriov *iov = dev->sriov;
> 	int bars = 0;
>+	int retval;
>
> 	if (!nr_virtfn)
> 		return 0;
>@@ -334,6 +340,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> 	if (nr_virtfn < initial)
> 		initial = nr_virtfn;
>
>+	if ((retval = pcibios_sriov_enable(dev, initial))) {
>+		dev_err(&dev->dev, "Failure %d from pcibios_sriov_setup()\n",
>+				retval);

		dev_err(&dev->dev, "Failure %d from pcibios_sriov_enable()\n",
			retval);

>+		return retval;
>+	}
>+
> 	for (i = 0; i < initial; i++) {
> 		rc = virtfn_add(dev, i, 0);
> 		if (rc)
>@@ -368,6 +380,11 @@ failed:
> 	return rc;
> }
>
>+int __weak pcibios_sriov_disable(struct pci_dev *pdev)
>+{
>+       return 0;
>+}
>+
> static void sriov_disable(struct pci_dev *dev)
> {
> 	int i;
>@@ -382,6 +399,8 @@ static void sriov_disable(struct pci_dev *dev)
> 	for (i = 0; i < iov->num_VFs; i++)
> 		virtfn_remove(dev, i, 0);
>
>+	pcibios_sriov_disable(dev);
>+
> 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> 	pci_cfg_access_lock(dev);
> 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 02/17] pci/of: Match PCI VFs to dev-tree nodes dynamically
  2014-06-10  1:56   ` Wei Yang
@ 2014-06-23  5:07     ` Gavin Shan
  -1 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  5:07 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Tue, Jun 10, 2014 at 09:56:24AM +0800, Wei Yang wrote:
>As introduced by commit 98d9f30c82 ("pci/of: Match PCI devices to dev-tree nodes
>dynamically"), we need to match PCI devices to their corresponding dev-tree
>nodes. While for VFs, this step was missed.
>
>This patch matches VFs' PCI devices to dev-tree nodes dynamically.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> drivers/pci/iov.c |    1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index 589ef7d..1d21f43 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -67,6 +67,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
>
> 	virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
> 	virtfn->vendor = dev->vendor;
>+	pci_set_of_node(virtfn);

If the VF and PF seats on different PCI buses, I guess pci_set_of_node() always
binds nothing with the VF. It might be one of the problem your code missed and
I didn't catch this in the code review done previously. However, it shouldn't
be a real problem if we're not going to rely on dynamic device_node.

> 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
> 	pci_setup_device(virtfn);
> 	virtfn->dev.parent = dev->dev.parent;

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 02/17] pci/of: Match PCI VFs to dev-tree nodes dynamically
@ 2014-06-23  5:07     ` Gavin Shan
  0 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  5:07 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Tue, Jun 10, 2014 at 09:56:24AM +0800, Wei Yang wrote:
>As introduced by commit 98d9f30c82 ("pci/of: Match PCI devices to dev-tree nodes
>dynamically"), we need to match PCI devices to their corresponding dev-tree
>nodes. While for VFs, this step was missed.
>
>This patch matches VFs' PCI devices to dev-tree nodes dynamically.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> drivers/pci/iov.c |    1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index 589ef7d..1d21f43 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -67,6 +67,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
>
> 	virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
> 	virtfn->vendor = dev->vendor;
>+	pci_set_of_node(virtfn);

If the VF and PF seats on different PCI buses, I guess pci_set_of_node() always
binds nothing with the VF. It might be one of the problem your code missed and
I didn't catch this in the code review done previously. However, it shouldn't
be a real problem if we're not going to rely on dynamic device_node.

> 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
> 	pci_setup_device(virtfn);
> 	virtfn->dev.parent = dev->dev.parent;

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 05/17] ppc/pnv: user macro to define the TCE size
  2014-06-10  1:56   ` Wei Yang
@ 2014-06-23  5:12     ` Gavin Shan
  -1 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  5:12 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Tue, Jun 10, 2014 at 09:56:27AM +0800, Wei Yang wrote:
>During the initialization of the TVT/TCE, it uses digits to specify the TCE IO
>Page Size, TCE Table Size, TCE Entry Size, etc.
>
>This patch replaces those digits with macros, which will be more meaningful and
>easy to read.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>

It looks conflicting with "dynamic page size support" posted by Alexey:

http://patchwork.ozlabs.org/patch/356718/
 
>---
> arch/powerpc/include/asm/tce.h            |    3 ++-
> arch/powerpc/platforms/powernv/pci-ioda.c |   25 +++++++++++--------------
> arch/powerpc/platforms/powernv/pci.c      |    2 +-
> arch/powerpc/platforms/powernv/pci.h      |    5 +++++
> 4 files changed, 19 insertions(+), 16 deletions(-)
>
>diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
>index 743f36b..28a1d06 100644
>--- a/arch/powerpc/include/asm/tce.h
>+++ b/arch/powerpc/include/asm/tce.h
>@@ -40,7 +40,8 @@
> #define TCE_SHIFT	12
> #define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
>
>-#define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
>+#define TCE_ENTRY_SHIFT		3
>+#define TCE_ENTRY_SIZE		(1 << TCE_ENTRY_SHIFT)	/* each TCE is 64 bits */
>
> #define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
> #define TCE_RPN_SHIFT		12
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 8ae09cf..9715351 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -820,9 +820,6 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
> 	int64_t rc;
> 	void *addr;
>
>-	/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
>-
> 	/* XXX FIXME: Handle 64-bit only DMA devices */
> 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
> 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>@@ -834,7 +831,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
> 	/* Grab a 32-bit TCE table */
> 	pe->tce32_seg = base;
> 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>-		(base << 28), ((base + segs) << 28) - 1);
>+		(base << PNV_TCE32_SEG_SHIFT), ((base + segs) << PNV_TCE32_SEG_SHIFT) - 1);
>
> 	/* XXX Currently, we allocate one big contiguous table for the
> 	 * TCEs. We only really need one chunk per 256M of TCE space
>@@ -842,21 +839,21 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
> 	 * requires some added smarts with our get/put_tce implementation
> 	 */
> 	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
>-				   get_order(TCE32_TABLE_SIZE * segs));
>+				   get_order(PNV_TCE32_TAB_SIZE * segs));
> 	if (!tce_mem) {
> 		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
> 		goto fail;
> 	}
> 	addr = page_address(tce_mem);
>-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
>+	memset(addr, 0, PNV_TCE32_TAB_SIZE * segs);
>
> 	/* Configure HW */
> 	for (i = 0; i < segs; i++) {
> 		rc = opal_pci_map_pe_dma_window(phb->opal_id,
> 					      pe->pe_number,
> 					      base + i, 1,
>-					      __pa(addr) + TCE32_TABLE_SIZE * i,
>-					      TCE32_TABLE_SIZE, 0x1000);
>+					      __pa(addr) + PNV_TCE32_TAB_SIZE * i,
>+					      PNV_TCE32_TAB_SIZE, TCE_PAGE_SIZE);
> 		if (rc) {
> 			pe_err(pe, " Failed to configure 32-bit TCE table,"
> 			       " err %ld\n", rc);
>@@ -866,8 +863,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>
> 	/* Setup linux iommu table */
> 	tbl = &pe->tce32_table;
>-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
>-				  base << 28);
>+	pnv_pci_setup_iommu_table(tbl, addr, PNV_TCE32_TAB_SIZE * segs,
>+				  base << PNV_TCE32_SEG_SHIFT);
>
> 	/* OPAL variant of P7IOC SW invalidated TCEs */
> 	swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
>@@ -898,7 +895,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
> 	if (pe->tce32_seg >= 0)
> 		pe->tce32_seg = -1;
> 	if (tce_mem)
>-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>+		__free_pages(tce_mem, get_order(PNV_TCE32_TAB_SIZE * segs));
> }
>
> static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
>@@ -968,7 +965,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
> 	/* The PE will reserve all possible 32-bits space */
> 	pe->tce32_seg = 0;
> 	end = (1 << ilog2(phb->ioda.m32_pci_base));
>-	tce_table_size = (end / 0x1000) * 8;
>+	tce_table_size = (end / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE;
> 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
> 		end);
>
>@@ -988,7 +985,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
> 	 */
> 	rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
> 					pe->pe_number << 1, 1, __pa(addr),
>-					tce_table_size, 0x1000);
>+					tce_table_size, TCE_PAGE_SIZE);
> 	if (rc) {
> 		pe_err(pe, "Failed to configure 32-bit TCE table,"
> 		       " err %ld\n", rc);
>@@ -1573,7 +1570,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
> 	INIT_LIST_HEAD(&phb->ioda.pe_list);
>
> 	/* Calculate how many 32-bit TCE segments we have */
>-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
>+	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> PNV_TCE32_SEG_SHIFT;
>
> #if 0 /* We should really do that ... */
> 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
>diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>index 8518817..687a068 100644
>--- a/arch/powerpc/platforms/powernv/pci.c
>+++ b/arch/powerpc/platforms/powernv/pci.c
>@@ -597,7 +597,7 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
> 	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> 	tbl->it_offset = dma_offset >> tbl->it_page_shift;
> 	tbl->it_index = 0;
>-	tbl->it_size = tce_size >> 3;
>+	tbl->it_size = tce_size >> TCE_ENTRY_SHIFT;
> 	tbl->it_busno = 0;
> 	tbl->it_type = TCE_PCI;
> }
>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>index 3e5f5a1..90f6da4 100644
>--- a/arch/powerpc/platforms/powernv/pci.h
>+++ b/arch/powerpc/platforms/powernv/pci.h
>@@ -227,4 +227,9 @@ extern void pnv_pci_init_ioda2_phb(struct device_node *np);
> extern void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
> 					__be64 *startp, __be64 *endp, bool rm);
>
>+#define PNV_TCE32_SEG_SHIFT     28
>+#define PNV_TCE32_SEG_SIZE      (1UL << PNV_TCE32_SEG_SHIFT)
>+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>+#define PNV_TCE32_TAB_SIZE	((PNV_TCE32_SEG_SIZE / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE)
>+
> #endif /* __POWERNV_PCI_H */

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 05/17] ppc/pnv: user macro to define the TCE size
@ 2014-06-23  5:12     ` Gavin Shan
  0 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  5:12 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Tue, Jun 10, 2014 at 09:56:27AM +0800, Wei Yang wrote:
>During the initialization of the TVT/TCE, it uses digits to specify the TCE IO
>Page Size, TCE Table Size, TCE Entry Size, etc.
>
>This patch replaces those digits with macros, which will be more meaningful and
>easy to read.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>

It looks conflicting with "dynamic page size support" posted by Alexey:

http://patchwork.ozlabs.org/patch/356718/
 
>---
> arch/powerpc/include/asm/tce.h            |    3 ++-
> arch/powerpc/platforms/powernv/pci-ioda.c |   25 +++++++++++--------------
> arch/powerpc/platforms/powernv/pci.c      |    2 +-
> arch/powerpc/platforms/powernv/pci.h      |    5 +++++
> 4 files changed, 19 insertions(+), 16 deletions(-)
>
>diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
>index 743f36b..28a1d06 100644
>--- a/arch/powerpc/include/asm/tce.h
>+++ b/arch/powerpc/include/asm/tce.h
>@@ -40,7 +40,8 @@
> #define TCE_SHIFT	12
> #define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
>
>-#define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
>+#define TCE_ENTRY_SHIFT		3
>+#define TCE_ENTRY_SIZE		(1 << TCE_ENTRY_SHIFT)	/* each TCE is 64 bits */
>
> #define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
> #define TCE_RPN_SHIFT		12
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 8ae09cf..9715351 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -820,9 +820,6 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
> 	int64_t rc;
> 	void *addr;
>
>-	/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
>-
> 	/* XXX FIXME: Handle 64-bit only DMA devices */
> 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
> 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>@@ -834,7 +831,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
> 	/* Grab a 32-bit TCE table */
> 	pe->tce32_seg = base;
> 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>-		(base << 28), ((base + segs) << 28) - 1);
>+		(base << PNV_TCE32_SEG_SHIFT), ((base + segs) << PNV_TCE32_SEG_SHIFT) - 1);
>
> 	/* XXX Currently, we allocate one big contiguous table for the
> 	 * TCEs. We only really need one chunk per 256M of TCE space
>@@ -842,21 +839,21 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
> 	 * requires some added smarts with our get/put_tce implementation
> 	 */
> 	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
>-				   get_order(TCE32_TABLE_SIZE * segs));
>+				   get_order(PNV_TCE32_TAB_SIZE * segs));
> 	if (!tce_mem) {
> 		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
> 		goto fail;
> 	}
> 	addr = page_address(tce_mem);
>-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
>+	memset(addr, 0, PNV_TCE32_TAB_SIZE * segs);
>
> 	/* Configure HW */
> 	for (i = 0; i < segs; i++) {
> 		rc = opal_pci_map_pe_dma_window(phb->opal_id,
> 					      pe->pe_number,
> 					      base + i, 1,
>-					      __pa(addr) + TCE32_TABLE_SIZE * i,
>-					      TCE32_TABLE_SIZE, 0x1000);
>+					      __pa(addr) + PNV_TCE32_TAB_SIZE * i,
>+					      PNV_TCE32_TAB_SIZE, TCE_PAGE_SIZE);
> 		if (rc) {
> 			pe_err(pe, " Failed to configure 32-bit TCE table,"
> 			       " err %ld\n", rc);
>@@ -866,8 +863,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>
> 	/* Setup linux iommu table */
> 	tbl = &pe->tce32_table;
>-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
>-				  base << 28);
>+	pnv_pci_setup_iommu_table(tbl, addr, PNV_TCE32_TAB_SIZE * segs,
>+				  base << PNV_TCE32_SEG_SHIFT);
>
> 	/* OPAL variant of P7IOC SW invalidated TCEs */
> 	swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
>@@ -898,7 +895,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
> 	if (pe->tce32_seg >= 0)
> 		pe->tce32_seg = -1;
> 	if (tce_mem)
>-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>+		__free_pages(tce_mem, get_order(PNV_TCE32_TAB_SIZE * segs));
> }
>
> static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
>@@ -968,7 +965,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
> 	/* The PE will reserve all possible 32-bits space */
> 	pe->tce32_seg = 0;
> 	end = (1 << ilog2(phb->ioda.m32_pci_base));
>-	tce_table_size = (end / 0x1000) * 8;
>+	tce_table_size = (end / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE;
> 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
> 		end);
>
>@@ -988,7 +985,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
> 	 */
> 	rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
> 					pe->pe_number << 1, 1, __pa(addr),
>-					tce_table_size, 0x1000);
>+					tce_table_size, TCE_PAGE_SIZE);
> 	if (rc) {
> 		pe_err(pe, "Failed to configure 32-bit TCE table,"
> 		       " err %ld\n", rc);
>@@ -1573,7 +1570,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
> 	INIT_LIST_HEAD(&phb->ioda.pe_list);
>
> 	/* Calculate how many 32-bit TCE segments we have */
>-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
>+	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> PNV_TCE32_SEG_SHIFT;
>
> #if 0 /* We should really do that ... */
> 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
>diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>index 8518817..687a068 100644
>--- a/arch/powerpc/platforms/powernv/pci.c
>+++ b/arch/powerpc/platforms/powernv/pci.c
>@@ -597,7 +597,7 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
> 	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> 	tbl->it_offset = dma_offset >> tbl->it_page_shift;
> 	tbl->it_index = 0;
>-	tbl->it_size = tce_size >> 3;
>+	tbl->it_size = tce_size >> TCE_ENTRY_SHIFT;
> 	tbl->it_busno = 0;
> 	tbl->it_type = TCE_PCI;
> }
>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>index 3e5f5a1..90f6da4 100644
>--- a/arch/powerpc/platforms/powernv/pci.h
>+++ b/arch/powerpc/platforms/powernv/pci.h
>@@ -227,4 +227,9 @@ extern void pnv_pci_init_ioda2_phb(struct device_node *np);
> extern void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
> 					__be64 *startp, __be64 *endp, bool rm);
>
>+#define PNV_TCE32_SEG_SHIFT     28
>+#define PNV_TCE32_SEG_SIZE      (1UL << PNV_TCE32_SEG_SHIFT)
>+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>+#define PNV_TCE32_TAB_SIZE	((PNV_TCE32_SEG_SIZE / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE)
>+
> #endif /* __POWERNV_PCI_H */

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 07/17] ppc/pnv: Add function to deconfig a PE
  2014-06-10  1:56   ` Wei Yang
@ 2014-06-23  5:27     ` Gavin Shan
  -1 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  5:27 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Tue, Jun 10, 2014 at 09:56:29AM +0800, Wei Yang wrote:
>On PowerNV platform, it will support dynamic PE allocation and deallocation.
>
>This patch adds a function to release those resources related to a PE.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> arch/powerpc/platforms/powernv/pci-ioda.c |   77 +++++++++++++++++++++++++++++
> 1 file changed, 77 insertions(+)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 8ca3926..87cb3089 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -330,6 +330,83 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
> }
> #endif /* CONFIG_PCI_MSI */
>
>+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>+{

Richard, it seems that the deconfiguration is incomplete. Something seems
missed: DMA, IO and MMIO, MSI. If I understand correctly, pnv_ioda_deconfigure_pe()
won't tear down DMA, IO and MMIO, MSI properly. For MSI/MSIx, it wouldn't
be a problem as the VF driver should disable them before calling this function.

>+	struct pci_dev *parent;
>+	uint8_t bcomp, dcomp, fcomp;
>+	int64_t rc;
>+	long rid_end, rid;

Blank line needed here to separate variable declaration and logic. And I think
we won't run into case "if (pe->pbus)" for now. So it's worthy to have some
comments to explain it for a bit :-)

>+	if (pe->pbus) {
>+		int count;
>+
>+		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
>+		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
>+		parent = pe->pbus->self;
>+		if (pe->flags & PNV_IODA_PE_BUS_ALL)
>+			count = pe->pbus->busn_res.end - pe->pbus->busn_res.start + 1;
>+		else
>+			count = 1;
>+
>+		switch(count) {
>+		case  1: bcomp = OpalPciBusAll;         break;
>+		case  2: bcomp = OpalPciBus7Bits;       break;
>+		case  4: bcomp = OpalPciBus6Bits;       break;
>+		case  8: bcomp = OpalPciBus5Bits;       break;
>+		case 16: bcomp = OpalPciBus4Bits;       break;
>+		case 32: bcomp = OpalPciBus3Bits;       break;
>+		default:
>+			pr_err("%s: Number of subordinate busses %d"
>+			       " unsupported\n",
>+			       pci_name(pe->pbus->self), count);

I guess it's not safe to do "pci_name(pe->pbus->self)" root root bus.

>+			/* Do an exact match only */
>+			bcomp = OpalPciBusAll;
>+		}
>+		rid_end = pe->rid + (count << 8);
>+	}else {

	} else {

>+		parent = pe->pdev->bus->self;
>+		bcomp = OpalPciBusAll;
>+		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
>+		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
>+		rid_end = pe->rid + 1;
>+	}
>+
>+	/* Disable MVT on IODA1 */
>+	if (phb->type == PNV_PHB_IODA1) {
>+		rc = opal_pci_set_mve_enable(phb->opal_id,
>+					     pe->mve_number, OPAL_DISABLE_MVE);
>+		if (rc) {
>+			pe_err(pe, "OPAL error %ld enabling MVE %d\n",
>+			       rc, pe->mve_number);
>+			pe->mve_number = -1;
>+		}
>+	}
>+	/* Clear the reverse map */
>+	for (rid = pe->rid; rid < rid_end; rid++)
>+		phb->ioda.pe_rmap[rid] = 0;
>+
>+	/* Release from all parents PELT-V */
>+	while (parent) {
>+		struct pci_dn *pdn = pci_get_pdn(parent);
>+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
>+			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
>+						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
>+			/* XXX What to do in case of error ? */
>+		}
>+		parent = parent->bus->self;
>+	}

It seems that you missed removing the PE from its own PELTV, which was
introduced by commit 631ad69 ("powerpc/powernv: Add PE to its own PELTV").

>+
>+	/* Dissociate PE in PELT */
>+	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
>+			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
>+	if (rc)
>+		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
>+
>+	pe->pbus = NULL;
>+	pe->pdev = NULL;
>+
>+	return 0;
>+}
>+
> static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
> {
> 	struct pci_dev *parent;

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 07/17] ppc/pnv: Add function to deconfig a PE
@ 2014-06-23  5:27     ` Gavin Shan
  0 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  5:27 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Tue, Jun 10, 2014 at 09:56:29AM +0800, Wei Yang wrote:
>On PowerNV platform, it will support dynamic PE allocation and deallocation.
>
>This patch adds a function to release those resources related to a PE.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> arch/powerpc/platforms/powernv/pci-ioda.c |   77 +++++++++++++++++++++++++++++
> 1 file changed, 77 insertions(+)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 8ca3926..87cb3089 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -330,6 +330,83 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
> }
> #endif /* CONFIG_PCI_MSI */
>
>+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>+{

Richard, it seems that the deconfiguration is incomplete. Something seems
missed: DMA, IO and MMIO, MSI. If I understand correctly, pnv_ioda_deconfigure_pe()
won't tear down DMA, IO and MMIO, MSI properly. For MSI/MSIx, it wouldn't
be a problem as the VF driver should disable them before calling this function.

>+	struct pci_dev *parent;
>+	uint8_t bcomp, dcomp, fcomp;
>+	int64_t rc;
>+	long rid_end, rid;

Blank line needed here to separate variable declaration and logic. And I think
we won't run into case "if (pe->pbus)" for now. So it's worthy to have some
comments to explain it for a bit :-)

>+	if (pe->pbus) {
>+		int count;
>+
>+		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
>+		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
>+		parent = pe->pbus->self;
>+		if (pe->flags & PNV_IODA_PE_BUS_ALL)
>+			count = pe->pbus->busn_res.end - pe->pbus->busn_res.start + 1;
>+		else
>+			count = 1;
>+
>+		switch(count) {
>+		case  1: bcomp = OpalPciBusAll;         break;
>+		case  2: bcomp = OpalPciBus7Bits;       break;
>+		case  4: bcomp = OpalPciBus6Bits;       break;
>+		case  8: bcomp = OpalPciBus5Bits;       break;
>+		case 16: bcomp = OpalPciBus4Bits;       break;
>+		case 32: bcomp = OpalPciBus3Bits;       break;
>+		default:
>+			pr_err("%s: Number of subordinate busses %d"
>+			       " unsupported\n",
>+			       pci_name(pe->pbus->self), count);

I guess it's not safe to do "pci_name(pe->pbus->self)" root root bus.

>+			/* Do an exact match only */
>+			bcomp = OpalPciBusAll;
>+		}
>+		rid_end = pe->rid + (count << 8);
>+	}else {

	} else {

>+		parent = pe->pdev->bus->self;
>+		bcomp = OpalPciBusAll;
>+		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
>+		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
>+		rid_end = pe->rid + 1;
>+	}
>+
>+	/* Disable MVT on IODA1 */
>+	if (phb->type == PNV_PHB_IODA1) {
>+		rc = opal_pci_set_mve_enable(phb->opal_id,
>+					     pe->mve_number, OPAL_DISABLE_MVE);
>+		if (rc) {
>+			pe_err(pe, "OPAL error %ld enabling MVE %d\n",
>+			       rc, pe->mve_number);
>+			pe->mve_number = -1;
>+		}
>+	}
>+	/* Clear the reverse map */
>+	for (rid = pe->rid; rid < rid_end; rid++)
>+		phb->ioda.pe_rmap[rid] = 0;
>+
>+	/* Release from all parents PELT-V */
>+	while (parent) {
>+		struct pci_dn *pdn = pci_get_pdn(parent);
>+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
>+			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
>+						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
>+			/* XXX What to do in case of error ? */
>+		}
>+		parent = parent->bus->self;
>+	}

It seems that you missed removing the PE from its own PELTV, which was
introduced by commit 631ad69 ("powerpc/powernv: Add PE to its own PELTV").

>+
>+	/* Dissociate PE in PELT */
>+	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
>+			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
>+	if (rc)
>+		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
>+
>+	pe->pbus = NULL;
>+	pe->pdev = NULL;
>+
>+	return 0;
>+}
>+
> static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
> {
> 	struct pci_dev *parent;

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 08/17] PCI: Add weak pcibios_sriov_resource_size() interface
  2014-06-10  1:56   ` Wei Yang
@ 2014-06-23  5:41     ` Gavin Shan
  -1 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  5:41 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Tue, Jun 10, 2014 at 09:56:30AM +0800, Wei Yang wrote:
>When retrieving sriov resource size in pci_sriov_resource_size(), it will
>divide the total IOV resource size with the totalVF number. This is true for
>most cases, while may not be correct on some specific platform.
>
>For example on powernv platform, in order to fix the IOV BAR into a hardware
>alignment, the IOV resource size would be expended. This means the original
>method couldn't work.
>
>This patch introduces a weak pcibios_sriov_resource_size() interface, which
>gives platform a chance to implement specific method to calculate the sriov
>resource size.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> drivers/pci/iov.c   |   27 +++++++++++++++++++++++++--
> include/linux/pci.h |    3 +++
> 2 files changed, 28 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index cc87773..9fd4648 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -45,6 +45,30 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus)
> 		pci_remove_bus(virtbus);
> }
>
>+resource_size_t __weak pcibios_sriov_resource_size(struct pci_dev *dev, int resno)
>+{
>+	return 0;
>+}
>+

Please define the prototype of weak function in header files (e.g.
linux/include/pci.h) :-)

If you missed doing same thing for the weak functions added in the
previous patches, you need fix it as well.

>+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
>+{
>+	u64 size;

I guess it'd better to be "resource_size_t". 

>+	struct pci_sriov *iov;
>+
>+	if (!dev->is_physfn)
>+		return 0;
>+
>+	size = pcibios_sriov_resource_size(dev, resno);
>+	if (size != 0)
>+		return size;
>+
>+	iov = dev->sriov;
>+	size = resource_size(dev->resource + resno);
>+	do_div(size, iov->total_VFs);
>+
>+	return size;
>+}
>+
> static int virtfn_add(struct pci_dev *dev, int id, int reset)
> {
> 	int i;
>@@ -81,8 +105,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
> 			continue;
> 		virtfn->resource[i].name = pci_name(virtfn);
> 		virtfn->resource[i].flags = res->flags;
>-		size = resource_size(res);
>-		do_div(size, iov->total_VFs);
>+		size = pci_sriov_resource_size(dev, i + PCI_IOV_RESOURCES);
> 		virtfn->resource[i].start = res->start + size * id;
> 		virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
> 		rc = request_resource(res, &virtfn->resource[i]);
>diff --git a/include/linux/pci.h b/include/linux/pci.h
>index ddb1ca0..315c150 100644
>--- a/include/linux/pci.h
>+++ b/include/linux/pci.h
>@@ -1637,6 +1637,7 @@ int pci_num_vf(struct pci_dev *dev);
> int pci_vfs_assigned(struct pci_dev *dev);
> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
> int pci_sriov_get_totalvfs(struct pci_dev *dev);
>+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno);
> #else
> static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
> {
>@@ -1658,6 +1659,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
> { return 0; }
> static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
> { return 0; }
>+static inline resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
>+{ return -1; }
> #endif
>
> #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 08/17] PCI: Add weak pcibios_sriov_resource_size() interface
@ 2014-06-23  5:41     ` Gavin Shan
  0 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  5:41 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Tue, Jun 10, 2014 at 09:56:30AM +0800, Wei Yang wrote:
>When retrieving sriov resource size in pci_sriov_resource_size(), it will
>divide the total IOV resource size with the totalVF number. This is true for
>most cases, while may not be correct on some specific platform.
>
>For example on powernv platform, in order to fix the IOV BAR into a hardware
>alignment, the IOV resource size would be expended. This means the original
>method couldn't work.
>
>This patch introduces a weak pcibios_sriov_resource_size() interface, which
>gives platform a chance to implement specific method to calculate the sriov
>resource size.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> drivers/pci/iov.c   |   27 +++++++++++++++++++++++++--
> include/linux/pci.h |    3 +++
> 2 files changed, 28 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index cc87773..9fd4648 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -45,6 +45,30 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus)
> 		pci_remove_bus(virtbus);
> }
>
>+resource_size_t __weak pcibios_sriov_resource_size(struct pci_dev *dev, int resno)
>+{
>+	return 0;
>+}
>+

Please define the prototype of weak function in header files (e.g.
linux/include/pci.h) :-)

If you missed doing same thing for the weak functions added in the
previous patches, you need fix it as well.

>+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
>+{
>+	u64 size;

I guess it'd better to be "resource_size_t". 

>+	struct pci_sriov *iov;
>+
>+	if (!dev->is_physfn)
>+		return 0;
>+
>+	size = pcibios_sriov_resource_size(dev, resno);
>+	if (size != 0)
>+		return size;
>+
>+	iov = dev->sriov;
>+	size = resource_size(dev->resource + resno);
>+	do_div(size, iov->total_VFs);
>+
>+	return size;
>+}
>+
> static int virtfn_add(struct pci_dev *dev, int id, int reset)
> {
> 	int i;
>@@ -81,8 +105,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
> 			continue;
> 		virtfn->resource[i].name = pci_name(virtfn);
> 		virtfn->resource[i].flags = res->flags;
>-		size = resource_size(res);
>-		do_div(size, iov->total_VFs);
>+		size = pci_sriov_resource_size(dev, i + PCI_IOV_RESOURCES);
> 		virtfn->resource[i].start = res->start + size * id;
> 		virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
> 		rc = request_resource(res, &virtfn->resource[i]);
>diff --git a/include/linux/pci.h b/include/linux/pci.h
>index ddb1ca0..315c150 100644
>--- a/include/linux/pci.h
>+++ b/include/linux/pci.h
>@@ -1637,6 +1637,7 @@ int pci_num_vf(struct pci_dev *dev);
> int pci_vfs_assigned(struct pci_dev *dev);
> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
> int pci_sriov_get_totalvfs(struct pci_dev *dev);
>+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno);
> #else
> static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
> {
>@@ -1658,6 +1659,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
> { return 0; }
> static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
> { return 0; }
>+static inline resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
>+{ return -1; }
> #endif
>
> #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 11/17] ppc/pnv: Expand VF resources according to the number of total_pe
  2014-06-10  1:56   ` Wei Yang
@ 2014-06-23  6:07     ` Gavin Shan
  -1 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  6:07 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Tue, Jun 10, 2014 at 09:56:33AM +0800, Wei Yang wrote:
>On PHB3, VF resources will be covered by M64 BAR to have better PE isolation.
>Mostly the total_pe number is different from the total_VFs, which will lead to
>a conflict between MMIO space and the PE number.
>
>This patch expands the VF resource size to reserve total_pe number of VFs'
>resource, which prevents the conflict.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> arch/powerpc/include/asm/machdep.h        |    6 +++
> arch/powerpc/include/asm/pci-bridge.h     |    3 ++
> arch/powerpc/kernel/pci-common.c          |   15 ++++++
> arch/powerpc/platforms/powernv/pci-ioda.c |   83 +++++++++++++++++++++++++++++
> 4 files changed, 107 insertions(+)
>
>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>index ad3025d..2f2e770 100644
>--- a/arch/powerpc/include/asm/machdep.h
>+++ b/arch/powerpc/include/asm/machdep.h
>@@ -234,9 +234,15 @@ struct machdep_calls {
>
> 	/* Called after scan and before resource survey */
> 	void (*pcibios_fixup_phb)(struct pci_controller *hose);
>+#ifdef CONFIG_PCI_IOV
>+	void (*pcibios_fixup_sriov)(struct pci_bus *bus);
>+#endif /* CONFIG_PCI_IOV */
>
> 	/* Called during PCI resource reassignment */
> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>+#ifdef CONFIG_PCI_IOV
>+	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);

	resource_size_t (*pcibios_sriov_resource_size)(struct pci_dev *, int resno);

You probably can put all SRIOV related functions together:

#ifdef CONFIG_PCI_IOV
	func_a;
	func_b;
	 :
#endif

>+#endif /* CONFIG_PCI_IOV */
>
> 	/* Called to shutdown machine specific hardware not already controlled
> 	 * by other drivers.
>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>index 4ca90a3..8c849d8 100644
>--- a/arch/powerpc/include/asm/pci-bridge.h
>+++ b/arch/powerpc/include/asm/pci-bridge.h
>@@ -168,6 +168,9 @@ struct pci_dn {
> #define IODA_INVALID_PE		(-1)
> #ifdef CONFIG_PPC_POWERNV
> 	int	pe_number;
>+#ifdef CONFIG_PCI_IOV
>+	u16     vfs;
>+#endif /* CONFIG_PCI_IOV */
> #endif
> };
>
>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>index c449a26..c4e2e92 100644
>--- a/arch/powerpc/kernel/pci-common.c
>+++ b/arch/powerpc/kernel/pci-common.c
>@@ -120,6 +120,16 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
> 	return 1;
> }
>
>+#ifdef CONFIG_PCI_IOV
>+resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>+{
>+	if (ppc_md.__pci_sriov_resource_size)
>+		return ppc_md.__pci_sriov_resource_size(pdev, resno);
>+
>+	return 0;
>+}
>+#endif /* CONFIG_PCI_IOV */
>+
> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
> {
> #ifdef CONFIG_PPC64
>@@ -1675,6 +1685,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
> 	if (ppc_md.pcibios_fixup_phb)
> 		ppc_md.pcibios_fixup_phb(hose);
>
>+#ifdef CONFIG_PCI_IOV
>+	if (ppc_md.pcibios_fixup_sriov)
>+		ppc_md.pcibios_fixup_sriov(bus);

One question I probably asked before: why we can't put the logic
of ppc_md.pcibios_fixup_sriov() to ppc_md.pcibios_fixup_phb()?

>+#endif /* CONFIG_PCI_IOV */
>+
> 	/* Configure PCI Express settings */
> 	if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
> 		struct pci_bus *child;
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 87cb3089..7dfad6a 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1298,6 +1298,67 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
> static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
> #endif /* CONFIG_PCI_MSI */
>
>+#ifdef CONFIG_PCI_IOV
>+static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>+{
>+	struct pci_controller *hose;
>+	struct pnv_phb *phb;
>+	struct resource *res;
>+	int i;
>+	resource_size_t size;
>+	struct pci_dn *pdn;
>+
>+	if (!pdev->is_physfn || pdev->is_added)
>+		return;
>+
>+	hose = pci_bus_to_host(pdev->bus);
>+	if (!hose) {
>+		dev_err(&pdev->dev, "%s: NULL pci_controller\n", __func__);
>+		return;
>+	}
>+
>+	phb = hose->private_data;
>+	if (!phb) {
>+		dev_err(&pdev->dev, "%s: NULL PHB\n", __func__);
>+		return;
>+	}
>+
>+	pdn = pci_get_pdn(pdev);
>+	pdn->vfs = 0;
>+
>+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
>+		res = &pdev->resource[i];
>+		if (!res->flags || res->parent)
>+			continue;
>+
>+		if (!is_mem_pref_64_type(res->flags))
>+			continue;
>+
>+		dev_info(&pdev->dev, "PowerNV: Fixing VF BAR[%d] %pR to\n",
>+				i, res);
>+		size = pci_sriov_resource_size(pdev, i);
>+		res->end = res->start + size * phb->ioda.total_pe - 1;
>+		dev_info(&pdev->dev, "                       %pR\n", res);
>+	}
>+	pdn->vfs = phb->ioda.total_pe;
>+}
>+
>+static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
>+{
>+	struct pci_dev *pdev;
>+	struct pci_bus *b;
>+
>+	list_for_each_entry(pdev, &bus->devices, bus_list) {
>+		b = pdev->subordinate;
>+
>+		if (b)
>+			pnv_pci_ioda_fixup_sriov(b);
>+
>+		pnv_pci_ioda_fixup_iov_resources(pdev);
>+	}
>+}
>+#endif /* CONFIG_PCI_IOV */
>+
> /*
>  * This function is supposed to be called on basis of PE from top
>  * to bottom style. So the the I/O or MMIO segment assigned to
>@@ -1498,6 +1559,22 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
> 	return phb->ioda.io_segsize;
> }
>
>+#ifdef CONFIG_PCI_IOV
>+static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int resno)
>+{
>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>+	u64 size = 0;
>+
>+	if (!pdn->vfs)
>+		return size;
>+
>+	size = resource_size(pdev->resource + resno);
>+	do_div(size, pdn->vfs);
>+
>+	return size;
>+}
>+#endif /* CONFIG_PCI_IOV */
>+
> /* Prevent enabling devices for which we couldn't properly
>  * assign a PE
>  */
>@@ -1692,9 +1769,15 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
> 	 * for the P2P bridge bars so that each PCI bus (excluding
> 	 * the child P2P bridges) can form individual PE.
> 	 */
>+#ifdef CONFIG_PCI_IOV
>+	ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_sriov;
>+#endif /* CONFIG_PCI_IOV */
> 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
> 	ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook;
> 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
>+#ifdef CONFIG_PCI_IOV
>+	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
>+#endif /* CONFIG_PCI_IOV */
> 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>
> 	/* Reset IODA tables to a clean state */

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 11/17] ppc/pnv: Expand VF resources according to the number of total_pe
@ 2014-06-23  6:07     ` Gavin Shan
  0 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  6:07 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Tue, Jun 10, 2014 at 09:56:33AM +0800, Wei Yang wrote:
>On PHB3, VF resources will be covered by M64 BAR to have better PE isolation.
>Mostly the total_pe number is different from the total_VFs, which will lead to
>a conflict between MMIO space and the PE number.
>
>This patch expands the VF resource size to reserve total_pe number of VFs'
>resource, which prevents the conflict.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> arch/powerpc/include/asm/machdep.h        |    6 +++
> arch/powerpc/include/asm/pci-bridge.h     |    3 ++
> arch/powerpc/kernel/pci-common.c          |   15 ++++++
> arch/powerpc/platforms/powernv/pci-ioda.c |   83 +++++++++++++++++++++++++++++
> 4 files changed, 107 insertions(+)
>
>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>index ad3025d..2f2e770 100644
>--- a/arch/powerpc/include/asm/machdep.h
>+++ b/arch/powerpc/include/asm/machdep.h
>@@ -234,9 +234,15 @@ struct machdep_calls {
>
> 	/* Called after scan and before resource survey */
> 	void (*pcibios_fixup_phb)(struct pci_controller *hose);
>+#ifdef CONFIG_PCI_IOV
>+	void (*pcibios_fixup_sriov)(struct pci_bus *bus);
>+#endif /* CONFIG_PCI_IOV */
>
> 	/* Called during PCI resource reassignment */
> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>+#ifdef CONFIG_PCI_IOV
>+	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);

	resource_size_t (*pcibios_sriov_resource_size)(struct pci_dev *, int resno);

You probably can put all SRIOV related functions together:

#ifdef CONFIG_PCI_IOV
	func_a;
	func_b;
	 :
#endif

>+#endif /* CONFIG_PCI_IOV */
>
> 	/* Called to shutdown machine specific hardware not already controlled
> 	 * by other drivers.
>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>index 4ca90a3..8c849d8 100644
>--- a/arch/powerpc/include/asm/pci-bridge.h
>+++ b/arch/powerpc/include/asm/pci-bridge.h
>@@ -168,6 +168,9 @@ struct pci_dn {
> #define IODA_INVALID_PE		(-1)
> #ifdef CONFIG_PPC_POWERNV
> 	int	pe_number;
>+#ifdef CONFIG_PCI_IOV
>+	u16     vfs;
>+#endif /* CONFIG_PCI_IOV */
> #endif
> };
>
>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>index c449a26..c4e2e92 100644
>--- a/arch/powerpc/kernel/pci-common.c
>+++ b/arch/powerpc/kernel/pci-common.c
>@@ -120,6 +120,16 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
> 	return 1;
> }
>
>+#ifdef CONFIG_PCI_IOV
>+resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>+{
>+	if (ppc_md.__pci_sriov_resource_size)
>+		return ppc_md.__pci_sriov_resource_size(pdev, resno);
>+
>+	return 0;
>+}
>+#endif /* CONFIG_PCI_IOV */
>+
> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
> {
> #ifdef CONFIG_PPC64
>@@ -1675,6 +1685,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
> 	if (ppc_md.pcibios_fixup_phb)
> 		ppc_md.pcibios_fixup_phb(hose);
>
>+#ifdef CONFIG_PCI_IOV
>+	if (ppc_md.pcibios_fixup_sriov)
>+		ppc_md.pcibios_fixup_sriov(bus);

One question I probably asked before: why we can't put the logic
of ppc_md.pcibios_fixup_sriov() to ppc_md.pcibios_fixup_phb()?

>+#endif /* CONFIG_PCI_IOV */
>+
> 	/* Configure PCI Express settings */
> 	if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
> 		struct pci_bus *child;
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 87cb3089..7dfad6a 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1298,6 +1298,67 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
> static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
> #endif /* CONFIG_PCI_MSI */
>
>+#ifdef CONFIG_PCI_IOV
>+static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>+{
>+	struct pci_controller *hose;
>+	struct pnv_phb *phb;
>+	struct resource *res;
>+	int i;
>+	resource_size_t size;
>+	struct pci_dn *pdn;
>+
>+	if (!pdev->is_physfn || pdev->is_added)
>+		return;
>+
>+	hose = pci_bus_to_host(pdev->bus);
>+	if (!hose) {
>+		dev_err(&pdev->dev, "%s: NULL pci_controller\n", __func__);
>+		return;
>+	}
>+
>+	phb = hose->private_data;
>+	if (!phb) {
>+		dev_err(&pdev->dev, "%s: NULL PHB\n", __func__);
>+		return;
>+	}
>+
>+	pdn = pci_get_pdn(pdev);
>+	pdn->vfs = 0;
>+
>+	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
>+		res = &pdev->resource[i];
>+		if (!res->flags || res->parent)
>+			continue;
>+
>+		if (!is_mem_pref_64_type(res->flags))
>+			continue;
>+
>+		dev_info(&pdev->dev, "PowerNV: Fixing VF BAR[%d] %pR to\n",
>+				i, res);
>+		size = pci_sriov_resource_size(pdev, i);
>+		res->end = res->start + size * phb->ioda.total_pe - 1;
>+		dev_info(&pdev->dev, "                       %pR\n", res);
>+	}
>+	pdn->vfs = phb->ioda.total_pe;
>+}
>+
>+static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
>+{
>+	struct pci_dev *pdev;
>+	struct pci_bus *b;
>+
>+	list_for_each_entry(pdev, &bus->devices, bus_list) {
>+		b = pdev->subordinate;
>+
>+		if (b)
>+			pnv_pci_ioda_fixup_sriov(b);
>+
>+		pnv_pci_ioda_fixup_iov_resources(pdev);
>+	}
>+}
>+#endif /* CONFIG_PCI_IOV */
>+
> /*
>  * This function is supposed to be called on basis of PE from top
>  * to bottom style. So the the I/O or MMIO segment assigned to
>@@ -1498,6 +1559,22 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
> 	return phb->ioda.io_segsize;
> }
>
>+#ifdef CONFIG_PCI_IOV
>+static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int resno)
>+{
>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>+	u64 size = 0;
>+
>+	if (!pdn->vfs)
>+		return size;
>+
>+	size = resource_size(pdev->resource + resno);
>+	do_div(size, pdn->vfs);
>+
>+	return size;
>+}
>+#endif /* CONFIG_PCI_IOV */
>+
> /* Prevent enabling devices for which we couldn't properly
>  * assign a PE
>  */
>@@ -1692,9 +1769,15 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
> 	 * for the P2P bridge bars so that each PCI bus (excluding
> 	 * the child P2P bridges) can form individual PE.
> 	 */
>+#ifdef CONFIG_PCI_IOV
>+	ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_sriov;
>+#endif /* CONFIG_PCI_IOV */
> 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
> 	ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook;
> 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
>+#ifdef CONFIG_PCI_IOV
>+	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
>+#endif /* CONFIG_PCI_IOV */
> 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>
> 	/* Reset IODA tables to a clean state */

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv
  2014-06-10  1:56   ` Wei Yang
@ 2014-06-23  6:09     ` Gavin Shan
  -1 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  6:09 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Tue, Jun 10, 2014 at 09:56:34AM +0800, Wei Yang wrote:
>This patch implements the pcibios_sriov_resource_alignment() on powernv
>platform.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> arch/powerpc/include/asm/machdep.h        |    1 +
> arch/powerpc/kernel/pci-common.c          |    8 ++++++++
> arch/powerpc/platforms/powernv/pci-ioda.c |   17 +++++++++++++++++
> 3 files changed, 26 insertions(+)
>
>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>index 2f2e770..3bbc55f 100644
>--- a/arch/powerpc/include/asm/machdep.h
>+++ b/arch/powerpc/include/asm/machdep.h
>@@ -242,6 +242,7 @@ struct machdep_calls {
> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
> #ifdef CONFIG_PCI_IOV
> 	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>+	resource_size_t (*__pci_sriov_resource_alignment)(struct pci_dev *, int resno, resource_size_t align);
> #endif /* CONFIG_PCI_IOV */
>
> 	/* Called to shutdown machine specific hardware not already controlled
>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>index c4e2e92..35345ac 100644
>--- a/arch/powerpc/kernel/pci-common.c
>+++ b/arch/powerpc/kernel/pci-common.c
>@@ -128,6 +128,14 @@ resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>
> 	return 0;
> }
>+
>+resource_size_t pcibios_sriov_resource_alignment(struct pci_dev *pdev, int resno, resource_size_t align)
>+{
>+	if (ppc_md.__pci_sriov_resource_alignment)
>+		return ppc_md.__pci_sriov_resource_alignment(pdev, resno, align);
>+
>+	return 0;
>+}
> #endif /* CONFIG_PCI_IOV */
>
> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 7dfad6a..b0ac851 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1573,6 +1573,22 @@ static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int r
>
> 	return size;
> }
>+
>+static resource_size_t __pnv_pci_sriov_resource_alignment(struct pci_dev *pdev, int resno,
>+		resource_size_t align)

The function could be "pcibios_sriov_resource_alignment()", but it's not a big deal.
If you prefer the original one, then keep it :)

>+{
>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>+	resource_size_t iov_align;
>+
>+	iov_align = resource_size(&pdev->resource[resno]);
>+	if (iov_align)
>+		return iov_align;
>+
>+	if (pdn->vfs)
>+		return pdn->vfs * align;
>+
>+	return align;
>+}
> #endif /* CONFIG_PCI_IOV */
>
> /* Prevent enabling devices for which we couldn't properly
>@@ -1777,6 +1793,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
> 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
> #ifdef CONFIG_PCI_IOV
> 	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
>+	ppc_md.__pci_sriov_resource_alignment = __pnv_pci_sriov_resource_alignment;
> #endif /* CONFIG_PCI_IOV */
> 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv
@ 2014-06-23  6:09     ` Gavin Shan
  0 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  6:09 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Tue, Jun 10, 2014 at 09:56:34AM +0800, Wei Yang wrote:
>This patch implements the pcibios_sriov_resource_alignment() on powernv
>platform.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> arch/powerpc/include/asm/machdep.h        |    1 +
> arch/powerpc/kernel/pci-common.c          |    8 ++++++++
> arch/powerpc/platforms/powernv/pci-ioda.c |   17 +++++++++++++++++
> 3 files changed, 26 insertions(+)
>
>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>index 2f2e770..3bbc55f 100644
>--- a/arch/powerpc/include/asm/machdep.h
>+++ b/arch/powerpc/include/asm/machdep.h
>@@ -242,6 +242,7 @@ struct machdep_calls {
> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
> #ifdef CONFIG_PCI_IOV
> 	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>+	resource_size_t (*__pci_sriov_resource_alignment)(struct pci_dev *, int resno, resource_size_t align);
> #endif /* CONFIG_PCI_IOV */
>
> 	/* Called to shutdown machine specific hardware not already controlled
>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>index c4e2e92..35345ac 100644
>--- a/arch/powerpc/kernel/pci-common.c
>+++ b/arch/powerpc/kernel/pci-common.c
>@@ -128,6 +128,14 @@ resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>
> 	return 0;
> }
>+
>+resource_size_t pcibios_sriov_resource_alignment(struct pci_dev *pdev, int resno, resource_size_t align)
>+{
>+	if (ppc_md.__pci_sriov_resource_alignment)
>+		return ppc_md.__pci_sriov_resource_alignment(pdev, resno, align);
>+
>+	return 0;
>+}
> #endif /* CONFIG_PCI_IOV */
>
> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 7dfad6a..b0ac851 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1573,6 +1573,22 @@ static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int r
>
> 	return size;
> }
>+
>+static resource_size_t __pnv_pci_sriov_resource_alignment(struct pci_dev *pdev, int resno,
>+		resource_size_t align)

The function could be "pcibios_sriov_resource_alignment()", but it's not a big deal.
If you prefer the original one, then keep it :)

>+{
>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>+	resource_size_t iov_align;
>+
>+	iov_align = resource_size(&pdev->resource[resno]);
>+	if (iov_align)
>+		return iov_align;
>+
>+	if (pdn->vfs)
>+		return pdn->vfs * align;
>+
>+	return align;
>+}
> #endif /* CONFIG_PCI_IOV */
>
> /* Prevent enabling devices for which we couldn't properly
>@@ -1777,6 +1793,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
> 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
> #ifdef CONFIG_PCI_IOV
> 	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
>+	ppc_md.__pci_sriov_resource_alignment = __pnv_pci_sriov_resource_alignment;
> #endif /* CONFIG_PCI_IOV */
> 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 02/17] pci/of: Match PCI VFs to dev-tree nodes dynamically
  2014-06-23  5:07     ` Gavin Shan
@ 2014-06-23  6:29       ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  6:29 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, yan, qiudayu

On Mon, Jun 23, 2014 at 03:07:47PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:24AM +0800, Wei Yang wrote:
>>As introduced by commit 98d9f30c82 ("pci/of: Match PCI devices to dev-tree nodes
>>dynamically"), we need to match PCI devices to their corresponding dev-tree
>>nodes. While for VFs, this step was missed.
>>
>>This patch matches VFs' PCI devices to dev-tree nodes dynamically.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> drivers/pci/iov.c |    1 +
>> 1 file changed, 1 insertion(+)
>>
>>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>index 589ef7d..1d21f43 100644
>>--- a/drivers/pci/iov.c
>>+++ b/drivers/pci/iov.c
>>@@ -67,6 +67,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
>>
>> 	virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
>> 	virtfn->vendor = dev->vendor;
>>+	pci_set_of_node(virtfn);
>
>If the VF and PF seats on different PCI buses, I guess pci_set_of_node() always
>binds nothing with the VF. It might be one of the problem your code missed and
>I didn't catch this in the code review done previously. However, it shouldn't
>be a real problem if we're not going to rely on dynamic device_node.
>

Thanks for the comment.

This case is not took into consideration yet, so it is not supported now.
While I think it is time to think about the solution now.

Hmm... after reading the code a while, this seems some change in current code.

1. The hierarchy of VF's device node
   
                    +---------+
                    |P2P      |parent
                    +----+----+
                         |       pbus
            +------------+------------+              vbus
            |                         |           ---------------+
       +---------+              +-----+---+          |
       |DEV      |child1        |DEV      |child2    |
       +---------+              +---------+          |
                                                  +----+------+
                                                  |VF         | vchild
                                                  +-----------+

   From the chart above, the left side is the device node hierarchy without
   VFs. Each pci device is the direct child of the P2P bridge. When match pci
   device and its device node, the code go through the parent bus node's child
   list and find the one with same devfn.(in pci_set_of_node()). And we can
   tell the parent bus node is the P2P bridge's device node.

   This works fine, untill VFs need to be added. vbus is a child of the pbus,
   and vbus->self is NULL. So first thing is to set the correct device node
   for this virtual bus. From the chart above, looks both P2P bridge and the
   DEV could be the device node. While the later one seems more reasonable.

2. Reserve virtual bus number in firmware
   This is not a big issue, just reserve enough bus number in firmware.
   Otherwise, pci device and device node may not match.

>> 	pci_|ead_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
>> 	pci_setup_device(virtfn);
>> 	virtfn->dev.parent = dev->dev.parent;
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 02/17] pci/of: Match PCI VFs to dev-tree nodes dynamically
@ 2014-06-23  6:29       ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  6:29 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linux-pci, yan, bhelgaas, qiudayu, linuxppc-dev

On Mon, Jun 23, 2014 at 03:07:47PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:24AM +0800, Wei Yang wrote:
>>As introduced by commit 98d9f30c82 ("pci/of: Match PCI devices to dev-tree nodes
>>dynamically"), we need to match PCI devices to their corresponding dev-tree
>>nodes. While for VFs, this step was missed.
>>
>>This patch matches VFs' PCI devices to dev-tree nodes dynamically.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> drivers/pci/iov.c |    1 +
>> 1 file changed, 1 insertion(+)
>>
>>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>index 589ef7d..1d21f43 100644
>>--- a/drivers/pci/iov.c
>>+++ b/drivers/pci/iov.c
>>@@ -67,6 +67,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
>>
>> 	virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
>> 	virtfn->vendor = dev->vendor;
>>+	pci_set_of_node(virtfn);
>
>If the VF and PF seats on different PCI buses, I guess pci_set_of_node() always
>binds nothing with the VF. It might be one of the problem your code missed and
>I didn't catch this in the code review done previously. However, it shouldn't
>be a real problem if we're not going to rely on dynamic device_node.
>

Thanks for the comment.

This case is not took into consideration yet, so it is not supported now.
While I think it is time to think about the solution now.

Hmm... after reading the code a while, this seems some change in current code.

1. The hierarchy of VF's device node
   
                    +---------+
                    |P2P      |parent
                    +----+----+
                         |       pbus
            +------------+------------+              vbus
            |                         |           ---------------+
       +---------+              +-----+---+          |
       |DEV      |child1        |DEV      |child2    |
       +---------+              +---------+          |
                                                  +----+------+
                                                  |VF         | vchild
                                                  +-----------+

   From the chart above, the left side is the device node hierarchy without
   VFs. Each pci device is the direct child of the P2P bridge. When match pci
   device and its device node, the code go through the parent bus node's child
   list and find the one with same devfn.(in pci_set_of_node()). And we can
   tell the parent bus node is the P2P bridge's device node.

   This works fine, untill VFs need to be added. vbus is a child of the pbus,
   and vbus->self is NULL. So first thing is to set the correct device node
   for this virtual bus. From the chart above, looks both P2P bridge and the
   DEV could be the device node. While the later one seems more reasonable.

2. Reserve virtual bus number in firmware
   This is not a big issue, just reserve enough bus number in firmware.
   Otherwise, pci device and device node may not match.

>> 	pci_|ead_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
>> 	pci_setup_device(virtfn);
>> 	virtfn->dev.parent = dev->dev.parent;
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 04/17] PCI: SRIOV: add VF enable/disable hook
  2014-06-23  5:03     ` Gavin Shan
@ 2014-06-23  6:29       ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  6:29 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, yan, qiudayu

On Mon, Jun 23, 2014 at 03:03:10PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:26AM +0800, Wei Yang wrote:
>>VFs are dynamically created/released when driver enable them. On some
>>platforms, like PowerNV, special resources are necessary to enable VFs.
>>
>>This patch adds two hooks for platform initialization before creating the VFs.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> drivers/pci/iov.c |   19 +++++++++++++++++++
>> 1 file changed, 19 insertions(+)
>>
>>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>index 1d21f43..cc87773 100644
>>--- a/drivers/pci/iov.c
>>+++ b/drivers/pci/iov.c
>>@@ -250,6 +250,11 @@ static void sriov_disable_migration(struct pci_dev *dev)
>> 	iounmap(iov->mstate);
>> }
>>
>>+int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
>>+{
>>+       return 0;
>>+}
>>+
>> static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>> {
>> 	int rc;
>>@@ -260,6 +265,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>> 	struct pci_dev *pdev;
>> 	struct pci_sriov *iov = dev->sriov;
>> 	int bars = 0;
>>+	int retval;
>>
>> 	if (!nr_virtfn)
>> 		return 0;
>>@@ -334,6 +340,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>> 	if (nr_virtfn < initial)
>> 		initial = nr_virtfn;
>>
>>+	if ((retval = pcibios_sriov_enable(dev, initial))) {
>>+		dev_err(&dev->dev, "Failure %d from pcibios_sriov_setup()\n",
>>+				retval);
>
>		dev_err(&dev->dev, "Failure %d from pcibios_sriov_enable()\n",
>			retval);

Thanks

>
>>+		return retval;
>>+	}
>>+
>> 	for (i = 0; i < initial; i++) {
>> 		rc = virtfn_add(dev, i, 0);
>> 		if (rc)
>>@@ -368,6 +380,11 @@ failed:
>> 	return rc;
>> }
>>
>>+int __weak pcibios_sriov_disable(struct pci_dev *pdev)
>>+{
>>+       return 0;
>>+}
>>+
>> static void sriov_disable(struct pci_dev *dev)
>> {
>> 	int i;
>>@@ -382,6 +399,8 @@ static void sriov_disable(struct pci_dev *dev)
>> 	for (i = 0; i < iov->num_VFs; i++)
>> 		virtfn_remove(dev, i, 0);
>>
>>+	pcibios_sriov_disable(dev);
>>+
>> 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
>> 	pci_cfg_access_lock(dev);
>> 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 04/17] PCI: SRIOV: add VF enable/disable hook
@ 2014-06-23  6:29       ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  6:29 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linux-pci, yan, bhelgaas, qiudayu, linuxppc-dev

On Mon, Jun 23, 2014 at 03:03:10PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:26AM +0800, Wei Yang wrote:
>>VFs are dynamically created/released when driver enable them. On some
>>platforms, like PowerNV, special resources are necessary to enable VFs.
>>
>>This patch adds two hooks for platform initialization before creating the VFs.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> drivers/pci/iov.c |   19 +++++++++++++++++++
>> 1 file changed, 19 insertions(+)
>>
>>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>index 1d21f43..cc87773 100644
>>--- a/drivers/pci/iov.c
>>+++ b/drivers/pci/iov.c
>>@@ -250,6 +250,11 @@ static void sriov_disable_migration(struct pci_dev *dev)
>> 	iounmap(iov->mstate);
>> }
>>
>>+int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
>>+{
>>+       return 0;
>>+}
>>+
>> static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>> {
>> 	int rc;
>>@@ -260,6 +265,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>> 	struct pci_dev *pdev;
>> 	struct pci_sriov *iov = dev->sriov;
>> 	int bars = 0;
>>+	int retval;
>>
>> 	if (!nr_virtfn)
>> 		return 0;
>>@@ -334,6 +340,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>> 	if (nr_virtfn < initial)
>> 		initial = nr_virtfn;
>>
>>+	if ((retval = pcibios_sriov_enable(dev, initial))) {
>>+		dev_err(&dev->dev, "Failure %d from pcibios_sriov_setup()\n",
>>+				retval);
>
>		dev_err(&dev->dev, "Failure %d from pcibios_sriov_enable()\n",
>			retval);

Thanks

>
>>+		return retval;
>>+	}
>>+
>> 	for (i = 0; i < initial; i++) {
>> 		rc = virtfn_add(dev, i, 0);
>> 		if (rc)
>>@@ -368,6 +380,11 @@ failed:
>> 	return rc;
>> }
>>
>>+int __weak pcibios_sriov_disable(struct pci_dev *pdev)
>>+{
>>+       return 0;
>>+}
>>+
>> static void sriov_disable(struct pci_dev *dev)
>> {
>> 	int i;
>>@@ -382,6 +399,8 @@ static void sriov_disable(struct pci_dev *dev)
>> 	for (i = 0; i < iov->num_VFs; i++)
>> 		virtfn_remove(dev, i, 0);
>>
>>+	pcibios_sriov_disable(dev);
>>+
>> 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
>> 	pci_cfg_access_lock(dev);
>> 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 05/17] ppc/pnv: user macro to define the TCE size
  2014-06-23  5:12     ` Gavin Shan
@ 2014-06-23  6:31       ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  6:31 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, yan, qiudayu

On Mon, Jun 23, 2014 at 03:12:33PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:27AM +0800, Wei Yang wrote:
>>During the initialization of the TVT/TCE, it uses digits to specify the TCE IO
>>Page Size, TCE Table Size, TCE Entry Size, etc.
>>
>>This patch replaces those digits with macros, which will be more meaningful and
>>easy to read.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>
>It looks conflicting with "dynamic page size support" posted by Alexey:
>
>http://patchwork.ozlabs.org/patch/356718/

Ok, will did some change in next version.

>
>>---
>> arch/powerpc/include/asm/tce.h            |    3 ++-
>> arch/powerpc/platforms/powernv/pci-ioda.c |   25 +++++++++++--------------
>> arch/powerpc/platforms/powernv/pci.c      |    2 +-
>> arch/powerpc/platforms/powernv/pci.h      |    5 +++++
>> 4 files changed, 19 insertions(+), 16 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
>>index 743f36b..28a1d06 100644
>>--- a/arch/powerpc/include/asm/tce.h
>>+++ b/arch/powerpc/include/asm/tce.h
>>@@ -40,7 +40,8 @@
>> #define TCE_SHIFT	12
>> #define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
>>
>>-#define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
>>+#define TCE_ENTRY_SHIFT		3
>>+#define TCE_ENTRY_SIZE		(1 << TCE_ENTRY_SHIFT)	/* each TCE is 64 bits */
>>
>> #define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
>> #define TCE_RPN_SHIFT		12
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 8ae09cf..9715351 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -820,9 +820,6 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>> 	int64_t rc;
>> 	void *addr;
>>
>>-	/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>>-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
>>-
>> 	/* XXX FIXME: Handle 64-bit only DMA devices */
>> 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
>> 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>>@@ -834,7 +831,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>> 	/* Grab a 32-bit TCE table */
>> 	pe->tce32_seg = base;
>> 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>-		(base << 28), ((base + segs) << 28) - 1);
>>+		(base << PNV_TCE32_SEG_SHIFT), ((base + segs) << PNV_TCE32_SEG_SHIFT) - 1);
>>
>> 	/* XXX Currently, we allocate one big contiguous table for the
>> 	 * TCEs. We only really need one chunk per 256M of TCE space
>>@@ -842,21 +839,21 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>> 	 * requires some added smarts with our get/put_tce implementation
>> 	 */
>> 	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
>>-				   get_order(TCE32_TABLE_SIZE * segs));
>>+				   get_order(PNV_TCE32_TAB_SIZE * segs));
>> 	if (!tce_mem) {
>> 		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
>> 		goto fail;
>> 	}
>> 	addr = page_address(tce_mem);
>>-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
>>+	memset(addr, 0, PNV_TCE32_TAB_SIZE * segs);
>>
>> 	/* Configure HW */
>> 	for (i = 0; i < segs; i++) {
>> 		rc = opal_pci_map_pe_dma_window(phb->opal_id,
>> 					      pe->pe_number,
>> 					      base + i, 1,
>>-					      __pa(addr) + TCE32_TABLE_SIZE * i,
>>-					      TCE32_TABLE_SIZE, 0x1000);
>>+					      __pa(addr) + PNV_TCE32_TAB_SIZE * i,
>>+					      PNV_TCE32_TAB_SIZE, TCE_PAGE_SIZE);
>> 		if (rc) {
>> 			pe_err(pe, " Failed to configure 32-bit TCE table,"
>> 			       " err %ld\n", rc);
>>@@ -866,8 +863,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>
>> 	/* Setup linux iommu table */
>> 	tbl = &pe->tce32_table;
>>-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
>>-				  base << 28);
>>+	pnv_pci_setup_iommu_table(tbl, addr, PNV_TCE32_TAB_SIZE * segs,
>>+				  base << PNV_TCE32_SEG_SHIFT);
>>
>> 	/* OPAL variant of P7IOC SW invalidated TCEs */
>> 	swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
>>@@ -898,7 +895,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>> 	if (pe->tce32_seg >= 0)
>> 		pe->tce32_seg = -1;
>> 	if (tce_mem)
>>-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>>+		__free_pages(tce_mem, get_order(PNV_TCE32_TAB_SIZE * segs));
>> }
>>
>> static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
>>@@ -968,7 +965,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>> 	/* The PE will reserve all possible 32-bits space */
>> 	pe->tce32_seg = 0;
>> 	end = (1 << ilog2(phb->ioda.m32_pci_base));
>>-	tce_table_size = (end / 0x1000) * 8;
>>+	tce_table_size = (end / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE;
>> 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
>> 		end);
>>
>>@@ -988,7 +985,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>> 	 */
>> 	rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
>> 					pe->pe_number << 1, 1, __pa(addr),
>>-					tce_table_size, 0x1000);
>>+					tce_table_size, TCE_PAGE_SIZE);
>> 	if (rc) {
>> 		pe_err(pe, "Failed to configure 32-bit TCE table,"
>> 		       " err %ld\n", rc);
>>@@ -1573,7 +1570,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
>> 	INIT_LIST_HEAD(&phb->ioda.pe_list);
>>
>> 	/* Calculate how many 32-bit TCE segments we have */
>>-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
>>+	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> PNV_TCE32_SEG_SHIFT;
>>
>> #if 0 /* We should really do that ... */
>> 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
>>diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>index 8518817..687a068 100644
>>--- a/arch/powerpc/platforms/powernv/pci.c
>>+++ b/arch/powerpc/platforms/powernv/pci.c
>>@@ -597,7 +597,7 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
>> 	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
>> 	tbl->it_offset = dma_offset >> tbl->it_page_shift;
>> 	tbl->it_index = 0;
>>-	tbl->it_size = tce_size >> 3;
>>+	tbl->it_size = tce_size >> TCE_ENTRY_SHIFT;
>> 	tbl->it_busno = 0;
>> 	tbl->it_type = TCE_PCI;
>> }
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 3e5f5a1..90f6da4 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -227,4 +227,9 @@ extern void pnv_pci_init_ioda2_phb(struct device_node *np);
>> extern void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
>> 					__be64 *startp, __be64 *endp, bool rm);
>>
>>+#define PNV_TCE32_SEG_SHIFT     28
>>+#define PNV_TCE32_SEG_SIZE      (1UL << PNV_TCE32_SEG_SHIFT)
>>+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>>+#define PNV_TCE32_TAB_SIZE	((PNV_TCE32_SEG_SIZE / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE)
>>+
>> #endif /* __POWERNV_PCI_H */
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 05/17] ppc/pnv: user macro to define the TCE size
@ 2014-06-23  6:31       ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  6:31 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linux-pci, yan, bhelgaas, qiudayu, linuxppc-dev

On Mon, Jun 23, 2014 at 03:12:33PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:27AM +0800, Wei Yang wrote:
>>During the initialization of the TVT/TCE, it uses digits to specify the TCE IO
>>Page Size, TCE Table Size, TCE Entry Size, etc.
>>
>>This patch replaces those digits with macros, which will be more meaningful and
>>easy to read.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>
>It looks conflicting with "dynamic page size support" posted by Alexey:
>
>http://patchwork.ozlabs.org/patch/356718/

Ok, will did some change in next version.

>
>>---
>> arch/powerpc/include/asm/tce.h            |    3 ++-
>> arch/powerpc/platforms/powernv/pci-ioda.c |   25 +++++++++++--------------
>> arch/powerpc/platforms/powernv/pci.c      |    2 +-
>> arch/powerpc/platforms/powernv/pci.h      |    5 +++++
>> 4 files changed, 19 insertions(+), 16 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
>>index 743f36b..28a1d06 100644
>>--- a/arch/powerpc/include/asm/tce.h
>>+++ b/arch/powerpc/include/asm/tce.h
>>@@ -40,7 +40,8 @@
>> #define TCE_SHIFT	12
>> #define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
>>
>>-#define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
>>+#define TCE_ENTRY_SHIFT		3
>>+#define TCE_ENTRY_SIZE		(1 << TCE_ENTRY_SHIFT)	/* each TCE is 64 bits */
>>
>> #define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
>> #define TCE_RPN_SHIFT		12
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 8ae09cf..9715351 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -820,9 +820,6 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>> 	int64_t rc;
>> 	void *addr;
>>
>>-	/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>>-#define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
>>-
>> 	/* XXX FIXME: Handle 64-bit only DMA devices */
>> 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
>> 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>>@@ -834,7 +831,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>> 	/* Grab a 32-bit TCE table */
>> 	pe->tce32_seg = base;
>> 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>-		(base << 28), ((base + segs) << 28) - 1);
>>+		(base << PNV_TCE32_SEG_SHIFT), ((base + segs) << PNV_TCE32_SEG_SHIFT) - 1);
>>
>> 	/* XXX Currently, we allocate one big contiguous table for the
>> 	 * TCEs. We only really need one chunk per 256M of TCE space
>>@@ -842,21 +839,21 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>> 	 * requires some added smarts with our get/put_tce implementation
>> 	 */
>> 	tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
>>-				   get_order(TCE32_TABLE_SIZE * segs));
>>+				   get_order(PNV_TCE32_TAB_SIZE * segs));
>> 	if (!tce_mem) {
>> 		pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
>> 		goto fail;
>> 	}
>> 	addr = page_address(tce_mem);
>>-	memset(addr, 0, TCE32_TABLE_SIZE * segs);
>>+	memset(addr, 0, PNV_TCE32_TAB_SIZE * segs);
>>
>> 	/* Configure HW */
>> 	for (i = 0; i < segs; i++) {
>> 		rc = opal_pci_map_pe_dma_window(phb->opal_id,
>> 					      pe->pe_number,
>> 					      base + i, 1,
>>-					      __pa(addr) + TCE32_TABLE_SIZE * i,
>>-					      TCE32_TABLE_SIZE, 0x1000);
>>+					      __pa(addr) + PNV_TCE32_TAB_SIZE * i,
>>+					      PNV_TCE32_TAB_SIZE, TCE_PAGE_SIZE);
>> 		if (rc) {
>> 			pe_err(pe, " Failed to configure 32-bit TCE table,"
>> 			       " err %ld\n", rc);
>>@@ -866,8 +863,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>>
>> 	/* Setup linux iommu table */
>> 	tbl = &pe->tce32_table;
>>-	pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
>>-				  base << 28);
>>+	pnv_pci_setup_iommu_table(tbl, addr, PNV_TCE32_TAB_SIZE * segs,
>>+				  base << PNV_TCE32_SEG_SHIFT);
>>
>> 	/* OPAL variant of P7IOC SW invalidated TCEs */
>> 	swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
>>@@ -898,7 +895,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>> 	if (pe->tce32_seg >= 0)
>> 		pe->tce32_seg = -1;
>> 	if (tce_mem)
>>-		__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
>>+		__free_pages(tce_mem, get_order(PNV_TCE32_TAB_SIZE * segs));
>> }
>>
>> static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
>>@@ -968,7 +965,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>> 	/* The PE will reserve all possible 32-bits space */
>> 	pe->tce32_seg = 0;
>> 	end = (1 << ilog2(phb->ioda.m32_pci_base));
>>-	tce_table_size = (end / 0x1000) * 8;
>>+	tce_table_size = (end / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE;
>> 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
>> 		end);
>>
>>@@ -988,7 +985,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>> 	 */
>> 	rc = opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
>> 					pe->pe_number << 1, 1, __pa(addr),
>>-					tce_table_size, 0x1000);
>>+					tce_table_size, TCE_PAGE_SIZE);
>> 	if (rc) {
>> 		pe_err(pe, "Failed to configure 32-bit TCE table,"
>> 		       " err %ld\n", rc);
>>@@ -1573,7 +1570,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
>> 	INIT_LIST_HEAD(&phb->ioda.pe_list);
>>
>> 	/* Calculate how many 32-bit TCE segments we have */
>>-	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
>>+	phb->ioda.tce32_count = phb->ioda.m32_pci_base >> PNV_TCE32_SEG_SHIFT;
>>
>> #if 0 /* We should really do that ... */
>> 	rc = opal_pci_set_phb_mem_window(opal->phb_id,
>>diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>index 8518817..687a068 100644
>>--- a/arch/powerpc/platforms/powernv/pci.c
>>+++ b/arch/powerpc/platforms/powernv/pci.c
>>@@ -597,7 +597,7 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
>> 	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
>> 	tbl->it_offset = dma_offset >> tbl->it_page_shift;
>> 	tbl->it_index = 0;
>>-	tbl->it_size = tce_size >> 3;
>>+	tbl->it_size = tce_size >> TCE_ENTRY_SHIFT;
>> 	tbl->it_busno = 0;
>> 	tbl->it_type = TCE_PCI;
>> }
>>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>>index 3e5f5a1..90f6da4 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -227,4 +227,9 @@ extern void pnv_pci_init_ioda2_phb(struct device_node *np);
>> extern void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
>> 					__be64 *startp, __be64 *endp, bool rm);
>>
>>+#define PNV_TCE32_SEG_SHIFT     28
>>+#define PNV_TCE32_SEG_SIZE      (1UL << PNV_TCE32_SEG_SHIFT)
>>+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>>+#define PNV_TCE32_TAB_SIZE	((PNV_TCE32_SEG_SIZE / TCE_PAGE_SIZE) * TCE_ENTRY_SIZE)
>>+
>> #endif /* __POWERNV_PCI_H */
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 11/17] ppc/pnv: Expand VF resources according to the number of total_pe
  2014-06-23  6:07     ` Gavin Shan
@ 2014-06-23  6:56       ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  6:56 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, yan, qiudayu

On Mon, Jun 23, 2014 at 04:07:07PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:33AM +0800, Wei Yang wrote:
>>On PHB3, VF resources will be covered by M64 BAR to have better PE isolation.
>>Mostly the total_pe number is different from the total_VFs, which will lead to
>>a conflict between MMIO space and the PE number.
>>
>>This patch expands the VF resource size to reserve total_pe number of VFs'
>>resource, which prevents the conflict.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> arch/powerpc/include/asm/machdep.h        |    6 +++
>> arch/powerpc/include/asm/pci-bridge.h     |    3 ++
>> arch/powerpc/kernel/pci-common.c          |   15 ++++++
>> arch/powerpc/platforms/powernv/pci-ioda.c |   83 +++++++++++++++++++++++++++++
>> 4 files changed, 107 insertions(+)
>>
>>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>index ad3025d..2f2e770 100644
>>--- a/arch/powerpc/include/asm/machdep.h
>>+++ b/arch/powerpc/include/asm/machdep.h
>>@@ -234,9 +234,15 @@ struct machdep_calls {
>>
>> 	/* Called after scan and before resource survey */
>> 	void (*pcibios_fixup_phb)(struct pci_controller *hose);
>>+#ifdef CONFIG_PCI_IOV
>>+	void (*pcibios_fixup_sriov)(struct pci_bus *bus);
>>+#endif /* CONFIG_PCI_IOV */
>>
>> 	/* Called during PCI resource reassignment */
>> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>>+#ifdef CONFIG_PCI_IOV
>>+	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>
>	resource_size_t (*pcibios_sriov_resource_size)(struct pci_dev *, int resno);
>
>You probably can put all SRIOV related functions together:
>
>#ifdef CONFIG_PCI_IOV
>	func_a;
>	func_b;
>	 :
>#endif
>
>>+#endif /* CONFIG_PCI_IOV */
>>
>> 	/* Called to shutdown machine specific hardware not already controlled
>> 	 * by other drivers.
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index 4ca90a3..8c849d8 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -168,6 +168,9 @@ struct pci_dn {
>> #define IODA_INVALID_PE		(-1)
>> #ifdef CONFIG_PPC_POWERNV
>> 	int	pe_number;
>>+#ifdef CONFIG_PCI_IOV
>>+	u16     vfs;
>>+#endif /* CONFIG_PCI_IOV */
>> #endif
>> };
>>
>>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>index c449a26..c4e2e92 100644
>>--- a/arch/powerpc/kernel/pci-common.c
>>+++ b/arch/powerpc/kernel/pci-common.c
>>@@ -120,6 +120,16 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>> 	return 1;
>> }
>>
>>+#ifdef CONFIG_PCI_IOV
>>+resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>>+{
>>+	if (ppc_md.__pci_sriov_resource_size)
>>+		return ppc_md.__pci_sriov_resource_size(pdev, resno);
>>+
>>+	return 0;
>>+}
>>+#endif /* CONFIG_PCI_IOV */
>>+
>> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>> {
>> #ifdef CONFIG_PPC64
>>@@ -1675,6 +1685,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
>> 	if (ppc_md.pcibios_fixup_phb)
>> 		ppc_md.pcibios_fixup_phb(hose);
>>
>>+#ifdef CONFIG_PCI_IOV
>>+	if (ppc_md.pcibios_fixup_sriov)
>>+		ppc_md.pcibios_fixup_sriov(bus);
>
>One question I probably asked before: why we can't put the logic
>of ppc_md.pcibios_fixup_sriov() to ppc_md.pcibios_fixup_phb()?
>

Yep, you have asked before and I replied before too :-)

During EEH hotplug, if the PF are removed, the IOV BAR will be retrieved from
the device itself again. If I merge this fixup into
ppc_md.pcibios_fixup_phb(), this is not proper to be invoked at hotplug event.

Or fixup the phb during EEH hotplug is reasonable?


-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 11/17] ppc/pnv: Expand VF resources according to the number of total_pe
@ 2014-06-23  6:56       ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  6:56 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linux-pci, yan, bhelgaas, qiudayu, linuxppc-dev

On Mon, Jun 23, 2014 at 04:07:07PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:33AM +0800, Wei Yang wrote:
>>On PHB3, VF resources will be covered by M64 BAR to have better PE isolation.
>>Mostly the total_pe number is different from the total_VFs, which will lead to
>>a conflict between MMIO space and the PE number.
>>
>>This patch expands the VF resource size to reserve total_pe number of VFs'
>>resource, which prevents the conflict.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> arch/powerpc/include/asm/machdep.h        |    6 +++
>> arch/powerpc/include/asm/pci-bridge.h     |    3 ++
>> arch/powerpc/kernel/pci-common.c          |   15 ++++++
>> arch/powerpc/platforms/powernv/pci-ioda.c |   83 +++++++++++++++++++++++++++++
>> 4 files changed, 107 insertions(+)
>>
>>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>index ad3025d..2f2e770 100644
>>--- a/arch/powerpc/include/asm/machdep.h
>>+++ b/arch/powerpc/include/asm/machdep.h
>>@@ -234,9 +234,15 @@ struct machdep_calls {
>>
>> 	/* Called after scan and before resource survey */
>> 	void (*pcibios_fixup_phb)(struct pci_controller *hose);
>>+#ifdef CONFIG_PCI_IOV
>>+	void (*pcibios_fixup_sriov)(struct pci_bus *bus);
>>+#endif /* CONFIG_PCI_IOV */
>>
>> 	/* Called during PCI resource reassignment */
>> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>>+#ifdef CONFIG_PCI_IOV
>>+	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>
>	resource_size_t (*pcibios_sriov_resource_size)(struct pci_dev *, int resno);
>
>You probably can put all SRIOV related functions together:
>
>#ifdef CONFIG_PCI_IOV
>	func_a;
>	func_b;
>	 :
>#endif
>
>>+#endif /* CONFIG_PCI_IOV */
>>
>> 	/* Called to shutdown machine specific hardware not already controlled
>> 	 * by other drivers.
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index 4ca90a3..8c849d8 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -168,6 +168,9 @@ struct pci_dn {
>> #define IODA_INVALID_PE		(-1)
>> #ifdef CONFIG_PPC_POWERNV
>> 	int	pe_number;
>>+#ifdef CONFIG_PCI_IOV
>>+	u16     vfs;
>>+#endif /* CONFIG_PCI_IOV */
>> #endif
>> };
>>
>>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>index c449a26..c4e2e92 100644
>>--- a/arch/powerpc/kernel/pci-common.c
>>+++ b/arch/powerpc/kernel/pci-common.c
>>@@ -120,6 +120,16 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>> 	return 1;
>> }
>>
>>+#ifdef CONFIG_PCI_IOV
>>+resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>>+{
>>+	if (ppc_md.__pci_sriov_resource_size)
>>+		return ppc_md.__pci_sriov_resource_size(pdev, resno);
>>+
>>+	return 0;
>>+}
>>+#endif /* CONFIG_PCI_IOV */
>>+
>> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>> {
>> #ifdef CONFIG_PPC64
>>@@ -1675,6 +1685,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
>> 	if (ppc_md.pcibios_fixup_phb)
>> 		ppc_md.pcibios_fixup_phb(hose);
>>
>>+#ifdef CONFIG_PCI_IOV
>>+	if (ppc_md.pcibios_fixup_sriov)
>>+		ppc_md.pcibios_fixup_sriov(bus);
>
>One question I probably asked before: why we can't put the logic
>of ppc_md.pcibios_fixup_sriov() to ppc_md.pcibios_fixup_phb()?
>

Yep, you have asked before and I replied before too :-)

During EEH hotplug, if the PF are removed, the IOV BAR will be retrieved from
the device itself again. If I merge this fixup into
ppc_md.pcibios_fixup_phb(), this is not proper to be invoked at hotplug event.

Or fixup the phb during EEH hotplug is reasonable?


-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 11/17] ppc/pnv: Expand VF resources according to the number of total_pe
  2014-06-23  6:56       ` Wei Yang
@ 2014-06-23  7:08         ` Gavin Shan
  -1 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  7:08 UTC (permalink / raw)
  To: Wei Yang
  Cc: Gavin Shan, benh, linuxppc-dev, bhelgaas, linux-pci, yan, qiudayu

On Mon, Jun 23, 2014 at 02:56:52PM +0800, Wei Yang wrote:
>On Mon, Jun 23, 2014 at 04:07:07PM +1000, Gavin Shan wrote:
>>On Tue, Jun 10, 2014 at 09:56:33AM +0800, Wei Yang wrote:
>>>On PHB3, VF resources will be covered by M64 BAR to have better PE isolation.
>>>Mostly the total_pe number is different from the total_VFs, which will lead to
>>>a conflict between MMIO space and the PE number.
>>>
>>>This patch expands the VF resource size to reserve total_pe number of VFs'
>>>resource, which prevents the conflict.
>>>
>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>---
>>> arch/powerpc/include/asm/machdep.h        |    6 +++
>>> arch/powerpc/include/asm/pci-bridge.h     |    3 ++
>>> arch/powerpc/kernel/pci-common.c          |   15 ++++++
>>> arch/powerpc/platforms/powernv/pci-ioda.c |   83 +++++++++++++++++++++++++++++
>>> 4 files changed, 107 insertions(+)
>>>
>>>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>>index ad3025d..2f2e770 100644
>>>--- a/arch/powerpc/include/asm/machdep.h
>>>+++ b/arch/powerpc/include/asm/machdep.h
>>>@@ -234,9 +234,15 @@ struct machdep_calls {
>>>
>>> 	/* Called after scan and before resource survey */
>>> 	void (*pcibios_fixup_phb)(struct pci_controller *hose);
>>>+#ifdef CONFIG_PCI_IOV
>>>+	void (*pcibios_fixup_sriov)(struct pci_bus *bus);
>>>+#endif /* CONFIG_PCI_IOV */
>>>
>>> 	/* Called during PCI resource reassignment */
>>> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>>>+#ifdef CONFIG_PCI_IOV
>>>+	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>>
>>	resource_size_t (*pcibios_sriov_resource_size)(struct pci_dev *, int resno);
>>
>>You probably can put all SRIOV related functions together:
>>
>>#ifdef CONFIG_PCI_IOV
>>	func_a;
>>	func_b;
>>	 :
>>#endif
>>
>>>+#endif /* CONFIG_PCI_IOV */
>>>
>>> 	/* Called to shutdown machine specific hardware not already controlled
>>> 	 * by other drivers.
>>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>>index 4ca90a3..8c849d8 100644
>>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>>@@ -168,6 +168,9 @@ struct pci_dn {
>>> #define IODA_INVALID_PE		(-1)
>>> #ifdef CONFIG_PPC_POWERNV
>>> 	int	pe_number;
>>>+#ifdef CONFIG_PCI_IOV
>>>+	u16     vfs;
>>>+#endif /* CONFIG_PCI_IOV */
>>> #endif
>>> };
>>>
>>>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>>index c449a26..c4e2e92 100644
>>>--- a/arch/powerpc/kernel/pci-common.c
>>>+++ b/arch/powerpc/kernel/pci-common.c
>>>@@ -120,6 +120,16 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>>> 	return 1;
>>> }
>>>
>>>+#ifdef CONFIG_PCI_IOV
>>>+resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>>>+{
>>>+	if (ppc_md.__pci_sriov_resource_size)
>>>+		return ppc_md.__pci_sriov_resource_size(pdev, resno);
>>>+
>>>+	return 0;
>>>+}
>>>+#endif /* CONFIG_PCI_IOV */
>>>+
>>> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>>> {
>>> #ifdef CONFIG_PPC64
>>>@@ -1675,6 +1685,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
>>> 	if (ppc_md.pcibios_fixup_phb)
>>> 		ppc_md.pcibios_fixup_phb(hose);
>>>
>>>+#ifdef CONFIG_PCI_IOV
>>>+	if (ppc_md.pcibios_fixup_sriov)
>>>+		ppc_md.pcibios_fixup_sriov(bus);
>>
>>One question I probably asked before: why we can't put the logic
>>of ppc_md.pcibios_fixup_sriov() to ppc_md.pcibios_fixup_phb()?
>>
>
>Yep, you have asked before and I replied before too :-)
>
>During EEH hotplug, if the PF are removed, the IOV BAR will be retrieved from
>the device itself again. If I merge this fixup into
>ppc_md.pcibios_fixup_phb(), this is not proper to be invoked at hotplug event.
>
>Or fixup the phb during EEH hotplug is reasonable?
>

Yeah. It's not reasonable to apply fixup to PHB when doing hotplug on PF.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 11/17] ppc/pnv: Expand VF resources according to the number of total_pe
@ 2014-06-23  7:08         ` Gavin Shan
  0 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23  7:08 UTC (permalink / raw)
  To: Wei Yang
  Cc: benh, linux-pci, Gavin Shan, yan, bhelgaas, qiudayu, linuxppc-dev

On Mon, Jun 23, 2014 at 02:56:52PM +0800, Wei Yang wrote:
>On Mon, Jun 23, 2014 at 04:07:07PM +1000, Gavin Shan wrote:
>>On Tue, Jun 10, 2014 at 09:56:33AM +0800, Wei Yang wrote:
>>>On PHB3, VF resources will be covered by M64 BAR to have better PE isolation.
>>>Mostly the total_pe number is different from the total_VFs, which will lead to
>>>a conflict between MMIO space and the PE number.
>>>
>>>This patch expands the VF resource size to reserve total_pe number of VFs'
>>>resource, which prevents the conflict.
>>>
>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>---
>>> arch/powerpc/include/asm/machdep.h        |    6 +++
>>> arch/powerpc/include/asm/pci-bridge.h     |    3 ++
>>> arch/powerpc/kernel/pci-common.c          |   15 ++++++
>>> arch/powerpc/platforms/powernv/pci-ioda.c |   83 +++++++++++++++++++++++++++++
>>> 4 files changed, 107 insertions(+)
>>>
>>>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>>index ad3025d..2f2e770 100644
>>>--- a/arch/powerpc/include/asm/machdep.h
>>>+++ b/arch/powerpc/include/asm/machdep.h
>>>@@ -234,9 +234,15 @@ struct machdep_calls {
>>>
>>> 	/* Called after scan and before resource survey */
>>> 	void (*pcibios_fixup_phb)(struct pci_controller *hose);
>>>+#ifdef CONFIG_PCI_IOV
>>>+	void (*pcibios_fixup_sriov)(struct pci_bus *bus);
>>>+#endif /* CONFIG_PCI_IOV */
>>>
>>> 	/* Called during PCI resource reassignment */
>>> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>>>+#ifdef CONFIG_PCI_IOV
>>>+	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>>
>>	resource_size_t (*pcibios_sriov_resource_size)(struct pci_dev *, int resno);
>>
>>You probably can put all SRIOV related functions together:
>>
>>#ifdef CONFIG_PCI_IOV
>>	func_a;
>>	func_b;
>>	 :
>>#endif
>>
>>>+#endif /* CONFIG_PCI_IOV */
>>>
>>> 	/* Called to shutdown machine specific hardware not already controlled
>>> 	 * by other drivers.
>>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>>index 4ca90a3..8c849d8 100644
>>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>>@@ -168,6 +168,9 @@ struct pci_dn {
>>> #define IODA_INVALID_PE		(-1)
>>> #ifdef CONFIG_PPC_POWERNV
>>> 	int	pe_number;
>>>+#ifdef CONFIG_PCI_IOV
>>>+	u16     vfs;
>>>+#endif /* CONFIG_PCI_IOV */
>>> #endif
>>> };
>>>
>>>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>>index c449a26..c4e2e92 100644
>>>--- a/arch/powerpc/kernel/pci-common.c
>>>+++ b/arch/powerpc/kernel/pci-common.c
>>>@@ -120,6 +120,16 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>>> 	return 1;
>>> }
>>>
>>>+#ifdef CONFIG_PCI_IOV
>>>+resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>>>+{
>>>+	if (ppc_md.__pci_sriov_resource_size)
>>>+		return ppc_md.__pci_sriov_resource_size(pdev, resno);
>>>+
>>>+	return 0;
>>>+}
>>>+#endif /* CONFIG_PCI_IOV */
>>>+
>>> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>>> {
>>> #ifdef CONFIG_PPC64
>>>@@ -1675,6 +1685,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
>>> 	if (ppc_md.pcibios_fixup_phb)
>>> 		ppc_md.pcibios_fixup_phb(hose);
>>>
>>>+#ifdef CONFIG_PCI_IOV
>>>+	if (ppc_md.pcibios_fixup_sriov)
>>>+		ppc_md.pcibios_fixup_sriov(bus);
>>
>>One question I probably asked before: why we can't put the logic
>>of ppc_md.pcibios_fixup_sriov() to ppc_md.pcibios_fixup_phb()?
>>
>
>Yep, you have asked before and I replied before too :-)
>
>During EEH hotplug, if the PF are removed, the IOV BAR will be retrieved from
>the device itself again. If I merge this fixup into
>ppc_md.pcibios_fixup_phb(), this is not proper to be invoked at hotplug event.
>
>Or fixup the phb during EEH hotplug is reasonable?
>

Yeah. It's not reasonable to apply fixup to PHB when doing hotplug on PF.

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 08/17] PCI: Add weak pcibios_sriov_resource_size() interface
  2014-06-23  5:41     ` Gavin Shan
@ 2014-06-23  7:56       ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  7:56 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, yan, qiudayu

On Mon, Jun 23, 2014 at 03:41:28PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:30AM +0800, Wei Yang wrote:
>>When retrieving sriov resource size in pci_sriov_resource_size(), it will
>>divide the total IOV resource size with the totalVF number. This is true for
>>most cases, while may not be correct on some specific platform.
>>
>>For example on powernv platform, in order to fix the IOV BAR into a hardware
>>alignment, the IOV resource size would be expended. This means the original
>>method couldn't work.
>>
>>This patch introduces a weak pcibios_sriov_resource_size() interface, which
>>gives platform a chance to implement specific method to calculate the sriov
>>resource size.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> drivers/pci/iov.c   |   27 +++++++++++++++++++++++++--
>> include/linux/pci.h |    3 +++
>> 2 files changed, 28 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>index cc87773..9fd4648 100644
>>--- a/drivers/pci/iov.c
>>+++ b/drivers/pci/iov.c
>>@@ -45,6 +45,30 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus)
>> 		pci_remove_bus(virtbus);
>> }
>>
>>+resource_size_t __weak pcibios_sriov_resource_size(struct pci_dev *dev, int resno)
>>+{
>>+	return 0;
>>+}
>>+
>
>Please define the prototype of weak function in header files (e.g.
>linux/include/pci.h) :-)

Missed, will add it.

>
>If you missed doing same thing for the weak functions added in the
>previous patches, you need fix it as well.

Yep.

>
>>+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
>>+{
>>+	u64 size;
>
>I guess it'd better to be "resource_size_t". 
>
>>+	struct pci_sriov *iov;
>>+
>>+	if (!dev->is_physfn)
>>+		return 0;
>>+
>>+	size = pcibios_sriov_resource_size(dev, resno);
>>+	if (size != 0)
>>+		return size;
>>+
>>+	iov = dev->sriov;
>>+	size = resource_size(dev->resource + resno);
>>+	do_div(size, iov->total_VFs);
>>+
>>+	return size;
>>+}
>>+
>> static int virtfn_add(struct pci_dev *dev, int id, int reset)
>> {
>> 	int i;
>>@@ -81,8 +105,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
>> 			continue;
>> 		virtfn->resource[i].name = pci_name(virtfn);
>> 		virtfn->resource[i].flags = res->flags;
>>-		size = resource_size(res);
>>-		do_div(size, iov->total_VFs);
>>+		size = pci_sriov_resource_size(dev, i + PCI_IOV_RESOURCES);
>> 		virtfn->resource[i].start = res->start + size * id;
>> 		virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
>> 		rc = request_resource(res, &virtfn->resource[i]);
>>diff --git a/include/linux/pci.h b/include/linux/pci.h
>>index ddb1ca0..315c150 100644
>>--- a/include/linux/pci.h
>>+++ b/include/linux/pci.h
>>@@ -1637,6 +1637,7 @@ int pci_num_vf(struct pci_dev *dev);
>> int pci_vfs_assigned(struct pci_dev *dev);
>> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
>> int pci_sriov_get_totalvfs(struct pci_dev *dev);
>>+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno);
>> #else
>> static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
>> {
>>@@ -1658,6 +1659,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
>> { return 0; }
>> static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
>> { return 0; }
>>+static inline resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
>>+{ return -1; }
>> #endif
>>
>> #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 08/17] PCI: Add weak pcibios_sriov_resource_size() interface
@ 2014-06-23  7:56       ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  7:56 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linux-pci, yan, bhelgaas, qiudayu, linuxppc-dev

On Mon, Jun 23, 2014 at 03:41:28PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:30AM +0800, Wei Yang wrote:
>>When retrieving sriov resource size in pci_sriov_resource_size(), it will
>>divide the total IOV resource size with the totalVF number. This is true for
>>most cases, while may not be correct on some specific platform.
>>
>>For example on powernv platform, in order to fix the IOV BAR into a hardware
>>alignment, the IOV resource size would be expended. This means the original
>>method couldn't work.
>>
>>This patch introduces a weak pcibios_sriov_resource_size() interface, which
>>gives platform a chance to implement specific method to calculate the sriov
>>resource size.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> drivers/pci/iov.c   |   27 +++++++++++++++++++++++++--
>> include/linux/pci.h |    3 +++
>> 2 files changed, 28 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>index cc87773..9fd4648 100644
>>--- a/drivers/pci/iov.c
>>+++ b/drivers/pci/iov.c
>>@@ -45,6 +45,30 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus)
>> 		pci_remove_bus(virtbus);
>> }
>>
>>+resource_size_t __weak pcibios_sriov_resource_size(struct pci_dev *dev, int resno)
>>+{
>>+	return 0;
>>+}
>>+
>
>Please define the prototype of weak function in header files (e.g.
>linux/include/pci.h) :-)

Missed, will add it.

>
>If you missed doing same thing for the weak functions added in the
>previous patches, you need fix it as well.

Yep.

>
>>+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
>>+{
>>+	u64 size;
>
>I guess it'd better to be "resource_size_t". 
>
>>+	struct pci_sriov *iov;
>>+
>>+	if (!dev->is_physfn)
>>+		return 0;
>>+
>>+	size = pcibios_sriov_resource_size(dev, resno);
>>+	if (size != 0)
>>+		return size;
>>+
>>+	iov = dev->sriov;
>>+	size = resource_size(dev->resource + resno);
>>+	do_div(size, iov->total_VFs);
>>+
>>+	return size;
>>+}
>>+
>> static int virtfn_add(struct pci_dev *dev, int id, int reset)
>> {
>> 	int i;
>>@@ -81,8 +105,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
>> 			continue;
>> 		virtfn->resource[i].name = pci_name(virtfn);
>> 		virtfn->resource[i].flags = res->flags;
>>-		size = resource_size(res);
>>-		do_div(size, iov->total_VFs);
>>+		size = pci_sriov_resource_size(dev, i + PCI_IOV_RESOURCES);
>> 		virtfn->resource[i].start = res->start + size * id;
>> 		virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
>> 		rc = request_resource(res, &virtfn->resource[i]);
>>diff --git a/include/linux/pci.h b/include/linux/pci.h
>>index ddb1ca0..315c150 100644
>>--- a/include/linux/pci.h
>>+++ b/include/linux/pci.h
>>@@ -1637,6 +1637,7 @@ int pci_num_vf(struct pci_dev *dev);
>> int pci_vfs_assigned(struct pci_dev *dev);
>> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
>> int pci_sriov_get_totalvfs(struct pci_dev *dev);
>>+resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno);
>> #else
>> static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
>> {
>>@@ -1658,6 +1659,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
>> { return 0; }
>> static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
>> { return 0; }
>>+static inline resource_size_t pci_sriov_resource_size(struct pci_dev *dev, int resno)
>>+{ return -1; }
>> #endif
>>
>> #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv
  2014-06-23  6:09     ` Gavin Shan
@ 2014-06-23  8:21       ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  8:21 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, yan, qiudayu

On Mon, Jun 23, 2014 at 04:09:47PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:34AM +0800, Wei Yang wrote:
>>This patch implements the pcibios_sriov_resource_alignment() on powernv
>>platform.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> arch/powerpc/include/asm/machdep.h        |    1 +
>> arch/powerpc/kernel/pci-common.c          |    8 ++++++++
>> arch/powerpc/platforms/powernv/pci-ioda.c |   17 +++++++++++++++++
>> 3 files changed, 26 insertions(+)
>>
>>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>index 2f2e770..3bbc55f 100644
>>--- a/arch/powerpc/include/asm/machdep.h
>>+++ b/arch/powerpc/include/asm/machdep.h
>>@@ -242,6 +242,7 @@ struct machdep_calls {
>> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>> #ifdef CONFIG_PCI_IOV
>> 	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>>+	resource_size_t (*__pci_sriov_resource_alignment)(struct pci_dev *, int resno, resource_size_t align);
>> #endif /* CONFIG_PCI_IOV */
>>
>> 	/* Called to shutdown machine specific hardware not already controlled
>>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>index c4e2e92..35345ac 100644
>>--- a/arch/powerpc/kernel/pci-common.c
>>+++ b/arch/powerpc/kernel/pci-common.c
>>@@ -128,6 +128,14 @@ resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>>
>> 	return 0;
>> }
>>+
>>+resource_size_t pcibios_sriov_resource_alignment(struct pci_dev *pdev, int resno, resource_size_t align)
>>+{
>>+	if (ppc_md.__pci_sriov_resource_alignment)
>>+		return ppc_md.__pci_sriov_resource_alignment(pdev, resno, align);
>>+
>>+	return 0;
>>+}
>> #endif /* CONFIG_PCI_IOV */
>>
>> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 7dfad6a..b0ac851 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1573,6 +1573,22 @@ static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int r
>>
>> 	return size;
>> }
>>+
>>+static resource_size_t __pnv_pci_sriov_resource_alignment(struct pci_dev *pdev, int resno,
>>+		resource_size_t align)
>
>The function could be "pcibios_sriov_resource_alignment()", but it's not a big deal.
>If you prefer the original one, then keep it :)

I guess you want to name it to pnv_pcibios_sriov_resource_alignment()?
pcibios_sriov_resource_alignment() is the general name for this function.

If yes, this is changed.

>
>>+{
>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>>+	resource_size_t iov_align;
>>+
>>+	iov_align = resource_size(&pdev->resource[resno]);
>>+	if (iov_align)
>>+		return iov_align;
>>+
>>+	if (pdn->vfs)
>>+		return pdn->vfs * align;
>>+
>>+	return align;
>>+}
>> #endif /* CONFIG_PCI_IOV */
>>
>> /* Prevent enabling devices for which we couldn't properly
>>@@ -1777,6 +1793,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
>> 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
>> #ifdef CONFIG_PCI_IOV
>> 	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
>>+	ppc_md.__pci_sriov_resource_alignment = __pnv_pci_sriov_resource_alignment;
>> #endif /* CONFIG_PCI_IOV */
>> 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>>
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv
@ 2014-06-23  8:21       ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  8:21 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linux-pci, yan, bhelgaas, qiudayu, linuxppc-dev

On Mon, Jun 23, 2014 at 04:09:47PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:34AM +0800, Wei Yang wrote:
>>This patch implements the pcibios_sriov_resource_alignment() on powernv
>>platform.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> arch/powerpc/include/asm/machdep.h        |    1 +
>> arch/powerpc/kernel/pci-common.c          |    8 ++++++++
>> arch/powerpc/platforms/powernv/pci-ioda.c |   17 +++++++++++++++++
>> 3 files changed, 26 insertions(+)
>>
>>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>index 2f2e770..3bbc55f 100644
>>--- a/arch/powerpc/include/asm/machdep.h
>>+++ b/arch/powerpc/include/asm/machdep.h
>>@@ -242,6 +242,7 @@ struct machdep_calls {
>> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>> #ifdef CONFIG_PCI_IOV
>> 	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>>+	resource_size_t (*__pci_sriov_resource_alignment)(struct pci_dev *, int resno, resource_size_t align);
>> #endif /* CONFIG_PCI_IOV */
>>
>> 	/* Called to shutdown machine specific hardware not already controlled
>>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>index c4e2e92..35345ac 100644
>>--- a/arch/powerpc/kernel/pci-common.c
>>+++ b/arch/powerpc/kernel/pci-common.c
>>@@ -128,6 +128,14 @@ resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>>
>> 	return 0;
>> }
>>+
>>+resource_size_t pcibios_sriov_resource_alignment(struct pci_dev *pdev, int resno, resource_size_t align)
>>+{
>>+	if (ppc_md.__pci_sriov_resource_alignment)
>>+		return ppc_md.__pci_sriov_resource_alignment(pdev, resno, align);
>>+
>>+	return 0;
>>+}
>> #endif /* CONFIG_PCI_IOV */
>>
>> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 7dfad6a..b0ac851 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1573,6 +1573,22 @@ static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int r
>>
>> 	return size;
>> }
>>+
>>+static resource_size_t __pnv_pci_sriov_resource_alignment(struct pci_dev *pdev, int resno,
>>+		resource_size_t align)
>
>The function could be "pcibios_sriov_resource_alignment()", but it's not a big deal.
>If you prefer the original one, then keep it :)

I guess you want to name it to pnv_pcibios_sriov_resource_alignment()?
pcibios_sriov_resource_alignment() is the general name for this function.

If yes, this is changed.

>
>>+{
>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>>+	resource_size_t iov_align;
>>+
>>+	iov_align = resource_size(&pdev->resource[resno]);
>>+	if (iov_align)
>>+		return iov_align;
>>+
>>+	if (pdn->vfs)
>>+		return pdn->vfs * align;
>>+
>>+	return align;
>>+}
>> #endif /* CONFIG_PCI_IOV */
>>
>> /* Prevent enabling devices for which we couldn't properly
>>@@ -1777,6 +1793,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
>> 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
>> #ifdef CONFIG_PCI_IOV
>> 	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
>>+	ppc_md.__pci_sriov_resource_alignment = __pnv_pci_sriov_resource_alignment;
>> #endif /* CONFIG_PCI_IOV */
>> 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>>
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 07/17] ppc/pnv: Add function to deconfig a PE
  2014-06-23  5:27     ` Gavin Shan
@ 2014-06-23  9:07       ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  9:07 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, yan, qiudayu

On Mon, Jun 23, 2014 at 03:27:21PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:29AM +0800, Wei Yang wrote:
>>On PowerNV platform, it will support dynamic PE allocation and deallocation.
>>
>>This patch adds a function to release those resources related to a PE.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> arch/powerpc/platforms/powernv/pci-ioda.c |   77 +++++++++++++++++++++++++++++
>> 1 file changed, 77 insertions(+)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 8ca3926..87cb3089 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -330,6 +330,83 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
>> }
>> #endif /* CONFIG_PCI_MSI */
>>
>>+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>+{
>
>Richard, it seems that the deconfiguration is incomplete. Something seems
>missed: DMA, IO and MMIO, MSI. If I understand correctly, pnv_ioda_deconfigure_pe()
>won't tear down DMA, IO and MMIO, MSI properly. For MSI/MSIx, it wouldn't
>be a problem as the VF driver should disable them before calling this function.
>

Hmm... the deconfiguration function is the counterpart of the configuration
function, so it will release the resource which are allocated in
configuration. The DMA, IO/MMIO, MSI is not assigned in the configuration
function, so it would not proper to release those resources by this function.

>>+	struct pci_dev *parent;
>>+	uint8_t bcomp, dcomp, fcomp;
>>+	int64_t rc;
>>+	long rid_end, rid;
>
>Blank line needed here to separate variable declaration and logic. And I think
>we won't run into case "if (pe->pbus)" for now. So it's worthy to have some
>comments to explain it for a bit :-)

Ok.

>
>>+	if (pe->pbus) {
>>+		int count;
>>+
>>+		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
>>+		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
>>+		parent = pe->pbus->self;
>>+		if (pe->flags & PNV_IODA_PE_BUS_ALL)
>>+			count = pe->pbus->busn_res.end - pe->pbus->busn_res.start + 1;
>>+		else
>>+			count = 1;
>>+
>>+		switch(count) {
>>+		case  1: bcomp = OpalPciBusAll;         break;
>>+		case  2: bcomp = OpalPciBus7Bits;       break;
>>+		case  4: bcomp = OpalPciBus6Bits;       break;
>>+		case  8: bcomp = OpalPciBus5Bits;       break;
>>+		case 16: bcomp = OpalPciBus4Bits;       break;
>>+		case 32: bcomp = OpalPciBus3Bits;       break;
>>+		default:
>>+			pr_err("%s: Number of subordinate busses %d"
>>+			       " unsupported\n",
>>+			       pci_name(pe->pbus->self), count);
>
>I guess it's not safe to do "pci_name(pe->pbus->self)" root root bus.
>

Ok, so there is a bug in the original code, will fix this.

>>+			/* Do an exact match only */
>>+			bcomp = OpalPciBusAll;
>>+		}
>>+		rid_end = pe->rid + (count << 8);
>>+	}else {
>
>	} else {
>
>>+		parent = pe->pdev->bus->self;
>>+		bcomp = OpalPciBusAll;
>>+		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
>>+		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
>>+		rid_end = pe->rid + 1;
>>+	}
>>+
>>+	/* Disable MVT on IODA1 */
>>+	if (phb->type == PNV_PHB_IODA1) {
>>+		rc = opal_pci_set_mve_enable(phb->opal_id,
>>+					     pe->mve_number, OPAL_DISABLE_MVE);
>>+		if (rc) {
>>+			pe_err(pe, "OPAL error %ld enabling MVE %d\n",
>>+			       rc, pe->mve_number);
>>+			pe->mve_number = -1;
>>+		}
>>+	}
>>+	/* Clear the reverse map */
>>+	for (rid = pe->rid; rid < rid_end; rid++)
>>+		phb->ioda.pe_rmap[rid] = 0;
>>+
>>+	/* Release from all parents PELT-V */
>>+	while (parent) {
>>+		struct pci_dn *pdn = pci_get_pdn(parent);
>>+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
>>+			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
>>+						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
>>+			/* XXX What to do in case of error ? */
>>+		}
>>+		parent = parent->bus->self;
>>+	}
>
>It seems that you missed removing the PE from its own PELTV, which was
>introduced by commit 631ad69 ("powerpc/powernv: Add PE to its own PELTV").
>

Sounds correct, this is missed.

>>+
>>+	/* Dissociate PE in PELT */
>>+	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
>>+			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
>>+	if (rc)
>>+		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
>>+
>>+	pe->pbus = NULL;
>>+	pe->pdev = NULL;
>>+
>>+	return 0;
>>+}
>>+
>> static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>> {
>> 	struct pci_dev *parent;
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 07/17] ppc/pnv: Add function to deconfig a PE
@ 2014-06-23  9:07       ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-23  9:07 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linux-pci, yan, bhelgaas, qiudayu, linuxppc-dev

On Mon, Jun 23, 2014 at 03:27:21PM +1000, Gavin Shan wrote:
>On Tue, Jun 10, 2014 at 09:56:29AM +0800, Wei Yang wrote:
>>On PowerNV platform, it will support dynamic PE allocation and deallocation.
>>
>>This patch adds a function to release those resources related to a PE.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>> arch/powerpc/platforms/powernv/pci-ioda.c |   77 +++++++++++++++++++++++++++++
>> 1 file changed, 77 insertions(+)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 8ca3926..87cb3089 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -330,6 +330,83 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
>> }
>> #endif /* CONFIG_PCI_MSI */
>>
>>+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>>+{
>
>Richard, it seems that the deconfiguration is incomplete. Something seems
>missed: DMA, IO and MMIO, MSI. If I understand correctly, pnv_ioda_deconfigure_pe()
>won't tear down DMA, IO and MMIO, MSI properly. For MSI/MSIx, it wouldn't
>be a problem as the VF driver should disable them before calling this function.
>

Hmm... the deconfiguration function is the counterpart of the configuration
function, so it will release the resource which are allocated in
configuration. The DMA, IO/MMIO, MSI is not assigned in the configuration
function, so it would not proper to release those resources by this function.

>>+	struct pci_dev *parent;
>>+	uint8_t bcomp, dcomp, fcomp;
>>+	int64_t rc;
>>+	long rid_end, rid;
>
>Blank line needed here to separate variable declaration and logic. And I think
>we won't run into case "if (pe->pbus)" for now. So it's worthy to have some
>comments to explain it for a bit :-)

Ok.

>
>>+	if (pe->pbus) {
>>+		int count;
>>+
>>+		dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
>>+		fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
>>+		parent = pe->pbus->self;
>>+		if (pe->flags & PNV_IODA_PE_BUS_ALL)
>>+			count = pe->pbus->busn_res.end - pe->pbus->busn_res.start + 1;
>>+		else
>>+			count = 1;
>>+
>>+		switch(count) {
>>+		case  1: bcomp = OpalPciBusAll;         break;
>>+		case  2: bcomp = OpalPciBus7Bits;       break;
>>+		case  4: bcomp = OpalPciBus6Bits;       break;
>>+		case  8: bcomp = OpalPciBus5Bits;       break;
>>+		case 16: bcomp = OpalPciBus4Bits;       break;
>>+		case 32: bcomp = OpalPciBus3Bits;       break;
>>+		default:
>>+			pr_err("%s: Number of subordinate busses %d"
>>+			       " unsupported\n",
>>+			       pci_name(pe->pbus->self), count);
>
>I guess it's not safe to do "pci_name(pe->pbus->self)" root root bus.
>

Ok, so there is a bug in the original code, will fix this.

>>+			/* Do an exact match only */
>>+			bcomp = OpalPciBusAll;
>>+		}
>>+		rid_end = pe->rid + (count << 8);
>>+	}else {
>
>	} else {
>
>>+		parent = pe->pdev->bus->self;
>>+		bcomp = OpalPciBusAll;
>>+		dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
>>+		fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
>>+		rid_end = pe->rid + 1;
>>+	}
>>+
>>+	/* Disable MVT on IODA1 */
>>+	if (phb->type == PNV_PHB_IODA1) {
>>+		rc = opal_pci_set_mve_enable(phb->opal_id,
>>+					     pe->mve_number, OPAL_DISABLE_MVE);
>>+		if (rc) {
>>+			pe_err(pe, "OPAL error %ld enabling MVE %d\n",
>>+			       rc, pe->mve_number);
>>+			pe->mve_number = -1;
>>+		}
>>+	}
>>+	/* Clear the reverse map */
>>+	for (rid = pe->rid; rid < rid_end; rid++)
>>+		phb->ioda.pe_rmap[rid] = 0;
>>+
>>+	/* Release from all parents PELT-V */
>>+	while (parent) {
>>+		struct pci_dn *pdn = pci_get_pdn(parent);
>>+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
>>+			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
>>+						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
>>+			/* XXX What to do in case of error ? */
>>+		}
>>+		parent = parent->bus->self;
>>+	}
>
>It seems that you missed removing the PE from its own PELTV, which was
>introduced by commit 631ad69 ("powerpc/powernv: Add PE to its own PELTV").
>

Sounds correct, this is missed.

>>+
>>+	/* Dissociate PE in PELT */
>>+	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
>>+			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
>>+	if (rc)
>>+		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
>>+
>>+	pe->pbus = NULL;
>>+	pe->pdev = NULL;
>>+
>>+	return 0;
>>+}
>>+
>> static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>> {
>> 	struct pci_dev *parent;
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv
  2014-06-23  8:21       ` Wei Yang
@ 2014-06-23 23:29         ` Gavin Shan
  -1 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23 23:29 UTC (permalink / raw)
  To: Wei Yang
  Cc: Gavin Shan, benh, linuxppc-dev, bhelgaas, linux-pci, yan, qiudayu

On Mon, Jun 23, 2014 at 04:21:42PM +0800, Wei Yang wrote:
>On Mon, Jun 23, 2014 at 04:09:47PM +1000, Gavin Shan wrote:
>>On Tue, Jun 10, 2014 at 09:56:34AM +0800, Wei Yang wrote:
>>>This patch implements the pcibios_sriov_resource_alignment() on powernv
>>>platform.
>>>
>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>---
>>> arch/powerpc/include/asm/machdep.h        |    1 +
>>> arch/powerpc/kernel/pci-common.c          |    8 ++++++++
>>> arch/powerpc/platforms/powernv/pci-ioda.c |   17 +++++++++++++++++
>>> 3 files changed, 26 insertions(+)
>>>
>>>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>>index 2f2e770..3bbc55f 100644
>>>--- a/arch/powerpc/include/asm/machdep.h
>>>+++ b/arch/powerpc/include/asm/machdep.h
>>>@@ -242,6 +242,7 @@ struct machdep_calls {
>>> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>>> #ifdef CONFIG_PCI_IOV
>>> 	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>>>+	resource_size_t (*__pci_sriov_resource_alignment)(struct pci_dev *, int resno, resource_size_t align);

Both lines exceed 80 lines here :)

>>> #endif /* CONFIG_PCI_IOV */
>>>
>>> 	/* Called to shutdown machine specific hardware not already controlled
>>>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>>index c4e2e92..35345ac 100644
>>>--- a/arch/powerpc/kernel/pci-common.c
>>>+++ b/arch/powerpc/kernel/pci-common.c
>>>@@ -128,6 +128,14 @@ resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>>>
>>> 	return 0;
>>> }
>>>+
>>>+resource_size_t pcibios_sriov_resource_alignment(struct pci_dev *pdev, int resno, resource_size_t align)
>>>+{
>>>+	if (ppc_md.__pci_sriov_resource_alignment)
>>>+		return ppc_md.__pci_sriov_resource_alignment(pdev, resno, align);
>>>+
>>>+	return 0;
>>>+}
>>> #endif /* CONFIG_PCI_IOV */
>>>
>>> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 7dfad6a..b0ac851 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -1573,6 +1573,22 @@ static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int r
>>>
>>> 	return size;
>>> }
>>>+
>>>+static resource_size_t __pnv_pci_sriov_resource_alignment(struct pci_dev *pdev, int resno,
>>>+		resource_size_t align)
>>
>>The function could be "pcibios_sriov_resource_alignment()", but it's not a big deal.
>>If you prefer the original one, then keep it :)
>
>I guess you want to name it to pnv_pcibios_sriov_resource_alignment()?
>pcibios_sriov_resource_alignment() is the general name for this function.
>
>If yes, this is changed.
>

Nope, What I mean is to have something like this:

	struct machdep_calls {
		:
	#ifdef CONFIG_PCI_IOV
	resource_size_t (*pci_sriov_resource_size)(struct pci_dev *dev,
						   int resno);
	resource_size_t (*pci_sriov_resource_alignment)(struct pci_dev *dev,
							int resno,
							resource_size_t align);
	#endif
		:
	}

	ppc_md.pci_sriov_resource_size = pnv_pci_iov_res_size;
	ppc_md.pci_sriov_resource_alignment = pnv_pci_iov_res_alignment;

The point is not to have prefix "__" for callbacks in "struct machdep_calls".
ppc_md.__pci_sriov_resource_size is the first one that has prefix "__"

>>
>>>+{
>>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>+	resource_size_t iov_align;
>>>+
>>>+	iov_align = resource_size(&pdev->resource[resno]);
>>>+	if (iov_align)
>>>+		return iov_align;
>>>+
>>>+	if (pdn->vfs)
>>>+		return pdn->vfs * align;
>>>+
>>>+	return align;
>>>+}
>>> #endif /* CONFIG_PCI_IOV */
>>>
>>> /* Prevent enabling devices for which we couldn't properly
>>>@@ -1777,6 +1793,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>> 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
>>> #ifdef CONFIG_PCI_IOV
>>> 	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
>>>+	ppc_md.__pci_sriov_resource_alignment = __pnv_pci_sriov_resource_alignment;
>>> #endif /* CONFIG_PCI_IOV */
>>> 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>>>

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv
@ 2014-06-23 23:29         ` Gavin Shan
  0 siblings, 0 replies; 100+ messages in thread
From: Gavin Shan @ 2014-06-23 23:29 UTC (permalink / raw)
  To: Wei Yang
  Cc: benh, linux-pci, Gavin Shan, yan, bhelgaas, qiudayu, linuxppc-dev

On Mon, Jun 23, 2014 at 04:21:42PM +0800, Wei Yang wrote:
>On Mon, Jun 23, 2014 at 04:09:47PM +1000, Gavin Shan wrote:
>>On Tue, Jun 10, 2014 at 09:56:34AM +0800, Wei Yang wrote:
>>>This patch implements the pcibios_sriov_resource_alignment() on powernv
>>>platform.
>>>
>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>---
>>> arch/powerpc/include/asm/machdep.h        |    1 +
>>> arch/powerpc/kernel/pci-common.c          |    8 ++++++++
>>> arch/powerpc/platforms/powernv/pci-ioda.c |   17 +++++++++++++++++
>>> 3 files changed, 26 insertions(+)
>>>
>>>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>>index 2f2e770..3bbc55f 100644
>>>--- a/arch/powerpc/include/asm/machdep.h
>>>+++ b/arch/powerpc/include/asm/machdep.h
>>>@@ -242,6 +242,7 @@ struct machdep_calls {
>>> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>>> #ifdef CONFIG_PCI_IOV
>>> 	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>>>+	resource_size_t (*__pci_sriov_resource_alignment)(struct pci_dev *, int resno, resource_size_t align);

Both lines exceed 80 lines here :)

>>> #endif /* CONFIG_PCI_IOV */
>>>
>>> 	/* Called to shutdown machine specific hardware not already controlled
>>>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>>index c4e2e92..35345ac 100644
>>>--- a/arch/powerpc/kernel/pci-common.c
>>>+++ b/arch/powerpc/kernel/pci-common.c
>>>@@ -128,6 +128,14 @@ resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>>>
>>> 	return 0;
>>> }
>>>+
>>>+resource_size_t pcibios_sriov_resource_alignment(struct pci_dev *pdev, int resno, resource_size_t align)
>>>+{
>>>+	if (ppc_md.__pci_sriov_resource_alignment)
>>>+		return ppc_md.__pci_sriov_resource_alignment(pdev, resno, align);
>>>+
>>>+	return 0;
>>>+}
>>> #endif /* CONFIG_PCI_IOV */
>>>
>>> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 7dfad6a..b0ac851 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -1573,6 +1573,22 @@ static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int r
>>>
>>> 	return size;
>>> }
>>>+
>>>+static resource_size_t __pnv_pci_sriov_resource_alignment(struct pci_dev *pdev, int resno,
>>>+		resource_size_t align)
>>
>>The function could be "pcibios_sriov_resource_alignment()", but it's not a big deal.
>>If you prefer the original one, then keep it :)
>
>I guess you want to name it to pnv_pcibios_sriov_resource_alignment()?
>pcibios_sriov_resource_alignment() is the general name for this function.
>
>If yes, this is changed.
>

Nope, What I mean is to have something like this:

	struct machdep_calls {
		:
	#ifdef CONFIG_PCI_IOV
	resource_size_t (*pci_sriov_resource_size)(struct pci_dev *dev,
						   int resno);
	resource_size_t (*pci_sriov_resource_alignment)(struct pci_dev *dev,
							int resno,
							resource_size_t align);
	#endif
		:
	}

	ppc_md.pci_sriov_resource_size = pnv_pci_iov_res_size;
	ppc_md.pci_sriov_resource_alignment = pnv_pci_iov_res_alignment;

The point is not to have prefix "__" for callbacks in "struct machdep_calls".
ppc_md.__pci_sriov_resource_size is the first one that has prefix "__"

>>
>>>+{
>>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>+	resource_size_t iov_align;
>>>+
>>>+	iov_align = resource_size(&pdev->resource[resno]);
>>>+	if (iov_align)
>>>+		return iov_align;
>>>+
>>>+	if (pdn->vfs)
>>>+		return pdn->vfs * align;
>>>+
>>>+	return align;
>>>+}
>>> #endif /* CONFIG_PCI_IOV */
>>>
>>> /* Prevent enabling devices for which we couldn't properly
>>>@@ -1777,6 +1793,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>> 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
>>> #ifdef CONFIG_PCI_IOV
>>> 	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
>>>+	ppc_md.__pci_sriov_resource_alignment = __pnv_pci_sriov_resource_alignment;
>>> #endif /* CONFIG_PCI_IOV */
>>> 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>>>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv
  2014-06-23 23:29         ` Gavin Shan
@ 2014-06-24  1:24           ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-24  1:24 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, yan, qiudayu

On Tue, Jun 24, 2014 at 09:29:22AM +1000, Gavin Shan wrote:
>On Mon, Jun 23, 2014 at 04:21:42PM +0800, Wei Yang wrote:
>>On Mon, Jun 23, 2014 at 04:09:47PM +1000, Gavin Shan wrote:
>>>On Tue, Jun 10, 2014 at 09:56:34AM +0800, Wei Yang wrote:
>>>>This patch implements the pcibios_sriov_resource_alignment() on powernv
>>>>platform.
>>>>
>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>---
>>>> arch/powerpc/include/asm/machdep.h        |    1 +
>>>> arch/powerpc/kernel/pci-common.c          |    8 ++++++++
>>>> arch/powerpc/platforms/powernv/pci-ioda.c |   17 +++++++++++++++++
>>>> 3 files changed, 26 insertions(+)
>>>>
>>>>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>>>index 2f2e770..3bbc55f 100644
>>>>--- a/arch/powerpc/include/asm/machdep.h
>>>>+++ b/arch/powerpc/include/asm/machdep.h
>>>>@@ -242,6 +242,7 @@ struct machdep_calls {
>>>> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>>>> #ifdef CONFIG_PCI_IOV
>>>> 	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>>>>+	resource_size_t (*__pci_sriov_resource_alignment)(struct pci_dev *, int resno, resource_size_t align);
>
>Both lines exceed 80 lines here :)
>
>>>> #endif /* CONFIG_PCI_IOV */
>>>>
>>>> 	/* Called to shutdown machine specific hardware not already controlled
>>>>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>>>index c4e2e92..35345ac 100644
>>>>--- a/arch/powerpc/kernel/pci-common.c
>>>>+++ b/arch/powerpc/kernel/pci-common.c
>>>>@@ -128,6 +128,14 @@ resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>>>>
>>>> 	return 0;
>>>> }
>>>>+
>>>>+resource_size_t pcibios_sriov_resource_alignment(struct pci_dev *pdev, int resno, resource_size_t align)
>>>>+{
>>>>+	if (ppc_md.__pci_sriov_resource_alignment)
>>>>+		return ppc_md.__pci_sriov_resource_alignment(pdev, resno, align);
>>>>+
>>>>+	return 0;
>>>>+}
>>>> #endif /* CONFIG_PCI_IOV */
>>>>
>>>> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 7dfad6a..b0ac851 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -1573,6 +1573,22 @@ static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int r
>>>>
>>>> 	return size;
>>>> }
>>>>+
>>>>+static resource_size_t __pnv_pci_sriov_resource_alignment(struct pci_dev *pdev, int resno,
>>>>+		resource_size_t align)
>>>
>>>The function could be "pcibios_sriov_resource_alignment()", but it's not a big deal.
>>>If you prefer the original one, then keep it :)
>>
>>I guess you want to name it to pnv_pcibios_sriov_resource_alignment()?
>>pcibios_sriov_resource_alignment() is the general name for this function.
>>
>>If yes, this is changed.
>>
>
>Nope, What I mean is to have something like this:
>
>	struct machdep_calls {
>		:
>	#ifdef CONFIG_PCI_IOV
>	resource_size_t (*pci_sriov_resource_size)(struct pci_dev *dev,
>						   int resno);
>	resource_size_t (*pci_sriov_resource_alignment)(struct pci_dev *dev,
>							int resno,
>							resource_size_t align);
>	#endif
>		:
>	}
>
>	ppc_md.pci_sriov_resource_size = pnv_pci_iov_res_size;
>	ppc_md.pci_sriov_resource_alignment = pnv_pci_iov_res_alignment;
>
>The point is not to have prefix "__" for callbacks in "struct machdep_calls".
>ppc_md.__pci_sriov_resource_size is the first one that has prefix "__"

Yep, will change the name.

>
>>>
>>>>+{
>>>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>>+	resource_size_t iov_align;
>>>>+
>>>>+	iov_align = resource_size(&pdev->resource[resno]);
>>>>+	if (iov_align)
>>>>+		return iov_align;
>>>>+
>>>>+	if (pdn->vfs)
>>>>+		return pdn->vfs * align;
>>>>+
>>>>+	return align;
>>>>+}
>>>> #endif /* CONFIG_PCI_IOV */
>>>>
>>>> /* Prevent enabling devices for which we couldn't properly
>>>>@@ -1777,6 +1793,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>> 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
>>>> #ifdef CONFIG_PCI_IOV
>>>> 	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
>>>>+	ppc_md.__pci_sriov_resource_alignment = __pnv_pci_sriov_resource_alignment;
>>>> #endif /* CONFIG_PCI_IOV */
>>>> 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>>>>
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv
@ 2014-06-24  1:24           ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-24  1:24 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Wei Yang, benh, linux-pci, yan, bhelgaas, qiudayu, linuxppc-dev

On Tue, Jun 24, 2014 at 09:29:22AM +1000, Gavin Shan wrote:
>On Mon, Jun 23, 2014 at 04:21:42PM +0800, Wei Yang wrote:
>>On Mon, Jun 23, 2014 at 04:09:47PM +1000, Gavin Shan wrote:
>>>On Tue, Jun 10, 2014 at 09:56:34AM +0800, Wei Yang wrote:
>>>>This patch implements the pcibios_sriov_resource_alignment() on powernv
>>>>platform.
>>>>
>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>---
>>>> arch/powerpc/include/asm/machdep.h        |    1 +
>>>> arch/powerpc/kernel/pci-common.c          |    8 ++++++++
>>>> arch/powerpc/platforms/powernv/pci-ioda.c |   17 +++++++++++++++++
>>>> 3 files changed, 26 insertions(+)
>>>>
>>>>diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>>>index 2f2e770..3bbc55f 100644
>>>>--- a/arch/powerpc/include/asm/machdep.h
>>>>+++ b/arch/powerpc/include/asm/machdep.h
>>>>@@ -242,6 +242,7 @@ struct machdep_calls {
>>>> 	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>>>> #ifdef CONFIG_PCI_IOV
>>>> 	resource_size_t (*__pci_sriov_resource_size)(struct pci_dev *, int resno);
>>>>+	resource_size_t (*__pci_sriov_resource_alignment)(struct pci_dev *, int resno, resource_size_t align);
>
>Both lines exceed 80 lines here :)
>
>>>> #endif /* CONFIG_PCI_IOV */
>>>>
>>>> 	/* Called to shutdown machine specific hardware not already controlled
>>>>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>>>index c4e2e92..35345ac 100644
>>>>--- a/arch/powerpc/kernel/pci-common.c
>>>>+++ b/arch/powerpc/kernel/pci-common.c
>>>>@@ -128,6 +128,14 @@ resource_size_t pcibios_sriov_resource_size(struct pci_dev *pdev, int resno)
>>>>
>>>> 	return 0;
>>>> }
>>>>+
>>>>+resource_size_t pcibios_sriov_resource_alignment(struct pci_dev *pdev, int resno, resource_size_t align)
>>>>+{
>>>>+	if (ppc_md.__pci_sriov_resource_alignment)
>>>>+		return ppc_md.__pci_sriov_resource_alignment(pdev, resno, align);
>>>>+
>>>>+	return 0;
>>>>+}
>>>> #endif /* CONFIG_PCI_IOV */
>>>>
>>>> static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 7dfad6a..b0ac851 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -1573,6 +1573,22 @@ static resource_size_t __pnv_pci_sriov_resource_size(struct pci_dev *pdev, int r
>>>>
>>>> 	return size;
>>>> }
>>>>+
>>>>+static resource_size_t __pnv_pci_sriov_resource_alignment(struct pci_dev *pdev, int resno,
>>>>+		resource_size_t align)
>>>
>>>The function could be "pcibios_sriov_resource_alignment()", but it's not a big deal.
>>>If you prefer the original one, then keep it :)
>>
>>I guess you want to name it to pnv_pcibios_sriov_resource_alignment()?
>>pcibios_sriov_resource_alignment() is the general name for this function.
>>
>>If yes, this is changed.
>>
>
>Nope, What I mean is to have something like this:
>
>	struct machdep_calls {
>		:
>	#ifdef CONFIG_PCI_IOV
>	resource_size_t (*pci_sriov_resource_size)(struct pci_dev *dev,
>						   int resno);
>	resource_size_t (*pci_sriov_resource_alignment)(struct pci_dev *dev,
>							int resno,
>							resource_size_t align);
>	#endif
>		:
>	}
>
>	ppc_md.pci_sriov_resource_size = pnv_pci_iov_res_size;
>	ppc_md.pci_sriov_resource_alignment = pnv_pci_iov_res_alignment;
>
>The point is not to have prefix "__" for callbacks in "struct machdep_calls".
>ppc_md.__pci_sriov_resource_size is the first one that has prefix "__"

Yep, will change the name.

>
>>>
>>>>+{
>>>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>>+	resource_size_t iov_align;
>>>>+
>>>>+	iov_align = resource_size(&pdev->resource[resno]);
>>>>+	if (iov_align)
>>>>+		return iov_align;
>>>>+
>>>>+	if (pdn->vfs)
>>>>+		return pdn->vfs * align;
>>>>+
>>>>+	return align;
>>>>+}
>>>> #endif /* CONFIG_PCI_IOV */
>>>>
>>>> /* Prevent enabling devices for which we couldn't properly
>>>>@@ -1777,6 +1793,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
>>>> 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
>>>> #ifdef CONFIG_PCI_IOV
>>>> 	ppc_md.__pci_sriov_resource_size = __pnv_pci_sriov_resource_size;
>>>>+	ppc_md.__pci_sriov_resource_alignment = __pnv_pci_sriov_resource_alignment;
>>>> #endif /* CONFIG_PCI_IOV */
>>>> 	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>>>>
>
>Thanks,
>Gavin

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-10  1:56   ` Wei Yang
@ 2014-06-24 10:06     ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 100+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-24 10:06 UTC (permalink / raw)
  To: Wei Yang, benh; +Cc: linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On 06/10/2014 11:56 AM, Wei Yang wrote:
> Current iommu_table of a PE is a static field. This will have a problem when
> iommu_free_table is called.

What kind of problem? This table is per PE and PE is not going anywhere.


> 
> This patch allocate iommu_table dynamically.
> 
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/iommu.h          |    3 +++
>  arch/powerpc/platforms/powernv/pci-ioda.c |   24 +++++++++++++-----------
>  arch/powerpc/platforms/powernv/pci.h      |    2 +-
>  3 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index 42632c7..0fedacb 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -78,6 +78,9 @@ struct iommu_table {
>  	struct iommu_group *it_group;
>  #endif
>  	void (*set_bypass)(struct iommu_table *tbl, bool enable);
> +#ifdef CONFIG_PPC_POWERNV
> +	void           *data;
> +#endif
>  };
>  
>  /* Pure 2^n version of get_order */
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 9715351..8ca3926 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -608,6 +608,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
>  		return;
>  	}
>  
> +	pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
> +			GFP_KERNEL, hose->node);
> +	pe->tce32_table->data = pe;
> +
>  	/* Associate it with all child devices */
>  	pnv_ioda_setup_same_PE(bus, pe);
>  
> @@ -675,7 +679,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev
>  
>  	pe = &phb->ioda.pe_array[pdn->pe_number];
>  	WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
> -	set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table);
> +	set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table);
>  }
>  
>  static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
> @@ -702,7 +706,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
>  	} else {
>  		dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
>  		set_dma_ops(&pdev->dev, &dma_iommu_ops);
> -		set_iommu_table_base(&pdev->dev, &pe->tce32_table);
> +		set_iommu_table_base(&pdev->dev, pe->tce32_table);
>  	}
>  	return 0;
>  }
> @@ -712,7 +716,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
>  	struct pci_dev *dev;
>  
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
> -		set_iommu_table_base_and_group(&dev->dev, &pe->tce32_table);
> +		set_iommu_table_base_and_group(&dev->dev, pe->tce32_table);
>  		if (dev->subordinate)
>  			pnv_ioda_setup_bus_dma(pe, dev->subordinate);
>  	}
> @@ -798,8 +802,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe,
>  void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
>  				 __be64 *startp, __be64 *endp, bool rm)
>  {
> -	struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
> -					      tce32_table);
> +	struct pnv_ioda_pe *pe = tbl->data;
>  	struct pnv_phb *phb = pe->phb;
>  
>  	if (phb->type == PNV_PHB_IODA1)
> @@ -862,7 +865,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>  	}
>  
>  	/* Setup linux iommu table */
> -	tbl = &pe->tce32_table;
> +	tbl = pe->tce32_table;
>  	pnv_pci_setup_iommu_table(tbl, addr, PNV_TCE32_TAB_SIZE * segs,
>  				  base << PNV_TCE32_SEG_SHIFT);
>  
> @@ -900,8 +903,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>  
>  static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
>  {
> -	struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
> -					      tce32_table);
> +	struct pnv_ioda_pe *pe = tbl->data;
>  	uint16_t window_id = (pe->pe_number << 1 ) + 1;
>  	int64_t rc;
>  
> @@ -942,10 +944,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb,
>  	pe->tce_bypass_base = 1ull << 59;
>  
>  	/* Install set_bypass callback for VFIO */
> -	pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass;
> +	pe->tce32_table->set_bypass = pnv_pci_ioda2_set_bypass;
>  
>  	/* Enable bypass by default */
> -	pnv_pci_ioda2_set_bypass(&pe->tce32_table, true);
> +	pnv_pci_ioda2_set_bypass(pe->tce32_table, true);
>  }
>  
>  static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
> @@ -993,7 +995,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>  	}
>  
>  	/* Setup linux iommu table */
> -	tbl = &pe->tce32_table;
> +	tbl = pe->tce32_table;
>  	pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0);
>  
>  	/* OPAL variant of PHB3 invalidated TCEs */
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 90f6da4..9fbf7c0 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -60,7 +60,7 @@ struct pnv_ioda_pe {
>  	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
>  	int			tce32_seg;
>  	int			tce32_segcount;
> -	struct iommu_table	tce32_table;
> +	struct iommu_table	*tce32_table;
>  	phys_addr_t		tce_inval_reg_phys;
>  
>  	/* 64-bit TCE bypass region */
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-24 10:06     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 100+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-24 10:06 UTC (permalink / raw)
  To: Wei Yang, benh; +Cc: linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

On 06/10/2014 11:56 AM, Wei Yang wrote:
> Current iommu_table of a PE is a static field. This will have a problem when
> iommu_free_table is called.

What kind of problem? This table is per PE and PE is not going anywhere.


> 
> This patch allocate iommu_table dynamically.
> 
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/iommu.h          |    3 +++
>  arch/powerpc/platforms/powernv/pci-ioda.c |   24 +++++++++++++-----------
>  arch/powerpc/platforms/powernv/pci.h      |    2 +-
>  3 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index 42632c7..0fedacb 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -78,6 +78,9 @@ struct iommu_table {
>  	struct iommu_group *it_group;
>  #endif
>  	void (*set_bypass)(struct iommu_table *tbl, bool enable);
> +#ifdef CONFIG_PPC_POWERNV
> +	void           *data;
> +#endif
>  };
>  
>  /* Pure 2^n version of get_order */
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 9715351..8ca3926 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -608,6 +608,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all)
>  		return;
>  	}
>  
> +	pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
> +			GFP_KERNEL, hose->node);
> +	pe->tce32_table->data = pe;
> +
>  	/* Associate it with all child devices */
>  	pnv_ioda_setup_same_PE(bus, pe);
>  
> @@ -675,7 +679,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev
>  
>  	pe = &phb->ioda.pe_array[pdn->pe_number];
>  	WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
> -	set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table);
> +	set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table);
>  }
>  
>  static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
> @@ -702,7 +706,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
>  	} else {
>  		dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
>  		set_dma_ops(&pdev->dev, &dma_iommu_ops);
> -		set_iommu_table_base(&pdev->dev, &pe->tce32_table);
> +		set_iommu_table_base(&pdev->dev, pe->tce32_table);
>  	}
>  	return 0;
>  }
> @@ -712,7 +716,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
>  	struct pci_dev *dev;
>  
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
> -		set_iommu_table_base_and_group(&dev->dev, &pe->tce32_table);
> +		set_iommu_table_base_and_group(&dev->dev, pe->tce32_table);
>  		if (dev->subordinate)
>  			pnv_ioda_setup_bus_dma(pe, dev->subordinate);
>  	}
> @@ -798,8 +802,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe,
>  void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
>  				 __be64 *startp, __be64 *endp, bool rm)
>  {
> -	struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
> -					      tce32_table);
> +	struct pnv_ioda_pe *pe = tbl->data;
>  	struct pnv_phb *phb = pe->phb;
>  
>  	if (phb->type == PNV_PHB_IODA1)
> @@ -862,7 +865,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>  	}
>  
>  	/* Setup linux iommu table */
> -	tbl = &pe->tce32_table;
> +	tbl = pe->tce32_table;
>  	pnv_pci_setup_iommu_table(tbl, addr, PNV_TCE32_TAB_SIZE * segs,
>  				  base << PNV_TCE32_SEG_SHIFT);
>  
> @@ -900,8 +903,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
>  
>  static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
>  {
> -	struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
> -					      tce32_table);
> +	struct pnv_ioda_pe *pe = tbl->data;
>  	uint16_t window_id = (pe->pe_number << 1 ) + 1;
>  	int64_t rc;
>  
> @@ -942,10 +944,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb,
>  	pe->tce_bypass_base = 1ull << 59;
>  
>  	/* Install set_bypass callback for VFIO */
> -	pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass;
> +	pe->tce32_table->set_bypass = pnv_pci_ioda2_set_bypass;
>  
>  	/* Enable bypass by default */
> -	pnv_pci_ioda2_set_bypass(&pe->tce32_table, true);
> +	pnv_pci_ioda2_set_bypass(pe->tce32_table, true);
>  }
>  
>  static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
> @@ -993,7 +995,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
>  	}
>  
>  	/* Setup linux iommu table */
> -	tbl = &pe->tce32_table;
> +	tbl = pe->tce32_table;
>  	pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0);
>  
>  	/* OPAL variant of PHB3 invalidated TCEs */
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 90f6da4..9fbf7c0 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -60,7 +60,7 @@ struct pnv_ioda_pe {
>  	/* "Base" iommu table, ie, 4K TCEs, 32-bit DMA */
>  	int			tce32_seg;
>  	int			tce32_segcount;
> -	struct iommu_table	tce32_table;
> +	struct iommu_table	*tce32_table;
>  	phys_addr_t		tce_inval_reg_phys;
>  
>  	/* 64-bit TCE bypass region */
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-24 10:06     ` Alexey Kardashevskiy
@ 2014-06-25  1:12       ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-25  1:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>On 06/10/2014 11:56 AM, Wei Yang wrote:
>> Current iommu_table of a PE is a static field. This will have a problem when
>> iommu_free_table is called.
>
>What kind of problem? This table is per PE and PE is not going anywhere.
>

Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
they could be released on the fly. When they are released, so do the iommu
table for the PE.


-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-25  1:12       ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-25  1:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>On 06/10/2014 11:56 AM, Wei Yang wrote:
>> Current iommu_table of a PE is a static field. This will have a problem when
>> iommu_free_table is called.
>
>What kind of problem? This table is per PE and PE is not going anywhere.
>

Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
they could be released on the fly. When they are released, so do the iommu
table for the PE.


-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-25  1:12       ` Wei Yang
@ 2014-06-25  4:12         ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 100+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-25  4:12 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On 06/25/2014 11:12 AM, Wei Yang wrote:
> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>>> Current iommu_table of a PE is a static field. This will have a problem when
>>> iommu_free_table is called.
>>
>> What kind of problem? This table is per PE and PE is not going anywhere.
>>
> 
> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
> they could be released on the fly. When they are released, so do the iommu
> table for the PE.

iommu_table is a part of PE struct. When PE is released, iommu_table will
go with it as well. Why to make is a pointer? I would understand it if you
added reference counting there but no - iommu_table's lifetime is equal to
PE lifetime.



-- 
Alexey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-25  4:12         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 100+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-25  4:12 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On 06/25/2014 11:12 AM, Wei Yang wrote:
> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>>> Current iommu_table of a PE is a static field. This will have a problem when
>>> iommu_free_table is called.
>>
>> What kind of problem? This table is per PE and PE is not going anywhere.
>>
> 
> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
> they could be released on the fly. When they are released, so do the iommu
> table for the PE.

iommu_table is a part of PE struct. When PE is released, iommu_table will
go with it as well. Why to make is a pointer? I would understand it if you
added reference counting there but no - iommu_table's lifetime is equal to
PE lifetime.



-- 
Alexey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-25  4:12         ` Alexey Kardashevskiy
@ 2014-06-25  5:27           ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-25  5:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Wed, Jun 25, 2014 at 02:12:34PM +1000, Alexey Kardashevskiy wrote:
>On 06/25/2014 11:12 AM, Wei Yang wrote:
>> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>>>> Current iommu_table of a PE is a static field. This will have a problem when
>>>> iommu_free_table is called.
>>>
>>> What kind of problem? This table is per PE and PE is not going anywhere.
>>>
>> 
>> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
>> they could be released on the fly. When they are released, so do the iommu
>> table for the PE.
>
>iommu_table is a part of PE struct. When PE is released, iommu_table will
>go with it as well. Why to make is a pointer? I would understand it if you
>added reference counting there but no - iommu_table's lifetime is equal to
>PE lifetime.
>

Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
need to release the iommu table. Currently, there is one function to release
the iommu table, iommu_free_table() which takes a pointer of the iommu_table
and release it.

If the iommu table in PE is just a part of PE, it will have some problem to
release it with iommu_free_table(). That's why I make it a pointer in PE
structure.

>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-25  5:27           ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-25  5:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Wed, Jun 25, 2014 at 02:12:34PM +1000, Alexey Kardashevskiy wrote:
>On 06/25/2014 11:12 AM, Wei Yang wrote:
>> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>>>> Current iommu_table of a PE is a static field. This will have a problem when
>>>> iommu_free_table is called.
>>>
>>> What kind of problem? This table is per PE and PE is not going anywhere.
>>>
>> 
>> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
>> they could be released on the fly. When they are released, so do the iommu
>> table for the PE.
>
>iommu_table is a part of PE struct. When PE is released, iommu_table will
>go with it as well. Why to make is a pointer? I would understand it if you
>added reference counting there but no - iommu_table's lifetime is equal to
>PE lifetime.
>

Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
need to release the iommu table. Currently, there is one function to release
the iommu table, iommu_free_table() which takes a pointer of the iommu_table
and release it.

If the iommu table in PE is just a part of PE, it will have some problem to
release it with iommu_free_table(). That's why I make it a pointer in PE
structure.

>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-25  5:27           ` Wei Yang
@ 2014-06-25  7:50             ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 100+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-25  7:50 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On 06/25/2014 03:27 PM, Wei Yang wrote:
> On Wed, Jun 25, 2014 at 02:12:34PM +1000, Alexey Kardashevskiy wrote:
>> On 06/25/2014 11:12 AM, Wei Yang wrote:
>>> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>>>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>>>>> Current iommu_table of a PE is a static field. This will have a problem when
>>>>> iommu_free_table is called.
>>>>
>>>> What kind of problem? This table is per PE and PE is not going anywhere.
>>>>
>>>
>>> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
>>> they could be released on the fly. When they are released, so do the iommu
>>> table for the PE.
>>
>> iommu_table is a part of PE struct. When PE is released, iommu_table will
>> go with it as well. Why to make is a pointer? I would understand it if you
>> added reference counting there but no - iommu_table's lifetime is equal to
>> PE lifetime.
>>
> 
> Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
> need to release the iommu table. Currently, there is one function to release
> the iommu table, iommu_free_table() which takes a pointer of the iommu_table
> and release it.
> 
> If the iommu table in PE is just a part of PE, it will have some problem to
> release it with iommu_free_table(). That's why I make it a pointer in PE
> structure.

So you are saying that you want to release PE by one kfree() and release
iommu_table by another kfree (embedded into iommu_free_table()). For me
that means that PE and iommu_table have different lifetime.

And I cannot find the exact place in this patchset where you call
iommu_free_table(), what do I miss?




-- 
Alexey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-25  7:50             ` Alexey Kardashevskiy
  0 siblings, 0 replies; 100+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-25  7:50 UTC (permalink / raw)
  To: Wei Yang; +Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On 06/25/2014 03:27 PM, Wei Yang wrote:
> On Wed, Jun 25, 2014 at 02:12:34PM +1000, Alexey Kardashevskiy wrote:
>> On 06/25/2014 11:12 AM, Wei Yang wrote:
>>> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>>>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>>>>> Current iommu_table of a PE is a static field. This will have a problem when
>>>>> iommu_free_table is called.
>>>>
>>>> What kind of problem? This table is per PE and PE is not going anywhere.
>>>>
>>>
>>> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
>>> they could be released on the fly. When they are released, so do the iommu
>>> table for the PE.
>>
>> iommu_table is a part of PE struct. When PE is released, iommu_table will
>> go with it as well. Why to make is a pointer? I would understand it if you
>> added reference counting there but no - iommu_table's lifetime is equal to
>> PE lifetime.
>>
> 
> Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
> need to release the iommu table. Currently, there is one function to release
> the iommu table, iommu_free_table() which takes a pointer of the iommu_table
> and release it.
> 
> If the iommu table in PE is just a part of PE, it will have some problem to
> release it with iommu_free_table(). That's why I make it a pointer in PE
> structure.

So you are saying that you want to release PE by one kfree() and release
iommu_table by another kfree (embedded into iommu_free_table()). For me
that means that PE and iommu_table have different lifetime.

And I cannot find the exact place in this patchset where you call
iommu_free_table(), what do I miss?




-- 
Alexey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-25  7:50             ` Alexey Kardashevskiy
@ 2014-06-25  7:56               ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 100+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-25  7:56 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Wed, 2014-06-25 at 17:50 +1000, Alexey Kardashevskiy wrote:

> > Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
> > need to release the iommu table. Currently, there is one function to release
> > the iommu table, iommu_free_table() which takes a pointer of the iommu_table
> > and release it.
> > 
> > If the iommu table in PE is just a part of PE, it will have some problem to
> > release it with iommu_free_table(). That's why I make it a pointer in PE
> > structure.
> 
> So you are saying that you want to release PE by one kfree() and release
> iommu_table by another kfree (embedded into iommu_free_table()). For me
> that means that PE and iommu_table have different lifetime.
> 
> And I cannot find the exact place in this patchset where you call
> iommu_free_table(), what do I miss?

He has a point though... iommu_free_table() does a whole bunch of things
in addition to kfree at the end.

This is a discrepancy in the iommu.c code, we don't allocate the table,
it's allocated by our callers, but we do free it in iommu_free_table().

My gut feeling is that we should fix that in the core by moving the
kfree() out of iommu_free_table() and back into vio.c and
pseries/iommu.c, the only two callers, otherwise we can't wrap the table
structure inside another object if we are going to ever free it.

Cheers,
Ben.





^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-25  7:56               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 100+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-25  7:56 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Wed, 2014-06-25 at 17:50 +1000, Alexey Kardashevskiy wrote:

> > Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
> > need to release the iommu table. Currently, there is one function to release
> > the iommu table, iommu_free_table() which takes a pointer of the iommu_table
> > and release it.
> > 
> > If the iommu table in PE is just a part of PE, it will have some problem to
> > release it with iommu_free_table(). That's why I make it a pointer in PE
> > structure.
> 
> So you are saying that you want to release PE by one kfree() and release
> iommu_table by another kfree (embedded into iommu_free_table()). For me
> that means that PE and iommu_table have different lifetime.
> 
> And I cannot find the exact place in this patchset where you call
> iommu_free_table(), what do I miss?

He has a point though... iommu_free_table() does a whole bunch of things
in addition to kfree at the end.

This is a discrepancy in the iommu.c code, we don't allocate the table,
it's allocated by our callers, but we do free it in iommu_free_table().

My gut feeling is that we should fix that in the core by moving the
kfree() out of iommu_free_table() and back into vio.c and
pseries/iommu.c, the only two callers, otherwise we can't wrap the table
structure inside another object if we are going to ever free it.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-25  7:50             ` Alexey Kardashevskiy
@ 2014-06-25  9:13               ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-25  9:13 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, benh, linuxppc-dev, bhelgaas, linux-pci, gwshan, yan, qiudayu

On Wed, Jun 25, 2014 at 05:50:08PM +1000, Alexey Kardashevskiy wrote:
>On 06/25/2014 03:27 PM, Wei Yang wrote:
>> On Wed, Jun 25, 2014 at 02:12:34PM +1000, Alexey Kardashevskiy wrote:
>>> On 06/25/2014 11:12 AM, Wei Yang wrote:
>>>> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>>>>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>>>>>> Current iommu_table of a PE is a static field. This will have a problem when
>>>>>> iommu_free_table is called.
>>>>>
>>>>> What kind of problem? This table is per PE and PE is not going anywhere.
>>>>>
>>>>
>>>> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
>>>> they could be released on the fly. When they are released, so do the iommu
>>>> table for the PE.
>>>
>>> iommu_table is a part of PE struct. When PE is released, iommu_table will
>>> go with it as well. Why to make is a pointer? I would understand it if you
>>> added reference counting there but no - iommu_table's lifetime is equal to
>>> PE lifetime.
>>>
>> 
>> Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
>> need to release the iommu table. Currently, there is one function to release
>> the iommu table, iommu_free_table() which takes a pointer of the iommu_table
>> and release it.
>> 
>> If the iommu table in PE is just a part of PE, it will have some problem to
>> release it with iommu_free_table(). That's why I make it a pointer in PE
>> structure.
>
>So you are saying that you want to release PE by one kfree() and release
>iommu_table by another kfree (embedded into iommu_free_table()). For me
>that means that PE and iommu_table have different lifetime.
>

Hmm... it is right, the lifetime of these two may have some difference.

>And I cannot find the exact place in this patchset where you call
>iommu_free_table(), what do I miss?
>

This is called in pnv_pci_release_dev_dma(), which is introduced in the commit
cd740988: powerpc/powernv: allocate VF PE

>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-25  9:13               ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-25  9:13 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Wei Yang, benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On Wed, Jun 25, 2014 at 05:50:08PM +1000, Alexey Kardashevskiy wrote:
>On 06/25/2014 03:27 PM, Wei Yang wrote:
>> On Wed, Jun 25, 2014 at 02:12:34PM +1000, Alexey Kardashevskiy wrote:
>>> On 06/25/2014 11:12 AM, Wei Yang wrote:
>>>> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>>>>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>>>>>> Current iommu_table of a PE is a static field. This will have a problem when
>>>>>> iommu_free_table is called.
>>>>>
>>>>> What kind of problem? This table is per PE and PE is not going anywhere.
>>>>>
>>>>
>>>> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
>>>> they could be released on the fly. When they are released, so do the iommu
>>>> table for the PE.
>>>
>>> iommu_table is a part of PE struct. When PE is released, iommu_table will
>>> go with it as well. Why to make is a pointer? I would understand it if you
>>> added reference counting there but no - iommu_table's lifetime is equal to
>>> PE lifetime.
>>>
>> 
>> Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
>> need to release the iommu table. Currently, there is one function to release
>> the iommu table, iommu_free_table() which takes a pointer of the iommu_table
>> and release it.
>> 
>> If the iommu table in PE is just a part of PE, it will have some problem to
>> release it with iommu_free_table(). That's why I make it a pointer in PE
>> structure.
>
>So you are saying that you want to release PE by one kfree() and release
>iommu_table by another kfree (embedded into iommu_free_table()). For me
>that means that PE and iommu_table have different lifetime.
>

Hmm... it is right, the lifetime of these two may have some difference.

>And I cannot find the exact place in this patchset where you call
>iommu_free_table(), what do I miss?
>

This is called in pnv_pci_release_dev_dma(), which is introduced in the commit
cd740988: powerpc/powernv: allocate VF PE

>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-25  7:56               ` Benjamin Herrenschmidt
@ 2014-06-25  9:18                 ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-25  9:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, Wei Yang, linuxppc-dev, bhelgaas,
	linux-pci, gwshan, yan, qiudayu

On Wed, Jun 25, 2014 at 05:56:37PM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2014-06-25 at 17:50 +1000, Alexey Kardashevskiy wrote:
>
>> > Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
>> > need to release the iommu table. Currently, there is one function to release
>> > the iommu table, iommu_free_table() which takes a pointer of the iommu_table
>> > and release it.
>> > 
>> > If the iommu table in PE is just a part of PE, it will have some problem to
>> > release it with iommu_free_table(). That's why I make it a pointer in PE
>> > structure.
>> 
>> So you are saying that you want to release PE by one kfree() and release
>> iommu_table by another kfree (embedded into iommu_free_table()). For me
>> that means that PE and iommu_table have different lifetime.
>> 
>> And I cannot find the exact place in this patchset where you call
>> iommu_free_table(), what do I miss?
>
>He has a point though... iommu_free_table() does a whole bunch of things
>in addition to kfree at the end.
>
>This is a discrepancy in the iommu.c code, we don't allocate the table,
>it's allocated by our callers, but we do free it in iommu_free_table().
>
>My gut feeling is that we should fix that in the core by moving the
>kfree() out of iommu_free_table() and back into vio.c and
>pseries/iommu.c, the only two callers, otherwise we can't wrap the table
>structure inside another object if we are going to ever free it.
>

Yes, this is another option. Move the kfree() outside could keep some logic in
current code, like in pnv_pci_ioda_tce_invalidate(). We could get the tbl from
a PE structure directly, instead of adding a field in tbl to point to the PE
structure.

>Cheers,
>Ben.
>
>
>

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-25  9:18                 ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-25  9:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Wei Yang, Alexey Kardashevskiy, linux-pci, gwshan, yan, bhelgaas,
	qiudayu, linuxppc-dev

On Wed, Jun 25, 2014 at 05:56:37PM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2014-06-25 at 17:50 +1000, Alexey Kardashevskiy wrote:
>
>> > Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
>> > need to release the iommu table. Currently, there is one function to release
>> > the iommu table, iommu_free_table() which takes a pointer of the iommu_table
>> > and release it.
>> > 
>> > If the iommu table in PE is just a part of PE, it will have some problem to
>> > release it with iommu_free_table(). That's why I make it a pointer in PE
>> > structure.
>> 
>> So you are saying that you want to release PE by one kfree() and release
>> iommu_table by another kfree (embedded into iommu_free_table()). For me
>> that means that PE and iommu_table have different lifetime.
>> 
>> And I cannot find the exact place in this patchset where you call
>> iommu_free_table(), what do I miss?
>
>He has a point though... iommu_free_table() does a whole bunch of things
>in addition to kfree at the end.
>
>This is a discrepancy in the iommu.c code, we don't allocate the table,
>it's allocated by our callers, but we do free it in iommu_free_table().
>
>My gut feeling is that we should fix that in the core by moving the
>kfree() out of iommu_free_table() and back into vio.c and
>pseries/iommu.c, the only two callers, otherwise we can't wrap the table
>structure inside another object if we are going to ever free it.
>

Yes, this is another option. Move the kfree() outside could keep some logic in
current code, like in pnv_pci_ioda_tce_invalidate(). We could get the tbl from
a PE structure directly, instead of adding a field in tbl to point to the PE
structure.

>Cheers,
>Ben.
>
>
>

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-25  5:27           ` Wei Yang
@ 2014-06-25  9:20             ` David Laight
  -1 siblings, 0 replies; 100+ messages in thread
From: David Laight @ 2014-06-25  9:20 UTC (permalink / raw)
  To: 'Wei Yang', Alexey Kardashevskiy
  Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

RnJvbTogV2VpIFlhbmcNCj4gT24gV2VkLCBKdW4gMjUsIDIwMTQgYXQgMDI6MTI6MzRQTSArMTAw
MCwgQWxleGV5IEthcmRhc2hldnNraXkgd3JvdGU6DQo+ID5PbiAwNi8yNS8yMDE0IDExOjEyIEFN
LCBXZWkgWWFuZyB3cm90ZToNCj4gPj4gT24gVHVlLCBKdW4gMjQsIDIwMTQgYXQgMDg6MDY6MzJQ
TSArMTAwMCwgQWxleGV5IEthcmRhc2hldnNraXkgd3JvdGU6DQo+ID4+PiBPbiAwNi8xMC8yMDE0
IDExOjU2IEFNLCBXZWkgWWFuZyB3cm90ZToNCj4gPj4+PiBDdXJyZW50IGlvbW11X3RhYmxlIG9m
IGEgUEUgaXMgYSBzdGF0aWMgZmllbGQuIFRoaXMgd2lsbCBoYXZlIGEgcHJvYmxlbSB3aGVuDQo+
ID4+Pj4gaW9tbXVfZnJlZV90YWJsZSBpcyBjYWxsZWQuDQo+ID4+Pg0KPiA+Pj4gV2hhdCBraW5k
IG9mIHByb2JsZW0/IFRoaXMgdGFibGUgaXMgcGVyIFBFIGFuZCBQRSBpcyBub3QgZ29pbmcgYW55
d2hlcmUuDQo+ID4+Pg0KPiA+Pg0KPiA+PiBZZXMsIGZvciBCdXMgUEUsIHRoZXkgd2lsbCBhbHdh
eXMgc2l0IGluIHRoZSBzeXN0ZW0uIFdoZW4gVkYgUEUgaW50cm9kdWNlZCwNCj4gPj4gdGhleSBj
b3VsZCBiZSByZWxlYXNlZCBvbiB0aGUgZmx5LiBXaGVuIHRoZXkgYXJlIHJlbGVhc2VkLCBzbyBk
byB0aGUgaW9tbXUNCj4gPj4gdGFibGUgZm9yIHRoZSBQRS4NCj4gPg0KPiA+aW9tbXVfdGFibGUg
aXMgYSBwYXJ0IG9mIFBFIHN0cnVjdC4gV2hlbiBQRSBpcyByZWxlYXNlZCwgaW9tbXVfdGFibGUg
d2lsbA0KPiA+Z28gd2l0aCBpdCBhcyB3ZWxsLiBXaHkgdG8gbWFrZSBpcyBhIHBvaW50ZXI/IEkg
d291bGQgdW5kZXJzdGFuZCBpdCBpZiB5b3UNCj4gPmFkZGVkIHJlZmVyZW5jZSBjb3VudGluZyB0
aGVyZSBidXQgbm8gLSBpb21tdV90YWJsZSdzIGxpZmV0aW1lIGlzIGVxdWFsIHRvDQo+ID5QRSBs
aWZldGltZS4NCj4gPg0KPiANCj4gWWVzLCBpb21tdV90YWxiZSdzIGxpZmUgdGltZSBlcXVhbHMg
dG8gUEUgbGlmZXRpbWUsIHNvIHdoZW4gcmVsZWFzaW5nIGEgUEUgd2UNCj4gbmVlZCB0byByZWxl
YXNlIHRoZSBpb21tdSB0YWJsZS4gQ3VycmVudGx5LCB0aGVyZSBpcyBvbmUgZnVuY3Rpb24gdG8g
cmVsZWFzZQ0KPiB0aGUgaW9tbXUgdGFibGUsIGlvbW11X2ZyZWVfdGFibGUoKSB3aGljaCB0YWtl
cyBhIHBvaW50ZXIgb2YgdGhlIGlvbW11X3RhYmxlDQo+IGFuZCByZWxlYXNlIGl0Lg0KPiANCj4g
SWYgdGhlIGlvbW11IHRhYmxlIGluIFBFIGlzIGp1c3QgYSBwYXJ0IG9mIFBFLCBpdCB3aWxsIGhh
dmUgc29tZSBwcm9ibGVtIHRvDQo+IHJlbGVhc2UgaXQgd2l0aCBpb21tdV9mcmVlX3RhYmxlKCku
IFRoYXQncyB3aHkgSSBtYWtlIGl0IGEgcG9pbnRlciBpbiBQRQ0KPiBzdHJ1Y3R1cmUuDQoNCldo
YXQgYXJlIHRoZSBzaXplcyBvZiB0aGUgaW9tbXUgdGFibGUgYW5kIHRoZSBQRSBzdHJ1Y3R1cmU/
DQpJZiB0aGUgdGFibGUgaXMgYSByb3VuZCBudW1iZXIgb2YgcGFnZXMgdGhlbiB5b3UgcHJvYmFi
bHkgZG9uJ3Qgd2FudCB0bw0KZW1iZWQgaXQgaW5zaWRlIHRoZSBQRSBzdHJ1Y3R1cmUuDQoNCglE
YXZpZA0KDQo=

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-25  9:20             ` David Laight
  0 siblings, 0 replies; 100+ messages in thread
From: David Laight @ 2014-06-25  9:20 UTC (permalink / raw)
  To: 'Wei Yang', Alexey Kardashevskiy
  Cc: benh, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

RnJvbTogV2VpIFlhbmcNCj4gT24gV2VkLCBKdW4gMjUsIDIwMTQgYXQgMDI6MTI6MzRQTSArMTAw
MCwgQWxleGV5IEthcmRhc2hldnNraXkgd3JvdGU6DQo+ID5PbiAwNi8yNS8yMDE0IDExOjEyIEFN
LCBXZWkgWWFuZyB3cm90ZToNCj4gPj4gT24gVHVlLCBKdW4gMjQsIDIwMTQgYXQgMDg6MDY6MzJQ
TSArMTAwMCwgQWxleGV5IEthcmRhc2hldnNraXkgd3JvdGU6DQo+ID4+PiBPbiAwNi8xMC8yMDE0
IDExOjU2IEFNLCBXZWkgWWFuZyB3cm90ZToNCj4gPj4+PiBDdXJyZW50IGlvbW11X3RhYmxlIG9m
IGEgUEUgaXMgYSBzdGF0aWMgZmllbGQuIFRoaXMgd2lsbCBoYXZlIGEgcHJvYmxlbSB3aGVuDQo+
ID4+Pj4gaW9tbXVfZnJlZV90YWJsZSBpcyBjYWxsZWQuDQo+ID4+Pg0KPiA+Pj4gV2hhdCBraW5k
IG9mIHByb2JsZW0/IFRoaXMgdGFibGUgaXMgcGVyIFBFIGFuZCBQRSBpcyBub3QgZ29pbmcgYW55
d2hlcmUuDQo+ID4+Pg0KPiA+Pg0KPiA+PiBZZXMsIGZvciBCdXMgUEUsIHRoZXkgd2lsbCBhbHdh
eXMgc2l0IGluIHRoZSBzeXN0ZW0uIFdoZW4gVkYgUEUgaW50cm9kdWNlZCwNCj4gPj4gdGhleSBj
b3VsZCBiZSByZWxlYXNlZCBvbiB0aGUgZmx5LiBXaGVuIHRoZXkgYXJlIHJlbGVhc2VkLCBzbyBk
byB0aGUgaW9tbXUNCj4gPj4gdGFibGUgZm9yIHRoZSBQRS4NCj4gPg0KPiA+aW9tbXVfdGFibGUg
aXMgYSBwYXJ0IG9mIFBFIHN0cnVjdC4gV2hlbiBQRSBpcyByZWxlYXNlZCwgaW9tbXVfdGFibGUg
d2lsbA0KPiA+Z28gd2l0aCBpdCBhcyB3ZWxsLiBXaHkgdG8gbWFrZSBpcyBhIHBvaW50ZXI/IEkg
d291bGQgdW5kZXJzdGFuZCBpdCBpZiB5b3UNCj4gPmFkZGVkIHJlZmVyZW5jZSBjb3VudGluZyB0
aGVyZSBidXQgbm8gLSBpb21tdV90YWJsZSdzIGxpZmV0aW1lIGlzIGVxdWFsIHRvDQo+ID5QRSBs
aWZldGltZS4NCj4gPg0KPiANCj4gWWVzLCBpb21tdV90YWxiZSdzIGxpZmUgdGltZSBlcXVhbHMg
dG8gUEUgbGlmZXRpbWUsIHNvIHdoZW4gcmVsZWFzaW5nIGEgUEUgd2UNCj4gbmVlZCB0byByZWxl
YXNlIHRoZSBpb21tdSB0YWJsZS4gQ3VycmVudGx5LCB0aGVyZSBpcyBvbmUgZnVuY3Rpb24gdG8g
cmVsZWFzZQ0KPiB0aGUgaW9tbXUgdGFibGUsIGlvbW11X2ZyZWVfdGFibGUoKSB3aGljaCB0YWtl
cyBhIHBvaW50ZXIgb2YgdGhlIGlvbW11X3RhYmxlDQo+IGFuZCByZWxlYXNlIGl0Lg0KPiANCj4g
SWYgdGhlIGlvbW11IHRhYmxlIGluIFBFIGlzIGp1c3QgYSBwYXJ0IG9mIFBFLCBpdCB3aWxsIGhh
dmUgc29tZSBwcm9ibGVtIHRvDQo+IHJlbGVhc2UgaXQgd2l0aCBpb21tdV9mcmVlX3RhYmxlKCku
IFRoYXQncyB3aHkgSSBtYWtlIGl0IGEgcG9pbnRlciBpbiBQRQ0KPiBzdHJ1Y3R1cmUuDQoNCldo
YXQgYXJlIHRoZSBzaXplcyBvZiB0aGUgaW9tbXUgdGFibGUgYW5kIHRoZSBQRSBzdHJ1Y3R1cmU/
DQpJZiB0aGUgdGFibGUgaXMgYSByb3VuZCBudW1iZXIgb2YgcGFnZXMgdGhlbiB5b3UgcHJvYmFi
bHkgZG9uJ3Qgd2FudCB0bw0KZW1iZWQgaXQgaW5zaWRlIHRoZSBQRSBzdHJ1Y3R1cmUuDQoNCglE
YXZpZA0KDQo=

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-25  9:20             ` David Laight
@ 2014-06-25  9:31               ` Wei Yang
  -1 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-25  9:31 UTC (permalink / raw)
  To: David Laight
  Cc: 'Wei Yang',
	Alexey Kardashevskiy, benh, linux-pci, gwshan, yan, bhelgaas,
	qiudayu, linuxppc-dev

On Wed, Jun 25, 2014 at 09:20:11AM +0000, David Laight wrote:
>From: Wei Yang
>> On Wed, Jun 25, 2014 at 02:12:34PM +1000, Alexey Kardashevskiy wrote:
>> >On 06/25/2014 11:12 AM, Wei Yang wrote:
>> >> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>> >>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>> >>>> Current iommu_table of a PE is a static field. This will have a problem when
>> >>>> iommu_free_table is called.
>> >>>
>> >>> What kind of problem? This table is per PE and PE is not going anywhere.
>> >>>
>> >>
>> >> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
>> >> they could be released on the fly. When they are released, so do the iommu
>> >> table for the PE.
>> >
>> >iommu_table is a part of PE struct. When PE is released, iommu_table will
>> >go with it as well. Why to make is a pointer? I would understand it if you
>> >added reference counting there but no - iommu_table's lifetime is equal to
>> >PE lifetime.
>> >
>> 
>> Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
>> need to release the iommu table. Currently, there is one function to release
>> the iommu table, iommu_free_table() which takes a pointer of the iommu_table
>> and release it.
>> 
>> If the iommu table in PE is just a part of PE, it will have some problem to
>> release it with iommu_free_table(). That's why I make it a pointer in PE
>> structure.
>
>What are the sizes of the iommu table and the PE structure?

I calculated it in my mind, the size of iommu_table, defined in
arch/powerpc/include/asm/iommu.h is 256 bytes.

>If the table is a round number of pages then you probably don't want to
>embed it inside the PE structure.

If my understanding is correct, the iommu table structure size is not that
big.

>
>	David
>

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-25  9:31               ` Wei Yang
  0 siblings, 0 replies; 100+ messages in thread
From: Wei Yang @ 2014-06-25  9:31 UTC (permalink / raw)
  To: David Laight
  Cc: 'Wei Yang',
	benh, Alexey Kardashevskiy, linux-pci, gwshan, yan, bhelgaas,
	qiudayu, linuxppc-dev

On Wed, Jun 25, 2014 at 09:20:11AM +0000, David Laight wrote:
>From: Wei Yang
>> On Wed, Jun 25, 2014 at 02:12:34PM +1000, Alexey Kardashevskiy wrote:
>> >On 06/25/2014 11:12 AM, Wei Yang wrote:
>> >> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>> >>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>> >>>> Current iommu_table of a PE is a static field. This will have a problem when
>> >>>> iommu_free_table is called.
>> >>>
>> >>> What kind of problem? This table is per PE and PE is not going anywhere.
>> >>>
>> >>
>> >> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
>> >> they could be released on the fly. When they are released, so do the iommu
>> >> table for the PE.
>> >
>> >iommu_table is a part of PE struct. When PE is released, iommu_table will
>> >go with it as well. Why to make is a pointer? I would understand it if you
>> >added reference counting there but no - iommu_table's lifetime is equal to
>> >PE lifetime.
>> >
>> 
>> Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
>> need to release the iommu table. Currently, there is one function to release
>> the iommu table, iommu_free_table() which takes a pointer of the iommu_table
>> and release it.
>> 
>> If the iommu table in PE is just a part of PE, it will have some problem to
>> release it with iommu_free_table(). That's why I make it a pointer in PE
>> structure.
>
>What are the sizes of the iommu table and the PE structure?

I calculated it in my mind, the size of iommu_table, defined in
arch/powerpc/include/asm/iommu.h is 256 bytes.

>If the table is a round number of pages then you probably don't want to
>embed it inside the PE structure.

If my understanding is correct, the iommu table structure size is not that
big.

>
>	David
>

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-25  9:20             ` David Laight
@ 2014-06-25 10:30               ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 100+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-25 10:30 UTC (permalink / raw)
  To: David Laight, 'Wei Yang'
  Cc: benh, linux-pci, gwshan, yan, bhelgaas, qiudayu, linuxppc-dev

On 06/25/2014 07:20 PM, David Laight wrote:
> From: Wei Yang
>> On Wed, Jun 25, 2014 at 02:12:34PM +1000, Alexey Kardashevskiy wrote:
>>> On 06/25/2014 11:12 AM, Wei Yang wrote:
>>>> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>>>>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>>>>>> Current iommu_table of a PE is a static field. This will have a problem when
>>>>>> iommu_free_table is called.
>>>>>
>>>>> What kind of problem? This table is per PE and PE is not going anywhere.
>>>>>
>>>>
>>>> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
>>>> they could be released on the fly. When they are released, so do the iommu
>>>> table for the PE.
>>>
>>> iommu_table is a part of PE struct. When PE is released, iommu_table will
>>> go with it as well. Why to make is a pointer? I would understand it if you
>>> added reference counting there but no - iommu_table's lifetime is equal to
>>> PE lifetime.
>>>
>>
>> Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
>> need to release the iommu table. Currently, there is one function to release
>> the iommu table, iommu_free_table() which takes a pointer of the iommu_table
>> and release it.
>>
>> If the iommu table in PE is just a part of PE, it will have some problem to
>> release it with iommu_free_table(). That's why I make it a pointer in PE
>> structure.
> 
> What are the sizes of the iommu table and the PE structure?

This is all about iommu_table struct (which is just a descriptor), not
IOMMU table per se (which may be megabytes) :)


> If the table is a round number of pages then you probably don't want to
> embed it inside the PE structure.




-- 
Alexey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
@ 2014-06-25 10:30               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 100+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-25 10:30 UTC (permalink / raw)
  To: David Laight, 'Wei Yang'
  Cc: benh, linux-pci, gwshan, qiudayu, bhelgaas, yan, linuxppc-dev

On 06/25/2014 07:20 PM, David Laight wrote:
> From: Wei Yang
>> On Wed, Jun 25, 2014 at 02:12:34PM +1000, Alexey Kardashevskiy wrote:
>>> On 06/25/2014 11:12 AM, Wei Yang wrote:
>>>> On Tue, Jun 24, 2014 at 08:06:32PM +1000, Alexey Kardashevskiy wrote:
>>>>> On 06/10/2014 11:56 AM, Wei Yang wrote:
>>>>>> Current iommu_table of a PE is a static field. This will have a problem when
>>>>>> iommu_free_table is called.
>>>>>
>>>>> What kind of problem? This table is per PE and PE is not going anywhere.
>>>>>
>>>>
>>>> Yes, for Bus PE, they will always sit in the system. When VF PE introduced,
>>>> they could be released on the fly. When they are released, so do the iommu
>>>> table for the PE.
>>>
>>> iommu_table is a part of PE struct. When PE is released, iommu_table will
>>> go with it as well. Why to make is a pointer? I would understand it if you
>>> added reference counting there but no - iommu_table's lifetime is equal to
>>> PE lifetime.
>>>
>>
>> Yes, iommu_talbe's life time equals to PE lifetime, so when releasing a PE we
>> need to release the iommu table. Currently, there is one function to release
>> the iommu table, iommu_free_table() which takes a pointer of the iommu_table
>> and release it.
>>
>> If the iommu table in PE is just a part of PE, it will have some problem to
>> release it with iommu_free_table(). That's why I make it a pointer in PE
>> structure.
> 
> What are the sizes of the iommu table and the PE structure?

This is all about iommu_table struct (which is just a descriptor), not
IOMMU table per se (which may be megabytes) :)


> If the table is a round number of pages then you probably don't want to
> embed it inside the PE structure.




-- 
Alexey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically
  2014-06-25  9:20             ` David Laight
                               ` (2 preceding siblings ...)
  (?)
@ 2014-07-14  3:12             ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 100+ messages in thread
From: Benjamin Herrenschmidt @ 2014-07-14  3:12 UTC (permalink / raw)
  To: David Laight
  Cc: 'Wei Yang',
	Alexey Kardashevskiy, linux-pci, gwshan, qiudayu, bhelgaas, yan,
	linuxppc-dev

On Wed, 2014-06-25 at 09:20 +0000, David Laight wrote:
> What are the sizes of the iommu table and the PE structure? 
> If the table is a round number of pages then you probably don't want
> to embed it inside the PE structure. 

The problem isn't the table itself but the struct iommu_table which
contains the pointer to the actual table and various other bits of
controlling state.

Cheers,
Ben.




^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2014-07-14  3:12 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-10  1:56 [RFC PATCH V3 00/17] Enable SRIOV on POWER8 Wei Yang
2014-06-10  1:56 ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 01/17] pci/iov: Export interface for retrieve VF's BDF Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 02/17] pci/of: Match PCI VFs to dev-tree nodes dynamically Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-23  5:07   ` Gavin Shan
2014-06-23  5:07     ` Gavin Shan
2014-06-23  6:29     ` Wei Yang
2014-06-23  6:29       ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 03/17] ppc/pci: don't unset pci resources for VFs Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 04/17] PCI: SRIOV: add VF enable/disable hook Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-23  5:03   ` Gavin Shan
2014-06-23  5:03     ` Gavin Shan
2014-06-23  6:29     ` Wei Yang
2014-06-23  6:29       ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 05/17] ppc/pnv: user macro to define the TCE size Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-23  5:12   ` Gavin Shan
2014-06-23  5:12     ` Gavin Shan
2014-06-23  6:31     ` Wei Yang
2014-06-23  6:31       ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 06/17] ppc/pnv: allocate pe->iommu_table dynamically Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-24 10:06   ` Alexey Kardashevskiy
2014-06-24 10:06     ` Alexey Kardashevskiy
2014-06-25  1:12     ` Wei Yang
2014-06-25  1:12       ` Wei Yang
2014-06-25  4:12       ` Alexey Kardashevskiy
2014-06-25  4:12         ` Alexey Kardashevskiy
2014-06-25  5:27         ` Wei Yang
2014-06-25  5:27           ` Wei Yang
2014-06-25  7:50           ` Alexey Kardashevskiy
2014-06-25  7:50             ` Alexey Kardashevskiy
2014-06-25  7:56             ` Benjamin Herrenschmidt
2014-06-25  7:56               ` Benjamin Herrenschmidt
2014-06-25  9:18               ` Wei Yang
2014-06-25  9:18                 ` Wei Yang
2014-06-25  9:13             ` Wei Yang
2014-06-25  9:13               ` Wei Yang
2014-06-25  9:20           ` David Laight
2014-06-25  9:20             ` David Laight
2014-06-25  9:31             ` Wei Yang
2014-06-25  9:31               ` Wei Yang
2014-06-25 10:30             ` Alexey Kardashevskiy
2014-06-25 10:30               ` Alexey Kardashevskiy
2014-07-14  3:12             ` Benjamin Herrenschmidt
2014-06-10  1:56 ` [RFC PATCH V3 07/17] ppc/pnv: Add function to deconfig a PE Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-23  5:27   ` Gavin Shan
2014-06-23  5:27     ` Gavin Shan
2014-06-23  9:07     ` Wei Yang
2014-06-23  9:07       ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 08/17] PCI: Add weak pcibios_sriov_resource_size() interface Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-23  5:41   ` Gavin Shan
2014-06-23  5:41     ` Gavin Shan
2014-06-23  7:56     ` Wei Yang
2014-06-23  7:56       ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 09/17] PCI: Add weak pcibios_sriov_resource_alignment() interface Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 10/17] PCI: take additional IOV BAR alignment in sizing and assigning Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 11/17] ppc/pnv: Expand VF resources according to the number of total_pe Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-23  6:07   ` Gavin Shan
2014-06-23  6:07     ` Gavin Shan
2014-06-23  6:56     ` Wei Yang
2014-06-23  6:56       ` Wei Yang
2014-06-23  7:08       ` Gavin Shan
2014-06-23  7:08         ` Gavin Shan
2014-06-10  1:56 ` [RFC PATCH V3 12/17] powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-23  6:09   ` Gavin Shan
2014-06-23  6:09     ` Gavin Shan
2014-06-23  8:21     ` Wei Yang
2014-06-23  8:21       ` Wei Yang
2014-06-23 23:29       ` Gavin Shan
2014-06-23 23:29         ` Gavin Shan
2014-06-24  1:24         ` Wei Yang
2014-06-24  1:24           ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 13/17] powerpc/powernv: shift VF resource with an offset Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 14/17] ppc/pci: create/release dev-tree node for VFs Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-18 18:26   ` Grant Likely
2014-06-18 20:51     ` Benjamin Herrenschmidt
2014-06-18 20:51       ` Benjamin Herrenschmidt
2014-06-19  2:46     ` Wei Yang
2014-06-19  8:30       ` Grant Likely
2014-06-19  9:42         ` Wei Yang
2014-06-20  3:46         ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 15/17] powerpc/powernv: allocate VF PE Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 16/17] ppc/pci: Expanding IOV BAR, with m64_per_iov supported Wei Yang
2014-06-10  1:56   ` Wei Yang
2014-06-10  1:56 ` [RFC PATCH V3 17/17] ppc/pnv: Group VF PE when IOV BAR is big on PHB3 Wei Yang
2014-06-10  1:56   ` Wei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.