All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V5 0/6] Redesign SR-IOV on PowerNV
@ 2015-10-09  2:46 Wei Yang
  2015-10-09  2:46 ` [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
                   ` (5 more replies)
  0 siblings, 6 replies; 30+ messages in thread
From: Wei Yang @ 2015-10-09  2:46 UTC (permalink / raw)
  To: gwshan, aik, benh; +Cc: linuxppc-dev, mpe, Wei Yang

In original design, it tries to group VFs to enable more number of VFs in the
system, when VF BAR is bigger than 64MB. This design has a flaw in which one
error on a VF will interfere other VFs in the same group.

This patch series change this design by using M64 BAR in Single PE mode to
cover only one VF BAR. By doing so, it gives absolute isolation between VFs.

v5:
   * rebase on top of v4.2, with commit 68230242cdb "net/mlx4_core: Add port
     attribute when tracking counters" reverted
   * add some reason in change log of Patch 1
   * make the pnv_pci_iov_resource_alignment() more easy to read
   * initialize pe_num_map[] just after it is allocated
   * test ssh from guest to host via VF passed and then shutdown the guest
v4:
   * rebase the code on top of v4.2-rc7
   * switch back to use the dynamic version of pe_num_map and m64_map
   * split the memory allocation and PE assignment of pe_num_map to make it
     more easy to read
   * check pe_num_map value before free PE.
   * add the rename reason for pe_num_map and m64_map in change log
v3:
   * return -ENOSPC when a VF has non-64bit prefetchable BAR
   * rename offset to pe_num_map and define it staticly
   * change commit log based on comments
   * define m64_map staticly
v2:
   * clean up iov bar alignment calculation
   * change m64s to m64_bars
   * add a field to represent M64 Single PE mode will be used
   * change m64_wins to m64_map
   * calculate the gate instead of hard coded
   * dynamically allocate m64_map
   * dynamically allocate PE#
   * add a case to calculate iov bar alignment when M64 Single PE is used
   * when M64 Single PE is used, compare num_vfs with M64 BAR available number 
     in system at first

Wei Yang (6):
  powerpc/powernv: don't enable SRIOV when VF BAR has non
    64bit-prefetchable BAR
  powerpc/powernv: simplify the calculation of iov resource alignment
  powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  powerpc/powernv: replace the hard coded boundary with gate
  powerpc/powernv: boundary the total VF BAR size instead of the
    individual one
  powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE
    mode

 arch/powerpc/include/asm/pci-bridge.h     |   7 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 329 ++++++++++++++++--------------
 2 files changed, 175 insertions(+), 161 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-09  2:46 [PATCH V5 0/6] Redesign SR-IOV on PowerNV Wei Yang
@ 2015-10-09  2:46 ` Wei Yang
  2015-10-09  8:15   ` Benjamin Herrenschmidt
  2015-10-13  0:01   ` Gavin Shan
  2015-10-09  2:46 ` [PATCH V5 2/6] powerpc/powernv: simplify the calculation of iov resource alignment Wei Yang
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 30+ messages in thread
From: Wei Yang @ 2015-10-09  2:46 UTC (permalink / raw)
  To: gwshan, aik, benh; +Cc: linuxppc-dev, mpe, Wei Yang

On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
from 64bit prefetchable window, which means M64 BAR can't work on it.

The reason is PCI bridges support only 2 windows and the kernel code
programs bridges in the way that one window is 32bit-nonprefetchable and
the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
non-prefetchable, it will be mapped into 32bit space and therefore M64
cannot be used for it.

This patch makes this explicit.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 85cbc96..8c031b5 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 		if (!res->flags || !res->parent)
 			continue;
 
-		if (!pnv_pci_is_mem_pref_64(res->flags))
-			continue;
-
 		/*
 		 * The actual IOV BAR range is determined by the start address
 		 * and the actual size for num_vfs VFs BAR.  This check is to
@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 		if (!res->flags || !res->parent)
 			continue;
 
-		if (!pnv_pci_is_mem_pref_64(res->flags))
-			continue;
-
 		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
 		res2 = *res;
 		res->start += size * offset;
@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 		if (!res->flags || !res->parent)
 			continue;
 
-		if (!pnv_pci_is_mem_pref_64(res->flags))
-			continue;
-
 		for (j = 0; j < vf_groups; j++) {
 			do {
 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 	pdn = pci_get_pdn(pdev);
 
 	if (phb->type == PNV_PHB_IODA2) {
+		if (!pdn->vfs_expanded) {
+			dev_info(&pdev->dev, "don't support this SRIOV device"
+				" with non 64bit-prefetchable IOV BAR\n");
+			return -ENOSPC;
+		}
+
 		/* Calculate available PE for required VFs */
 		mutex_lock(&phb->ioda.pe_alloc_mutex);
 		pdn->offset = bitmap_find_next_zero_area(
@@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		if (!res->flags || res->parent)
 			continue;
 		if (!pnv_pci_is_mem_pref_64(res->flags)) {
-			dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
+			dev_warn(&pdev->dev, "Don't support SR-IOV with"
+					" non M64 VF BAR%d: %pR. \n",
 				 i, res);
-			continue;
+			return;
 		}
 
 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
@@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
 		if (!res->flags || res->parent)
 			continue;
-		if (!pnv_pci_is_mem_pref_64(res->flags)) {
-			dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
-				 i, res);
-			continue;
-		}
 
 		dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V5 2/6] powerpc/powernv: simplify the calculation of iov resource alignment
  2015-10-09  2:46 [PATCH V5 0/6] Redesign SR-IOV on PowerNV Wei Yang
  2015-10-09  2:46 ` [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
@ 2015-10-09  2:46 ` Wei Yang
  2015-10-13  0:13   ` Gavin Shan
  2015-10-09  2:46 ` [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR Wei Yang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 30+ messages in thread
From: Wei Yang @ 2015-10-09  2:46 UTC (permalink / raw)
  To: gwshan, aik, benh; +Cc: linuxppc-dev, mpe, Wei Yang

The alignment of IOV BAR on PowerNV platform is the total size of the IOV
BAR. No matter whether the IOV BAR is extended with number of
roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
size could be calculated by (vfs_expanded * VF_BAR_size).

This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
first case.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8c031b5..7da476b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2988,17 +2988,21 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 						      int resno)
 {
 	struct pci_dn *pdn = pci_get_pdn(pdev);
-	resource_size_t align, iov_align;
-
-	iov_align = resource_size(&pdev->resource[resno]);
-	if (iov_align)
-		return iov_align;
+	resource_size_t align;
 
+	/*
+	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
+	 * SR-IOV. While from hardware perspective, the range mapped by M64
+	 * BAR should be size aligned.
+	 *
+	 * This function returns the total IOV BAR size if M64 BAR is in
+	 * Shared PE mode or just the individual size if not.
+	 */
 	align = pci_iov_resource_size(pdev, resno);
-	if (pdn->vfs_expanded)
-		return pdn->vfs_expanded * align;
+	if (!pdn->vfs_expanded)
+		return align;
 
-	return align;
+	return pdn->vfs_expanded * align;
 }
 #endif /* CONFIG_PCI_IOV */
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-10-09  2:46 [PATCH V5 0/6] Redesign SR-IOV on PowerNV Wei Yang
  2015-10-09  2:46 ` [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
  2015-10-09  2:46 ` [PATCH V5 2/6] powerpc/powernv: simplify the calculation of iov resource alignment Wei Yang
@ 2015-10-09  2:46 ` Wei Yang
  2015-10-12 23:55   ` Gavin Shan
  2015-10-09  2:46 ` [PATCH V5 4/6] powerpc/powernv: replace the hard coded boundary with gate Wei Yang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 30+ messages in thread
From: Wei Yang @ 2015-10-09  2:46 UTC (permalink / raw)
  To: gwshan, aik, benh; +Cc: linuxppc-dev, mpe, Wei Yang

In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
BARs in Single PE mode to cover the number of VFs required to be enabled.
By doing so, several VFs would be in one VF Group and leads to interference
between VFs in the same group.

And in this patch, m64_wins is renamed to m64_map, which means index number
of the M64 BAR used to map the VF BAR. Based on Gavin's comments.

This patch changes the design by using one M64 BAR in Single PE mode for
one VF BAR. This gives absolute isolation for VFs.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci-bridge.h     |   5 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++++++------------------
 2 files changed, 68 insertions(+), 106 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 712add5..8aeba4c 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -214,10 +214,9 @@ struct pci_dn {
 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
 	u16     num_vfs;		/* number of VFs enabled*/
 	int     offset;			/* PE# for the first VF PE */
-#define M64_PER_IOV 4
-	int     m64_per_iov;
+	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
 #define IODA_INVALID_M64        (-1)
-	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
+	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
 	struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7da476b..2886f90 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
 }
 
 #ifdef CONFIG_PCI_IOV
-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
 {
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
 	struct pci_dn         *pdn;
 	int                    i, j;
+	int                    m64_bars;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
 	phb = hose->private_data;
 	pdn = pci_get_pdn(pdev);
 
+	if (pdn->m64_single_mode)
+		m64_bars = num_vfs;
+	else
+		m64_bars = 1;
+
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-		for (j = 0; j < M64_PER_IOV; j++) {
-			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
+		for (j = 0; j < m64_bars; j++) {
+			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
 				continue;
 			opal_pci_phb_mmio_enable(phb->opal_id,
-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
-			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
+				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
+			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
+			pdn->m64_map[j][i] = IODA_INVALID_M64;
 		}
 
+	kfree(pdn->m64_map);
 	return 0;
 }
 
@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 	int                    total_vfs;
 	resource_size_t        size, start;
 	int                    pe_num;
-	int                    vf_groups;
-	int                    vf_per_group;
+	int                    m64_bars;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 	pdn = pci_get_pdn(pdev);
 	total_vfs = pci_sriov_get_totalvfs(pdev);
 
-	/* Initialize the m64_wins to IODA_INVALID_M64 */
-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-		for (j = 0; j < M64_PER_IOV; j++)
-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
+	if (pdn->m64_single_mode)
+		m64_bars = num_vfs;
+	else
+		m64_bars = 1;
+
+	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
+	if (!pdn->m64_map)
+		return -ENOMEM;
+	/* Initialize the m64_map to IODA_INVALID_M64 */
+	for (i = 0; i < m64_bars ; i++)
+		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
+			pdn->m64_map[i][j] = IODA_INVALID_M64;
 
-	if (pdn->m64_per_iov == M64_PER_IOV) {
-		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
-		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
-			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
-	} else {
-		vf_groups = 1;
-		vf_per_group = 1;
-	}
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
 		if (!res->flags || !res->parent)
 			continue;
 
-		for (j = 0; j < vf_groups; j++) {
+		for (j = 0; j < m64_bars; j++) {
 			do {
 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
 						phb->ioda.m64_bar_idx + 1, 0);
@@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 					goto m64_failed;
 			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
 
-			pdn->m64_wins[i][j] = win;
+			pdn->m64_map[j][i] = win;
 
-			if (pdn->m64_per_iov == M64_PER_IOV) {
+			if (pdn->m64_single_mode) {
 				size = pci_iov_resource_size(pdev,
 							PCI_IOV_RESOURCES + i);
-				size = size * vf_per_group;
 				start = res->start + size * j;
 			} else {
 				size = resource_size(res);
@@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 			}
 
 			/* Map the M64 here */
-			if (pdn->m64_per_iov == M64_PER_IOV) {
+			if (pdn->m64_single_mode) {
 				pe_num = pdn->offset + j;
 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
 						pe_num, OPAL_M64_WINDOW_TYPE,
-						pdn->m64_wins[i][j], 0);
+						pdn->m64_map[j][i], 0);
 			}
 
 			rc = opal_pci_set_phb_mem_window(phb->opal_id,
 						 OPAL_M64_WINDOW_TYPE,
-						 pdn->m64_wins[i][j],
+						 pdn->m64_map[j][i],
 						 start,
 						 0, /* unused */
 						 size);
@@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 				goto m64_failed;
 			}
 
-			if (pdn->m64_per_iov == M64_PER_IOV)
+			if (pdn->m64_single_mode)
 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
 			else
 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
 
 			if (rc != OPAL_SUCCESS) {
 				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
@@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 	return 0;
 
 m64_failed:
-	pnv_pci_vf_release_m64(pdev);
+	pnv_pci_vf_release_m64(pdev, num_vfs);
 	return -EBUSY;
 }
 
@@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
 	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
 }
 
-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
 {
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
 	struct pnv_ioda_pe    *pe, *pe_n;
 	struct pci_dn         *pdn;
-	u16                    vf_index;
-	int64_t                rc;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	if (!pdev->is_physfn)
 		return;
 
-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
-		int   vf_group;
-		int   vf_per_group;
-		int   vf_index1;
-
-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
-
-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
-			for (vf_index = vf_group * vf_per_group;
-				vf_index < (vf_group + 1) * vf_per_group &&
-				vf_index < num_vfs;
-				vf_index++)
-				for (vf_index1 = vf_group * vf_per_group;
-					vf_index1 < (vf_group + 1) * vf_per_group &&
-					vf_index1 < num_vfs;
-					vf_index1++){
-
-					rc = opal_pci_set_peltv(phb->opal_id,
-						pdn->offset + vf_index,
-						pdn->offset + vf_index1,
-						OPAL_REMOVE_PE_FROM_DOMAIN);
-
-					if (rc)
-					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
-						__func__,
-						pdn->offset + vf_index1, rc);
-				}
-	}
-
 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
 		if (pe->parent_dev != pdev)
 			continue;
@@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 	num_vfs = pdn->num_vfs;
 
 	/* Release VF PEs */
-	pnv_ioda_release_vf_PE(pdev, num_vfs);
+	pnv_ioda_release_vf_PE(pdev);
 
 	if (phb->type == PNV_PHB_IODA2) {
-		if (pdn->m64_per_iov == 1)
+		if (!pdn->m64_single_mode)
 			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
 
 		/* Release M64 windows */
-		pnv_pci_vf_release_m64(pdev);
+		pnv_pci_vf_release_m64(pdev, num_vfs);
 
 		/* Release PE numbers */
 		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
@@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	int                    pe_num;
 	u16                    vf_index;
 	struct pci_dn         *pdn;
-	int64_t                rc;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
 	}
-
-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
-		int   vf_group;
-		int   vf_per_group;
-		int   vf_index1;
-
-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
-
-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
-			for (vf_index = vf_group * vf_per_group;
-			     vf_index < (vf_group + 1) * vf_per_group &&
-			     vf_index < num_vfs;
-			     vf_index++) {
-				for (vf_index1 = vf_group * vf_per_group;
-				     vf_index1 < (vf_group + 1) * vf_per_group &&
-				     vf_index1 < num_vfs;
-				     vf_index1++) {
-
-					rc = opal_pci_set_peltv(phb->opal_id,
-						pdn->offset + vf_index,
-						pdn->offset + vf_index1,
-						OPAL_ADD_PE_TO_DOMAIN);
-
-					if (rc)
-					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
-						__func__,
-						pdn->offset + vf_index1, rc);
-				}
-			}
-		}
-	}
 }
 
 int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
@@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 			return -ENOSPC;
 		}
 
+		/*
+		 * When M64 BARs functions in Single PE mode, the number of VFs
+		 * could be enabled must be less than the number of M64 BARs.
+		 */
+		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {
+			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
+			return -EBUSY;
+		}
+
 		/* Calculate available PE for required VFs */
 		mutex_lock(&phb->ioda.pe_alloc_mutex);
 		pdn->offset = bitmap_find_next_zero_area(
@@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 		 * the IOV BAR according to the PE# allocated to the VFs.
 		 * Otherwise, the PE# for the VF will conflict with others.
 		 */
-		if (pdn->m64_per_iov == 1) {
+		if (!pdn->m64_single_mode) {
 			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
 			if (ret)
 				goto m64_failed;
@@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 	/* Allocate PCI data */
 	add_dev_pci_data(pdev);
 
-	pnv_pci_sriov_enable(pdev, num_vfs);
-	return 0;
+	return pnv_pci_sriov_enable(pdev, num_vfs);
 }
 #endif /* CONFIG_PCI_IOV */
 
@@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 
 	pdn = pci_get_pdn(pdev);
 	pdn->vfs_expanded = 0;
+	pdn->m64_single_mode = false;
 
 	total_vfs = pci_sriov_get_totalvfs(pdev);
-	pdn->m64_per_iov = 1;
 	mul = phb->ioda.total_pe;
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
@@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		if (size > (1 << 26)) {
 			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
 				 i, res);
-			pdn->m64_per_iov = M64_PER_IOV;
 			mul = roundup_pow_of_two(total_vfs);
+			pdn->m64_single_mode = true;
 			break;
 		}
 	}
@@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
 static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 						      int resno)
 {
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
 	struct pci_dn *pdn = pci_get_pdn(pdev);
 	resource_size_t align;
 
@@ -2995,12 +2947,23 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 	 * SR-IOV. While from hardware perspective, the range mapped by M64
 	 * BAR should be size aligned.
 	 *
+	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
+	 * powernv-specific hardware restriction is gone. But if just use the
+	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
+	 * in one segment of M64 #15, which introduces the PE conflict between
+	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
+	 * m64_segsize.
+	 *
 	 * This function returns the total IOV BAR size if M64 BAR is in
 	 * Shared PE mode or just the individual size if not.
+	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
+	 * M64 segment size if IOV BAR size is less.
 	 */
 	align = pci_iov_resource_size(pdev, resno);
 	if (!pdn->vfs_expanded)
 		return align;
+	if (pdn->m64_single_mode)
+		return max(align, (resource_size_t)phb->ioda.m64_segsize);
 
 	return pdn->vfs_expanded * align;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V5 4/6] powerpc/powernv: replace the hard coded boundary with gate
  2015-10-09  2:46 [PATCH V5 0/6] Redesign SR-IOV on PowerNV Wei Yang
                   ` (2 preceding siblings ...)
  2015-10-09  2:46 ` [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR Wei Yang
@ 2015-10-09  2:46 ` Wei Yang
  2015-10-09  2:46 ` [PATCH V5 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one Wei Yang
  2015-10-09  2:46 ` [PATCH V5 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode Wei Yang
  5 siblings, 0 replies; 30+ messages in thread
From: Wei Yang @ 2015-10-09  2:46 UTC (permalink / raw)
  To: gwshan, aik, benh; +Cc: linuxppc-dev, mpe, Wei Yang

At the moment 64bit-prefetchable window can be maximum 64GB, which is
currently got from device tree. This means that in shared mode the maximum
supported VF BAR size is 64GB/256=256MB. While this size could exhaust the
whole 64bit-prefetchable window. This is a design decision to set a
boundary to 64MB of the VF BAR size. Since VF BAR size with 64MB would
occupy a quarter of the 64bit-prefetchable window, this is affordable.

This patch replaces magic limit of 64MB with "gate", which is 1/4 of the
M64 Segment Size(m64_segsize >> 2) and adds comment to explain the reason
for it.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vent.ibm.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 2886f90..71c5371 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2696,8 +2696,9 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
 #ifdef CONFIG_PCI_IOV
 static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 {
-	struct pci_controller *hose;
-	struct pnv_phb *phb;
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	const resource_size_t gate = phb->ioda.m64_segsize >> 2;
 	struct resource *res;
 	int i;
 	resource_size_t size;
@@ -2707,9 +2708,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	if (!pdev->is_physfn || pdev->is_added)
 		return;
 
-	hose = pci_bus_to_host(pdev->bus);
-	phb = hose->private_data;
-
 	pdn = pci_get_pdn(pdev);
 	pdn->vfs_expanded = 0;
 	pdn->m64_single_mode = false;
@@ -2730,10 +2728,22 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 
 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
 
-		/* bigger than 64M */
-		if (size > (1 << 26)) {
-			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
-				 i, res);
+		/*
+		 * If bigger than quarter of M64 segment size, just round up
+		 * power of two.
+		 *
+		 * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
+		 * with other devices, IOV BAR size is expanded to be
+		 * (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
+		 * segment size , the expanded size would equal to half of the
+		 * whole M64 space size, which will exhaust the M64 Space and
+		 * limit the system flexibility.  This is a design decision to
+		 * set the boundary to quarter of the M64 segment size.
+		 */
+		if (size > gate) {
+			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size "
+				"is bigger than %lld, roundup power2\n",
+				 i, res, gate);
 			mul = roundup_pow_of_two(total_vfs);
 			pdn->m64_single_mode = true;
 			break;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V5 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one
  2015-10-09  2:46 [PATCH V5 0/6] Redesign SR-IOV on PowerNV Wei Yang
                   ` (3 preceding siblings ...)
  2015-10-09  2:46 ` [PATCH V5 4/6] powerpc/powernv: replace the hard coded boundary with gate Wei Yang
@ 2015-10-09  2:46 ` Wei Yang
  2015-10-09  2:46 ` [PATCH V5 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode Wei Yang
  5 siblings, 0 replies; 30+ messages in thread
From: Wei Yang @ 2015-10-09  2:46 UTC (permalink / raw)
  To: gwshan, aik, benh; +Cc: linuxppc-dev, mpe, Wei Yang

Each VF could have 6 BARs at most. When the total BAR size exceeds the
gate, after expanding it will also exhaust the M64 Window.

This patch limits the boundary by checking the total VF BAR size instead of
the individual BAR.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 71c5371..ba77fc0 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2701,7 +2701,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	const resource_size_t gate = phb->ioda.m64_segsize >> 2;
 	struct resource *res;
 	int i;
-	resource_size_t size;
+	resource_size_t size, total_vf_bar_sz;
 	struct pci_dn *pdn;
 	int mul, total_vfs;
 
@@ -2714,6 +2714,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 
 	total_vfs = pci_sriov_get_totalvfs(pdev);
 	mul = phb->ioda.total_pe;
+	total_vf_bar_sz = 0;
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
@@ -2726,7 +2727,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 			return;
 		}
 
-		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+		total_vf_bar_sz += pci_iov_resource_size(pdev,
+				i + PCI_IOV_RESOURCES);
 
 		/*
 		 * If bigger than quarter of M64 segment size, just round up
@@ -2740,11 +2742,11 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		 * limit the system flexibility.  This is a design decision to
 		 * set the boundary to quarter of the M64 segment size.
 		 */
-		if (size > gate) {
-			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size "
-				"is bigger than %lld, roundup power2\n",
-				 i, res, gate);
+		if (total_vf_bar_sz > gate) {
 			mul = roundup_pow_of_two(total_vfs);
+			dev_info(&pdev->dev,
+				"VF BAR Total IOV size %llx > %llx, roundup to %d VFs\n",
+				total_vf_bar_sz, gate, mul);
 			pdn->m64_single_mode = true;
 			break;
 		}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V5 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode
  2015-10-09  2:46 [PATCH V5 0/6] Redesign SR-IOV on PowerNV Wei Yang
                   ` (4 preceding siblings ...)
  2015-10-09  2:46 ` [PATCH V5 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one Wei Yang
@ 2015-10-09  2:46 ` Wei Yang
  5 siblings, 0 replies; 30+ messages in thread
From: Wei Yang @ 2015-10-09  2:46 UTC (permalink / raw)
  To: gwshan, aik, benh; +Cc: linuxppc-dev, mpe, Wei Yang

When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
sparse.

This patch restructures the code to allocate sparse PE# for VFs when M64
BAR is set to Single PE mode. Also it rename the offset to pe_num_map to
reflect the content is the PE number.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci-bridge.h     |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 81 +++++++++++++++++++++++--------
 2 files changed, 63 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 8aeba4c..b3a226b 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -213,7 +213,7 @@ struct pci_dn {
 #ifdef CONFIG_PCI_IOV
 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
 	u16     num_vfs;		/* number of VFs enabled*/
-	int     offset;			/* PE# for the first VF PE */
+	int     *pe_num_map;		/* PE# for the first VF PE or array */
 	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
 #define IODA_INVALID_M64        (-1)
 	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index ba77fc0..34833b7 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1243,7 +1243,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 
 			/* Map the M64 here */
 			if (pdn->m64_single_mode) {
-				pe_num = pdn->offset + j;
+				pe_num = pdn->pe_num_map[j];
 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
 						pe_num, OPAL_M64_WINDOW_TYPE,
 						pdn->m64_map[j][i], 0);
@@ -1347,7 +1347,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 	struct pnv_phb        *phb;
 	struct pci_dn         *pdn;
 	struct pci_sriov      *iov;
-	u16 num_vfs;
+	u16                    num_vfs, i;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1361,14 +1361,21 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 
 	if (phb->type == PNV_PHB_IODA2) {
 		if (!pdn->m64_single_mode)
-			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
+			pnv_pci_vf_resource_shift(pdev, -*pdn->pe_num_map);
 
 		/* Release M64 windows */
 		pnv_pci_vf_release_m64(pdev, num_vfs);
 
 		/* Release PE numbers */
-		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
-		pdn->offset = 0;
+		if (pdn->m64_single_mode) {
+			for (i = 0; i < num_vfs; i++) {
+				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
+					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
+			}
+		} else
+			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
+		/* Releasing pe_num_map */
+		kfree(pdn->pe_num_map);
 	}
 }
 
@@ -1394,7 +1401,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 
 	/* Reserve PE for each VF */
 	for (vf_index = 0; vf_index < num_vfs; vf_index++) {
-		pe_num = pdn->offset + vf_index;
+		if (pdn->m64_single_mode)
+			pe_num = pdn->pe_num_map[vf_index];
+		else
+			pe_num = *pdn->pe_num_map + vf_index;
 
 		pe = &phb->ioda.pe_array[pe_num];
 		pe->pe_number = pe_num;
@@ -1436,6 +1446,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 	struct pnv_phb        *phb;
 	struct pci_dn         *pdn;
 	int                    ret;
+	u16                    i;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1458,20 +1469,44 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 			return -EBUSY;
 		}
 
+		/* Allocating pe_num_map */
+		if (pdn->m64_single_mode)
+			pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map) * num_vfs,
+					GFP_KERNEL);
+		else
+			pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map), GFP_KERNEL);
+
+		if (!pdn->pe_num_map)
+			return -ENOMEM;
+
+		if (pdn->m64_single_mode)
+			for (i = 0; i < num_vfs; i++)
+				pdn->pe_num_map[i] = IODA_INVALID_PE;
+
 		/* Calculate available PE for required VFs */
-		mutex_lock(&phb->ioda.pe_alloc_mutex);
-		pdn->offset = bitmap_find_next_zero_area(
-			phb->ioda.pe_alloc, phb->ioda.total_pe,
-			0, num_vfs, 0);
-		if (pdn->offset >= phb->ioda.total_pe) {
+		if (pdn->m64_single_mode) {
+			for (i = 0; i < num_vfs; i++) {
+				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
+				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
+					ret = -EBUSY;
+					goto m64_failed;
+				}
+			}
+		} else {
+			mutex_lock(&phb->ioda.pe_alloc_mutex);
+			*pdn->pe_num_map = bitmap_find_next_zero_area(
+				phb->ioda.pe_alloc, phb->ioda.total_pe,
+				0, num_vfs, 0);
+			if (*pdn->pe_num_map >= phb->ioda.total_pe) {
+				mutex_unlock(&phb->ioda.pe_alloc_mutex);
+				dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
+				kfree(pdn->pe_num_map);
+				return -EBUSY;
+			}
+			bitmap_set(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
 			mutex_unlock(&phb->ioda.pe_alloc_mutex);
-			dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
-			pdn->offset = 0;
-			return -EBUSY;
 		}
-		bitmap_set(phb->ioda.pe_alloc, pdn->offset, num_vfs);
 		pdn->num_vfs = num_vfs;
-		mutex_unlock(&phb->ioda.pe_alloc_mutex);
 
 		/* Assign M64 window accordingly */
 		ret = pnv_pci_vf_assign_m64(pdev, num_vfs);
@@ -1486,7 +1521,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 		 * Otherwise, the PE# for the VF will conflict with others.
 		 */
 		if (!pdn->m64_single_mode) {
-			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
+			ret = pnv_pci_vf_resource_shift(pdev, *pdn->pe_num_map);
 			if (ret)
 				goto m64_failed;
 		}
@@ -1498,8 +1533,16 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 	return 0;
 
 m64_failed:
-	bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
-	pdn->offset = 0;
+	if (pdn->m64_single_mode) {
+		for (i = 0; i < num_vfs; i++) {
+			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
+				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
+		}
+	} else
+		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
+
+	/* Releasing pe_num_map */
+	kfree(pdn->pe_num_map);
 
 	return ret;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-09  2:46 ` [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
@ 2015-10-09  8:15   ` Benjamin Herrenschmidt
  2015-10-09  9:02     ` David Laight
  2015-10-12  2:58     ` Wei Yang
  2015-10-13  0:01   ` Gavin Shan
  1 sibling, 2 replies; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2015-10-09  8:15 UTC (permalink / raw)
  To: Wei Yang, gwshan, aik; +Cc: linuxppc-dev, mpe

On Fri, 2015-10-09 at 10:46 +0800, Wei Yang wrote:
> On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
> a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
> from 64bit prefetchable window, which means M64 BAR can't work on it.

Won't this cause a lot of devices to become unsupported for us ? Or do
all devices we care about have their BARs marked prefetchable ?

> The reason is PCI bridges support only 2 windows and the kernel code
> programs bridges in the way that one window is 32bit-nonprefetchable and
> the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
> non-prefetchable, it will be mapped into 32bit space and therefore M64
> cannot be used for it.
> 
> This patch makes this explicit.

So PCIe allows for non-prefetchable BARs to be put under prefetchable
bridge windows as long as the mapping done by the CPU doesn't prefetch,
I believe. Well it's a natural conclusion of the weird note "Additional
Guidance on the Prefetchable Bit in Memory Space BARs" page 596 of PCIe
spec v3.0... it also says that devices should be pretty much free to
set their prefetchable bit even if they have side effects so.

So maybe we should have that option, rather than just not using the
devices, allow them to be allocate via the prefetchable window...

> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 25 +++++++++----------------
>  1 file changed, 9 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 85cbc96..8c031b5 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>  > 	> 	> if (!res->flags || !res->parent)
>  > 	> 	> 	> continue;
>  
> -> 	> 	> if (!pnv_pci_is_mem_pref_64(res->flags))
> -> 	> 	> 	> continue;
> -
>  > 	> 	> /*
>  > 	> 	>  * The actual IOV BAR range is determined by the start address
>  > 	> 	>  * and the actual size for num_vfs VFs BAR.  This check is to
> @@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>  > 	> 	> if (!res->flags || !res->parent)
>  > 	> 	> 	> continue;
>  
> -> 	> 	> if (!pnv_pci_is_mem_pref_64(res->flags))
> -> 	> 	> 	> continue;
> -
>  > 	> 	> size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>  > 	> 	> res2 = *res;
>  > 	> 	> res->start += size * offset;
> @@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>  > 	> 	> if (!res->flags || !res->parent)
>  > 	> 	> 	> continue;
>  
> -> 	> 	> if (!pnv_pci_is_mem_pref_64(res->flags))
> -> 	> 	> 	> continue;
> -
>  > 	> 	> for (j = 0; j < vf_groups; j++) {
>  > 	> 	> 	> do {
>  > 	> 	> 	> 	> win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
> @@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>  > 	> pdn = pci_get_pdn(pdev);
>  
>  > 	> if (phb->type == PNV_PHB_IODA2) {
> +> 	> 	> if (!pdn->vfs_expanded) {
> +> 	> 	> 	> dev_info(&pdev->dev, "don't support this SRIOV device"
> +> 	> 	> 	> 	> " with non 64bit-prefetchable IOV BAR\n");
> +> 	> 	> 	> return -ENOSPC;
> +> 	> 	> }
> +
>  > 	> 	> /* Calculate available PE for required VFs */
>  > 	> 	> mutex_lock(&phb->ioda.pe_alloc_mutex);
>  > 	> 	> pdn->offset = bitmap_find_next_zero_area(
> @@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  > 	> 	> if (!res->flags || res->parent)
>  > 	> 	> 	> continue;
>  > 	> 	> if (!pnv_pci_is_mem_pref_64(res->flags)) {
> -> 	> 	> 	> dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
> +> 	> 	> 	> dev_warn(&pdev->dev, "Don't support SR-IOV with"
> +> 	> 	> 	> 	> 	> " non M64 VF BAR%d: %pR. \n",
>  > 	> 	> 	> 	>  i, res);
> -> 	> 	> 	> continue;
> +> 	> 	> 	> return;
>  > 	> 	> }
>  
>  > 	> 	> size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
> @@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  > 	> 	> res = &pdev->resource[i + PCI_IOV_RESOURCES];
>  > 	> 	> if (!res->flags || res->parent)
>  > 	> 	> 	> continue;
> -> 	> 	> if (!pnv_pci_is_mem_pref_64(res->flags)) {
> -> 	> 	> 	> dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
> -> 	> 	> 	> 	>  i, res);
> -> 	> 	> 	> continue;
> -> 	> 	> }
>  
>  > 	> 	> dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
>  > 	> 	> size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-09  8:15   ` Benjamin Herrenschmidt
@ 2015-10-09  9:02     ` David Laight
  2015-10-12  3:16       ` Wei Yang
  2015-10-12  2:58     ` Wei Yang
  1 sibling, 1 reply; 30+ messages in thread
From: David Laight @ 2015-10-09  9:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Wei Yang, gwshan, aik; +Cc: linuxppc-dev

RnJvbTogQmVuamFtaW4gSGVycmVuc2NobWlkdA0KPiBTZW50OiAwOSBPY3RvYmVyIDIwMTUgMDk6
MTUNCj4gT24gRnJpLCAyMDE1LTEwLTA5IGF0IDEwOjQ2ICswODAwLCBXZWkgWWFuZyB3cm90ZToN
Cj4gPiBPbiBQSEJfSU9EQTIsIHdlIGVuYWJsZSBTUklPViBkZXZpY2VzIGJ5IG1hcHBpbmcgSU9W
IEJBUiB3aXRoIE02NCBCQVJzLiBJZg0KPiA+IGEgU1JJT1YgZGV2aWNlJ3MgSU9WIEJBUiBpcyBu
b3QgNjRiaXQtcHJlZmV0Y2hhYmxlLCB0aGlzIGlzIG5vdCBhc3NpZ25lZA0KPiA+IGZyb20gNjRi
aXQgcHJlZmV0Y2hhYmxlIHdpbmRvdywgd2hpY2ggbWVhbnMgTTY0IEJBUiBjYW4ndCB3b3JrIG9u
IGl0Lg0KPiANCj4gV29uJ3QgdGhpcyBjYXVzZSBhIGxvdCBvZiBkZXZpY2VzIHRvIGJlY29tZSB1
bnN1cHBvcnRlZCBmb3IgdXMgPyBPciBkbw0KPiBhbGwgZGV2aWNlcyB3ZSBjYXJlIGFib3V0IGhh
dmUgdGhlaXIgQkFScyBtYXJrZWQgcHJlZmV0Y2hhYmxlID8NCj4gDQo+ID4gVGhlIHJlYXNvbiBp
cyBQQ0kgYnJpZGdlcyBzdXBwb3J0IG9ubHkgMiB3aW5kb3dzIGFuZCB0aGUga2VybmVsIGNvZGUN
Cj4gPiBwcm9ncmFtcyBicmlkZ2VzIGluIHRoZSB3YXkgdGhhdCBvbmUgd2luZG93IGlzIDMyYml0
LW5vbnByZWZldGNoYWJsZSBhbmQNCj4gPiB0aGUgb3RoZXIgb25lIGlzIDY0Yml0LXByZWZldGNo
YWJsZS4gU28gaWYgZGV2aWNlcycgSU9WIEJBUiBpcyA2NGJpdCBhbmQNCj4gPiBub24tcHJlZmV0
Y2hhYmxlLCBpdCB3aWxsIGJlIG1hcHBlZCBpbnRvIDMyYml0IHNwYWNlIGFuZCB0aGVyZWZvcmUg
TTY0DQo+ID4gY2Fubm90IGJlIHVzZWQgZm9yIGl0Lg0KPiA+DQo+ID4gVGhpcyBwYXRjaCBtYWtl
cyB0aGlzIGV4cGxpY2l0Lg0KPiANCj4gU28gUENJZSBhbGxvd3MgZm9yIG5vbi1wcmVmZXRjaGFi
bGUgQkFScyB0byBiZSBwdXQgdW5kZXIgcHJlZmV0Y2hhYmxlDQo+IGJyaWRnZSB3aW5kb3dzIGFz
IGxvbmcgYXMgdGhlIG1hcHBpbmcgZG9uZSBieSB0aGUgQ1BVIGRvZXNuJ3QgcHJlZmV0Y2gsDQo+
IEkgYmVsaWV2ZS4gV2VsbCBpdCdzIGEgbmF0dXJhbCBjb25jbHVzaW9uIG9mIHRoZSB3ZWlyZCBu
b3RlICJBZGRpdGlvbmFsDQo+IEd1aWRhbmNlIG9uIHRoZSBQcmVmZXRjaGFibGUgQml0IGluIE1l
bW9yeSBTcGFjZSBCQVJzIiBwYWdlIDU5NiBvZiBQQ0llDQo+IHNwZWMgdjMuMC4uLiBpdCBhbHNv
IHNheXMgdGhhdCBkZXZpY2VzIHNob3VsZCBiZSBwcmV0dHkgbXVjaCBmcmVlIHRvDQo+IHNldCB0
aGVpciBwcmVmZXRjaGFibGUgYml0IGV2ZW4gaWYgdGhleSBoYXZlIHNpZGUgZWZmZWN0cyBzby4N
Cj4gDQo+IFNvIG1heWJlIHdlIHNob3VsZCBoYXZlIHRoYXQgb3B0aW9uLCByYXRoZXIgdGhhbiBq
dXN0IG5vdCB1c2luZyB0aGUNCj4gZGV2aWNlcywgYWxsb3cgdGhlbSB0byBiZSBhbGxvY2F0ZSB2
aWEgdGhlIHByZWZldGNoYWJsZSB3aW5kb3cuLi4NCg0KSSdtIHRyeWluZyB0byB1bmRlcnN0YW5k
IHNvbWUgb2YgdGhpcyBmb3Igb3RoZXIgcmVhc29ucy4uLg0KKE1vc3RseSBiZWNhdXNlIE5ldEJT
RCBkb2Vzbid0IGFsbG93IFBDSWUgYWRkcmVzc2VzIGFib3ZlIDRHLikNCg0KQUZBSUNUIHRoZXJl
IGFyZSB0d28gb3B0aW9ucyAnMzJiaXQnIGFuZCAnNjRiaXQgcHJlZmV0Y2hhYmxlJy4NCg0KSSBw
cmVzdW1lIHRoZSBsYXR0ZXIgYWxsb3dzIGFkZHJlc3NlcyBhYm92ZSA0RyAoYWx0aG91Z2ggSSd2
ZQ0Kbm90IGxvb2tlZCBjbG9zZWx5IGVub3VnaCB0byBzZWUgaG93IHRoYXQgZ2V0cyB3cml0dGVu
IGludG8gdGhlIEJBUikuDQoNClRoYXQgc2VlbXMgcHJvYmxlbWF0aWMgb24gYXJjaGl0ZWN0dXJl
cyBsaWtlIHg4NiB3aGVyZSB0aGUgYmlvcw0KaW5pdGlhbGlzZXMgdGhlIEJBUnMgYnV0IGRvZXNu
J3QgKHVzdWFsbHkpIGtub3cgd2hldGhlciB0aGUgT1MNCndpbGwgYmUgMzIgb3IgNjRiaXQuDQoN
CldoZXRoZXIgdGhlIGNwdSBtYXBzIHRoZSBhZGRyZXNzZXMgcHJlZmV0Y2hhYmxlIG91Z2h0IHRv
IGRlcGVuZA0Kb24gdGhlIHdheSB0aGUgZHJpdmVyIG1hcHMgdGhlIGFyZWEgLSBzaW5jZSBpdCBp
cyB0aGUgZHJpdmVyDQp0aGF0IGtub3dzIHdoZXRoZXIgcHJlZmV0Y2hpbmcgaXMgc2Vuc2libGUu
DQoNCglEYXZpZA0KDQoNCg==

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-09  8:15   ` Benjamin Herrenschmidt
  2015-10-09  9:02     ` David Laight
@ 2015-10-12  2:58     ` Wei Yang
  2015-10-12  6:55       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 30+ messages in thread
From: Wei Yang @ 2015-10-12  2:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Wei Yang, gwshan, aik, linuxppc-dev, mpe

On Fri, Oct 09, 2015 at 07:15:19PM +1100, Benjamin Herrenschmidt wrote:
>On Fri, 2015-10-09 at 10:46 +0800, Wei Yang wrote:
>> On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>> a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
>> from 64bit prefetchable window, which means M64 BAR can't work on it.
>
>Won't this cause a lot of devices to become unsupported for us ? Or do
>all devices we care about have their BARs marked prefetchable ?
>

You are right. After doing so, some of the devices will not be supported.

Hmm, I thought you know this, since this strategy, use M64 BAR to map IOV
BAR, is proposed by you. This patch doesn't change the function, while make it
more explicit.

>> The reason is PCI bridges support only 2 windows and the kernel code
>> programs bridges in the way that one window is 32bit-nonprefetchable and
>> the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
>> non-prefetchable, it will be mapped into 32bit space and therefore M64
>> cannot be used for it.
>> 
>> This patch makes this explicit.
>
>So PCIe allows for non-prefetchable BARs to be put under prefetchable
>bridge windows as long as the mapping done by the CPU doesn't prefetch,
>I believe. Well it's a natural conclusion of the weird note "Additional
>Guidance on the Prefetchable Bit in Memory Space BARs" page 596 of PCIe
>spec v3.0... it also says that devices should be pretty much free to
>set their prefetchable bit even if they have side effects so.
>
>So maybe we should have that option, rather than just not using the
>devices, allow them to be allocate via the prefetchable window...
>

Based on current linux kernel pci core resource size/assignment algorithm,
when we have 64bit-prefetchable root window, only 64bit-prefetchable BAR could
be assigned in this region. Refer to commit 5b28541552ef "PCI: Restrict 64-bit
prefetchable bridge windows to 64-bit resources" for more detail.

So we have two facts here:
1. We want to use M64 BAR to map VFs, so that VF could be put into individual
PE. And the M64 BAR just map 64bit window.
2. Current MMIO size/assignment algorithm in linux pci core only put
64bit-prefetchable BAR into 64bit prefetchable bridge window, no other BARs.

Your suggestion here is to put a non-prefetchable BAR in prefetchable bridge
window? This would make the MMIO size/assignment algorithm more confusing.

>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 25 +++++++++----------------
>>  1 file changed, 9 insertions(+), 16 deletions(-)
>> 
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 85cbc96..8c031b5 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>  > 	> 	> if (!res->flags || !res->parent)
>>  > 	> 	> 	> continue;
>>  
>> -> 	> 	> if (!pnv_pci_is_mem_pref_64(res->flags))
>> -> 	> 	> 	> continue;
>> -
>>  > 	> 	> /*
>>  > 	> 	>  * The actual IOV BAR range is determined by the start address
>>  > 	> 	>  * and the actual size for num_vfs VFs BAR.  This check is to
>> @@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>  > 	> 	> if (!res->flags || !res->parent)
>>  > 	> 	> 	> continue;
>>  
>> -> 	> 	> if (!pnv_pci_is_mem_pref_64(res->flags))
>> -> 	> 	> 	> continue;
>> -
>>  > 	> 	> size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>>  > 	> 	> res2 = *res;
>>  > 	> 	> res->start += size * offset;
>> @@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>  > 	> 	> if (!res->flags || !res->parent)
>>  > 	> 	> 	> continue;
>>  
>> -> 	> 	> if (!pnv_pci_is_mem_pref_64(res->flags))
>> -> 	> 	> 	> continue;
>> -
>>  > 	> 	> for (j = 0; j < vf_groups; j++) {
>>  > 	> 	> 	> do {
>>  > 	> 	> 	> 	> win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>> @@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  > 	> pdn = pci_get_pdn(pdev);
>>  
>>  > 	> if (phb->type == PNV_PHB_IODA2) {
>> +> 	> 	> if (!pdn->vfs_expanded) {
>> +> 	> 	> 	> dev_info(&pdev->dev, "don't support this SRIOV device"
>> +> 	> 	> 	> 	> " with non 64bit-prefetchable IOV BAR\n");
>> +> 	> 	> 	> return -ENOSPC;
>> +> 	> 	> }
>> +
>>  > 	> 	> /* Calculate available PE for required VFs */
>>  > 	> 	> mutex_lock(&phb->ioda.pe_alloc_mutex);
>>  > 	> 	> pdn->offset = bitmap_find_next_zero_area(
>> @@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>  > 	> 	> if (!res->flags || res->parent)
>>  > 	> 	> 	> continue;
>>  > 	> 	> if (!pnv_pci_is_mem_pref_64(res->flags)) {
>> -> 	> 	> 	> dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>> +> 	> 	> 	> dev_warn(&pdev->dev, "Don't support SR-IOV with"
>> +> 	> 	> 	> 	> 	> " non M64 VF BAR%d: %pR. \n",
>>  > 	> 	> 	> 	>  i, res);
>> -> 	> 	> 	> continue;
>> +> 	> 	> 	> return;
>>  > 	> 	> }
>>  
>>  > 	> 	> size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>> @@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>  > 	> 	> res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>  > 	> 	> if (!res->flags || res->parent)
>>  > 	> 	> 	> continue;
>> -> 	> 	> if (!pnv_pci_is_mem_pref_64(res->flags)) {
>> -> 	> 	> 	> dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
>> -> 	> 	> 	> 	>  i, res);
>> -> 	> 	> 	> continue;
>> -> 	> 	> }
>>  
>>  > 	> 	> dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
>>  > 	> 	> size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-09  9:02     ` David Laight
@ 2015-10-12  3:16       ` Wei Yang
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Yang @ 2015-10-12  3:16 UTC (permalink / raw)
  To: David Laight; +Cc: Benjamin Herrenschmidt, Wei Yang, gwshan, aik, linuxppc-dev

On Fri, Oct 09, 2015 at 09:02:16AM +0000, David Laight wrote:
>From: Benjamin Herrenschmidt
>> Sent: 09 October 2015 09:15
>> On Fri, 2015-10-09 at 10:46 +0800, Wei Yang wrote:
>> > On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>> > a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
>> > from 64bit prefetchable window, which means M64 BAR can't work on it.
>> 
>> Won't this cause a lot of devices to become unsupported for us ? Or do
>> all devices we care about have their BARs marked prefetchable ?
>> 
>> > The reason is PCI bridges support only 2 windows and the kernel code
>> > programs bridges in the way that one window is 32bit-nonprefetchable and
>> > the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
>> > non-prefetchable, it will be mapped into 32bit space and therefore M64
>> > cannot be used for it.
>> >
>> > This patch makes this explicit.
>> 
>> So PCIe allows for non-prefetchable BARs to be put under prefetchable
>> bridge windows as long as the mapping done by the CPU doesn't prefetch,
>> I believe. Well it's a natural conclusion of the weird note "Additional
>> Guidance on the Prefetchable Bit in Memory Space BARs" page 596 of PCIe
>> spec v3.0... it also says that devices should be pretty much free to
>> set their prefetchable bit even if they have side effects so.
>> 
>> So maybe we should have that option, rather than just not using the
>> devices, allow them to be allocate via the prefetchable window...
>
>I'm trying to understand some of this for other reasons...
>(Mostly because NetBSD doesn't allow PCIe addresses above 4G.)

I guess here means the pci root bridge's windows are all below 4G?

>
>AFAICT there are two options '32bit' and '64bit prefetchable'.
>

For a pci bridge, the bridge window could be:
1. 32bit non-prefetchable and 32bit prefetchable
2. 32bit non-prefetchable and 64bit prefetchable

>I presume the latter allows addresses above 4G (although I've
>not looked closely enough to see how that gets written into the BAR).
>

Allow, and also could below 4G.

>That seems problematic on architectures like x86 where the bios
>initialises the BARs but doesn't (usually) know whether the OS
>will be 32 or 64bit.
>

I didn't get the point.

Usually when bios handle the MMIO assignment, kernel will mostly leave it
alone, as I know.

>Whether the cpu maps the addresses prefetchable ought to depend
>on the way the driver maps the area - since it is the driver
>that knows whether prefetching is sensible.
>
>	David
>
>

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-12  2:58     ` Wei Yang
@ 2015-10-12  6:55       ` Benjamin Herrenschmidt
  2015-10-13  2:30         ` Wei Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2015-10-12  6:55 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, aik, linuxppc-dev, mpe

On Mon, 2015-10-12 at 10:58 +0800, Wei Yang wrote:
> On Fri, Oct 09, 2015 at 07:15:19PM +1100, Benjamin Herrenschmidt wrote:
> > On Fri, 2015-10-09 at 10:46 +0800, Wei Yang wrote:
> > > On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
> > > a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
> > > from 64bit prefetchable window, which means M64 BAR can't work on it.
> > 
> > Won't this cause a lot of devices to become unsupported for us ? Or do
> > all devices we care about have their BARs marked prefetchable ?
> > 
> 
> You are right. After doing so, some of the devices will not be supported.
> 
> Hmm, I thought you know this, since this strategy, use M64 BAR to map IOV
> BAR, is proposed by you. This patch doesn't change the function, while make it
> more explicit.

Right but we can make things better by also effectively allowing non
-prefetch BARs to sit in the prefetchable window of a bridge.

> > > The reason is PCI bridges support only 2 windows and the kernel code
> > > programs bridges in the way that one window is 32bit-nonprefetchable and
> > > the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
> > > non-prefetchable, it will be mapped into 32bit space and therefore M64
> > > cannot be used for it.
> > > 
> > > This patch makes this explicit.
> > 
> > So PCIe allows for non-prefetchable BARs to be put under prefetchable
> > bridge windows as long as the mapping done by the CPU doesn't prefetch,
> > I believe. Well it's a natural conclusion of the weird note "Additional
> > Guidance on the Prefetchable Bit in Memory Space BARs" page 596 of PCIe
> > spec v3.0... it also says that devices should be pretty much free to
> > set their prefetchable bit even if they have side effects so.
> > 
> > So maybe we should have that option, rather than just not using the
> > devices, allow them to be allocate via the prefetchable window...
> > 
> 
> Based on current linux kernel pci core resource size/assignment algorithm,
> when we have 64bit-prefetchable root window, only 64bit-prefetchable BAR could
> be assigned in this region. Refer to commit 5b28541552ef "PCI: Restrict 64-bit
> prefetchable bridge windows to 64-bit resources" for more detail.

The above patch was about avoiding 32-bit prefetchable, it's not about
64-bit non-prefetchable.

> So we have two facts here:
> 1. We want to use M64 BAR to map VFs, so that VF could be put into individual
> PE. And the M64 BAR just map 64bit window.
> 2. Current MMIO size/assignment algorithm in linux pci core only put
> 64bit-prefetchable BAR into 64bit prefetchable bridge window, no other BARs.
> 
> Your suggestion here is to put a non-prefetchable BAR in prefetchable bridge
> window? This would make the MMIO size/assignment algorithm more confusing.

Right, in the long run, both us and x86 might need to be able to do
that, though exclusively for PCIe. PCI/PCI-X bridges can prefetch.

Cheers,
Ben.

> > > Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> > > Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> > > Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > > ---
> > >  arch/powerpc/platforms/powernv/pci-ioda.c | 25 +++++++++----------------
> > >  1 file changed, 9 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > index 85cbc96..8c031b5 100644
> > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > @@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
> > >  > > > > 	> > > > > > > 	> > > > if (!res->flags || !res->parent)
> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
> > >  
> > > -> > > > 	> > > > > > > 	> > > > if (!pnv_pci_is_mem_pref_64(res->flags))
> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
> > > -
> > >  > > > > 	> > > > > > > 	> > > > /*
> > >  > > > > 	> > > > > > > 	> > > >  * The actual IOV BAR range is determined by the start address
> > >  > > > > 	> > > > > > > 	> > > >  * and the actual size for num_vfs VFs BAR.  This check is to
> > > @@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
> > >  > > > > 	> > > > > > > 	> > > > if (!res->flags || !res->parent)
> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
> > >  
> > > -> > > > 	> > > > > > > 	> > > > if (!pnv_pci_is_mem_pref_64(res->flags))
> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
> > > -
> > >  > > > > 	> > > > > > > 	> > > > size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
> > >  > > > > 	> > > > > > > 	> > > > res2 = *res;
> > >  > > > > 	> > > > > > > 	> > > > res->start += size * offset;
> > > @@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> > >  > > > > 	> > > > > > > 	> > > > if (!res->flags || !res->parent)
> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
> > >  
> > > -> > > > 	> > > > > > > 	> > > > if (!pnv_pci_is_mem_pref_64(res->flags))
> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
> > > -
> > >  > > > > 	> > > > > > > 	> > > > for (j = 0; j < vf_groups; j++) {
> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > do {
> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > > win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
> > > @@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> > >  > > > > 	> > > > pdn = pci_get_pdn(pdev);
> > >  
> > >  > > > > 	> > > > if (phb->type == PNV_PHB_IODA2) {
> > > +> > > > 	> > > > > > > 	> > > > if (!pdn->vfs_expanded) {
> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > dev_info(&pdev->dev, "don't support this SRIOV device"
> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > > " with non 64bit-prefetchable IOV BAR\n");
> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > return -ENOSPC;
> > > +> > > > 	> > > > > > > 	> > > > }
> > > +
> > >  > > > > 	> > > > > > > 	> > > > /* Calculate available PE for required VFs */
> > >  > > > > 	> > > > > > > 	> > > > mutex_lock(&phb->ioda.pe_alloc_mutex);
> > >  > > > > 	> > > > > > > 	> > > > pdn->offset = bitmap_find_next_zero_area(
> > > @@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> > >  > > > > 	> > > > > > > 	> > > > if (!res->flags || res->parent)
> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
> > >  > > > > 	> > > > > > > 	> > > > if (!pnv_pci_is_mem_pref_64(res->flags)) {
> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > dev_warn(&pdev->dev, "Don't support SR-IOV with"
> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > > " non M64 VF BAR%d: %pR. \n",
> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > >  i, res);
> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > return;
> > >  > > > > 	> > > > > > > 	> > > > }
> > >  
> > >  > > > > 	> > > > > > > 	> > > > size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
> > > @@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> > >  > > > > 	> > > > > > > 	> > > > res = &pdev->resource[i + PCI_IOV_RESOURCES];
> > >  > > > > 	> > > > > > > 	> > > > if (!res->flags || res->parent)
> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
> > > -> > > > 	> > > > > > > 	> > > > if (!pnv_pci_is_mem_pref_64(res->flags)) {
> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > >  i, res);
> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
> > > -> > > > 	> > > > > > > 	> > > > }
> > >  
> > >  > > > > 	> > > > > > > 	> > > > dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
> > >  > > > > 	> > > > > > > 	> > > > size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-10-09  2:46 ` [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR Wei Yang
@ 2015-10-12 23:55   ` Gavin Shan
  2015-10-13  2:50     ` Wei Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Gavin Shan @ 2015-10-12 23:55 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, aik, benh, linuxppc-dev, mpe

On Fri, Oct 09, 2015 at 10:46:53AM +0800, Wei Yang wrote:
>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>BARs in Single PE mode to cover the number of VFs required to be enabled.
>By doing so, several VFs would be in one VF Group and leads to interference
>between VFs in the same group.
>
>And in this patch, m64_wins is renamed to m64_map, which means index number
>of the M64 BAR used to map the VF BAR. Based on Gavin's comments.
>
>This patch changes the design by using one M64 BAR in Single PE mode for
>one VF BAR. This gives absolute isolation for VFs.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>---
> arch/powerpc/include/asm/pci-bridge.h     |   5 +-
> arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++++++------------------
> 2 files changed, 68 insertions(+), 106 deletions(-)
>
>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>index 712add5..8aeba4c 100644
>--- a/arch/powerpc/include/asm/pci-bridge.h
>+++ b/arch/powerpc/include/asm/pci-bridge.h
>@@ -214,10 +214,9 @@ struct pci_dn {
> 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
> 	u16     num_vfs;		/* number of VFs enabled*/
> 	int     offset;			/* PE# for the first VF PE */
>-#define M64_PER_IOV 4
>-	int     m64_per_iov;
>+	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
> #define IODA_INVALID_M64        (-1)
>-	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>+	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
> #endif /* CONFIG_PCI_IOV */
> #endif
> 	struct list_head child_list;
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 7da476b..2886f90 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
> }
>
> #ifdef CONFIG_PCI_IOV
>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
> {
> 	struct pci_bus        *bus;
> 	struct pci_controller *hose;
> 	struct pnv_phb        *phb;
> 	struct pci_dn         *pdn;
> 	int                    i, j;
>+	int                    m64_bars;
>
> 	bus = pdev->bus;
> 	hose = pci_bus_to_host(bus);
> 	phb = hose->private_data;
> 	pdn = pci_get_pdn(pdev);
>
>+	if (pdn->m64_single_mode)
>+		m64_bars = num_vfs;
>+	else
>+		m64_bars = 1;
>+
> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>-		for (j = 0; j < M64_PER_IOV; j++) {
>-			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
>+		for (j = 0; j < m64_bars; j++) {
>+			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
> 				continue;
> 			opal_pci_phb_mmio_enable(phb->opal_id,
>-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
>-			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>+				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
>+			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
>+			pdn->m64_map[j][i] = IODA_INVALID_M64;
> 		}
>
>+	kfree(pdn->m64_map);
> 	return 0;
> }
>
>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 	int                    total_vfs;
> 	resource_size_t        size, start;
> 	int                    pe_num;
>-	int                    vf_groups;
>-	int                    vf_per_group;
>+	int                    m64_bars;
>
> 	bus = pdev->bus;
> 	hose = pci_bus_to_host(bus);
>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 	pdn = pci_get_pdn(pdev);
> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>
>-	/* Initialize the m64_wins to IODA_INVALID_M64 */
>-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>-		for (j = 0; j < M64_PER_IOV; j++)
>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>+	if (pdn->m64_single_mode)
>+		m64_bars = num_vfs;
>+	else
>+		m64_bars = 1;
>+
>+	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
>+	if (!pdn->m64_map)
>+		return -ENOMEM;
>+	/* Initialize the m64_map to IODA_INVALID_M64 */
>+	for (i = 0; i < m64_bars ; i++)
>+		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
>+			pdn->m64_map[i][j] = IODA_INVALID_M64;
>
>-	if (pdn->m64_per_iov == M64_PER_IOV) {
>-		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
>-		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
>-			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>-	} else {
>-		vf_groups = 1;
>-		vf_per_group = 1;
>-	}
>
> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
> 		if (!res->flags || !res->parent)
> 			continue;
>
>-		for (j = 0; j < vf_groups; j++) {
>+		for (j = 0; j < m64_bars; j++) {
> 			do {
> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
> 						phb->ioda.m64_bar_idx + 1, 0);
>@@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 					goto m64_failed;
> 			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
>
>-			pdn->m64_wins[i][j] = win;
>+			pdn->m64_map[j][i] = win;
>
>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>+			if (pdn->m64_single_mode) {
> 				size = pci_iov_resource_size(pdev,
> 							PCI_IOV_RESOURCES + i);
>-				size = size * vf_per_group;
> 				start = res->start + size * j;
> 			} else {
> 				size = resource_size(res);
>@@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 			}
>
> 			/* Map the M64 here */
>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>+			if (pdn->m64_single_mode) {
> 				pe_num = pdn->offset + j;
> 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> 						pe_num, OPAL_M64_WINDOW_TYPE,
>-						pdn->m64_wins[i][j], 0);
>+						pdn->m64_map[j][i], 0);
> 			}
>
> 			rc = opal_pci_set_phb_mem_window(phb->opal_id,
> 						 OPAL_M64_WINDOW_TYPE,
>-						 pdn->m64_wins[i][j],
>+						 pdn->m64_map[j][i],
> 						 start,
> 						 0, /* unused */
> 						 size);
>@@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 				goto m64_failed;
> 			}
>
>-			if (pdn->m64_per_iov == M64_PER_IOV)
>+			if (pdn->m64_single_mode)
> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
> 			else
> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
>
> 			if (rc != OPAL_SUCCESS) {
> 				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
>@@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 	return 0;
>
> m64_failed:
>-	pnv_pci_vf_release_m64(pdev);
>+	pnv_pci_vf_release_m64(pdev, num_vfs);
> 	return -EBUSY;
> }
>
>@@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
> 	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
> }
>
>-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
> {
> 	struct pci_bus        *bus;
> 	struct pci_controller *hose;
> 	struct pnv_phb        *phb;
> 	struct pnv_ioda_pe    *pe, *pe_n;
> 	struct pci_dn         *pdn;
>-	u16                    vf_index;
>-	int64_t                rc;
>
> 	bus = pdev->bus;
> 	hose = pci_bus_to_host(bus);
>@@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
> 	if (!pdev->is_physfn)
> 		return;
>
>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>-		int   vf_group;
>-		int   vf_per_group;
>-		int   vf_index1;
>-
>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>-
>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
>-			for (vf_index = vf_group * vf_per_group;
>-				vf_index < (vf_group + 1) * vf_per_group &&
>-				vf_index < num_vfs;
>-				vf_index++)
>-				for (vf_index1 = vf_group * vf_per_group;
>-					vf_index1 < (vf_group + 1) * vf_per_group &&
>-					vf_index1 < num_vfs;
>-					vf_index1++){
>-
>-					rc = opal_pci_set_peltv(phb->opal_id,
>-						pdn->offset + vf_index,
>-						pdn->offset + vf_index1,
>-						OPAL_REMOVE_PE_FROM_DOMAIN);
>-
>-					if (rc)
>-					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
>-						__func__,
>-						pdn->offset + vf_index1, rc);
>-				}
>-	}
>-
> 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
> 		if (pe->parent_dev != pdev)
> 			continue;
>@@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
> 	num_vfs = pdn->num_vfs;
>
> 	/* Release VF PEs */
>-	pnv_ioda_release_vf_PE(pdev, num_vfs);
>+	pnv_ioda_release_vf_PE(pdev);
>
> 	if (phb->type == PNV_PHB_IODA2) {
>-		if (pdn->m64_per_iov == 1)
>+		if (!pdn->m64_single_mode)
> 			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>
> 		/* Release M64 windows */
>-		pnv_pci_vf_release_m64(pdev);
>+		pnv_pci_vf_release_m64(pdev, num_vfs);
>
> 		/* Release PE numbers */
> 		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>@@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
> 	int                    pe_num;
> 	u16                    vf_index;
> 	struct pci_dn         *pdn;
>-	int64_t                rc;
>
> 	bus = pdev->bus;
> 	hose = pci_bus_to_host(bus);
>@@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>
> 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
> 	}
>-
>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>-		int   vf_group;
>-		int   vf_per_group;
>-		int   vf_index1;
>-
>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>-
>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
>-			for (vf_index = vf_group * vf_per_group;
>-			     vf_index < (vf_group + 1) * vf_per_group &&
>-			     vf_index < num_vfs;
>-			     vf_index++) {
>-				for (vf_index1 = vf_group * vf_per_group;
>-				     vf_index1 < (vf_group + 1) * vf_per_group &&
>-				     vf_index1 < num_vfs;
>-				     vf_index1++) {
>-
>-					rc = opal_pci_set_peltv(phb->opal_id,
>-						pdn->offset + vf_index,
>-						pdn->offset + vf_index1,
>-						OPAL_ADD_PE_TO_DOMAIN);
>-
>-					if (rc)
>-					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
>-						__func__,
>-						pdn->offset + vf_index1, rc);
>-				}
>-			}
>-		}
>-	}
> }
>
> int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>@@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 			return -ENOSPC;
> 		}
>
>+		/*
>+		 * When M64 BARs functions in Single PE mode, the number of VFs
>+		 * could be enabled must be less than the number of M64 BARs.
>+		 */
>+		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {
>+			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
>+			return -EBUSY;
>+		}
>+
> 		/* Calculate available PE for required VFs */
> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
> 		pdn->offset = bitmap_find_next_zero_area(
>@@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 		 * the IOV BAR according to the PE# allocated to the VFs.
> 		 * Otherwise, the PE# for the VF will conflict with others.
> 		 */
>-		if (pdn->m64_per_iov == 1) {
>+		if (!pdn->m64_single_mode) {
> 			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
> 			if (ret)
> 				goto m64_failed;
>@@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 	/* Allocate PCI data */
> 	add_dev_pci_data(pdev);
>
>-	pnv_pci_sriov_enable(pdev, num_vfs);
>-	return 0;
>+	return pnv_pci_sriov_enable(pdev, num_vfs);
> }
> #endif /* CONFIG_PCI_IOV */
>
>@@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>
> 	pdn = pci_get_pdn(pdev);
> 	pdn->vfs_expanded = 0;
>+	pdn->m64_single_mode = false;
>
> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>-	pdn->m64_per_iov = 1;
> 	mul = phb->ioda.total_pe;
>
> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>@@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> 		if (size > (1 << 26)) {
> 			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
> 				 i, res);
>-			pdn->m64_per_iov = M64_PER_IOV;
> 			mul = roundup_pow_of_two(total_vfs);
>+			pdn->m64_single_mode = true;
> 			break;
> 		}
> 	}
>@@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
> static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
> 						      int resno)
> {
>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>+	struct pnv_phb *phb = hose->private_data;
> 	struct pci_dn *pdn = pci_get_pdn(pdev);
> 	resource_size_t align;
>
>@@ -2995,12 +2947,23 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
> 	 * SR-IOV. While from hardware perspective, the range mapped by M64
> 	 * BAR should be size aligned.
> 	 *
>+	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
>+	 * powernv-specific hardware restriction is gone. But if just use the
>+	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
>+	 * in one segment of M64 #15, which introduces the PE conflict between
>+	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
>+	 * m64_segsize.
>+	 *

When M64 BARs run in single mode, it still requires 32MB aligned. Does
"the extra powernv-specific hardware restriction" includes the limitation
or not?

> 	 * This function returns the total IOV BAR size if M64 BAR is in
> 	 * Shared PE mode or just the individual size if not.
>+	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
>+	 * M64 segment size if IOV BAR size is less.
> 	 */
> 	align = pci_iov_resource_size(pdev, resno);
> 	if (!pdn->vfs_expanded)
> 		return align;
>+	if (pdn->m64_single_mode)
>+		return max(align, (resource_size_t)phb->ioda.m64_segsize);
>
> 	return pdn->vfs_expanded * align;
> }
>-- 
>2.5.0
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-09  2:46 ` [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
  2015-10-09  8:15   ` Benjamin Herrenschmidt
@ 2015-10-13  0:01   ` Gavin Shan
  2015-10-13  1:49     ` Wei Yang
  1 sibling, 1 reply; 30+ messages in thread
From: Gavin Shan @ 2015-10-13  0:01 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, aik, benh, linuxppc-dev, mpe

On Fri, Oct 09, 2015 at 10:46:51AM +0800, Wei Yang wrote:
>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
>from 64bit prefetchable window, which means M64 BAR can't work on it.
>
>The reason is PCI bridges support only 2 windows and the kernel code
>programs bridges in the way that one window is 32bit-nonprefetchable and
>the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
>non-prefetchable, it will be mapped into 32bit space and therefore M64
>cannot be used for it.
>
>This patch makes this explicit.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>---
> arch/powerpc/platforms/powernv/pci-ioda.c | 25 +++++++++----------------
> 1 file changed, 9 insertions(+), 16 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 85cbc96..8c031b5 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
> 		if (!res->flags || !res->parent)
> 			continue;
>
>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>-			continue;
>-
> 		/*
> 		 * The actual IOV BAR range is determined by the start address
> 		 * and the actual size for num_vfs VFs BAR.  This check is to
>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
> 		if (!res->flags || !res->parent)
> 			continue;
>
>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>-			continue;
>-
> 		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
> 		res2 = *res;
> 		res->start += size * offset;
>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 		if (!res->flags || !res->parent)
> 			continue;
>
>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>-			continue;
>-
> 		for (j = 0; j < vf_groups; j++) {
> 			do {
> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 	pdn = pci_get_pdn(pdev);
>
> 	if (phb->type == PNV_PHB_IODA2) {
>+		if (!pdn->vfs_expanded) {
>+			dev_info(&pdev->dev, "don't support this SRIOV device"
>+				" with non 64bit-prefetchable IOV BAR\n");
>+			return -ENOSPC;
>+		}
>+
> 		/* Calculate available PE for required VFs */
> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
> 		pdn->offset = bitmap_find_next_zero_area(
>@@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> 		if (!res->flags || res->parent)
> 			continue;
> 		if (!pnv_pci_is_mem_pref_64(res->flags)) {
>-			dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>+			dev_warn(&pdev->dev, "Don't support SR-IOV with"
>+					" non M64 VF BAR%d: %pR. \n",
> 				 i, res);
>-			continue;
>+			return;

When the IOV BAR isn't 64-bits prefetchable one, it's going to be allocated from
M32 aperatus. However, the IOV BAR won't be used as the SRIOV capability on the PF
can't be enabled. Occasionally, the IOV BAR is huge (e.g. 4GB) and it eats up all
M32 space, BARs other than 64-bits prefetchable BARs on other device will fail in
this case. Would it a problem you ever thought of?

> 		}
>
> 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>@@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
> 		if (!res->flags || res->parent)
> 			continue;
>-		if (!pnv_pci_is_mem_pref_64(res->flags)) {
>-			dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
>-				 i, res);
>-			continue;
>-		}
>
> 		dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
> 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>-- 
>2.5.0
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 2/6] powerpc/powernv: simplify the calculation of iov resource alignment
  2015-10-09  2:46 ` [PATCH V5 2/6] powerpc/powernv: simplify the calculation of iov resource alignment Wei Yang
@ 2015-10-13  0:13   ` Gavin Shan
  2015-10-13  2:45     ` Wei Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Gavin Shan @ 2015-10-13  0:13 UTC (permalink / raw)
  To: Wei Yang; +Cc: gwshan, aik, benh, linuxppc-dev, mpe

On Fri, Oct 09, 2015 at 10:46:52AM +0800, Wei Yang wrote:
>The alignment of IOV BAR on PowerNV platform is the total size of the IOV
>BAR. No matter whether the IOV BAR is extended with number of
>roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
>size could be calculated by (vfs_expanded * VF_BAR_size).
>
>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
>first case.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>---
> arch/powerpc/platforms/powernv/pci-ioda.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 8c031b5..7da476b 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -2988,17 +2988,21 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
> 						      int resno)
> {
> 	struct pci_dn *pdn = pci_get_pdn(pdev);
>-	resource_size_t align, iov_align;
>-
>-	iov_align = resource_size(&pdev->resource[resno]);
>-	if (iov_align)
>-		return iov_align;
>+	resource_size_t align;
>
>+	/*
>+	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
>+	 * SR-IOV. While from hardware perspective, the range mapped by M64
>+	 * BAR should be size aligned.
>+	 *
>+	 * This function returns the total IOV BAR size if M64 BAR is in
>+	 * Shared PE mode or just the individual size if not.
>+	 */

s/the invidial size/VF BAR size

> 	align = pci_iov_resource_size(pdev, resno);
>-	if (pdn->vfs_expanded)
>-		return pdn->vfs_expanded * align;
>+	if (!pdn->vfs_expanded)
>+		return align;
>
>-	return align;
>+	return pdn->vfs_expanded * align;

There is no difference before/after the changes. why this change is needed?

> }
> #endif /* CONFIG_PCI_IOV */
>
>-- 
>2.5.0
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-13  0:01   ` Gavin Shan
@ 2015-10-13  1:49     ` Wei Yang
  2015-10-13  3:20       ` Gavin Shan
  0 siblings, 1 reply; 30+ messages in thread
From: Wei Yang @ 2015-10-13  1:49 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Wei Yang, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 11:01:24AM +1100, Gavin Shan wrote:
>On Fri, Oct 09, 2015 at 10:46:51AM +0800, Wei Yang wrote:
>>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>>a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
>>from 64bit prefetchable window, which means M64 BAR can't work on it.
>>
>>The reason is PCI bridges support only 2 windows and the kernel code
>>programs bridges in the way that one window is 32bit-nonprefetchable and
>>the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
>>non-prefetchable, it will be mapped into 32bit space and therefore M64
>>cannot be used for it.
>>
>>This patch makes this explicit.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>---
>> arch/powerpc/platforms/powernv/pci-ioda.c | 25 +++++++++----------------
>> 1 file changed, 9 insertions(+), 16 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 85cbc96..8c031b5 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>> 		if (!res->flags || !res->parent)
>> 			continue;
>>
>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>-			continue;
>>-
>> 		/*
>> 		 * The actual IOV BAR range is determined by the start address
>> 		 * and the actual size for num_vfs VFs BAR.  This check is to
>>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>> 		if (!res->flags || !res->parent)
>> 			continue;
>>
>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>-			continue;
>>-
>> 		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>> 		res2 = *res;
>> 		res->start += size * offset;
>>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>> 		if (!res->flags || !res->parent)
>> 			continue;
>>
>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>-			continue;
>>-
>> 		for (j = 0; j < vf_groups; j++) {
>> 			do {
>> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>> 	pdn = pci_get_pdn(pdev);
>>
>> 	if (phb->type == PNV_PHB_IODA2) {
>>+		if (!pdn->vfs_expanded) {
>>+			dev_info(&pdev->dev, "don't support this SRIOV device"
>>+				" with non 64bit-prefetchable IOV BAR\n");
>>+			return -ENOSPC;
>>+		}
>>+
>> 		/* Calculate available PE for required VFs */
>> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
>> 		pdn->offset = bitmap_find_next_zero_area(
>>@@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>> 		if (!res->flags || res->parent)
>> 			continue;
>> 		if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>-			dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>>+			dev_warn(&pdev->dev, "Don't support SR-IOV with"
>>+					" non M64 VF BAR%d: %pR. \n",
>> 				 i, res);
>>-			continue;
>>+			return;
>
>When the IOV BAR isn't 64-bits prefetchable one, it's going to be allocated from
>M32 aperatus. However, the IOV BAR won't be used as the SRIOV capability on the PF
>can't be enabled. Occasionally, the IOV BAR is huge (e.g. 4GB) and it eats up all
>M32 space, BARs other than 64-bits prefetchable BARs on other device will fail in
>this case. Would it a problem you ever thought of?
>

IOV BARs are in optional list during assignment.

>> 		}
>>
>> 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>@@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>> 		if (!res->flags || res->parent)
>> 			continue;
>>-		if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>-			dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
>>-				 i, res);
>>-			continue;
>>-		}
>>
>> 		dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
>> 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>-- 
>>2.5.0
>>

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-12  6:55       ` Benjamin Herrenschmidt
@ 2015-10-13  2:30         ` Wei Yang
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Yang @ 2015-10-13  2:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Wei Yang, gwshan, aik, linuxppc-dev, mpe

On Mon, Oct 12, 2015 at 12:25:24PM +0530, Benjamin Herrenschmidt wrote:
>On Mon, 2015-10-12 at 10:58 +0800, Wei Yang wrote:
>> On Fri, Oct 09, 2015 at 07:15:19PM +1100, Benjamin Herrenschmidt wrote:
>> > On Fri, 2015-10-09 at 10:46 +0800, Wei Yang wrote:
>> > > On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>> > > a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
>> > > from 64bit prefetchable window, which means M64 BAR can't work on it.
>> > 
>> > Won't this cause a lot of devices to become unsupported for us ? Or do
>> > all devices we care about have their BARs marked prefetchable ?
>> > 
>> 
>> You are right. After doing so, some of the devices will not be supported.
>> 
>> Hmm, I thought you know this, since this strategy, use M64 BAR to map IOV
>> BAR, is proposed by you. This patch doesn't change the function, while make it
>> more explicit.
>
>Right but we can make things better by also effectively allowing non
>-prefetch BARs to sit in the prefetchable window of a bridge.
>
>> > > The reason is PCI bridges support only 2 windows and the kernel code
>> > > programs bridges in the way that one window is 32bit-nonprefetchable and
>> > > the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
>> > > non-prefetchable, it will be mapped into 32bit space and therefore M64
>> > > cannot be used for it.
>> > > 
>> > > This patch makes this explicit.
>> > 
>> > So PCIe allows for non-prefetchable BARs to be put under prefetchable
>> > bridge windows as long as the mapping done by the CPU doesn't prefetch,
>> > I believe. Well it's a natural conclusion of the weird note "Additional
>> > Guidance on the Prefetchable Bit in Memory Space BARs" page 596 of PCIe
>> > spec v3.0... it also says that devices should be pretty much free to
>> > set their prefetchable bit even if they have side effects so.
>> > 
>> > So maybe we should have that option, rather than just not using the
>> > devices, allow them to be allocate via the prefetchable window...
>> > 
>> 
>> Based on current linux kernel pci core resource size/assignment algorithm,
>> when we have 64bit-prefetchable root window, only 64bit-prefetchable BAR could
>> be assigned in this region. Refer to commit 5b28541552ef "PCI: Restrict 64-bit
>> prefetchable bridge windows to 64-bit resources" for more detail.
>
>The above patch was about avoiding 32-bit prefetchable, it's not about
>64-bit non-prefetchable.
>
>> So we have two facts here:
>> 1. We want to use M64 BAR to map VFs, so that VF could be put into individual
>> PE. And the M64 BAR just map 64bit window.
>> 2. Current MMIO size/assignment algorithm in linux pci core only put
>> 64bit-prefetchable BAR into 64bit prefetchable bridge window, no other BARs.
>> 
>> Your suggestion here is to put a non-prefetchable BAR in prefetchable bridge
>> window? This would make the MMIO size/assignment algorithm more confusing.
>
>Right, in the long run, both us and x86 might need to be able to do
>that, though exclusively for PCIe. PCI/PCI-X bridges can prefetch.
>

Thanks for the proposal, this would benefit linux kernel in the long run, I
believe.
While I have some concerns to do this in this patch set:
1. I am still not very clear in which case this is allowed, so I am not
   confident to do the correct change now.
   The device type ? PCIe vs PCI/PCI-X
   The cpu address prefetchable property?
   The arch type? x86, ppc or other arch?
   Last but not the least, the rule to choose the correct bridge window for a
   BAR looks confusing to me after this change. I need some time to understand
   it.
2. Since this is a more generic code change, we need include linux-pci mail
   list.
3. Divide the problem and solve them one by one
   The main purpose of this patch set is to make each VF independent.
   We may face different problems during the evolution of linux kernel, sounds
   it is difficult to address them all in once. How about let this patch set
   address the VF independence first. And we can improve the pci core on MMIO
   assignment in another thread?
   
>Cheers,
>Ben.
>
>> > > Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> > > Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> > > Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> > > ---
>> > >  arch/powerpc/platforms/powernv/pci-ioda.c | 25 +++++++++----------------
>> > >  1 file changed, 9 insertions(+), 16 deletions(-)
>> > > 
>> > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> > > index 85cbc96..8c031b5 100644
>> > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> > > @@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>> > >  > > > > 	> > > > > > > 	> > > > if (!res->flags || !res->parent)
>> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
>> > >  
>> > > -> > > > 	> > > > > > > 	> > > > if (!pnv_pci_is_mem_pref_64(res->flags))
>> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
>> > > -
>> > >  > > > > 	> > > > > > > 	> > > > /*
>> > >  > > > > 	> > > > > > > 	> > > >  * The actual IOV BAR range is determined by the start address
>> > >  > > > > 	> > > > > > > 	> > > >  * and the actual size for num_vfs VFs BAR.  This check is to
>> > > @@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>> > >  > > > > 	> > > > > > > 	> > > > if (!res->flags || !res->parent)
>> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
>> > >  
>> > > -> > > > 	> > > > > > > 	> > > > if (!pnv_pci_is_mem_pref_64(res->flags))
>> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
>> > > -
>> > >  > > > > 	> > > > > > > 	> > > > size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>> > >  > > > > 	> > > > > > > 	> > > > res2 = *res;
>> > >  > > > > 	> > > > > > > 	> > > > res->start += size * offset;
>> > > @@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>> > >  > > > > 	> > > > > > > 	> > > > if (!res->flags || !res->parent)
>> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
>> > >  
>> > > -> > > > 	> > > > > > > 	> > > > if (!pnv_pci_is_mem_pref_64(res->flags))
>> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
>> > > -
>> > >  > > > > 	> > > > > > > 	> > > > for (j = 0; j < vf_groups; j++) {
>> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > do {
>> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > > win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>> > > @@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>> > >  > > > > 	> > > > pdn = pci_get_pdn(pdev);
>> > >  
>> > >  > > > > 	> > > > if (phb->type == PNV_PHB_IODA2) {
>> > > +> > > > 	> > > > > > > 	> > > > if (!pdn->vfs_expanded) {
>> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > dev_info(&pdev->dev, "don't support this SRIOV device"
>> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > > " with non 64bit-prefetchable IOV BAR\n");
>> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > return -ENOSPC;
>> > > +> > > > 	> > > > > > > 	> > > > }
>> > > +
>> > >  > > > > 	> > > > > > > 	> > > > /* Calculate available PE for required VFs */
>> > >  > > > > 	> > > > > > > 	> > > > mutex_lock(&phb->ioda.pe_alloc_mutex);
>> > >  > > > > 	> > > > > > > 	> > > > pdn->offset = bitmap_find_next_zero_area(
>> > > @@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>> > >  > > > > 	> > > > > > > 	> > > > if (!res->flags || res->parent)
>> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
>> > >  > > > > 	> > > > > > > 	> > > > if (!pnv_pci_is_mem_pref_64(res->flags)) {
>> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > dev_warn(&pdev->dev, "Don't support SR-IOV with"
>> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > > " non M64 VF BAR%d: %pR. \n",
>> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > >  i, res);
>> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
>> > > +> > > > 	> > > > > > > 	> > > > > > > 	> > > > return;
>> > >  > > > > 	> > > > > > > 	> > > > }
>> > >  
>> > >  > > > > 	> > > > > > > 	> > > > size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>> > > @@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>> > >  > > > > 	> > > > > > > 	> > > > res = &pdev->resource[i + PCI_IOV_RESOURCES];
>> > >  > > > > 	> > > > > > > 	> > > > if (!res->flags || res->parent)
>> > >  > > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
>> > > -> > > > 	> > > > > > > 	> > > > if (!pnv_pci_is_mem_pref_64(res->flags)) {
>> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
>> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > > > > 	> > > >  i, res);
>> > > -> > > > 	> > > > > > > 	> > > > > > > 	> > > > continue;
>> > > -> > > > 	> > > > > > > 	> > > > }
>> > >  
>> > >  > > > > 	> > > > > > > 	> > > > dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
>> > >  > > > > 	> > > > > > > 	> > > > size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>> 

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 2/6] powerpc/powernv: simplify the calculation of iov resource alignment
  2015-10-13  0:13   ` Gavin Shan
@ 2015-10-13  2:45     ` Wei Yang
  2015-10-13  3:27       ` Gavin Shan
  0 siblings, 1 reply; 30+ messages in thread
From: Wei Yang @ 2015-10-13  2:45 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Wei Yang, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 11:13:50AM +1100, Gavin Shan wrote:
>On Fri, Oct 09, 2015 at 10:46:52AM +0800, Wei Yang wrote:
>>The alignment of IOV BAR on PowerNV platform is the total size of the IOV
>>BAR. No matter whether the IOV BAR is extended with number of
>>roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
>>size could be calculated by (vfs_expanded * VF_BAR_size).
>>
>>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
>>first case.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>---
>> arch/powerpc/platforms/powernv/pci-ioda.c | 20 ++++++++++++--------
>> 1 file changed, 12 insertions(+), 8 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 8c031b5..7da476b 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2988,17 +2988,21 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>> 						      int resno)
>> {
>> 	struct pci_dn *pdn = pci_get_pdn(pdev);
>>-	resource_size_t align, iov_align;
>>-
>>-	iov_align = resource_size(&pdev->resource[resno]);
>>-	if (iov_align)
>>-		return iov_align;
>>+	resource_size_t align;
>>
>>+	/*
>>+	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
>>+	 * SR-IOV. While from hardware perspective, the range mapped by M64
>>+	 * BAR should be size aligned.
>>+	 *
>>+	 * This function returns the total IOV BAR size if M64 BAR is in
>>+	 * Shared PE mode or just the individual size if not.
>>+	 */
>
>s/the invidial size/VF BAR size
>
>> 	align = pci_iov_resource_size(pdev, resno);
>>-	if (pdn->vfs_expanded)
>>-		return pdn->vfs_expanded * align;
>>+	if (!pdn->vfs_expanded)
>>+		return align;
>>
>>-	return align;
>>+	return pdn->vfs_expanded * align;
>
>There is no difference before/after the changes. why this change is needed?
>

After change the logic is more clear.

Alignment equals to the total size when IOV BAR is expanded or equals to the
VF BAR size. We don't need to check whether IOV BAR is truncated.

>> }
>> #endif /* CONFIG_PCI_IOV */
>>
>>-- 
>>2.5.0
>>

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-10-12 23:55   ` Gavin Shan
@ 2015-10-13  2:50     ` Wei Yang
  2015-10-13  3:32       ` Gavin Shan
  0 siblings, 1 reply; 30+ messages in thread
From: Wei Yang @ 2015-10-13  2:50 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Wei Yang, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 10:55:27AM +1100, Gavin Shan wrote:
>On Fri, Oct 09, 2015 at 10:46:53AM +0800, Wei Yang wrote:
>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>>BARs in Single PE mode to cover the number of VFs required to be enabled.
>>By doing so, several VFs would be in one VF Group and leads to interference
>>between VFs in the same group.
>>
>>And in this patch, m64_wins is renamed to m64_map, which means index number
>>of the M64 BAR used to map the VF BAR. Based on Gavin's comments.
>>
>>This patch changes the design by using one M64 BAR in Single PE mode for
>>one VF BAR. This gives absolute isolation for VFs.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>---
>> arch/powerpc/include/asm/pci-bridge.h     |   5 +-
>> arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++++++------------------
>> 2 files changed, 68 insertions(+), 106 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index 712add5..8aeba4c 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -214,10 +214,9 @@ struct pci_dn {
>> 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>> 	u16     num_vfs;		/* number of VFs enabled*/
>> 	int     offset;			/* PE# for the first VF PE */
>>-#define M64_PER_IOV 4
>>-	int     m64_per_iov;
>>+	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
>> #define IODA_INVALID_M64        (-1)
>>-	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>+	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>> #endif /* CONFIG_PCI_IOV */
>> #endif
>> 	struct list_head child_list;
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 7da476b..2886f90 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
>> }
>>
>> #ifdef CONFIG_PCI_IOV
>>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>> {
>> 	struct pci_bus        *bus;
>> 	struct pci_controller *hose;
>> 	struct pnv_phb        *phb;
>> 	struct pci_dn         *pdn;
>> 	int                    i, j;
>>+	int                    m64_bars;
>>
>> 	bus = pdev->bus;
>> 	hose = pci_bus_to_host(bus);
>> 	phb = hose->private_data;
>> 	pdn = pci_get_pdn(pdev);
>>
>>+	if (pdn->m64_single_mode)
>>+		m64_bars = num_vfs;
>>+	else
>>+		m64_bars = 1;
>>+
>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>-		for (j = 0; j < M64_PER_IOV; j++) {
>>-			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
>>+		for (j = 0; j < m64_bars; j++) {
>>+			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
>> 				continue;
>> 			opal_pci_phb_mmio_enable(phb->opal_id,
>>-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
>>-			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>+				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
>>+			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
>>+			pdn->m64_map[j][i] = IODA_INVALID_M64;
>> 		}
>>
>>+	kfree(pdn->m64_map);
>> 	return 0;
>> }
>>
>>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>> 	int                    total_vfs;
>> 	resource_size_t        size, start;
>> 	int                    pe_num;
>>-	int                    vf_groups;
>>-	int                    vf_per_group;
>>+	int                    m64_bars;
>>
>> 	bus = pdev->bus;
>> 	hose = pci_bus_to_host(bus);
>>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>> 	pdn = pci_get_pdn(pdev);
>> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>>
>>-	/* Initialize the m64_wins to IODA_INVALID_M64 */
>>-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>-		for (j = 0; j < M64_PER_IOV; j++)
>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>+	if (pdn->m64_single_mode)
>>+		m64_bars = num_vfs;
>>+	else
>>+		m64_bars = 1;
>>+
>>+	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
>>+	if (!pdn->m64_map)
>>+		return -ENOMEM;
>>+	/* Initialize the m64_map to IODA_INVALID_M64 */
>>+	for (i = 0; i < m64_bars ; i++)
>>+		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
>>+			pdn->m64_map[i][j] = IODA_INVALID_M64;
>>
>>-	if (pdn->m64_per_iov == M64_PER_IOV) {
>>-		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
>>-		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
>>-			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>-	} else {
>>-		vf_groups = 1;
>>-		vf_per_group = 1;
>>-	}
>>
>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>> 		if (!res->flags || !res->parent)
>> 			continue;
>>
>>-		for (j = 0; j < vf_groups; j++) {
>>+		for (j = 0; j < m64_bars; j++) {
>> 			do {
>> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>> 						phb->ioda.m64_bar_idx + 1, 0);
>>@@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>> 					goto m64_failed;
>> 			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
>>
>>-			pdn->m64_wins[i][j] = win;
>>+			pdn->m64_map[j][i] = win;
>>
>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>+			if (pdn->m64_single_mode) {
>> 				size = pci_iov_resource_size(pdev,
>> 							PCI_IOV_RESOURCES + i);
>>-				size = size * vf_per_group;
>> 				start = res->start + size * j;
>> 			} else {
>> 				size = resource_size(res);
>>@@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>> 			}
>>
>> 			/* Map the M64 here */
>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>+			if (pdn->m64_single_mode) {
>> 				pe_num = pdn->offset + j;
>> 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>> 						pe_num, OPAL_M64_WINDOW_TYPE,
>>-						pdn->m64_wins[i][j], 0);
>>+						pdn->m64_map[j][i], 0);
>> 			}
>>
>> 			rc = opal_pci_set_phb_mem_window(phb->opal_id,
>> 						 OPAL_M64_WINDOW_TYPE,
>>-						 pdn->m64_wins[i][j],
>>+						 pdn->m64_map[j][i],
>> 						 start,
>> 						 0, /* unused */
>> 						 size);
>>@@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>> 				goto m64_failed;
>> 			}
>>
>>-			if (pdn->m64_per_iov == M64_PER_IOV)
>>+			if (pdn->m64_single_mode)
>> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
>> 			else
>> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
>>
>> 			if (rc != OPAL_SUCCESS) {
>> 				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
>>@@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>> 	return 0;
>>
>> m64_failed:
>>-	pnv_pci_vf_release_m64(pdev);
>>+	pnv_pci_vf_release_m64(pdev, num_vfs);
>> 	return -EBUSY;
>> }
>>
>>@@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
>> 	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>> }
>>
>>-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>> {
>> 	struct pci_bus        *bus;
>> 	struct pci_controller *hose;
>> 	struct pnv_phb        *phb;
>> 	struct pnv_ioda_pe    *pe, *pe_n;
>> 	struct pci_dn         *pdn;
>>-	u16                    vf_index;
>>-	int64_t                rc;
>>
>> 	bus = pdev->bus;
>> 	hose = pci_bus_to_host(bus);
>>@@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>> 	if (!pdev->is_physfn)
>> 		return;
>>
>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>-		int   vf_group;
>>-		int   vf_per_group;
>>-		int   vf_index1;
>>-
>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>-
>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
>>-			for (vf_index = vf_group * vf_per_group;
>>-				vf_index < (vf_group + 1) * vf_per_group &&
>>-				vf_index < num_vfs;
>>-				vf_index++)
>>-				for (vf_index1 = vf_group * vf_per_group;
>>-					vf_index1 < (vf_group + 1) * vf_per_group &&
>>-					vf_index1 < num_vfs;
>>-					vf_index1++){
>>-
>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>-						pdn->offset + vf_index,
>>-						pdn->offset + vf_index1,
>>-						OPAL_REMOVE_PE_FROM_DOMAIN);
>>-
>>-					if (rc)
>>-					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
>>-						__func__,
>>-						pdn->offset + vf_index1, rc);
>>-				}
>>-	}
>>-
>> 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
>> 		if (pe->parent_dev != pdev)
>> 			continue;
>>@@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>> 	num_vfs = pdn->num_vfs;
>>
>> 	/* Release VF PEs */
>>-	pnv_ioda_release_vf_PE(pdev, num_vfs);
>>+	pnv_ioda_release_vf_PE(pdev);
>>
>> 	if (phb->type == PNV_PHB_IODA2) {
>>-		if (pdn->m64_per_iov == 1)
>>+		if (!pdn->m64_single_mode)
>> 			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>>
>> 		/* Release M64 windows */
>>-		pnv_pci_vf_release_m64(pdev);
>>+		pnv_pci_vf_release_m64(pdev, num_vfs);
>>
>> 		/* Release PE numbers */
>> 		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>@@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>> 	int                    pe_num;
>> 	u16                    vf_index;
>> 	struct pci_dn         *pdn;
>>-	int64_t                rc;
>>
>> 	bus = pdev->bus;
>> 	hose = pci_bus_to_host(bus);
>>@@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>
>> 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>> 	}
>>-
>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>-		int   vf_group;
>>-		int   vf_per_group;
>>-		int   vf_index1;
>>-
>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>-
>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
>>-			for (vf_index = vf_group * vf_per_group;
>>-			     vf_index < (vf_group + 1) * vf_per_group &&
>>-			     vf_index < num_vfs;
>>-			     vf_index++) {
>>-				for (vf_index1 = vf_group * vf_per_group;
>>-				     vf_index1 < (vf_group + 1) * vf_per_group &&
>>-				     vf_index1 < num_vfs;
>>-				     vf_index1++) {
>>-
>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>-						pdn->offset + vf_index,
>>-						pdn->offset + vf_index1,
>>-						OPAL_ADD_PE_TO_DOMAIN);
>>-
>>-					if (rc)
>>-					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
>>-						__func__,
>>-						pdn->offset + vf_index1, rc);
>>-				}
>>-			}
>>-		}
>>-	}
>> }
>>
>> int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>@@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>> 			return -ENOSPC;
>> 		}
>>
>>+		/*
>>+		 * When M64 BARs functions in Single PE mode, the number of VFs
>>+		 * could be enabled must be less than the number of M64 BARs.
>>+		 */
>>+		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {
>>+			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
>>+			return -EBUSY;
>>+		}
>>+
>> 		/* Calculate available PE for required VFs */
>> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
>> 		pdn->offset = bitmap_find_next_zero_area(
>>@@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>> 		 * the IOV BAR according to the PE# allocated to the VFs.
>> 		 * Otherwise, the PE# for the VF will conflict with others.
>> 		 */
>>-		if (pdn->m64_per_iov == 1) {
>>+		if (!pdn->m64_single_mode) {
>> 			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
>> 			if (ret)
>> 				goto m64_failed;
>>@@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>> 	/* Allocate PCI data */
>> 	add_dev_pci_data(pdev);
>>
>>-	pnv_pci_sriov_enable(pdev, num_vfs);
>>-	return 0;
>>+	return pnv_pci_sriov_enable(pdev, num_vfs);
>> }
>> #endif /* CONFIG_PCI_IOV */
>>
>>@@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>
>> 	pdn = pci_get_pdn(pdev);
>> 	pdn->vfs_expanded = 0;
>>+	pdn->m64_single_mode = false;
>>
>> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>>-	pdn->m64_per_iov = 1;
>> 	mul = phb->ioda.total_pe;
>>
>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>@@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>> 		if (size > (1 << 26)) {
>> 			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
>> 				 i, res);
>>-			pdn->m64_per_iov = M64_PER_IOV;
>> 			mul = roundup_pow_of_two(total_vfs);
>>+			pdn->m64_single_mode = true;
>> 			break;
>> 		}
>> 	}
>>@@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>> static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>> 						      int resno)
>> {
>>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>+	struct pnv_phb *phb = hose->private_data;
>> 	struct pci_dn *pdn = pci_get_pdn(pdev);
>> 	resource_size_t align;
>>
>>@@ -2995,12 +2947,23 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>> 	 * SR-IOV. While from hardware perspective, the range mapped by M64
>> 	 * BAR should be size aligned.
>> 	 *
>>+	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
>>+	 * powernv-specific hardware restriction is gone. But if just use the
>>+	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
>>+	 * in one segment of M64 #15, which introduces the PE conflict between
>>+	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
>>+	 * m64_segsize.
>>+	 *
>
>When M64 BARs run in single mode, it still requires 32MB aligned. Does
>"the extra powernv-specific hardware restriction" includes the limitation
>or not?
>

The 32MB aligned you mentioned is the size of VF BAR?

The answer is No. "the extra powernv-specific hardware restriction" only
mentioned the total M64 BAR alignment.

>> 	 * This function returns the total IOV BAR size if M64 BAR is in
>> 	 * Shared PE mode or just the individual size if not.
>>+	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
>>+	 * M64 segment size if IOV BAR size is less.
>> 	 */
>> 	align = pci_iov_resource_size(pdev, resno);
>> 	if (!pdn->vfs_expanded)
>> 		return align;
>>+	if (pdn->m64_single_mode)
>>+		return max(align, (resource_size_t)phb->ioda.m64_segsize);
>>
>> 	return pdn->vfs_expanded * align;
>> }
>>-- 
>>2.5.0
>>

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-13  1:49     ` Wei Yang
@ 2015-10-13  3:20       ` Gavin Shan
  2015-10-13  3:50         ` Wei Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Gavin Shan @ 2015-10-13  3:20 UTC (permalink / raw)
  To: Wei Yang; +Cc: Gavin Shan, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 09:49:30AM +0800, Wei Yang wrote:
>On Tue, Oct 13, 2015 at 11:01:24AM +1100, Gavin Shan wrote:
>>On Fri, Oct 09, 2015 at 10:46:51AM +0800, Wei Yang wrote:
>>>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>>>a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
>>>from 64bit prefetchable window, which means M64 BAR can't work on it.
>>>
>>>The reason is PCI bridges support only 2 windows and the kernel code
>>>programs bridges in the way that one window is 32bit-nonprefetchable and
>>>the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
>>>non-prefetchable, it will be mapped into 32bit space and therefore M64
>>>cannot be used for it.
>>>
>>>This patch makes this explicit.
>>>
>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>---
>>> arch/powerpc/platforms/powernv/pci-ioda.c | 25 +++++++++----------------
>>> 1 file changed, 9 insertions(+), 16 deletions(-)
>>>
>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 85cbc96..8c031b5 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>> 		if (!res->flags || !res->parent)
>>> 			continue;
>>>
>>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>>-			continue;
>>>-
>>> 		/*
>>> 		 * The actual IOV BAR range is determined by the start address
>>> 		 * and the actual size for num_vfs VFs BAR.  This check is to
>>>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>> 		if (!res->flags || !res->parent)
>>> 			continue;
>>>
>>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>>-			continue;
>>>-
>>> 		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>>> 		res2 = *res;
>>> 		res->start += size * offset;
>>>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>> 		if (!res->flags || !res->parent)
>>> 			continue;
>>>
>>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>>-			continue;
>>>-
>>> 		for (j = 0; j < vf_groups; j++) {
>>> 			do {
>>> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>> 	pdn = pci_get_pdn(pdev);
>>>
>>> 	if (phb->type == PNV_PHB_IODA2) {
>>>+		if (!pdn->vfs_expanded) {
>>>+			dev_info(&pdev->dev, "don't support this SRIOV device"
>>>+				" with non 64bit-prefetchable IOV BAR\n");
>>>+			return -ENOSPC;
>>>+		}
>>>+
>>> 		/* Calculate available PE for required VFs */
>>> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
>>> 		pdn->offset = bitmap_find_next_zero_area(
>>>@@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>> 		if (!res->flags || res->parent)
>>> 			continue;
>>> 		if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>>-			dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>>>+			dev_warn(&pdev->dev, "Don't support SR-IOV with"
>>>+					" non M64 VF BAR%d: %pR. \n",
>>> 				 i, res);
>>>-			continue;
>>>+			return;
>>
>>When the IOV BAR isn't 64-bits prefetchable one, it's going to be allocated from
>>M32 aperatus. However, the IOV BAR won't be used as the SRIOV capability on the PF
>>can't be enabled. Occasionally, the IOV BAR is huge (e.g. 4GB) and it eats up all
>>M32 space, BARs other than 64-bits prefetchable BARs on other device will fail in
>>this case. Would it a problem you ever thought of?
>>
>
>IOV BARs are in optional list during assignment.
>

The point isn't that optional list allows to be failed when assigning the resources.
The point is the system allocating resources without using them. Isn't it wasting
resources?

>>> 		}
>>>
>>> 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>>@@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>> 		if (!res->flags || res->parent)
>>> 			continue;
>>>-		if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>>-			dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
>>>-				 i, res);
>>>-			continue;
>>>-		}
>>>
>>> 		dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
>>> 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>>-- 
>>>2.5.0
>>>
>
>-- 
>Richard Yang
>Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 2/6] powerpc/powernv: simplify the calculation of iov resource alignment
  2015-10-13  2:45     ` Wei Yang
@ 2015-10-13  3:27       ` Gavin Shan
  2015-10-13  3:56         ` Wei Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Gavin Shan @ 2015-10-13  3:27 UTC (permalink / raw)
  To: Wei Yang; +Cc: Gavin Shan, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 10:45:45AM +0800, Wei Yang wrote:
>On Tue, Oct 13, 2015 at 11:13:50AM +1100, Gavin Shan wrote:
>>On Fri, Oct 09, 2015 at 10:46:52AM +0800, Wei Yang wrote:
>>>The alignment of IOV BAR on PowerNV platform is the total size of the IOV
>>>BAR. No matter whether the IOV BAR is extended with number of
>>>roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
>>>size could be calculated by (vfs_expanded * VF_BAR_size).
>>>
>>>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
>>>first case.
>>>
>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>---
>>> arch/powerpc/platforms/powernv/pci-ioda.c | 20 ++++++++++++--------
>>> 1 file changed, 12 insertions(+), 8 deletions(-)
>>>
>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 8c031b5..7da476b 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -2988,17 +2988,21 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>> 						      int resno)
>>> {
>>> 	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>-	resource_size_t align, iov_align;
>>>-
>>>-	iov_align = resource_size(&pdev->resource[resno]);
>>>-	if (iov_align)
>>>-		return iov_align;
>>>+	resource_size_t align;
>>>
>>>+	/*
>>>+	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
>>>+	 * SR-IOV. While from hardware perspective, the range mapped by M64
>>>+	 * BAR should be size aligned.
>>>+	 *
>>>+	 * This function returns the total IOV BAR size if M64 BAR is in
>>>+	 * Shared PE mode or just the individual size if not.
>>>+	 */
>>
>>s/the invidial size/VF BAR size
>>
>>> 	align = pci_iov_resource_size(pdev, resno);
>>>-	if (pdn->vfs_expanded)
>>>-		return pdn->vfs_expanded * align;
>>>+	if (!pdn->vfs_expanded)
>>>+		return align;
>>>
>>>-	return align;
>>>+	return pdn->vfs_expanded * align;
>>
>>There is no difference before/after the changes. why this change is needed?
>>
>
>After change the logic is more clear.
>
>Alignment equals to the total size when IOV BAR is expanded or equals to the
>VF BAR size. We don't need to check whether IOV BAR is truncated.
>

I didn't get what you're talking about with "IOV BAR is truncated". I also
didn't get what has been really changed from last 3 lines changes.

	if (pdn->vfs_expanded)				if (!pdn->vfs_expanded)
		return pdn->vfs_expanded * align;		return align;
	return align;					return pdn->vfs_expanded * align;

>>> }
>>> #endif /* CONFIG_PCI_IOV */
>>>
>>>-- 
>>>2.5.0
>>>
>
>-- 
>Richard Yang
>Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-10-13  2:50     ` Wei Yang
@ 2015-10-13  3:32       ` Gavin Shan
  2015-10-13  5:29         ` Wei Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Gavin Shan @ 2015-10-13  3:32 UTC (permalink / raw)
  To: Wei Yang; +Cc: Gavin Shan, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 10:50:42AM +0800, Wei Yang wrote:
>On Tue, Oct 13, 2015 at 10:55:27AM +1100, Gavin Shan wrote:
>>On Fri, Oct 09, 2015 at 10:46:53AM +0800, Wei Yang wrote:
>>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>>>BARs in Single PE mode to cover the number of VFs required to be enabled.
>>>By doing so, several VFs would be in one VF Group and leads to interference
>>>between VFs in the same group.
>>>
>>>And in this patch, m64_wins is renamed to m64_map, which means index number
>>>of the M64 BAR used to map the VF BAR. Based on Gavin's comments.
>>>
>>>This patch changes the design by using one M64 BAR in Single PE mode for
>>>one VF BAR. This gives absolute isolation for VFs.
>>>
>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>---
>>> arch/powerpc/include/asm/pci-bridge.h     |   5 +-
>>> arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++++++------------------
>>> 2 files changed, 68 insertions(+), 106 deletions(-)
>>>
>>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>>index 712add5..8aeba4c 100644
>>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>>@@ -214,10 +214,9 @@ struct pci_dn {
>>> 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>>> 	u16     num_vfs;		/* number of VFs enabled*/
>>> 	int     offset;			/* PE# for the first VF PE */
>>>-#define M64_PER_IOV 4
>>>-	int     m64_per_iov;
>>>+	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
>>> #define IODA_INVALID_M64        (-1)
>>>-	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>>+	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>>> #endif /* CONFIG_PCI_IOV */
>>> #endif
>>> 	struct list_head child_list;
>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 7da476b..2886f90 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
>>> }
>>>
>>> #ifdef CONFIG_PCI_IOV
>>>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>>>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>>> {
>>> 	struct pci_bus        *bus;
>>> 	struct pci_controller *hose;
>>> 	struct pnv_phb        *phb;
>>> 	struct pci_dn         *pdn;
>>> 	int                    i, j;
>>>+	int                    m64_bars;
>>>
>>> 	bus = pdev->bus;
>>> 	hose = pci_bus_to_host(bus);
>>> 	phb = hose->private_data;
>>> 	pdn = pci_get_pdn(pdev);
>>>
>>>+	if (pdn->m64_single_mode)
>>>+		m64_bars = num_vfs;
>>>+	else
>>>+		m64_bars = 1;
>>>+
>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>>-		for (j = 0; j < M64_PER_IOV; j++) {
>>>-			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
>>>+		for (j = 0; j < m64_bars; j++) {
>>>+			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
>>> 				continue;
>>> 			opal_pci_phb_mmio_enable(phb->opal_id,
>>>-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
>>>-			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
>>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>>+				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
>>>+			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
>>>+			pdn->m64_map[j][i] = IODA_INVALID_M64;
>>> 		}
>>>
>>>+	kfree(pdn->m64_map);
>>> 	return 0;
>>> }
>>>
>>>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>> 	int                    total_vfs;
>>> 	resource_size_t        size, start;
>>> 	int                    pe_num;
>>>-	int                    vf_groups;
>>>-	int                    vf_per_group;
>>>+	int                    m64_bars;
>>>
>>> 	bus = pdev->bus;
>>> 	hose = pci_bus_to_host(bus);
>>>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>> 	pdn = pci_get_pdn(pdev);
>>> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>>>
>>>-	/* Initialize the m64_wins to IODA_INVALID_M64 */
>>>-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>>-		for (j = 0; j < M64_PER_IOV; j++)
>>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>>+	if (pdn->m64_single_mode)
>>>+		m64_bars = num_vfs;
>>>+	else
>>>+		m64_bars = 1;
>>>+
>>>+	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
>>>+	if (!pdn->m64_map)
>>>+		return -ENOMEM;
>>>+	/* Initialize the m64_map to IODA_INVALID_M64 */
>>>+	for (i = 0; i < m64_bars ; i++)
>>>+		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
>>>+			pdn->m64_map[i][j] = IODA_INVALID_M64;
>>>
>>>-	if (pdn->m64_per_iov == M64_PER_IOV) {
>>>-		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
>>>-		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
>>>-			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>-	} else {
>>>-		vf_groups = 1;
>>>-		vf_per_group = 1;
>>>-	}
>>>
>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>> 		if (!res->flags || !res->parent)
>>> 			continue;
>>>
>>>-		for (j = 0; j < vf_groups; j++) {
>>>+		for (j = 0; j < m64_bars; j++) {
>>> 			do {
>>> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>> 						phb->ioda.m64_bar_idx + 1, 0);
>>>@@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>> 					goto m64_failed;
>>> 			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
>>>
>>>-			pdn->m64_wins[i][j] = win;
>>>+			pdn->m64_map[j][i] = win;
>>>
>>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>>+			if (pdn->m64_single_mode) {
>>> 				size = pci_iov_resource_size(pdev,
>>> 							PCI_IOV_RESOURCES + i);
>>>-				size = size * vf_per_group;
>>> 				start = res->start + size * j;
>>> 			} else {
>>> 				size = resource_size(res);
>>>@@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>> 			}
>>>
>>> 			/* Map the M64 here */
>>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>>+			if (pdn->m64_single_mode) {
>>> 				pe_num = pdn->offset + j;
>>> 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> 						pe_num, OPAL_M64_WINDOW_TYPE,
>>>-						pdn->m64_wins[i][j], 0);
>>>+						pdn->m64_map[j][i], 0);
>>> 			}
>>>
>>> 			rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>> 						 OPAL_M64_WINDOW_TYPE,
>>>-						 pdn->m64_wins[i][j],
>>>+						 pdn->m64_map[j][i],
>>> 						 start,
>>> 						 0, /* unused */
>>> 						 size);
>>>@@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>> 				goto m64_failed;
>>> 			}
>>>
>>>-			if (pdn->m64_per_iov == M64_PER_IOV)
>>>+			if (pdn->m64_single_mode)
>>> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
>>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
>>> 			else
>>> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
>>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
>>>
>>> 			if (rc != OPAL_SUCCESS) {
>>> 				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
>>>@@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>> 	return 0;
>>>
>>> m64_failed:
>>>-	pnv_pci_vf_release_m64(pdev);
>>>+	pnv_pci_vf_release_m64(pdev, num_vfs);
>>> 	return -EBUSY;
>>> }
>>>
>>>@@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
>>> 	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>> }
>>>
>>>-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>> {
>>> 	struct pci_bus        *bus;
>>> 	struct pci_controller *hose;
>>> 	struct pnv_phb        *phb;
>>> 	struct pnv_ioda_pe    *pe, *pe_n;
>>> 	struct pci_dn         *pdn;
>>>-	u16                    vf_index;
>>>-	int64_t                rc;
>>>
>>> 	bus = pdev->bus;
>>> 	hose = pci_bus_to_host(bus);
>>>@@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>> 	if (!pdev->is_physfn)
>>> 		return;
>>>
>>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>>-		int   vf_group;
>>>-		int   vf_per_group;
>>>-		int   vf_index1;
>>>-
>>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>-
>>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
>>>-			for (vf_index = vf_group * vf_per_group;
>>>-				vf_index < (vf_group + 1) * vf_per_group &&
>>>-				vf_index < num_vfs;
>>>-				vf_index++)
>>>-				for (vf_index1 = vf_group * vf_per_group;
>>>-					vf_index1 < (vf_group + 1) * vf_per_group &&
>>>-					vf_index1 < num_vfs;
>>>-					vf_index1++){
>>>-
>>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>>-						pdn->offset + vf_index,
>>>-						pdn->offset + vf_index1,
>>>-						OPAL_REMOVE_PE_FROM_DOMAIN);
>>>-
>>>-					if (rc)
>>>-					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
>>>-						__func__,
>>>-						pdn->offset + vf_index1, rc);
>>>-				}
>>>-	}
>>>-
>>> 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
>>> 		if (pe->parent_dev != pdev)
>>> 			continue;
>>>@@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>> 	num_vfs = pdn->num_vfs;
>>>
>>> 	/* Release VF PEs */
>>>-	pnv_ioda_release_vf_PE(pdev, num_vfs);
>>>+	pnv_ioda_release_vf_PE(pdev);
>>>
>>> 	if (phb->type == PNV_PHB_IODA2) {
>>>-		if (pdn->m64_per_iov == 1)
>>>+		if (!pdn->m64_single_mode)
>>> 			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>>>
>>> 		/* Release M64 windows */
>>>-		pnv_pci_vf_release_m64(pdev);
>>>+		pnv_pci_vf_release_m64(pdev, num_vfs);
>>>
>>> 		/* Release PE numbers */
>>> 		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>>@@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>> 	int                    pe_num;
>>> 	u16                    vf_index;
>>> 	struct pci_dn         *pdn;
>>>-	int64_t                rc;
>>>
>>> 	bus = pdev->bus;
>>> 	hose = pci_bus_to_host(bus);
>>>@@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>
>>> 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>> 	}
>>>-
>>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>>-		int   vf_group;
>>>-		int   vf_per_group;
>>>-		int   vf_index1;
>>>-
>>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>-
>>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
>>>-			for (vf_index = vf_group * vf_per_group;
>>>-			     vf_index < (vf_group + 1) * vf_per_group &&
>>>-			     vf_index < num_vfs;
>>>-			     vf_index++) {
>>>-				for (vf_index1 = vf_group * vf_per_group;
>>>-				     vf_index1 < (vf_group + 1) * vf_per_group &&
>>>-				     vf_index1 < num_vfs;
>>>-				     vf_index1++) {
>>>-
>>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>>-						pdn->offset + vf_index,
>>>-						pdn->offset + vf_index1,
>>>-						OPAL_ADD_PE_TO_DOMAIN);
>>>-
>>>-					if (rc)
>>>-					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
>>>-						__func__,
>>>-						pdn->offset + vf_index1, rc);
>>>-				}
>>>-			}
>>>-		}
>>>-	}
>>> }
>>>
>>> int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>@@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>> 			return -ENOSPC;
>>> 		}
>>>
>>>+		/*
>>>+		 * When M64 BARs functions in Single PE mode, the number of VFs
>>>+		 * could be enabled must be less than the number of M64 BARs.
>>>+		 */
>>>+		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {
>>>+			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
>>>+			return -EBUSY;
>>>+		}
>>>+
>>> 		/* Calculate available PE for required VFs */
>>> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
>>> 		pdn->offset = bitmap_find_next_zero_area(
>>>@@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>> 		 * the IOV BAR according to the PE# allocated to the VFs.
>>> 		 * Otherwise, the PE# for the VF will conflict with others.
>>> 		 */
>>>-		if (pdn->m64_per_iov == 1) {
>>>+		if (!pdn->m64_single_mode) {
>>> 			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
>>> 			if (ret)
>>> 				goto m64_failed;
>>>@@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>> 	/* Allocate PCI data */
>>> 	add_dev_pci_data(pdev);
>>>
>>>-	pnv_pci_sriov_enable(pdev, num_vfs);
>>>-	return 0;
>>>+	return pnv_pci_sriov_enable(pdev, num_vfs);
>>> }
>>> #endif /* CONFIG_PCI_IOV */
>>>
>>>@@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>
>>> 	pdn = pci_get_pdn(pdev);
>>> 	pdn->vfs_expanded = 0;
>>>+	pdn->m64_single_mode = false;
>>>
>>> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>>>-	pdn->m64_per_iov = 1;
>>> 	mul = phb->ioda.total_pe;
>>>
>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>>@@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>> 		if (size > (1 << 26)) {
>>> 			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
>>> 				 i, res);
>>>-			pdn->m64_per_iov = M64_PER_IOV;
>>> 			mul = roundup_pow_of_two(total_vfs);
>>>+			pdn->m64_single_mode = true;
>>> 			break;
>>> 		}
>>> 	}
>>>@@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>>> static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>> 						      int resno)
>>> {
>>>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>>+	struct pnv_phb *phb = hose->private_data;
>>> 	struct pci_dn *pdn = pci_get_pdn(pdev);
>>> 	resource_size_t align;
>>>
>>>@@ -2995,12 +2947,23 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>> 	 * SR-IOV. While from hardware perspective, the range mapped by M64
>>> 	 * BAR should be size aligned.
>>> 	 *
>>>+	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
>>>+	 * powernv-specific hardware restriction is gone. But if just use the
>>>+	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
>>>+	 * in one segment of M64 #15, which introduces the PE conflict between
>>>+	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
>>>+	 * m64_segsize.
>>>+	 *
>>
>>When M64 BARs run in single mode, it still requires 32MB aligned. Does
>>"the extra powernv-specific hardware restriction" includes the limitation
>>or not?
>>
>
>The 32MB aligned you mentioned is the size of VF BAR?
>
>The answer is No. "the extra powernv-specific hardware restriction" only
>mentioned the total M64 BAR alignment.
>

No, it's PHB's M64 BAR. there are 16 in total. When it's running in single PE
mode, it requires 32MB alignment at least. Extracted from PHB3 spec:

When Single PE = 1, Bits 02:26 are the base address to compare
against the MSB 25 AIB address bits, 0:24, out of 0:49 (32MB aligned).
Bits 27:31 are the upper 5 bits of the PE# that owns the entire address
space defined by this BAR.

>>> 	 * This function returns the total IOV BAR size if M64 BAR is in
>>> 	 * Shared PE mode or just the individual size if not.
>>>+	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
>>>+	 * M64 segment size if IOV BAR size is less.
>>> 	 */
>>> 	align = pci_iov_resource_size(pdev, resno);
>>> 	if (!pdn->vfs_expanded)
>>> 		return align;
>>>+	if (pdn->m64_single_mode)
>>>+		return max(align, (resource_size_t)phb->ioda.m64_segsize);
>>>
>>> 	return pdn->vfs_expanded * align;
>>> }
>>>-- 
>>>2.5.0
>>>
>
>-- 
>Richard Yang
>Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-13  3:20       ` Gavin Shan
@ 2015-10-13  3:50         ` Wei Yang
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Yang @ 2015-10-13  3:50 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Wei Yang, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 02:20:30PM +1100, Gavin Shan wrote:
>On Tue, Oct 13, 2015 at 09:49:30AM +0800, Wei Yang wrote:
>>On Tue, Oct 13, 2015 at 11:01:24AM +1100, Gavin Shan wrote:
>>>On Fri, Oct 09, 2015 at 10:46:51AM +0800, Wei Yang wrote:
>>>>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>>>>a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
>>>>from 64bit prefetchable window, which means M64 BAR can't work on it.
>>>>
>>>>The reason is PCI bridges support only 2 windows and the kernel code
>>>>programs bridges in the way that one window is 32bit-nonprefetchable and
>>>>the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
>>>>non-prefetchable, it will be mapped into 32bit space and therefore M64
>>>>cannot be used for it.
>>>>
>>>>This patch makes this explicit.
>>>>
>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>---
>>>> arch/powerpc/platforms/powernv/pci-ioda.c | 25 +++++++++----------------
>>>> 1 file changed, 9 insertions(+), 16 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 85cbc96..8c031b5 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>>> 		if (!res->flags || !res->parent)
>>>> 			continue;
>>>>
>>>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>>>-			continue;
>>>>-
>>>> 		/*
>>>> 		 * The actual IOV BAR range is determined by the start address
>>>> 		 * and the actual size for num_vfs VFs BAR.  This check is to
>>>>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>>> 		if (!res->flags || !res->parent)
>>>> 			continue;
>>>>
>>>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>>>-			continue;
>>>>-
>>>> 		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>>>> 		res2 = *res;
>>>> 		res->start += size * offset;
>>>>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>> 		if (!res->flags || !res->parent)
>>>> 			continue;
>>>>
>>>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>>>-			continue;
>>>>-
>>>> 		for (j = 0; j < vf_groups; j++) {
>>>> 			do {
>>>> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>>>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>> 	pdn = pci_get_pdn(pdev);
>>>>
>>>> 	if (phb->type == PNV_PHB_IODA2) {
>>>>+		if (!pdn->vfs_expanded) {
>>>>+			dev_info(&pdev->dev, "don't support this SRIOV device"
>>>>+				" with non 64bit-prefetchable IOV BAR\n");
>>>>+			return -ENOSPC;
>>>>+		}
>>>>+
>>>> 		/* Calculate available PE for required VFs */
>>>> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
>>>> 		pdn->offset = bitmap_find_next_zero_area(
>>>>@@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>> 		if (!res->flags || res->parent)
>>>> 			continue;
>>>> 		if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>>>-			dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>>>>+			dev_warn(&pdev->dev, "Don't support SR-IOV with"
>>>>+					" non M64 VF BAR%d: %pR. \n",
>>>> 				 i, res);
>>>>-			continue;
>>>>+			return;
>>>
>>>When the IOV BAR isn't 64-bits prefetchable one, it's going to be allocated from
>>>M32 aperatus. However, the IOV BAR won't be used as the SRIOV capability on the PF
>>>can't be enabled. Occasionally, the IOV BAR is huge (e.g. 4GB) and it eats up all
>>>M32 space, BARs other than 64-bits prefetchable BARs on other device will fail in
>>>this case. Would it a problem you ever thought of?
>>>
>>
>>IOV BARs are in optional list during assignment.
>>
>
>The point isn't that optional list allows to be failed when assigning the resources.
>The point is the system allocating resources without using them. Isn't it wasting
>resources?
>

Looks you are right. Then the better way is to truncate it.

I would change in next version.

>>>> 		}
>>>>
>>>> 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>>>@@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>>> 		if (!res->flags || res->parent)
>>>> 			continue;
>>>>-		if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>>>-			dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
>>>>-				 i, res);
>>>>-			continue;
>>>>-		}
>>>>
>>>> 		dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
>>>> 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>>>-- 
>>>>2.5.0
>>>>
>>
>>-- 
>>Richard Yang
>>Help you, Help me

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 2/6] powerpc/powernv: simplify the calculation of iov resource alignment
  2015-10-13  3:27       ` Gavin Shan
@ 2015-10-13  3:56         ` Wei Yang
  2015-10-14  1:22           ` Gavin Shan
  0 siblings, 1 reply; 30+ messages in thread
From: Wei Yang @ 2015-10-13  3:56 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Wei Yang, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 02:27:52PM +1100, Gavin Shan wrote:
>On Tue, Oct 13, 2015 at 10:45:45AM +0800, Wei Yang wrote:
>>On Tue, Oct 13, 2015 at 11:13:50AM +1100, Gavin Shan wrote:
>>>On Fri, Oct 09, 2015 at 10:46:52AM +0800, Wei Yang wrote:
>>>>The alignment of IOV BAR on PowerNV platform is the total size of the IOV
>>>>BAR. No matter whether the IOV BAR is extended with number of
>>>>roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
>>>>size could be calculated by (vfs_expanded * VF_BAR_size).
>>>>
>>>>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
>>>>first case.
>>>>
>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>---
>>>> arch/powerpc/platforms/powernv/pci-ioda.c | 20 ++++++++++++--------
>>>> 1 file changed, 12 insertions(+), 8 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 8c031b5..7da476b 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -2988,17 +2988,21 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>> 						      int resno)
>>>> {
>>>> 	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>>-	resource_size_t align, iov_align;
>>>>-
>>>>-	iov_align = resource_size(&pdev->resource[resno]);
>>>>-	if (iov_align)
>>>>-		return iov_align;
>>>>+	resource_size_t align;
>>>>
>>>>+	/*
>>>>+	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
>>>>+	 * SR-IOV. While from hardware perspective, the range mapped by M64
>>>>+	 * BAR should be size aligned.
>>>>+	 *
>>>>+	 * This function returns the total IOV BAR size if M64 BAR is in
>>>>+	 * Shared PE mode or just the individual size if not.
>>>>+	 */
>>>
>>>s/the invidial size/VF BAR size
>>>
>>>> 	align = pci_iov_resource_size(pdev, resno);
>>>>-	if (pdn->vfs_expanded)
>>>>-		return pdn->vfs_expanded * align;
>>>>+	if (!pdn->vfs_expanded)
>>>>+		return align;
>>>>
>>>>-	return align;
>>>>+	return pdn->vfs_expanded * align;
>>>
>>>There is no difference before/after the changes. why this change is needed?
>>>
>>
>>After change the logic is more clear.
>>
>>Alignment equals to the total size when IOV BAR is expanded or equals to the
>>VF BAR size. We don't need to check whether IOV BAR is truncated.
>>
>
>I didn't get what you're talking about with "IOV BAR is truncated". I also
>didn't get what has been really changed from last 3 lines changes.
>
>	if (pdn->vfs_expanded)				if (!pdn->vfs_expanded)
>		return pdn->vfs_expanded * align;		return align;
>	return align;					return pdn->vfs_expanded * align;
>

Truncated IOV BAR resource happens during sizing state, since IOV BAR is
optional. While we still need to get the correct alignment during this
process.

The code before and after change has the same meaning, while the later one is
prepared for the single mode M64 BAR. You could look at the final version
after single mode M64 is introduced.

>>>> }
>>>> #endif /* CONFIG_PCI_IOV */
>>>>
>>>>-- 
>>>>2.5.0
>>>>
>>
>>-- 
>>Richard Yang
>>Help you, Help me

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-10-13  3:32       ` Gavin Shan
@ 2015-10-13  5:29         ` Wei Yang
  2015-10-14  1:15           ` Gavin Shan
  0 siblings, 1 reply; 30+ messages in thread
From: Wei Yang @ 2015-10-13  5:29 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Wei Yang, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 02:32:49PM +1100, Gavin Shan wrote:
>On Tue, Oct 13, 2015 at 10:50:42AM +0800, Wei Yang wrote:
>>On Tue, Oct 13, 2015 at 10:55:27AM +1100, Gavin Shan wrote:
>>>On Fri, Oct 09, 2015 at 10:46:53AM +0800, Wei Yang wrote:
>>>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>>>>BARs in Single PE mode to cover the number of VFs required to be enabled.
>>>>By doing so, several VFs would be in one VF Group and leads to interference
>>>>between VFs in the same group.
>>>>
>>>>And in this patch, m64_wins is renamed to m64_map, which means index number
>>>>of the M64 BAR used to map the VF BAR. Based on Gavin's comments.
>>>>
>>>>This patch changes the design by using one M64 BAR in Single PE mode for
>>>>one VF BAR. This gives absolute isolation for VFs.
>>>>
>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>---
>>>> arch/powerpc/include/asm/pci-bridge.h     |   5 +-
>>>> arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++++++------------------
>>>> 2 files changed, 68 insertions(+), 106 deletions(-)
>>>>
>>>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>>>index 712add5..8aeba4c 100644
>>>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>>>@@ -214,10 +214,9 @@ struct pci_dn {
>>>> 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>>>> 	u16     num_vfs;		/* number of VFs enabled*/
>>>> 	int     offset;			/* PE# for the first VF PE */
>>>>-#define M64_PER_IOV 4
>>>>-	int     m64_per_iov;
>>>>+	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
>>>> #define IODA_INVALID_M64        (-1)
>>>>-	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>>>+	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>>>> #endif /* CONFIG_PCI_IOV */
>>>> #endif
>>>> 	struct list_head child_list;
>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>index 7da476b..2886f90 100644
>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
>>>> }
>>>>
>>>> #ifdef CONFIG_PCI_IOV
>>>>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>>>>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>>>> {
>>>> 	struct pci_bus        *bus;
>>>> 	struct pci_controller *hose;
>>>> 	struct pnv_phb        *phb;
>>>> 	struct pci_dn         *pdn;
>>>> 	int                    i, j;
>>>>+	int                    m64_bars;
>>>>
>>>> 	bus = pdev->bus;
>>>> 	hose = pci_bus_to_host(bus);
>>>> 	phb = hose->private_data;
>>>> 	pdn = pci_get_pdn(pdev);
>>>>
>>>>+	if (pdn->m64_single_mode)
>>>>+		m64_bars = num_vfs;
>>>>+	else
>>>>+		m64_bars = 1;
>>>>+
>>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>>>-		for (j = 0; j < M64_PER_IOV; j++) {
>>>>-			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
>>>>+		for (j = 0; j < m64_bars; j++) {
>>>>+			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
>>>> 				continue;
>>>> 			opal_pci_phb_mmio_enable(phb->opal_id,
>>>>-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
>>>>-			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
>>>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>>>+				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
>>>>+			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
>>>>+			pdn->m64_map[j][i] = IODA_INVALID_M64;
>>>> 		}
>>>>
>>>>+	kfree(pdn->m64_map);
>>>> 	return 0;
>>>> }
>>>>
>>>>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>> 	int                    total_vfs;
>>>> 	resource_size_t        size, start;
>>>> 	int                    pe_num;
>>>>-	int                    vf_groups;
>>>>-	int                    vf_per_group;
>>>>+	int                    m64_bars;
>>>>
>>>> 	bus = pdev->bus;
>>>> 	hose = pci_bus_to_host(bus);
>>>>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>> 	pdn = pci_get_pdn(pdev);
>>>> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>>>>
>>>>-	/* Initialize the m64_wins to IODA_INVALID_M64 */
>>>>-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>>>-		for (j = 0; j < M64_PER_IOV; j++)
>>>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>>>+	if (pdn->m64_single_mode)
>>>>+		m64_bars = num_vfs;
>>>>+	else
>>>>+		m64_bars = 1;
>>>>+
>>>>+	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
>>>>+	if (!pdn->m64_map)
>>>>+		return -ENOMEM;
>>>>+	/* Initialize the m64_map to IODA_INVALID_M64 */
>>>>+	for (i = 0; i < m64_bars ; i++)
>>>>+		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
>>>>+			pdn->m64_map[i][j] = IODA_INVALID_M64;
>>>>
>>>>-	if (pdn->m64_per_iov == M64_PER_IOV) {
>>>>-		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
>>>>-		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
>>>>-			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>>-	} else {
>>>>-		vf_groups = 1;
>>>>-		vf_per_group = 1;
>>>>-	}
>>>>
>>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>>> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>>> 		if (!res->flags || !res->parent)
>>>> 			continue;
>>>>
>>>>-		for (j = 0; j < vf_groups; j++) {
>>>>+		for (j = 0; j < m64_bars; j++) {
>>>> 			do {
>>>> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>>> 						phb->ioda.m64_bar_idx + 1, 0);
>>>>@@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>> 					goto m64_failed;
>>>> 			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
>>>>
>>>>-			pdn->m64_wins[i][j] = win;
>>>>+			pdn->m64_map[j][i] = win;
>>>>
>>>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>>>+			if (pdn->m64_single_mode) {
>>>> 				size = pci_iov_resource_size(pdev,
>>>> 							PCI_IOV_RESOURCES + i);
>>>>-				size = size * vf_per_group;
>>>> 				start = res->start + size * j;
>>>> 			} else {
>>>> 				size = resource_size(res);
>>>>@@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>> 			}
>>>>
>>>> 			/* Map the M64 here */
>>>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>>>+			if (pdn->m64_single_mode) {
>>>> 				pe_num = pdn->offset + j;
>>>> 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>> 						pe_num, OPAL_M64_WINDOW_TYPE,
>>>>-						pdn->m64_wins[i][j], 0);
>>>>+						pdn->m64_map[j][i], 0);
>>>> 			}
>>>>
>>>> 			rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>>> 						 OPAL_M64_WINDOW_TYPE,
>>>>-						 pdn->m64_wins[i][j],
>>>>+						 pdn->m64_map[j][i],
>>>> 						 start,
>>>> 						 0, /* unused */
>>>> 						 size);
>>>>@@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>> 				goto m64_failed;
>>>> 			}
>>>>
>>>>-			if (pdn->m64_per_iov == M64_PER_IOV)
>>>>+			if (pdn->m64_single_mode)
>>>> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
>>>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
>>>> 			else
>>>> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
>>>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
>>>>
>>>> 			if (rc != OPAL_SUCCESS) {
>>>> 				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
>>>>@@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>> 	return 0;
>>>>
>>>> m64_failed:
>>>>-	pnv_pci_vf_release_m64(pdev);
>>>>+	pnv_pci_vf_release_m64(pdev, num_vfs);
>>>> 	return -EBUSY;
>>>> }
>>>>
>>>>@@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
>>>> 	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>>> }
>>>>
>>>>-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>>> {
>>>> 	struct pci_bus        *bus;
>>>> 	struct pci_controller *hose;
>>>> 	struct pnv_phb        *phb;
>>>> 	struct pnv_ioda_pe    *pe, *pe_n;
>>>> 	struct pci_dn         *pdn;
>>>>-	u16                    vf_index;
>>>>-	int64_t                rc;
>>>>
>>>> 	bus = pdev->bus;
>>>> 	hose = pci_bus_to_host(bus);
>>>>@@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>> 	if (!pdev->is_physfn)
>>>> 		return;
>>>>
>>>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>>>-		int   vf_group;
>>>>-		int   vf_per_group;
>>>>-		int   vf_index1;
>>>>-
>>>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>>-
>>>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
>>>>-			for (vf_index = vf_group * vf_per_group;
>>>>-				vf_index < (vf_group + 1) * vf_per_group &&
>>>>-				vf_index < num_vfs;
>>>>-				vf_index++)
>>>>-				for (vf_index1 = vf_group * vf_per_group;
>>>>-					vf_index1 < (vf_group + 1) * vf_per_group &&
>>>>-					vf_index1 < num_vfs;
>>>>-					vf_index1++){
>>>>-
>>>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>>>-						pdn->offset + vf_index,
>>>>-						pdn->offset + vf_index1,
>>>>-						OPAL_REMOVE_PE_FROM_DOMAIN);
>>>>-
>>>>-					if (rc)
>>>>-					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
>>>>-						__func__,
>>>>-						pdn->offset + vf_index1, rc);
>>>>-				}
>>>>-	}
>>>>-
>>>> 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
>>>> 		if (pe->parent_dev != pdev)
>>>> 			continue;
>>>>@@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>>> 	num_vfs = pdn->num_vfs;
>>>>
>>>> 	/* Release VF PEs */
>>>>-	pnv_ioda_release_vf_PE(pdev, num_vfs);
>>>>+	pnv_ioda_release_vf_PE(pdev);
>>>>
>>>> 	if (phb->type == PNV_PHB_IODA2) {
>>>>-		if (pdn->m64_per_iov == 1)
>>>>+		if (!pdn->m64_single_mode)
>>>> 			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>>>>
>>>> 		/* Release M64 windows */
>>>>-		pnv_pci_vf_release_m64(pdev);
>>>>+		pnv_pci_vf_release_m64(pdev, num_vfs);
>>>>
>>>> 		/* Release PE numbers */
>>>> 		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>>>@@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>> 	int                    pe_num;
>>>> 	u16                    vf_index;
>>>> 	struct pci_dn         *pdn;
>>>>-	int64_t                rc;
>>>>
>>>> 	bus = pdev->bus;
>>>> 	hose = pci_bus_to_host(bus);
>>>>@@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>
>>>> 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>>> 	}
>>>>-
>>>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>>>-		int   vf_group;
>>>>-		int   vf_per_group;
>>>>-		int   vf_index1;
>>>>-
>>>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>>-
>>>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
>>>>-			for (vf_index = vf_group * vf_per_group;
>>>>-			     vf_index < (vf_group + 1) * vf_per_group &&
>>>>-			     vf_index < num_vfs;
>>>>-			     vf_index++) {
>>>>-				for (vf_index1 = vf_group * vf_per_group;
>>>>-				     vf_index1 < (vf_group + 1) * vf_per_group &&
>>>>-				     vf_index1 < num_vfs;
>>>>-				     vf_index1++) {
>>>>-
>>>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>>>-						pdn->offset + vf_index,
>>>>-						pdn->offset + vf_index1,
>>>>-						OPAL_ADD_PE_TO_DOMAIN);
>>>>-
>>>>-					if (rc)
>>>>-					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
>>>>-						__func__,
>>>>-						pdn->offset + vf_index1, rc);
>>>>-				}
>>>>-			}
>>>>-		}
>>>>-	}
>>>> }
>>>>
>>>> int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>@@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>> 			return -ENOSPC;
>>>> 		}
>>>>
>>>>+		/*
>>>>+		 * When M64 BARs functions in Single PE mode, the number of VFs
>>>>+		 * could be enabled must be less than the number of M64 BARs.
>>>>+		 */
>>>>+		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {
>>>>+			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
>>>>+			return -EBUSY;
>>>>+		}
>>>>+
>>>> 		/* Calculate available PE for required VFs */
>>>> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
>>>> 		pdn->offset = bitmap_find_next_zero_area(
>>>>@@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>> 		 * the IOV BAR according to the PE# allocated to the VFs.
>>>> 		 * Otherwise, the PE# for the VF will conflict with others.
>>>> 		 */
>>>>-		if (pdn->m64_per_iov == 1) {
>>>>+		if (!pdn->m64_single_mode) {
>>>> 			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
>>>> 			if (ret)
>>>> 				goto m64_failed;
>>>>@@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>> 	/* Allocate PCI data */
>>>> 	add_dev_pci_data(pdev);
>>>>
>>>>-	pnv_pci_sriov_enable(pdev, num_vfs);
>>>>-	return 0;
>>>>+	return pnv_pci_sriov_enable(pdev, num_vfs);
>>>> }
>>>> #endif /* CONFIG_PCI_IOV */
>>>>
>>>>@@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>>
>>>> 	pdn = pci_get_pdn(pdev);
>>>> 	pdn->vfs_expanded = 0;
>>>>+	pdn->m64_single_mode = false;
>>>>
>>>> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>>>>-	pdn->m64_per_iov = 1;
>>>> 	mul = phb->ioda.total_pe;
>>>>
>>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>>>@@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>> 		if (size > (1 << 26)) {
>>>> 			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
>>>> 				 i, res);
>>>>-			pdn->m64_per_iov = M64_PER_IOV;
>>>> 			mul = roundup_pow_of_two(total_vfs);
>>>>+			pdn->m64_single_mode = true;
>>>> 			break;
>>>> 		}
>>>> 	}
>>>>@@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>>>> static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>> 						      int resno)
>>>> {
>>>>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>>>+	struct pnv_phb *phb = hose->private_data;
>>>> 	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>> 	resource_size_t align;
>>>>
>>>>@@ -2995,12 +2947,23 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>> 	 * SR-IOV. While from hardware perspective, the range mapped by M64
>>>> 	 * BAR should be size aligned.
>>>> 	 *
>>>>+	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
>>>>+	 * powernv-specific hardware restriction is gone. But if just use the
>>>>+	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
>>>>+	 * in one segment of M64 #15, which introduces the PE conflict between
>>>>+	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
>>>>+	 * m64_segsize.
>>>>+	 *
>>>
>>>When M64 BARs run in single mode, it still requires 32MB aligned. Does
>>>"the extra powernv-specific hardware restriction" includes the limitation
>>>or not?
>>>
>>
>>The 32MB aligned you mentioned is the size of VF BAR?
>>
>>The answer is No. "the extra powernv-specific hardware restriction" only
>>mentioned the total M64 BAR alignment.
>>
>
>No, it's PHB's M64 BAR. there are 16 in total. When it's running in single PE
>mode, it requires 32MB alignment at least. Extracted from PHB3 spec:
>
>When Single PE = 1, Bits 02:26 are the base address to compare
>against the MSB 25 AIB address bits, 0:24, out of 0:49 (32MB aligned).
>Bits 27:31 are the upper 5 bits of the PE# that owns the entire address
>space defined by this BAR.
>

Good, looks we have another hardware specific requirement.

While it is not proper to handle it here. This requirement effects the whole
pci resource assignment.

Linux kernel doesn't assign VF BAR individually, it assigns the IOV BAR and
VF BAR is a piece in it and they are continuous. So this means if the VF BAR
is less than 32MB and we want to use M64 BAR in Single PE mode, this would
make two VF BAR sits in one PE.

My suggestion is to add this check in pnv_pci_ioda_fixup_iov_resources(), when
we decide to use M64 BAR in Single PE mode to cover a IOV BAR, each VF BAR
should be at least 32MB. When this condition is met, the alignment from M64
Single PE mode is met too.

>>>> 	 * This function returns the total IOV BAR size if M64 BAR is in
>>>> 	 * Shared PE mode or just the individual size if not.
>>>>+	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
>>>>+	 * M64 segment size if IOV BAR size is less.
>>>> 	 */
>>>> 	align = pci_iov_resource_size(pdev, resno);
>>>> 	if (!pdn->vfs_expanded)
>>>> 		return align;
>>>>+	if (pdn->m64_single_mode)
>>>>+		return max(align, (resource_size_t)phb->ioda.m64_segsize);
>>>>
>>>> 	return pdn->vfs_expanded * align;
>>>> }
>>>>-- 
>>>>2.5.0
>>>>
>>
>>-- 
>>Richard Yang
>>Help you, Help me

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-10-13  5:29         ` Wei Yang
@ 2015-10-14  1:15           ` Gavin Shan
  2015-10-15  7:29             ` Wei Yang
  0 siblings, 1 reply; 30+ messages in thread
From: Gavin Shan @ 2015-10-14  1:15 UTC (permalink / raw)
  To: Wei Yang; +Cc: Gavin Shan, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 01:29:44PM +0800, Wei Yang wrote:
>On Tue, Oct 13, 2015 at 02:32:49PM +1100, Gavin Shan wrote:
>>On Tue, Oct 13, 2015 at 10:50:42AM +0800, Wei Yang wrote:
>>>On Tue, Oct 13, 2015 at 10:55:27AM +1100, Gavin Shan wrote:
>>>>On Fri, Oct 09, 2015 at 10:46:53AM +0800, Wei Yang wrote:
>>>>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>>>>>BARs in Single PE mode to cover the number of VFs required to be enabled.
>>>>>By doing so, several VFs would be in one VF Group and leads to interference
>>>>>between VFs in the same group.
>>>>>
>>>>>And in this patch, m64_wins is renamed to m64_map, which means index number
>>>>>of the M64 BAR used to map the VF BAR. Based on Gavin's comments.
>>>>>
>>>>>This patch changes the design by using one M64 BAR in Single PE mode for
>>>>>one VF BAR. This gives absolute isolation for VFs.
>>>>>
>>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>---
>>>>> arch/powerpc/include/asm/pci-bridge.h     |   5 +-
>>>>> arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++++++------------------
>>>>> 2 files changed, 68 insertions(+), 106 deletions(-)
>>>>>
>>>>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>>>>index 712add5..8aeba4c 100644
>>>>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>>>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>>>>@@ -214,10 +214,9 @@ struct pci_dn {
>>>>> 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>>>>> 	u16     num_vfs;		/* number of VFs enabled*/
>>>>> 	int     offset;			/* PE# for the first VF PE */
>>>>>-#define M64_PER_IOV 4
>>>>>-	int     m64_per_iov;
>>>>>+	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
>>>>> #define IODA_INVALID_M64        (-1)
>>>>>-	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>>>>+	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>>>>> #endif /* CONFIG_PCI_IOV */
>>>>> #endif
>>>>> 	struct list_head child_list;
>>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>index 7da476b..2886f90 100644
>>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
>>>>> }
>>>>>
>>>>> #ifdef CONFIG_PCI_IOV
>>>>>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>>>>>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>> {
>>>>> 	struct pci_bus        *bus;
>>>>> 	struct pci_controller *hose;
>>>>> 	struct pnv_phb        *phb;
>>>>> 	struct pci_dn         *pdn;
>>>>> 	int                    i, j;
>>>>>+	int                    m64_bars;
>>>>>
>>>>> 	bus = pdev->bus;
>>>>> 	hose = pci_bus_to_host(bus);
>>>>> 	phb = hose->private_data;
>>>>> 	pdn = pci_get_pdn(pdev);
>>>>>
>>>>>+	if (pdn->m64_single_mode)
>>>>>+		m64_bars = num_vfs;
>>>>>+	else
>>>>>+		m64_bars = 1;
>>>>>+
>>>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>>>>-		for (j = 0; j < M64_PER_IOV; j++) {
>>>>>-			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
>>>>>+		for (j = 0; j < m64_bars; j++) {
>>>>>+			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
>>>>> 				continue;
>>>>> 			opal_pci_phb_mmio_enable(phb->opal_id,
>>>>>-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
>>>>>-			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
>>>>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>>>>+				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
>>>>>+			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
>>>>>+			pdn->m64_map[j][i] = IODA_INVALID_M64;
>>>>> 		}
>>>>>
>>>>>+	kfree(pdn->m64_map);
>>>>> 	return 0;
>>>>> }
>>>>>
>>>>>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>> 	int                    total_vfs;
>>>>> 	resource_size_t        size, start;
>>>>> 	int                    pe_num;
>>>>>-	int                    vf_groups;
>>>>>-	int                    vf_per_group;
>>>>>+	int                    m64_bars;
>>>>>
>>>>> 	bus = pdev->bus;
>>>>> 	hose = pci_bus_to_host(bus);
>>>>>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>> 	pdn = pci_get_pdn(pdev);
>>>>> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>>>>>
>>>>>-	/* Initialize the m64_wins to IODA_INVALID_M64 */
>>>>>-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>>>>-		for (j = 0; j < M64_PER_IOV; j++)
>>>>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>>>>+	if (pdn->m64_single_mode)
>>>>>+		m64_bars = num_vfs;
>>>>>+	else
>>>>>+		m64_bars = 1;
>>>>>+
>>>>>+	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
>>>>>+	if (!pdn->m64_map)
>>>>>+		return -ENOMEM;
>>>>>+	/* Initialize the m64_map to IODA_INVALID_M64 */
>>>>>+	for (i = 0; i < m64_bars ; i++)
>>>>>+		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
>>>>>+			pdn->m64_map[i][j] = IODA_INVALID_M64;
>>>>>
>>>>>-	if (pdn->m64_per_iov == M64_PER_IOV) {
>>>>>-		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
>>>>>-		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
>>>>>-			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>>>-	} else {
>>>>>-		vf_groups = 1;
>>>>>-		vf_per_group = 1;
>>>>>-	}
>>>>>
>>>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>>>> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>>>> 		if (!res->flags || !res->parent)
>>>>> 			continue;
>>>>>
>>>>>-		for (j = 0; j < vf_groups; j++) {
>>>>>+		for (j = 0; j < m64_bars; j++) {
>>>>> 			do {
>>>>> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>>>> 						phb->ioda.m64_bar_idx + 1, 0);
>>>>>@@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>> 					goto m64_failed;
>>>>> 			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
>>>>>
>>>>>-			pdn->m64_wins[i][j] = win;
>>>>>+			pdn->m64_map[j][i] = win;
>>>>>
>>>>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>>>>+			if (pdn->m64_single_mode) {
>>>>> 				size = pci_iov_resource_size(pdev,
>>>>> 							PCI_IOV_RESOURCES + i);
>>>>>-				size = size * vf_per_group;
>>>>> 				start = res->start + size * j;
>>>>> 			} else {
>>>>> 				size = resource_size(res);
>>>>>@@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>> 			}
>>>>>
>>>>> 			/* Map the M64 here */
>>>>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>>>>+			if (pdn->m64_single_mode) {
>>>>> 				pe_num = pdn->offset + j;
>>>>> 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>> 						pe_num, OPAL_M64_WINDOW_TYPE,
>>>>>-						pdn->m64_wins[i][j], 0);
>>>>>+						pdn->m64_map[j][i], 0);
>>>>> 			}
>>>>>
>>>>> 			rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>>>> 						 OPAL_M64_WINDOW_TYPE,
>>>>>-						 pdn->m64_wins[i][j],
>>>>>+						 pdn->m64_map[j][i],
>>>>> 						 start,
>>>>> 						 0, /* unused */
>>>>> 						 size);
>>>>>@@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>> 				goto m64_failed;
>>>>> 			}
>>>>>
>>>>>-			if (pdn->m64_per_iov == M64_PER_IOV)
>>>>>+			if (pdn->m64_single_mode)
>>>>> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
>>>>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
>>>>> 			else
>>>>> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
>>>>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
>>>>>
>>>>> 			if (rc != OPAL_SUCCESS) {
>>>>> 				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
>>>>>@@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>> 	return 0;
>>>>>
>>>>> m64_failed:
>>>>>-	pnv_pci_vf_release_m64(pdev);
>>>>>+	pnv_pci_vf_release_m64(pdev, num_vfs);
>>>>> 	return -EBUSY;
>>>>> }
>>>>>
>>>>>@@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
>>>>> 	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>>>> }
>>>>>
>>>>>-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>>+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>>>> {
>>>>> 	struct pci_bus        *bus;
>>>>> 	struct pci_controller *hose;
>>>>> 	struct pnv_phb        *phb;
>>>>> 	struct pnv_ioda_pe    *pe, *pe_n;
>>>>> 	struct pci_dn         *pdn;
>>>>>-	u16                    vf_index;
>>>>>-	int64_t                rc;
>>>>>
>>>>> 	bus = pdev->bus;
>>>>> 	hose = pci_bus_to_host(bus);
>>>>>@@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>> 	if (!pdev->is_physfn)
>>>>> 		return;
>>>>>
>>>>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>>>>-		int   vf_group;
>>>>>-		int   vf_per_group;
>>>>>-		int   vf_index1;
>>>>>-
>>>>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>>>-
>>>>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
>>>>>-			for (vf_index = vf_group * vf_per_group;
>>>>>-				vf_index < (vf_group + 1) * vf_per_group &&
>>>>>-				vf_index < num_vfs;
>>>>>-				vf_index++)
>>>>>-				for (vf_index1 = vf_group * vf_per_group;
>>>>>-					vf_index1 < (vf_group + 1) * vf_per_group &&
>>>>>-					vf_index1 < num_vfs;
>>>>>-					vf_index1++){
>>>>>-
>>>>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>>>>-						pdn->offset + vf_index,
>>>>>-						pdn->offset + vf_index1,
>>>>>-						OPAL_REMOVE_PE_FROM_DOMAIN);
>>>>>-
>>>>>-					if (rc)
>>>>>-					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
>>>>>-						__func__,
>>>>>-						pdn->offset + vf_index1, rc);
>>>>>-				}
>>>>>-	}
>>>>>-
>>>>> 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
>>>>> 		if (pe->parent_dev != pdev)
>>>>> 			continue;
>>>>>@@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>>>> 	num_vfs = pdn->num_vfs;
>>>>>
>>>>> 	/* Release VF PEs */
>>>>>-	pnv_ioda_release_vf_PE(pdev, num_vfs);
>>>>>+	pnv_ioda_release_vf_PE(pdev);
>>>>>
>>>>> 	if (phb->type == PNV_PHB_IODA2) {
>>>>>-		if (pdn->m64_per_iov == 1)
>>>>>+		if (!pdn->m64_single_mode)
>>>>> 			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>>>>>
>>>>> 		/* Release M64 windows */
>>>>>-		pnv_pci_vf_release_m64(pdev);
>>>>>+		pnv_pci_vf_release_m64(pdev, num_vfs);
>>>>>
>>>>> 		/* Release PE numbers */
>>>>> 		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>>>>@@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>> 	int                    pe_num;
>>>>> 	u16                    vf_index;
>>>>> 	struct pci_dn         *pdn;
>>>>>-	int64_t                rc;
>>>>>
>>>>> 	bus = pdev->bus;
>>>>> 	hose = pci_bus_to_host(bus);
>>>>>@@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>>
>>>>> 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>>>> 	}
>>>>>-
>>>>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>>>>-		int   vf_group;
>>>>>-		int   vf_per_group;
>>>>>-		int   vf_index1;
>>>>>-
>>>>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>>>-
>>>>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
>>>>>-			for (vf_index = vf_group * vf_per_group;
>>>>>-			     vf_index < (vf_group + 1) * vf_per_group &&
>>>>>-			     vf_index < num_vfs;
>>>>>-			     vf_index++) {
>>>>>-				for (vf_index1 = vf_group * vf_per_group;
>>>>>-				     vf_index1 < (vf_group + 1) * vf_per_group &&
>>>>>-				     vf_index1 < num_vfs;
>>>>>-				     vf_index1++) {
>>>>>-
>>>>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>>>>-						pdn->offset + vf_index,
>>>>>-						pdn->offset + vf_index1,
>>>>>-						OPAL_ADD_PE_TO_DOMAIN);
>>>>>-
>>>>>-					if (rc)
>>>>>-					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
>>>>>-						__func__,
>>>>>-						pdn->offset + vf_index1, rc);
>>>>>-				}
>>>>>-			}
>>>>>-		}
>>>>>-	}
>>>>> }
>>>>>
>>>>> int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>>@@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>> 			return -ENOSPC;
>>>>> 		}
>>>>>
>>>>>+		/*
>>>>>+		 * When M64 BARs functions in Single PE mode, the number of VFs
>>>>>+		 * could be enabled must be less than the number of M64 BARs.
>>>>>+		 */
>>>>>+		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {
>>>>>+			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
>>>>>+			return -EBUSY;
>>>>>+		}
>>>>>+
>>>>> 		/* Calculate available PE for required VFs */
>>>>> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
>>>>> 		pdn->offset = bitmap_find_next_zero_area(
>>>>>@@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>> 		 * the IOV BAR according to the PE# allocated to the VFs.
>>>>> 		 * Otherwise, the PE# for the VF will conflict with others.
>>>>> 		 */
>>>>>-		if (pdn->m64_per_iov == 1) {
>>>>>+		if (!pdn->m64_single_mode) {
>>>>> 			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
>>>>> 			if (ret)
>>>>> 				goto m64_failed;
>>>>>@@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>> 	/* Allocate PCI data */
>>>>> 	add_dev_pci_data(pdev);
>>>>>
>>>>>-	pnv_pci_sriov_enable(pdev, num_vfs);
>>>>>-	return 0;
>>>>>+	return pnv_pci_sriov_enable(pdev, num_vfs);
>>>>> }
>>>>> #endif /* CONFIG_PCI_IOV */
>>>>>
>>>>>@@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>>>
>>>>> 	pdn = pci_get_pdn(pdev);
>>>>> 	pdn->vfs_expanded = 0;
>>>>>+	pdn->m64_single_mode = false;
>>>>>
>>>>> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>>>>>-	pdn->m64_per_iov = 1;
>>>>> 	mul = phb->ioda.total_pe;
>>>>>
>>>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>>>>@@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>>> 		if (size > (1 << 26)) {
>>>>> 			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
>>>>> 				 i, res);
>>>>>-			pdn->m64_per_iov = M64_PER_IOV;
>>>>> 			mul = roundup_pow_of_two(total_vfs);
>>>>>+			pdn->m64_single_mode = true;
>>>>> 			break;
>>>>> 		}
>>>>> 	}
>>>>>@@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>>>>> static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>>> 						      int resno)
>>>>> {
>>>>>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>>>>+	struct pnv_phb *phb = hose->private_data;
>>>>> 	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>>> 	resource_size_t align;
>>>>>
>>>>>@@ -2995,12 +2947,23 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>>> 	 * SR-IOV. While from hardware perspective, the range mapped by M64
>>>>> 	 * BAR should be size aligned.
>>>>> 	 *
>>>>>+	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
>>>>>+	 * powernv-specific hardware restriction is gone. But if just use the
>>>>>+	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
>>>>>+	 * in one segment of M64 #15, which introduces the PE conflict between
>>>>>+	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
>>>>>+	 * m64_segsize.
>>>>>+	 *
>>>>
>>>>When M64 BARs run in single mode, it still requires 32MB aligned. Does
>>>>"the extra powernv-specific hardware restriction" includes the limitation
>>>>or not?
>>>>
>>>
>>>The 32MB aligned you mentioned is the size of VF BAR?
>>>
>>>The answer is No. "the extra powernv-specific hardware restriction" only
>>>mentioned the total M64 BAR alignment.
>>>
>>
>>No, it's PHB's M64 BAR. there are 16 in total. When it's running in single PE
>>mode, it requires 32MB alignment at least. Extracted from PHB3 spec:
>>
>>When Single PE = 1, Bits 02:26 are the base address to compare
>>against the MSB 25 AIB address bits, 0:24, out of 0:49 (32MB aligned).
>>Bits 27:31 are the upper 5 bits of the PE# that owns the entire address
>>space defined by this BAR.
>>
>
>Good, looks we have another hardware specific requirement.
>
>While it is not proper to handle it here. This requirement effects the whole
>pci resource assignment.
>

Even PHB's M64 BAR runs in shared mode, it still have minimal alignment.
I'm suprised that you even don't know that... I never suggest to fix it
here, but the comments needs to be fixed here to avoid confusing others.

>Linux kernel doesn't assign VF BAR individually, it assigns the IOV BAR and
>VF BAR is a piece in it and they are continuous. So this means if the VF BAR
>is less than 32MB and we want to use M64 BAR in Single PE mode, this would
>make two VF BAR sits in one PE.
>

hrm, how can VF BAR can be assigned invididually? Remember there is one set
of PCI config registers tracking one IOV BAR that could be made of multiple VF
BARs. From this perspective, VF BAR is a "virtual" concept without corresponding
"real" config registers

>My suggestion is to add this check in pnv_pci_ioda_fixup_iov_resources(), when
>we decide to use M64 BAR in Single PE mode to cover a IOV BAR, each VF BAR
>should be at least 32MB. When this condition is met, the alignment from M64
>Single PE mode is met too.
>

That would be best place to check and add another limitation as I can see for
now.

>>>>> 	 * This function returns the total IOV BAR size if M64 BAR is in
>>>>> 	 * Shared PE mode or just the individual size if not.
>>>>>+	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
>>>>>+	 * M64 segment size if IOV BAR size is less.
>>>>> 	 */
>>>>> 	align = pci_iov_resource_size(pdev, resno);
>>>>> 	if (!pdn->vfs_expanded)
>>>>> 		return align;
>>>>>+	if (pdn->m64_single_mode)
>>>>>+		return max(align, (resource_size_t)phb->ioda.m64_segsize);
>>>>>
>>>>> 	return pdn->vfs_expanded * align;
>>>>> }
>>>>>-- 
>>>>>2.5.0
>>>>>
>>>
>>>-- 
>>>Richard Yang
>>>Help you, Help me
>
>-- 
>Richard Yang
>Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 2/6] powerpc/powernv: simplify the calculation of iov resource alignment
  2015-10-13  3:56         ` Wei Yang
@ 2015-10-14  1:22           ` Gavin Shan
  0 siblings, 0 replies; 30+ messages in thread
From: Gavin Shan @ 2015-10-14  1:22 UTC (permalink / raw)
  To: Wei Yang; +Cc: Gavin Shan, aik, benh, linuxppc-dev, mpe

On Tue, Oct 13, 2015 at 11:56:43AM +0800, Wei Yang wrote:
>On Tue, Oct 13, 2015 at 02:27:52PM +1100, Gavin Shan wrote:
>>On Tue, Oct 13, 2015 at 10:45:45AM +0800, Wei Yang wrote:
>>>On Tue, Oct 13, 2015 at 11:13:50AM +1100, Gavin Shan wrote:
>>>>On Fri, Oct 09, 2015 at 10:46:52AM +0800, Wei Yang wrote:
>>>>>The alignment of IOV BAR on PowerNV platform is the total size of the IOV
>>>>>BAR. No matter whether the IOV BAR is extended with number of
>>>>>roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
>>>>>size could be calculated by (vfs_expanded * VF_BAR_size).
>>>>>
>>>>>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
>>>>>first case.
>>>>>
>>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>---
>>>>> arch/powerpc/platforms/powernv/pci-ioda.c | 20 ++++++++++++--------
>>>>> 1 file changed, 12 insertions(+), 8 deletions(-)
>>>>>
>>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>index 8c031b5..7da476b 100644
>>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>@@ -2988,17 +2988,21 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>>> 						      int resno)
>>>>> {
>>>>> 	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>>>-	resource_size_t align, iov_align;
>>>>>-
>>>>>-	iov_align = resource_size(&pdev->resource[resno]);
>>>>>-	if (iov_align)
>>>>>-		return iov_align;
>>>>>+	resource_size_t align;
>>>>>
>>>>>+	/*
>>>>>+	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
>>>>>+	 * SR-IOV. While from hardware perspective, the range mapped by M64
>>>>>+	 * BAR should be size aligned.
>>>>>+	 *
>>>>>+	 * This function returns the total IOV BAR size if M64 BAR is in
>>>>>+	 * Shared PE mode or just the individual size if not.
>>>>>+	 */
>>>>
>>>>s/the invidial size/VF BAR size
>>>>
>>>>> 	align = pci_iov_resource_size(pdev, resno);
>>>>>-	if (pdn->vfs_expanded)
>>>>>-		return pdn->vfs_expanded * align;
>>>>>+	if (!pdn->vfs_expanded)
>>>>>+		return align;
>>>>>
>>>>>-	return align;
>>>>>+	return pdn->vfs_expanded * align;
>>>>
>>>>There is no difference before/after the changes. why this change is needed?
>>>>
>>>
>>>After change the logic is more clear.
>>>
>>>Alignment equals to the total size when IOV BAR is expanded or equals to the
>>>VF BAR size. We don't need to check whether IOV BAR is truncated.
>>>
>>
>>I didn't get what you're talking about with "IOV BAR is truncated". I also
>>didn't get what has been really changed from last 3 lines changes.
>>
>>	if (pdn->vfs_expanded)				if (!pdn->vfs_expanded)
>>		return pdn->vfs_expanded * align;		return align;
>>	return align;					return pdn->vfs_expanded * align;
>>
>
>Truncated IOV BAR resource happens during sizing state, since IOV BAR is
>optional. While we still need to get the correct alignment during this
>process.
>

I don't see how your claim here is related. And, the IOV BAR can be
expanded as well depending on how much VFs the PF is going to support
virtually based on IOV BAR size.

>The code before and after change has the same meaning, while the later one is
>prepared for the single mode M64 BAR. You could look at the final version
>after single mode M64 is introduced.
>

I still don't see any strong reason to have this - nothing changed, but
it's not a big deal, I guess...

>>>>> }
>>>>> #endif /* CONFIG_PCI_IOV */
>>>>>
>>>>>-- 
>>>>>2.5.0
>>>>>
>>>
>>>-- 
>>>Richard Yang
>>>Help you, Help me
>
>-- 
>Richard Yang
>Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-10-14  1:15           ` Gavin Shan
@ 2015-10-15  7:29             ` Wei Yang
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Yang @ 2015-10-15  7:29 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Wei Yang, aik, benh, linuxppc-dev, mpe

On Wed, Oct 14, 2015 at 12:15:32PM +1100, Gavin Shan wrote:
>On Tue, Oct 13, 2015 at 01:29:44PM +0800, Wei Yang wrote:
>>On Tue, Oct 13, 2015 at 02:32:49PM +1100, Gavin Shan wrote:
>>>On Tue, Oct 13, 2015 at 10:50:42AM +0800, Wei Yang wrote:
>>>>On Tue, Oct 13, 2015 at 10:55:27AM +1100, Gavin Shan wrote:
>>>>>On Fri, Oct 09, 2015 at 10:46:53AM +0800, Wei Yang wrote:
>>>>>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>>>>>>BARs in Single PE mode to cover the number of VFs required to be enabled.
>>>>>>By doing so, several VFs would be in one VF Group and leads to interference
>>>>>>between VFs in the same group.
>>>>>>
>>>>>>And in this patch, m64_wins is renamed to m64_map, which means index number
>>>>>>of the M64 BAR used to map the VF BAR. Based on Gavin's comments.
>>>>>>
>>>>>>This patch changes the design by using one M64 BAR in Single PE mode for
>>>>>>one VF BAR. This gives absolute isolation for VFs.
>>>>>>
>>>>>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>>>>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>>>Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>---
>>>>>> arch/powerpc/include/asm/pci-bridge.h     |   5 +-
>>>>>> arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++++++------------------
>>>>>> 2 files changed, 68 insertions(+), 106 deletions(-)
>>>>>>
>>>>>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>>>>>index 712add5..8aeba4c 100644
>>>>>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>>>>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>>>>>@@ -214,10 +214,9 @@ struct pci_dn {
>>>>>> 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>>>>>> 	u16     num_vfs;		/* number of VFs enabled*/
>>>>>> 	int     offset;			/* PE# for the first VF PE */
>>>>>>-#define M64_PER_IOV 4
>>>>>>-	int     m64_per_iov;
>>>>>>+	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
>>>>>> #define IODA_INVALID_M64        (-1)
>>>>>>-	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>>>>>+	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>>>>>> #endif /* CONFIG_PCI_IOV */
>>>>>> #endif
>>>>>> 	struct list_head child_list;
>>>>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>index 7da476b..2886f90 100644
>>>>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
>>>>>> }
>>>>>>
>>>>>> #ifdef CONFIG_PCI_IOV
>>>>>>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>>>>>>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>>> {
>>>>>> 	struct pci_bus        *bus;
>>>>>> 	struct pci_controller *hose;
>>>>>> 	struct pnv_phb        *phb;
>>>>>> 	struct pci_dn         *pdn;
>>>>>> 	int                    i, j;
>>>>>>+	int                    m64_bars;
>>>>>>
>>>>>> 	bus = pdev->bus;
>>>>>> 	hose = pci_bus_to_host(bus);
>>>>>> 	phb = hose->private_data;
>>>>>> 	pdn = pci_get_pdn(pdev);
>>>>>>
>>>>>>+	if (pdn->m64_single_mode)
>>>>>>+		m64_bars = num_vfs;
>>>>>>+	else
>>>>>>+		m64_bars = 1;
>>>>>>+
>>>>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>>>>>-		for (j = 0; j < M64_PER_IOV; j++) {
>>>>>>-			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
>>>>>>+		for (j = 0; j < m64_bars; j++) {
>>>>>>+			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
>>>>>> 				continue;
>>>>>> 			opal_pci_phb_mmio_enable(phb->opal_id,
>>>>>>-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
>>>>>>-			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
>>>>>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>>>>>+				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
>>>>>>+			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
>>>>>>+			pdn->m64_map[j][i] = IODA_INVALID_M64;
>>>>>> 		}
>>>>>>
>>>>>>+	kfree(pdn->m64_map);
>>>>>> 	return 0;
>>>>>> }
>>>>>>
>>>>>>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 	int                    total_vfs;
>>>>>> 	resource_size_t        size, start;
>>>>>> 	int                    pe_num;
>>>>>>-	int                    vf_groups;
>>>>>>-	int                    vf_per_group;
>>>>>>+	int                    m64_bars;
>>>>>>
>>>>>> 	bus = pdev->bus;
>>>>>> 	hose = pci_bus_to_host(bus);
>>>>>>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 	pdn = pci_get_pdn(pdev);
>>>>>> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>>>>>>
>>>>>>-	/* Initialize the m64_wins to IODA_INVALID_M64 */
>>>>>>-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>>>>>-		for (j = 0; j < M64_PER_IOV; j++)
>>>>>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>>>>>+	if (pdn->m64_single_mode)
>>>>>>+		m64_bars = num_vfs;
>>>>>>+	else
>>>>>>+		m64_bars = 1;
>>>>>>+
>>>>>>+	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
>>>>>>+	if (!pdn->m64_map)
>>>>>>+		return -ENOMEM;
>>>>>>+	/* Initialize the m64_map to IODA_INVALID_M64 */
>>>>>>+	for (i = 0; i < m64_bars ; i++)
>>>>>>+		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
>>>>>>+			pdn->m64_map[i][j] = IODA_INVALID_M64;
>>>>>>
>>>>>>-	if (pdn->m64_per_iov == M64_PER_IOV) {
>>>>>>-		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
>>>>>>-		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
>>>>>>-			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>>>>-	} else {
>>>>>>-		vf_groups = 1;
>>>>>>-		vf_per_group = 1;
>>>>>>-	}
>>>>>>
>>>>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>>>>> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>>>>> 		if (!res->flags || !res->parent)
>>>>>> 			continue;
>>>>>>
>>>>>>-		for (j = 0; j < vf_groups; j++) {
>>>>>>+		for (j = 0; j < m64_bars; j++) {
>>>>>> 			do {
>>>>>> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>>>>> 						phb->ioda.m64_bar_idx + 1, 0);
>>>>>>@@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 					goto m64_failed;
>>>>>> 			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
>>>>>>
>>>>>>-			pdn->m64_wins[i][j] = win;
>>>>>>+			pdn->m64_map[j][i] = win;
>>>>>>
>>>>>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>>>>>+			if (pdn->m64_single_mode) {
>>>>>> 				size = pci_iov_resource_size(pdev,
>>>>>> 							PCI_IOV_RESOURCES + i);
>>>>>>-				size = size * vf_per_group;
>>>>>> 				start = res->start + size * j;
>>>>>> 			} else {
>>>>>> 				size = resource_size(res);
>>>>>>@@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 			}
>>>>>>
>>>>>> 			/* Map the M64 here */
>>>>>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>>>>>+			if (pdn->m64_single_mode) {
>>>>>> 				pe_num = pdn->offset + j;
>>>>>> 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>>>>> 						pe_num, OPAL_M64_WINDOW_TYPE,
>>>>>>-						pdn->m64_wins[i][j], 0);
>>>>>>+						pdn->m64_map[j][i], 0);
>>>>>> 			}
>>>>>>
>>>>>> 			rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>>>>> 						 OPAL_M64_WINDOW_TYPE,
>>>>>>-						 pdn->m64_wins[i][j],
>>>>>>+						 pdn->m64_map[j][i],
>>>>>> 						 start,
>>>>>> 						 0, /* unused */
>>>>>> 						 size);
>>>>>>@@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 				goto m64_failed;
>>>>>> 			}
>>>>>>
>>>>>>-			if (pdn->m64_per_iov == M64_PER_IOV)
>>>>>>+			if (pdn->m64_single_mode)
>>>>>> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>>>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
>>>>>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
>>>>>> 			else
>>>>>> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>>>>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
>>>>>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
>>>>>>
>>>>>> 			if (rc != OPAL_SUCCESS) {
>>>>>> 				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
>>>>>>@@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 	return 0;
>>>>>>
>>>>>> m64_failed:
>>>>>>-	pnv_pci_vf_release_m64(pdev);
>>>>>>+	pnv_pci_vf_release_m64(pdev, num_vfs);
>>>>>> 	return -EBUSY;
>>>>>> }
>>>>>>
>>>>>>@@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
>>>>>> 	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>>>>> }
>>>>>>
>>>>>>-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>>>+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>>>>> {
>>>>>> 	struct pci_bus        *bus;
>>>>>> 	struct pci_controller *hose;
>>>>>> 	struct pnv_phb        *phb;
>>>>>> 	struct pnv_ioda_pe    *pe, *pe_n;
>>>>>> 	struct pci_dn         *pdn;
>>>>>>-	u16                    vf_index;
>>>>>>-	int64_t                rc;
>>>>>>
>>>>>> 	bus = pdev->bus;
>>>>>> 	hose = pci_bus_to_host(bus);
>>>>>>@@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 	if (!pdev->is_physfn)
>>>>>> 		return;
>>>>>>
>>>>>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>>>>>-		int   vf_group;
>>>>>>-		int   vf_per_group;
>>>>>>-		int   vf_index1;
>>>>>>-
>>>>>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>>>>-
>>>>>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
>>>>>>-			for (vf_index = vf_group * vf_per_group;
>>>>>>-				vf_index < (vf_group + 1) * vf_per_group &&
>>>>>>-				vf_index < num_vfs;
>>>>>>-				vf_index++)
>>>>>>-				for (vf_index1 = vf_group * vf_per_group;
>>>>>>-					vf_index1 < (vf_group + 1) * vf_per_group &&
>>>>>>-					vf_index1 < num_vfs;
>>>>>>-					vf_index1++){
>>>>>>-
>>>>>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>>>>>-						pdn->offset + vf_index,
>>>>>>-						pdn->offset + vf_index1,
>>>>>>-						OPAL_REMOVE_PE_FROM_DOMAIN);
>>>>>>-
>>>>>>-					if (rc)
>>>>>>-					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
>>>>>>-						__func__,
>>>>>>-						pdn->offset + vf_index1, rc);
>>>>>>-				}
>>>>>>-	}
>>>>>>-
>>>>>> 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
>>>>>> 		if (pe->parent_dev != pdev)
>>>>>> 			continue;
>>>>>>@@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>>>>> 	num_vfs = pdn->num_vfs;
>>>>>>
>>>>>> 	/* Release VF PEs */
>>>>>>-	pnv_ioda_release_vf_PE(pdev, num_vfs);
>>>>>>+	pnv_ioda_release_vf_PE(pdev);
>>>>>>
>>>>>> 	if (phb->type == PNV_PHB_IODA2) {
>>>>>>-		if (pdn->m64_per_iov == 1)
>>>>>>+		if (!pdn->m64_single_mode)
>>>>>> 			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>>>>>>
>>>>>> 		/* Release M64 windows */
>>>>>>-		pnv_pci_vf_release_m64(pdev);
>>>>>>+		pnv_pci_vf_release_m64(pdev, num_vfs);
>>>>>>
>>>>>> 		/* Release PE numbers */
>>>>>> 		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>>>>>@@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 	int                    pe_num;
>>>>>> 	u16                    vf_index;
>>>>>> 	struct pci_dn         *pdn;
>>>>>>-	int64_t                rc;
>>>>>>
>>>>>> 	bus = pdev->bus;
>>>>>> 	hose = pci_bus_to_host(bus);
>>>>>>@@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>>>>>
>>>>>> 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>>>>> 	}
>>>>>>-
>>>>>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>>>>>-		int   vf_group;
>>>>>>-		int   vf_per_group;
>>>>>>-		int   vf_index1;
>>>>>>-
>>>>>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>>>>>-
>>>>>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
>>>>>>-			for (vf_index = vf_group * vf_per_group;
>>>>>>-			     vf_index < (vf_group + 1) * vf_per_group &&
>>>>>>-			     vf_index < num_vfs;
>>>>>>-			     vf_index++) {
>>>>>>-				for (vf_index1 = vf_group * vf_per_group;
>>>>>>-				     vf_index1 < (vf_group + 1) * vf_per_group &&
>>>>>>-				     vf_index1 < num_vfs;
>>>>>>-				     vf_index1++) {
>>>>>>-
>>>>>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>>>>>-						pdn->offset + vf_index,
>>>>>>-						pdn->offset + vf_index1,
>>>>>>-						OPAL_ADD_PE_TO_DOMAIN);
>>>>>>-
>>>>>>-					if (rc)
>>>>>>-					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
>>>>>>-						__func__,
>>>>>>-						pdn->offset + vf_index1, rc);
>>>>>>-				}
>>>>>>-			}
>>>>>>-		}
>>>>>>-	}
>>>>>> }
>>>>>>
>>>>>> int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>>>@@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 			return -ENOSPC;
>>>>>> 		}
>>>>>>
>>>>>>+		/*
>>>>>>+		 * When M64 BARs functions in Single PE mode, the number of VFs
>>>>>>+		 * could be enabled must be less than the number of M64 BARs.
>>>>>>+		 */
>>>>>>+		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {
>>>>>>+			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
>>>>>>+			return -EBUSY;
>>>>>>+		}
>>>>>>+
>>>>>> 		/* Calculate available PE for required VFs */
>>>>>> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
>>>>>> 		pdn->offset = bitmap_find_next_zero_area(
>>>>>>@@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 		 * the IOV BAR according to the PE# allocated to the VFs.
>>>>>> 		 * Otherwise, the PE# for the VF will conflict with others.
>>>>>> 		 */
>>>>>>-		if (pdn->m64_per_iov == 1) {
>>>>>>+		if (!pdn->m64_single_mode) {
>>>>>> 			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
>>>>>> 			if (ret)
>>>>>> 				goto m64_failed;
>>>>>>@@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>>>>> 	/* Allocate PCI data */
>>>>>> 	add_dev_pci_data(pdev);
>>>>>>
>>>>>>-	pnv_pci_sriov_enable(pdev, num_vfs);
>>>>>>-	return 0;
>>>>>>+	return pnv_pci_sriov_enable(pdev, num_vfs);
>>>>>> }
>>>>>> #endif /* CONFIG_PCI_IOV */
>>>>>>
>>>>>>@@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>>>>
>>>>>> 	pdn = pci_get_pdn(pdev);
>>>>>> 	pdn->vfs_expanded = 0;
>>>>>>+	pdn->m64_single_mode = false;
>>>>>>
>>>>>> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>>>>>>-	pdn->m64_per_iov = 1;
>>>>>> 	mul = phb->ioda.total_pe;
>>>>>>
>>>>>> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>>>>>@@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>>>>> 		if (size > (1 << 26)) {
>>>>>> 			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
>>>>>> 				 i, res);
>>>>>>-			pdn->m64_per_iov = M64_PER_IOV;
>>>>>> 			mul = roundup_pow_of_two(total_vfs);
>>>>>>+			pdn->m64_single_mode = true;
>>>>>> 			break;
>>>>>> 		}
>>>>>> 	}
>>>>>>@@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>>>>>> static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>>>> 						      int resno)
>>>>>> {
>>>>>>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>>>>>+	struct pnv_phb *phb = hose->private_data;
>>>>>> 	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>>>> 	resource_size_t align;
>>>>>>
>>>>>>@@ -2995,12 +2947,23 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>>>> 	 * SR-IOV. While from hardware perspective, the range mapped by M64
>>>>>> 	 * BAR should be size aligned.
>>>>>> 	 *
>>>>>>+	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
>>>>>>+	 * powernv-specific hardware restriction is gone. But if just use the
>>>>>>+	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
>>>>>>+	 * in one segment of M64 #15, which introduces the PE conflict between
>>>>>>+	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
>>>>>>+	 * m64_segsize.
>>>>>>+	 *
>>>>>
>>>>>When M64 BARs run in single mode, it still requires 32MB aligned. Does
>>>>>"the extra powernv-specific hardware restriction" includes the limitation
>>>>>or not?
>>>>>
>>>>
>>>>The 32MB aligned you mentioned is the size of VF BAR?
>>>>
>>>>The answer is No. "the extra powernv-specific hardware restriction" only
>>>>mentioned the total M64 BAR alignment.
>>>>
>>>
>>>No, it's PHB's M64 BAR. there are 16 in total. When it's running in single PE
>>>mode, it requires 32MB alignment at least. Extracted from PHB3 spec:
>>>
>>>When Single PE = 1, Bits 02:26 are the base address to compare
>>>against the MSB 25 AIB address bits, 0:24, out of 0:49 (32MB aligned).
>>>Bits 27:31 are the upper 5 bits of the PE# that owns the entire address
>>>space defined by this BAR.
>>>
>>
>>Good, looks we have another hardware specific requirement.
>>
>>While it is not proper to handle it here. This requirement effects the whole
>>pci resource assignment.
>>
>
>Even PHB's M64 BAR runs in shared mode, it still have minimal alignment.
>I'm suprised that you even don't know that... I never suggest to fix it
>here, but the comments needs to be fixed here to avoid confusing others.
>
>>Linux kernel doesn't assign VF BAR individually, it assigns the IOV BAR and
>>VF BAR is a piece in it and they are continuous. So this means if the VF BAR
>>is less than 32MB and we want to use M64 BAR in Single PE mode, this would
>>make two VF BAR sits in one PE.
>>
>
>hrm, how can VF BAR can be assigned invididually? Remember there is one set
>of PCI config registers tracking one IOV BAR that could be made of multiple VF
>BARs. From this perspective, VF BAR is a "virtual" concept without corresponding
>"real" config registers
>
>>My suggestion is to add this check in pnv_pci_ioda_fixup_iov_resources(), when
>>we decide to use M64 BAR in Single PE mode to cover a IOV BAR, each VF BAR
>>should be at least 32MB. When this condition is met, the alignment from M64
>>Single PE mode is met too.
>>
>
>That would be best place to check and add another limitation as I can see for
>now.
>

Would add this check in next version.

>>>>>> 	 * This function returns the total IOV BAR size if M64 BAR is in
>>>>>> 	 * Shared PE mode or just the individual size if not.
>>>>>>+	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
>>>>>>+	 * M64 segment size if IOV BAR size is less.
>>>>>> 	 */
>>>>>> 	align = pci_iov_resource_size(pdev, resno);
>>>>>> 	if (!pdn->vfs_expanded)
>>>>>> 		return align;
>>>>>>+	if (pdn->m64_single_mode)
>>>>>>+		return max(align, (resource_size_t)phb->ioda.m64_segsize);
>>>>>>
>>>>>> 	return pdn->vfs_expanded * align;
>>>>>> }
>>>>>>-- 
>>>>>>2.5.0
>>>>>>
>>>>
>>>>-- 
>>>>Richard Yang
>>>>Help you, Help me
>>
>>-- 
>>Richard Yang
>>Help you, Help me

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 0/6] Redesign SR-IOV on PowerNV
  2015-09-03  1:29 [PATCH V5 0/6] Redesign SR-IOV on PowerNV Wei Yang
@ 2015-09-03  3:21 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-03  3:21 UTC (permalink / raw)
  To: Wei Yang, gwshan, mpe, benh; +Cc: linuxppc-dev

On 09/03/2015 11:29 AM, Wei Yang wrote:
> In original design, it tries to group VFs to enable more number of VFs in the
> system, when VF BAR is bigger than 64MB. This design has a flaw in which one
> error on a VF will interfere other VFs in the same group.
>
> This patch series change this design by using M64 BAR in Single PE mode to
> cover only one VF BAR. By doing so, it gives absolute isolation between VFs.
>
> v5:
>     * rebase on top of v4.2, with commit 68230242cdb "net/mlx4_core: Add port
>       attribute when tracking counters" reverted
>     * test ssh from guest to host via VF passed and then shutdown the guest
>     * no code change


This is weak test actually. When I try to load it more (described in 
another mail thread with the "mlx4" subject, please respond there), the 
host is crashing quite soon.



> v4:
>     * rebase the code on top of v4.2-rc7
>     * switch back to use the dynamic version of pe_num_map and m64_map
>     * split the memory allocation and PE assignment of pe_num_map to make it
>       more easy to read
>     * check pe_num_map value before free PE.
>     * add the rename reason for pe_num_map and m64_map in change log
> v3:
>     * return -ENOSPC when a VF has non-64bit prefetchable BAR
>     * rename offset to pe_num_map and define it staticly
>     * change commit log based on comments
>     * define m64_map staticly
> v2:
>     * clean up iov bar alignment calculation
>     * change m64s to m64_bars
>     * add a field to represent M64 Single PE mode will be used
>     * change m64_wins to m64_map
>     * calculate the gate instead of hard coded
>     * dynamically allocate m64_map
>     * dynamically allocate PE#
>     * add a case to calculate iov bar alignment when M64 Single PE is used
>     * when M64 Single PE is used, compare num_vfs with M64 BAR available number
>       in system at first
>
> Wei Yang (6):
>    powerpc/powernv: don't enable SRIOV when VF BAR has non
>      64bit-prefetchable BAR
>    powerpc/powernv: simplify the calculation of iov resource alignment
>    powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
>    powerpc/powernv: replace the hard coded boundary with gate
>    powerpc/powernv: boundary the total VF BAR size instead of the
>      individual one
>    powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE
>      mode
>
>   arch/powerpc/include/asm/pci-bridge.h     |   7 +-
>   arch/powerpc/platforms/powernv/pci-ioda.c | 328 ++++++++++++++++--------------
>   2 files changed, 175 insertions(+), 160 deletions(-)
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V5 0/6] Redesign SR-IOV on PowerNV
@ 2015-09-03  1:29 Wei Yang
  2015-09-03  3:21 ` Alexey Kardashevskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Wei Yang @ 2015-09-03  1:29 UTC (permalink / raw)
  To: aik, gwshan, mpe, benh; +Cc: linuxppc-dev, Wei Yang

In original design, it tries to group VFs to enable more number of VFs in the
system, when VF BAR is bigger than 64MB. This design has a flaw in which one
error on a VF will interfere other VFs in the same group.

This patch series change this design by using M64 BAR in Single PE mode to
cover only one VF BAR. By doing so, it gives absolute isolation between VFs.

v5:
   * rebase on top of v4.2, with commit 68230242cdb "net/mlx4_core: Add port
     attribute when tracking counters" reverted
   * test ssh from guest to host via VF passed and then shutdown the guest
   * no code change
v4:
   * rebase the code on top of v4.2-rc7
   * switch back to use the dynamic version of pe_num_map and m64_map
   * split the memory allocation and PE assignment of pe_num_map to make it
     more easy to read
   * check pe_num_map value before free PE.
   * add the rename reason for pe_num_map and m64_map in change log
v3:
   * return -ENOSPC when a VF has non-64bit prefetchable BAR
   * rename offset to pe_num_map and define it staticly
   * change commit log based on comments
   * define m64_map staticly
v2:
   * clean up iov bar alignment calculation
   * change m64s to m64_bars
   * add a field to represent M64 Single PE mode will be used
   * change m64_wins to m64_map
   * calculate the gate instead of hard coded
   * dynamically allocate m64_map
   * dynamically allocate PE#
   * add a case to calculate iov bar alignment when M64 Single PE is used
   * when M64 Single PE is used, compare num_vfs with M64 BAR available number 
     in system at first

Wei Yang (6):
  powerpc/powernv: don't enable SRIOV when VF BAR has non
    64bit-prefetchable BAR
  powerpc/powernv: simplify the calculation of iov resource alignment
  powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  powerpc/powernv: replace the hard coded boundary with gate
  powerpc/powernv: boundary the total VF BAR size instead of the
    individual one
  powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE
    mode

 arch/powerpc/include/asm/pci-bridge.h     |   7 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 328 ++++++++++++++++--------------
 2 files changed, 175 insertions(+), 160 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-10-15  7:30 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-09  2:46 [PATCH V5 0/6] Redesign SR-IOV on PowerNV Wei Yang
2015-10-09  2:46 ` [PATCH V5 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
2015-10-09  8:15   ` Benjamin Herrenschmidt
2015-10-09  9:02     ` David Laight
2015-10-12  3:16       ` Wei Yang
2015-10-12  2:58     ` Wei Yang
2015-10-12  6:55       ` Benjamin Herrenschmidt
2015-10-13  2:30         ` Wei Yang
2015-10-13  0:01   ` Gavin Shan
2015-10-13  1:49     ` Wei Yang
2015-10-13  3:20       ` Gavin Shan
2015-10-13  3:50         ` Wei Yang
2015-10-09  2:46 ` [PATCH V5 2/6] powerpc/powernv: simplify the calculation of iov resource alignment Wei Yang
2015-10-13  0:13   ` Gavin Shan
2015-10-13  2:45     ` Wei Yang
2015-10-13  3:27       ` Gavin Shan
2015-10-13  3:56         ` Wei Yang
2015-10-14  1:22           ` Gavin Shan
2015-10-09  2:46 ` [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR Wei Yang
2015-10-12 23:55   ` Gavin Shan
2015-10-13  2:50     ` Wei Yang
2015-10-13  3:32       ` Gavin Shan
2015-10-13  5:29         ` Wei Yang
2015-10-14  1:15           ` Gavin Shan
2015-10-15  7:29             ` Wei Yang
2015-10-09  2:46 ` [PATCH V5 4/6] powerpc/powernv: replace the hard coded boundary with gate Wei Yang
2015-10-09  2:46 ` [PATCH V5 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one Wei Yang
2015-10-09  2:46 ` [PATCH V5 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode Wei Yang
  -- strict thread matches above, loose matches on Subject: below --
2015-09-03  1:29 [PATCH V5 0/6] Redesign SR-IOV on PowerNV Wei Yang
2015-09-03  3:21 ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.