linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/6] Redesign SR-IOV on PowerNV
@ 2015-08-19  2:01 Wei Yang
  2015-08-19  2:01 ` [PATCH V4 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Wei Yang @ 2015-08-19  2:01 UTC (permalink / raw)
  To: aik, gwshan, benh; +Cc: linuxppc-dev, Wei Yang

In original design, it tries to group VFs to enable more number of VFs in the
system, when VF BAR is bigger than 64MB. This design has a flaw in which one
error on a VF will interfere other VFs in the same group.

This patch series change this design by using M64 BAR in Single PE mode to
cover only one VF BAR. By doing so, it gives absolute isolation between VFs.

v4:
   * rebase the code on top of v4.2-rc7
   * switch back to use the dynamic version of pe_num_map and m64_map
   * split the memory allocation and PE assignment of pe_num_map to make it
     more easy to read
   * check pe_num_map value before free PE.
   * add the rename reason for pe_num_map and m64_map in change log
v3:
   * return -ENOSPC when a VF has non-64bit prefetchable BAR
   * rename offset to pe_num_map and define it staticly
   * change commit log based on comments
   * define m64_map staticly
v2:
   * clean up iov bar alignment calculation
   * change m64s to m64_bars
   * add a field to represent M64 Single PE mode will be used
   * change m64_wins to m64_map
   * calculate the gate instead of hard coded
   * dynamically allocate m64_map
   * dynamically allocate PE#
   * add a case to calculate iov bar alignment when M64 Single PE is used
   * when M64 Single PE is used, compare num_vfs with M64 BAR available number 
     in system at first



Wei Yang (6):
  powerpc/powernv: don't enable SRIOV when VF BAR has non
    64bit-prefetchable BAR
  powerpc/powernv: simplify the calculation of iov resource alignment
  powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  powerpc/powernv: replace the hard coded boundary with gate
  powerpc/powernv: boundary the total VF BAR size instead of the
    individual one
  powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE
    mode

 arch/powerpc/include/asm/pci-bridge.h     |    7 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  328 +++++++++++++++--------------
 2 files changed, 175 insertions(+), 160 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH V4 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-08-19  2:01 [PATCH V4 0/6] Redesign SR-IOV on PowerNV Wei Yang
@ 2015-08-19  2:01 ` Wei Yang
  2015-10-02  8:55   ` Alexey Kardashevskiy
  2015-08-19  2:01 ` [PATCH V4 2/6] powerpc/powernv: simplify the calculation of iov resource alignment Wei Yang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Wei Yang @ 2015-08-19  2:01 UTC (permalink / raw)
  To: aik, gwshan, benh; +Cc: linuxppc-dev, Wei Yang

On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
from 64bit prefetchable window, which means M64 BAR can't work on it.

This patch makes this explicit.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 85cbc96..8c031b5 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 		if (!res->flags || !res->parent)
 			continue;
 
-		if (!pnv_pci_is_mem_pref_64(res->flags))
-			continue;
-
 		/*
 		 * The actual IOV BAR range is determined by the start address
 		 * and the actual size for num_vfs VFs BAR.  This check is to
@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 		if (!res->flags || !res->parent)
 			continue;
 
-		if (!pnv_pci_is_mem_pref_64(res->flags))
-			continue;
-
 		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
 		res2 = *res;
 		res->start += size * offset;
@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 		if (!res->flags || !res->parent)
 			continue;
 
-		if (!pnv_pci_is_mem_pref_64(res->flags))
-			continue;
-
 		for (j = 0; j < vf_groups; j++) {
 			do {
 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 	pdn = pci_get_pdn(pdev);
 
 	if (phb->type == PNV_PHB_IODA2) {
+		if (!pdn->vfs_expanded) {
+			dev_info(&pdev->dev, "don't support this SRIOV device"
+				" with non 64bit-prefetchable IOV BAR\n");
+			return -ENOSPC;
+		}
+
 		/* Calculate available PE for required VFs */
 		mutex_lock(&phb->ioda.pe_alloc_mutex);
 		pdn->offset = bitmap_find_next_zero_area(
@@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		if (!res->flags || res->parent)
 			continue;
 		if (!pnv_pci_is_mem_pref_64(res->flags)) {
-			dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
+			dev_warn(&pdev->dev, "Don't support SR-IOV with"
+					" non M64 VF BAR%d: %pR. \n",
 				 i, res);
-			continue;
+			return;
 		}
 
 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
@@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
 		if (!res->flags || res->parent)
 			continue;
-		if (!pnv_pci_is_mem_pref_64(res->flags)) {
-			dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
-				 i, res);
-			continue;
-		}
 
 		dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V4 2/6] powerpc/powernv: simplify the calculation of iov resource alignment
  2015-08-19  2:01 [PATCH V4 0/6] Redesign SR-IOV on PowerNV Wei Yang
  2015-08-19  2:01 ` [PATCH V4 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
@ 2015-08-19  2:01 ` Wei Yang
  2015-10-02  8:58   ` Alexey Kardashevskiy
  2015-08-19  2:01 ` [PATCH V4 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR Wei Yang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Wei Yang @ 2015-08-19  2:01 UTC (permalink / raw)
  To: aik, gwshan, benh; +Cc: linuxppc-dev, Wei Yang

The alignment of IOV BAR on PowerNV platform is the total size of the IOV
BAR. No matter whether the IOV BAR is extended with number of
roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
size could be calculated by (vfs_expanded * VF_BAR_size).

This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
first case.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8c031b5..e3e0acb 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2988,12 +2988,16 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 						      int resno)
 {
 	struct pci_dn *pdn = pci_get_pdn(pdev);
-	resource_size_t align, iov_align;
-
-	iov_align = resource_size(&pdev->resource[resno]);
-	if (iov_align)
-		return iov_align;
+	resource_size_t align;
 
+	/*
+	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
+	 * SR-IOV. While from hardware perspective, the range mapped by M64
+	 * BAR should be size aligned.
+	 *
+	 * This function returns the total IOV BAR size if expanded or just the
+	 * individual size if not.
+	 */
 	align = pci_iov_resource_size(pdev, resno);
 	if (pdn->vfs_expanded)
 		return pdn->vfs_expanded * align;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V4 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-08-19  2:01 [PATCH V4 0/6] Redesign SR-IOV on PowerNV Wei Yang
  2015-08-19  2:01 ` [PATCH V4 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
  2015-08-19  2:01 ` [PATCH V4 2/6] powerpc/powernv: simplify the calculation of iov resource alignment Wei Yang
@ 2015-08-19  2:01 ` Wei Yang
  2015-08-19  2:21   ` Gavin Shan
  2015-10-02  9:29   ` Alexey Kardashevskiy
  2015-08-19  2:01 ` [PATCH V4 4/6] powerpc/powernv: replace the hard coded boundary with gate Wei Yang
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Wei Yang @ 2015-08-19  2:01 UTC (permalink / raw)
  To: aik, gwshan, benh; +Cc: linuxppc-dev, Wei Yang

In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
BARs in Single PE mode to cover the number of VFs required to be enabled.
By doing so, several VFs would be in one VF Group and leads to interference
between VFs in the same group.

And in this patch, m64_wins is renamed to m64_map, which means index number
of the M64 BAR used to map the VF BAR. Based on Gavin's comments.

This patch changes the design by using one M64 BAR in Single PE mode for
one VF BAR. This gives absolute isolation for VFs.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |    5 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  178 ++++++++++++-----------------
 2 files changed, 74 insertions(+), 109 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 712add5..8aeba4c 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -214,10 +214,9 @@ struct pci_dn {
 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
 	u16     num_vfs;		/* number of VFs enabled*/
 	int     offset;			/* PE# for the first VF PE */
-#define M64_PER_IOV 4
-	int     m64_per_iov;
+	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
 #define IODA_INVALID_M64        (-1)
-	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
+	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
 	struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e3e0acb..de7db1d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
 }
 
 #ifdef CONFIG_PCI_IOV
-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
 {
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
 	struct pci_dn         *pdn;
 	int                    i, j;
+	int                    m64_bars;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
 	phb = hose->private_data;
 	pdn = pci_get_pdn(pdev);
 
+	if (pdn->m64_single_mode)
+		m64_bars = num_vfs;
+	else
+		m64_bars = 1;
+
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-		for (j = 0; j < M64_PER_IOV; j++) {
-			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
+		for (j = 0; j < m64_bars; j++) {
+			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
 				continue;
 			opal_pci_phb_mmio_enable(phb->opal_id,
-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
-			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
+				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
+			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
+			pdn->m64_map[j][i] = IODA_INVALID_M64;
 		}
 
+	kfree(pdn->m64_map);
 	return 0;
 }
 
@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 	int                    total_vfs;
 	resource_size_t        size, start;
 	int                    pe_num;
-	int                    vf_groups;
-	int                    vf_per_group;
+	int                    m64_bars;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 	pdn = pci_get_pdn(pdev);
 	total_vfs = pci_sriov_get_totalvfs(pdev);
 
-	/* Initialize the m64_wins to IODA_INVALID_M64 */
-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-		for (j = 0; j < M64_PER_IOV; j++)
-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
+	if (pdn->m64_single_mode)
+		m64_bars = num_vfs;
+	else
+		m64_bars = 1;
+
+	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
+	if (!pdn->m64_map)
+		return -ENOMEM;
+	/* Initialize the m64_map to IODA_INVALID_M64 */
+	for (i = 0; i < m64_bars ; i++)
+		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
+			pdn->m64_map[i][j] = IODA_INVALID_M64;
 
-	if (pdn->m64_per_iov == M64_PER_IOV) {
-		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
-		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
-			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
-	} else {
-		vf_groups = 1;
-		vf_per_group = 1;
-	}
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
 		if (!res->flags || !res->parent)
 			continue;
 
-		for (j = 0; j < vf_groups; j++) {
+		for (j = 0; j < m64_bars; j++) {
 			do {
 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
 						phb->ioda.m64_bar_idx + 1, 0);
@@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 					goto m64_failed;
 			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
 
-			pdn->m64_wins[i][j] = win;
+			pdn->m64_map[j][i] = win;
 
-			if (pdn->m64_per_iov == M64_PER_IOV) {
+			if (pdn->m64_single_mode) {
 				size = pci_iov_resource_size(pdev,
 							PCI_IOV_RESOURCES + i);
-				size = size * vf_per_group;
 				start = res->start + size * j;
 			} else {
 				size = resource_size(res);
@@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 			}
 
 			/* Map the M64 here */
-			if (pdn->m64_per_iov == M64_PER_IOV) {
+			if (pdn->m64_single_mode) {
 				pe_num = pdn->offset + j;
 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
 						pe_num, OPAL_M64_WINDOW_TYPE,
-						pdn->m64_wins[i][j], 0);
+						pdn->m64_map[j][i], 0);
 			}
 
 			rc = opal_pci_set_phb_mem_window(phb->opal_id,
 						 OPAL_M64_WINDOW_TYPE,
-						 pdn->m64_wins[i][j],
+						 pdn->m64_map[j][i],
 						 start,
 						 0, /* unused */
 						 size);
@@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 				goto m64_failed;
 			}
 
-			if (pdn->m64_per_iov == M64_PER_IOV)
+			if (pdn->m64_single_mode)
 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
 			else
 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
 
 			if (rc != OPAL_SUCCESS) {
 				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
@@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 	return 0;
 
 m64_failed:
-	pnv_pci_vf_release_m64(pdev);
+	pnv_pci_vf_release_m64(pdev, num_vfs);
 	return -EBUSY;
 }
 
@@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
 	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
 }
 
-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
 {
 	struct pci_bus        *bus;
 	struct pci_controller *hose;
 	struct pnv_phb        *phb;
 	struct pnv_ioda_pe    *pe, *pe_n;
 	struct pci_dn         *pdn;
-	u16                    vf_index;
-	int64_t                rc;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	if (!pdev->is_physfn)
 		return;
 
-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
-		int   vf_group;
-		int   vf_per_group;
-		int   vf_index1;
-
-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
-
-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
-			for (vf_index = vf_group * vf_per_group;
-				vf_index < (vf_group + 1) * vf_per_group &&
-				vf_index < num_vfs;
-				vf_index++)
-				for (vf_index1 = vf_group * vf_per_group;
-					vf_index1 < (vf_group + 1) * vf_per_group &&
-					vf_index1 < num_vfs;
-					vf_index1++){
-
-					rc = opal_pci_set_peltv(phb->opal_id,
-						pdn->offset + vf_index,
-						pdn->offset + vf_index1,
-						OPAL_REMOVE_PE_FROM_DOMAIN);
-
-					if (rc)
-					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
-						__func__,
-						pdn->offset + vf_index1, rc);
-				}
-	}
-
 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
 		if (pe->parent_dev != pdev)
 			continue;
@@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 	num_vfs = pdn->num_vfs;
 
 	/* Release VF PEs */
-	pnv_ioda_release_vf_PE(pdev, num_vfs);
+	pnv_ioda_release_vf_PE(pdev);
 
 	if (phb->type == PNV_PHB_IODA2) {
-		if (pdn->m64_per_iov == 1)
+		if (!pdn->m64_single_mode)
 			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
 
 		/* Release M64 windows */
-		pnv_pci_vf_release_m64(pdev);
+		pnv_pci_vf_release_m64(pdev, num_vfs);
 
 		/* Release PE numbers */
 		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
@@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 	int                    pe_num;
 	u16                    vf_index;
 	struct pci_dn         *pdn;
-	int64_t                rc;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 
 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
 	}
-
-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
-		int   vf_group;
-		int   vf_per_group;
-		int   vf_index1;
-
-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
-
-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
-			for (vf_index = vf_group * vf_per_group;
-			     vf_index < (vf_group + 1) * vf_per_group &&
-			     vf_index < num_vfs;
-			     vf_index++) {
-				for (vf_index1 = vf_group * vf_per_group;
-				     vf_index1 < (vf_group + 1) * vf_per_group &&
-				     vf_index1 < num_vfs;
-				     vf_index1++) {
-
-					rc = opal_pci_set_peltv(phb->opal_id,
-						pdn->offset + vf_index,
-						pdn->offset + vf_index1,
-						OPAL_ADD_PE_TO_DOMAIN);
-
-					if (rc)
-					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
-						__func__,
-						pdn->offset + vf_index1, rc);
-				}
-			}
-		}
-	}
 }
 
 int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
@@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 			return -ENOSPC;
 		}
 
+		/*
+		 * When M64 BARs functions in Single PE mode, the number of VFs
+		 * could be enabled must be less than the number of M64 BARs.
+		 */
+		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {
+			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
+			return -EBUSY;
+		}
+
 		/* Calculate available PE for required VFs */
 		mutex_lock(&phb->ioda.pe_alloc_mutex);
 		pdn->offset = bitmap_find_next_zero_area(
@@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 		 * the IOV BAR according to the PE# allocated to the VFs.
 		 * Otherwise, the PE# for the VF will conflict with others.
 		 */
-		if (pdn->m64_per_iov == 1) {
+		if (!pdn->m64_single_mode) {
 			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
 			if (ret)
 				goto m64_failed;
@@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 	/* Allocate PCI data */
 	add_dev_pci_data(pdev);
 
-	pnv_pci_sriov_enable(pdev, num_vfs);
-	return 0;
+	return pnv_pci_sriov_enable(pdev, num_vfs);
 }
 #endif /* CONFIG_PCI_IOV */
 
@@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 
 	pdn = pci_get_pdn(pdev);
 	pdn->vfs_expanded = 0;
+	pdn->m64_single_mode = false;
 
 	total_vfs = pci_sriov_get_totalvfs(pdev);
-	pdn->m64_per_iov = 1;
 	mul = phb->ioda.total_pe;
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
@@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		if (size > (1 << 26)) {
 			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
 				 i, res);
-			pdn->m64_per_iov = M64_PER_IOV;
 			mul = roundup_pow_of_two(total_vfs);
+			pdn->m64_single_mode = true;
 			break;
 		}
 	}
@@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
 static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 						      int resno)
 {
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
 	struct pci_dn *pdn = pci_get_pdn(pdev);
 	resource_size_t align;
 
@@ -2995,12 +2947,26 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 	 * SR-IOV. While from hardware perspective, the range mapped by M64
 	 * BAR should be size aligned.
 	 *
+	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
+	 * powernv-specific hardware restriction is gone. But if just use the
+	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
+	 * in one segment of M64 #15, which introduces the PE conflict between
+	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
+	 * m64_segsize.
+	 *
 	 * This function returns the total IOV BAR size if expanded or just the
-	 * individual size if not.
+	 * individual size if not, when M64 BAR is in Shared PE mode.
+	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
+	 * M64 segment size if IOV BAR size is less.
 	 */
 	align = pci_iov_resource_size(pdev, resno);
-	if (pdn->vfs_expanded)
-		return pdn->vfs_expanded * align;
+	if (pdn->vfs_expanded) {
+		if (pdn->m64_single_mode)
+			return max(align,
+				(resource_size_t)phb->ioda.m64_segsize);
+		else
+			return pdn->vfs_expanded * align;
+	}
 
 	return align;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V4 4/6] powerpc/powernv: replace the hard coded boundary with gate
  2015-08-19  2:01 [PATCH V4 0/6] Redesign SR-IOV on PowerNV Wei Yang
                   ` (2 preceding siblings ...)
  2015-08-19  2:01 ` [PATCH V4 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR Wei Yang
@ 2015-08-19  2:01 ` Wei Yang
  2015-08-19  2:01 ` [PATCH V4 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one Wei Yang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Wei Yang @ 2015-08-19  2:01 UTC (permalink / raw)
  To: aik, gwshan, benh; +Cc: linuxppc-dev, Wei Yang

At the moment 64bit-prefetchable window can be maximum 64GB, which is
currently got from device tree. This means that in shared mode the maximum
supported VF BAR size is 64GB/256=256MB. While this size could exhaust the
whole 64bit-prefetchable window. This is a design decision to set a
boundary to 64MB of the VF BAR size. Since VF BAR size with 64MB would
occupy a quarter of the 64bit-prefetchable window, this is affordable.

This patch replaces magic limit of 64MB with "gate", which is 1/4 of the
M64 Segment Size(m64_segsize >> 2) and adds comment to explain the reason
for it.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vent.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index de7db1d..b8bc51f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2696,8 +2696,9 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
 #ifdef CONFIG_PCI_IOV
 static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 {
-	struct pci_controller *hose;
-	struct pnv_phb *phb;
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	const resource_size_t gate = phb->ioda.m64_segsize >> 2;
 	struct resource *res;
 	int i;
 	resource_size_t size;
@@ -2707,9 +2708,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	if (!pdev->is_physfn || pdev->is_added)
 		return;
 
-	hose = pci_bus_to_host(pdev->bus);
-	phb = hose->private_data;
-
 	pdn = pci_get_pdn(pdev);
 	pdn->vfs_expanded = 0;
 	pdn->m64_single_mode = false;
@@ -2730,10 +2728,22 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 
 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
 
-		/* bigger than 64M */
-		if (size > (1 << 26)) {
-			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
-				 i, res);
+		/*
+		 * If bigger than quarter of M64 segment size, just round up
+		 * power of two.
+		 *
+		 * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
+		 * with other devices, IOV BAR size is expanded to be
+		 * (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
+		 * segment size , the expanded size would equal to half of the
+		 * whole M64 space size, which will exhaust the M64 space and
+		 * limit the system flexibility.  This is a design decision to
+		 * set the boundary to quarter of the M64 segment size.
+		 */
+		if (size > gate) {
+			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size "
+				"is bigger than %lld, roundup power2\n",
+				 i, res, gate);
 			mul = roundup_pow_of_two(total_vfs);
 			pdn->m64_single_mode = true;
 			break;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V4 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one
  2015-08-19  2:01 [PATCH V4 0/6] Redesign SR-IOV on PowerNV Wei Yang
                   ` (3 preceding siblings ...)
  2015-08-19  2:01 ` [PATCH V4 4/6] powerpc/powernv: replace the hard coded boundary with gate Wei Yang
@ 2015-08-19  2:01 ` Wei Yang
  2015-10-02  9:51   ` Alexey Kardashevskiy
  2015-08-19  2:01 ` [PATCH V4 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode Wei Yang
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Wei Yang @ 2015-08-19  2:01 UTC (permalink / raw)
  To: aik, gwshan, benh; +Cc: linuxppc-dev, Wei Yang

Each VF could have 6 BARs at most. When the total BAR size exceeds the
gate, after expanding it will also exhaust the M64 Window.

This patch limits the boundary by checking the total VF BAR size instead of
the individual BAR.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index b8bc51f..4bc83b8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2701,7 +2701,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	const resource_size_t gate = phb->ioda.m64_segsize >> 2;
 	struct resource *res;
 	int i;
-	resource_size_t size;
+	resource_size_t size, total_vf_bar_sz;
 	struct pci_dn *pdn;
 	int mul, total_vfs;
 
@@ -2714,6 +2714,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 
 	total_vfs = pci_sriov_get_totalvfs(pdev);
 	mul = phb->ioda.total_pe;
+	total_vf_bar_sz = 0;
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
@@ -2726,7 +2727,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 			return;
 		}
 
-		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+		total_vf_bar_sz += pci_iov_resource_size(pdev,
+				i + PCI_IOV_RESOURCES);
 
 		/*
 		 * If bigger than quarter of M64 segment size, just round up
@@ -2740,11 +2742,11 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		 * limit the system flexibility.  This is a design decision to
 		 * set the boundary to quarter of the M64 segment size.
 		 */
-		if (size > gate) {
-			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size "
-				"is bigger than %lld, roundup power2\n",
-				 i, res, gate);
+		if (total_vf_bar_sz > gate) {
 			mul = roundup_pow_of_two(total_vfs);
+			dev_info(&pdev->dev,
+				"VF BAR Total IOV size %llx > %llx, roundup to %d VFs\n",
+				total_vf_bar_sz, gate, mul);
 			pdn->m64_single_mode = true;
 			break;
 		}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V4 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode
  2015-08-19  2:01 [PATCH V4 0/6] Redesign SR-IOV on PowerNV Wei Yang
                   ` (4 preceding siblings ...)
  2015-08-19  2:01 ` [PATCH V4 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one Wei Yang
@ 2015-08-19  2:01 ` Wei Yang
  2015-08-19  2:21   ` Gavin Shan
  2015-10-02 10:05   ` Alexey Kardashevskiy
  2015-08-26  5:11 ` [PATCH V4 0/6] Redesign SR-IOV on PowerNV Alexey Kardashevskiy
  2015-10-02 10:07 ` Alexey Kardashevskiy
  7 siblings, 2 replies; 23+ messages in thread
From: Wei Yang @ 2015-08-19  2:01 UTC (permalink / raw)
  To: aik, gwshan, benh; +Cc: linuxppc-dev, Wei Yang

When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
sparse.

This patch restructures the patch to allocate sparse PE# for VFs when M64
BAR is set to Single PE mode. Also it rename the offset to pe_num_map to
reflect the content is the PE number.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci-bridge.h     |    2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |   79 ++++++++++++++++++++++-------
 2 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 8aeba4c..b3a226b 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -213,7 +213,7 @@ struct pci_dn {
 #ifdef CONFIG_PCI_IOV
 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
 	u16     num_vfs;		/* number of VFs enabled*/
-	int     offset;			/* PE# for the first VF PE */
+	int     *pe_num_map;		/* PE# for the first VF PE or array */
 	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
 #define IODA_INVALID_M64        (-1)
 	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 4bc83b8..779f52a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1243,7 +1243,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 
 			/* Map the M64 here */
 			if (pdn->m64_single_mode) {
-				pe_num = pdn->offset + j;
+				pe_num = pdn->pe_num_map[j];
 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
 						pe_num, OPAL_M64_WINDOW_TYPE,
 						pdn->m64_map[j][i], 0);
@@ -1347,7 +1347,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 	struct pnv_phb        *phb;
 	struct pci_dn         *pdn;
 	struct pci_sriov      *iov;
-	u16 num_vfs;
+	u16                    num_vfs, i;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1361,14 +1361,21 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 
 	if (phb->type == PNV_PHB_IODA2) {
 		if (!pdn->m64_single_mode)
-			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
+			pnv_pci_vf_resource_shift(pdev, -*pdn->pe_num_map);
 
 		/* Release M64 windows */
 		pnv_pci_vf_release_m64(pdev, num_vfs);
 
 		/* Release PE numbers */
-		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
-		pdn->offset = 0;
+		if (pdn->m64_single_mode) {
+			for (i = 0; i < num_vfs; i++) {
+				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
+					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
+			}
+		} else
+			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
+		/* Releasing pe_num_map */
+		kfree(pdn->pe_num_map);
 	}
 }
 
@@ -1394,7 +1401,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 
 	/* Reserve PE for each VF */
 	for (vf_index = 0; vf_index < num_vfs; vf_index++) {
-		pe_num = pdn->offset + vf_index;
+		if (pdn->m64_single_mode)
+			pe_num = pdn->pe_num_map[vf_index];
+		else
+			pe_num = *pdn->pe_num_map + vf_index;
 
 		pe = &phb->ioda.pe_array[pe_num];
 		pe->pe_number = pe_num;
@@ -1436,6 +1446,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 	struct pnv_phb        *phb;
 	struct pci_dn         *pdn;
 	int                    ret;
+	u16                    i;
 
 	bus = pdev->bus;
 	hose = pci_bus_to_host(bus);
@@ -1458,20 +1469,42 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 			return -EBUSY;
 		}
 
+		/* Allocating pe_num_map */
+		if (pdn->m64_single_mode)
+			pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map) * num_vfs,
+					GFP_KERNEL);
+		else
+			pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map), GFP_KERNEL);
+
+		if (!pdn->pe_num_map)
+			return -ENOMEM;
+
 		/* Calculate available PE for required VFs */
-		mutex_lock(&phb->ioda.pe_alloc_mutex);
-		pdn->offset = bitmap_find_next_zero_area(
-			phb->ioda.pe_alloc, phb->ioda.total_pe,
-			0, num_vfs, 0);
-		if (pdn->offset >= phb->ioda.total_pe) {
+		if (pdn->m64_single_mode) {
+			for (i = 0; i < num_vfs; i++)
+				pdn->pe_num_map[i] = IODA_INVALID_PE;
+			for (i = 0; i < num_vfs; i++) {
+				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
+				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
+					ret = -EBUSY;
+					goto m64_failed;
+				}
+			}
+		} else {
+			mutex_lock(&phb->ioda.pe_alloc_mutex);
+			*pdn->pe_num_map = bitmap_find_next_zero_area(
+				phb->ioda.pe_alloc, phb->ioda.total_pe,
+				0, num_vfs, 0);
+			if (*pdn->pe_num_map >= phb->ioda.total_pe) {
+				mutex_unlock(&phb->ioda.pe_alloc_mutex);
+				dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
+				kfree(pdn->pe_num_map);
+				return -EBUSY;
+			}
+			bitmap_set(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
 			mutex_unlock(&phb->ioda.pe_alloc_mutex);
-			dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
-			pdn->offset = 0;
-			return -EBUSY;
 		}
-		bitmap_set(phb->ioda.pe_alloc, pdn->offset, num_vfs);
 		pdn->num_vfs = num_vfs;
-		mutex_unlock(&phb->ioda.pe_alloc_mutex);
 
 		/* Assign M64 window accordingly */
 		ret = pnv_pci_vf_assign_m64(pdev, num_vfs);
@@ -1486,7 +1519,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 		 * Otherwise, the PE# for the VF will conflict with others.
 		 */
 		if (!pdn->m64_single_mode) {
-			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
+			ret = pnv_pci_vf_resource_shift(pdev, *pdn->pe_num_map);
 			if (ret)
 				goto m64_failed;
 		}
@@ -1498,8 +1531,16 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
 	return 0;
 
 m64_failed:
-	bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
-	pdn->offset = 0;
+	if (pdn->m64_single_mode) {
+		for (i = 0; i < num_vfs; i++) {
+			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
+				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
+		}
+	} else
+		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
+
+	/* Releasing pe_num_map */
+	kfree(pdn->pe_num_map);
 
 	return ret;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-08-19  2:01 ` [PATCH V4 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR Wei Yang
@ 2015-08-19  2:21   ` Gavin Shan
  2015-10-02  9:29   ` Alexey Kardashevskiy
  1 sibling, 0 replies; 23+ messages in thread
From: Gavin Shan @ 2015-08-19  2:21 UTC (permalink / raw)
  To: Wei Yang; +Cc: aik, gwshan, benh, linuxppc-dev

On Wed, Aug 19, 2015 at 10:01:41AM +0800, Wei Yang wrote:
>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>BARs in Single PE mode to cover the number of VFs required to be enabled.
>By doing so, several VFs would be in one VF Group and leads to interference
>between VFs in the same group.
>
>And in this patch, m64_wins is renamed to m64_map, which means index number
>of the M64 BAR used to map the VF BAR. Based on Gavin's comments.
>
>This patch changes the design by using one M64 BAR in Single PE mode for
>one VF BAR. This gives absolute isolation for VFs.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>

Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

>---
> arch/powerpc/include/asm/pci-bridge.h     |    5 +-
> arch/powerpc/platforms/powernv/pci-ioda.c |  178 ++++++++++++-----------------
> 2 files changed, 74 insertions(+), 109 deletions(-)
>
>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>index 712add5..8aeba4c 100644
>--- a/arch/powerpc/include/asm/pci-bridge.h
>+++ b/arch/powerpc/include/asm/pci-bridge.h
>@@ -214,10 +214,9 @@ struct pci_dn {
> 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
> 	u16     num_vfs;		/* number of VFs enabled*/
> 	int     offset;			/* PE# for the first VF PE */
>-#define M64_PER_IOV 4
>-	int     m64_per_iov;
>+	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
> #define IODA_INVALID_M64        (-1)
>-	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>+	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
> #endif /* CONFIG_PCI_IOV */
> #endif
> 	struct list_head child_list;
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index e3e0acb..de7db1d 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
> }
>
> #ifdef CONFIG_PCI_IOV
>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
> {
> 	struct pci_bus        *bus;
> 	struct pci_controller *hose;
> 	struct pnv_phb        *phb;
> 	struct pci_dn         *pdn;
> 	int                    i, j;
>+	int                    m64_bars;
>
> 	bus = pdev->bus;
> 	hose = pci_bus_to_host(bus);
> 	phb = hose->private_data;
> 	pdn = pci_get_pdn(pdev);
>
>+	if (pdn->m64_single_mode)
>+		m64_bars = num_vfs;
>+	else
>+		m64_bars = 1;
>+
> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>-		for (j = 0; j < M64_PER_IOV; j++) {
>-			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
>+		for (j = 0; j < m64_bars; j++) {
>+			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
> 				continue;
> 			opal_pci_phb_mmio_enable(phb->opal_id,
>-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
>-			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>+				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
>+			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
>+			pdn->m64_map[j][i] = IODA_INVALID_M64;
> 		}
>
>+	kfree(pdn->m64_map);
> 	return 0;
> }
>
>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 	int                    total_vfs;
> 	resource_size_t        size, start;
> 	int                    pe_num;
>-	int                    vf_groups;
>-	int                    vf_per_group;
>+	int                    m64_bars;
>
> 	bus = pdev->bus;
> 	hose = pci_bus_to_host(bus);
>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 	pdn = pci_get_pdn(pdev);
> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>
>-	/* Initialize the m64_wins to IODA_INVALID_M64 */
>-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>-		for (j = 0; j < M64_PER_IOV; j++)
>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>+	if (pdn->m64_single_mode)
>+		m64_bars = num_vfs;
>+	else
>+		m64_bars = 1;
>+
>+	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
>+	if (!pdn->m64_map)
>+		return -ENOMEM;
>+	/* Initialize the m64_map to IODA_INVALID_M64 */
>+	for (i = 0; i < m64_bars ; i++)
>+		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
>+			pdn->m64_map[i][j] = IODA_INVALID_M64;
>
>-	if (pdn->m64_per_iov == M64_PER_IOV) {
>-		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
>-		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
>-			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>-	} else {
>-		vf_groups = 1;
>-		vf_per_group = 1;
>-	}
>
> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
> 		if (!res->flags || !res->parent)
> 			continue;
>
>-		for (j = 0; j < vf_groups; j++) {
>+		for (j = 0; j < m64_bars; j++) {
> 			do {
> 				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
> 						phb->ioda.m64_bar_idx + 1, 0);
>@@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 					goto m64_failed;
> 			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
>
>-			pdn->m64_wins[i][j] = win;
>+			pdn->m64_map[j][i] = win;
>
>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>+			if (pdn->m64_single_mode) {
> 				size = pci_iov_resource_size(pdev,
> 							PCI_IOV_RESOURCES + i);
>-				size = size * vf_per_group;
> 				start = res->start + size * j;
> 			} else {
> 				size = resource_size(res);
>@@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 			}
>
> 			/* Map the M64 here */
>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>+			if (pdn->m64_single_mode) {
> 				pe_num = pdn->offset + j;
> 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> 						pe_num, OPAL_M64_WINDOW_TYPE,
>-						pdn->m64_wins[i][j], 0);
>+						pdn->m64_map[j][i], 0);
> 			}
>
> 			rc = opal_pci_set_phb_mem_window(phb->opal_id,
> 						 OPAL_M64_WINDOW_TYPE,
>-						 pdn->m64_wins[i][j],
>+						 pdn->m64_map[j][i],
> 						 start,
> 						 0, /* unused */
> 						 size);
>@@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 				goto m64_failed;
> 			}
>
>-			if (pdn->m64_per_iov == M64_PER_IOV)
>+			if (pdn->m64_single_mode)
> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
> 			else
> 				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
>
> 			if (rc != OPAL_SUCCESS) {
> 				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
>@@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
> 	return 0;
>
> m64_failed:
>-	pnv_pci_vf_release_m64(pdev);
>+	pnv_pci_vf_release_m64(pdev, num_vfs);
> 	return -EBUSY;
> }
>
>@@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
> 	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
> }
>
>-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
> {
> 	struct pci_bus        *bus;
> 	struct pci_controller *hose;
> 	struct pnv_phb        *phb;
> 	struct pnv_ioda_pe    *pe, *pe_n;
> 	struct pci_dn         *pdn;
>-	u16                    vf_index;
>-	int64_t                rc;
>
> 	bus = pdev->bus;
> 	hose = pci_bus_to_host(bus);
>@@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
> 	if (!pdev->is_physfn)
> 		return;
>
>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>-		int   vf_group;
>-		int   vf_per_group;
>-		int   vf_index1;
>-
>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>-
>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
>-			for (vf_index = vf_group * vf_per_group;
>-				vf_index < (vf_group + 1) * vf_per_group &&
>-				vf_index < num_vfs;
>-				vf_index++)
>-				for (vf_index1 = vf_group * vf_per_group;
>-					vf_index1 < (vf_group + 1) * vf_per_group &&
>-					vf_index1 < num_vfs;
>-					vf_index1++){
>-
>-					rc = opal_pci_set_peltv(phb->opal_id,
>-						pdn->offset + vf_index,
>-						pdn->offset + vf_index1,
>-						OPAL_REMOVE_PE_FROM_DOMAIN);
>-
>-					if (rc)
>-					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
>-						__func__,
>-						pdn->offset + vf_index1, rc);
>-				}
>-	}
>-
> 	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
> 		if (pe->parent_dev != pdev)
> 			continue;
>@@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
> 	num_vfs = pdn->num_vfs;
>
> 	/* Release VF PEs */
>-	pnv_ioda_release_vf_PE(pdev, num_vfs);
>+	pnv_ioda_release_vf_PE(pdev);
>
> 	if (phb->type == PNV_PHB_IODA2) {
>-		if (pdn->m64_per_iov == 1)
>+		if (!pdn->m64_single_mode)
> 			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>
> 		/* Release M64 windows */
>-		pnv_pci_vf_release_m64(pdev);
>+		pnv_pci_vf_release_m64(pdev, num_vfs);
>
> 		/* Release PE numbers */
> 		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>@@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
> 	int                    pe_num;
> 	u16                    vf_index;
> 	struct pci_dn         *pdn;
>-	int64_t                rc;
>
> 	bus = pdev->bus;
> 	hose = pci_bus_to_host(bus);
>@@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>
> 		pnv_pci_ioda2_setup_dma_pe(phb, pe);
> 	}
>-
>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>-		int   vf_group;
>-		int   vf_per_group;
>-		int   vf_index1;
>-
>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>-
>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
>-			for (vf_index = vf_group * vf_per_group;
>-			     vf_index < (vf_group + 1) * vf_per_group &&
>-			     vf_index < num_vfs;
>-			     vf_index++) {
>-				for (vf_index1 = vf_group * vf_per_group;
>-				     vf_index1 < (vf_group + 1) * vf_per_group &&
>-				     vf_index1 < num_vfs;
>-				     vf_index1++) {
>-
>-					rc = opal_pci_set_peltv(phb->opal_id,
>-						pdn->offset + vf_index,
>-						pdn->offset + vf_index1,
>-						OPAL_ADD_PE_TO_DOMAIN);
>-
>-					if (rc)
>-					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
>-						__func__,
>-						pdn->offset + vf_index1, rc);
>-				}
>-			}
>-		}
>-	}
> }
>
> int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>@@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 			return -ENOSPC;
> 		}
>
>+		/*
>+		 * When M64 BARs functions in Single PE mode, the number of VFs
>+		 * could be enabled must be less than the number of M64 BARs.
>+		 */
>+		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {
>+			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
>+			return -EBUSY;
>+		}
>+
> 		/* Calculate available PE for required VFs */
> 		mutex_lock(&phb->ioda.pe_alloc_mutex);
> 		pdn->offset = bitmap_find_next_zero_area(
>@@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 		 * the IOV BAR according to the PE# allocated to the VFs.
> 		 * Otherwise, the PE# for the VF will conflict with others.
> 		 */
>-		if (pdn->m64_per_iov == 1) {
>+		if (!pdn->m64_single_mode) {
> 			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
> 			if (ret)
> 				goto m64_failed;
>@@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 	/* Allocate PCI data */
> 	add_dev_pci_data(pdev);
>
>-	pnv_pci_sriov_enable(pdev, num_vfs);
>-	return 0;
>+	return pnv_pci_sriov_enable(pdev, num_vfs);
> }
> #endif /* CONFIG_PCI_IOV */
>
>@@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>
> 	pdn = pci_get_pdn(pdev);
> 	pdn->vfs_expanded = 0;
>+	pdn->m64_single_mode = false;
>
> 	total_vfs = pci_sriov_get_totalvfs(pdev);
>-	pdn->m64_per_iov = 1;
> 	mul = phb->ioda.total_pe;
>
> 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>@@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> 		if (size > (1 << 26)) {
> 			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
> 				 i, res);
>-			pdn->m64_per_iov = M64_PER_IOV;
> 			mul = roundup_pow_of_two(total_vfs);
>+			pdn->m64_single_mode = true;
> 			break;
> 		}
> 	}
>@@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
> static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
> 						      int resno)
> {
>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>+	struct pnv_phb *phb = hose->private_data;
> 	struct pci_dn *pdn = pci_get_pdn(pdev);
> 	resource_size_t align;
>
>@@ -2995,12 +2947,26 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
> 	 * SR-IOV. While from hardware perspective, the range mapped by M64
> 	 * BAR should be size aligned.
> 	 *
>+	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
>+	 * powernv-specific hardware restriction is gone. But if just use the
>+	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
>+	 * in one segment of M64 #15, which introduces the PE conflict between
>+	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
>+	 * m64_segsize.
>+	 *
> 	 * This function returns the total IOV BAR size if expanded or just the
>-	 * individual size if not.
>+	 * individual size if not, when M64 BAR is in Shared PE mode.
>+	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
>+	 * M64 segment size if IOV BAR size is less.
> 	 */
> 	align = pci_iov_resource_size(pdev, resno);
>-	if (pdn->vfs_expanded)
>-		return pdn->vfs_expanded * align;
>+	if (pdn->vfs_expanded) {
>+		if (pdn->m64_single_mode)
>+			return max(align,
>+				(resource_size_t)phb->ioda.m64_segsize);
>+		else
>+			return pdn->vfs_expanded * align;
>+	}
>
> 	return align;
> }
>-- 
>1.7.9.5
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode
  2015-08-19  2:01 ` [PATCH V4 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode Wei Yang
@ 2015-08-19  2:21   ` Gavin Shan
  2015-10-02 10:05   ` Alexey Kardashevskiy
  1 sibling, 0 replies; 23+ messages in thread
From: Gavin Shan @ 2015-08-19  2:21 UTC (permalink / raw)
  To: Wei Yang; +Cc: aik, gwshan, benh, linuxppc-dev

On Wed, Aug 19, 2015 at 10:01:44AM +0800, Wei Yang wrote:
>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
>sparse.
>
>This patch restructures the patch to allocate sparse PE# for VFs when M64
>BAR is set to Single PE mode. Also it rename the offset to pe_num_map to
>reflect the content is the PE number.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>

Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

>---
> arch/powerpc/include/asm/pci-bridge.h     |    2 +-
> arch/powerpc/platforms/powernv/pci-ioda.c |   79 ++++++++++++++++++++++-------
> 2 files changed, 61 insertions(+), 20 deletions(-)
>
>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>index 8aeba4c..b3a226b 100644
>--- a/arch/powerpc/include/asm/pci-bridge.h
>+++ b/arch/powerpc/include/asm/pci-bridge.h
>@@ -213,7 +213,7 @@ struct pci_dn {
> #ifdef CONFIG_PCI_IOV
> 	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
> 	u16     num_vfs;		/* number of VFs enabled*/
>-	int     offset;			/* PE# for the first VF PE */
>+	int     *pe_num_map;		/* PE# for the first VF PE or array */
> 	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
> #define IODA_INVALID_M64        (-1)
> 	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 4bc83b8..779f52a 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1243,7 +1243,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>
> 			/* Map the M64 here */
> 			if (pdn->m64_single_mode) {
>-				pe_num = pdn->offset + j;
>+				pe_num = pdn->pe_num_map[j];
> 				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
> 						pe_num, OPAL_M64_WINDOW_TYPE,
> 						pdn->m64_map[j][i], 0);
>@@ -1347,7 +1347,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
> 	struct pnv_phb        *phb;
> 	struct pci_dn         *pdn;
> 	struct pci_sriov      *iov;
>-	u16 num_vfs;
>+	u16                    num_vfs, i;
>
> 	bus = pdev->bus;
> 	hose = pci_bus_to_host(bus);
>@@ -1361,14 +1361,21 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>
> 	if (phb->type == PNV_PHB_IODA2) {
> 		if (!pdn->m64_single_mode)
>-			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>+			pnv_pci_vf_resource_shift(pdev, -*pdn->pe_num_map);
>
> 		/* Release M64 windows */
> 		pnv_pci_vf_release_m64(pdev, num_vfs);
>
> 		/* Release PE numbers */
>-		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>-		pdn->offset = 0;
>+		if (pdn->m64_single_mode) {
>+			for (i = 0; i < num_vfs; i++) {
>+				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
>+					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
>+			}
>+		} else
>+			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>+		/* Releasing pe_num_map */
>+		kfree(pdn->pe_num_map);
> 	}
> }
>
>@@ -1394,7 +1401,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>
> 	/* Reserve PE for each VF */
> 	for (vf_index = 0; vf_index < num_vfs; vf_index++) {
>-		pe_num = pdn->offset + vf_index;
>+		if (pdn->m64_single_mode)
>+			pe_num = pdn->pe_num_map[vf_index];
>+		else
>+			pe_num = *pdn->pe_num_map + vf_index;
>
> 		pe = &phb->ioda.pe_array[pe_num];
> 		pe->pe_number = pe_num;
>@@ -1436,6 +1446,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 	struct pnv_phb        *phb;
> 	struct pci_dn         *pdn;
> 	int                    ret;
>+	u16                    i;
>
> 	bus = pdev->bus;
> 	hose = pci_bus_to_host(bus);
>@@ -1458,20 +1469,42 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 			return -EBUSY;
> 		}
>
>+		/* Allocating pe_num_map */
>+		if (pdn->m64_single_mode)
>+			pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map) * num_vfs,
>+					GFP_KERNEL);
>+		else
>+			pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map), GFP_KERNEL);
>+
>+		if (!pdn->pe_num_map)
>+			return -ENOMEM;
>+
> 		/* Calculate available PE for required VFs */
>-		mutex_lock(&phb->ioda.pe_alloc_mutex);
>-		pdn->offset = bitmap_find_next_zero_area(
>-			phb->ioda.pe_alloc, phb->ioda.total_pe,
>-			0, num_vfs, 0);
>-		if (pdn->offset >= phb->ioda.total_pe) {
>+		if (pdn->m64_single_mode) {
>+			for (i = 0; i < num_vfs; i++)
>+				pdn->pe_num_map[i] = IODA_INVALID_PE;
>+			for (i = 0; i < num_vfs; i++) {
>+				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
>+				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
>+					ret = -EBUSY;
>+					goto m64_failed;
>+				}
>+			}
>+		} else {
>+			mutex_lock(&phb->ioda.pe_alloc_mutex);
>+			*pdn->pe_num_map = bitmap_find_next_zero_area(
>+				phb->ioda.pe_alloc, phb->ioda.total_pe,
>+				0, num_vfs, 0);
>+			if (*pdn->pe_num_map >= phb->ioda.total_pe) {
>+				mutex_unlock(&phb->ioda.pe_alloc_mutex);
>+				dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
>+				kfree(pdn->pe_num_map);
>+				return -EBUSY;
>+			}
>+			bitmap_set(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
> 			mutex_unlock(&phb->ioda.pe_alloc_mutex);
>-			dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
>-			pdn->offset = 0;
>-			return -EBUSY;
> 		}
>-		bitmap_set(phb->ioda.pe_alloc, pdn->offset, num_vfs);
> 		pdn->num_vfs = num_vfs;
>-		mutex_unlock(&phb->ioda.pe_alloc_mutex);
>
> 		/* Assign M64 window accordingly */
> 		ret = pnv_pci_vf_assign_m64(pdev, num_vfs);
>@@ -1486,7 +1519,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 		 * Otherwise, the PE# for the VF will conflict with others.
> 		 */
> 		if (!pdn->m64_single_mode) {
>-			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
>+			ret = pnv_pci_vf_resource_shift(pdev, *pdn->pe_num_map);
> 			if (ret)
> 				goto m64_failed;
> 		}
>@@ -1498,8 +1531,16 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> 	return 0;
>
> m64_failed:
>-	bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>-	pdn->offset = 0;
>+	if (pdn->m64_single_mode) {
>+		for (i = 0; i < num_vfs; i++) {
>+			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
>+				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
>+		}
>+	} else
>+		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>+
>+	/* Releasing pe_num_map */
>+	kfree(pdn->pe_num_map);
>
> 	return ret;
> }
>-- 
>1.7.9.5
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 0/6] Redesign SR-IOV on PowerNV
  2015-08-19  2:01 [PATCH V4 0/6] Redesign SR-IOV on PowerNV Wei Yang
                   ` (5 preceding siblings ...)
  2015-08-19  2:01 ` [PATCH V4 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode Wei Yang
@ 2015-08-26  5:11 ` Alexey Kardashevskiy
  2015-08-26  8:06   ` Alexey Kardashevskiy
  2015-10-02 10:07 ` Alexey Kardashevskiy
  7 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-26  5:11 UTC (permalink / raw)
  To: Wei Yang, gwshan, benh; +Cc: linuxppc-dev

On 08/19/2015 12:01 PM, Wei Yang wrote:
> In original design, it tries to group VFs to enable more number of VFs in the
> system, when VF BAR is bigger than 64MB. This design has a flaw in which one
> error on a VF will interfere other VFs in the same group.
>
> This patch series change this design by using M64 BAR in Single PE mode to
> cover only one VF BAR. By doing so, it gives absolute isolation between VFs.

With or without this patchset, this fails with a horrible loop of EEHs:
rmmod mlx4_en mlx4_ib mlx4_core
modprobe mlx4_core num_vfs=4 probe_vf=4 port_type_array=2,2 debug_level=1

No guest is needed, just boot and do these commands. The EEH error is 
pointing to a broken DMA address. iommu=nobypass fixed it for 4 VFs case 
but when I try 16 VFs, none is created.

What is the correct base tree and what hardware did you use for the testing 
_exactly_?

Mine is "Ethernet controller: Mellanox Technologies MT27520 Family 
[ConnectX-3 Pro]" with 128MB BARs and that works (just double checked - can 
create all 16 VFs) with PowerKVM 3.1 so it is not a hardware issue.



>
> v4:
>     * rebase the code on top of v4.2-rc7
>     * switch back to use the dynamic version of pe_num_map and m64_map
>     * split the memory allocation and PE assignment of pe_num_map to make it
>       more easy to read
>     * check pe_num_map value before free PE.
>     * add the rename reason for pe_num_map and m64_map in change log
> v3:
>     * return -ENOSPC when a VF has non-64bit prefetchable BAR
>     * rename offset to pe_num_map and define it staticly
>     * change commit log based on comments
>     * define m64_map staticly
> v2:
>     * clean up iov bar alignment calculation
>     * change m64s to m64_bars
>     * add a field to represent M64 Single PE mode will be used
>     * change m64_wins to m64_map
>     * calculate the gate instead of hard coded
>     * dynamically allocate m64_map
>     * dynamically allocate PE#
>     * add a case to calculate iov bar alignment when M64 Single PE is used
>     * when M64 Single PE is used, compare num_vfs with M64 BAR available number
>       in system at first
>
>
>
> Wei Yang (6):
>    powerpc/powernv: don't enable SRIOV when VF BAR has non
>      64bit-prefetchable BAR
>    powerpc/powernv: simplify the calculation of iov resource alignment
>    powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
>    powerpc/powernv: replace the hard coded boundary with gate
>    powerpc/powernv: boundary the total VF BAR size instead of the
>      individual one
>    powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE
>      mode
>
>   arch/powerpc/include/asm/pci-bridge.h     |    7 +-
>   arch/powerpc/platforms/powernv/pci-ioda.c |  328 +++++++++++++++--------------
>   2 files changed, 175 insertions(+), 160 deletions(-)
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 0/6] Redesign SR-IOV on PowerNV
  2015-08-26  5:11 ` [PATCH V4 0/6] Redesign SR-IOV on PowerNV Alexey Kardashevskiy
@ 2015-08-26  8:06   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 23+ messages in thread
From: Alexey Kardashevskiy @ 2015-08-26  8:06 UTC (permalink / raw)
  To: Wei Yang, gwshan, benh; +Cc: linuxppc-dev, Michael Ellerman

On 08/26/2015 03:11 PM, Alexey Kardashevskiy wrote:
> On 08/19/2015 12:01 PM, Wei Yang wrote:
>> In original design, it tries to group VFs to enable more number of VFs in
>> the
>> system, when VF BAR is bigger than 64MB. This design has a flaw in which one
>> error on a VF will interfere other VFs in the same group.
>>
>> This patch series change this design by using M64 BAR in Single PE mode to
>> cover only one VF BAR. By doing so, it gives absolute isolation between VFs.
>
> With or without this patchset, this fails with a horrible loop of EEHs:
> rmmod mlx4_en mlx4_ib mlx4_core
> modprobe mlx4_core num_vfs=4 probe_vf=4 port_type_array=2,2 debug_level=1
>
> No guest is needed, just boot and do these commands. The EEH error is
> pointing to a broken DMA address. iommu=nobypass fixed it for 4 VFs case
> but when I try 16 VFs, none is created.
>
> What is the correct base tree and what hardware did you use for the testing
> _exactly_?
>
> Mine is "Ethernet controller: Mellanox Technologies MT27520 Family
> [ConnectX-3 Pro]" with 128MB BARs and that works (just double checked - can
> create all 16 VFs) with PowerKVM 3.1 so it is not a hardware issue.


This turned out to be the powerpc/next tree problem, not this patch or 
anything related to SRIOV. Debugging now...



-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-08-19  2:01 ` [PATCH V4 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
@ 2015-10-02  8:55   ` Alexey Kardashevskiy
  2015-10-08  6:29     ` Wei Yang
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-02  8:55 UTC (permalink / raw)
  To: Wei Yang, gwshan, benh; +Cc: linuxppc-dev

On 08/19/2015 12:01 PM, Wei Yang wrote:
> On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
> a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
> from 64bit prefetchable window, which means M64 BAR can't work on it.


Please change the commit log to explain what limit came from where.
Something like:

PCI bridges support only 2 windows and the kernel code programs bridges in 
the way that one window is 32bit-nonprefetchable and another one is 
64bit-prefetchable. So if devices' IOV BAR is 64bit and non-prefetchable, 
it will be mapped into 32bit space and therefore M64 cannot be used for it.


>
> This patch makes this explicit.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c |   25 +++++++++----------------
>   1 file changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 85cbc96..8c031b5 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>   		if (!res->flags || !res->parent)
>   			continue;
>
> -		if (!pnv_pci_is_mem_pref_64(res->flags))
> -			continue;
> -
>   		/*
>   		 * The actual IOV BAR range is determined by the start address
>   		 * and the actual size for num_vfs VFs BAR.  This check is to
> @@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>   		if (!res->flags || !res->parent)
>   			continue;
>
> -		if (!pnv_pci_is_mem_pref_64(res->flags))
> -			continue;
> -
>   		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>   		res2 = *res;
>   		res->start += size * offset;
> @@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>   		if (!res->flags || !res->parent)
>   			continue;
>
> -		if (!pnv_pci_is_mem_pref_64(res->flags))
> -			continue;
> -
>   		for (j = 0; j < vf_groups; j++) {
>   			do {
>   				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
> @@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   	pdn = pci_get_pdn(pdev);
>
>   	if (phb->type == PNV_PHB_IODA2) {
> +		if (!pdn->vfs_expanded) {

The patch claims it does make the limitation explicit but it is not clear 
at all how to trace from vfs_expanded==0 to "non 64bit-prefetchable IOV BAR".


> +			dev_info(&pdev->dev, "don't support this SRIOV device"
> +				" with non 64bit-prefetchable IOV BAR\n");
> +			return -ENOSPC;
> +		}
> +
>   		/* Calculate available PE for required VFs */
>   		mutex_lock(&phb->ioda.pe_alloc_mutex);
>   		pdn->offset = bitmap_find_next_zero_area(
> @@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>   		if (!res->flags || res->parent)
>   			continue;
>   		if (!pnv_pci_is_mem_pref_64(res->flags)) {
> -			dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
> +			dev_warn(&pdev->dev, "Don't support SR-IOV with"
> +					" non M64 VF BAR%d: %pR. \n",
>   				 i, res);
> -			continue;
> +			return;
>   		}
>
>   		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
> @@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>   		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>   		if (!res->flags || res->parent)
>   			continue;
> -		if (!pnv_pci_is_mem_pref_64(res->flags)) {


And this check was quite clear. I'd keep this one.


> -			dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
> -				 i, res);
> -			continue;
> -		}
>
>   		dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
>   		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 2/6] powerpc/powernv: simplify the calculation of iov resource alignment
  2015-08-19  2:01 ` [PATCH V4 2/6] powerpc/powernv: simplify the calculation of iov resource alignment Wei Yang
@ 2015-10-02  8:58   ` Alexey Kardashevskiy
  2015-10-08  6:39     ` Wei Yang
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-02  8:58 UTC (permalink / raw)
  To: Wei Yang, gwshan, benh; +Cc: linuxppc-dev

On 08/19/2015 12:01 PM, Wei Yang wrote:
> The alignment of IOV BAR on PowerNV platform is the total size of the IOV
> BAR. No matter whether the IOV BAR is extended with number of
> roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
> size could be calculated by (vfs_expanded * VF_BAR_size).
>
> This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
> first case.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c |   14 +++++++++-----
>   1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 8c031b5..e3e0acb 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2988,12 +2988,16 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>   						      int resno)
>   {
>   	struct pci_dn *pdn = pci_get_pdn(pdev);
> -	resource_size_t align, iov_align;
> -
> -	iov_align = resource_size(&pdev->resource[resno]);
> -	if (iov_align)
> -		return iov_align;
> +	resource_size_t align;
>
> +	/*
> +	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
> +	 * SR-IOV. While from hardware perspective, the range mapped by M64
> +	 * BAR should be size aligned.


Out of curiosity - IOV BAR does NOT have to be aligned on other platforms?


> +	 *
> +	 * This function returns the total IOV BAR size if expanded or just the
> +	 * individual size if not.

Expanded vs. non-expanded means "using shared M64" (when it is split by 256 
segments) vs. "using entire M64"?


> +	 */
>   	align = pci_iov_resource_size(pdev, resno);
>   	if (pdn->vfs_expanded)
>   		return pdn->vfs_expanded * align;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-08-19  2:01 ` [PATCH V4 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR Wei Yang
  2015-08-19  2:21   ` Gavin Shan
@ 2015-10-02  9:29   ` Alexey Kardashevskiy
  2015-10-08  7:06     ` Wei Yang
  1 sibling, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-02  9:29 UTC (permalink / raw)
  To: Wei Yang, gwshan, benh; +Cc: linuxppc-dev

On 08/19/2015 12:01 PM, Wei Yang wrote:
> In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
> BARs in Single PE mode to cover the number of VFs required to be enabled.
> By doing so, several VFs would be in one VF Group and leads to interference
> between VFs in the same group.
>
> And in this patch, m64_wins is renamed to m64_map, which means index number
> of the M64 BAR used to map the VF BAR. Based on Gavin's comments.
>
> This patch changes the design by using one M64 BAR in Single PE mode for
> one VF BAR. This gives absolute isolation for VFs.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/pci-bridge.h     |    5 +-
>   arch/powerpc/platforms/powernv/pci-ioda.c |  178 ++++++++++++-----------------
>   2 files changed, 74 insertions(+), 109 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index 712add5..8aeba4c 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -214,10 +214,9 @@ struct pci_dn {
>   	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>   	u16     num_vfs;		/* number of VFs enabled*/
>   	int     offset;			/* PE# for the first VF PE */
> -#define M64_PER_IOV 4
> -	int     m64_per_iov;
> +	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
>   #define IODA_INVALID_M64        (-1)
> -	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
> +	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>   #endif /* CONFIG_PCI_IOV */
>   #endif
>   	struct list_head child_list;
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index e3e0acb..de7db1d 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
>   }
>
>   #ifdef CONFIG_PCI_IOV
> -static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
> +static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>   {
>   	struct pci_bus        *bus;
>   	struct pci_controller *hose;
>   	struct pnv_phb        *phb;
>   	struct pci_dn         *pdn;
>   	int                    i, j;
> +	int                    m64_bars;
>
>   	bus = pdev->bus;
>   	hose = pci_bus_to_host(bus);
>   	phb = hose->private_data;
>   	pdn = pci_get_pdn(pdev);
>
> +	if (pdn->m64_single_mode)
> +		m64_bars = num_vfs;
> +	else
> +		m64_bars = 1;
> +
>   	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
> -		for (j = 0; j < M64_PER_IOV; j++) {
> -			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
> +		for (j = 0; j < m64_bars; j++) {
> +			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
>   				continue;
>   			opal_pci_phb_mmio_enable(phb->opal_id,
> -				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
> -			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
> -			pdn->m64_wins[i][j] = IODA_INVALID_M64;
> +				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
> +			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
> +			pdn->m64_map[j][i] = IODA_INVALID_M64;
>   		}
>
> +	kfree(pdn->m64_map);
>   	return 0;
>   }
>
> @@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>   	int                    total_vfs;
>   	resource_size_t        size, start;
>   	int                    pe_num;
> -	int                    vf_groups;
> -	int                    vf_per_group;
> +	int                    m64_bars;
>
>   	bus = pdev->bus;
>   	hose = pci_bus_to_host(bus);
> @@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>   	pdn = pci_get_pdn(pdev);
>   	total_vfs = pci_sriov_get_totalvfs(pdev);
>
> -	/* Initialize the m64_wins to IODA_INVALID_M64 */
> -	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
> -		for (j = 0; j < M64_PER_IOV; j++)
> -			pdn->m64_wins[i][j] = IODA_INVALID_M64;
> +	if (pdn->m64_single_mode)
> +		m64_bars = num_vfs;
> +	else
> +		m64_bars = 1;
> +
> +	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
> +	if (!pdn->m64_map)
> +		return -ENOMEM;
> +	/* Initialize the m64_map to IODA_INVALID_M64 */
> +	for (i = 0; i < m64_bars ; i++)
> +		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
> +			pdn->m64_map[i][j] = IODA_INVALID_M64;
>
> -	if (pdn->m64_per_iov == M64_PER_IOV) {
> -		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
> -		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
> -			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
> -	} else {
> -		vf_groups = 1;
> -		vf_per_group = 1;
> -	}
>
>   	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>   		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>   		if (!res->flags || !res->parent)
>   			continue;
>
> -		for (j = 0; j < vf_groups; j++) {
> +		for (j = 0; j < m64_bars; j++) {
>   			do {
>   				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>   						phb->ioda.m64_bar_idx + 1, 0);
> @@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>   					goto m64_failed;
>   			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
>
> -			pdn->m64_wins[i][j] = win;
> +			pdn->m64_map[j][i] = win;
>
> -			if (pdn->m64_per_iov == M64_PER_IOV) {
> +			if (pdn->m64_single_mode) {
>   				size = pci_iov_resource_size(pdev,
>   							PCI_IOV_RESOURCES + i);
> -				size = size * vf_per_group;
>   				start = res->start + size * j;
>   			} else {
>   				size = resource_size(res);
> @@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>   			}
>
>   			/* Map the M64 here */
> -			if (pdn->m64_per_iov == M64_PER_IOV) {
> +			if (pdn->m64_single_mode) {
>   				pe_num = pdn->offset + j;
>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>   						pe_num, OPAL_M64_WINDOW_TYPE,
> -						pdn->m64_wins[i][j], 0);
> +						pdn->m64_map[j][i], 0);
>   			}
>
>   			rc = opal_pci_set_phb_mem_window(phb->opal_id,
>   						 OPAL_M64_WINDOW_TYPE,
> -						 pdn->m64_wins[i][j],
> +						 pdn->m64_map[j][i],
>   						 start,
>   						 0, /* unused */
>   						 size);
> @@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>   				goto m64_failed;
>   			}
>
> -			if (pdn->m64_per_iov == M64_PER_IOV)
> +			if (pdn->m64_single_mode)
>   				rc = opal_pci_phb_mmio_enable(phb->opal_id,
> -				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
> +				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
>   			else
>   				rc = opal_pci_phb_mmio_enable(phb->opal_id,
> -				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
> +				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
>
>   			if (rc != OPAL_SUCCESS) {
>   				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
> @@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>   	return 0;
>
>   m64_failed:
> -	pnv_pci_vf_release_m64(pdev);
> +	pnv_pci_vf_release_m64(pdev, num_vfs);
>   	return -EBUSY;
>   }
>
> @@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
>   	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>   }
>
> -static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
> +static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>   {
>   	struct pci_bus        *bus;
>   	struct pci_controller *hose;
>   	struct pnv_phb        *phb;
>   	struct pnv_ioda_pe    *pe, *pe_n;
>   	struct pci_dn         *pdn;
> -	u16                    vf_index;
> -	int64_t                rc;
>
>   	bus = pdev->bus;
>   	hose = pci_bus_to_host(bus);
> @@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>   	if (!pdev->is_physfn)
>   		return;
>
> -	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
> -		int   vf_group;
> -		int   vf_per_group;
> -		int   vf_index1;
> -
> -		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
> -
> -		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
> -			for (vf_index = vf_group * vf_per_group;
> -				vf_index < (vf_group + 1) * vf_per_group &&
> -				vf_index < num_vfs;
> -				vf_index++)
> -				for (vf_index1 = vf_group * vf_per_group;
> -					vf_index1 < (vf_group + 1) * vf_per_group &&
> -					vf_index1 < num_vfs;
> -					vf_index1++){
> -
> -					rc = opal_pci_set_peltv(phb->opal_id,
> -						pdn->offset + vf_index,
> -						pdn->offset + vf_index1,
> -						OPAL_REMOVE_PE_FROM_DOMAIN);
> -
> -					if (rc)
> -					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
> -						__func__,
> -						pdn->offset + vf_index1, rc);
> -				}
> -	}
> -
>   	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
>   		if (pe->parent_dev != pdev)
>   			continue;
> @@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>   	num_vfs = pdn->num_vfs;
>
>   	/* Release VF PEs */
> -	pnv_ioda_release_vf_PE(pdev, num_vfs);
> +	pnv_ioda_release_vf_PE(pdev);
>
>   	if (phb->type == PNV_PHB_IODA2) {
> -		if (pdn->m64_per_iov == 1)
> +		if (!pdn->m64_single_mode)
>   			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>
>   		/* Release M64 windows */
> -		pnv_pci_vf_release_m64(pdev);
> +		pnv_pci_vf_release_m64(pdev, num_vfs);
>
>   		/* Release PE numbers */
>   		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
> @@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>   	int                    pe_num;
>   	u16                    vf_index;
>   	struct pci_dn         *pdn;
> -	int64_t                rc;
>
>   	bus = pdev->bus;
>   	hose = pci_bus_to_host(bus);
> @@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>
>   		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>   	}
> -
> -	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
> -		int   vf_group;
> -		int   vf_per_group;
> -		int   vf_index1;
> -
> -		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
> -
> -		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
> -			for (vf_index = vf_group * vf_per_group;
> -			     vf_index < (vf_group + 1) * vf_per_group &&
> -			     vf_index < num_vfs;
> -			     vf_index++) {
> -				for (vf_index1 = vf_group * vf_per_group;
> -				     vf_index1 < (vf_group + 1) * vf_per_group &&
> -				     vf_index1 < num_vfs;
> -				     vf_index1++) {
> -
> -					rc = opal_pci_set_peltv(phb->opal_id,
> -						pdn->offset + vf_index,
> -						pdn->offset + vf_index1,
> -						OPAL_ADD_PE_TO_DOMAIN);
> -
> -					if (rc)
> -					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
> -						__func__,
> -						pdn->offset + vf_index1, rc);
> -				}
> -			}
> -		}
> -	}
>   }
>
>   int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
> @@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   			return -ENOSPC;
>   		}
>
> +		/*
> +		 * When M64 BARs functions in Single PE mode, the number of VFs
> +		 * could be enabled must be less than the number of M64 BARs.
> +		 */
> +		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {


m64_bar_idx is not really the maximum number of M64 BARs, you program it to 
store the latest number but after some future rework/hardware update you 
might want to change it to be not the last M64 BAR and you will have to 
update this check as well.

Instead of doing:
arch/powerpc/platforms/powernv/pci-ioda.c|374| phb->ioda.m64_bar_idx = 15;
do
#define PCI_IODA2_M64_BAR_NUM 15
and use new macro everywhere.
Or read from OPAL if it tells about it.



> +			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
> +			return -EBUSY;
> +		}
> +
>   		/* Calculate available PE for required VFs */
>   		mutex_lock(&phb->ioda.pe_alloc_mutex);
>   		pdn->offset = bitmap_find_next_zero_area(
> @@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   		 * the IOV BAR according to the PE# allocated to the VFs.
>   		 * Otherwise, the PE# for the VF will conflict with others.
>   		 */
> -		if (pdn->m64_per_iov == 1) {
> +		if (!pdn->m64_single_mode) {
>   			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
>   			if (ret)
>   				goto m64_failed;
> @@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   	/* Allocate PCI data */
>   	add_dev_pci_data(pdev);
>
> -	pnv_pci_sriov_enable(pdev, num_vfs);
> -	return 0;
> +	return pnv_pci_sriov_enable(pdev, num_vfs);
>   }
>   #endif /* CONFIG_PCI_IOV */
>
> @@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>
>   	pdn = pci_get_pdn(pdev);
>   	pdn->vfs_expanded = 0;
> +	pdn->m64_single_mode = false;
>
>   	total_vfs = pci_sriov_get_totalvfs(pdev);
> -	pdn->m64_per_iov = 1;
>   	mul = phb->ioda.total_pe;
>
>   	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> @@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>   		if (size > (1 << 26)) {
>   			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
>   				 i, res);
> -			pdn->m64_per_iov = M64_PER_IOV;
>   			mul = roundup_pow_of_two(total_vfs);
> +			pdn->m64_single_mode = true;
>   			break;
>   		}
>   	}
> @@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>   static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>   						      int resno)
>   {
> +	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> +	struct pnv_phb *phb = hose->private_data;
>   	struct pci_dn *pdn = pci_get_pdn(pdev);
>   	resource_size_t align;
>
> @@ -2995,12 +2947,26 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>   	 * SR-IOV. While from hardware perspective, the range mapped by M64
>   	 * BAR should be size aligned.
>   	 *
> +	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
> +	 * powernv-specific hardware restriction is gone. But if just use the
> +	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
> +	 * in one segment of M64 #15, which introduces the PE conflict between
> +	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
> +	 * m64_segsize.
> +	 *
>   	 * This function returns the total IOV BAR size if expanded or just the
> -	 * individual size if not.
> +	 * individual size if not, when M64 BAR is in Shared PE mode.
> +	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
> +	 * M64 segment size if IOV BAR size is less.

"Expanded" == "non-shared M64 BAR"? I'd stick to the "non-shared".


>   	 */
>   	align = pci_iov_resource_size(pdev, resno);
> -	if (pdn->vfs_expanded)
> -		return pdn->vfs_expanded * align;
> +	if (pdn->vfs_expanded) {
> +		if (pdn->m64_single_mode)
> +			return max(align,
> +				(resource_size_t)phb->ioda.m64_segsize);
> +		else
> +			return pdn->vfs_expanded * align;
> +	}
>
>   	return align;
>   }
>

	if (!pdn->vfs_expanded)
		return align;

	if (pdn->m64_single_mode)
		return max(align, (resource_size_t)phb->ioda.m64_segsize);

	return pdn->vfs_expanded * align;

Fewer braces/indents/line wraps :)



-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one
  2015-08-19  2:01 ` [PATCH V4 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one Wei Yang
@ 2015-10-02  9:51   ` Alexey Kardashevskiy
  2015-10-08  7:13     ` Wei Yang
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-02  9:51 UTC (permalink / raw)
  To: Wei Yang, gwshan, benh; +Cc: linuxppc-dev

On 08/19/2015 12:01 PM, Wei Yang wrote:
> Each VF could have 6 BARs at most. When the total BAR size exceeds the
> gate, after expanding it will also exhaust the M64 Window.
>
> This patch limits the boundary by checking the total VF BAR size instead of
> the individual BAR.

The gate is the biggest segment size in PE in shared mode, right? And this 
is 64MB. Also, BARs with the same number of all VFs of the same physical 
adapter will be mapper contiguously (as one huge IOV BAR), for example, 2 
VFs, 2 BARs each, mapping will look like:
VF0-BAR0, VF1-BAR0, VF0-BAR1, VF1-BAR1
but not like this:
VF0-BAR0, VF0-BAR1, VF1-BAR0, VF1-BAR1
Is this correct?



>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c |   14 ++++++++------
>   1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index b8bc51f..4bc83b8 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2701,7 +2701,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>   	const resource_size_t gate = phb->ioda.m64_segsize >> 2;
>   	struct resource *res;
>   	int i;
> -	resource_size_t size;
> +	resource_size_t size, total_vf_bar_sz;
>   	struct pci_dn *pdn;
>   	int mul, total_vfs;
>
> @@ -2714,6 +2714,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>
>   	total_vfs = pci_sriov_get_totalvfs(pdev);
>   	mul = phb->ioda.total_pe;
> +	total_vf_bar_sz = 0;
>
>   	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>   		res = &pdev->resource[i + PCI_IOV_RESOURCES];
> @@ -2726,7 +2727,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>   			return;
>   		}
>
> -		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
> +		total_vf_bar_sz += pci_iov_resource_size(pdev,
> +				i + PCI_IOV_RESOURCES);


Is @pdev a physical device in this context? I assume it is so 
pci_iov_resource_size() returns the entire IOV BAR size.
For example, I have a Mellanox card with 16 VFs, each has a single 32MB BAR 
so total_vf_bar_sz will be 16*32=512MB and this will exceed the @gate size 
and we end up having m64_single_mode = true. What do I miss here?


>
>   		/*
>   		 * If bigger than quarter of M64 segment size, just round up
> @@ -2740,11 +2742,11 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>   		 * limit the system flexibility.  This is a design decision to
>   		 * set the boundary to quarter of the M64 segment size.
>   		 */
> -		if (size > gate) {
> -			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size "
> -				"is bigger than %lld, roundup power2\n",
> -				 i, res, gate);
> +		if (total_vf_bar_sz > gate) {
>   			mul = roundup_pow_of_two(total_vfs);
> +			dev_info(&pdev->dev,
> +				"VF BAR Total IOV size %llx > %llx, roundup to %d VFs\n",
> +				total_vf_bar_sz, gate, mul);
>   			pdn->m64_single_mode = true;
>   			break;
>   		}
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode
  2015-08-19  2:01 ` [PATCH V4 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode Wei Yang
  2015-08-19  2:21   ` Gavin Shan
@ 2015-10-02 10:05   ` Alexey Kardashevskiy
  2015-10-08  7:19     ` Wei Yang
  1 sibling, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-02 10:05 UTC (permalink / raw)
  To: Wei Yang, gwshan, benh; +Cc: linuxppc-dev

On 08/19/2015 12:01 PM, Wei Yang wrote:
> When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
> sparse.
>
> This patch restructures the patch to allocate sparse PE# for VFs when M64

This patch restructures the code ;)


> BAR is set to Single PE mode. Also it rename the offset to pe_num_map to
> reflect the content is the PE number.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/pci-bridge.h     |    2 +-
>   arch/powerpc/platforms/powernv/pci-ioda.c |   79 ++++++++++++++++++++++-------
>   2 files changed, 61 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index 8aeba4c..b3a226b 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -213,7 +213,7 @@ struct pci_dn {
>   #ifdef CONFIG_PCI_IOV
>   	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>   	u16     num_vfs;		/* number of VFs enabled*/
> -	int     offset;			/* PE# for the first VF PE */
> +	int     *pe_num_map;		/* PE# for the first VF PE or array */
>   	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
>   #define IODA_INVALID_M64        (-1)
>   	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 4bc83b8..779f52a 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1243,7 +1243,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>
>   			/* Map the M64 here */
>   			if (pdn->m64_single_mode) {
> -				pe_num = pdn->offset + j;
> +				pe_num = pdn->pe_num_map[j];
>   				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>   						pe_num, OPAL_M64_WINDOW_TYPE,
>   						pdn->m64_map[j][i], 0);
> @@ -1347,7 +1347,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>   	struct pnv_phb        *phb;
>   	struct pci_dn         *pdn;
>   	struct pci_sriov      *iov;
> -	u16 num_vfs;
> +	u16                    num_vfs, i;
>
>   	bus = pdev->bus;
>   	hose = pci_bus_to_host(bus);
> @@ -1361,14 +1361,21 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>
>   	if (phb->type == PNV_PHB_IODA2) {
>   		if (!pdn->m64_single_mode)
> -			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
> +			pnv_pci_vf_resource_shift(pdev, -*pdn->pe_num_map);
>
>   		/* Release M64 windows */
>   		pnv_pci_vf_release_m64(pdev, num_vfs);
>
>   		/* Release PE numbers */
> -		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
> -		pdn->offset = 0;
> +		if (pdn->m64_single_mode) {
> +			for (i = 0; i < num_vfs; i++) {
> +				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
> +					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
> +			}
> +		} else
> +			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
> +		/* Releasing pe_num_map */
> +		kfree(pdn->pe_num_map);
>   	}
>   }
>
> @@ -1394,7 +1401,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>
>   	/* Reserve PE for each VF */
>   	for (vf_index = 0; vf_index < num_vfs; vf_index++) {
> -		pe_num = pdn->offset + vf_index;
> +		if (pdn->m64_single_mode)
> +			pe_num = pdn->pe_num_map[vf_index];
> +		else
> +			pe_num = *pdn->pe_num_map + vf_index;
>
>   		pe = &phb->ioda.pe_array[pe_num];
>   		pe->pe_number = pe_num;
> @@ -1436,6 +1446,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   	struct pnv_phb        *phb;
>   	struct pci_dn         *pdn;
>   	int                    ret;
> +	u16                    i;
>
>   	bus = pdev->bus;
>   	hose = pci_bus_to_host(bus);
> @@ -1458,20 +1469,42 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   			return -EBUSY;
>   		}
>
> +		/* Allocating pe_num_map */
> +		if (pdn->m64_single_mode)
> +			pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map) * num_vfs,
> +					GFP_KERNEL);
> +		else
> +			pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map), GFP_KERNEL);
> +
> +		if (!pdn->pe_num_map)
> +			return -ENOMEM;

[*]

> +
>   		/* Calculate available PE for required VFs */
> -		mutex_lock(&phb->ioda.pe_alloc_mutex);
> -		pdn->offset = bitmap_find_next_zero_area(
> -			phb->ioda.pe_alloc, phb->ioda.total_pe,
> -			0, num_vfs, 0);
> -		if (pdn->offset >= phb->ioda.total_pe) {
> +		if (pdn->m64_single_mode) {
> +			for (i = 0; i < num_vfs; i++)
> +				pdn->pe_num_map[i] = IODA_INVALID_PE;

It is cleaner to do such initialization right after the check that 
kmalloc() did not fail, i.e. at [*]


> +			for (i = 0; i < num_vfs; i++) {
> +				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
> +				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
> +					ret = -EBUSY;
> +					goto m64_failed;
> +				}
> +			}
> +		} else {
> +			mutex_lock(&phb->ioda.pe_alloc_mutex);
> +			*pdn->pe_num_map = bitmap_find_next_zero_area(
> +				phb->ioda.pe_alloc, phb->ioda.total_pe,
> +				0, num_vfs, 0);
> +			if (*pdn->pe_num_map >= phb->ioda.total_pe) {
> +				mutex_unlock(&phb->ioda.pe_alloc_mutex);
> +				dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
> +				kfree(pdn->pe_num_map);
> +				return -EBUSY;
> +			}
> +			bitmap_set(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>   			mutex_unlock(&phb->ioda.pe_alloc_mutex);
> -			dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
> -			pdn->offset = 0;
> -			return -EBUSY;
>   		}
> -		bitmap_set(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>   		pdn->num_vfs = num_vfs;
> -		mutex_unlock(&phb->ioda.pe_alloc_mutex);
>
>   		/* Assign M64 window accordingly */
>   		ret = pnv_pci_vf_assign_m64(pdev, num_vfs);
> @@ -1486,7 +1519,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   		 * Otherwise, the PE# for the VF will conflict with others.
>   		 */
>   		if (!pdn->m64_single_mode) {
> -			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
> +			ret = pnv_pci_vf_resource_shift(pdev, *pdn->pe_num_map);
>   			if (ret)
>   				goto m64_failed;
>   		}
> @@ -1498,8 +1531,16 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>   	return 0;
>
>   m64_failed:
> -	bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
> -	pdn->offset = 0;
> +	if (pdn->m64_single_mode) {
> +		for (i = 0; i < num_vfs; i++) {
> +			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
> +				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
> +		}
> +	} else
> +		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
> +
> +	/* Releasing pe_num_map */
> +	kfree(pdn->pe_num_map);
>
>   	return ret;
>   }
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 0/6] Redesign SR-IOV on PowerNV
  2015-08-19  2:01 [PATCH V4 0/6] Redesign SR-IOV on PowerNV Wei Yang
                   ` (6 preceding siblings ...)
  2015-08-26  5:11 ` [PATCH V4 0/6] Redesign SR-IOV on PowerNV Alexey Kardashevskiy
@ 2015-10-02 10:07 ` Alexey Kardashevskiy
  2015-10-07  2:43   ` Michael Ellerman
  7 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-02 10:07 UTC (permalink / raw)
  To: Wei Yang, gwshan, benh; +Cc: linuxppc-dev

On 08/19/2015 12:01 PM, Wei Yang wrote:
> In original design, it tries to group VFs to enable more number of VFs in the
> system, when VF BAR is bigger than 64MB. This design has a flaw in which one
> error on a VF will interfere other VFs in the same group.
>
> This patch series change this design by using M64 BAR in Single PE mode to
> cover only one VF BAR. By doing so, it gives absolute isolation between VFs.
>
> v4:
>     * rebase the code on top of v4.2-rc7
>     * switch back to use the dynamic version of pe_num_map and m64_map
>     * split the memory allocation and PE assignment of pe_num_map to make it
>       more easy to read
>     * check pe_num_map value before free PE.
>     * add the rename reason for pe_num_map and m64_map in change log
> v3:
>     * return -ENOSPC when a VF has non-64bit prefetchable BAR
>     * rename offset to pe_num_map and define it staticly
>     * change commit log based on comments
>     * define m64_map staticly
> v2:
>     * clean up iov bar alignment calculation
>     * change m64s to m64_bars
>     * add a field to represent M64 Single PE mode will be used
>     * change m64_wins to m64_map
>     * calculate the gate instead of hard coded
>     * dynamically allocate m64_map
>     * dynamically allocate PE#
>     * add a case to calculate iov bar alignment when M64 Single PE is used
>     * when M64 Single PE is used, compare num_vfs with M64 BAR available number
>       in system at first
>
>
>
> Wei Yang (6):
>    powerpc/powernv: don't enable SRIOV when VF BAR has non
>      64bit-prefetchable BAR
>    powerpc/powernv: simplify the calculation of iov resource alignment
>    powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
>    powerpc/powernv: replace the hard coded boundary with gate
>    powerpc/powernv: boundary the total VF BAR size instead of the
>      individual one
>    powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE
>      mode
>
>   arch/powerpc/include/asm/pci-bridge.h     |    7 +-
>   arch/powerpc/platforms/powernv/pci-ioda.c |  328 +++++++++++++++--------------
>   2 files changed, 175 insertions(+), 160 deletions(-)
>

I have posted few comments but in general the patchset makes things simpler 
by removing a compound PE and does not seem to make things worse so:

Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>



-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 0/6] Redesign SR-IOV on PowerNV
  2015-10-02 10:07 ` Alexey Kardashevskiy
@ 2015-10-07  2:43   ` Michael Ellerman
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2015-10-07  2:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Wei Yang, gwshan, benh, linuxppc-dev

On Fri, 2015-10-02 at 20:07 +1000, Alexey Kardashevskiy wrote:
> On 08/19/2015 12:01 PM, Wei Yang wrote:
> > In original design, it tries to group VFs to enable more number of VFs in the
> > system, when VF BAR is bigger than 64MB. This design has a flaw in which one
> > error on a VF will interfere other VFs in the same group.
> >
> > This patch series change this design by using M64 BAR in Single PE mode to
> > cover only one VF BAR. By doing so, it gives absolute isolation between VFs.
> >
> > Wei Yang (6):
> >    powerpc/powernv: don't enable SRIOV when VF BAR has non
> >      64bit-prefetchable BAR
> >    powerpc/powernv: simplify the calculation of iov resource alignment
> >    powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
> >    powerpc/powernv: replace the hard coded boundary with gate
> >    powerpc/powernv: boundary the total VF BAR size instead of the
> >      individual one
> >    powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE
> >      mode
> >
> >   arch/powerpc/include/asm/pci-bridge.h     |    7 +-
> >   arch/powerpc/platforms/powernv/pci-ioda.c |  328 +++++++++++++++--------------
> >   2 files changed, 175 insertions(+), 160 deletions(-)
> 
> I have posted few comments but in general the patchset makes things simpler 
> by removing a compound PE and does not seem to make things worse so:
> 
> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Thanks for reviewing it.

I'll wait for a v5 that incorporates your comments.

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
  2015-10-02  8:55   ` Alexey Kardashevskiy
@ 2015-10-08  6:29     ` Wei Yang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Yang @ 2015-10-08  6:29 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Wei Yang, gwshan, benh, linuxppc-dev

On Fri, Oct 02, 2015 at 06:55:10PM +1000, Alexey Kardashevskiy wrote:
>On 08/19/2015 12:01 PM, Wei Yang wrote:
>>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>>a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
>>from 64bit prefetchable window, which means M64 BAR can't work on it.
>
>
>Please change the commit log to explain what limit came from where.
>Something like:
>
>PCI bridges support only 2 windows and the kernel code programs bridges in
>the way that one window is 32bit-nonprefetchable and another one is
>64bit-prefetchable. So if devices' IOV BAR is 64bit and non-prefetchable, it
                   ^

Suggest to add "on powernv platform", since other platform could have a
32bit-prefetchable window.

>will be mapped into 32bit space and therefore M64 cannot be used for it.
>
>
>>
>>This patch makes this explicit.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c |   25 +++++++++----------------
>>  1 file changed, 9 insertions(+), 16 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 85cbc96..8c031b5 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>  		if (!res->flags || !res->parent)
>>  			continue;
>>
>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>-			continue;
>>-
>>  		/*
>>  		 * The actual IOV BAR range is determined by the start address
>>  		 * and the actual size for num_vfs VFs BAR.  This check is to
>>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>  		if (!res->flags || !res->parent)
>>  			continue;
>>
>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>-			continue;
>>-
>>  		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>>  		res2 = *res;
>>  		res->start += size * offset;
>>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>  		if (!res->flags || !res->parent)
>>  			continue;
>>
>>-		if (!pnv_pci_is_mem_pref_64(res->flags))
>>-			continue;
>>-
>>  		for (j = 0; j < vf_groups; j++) {
>>  			do {
>>  				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  	pdn = pci_get_pdn(pdev);
>>
>>  	if (phb->type == PNV_PHB_IODA2) {
>>+		if (!pdn->vfs_expanded) {
>
>The patch claims it does make the limitation explicit but it is not clear at
>all how to trace from vfs_expanded==0 to "non 64bit-prefetchable IOV BAR".
>

hmm... vfs_expanded is set in pnv_pci_ioda_fixup_iov_resources(), which is
executed before pnv_pci_sriov_enable(). If the PF han no 32bit-prefetchable
BAR, vfs_expanded will be set to a non-zero integer. Otherwise, it is left 0.

I agree this is not obvious. As you suggested, not counting on vfs_expanded,
we need to check each IOV BAR at this place with pnv_pci_is_mem_pref_64(). If
you think this is better, I will change it in next version.

>
>>+			dev_info(&pdev->dev, "don't support this SRIOV device"
>>+				" with non 64bit-prefetchable IOV BAR\n");
>>+			return -ENOSPC;
>>+		}
>>+
>>  		/* Calculate available PE for required VFs */
>>  		mutex_lock(&phb->ioda.pe_alloc_mutex);
>>  		pdn->offset = bitmap_find_next_zero_area(
>>@@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>  		if (!res->flags || res->parent)
>>  			continue;
>>  		if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>-			dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>>+			dev_warn(&pdev->dev, "Don't support SR-IOV with"
>>+					" non M64 VF BAR%d: %pR. \n",
>>  				 i, res);
>>-			continue;
>>+			return;
>>  		}
>>
>>  		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>@@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>  		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>  		if (!res->flags || res->parent)
>>  			continue;
>>-		if (!pnv_pci_is_mem_pref_64(res->flags)) {
>
>
>And this check was quite clear. I'd keep this one.
>
>
>>-			dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n",
>>-				 i, res);
>>-			continue;
>>-		}
>>
>>  		dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
>>  		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 2/6] powerpc/powernv: simplify the calculation of iov resource alignment
  2015-10-02  8:58   ` Alexey Kardashevskiy
@ 2015-10-08  6:39     ` Wei Yang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Yang @ 2015-10-08  6:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Wei Yang, gwshan, benh, linuxppc-dev

On Fri, Oct 02, 2015 at 06:58:09PM +1000, Alexey Kardashevskiy wrote:
>On 08/19/2015 12:01 PM, Wei Yang wrote:
>>The alignment of IOV BAR on PowerNV platform is the total size of the IOV
>>BAR. No matter whether the IOV BAR is extended with number of
>>roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
>>size could be calculated by (vfs_expanded * VF_BAR_size).
>>
>>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
>>first case.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c |   14 +++++++++-----
>>  1 file changed, 9 insertions(+), 5 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 8c031b5..e3e0acb 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2988,12 +2988,16 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>  						      int resno)
>>  {
>>  	struct pci_dn *pdn = pci_get_pdn(pdev);
>>-	resource_size_t align, iov_align;
>>-
>>-	iov_align = resource_size(&pdev->resource[resno]);
>>-	if (iov_align)
>>-		return iov_align;
>>+	resource_size_t align;
>>
>>+	/*
>>+	 * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
>>+	 * SR-IOV. While from hardware perspective, the range mapped by M64
>>+	 * BAR should be size aligned.
>
>
>Out of curiosity - IOV BAR does NOT have to be aligned on other platforms?
>

Quick answer, NO. On other platforms, the alignment applies to VF BAR size.

IOV BAR is composed with several VF BAR. 
    
    IOV BAR size = VF BAR size * total_vfs

And usually VF BAR size equals to one of the PF's BAR size. According to the
SPEC, IOV BAR should be VF BAR size aligned, instead of the IOV BAR size
aligned. This requirement is implicitly met, since PF's BAR alignment is took
into consideration.

The IOV BAR alignment is based on M64 BAR on powernv.

>
>>+	 *
>>+	 * This function returns the total IOV BAR size if expanded or just the
>>+	 * individual size if not.
>
>Expanded vs. non-expanded means "using shared M64" (when it is split by 256
>segments) vs. "using entire M64"?
>

Yes, you are right.

When we use "shared M64", IOV BAR should be aligned by its total size. Then we
need to return the total size.

When we use "entire M64", IOV BAR just need to be aligned with VF BAR size.
That's why just return individual size.

>
>>+	 */
>>  	align = pci_iov_resource_size(pdev, resno);
>>  	if (pdn->vfs_expanded)
>>  		return pdn->vfs_expanded * align;
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  2015-10-02  9:29   ` Alexey Kardashevskiy
@ 2015-10-08  7:06     ` Wei Yang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Yang @ 2015-10-08  7:06 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Wei Yang, gwshan, benh, linuxppc-dev

On Fri, Oct 02, 2015 at 07:29:29PM +1000, Alexey Kardashevskiy wrote:
>On 08/19/2015 12:01 PM, Wei Yang wrote:
>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>>BARs in Single PE mode to cover the number of VFs required to be enabled.
>>By doing so, several VFs would be in one VF Group and leads to interference
>>between VFs in the same group.
>>
>>And in this patch, m64_wins is renamed to m64_map, which means index number
>>of the M64 BAR used to map the VF BAR. Based on Gavin's comments.
>>
>>This patch changes the design by using one M64 BAR in Single PE mode for
>>one VF BAR. This gives absolute isolation for VFs.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/pci-bridge.h     |    5 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c |  178 ++++++++++++-----------------
>>  2 files changed, 74 insertions(+), 109 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index 712add5..8aeba4c 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -214,10 +214,9 @@ struct pci_dn {
>>  	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>>  	u16     num_vfs;		/* number of VFs enabled*/
>>  	int     offset;			/* PE# for the first VF PE */
>>-#define M64_PER_IOV 4
>>-	int     m64_per_iov;
>>+	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
>>  #define IODA_INVALID_M64        (-1)
>>-	int     m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>+	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>>  #endif /* CONFIG_PCI_IOV */
>>  #endif
>>  	struct list_head child_list;
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index e3e0acb..de7db1d 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
>>  }
>>
>>  #ifdef CONFIG_PCI_IOV
>>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>>  {
>>  	struct pci_bus        *bus;
>>  	struct pci_controller *hose;
>>  	struct pnv_phb        *phb;
>>  	struct pci_dn         *pdn;
>>  	int                    i, j;
>>+	int                    m64_bars;
>>
>>  	bus = pdev->bus;
>>  	hose = pci_bus_to_host(bus);
>>  	phb = hose->private_data;
>>  	pdn = pci_get_pdn(pdev);
>>
>>+	if (pdn->m64_single_mode)
>>+		m64_bars = num_vfs;
>>+	else
>>+		m64_bars = 1;
>>+
>>  	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>-		for (j = 0; j < M64_PER_IOV; j++) {
>>-			if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
>>+		for (j = 0; j < m64_bars; j++) {
>>+			if (pdn->m64_map[j][i] == IODA_INVALID_M64)
>>  				continue;
>>  			opal_pci_phb_mmio_enable(phb->opal_id,
>>-				OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
>>-			clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc);
>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>+				OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
>>+			clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
>>+			pdn->m64_map[j][i] = IODA_INVALID_M64;
>>  		}
>>
>>+	kfree(pdn->m64_map);
>>  	return 0;
>>  }
>>
>>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>  	int                    total_vfs;
>>  	resource_size_t        size, start;
>>  	int                    pe_num;
>>-	int                    vf_groups;
>>-	int                    vf_per_group;
>>+	int                    m64_bars;
>>
>>  	bus = pdev->bus;
>>  	hose = pci_bus_to_host(bus);
>>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>  	pdn = pci_get_pdn(pdev);
>>  	total_vfs = pci_sriov_get_totalvfs(pdev);
>>
>>-	/* Initialize the m64_wins to IODA_INVALID_M64 */
>>-	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>-		for (j = 0; j < M64_PER_IOV; j++)
>>-			pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>+	if (pdn->m64_single_mode)
>>+		m64_bars = num_vfs;
>>+	else
>>+		m64_bars = 1;
>>+
>>+	pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
>>+	if (!pdn->m64_map)
>>+		return -ENOMEM;
>>+	/* Initialize the m64_map to IODA_INVALID_M64 */
>>+	for (i = 0; i < m64_bars ; i++)
>>+		for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
>>+			pdn->m64_map[i][j] = IODA_INVALID_M64;
>>
>>-	if (pdn->m64_per_iov == M64_PER_IOV) {
>>-		vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
>>-		vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
>>-			roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>-	} else {
>>-		vf_groups = 1;
>>-		vf_per_group = 1;
>>-	}
>>
>>  	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>  		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>  		if (!res->flags || !res->parent)
>>  			continue;
>>
>>-		for (j = 0; j < vf_groups; j++) {
>>+		for (j = 0; j < m64_bars; j++) {
>>  			do {
>>  				win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>  						phb->ioda.m64_bar_idx + 1, 0);
>>@@ -1224,12 +1230,11 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>  					goto m64_failed;
>>  			} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
>>
>>-			pdn->m64_wins[i][j] = win;
>>+			pdn->m64_map[j][i] = win;
>>
>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>+			if (pdn->m64_single_mode) {
>>  				size = pci_iov_resource_size(pdev,
>>  							PCI_IOV_RESOURCES + i);
>>-				size = size * vf_per_group;
>>  				start = res->start + size * j;
>>  			} else {
>>  				size = resource_size(res);
>>@@ -1237,16 +1242,16 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>  			}
>>
>>  			/* Map the M64 here */
>>-			if (pdn->m64_per_iov == M64_PER_IOV) {
>>+			if (pdn->m64_single_mode) {
>>  				pe_num = pdn->offset + j;
>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>  						pe_num, OPAL_M64_WINDOW_TYPE,
>>-						pdn->m64_wins[i][j], 0);
>>+						pdn->m64_map[j][i], 0);
>>  			}
>>
>>  			rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>  						 OPAL_M64_WINDOW_TYPE,
>>-						 pdn->m64_wins[i][j],
>>+						 pdn->m64_map[j][i],
>>  						 start,
>>  						 0, /* unused */
>>  						 size);
>>@@ -1258,12 +1263,12 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>  				goto m64_failed;
>>  			}
>>
>>-			if (pdn->m64_per_iov == M64_PER_IOV)
>>+			if (pdn->m64_single_mode)
>>  				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 2);
>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 2);
>>  			else
>>  				rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>-				     OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 1);
>>+				     OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 1);
>>
>>  			if (rc != OPAL_SUCCESS) {
>>  				dev_err(&pdev->dev, "Failed to enable M64 window #%d: %llx\n",
>>@@ -1275,7 +1280,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>  	return 0;
>>
>>  m64_failed:
>>-	pnv_pci_vf_release_m64(pdev);
>>+	pnv_pci_vf_release_m64(pdev, num_vfs);
>>  	return -EBUSY;
>>  }
>>
>>@@ -1302,15 +1307,13 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe
>>  	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>  }
>>
>>-static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>+static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>  {
>>  	struct pci_bus        *bus;
>>  	struct pci_controller *hose;
>>  	struct pnv_phb        *phb;
>>  	struct pnv_ioda_pe    *pe, *pe_n;
>>  	struct pci_dn         *pdn;
>>-	u16                    vf_index;
>>-	int64_t                rc;
>>
>>  	bus = pdev->bus;
>>  	hose = pci_bus_to_host(bus);
>>@@ -1320,35 +1323,6 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>  	if (!pdev->is_physfn)
>>  		return;
>>
>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>-		int   vf_group;
>>-		int   vf_per_group;
>>-		int   vf_index1;
>>-
>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>-
>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++)
>>-			for (vf_index = vf_group * vf_per_group;
>>-				vf_index < (vf_group + 1) * vf_per_group &&
>>-				vf_index < num_vfs;
>>-				vf_index++)
>>-				for (vf_index1 = vf_group * vf_per_group;
>>-					vf_index1 < (vf_group + 1) * vf_per_group &&
>>-					vf_index1 < num_vfs;
>>-					vf_index1++){
>>-
>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>-						pdn->offset + vf_index,
>>-						pdn->offset + vf_index1,
>>-						OPAL_REMOVE_PE_FROM_DOMAIN);
>>-
>>-					if (rc)
>>-					    dev_warn(&pdev->dev, "%s: Failed to unlink same group PE#%d(%lld)\n",
>>-						__func__,
>>-						pdn->offset + vf_index1, rc);
>>-				}
>>-	}
>>-
>>  	list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
>>  		if (pe->parent_dev != pdev)
>>  			continue;
>>@@ -1383,14 +1357,14 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>  	num_vfs = pdn->num_vfs;
>>
>>  	/* Release VF PEs */
>>-	pnv_ioda_release_vf_PE(pdev, num_vfs);
>>+	pnv_ioda_release_vf_PE(pdev);
>>
>>  	if (phb->type == PNV_PHB_IODA2) {
>>-		if (pdn->m64_per_iov == 1)
>>+		if (!pdn->m64_single_mode)
>>  			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>>
>>  		/* Release M64 windows */
>>-		pnv_pci_vf_release_m64(pdev);
>>+		pnv_pci_vf_release_m64(pdev, num_vfs);
>>
>>  		/* Release PE numbers */
>>  		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>@@ -1409,7 +1383,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>  	int                    pe_num;
>>  	u16                    vf_index;
>>  	struct pci_dn         *pdn;
>>-	int64_t                rc;
>>
>>  	bus = pdev->bus;
>>  	hose = pci_bus_to_host(bus);
>>@@ -1454,37 +1427,6 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>
>>  		pnv_pci_ioda2_setup_dma_pe(phb, pe);
>>  	}
>>-
>>-	if (pdn->m64_per_iov == M64_PER_IOV && num_vfs > M64_PER_IOV) {
>>-		int   vf_group;
>>-		int   vf_per_group;
>>-		int   vf_index1;
>>-
>>-		vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
>>-
>>-		for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
>>-			for (vf_index = vf_group * vf_per_group;
>>-			     vf_index < (vf_group + 1) * vf_per_group &&
>>-			     vf_index < num_vfs;
>>-			     vf_index++) {
>>-				for (vf_index1 = vf_group * vf_per_group;
>>-				     vf_index1 < (vf_group + 1) * vf_per_group &&
>>-				     vf_index1 < num_vfs;
>>-				     vf_index1++) {
>>-
>>-					rc = opal_pci_set_peltv(phb->opal_id,
>>-						pdn->offset + vf_index,
>>-						pdn->offset + vf_index1,
>>-						OPAL_ADD_PE_TO_DOMAIN);
>>-
>>-					if (rc)
>>-					    dev_warn(&pdev->dev, "%s: Failed to link same group PE#%d(%lld)\n",
>>-						__func__,
>>-						pdn->offset + vf_index1, rc);
>>-				}
>>-			}
>>-		}
>>-	}
>>  }
>>
>>  int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>@@ -1507,6 +1449,15 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  			return -ENOSPC;
>>  		}
>>
>>+		/*
>>+		 * When M64 BARs functions in Single PE mode, the number of VFs
>>+		 * could be enabled must be less than the number of M64 BARs.
>>+		 */
>>+		if (pdn->m64_single_mode && num_vfs > phb->ioda.m64_bar_idx) {
>
>
>m64_bar_idx is not really the maximum number of M64 BARs, you program it to
>store the latest number but after some future rework/hardware update you
>might want to change it to be not the last M64 BAR and you will have to
>update this check as well.
>
>Instead of doing:
>arch/powerpc/platforms/powernv/pci-ioda.c|374| phb->ioda.m64_bar_idx = 15;
>do
>#define PCI_IODA2_M64_BAR_NUM 15
>and use new macro everywhere.
>Or read from OPAL if it tells about it.
>

I didn't touch this field since this is not that related to SRIOV.

And I prefer the OPAL way, or device tree way, which need to do some
modification in skiboot. If you think it is ok, I would send a separate patch
to fix this in both skiboot and kernel.

>
>
>>+			dev_info(&pdev->dev, "Not enough M64 BAR for VFs\n");
>>+			return -EBUSY;
>>+		}
>>+
>>  		/* Calculate available PE for required VFs */
>>  		mutex_lock(&phb->ioda.pe_alloc_mutex);
>>  		pdn->offset = bitmap_find_next_zero_area(
>>@@ -1534,7 +1485,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  		 * the IOV BAR according to the PE# allocated to the VFs.
>>  		 * Otherwise, the PE# for the VF will conflict with others.
>>  		 */
>>-		if (pdn->m64_per_iov == 1) {
>>+		if (!pdn->m64_single_mode) {
>>  			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
>>  			if (ret)
>>  				goto m64_failed;
>>@@ -1567,8 +1518,7 @@ int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  	/* Allocate PCI data */
>>  	add_dev_pci_data(pdev);
>>
>>-	pnv_pci_sriov_enable(pdev, num_vfs);
>>-	return 0;
>>+	return pnv_pci_sriov_enable(pdev, num_vfs);
>>  }
>>  #endif /* CONFIG_PCI_IOV */
>>
>>@@ -2762,9 +2712,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>
>>  	pdn = pci_get_pdn(pdev);
>>  	pdn->vfs_expanded = 0;
>>+	pdn->m64_single_mode = false;
>>
>>  	total_vfs = pci_sriov_get_totalvfs(pdev);
>>-	pdn->m64_per_iov = 1;
>>  	mul = phb->ioda.total_pe;
>>
>>  	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>@@ -2784,8 +2734,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>  		if (size > (1 << 26)) {
>>  			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n",
>>  				 i, res);
>>-			pdn->m64_per_iov = M64_PER_IOV;
>>  			mul = roundup_pow_of_two(total_vfs);
>>+			pdn->m64_single_mode = true;
>>  			break;
>>  		}
>>  	}
>>@@ -2987,6 +2937,8 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>>  static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>  						      int resno)
>>  {
>>+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>+	struct pnv_phb *phb = hose->private_data;
>>  	struct pci_dn *pdn = pci_get_pdn(pdev);
>>  	resource_size_t align;
>>
>>@@ -2995,12 +2947,26 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>  	 * SR-IOV. While from hardware perspective, the range mapped by M64
>>  	 * BAR should be size aligned.
>>  	 *
>>+	 * When IOV BAR is mapped with M64 BAR in Single PE mode, the extra
>>+	 * powernv-specific hardware restriction is gone. But if just use the
>>+	 * VF BAR size as the alignment, PF BAR / VF BAR may be allocated with
>>+	 * in one segment of M64 #15, which introduces the PE conflict between
>>+	 * PF and VF. Based on this, the minimum alignment of an IOV BAR is
>>+	 * m64_segsize.
>>+	 *
>>  	 * This function returns the total IOV BAR size if expanded or just the
>>-	 * individual size if not.
>>+	 * individual size if not, when M64 BAR is in Shared PE mode.
>>+	 * If the M64 BAR is in Single PE mode, return the VF BAR size or
>>+	 * M64 segment size if IOV BAR size is less.
>
>"Expanded" == "non-shared M64 BAR"? I'd stick to the "non-shared".
>

Yes, your way is more friendly to audience.

The current comment is not accurate. The correct way is:

 * This function returns the total IOV BAR size if M64 BAR is in Shared PE
 * mode or just the individual size if not.

>
>>  	 */
>>  	align = pci_iov_resource_size(pdev, resno);
>>-	if (pdn->vfs_expanded)
>>-		return pdn->vfs_expanded * align;
>>+	if (pdn->vfs_expanded) {
>>+		if (pdn->m64_single_mode)
>>+			return max(align,
>>+				(resource_size_t)phb->ioda.m64_segsize);
>>+		else
>>+			return pdn->vfs_expanded * align;
>>+	}
>>
>>  	return align;
>>  }
>>
>
>	if (!pdn->vfs_expanded)
>		return align;
>
>	if (pdn->m64_single_mode)
>		return max(align, (resource_size_t)phb->ioda.m64_segsize);
>
>	return pdn->vfs_expanded * align;
>
>Fewer braces/indents/line wraps :)
>

Yep~ Looks nice. Thanks!

>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one
  2015-10-02  9:51   ` Alexey Kardashevskiy
@ 2015-10-08  7:13     ` Wei Yang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Yang @ 2015-10-08  7:13 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Wei Yang, gwshan, benh, linuxppc-dev

On Fri, Oct 02, 2015 at 07:51:17PM +1000, Alexey Kardashevskiy wrote:
>On 08/19/2015 12:01 PM, Wei Yang wrote:
>>Each VF could have 6 BARs at most. When the total BAR size exceeds the
>>gate, after expanding it will also exhaust the M64 Window.
>>
>>This patch limits the boundary by checking the total VF BAR size instead of
>>the individual BAR.
>
>The gate is the biggest segment size in PE in shared mode, right? And this is
>64MB. Also, BARs with the same number of all VFs of the same physical adapter
>will be mapper contiguously (as one huge IOV BAR), for example, 2 VFs, 2 BARs
>each, mapping will look like:
>VF0-BAR0, VF1-BAR0, VF0-BAR1, VF1-BAR1
>but not like this:
>VF0-BAR0, VF0-BAR1, VF1-BAR0, VF1-BAR1
>Is this correct?
>

Yes, your understanding is correct. It will look like:

VF0-BAR0, VF1-BAR0, VF0-BAR1, VF1-BAR1

>
>
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c |   14 ++++++++------
>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index b8bc51f..4bc83b8 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2701,7 +2701,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>  	const resource_size_t gate = phb->ioda.m64_segsize >> 2;
>>  	struct resource *res;
>>  	int i;
>>-	resource_size_t size;
>>+	resource_size_t size, total_vf_bar_sz;
>>  	struct pci_dn *pdn;
>>  	int mul, total_vfs;
>>
>>@@ -2714,6 +2714,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>
>>  	total_vfs = pci_sriov_get_totalvfs(pdev);
>>  	mul = phb->ioda.total_pe;
>>+	total_vf_bar_sz = 0;
>>
>>  	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>  		res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>@@ -2726,7 +2727,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>  			return;
>>  		}
>>
>>-		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>+		total_vf_bar_sz += pci_iov_resource_size(pdev,
>>+				i + PCI_IOV_RESOURCES);
>
>
>Is @pdev a physical device in this context? I assume it is so
>pci_iov_resource_size() returns the entire IOV BAR size.
>For example, I have a Mellanox card with 16 VFs, each has a single 32MB BAR
>so total_vf_bar_sz will be 16*32=512MB and this will exceed the @gate size
>and we end up having m64_single_mode = true. What do I miss here?
>

@pdev is a PF here.

But pci_iov_resource_size() return VF BAR size instead of the entire IOV BAR
size.

The iov->barsz[] is set in sriov_init(). You could take a look at it.

>
>>
>>  		/*
>>  		 * If bigger than quarter of M64 segment size, just round up
>>@@ -2740,11 +2742,11 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>>  		 * limit the system flexibility.  This is a design decision to
>>  		 * set the boundary to quarter of the M64 segment size.
>>  		 */
>>-		if (size > gate) {
>>-			dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size "
>>-				"is bigger than %lld, roundup power2\n",
>>-				 i, res, gate);
>>+		if (total_vf_bar_sz > gate) {
>>  			mul = roundup_pow_of_two(total_vfs);
>>+			dev_info(&pdev->dev,
>>+				"VF BAR Total IOV size %llx > %llx, roundup to %d VFs\n",
>>+				total_vf_bar_sz, gate, mul);
>>  			pdn->m64_single_mode = true;
>>  			break;
>>  		}
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V4 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode
  2015-10-02 10:05   ` Alexey Kardashevskiy
@ 2015-10-08  7:19     ` Wei Yang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Yang @ 2015-10-08  7:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Wei Yang, gwshan, benh, linuxppc-dev

On Fri, Oct 02, 2015 at 08:05:47PM +1000, Alexey Kardashevskiy wrote:
>On 08/19/2015 12:01 PM, Wei Yang wrote:
>>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
>>sparse.
>>
>>This patch restructures the patch to allocate sparse PE# for VFs when M64
>
>This patch restructures the code ;)
>
>
>>BAR is set to Single PE mode. Also it rename the offset to pe_num_map to
>>reflect the content is the PE number.
>>
>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>>---
>>  arch/powerpc/include/asm/pci-bridge.h     |    2 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c |   79 ++++++++++++++++++++++-------
>>  2 files changed, 61 insertions(+), 20 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>index 8aeba4c..b3a226b 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -213,7 +213,7 @@ struct pci_dn {
>>  #ifdef CONFIG_PCI_IOV
>>  	u16     vfs_expanded;		/* number of VFs IOV BAR expanded */
>>  	u16     num_vfs;		/* number of VFs enabled*/
>>-	int     offset;			/* PE# for the first VF PE */
>>+	int     *pe_num_map;		/* PE# for the first VF PE or array */
>>  	bool    m64_single_mode;	/* Use M64 BAR in Single Mode */
>>  #define IODA_INVALID_M64        (-1)
>>  	int     (*m64_map)[PCI_SRIOV_NUM_BARS];
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 4bc83b8..779f52a 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1243,7 +1243,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
>>
>>  			/* Map the M64 here */
>>  			if (pdn->m64_single_mode) {
>>-				pe_num = pdn->offset + j;
>>+				pe_num = pdn->pe_num_map[j];
>>  				rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>  						pe_num, OPAL_M64_WINDOW_TYPE,
>>  						pdn->m64_map[j][i], 0);
>>@@ -1347,7 +1347,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>  	struct pnv_phb        *phb;
>>  	struct pci_dn         *pdn;
>>  	struct pci_sriov      *iov;
>>-	u16 num_vfs;
>>+	u16                    num_vfs, i;
>>
>>  	bus = pdev->bus;
>>  	hose = pci_bus_to_host(bus);
>>@@ -1361,14 +1361,21 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>
>>  	if (phb->type == PNV_PHB_IODA2) {
>>  		if (!pdn->m64_single_mode)
>>-			pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>>+			pnv_pci_vf_resource_shift(pdev, -*pdn->pe_num_map);
>>
>>  		/* Release M64 windows */
>>  		pnv_pci_vf_release_m64(pdev, num_vfs);
>>
>>  		/* Release PE numbers */
>>-		bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>-		pdn->offset = 0;
>>+		if (pdn->m64_single_mode) {
>>+			for (i = 0; i < num_vfs; i++) {
>>+				if (pdn->pe_num_map[i] != IODA_INVALID_PE)
>>+					pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
>>+			}
>>+		} else
>>+			bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>>+		/* Releasing pe_num_map */
>>+		kfree(pdn->pe_num_map);
>>  	}
>>  }
>>
>>@@ -1394,7 +1401,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
>>
>>  	/* Reserve PE for each VF */
>>  	for (vf_index = 0; vf_index < num_vfs; vf_index++) {
>>-		pe_num = pdn->offset + vf_index;
>>+		if (pdn->m64_single_mode)
>>+			pe_num = pdn->pe_num_map[vf_index];
>>+		else
>>+			pe_num = *pdn->pe_num_map + vf_index;
>>
>>  		pe = &phb->ioda.pe_array[pe_num];
>>  		pe->pe_number = pe_num;
>>@@ -1436,6 +1446,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  	struct pnv_phb        *phb;
>>  	struct pci_dn         *pdn;
>>  	int                    ret;
>>+	u16                    i;
>>
>>  	bus = pdev->bus;
>>  	hose = pci_bus_to_host(bus);
>>@@ -1458,20 +1469,42 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  			return -EBUSY;
>>  		}
>>
>>+		/* Allocating pe_num_map */
>>+		if (pdn->m64_single_mode)
>>+			pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map) * num_vfs,
>>+					GFP_KERNEL);
>>+		else
>>+			pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map), GFP_KERNEL);
>>+
>>+		if (!pdn->pe_num_map)
>>+			return -ENOMEM;
>
>[*]
>
>>+
>>  		/* Calculate available PE for required VFs */
>>-		mutex_lock(&phb->ioda.pe_alloc_mutex);
>>-		pdn->offset = bitmap_find_next_zero_area(
>>-			phb->ioda.pe_alloc, phb->ioda.total_pe,
>>-			0, num_vfs, 0);
>>-		if (pdn->offset >= phb->ioda.total_pe) {
>>+		if (pdn->m64_single_mode) {
>>+			for (i = 0; i < num_vfs; i++)
>>+				pdn->pe_num_map[i] = IODA_INVALID_PE;
>
>It is cleaner to do such initialization right after the check that kmalloc()
>did not fail, i.e. at [*]
>

Yep, agree.

>
>>+			for (i = 0; i < num_vfs; i++) {
>>+				pdn->pe_num_map[i] = pnv_ioda_alloc_pe(phb);
>>+				if (pdn->pe_num_map[i] == IODA_INVALID_PE) {
>>+					ret = -EBUSY;
>>+					goto m64_failed;
>>+				}
>>+			}
>>+		} else {
>>+			mutex_lock(&phb->ioda.pe_alloc_mutex);
>>+			*pdn->pe_num_map = bitmap_find_next_zero_area(
>>+				phb->ioda.pe_alloc, phb->ioda.total_pe,
>>+				0, num_vfs, 0);
>>+			if (*pdn->pe_num_map >= phb->ioda.total_pe) {
>>+				mutex_unlock(&phb->ioda.pe_alloc_mutex);
>>+				dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
>>+				kfree(pdn->pe_num_map);
>>+				return -EBUSY;
>>+			}
>>+			bitmap_set(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>>  			mutex_unlock(&phb->ioda.pe_alloc_mutex);
>>-			dev_info(&pdev->dev, "Failed to enable VF%d\n", num_vfs);
>>-			pdn->offset = 0;
>>-			return -EBUSY;
>>  		}
>>-		bitmap_set(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>  		pdn->num_vfs = num_vfs;
>>-		mutex_unlock(&phb->ioda.pe_alloc_mutex);
>>
>>  		/* Assign M64 window accordingly */
>>  		ret = pnv_pci_vf_assign_m64(pdev, num_vfs);
>>@@ -1486,7 +1519,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  		 * Otherwise, the PE# for the VF will conflict with others.
>>  		 */
>>  		if (!pdn->m64_single_mode) {
>>-			ret = pnv_pci_vf_resource_shift(pdev, pdn->offset);
>>+			ret = pnv_pci_vf_resource_shift(pdev, *pdn->pe_num_map);
>>  			if (ret)
>>  				goto m64_failed;
>>  		}
>>@@ -1498,8 +1531,16 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
>>  	return 0;
>>
>>  m64_failed:
>>-	bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>-	pdn->offset = 0;
>>+	if (pdn->m64_single_mode) {
>>+		for (i = 0; i < num_vfs; i++) {
>>+			if (pdn->pe_num_map[i] != IODA_INVALID_PE)
>>+				pnv_ioda_free_pe(phb, pdn->pe_num_map[i]);
>>+		}
>>+	} else
>>+		bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, num_vfs);
>>+
>>+	/* Releasing pe_num_map */
>>+	kfree(pdn->pe_num_map);
>>
>>  	return ret;
>>  }
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-10-08  7:20 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-19  2:01 [PATCH V4 0/6] Redesign SR-IOV on PowerNV Wei Yang
2015-08-19  2:01 ` [PATCH V4 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR Wei Yang
2015-10-02  8:55   ` Alexey Kardashevskiy
2015-10-08  6:29     ` Wei Yang
2015-08-19  2:01 ` [PATCH V4 2/6] powerpc/powernv: simplify the calculation of iov resource alignment Wei Yang
2015-10-02  8:58   ` Alexey Kardashevskiy
2015-10-08  6:39     ` Wei Yang
2015-08-19  2:01 ` [PATCH V4 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR Wei Yang
2015-08-19  2:21   ` Gavin Shan
2015-10-02  9:29   ` Alexey Kardashevskiy
2015-10-08  7:06     ` Wei Yang
2015-08-19  2:01 ` [PATCH V4 4/6] powerpc/powernv: replace the hard coded boundary with gate Wei Yang
2015-08-19  2:01 ` [PATCH V4 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one Wei Yang
2015-10-02  9:51   ` Alexey Kardashevskiy
2015-10-08  7:13     ` Wei Yang
2015-08-19  2:01 ` [PATCH V4 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode Wei Yang
2015-08-19  2:21   ` Gavin Shan
2015-10-02 10:05   ` Alexey Kardashevskiy
2015-10-08  7:19     ` Wei Yang
2015-08-26  5:11 ` [PATCH V4 0/6] Redesign SR-IOV on PowerNV Alexey Kardashevskiy
2015-08-26  8:06   ` Alexey Kardashevskiy
2015-10-02 10:07 ` Alexey Kardashevskiy
2015-10-07  2:43   ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).