[PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through
@ 2018-07-25  9:50 Alexey Kardashevskiy
  2018-07-25  9:50 ` [PATCH kernel RFC 1/3] powerpc/pseries/iommu: Allow dynamic window to start from zero Alexey Kardashevskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2018-07-25  9:50 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc,
	Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras,
	Russell Currey

I am trying to pass through a 3D controller:
[0302]: NVIDIA Corporation GV100GL [Tesla V100 SXM2] [10de:1db1] (rev a1)

which has a quite unique feature as coherent memory directly accessible
from a POWER9 CPU via an NVLink2 transport.

So in addition to passing a PCI device + accompanying NPU devices,
we will also be passing the host physical address range as it is done
on the bare metal system.

The memory on the host is presented as:

===
[aik@yc02goos ~]$ lsprop /proc/device-tree/memory@42000000000
ibm,chip-id      000000fe (254)
device_type      "memory"
compatible       "ibm,coherent-device-memory"
reg              00000420 00000000 00000020 00000000
linux,usable-memory
                 00000420 00000000 00000000 00000000
phandle          00000726 (1830)
name             "memory"
ibm,associativity
                 00000004 000000fe 000000fe 000000fe 000000fe
===

and the host does not touch it as the second 64bit value of
"linux,usable-memory" - the size - is null. Later on the NVIDIA driver
trains the NVLink2 and probes this memory and this is how it becomes
onlined.

In the virtual environment I am planning on doing the same thing,
however there is a difference in 64bit DMA handling. The powernv
platform uses a PHB3 bypass mode and that just works but
the pseries platform uses DDW RTAS API to achieve the same
result and the problem with this is that we need a huge DMA
window to start from zero (because this GPU supports less than
50bits for DMA address space) and cover not just present memory
but also this new coherent memory.

This is based on sha1
d72e90f3 Linus Torvalds "Linux 4.18-rc6".

Please comment. Thanks.

Alexey Kardashevskiy (3):
  powerpc/pseries/iommu: Allow dynamic window to start from zero
  powerpc/pseries/iommu: Force default DMA window removal
  powerpc/pseries/iommu: Use memory@ nodes in max RAM address
    calculation

 arch/powerpc/platforms/pseries/iommu.c | 77 ++++++++++++++++++++++++++++++----
 1 file changed, 70 insertions(+), 7 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH kernel RFC 1/3] powerpc/pseries/iommu: Allow dynamic window to start from zero
  2018-07-25  9:50 [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through Alexey Kardashevskiy
@ 2018-07-25  9:50 ` Alexey Kardashevskiy
  2018-07-27  3:42   ` David Gibson
  2018-07-25  9:50 ` [PATCH kernel RFC 2/3] powerpc/pseries/iommu: Force default DMA window removal Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2018-07-25  9:50 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc,
	Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras,
	Russell Currey

At the moment the kernel does not expect dynamic windows to ever start
at zero on a PCI bus as PAPR requires the hypervisor to create a 32bit
default window which starts from zero and the pseries kernel only
creates additional windows.

However PAPR permits removing the default window and creating another
one instead, starting from zero as well. In fact, the kernel used to
remove the default window after sha1 25ebc45b934 but this has been
reverted later.

Since there are devices capable of more than 32 bits for DMA but less than
50, and currently available hardware allows the second window only
at 1<<59, we will need to be able to create bigger windows starting from
zero. This does the initial preparation and should not cause any
behavioral changes.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 06f0296..9ece42f 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -53,6 +53,8 @@
 
 #include "pseries.h"
 
+#define DDW_INVALID_OFFSET	((uint64_t)-1)
+
 static struct iommu_table_group *iommu_pseries_alloc_group(int node)
 {
 	struct iommu_table_group *table_group;
@@ -844,7 +846,7 @@ static u64 find_existing_ddw(struct device_node *pdn)
 {
 	struct direct_window *window;
 	const struct dynamic_dma_window_prop *direct64;
-	u64 dma_addr = 0;
+	u64 dma_addr = DDW_INVALID_OFFSET;
 
 	spin_lock(&direct_window_list_lock);
 	/* check if we already created a window and dupe that config if so */
@@ -992,7 +994,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	mutex_lock(&direct_window_init_mutex);
 
 	dma_addr = find_existing_ddw(pdn);
-	if (dma_addr != 0)
+	if (dma_addr != DDW_INVALID_OFFSET)
 		goto out_unlock;
 
 	/*
@@ -1228,7 +1230,7 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
 		}
 		if (pdn && PCI_DN(pdn)) {
 			dma_offset = enable_ddw(pdev, pdn);
-			if (dma_offset != 0) {
+			if (dma_offset != DDW_INVALID_OFFSET) {
 				dev_info(dev, "Using 64-bit direct DMA at offset %llx\n", dma_offset);
 				set_dma_offset(dev, dma_offset);
 				set_dma_ops(dev, &dma_nommu_ops);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH kernel RFC 2/3] powerpc/pseries/iommu: Force default DMA window removal
  2018-07-25  9:50 [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through Alexey Kardashevskiy
  2018-07-25  9:50 ` [PATCH kernel RFC 1/3] powerpc/pseries/iommu: Allow dynamic window to start from zero Alexey Kardashevskiy
@ 2018-07-25  9:50 ` Alexey Kardashevskiy
  2018-07-25  9:50 ` [PATCH kernel RFC 3/3] powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation Alexey Kardashevskiy
  2018-08-09  4:41 ` [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through Alexey Kardashevskiy
  3 siblings, 0 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2018-07-25  9:50 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc,
	Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras,
	Russell Currey

It is quite common for a device to support more than 32bit but less than
64bit for DMA, for example, GPUs often support 42..50bits. However
the pseries platform only allows huge DMA window (the one which allows
the use of more than 2GB of DMA space) for 64bit-capable devices mostly
because:

1. we may have 32bit and >32bit devices on the same IOMMU domain and
we cannot place the new big window where the 32bit one is located;

2. the existing hardware only supports the second window at very high
offset of 1<<59 == 0x0800.0000.0000.0000.

So in order to allow 33..59bit DMA, we have to remove the default DMA
window and place a huge one there instead.

The PAPR spec says that the platform may decide not to use the default
window and remove it using DDW RTAS calls. There are few possible ways
for the platform to decide:

1. look at the device IDs and decide in advance that such and such
devices are capable of more than 32bit DMA (powernv's sketchy bypass
does something like this - it drops the default window if all devices
on the PE are from the same vendor) - this is not great as involves
guessing because, unlike sketchy bypass, the GPU case involves 2 vendor
ids and does not scale;

2. advertise 1 available DMA window in the hypervisor via
ibm,query-pe-dma-window so the pseries platform could take it as a clue
that if more bits for DMA are needed, it has to remove the default
window - this is not great as it is implicit clue rather than direct
instruction;

3. removing the default DMA window at all it not really an option as
PAPR mandates its presense at the guest boot time;

4. make the hypervisor explicitly tell the guest that the default window
is better be removed so the guest does not have to think hard and can
simply do what requested and this is what this patch does.

This makes use of the latter approach and exploits a new
"qemu,dma-force-remove-default" flag in a vPHB.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9ece42f..840afe5 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -54,6 +54,7 @@
 #include "pseries.h"

 #define DDW_INVALID_OFFSET	((uint64_t)-1)
+#define DDW_INVALID_LIOBN	((uint32_t)-1)

 static struct iommu_table_group *iommu_pseries_alloc_group(int node)
 {
@@ -977,7 +978,8 @@ static LIST_HEAD(failed_ddw_pdn_list);
  *
  * returns the dma offset for use by dma_set_mask
  */
-static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
+static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn,
+		u32 default_liobn)
 {
 	int len, ret;
 	struct ddw_query_response query;
@@ -1022,6 +1024,16 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	if (ret)
 		goto out_failed;

+	/*
+	 * The device tree has a request to force remove the default window,
+	 * do this.
+	 */
+	if (default_liobn != DDW_INVALID_LIOBN && (!ddw_avail[2] ||
+			rtas_call(ddw_avail[2], 1, 1, NULL, default_liobn))) {
+		dev_dbg(&dev->dev, "Could not remove window");
+		goto out_failed;
+	}
+
        /*
 	 * Query if there is a second window of size to map the
 	 * whole partition.  Query returns number of windows, largest
@@ -1212,7 +1224,7 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
 	pdev = to_pci_dev(dev);

 	/* only attempt to use a new window if 64-bit DMA is requested */
-	if (!disable_ddw && dma_mask == DMA_BIT_MASK(64)) {
+	if (!disable_ddw && dma_mask > DMA_BIT_MASK(32)) {
 		dn = pci_device_to_OF_node(pdev);
 		dev_dbg(dev, "node is %pOF\n", dn);

@@ -1229,7 +1241,15 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
 				break;
 		}
 		if (pdn && PCI_DN(pdn)) {
-			dma_offset = enable_ddw(pdev, pdn);
+			u32 flag = 0, liobn = DDW_INVALID_LIOBN;
+			int ret = of_property_read_u32(pdn,
+					"qemu,dma-force-remove-default", &flag);
+
+			if (!ret && flag && dma_window &&
+					dma_mask != DMA_BIT_MASK(64))
+				liobn = be32_to_cpu(dma_window[0]);
+
+			dma_offset = enable_ddw(pdev, pdn, liobn);
 			if (dma_offset != DDW_INVALID_OFFSET) {
 				dev_info(dev, "Using 64-bit direct DMA at offset %llx\n", dma_offset);
 				set_dma_offset(dev, dma_offset);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH kernel RFC 3/3] powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation
  2018-07-25  9:50 [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through Alexey Kardashevskiy
  2018-07-25  9:50 ` [PATCH kernel RFC 1/3] powerpc/pseries/iommu: Allow dynamic window to start from zero Alexey Kardashevskiy
  2018-07-25  9:50 ` [PATCH kernel RFC 2/3] powerpc/pseries/iommu: Force default DMA window removal Alexey Kardashevskiy
@ 2018-07-25  9:50 ` Alexey Kardashevskiy
  2018-08-09  4:41 ` [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through Alexey Kardashevskiy
  3 siblings, 0 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2018-07-25  9:50 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc,
	Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras,
	Russell Currey

We might have memory@ nodes with "linux,usable-memory" set to zero
(for example, to replicate powernv's behaviour for GPU coherent memory)
which means that the memory needs an extra initialization but since
it can be used afterwards, the pseries platform will try mapping it
for DMA so the DMA window needs to cover those memory regions too.

This walks through the memory nodes to find the highest RAM address to
let a huge DMA window cover that too in case this memory gets onlined
later.

The existing memory_hotplug_max() does not do the job as it calls:

1. memblock_end_of_DRAM() which looks at memory blocks and
GPU RAM is not there because of size==0 in linux,usable-memory
property of the memory node;

2. hot_add_drconf_memory_max() does not support sparse memory
if we want to map this memory in the guest where it is mapped
on the host (and it looks like we have to), the drconf chunk
is easily getting bigger that a megabyte.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 43 +++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 840afe5..74404f8 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -967,6 +967,47 @@ struct failed_ddw_pdn {
 
 static LIST_HEAD(failed_ddw_pdn_list);
 
+static unsigned long read_n_cells(int n, const __be32 **buf)
+{
+	unsigned long result = 0;
+
+	while (n--) {
+		result = (result << 32) | of_read_number(*buf, 1);
+		(*buf)++;
+	}
+	return result;
+}
+
+static phys_addr_t ddw_memory_hotplug_max(void)
+{
+	phys_addr_t max_addr = memory_hotplug_max();
+	struct device_node *memory;
+
+	for_each_node_by_type(memory, "memory") {
+		unsigned long start, size;
+		int ranges, n_mem_addr_cells, n_mem_size_cells, len;
+		const __be32 *memcell_buf;
+
+		memcell_buf = of_get_property(memory, "reg", &len);
+		if (!memcell_buf || len <= 0)
+			continue;
+
+		n_mem_addr_cells = of_n_addr_cells(memory);
+		n_mem_size_cells = of_n_size_cells(memory);
+
+		/* ranges in cell */
+		ranges = (len >> 2) / (n_mem_addr_cells + n_mem_size_cells);
+
+		/* these are order-sensitive, and modify the buffer pointer */
+		start = read_n_cells(n_mem_addr_cells, &memcell_buf);
+		size = read_n_cells(n_mem_size_cells, &memcell_buf);
+
+		max_addr = max_t(phys_addr_t, max_addr, start + size);
+	}
+
+	return max_addr;
+}
+
 /*
  * If the PE supports dynamic dma windows, and there is space for a table
  * that can map all pages in a linear offset, then setup such a table,
@@ -1067,7 +1108,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn,
 	}
 	/* verify the window * number of ptes will map the partition */
 	/* check largest block * page size > max memory hotplug addr */
-	max_addr = memory_hotplug_max();
+	max_addr = ddw_memory_hotplug_max();
 	if (query.largest_available_block < (max_addr >> page_shift)) {
 		dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
 			  "%llu-sized pages\n", max_addr,  query.largest_available_block,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH kernel RFC 1/3] powerpc/pseries/iommu: Allow dynamic window to start from zero
  2018-07-25  9:50 ` [PATCH kernel RFC 1/3] powerpc/pseries/iommu: Allow dynamic window to start from zero Alexey Kardashevskiy
@ 2018-07-27  3:42   ` David Gibson
  0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2018-07-27  3:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, Benjamin Herrenschmidt, Michael Ellerman,
	Paul Mackerras, Russell Currey

[-- Attachment #1: Type: text/plain, Size: 2818 bytes --]

On Wed, Jul 25, 2018 at 07:50:30PM +1000, Alexey Kardashevskiy wrote:
> At the moment the kernel does not expect dynamic windows to ever start
> at zero on a PCI bus as PAPR requires the hypervisor to create a 32bit
> default window which starts from zero and the pseries kernel only
> creates additional windows.
> 
> However PAPR permits removing the default window and creating another
> one instead, starting from zero as well. In fact, the kernel used to
> remove the default window after sha1 25ebc45b934 but this has been
> reverted later.
> 
> Since there are devices capable of more than 32 bits for DMA but less than
> 50, and currently available hardware allows the second window only
> at 1<<59, we will need to be able to create bigger windows starting from
> zero. This does the initial preparation and should not cause any
> behavioral changes.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/platforms/pseries/iommu.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 06f0296..9ece42f 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -53,6 +53,8 @@
>  
>  #include "pseries.h"
>  
> +#define DDW_INVALID_OFFSET	((uint64_t)-1)
> +
>  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
>  {
>  	struct iommu_table_group *table_group;
> @@ -844,7 +846,7 @@ static u64 find_existing_ddw(struct device_node *pdn)
>  {
>  	struct direct_window *window;
>  	const struct dynamic_dma_window_prop *direct64;
> -	u64 dma_addr = 0;
> +	u64 dma_addr = DDW_INVALID_OFFSET;
>  
>  	spin_lock(&direct_window_list_lock);
>  	/* check if we already created a window and dupe that config if so */
> @@ -992,7 +994,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>  	mutex_lock(&direct_window_init_mutex);
>  
>  	dma_addr = find_existing_ddw(pdn);
> -	if (dma_addr != 0)
> +	if (dma_addr != DDW_INVALID_OFFSET)
>  		goto out_unlock;
>  
>  	/*
> @@ -1228,7 +1230,7 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
>  		}
>  		if (pdn && PCI_DN(pdn)) {
>  			dma_offset = enable_ddw(pdev, pdn);
> -			if (dma_offset != 0) {
> +			if (dma_offset != DDW_INVALID_OFFSET) {
>  				dev_info(dev, "Using 64-bit direct DMA at offset %llx\n", dma_offset);
>  				set_dma_offset(dev, dma_offset);
>  				set_dma_ops(dev, &dma_nommu_ops);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through
  2018-07-25  9:50 [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2018-07-25  9:50 ` [PATCH kernel RFC 3/3] powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation Alexey Kardashevskiy
@ 2018-08-09  4:41 ` Alexey Kardashevskiy
  2018-08-24  3:04   ` Alexey Kardashevskiy
  3 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-09  4:41 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: David Gibson, kvm-ppc, Benjamin Herrenschmidt, Michael Ellerman,
	Paul Mackerras, Russell Currey



On 25/07/2018 19:50, Alexey Kardashevskiy wrote:
> I am trying to pass through a 3D controller:
> [0302]: NVIDIA Corporation GV100GL [Tesla V100 SXM2] [10de:1db1] (rev a1)
> 
> which has a quite unique feature as coherent memory directly accessible
> from a POWER9 CPU via an NVLink2 transport.
> 
> So in addition to passing a PCI device + accompanying NPU devices,
> we will also be passing the host physical address range as it is done
> on the bare metal system.
> 
> The memory on the host is presented as:
> 
> ===
> [aik@yc02goos ~]$ lsprop /proc/device-tree/memory@42000000000
> ibm,chip-id      000000fe (254)
> device_type      "memory"
> compatible       "ibm,coherent-device-memory"
> reg              00000420 00000000 00000020 00000000
> linux,usable-memory
>                  00000420 00000000 00000000 00000000
> phandle          00000726 (1830)
> name             "memory"
> ibm,associativity
>                  00000004 000000fe 000000fe 000000fe 000000fe
> ===
> 
> and the host does not touch it as the second 64bit value of
> "linux,usable-memory" - the size - is null. Later on the NVIDIA driver
> trains the NVLink2 and probes this memory and this is how it becomes
> onlined.
> 
> In the virtual environment I am planning on doing the same thing,
> however there is a difference in 64bit DMA handling. The powernv
> platform uses a PHB3 bypass mode and that just works but
> the pseries platform uses DDW RTAS API to achieve the same
> result and the problem with this is that we need a huge DMA
> window to start from zero (because this GPU supports less than
> 50bits for DMA address space) and cover not just present memory
> but also this new coherent memory.
> 
> 
> This is based on sha1
> d72e90f3 Linus Torvalds "Linux 4.18-rc6".
> 
> Please comment. Thanks.


Ping?


> 
> 
> 
> Alexey Kardashevskiy (3):
>   powerpc/pseries/iommu: Allow dynamic window to start from zero
>   powerpc/pseries/iommu: Force default DMA window removal
>   powerpc/pseries/iommu: Use memory@ nodes in max RAM address
>     calculation
> 
>  arch/powerpc/platforms/pseries/iommu.c | 77 ++++++++++++++++++++++++++++++----
>  1 file changed, 70 insertions(+), 7 deletions(-)
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through
  2018-08-09  4:41 ` [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through Alexey Kardashevskiy
@ 2018-08-24  3:04   ` Alexey Kardashevskiy
  2018-09-17  7:05     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-24  3:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: David Gibson, kvm-ppc, Benjamin Herrenschmidt, Michael Ellerman,
	Paul Mackerras, Russell Currey



On 09/08/2018 14:41, Alexey Kardashevskiy wrote:
> 
> 
> On 25/07/2018 19:50, Alexey Kardashevskiy wrote:
>> I am trying to pass through a 3D controller:
>> [0302]: NVIDIA Corporation GV100GL [Tesla V100 SXM2] [10de:1db1] (rev a1)
>>
>> which has a quite unique feature as coherent memory directly accessible
>> from a POWER9 CPU via an NVLink2 transport.
>>
>> So in addition to passing a PCI device + accompanying NPU devices,
>> we will also be passing the host physical address range as it is done
>> on the bare metal system.
>>
>> The memory on the host is presented as:
>>
>> ===
>> [aik@yc02goos ~]$ lsprop /proc/device-tree/memory@42000000000
>> ibm,chip-id      000000fe (254)
>> device_type      "memory"
>> compatible       "ibm,coherent-device-memory"
>> reg              00000420 00000000 00000020 00000000
>> linux,usable-memory
>>                  00000420 00000000 00000000 00000000
>> phandle          00000726 (1830)
>> name             "memory"
>> ibm,associativity
>>                  00000004 000000fe 000000fe 000000fe 000000fe
>> ===
>>
>> and the host does not touch it as the second 64bit value of
>> "linux,usable-memory" - the size - is null. Later on the NVIDIA driver
>> trains the NVLink2 and probes this memory and this is how it becomes
>> onlined.
>>
>> In the virtual environment I am planning on doing the same thing,
>> however there is a difference in 64bit DMA handling. The powernv
>> platform uses a PHB3 bypass mode and that just works but
>> the pseries platform uses DDW RTAS API to achieve the same
>> result and the problem with this is that we need a huge DMA
>> window to start from zero (because this GPU supports less than
>> 50bits for DMA address space) and cover not just present memory
>> but also this new coherent memory.
>>
>>
>> This is based on sha1
>> d72e90f3 Linus Torvalds "Linux 4.18-rc6".
>>
>> Please comment. Thanks.
> 
> 
> Ping?


Ping?

> 
> 
>>
>>
>>
>> Alexey Kardashevskiy (3):
>>   powerpc/pseries/iommu: Allow dynamic window to start from zero
>>   powerpc/pseries/iommu: Force default DMA window removal
>>   powerpc/pseries/iommu: Use memory@ nodes in max RAM address
>>     calculation
>>
>>  arch/powerpc/platforms/pseries/iommu.c | 77 ++++++++++++++++++++++++++++++----
>>  1 file changed, 70 insertions(+), 7 deletions(-)
>>
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through
  2018-08-24  3:04   ` Alexey Kardashevskiy
@ 2018-09-17  7:05     ` Alexey Kardashevskiy
  2018-10-15  7:29       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-17  7:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: David Gibson, kvm-ppc, Benjamin Herrenschmidt, Michael Ellerman,
	Paul Mackerras, Russell Currey

Ping?

The problem is still there...


On 24/08/2018 13:04, Alexey Kardashevskiy wrote:
> 
> 
> On 09/08/2018 14:41, Alexey Kardashevskiy wrote:
>>
>>
>> On 25/07/2018 19:50, Alexey Kardashevskiy wrote:
>>> I am trying to pass through a 3D controller:
>>> [0302]: NVIDIA Corporation GV100GL [Tesla V100 SXM2] [10de:1db1] (rev a1)
>>>
>>> which has a quite unique feature as coherent memory directly accessible
>>> from a POWER9 CPU via an NVLink2 transport.
>>>
>>> So in addition to passing a PCI device + accompanying NPU devices,
>>> we will also be passing the host physical address range as it is done
>>> on the bare metal system.
>>>
>>> The memory on the host is presented as:
>>>
>>> ===
>>> [aik@yc02goos ~]$ lsprop /proc/device-tree/memory@42000000000
>>> ibm,chip-id      000000fe (254)
>>> device_type      "memory"
>>> compatible       "ibm,coherent-device-memory"
>>> reg              00000420 00000000 00000020 00000000
>>> linux,usable-memory
>>>                  00000420 00000000 00000000 00000000
>>> phandle          00000726 (1830)
>>> name             "memory"
>>> ibm,associativity
>>>                  00000004 000000fe 000000fe 000000fe 000000fe
>>> ===
>>>
>>> and the host does not touch it as the second 64bit value of
>>> "linux,usable-memory" - the size - is null. Later on the NVIDIA driver
>>> trains the NVLink2 and probes this memory and this is how it becomes
>>> onlined.
>>>
>>> In the virtual environment I am planning on doing the same thing,
>>> however there is a difference in 64bit DMA handling. The powernv
>>> platform uses a PHB3 bypass mode and that just works but
>>> the pseries platform uses DDW RTAS API to achieve the same
>>> result and the problem with this is that we need a huge DMA
>>> window to start from zero (because this GPU supports less than
>>> 50bits for DMA address space) and cover not just present memory
>>> but also this new coherent memory.
>>>
>>>
>>> This is based on sha1
>>> d72e90f3 Linus Torvalds "Linux 4.18-rc6".
>>>
>>> Please comment. Thanks.
>>
>>
>> Ping?
> 
> 
> Ping?
> 
>>
>>
>>>
>>>
>>>
>>> Alexey Kardashevskiy (3):
>>>   powerpc/pseries/iommu: Allow dynamic window to start from zero
>>>   powerpc/pseries/iommu: Force default DMA window removal
>>>   powerpc/pseries/iommu: Use memory@ nodes in max RAM address
>>>     calculation
>>>
>>>  arch/powerpc/platforms/pseries/iommu.c | 77 ++++++++++++++++++++++++++++++----
>>>  1 file changed, 70 insertions(+), 7 deletions(-)
>>>
>>
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through
  2018-09-17  7:05     ` Alexey Kardashevskiy
@ 2018-10-15  7:29       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  7:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kvm-ppc, David Gibson

Ping?


On 17/09/2018 17:05, Alexey Kardashevskiy wrote:
> Ping?
> 
> The problem is still there...
> 
> 
> On 24/08/2018 13:04, Alexey Kardashevskiy wrote:
>>
>>
>> On 09/08/2018 14:41, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 25/07/2018 19:50, Alexey Kardashevskiy wrote:
>>>> I am trying to pass through a 3D controller:
>>>> [0302]: NVIDIA Corporation GV100GL [Tesla V100 SXM2] [10de:1db1] (rev a1)
>>>>
>>>> which has a quite unique feature as coherent memory directly accessible
>>>> from a POWER9 CPU via an NVLink2 transport.
>>>>
>>>> So in addition to passing a PCI device + accompanying NPU devices,
>>>> we will also be passing the host physical address range as it is done
>>>> on the bare metal system.
>>>>
>>>> The memory on the host is presented as:
>>>>
>>>> ===
>>>> [aik@yc02goos ~]$ lsprop /proc/device-tree/memory@42000000000
>>>> ibm,chip-id      000000fe (254)
>>>> device_type      "memory"
>>>> compatible       "ibm,coherent-device-memory"
>>>> reg              00000420 00000000 00000020 00000000
>>>> linux,usable-memory
>>>>                  00000420 00000000 00000000 00000000
>>>> phandle          00000726 (1830)
>>>> name             "memory"
>>>> ibm,associativity
>>>>                  00000004 000000fe 000000fe 000000fe 000000fe
>>>> ===
>>>>
>>>> and the host does not touch it as the second 64bit value of
>>>> "linux,usable-memory" - the size - is null. Later on the NVIDIA driver
>>>> trains the NVLink2 and probes this memory and this is how it becomes
>>>> onlined.
>>>>
>>>> In the virtual environment I am planning on doing the same thing,
>>>> however there is a difference in 64bit DMA handling. The powernv
>>>> platform uses a PHB3 bypass mode and that just works but
>>>> the pseries platform uses DDW RTAS API to achieve the same
>>>> result and the problem with this is that we need a huge DMA
>>>> window to start from zero (because this GPU supports less than
>>>> 50bits for DMA address space) and cover not just present memory
>>>> but also this new coherent memory.
>>>>
>>>>
>>>> This is based on sha1
>>>> d72e90f3 Linus Torvalds "Linux 4.18-rc6".
>>>>
>>>> Please comment. Thanks.
>>>
>>>
>>> Ping?
>>
>>
>> Ping?
>>
>>>
>>>
>>>>
>>>>
>>>>
>>>> Alexey Kardashevskiy (3):
>>>>   powerpc/pseries/iommu: Allow dynamic window to start from zero
>>>>   powerpc/pseries/iommu: Force default DMA window removal
>>>>   powerpc/pseries/iommu: Use memory@ nodes in max RAM address
>>>>     calculation
>>>>
>>>>  arch/powerpc/platforms/pseries/iommu.c | 77 ++++++++++++++++++++++++++++++----
>>>>  1 file changed, 70 insertions(+), 7 deletions(-)
>>>>
>>>
>>
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-10-15  7:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-25  9:50 [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through Alexey Kardashevskiy
2018-07-25  9:50 ` [PATCH kernel RFC 1/3] powerpc/pseries/iommu: Allow dynamic window to start from zero Alexey Kardashevskiy
2018-07-27  3:42   ` David Gibson
2018-07-25  9:50 ` [PATCH kernel RFC 2/3] powerpc/pseries/iommu: Force default DMA window removal Alexey Kardashevskiy
2018-07-25  9:50 ` [PATCH kernel RFC 3/3] powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation Alexey Kardashevskiy
2018-08-09  4:41 ` [PATCH kernel RFC 0/3] powerpc/pseries/iommu: GPU coherent memory pass through Alexey Kardashevskiy
2018-08-24  3:04   ` Alexey Kardashevskiy
2018-09-17  7:05     ` Alexey Kardashevskiy
2018-10-15  7:29       ` Alexey Kardashevskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).