[PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range
@ 2018-04-17  9:11 Alistair Popple
  2018-04-17  9:11 ` [PATCH 2/2] powernv/npu: Add a debugfs setting to change ATSD threshold Alistair Popple
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Alistair Popple @ 2018-04-17  9:11 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mhairgrove, arbab, bsingharora, Alistair Popple

The NPU has a limited number of address translation shootdown (ATSD)
registers and the GPU has limited bandwidth to process ATSDs. This can
result in contention of ATSD registers leading to soft lockups on some
threads, particularly when invalidating a large address range in
pnv_npu2_mn_invalidate_range().

At some threshold it becomes more efficient to flush the entire GPU TLB for
the given MM context (PID) than individually flushing each address in the
range. This patch will result in ranges greater than 2MB being converted
from 32+ ATSDs into a single ATSD which will flush the TLB for the given
PID on each GPU.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
---
 arch/powerpc/platforms/powernv/npu-dma.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 94801d8e7894..dc34662e9df9 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -40,6 +40,13 @@
 DEFINE_SPINLOCK(npu_context_lock);
 
 /*
+ * When an address shootdown range exceeds this threshold we invalidate the
+ * entire TLB on the GPU for the given PID rather than each specific address in
+ * the range.
+ */
+#define ATSD_THRESHOLD (2*1024*1024)
+
+/*
  * Other types of TCE cache invalidation are not functional in the
  * hardware.
  */
@@ -675,11 +682,19 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
 	struct npu_context *npu_context = mn_to_npu_context(mn);
 	unsigned long address;
 
-	for (address = start; address < end; address += PAGE_SIZE)
-		mmio_invalidate(npu_context, 1, address, false);
+	if (end - start > ATSD_THRESHOLD) {
+		/*
+		 * Just invalidate the entire PID if the address range is too
+		 * large.
+		 */
+		mmio_invalidate(npu_context, 0, 0, true);
+	} else {
+		for (address = start; address < end; address += PAGE_SIZE)
+			mmio_invalidate(npu_context, 1, address, false);
 
-	/* Do the flush only on the final addess == end */
-	mmio_invalidate(npu_context, 1, address, true);
+		/* Do the flush only on the final addess == end */
+		mmio_invalidate(npu_context, 1, address, true);
+	}
 }
 
 static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] powernv/npu: Add a debugfs setting to change ATSD threshold
  2018-04-17  9:11 [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range Alistair Popple
@ 2018-04-17  9:11 ` Alistair Popple
  2018-04-17 21:45   ` Balbir Singh
  2018-07-23 15:11   ` [2/2] " Michael Ellerman
  2018-04-17  9:17 ` [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range Balbir Singh
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 8+ messages in thread
From: Alistair Popple @ 2018-04-17  9:11 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mhairgrove, arbab, bsingharora, Alistair Popple

The threshold at which it becomes more efficient to coalesce a range of
ATSDs into a single per-PID ATSD is currently not well understood due to a
lack of real-world work loads. This patch adds a debugfs parameter allowing
the threshold to be altered at runtime in order to aid future development
and refinement of the value.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
---
 arch/powerpc/platforms/powernv/npu-dma.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index dc34662e9df9..a765bf576c14 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -17,7 +17,9 @@
 #include <linux/pci.h>
 #include <linux/memblock.h>
 #include <linux/iommu.h>
+#include <linux/debugfs.h>
 
+#include <asm/debugfs.h>
 #include <asm/tlb.h>
 #include <asm/powernv.h>
 #include <asm/reg.h>
@@ -44,7 +46,8 @@ DEFINE_SPINLOCK(npu_context_lock);
  * entire TLB on the GPU for the given PID rather than each specific address in
  * the range.
  */
-#define ATSD_THRESHOLD (2*1024*1024)
+static uint64_t atsd_threshold = 2 * 1024 * 1024;
+static struct dentry *atsd_threshold_dentry;
 
 /*
  * Other types of TCE cache invalidation are not functional in the
@@ -682,7 +685,7 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
 	struct npu_context *npu_context = mn_to_npu_context(mn);
 	unsigned long address;
 
-	if (end - start > ATSD_THRESHOLD) {
+	if (end - start > atsd_threshold) {
 		/*
 		 * Just invalidate the entire PID if the address range is too
 		 * large.
@@ -956,6 +959,11 @@ int pnv_npu2_init(struct pnv_phb *phb)
 	static int npu_index;
 	uint64_t rc = 0;
 
+	if (!atsd_threshold_dentry) {
+		atsd_threshold_dentry = debugfs_create_x64("atsd_threshold",
+				   0600, powerpc_debugfs_root, &atsd_threshold);
+	}
+
 	phb->npu.nmmu_flush =
 		of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush");
 	for_each_child_of_node(phb->hose->dn, dn) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range
  2018-04-17  9:11 [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range Alistair Popple
  2018-04-17  9:11 ` [PATCH 2/2] powernv/npu: Add a debugfs setting to change ATSD threshold Alistair Popple
@ 2018-04-17  9:17 ` Balbir Singh
  2018-04-17 22:25   ` Balbir Singh
  2018-04-20  3:51 ` Alistair Popple
  2018-04-24  3:48 ` [1/2] " Michael Ellerman
  3 siblings, 1 reply; 8+ messages in thread
From: Balbir Singh @ 2018-04-17  9:17 UTC (permalink / raw)
  To: Alistair Popple
  Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Michael Ellerman, Mark Hairgrove, arbab

On Tue, Apr 17, 2018 at 7:11 PM, Alistair Popple <alistair@popple.id.au> wrote:
> The NPU has a limited number of address translation shootdown (ATSD)
> registers and the GPU has limited bandwidth to process ATSDs. This can
> result in contention of ATSD registers leading to soft lockups on some
> threads, particularly when invalidating a large address range in
> pnv_npu2_mn_invalidate_range().
>
> At some threshold it becomes more efficient to flush the entire GPU TLB for
> the given MM context (PID) than individually flushing each address in the
> range. This patch will result in ranges greater than 2MB being converted
> from 32+ ATSDs into a single ATSD which will flush the TLB for the given
> PID on each GPU.
>
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 94801d8e7894..dc34662e9df9 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -40,6 +40,13 @@
>  DEFINE_SPINLOCK(npu_context_lock);
>
>  /*
> + * When an address shootdown range exceeds this threshold we invalidate the
> + * entire TLB on the GPU for the given PID rather than each specific address in
> + * the range.
> + */
> +#define ATSD_THRESHOLD (2*1024*1024)
> +
> +/*
>   * Other types of TCE cache invalidation are not functional in the
>   * hardware.
>   */
> @@ -675,11 +682,19 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
>         struct npu_context *npu_context = mn_to_npu_context(mn);
>         unsigned long address;
>
> -       for (address = start; address < end; address += PAGE_SIZE)
> -               mmio_invalidate(npu_context, 1, address, false);
> +       if (end - start > ATSD_THRESHOLD) {

I'm nitpicking, but (end - start) > ATSD_THRESHOLD is clearer

> +               /*
> +                * Just invalidate the entire PID if the address range is too
> +                * large.
> +                */
> +               mmio_invalidate(npu_context, 0, 0, true);
> +       } else {
> +               for (address = start; address < end; address += PAGE_SIZE)
> +                       mmio_invalidate(npu_context, 1, address, false);
>
> -       /* Do the flush only on the final addess == end */
> -       mmio_invalidate(npu_context, 1, address, true);
> +               /* Do the flush only on the final addess == end */
> +               mmio_invalidate(npu_context, 1, address, true);
> +       }
>  }
>

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] powernv/npu: Add a debugfs setting to change ATSD threshold
  2018-04-17  9:11 ` [PATCH 2/2] powernv/npu: Add a debugfs setting to change ATSD threshold Alistair Popple
@ 2018-04-17 21:45   ` Balbir Singh
  2018-07-23 15:11   ` [2/2] " Michael Ellerman
  1 sibling, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2018-04-17 21:45 UTC (permalink / raw)
  To: Alistair Popple; +Cc: linuxppc-dev, mpe, mhairgrove, arbab

On Tue, 17 Apr 2018 19:11:29 +1000
Alistair Popple <alistair@popple.id.au> wrote:

> The threshold at which it becomes more efficient to coalesce a range of
> ATSDs into a single per-PID ATSD is currently not well understood due to a
> lack of real-world work loads. This patch adds a debugfs parameter allowing
> the threshold to be altered at runtime in order to aid future development
> and refinement of the value.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index dc34662e9df9..a765bf576c14 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -17,7 +17,9 @@
>  #include <linux/pci.h>
>  #include <linux/memblock.h>
>  #include <linux/iommu.h>
> +#include <linux/debugfs.h>
>  
> +#include <asm/debugfs.h>
>  #include <asm/tlb.h>
>  #include <asm/powernv.h>
>  #include <asm/reg.h>
> @@ -44,7 +46,8 @@ DEFINE_SPINLOCK(npu_context_lock);
>   * entire TLB on the GPU for the given PID rather than each specific address in
>   * the range.
>   */
> -#define ATSD_THRESHOLD (2*1024*1024)
> +static uint64_t atsd_threshold = 2 * 1024 * 1024;
> +static struct dentry *atsd_threshold_dentry;
>  
>  /*
>   * Other types of TCE cache invalidation are not functional in the
> @@ -682,7 +685,7 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
>  	struct npu_context *npu_context = mn_to_npu_context(mn);
>  	unsigned long address;
>  
> -	if (end - start > ATSD_THRESHOLD) {
> +	if (end - start > atsd_threshold) {
>  		/*
>  		 * Just invalidate the entire PID if the address range is too
>  		 * large.
> @@ -956,6 +959,11 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  	static int npu_index;
>  	uint64_t rc = 0;
>  
> +	if (!atsd_threshold_dentry) {
> +		atsd_threshold_dentry = debugfs_create_x64("atsd_threshold",

Nit-picking can we call this atsd_threshold_in_bytes?

> +				   0600, powerpc_debugfs_root, &atsd_threshold);
> +	}
> +
>  	phb->npu.nmmu_flush =
>  		of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush");
>  	for_each_child_of_node(phb->hose->dn, dn) {

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range
  2018-04-17  9:17 ` [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range Balbir Singh
@ 2018-04-17 22:25   ` Balbir Singh
  0 siblings, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2018-04-17 22:25 UTC (permalink / raw)
  To: Alistair Popple
  Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Michael Ellerman, Mark Hairgrove, arbab

On Tue, Apr 17, 2018 at 7:17 PM, Balbir Singh <bsingharora@gmail.com> wrote:
> On Tue, Apr 17, 2018 at 7:11 PM, Alistair Popple <alistair@popple.id.au> wrote:
>> The NPU has a limited number of address translation shootdown (ATSD)
>> registers and the GPU has limited bandwidth to process ATSDs. This can
>> result in contention of ATSD registers leading to soft lockups on some
>> threads, particularly when invalidating a large address range in
>> pnv_npu2_mn_invalidate_range().
>>
>> At some threshold it becomes more efficient to flush the entire GPU TLB for
>> the given MM context (PID) than individually flushing each address in the
>> range. This patch will result in ranges greater than 2MB being converted
>> from 32+ ATSDs into a single ATSD which will flush the TLB for the given
>> PID on each GPU.
>>
>> Signed-off-by: Alistair Popple <alistair@popple.id.au>
>> +       }
>>  }
>>
>
> Acked-by: Balbir Singh <bsingharora@gmail.com>
Tested-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range
  2018-04-17  9:11 [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range Alistair Popple
  2018-04-17  9:11 ` [PATCH 2/2] powernv/npu: Add a debugfs setting to change ATSD threshold Alistair Popple
  2018-04-17  9:17 ` [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range Balbir Singh
@ 2018-04-20  3:51 ` Alistair Popple
  2018-04-24  3:48 ` [1/2] " Michael Ellerman
  3 siblings, 0 replies; 8+ messages in thread
From: Alistair Popple @ 2018-04-20  3:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, mhairgrove, arbab

Sorry, forgot to include:

Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2")

Thanks

On Tuesday, 17 April 2018 7:11:28 PM AEST Alistair Popple wrote:
> The NPU has a limited number of address translation shootdown (ATSD)
> registers and the GPU has limited bandwidth to process ATSDs. This can
> result in contention of ATSD registers leading to soft lockups on some
> threads, particularly when invalidating a large address range in
> pnv_npu2_mn_invalidate_range().
> 
> At some threshold it becomes more efficient to flush the entire GPU TLB for
> the given MM context (PID) than individually flushing each address in the
> range. This patch will result in ranges greater than 2MB being converted
> from 32+ ATSDs into a single ATSD which will flush the TLB for the given
> PID on each GPU.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 94801d8e7894..dc34662e9df9 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -40,6 +40,13 @@
>  DEFINE_SPINLOCK(npu_context_lock);
>  
>  /*
> + * When an address shootdown range exceeds this threshold we invalidate the
> + * entire TLB on the GPU for the given PID rather than each specific address in
> + * the range.
> + */
> +#define ATSD_THRESHOLD (2*1024*1024)
> +
> +/*
>   * Other types of TCE cache invalidation are not functional in the
>   * hardware.
>   */
> @@ -675,11 +682,19 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
>  	struct npu_context *npu_context = mn_to_npu_context(mn);
>  	unsigned long address;
>  
> -	for (address = start; address < end; address += PAGE_SIZE)
> -		mmio_invalidate(npu_context, 1, address, false);
> +	if (end - start > ATSD_THRESHOLD) {
> +		/*
> +		 * Just invalidate the entire PID if the address range is too
> +		 * large.
> +		 */
> +		mmio_invalidate(npu_context, 0, 0, true);
> +	} else {
> +		for (address = start; address < end; address += PAGE_SIZE)
> +			mmio_invalidate(npu_context, 1, address, false);
>  
> -	/* Do the flush only on the final addess == end */
> -	mmio_invalidate(npu_context, 1, address, true);
> +		/* Do the flush only on the final addess == end */
> +		mmio_invalidate(npu_context, 1, address, true);
> +	}
>  }
>  
>  static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range
  2018-04-17  9:11 [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range Alistair Popple
                   ` (2 preceding siblings ...)
  2018-04-20  3:51 ` Alistair Popple
@ 2018-04-24  3:48 ` Michael Ellerman
  3 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-04-24  3:48 UTC (permalink / raw)
  To: Alistair Popple, linuxppc-dev; +Cc: Alistair Popple, mhairgrove, arbab

On Tue, 2018-04-17 at 09:11:28 UTC, Alistair Popple wrote:
> The NPU has a limited number of address translation shootdown (ATSD)
> registers and the GPU has limited bandwidth to process ATSDs. This can
> result in contention of ATSD registers leading to soft lockups on some
> threads, particularly when invalidating a large address range in
> pnv_npu2_mn_invalidate_range().
> 
> At some threshold it becomes more efficient to flush the entire GPU TLB for
> the given MM context (PID) than individually flushing each address in the
> range. This patch will result in ranges greater than 2MB being converted
> from 32+ ATSDs into a single ATSD which will flush the TLB for the given
> PID on each GPU.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> Acked-by: Balbir Singh <bsingharora@gmail.com>
> Tested-by: Balbir Singh <bsingharora@gmail.com>

Patch 1 applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/d0cf9b561ca97d5245bb9e0c4774b7

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [2/2] powernv/npu: Add a debugfs setting to change ATSD threshold
  2018-04-17  9:11 ` [PATCH 2/2] powernv/npu: Add a debugfs setting to change ATSD threshold Alistair Popple
  2018-04-17 21:45   ` Balbir Singh
@ 2018-07-23 15:11   ` Michael Ellerman
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-07-23 15:11 UTC (permalink / raw)
  To: Alistair Popple, linuxppc-dev; +Cc: Alistair Popple, mhairgrove, arbab

On Tue, 2018-04-17 at 09:11:29 UTC, Alistair Popple wrote:
> The threshold at which it becomes more efficient to coalesce a range of
> ATSDs into a single per-PID ATSD is currently not well understood due to a
> lack of real-world work loads. This patch adds a debugfs parameter allowing
> the threshold to be altered at runtime in order to aid future development
> and refinement of the value.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> Acked-by: Balbir Singh <bsingharora@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/99c3ce33a00bc40cb218af770ef00c

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-07-23 15:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17  9:11 [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range Alistair Popple
2018-04-17  9:11 ` [PATCH 2/2] powernv/npu: Add a debugfs setting to change ATSD threshold Alistair Popple
2018-04-17 21:45   ` Balbir Singh
2018-07-23 15:11   ` [2/2] " Michael Ellerman
2018-04-17  9:17 ` [PATCH 1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range Balbir Singh
2018-04-17 22:25   ` Balbir Singh
2018-04-20  3:51 ` Alistair Popple
2018-04-24  3:48 ` [1/2] " Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).