[PATCH] Don't touch single threaded PTEs which are on the right node

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-13 18:08 ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2016-10-13 18:08 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, peterz, mgorman, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

We had some problems with pages getting unmapped in single threaded
affinitized processes. It was tracked down to NUMA scanning.

In this case it doesn't make any sense to unmap pages if the
process is single threaded and the page is already on the
node the process is running on.

Add a check for this case into the numa protection code,
and skip unmapping if true.

In theory the process could be migrated later, but we
will eventually rescan and unmap and migrate then.

In theory this could be made more fancy: remembering this
state per process or even whole mm. However that would
need extra tracking and be more complicated, and the
simple check seems to work fine so far.

v2: Only do it for private VMAs. Move most of check out of
loop.
v3: Minor updates from Mel. Change code layout.
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 mm/mprotect.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index a4830f0325fe..11b8857c3437 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -68,11 +68,17 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 	pte_t *pte, oldpte;
 	spinlock_t *ptl;
 	unsigned long pages = 0;
+	int target_node = NUMA_NO_NODE;
 
 	pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
 	if (!pte)
 		return 0;
 
+	/* Get target node for single threaded private VMAs */
+	if (prot_numa && !(vma->vm_flags & VM_SHARED) &&
+	    atomic_read(&vma->vm_mm->mm_users) == 1)
+		target_node = numa_node_id();
+
 	arch_enter_lazy_mmu_mode();
 	do {
 		oldpte = *pte;
@@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				/* Avoid TLB flush if possible */
 				if (pte_protnone(oldpte))
 					continue;
+
+				/*
+				 * Don't mess with PTEs if page is already on the node
+				 * a single-threaded process is running on.
+				 */
+				if (target_node == page_to_nid(page))
+					continue;
 			}
 
 			ptent = ptep_modify_prot_start(mm, addr, pte);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-13 18:08 ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2016-10-13 18:08 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, peterz, mgorman, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

We had some problems with pages getting unmapped in single threaded
affinitized processes. It was tracked down to NUMA scanning.

In this case it doesn't make any sense to unmap pages if the
process is single threaded and the page is already on the
node the process is running on.

Add a check for this case into the numa protection code,
and skip unmapping if true.

In theory the process could be migrated later, but we
will eventually rescan and unmap and migrate then.

In theory this could be made more fancy: remembering this
state per process or even whole mm. However that would
need extra tracking and be more complicated, and the
simple check seems to work fine so far.

v2: Only do it for private VMAs. Move most of check out of
loop.
v3: Minor updates from Mel. Change code layout.
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 mm/mprotect.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index a4830f0325fe..11b8857c3437 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -68,11 +68,17 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 	pte_t *pte, oldpte;
 	spinlock_t *ptl;
 	unsigned long pages = 0;
+	int target_node = NUMA_NO_NODE;
 
 	pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
 	if (!pte)
 		return 0;
 
+	/* Get target node for single threaded private VMAs */
+	if (prot_numa && !(vma->vm_flags & VM_SHARED) &&
+	    atomic_read(&vma->vm_mm->mm_users) == 1)
+		target_node = numa_node_id();
+
 	arch_enter_lazy_mmu_mode();
 	do {
 		oldpte = *pte;
@@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				/* Avoid TLB flush if possible */
 				if (pte_protnone(oldpte))
 					continue;
+
+				/*
+				 * Don't mess with PTEs if page is already on the node
+				 * a single-threaded process is running on.
+				 */
+				if (target_node == page_to_nid(page))
+					continue;
 			}
 
 			ptent = ptep_modify_prot_start(mm, addr, pte);
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-12 16:15 ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2016-10-12 16:15 UTC (permalink / raw)
  To: peterz; +Cc: linux-mm, akpm, mgorman, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

We had some problems with pages getting unmapped in single threaded
affinitized processes. It was tracked down to NUMA scanning.

In this case it doesn't make any sense to unmap pages if the
process is single threaded and the page is already on the
node the process is running on.

Add a check for this case into the numa protection code,
and skip unmapping if true.

In theory the process could be migrated later, but we
will eventually rescan and unmap and migrate then.

In theory this could be made more fancy: remembering this
state per process or even whole mm. However that would
need extra tracking and be more complicated, and the
simple check seems to work fine so far.

v2: Only do it for private VMAs. Move most of check out of
loop.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 mm/mprotect.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index a4830f0325fe..e9473e7e1468 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -68,11 +68,17 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 	pte_t *pte, oldpte;
 	spinlock_t *ptl;
 	unsigned long pages = 0;
+	int target_node = -1;
 
 	pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
 	if (!pte)
 		return 0;
 
+	if (prot_numa &&
+	    !(vma->vm_flags & VM_SHARED) &&
+	    atomic_read(&vma->vm_mm->mm_users) == 1)
+	    target_node = cpu_to_node(raw_smp_processor_id());
+
 	arch_enter_lazy_mmu_mode();
 	do {
 		oldpte = *pte;
@@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				/* Avoid TLB flush if possible */
 				if (pte_protnone(oldpte))
 					continue;
+
+				/*
+				 * Don't mess with PTEs if page is already on the node
+				 * a single-threaded process is running on.
+				 */
+				if (target_node == page_to_nid(page))
+					continue;
 			}
 
 			ptent = ptep_modify_prot_start(mm, addr, pte);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-12 16:15 ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2016-10-12 16:15 UTC (permalink / raw)
  To: peterz; +Cc: linux-mm, akpm, mgorman, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

We had some problems with pages getting unmapped in single threaded
affinitized processes. It was tracked down to NUMA scanning.

In this case it doesn't make any sense to unmap pages if the
process is single threaded and the page is already on the
node the process is running on.

Add a check for this case into the numa protection code,
and skip unmapping if true.

In theory the process could be migrated later, but we
will eventually rescan and unmap and migrate then.

In theory this could be made more fancy: remembering this
state per process or even whole mm. However that would
need extra tracking and be more complicated, and the
simple check seems to work fine so far.

v2: Only do it for private VMAs. Move most of check out of
loop.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 mm/mprotect.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index a4830f0325fe..e9473e7e1468 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -68,11 +68,17 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 	pte_t *pte, oldpte;
 	spinlock_t *ptl;
 	unsigned long pages = 0;
+	int target_node = -1;
 
 	pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
 	if (!pte)
 		return 0;
 
+	if (prot_numa &&
+	    !(vma->vm_flags & VM_SHARED) &&
+	    atomic_read(&vma->vm_mm->mm_users) == 1)
+	    target_node = cpu_to_node(raw_smp_processor_id());
+
 	arch_enter_lazy_mmu_mode();
 	do {
 		oldpte = *pte;
@@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				/* Avoid TLB flush if possible */
 				if (pte_protnone(oldpte))
 					continue;
+
+				/*
+				 * Don't mess with PTEs if page is already on the node
+				 * a single-threaded process is running on.
+				 */
+				if (target_node == page_to_nid(page))
+					continue;
 			}
 
 			ptent = ptep_modify_prot_start(mm, addr, pte);
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] Don't touch single threaded PTEs which are on the right node
  2016-10-12 16:15 ` Andi Kleen
@ 2016-10-13  8:39   ` Mel Gorman
  -1 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2016-10-13  8:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: peterz, linux-mm, akpm, linux-kernel, Andi Kleen

On Wed, Oct 12, 2016 at 09:15:49AM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> We had some problems with pages getting unmapped in single threaded
> affinitized processes. It was tracked down to NUMA scanning.
> 
> In this case it doesn't make any sense to unmap pages if the
> process is single threaded and the page is already on the
> node the process is running on.
> 
> Add a check for this case into the numa protection code,
> and skip unmapping if true.
> 
> In theory the process could be migrated later, but we
> will eventually rescan and unmap and migrate then.
> 
> In theory this could be made more fancy: remembering this
> state per process or even whole mm. However that would
> need extra tracking and be more complicated, and the
> simple check seems to work fine so far.
> 
> v2: Only do it for private VMAs. Move most of check out of
> loop.
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Minor comments

> ---
>  mm/mprotect.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index a4830f0325fe..e9473e7e1468 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -68,11 +68,17 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>  	pte_t *pte, oldpte;
>  	spinlock_t *ptl;
>  	unsigned long pages = 0;
> +	int target_node = -1;
>  

Proper convention is to use NUMA_NO_NODE instead of -1 although it's not
always adhered to.

>  	pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
>  	if (!pte)
>  		return 0;
>  
> +	if (prot_numa &&
> +	    !(vma->vm_flags & VM_SHARED) &&
> +	    atomic_read(&vma->vm_mm->mm_users) == 1)
> +	    target_node = cpu_to_node(raw_smp_processor_id());
> +

Use numa_node_id() instead of open-coding this. A short comment probably
would not hurt even if git blame should make it obvious.

>  	arch_enter_lazy_mmu_mode();
>  	do {
>  		oldpte = *pte;
> @@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>  				/* Avoid TLB flush if possible */
>  				if (pte_protnone(oldpte))
>  					continue;
> +
> +				/*
> +				 * Don't mess with PTEs if page is already on the node
> +				 * a single-threaded process is running on.
> +				 */
> +				if (target_node == page_to_nid(page))
> +					continue;
>  			}
>  

Check target_node != NUMA_NODE && target_node == page_to_nid(page) to
avoid unnecessary page->flag masking and shifts?

The last one will be fairly marginal, the others are taste so whether
you spin a v3 with the corrections or not;

Acked-by: Mel Gorman <mgorman@suse.de>

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-13  8:39   ` Mel Gorman
  0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2016-10-13  8:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: peterz, linux-mm, akpm, linux-kernel, Andi Kleen

On Wed, Oct 12, 2016 at 09:15:49AM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> We had some problems with pages getting unmapped in single threaded
> affinitized processes. It was tracked down to NUMA scanning.
> 
> In this case it doesn't make any sense to unmap pages if the
> process is single threaded and the page is already on the
> node the process is running on.
> 
> Add a check for this case into the numa protection code,
> and skip unmapping if true.
> 
> In theory the process could be migrated later, but we
> will eventually rescan and unmap and migrate then.
> 
> In theory this could be made more fancy: remembering this
> state per process or even whole mm. However that would
> need extra tracking and be more complicated, and the
> simple check seems to work fine so far.
> 
> v2: Only do it for private VMAs. Move most of check out of
> loop.
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Minor comments

> ---
>  mm/mprotect.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index a4830f0325fe..e9473e7e1468 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -68,11 +68,17 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>  	pte_t *pte, oldpte;
>  	spinlock_t *ptl;
>  	unsigned long pages = 0;
> +	int target_node = -1;
>  

Proper convention is to use NUMA_NO_NODE instead of -1 although it's not
always adhered to.

>  	pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
>  	if (!pte)
>  		return 0;
>  
> +	if (prot_numa &&
> +	    !(vma->vm_flags & VM_SHARED) &&
> +	    atomic_read(&vma->vm_mm->mm_users) == 1)
> +	    target_node = cpu_to_node(raw_smp_processor_id());
> +

Use numa_node_id() instead of open-coding this. A short comment probably
would not hurt even if git blame should make it obvious.

>  	arch_enter_lazy_mmu_mode();
>  	do {
>  		oldpte = *pte;
> @@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>  				/* Avoid TLB flush if possible */
>  				if (pte_protnone(oldpte))
>  					continue;
> +
> +				/*
> +				 * Don't mess with PTEs if page is already on the node
> +				 * a single-threaded process is running on.
> +				 */
> +				if (target_node == page_to_nid(page))
> +					continue;
>  			}
>  

Check target_node != NUMA_NODE && target_node == page_to_nid(page) to
avoid unnecessary page->flag masking and shifts?

The last one will be fairly marginal, the others are taste so whether
you spin a v3 with the corrections or not;

Acked-by: Mel Gorman <mgorman@suse.de>

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Don't touch single threaded PTEs which are on the right node
  2016-10-13  8:39   ` Mel Gorman
@ 2016-10-13 18:04     ` Andi Kleen
  -1 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2016-10-13 18:04 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andi Kleen, peterz, linux-mm, akpm, linux-kernel

> >  	do {
> >  		oldpte = *pte;
> > @@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> >  				/* Avoid TLB flush if possible */
> >  				if (pte_protnone(oldpte))
> >  					continue;
> > +
> > +				/*
> > +				 * Don't mess with PTEs if page is already on the node
> > +				 * a single-threaded process is running on.
> > +				 */
> > +				if (target_node == page_to_nid(page))
> > +					continue;
> >  			}
> >  
> 
> Check target_node != NUMA_NODE && target_node == page_to_nid(page) to
> avoid unnecessary page->flag masking and shifts?

I didn't do this last change because I expect a potentially mispredicted
check is more expensive than some shifting/masking.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-13 18:04     ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2016-10-13 18:04 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andi Kleen, peterz, linux-mm, akpm, linux-kernel

> >  	do {
> >  		oldpte = *pte;
> > @@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> >  				/* Avoid TLB flush if possible */
> >  				if (pte_protnone(oldpte))
> >  					continue;
> > +
> > +				/*
> > +				 * Don't mess with PTEs if page is already on the node
> > +				 * a single-threaded process is running on.
> > +				 */
> > +				if (target_node == page_to_nid(page))
> > +					continue;
> >  			}
> >  
> 
> Check target_node != NUMA_NODE && target_node == page_to_nid(page) to
> avoid unnecessary page->flag masking and shifts?

I didn't do this last change because I expect a potentially mispredicted
check is more expensive than some shifting/masking.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Don't touch single threaded PTEs which are on the right node
  2016-10-13 18:04     ` Andi Kleen
@ 2016-10-13 18:16       ` Mel Gorman
  -1 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2016-10-13 18:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andi Kleen, peterz, linux-mm, akpm, linux-kernel

On Thu, Oct 13, 2016 at 11:04:02AM -0700, Andi Kleen wrote:
> > >  	do {
> > >  		oldpte = *pte;
> > > @@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> > >  				/* Avoid TLB flush if possible */
> > >  				if (pte_protnone(oldpte))
> > >  					continue;
> > > +
> > > +				/*
> > > +				 * Don't mess with PTEs if page is already on the node
> > > +				 * a single-threaded process is running on.
> > > +				 */
> > > +				if (target_node == page_to_nid(page))
> > > +					continue;
> > >  			}
> > >  
> > 
> > Check target_node != NUMA_NODE && target_node == page_to_nid(page) to
> > avoid unnecessary page->flag masking and shifts?
> 
> I didn't do this last change because I expect a potentially mispredicted
> check is more expensive than some shifting/masking.
> 

Ok, that's fair enough. For something that minor I expect it to be a
case of "you win some you lose some" depending on workload, CPU and
phase of the moon.

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-13 18:16       ` Mel Gorman
  0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2016-10-13 18:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andi Kleen, peterz, linux-mm, akpm, linux-kernel

On Thu, Oct 13, 2016 at 11:04:02AM -0700, Andi Kleen wrote:
> > >  	do {
> > >  		oldpte = *pte;
> > > @@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> > >  				/* Avoid TLB flush if possible */
> > >  				if (pte_protnone(oldpte))
> > >  					continue;
> > > +
> > > +				/*
> > > +				 * Don't mess with PTEs if page is already on the node
> > > +				 * a single-threaded process is running on.
> > > +				 */
> > > +				if (target_node == page_to_nid(page))
> > > +					continue;
> > >  			}
> > >  
> > 
> > Check target_node != NUMA_NODE && target_node == page_to_nid(page) to
> > avoid unnecessary page->flag masking and shifts?
> 
> I didn't do this last change because I expect a potentially mispredicted
> check is more expensive than some shifting/masking.
> 

Ok, that's fair enough. For something that minor I expect it to be a
case of "you win some you lose some" depending on workload, CPU and
phase of the moon.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-11 20:28 ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2016-10-11 20:28 UTC (permalink / raw)
  To: peterz; +Cc: linux-mm, akpm, mgorman, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

We had some problems with pages getting unmapped in single threaded
affinitized processes. It was tracked down to NUMA scanning.

In this case it doesn't make any sense to unmap pages if the
process is single threaded and the page is already on the
node the process is running on.

Add a check for this case into the numa protection code,
and skip unmapping if true.

In theory the process could be migrated later, but we
will eventually rescan and unmap and migrate then.

In theory this could be made more fancy: remembering this
state per process or even whole mm. However that would
need extra tracking and be more complicated, and the
simple check seems to work fine so far.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 mm/mprotect.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index a4830f0325fe..e8028658e817 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -94,6 +94,14 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				/* Avoid TLB flush if possible */
 				if (pte_protnone(oldpte))
 					continue;
+
+				/*
+				 * Don't mess with PTEs if page is already on the node
+				 * a single-threaded process is running on.
+				 */
+				if (atomic_read(&vma->vm_mm->mm_users) == 1 &&
+				    cpu_to_node(raw_smp_processor_id()) == page_to_nid(page))
+					continue;
 			}

 			ptent = ptep_modify_prot_start(mm, addr, pte);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-11 20:28 ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2016-10-11 20:28 UTC (permalink / raw)
  To: peterz; +Cc: linux-mm, akpm, mgorman, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

We had some problems with pages getting unmapped in single threaded
affinitized processes. It was tracked down to NUMA scanning.

In this case it doesn't make any sense to unmap pages if the
process is single threaded and the page is already on the
node the process is running on.

Add a check for this case into the numa protection code,
and skip unmapping if true.

In theory the process could be migrated later, but we
will eventually rescan and unmap and migrate then.

In theory this could be made more fancy: remembering this
state per process or even whole mm. However that would
need extra tracking and be more complicated, and the
simple check seems to work fine so far.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 mm/mprotect.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index a4830f0325fe..e8028658e817 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -94,6 +94,14 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				/* Avoid TLB flush if possible */
 				if (pte_protnone(oldpte))
 					continue;
+
+				/*
+				 * Don't mess with PTEs if page is already on the node
+				 * a single-threaded process is running on.
+				 */
+				if (atomic_read(&vma->vm_mm->mm_users) == 1 &&
+				    cpu_to_node(raw_smp_processor_id()) == page_to_nid(page))
+					continue;
 			}

 			ptent = ptep_modify_prot_start(mm, addr, pte);
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] Don't touch single threaded PTEs which are on the right node
  2016-10-11 20:28 ` Andi Kleen
@ 2016-10-12  5:49   ` Mel Gorman
  -1 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2016-10-12  5:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: peterz, linux-mm, akpm, linux-kernel, Andi Kleen

On Tue, Oct 11, 2016 at 01:28:58PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> We had some problems with pages getting unmapped in single threaded
> affinitized processes. It was tracked down to NUMA scanning.
> 
> In this case it doesn't make any sense to unmap pages if the
> process is single threaded and the page is already on the
> node the process is running on.
> 
> Add a check for this case into the numa protection code,
> and skip unmapping if true.
> 
> In theory the process could be migrated later, but we
> will eventually rescan and unmap and migrate then.
> 
> In theory this could be made more fancy: remembering this
> state per process or even whole mm. However that would
> need extra tracking and be more complicated, and the
> simple check seems to work fine so far.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  mm/mprotect.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index a4830f0325fe..e8028658e817 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -94,6 +94,14 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>  				/* Avoid TLB flush if possible */
>  				if (pte_protnone(oldpte))
>  					continue;
> +
> +				/*
> +				 * Don't mess with PTEs if page is already on the node
> +				 * a single-threaded process is running on.
> +				 */
> +				if (atomic_read(&vma->vm_mm->mm_users) == 1 &&
> +				    cpu_to_node(raw_smp_processor_id()) == page_to_nid(page))
> +					continue;
>  			}

You shouldn't need to check the number of mm_users and the node the task
is running on for every PTE being scanned.

A more important corner case is if the VMA is shared with a task running on
another node. By avoiding the NUMA hinting faults here, the hinting faults
trapped by the remote process will appear exclusive and allow migration of
the page. This will happen even if the single-threade task is continually
using the pages.

When you said "we had some problems", you didn't describe the workload or
what the problems were (I'm assuming latency/jitter). Would restricting
this check to private VMAs be sufficient?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-12  5:49   ` Mel Gorman
  0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2016-10-12  5:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: peterz, linux-mm, akpm, linux-kernel, Andi Kleen

On Tue, Oct 11, 2016 at 01:28:58PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> We had some problems with pages getting unmapped in single threaded
> affinitized processes. It was tracked down to NUMA scanning.
> 
> In this case it doesn't make any sense to unmap pages if the
> process is single threaded and the page is already on the
> node the process is running on.
> 
> Add a check for this case into the numa protection code,
> and skip unmapping if true.
> 
> In theory the process could be migrated later, but we
> will eventually rescan and unmap and migrate then.
> 
> In theory this could be made more fancy: remembering this
> state per process or even whole mm. However that would
> need extra tracking and be more complicated, and the
> simple check seems to work fine so far.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  mm/mprotect.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index a4830f0325fe..e8028658e817 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -94,6 +94,14 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>  				/* Avoid TLB flush if possible */
>  				if (pte_protnone(oldpte))
>  					continue;
> +
> +				/*
> +				 * Don't mess with PTEs if page is already on the node
> +				 * a single-threaded process is running on.
> +				 */
> +				if (atomic_read(&vma->vm_mm->mm_users) == 1 &&
> +				    cpu_to_node(raw_smp_processor_id()) == page_to_nid(page))
> +					continue;
>  			}

You shouldn't need to check the number of mm_users and the node the task
is running on for every PTE being scanned.

A more important corner case is if the VMA is shared with a task running on
another node. By avoiding the NUMA hinting faults here, the hinting faults
trapped by the remote process will appear exclusive and allow migration of
the page. This will happen even if the single-threade task is continually
using the pages.

When you said "we had some problems", you didn't describe the workload or
what the problems were (I'm assuming latency/jitter). Would restricting
this check to private VMAs be sufficient?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Don't touch single threaded PTEs which are on the right node
  2016-10-12  5:49   ` Mel Gorman
@ 2016-10-12 15:40     ` Andi Kleen
  -1 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2016-10-12 15:40 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andi Kleen, peterz, linux-mm, akpm, linux-kernel

> You shouldn't need to check the number of mm_users and the node the task
> is running on for every PTE being scanned.

Ok.

> 
> A more important corner case is if the VMA is shared with a task running on
> another node. By avoiding the NUMA hinting faults here, the hinting faults
> trapped by the remote process will appear exclusive and allow migration of
> the page. This will happen even if the single-threade task is continually
> using the pages.
> 
> When you said "we had some problems", you didn't describe the workload or
> what the problems were (I'm assuming latency/jitter). Would restricting
> this check to private VMAs be sufficient?

The problem we ran into was that prefetches were not working, but
yes it would also cause extra latencies and jitter and in general
is unnecessary overhead.

It is super easy to reproduce. Just run main() {for(;;);}
It will eventually get some of its pages unmapped.

Yes doing it for private only would be fine. I'll add a check
for that.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Don't touch single threaded PTEs which are on the right node
@ 2016-10-12 15:40     ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2016-10-12 15:40 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andi Kleen, peterz, linux-mm, akpm, linux-kernel

> You shouldn't need to check the number of mm_users and the node the task
> is running on for every PTE being scanned.

Ok.

> 
> A more important corner case is if the VMA is shared with a task running on
> another node. By avoiding the NUMA hinting faults here, the hinting faults
> trapped by the remote process will appear exclusive and allow migration of
> the page. This will happen even if the single-threade task is continually
> using the pages.
> 
> When you said "we had some problems", you didn't describe the workload or
> what the problems were (I'm assuming latency/jitter). Would restricting
> this check to private VMAs be sufficient?

The problem we ran into was that prefetches were not working, but
yes it would also cause extra latencies and jitter and in general
is unnecessary overhead.

It is super easy to reproduce. Just run main() {for(;;);}
It will eventually get some of its pages unmapped.

Yes doing it for private only would be fine. I'll add a check
for that.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-10-13 18:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-13 18:08 [PATCH] Don't touch single threaded PTEs which are on the right node Andi Kleen
2016-10-13 18:08 ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2016-10-12 16:15 Andi Kleen
2016-10-12 16:15 ` Andi Kleen
2016-10-13  8:39 ` Mel Gorman
2016-10-13  8:39   ` Mel Gorman
2016-10-13 18:04   ` Andi Kleen
2016-10-13 18:04     ` Andi Kleen
2016-10-13 18:16     ` Mel Gorman
2016-10-13 18:16       ` Mel Gorman
2016-10-11 20:28 Andi Kleen
2016-10-11 20:28 ` Andi Kleen
2016-10-12  5:49 ` Mel Gorman
2016-10-12  5:49   ` Mel Gorman
2016-10-12 15:40   ` Andi Kleen
2016-10-12 15:40     ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.