linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
@ 2022-03-17  6:50 Bibo Mao
  2022-04-27 20:48 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Bibo Mao @ 2022-03-17  6:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, David Hildenbrand, Yang Shi

collapse huge page will copy huge page from general small pages,
dest node is calculated from most one of source pages, however
THP daemon is not scheduled on dest node. The performance may be
poor since huge page copying across nodes, also cache is not used
for target node. With this patch, khugepaged daemon switches to
the same numa node with huge page. It saves copying time and makes
use of local cache better.

With this patch, specint 2006 base performance is improved with 6%
on Loongson 3C5000L platform with 32 cores and 8 numa nodes.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
changelog:
V2: remove node record for thp daemon
V3: remove unlikely statement
---
 mm/khugepaged.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 131492fd1148..b3cf0885f5a2 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 	struct vm_area_struct *vma;
 	struct mmu_notifier_range range;
 	gfp_t gfp;
+	const struct cpumask *cpumask;
 
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
@@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 * that. We will recheck the vma after taking it again in write mode.
 	 */
 	mmap_read_unlock(mm);
+
+	/* sched to specified node before huage page memory copy */
+	if (task_node(current) != node) {
+		cpumask = cpumask_of_node(node);
+		if (!cpumask_empty(cpumask))
+			set_cpus_allowed_ptr(current, cpumask);
+	}
 	new_page = khugepaged_alloc_page(hpage, gfp, node);
 	if (!new_page) {
 		result = SCAN_ALLOC_HUGE_PAGE_FAIL;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-03-17  6:50 [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page Bibo Mao
@ 2022-04-27 20:48 ` Andrew Morton
  2022-04-27 22:29   ` Yang Shi
  2022-04-28 10:07   ` David Hildenbrand
  2022-04-28 13:50 ` Peter Xu
  2022-04-28 15:17 ` David Hildenbrand
  2 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2022-04-27 20:48 UTC (permalink / raw)
  To: Bibo Mao; +Cc: linux-mm, linux-kernel, David Hildenbrand, Yang Shi

On Thu, 17 Mar 2022 02:50:24 -0400 Bibo Mao <maobibo@loongson.cn> wrote:

> collapse huge page will copy huge page from general small pages,
> dest node is calculated from most one of source pages, however
> THP daemon is not scheduled on dest node. The performance may be
> poor since huge page copying across nodes, also cache is not used
> for target node. With this patch, khugepaged daemon switches to
> the same numa node with huge page. It saves copying time and makes
> use of local cache better.
> 
> With this patch, specint 2006 base performance is improved with 6%
> on Loongson 3C5000L platform with 32 cores and 8 numa nodes.
> 

Are there any acks for this one please?

> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	struct vm_area_struct *vma;
>  	struct mmu_notifier_range range;
>  	gfp_t gfp;
> +	const struct cpumask *cpumask;
>  
>  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>  
> @@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	 * that. We will recheck the vma after taking it again in write mode.
>  	 */
>  	mmap_read_unlock(mm);
> +
> +	/* sched to specified node before huage page memory copy */
> +	if (task_node(current) != node) {
> +		cpumask = cpumask_of_node(node);
> +		if (!cpumask_empty(cpumask))
> +			set_cpus_allowed_ptr(current, cpumask);
> +	}
>  	new_page = khugepaged_alloc_page(hpage, gfp, node);
>  	if (!new_page) {
>  		result = SCAN_ALLOC_HUGE_PAGE_FAIL;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-04-27 20:48 ` Andrew Morton
@ 2022-04-27 22:29   ` Yang Shi
  2022-04-28 10:07   ` David Hildenbrand
  1 sibling, 0 replies; 13+ messages in thread
From: Yang Shi @ 2022-04-27 22:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Bibo Mao, Linux MM, Linux Kernel Mailing List, David Hildenbrand

On Wed, Apr 27, 2022 at 1:48 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 17 Mar 2022 02:50:24 -0400 Bibo Mao <maobibo@loongson.cn> wrote:
>
> > collapse huge page will copy huge page from general small pages,
> > dest node is calculated from most one of source pages, however
> > THP daemon is not scheduled on dest node. The performance may be
> > poor since huge page copying across nodes, also cache is not used
> > for target node. With this patch, khugepaged daemon switches to
> > the same numa node with huge page. It saves copying time and makes
> > use of local cache better.
> >
> > With this patch, specint 2006 base performance is improved with 6%
> > on Loongson 3C5000L platform with 32 cores and 8 numa nodes.
> >
>
> Are there any acks for this one please?

TBH, I'm a little bit reluctant to this patch. I agree running
khugepaged on the same node with the source and dest pages could
reduce cross socket traffic and use cache more efficiently. But I'm
not sure whether it is really worth it or not. For example, on a busy
system, khugepaged may jump from cpus to cpus, that may interfere with
the scheduler, and khugepaged has to wait to run on the target cpu, it
may take indefinite time. In addition the yield also depends on the
locality of source pages (how many of them are on the same node), how
often khugepaged is woken up on a different node, etc.

Even though it was proved worth it, I'd prefer set_cpus_allowed_ptr()
is called between mmap_read_unlock() and mmap_write_lock() in order to
avoid waste effort for some error paths.

>
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> >       struct vm_area_struct *vma;
> >       struct mmu_notifier_range range;
> >       gfp_t gfp;
> > +     const struct cpumask *cpumask;
> >
> >       VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> >
> > @@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm,
> >        * that. We will recheck the vma after taking it again in write mode.
> >        */
> >       mmap_read_unlock(mm);
> > +
> > +     /* sched to specified node before huage page memory copy */
> > +     if (task_node(current) != node) {
> > +             cpumask = cpumask_of_node(node);
> > +             if (!cpumask_empty(cpumask))
> > +                     set_cpus_allowed_ptr(current, cpumask);
> > +     }
> >       new_page = khugepaged_alloc_page(hpage, gfp, node);
> >       if (!new_page) {
> >               result = SCAN_ALLOC_HUGE_PAGE_FAIL;
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-04-27 20:48 ` Andrew Morton
  2022-04-27 22:29   ` Yang Shi
@ 2022-04-28 10:07   ` David Hildenbrand
  1 sibling, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2022-04-28 10:07 UTC (permalink / raw)
  To: Andrew Morton, Bibo Mao; +Cc: linux-mm, linux-kernel, Yang Shi

On 27.04.22 22:48, Andrew Morton wrote:
> On Thu, 17 Mar 2022 02:50:24 -0400 Bibo Mao <maobibo@loongson.cn> wrote:
> 
>> collapse huge page will copy huge page from general small pages,
>> dest node is calculated from most one of source pages, however
>> THP daemon is not scheduled on dest node. The performance may be
>> poor since huge page copying across nodes, also cache is not used
>> for target node. With this patch, khugepaged daemon switches to
>> the same numa node with huge page. It saves copying time and makes
>> use of local cache better.
>>
>> With this patch, specint 2006 base performance is improved with 6%
>> on Loongson 3C5000L platform with 32 cores and 8 numa nodes.
>>
> 
> Are there any acks for this one please?

I'll have a look!


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-03-17  6:50 [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page Bibo Mao
  2022-04-27 20:48 ` Andrew Morton
@ 2022-04-28 13:50 ` Peter Xu
  2022-04-28 15:17 ` David Hildenbrand
  2 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2022-04-28 13:50 UTC (permalink / raw)
  To: Bibo Mao
  Cc: Andrew Morton, linux-mm, linux-kernel, David Hildenbrand, Yang Shi

Hi, Bibo,

On Thu, Mar 17, 2022 at 02:50:24AM -0400, Bibo Mao wrote:
> collapse huge page will copy huge page from general small pages,
> dest node is calculated from most one of source pages, however
> THP daemon is not scheduled on dest node. The performance may be
> poor since huge page copying across nodes, also cache is not used
> for target node. With this patch, khugepaged daemon switches to
> the same numa node with huge page. It saves copying time and makes
> use of local cache better.
> 
> With this patch, specint 2006 base performance is improved with 6%
> on Loongson 3C5000L platform with 32 cores and 8 numa nodes.

Totally not familiar with specint, so a pure question is whether it'll make
a real difference in real-world workloads?  As I assume in real world the
memory affinity to the processors should change relatively slow on tuned
systems, so even if khugepaged copied a bit slower then it'll not affect
much on the real workload after the movement completes?

The other question is if it makes sense, whether it's applicable to file
thps too (collapse_file)?

Thanks,

> 
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
> changelog:
> V2: remove node record for thp daemon
> V3: remove unlikely statement
> ---
>  mm/khugepaged.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 131492fd1148..b3cf0885f5a2 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	struct vm_area_struct *vma;
>  	struct mmu_notifier_range range;
>  	gfp_t gfp;
> +	const struct cpumask *cpumask;
>  
>  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>  
> @@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	 * that. We will recheck the vma after taking it again in write mode.
>  	 */
>  	mmap_read_unlock(mm);
> +
> +	/* sched to specified node before huage page memory copy */
> +	if (task_node(current) != node) {
> +		cpumask = cpumask_of_node(node);
> +		if (!cpumask_empty(cpumask))
> +			set_cpus_allowed_ptr(current, cpumask);
> +	}
>  	new_page = khugepaged_alloc_page(hpage, gfp, node);
>  	if (!new_page) {
>  		result = SCAN_ALLOC_HUGE_PAGE_FAIL;

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-03-17  6:50 [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page Bibo Mao
  2022-04-27 20:48 ` Andrew Morton
  2022-04-28 13:50 ` Peter Xu
@ 2022-04-28 15:17 ` David Hildenbrand
  2022-04-28 16:34   ` Peter Xu
  2 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2022-04-28 15:17 UTC (permalink / raw)
  To: Bibo Mao, Andrew Morton; +Cc: linux-mm, linux-kernel, Yang Shi

On 17.03.22 07:50, Bibo Mao wrote:
> collapse huge page will copy huge page from general small pages,
> dest node is calculated from most one of source pages, however
> THP daemon is not scheduled on dest node. The performance may be
> poor since huge page copying across nodes, also cache is not used
> for target node. With this patch, khugepaged daemon switches to
> the same numa node with huge page. It saves copying time and makes
> use of local cache better.
> 
> With this patch, specint 2006 base performance is improved with 6%
> on Loongson 3C5000L platform with 32 cores and 8 numa nodes.

If it helps, that's nice as long as it doesn't hurt other cases.

> 
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
> changelog:
> V2: remove node record for thp daemon
> V3: remove unlikely statement
> ---
>  mm/khugepaged.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 131492fd1148..b3cf0885f5a2 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	struct vm_area_struct *vma;
>  	struct mmu_notifier_range range;
>  	gfp_t gfp;
> +	const struct cpumask *cpumask;
>  
>  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>  
> @@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	 * that. We will recheck the vma after taking it again in write mode.
>  	 */
>  	mmap_read_unlock(mm);
> +
> +	/* sched to specified node before huage page memory copy */

huage? I assume "huge"

> +	if (task_node(current) != node) {
> +		cpumask = cpumask_of_node(node);
> +		if (!cpumask_empty(cpumask))
> +			set_cpus_allowed_ptr(current, cpumask);
> +	}

I wonder if that will always be optimized out without NUMA and if we
want to check for IS_ENABLED(CONFIG_NUMA).


Regarding comments from others, I agree: I think what we'd actually want
is something like "try to reschedule to one of these CPUs immediately.
If they are all busy, just stay here.


Also, I do wonder if there could already be scenarios where someone
wants to let khugepaged run only on selected housekeeping CPUs (e.g.,
when pinning VCPUs in a VM to physical CPUs). It might even degrade the
VM performance in that case if we schedule something unrelated on these
CPUs. (I don't know which interfaces we might already have to configure
housekeeping CPUs for kthreads).

I can spot in kernel/kthread.c:kthread()

set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));

Hmmmmm ...


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-04-28 15:17 ` David Hildenbrand
@ 2022-04-28 16:34   ` Peter Xu
  2022-05-13  0:36     ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2022-04-28 16:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Bibo Mao, Andrew Morton, linux-mm, linux-kernel, Yang Shi

On Thu, Apr 28, 2022 at 05:17:07PM +0200, David Hildenbrand wrote:
> On 17.03.22 07:50, Bibo Mao wrote:
> > collapse huge page will copy huge page from general small pages,
> > dest node is calculated from most one of source pages, however
> > THP daemon is not scheduled on dest node. The performance may be
> > poor since huge page copying across nodes, also cache is not used
> > for target node. With this patch, khugepaged daemon switches to
> > the same numa node with huge page. It saves copying time and makes
> > use of local cache better.
> > 
> > With this patch, specint 2006 base performance is improved with 6%
> > on Loongson 3C5000L platform with 32 cores and 8 numa nodes.
> 
> If it helps, that's nice as long as it doesn't hurt other cases.
> 
> > 
> > Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> > ---
> > changelog:
> > V2: remove node record for thp daemon
> > V3: remove unlikely statement
> > ---
> >  mm/khugepaged.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 131492fd1148..b3cf0885f5a2 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> >  	struct vm_area_struct *vma;
> >  	struct mmu_notifier_range range;
> >  	gfp_t gfp;
> > +	const struct cpumask *cpumask;
> >  
> >  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> >  
> > @@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm,
> >  	 * that. We will recheck the vma after taking it again in write mode.
> >  	 */
> >  	mmap_read_unlock(mm);
> > +
> > +	/* sched to specified node before huage page memory copy */
> 
> huage? I assume "huge"
> 
> > +	if (task_node(current) != node) {
> > +		cpumask = cpumask_of_node(node);
> > +		if (!cpumask_empty(cpumask))
> > +			set_cpus_allowed_ptr(current, cpumask);
> > +	}
> 
> I wonder if that will always be optimized out without NUMA and if we
> want to check for IS_ENABLED(CONFIG_NUMA).
> 
> 
> Regarding comments from others, I agree: I think what we'd actually want
> is something like "try to reschedule to one of these CPUs immediately.
> If they are all busy, just stay here.
> 
> 
> Also, I do wonder if there could already be scenarios where someone
> wants to let khugepaged run only on selected housekeeping CPUs (e.g.,
> when pinning VCPUs in a VM to physical CPUs). It might even degrade the
> VM performance in that case if we schedule something unrelated on these
> CPUs. (I don't know which interfaces we might already have to configure
> housekeeping CPUs for kthreads).
> 
> I can spot in kernel/kthread.c:kthread()
> 
> set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));
> 
> Hmmmmm ...

Yes that's a valid point, for RT afaik many users tunes the kernel threads
specifically on demand by pinning them.  So I'm not sure how this new
algorithm could break some users already, by either (1) trying to pin
khugepaged onto some isolated cores (which can cause spikes?), or (2) mess
up with the admin's previous pin settings on the khugepagd kthread.

The other thing is the khugepaged movement on the cores seems to be quite
random, because the pages it scans can be unpredictably stored on different
numa nodes, so logically it can start bouncing easily on some hosts and
that does sound questionalbe.. as I raised the (pure) question previously
on the 2nd point irrelevant of the benchmark result.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-04-28 16:34   ` Peter Xu
@ 2022-05-13  0:36     ` Andrew Morton
  2022-05-13  1:19       ` maobibo
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2022-05-13  0:36 UTC (permalink / raw)
  To: Peter Xu; +Cc: David Hildenbrand, Bibo Mao, linux-mm, linux-kernel, Yang Shi

On Thu, 28 Apr 2022 12:34:07 -0400 Peter Xu <peterx@redhat.com> wrote:

> On Thu, Apr 28, 2022 at 05:17:07PM +0200, David Hildenbrand wrote:
> > On 17.03.22 07:50, Bibo Mao wrote:
> > > collapse huge page will copy huge page from general small pages,
> > > dest node is calculated from most one of source pages, however
> > > THP daemon is not scheduled on dest node. The performance may be
> > > poor since huge page copying across nodes, also cache is not used
> > > for target node. With this patch, khugepaged daemon switches to
> > > the same numa node with huge page. It saves copying time and makes
> > > use of local cache better.
> > > 
> > > With this patch, specint 2006 base performance is improved with 6%
> > > on Loongson 3C5000L platform with 32 cores and 8 numa nodes.
> > 
> > If it helps, that's nice as long as it doesn't hurt other cases.
> > 

Quite a bit of doubtful feedback and we have yet to hear from the
author.  I'll drop the patch.

Bibo, please resend at a later time if you feel the patch remains
desirable.  Please attempt to address the feedback via code changes
and/or changelogging.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-05-13  0:36     ` Andrew Morton
@ 2022-05-13  1:19       ` maobibo
  2022-05-13  1:29         ` maobibo
  0 siblings, 1 reply; 13+ messages in thread
From: maobibo @ 2022-05-13  1:19 UTC (permalink / raw)
  To: Andrew Morton, Peter Xu
  Cc: David Hildenbrand, linux-mm, linux-kernel, Yang Shi



在 2022/5/13 08:36, Andrew Morton 写道:
> On Thu, 28 Apr 2022 12:34:07 -0400 Peter Xu <peterx@redhat.com> wrote:
> 
>> On Thu, Apr 28, 2022 at 05:17:07PM +0200, David Hildenbrand wrote:
>>> On 17.03.22 07:50, Bibo Mao wrote:
>>>> collapse huge page will copy huge page from general small pages,
>>>> dest node is calculated from most one of source pages, however
>>>> THP daemon is not scheduled on dest node. The performance may be
>>>> poor since huge page copying across nodes, also cache is not used
>>>> for target node. With this patch, khugepaged daemon switches to
>>>> the same numa node with huge page. It saves copying time and makes
>>>> use of local cache better.
>>>>
>>>> With this patch, specint 2006 base performance is improved with 6%
>>>> on Loongson 3C5000L platform with 32 cores and 8 numa nodes.
>>>
>>> If it helps, that's nice as long as it doesn't hurt other cases.
>>>
> 
> Quite a bit of doubtful feedback and we have yet to hear from the
> author.  I'll drop the patch.
> 
> Bibo, please resend at a later time if you feel the patch remains
> desirable.  Please attempt to address the feedback via code changes
> and/or changelogging.
Sorry for the late response, the mail is filtered and I did not notice that. The result is not so obvious after bandwidth is improved between nodes, it is only about 1% improvement for specint2006 for 32 core on my box.

Now I do not see negative effective about this patch unless user wants to keep some cores separated from khugepaged daemon process.


regards
bibo,mao


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-05-13  1:19       ` maobibo
@ 2022-05-13  1:29         ` maobibo
  2022-05-13  1:49           ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: maobibo @ 2022-05-13  1:29 UTC (permalink / raw)
  To: Andrew Morton, Peter Xu
  Cc: David Hildenbrand, linux-mm, linux-kernel, Yang Shi



在 2022/5/13 09:19, maobibo 写道:
> 
> 
> 在 2022/5/13 08:36, Andrew Morton 写道:
>> On Thu, 28 Apr 2022 12:34:07 -0400 Peter Xu <peterx@redhat.com> wrote:
>>
>>> On Thu, Apr 28, 2022 at 05:17:07PM +0200, David Hildenbrand wrote:
>>>> On 17.03.22 07:50, Bibo Mao wrote:
>>>>> collapse huge page will copy huge page from general small pages,
>>>>> dest node is calculated from most one of source pages, however
>>>>> THP daemon is not scheduled on dest node. The performance may be
>>>>> poor since huge page copying across nodes, also cache is not used
>>>>> for target node. With this patch, khugepaged daemon switches to
>>>>> the same numa node with huge page. It saves copying time and makes
>>>>> use of local cache better.
>>>>>
>>>>> With this patch, specint 2006 base performance is improved with 6%
>>>>> on Loongson 3C5000L platform with 32 cores and 8 numa nodes.
>>>>
>>>> If it helps, that's nice as long as it doesn't hurt other cases.
>>>>
>>
>> Quite a bit of doubtful feedback and we have yet to hear from the
>> author.  I'll drop the patch.
>>
>> Bibo, please resend at a later time if you feel the patch remains
>> desirable.  Please attempt to address the feedback via code changes
>> and/or changelogging.
> Sorry for the late response, the mail is filtered and I did not notice that. The result is not so obvious after bandwidth is improved between nodes, it is only about 1% improvement for specint2006 for 32 core on my box.
> 
> Now I do not see negative effective about this patch unless user wants to keep some cores separated from khugepaged daemon process.

Can we provide an extra parameter to let khugepaged daemon scheduling binded to node or freely? If can, I will provide updated patch.

> 
> 
> regards
> bibo,mao


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-05-13  1:29         ` maobibo
@ 2022-05-13  1:49           ` Andrew Morton
  2022-05-13  1:59             ` maobibo
  2022-05-13  2:40             ` Yang Shi
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2022-05-13  1:49 UTC (permalink / raw)
  To: maobibo; +Cc: Peter Xu, David Hildenbrand, linux-mm, linux-kernel, Yang Shi

On Fri, 13 May 2022 09:29:07 +0800 maobibo <maobibo@loongson.cn> wrote:

> 
> 
> >> and/or changelogging.
> > Sorry for the late response, the mail is filtered and I did not notice that. The result is not so obvious after bandwidth is improved between nodes, it is only about 1% improvement for specint2006 for 32 core on my box.
> > 
> > Now I do not see negative effective about this patch unless user wants to keep some cores separated from khugepaged daemon process.
> 
> Can we provide an extra parameter to let khugepaged daemon scheduling binded to node or freely? If can, I will provide updated patch.

It has always surprised me that we have a single khugepaged thread.  If
we had a thread per node, you'd be all fixed up, yes?

Ditto ksmd.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-05-13  1:49           ` Andrew Morton
@ 2022-05-13  1:59             ` maobibo
  2022-05-13  2:40             ` Yang Shi
  1 sibling, 0 replies; 13+ messages in thread
From: maobibo @ 2022-05-13  1:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Xu, David Hildenbrand, linux-mm, linux-kernel, Yang Shi



在 2022/5/13 09:49, Andrew Morton 写道:
> On Fri, 13 May 2022 09:29:07 +0800 maobibo <maobibo@loongson.cn> wrote:
> 
>>
>>
>>>> and/or changelogging.
>>> Sorry for the late response, the mail is filtered and I did not notice that. The result is not so obvious after bandwidth is improved between nodes, it is only about 1% improvement for specint2006 for 32 core on my box.
>>>
>>> Now I do not see negative effective about this patch unless user wants to keep some cores separated from khugepaged daemon process.
>>
>> Can we provide an extra parameter to let khugepaged daemon scheduling binded to node or freely? If can, I will provide updated patch.
> 
> It has always surprised me that we have a single khugepaged thread.  If
> we had a thread per node, you'd be all fixed up, yes?
yes, it will solve this issue if there is a thread per node, also it can speed up huge page scanning. It should be useful for some workloads in short time like specint.
> 
> Ditto ksmd.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page
  2022-05-13  1:49           ` Andrew Morton
  2022-05-13  1:59             ` maobibo
@ 2022-05-13  2:40             ` Yang Shi
  1 sibling, 0 replies; 13+ messages in thread
From: Yang Shi @ 2022-05-13  2:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: maobibo, Peter Xu, David Hildenbrand, Linux MM,
	Linux Kernel Mailing List

On Thu, May 12, 2022 at 6:49 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 13 May 2022 09:29:07 +0800 maobibo <maobibo@loongson.cn> wrote:
>
> >
> >
> > >> and/or changelogging.
> > > Sorry for the late response, the mail is filtered and I did not notice that. The result is not so obvious after bandwidth is improved between nodes, it is only about 1% improvement for specint2006 for 32 core on my box.
> > >
> > > Now I do not see negative effective about this patch unless user wants to keep some cores separated from khugepaged daemon process.
> >
> > Can we provide an extra parameter to let khugepaged daemon scheduling binded to node or freely? If can, I will provide updated patch.
>
> It has always surprised me that we have a single khugepaged thread.  If
> we had a thread per node, you'd be all fixed up, yes?

Actually I was thinking about this before, but I didn't see too much
benefit with this approach TBH. The khugepaged scans vmas and the
mapped pages may spread on all nodes. It is not like kswapd which
could scan the LRUs for the specific node.

>
> Ditto ksmd.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-05-13  2:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-17  6:50 [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page Bibo Mao
2022-04-27 20:48 ` Andrew Morton
2022-04-27 22:29   ` Yang Shi
2022-04-28 10:07   ` David Hildenbrand
2022-04-28 13:50 ` Peter Xu
2022-04-28 15:17 ` David Hildenbrand
2022-04-28 16:34   ` Peter Xu
2022-05-13  0:36     ` Andrew Morton
2022-05-13  1:19       ` maobibo
2022-05-13  1:29         ` maobibo
2022-05-13  1:49           ` Andrew Morton
2022-05-13  1:59             ` maobibo
2022-05-13  2:40             ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).