All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Qian Cai <quic_qiancai@quicinc.com>
Cc: <linux-kernel@vger.kernel.org>,
	<linux-tip-commits@vger.kernel.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>, <x86@kernel.org>,
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [tip: sched/core] sched/numa: Avoid migrating task to CPU-less node
Date: Mon, 07 Mar 2022 13:51:55 +0800	[thread overview]
Message-ID: <87wnh648ec.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <87v8wx1850.fsf@yhuang6-desk2.ccr.corp.intel.com> (Ying Huang's message of "Wed, 02 Mar 2022 08:59:55 +0800")

Hi, Qian,

"Huang, Ying" <ying.huang@intel.com> writes:

> Qian Cai <quic_qiancai@quicinc.com> writes:
>
>> On Thu, Feb 17, 2022 at 06:56:52PM -0000, tip-bot2 for Huang Ying wrote:
>>> The following commit has been merged into the sched/core branch of tip:
>>> 
>>> Commit-ID:     5c7b1aaf139dab5072311853bacc40fc3457d1f9
>>> Gitweb:        https://git.kernel.org/tip/5c7b1aaf139dab5072311853bacc40fc3457d1f9
>>> Author:        Huang Ying <ying.huang@intel.com>
>>> AuthorDate:    Mon, 14 Feb 2022 20:15:53 +08:00
>>> Committer:     Peter Zijlstra <peterz@infradead.org>
>>> CommitterDate: Wed, 16 Feb 2022 15:57:53 +01:00
>>> 
>>> sched/numa: Avoid migrating task to CPU-less node
>>> 
>>> In a typical memory tiering system, there's no CPU in slow (PMEM) NUMA
>>> nodes.  But if the number of the hint page faults on a PMEM node is
>>> the max for a task, The current NUMA balancing policy may try to place
>>> the task on the PMEM node instead of DRAM node.  This is unreasonable,
>>> because there's no CPU in PMEM NUMA nodes.  To fix this, CPU-less
>>> nodes are ignored when searching the migration target node for a task
>>> in this patch.
>>> 
>>> To test the patch, we run a workload that accesses more memory in PMEM
>>> node than memory in DRAM node.  Without the patch, the PMEM node will
>>> be chosen as preferred node in task_numa_placement().  While the DRAM
>>> node will be chosen instead with the patch.
>>> 
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Link: https://lkml.kernel.org/r/20220214121553.582248-2-ying.huang@intel.com
>>
>> Reverting this commit on the top of today's linux-next fixed a boot crash
>> on arm64 NUMA systems.
>>
>>  Unable to handle kernel paging request at virtual address ffff7a6601694aec
>>  KASAN: maybe wild-memory-access in range [0xffffd3300b4a5760-0xffffd3300b4a5767]
>>  Mem abort info:
>>    ESR = 0x96000005
>>    EC = 0x25: DABT (current EL), IL = 32 bits
>>  mlx5_core 0007:02:00.0: enabling device (0100 -> 0102)
>>    SET = 0, FnV = 0
>>    EA = 0, S1PTW = 0
>>    FSC = 0x05: level 1 translation fault
>>  Data abort info:
>>    ISV = 0, ISS = 0x00000005
>>    CM = 0, WnR = 0
>>  swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000400b3d6c6000
>>  [ffff7a6601694aec] pgd=0000403fc007f003, p4d=0000403fc007f003, pud=0000000000000000
>>  Internal error: Oops: 96000005 [#1] PREEMPT SMP
>>  Modules linked in: nouveau(+) drm_ttm_helper ttm nvme(+) drm_dp_helper drm_kms_helper mlx5_core(+) mpt3sas(+) xhci_pci(+) nvme_core raid_class xhci_pci_renesas drm
>>  CPU: 85 PID: 1308 Comm: udevadm Not tainted 5.17.0-rc6-next-20220301 #1
>>  pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>  pc : task_numa_placement
>>  lr : task_numa_placement
>>  sp : ffff800031047760
>>  x29: ffff800031047760 x28: ffff3fffab916c00 x27: 0000000000000020
>>  x26: 0000000000000001 x25: 0000000000000000 x24: 0000000000000000
>>
>>  x23: ffff07ffe5289a80 x22: ffffd3300b4a5760 x21: 000000000000003f
>>  x20: ffffd32feb4a5768 x19: 0000000000000000 x18: ffff07ffe528ad88
>>  x17: ffffd32fe5693a1c x16: 0000000000000000 x15: ffff8000310478e0
>>
>>  x14: ffff07ffe528ad90 x13: 0000000000000002 x12: dfff80000000000d
>>  x11: 0000000000000001 x10: 000000000000b6be x9 : 0000000000000000
>>  x8 : 00000000ffffffff x7 : ffffd32feb4a5780 x6 : 0000000000000000
>>  x5 : 0000000000000000 x4 : 0000000000000000 x3 : 1ffffa6601694aec
>>  x2 : 0000000000000000 x1 : dfff800000000000 x0 : 000000001ffffff8
>>  Call trace:
>>   task_numa_placement
>>   arch_test_bit at include/asm-generic/bitops/non-atomic.h:118
>>   (inlined by) node_state at include/linux/nodemask.h:416
>>   (inlined by) task_numa_placement at kernel/sched/fair.c:2439
>>   task_numa_fault
>>   do_numa_page
>>   handle_pte_fault
>>   __handle_mm_fault
>>   handle_mm_fault
>>   do_page_fault
>>   do_translation_fault
>>   do_mem_abort
>>   el0_da
>>   el0t_64_sync_handler
>>   el0t_64_sync
>>  Code: 8b000296 d2d00001 f2fbffe1 d343fec3 (38e16861)
>>  ---[ end trace 0000000000000000 ]---
>>  Kernel panic - not syncing: Oops: Fatal exception
>>  SMP: stopping secondary CPUs
>>  Kernel Offset: 0x532fdcf70000 from 0xffff800008000000
>>  PHYS_OFFSET: 0x80000000
>>  CPU features: 0x00,00042c0c,19801c82
>>  Memory Limit: none
>>  ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
>
> Thanks for reporting!  Can you try whether the following debug patch can fix the issue?
>
> Best Regards,
> Huang, Ying
>
> ----------------------------8<-------------------------------------------
> From 176d185426730111e763eb386d0210561f021dbc Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@intel.com>
> Date: Wed, 2 Mar 2022 08:54:01 +0800
> Subject: [PATCH] dbg KASAN error
>
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a3f0ea216ccb..1fe7a4510cca 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2405,7 +2405,7 @@ static void task_numa_placement(struct task_struct *p)
>  	}
>  
>  	/* Cannot migrate task to CPU-less node */
> -	if (!node_state(max_nid, N_CPU)) {
> +	if (max_nid != NUMA_NO_NODE && !node_state(max_nid, N_CPU)) {
>  		int near_nid = max_nid;
>  		int distance, near_distance = INT_MAX;

Do you have time to give this patch a try?

Best Regards,
Huang, Ying

WARNING: multiple messages have this Message-ID (diff)
From: "Huang, Ying" <ying.huang@intel.com>
To: Qian Cai <quic_qiancai@quicinc.com>
Cc: <linux-kernel@vger.kernel.org>,
	 <linux-tip-commits@vger.kernel.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	 <x86@kernel.org>, <linux-arm-kernel@lists.infradead.org>
Subject: Re: [tip: sched/core] sched/numa: Avoid migrating task to CPU-less node
Date: Mon, 07 Mar 2022 13:51:55 +0800	[thread overview]
Message-ID: <87wnh648ec.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <87v8wx1850.fsf@yhuang6-desk2.ccr.corp.intel.com> (Ying Huang's message of "Wed, 02 Mar 2022 08:59:55 +0800")

Hi, Qian,

"Huang, Ying" <ying.huang@intel.com> writes:

> Qian Cai <quic_qiancai@quicinc.com> writes:
>
>> On Thu, Feb 17, 2022 at 06:56:52PM -0000, tip-bot2 for Huang Ying wrote:
>>> The following commit has been merged into the sched/core branch of tip:
>>> 
>>> Commit-ID:     5c7b1aaf139dab5072311853bacc40fc3457d1f9
>>> Gitweb:        https://git.kernel.org/tip/5c7b1aaf139dab5072311853bacc40fc3457d1f9
>>> Author:        Huang Ying <ying.huang@intel.com>
>>> AuthorDate:    Mon, 14 Feb 2022 20:15:53 +08:00
>>> Committer:     Peter Zijlstra <peterz@infradead.org>
>>> CommitterDate: Wed, 16 Feb 2022 15:57:53 +01:00
>>> 
>>> sched/numa: Avoid migrating task to CPU-less node
>>> 
>>> In a typical memory tiering system, there's no CPU in slow (PMEM) NUMA
>>> nodes.  But if the number of the hint page faults on a PMEM node is
>>> the max for a task, The current NUMA balancing policy may try to place
>>> the task on the PMEM node instead of DRAM node.  This is unreasonable,
>>> because there's no CPU in PMEM NUMA nodes.  To fix this, CPU-less
>>> nodes are ignored when searching the migration target node for a task
>>> in this patch.
>>> 
>>> To test the patch, we run a workload that accesses more memory in PMEM
>>> node than memory in DRAM node.  Without the patch, the PMEM node will
>>> be chosen as preferred node in task_numa_placement().  While the DRAM
>>> node will be chosen instead with the patch.
>>> 
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Link: https://lkml.kernel.org/r/20220214121553.582248-2-ying.huang@intel.com
>>
>> Reverting this commit on the top of today's linux-next fixed a boot crash
>> on arm64 NUMA systems.
>>
>>  Unable to handle kernel paging request at virtual address ffff7a6601694aec
>>  KASAN: maybe wild-memory-access in range [0xffffd3300b4a5760-0xffffd3300b4a5767]
>>  Mem abort info:
>>    ESR = 0x96000005
>>    EC = 0x25: DABT (current EL), IL = 32 bits
>>  mlx5_core 0007:02:00.0: enabling device (0100 -> 0102)
>>    SET = 0, FnV = 0
>>    EA = 0, S1PTW = 0
>>    FSC = 0x05: level 1 translation fault
>>  Data abort info:
>>    ISV = 0, ISS = 0x00000005
>>    CM = 0, WnR = 0
>>  swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000400b3d6c6000
>>  [ffff7a6601694aec] pgd=0000403fc007f003, p4d=0000403fc007f003, pud=0000000000000000
>>  Internal error: Oops: 96000005 [#1] PREEMPT SMP
>>  Modules linked in: nouveau(+) drm_ttm_helper ttm nvme(+) drm_dp_helper drm_kms_helper mlx5_core(+) mpt3sas(+) xhci_pci(+) nvme_core raid_class xhci_pci_renesas drm
>>  CPU: 85 PID: 1308 Comm: udevadm Not tainted 5.17.0-rc6-next-20220301 #1
>>  pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>  pc : task_numa_placement
>>  lr : task_numa_placement
>>  sp : ffff800031047760
>>  x29: ffff800031047760 x28: ffff3fffab916c00 x27: 0000000000000020
>>  x26: 0000000000000001 x25: 0000000000000000 x24: 0000000000000000
>>
>>  x23: ffff07ffe5289a80 x22: ffffd3300b4a5760 x21: 000000000000003f
>>  x20: ffffd32feb4a5768 x19: 0000000000000000 x18: ffff07ffe528ad88
>>  x17: ffffd32fe5693a1c x16: 0000000000000000 x15: ffff8000310478e0
>>
>>  x14: ffff07ffe528ad90 x13: 0000000000000002 x12: dfff80000000000d
>>  x11: 0000000000000001 x10: 000000000000b6be x9 : 0000000000000000
>>  x8 : 00000000ffffffff x7 : ffffd32feb4a5780 x6 : 0000000000000000
>>  x5 : 0000000000000000 x4 : 0000000000000000 x3 : 1ffffa6601694aec
>>  x2 : 0000000000000000 x1 : dfff800000000000 x0 : 000000001ffffff8
>>  Call trace:
>>   task_numa_placement
>>   arch_test_bit at include/asm-generic/bitops/non-atomic.h:118
>>   (inlined by) node_state at include/linux/nodemask.h:416
>>   (inlined by) task_numa_placement at kernel/sched/fair.c:2439
>>   task_numa_fault
>>   do_numa_page
>>   handle_pte_fault
>>   __handle_mm_fault
>>   handle_mm_fault
>>   do_page_fault
>>   do_translation_fault
>>   do_mem_abort
>>   el0_da
>>   el0t_64_sync_handler
>>   el0t_64_sync
>>  Code: 8b000296 d2d00001 f2fbffe1 d343fec3 (38e16861)
>>  ---[ end trace 0000000000000000 ]---
>>  Kernel panic - not syncing: Oops: Fatal exception
>>  SMP: stopping secondary CPUs
>>  Kernel Offset: 0x532fdcf70000 from 0xffff800008000000
>>  PHYS_OFFSET: 0x80000000
>>  CPU features: 0x00,00042c0c,19801c82
>>  Memory Limit: none
>>  ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
>
> Thanks for reporting!  Can you try whether the following debug patch can fix the issue?
>
> Best Regards,
> Huang, Ying
>
> ----------------------------8<-------------------------------------------
> From 176d185426730111e763eb386d0210561f021dbc Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@intel.com>
> Date: Wed, 2 Mar 2022 08:54:01 +0800
> Subject: [PATCH] dbg KASAN error
>
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a3f0ea216ccb..1fe7a4510cca 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2405,7 +2405,7 @@ static void task_numa_placement(struct task_struct *p)
>  	}
>  
>  	/* Cannot migrate task to CPU-less node */
> -	if (!node_state(max_nid, N_CPU)) {
> +	if (max_nid != NUMA_NO_NODE && !node_state(max_nid, N_CPU)) {
>  		int near_nid = max_nid;
>  		int distance, near_distance = INT_MAX;

Do you have time to give this patch a try?

Best Regards,
Huang, Ying

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2022-03-07  5:55 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-14 12:15 [PATCH -V3 1/2] NUMA balancing: fix NUMA topology for systems with CPU-less nodes Huang Ying
2022-02-14 12:15 ` [PATCH -V3 2/2] NUMA balancing: avoid to migrate task to CPU-less node Huang Ying
2022-02-17 18:56   ` [tip: sched/core] sched/numa: Avoid migrating " tip-bot2 for Huang Ying
2022-03-01 20:54     ` Qian Cai
2022-03-01 20:54       ` Qian Cai
2022-03-02  0:59       ` Huang, Ying
2022-03-02  0:59         ` Huang, Ying
2022-03-02 12:37         ` Qian Cai
2022-03-02 12:37           ` Qian Cai
2022-03-07  5:51         ` Huang, Ying [this message]
2022-03-07  5:51           ` Huang, Ying
2022-03-07 13:53           ` Qian Cai
2022-03-07 13:53             ` Qian Cai
2022-03-08  0:40             ` Huang, Ying
2022-03-08  0:40               ` Huang, Ying
2022-03-08  2:05   ` [PATCH -V3 2/2 UPDATE] NUMA balancing: avoid to migrate " Huang, Ying
2022-03-08  2:11     ` Huang, Ying
2022-03-16  0:37       ` Huang, Ying
2022-02-14 15:05 ` [PATCH -V3 1/2] NUMA balancing: fix NUMA topology for systems with CPU-less nodes Peter Zijlstra
2022-02-15  1:29   ` Huang, Ying
2022-02-17 18:56 ` [tip: sched/core] sched/numa: Fix " tip-bot2 for Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnh648ec.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=quic_qiancai@quicinc.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.