All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Francis <alistair23@gmail.com>
To: Gavin Shan <gshan@redhat.com>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	qemu-riscv@nongnu.org,  rad@semihalf.com,
	peter.maydell@linaro.org, quic_llindhol@quicinc.com,
	 eduardo@habkost.net, marcel.apfelbaum@gmail.com,
	philmd@linaro.org,  wangyanan55@huawei.com, palmer@dabbelt.com,
	alistair.francis@wdc.com,  bin.meng@windriver.com,
	thuth@redhat.com, lvivier@redhat.com,  pbonzini@redhat.com,
	imammedo@redhat.com, ajones@ventanamicro.com,
	 berrange@redhat.com, dbarboza@ventanamicro.com,
	yihyu@redhat.com,  shan.gavin@gmail.com
Subject: Re: [PATCH v4 1/3] numa: Validate cluster and NUMA node boundary if required
Date: Tue, 21 Mar 2023 21:39:26 +1000	[thread overview]
Message-ID: <CAKmqyKPDE8OkfbDE0UgVyqodCGjRXAvkaEf2BFCJCVUsJf6cUw@mail.gmail.com> (raw)
In-Reply-To: <20230317062542.61061-2-gshan@redhat.com>

On Fri, Mar 17, 2023 at 4:29 PM Gavin Shan <gshan@redhat.com> wrote:
>
> For some architectures like ARM64, multiple CPUs in one cluster can be
> associated with different NUMA nodes, which is irregular configuration
> because we shouldn't have this in baremetal environment. The irregular
> configuration causes Linux guest to misbehave, as the following warning
> messages indicate.
>
>   -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
>   -numa node,nodeid=0,cpus=0-1,memdev=ram0                \
>   -numa node,nodeid=1,cpus=2-3,memdev=ram1                \
>   -numa node,nodeid=2,cpus=4-5,memdev=ram2                \
>
>   ------------[ cut here ]------------
>   WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
>   pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : build_sched_domains+0x284/0x910
>   lr : build_sched_domains+0x184/0x910
>   sp : ffff80000804bd50
>   x29: ffff80000804bd50 x28: 0000000000000002 x27: 0000000000000000
>   x26: ffff800009cf9a80 x25: 0000000000000000 x24: ffff800009cbf840
>   x23: ffff000080325000 x22: ffff0000005df800 x21: ffff80000a4ce508
>   x20: 0000000000000000 x19: ffff000080324440 x18: 0000000000000014
>   x17: 00000000388925c0 x16: 000000005386a066 x15: 000000009c10cc2e
>   x14: 00000000000001c0 x13: 0000000000000001 x12: ffff00007fffb1a0
>   x11: ffff00007fffb180 x10: ffff80000a4ce508 x9 : 0000000000000041
>   x8 : ffff80000a4ce500 x7 : ffff80000a4cf920 x6 : 0000000000000001
>   x5 : 0000000000000001 x4 : 0000000000000007 x3 : 0000000000000002
>   x2 : 0000000000001000 x1 : ffff80000a4cf928 x0 : 0000000000000001
>   Call trace:
>    build_sched_domains+0x284/0x910
>    sched_init_domains+0xac/0xe0
>    sched_init_smp+0x48/0xc8
>    kernel_init_freeable+0x140/0x1ac
>    kernel_init+0x28/0x140
>    ret_from_fork+0x10/0x20
>
> Improve the situation to warn when multiple CPUs in one cluster have
> been associated with different NUMA nodes. However, one NUMA node is
> allowed to be associated with different clusters.
>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Acked-by: Philippe Mathieu-Daudé <philmd@linaro.org>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  hw/core/machine.c   | 42 ++++++++++++++++++++++++++++++++++++++++++
>  include/hw/boards.h |  1 +
>  2 files changed, 43 insertions(+)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 45e3d24fdc..a2329f975d 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -1255,6 +1255,45 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>      g_string_free(s, true);
>  }
>
> +static void validate_cpu_cluster_to_numa_boundary(MachineState *ms)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    NumaState *state = ms->numa_state;
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +    const CPUArchId *cpus = possible_cpus->cpus;
> +    int i, j;
> +
> +    if (state->num_nodes <= 1 || possible_cpus->len <= 1) {
> +        return;
> +    }
> +
> +    /*
> +     * The Linux scheduling domain can't be parsed when the multiple CPUs
> +     * in one cluster have been associated with different NUMA nodes. However,
> +     * it's fine to associate one NUMA node with CPUs in different clusters.
> +     */
> +    for (i = 0; i < possible_cpus->len; i++) {
> +        for (j = i + 1; j < possible_cpus->len; j++) {
> +            if (cpus[i].props.has_socket_id &&
> +                cpus[i].props.has_cluster_id &&
> +                cpus[i].props.has_node_id &&
> +                cpus[j].props.has_socket_id &&
> +                cpus[j].props.has_cluster_id &&
> +                cpus[j].props.has_node_id &&
> +                cpus[i].props.socket_id == cpus[j].props.socket_id &&
> +                cpus[i].props.cluster_id == cpus[j].props.cluster_id &&
> +                cpus[i].props.node_id != cpus[j].props.node_id) {
> +                warn_report("CPU-%d and CPU-%d in socket-%ld-cluster-%ld "
> +                             "have been associated with node-%ld and node-%ld "
> +                             "respectively. It can cause OSes like Linux to "
> +                             "misbehave", i, j, cpus[i].props.socket_id,
> +                             cpus[i].props.cluster_id, cpus[i].props.node_id,
> +                             cpus[j].props.node_id);
> +            }
> +        }
> +    }
> +}
> +
>  MemoryRegion *machine_consume_memdev(MachineState *machine,
>                                       HostMemoryBackend *backend)
>  {
> @@ -1340,6 +1379,9 @@ void machine_run_board_init(MachineState *machine, const char *mem_path, Error *
>          numa_complete_configuration(machine);
>          if (machine->numa_state->num_nodes) {
>              machine_numa_finish_cpu_init(machine);
> +            if (machine_class->cpu_cluster_has_numa_boundary) {
> +                validate_cpu_cluster_to_numa_boundary(machine);
> +            }
>          }
>      }
>
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 6fbbfd56c8..c9793b2789 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -273,6 +273,7 @@ struct MachineClass {
>      bool nvdimm_supported;
>      bool numa_mem_supported;
>      bool auto_enable_numa;
> +    bool cpu_cluster_has_numa_boundary;
>      SMPCompatProps smp_props;
>      const char *default_ram_id;
>
> --
> 2.23.0
>
>


  reply	other threads:[~2023-03-21 11:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-17  6:25 [PATCH v4 0/3] NUMA: Apply cluster-NUMA-node boundary for aarch64 and riscv machines Gavin Shan
2023-03-17  6:25 ` [PATCH v4 1/3] numa: Validate cluster and NUMA node boundary if required Gavin Shan
2023-03-21 11:39   ` Alistair Francis [this message]
2023-03-17  6:25 ` [PATCH v4 2/3] hw/arm: Validate cluster and NUMA node boundary Gavin Shan
2023-03-17  6:25 ` [PATCH v4 3/3] hw/riscv: " Gavin Shan
2023-03-21 11:40   ` Alistair Francis
2023-03-27 13:26 ` [PATCH v4 0/3] NUMA: Apply cluster-NUMA-node boundary for aarch64 and riscv machines Igor Mammedov
2023-04-12  1:07   ` Gavin Shan
2023-04-12 11:42     ` Peter Maydell
2023-04-13  5:50       ` Gavin Shan
2023-04-13 11:21         ` Igor Mammedov
2023-04-18  8:57           ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKmqyKPDE8OkfbDE0UgVyqodCGjRXAvkaEf2BFCJCVUsJf6cUw@mail.gmail.com \
    --to=alistair23@gmail.com \
    --cc=ajones@ventanamicro.com \
    --cc=alistair.francis@wdc.com \
    --cc=berrange@redhat.com \
    --cc=bin.meng@windriver.com \
    --cc=dbarboza@ventanamicro.com \
    --cc=eduardo@habkost.net \
    --cc=gshan@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-riscv@nongnu.org \
    --cc=quic_llindhol@quicinc.com \
    --cc=rad@semihalf.com \
    --cc=shan.gavin@gmail.com \
    --cc=thuth@redhat.com \
    --cc=wangyanan55@huawei.com \
    --cc=yihyu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.