All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/5] RISC-V multi-socket support
@ 2020-05-29 11:46 ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Atish Patra, Anup Patel, qemu-riscv, qemu-devel, Anup Patel

This series adds multi-socket support for RISC-V virt machine and
RISC-V spike machine. The multi-socket support will help us improve
various RISC-V operating systems, firmwares, and bootloader to
support RISC-V NUMA systems.

These patch can be found in riscv_multi_socket_v5 branch at:
https://github.com/avpatel/qemu.git

To try this patches, we will need: Linux multi-PLIC improvements
support which can be found in plic_imp_v2 branch at:
https://github.com/avpatel/linux.git

Changes since v4:
 - Re-arrange patches and move CLINT and PLIC patches before other
   patches because these are already reviewed
 - Added PATCH3 for common RISC-V multi-socket helpers
 - Added support for "-numa cpu,node-id" option in PATCH4 and PATCH5

Changes since v3:
 - Use "-numa" QEMU options to populate sockets instead of custom
   "multi-socket" sub-option in machine name

Changes since v2:
 - Dropped PATCH1 as it is not required any more
 - Added "multi-socket" sub-option for Spike and Virt machine
   which can be used to enable/disable mult-socket support

Changes since v1:
 - Fixed checkpatch errors and warnings
 - Added PATCH1 for knowning whether "sockets" sub-option was specified
 - Remove SPIKE_CPUS_PER_SOCKET_MIN and SPIKE_CPUS_PER_SOCKET_MAX in PATCH3
 - Remove VIRT_CPUS_PER_SOCKET_MIN and VIRT_CPUS_PER_SOCKET_MAX in PATCH5

Anup Patel (5):
  hw/riscv: Allow creating multiple instances of CLINT
  hw/riscv: Allow creating multiple instances of PLIC
  hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
  hw/riscv: spike: Allow creating multiple NUMA sockets
  hw/riscv: virt: Allow creating multiple NUMA sockets

 hw/riscv/Makefile.objs          |   1 +
 hw/riscv/numa.c                 | 242 +++++++++++++++
 hw/riscv/sifive_clint.c         |  20 +-
 hw/riscv/sifive_e.c             |   4 +-
 hw/riscv/sifive_plic.c          |  24 +-
 hw/riscv/sifive_u.c             |   4 +-
 hw/riscv/spike.c                | 272 ++++++++++------
 hw/riscv/virt.c                 | 530 ++++++++++++++++++--------------
 include/hw/riscv/numa.h         |  51 +++
 include/hw/riscv/sifive_clint.h |   7 +-
 include/hw/riscv/sifive_plic.h  |  12 +-
 include/hw/riscv/spike.h        |  11 +-
 include/hw/riscv/virt.h         |   9 +-
 13 files changed, 831 insertions(+), 356 deletions(-)
 create mode 100644 hw/riscv/numa.c
 create mode 100644 include/hw/riscv/numa.h

-- 
2.25.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 0/5] RISC-V multi-socket support
@ 2020-05-29 11:46 ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Atish Patra, Anup Patel, qemu-riscv, qemu-devel, Anup Patel

This series adds multi-socket support for RISC-V virt machine and
RISC-V spike machine. The multi-socket support will help us improve
various RISC-V operating systems, firmwares, and bootloader to
support RISC-V NUMA systems.

These patch can be found in riscv_multi_socket_v5 branch at:
https://github.com/avpatel/qemu.git

To try this patches, we will need: Linux multi-PLIC improvements
support which can be found in plic_imp_v2 branch at:
https://github.com/avpatel/linux.git

Changes since v4:
 - Re-arrange patches and move CLINT and PLIC patches before other
   patches because these are already reviewed
 - Added PATCH3 for common RISC-V multi-socket helpers
 - Added support for "-numa cpu,node-id" option in PATCH4 and PATCH5

Changes since v3:
 - Use "-numa" QEMU options to populate sockets instead of custom
   "multi-socket" sub-option in machine name

Changes since v2:
 - Dropped PATCH1 as it is not required any more
 - Added "multi-socket" sub-option for Spike and Virt machine
   which can be used to enable/disable mult-socket support

Changes since v1:
 - Fixed checkpatch errors and warnings
 - Added PATCH1 for knowning whether "sockets" sub-option was specified
 - Remove SPIKE_CPUS_PER_SOCKET_MIN and SPIKE_CPUS_PER_SOCKET_MAX in PATCH3
 - Remove VIRT_CPUS_PER_SOCKET_MIN and VIRT_CPUS_PER_SOCKET_MAX in PATCH5

Anup Patel (5):
  hw/riscv: Allow creating multiple instances of CLINT
  hw/riscv: Allow creating multiple instances of PLIC
  hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
  hw/riscv: spike: Allow creating multiple NUMA sockets
  hw/riscv: virt: Allow creating multiple NUMA sockets

 hw/riscv/Makefile.objs          |   1 +
 hw/riscv/numa.c                 | 242 +++++++++++++++
 hw/riscv/sifive_clint.c         |  20 +-
 hw/riscv/sifive_e.c             |   4 +-
 hw/riscv/sifive_plic.c          |  24 +-
 hw/riscv/sifive_u.c             |   4 +-
 hw/riscv/spike.c                | 272 ++++++++++------
 hw/riscv/virt.c                 | 530 ++++++++++++++++++--------------
 include/hw/riscv/numa.h         |  51 +++
 include/hw/riscv/sifive_clint.h |   7 +-
 include/hw/riscv/sifive_plic.h  |  12 +-
 include/hw/riscv/spike.h        |  11 +-
 include/hw/riscv/virt.h         |   9 +-
 13 files changed, 831 insertions(+), 356 deletions(-)
 create mode 100644 hw/riscv/numa.c
 create mode 100644 include/hw/riscv/numa.h

-- 
2.25.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 1/5] hw/riscv: Allow creating multiple instances of CLINT
  2020-05-29 11:46 ` Anup Patel
@ 2020-05-29 11:46   ` Anup Patel
  -1 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Palmer Dabbelt, qemu-riscv, Anup Patel, Anup Patel, qemu-devel,
	Atish Patra, Alistair Francis

We extend CLINT emulation to allow multiple instances of CLINT in
a QEMU RISC-V machine. To achieve this, we remove first HART id
zero assumption from CLINT emulation.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
 hw/riscv/sifive_clint.c         | 20 ++++++++++++--------
 hw/riscv/sifive_e.c             |  2 +-
 hw/riscv/sifive_u.c             |  2 +-
 hw/riscv/spike.c                |  6 +++---
 hw/riscv/virt.c                 |  2 +-
 include/hw/riscv/sifive_clint.h |  7 ++++---
 6 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/hw/riscv/sifive_clint.c b/hw/riscv/sifive_clint.c
index e933d35092..7d713fd743 100644
--- a/hw/riscv/sifive_clint.c
+++ b/hw/riscv/sifive_clint.c
@@ -78,7 +78,7 @@ static uint64_t sifive_clint_read(void *opaque, hwaddr addr, unsigned size)
     SiFiveCLINTState *clint = opaque;
     if (addr >= clint->sip_base &&
         addr < clint->sip_base + (clint->num_harts << 2)) {
-        size_t hartid = (addr - clint->sip_base) >> 2;
+        size_t hartid = clint->hartid_base + ((addr - clint->sip_base) >> 2);
         CPUState *cpu = qemu_get_cpu(hartid);
         CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
         if (!env) {
@@ -91,7 +91,8 @@ static uint64_t sifive_clint_read(void *opaque, hwaddr addr, unsigned size)
         }
     } else if (addr >= clint->timecmp_base &&
         addr < clint->timecmp_base + (clint->num_harts << 3)) {
-        size_t hartid = (addr - clint->timecmp_base) >> 3;
+        size_t hartid = clint->hartid_base +
+            ((addr - clint->timecmp_base) >> 3);
         CPUState *cpu = qemu_get_cpu(hartid);
         CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
         if (!env) {
@@ -128,7 +129,7 @@ static void sifive_clint_write(void *opaque, hwaddr addr, uint64_t value,
 
     if (addr >= clint->sip_base &&
         addr < clint->sip_base + (clint->num_harts << 2)) {
-        size_t hartid = (addr - clint->sip_base) >> 2;
+        size_t hartid = clint->hartid_base + ((addr - clint->sip_base) >> 2);
         CPUState *cpu = qemu_get_cpu(hartid);
         CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
         if (!env) {
@@ -141,7 +142,8 @@ static void sifive_clint_write(void *opaque, hwaddr addr, uint64_t value,
         return;
     } else if (addr >= clint->timecmp_base &&
         addr < clint->timecmp_base + (clint->num_harts << 3)) {
-        size_t hartid = (addr - clint->timecmp_base) >> 3;
+        size_t hartid = clint->hartid_base +
+            ((addr - clint->timecmp_base) >> 3);
         CPUState *cpu = qemu_get_cpu(hartid);
         CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
         if (!env) {
@@ -185,6 +187,7 @@ static const MemoryRegionOps sifive_clint_ops = {
 };
 
 static Property sifive_clint_properties[] = {
+    DEFINE_PROP_UINT32("hartid-base", SiFiveCLINTState, hartid_base, 0),
     DEFINE_PROP_UINT32("num-harts", SiFiveCLINTState, num_harts, 0),
     DEFINE_PROP_UINT32("sip-base", SiFiveCLINTState, sip_base, 0),
     DEFINE_PROP_UINT32("timecmp-base", SiFiveCLINTState, timecmp_base, 0),
@@ -226,13 +229,13 @@ type_init(sifive_clint_register_types)
 /*
  * Create CLINT device.
  */
-DeviceState *sifive_clint_create(hwaddr addr, hwaddr size, uint32_t num_harts,
-    uint32_t sip_base, uint32_t timecmp_base, uint32_t time_base,
-    bool provide_rdtime)
+DeviceState *sifive_clint_create(hwaddr addr, hwaddr size,
+    uint32_t hartid_base, uint32_t num_harts, uint32_t sip_base,
+    uint32_t timecmp_base, uint32_t time_base, bool provide_rdtime)
 {
     int i;
     for (i = 0; i < num_harts; i++) {
-        CPUState *cpu = qemu_get_cpu(i);
+        CPUState *cpu = qemu_get_cpu(hartid_base + i);
         CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
         if (!env) {
             continue;
@@ -246,6 +249,7 @@ DeviceState *sifive_clint_create(hwaddr addr, hwaddr size, uint32_t num_harts,
     }
 
     DeviceState *dev = qdev_create(NULL, TYPE_SIFIVE_CLINT);
+    qdev_prop_set_uint32(dev, "hartid-base", hartid_base);
     qdev_prop_set_uint32(dev, "num-harts", num_harts);
     qdev_prop_set_uint32(dev, "sip-base", sip_base);
     qdev_prop_set_uint32(dev, "timecmp-base", timecmp_base);
diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
index b53109521e..1c3b37d0ba 100644
--- a/hw/riscv/sifive_e.c
+++ b/hw/riscv/sifive_e.c
@@ -163,7 +163,7 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp)
         SIFIVE_E_PLIC_CONTEXT_STRIDE,
         memmap[SIFIVE_E_PLIC].size);
     sifive_clint_create(memmap[SIFIVE_E_CLINT].base,
-        memmap[SIFIVE_E_CLINT].size, ms->smp.cpus,
+        memmap[SIFIVE_E_CLINT].size, 0, ms->smp.cpus,
         SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, false);
     create_unimplemented_device("riscv.sifive.e.aon",
         memmap[SIFIVE_E_AON].base, memmap[SIFIVE_E_AON].size);
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 4299bdf480..c193761916 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -602,7 +602,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp)
     sifive_uart_create(system_memory, memmap[SIFIVE_U_UART1].base,
         serial_hd(1), qdev_get_gpio_in(DEVICE(s->plic), SIFIVE_U_UART1_IRQ));
     sifive_clint_create(memmap[SIFIVE_U_CLINT].base,
-        memmap[SIFIVE_U_CLINT].size, ms->smp.cpus,
+        memmap[SIFIVE_U_CLINT].size, 0, ms->smp.cpus,
         SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, false);
 
     object_property_set_bool(OBJECT(&s->prci), true, "realized", &err);
diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index d0c4843712..d5e0103d89 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -253,7 +253,7 @@ static void spike_board_init(MachineState *machine)
 
     /* Core Local Interruptor (timer and IPI) */
     sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
-        smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
+        0, smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
         false);
 }
 
@@ -343,7 +343,7 @@ static void spike_v1_10_0_board_init(MachineState *machine)
 
     /* Core Local Interruptor (timer and IPI) */
     sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
-        smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
+        0, smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
         false);
 }
 
@@ -452,7 +452,7 @@ static void spike_v1_09_1_board_init(MachineState *machine)
 
     /* Core Local Interruptor (timer and IPI) */
     sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
-        smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
+        0, smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
         false);
 
     g_free(config_string);
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index c695a44979..51afe7e23b 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -595,7 +595,7 @@ static void riscv_virt_board_init(MachineState *machine)
         VIRT_PLIC_CONTEXT_STRIDE,
         memmap[VIRT_PLIC].size);
     sifive_clint_create(memmap[VIRT_CLINT].base,
-        memmap[VIRT_CLINT].size, smp_cpus,
+        memmap[VIRT_CLINT].size, 0, smp_cpus,
         SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
     sifive_test_create(memmap[VIRT_TEST].base);
 
diff --git a/include/hw/riscv/sifive_clint.h b/include/hw/riscv/sifive_clint.h
index 4a720bfece..9f5fb3d31d 100644
--- a/include/hw/riscv/sifive_clint.h
+++ b/include/hw/riscv/sifive_clint.h
@@ -33,6 +33,7 @@ typedef struct SiFiveCLINTState {
 
     /*< public >*/
     MemoryRegion mmio;
+    uint32_t hartid_base;
     uint32_t num_harts;
     uint32_t sip_base;
     uint32_t timecmp_base;
@@ -40,9 +41,9 @@ typedef struct SiFiveCLINTState {
     uint32_t aperture_size;
 } SiFiveCLINTState;
 
-DeviceState *sifive_clint_create(hwaddr addr, hwaddr size, uint32_t num_harts,
-    uint32_t sip_base, uint32_t timecmp_base, uint32_t time_base,
-    bool provide_rdtime);
+DeviceState *sifive_clint_create(hwaddr addr, hwaddr size,
+    uint32_t hartid_base, uint32_t num_harts, uint32_t sip_base,
+    uint32_t timecmp_base, uint32_t time_base, bool provide_rdtime);
 
 enum {
     SIFIVE_SIP_BASE     = 0x0,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 1/5] hw/riscv: Allow creating multiple instances of CLINT
@ 2020-05-29 11:46   ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Atish Patra, Anup Patel, qemu-riscv, qemu-devel, Anup Patel,
	Alistair Francis, Palmer Dabbelt

We extend CLINT emulation to allow multiple instances of CLINT in
a QEMU RISC-V machine. To achieve this, we remove first HART id
zero assumption from CLINT emulation.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
 hw/riscv/sifive_clint.c         | 20 ++++++++++++--------
 hw/riscv/sifive_e.c             |  2 +-
 hw/riscv/sifive_u.c             |  2 +-
 hw/riscv/spike.c                |  6 +++---
 hw/riscv/virt.c                 |  2 +-
 include/hw/riscv/sifive_clint.h |  7 ++++---
 6 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/hw/riscv/sifive_clint.c b/hw/riscv/sifive_clint.c
index e933d35092..7d713fd743 100644
--- a/hw/riscv/sifive_clint.c
+++ b/hw/riscv/sifive_clint.c
@@ -78,7 +78,7 @@ static uint64_t sifive_clint_read(void *opaque, hwaddr addr, unsigned size)
     SiFiveCLINTState *clint = opaque;
     if (addr >= clint->sip_base &&
         addr < clint->sip_base + (clint->num_harts << 2)) {
-        size_t hartid = (addr - clint->sip_base) >> 2;
+        size_t hartid = clint->hartid_base + ((addr - clint->sip_base) >> 2);
         CPUState *cpu = qemu_get_cpu(hartid);
         CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
         if (!env) {
@@ -91,7 +91,8 @@ static uint64_t sifive_clint_read(void *opaque, hwaddr addr, unsigned size)
         }
     } else if (addr >= clint->timecmp_base &&
         addr < clint->timecmp_base + (clint->num_harts << 3)) {
-        size_t hartid = (addr - clint->timecmp_base) >> 3;
+        size_t hartid = clint->hartid_base +
+            ((addr - clint->timecmp_base) >> 3);
         CPUState *cpu = qemu_get_cpu(hartid);
         CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
         if (!env) {
@@ -128,7 +129,7 @@ static void sifive_clint_write(void *opaque, hwaddr addr, uint64_t value,
 
     if (addr >= clint->sip_base &&
         addr < clint->sip_base + (clint->num_harts << 2)) {
-        size_t hartid = (addr - clint->sip_base) >> 2;
+        size_t hartid = clint->hartid_base + ((addr - clint->sip_base) >> 2);
         CPUState *cpu = qemu_get_cpu(hartid);
         CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
         if (!env) {
@@ -141,7 +142,8 @@ static void sifive_clint_write(void *opaque, hwaddr addr, uint64_t value,
         return;
     } else if (addr >= clint->timecmp_base &&
         addr < clint->timecmp_base + (clint->num_harts << 3)) {
-        size_t hartid = (addr - clint->timecmp_base) >> 3;
+        size_t hartid = clint->hartid_base +
+            ((addr - clint->timecmp_base) >> 3);
         CPUState *cpu = qemu_get_cpu(hartid);
         CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
         if (!env) {
@@ -185,6 +187,7 @@ static const MemoryRegionOps sifive_clint_ops = {
 };
 
 static Property sifive_clint_properties[] = {
+    DEFINE_PROP_UINT32("hartid-base", SiFiveCLINTState, hartid_base, 0),
     DEFINE_PROP_UINT32("num-harts", SiFiveCLINTState, num_harts, 0),
     DEFINE_PROP_UINT32("sip-base", SiFiveCLINTState, sip_base, 0),
     DEFINE_PROP_UINT32("timecmp-base", SiFiveCLINTState, timecmp_base, 0),
@@ -226,13 +229,13 @@ type_init(sifive_clint_register_types)
 /*
  * Create CLINT device.
  */
-DeviceState *sifive_clint_create(hwaddr addr, hwaddr size, uint32_t num_harts,
-    uint32_t sip_base, uint32_t timecmp_base, uint32_t time_base,
-    bool provide_rdtime)
+DeviceState *sifive_clint_create(hwaddr addr, hwaddr size,
+    uint32_t hartid_base, uint32_t num_harts, uint32_t sip_base,
+    uint32_t timecmp_base, uint32_t time_base, bool provide_rdtime)
 {
     int i;
     for (i = 0; i < num_harts; i++) {
-        CPUState *cpu = qemu_get_cpu(i);
+        CPUState *cpu = qemu_get_cpu(hartid_base + i);
         CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
         if (!env) {
             continue;
@@ -246,6 +249,7 @@ DeviceState *sifive_clint_create(hwaddr addr, hwaddr size, uint32_t num_harts,
     }
 
     DeviceState *dev = qdev_create(NULL, TYPE_SIFIVE_CLINT);
+    qdev_prop_set_uint32(dev, "hartid-base", hartid_base);
     qdev_prop_set_uint32(dev, "num-harts", num_harts);
     qdev_prop_set_uint32(dev, "sip-base", sip_base);
     qdev_prop_set_uint32(dev, "timecmp-base", timecmp_base);
diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
index b53109521e..1c3b37d0ba 100644
--- a/hw/riscv/sifive_e.c
+++ b/hw/riscv/sifive_e.c
@@ -163,7 +163,7 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp)
         SIFIVE_E_PLIC_CONTEXT_STRIDE,
         memmap[SIFIVE_E_PLIC].size);
     sifive_clint_create(memmap[SIFIVE_E_CLINT].base,
-        memmap[SIFIVE_E_CLINT].size, ms->smp.cpus,
+        memmap[SIFIVE_E_CLINT].size, 0, ms->smp.cpus,
         SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, false);
     create_unimplemented_device("riscv.sifive.e.aon",
         memmap[SIFIVE_E_AON].base, memmap[SIFIVE_E_AON].size);
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 4299bdf480..c193761916 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -602,7 +602,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp)
     sifive_uart_create(system_memory, memmap[SIFIVE_U_UART1].base,
         serial_hd(1), qdev_get_gpio_in(DEVICE(s->plic), SIFIVE_U_UART1_IRQ));
     sifive_clint_create(memmap[SIFIVE_U_CLINT].base,
-        memmap[SIFIVE_U_CLINT].size, ms->smp.cpus,
+        memmap[SIFIVE_U_CLINT].size, 0, ms->smp.cpus,
         SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, false);
 
     object_property_set_bool(OBJECT(&s->prci), true, "realized", &err);
diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index d0c4843712..d5e0103d89 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -253,7 +253,7 @@ static void spike_board_init(MachineState *machine)
 
     /* Core Local Interruptor (timer and IPI) */
     sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
-        smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
+        0, smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
         false);
 }
 
@@ -343,7 +343,7 @@ static void spike_v1_10_0_board_init(MachineState *machine)
 
     /* Core Local Interruptor (timer and IPI) */
     sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
-        smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
+        0, smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
         false);
 }
 
@@ -452,7 +452,7 @@ static void spike_v1_09_1_board_init(MachineState *machine)
 
     /* Core Local Interruptor (timer and IPI) */
     sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
-        smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
+        0, smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
         false);
 
     g_free(config_string);
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index c695a44979..51afe7e23b 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -595,7 +595,7 @@ static void riscv_virt_board_init(MachineState *machine)
         VIRT_PLIC_CONTEXT_STRIDE,
         memmap[VIRT_PLIC].size);
     sifive_clint_create(memmap[VIRT_CLINT].base,
-        memmap[VIRT_CLINT].size, smp_cpus,
+        memmap[VIRT_CLINT].size, 0, smp_cpus,
         SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
     sifive_test_create(memmap[VIRT_TEST].base);
 
diff --git a/include/hw/riscv/sifive_clint.h b/include/hw/riscv/sifive_clint.h
index 4a720bfece..9f5fb3d31d 100644
--- a/include/hw/riscv/sifive_clint.h
+++ b/include/hw/riscv/sifive_clint.h
@@ -33,6 +33,7 @@ typedef struct SiFiveCLINTState {
 
     /*< public >*/
     MemoryRegion mmio;
+    uint32_t hartid_base;
     uint32_t num_harts;
     uint32_t sip_base;
     uint32_t timecmp_base;
@@ -40,9 +41,9 @@ typedef struct SiFiveCLINTState {
     uint32_t aperture_size;
 } SiFiveCLINTState;
 
-DeviceState *sifive_clint_create(hwaddr addr, hwaddr size, uint32_t num_harts,
-    uint32_t sip_base, uint32_t timecmp_base, uint32_t time_base,
-    bool provide_rdtime);
+DeviceState *sifive_clint_create(hwaddr addr, hwaddr size,
+    uint32_t hartid_base, uint32_t num_harts, uint32_t sip_base,
+    uint32_t timecmp_base, uint32_t time_base, bool provide_rdtime);
 
 enum {
     SIFIVE_SIP_BASE     = 0x0,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 2/5] hw/riscv: Allow creating multiple instances of PLIC
  2020-05-29 11:46 ` Anup Patel
@ 2020-05-29 11:46   ` Anup Patel
  -1 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Anup Patel, qemu-riscv, Anup Patel, Palmer Dabbelt, qemu-devel,
	Atish Patra, Alistair Francis

We extend PLIC emulation to allow multiple instances of PLIC in
a QEMU RISC-V machine. To achieve this, we remove first HART id
zero assumption from PLIC emulation.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/riscv/sifive_e.c            |  2 +-
 hw/riscv/sifive_plic.c         | 24 +++++++++++++-----------
 hw/riscv/sifive_u.c            |  2 +-
 hw/riscv/virt.c                |  2 +-
 include/hw/riscv/sifive_plic.h | 12 +++++++-----
 5 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
index 1c3b37d0ba..bd122e71ae 100644
--- a/hw/riscv/sifive_e.c
+++ b/hw/riscv/sifive_e.c
@@ -152,7 +152,7 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp)
 
     /* MMIO */
     s->plic = sifive_plic_create(memmap[SIFIVE_E_PLIC].base,
-        (char *)SIFIVE_E_PLIC_HART_CONFIG,
+        (char *)SIFIVE_E_PLIC_HART_CONFIG, 0,
         SIFIVE_E_PLIC_NUM_SOURCES,
         SIFIVE_E_PLIC_NUM_PRIORITIES,
         SIFIVE_E_PLIC_PRIORITY_BASE,
diff --git a/hw/riscv/sifive_plic.c b/hw/riscv/sifive_plic.c
index c1e04cbb98..f88bb48053 100644
--- a/hw/riscv/sifive_plic.c
+++ b/hw/riscv/sifive_plic.c
@@ -352,6 +352,7 @@ static const MemoryRegionOps sifive_plic_ops = {
 
 static Property sifive_plic_properties[] = {
     DEFINE_PROP_STRING("hart-config", SiFivePLICState, hart_config),
+    DEFINE_PROP_UINT32("hartid-base", SiFivePLICState, hartid_base, 0),
     DEFINE_PROP_UINT32("num-sources", SiFivePLICState, num_sources, 0),
     DEFINE_PROP_UINT32("num-priorities", SiFivePLICState, num_priorities, 0),
     DEFINE_PROP_UINT32("priority-base", SiFivePLICState, priority_base, 0),
@@ -400,10 +401,12 @@ static void parse_hart_config(SiFivePLICState *plic)
     }
     hartid++;
 
-    /* store hart/mode combinations */
     plic->num_addrs = addrid;
+    plic->num_harts = hartid;
+
+    /* store hart/mode combinations */
     plic->addr_config = g_new(PLICAddr, plic->num_addrs);
-    addrid = 0, hartid = 0;
+    addrid = 0, hartid = plic->hartid_base;
     p = plic->hart_config;
     while ((c = *p++)) {
         if (c == ',') {
@@ -429,8 +432,6 @@ static void sifive_plic_irq_request(void *opaque, int irq, int level)
 
 static void sifive_plic_realize(DeviceState *dev, Error **errp)
 {
-    MachineState *ms = MACHINE(qdev_get_machine());
-    unsigned int smp_cpus = ms->smp.cpus;
     SiFivePLICState *plic = SIFIVE_PLIC(dev);
     int i;
 
@@ -451,8 +452,8 @@ static void sifive_plic_realize(DeviceState *dev, Error **errp)
      * lost a interrupt in the case a PLIC is attached. The SEIP bit must be
      * hardware controlled when a PLIC is attached.
      */
-    for (i = 0; i < smp_cpus; i++) {
-        RISCVCPU *cpu = RISCV_CPU(qemu_get_cpu(i));
+    for (i = 0; i < plic->num_harts; i++) {
+        RISCVCPU *cpu = RISCV_CPU(qemu_get_cpu(plic->hartid_base + i));
         if (riscv_cpu_claim_interrupts(cpu, MIP_SEIP) < 0) {
             error_report("SEIP already claimed");
             exit(1);
@@ -488,16 +489,17 @@ type_init(sifive_plic_register_types)
  * Create PLIC device.
  */
 DeviceState *sifive_plic_create(hwaddr addr, char *hart_config,
-    uint32_t num_sources, uint32_t num_priorities,
-    uint32_t priority_base, uint32_t pending_base,
-    uint32_t enable_base, uint32_t enable_stride,
-    uint32_t context_base, uint32_t context_stride,
-    uint32_t aperture_size)
+    uint32_t hartid_base, uint32_t num_sources,
+    uint32_t num_priorities, uint32_t priority_base,
+    uint32_t pending_base, uint32_t enable_base,
+    uint32_t enable_stride, uint32_t context_base,
+    uint32_t context_stride, uint32_t aperture_size)
 {
     DeviceState *dev = qdev_create(NULL, TYPE_SIFIVE_PLIC);
     assert(enable_stride == (enable_stride & -enable_stride));
     assert(context_stride == (context_stride & -context_stride));
     qdev_prop_set_string(dev, "hart-config", hart_config);
+    qdev_prop_set_uint32(dev, "hartid-base", hartid_base);
     qdev_prop_set_uint32(dev, "num-sources", num_sources);
     qdev_prop_set_uint32(dev, "num-priorities", num_priorities);
     qdev_prop_set_uint32(dev, "priority-base", priority_base);
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index c193761916..53e48e2ff5 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -586,7 +586,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp)
 
     /* MMIO */
     s->plic = sifive_plic_create(memmap[SIFIVE_U_PLIC].base,
-        plic_hart_config,
+        plic_hart_config, 0,
         SIFIVE_U_PLIC_NUM_SOURCES,
         SIFIVE_U_PLIC_NUM_PRIORITIES,
         SIFIVE_U_PLIC_PRIORITY_BASE,
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 51afe7e23b..421815081d 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -584,7 +584,7 @@ static void riscv_virt_board_init(MachineState *machine)
 
     /* MMIO */
     s->plic = sifive_plic_create(memmap[VIRT_PLIC].base,
-        plic_hart_config,
+        plic_hart_config, 0,
         VIRT_PLIC_NUM_SOURCES,
         VIRT_PLIC_NUM_PRIORITIES,
         VIRT_PLIC_PRIORITY_BASE,
diff --git a/include/hw/riscv/sifive_plic.h b/include/hw/riscv/sifive_plic.h
index 4421e81249..ace76d0f1b 100644
--- a/include/hw/riscv/sifive_plic.h
+++ b/include/hw/riscv/sifive_plic.h
@@ -48,6 +48,7 @@ typedef struct SiFivePLICState {
     /*< public >*/
     MemoryRegion mmio;
     uint32_t num_addrs;
+    uint32_t num_harts;
     uint32_t bitfield_words;
     PLICAddr *addr_config;
     uint32_t *source_priority;
@@ -58,6 +59,7 @@ typedef struct SiFivePLICState {
 
     /* config */
     char *hart_config;
+    uint32_t hartid_base;
     uint32_t num_sources;
     uint32_t num_priorities;
     uint32_t priority_base;
@@ -70,10 +72,10 @@ typedef struct SiFivePLICState {
 } SiFivePLICState;
 
 DeviceState *sifive_plic_create(hwaddr addr, char *hart_config,
-    uint32_t num_sources, uint32_t num_priorities,
-    uint32_t priority_base, uint32_t pending_base,
-    uint32_t enable_base, uint32_t enable_stride,
-    uint32_t context_base, uint32_t context_stride,
-    uint32_t aperture_size);
+    uint32_t hartid_base, uint32_t num_sources,
+    uint32_t num_priorities, uint32_t priority_base,
+    uint32_t pending_base, uint32_t enable_base,
+    uint32_t enable_stride, uint32_t context_base,
+    uint32_t context_stride, uint32_t aperture_size);
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 2/5] hw/riscv: Allow creating multiple instances of PLIC
@ 2020-05-29 11:46   ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Atish Patra, Anup Patel, qemu-riscv, qemu-devel, Anup Patel,
	Palmer Dabbelt, Alistair Francis

We extend PLIC emulation to allow multiple instances of PLIC in
a QEMU RISC-V machine. To achieve this, we remove first HART id
zero assumption from PLIC emulation.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/riscv/sifive_e.c            |  2 +-
 hw/riscv/sifive_plic.c         | 24 +++++++++++++-----------
 hw/riscv/sifive_u.c            |  2 +-
 hw/riscv/virt.c                |  2 +-
 include/hw/riscv/sifive_plic.h | 12 +++++++-----
 5 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
index 1c3b37d0ba..bd122e71ae 100644
--- a/hw/riscv/sifive_e.c
+++ b/hw/riscv/sifive_e.c
@@ -152,7 +152,7 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp)
 
     /* MMIO */
     s->plic = sifive_plic_create(memmap[SIFIVE_E_PLIC].base,
-        (char *)SIFIVE_E_PLIC_HART_CONFIG,
+        (char *)SIFIVE_E_PLIC_HART_CONFIG, 0,
         SIFIVE_E_PLIC_NUM_SOURCES,
         SIFIVE_E_PLIC_NUM_PRIORITIES,
         SIFIVE_E_PLIC_PRIORITY_BASE,
diff --git a/hw/riscv/sifive_plic.c b/hw/riscv/sifive_plic.c
index c1e04cbb98..f88bb48053 100644
--- a/hw/riscv/sifive_plic.c
+++ b/hw/riscv/sifive_plic.c
@@ -352,6 +352,7 @@ static const MemoryRegionOps sifive_plic_ops = {
 
 static Property sifive_plic_properties[] = {
     DEFINE_PROP_STRING("hart-config", SiFivePLICState, hart_config),
+    DEFINE_PROP_UINT32("hartid-base", SiFivePLICState, hartid_base, 0),
     DEFINE_PROP_UINT32("num-sources", SiFivePLICState, num_sources, 0),
     DEFINE_PROP_UINT32("num-priorities", SiFivePLICState, num_priorities, 0),
     DEFINE_PROP_UINT32("priority-base", SiFivePLICState, priority_base, 0),
@@ -400,10 +401,12 @@ static void parse_hart_config(SiFivePLICState *plic)
     }
     hartid++;
 
-    /* store hart/mode combinations */
     plic->num_addrs = addrid;
+    plic->num_harts = hartid;
+
+    /* store hart/mode combinations */
     plic->addr_config = g_new(PLICAddr, plic->num_addrs);
-    addrid = 0, hartid = 0;
+    addrid = 0, hartid = plic->hartid_base;
     p = plic->hart_config;
     while ((c = *p++)) {
         if (c == ',') {
@@ -429,8 +432,6 @@ static void sifive_plic_irq_request(void *opaque, int irq, int level)
 
 static void sifive_plic_realize(DeviceState *dev, Error **errp)
 {
-    MachineState *ms = MACHINE(qdev_get_machine());
-    unsigned int smp_cpus = ms->smp.cpus;
     SiFivePLICState *plic = SIFIVE_PLIC(dev);
     int i;
 
@@ -451,8 +452,8 @@ static void sifive_plic_realize(DeviceState *dev, Error **errp)
      * lost a interrupt in the case a PLIC is attached. The SEIP bit must be
      * hardware controlled when a PLIC is attached.
      */
-    for (i = 0; i < smp_cpus; i++) {
-        RISCVCPU *cpu = RISCV_CPU(qemu_get_cpu(i));
+    for (i = 0; i < plic->num_harts; i++) {
+        RISCVCPU *cpu = RISCV_CPU(qemu_get_cpu(plic->hartid_base + i));
         if (riscv_cpu_claim_interrupts(cpu, MIP_SEIP) < 0) {
             error_report("SEIP already claimed");
             exit(1);
@@ -488,16 +489,17 @@ type_init(sifive_plic_register_types)
  * Create PLIC device.
  */
 DeviceState *sifive_plic_create(hwaddr addr, char *hart_config,
-    uint32_t num_sources, uint32_t num_priorities,
-    uint32_t priority_base, uint32_t pending_base,
-    uint32_t enable_base, uint32_t enable_stride,
-    uint32_t context_base, uint32_t context_stride,
-    uint32_t aperture_size)
+    uint32_t hartid_base, uint32_t num_sources,
+    uint32_t num_priorities, uint32_t priority_base,
+    uint32_t pending_base, uint32_t enable_base,
+    uint32_t enable_stride, uint32_t context_base,
+    uint32_t context_stride, uint32_t aperture_size)
 {
     DeviceState *dev = qdev_create(NULL, TYPE_SIFIVE_PLIC);
     assert(enable_stride == (enable_stride & -enable_stride));
     assert(context_stride == (context_stride & -context_stride));
     qdev_prop_set_string(dev, "hart-config", hart_config);
+    qdev_prop_set_uint32(dev, "hartid-base", hartid_base);
     qdev_prop_set_uint32(dev, "num-sources", num_sources);
     qdev_prop_set_uint32(dev, "num-priorities", num_priorities);
     qdev_prop_set_uint32(dev, "priority-base", priority_base);
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index c193761916..53e48e2ff5 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -586,7 +586,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp)
 
     /* MMIO */
     s->plic = sifive_plic_create(memmap[SIFIVE_U_PLIC].base,
-        plic_hart_config,
+        plic_hart_config, 0,
         SIFIVE_U_PLIC_NUM_SOURCES,
         SIFIVE_U_PLIC_NUM_PRIORITIES,
         SIFIVE_U_PLIC_PRIORITY_BASE,
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 51afe7e23b..421815081d 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -584,7 +584,7 @@ static void riscv_virt_board_init(MachineState *machine)
 
     /* MMIO */
     s->plic = sifive_plic_create(memmap[VIRT_PLIC].base,
-        plic_hart_config,
+        plic_hart_config, 0,
         VIRT_PLIC_NUM_SOURCES,
         VIRT_PLIC_NUM_PRIORITIES,
         VIRT_PLIC_PRIORITY_BASE,
diff --git a/include/hw/riscv/sifive_plic.h b/include/hw/riscv/sifive_plic.h
index 4421e81249..ace76d0f1b 100644
--- a/include/hw/riscv/sifive_plic.h
+++ b/include/hw/riscv/sifive_plic.h
@@ -48,6 +48,7 @@ typedef struct SiFivePLICState {
     /*< public >*/
     MemoryRegion mmio;
     uint32_t num_addrs;
+    uint32_t num_harts;
     uint32_t bitfield_words;
     PLICAddr *addr_config;
     uint32_t *source_priority;
@@ -58,6 +59,7 @@ typedef struct SiFivePLICState {
 
     /* config */
     char *hart_config;
+    uint32_t hartid_base;
     uint32_t num_sources;
     uint32_t num_priorities;
     uint32_t priority_base;
@@ -70,10 +72,10 @@ typedef struct SiFivePLICState {
 } SiFivePLICState;
 
 DeviceState *sifive_plic_create(hwaddr addr, char *hart_config,
-    uint32_t num_sources, uint32_t num_priorities,
-    uint32_t priority_base, uint32_t pending_base,
-    uint32_t enable_base, uint32_t enable_stride,
-    uint32_t context_base, uint32_t context_stride,
-    uint32_t aperture_size);
+    uint32_t hartid_base, uint32_t num_sources,
+    uint32_t num_priorities, uint32_t priority_base,
+    uint32_t pending_base, uint32_t enable_base,
+    uint32_t enable_stride, uint32_t context_base,
+    uint32_t context_stride, uint32_t aperture_size);
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
  2020-05-29 11:46 ` Anup Patel
@ 2020-05-29 11:46   ` Anup Patel
  -1 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Atish Patra, Anup Patel, qemu-riscv, qemu-devel, Anup Patel

We add common helper routines which can be shared by RISC-V
multi-socket NUMA machines.

We have two types of helpers:
1. riscv_socket_xyz() - These helper assist managing multiple
   sockets irrespective whether QEMU NUMA is enabled/disabled
2. riscv_numa_xyz() - These helpers assist in providing
   necessary QEMU machine callbacks for QEMU NUMA emulation

Signed-off-by: Anup Patel <anup.patel@wdc.com>
---
 hw/riscv/Makefile.objs  |   1 +
 hw/riscv/numa.c         | 242 ++++++++++++++++++++++++++++++++++++++++
 include/hw/riscv/numa.h |  51 +++++++++
 3 files changed, 294 insertions(+)
 create mode 100644 hw/riscv/numa.c
 create mode 100644 include/hw/riscv/numa.h

diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
index fc3c6dd7c8..4483e61879 100644
--- a/hw/riscv/Makefile.objs
+++ b/hw/riscv/Makefile.objs
@@ -1,4 +1,5 @@
 obj-y += boot.o
+obj-y += numa.o
 obj-$(CONFIG_SPIKE) += riscv_htif.o
 obj-$(CONFIG_HART) += riscv_hart.o
 obj-$(CONFIG_SIFIVE_E) += sifive_e.o
diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
new file mode 100644
index 0000000000..4f92307102
--- /dev/null
+++ b/hw/riscv/numa.c
@@ -0,0 +1,242 @@
+/*
+ * QEMU RISC-V NUMA Helper
+ *
+ * Copyright (c) 2020 Western Digital Corporation or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "hw/qdev-properties.h"
+#include "hw/riscv/numa.h"
+#include "sysemu/device_tree.h"
+
+static bool numa_enabled(const MachineState *ms)
+{
+    return (ms->numa_state && ms->numa_state->num_nodes) ? true : false;
+}
+
+int riscv_socket_count(const MachineState *ms)
+{
+    return (numa_enabled(ms)) ? ms->numa_state->num_nodes : 1;
+}
+
+int riscv_socket_first_hartid(const MachineState *ms, int socket_id)
+{
+    int i, first_hartid = ms->smp.cpus;
+
+    if (!numa_enabled(ms)) {
+        return (!socket_id) ? 0 : -1;
+    }
+
+    for (i = 0; i < ms->smp.cpus; i++) {
+        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
+            continue;
+        }
+        if (i < first_hartid) {
+            first_hartid = i;
+        }
+    }
+
+    return (first_hartid < ms->smp.cpus) ? first_hartid : -1;
+}
+
+int riscv_socket_last_hartid(const MachineState *ms, int socket_id)
+{
+    int i, last_hartid = -1;
+
+    if (!numa_enabled(ms)) {
+        return (!socket_id) ? ms->smp.cpus - 1 : -1;
+    }
+
+    for (i = 0; i < ms->smp.cpus; i++) {
+        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
+            continue;
+        }
+        if (i > last_hartid) {
+            last_hartid = i;
+        }
+    }
+
+    return (last_hartid < ms->smp.cpus) ? last_hartid : -1;
+}
+
+int riscv_socket_hart_count(const MachineState *ms, int socket_id)
+{
+    int first_hartid, last_hartid;
+
+    if (!numa_enabled(ms)) {
+        return (!socket_id) ? ms->smp.cpus : -1;
+    }
+
+    first_hartid = riscv_socket_first_hartid(ms, socket_id);
+    if (first_hartid < 0) {
+        return -1;
+    }
+
+    last_hartid = riscv_socket_last_hartid(ms, socket_id);
+    if (last_hartid < 0) {
+        return -1;
+    }
+
+    if (first_hartid > last_hartid) {
+        return -1;
+    }
+
+    return last_hartid - first_hartid + 1;
+}
+
+bool riscv_socket_check_hartids(const MachineState *ms, int socket_id)
+{
+    int i, first_hartid, last_hartid;
+
+    if (!numa_enabled(ms)) {
+        return (!socket_id) ? true : false;
+    }
+
+    first_hartid = riscv_socket_first_hartid(ms, socket_id);
+    if (first_hartid < 0) {
+        return false;
+    }
+
+    last_hartid = riscv_socket_last_hartid(ms, socket_id);
+    if (last_hartid < 0) {
+        return false;
+    }
+
+    for (i = first_hartid; i <= last_hartid; i++) {
+        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
+uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id)
+{
+    int i;
+    uint64_t mem_offset = 0;
+
+    if (!numa_enabled(ms)) {
+        return 0;
+    }
+
+    for (i = 0; i < ms->numa_state->num_nodes; i++) {
+        if (i == socket_id) {
+            break;
+        }
+        mem_offset += ms->numa_state->nodes[i].node_mem;
+    }
+
+    return (i == socket_id) ? mem_offset : 0;
+}
+
+uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id)
+{
+    if (!numa_enabled(ms)) {
+        return (!socket_id) ? ms->ram_size : 0;
+    }
+
+    return (socket_id < ms->numa_state->num_nodes) ?
+            ms->numa_state->nodes[socket_id].node_mem : 0;
+}
+
+void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
+                               const char *node_name, int socket_id)
+{
+    if (numa_enabled(ms)) {
+        qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id", socket_id);
+    }
+}
+
+void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt)
+{
+    int i, j, idx;
+    uint32_t *dist_matrix, dist_matrix_size;
+
+    if (numa_enabled(ms) && ms->numa_state->have_numa_distance) {
+        dist_matrix_size = riscv_socket_count(ms) * riscv_socket_count(ms);
+        dist_matrix_size *= (3 * sizeof(uint32_t));
+        dist_matrix = g_malloc0(dist_matrix_size);
+
+        for (i = 0; i < riscv_socket_count(ms); i++) {
+            for (j = 0; j < riscv_socket_count(ms); j++) {
+                idx = (i * riscv_socket_count(ms) + j) * 3;
+                dist_matrix[idx + 0] = cpu_to_be32(i);
+                dist_matrix[idx + 1] = cpu_to_be32(j);
+                dist_matrix[idx + 2] =
+                    cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
+            }
+        }
+
+        qemu_fdt_add_subnode(fdt, "/distance-map");
+        qemu_fdt_setprop_string(fdt, "/distance-map", "compatible",
+                                "numa-distance-map-v1");
+        qemu_fdt_setprop(fdt, "/distance-map", "distance-matrix",
+                         dist_matrix, dist_matrix_size);
+        g_free(dist_matrix);
+    }
+}
+
+CpuInstanceProperties
+riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+    assert(cpu_index < possible_cpus->len);
+    return possible_cpus->cpus[cpu_index].props;
+}
+
+int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx)
+{
+    int64_t nidx = 0;
+
+    if (ms->numa_state->num_nodes) {
+        nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
+        if (ms->numa_state->num_nodes <= nidx) {
+            nidx = ms->numa_state->num_nodes - 1;
+        }
+    }
+
+    return nidx;
+}
+
+const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms)
+{
+    int n;
+    unsigned int max_cpus = ms->smp.max_cpus;
+
+    if (ms->possible_cpus) {
+        assert(ms->possible_cpus->len == max_cpus);
+        return ms->possible_cpus;
+    }
+
+    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
+                                  sizeof(CPUArchId) * max_cpus);
+    ms->possible_cpus->len = max_cpus;
+    for (n = 0; n < ms->possible_cpus->len; n++) {
+        ms->possible_cpus->cpus[n].type = ms->cpu_type;
+        ms->possible_cpus->cpus[n].arch_id = n;
+        ms->possible_cpus->cpus[n].props.has_core_id = true;
+        ms->possible_cpus->cpus[n].props.core_id = n;
+    }
+
+    return ms->possible_cpus;
+}
diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h
new file mode 100644
index 0000000000..fd9517a315
--- /dev/null
+++ b/include/hw/riscv/numa.h
@@ -0,0 +1,51 @@
+/*
+ * QEMU RISC-V NUMA Helper
+ *
+ * Copyright (c) 2020 Western Digital Corporation or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef RISCV_NUMA_H
+#define RISCV_NUMA_H
+
+#include "hw/sysbus.h"
+#include "sysemu/numa.h"
+
+int riscv_socket_count(const MachineState *ms);
+
+int riscv_socket_first_hartid(const MachineState *ms, int socket_id);
+
+int riscv_socket_last_hartid(const MachineState *ms, int socket_id);
+
+int riscv_socket_hart_count(const MachineState *ms, int socket_id);
+
+uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id);
+
+uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id);
+
+bool riscv_socket_check_hartids(const MachineState *ms, int socket_id);
+
+void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
+                               const char *node_name, int socket_id);
+
+void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt);
+
+CpuInstanceProperties
+riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index);
+
+int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx);
+
+const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms);
+
+#endif /* RISCV_NUMA_H */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
@ 2020-05-29 11:46   ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Atish Patra, Anup Patel, qemu-riscv, qemu-devel, Anup Patel

We add common helper routines which can be shared by RISC-V
multi-socket NUMA machines.

We have two types of helpers:
1. riscv_socket_xyz() - These helper assist managing multiple
   sockets irrespective whether QEMU NUMA is enabled/disabled
2. riscv_numa_xyz() - These helpers assist in providing
   necessary QEMU machine callbacks for QEMU NUMA emulation

Signed-off-by: Anup Patel <anup.patel@wdc.com>
---
 hw/riscv/Makefile.objs  |   1 +
 hw/riscv/numa.c         | 242 ++++++++++++++++++++++++++++++++++++++++
 include/hw/riscv/numa.h |  51 +++++++++
 3 files changed, 294 insertions(+)
 create mode 100644 hw/riscv/numa.c
 create mode 100644 include/hw/riscv/numa.h

diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
index fc3c6dd7c8..4483e61879 100644
--- a/hw/riscv/Makefile.objs
+++ b/hw/riscv/Makefile.objs
@@ -1,4 +1,5 @@
 obj-y += boot.o
+obj-y += numa.o
 obj-$(CONFIG_SPIKE) += riscv_htif.o
 obj-$(CONFIG_HART) += riscv_hart.o
 obj-$(CONFIG_SIFIVE_E) += sifive_e.o
diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
new file mode 100644
index 0000000000..4f92307102
--- /dev/null
+++ b/hw/riscv/numa.c
@@ -0,0 +1,242 @@
+/*
+ * QEMU RISC-V NUMA Helper
+ *
+ * Copyright (c) 2020 Western Digital Corporation or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "hw/qdev-properties.h"
+#include "hw/riscv/numa.h"
+#include "sysemu/device_tree.h"
+
+static bool numa_enabled(const MachineState *ms)
+{
+    return (ms->numa_state && ms->numa_state->num_nodes) ? true : false;
+}
+
+int riscv_socket_count(const MachineState *ms)
+{
+    return (numa_enabled(ms)) ? ms->numa_state->num_nodes : 1;
+}
+
+int riscv_socket_first_hartid(const MachineState *ms, int socket_id)
+{
+    int i, first_hartid = ms->smp.cpus;
+
+    if (!numa_enabled(ms)) {
+        return (!socket_id) ? 0 : -1;
+    }
+
+    for (i = 0; i < ms->smp.cpus; i++) {
+        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
+            continue;
+        }
+        if (i < first_hartid) {
+            first_hartid = i;
+        }
+    }
+
+    return (first_hartid < ms->smp.cpus) ? first_hartid : -1;
+}
+
+int riscv_socket_last_hartid(const MachineState *ms, int socket_id)
+{
+    int i, last_hartid = -1;
+
+    if (!numa_enabled(ms)) {
+        return (!socket_id) ? ms->smp.cpus - 1 : -1;
+    }
+
+    for (i = 0; i < ms->smp.cpus; i++) {
+        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
+            continue;
+        }
+        if (i > last_hartid) {
+            last_hartid = i;
+        }
+    }
+
+    return (last_hartid < ms->smp.cpus) ? last_hartid : -1;
+}
+
+int riscv_socket_hart_count(const MachineState *ms, int socket_id)
+{
+    int first_hartid, last_hartid;
+
+    if (!numa_enabled(ms)) {
+        return (!socket_id) ? ms->smp.cpus : -1;
+    }
+
+    first_hartid = riscv_socket_first_hartid(ms, socket_id);
+    if (first_hartid < 0) {
+        return -1;
+    }
+
+    last_hartid = riscv_socket_last_hartid(ms, socket_id);
+    if (last_hartid < 0) {
+        return -1;
+    }
+
+    if (first_hartid > last_hartid) {
+        return -1;
+    }
+
+    return last_hartid - first_hartid + 1;
+}
+
+bool riscv_socket_check_hartids(const MachineState *ms, int socket_id)
+{
+    int i, first_hartid, last_hartid;
+
+    if (!numa_enabled(ms)) {
+        return (!socket_id) ? true : false;
+    }
+
+    first_hartid = riscv_socket_first_hartid(ms, socket_id);
+    if (first_hartid < 0) {
+        return false;
+    }
+
+    last_hartid = riscv_socket_last_hartid(ms, socket_id);
+    if (last_hartid < 0) {
+        return false;
+    }
+
+    for (i = first_hartid; i <= last_hartid; i++) {
+        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
+uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id)
+{
+    int i;
+    uint64_t mem_offset = 0;
+
+    if (!numa_enabled(ms)) {
+        return 0;
+    }
+
+    for (i = 0; i < ms->numa_state->num_nodes; i++) {
+        if (i == socket_id) {
+            break;
+        }
+        mem_offset += ms->numa_state->nodes[i].node_mem;
+    }
+
+    return (i == socket_id) ? mem_offset : 0;
+}
+
+uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id)
+{
+    if (!numa_enabled(ms)) {
+        return (!socket_id) ? ms->ram_size : 0;
+    }
+
+    return (socket_id < ms->numa_state->num_nodes) ?
+            ms->numa_state->nodes[socket_id].node_mem : 0;
+}
+
+void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
+                               const char *node_name, int socket_id)
+{
+    if (numa_enabled(ms)) {
+        qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id", socket_id);
+    }
+}
+
+void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt)
+{
+    int i, j, idx;
+    uint32_t *dist_matrix, dist_matrix_size;
+
+    if (numa_enabled(ms) && ms->numa_state->have_numa_distance) {
+        dist_matrix_size = riscv_socket_count(ms) * riscv_socket_count(ms);
+        dist_matrix_size *= (3 * sizeof(uint32_t));
+        dist_matrix = g_malloc0(dist_matrix_size);
+
+        for (i = 0; i < riscv_socket_count(ms); i++) {
+            for (j = 0; j < riscv_socket_count(ms); j++) {
+                idx = (i * riscv_socket_count(ms) + j) * 3;
+                dist_matrix[idx + 0] = cpu_to_be32(i);
+                dist_matrix[idx + 1] = cpu_to_be32(j);
+                dist_matrix[idx + 2] =
+                    cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
+            }
+        }
+
+        qemu_fdt_add_subnode(fdt, "/distance-map");
+        qemu_fdt_setprop_string(fdt, "/distance-map", "compatible",
+                                "numa-distance-map-v1");
+        qemu_fdt_setprop(fdt, "/distance-map", "distance-matrix",
+                         dist_matrix, dist_matrix_size);
+        g_free(dist_matrix);
+    }
+}
+
+CpuInstanceProperties
+riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+    assert(cpu_index < possible_cpus->len);
+    return possible_cpus->cpus[cpu_index].props;
+}
+
+int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx)
+{
+    int64_t nidx = 0;
+
+    if (ms->numa_state->num_nodes) {
+        nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
+        if (ms->numa_state->num_nodes <= nidx) {
+            nidx = ms->numa_state->num_nodes - 1;
+        }
+    }
+
+    return nidx;
+}
+
+const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms)
+{
+    int n;
+    unsigned int max_cpus = ms->smp.max_cpus;
+
+    if (ms->possible_cpus) {
+        assert(ms->possible_cpus->len == max_cpus);
+        return ms->possible_cpus;
+    }
+
+    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
+                                  sizeof(CPUArchId) * max_cpus);
+    ms->possible_cpus->len = max_cpus;
+    for (n = 0; n < ms->possible_cpus->len; n++) {
+        ms->possible_cpus->cpus[n].type = ms->cpu_type;
+        ms->possible_cpus->cpus[n].arch_id = n;
+        ms->possible_cpus->cpus[n].props.has_core_id = true;
+        ms->possible_cpus->cpus[n].props.core_id = n;
+    }
+
+    return ms->possible_cpus;
+}
diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h
new file mode 100644
index 0000000000..fd9517a315
--- /dev/null
+++ b/include/hw/riscv/numa.h
@@ -0,0 +1,51 @@
+/*
+ * QEMU RISC-V NUMA Helper
+ *
+ * Copyright (c) 2020 Western Digital Corporation or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef RISCV_NUMA_H
+#define RISCV_NUMA_H
+
+#include "hw/sysbus.h"
+#include "sysemu/numa.h"
+
+int riscv_socket_count(const MachineState *ms);
+
+int riscv_socket_first_hartid(const MachineState *ms, int socket_id);
+
+int riscv_socket_last_hartid(const MachineState *ms, int socket_id);
+
+int riscv_socket_hart_count(const MachineState *ms, int socket_id);
+
+uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id);
+
+uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id);
+
+bool riscv_socket_check_hartids(const MachineState *ms, int socket_id);
+
+void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
+                               const char *node_name, int socket_id);
+
+void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt);
+
+CpuInstanceProperties
+riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index);
+
+int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx);
+
+const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms);
+
+#endif /* RISCV_NUMA_H */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 4/5] hw/riscv: spike: Allow creating multiple NUMA sockets
  2020-05-29 11:46 ` Anup Patel
@ 2020-05-29 11:46   ` Anup Patel
  -1 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Atish Patra, Anup Patel, qemu-riscv, qemu-devel, Anup Patel

We extend RISC-V spike machine to allow creating a multi-socket
machine. Each RISC-V spike machine socket is a NUMA node having
a set of HARTs, a memory instance, and a CLINT instance. Other
devices are shared between all sockets. We also update the
generated device tree accordingly.

By default, NUMA multi-socket support is disabled for RISC-V spike
machine. To enable it, users can use "-numa" command-line options
of QEMU.

Example1: For two NUMA nodes with 2 CPUs each, append following
to command-line options: "-smp 4 -numa node -numa node"

Example2: For two NUMA nodes with 1 and 3 CPUs, append following
to command-line options:
"-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \
-numa cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \
-numa cpu,node-id=1,core-id=3"

The maximum number of sockets in a RISC-V spike machine is 8
but this limit can be changed in future.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
---
 hw/riscv/spike.c         | 268 ++++++++++++++++++++++++++-------------
 include/hw/riscv/spike.h |  11 +-
 2 files changed, 187 insertions(+), 92 deletions(-)

diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index d5e0103d89..b8373eb1eb 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -36,6 +36,7 @@
 #include "hw/riscv/sifive_clint.h"
 #include "hw/riscv/spike.h"
 #include "hw/riscv/boot.h"
+#include "hw/riscv/numa.h"
 #include "chardev/char.h"
 #include "sysemu/arch_init.h"
 #include "sysemu/device_tree.h"
@@ -64,9 +65,14 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
     uint64_t mem_size, const char *cmdline)
 {
     void *fdt;
-    int cpu;
-    uint32_t *cells;
-    char *nodename;
+    uint64_t addr, size;
+    unsigned long clint_addr;
+    int cpu, socket;
+    MachineState *mc = MACHINE(s);
+    uint32_t *clint_cells;
+    uint32_t cpu_phandle, intc_phandle, phandle = 1;
+    char *name, *mem_name, *clint_name, *clust_name;
+    char *core_name, *cpu_name, *intc_name;
 
     fdt = s->fdt = create_device_tree(&s->fdt_size);
     if (!fdt) {
@@ -88,68 +94,91 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
     qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
     qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
 
-    nodename = g_strdup_printf("/memory@%lx",
-        (long)memmap[SPIKE_DRAM].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
-        memmap[SPIKE_DRAM].base >> 32, memmap[SPIKE_DRAM].base,
-        mem_size >> 32, mem_size);
-    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
-    g_free(nodename);
-
     qemu_fdt_add_subnode(fdt, "/cpus");
     qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
         SIFIVE_CLINT_TIMEBASE_FREQ);
     qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
     qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
+    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
+
+    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
+        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
+        qemu_fdt_add_subnode(fdt, clust_name);
+
+        clint_cells =  g_new0(uint32_t, s->soc[socket].num_harts * 4);
 
-    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
-        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
-        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
-        qemu_fdt_add_subnode(fdt, nodename);
+        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
+            cpu_phandle = phandle++;
+
+            cpu_name = g_strdup_printf("/cpus/cpu@%d",
+                s->soc[socket].hartid_base + cpu);
+            qemu_fdt_add_subnode(fdt, cpu_name);
 #if defined(TARGET_RISCV32)
-        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
+            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv32");
 #else
-        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
+            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv48");
 #endif
-        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
-        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
-        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
-        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
-        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
-        qemu_fdt_add_subnode(fdt, intc);
-        qemu_fdt_setprop_cell(fdt, intc, "phandle", 1);
-        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
-        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
-        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
-        g_free(isa);
-        g_free(intc);
-        g_free(nodename);
-    }
+            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
+            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
+            g_free(name);
+            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
+            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
+            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
+                s->soc[socket].hartid_base + cpu);
+            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
+            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
+            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
+
+            intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
+            qemu_fdt_add_subnode(fdt, intc_name);
+            intc_phandle = phandle++;
+            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
+            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
+                "riscv,cpu-intc");
+            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
+            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells", 1);
+
+            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
+            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
+            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
+            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
+
+            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
+            qemu_fdt_add_subnode(fdt, core_name);
+            qemu_fdt_setprop_cell(fdt, core_name, "cpu", cpu_phandle);
+
+            g_free(core_name);
+            g_free(intc_name);
+            g_free(cpu_name);
+        }
 
-    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
-    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
-        nodename =
-            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
-        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
-        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
-        g_free(nodename);
+        addr = memmap[SPIKE_DRAM].base + riscv_socket_mem_offset(mc, socket);
+        size = riscv_socket_mem_size(mc, socket);
+        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
+        qemu_fdt_add_subnode(fdt, mem_name);
+        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
+            addr >> 32, addr, size >> 32, size);
+        qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
+        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
+        g_free(mem_name);
+
+        clint_addr = memmap[SPIKE_CLINT].base +
+            (memmap[SPIKE_CLINT].size * socket);
+        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
+        qemu_fdt_add_subnode(fdt, clint_name);
+        qemu_fdt_setprop_string(fdt, clint_name, "compatible", "riscv,clint0");
+        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
+            0x0, clint_addr, 0x0, memmap[SPIKE_CLINT].size);
+        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
+            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
+        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
+
+        g_free(clint_name);
+        g_free(clint_cells);
+        g_free(clust_name);
     }
-    nodename = g_strdup_printf("/soc/clint@%lx",
-        (long)memmap[SPIKE_CLINT].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
-        0x0, memmap[SPIKE_CLINT].base,
-        0x0, memmap[SPIKE_CLINT].size);
-    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
-        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
-    g_free(cells);
-    g_free(nodename);
+
+    riscv_socket_fdt_write_distance_matrix(mc, fdt);
 
     if (cmdline) {
         qemu_fdt_add_subnode(fdt, "/chosen");
@@ -160,23 +189,58 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
 static void spike_board_init(MachineState *machine)
 {
     const struct MemmapEntry *memmap = spike_memmap;
-
-    SpikeState *s = g_new0(SpikeState, 1);
+    SpikeState *s = SPIKE_MACHINE(machine);
     MemoryRegion *system_memory = get_system_memory();
     MemoryRegion *main_mem = g_new(MemoryRegion, 1);
     MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
-    int i;
-    unsigned int smp_cpus = machine->smp.cpus;
+    char *soc_name;
+    int i, base_hartid, hart_count;
 
-    /* Initialize SOC */
-    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
-                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
-    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-type",
-                            &error_abort);
-    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
-                            &error_abort);
-    object_property_set_bool(OBJECT(&s->soc), true, "realized",
-                            &error_abort);
+    /* Check socket count limit */
+    if (SPIKE_SOCKETS_MAX < riscv_socket_count(machine)) {
+        error_report("number of sockets/nodes should be less than %d",
+            SPIKE_SOCKETS_MAX);
+        exit(1);
+    }
+
+    /* Initialize sockets */
+    for (i = 0; i < riscv_socket_count(machine); i++) {
+        if (!riscv_socket_check_hartids(machine, i)) {
+            error_report("discontinuous hartids in socket%d", i);
+            exit(1);
+        }
+
+        base_hartid = riscv_socket_first_hartid(machine, i);
+        if (base_hartid < 0) {
+            error_report("can't find hartid base for socket%d", i);
+            exit(1);
+        }
+
+        hart_count = riscv_socket_hart_count(machine, i);
+        if (hart_count < 0) {
+            error_report("can't find hart count for socket%d", i);
+            exit(1);
+        }
+
+        soc_name = g_strdup_printf("soc%d", i);
+        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
+            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
+        g_free(soc_name);
+        object_property_set_str(OBJECT(&s->soc[i]),
+            machine->cpu_type, "cpu-type", &error_abort);
+        object_property_set_int(OBJECT(&s->soc[i]),
+            base_hartid, "hartid-base", &error_abort);
+        object_property_set_int(OBJECT(&s->soc[i]),
+            hart_count, "num-harts", &error_abort);
+        object_property_set_bool(OBJECT(&s->soc[i]),
+            true, "realized", &error_abort);
+
+        /* Core Local Interruptor (timer and IPI) for each socket */
+        sifive_clint_create(
+            memmap[SPIKE_CLINT].base + i * memmap[SPIKE_CLINT].size,
+            memmap[SPIKE_CLINT].size, base_hartid, hart_count,
+            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, false);
+    }
 
     /* register system main memory (actual RAM) */
     memory_region_init_ram(main_mem, NULL, "riscv.spike.ram",
@@ -249,12 +313,8 @@ static void spike_board_init(MachineState *machine)
                           &address_space_memory);
 
     /* initialize HTIF using symbols found in load_kernel */
-    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
-
-    /* Core Local Interruptor (timer and IPI) */
-    sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
-        0, smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
-        false);
+    htif_mm_init(system_memory, mask_rom,
+                 &s->soc[0].harts[0].env, serial_hd(0));
 }
 
 static void spike_v1_10_0_board_init(MachineState *machine)
@@ -275,13 +335,14 @@ static void spike_v1_10_0_board_init(MachineState *machine)
     }
 
     /* Initialize SOC */
-    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
+    object_initialize_child(OBJECT(machine), "soc",
+                            &s->soc[0], sizeof(s->soc[0]),
                             TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
-    object_property_set_str(OBJECT(&s->soc), SPIKE_V1_10_0_CPU, "cpu-type",
+    object_property_set_str(OBJECT(&s->soc[0]), SPIKE_V1_10_0_CPU, "cpu-type",
                             &error_abort);
-    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
+    object_property_set_int(OBJECT(&s->soc[0]), smp_cpus, "num-harts",
                             &error_abort);
-    object_property_set_bool(OBJECT(&s->soc), true, "realized",
+    object_property_set_bool(OBJECT(&s->soc[0]), true, "realized",
                             &error_abort);
 
     /* register system main memory (actual RAM) */
@@ -339,7 +400,8 @@ static void spike_v1_10_0_board_init(MachineState *machine)
                           &address_space_memory);
 
     /* initialize HTIF using symbols found in load_kernel */
-    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
+    htif_mm_init(system_memory, mask_rom,
+                 &s->soc[0].harts[0].env, serial_hd(0));
 
     /* Core Local Interruptor (timer and IPI) */
     sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
@@ -365,13 +427,14 @@ static void spike_v1_09_1_board_init(MachineState *machine)
     }
 
     /* Initialize SOC */
-    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
+    object_initialize_child(OBJECT(machine), "soc",
+                            &s->soc[0], sizeof(s->soc[0]),
                             TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
-    object_property_set_str(OBJECT(&s->soc), SPIKE_V1_09_1_CPU, "cpu-type",
+    object_property_set_str(OBJECT(&s->soc[0]), SPIKE_V1_09_1_CPU, "cpu-type",
                             &error_abort);
-    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
+    object_property_set_int(OBJECT(&s->soc[0]), smp_cpus, "num-harts",
                             &error_abort);
-    object_property_set_bool(OBJECT(&s->soc), true, "realized",
+    object_property_set_bool(OBJECT(&s->soc[0]), true, "realized",
                             &error_abort);
 
     /* register system main memory (actual RAM) */
@@ -425,7 +488,7 @@ static void spike_v1_09_1_board_init(MachineState *machine)
         "};\n";
 
     /* build config string with supplied memory size */
-    char *isa = riscv_isa_string(&s->soc.harts[0]);
+    char *isa = riscv_isa_string(&s->soc[0].harts[0]);
     char *config_string = g_strdup_printf(config_string_tmpl,
         (uint64_t)memmap[SPIKE_CLINT].base + SIFIVE_TIME_BASE,
         (uint64_t)memmap[SPIKE_DRAM].base,
@@ -448,7 +511,8 @@ static void spike_v1_09_1_board_init(MachineState *machine)
                           &address_space_memory);
 
     /* initialize HTIF using symbols found in load_kernel */
-    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
+    htif_mm_init(system_memory, mask_rom,
+                 &s->soc[0].harts[0].env, serial_hd(0));
 
     /* Core Local Interruptor (timer and IPI) */
     sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
@@ -472,15 +536,39 @@ static void spike_v1_10_0_machine_init(MachineClass *mc)
     mc->max_cpus = 1;
 }
 
-static void spike_machine_init(MachineClass *mc)
+DEFINE_MACHINE("spike_v1.9.1", spike_v1_09_1_machine_init)
+DEFINE_MACHINE("spike_v1.10", spike_v1_10_0_machine_init)
+
+static void spike_machine_instance_init(Object *obj)
+{
+}
+
+static void spike_machine_class_init(ObjectClass *oc, void *data)
 {
-    mc->desc = "RISC-V Spike Board";
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->desc = "RISC-V Spike board";
     mc->init = spike_board_init;
-    mc->max_cpus = 8;
+    mc->max_cpus = SPIKE_CPUS_MAX;
     mc->is_default = true;
     mc->default_cpu_type = SPIKE_V1_10_0_CPU;
+    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
+    mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
+    mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
+    mc->numa_mem_supported = true;
 }
 
-DEFINE_MACHINE("spike_v1.9.1", spike_v1_09_1_machine_init)
-DEFINE_MACHINE("spike_v1.10", spike_v1_10_0_machine_init)
-DEFINE_MACHINE("spike", spike_machine_init)
+static const TypeInfo spike_machine_typeinfo = {
+    .name       = MACHINE_TYPE_NAME("spike"),
+    .parent     = TYPE_MACHINE,
+    .class_init = spike_machine_class_init,
+    .instance_init = spike_machine_instance_init,
+    .instance_size = sizeof(SpikeState),
+};
+
+static void spike_machine_init_register_types(void)
+{
+    type_register_static(&spike_machine_typeinfo);
+}
+
+type_init(spike_machine_init_register_types)
diff --git a/include/hw/riscv/spike.h b/include/hw/riscv/spike.h
index dc770421bc..c55fdf4d24 100644
--- a/include/hw/riscv/spike.h
+++ b/include/hw/riscv/spike.h
@@ -22,12 +22,19 @@
 #include "hw/riscv/riscv_hart.h"
 #include "hw/sysbus.h"
 
+#define SPIKE_CPUS_MAX 8
+#define SPIKE_SOCKETS_MAX 8
+
+#define TYPE_SPIKE_MACHINE MACHINE_TYPE_NAME("spike")
+#define SPIKE_MACHINE(obj) \
+    OBJECT_CHECK(SpikeState, (obj), TYPE_SPIKE_MACHINE)
+
 typedef struct {
     /*< private >*/
-    SysBusDevice parent_obj;
+    MachineState parent;
 
     /*< public >*/
-    RISCVHartArrayState soc;
+    RISCVHartArrayState soc[SPIKE_SOCKETS_MAX];
     void *fdt;
     int fdt_size;
 } SpikeState;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 4/5] hw/riscv: spike: Allow creating multiple NUMA sockets
@ 2020-05-29 11:46   ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Atish Patra, Anup Patel, qemu-riscv, qemu-devel, Anup Patel

We extend RISC-V spike machine to allow creating a multi-socket
machine. Each RISC-V spike machine socket is a NUMA node having
a set of HARTs, a memory instance, and a CLINT instance. Other
devices are shared between all sockets. We also update the
generated device tree accordingly.

By default, NUMA multi-socket support is disabled for RISC-V spike
machine. To enable it, users can use "-numa" command-line options
of QEMU.

Example1: For two NUMA nodes with 2 CPUs each, append following
to command-line options: "-smp 4 -numa node -numa node"

Example2: For two NUMA nodes with 1 and 3 CPUs, append following
to command-line options:
"-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \
-numa cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \
-numa cpu,node-id=1,core-id=3"

The maximum number of sockets in a RISC-V spike machine is 8
but this limit can be changed in future.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
---
 hw/riscv/spike.c         | 268 ++++++++++++++++++++++++++-------------
 include/hw/riscv/spike.h |  11 +-
 2 files changed, 187 insertions(+), 92 deletions(-)

diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index d5e0103d89..b8373eb1eb 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -36,6 +36,7 @@
 #include "hw/riscv/sifive_clint.h"
 #include "hw/riscv/spike.h"
 #include "hw/riscv/boot.h"
+#include "hw/riscv/numa.h"
 #include "chardev/char.h"
 #include "sysemu/arch_init.h"
 #include "sysemu/device_tree.h"
@@ -64,9 +65,14 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
     uint64_t mem_size, const char *cmdline)
 {
     void *fdt;
-    int cpu;
-    uint32_t *cells;
-    char *nodename;
+    uint64_t addr, size;
+    unsigned long clint_addr;
+    int cpu, socket;
+    MachineState *mc = MACHINE(s);
+    uint32_t *clint_cells;
+    uint32_t cpu_phandle, intc_phandle, phandle = 1;
+    char *name, *mem_name, *clint_name, *clust_name;
+    char *core_name, *cpu_name, *intc_name;
 
     fdt = s->fdt = create_device_tree(&s->fdt_size);
     if (!fdt) {
@@ -88,68 +94,91 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
     qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
     qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
 
-    nodename = g_strdup_printf("/memory@%lx",
-        (long)memmap[SPIKE_DRAM].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
-        memmap[SPIKE_DRAM].base >> 32, memmap[SPIKE_DRAM].base,
-        mem_size >> 32, mem_size);
-    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
-    g_free(nodename);
-
     qemu_fdt_add_subnode(fdt, "/cpus");
     qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
         SIFIVE_CLINT_TIMEBASE_FREQ);
     qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
     qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
+    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
+
+    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
+        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
+        qemu_fdt_add_subnode(fdt, clust_name);
+
+        clint_cells =  g_new0(uint32_t, s->soc[socket].num_harts * 4);
 
-    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
-        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
-        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
-        qemu_fdt_add_subnode(fdt, nodename);
+        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
+            cpu_phandle = phandle++;
+
+            cpu_name = g_strdup_printf("/cpus/cpu@%d",
+                s->soc[socket].hartid_base + cpu);
+            qemu_fdt_add_subnode(fdt, cpu_name);
 #if defined(TARGET_RISCV32)
-        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
+            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv32");
 #else
-        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
+            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv48");
 #endif
-        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
-        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
-        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
-        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
-        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
-        qemu_fdt_add_subnode(fdt, intc);
-        qemu_fdt_setprop_cell(fdt, intc, "phandle", 1);
-        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
-        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
-        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
-        g_free(isa);
-        g_free(intc);
-        g_free(nodename);
-    }
+            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
+            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
+            g_free(name);
+            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
+            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
+            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
+                s->soc[socket].hartid_base + cpu);
+            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
+            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
+            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
+
+            intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
+            qemu_fdt_add_subnode(fdt, intc_name);
+            intc_phandle = phandle++;
+            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
+            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
+                "riscv,cpu-intc");
+            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
+            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells", 1);
+
+            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
+            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
+            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
+            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
+
+            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
+            qemu_fdt_add_subnode(fdt, core_name);
+            qemu_fdt_setprop_cell(fdt, core_name, "cpu", cpu_phandle);
+
+            g_free(core_name);
+            g_free(intc_name);
+            g_free(cpu_name);
+        }
 
-    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
-    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
-        nodename =
-            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
-        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
-        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
-        g_free(nodename);
+        addr = memmap[SPIKE_DRAM].base + riscv_socket_mem_offset(mc, socket);
+        size = riscv_socket_mem_size(mc, socket);
+        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
+        qemu_fdt_add_subnode(fdt, mem_name);
+        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
+            addr >> 32, addr, size >> 32, size);
+        qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
+        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
+        g_free(mem_name);
+
+        clint_addr = memmap[SPIKE_CLINT].base +
+            (memmap[SPIKE_CLINT].size * socket);
+        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
+        qemu_fdt_add_subnode(fdt, clint_name);
+        qemu_fdt_setprop_string(fdt, clint_name, "compatible", "riscv,clint0");
+        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
+            0x0, clint_addr, 0x0, memmap[SPIKE_CLINT].size);
+        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
+            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
+        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
+
+        g_free(clint_name);
+        g_free(clint_cells);
+        g_free(clust_name);
     }
-    nodename = g_strdup_printf("/soc/clint@%lx",
-        (long)memmap[SPIKE_CLINT].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
-        0x0, memmap[SPIKE_CLINT].base,
-        0x0, memmap[SPIKE_CLINT].size);
-    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
-        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
-    g_free(cells);
-    g_free(nodename);
+
+    riscv_socket_fdt_write_distance_matrix(mc, fdt);
 
     if (cmdline) {
         qemu_fdt_add_subnode(fdt, "/chosen");
@@ -160,23 +189,58 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
 static void spike_board_init(MachineState *machine)
 {
     const struct MemmapEntry *memmap = spike_memmap;
-
-    SpikeState *s = g_new0(SpikeState, 1);
+    SpikeState *s = SPIKE_MACHINE(machine);
     MemoryRegion *system_memory = get_system_memory();
     MemoryRegion *main_mem = g_new(MemoryRegion, 1);
     MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
-    int i;
-    unsigned int smp_cpus = machine->smp.cpus;
+    char *soc_name;
+    int i, base_hartid, hart_count;
 
-    /* Initialize SOC */
-    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
-                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
-    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-type",
-                            &error_abort);
-    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
-                            &error_abort);
-    object_property_set_bool(OBJECT(&s->soc), true, "realized",
-                            &error_abort);
+    /* Check socket count limit */
+    if (SPIKE_SOCKETS_MAX < riscv_socket_count(machine)) {
+        error_report("number of sockets/nodes should be less than %d",
+            SPIKE_SOCKETS_MAX);
+        exit(1);
+    }
+
+    /* Initialize sockets */
+    for (i = 0; i < riscv_socket_count(machine); i++) {
+        if (!riscv_socket_check_hartids(machine, i)) {
+            error_report("discontinuous hartids in socket%d", i);
+            exit(1);
+        }
+
+        base_hartid = riscv_socket_first_hartid(machine, i);
+        if (base_hartid < 0) {
+            error_report("can't find hartid base for socket%d", i);
+            exit(1);
+        }
+
+        hart_count = riscv_socket_hart_count(machine, i);
+        if (hart_count < 0) {
+            error_report("can't find hart count for socket%d", i);
+            exit(1);
+        }
+
+        soc_name = g_strdup_printf("soc%d", i);
+        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
+            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
+        g_free(soc_name);
+        object_property_set_str(OBJECT(&s->soc[i]),
+            machine->cpu_type, "cpu-type", &error_abort);
+        object_property_set_int(OBJECT(&s->soc[i]),
+            base_hartid, "hartid-base", &error_abort);
+        object_property_set_int(OBJECT(&s->soc[i]),
+            hart_count, "num-harts", &error_abort);
+        object_property_set_bool(OBJECT(&s->soc[i]),
+            true, "realized", &error_abort);
+
+        /* Core Local Interruptor (timer and IPI) for each socket */
+        sifive_clint_create(
+            memmap[SPIKE_CLINT].base + i * memmap[SPIKE_CLINT].size,
+            memmap[SPIKE_CLINT].size, base_hartid, hart_count,
+            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, false);
+    }
 
     /* register system main memory (actual RAM) */
     memory_region_init_ram(main_mem, NULL, "riscv.spike.ram",
@@ -249,12 +313,8 @@ static void spike_board_init(MachineState *machine)
                           &address_space_memory);
 
     /* initialize HTIF using symbols found in load_kernel */
-    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
-
-    /* Core Local Interruptor (timer and IPI) */
-    sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
-        0, smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
-        false);
+    htif_mm_init(system_memory, mask_rom,
+                 &s->soc[0].harts[0].env, serial_hd(0));
 }
 
 static void spike_v1_10_0_board_init(MachineState *machine)
@@ -275,13 +335,14 @@ static void spike_v1_10_0_board_init(MachineState *machine)
     }
 
     /* Initialize SOC */
-    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
+    object_initialize_child(OBJECT(machine), "soc",
+                            &s->soc[0], sizeof(s->soc[0]),
                             TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
-    object_property_set_str(OBJECT(&s->soc), SPIKE_V1_10_0_CPU, "cpu-type",
+    object_property_set_str(OBJECT(&s->soc[0]), SPIKE_V1_10_0_CPU, "cpu-type",
                             &error_abort);
-    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
+    object_property_set_int(OBJECT(&s->soc[0]), smp_cpus, "num-harts",
                             &error_abort);
-    object_property_set_bool(OBJECT(&s->soc), true, "realized",
+    object_property_set_bool(OBJECT(&s->soc[0]), true, "realized",
                             &error_abort);
 
     /* register system main memory (actual RAM) */
@@ -339,7 +400,8 @@ static void spike_v1_10_0_board_init(MachineState *machine)
                           &address_space_memory);
 
     /* initialize HTIF using symbols found in load_kernel */
-    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
+    htif_mm_init(system_memory, mask_rom,
+                 &s->soc[0].harts[0].env, serial_hd(0));
 
     /* Core Local Interruptor (timer and IPI) */
     sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
@@ -365,13 +427,14 @@ static void spike_v1_09_1_board_init(MachineState *machine)
     }
 
     /* Initialize SOC */
-    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
+    object_initialize_child(OBJECT(machine), "soc",
+                            &s->soc[0], sizeof(s->soc[0]),
                             TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
-    object_property_set_str(OBJECT(&s->soc), SPIKE_V1_09_1_CPU, "cpu-type",
+    object_property_set_str(OBJECT(&s->soc[0]), SPIKE_V1_09_1_CPU, "cpu-type",
                             &error_abort);
-    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
+    object_property_set_int(OBJECT(&s->soc[0]), smp_cpus, "num-harts",
                             &error_abort);
-    object_property_set_bool(OBJECT(&s->soc), true, "realized",
+    object_property_set_bool(OBJECT(&s->soc[0]), true, "realized",
                             &error_abort);
 
     /* register system main memory (actual RAM) */
@@ -425,7 +488,7 @@ static void spike_v1_09_1_board_init(MachineState *machine)
         "};\n";
 
     /* build config string with supplied memory size */
-    char *isa = riscv_isa_string(&s->soc.harts[0]);
+    char *isa = riscv_isa_string(&s->soc[0].harts[0]);
     char *config_string = g_strdup_printf(config_string_tmpl,
         (uint64_t)memmap[SPIKE_CLINT].base + SIFIVE_TIME_BASE,
         (uint64_t)memmap[SPIKE_DRAM].base,
@@ -448,7 +511,8 @@ static void spike_v1_09_1_board_init(MachineState *machine)
                           &address_space_memory);
 
     /* initialize HTIF using symbols found in load_kernel */
-    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
+    htif_mm_init(system_memory, mask_rom,
+                 &s->soc[0].harts[0].env, serial_hd(0));
 
     /* Core Local Interruptor (timer and IPI) */
     sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
@@ -472,15 +536,39 @@ static void spike_v1_10_0_machine_init(MachineClass *mc)
     mc->max_cpus = 1;
 }
 
-static void spike_machine_init(MachineClass *mc)
+DEFINE_MACHINE("spike_v1.9.1", spike_v1_09_1_machine_init)
+DEFINE_MACHINE("spike_v1.10", spike_v1_10_0_machine_init)
+
+static void spike_machine_instance_init(Object *obj)
+{
+}
+
+static void spike_machine_class_init(ObjectClass *oc, void *data)
 {
-    mc->desc = "RISC-V Spike Board";
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->desc = "RISC-V Spike board";
     mc->init = spike_board_init;
-    mc->max_cpus = 8;
+    mc->max_cpus = SPIKE_CPUS_MAX;
     mc->is_default = true;
     mc->default_cpu_type = SPIKE_V1_10_0_CPU;
+    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
+    mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
+    mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
+    mc->numa_mem_supported = true;
 }
 
-DEFINE_MACHINE("spike_v1.9.1", spike_v1_09_1_machine_init)
-DEFINE_MACHINE("spike_v1.10", spike_v1_10_0_machine_init)
-DEFINE_MACHINE("spike", spike_machine_init)
+static const TypeInfo spike_machine_typeinfo = {
+    .name       = MACHINE_TYPE_NAME("spike"),
+    .parent     = TYPE_MACHINE,
+    .class_init = spike_machine_class_init,
+    .instance_init = spike_machine_instance_init,
+    .instance_size = sizeof(SpikeState),
+};
+
+static void spike_machine_init_register_types(void)
+{
+    type_register_static(&spike_machine_typeinfo);
+}
+
+type_init(spike_machine_init_register_types)
diff --git a/include/hw/riscv/spike.h b/include/hw/riscv/spike.h
index dc770421bc..c55fdf4d24 100644
--- a/include/hw/riscv/spike.h
+++ b/include/hw/riscv/spike.h
@@ -22,12 +22,19 @@
 #include "hw/riscv/riscv_hart.h"
 #include "hw/sysbus.h"
 
+#define SPIKE_CPUS_MAX 8
+#define SPIKE_SOCKETS_MAX 8
+
+#define TYPE_SPIKE_MACHINE MACHINE_TYPE_NAME("spike")
+#define SPIKE_MACHINE(obj) \
+    OBJECT_CHECK(SpikeState, (obj), TYPE_SPIKE_MACHINE)
+
 typedef struct {
     /*< private >*/
-    SysBusDevice parent_obj;
+    MachineState parent;
 
     /*< public >*/
-    RISCVHartArrayState soc;
+    RISCVHartArrayState soc[SPIKE_SOCKETS_MAX];
     void *fdt;
     int fdt_size;
 } SpikeState;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 5/5] hw/riscv: virt: Allow creating multiple NUMA sockets
  2020-05-29 11:46 ` Anup Patel
@ 2020-05-29 11:46   ` Anup Patel
  -1 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Atish Patra, Anup Patel, qemu-riscv, qemu-devel, Anup Patel

We extend RISC-V virt machine to allow creating a multi-socket
machine. Each RISC-V virt machine socket is a NUMA node having
a set of HARTs, a memory instance, a CLINT instance, and a PLIC
instance. Other devices are shared between all sockets. We also
update the generated device tree accordingly.

By default, NUMA multi-socket support is disabled for RISC-V virt
machine. To enable it, users can use "-numa" command-line options
of QEMU.

Example1: For two NUMA nodes with 2 CPUs each, append following
to command-line options: "-smp 4 -numa node -numa node"

Example2: For two NUMA nodes with 1 and 3 CPUs, append following
to command-line options:
"-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \
-numa cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \
-numa cpu,node-id=1,core-id=3"

The maximum number of sockets in a RISC-V virt machine is 8
but this limit can be changed in future.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
---
 hw/riscv/virt.c         | 530 +++++++++++++++++++++++-----------------
 include/hw/riscv/virt.h |   9 +-
 2 files changed, 308 insertions(+), 231 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 421815081d..2863b42cea 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -35,6 +35,7 @@
 #include "hw/riscv/sifive_test.h"
 #include "hw/riscv/virt.h"
 #include "hw/riscv/boot.h"
+#include "hw/riscv/numa.h"
 #include "chardev/char.h"
 #include "sysemu/arch_init.h"
 #include "sysemu/device_tree.h"
@@ -60,7 +61,7 @@ static const struct MemmapEntry {
     [VIRT_TEST] =        {   0x100000,        0x1000 },
     [VIRT_RTC] =         {   0x101000,        0x1000 },
     [VIRT_CLINT] =       {  0x2000000,       0x10000 },
-    [VIRT_PLIC] =        {  0xc000000,     0x4000000 },
+    [VIRT_PLIC] =        {  0xc000000, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
     [VIRT_UART0] =       { 0x10000000,         0x100 },
     [VIRT_VIRTIO] =      { 0x10001000,        0x1000 },
     [VIRT_FLASH] =       { 0x20000000,     0x4000000 },
@@ -182,10 +183,17 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
     uint64_t mem_size, const char *cmdline)
 {
     void *fdt;
-    int cpu, i;
-    uint32_t *cells;
-    char *nodename;
-    uint32_t plic_phandle, test_phandle, phandle = 1;
+    int i, cpu, socket;
+    MachineState *mc = MACHINE(s);
+    uint64_t addr, size;
+    uint32_t *clint_cells, *plic_cells;
+    unsigned long clint_addr, plic_addr;
+    uint32_t plic_phandle[MAX_NODES];
+    uint32_t cpu_phandle, intc_phandle, test_phandle;
+    uint32_t phandle = 1, plic_mmio_phandle = 1;
+    uint32_t plic_pcie_phandle = 1, plic_virtio_phandle = 1;
+    char *mem_name, *cpu_name, *core_name, *intc_name;
+    char *name, *clint_name, *plic_name, *clust_name;
     hwaddr flashsize = virt_memmap[VIRT_FLASH].size / 2;
     hwaddr flashbase = virt_memmap[VIRT_FLASH].base;
 
@@ -206,231 +214,238 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
     qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
     qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
 
-    nodename = g_strdup_printf("/memory@%lx",
-        (long)memmap[VIRT_DRAM].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
-        memmap[VIRT_DRAM].base >> 32, memmap[VIRT_DRAM].base,
-        mem_size >> 32, mem_size);
-    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
-    g_free(nodename);
-
     qemu_fdt_add_subnode(fdt, "/cpus");
     qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
                           SIFIVE_CLINT_TIMEBASE_FREQ);
     qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
     qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
+    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
+
+    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
+        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
+        qemu_fdt_add_subnode(fdt, clust_name);
+
+        plic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
+        clint_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
+
+        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
+            cpu_phandle = phandle++;
 
-    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
-        int cpu_phandle = phandle++;
-        int intc_phandle;
-        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
-        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
-        qemu_fdt_add_subnode(fdt, nodename);
+            cpu_name = g_strdup_printf("/cpus/cpu@%d",
+                s->soc[socket].hartid_base + cpu);
+            qemu_fdt_add_subnode(fdt, cpu_name);
 #if defined(TARGET_RISCV32)
-        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
+            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv32");
 #else
-        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
+            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv48");
 #endif
-        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
-        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
-        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
-        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
-        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
-        qemu_fdt_setprop_cell(fdt, nodename, "phandle", cpu_phandle);
-        intc_phandle = phandle++;
-        qemu_fdt_add_subnode(fdt, intc);
-        qemu_fdt_setprop_cell(fdt, intc, "phandle", intc_phandle);
-        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
-        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
-        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
-        g_free(isa);
-        g_free(intc);
-        g_free(nodename);
-    }
+            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
+            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
+            g_free(name);
+            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
+            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
+            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
+                s->soc[socket].hartid_base + cpu);
+            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
+            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
+            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
+
+            intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
+            qemu_fdt_add_subnode(fdt, intc_name);
+            intc_phandle = phandle++;
+            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
+            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
+                "riscv,cpu-intc");
+            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
+            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells", 1);
+
+            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
+            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
+            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
+            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
+
+            plic_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
+            plic_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
+            plic_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
+            plic_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
+
+            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
+            qemu_fdt_add_subnode(fdt, core_name);
+            qemu_fdt_setprop_cell(fdt, core_name, "cpu", cpu_phandle);
+
+            g_free(core_name);
+            g_free(intc_name);
+            g_free(cpu_name);
+        }
 
-    /* Add cpu-topology node */
-    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
-    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map/cluster0");
-    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
-        char *core_nodename = g_strdup_printf("/cpus/cpu-map/cluster0/core%d",
-                                              cpu);
-        char *cpu_nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
-        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, cpu_nodename);
-        qemu_fdt_add_subnode(fdt, core_nodename);
-        qemu_fdt_setprop_cell(fdt, core_nodename, "cpu", intc_phandle);
-        g_free(core_nodename);
-        g_free(cpu_nodename);
+        addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(mc, socket);
+        size = riscv_socket_mem_size(mc, socket);
+        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
+        qemu_fdt_add_subnode(fdt, mem_name);
+        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
+            addr >> 32, addr, size >> 32, size);
+        qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
+        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
+        g_free(mem_name);
+
+        clint_addr = memmap[VIRT_CLINT].base +
+            (memmap[VIRT_CLINT].size * socket);
+        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
+        qemu_fdt_add_subnode(fdt, clint_name);
+        qemu_fdt_setprop_string(fdt, clint_name, "compatible", "riscv,clint0");
+        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
+            0x0, clint_addr, 0x0, memmap[VIRT_CLINT].size);
+        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
+            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
+        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
+        g_free(clint_name);
+
+        plic_phandle[socket] = phandle++;
+        plic_addr = memmap[VIRT_PLIC].base + (memmap[VIRT_PLIC].size * socket);
+        plic_name = g_strdup_printf("/soc/plic@%lx", plic_addr);
+        qemu_fdt_add_subnode(fdt, plic_name);
+        qemu_fdt_setprop_cell(fdt, plic_name,
+            "#address-cells", FDT_PLIC_ADDR_CELLS);
+        qemu_fdt_setprop_cell(fdt, plic_name,
+            "#interrupt-cells", FDT_PLIC_INT_CELLS);
+        qemu_fdt_setprop_string(fdt, plic_name, "compatible", "riscv,plic0");
+        qemu_fdt_setprop(fdt, plic_name, "interrupt-controller", NULL, 0);
+        qemu_fdt_setprop(fdt, plic_name, "interrupts-extended",
+            plic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
+        qemu_fdt_setprop_cells(fdt, plic_name, "reg",
+            0x0, plic_addr, 0x0, memmap[VIRT_PLIC].size);
+        qemu_fdt_setprop_cell(fdt, plic_name, "riscv,ndev", VIRTIO_NDEV);
+        riscv_socket_fdt_write_id(mc, fdt, plic_name, socket);
+        qemu_fdt_setprop_cell(fdt, plic_name, "phandle", plic_phandle[socket]);
+        g_free(plic_name);
+
+        g_free(clint_cells);
+        g_free(plic_cells);
+        g_free(clust_name);
     }
 
-    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
-    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
-        nodename =
-            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
-        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
-        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
-        g_free(nodename);
-    }
-    nodename = g_strdup_printf("/soc/clint@%lx",
-        (long)memmap[VIRT_CLINT].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
-        0x0, memmap[VIRT_CLINT].base,
-        0x0, memmap[VIRT_CLINT].size);
-    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
-        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
-    g_free(cells);
-    g_free(nodename);
-
-    plic_phandle = phandle++;
-    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
-    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
-        nodename =
-            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
-        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
-        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
-        g_free(nodename);
+    for (socket = 0; socket < riscv_socket_count(mc); socket++) {
+        if (socket == 0) {
+            plic_mmio_phandle = plic_phandle[socket];
+            plic_virtio_phandle = plic_phandle[socket];
+            plic_pcie_phandle = plic_phandle[socket];
+        }
+        if (socket == 1) {
+            plic_virtio_phandle = plic_phandle[socket];
+            plic_pcie_phandle = plic_phandle[socket];
+        }
+        if (socket == 2) {
+            plic_pcie_phandle = plic_phandle[socket];
+        }
     }
-    nodename = g_strdup_printf("/soc/interrupt-controller@%lx",
-        (long)memmap[VIRT_PLIC].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
-                          FDT_PLIC_ADDR_CELLS);
-    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
-                          FDT_PLIC_INT_CELLS);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,plic0");
-    qemu_fdt_setprop(fdt, nodename, "interrupt-controller", NULL, 0);
-    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
-        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
-        0x0, memmap[VIRT_PLIC].base,
-        0x0, memmap[VIRT_PLIC].size);
-    qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
-    qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
-    plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
-    g_free(cells);
-    g_free(nodename);
+
+    riscv_socket_fdt_write_distance_matrix(mc, fdt);
 
     for (i = 0; i < VIRTIO_COUNT; i++) {
-        nodename = g_strdup_printf("/virtio_mmio@%lx",
+        name = g_strdup_printf("/soc/virtio_mmio@%lx",
             (long)(memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size));
-        qemu_fdt_add_subnode(fdt, nodename);
-        qemu_fdt_setprop_string(fdt, nodename, "compatible", "virtio,mmio");
-        qemu_fdt_setprop_cells(fdt, nodename, "reg",
+        qemu_fdt_add_subnode(fdt, name);
+        qemu_fdt_setprop_string(fdt, name, "compatible", "virtio,mmio");
+        qemu_fdt_setprop_cells(fdt, name, "reg",
             0x0, memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
             0x0, memmap[VIRT_VIRTIO].size);
-        qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
-        qemu_fdt_setprop_cell(fdt, nodename, "interrupts", VIRTIO_IRQ + i);
-        g_free(nodename);
+        qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
+            plic_virtio_phandle);
+        qemu_fdt_setprop_cell(fdt, name, "interrupts", VIRTIO_IRQ + i);
+        g_free(name);
     }
 
-    nodename = g_strdup_printf("/soc/pci@%lx",
+    name = g_strdup_printf("/soc/pci@%lx",
         (long) memmap[VIRT_PCIE_ECAM].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
-                          FDT_PCI_ADDR_CELLS);
-    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
-                          FDT_PCI_INT_CELLS);
-    qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0x2);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible",
-                            "pci-host-ecam-generic");
-    qemu_fdt_setprop_string(fdt, nodename, "device_type", "pci");
-    qemu_fdt_setprop_cell(fdt, nodename, "linux,pci-domain", 0);
-    qemu_fdt_setprop_cells(fdt, nodename, "bus-range", 0,
-                           memmap[VIRT_PCIE_ECAM].size /
-                               PCIE_MMCFG_SIZE_MIN - 1);
-    qemu_fdt_setprop(fdt, nodename, "dma-coherent", NULL, 0);
-    qemu_fdt_setprop_cells(fdt, nodename, "reg", 0, memmap[VIRT_PCIE_ECAM].base,
-                           0, memmap[VIRT_PCIE_ECAM].size);
-    qemu_fdt_setprop_sized_cells(fdt, nodename, "ranges",
+    qemu_fdt_add_subnode(fdt, name);
+    qemu_fdt_setprop_cell(fdt, name, "#address-cells", FDT_PCI_ADDR_CELLS);
+    qemu_fdt_setprop_cell(fdt, name, "#interrupt-cells", FDT_PCI_INT_CELLS);
+    qemu_fdt_setprop_cell(fdt, name, "#size-cells", 0x2);
+    qemu_fdt_setprop_string(fdt, name, "compatible", "pci-host-ecam-generic");
+    qemu_fdt_setprop_string(fdt, name, "device_type", "pci");
+    qemu_fdt_setprop_cell(fdt, name, "linux,pci-domain", 0);
+    qemu_fdt_setprop_cells(fdt, name, "bus-range", 0,
+        memmap[VIRT_PCIE_ECAM].size / PCIE_MMCFG_SIZE_MIN - 1);
+    qemu_fdt_setprop(fdt, name, "dma-coherent", NULL, 0);
+    qemu_fdt_setprop_cells(fdt, name, "reg", 0,
+        memmap[VIRT_PCIE_ECAM].base, 0, memmap[VIRT_PCIE_ECAM].size);
+    qemu_fdt_setprop_sized_cells(fdt, name, "ranges",
         1, FDT_PCI_RANGE_IOPORT, 2, 0,
         2, memmap[VIRT_PCIE_PIO].base, 2, memmap[VIRT_PCIE_PIO].size,
         1, FDT_PCI_RANGE_MMIO,
         2, memmap[VIRT_PCIE_MMIO].base,
         2, memmap[VIRT_PCIE_MMIO].base, 2, memmap[VIRT_PCIE_MMIO].size);
-    create_pcie_irq_map(fdt, nodename, plic_phandle);
-    g_free(nodename);
+    create_pcie_irq_map(fdt, name, plic_pcie_phandle);
+    g_free(name);
 
     test_phandle = phandle++;
-    nodename = g_strdup_printf("/test@%lx",
+    name = g_strdup_printf("/soc/test@%lx",
         (long)memmap[VIRT_TEST].base);
-    qemu_fdt_add_subnode(fdt, nodename);
+    qemu_fdt_add_subnode(fdt, name);
     {
         const char compat[] = "sifive,test1\0sifive,test0\0syscon";
-        qemu_fdt_setprop(fdt, nodename, "compatible", compat, sizeof(compat));
+        qemu_fdt_setprop(fdt, name, "compatible", compat, sizeof(compat));
     }
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
+    qemu_fdt_setprop_cells(fdt, name, "reg",
         0x0, memmap[VIRT_TEST].base,
         0x0, memmap[VIRT_TEST].size);
-    qemu_fdt_setprop_cell(fdt, nodename, "phandle", test_phandle);
-    test_phandle = qemu_fdt_get_phandle(fdt, nodename);
-    g_free(nodename);
-
-    nodename = g_strdup_printf("/reboot");
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-reboot");
-    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
-    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
-    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_RESET);
-    g_free(nodename);
-
-    nodename = g_strdup_printf("/poweroff");
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-poweroff");
-    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
-    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
-    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_PASS);
-    g_free(nodename);
-
-    nodename = g_strdup_printf("/uart@%lx",
-        (long)memmap[VIRT_UART0].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "ns16550a");
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
+    qemu_fdt_setprop_cell(fdt, name, "phandle", test_phandle);
+    test_phandle = qemu_fdt_get_phandle(fdt, name);
+    g_free(name);
+
+    name = g_strdup_printf("/soc/reboot");
+    qemu_fdt_add_subnode(fdt, name);
+    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-reboot");
+    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
+    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
+    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_RESET);
+    g_free(name);
+
+    name = g_strdup_printf("/soc/poweroff");
+    qemu_fdt_add_subnode(fdt, name);
+    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-poweroff");
+    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
+    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
+    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_PASS);
+    g_free(name);
+
+    name = g_strdup_printf("/soc/uart@%lx", (long)memmap[VIRT_UART0].base);
+    qemu_fdt_add_subnode(fdt, name);
+    qemu_fdt_setprop_string(fdt, name, "compatible", "ns16550a");
+    qemu_fdt_setprop_cells(fdt, name, "reg",
         0x0, memmap[VIRT_UART0].base,
         0x0, memmap[VIRT_UART0].size);
-    qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
-    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
-    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", UART0_IRQ);
+    qemu_fdt_setprop_cell(fdt, name, "clock-frequency", 3686400);
+    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
+    qemu_fdt_setprop_cell(fdt, name, "interrupts", UART0_IRQ);
 
     qemu_fdt_add_subnode(fdt, "/chosen");
-    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", nodename);
+    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", name);
     if (cmdline) {
         qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
     }
-    g_free(nodename);
-
-    nodename = g_strdup_printf("/rtc@%lx",
-        (long)memmap[VIRT_RTC].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible",
-        "google,goldfish-rtc");
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
+    g_free(name);
+
+    name = g_strdup_printf("/soc/rtc@%lx", (long)memmap[VIRT_RTC].base);
+    qemu_fdt_add_subnode(fdt, name);
+    qemu_fdt_setprop_string(fdt, name, "compatible", "google,goldfish-rtc");
+    qemu_fdt_setprop_cells(fdt, name, "reg",
         0x0, memmap[VIRT_RTC].base,
         0x0, memmap[VIRT_RTC].size);
-    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
-    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", RTC_IRQ);
-    g_free(nodename);
-
-    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
-    qemu_fdt_add_subnode(s->fdt, nodename);
-    qemu_fdt_setprop_string(s->fdt, nodename, "compatible", "cfi-flash");
-    qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg",
+    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
+    qemu_fdt_setprop_cell(fdt, name, "interrupts", RTC_IRQ);
+    g_free(name);
+
+    name = g_strdup_printf("/soc/flash@%" PRIx64, flashbase);
+    qemu_fdt_add_subnode(s->fdt, name);
+    qemu_fdt_setprop_string(s->fdt, name, "compatible", "cfi-flash");
+    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
                                  2, flashbase, 2, flashsize,
                                  2, flashbase + flashsize, 2, flashsize);
-    qemu_fdt_setprop_cell(s->fdt, nodename, "bank-width", 4);
-    g_free(nodename);
+    qemu_fdt_setprop_cell(s->fdt, name, "bank-width", 4);
+    g_free(name);
 }
 
-
 static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
                                           hwaddr ecam_base, hwaddr ecam_size,
                                           hwaddr mmio_base, hwaddr mmio_size,
@@ -478,21 +493,100 @@ static void riscv_virt_board_init(MachineState *machine)
     MemoryRegion *system_memory = get_system_memory();
     MemoryRegion *main_mem = g_new(MemoryRegion, 1);
     MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
-    char *plic_hart_config;
+    char *plic_hart_config, *soc_name;
     size_t plic_hart_config_len;
     target_ulong start_addr = memmap[VIRT_DRAM].base;
-    int i;
-    unsigned int smp_cpus = machine->smp.cpus;
-
-    /* Initialize SOC */
-    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
-                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
-    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-type",
-                            &error_abort);
-    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
-                            &error_abort);
-    object_property_set_bool(OBJECT(&s->soc), true, "realized",
-                            &error_abort);
+    DeviceState *mmio_plic, *virtio_plic, *pcie_plic;
+    int i, j, base_hartid, hart_count;
+
+    /* Check socket count limit */
+    if (VIRT_SOCKETS_MAX < riscv_socket_count(machine)) {
+        error_report("number of sockets/nodes should be less than %d",
+            VIRT_SOCKETS_MAX);
+        exit(1);
+    }
+
+    /* Initialize sockets */
+    mmio_plic = virtio_plic = pcie_plic = NULL;
+    for (i = 0; i < riscv_socket_count(machine); i++) {
+        if (!riscv_socket_check_hartids(machine, i)) {
+            error_report("discontinuous hartids in socket%d", i);
+            exit(1);
+        }
+
+        base_hartid = riscv_socket_first_hartid(machine, i);
+        if (base_hartid < 0) {
+            error_report("can't find hartid base for socket%d", i);
+            exit(1);
+        }
+
+        hart_count = riscv_socket_hart_count(machine, i);
+        if (hart_count < 0) {
+            error_report("can't find hart count for socket%d", i);
+            exit(1);
+        }
+
+        soc_name = g_strdup_printf("soc%d", i);
+        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
+            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
+        g_free(soc_name);
+        object_property_set_str(OBJECT(&s->soc[i]),
+            machine->cpu_type, "cpu-type", &error_abort);
+        object_property_set_int(OBJECT(&s->soc[i]),
+            base_hartid, "hartid-base", &error_abort);
+        object_property_set_int(OBJECT(&s->soc[i]),
+            hart_count, "num-harts", &error_abort);
+        object_property_set_bool(OBJECT(&s->soc[i]),
+            true, "realized", &error_abort);
+
+        /* Per-socket CLINT */
+        sifive_clint_create(
+            memmap[VIRT_CLINT].base + i * memmap[VIRT_CLINT].size,
+            memmap[VIRT_CLINT].size, base_hartid, hart_count,
+            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
+
+        /* Per-socket PLIC hart topology configuration string */
+        plic_hart_config_len =
+            (strlen(VIRT_PLIC_HART_CONFIG) + 1) * hart_count;
+        plic_hart_config = g_malloc0(plic_hart_config_len);
+        for (j = 0; j < hart_count; j++) {
+            if (j != 0) {
+                strncat(plic_hart_config, ",", plic_hart_config_len);
+            }
+            strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG,
+                plic_hart_config_len);
+            plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
+        }
+
+        /* Per-socket PLIC */
+        s->plic[i] = sifive_plic_create(
+            memmap[VIRT_PLIC].base + i * memmap[VIRT_PLIC].size,
+            plic_hart_config, base_hartid,
+            VIRT_PLIC_NUM_SOURCES,
+            VIRT_PLIC_NUM_PRIORITIES,
+            VIRT_PLIC_PRIORITY_BASE,
+            VIRT_PLIC_PENDING_BASE,
+            VIRT_PLIC_ENABLE_BASE,
+            VIRT_PLIC_ENABLE_STRIDE,
+            VIRT_PLIC_CONTEXT_BASE,
+            VIRT_PLIC_CONTEXT_STRIDE,
+            memmap[VIRT_PLIC].size);
+        g_free(plic_hart_config);
+
+        /* Try to use different PLIC instance based device type */
+        if (i == 0) {
+            mmio_plic = s->plic[i];
+            virtio_plic = s->plic[i];
+            pcie_plic = s->plic[i];
+        }
+        if (i == 1) {
+            virtio_plic = s->plic[i];
+            pcie_plic = s->plic[i];
+        }
+        if (i == 2) {
+            pcie_plic = s->plic[i];
+        }
+    }
 
     /* register system main memory (actual RAM) */
     memory_region_init_ram(main_mem, NULL, "riscv_virt_board.ram",
@@ -571,38 +665,14 @@ static void riscv_virt_board_init(MachineState *machine)
                           memmap[VIRT_MROM].base + sizeof(reset_vec),
                           &address_space_memory);
 
-    /* create PLIC hart topology configuration string */
-    plic_hart_config_len = (strlen(VIRT_PLIC_HART_CONFIG) + 1) * smp_cpus;
-    plic_hart_config = g_malloc0(plic_hart_config_len);
-    for (i = 0; i < smp_cpus; i++) {
-        if (i != 0) {
-            strncat(plic_hart_config, ",", plic_hart_config_len);
-        }
-        strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG, plic_hart_config_len);
-        plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
-    }
-
-    /* MMIO */
-    s->plic = sifive_plic_create(memmap[VIRT_PLIC].base,
-        plic_hart_config, 0,
-        VIRT_PLIC_NUM_SOURCES,
-        VIRT_PLIC_NUM_PRIORITIES,
-        VIRT_PLIC_PRIORITY_BASE,
-        VIRT_PLIC_PENDING_BASE,
-        VIRT_PLIC_ENABLE_BASE,
-        VIRT_PLIC_ENABLE_STRIDE,
-        VIRT_PLIC_CONTEXT_BASE,
-        VIRT_PLIC_CONTEXT_STRIDE,
-        memmap[VIRT_PLIC].size);
-    sifive_clint_create(memmap[VIRT_CLINT].base,
-        memmap[VIRT_CLINT].size, 0, smp_cpus,
-        SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
+    /* SiFive Test MMIO device */
     sifive_test_create(memmap[VIRT_TEST].base);
 
+    /* VirtIO MMIO devices */
     for (i = 0; i < VIRTIO_COUNT; i++) {
         sysbus_create_simple("virtio-mmio",
             memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
-            qdev_get_gpio_in(DEVICE(s->plic), VIRTIO_IRQ + i));
+            qdev_get_gpio_in(DEVICE(virtio_plic), VIRTIO_IRQ + i));
     }
 
     gpex_pcie_init(system_memory,
@@ -611,14 +681,14 @@ static void riscv_virt_board_init(MachineState *machine)
                          memmap[VIRT_PCIE_MMIO].base,
                          memmap[VIRT_PCIE_MMIO].size,
                          memmap[VIRT_PCIE_PIO].base,
-                         DEVICE(s->plic), true);
+                         DEVICE(pcie_plic), true);
 
     serial_mm_init(system_memory, memmap[VIRT_UART0].base,
-        0, qdev_get_gpio_in(DEVICE(s->plic), UART0_IRQ), 399193,
+        0, qdev_get_gpio_in(DEVICE(mmio_plic), UART0_IRQ), 399193,
         serial_hd(0), DEVICE_LITTLE_ENDIAN);
 
     sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
-        qdev_get_gpio_in(DEVICE(s->plic), RTC_IRQ));
+        qdev_get_gpio_in(DEVICE(mmio_plic), RTC_IRQ));
 
     virt_flash_create(s);
 
@@ -628,8 +698,6 @@ static void riscv_virt_board_init(MachineState *machine)
                                   drive_get(IF_PFLASH, 0, i));
     }
     virt_flash_map(s, system_memory);
-
-    g_free(plic_hart_config);
 }
 
 static void riscv_virt_machine_instance_init(Object *obj)
@@ -642,9 +710,13 @@ static void riscv_virt_machine_class_init(ObjectClass *oc, void *data)
 
     mc->desc = "RISC-V VirtIO board";
     mc->init = riscv_virt_board_init;
-    mc->max_cpus = 8;
+    mc->max_cpus = VIRT_CPUS_MAX;
     mc->default_cpu_type = VIRT_CPU;
     mc->pci_allow_0_address = true;
+    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
+    mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
+    mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
+    mc->numa_mem_supported = true;
 }
 
 static const TypeInfo riscv_virt_machine_typeinfo = {
diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
index e69355efaf..1beacd7666 100644
--- a/include/hw/riscv/virt.h
+++ b/include/hw/riscv/virt.h
@@ -23,6 +23,9 @@
 #include "hw/sysbus.h"
 #include "hw/block/flash.h"
 
+#define VIRT_CPUS_MAX 8
+#define VIRT_SOCKETS_MAX 8
+
 #define TYPE_RISCV_VIRT_MACHINE MACHINE_TYPE_NAME("virt")
 #define RISCV_VIRT_MACHINE(obj) \
     OBJECT_CHECK(RISCVVirtState, (obj), TYPE_RISCV_VIRT_MACHINE)
@@ -32,8 +35,8 @@ typedef struct {
     MachineState parent;
 
     /*< public >*/
-    RISCVHartArrayState soc;
-    DeviceState *plic;
+    RISCVHartArrayState soc[VIRT_SOCKETS_MAX];
+    DeviceState *plic[VIRT_SOCKETS_MAX];
     PFlashCFI01 *flash[2];
 
     void *fdt;
@@ -74,6 +77,8 @@ enum {
 #define VIRT_PLIC_ENABLE_STRIDE 0x80
 #define VIRT_PLIC_CONTEXT_BASE 0x200000
 #define VIRT_PLIC_CONTEXT_STRIDE 0x1000
+#define VIRT_PLIC_SIZE(__num_context) \
+    (VIRT_PLIC_CONTEXT_BASE + (__num_context) * VIRT_PLIC_CONTEXT_STRIDE)
 
 #define FDT_PCI_ADDR_CELLS    3
 #define FDT_PCI_INT_CELLS     1
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5 5/5] hw/riscv: virt: Allow creating multiple NUMA sockets
@ 2020-05-29 11:46   ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-05-29 11:46 UTC (permalink / raw)
  To: Peter Maydell, Palmer Dabbelt, Alistair Francis, Sagar Karandikar
  Cc: Atish Patra, Anup Patel, qemu-riscv, qemu-devel, Anup Patel

We extend RISC-V virt machine to allow creating a multi-socket
machine. Each RISC-V virt machine socket is a NUMA node having
a set of HARTs, a memory instance, a CLINT instance, and a PLIC
instance. Other devices are shared between all sockets. We also
update the generated device tree accordingly.

By default, NUMA multi-socket support is disabled for RISC-V virt
machine. To enable it, users can use "-numa" command-line options
of QEMU.

Example1: For two NUMA nodes with 2 CPUs each, append following
to command-line options: "-smp 4 -numa node -numa node"

Example2: For two NUMA nodes with 1 and 3 CPUs, append following
to command-line options:
"-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \
-numa cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \
-numa cpu,node-id=1,core-id=3"

The maximum number of sockets in a RISC-V virt machine is 8
but this limit can be changed in future.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
---
 hw/riscv/virt.c         | 530 +++++++++++++++++++++++-----------------
 include/hw/riscv/virt.h |   9 +-
 2 files changed, 308 insertions(+), 231 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 421815081d..2863b42cea 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -35,6 +35,7 @@
 #include "hw/riscv/sifive_test.h"
 #include "hw/riscv/virt.h"
 #include "hw/riscv/boot.h"
+#include "hw/riscv/numa.h"
 #include "chardev/char.h"
 #include "sysemu/arch_init.h"
 #include "sysemu/device_tree.h"
@@ -60,7 +61,7 @@ static const struct MemmapEntry {
     [VIRT_TEST] =        {   0x100000,        0x1000 },
     [VIRT_RTC] =         {   0x101000,        0x1000 },
     [VIRT_CLINT] =       {  0x2000000,       0x10000 },
-    [VIRT_PLIC] =        {  0xc000000,     0x4000000 },
+    [VIRT_PLIC] =        {  0xc000000, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
     [VIRT_UART0] =       { 0x10000000,         0x100 },
     [VIRT_VIRTIO] =      { 0x10001000,        0x1000 },
     [VIRT_FLASH] =       { 0x20000000,     0x4000000 },
@@ -182,10 +183,17 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
     uint64_t mem_size, const char *cmdline)
 {
     void *fdt;
-    int cpu, i;
-    uint32_t *cells;
-    char *nodename;
-    uint32_t plic_phandle, test_phandle, phandle = 1;
+    int i, cpu, socket;
+    MachineState *mc = MACHINE(s);
+    uint64_t addr, size;
+    uint32_t *clint_cells, *plic_cells;
+    unsigned long clint_addr, plic_addr;
+    uint32_t plic_phandle[MAX_NODES];
+    uint32_t cpu_phandle, intc_phandle, test_phandle;
+    uint32_t phandle = 1, plic_mmio_phandle = 1;
+    uint32_t plic_pcie_phandle = 1, plic_virtio_phandle = 1;
+    char *mem_name, *cpu_name, *core_name, *intc_name;
+    char *name, *clint_name, *plic_name, *clust_name;
     hwaddr flashsize = virt_memmap[VIRT_FLASH].size / 2;
     hwaddr flashbase = virt_memmap[VIRT_FLASH].base;
 
@@ -206,231 +214,238 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
     qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
     qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
 
-    nodename = g_strdup_printf("/memory@%lx",
-        (long)memmap[VIRT_DRAM].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
-        memmap[VIRT_DRAM].base >> 32, memmap[VIRT_DRAM].base,
-        mem_size >> 32, mem_size);
-    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
-    g_free(nodename);
-
     qemu_fdt_add_subnode(fdt, "/cpus");
     qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
                           SIFIVE_CLINT_TIMEBASE_FREQ);
     qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
     qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
+    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
+
+    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
+        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
+        qemu_fdt_add_subnode(fdt, clust_name);
+
+        plic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
+        clint_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
+
+        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
+            cpu_phandle = phandle++;
 
-    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
-        int cpu_phandle = phandle++;
-        int intc_phandle;
-        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
-        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
-        qemu_fdt_add_subnode(fdt, nodename);
+            cpu_name = g_strdup_printf("/cpus/cpu@%d",
+                s->soc[socket].hartid_base + cpu);
+            qemu_fdt_add_subnode(fdt, cpu_name);
 #if defined(TARGET_RISCV32)
-        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
+            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv32");
 #else
-        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
+            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv48");
 #endif
-        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
-        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
-        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
-        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
-        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
-        qemu_fdt_setprop_cell(fdt, nodename, "phandle", cpu_phandle);
-        intc_phandle = phandle++;
-        qemu_fdt_add_subnode(fdt, intc);
-        qemu_fdt_setprop_cell(fdt, intc, "phandle", intc_phandle);
-        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
-        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
-        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
-        g_free(isa);
-        g_free(intc);
-        g_free(nodename);
-    }
+            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
+            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
+            g_free(name);
+            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
+            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
+            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
+                s->soc[socket].hartid_base + cpu);
+            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
+            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
+            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
+
+            intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
+            qemu_fdt_add_subnode(fdt, intc_name);
+            intc_phandle = phandle++;
+            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
+            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
+                "riscv,cpu-intc");
+            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
+            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells", 1);
+
+            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
+            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
+            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
+            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
+
+            plic_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
+            plic_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
+            plic_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
+            plic_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
+
+            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
+            qemu_fdt_add_subnode(fdt, core_name);
+            qemu_fdt_setprop_cell(fdt, core_name, "cpu", cpu_phandle);
+
+            g_free(core_name);
+            g_free(intc_name);
+            g_free(cpu_name);
+        }
 
-    /* Add cpu-topology node */
-    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
-    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map/cluster0");
-    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
-        char *core_nodename = g_strdup_printf("/cpus/cpu-map/cluster0/core%d",
-                                              cpu);
-        char *cpu_nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
-        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, cpu_nodename);
-        qemu_fdt_add_subnode(fdt, core_nodename);
-        qemu_fdt_setprop_cell(fdt, core_nodename, "cpu", intc_phandle);
-        g_free(core_nodename);
-        g_free(cpu_nodename);
+        addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(mc, socket);
+        size = riscv_socket_mem_size(mc, socket);
+        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
+        qemu_fdt_add_subnode(fdt, mem_name);
+        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
+            addr >> 32, addr, size >> 32, size);
+        qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
+        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
+        g_free(mem_name);
+
+        clint_addr = memmap[VIRT_CLINT].base +
+            (memmap[VIRT_CLINT].size * socket);
+        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
+        qemu_fdt_add_subnode(fdt, clint_name);
+        qemu_fdt_setprop_string(fdt, clint_name, "compatible", "riscv,clint0");
+        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
+            0x0, clint_addr, 0x0, memmap[VIRT_CLINT].size);
+        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
+            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
+        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
+        g_free(clint_name);
+
+        plic_phandle[socket] = phandle++;
+        plic_addr = memmap[VIRT_PLIC].base + (memmap[VIRT_PLIC].size * socket);
+        plic_name = g_strdup_printf("/soc/plic@%lx", plic_addr);
+        qemu_fdt_add_subnode(fdt, plic_name);
+        qemu_fdt_setprop_cell(fdt, plic_name,
+            "#address-cells", FDT_PLIC_ADDR_CELLS);
+        qemu_fdt_setprop_cell(fdt, plic_name,
+            "#interrupt-cells", FDT_PLIC_INT_CELLS);
+        qemu_fdt_setprop_string(fdt, plic_name, "compatible", "riscv,plic0");
+        qemu_fdt_setprop(fdt, plic_name, "interrupt-controller", NULL, 0);
+        qemu_fdt_setprop(fdt, plic_name, "interrupts-extended",
+            plic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
+        qemu_fdt_setprop_cells(fdt, plic_name, "reg",
+            0x0, plic_addr, 0x0, memmap[VIRT_PLIC].size);
+        qemu_fdt_setprop_cell(fdt, plic_name, "riscv,ndev", VIRTIO_NDEV);
+        riscv_socket_fdt_write_id(mc, fdt, plic_name, socket);
+        qemu_fdt_setprop_cell(fdt, plic_name, "phandle", plic_phandle[socket]);
+        g_free(plic_name);
+
+        g_free(clint_cells);
+        g_free(plic_cells);
+        g_free(clust_name);
     }
 
-    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
-    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
-        nodename =
-            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
-        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
-        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
-        g_free(nodename);
-    }
-    nodename = g_strdup_printf("/soc/clint@%lx",
-        (long)memmap[VIRT_CLINT].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
-        0x0, memmap[VIRT_CLINT].base,
-        0x0, memmap[VIRT_CLINT].size);
-    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
-        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
-    g_free(cells);
-    g_free(nodename);
-
-    plic_phandle = phandle++;
-    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
-    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
-        nodename =
-            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
-        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
-        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
-        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
-        g_free(nodename);
+    for (socket = 0; socket < riscv_socket_count(mc); socket++) {
+        if (socket == 0) {
+            plic_mmio_phandle = plic_phandle[socket];
+            plic_virtio_phandle = plic_phandle[socket];
+            plic_pcie_phandle = plic_phandle[socket];
+        }
+        if (socket == 1) {
+            plic_virtio_phandle = plic_phandle[socket];
+            plic_pcie_phandle = plic_phandle[socket];
+        }
+        if (socket == 2) {
+            plic_pcie_phandle = plic_phandle[socket];
+        }
     }
-    nodename = g_strdup_printf("/soc/interrupt-controller@%lx",
-        (long)memmap[VIRT_PLIC].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
-                          FDT_PLIC_ADDR_CELLS);
-    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
-                          FDT_PLIC_INT_CELLS);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,plic0");
-    qemu_fdt_setprop(fdt, nodename, "interrupt-controller", NULL, 0);
-    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
-        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
-        0x0, memmap[VIRT_PLIC].base,
-        0x0, memmap[VIRT_PLIC].size);
-    qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
-    qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
-    plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
-    g_free(cells);
-    g_free(nodename);
+
+    riscv_socket_fdt_write_distance_matrix(mc, fdt);
 
     for (i = 0; i < VIRTIO_COUNT; i++) {
-        nodename = g_strdup_printf("/virtio_mmio@%lx",
+        name = g_strdup_printf("/soc/virtio_mmio@%lx",
             (long)(memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size));
-        qemu_fdt_add_subnode(fdt, nodename);
-        qemu_fdt_setprop_string(fdt, nodename, "compatible", "virtio,mmio");
-        qemu_fdt_setprop_cells(fdt, nodename, "reg",
+        qemu_fdt_add_subnode(fdt, name);
+        qemu_fdt_setprop_string(fdt, name, "compatible", "virtio,mmio");
+        qemu_fdt_setprop_cells(fdt, name, "reg",
             0x0, memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
             0x0, memmap[VIRT_VIRTIO].size);
-        qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
-        qemu_fdt_setprop_cell(fdt, nodename, "interrupts", VIRTIO_IRQ + i);
-        g_free(nodename);
+        qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
+            plic_virtio_phandle);
+        qemu_fdt_setprop_cell(fdt, name, "interrupts", VIRTIO_IRQ + i);
+        g_free(name);
     }
 
-    nodename = g_strdup_printf("/soc/pci@%lx",
+    name = g_strdup_printf("/soc/pci@%lx",
         (long) memmap[VIRT_PCIE_ECAM].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
-                          FDT_PCI_ADDR_CELLS);
-    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
-                          FDT_PCI_INT_CELLS);
-    qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0x2);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible",
-                            "pci-host-ecam-generic");
-    qemu_fdt_setprop_string(fdt, nodename, "device_type", "pci");
-    qemu_fdt_setprop_cell(fdt, nodename, "linux,pci-domain", 0);
-    qemu_fdt_setprop_cells(fdt, nodename, "bus-range", 0,
-                           memmap[VIRT_PCIE_ECAM].size /
-                               PCIE_MMCFG_SIZE_MIN - 1);
-    qemu_fdt_setprop(fdt, nodename, "dma-coherent", NULL, 0);
-    qemu_fdt_setprop_cells(fdt, nodename, "reg", 0, memmap[VIRT_PCIE_ECAM].base,
-                           0, memmap[VIRT_PCIE_ECAM].size);
-    qemu_fdt_setprop_sized_cells(fdt, nodename, "ranges",
+    qemu_fdt_add_subnode(fdt, name);
+    qemu_fdt_setprop_cell(fdt, name, "#address-cells", FDT_PCI_ADDR_CELLS);
+    qemu_fdt_setprop_cell(fdt, name, "#interrupt-cells", FDT_PCI_INT_CELLS);
+    qemu_fdt_setprop_cell(fdt, name, "#size-cells", 0x2);
+    qemu_fdt_setprop_string(fdt, name, "compatible", "pci-host-ecam-generic");
+    qemu_fdt_setprop_string(fdt, name, "device_type", "pci");
+    qemu_fdt_setprop_cell(fdt, name, "linux,pci-domain", 0);
+    qemu_fdt_setprop_cells(fdt, name, "bus-range", 0,
+        memmap[VIRT_PCIE_ECAM].size / PCIE_MMCFG_SIZE_MIN - 1);
+    qemu_fdt_setprop(fdt, name, "dma-coherent", NULL, 0);
+    qemu_fdt_setprop_cells(fdt, name, "reg", 0,
+        memmap[VIRT_PCIE_ECAM].base, 0, memmap[VIRT_PCIE_ECAM].size);
+    qemu_fdt_setprop_sized_cells(fdt, name, "ranges",
         1, FDT_PCI_RANGE_IOPORT, 2, 0,
         2, memmap[VIRT_PCIE_PIO].base, 2, memmap[VIRT_PCIE_PIO].size,
         1, FDT_PCI_RANGE_MMIO,
         2, memmap[VIRT_PCIE_MMIO].base,
         2, memmap[VIRT_PCIE_MMIO].base, 2, memmap[VIRT_PCIE_MMIO].size);
-    create_pcie_irq_map(fdt, nodename, plic_phandle);
-    g_free(nodename);
+    create_pcie_irq_map(fdt, name, plic_pcie_phandle);
+    g_free(name);
 
     test_phandle = phandle++;
-    nodename = g_strdup_printf("/test@%lx",
+    name = g_strdup_printf("/soc/test@%lx",
         (long)memmap[VIRT_TEST].base);
-    qemu_fdt_add_subnode(fdt, nodename);
+    qemu_fdt_add_subnode(fdt, name);
     {
         const char compat[] = "sifive,test1\0sifive,test0\0syscon";
-        qemu_fdt_setprop(fdt, nodename, "compatible", compat, sizeof(compat));
+        qemu_fdt_setprop(fdt, name, "compatible", compat, sizeof(compat));
     }
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
+    qemu_fdt_setprop_cells(fdt, name, "reg",
         0x0, memmap[VIRT_TEST].base,
         0x0, memmap[VIRT_TEST].size);
-    qemu_fdt_setprop_cell(fdt, nodename, "phandle", test_phandle);
-    test_phandle = qemu_fdt_get_phandle(fdt, nodename);
-    g_free(nodename);
-
-    nodename = g_strdup_printf("/reboot");
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-reboot");
-    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
-    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
-    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_RESET);
-    g_free(nodename);
-
-    nodename = g_strdup_printf("/poweroff");
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-poweroff");
-    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
-    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
-    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_PASS);
-    g_free(nodename);
-
-    nodename = g_strdup_printf("/uart@%lx",
-        (long)memmap[VIRT_UART0].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible", "ns16550a");
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
+    qemu_fdt_setprop_cell(fdt, name, "phandle", test_phandle);
+    test_phandle = qemu_fdt_get_phandle(fdt, name);
+    g_free(name);
+
+    name = g_strdup_printf("/soc/reboot");
+    qemu_fdt_add_subnode(fdt, name);
+    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-reboot");
+    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
+    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
+    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_RESET);
+    g_free(name);
+
+    name = g_strdup_printf("/soc/poweroff");
+    qemu_fdt_add_subnode(fdt, name);
+    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-poweroff");
+    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
+    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
+    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_PASS);
+    g_free(name);
+
+    name = g_strdup_printf("/soc/uart@%lx", (long)memmap[VIRT_UART0].base);
+    qemu_fdt_add_subnode(fdt, name);
+    qemu_fdt_setprop_string(fdt, name, "compatible", "ns16550a");
+    qemu_fdt_setprop_cells(fdt, name, "reg",
         0x0, memmap[VIRT_UART0].base,
         0x0, memmap[VIRT_UART0].size);
-    qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
-    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
-    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", UART0_IRQ);
+    qemu_fdt_setprop_cell(fdt, name, "clock-frequency", 3686400);
+    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
+    qemu_fdt_setprop_cell(fdt, name, "interrupts", UART0_IRQ);
 
     qemu_fdt_add_subnode(fdt, "/chosen");
-    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", nodename);
+    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", name);
     if (cmdline) {
         qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
     }
-    g_free(nodename);
-
-    nodename = g_strdup_printf("/rtc@%lx",
-        (long)memmap[VIRT_RTC].base);
-    qemu_fdt_add_subnode(fdt, nodename);
-    qemu_fdt_setprop_string(fdt, nodename, "compatible",
-        "google,goldfish-rtc");
-    qemu_fdt_setprop_cells(fdt, nodename, "reg",
+    g_free(name);
+
+    name = g_strdup_printf("/soc/rtc@%lx", (long)memmap[VIRT_RTC].base);
+    qemu_fdt_add_subnode(fdt, name);
+    qemu_fdt_setprop_string(fdt, name, "compatible", "google,goldfish-rtc");
+    qemu_fdt_setprop_cells(fdt, name, "reg",
         0x0, memmap[VIRT_RTC].base,
         0x0, memmap[VIRT_RTC].size);
-    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
-    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", RTC_IRQ);
-    g_free(nodename);
-
-    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
-    qemu_fdt_add_subnode(s->fdt, nodename);
-    qemu_fdt_setprop_string(s->fdt, nodename, "compatible", "cfi-flash");
-    qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg",
+    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
+    qemu_fdt_setprop_cell(fdt, name, "interrupts", RTC_IRQ);
+    g_free(name);
+
+    name = g_strdup_printf("/soc/flash@%" PRIx64, flashbase);
+    qemu_fdt_add_subnode(s->fdt, name);
+    qemu_fdt_setprop_string(s->fdt, name, "compatible", "cfi-flash");
+    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
                                  2, flashbase, 2, flashsize,
                                  2, flashbase + flashsize, 2, flashsize);
-    qemu_fdt_setprop_cell(s->fdt, nodename, "bank-width", 4);
-    g_free(nodename);
+    qemu_fdt_setprop_cell(s->fdt, name, "bank-width", 4);
+    g_free(name);
 }
 
-
 static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
                                           hwaddr ecam_base, hwaddr ecam_size,
                                           hwaddr mmio_base, hwaddr mmio_size,
@@ -478,21 +493,100 @@ static void riscv_virt_board_init(MachineState *machine)
     MemoryRegion *system_memory = get_system_memory();
     MemoryRegion *main_mem = g_new(MemoryRegion, 1);
     MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
-    char *plic_hart_config;
+    char *plic_hart_config, *soc_name;
     size_t plic_hart_config_len;
     target_ulong start_addr = memmap[VIRT_DRAM].base;
-    int i;
-    unsigned int smp_cpus = machine->smp.cpus;
-
-    /* Initialize SOC */
-    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
-                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
-    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-type",
-                            &error_abort);
-    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
-                            &error_abort);
-    object_property_set_bool(OBJECT(&s->soc), true, "realized",
-                            &error_abort);
+    DeviceState *mmio_plic, *virtio_plic, *pcie_plic;
+    int i, j, base_hartid, hart_count;
+
+    /* Check socket count limit */
+    if (VIRT_SOCKETS_MAX < riscv_socket_count(machine)) {
+        error_report("number of sockets/nodes should be less than %d",
+            VIRT_SOCKETS_MAX);
+        exit(1);
+    }
+
+    /* Initialize sockets */
+    mmio_plic = virtio_plic = pcie_plic = NULL;
+    for (i = 0; i < riscv_socket_count(machine); i++) {
+        if (!riscv_socket_check_hartids(machine, i)) {
+            error_report("discontinuous hartids in socket%d", i);
+            exit(1);
+        }
+
+        base_hartid = riscv_socket_first_hartid(machine, i);
+        if (base_hartid < 0) {
+            error_report("can't find hartid base for socket%d", i);
+            exit(1);
+        }
+
+        hart_count = riscv_socket_hart_count(machine, i);
+        if (hart_count < 0) {
+            error_report("can't find hart count for socket%d", i);
+            exit(1);
+        }
+
+        soc_name = g_strdup_printf("soc%d", i);
+        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
+            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
+        g_free(soc_name);
+        object_property_set_str(OBJECT(&s->soc[i]),
+            machine->cpu_type, "cpu-type", &error_abort);
+        object_property_set_int(OBJECT(&s->soc[i]),
+            base_hartid, "hartid-base", &error_abort);
+        object_property_set_int(OBJECT(&s->soc[i]),
+            hart_count, "num-harts", &error_abort);
+        object_property_set_bool(OBJECT(&s->soc[i]),
+            true, "realized", &error_abort);
+
+        /* Per-socket CLINT */
+        sifive_clint_create(
+            memmap[VIRT_CLINT].base + i * memmap[VIRT_CLINT].size,
+            memmap[VIRT_CLINT].size, base_hartid, hart_count,
+            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
+
+        /* Per-socket PLIC hart topology configuration string */
+        plic_hart_config_len =
+            (strlen(VIRT_PLIC_HART_CONFIG) + 1) * hart_count;
+        plic_hart_config = g_malloc0(plic_hart_config_len);
+        for (j = 0; j < hart_count; j++) {
+            if (j != 0) {
+                strncat(plic_hart_config, ",", plic_hart_config_len);
+            }
+            strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG,
+                plic_hart_config_len);
+            plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
+        }
+
+        /* Per-socket PLIC */
+        s->plic[i] = sifive_plic_create(
+            memmap[VIRT_PLIC].base + i * memmap[VIRT_PLIC].size,
+            plic_hart_config, base_hartid,
+            VIRT_PLIC_NUM_SOURCES,
+            VIRT_PLIC_NUM_PRIORITIES,
+            VIRT_PLIC_PRIORITY_BASE,
+            VIRT_PLIC_PENDING_BASE,
+            VIRT_PLIC_ENABLE_BASE,
+            VIRT_PLIC_ENABLE_STRIDE,
+            VIRT_PLIC_CONTEXT_BASE,
+            VIRT_PLIC_CONTEXT_STRIDE,
+            memmap[VIRT_PLIC].size);
+        g_free(plic_hart_config);
+
+        /* Try to use different PLIC instance based device type */
+        if (i == 0) {
+            mmio_plic = s->plic[i];
+            virtio_plic = s->plic[i];
+            pcie_plic = s->plic[i];
+        }
+        if (i == 1) {
+            virtio_plic = s->plic[i];
+            pcie_plic = s->plic[i];
+        }
+        if (i == 2) {
+            pcie_plic = s->plic[i];
+        }
+    }
 
     /* register system main memory (actual RAM) */
     memory_region_init_ram(main_mem, NULL, "riscv_virt_board.ram",
@@ -571,38 +665,14 @@ static void riscv_virt_board_init(MachineState *machine)
                           memmap[VIRT_MROM].base + sizeof(reset_vec),
                           &address_space_memory);
 
-    /* create PLIC hart topology configuration string */
-    plic_hart_config_len = (strlen(VIRT_PLIC_HART_CONFIG) + 1) * smp_cpus;
-    plic_hart_config = g_malloc0(plic_hart_config_len);
-    for (i = 0; i < smp_cpus; i++) {
-        if (i != 0) {
-            strncat(plic_hart_config, ",", plic_hart_config_len);
-        }
-        strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG, plic_hart_config_len);
-        plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
-    }
-
-    /* MMIO */
-    s->plic = sifive_plic_create(memmap[VIRT_PLIC].base,
-        plic_hart_config, 0,
-        VIRT_PLIC_NUM_SOURCES,
-        VIRT_PLIC_NUM_PRIORITIES,
-        VIRT_PLIC_PRIORITY_BASE,
-        VIRT_PLIC_PENDING_BASE,
-        VIRT_PLIC_ENABLE_BASE,
-        VIRT_PLIC_ENABLE_STRIDE,
-        VIRT_PLIC_CONTEXT_BASE,
-        VIRT_PLIC_CONTEXT_STRIDE,
-        memmap[VIRT_PLIC].size);
-    sifive_clint_create(memmap[VIRT_CLINT].base,
-        memmap[VIRT_CLINT].size, 0, smp_cpus,
-        SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
+    /* SiFive Test MMIO device */
     sifive_test_create(memmap[VIRT_TEST].base);
 
+    /* VirtIO MMIO devices */
     for (i = 0; i < VIRTIO_COUNT; i++) {
         sysbus_create_simple("virtio-mmio",
             memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
-            qdev_get_gpio_in(DEVICE(s->plic), VIRTIO_IRQ + i));
+            qdev_get_gpio_in(DEVICE(virtio_plic), VIRTIO_IRQ + i));
     }
 
     gpex_pcie_init(system_memory,
@@ -611,14 +681,14 @@ static void riscv_virt_board_init(MachineState *machine)
                          memmap[VIRT_PCIE_MMIO].base,
                          memmap[VIRT_PCIE_MMIO].size,
                          memmap[VIRT_PCIE_PIO].base,
-                         DEVICE(s->plic), true);
+                         DEVICE(pcie_plic), true);
 
     serial_mm_init(system_memory, memmap[VIRT_UART0].base,
-        0, qdev_get_gpio_in(DEVICE(s->plic), UART0_IRQ), 399193,
+        0, qdev_get_gpio_in(DEVICE(mmio_plic), UART0_IRQ), 399193,
         serial_hd(0), DEVICE_LITTLE_ENDIAN);
 
     sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
-        qdev_get_gpio_in(DEVICE(s->plic), RTC_IRQ));
+        qdev_get_gpio_in(DEVICE(mmio_plic), RTC_IRQ));
 
     virt_flash_create(s);
 
@@ -628,8 +698,6 @@ static void riscv_virt_board_init(MachineState *machine)
                                   drive_get(IF_PFLASH, 0, i));
     }
     virt_flash_map(s, system_memory);
-
-    g_free(plic_hart_config);
 }
 
 static void riscv_virt_machine_instance_init(Object *obj)
@@ -642,9 +710,13 @@ static void riscv_virt_machine_class_init(ObjectClass *oc, void *data)
 
     mc->desc = "RISC-V VirtIO board";
     mc->init = riscv_virt_board_init;
-    mc->max_cpus = 8;
+    mc->max_cpus = VIRT_CPUS_MAX;
     mc->default_cpu_type = VIRT_CPU;
     mc->pci_allow_0_address = true;
+    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
+    mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
+    mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
+    mc->numa_mem_supported = true;
 }
 
 static const TypeInfo riscv_virt_machine_typeinfo = {
diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
index e69355efaf..1beacd7666 100644
--- a/include/hw/riscv/virt.h
+++ b/include/hw/riscv/virt.h
@@ -23,6 +23,9 @@
 #include "hw/sysbus.h"
 #include "hw/block/flash.h"
 
+#define VIRT_CPUS_MAX 8
+#define VIRT_SOCKETS_MAX 8
+
 #define TYPE_RISCV_VIRT_MACHINE MACHINE_TYPE_NAME("virt")
 #define RISCV_VIRT_MACHINE(obj) \
     OBJECT_CHECK(RISCVVirtState, (obj), TYPE_RISCV_VIRT_MACHINE)
@@ -32,8 +35,8 @@ typedef struct {
     MachineState parent;
 
     /*< public >*/
-    RISCVHartArrayState soc;
-    DeviceState *plic;
+    RISCVHartArrayState soc[VIRT_SOCKETS_MAX];
+    DeviceState *plic[VIRT_SOCKETS_MAX];
     PFlashCFI01 *flash[2];
 
     void *fdt;
@@ -74,6 +77,8 @@ enum {
 #define VIRT_PLIC_ENABLE_STRIDE 0x80
 #define VIRT_PLIC_CONTEXT_BASE 0x200000
 #define VIRT_PLIC_CONTEXT_STRIDE 0x1000
+#define VIRT_PLIC_SIZE(__num_context) \
+    (VIRT_PLIC_CONTEXT_BASE + (__num_context) * VIRT_PLIC_CONTEXT_STRIDE)
 
 #define FDT_PCI_ADDR_CELLS    3
 #define FDT_PCI_INT_CELLS     1
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 5/5] hw/riscv: virt: Allow creating multiple NUMA sockets
  2020-05-29 11:46   ` Anup Patel
@ 2020-06-10 23:24     ` Alistair Francis
  -1 siblings, 0 replies; 28+ messages in thread
From: Alistair Francis @ 2020-06-10 23:24 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, open list:RISC-V, Sagar Karandikar, Anup Patel,
	qemu-devel@nongnu.org Developers, Atish Patra, Alistair Francis,
	Palmer Dabbelt

On Fri, May 29, 2020 at 4:49 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> We extend RISC-V virt machine to allow creating a multi-socket
> machine. Each RISC-V virt machine socket is a NUMA node having
> a set of HARTs, a memory instance, a CLINT instance, and a PLIC
> instance. Other devices are shared between all sockets. We also
> update the generated device tree accordingly.
>
> By default, NUMA multi-socket support is disabled for RISC-V virt
> machine. To enable it, users can use "-numa" command-line options
> of QEMU.
>
> Example1: For two NUMA nodes with 2 CPUs each, append following
> to command-line options: "-smp 4 -numa node -numa node"
>
> Example2: For two NUMA nodes with 1 and 3 CPUs, append following
> to command-line options:
> "-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \
> -numa cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \
> -numa cpu,node-id=1,core-id=3"
>
> The maximum number of sockets in a RISC-V virt machine is 8
> but this limit can be changed in future.
>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> ---
>  hw/riscv/virt.c         | 530 +++++++++++++++++++++++-----------------
>  include/hw/riscv/virt.h |   9 +-
>  2 files changed, 308 insertions(+), 231 deletions(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index 421815081d..2863b42cea 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -35,6 +35,7 @@
>  #include "hw/riscv/sifive_test.h"
>  #include "hw/riscv/virt.h"
>  #include "hw/riscv/boot.h"
> +#include "hw/riscv/numa.h"
>  #include "chardev/char.h"
>  #include "sysemu/arch_init.h"
>  #include "sysemu/device_tree.h"
> @@ -60,7 +61,7 @@ static const struct MemmapEntry {
>      [VIRT_TEST] =        {   0x100000,        0x1000 },
>      [VIRT_RTC] =         {   0x101000,        0x1000 },
>      [VIRT_CLINT] =       {  0x2000000,       0x10000 },
> -    [VIRT_PLIC] =        {  0xc000000,     0x4000000 },
> +    [VIRT_PLIC] =        {  0xc000000, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
>      [VIRT_UART0] =       { 0x10000000,         0x100 },
>      [VIRT_VIRTIO] =      { 0x10001000,        0x1000 },
>      [VIRT_FLASH] =       { 0x20000000,     0x4000000 },
> @@ -182,10 +183,17 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
>      uint64_t mem_size, const char *cmdline)
>  {
>      void *fdt;
> -    int cpu, i;
> -    uint32_t *cells;
> -    char *nodename;
> -    uint32_t plic_phandle, test_phandle, phandle = 1;
> +    int i, cpu, socket;
> +    MachineState *mc = MACHINE(s);
> +    uint64_t addr, size;
> +    uint32_t *clint_cells, *plic_cells;
> +    unsigned long clint_addr, plic_addr;
> +    uint32_t plic_phandle[MAX_NODES];
> +    uint32_t cpu_phandle, intc_phandle, test_phandle;
> +    uint32_t phandle = 1, plic_mmio_phandle = 1;
> +    uint32_t plic_pcie_phandle = 1, plic_virtio_phandle = 1;
> +    char *mem_name, *cpu_name, *core_name, *intc_name;
> +    char *name, *clint_name, *plic_name, *clust_name;
>      hwaddr flashsize = virt_memmap[VIRT_FLASH].size / 2;
>      hwaddr flashbase = virt_memmap[VIRT_FLASH].base;
>
> @@ -206,231 +214,238 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
>      qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
>      qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
>
> -    nodename = g_strdup_printf("/memory@%lx",
> -        (long)memmap[VIRT_DRAM].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        memmap[VIRT_DRAM].base >> 32, memmap[VIRT_DRAM].base,
> -        mem_size >> 32, mem_size);
> -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> -    g_free(nodename);
> -
>      qemu_fdt_add_subnode(fdt, "/cpus");
>      qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
>                            SIFIVE_CLINT_TIMEBASE_FREQ);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
> +    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");

I'm no expert with cpu-map. Do you mind CCing Atish in the next
version and see if he can Ack these DT changes?

> +
> +    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
> +        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
> +        qemu_fdt_add_subnode(fdt, clust_name);
> +
> +        plic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> +        clint_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> +
> +        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
> +            cpu_phandle = phandle++;
>
> -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> -        int cpu_phandle = phandle++;
> -        int intc_phandle;
> -        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> -        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
> -        qemu_fdt_add_subnode(fdt, nodename);
> +            cpu_name = g_strdup_printf("/cpus/cpu@%d",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_add_subnode(fdt, cpu_name);
>  #if defined(TARGET_RISCV32)
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv32");
>  #else
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv48");
>  #endif
> -        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
> -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
> -        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
> -        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
> -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
> -        qemu_fdt_setprop_cell(fdt, nodename, "phandle", cpu_phandle);
> -        intc_phandle = phandle++;
> -        qemu_fdt_add_subnode(fdt, intc);
> -        qemu_fdt_setprop_cell(fdt, intc, "phandle", intc_phandle);
> -        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
> -        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
> -        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
> -        g_free(isa);
> -        g_free(intc);
> -        g_free(nodename);
> -    }
> +            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
> +            g_free(name);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
> +            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
> +
> +            intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
> +            qemu_fdt_add_subnode(fdt, intc_name);
> +            intc_phandle = phandle++;
> +            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
> +            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
> +                "riscv,cpu-intc");
> +            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
> +            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells", 1);
> +
> +            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> +            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> +
> +            plic_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> +            plic_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> +            plic_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> +            plic_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> +
> +            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
> +            qemu_fdt_add_subnode(fdt, core_name);
> +            qemu_fdt_setprop_cell(fdt, core_name, "cpu", cpu_phandle);
> +
> +            g_free(core_name);
> +            g_free(intc_name);
> +            g_free(cpu_name);
> +        }
>
> -    /* Add cpu-topology node */
> -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map/cluster0");
> -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> -        char *core_nodename = g_strdup_printf("/cpus/cpu-map/cluster0/core%d",
> -                                              cpu);
> -        char *cpu_nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, cpu_nodename);
> -        qemu_fdt_add_subnode(fdt, core_nodename);
> -        qemu_fdt_setprop_cell(fdt, core_nodename, "cpu", intc_phandle);
> -        g_free(core_nodename);
> -        g_free(cpu_nodename);
> +        addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(mc, socket);
> +        size = riscv_socket_mem_size(mc, socket);
> +        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
> +        qemu_fdt_add_subnode(fdt, mem_name);
> +        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
> +            addr >> 32, addr, size >> 32, size);
> +        qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
> +        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
> +        g_free(mem_name);
> +
> +        clint_addr = memmap[VIRT_CLINT].base +
> +            (memmap[VIRT_CLINT].size * socket);
> +        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
> +        qemu_fdt_add_subnode(fdt, clint_name);
> +        qemu_fdt_setprop_string(fdt, clint_name, "compatible", "riscv,clint0");
> +        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
> +            0x0, clint_addr, 0x0, memmap[VIRT_CLINT].size);
> +        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
> +            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> +        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
> +        g_free(clint_name);
> +
> +        plic_phandle[socket] = phandle++;
> +        plic_addr = memmap[VIRT_PLIC].base + (memmap[VIRT_PLIC].size * socket);
> +        plic_name = g_strdup_printf("/soc/plic@%lx", plic_addr);
> +        qemu_fdt_add_subnode(fdt, plic_name);
> +        qemu_fdt_setprop_cell(fdt, plic_name,
> +            "#address-cells", FDT_PLIC_ADDR_CELLS);
> +        qemu_fdt_setprop_cell(fdt, plic_name,
> +            "#interrupt-cells", FDT_PLIC_INT_CELLS);
> +        qemu_fdt_setprop_string(fdt, plic_name, "compatible", "riscv,plic0");
> +        qemu_fdt_setprop(fdt, plic_name, "interrupt-controller", NULL, 0);
> +        qemu_fdt_setprop(fdt, plic_name, "interrupts-extended",
> +            plic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> +        qemu_fdt_setprop_cells(fdt, plic_name, "reg",
> +            0x0, plic_addr, 0x0, memmap[VIRT_PLIC].size);
> +        qemu_fdt_setprop_cell(fdt, plic_name, "riscv,ndev", VIRTIO_NDEV);
> +        riscv_socket_fdt_write_id(mc, fdt, plic_name, socket);
> +        qemu_fdt_setprop_cell(fdt, plic_name, "phandle", plic_phandle[socket]);
> +        g_free(plic_name);
> +
> +        g_free(clint_cells);
> +        g_free(plic_cells);
> +        g_free(clust_name);
>      }
>
> -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> -        nodename =
> -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> -        g_free(nodename);
> -    }
> -    nodename = g_strdup_printf("/soc/clint@%lx",
> -        (long)memmap[VIRT_CLINT].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        0x0, memmap[VIRT_CLINT].base,
> -        0x0, memmap[VIRT_CLINT].size);
> -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> -    g_free(cells);
> -    g_free(nodename);
> -
> -    plic_phandle = phandle++;
> -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> -        nodename =
> -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> -        g_free(nodename);
> +    for (socket = 0; socket < riscv_socket_count(mc); socket++) {
> +        if (socket == 0) {
> +            plic_mmio_phandle = plic_phandle[socket];
> +            plic_virtio_phandle = plic_phandle[socket];
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
> +        if (socket == 1) {
> +            plic_virtio_phandle = plic_phandle[socket];
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
> +        if (socket == 2) {
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
>      }
> -    nodename = g_strdup_printf("/soc/interrupt-controller@%lx",
> -        (long)memmap[VIRT_PLIC].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> -                          FDT_PLIC_ADDR_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> -                          FDT_PLIC_INT_CELLS);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,plic0");
> -    qemu_fdt_setprop(fdt, nodename, "interrupt-controller", NULL, 0);
> -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        0x0, memmap[VIRT_PLIC].base,
> -        0x0, memmap[VIRT_PLIC].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
> -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
> -    plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -    g_free(cells);
> -    g_free(nodename);
> +
> +    riscv_socket_fdt_write_distance_matrix(mc, fdt);
>
>      for (i = 0; i < VIRTIO_COUNT; i++) {
> -        nodename = g_strdup_printf("/virtio_mmio@%lx",
> +        name = g_strdup_printf("/soc/virtio_mmio@%lx",
>              (long)(memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size));
> -        qemu_fdt_add_subnode(fdt, nodename);
> -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "virtio,mmio");
> -        qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +        qemu_fdt_add_subnode(fdt, name);
> +        qemu_fdt_setprop_string(fdt, name, "compatible", "virtio,mmio");
> +        qemu_fdt_setprop_cells(fdt, name, "reg",
>              0x0, memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
>              0x0, memmap[VIRT_VIRTIO].size);
> -        qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -        qemu_fdt_setprop_cell(fdt, nodename, "interrupts", VIRTIO_IRQ + i);
> -        g_free(nodename);
> +        qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
> +            plic_virtio_phandle);
> +        qemu_fdt_setprop_cell(fdt, name, "interrupts", VIRTIO_IRQ + i);
> +        g_free(name);
>      }
>
> -    nodename = g_strdup_printf("/soc/pci@%lx",
> +    name = g_strdup_printf("/soc/pci@%lx",
>          (long) memmap[VIRT_PCIE_ECAM].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> -                          FDT_PCI_ADDR_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> -                          FDT_PCI_INT_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0x2);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> -                            "pci-host-ecam-generic");
> -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "pci");
> -    qemu_fdt_setprop_cell(fdt, nodename, "linux,pci-domain", 0);
> -    qemu_fdt_setprop_cells(fdt, nodename, "bus-range", 0,
> -                           memmap[VIRT_PCIE_ECAM].size /
> -                               PCIE_MMCFG_SIZE_MIN - 1);
> -    qemu_fdt_setprop(fdt, nodename, "dma-coherent", NULL, 0);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg", 0, memmap[VIRT_PCIE_ECAM].base,
> -                           0, memmap[VIRT_PCIE_ECAM].size);
> -    qemu_fdt_setprop_sized_cells(fdt, nodename, "ranges",
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_cell(fdt, name, "#address-cells", FDT_PCI_ADDR_CELLS);
> +    qemu_fdt_setprop_cell(fdt, name, "#interrupt-cells", FDT_PCI_INT_CELLS);
> +    qemu_fdt_setprop_cell(fdt, name, "#size-cells", 0x2);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "pci-host-ecam-generic");
> +    qemu_fdt_setprop_string(fdt, name, "device_type", "pci");
> +    qemu_fdt_setprop_cell(fdt, name, "linux,pci-domain", 0);
> +    qemu_fdt_setprop_cells(fdt, name, "bus-range", 0,
> +        memmap[VIRT_PCIE_ECAM].size / PCIE_MMCFG_SIZE_MIN - 1);
> +    qemu_fdt_setprop(fdt, name, "dma-coherent", NULL, 0);
> +    qemu_fdt_setprop_cells(fdt, name, "reg", 0,
> +        memmap[VIRT_PCIE_ECAM].base, 0, memmap[VIRT_PCIE_ECAM].size);
> +    qemu_fdt_setprop_sized_cells(fdt, name, "ranges",
>          1, FDT_PCI_RANGE_IOPORT, 2, 0,
>          2, memmap[VIRT_PCIE_PIO].base, 2, memmap[VIRT_PCIE_PIO].size,
>          1, FDT_PCI_RANGE_MMIO,
>          2, memmap[VIRT_PCIE_MMIO].base,
>          2, memmap[VIRT_PCIE_MMIO].base, 2, memmap[VIRT_PCIE_MMIO].size);
> -    create_pcie_irq_map(fdt, nodename, plic_phandle);
> -    g_free(nodename);
> +    create_pcie_irq_map(fdt, name, plic_pcie_phandle);
> +    g_free(name);
>
>      test_phandle = phandle++;
> -    nodename = g_strdup_printf("/test@%lx",
> +    name = g_strdup_printf("/soc/test@%lx",
>          (long)memmap[VIRT_TEST].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> +    qemu_fdt_add_subnode(fdt, name);
>      {
>          const char compat[] = "sifive,test1\0sifive,test0\0syscon";
> -        qemu_fdt_setprop(fdt, nodename, "compatible", compat, sizeof(compat));
> +        qemu_fdt_setprop(fdt, name, "compatible", compat, sizeof(compat));
>      }
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_TEST].base,
>          0x0, memmap[VIRT_TEST].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", test_phandle);
> -    test_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/reboot");
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-reboot");
> -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_RESET);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/poweroff");
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-poweroff");
> -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_PASS);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/uart@%lx",
> -        (long)memmap[VIRT_UART0].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "ns16550a");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    qemu_fdt_setprop_cell(fdt, name, "phandle", test_phandle);
> +    test_phandle = qemu_fdt_get_phandle(fdt, name);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/reboot");
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-reboot");
> +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_RESET);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/poweroff");
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-poweroff");
> +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_PASS);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/uart@%lx", (long)memmap[VIRT_UART0].base);
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "ns16550a");
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_UART0].base,
>          0x0, memmap[VIRT_UART0].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", UART0_IRQ);
> +    qemu_fdt_setprop_cell(fdt, name, "clock-frequency", 3686400);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupts", UART0_IRQ);
>
>      qemu_fdt_add_subnode(fdt, "/chosen");
> -    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", nodename);
> +    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", name);
>      if (cmdline) {
>          qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
>      }
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/rtc@%lx",
> -        (long)memmap[VIRT_RTC].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> -        "google,goldfish-rtc");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/rtc@%lx", (long)memmap[VIRT_RTC].base);
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "google,goldfish-rtc");
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_RTC].base,
>          0x0, memmap[VIRT_RTC].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", RTC_IRQ);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
> -    qemu_fdt_add_subnode(s->fdt, nodename);
> -    qemu_fdt_setprop_string(s->fdt, nodename, "compatible", "cfi-flash");
> -    qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg",
> +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupts", RTC_IRQ);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/flash@%" PRIx64, flashbase);
> +    qemu_fdt_add_subnode(s->fdt, name);
> +    qemu_fdt_setprop_string(s->fdt, name, "compatible", "cfi-flash");
> +    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
>                                   2, flashbase, 2, flashsize,
>                                   2, flashbase + flashsize, 2, flashsize);
> -    qemu_fdt_setprop_cell(s->fdt, nodename, "bank-width", 4);
> -    g_free(nodename);
> +    qemu_fdt_setprop_cell(s->fdt, name, "bank-width", 4);
> +    g_free(name);
>  }
>
> -
>  static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
>                                            hwaddr ecam_base, hwaddr ecam_size,
>                                            hwaddr mmio_base, hwaddr mmio_size,
> @@ -478,21 +493,100 @@ static void riscv_virt_board_init(MachineState *machine)
>      MemoryRegion *system_memory = get_system_memory();
>      MemoryRegion *main_mem = g_new(MemoryRegion, 1);
>      MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
> -    char *plic_hart_config;
> +    char *plic_hart_config, *soc_name;
>      size_t plic_hart_config_len;
>      target_ulong start_addr = memmap[VIRT_DRAM].base;
> -    int i;
> -    unsigned int smp_cpus = machine->smp.cpus;
> -
> -    /* Initialize SOC */
> -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> -                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> -    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-type",
> -                            &error_abort);
> -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> -                            &error_abort);
> -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> -                            &error_abort);
> +    DeviceState *mmio_plic, *virtio_plic, *pcie_plic;
> +    int i, j, base_hartid, hart_count;
> +
> +    /* Check socket count limit */
> +    if (VIRT_SOCKETS_MAX < riscv_socket_count(machine)) {
> +        error_report("number of sockets/nodes should be less than %d",
> +            VIRT_SOCKETS_MAX);
> +        exit(1);
> +    }
> +
> +    /* Initialize sockets */
> +    mmio_plic = virtio_plic = pcie_plic = NULL;
> +    for (i = 0; i < riscv_socket_count(machine); i++) {
> +        if (!riscv_socket_check_hartids(machine, i)) {
> +            error_report("discontinuous hartids in socket%d", i);
> +            exit(1);
> +        }
> +
> +        base_hartid = riscv_socket_first_hartid(machine, i);
> +        if (base_hartid < 0) {
> +            error_report("can't find hartid base for socket%d", i);
> +            exit(1);
> +        }
> +
> +        hart_count = riscv_socket_hart_count(machine, i);
> +        if (hart_count < 0) {
> +            error_report("can't find hart count for socket%d", i);
> +            exit(1);
> +        }
> +
> +        soc_name = g_strdup_printf("soc%d", i);
> +        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
> +            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> +        g_free(soc_name);
> +        object_property_set_str(OBJECT(&s->soc[i]),
> +            machine->cpu_type, "cpu-type", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            base_hartid, "hartid-base", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            hart_count, "num-harts", &error_abort);
> +        object_property_set_bool(OBJECT(&s->soc[i]),
> +            true, "realized", &error_abort);
> +
> +        /* Per-socket CLINT */
> +        sifive_clint_create(
> +            memmap[VIRT_CLINT].base + i * memmap[VIRT_CLINT].size,
> +            memmap[VIRT_CLINT].size, base_hartid, hart_count,
> +            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
> +
> +        /* Per-socket PLIC hart topology configuration string */
> +        plic_hart_config_len =
> +            (strlen(VIRT_PLIC_HART_CONFIG) + 1) * hart_count;
> +        plic_hart_config = g_malloc0(plic_hart_config_len);
> +        for (j = 0; j < hart_count; j++) {
> +            if (j != 0) {
> +                strncat(plic_hart_config, ",", plic_hart_config_len);
> +            }
> +            strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG,
> +                plic_hart_config_len);
> +            plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> +        }
> +
> +        /* Per-socket PLIC */
> +        s->plic[i] = sifive_plic_create(
> +            memmap[VIRT_PLIC].base + i * memmap[VIRT_PLIC].size,
> +            plic_hart_config, base_hartid,
> +            VIRT_PLIC_NUM_SOURCES,
> +            VIRT_PLIC_NUM_PRIORITIES,
> +            VIRT_PLIC_PRIORITY_BASE,
> +            VIRT_PLIC_PENDING_BASE,
> +            VIRT_PLIC_ENABLE_BASE,
> +            VIRT_PLIC_ENABLE_STRIDE,
> +            VIRT_PLIC_CONTEXT_BASE,
> +            VIRT_PLIC_CONTEXT_STRIDE,
> +            memmap[VIRT_PLIC].size);
> +        g_free(plic_hart_config);
> +
> +        /* Try to use different PLIC instance based device type */

Why do we have different types of PLICs?

> +        if (i == 0) {
> +            mmio_plic = s->plic[i];
> +            virtio_plic = s->plic[i];
> +            pcie_plic = s->plic[i];
> +        }
> +        if (i == 1) {
> +            virtio_plic = s->plic[i];
> +            pcie_plic = s->plic[i];
> +        }
> +        if (i == 2) {
> +            pcie_plic = s->plic[i];
> +        }
> +    }
>
>      /* register system main memory (actual RAM) */
>      memory_region_init_ram(main_mem, NULL, "riscv_virt_board.ram",
> @@ -571,38 +665,14 @@ static void riscv_virt_board_init(MachineState *machine)
>                            memmap[VIRT_MROM].base + sizeof(reset_vec),
>                            &address_space_memory);
>
> -    /* create PLIC hart topology configuration string */
> -    plic_hart_config_len = (strlen(VIRT_PLIC_HART_CONFIG) + 1) * smp_cpus;
> -    plic_hart_config = g_malloc0(plic_hart_config_len);
> -    for (i = 0; i < smp_cpus; i++) {
> -        if (i != 0) {
> -            strncat(plic_hart_config, ",", plic_hart_config_len);
> -        }
> -        strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG, plic_hart_config_len);
> -        plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> -    }
> -
> -    /* MMIO */
> -    s->plic = sifive_plic_create(memmap[VIRT_PLIC].base,
> -        plic_hart_config, 0,
> -        VIRT_PLIC_NUM_SOURCES,
> -        VIRT_PLIC_NUM_PRIORITIES,
> -        VIRT_PLIC_PRIORITY_BASE,
> -        VIRT_PLIC_PENDING_BASE,
> -        VIRT_PLIC_ENABLE_BASE,
> -        VIRT_PLIC_ENABLE_STRIDE,
> -        VIRT_PLIC_CONTEXT_BASE,
> -        VIRT_PLIC_CONTEXT_STRIDE,
> -        memmap[VIRT_PLIC].size);
> -    sifive_clint_create(memmap[VIRT_CLINT].base,
> -        memmap[VIRT_CLINT].size, 0, smp_cpus,
> -        SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
> +    /* SiFive Test MMIO device */
>      sifive_test_create(memmap[VIRT_TEST].base);
>
> +    /* VirtIO MMIO devices */
>      for (i = 0; i < VIRTIO_COUNT; i++) {
>          sysbus_create_simple("virtio-mmio",
>              memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
> -            qdev_get_gpio_in(DEVICE(s->plic), VIRTIO_IRQ + i));
> +            qdev_get_gpio_in(DEVICE(virtio_plic), VIRTIO_IRQ + i));
>      }
>
>      gpex_pcie_init(system_memory,
> @@ -611,14 +681,14 @@ static void riscv_virt_board_init(MachineState *machine)
>                           memmap[VIRT_PCIE_MMIO].base,
>                           memmap[VIRT_PCIE_MMIO].size,
>                           memmap[VIRT_PCIE_PIO].base,
> -                         DEVICE(s->plic), true);
> +                         DEVICE(pcie_plic), true);
>
>      serial_mm_init(system_memory, memmap[VIRT_UART0].base,
> -        0, qdev_get_gpio_in(DEVICE(s->plic), UART0_IRQ), 399193,
> +        0, qdev_get_gpio_in(DEVICE(mmio_plic), UART0_IRQ), 399193,
>          serial_hd(0), DEVICE_LITTLE_ENDIAN);
>
>      sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
> -        qdev_get_gpio_in(DEVICE(s->plic), RTC_IRQ));
> +        qdev_get_gpio_in(DEVICE(mmio_plic), RTC_IRQ));
>
>      virt_flash_create(s);
>
> @@ -628,8 +698,6 @@ static void riscv_virt_board_init(MachineState *machine)
>                                    drive_get(IF_PFLASH, 0, i));
>      }
>      virt_flash_map(s, system_memory);
> -
> -    g_free(plic_hart_config);
>  }
>
>  static void riscv_virt_machine_instance_init(Object *obj)
> @@ -642,9 +710,13 @@ static void riscv_virt_machine_class_init(ObjectClass *oc, void *data)
>
>      mc->desc = "RISC-V VirtIO board";
>      mc->init = riscv_virt_board_init;
> -    mc->max_cpus = 8;
> +    mc->max_cpus = VIRT_CPUS_MAX;
>      mc->default_cpu_type = VIRT_CPU;
>      mc->pci_allow_0_address = true;
> +    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
> +    mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
> +    mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
> +    mc->numa_mem_supported = true;
>  }
>
>  static const TypeInfo riscv_virt_machine_typeinfo = {
> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
> index e69355efaf..1beacd7666 100644
> --- a/include/hw/riscv/virt.h
> +++ b/include/hw/riscv/virt.h
> @@ -23,6 +23,9 @@
>  #include "hw/sysbus.h"
>  #include "hw/block/flash.h"
>
> +#define VIRT_CPUS_MAX 8
> +#define VIRT_SOCKETS_MAX 8
> +
>  #define TYPE_RISCV_VIRT_MACHINE MACHINE_TYPE_NAME("virt")
>  #define RISCV_VIRT_MACHINE(obj) \
>      OBJECT_CHECK(RISCVVirtState, (obj), TYPE_RISCV_VIRT_MACHINE)
> @@ -32,8 +35,8 @@ typedef struct {
>      MachineState parent;
>
>      /*< public >*/
> -    RISCVHartArrayState soc;
> -    DeviceState *plic;
> +    RISCVHartArrayState soc[VIRT_SOCKETS_MAX];
> +    DeviceState *plic[VIRT_SOCKETS_MAX];
>      PFlashCFI01 *flash[2];
>
>      void *fdt;
> @@ -74,6 +77,8 @@ enum {
>  #define VIRT_PLIC_ENABLE_STRIDE 0x80
>  #define VIRT_PLIC_CONTEXT_BASE 0x200000
>  #define VIRT_PLIC_CONTEXT_STRIDE 0x1000
> +#define VIRT_PLIC_SIZE(__num_context) \
> +    (VIRT_PLIC_CONTEXT_BASE + (__num_context) * VIRT_PLIC_CONTEXT_STRIDE)
>
>  #define FDT_PCI_ADDR_CELLS    3
>  #define FDT_PCI_INT_CELLS     1
> --
> 2.25.1
>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 5/5] hw/riscv: virt: Allow creating multiple NUMA sockets
@ 2020-06-10 23:24     ` Alistair Francis
  0 siblings, 0 replies; 28+ messages in thread
From: Alistair Francis @ 2020-06-10 23:24 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, Palmer Dabbelt, Alistair Francis,
	Sagar Karandikar, Atish Patra, open list:RISC-V,
	qemu-devel@nongnu.org Developers, Anup Patel

On Fri, May 29, 2020 at 4:49 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> We extend RISC-V virt machine to allow creating a multi-socket
> machine. Each RISC-V virt machine socket is a NUMA node having
> a set of HARTs, a memory instance, a CLINT instance, and a PLIC
> instance. Other devices are shared between all sockets. We also
> update the generated device tree accordingly.
>
> By default, NUMA multi-socket support is disabled for RISC-V virt
> machine. To enable it, users can use "-numa" command-line options
> of QEMU.
>
> Example1: For two NUMA nodes with 2 CPUs each, append following
> to command-line options: "-smp 4 -numa node -numa node"
>
> Example2: For two NUMA nodes with 1 and 3 CPUs, append following
> to command-line options:
> "-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \
> -numa cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \
> -numa cpu,node-id=1,core-id=3"
>
> The maximum number of sockets in a RISC-V virt machine is 8
> but this limit can be changed in future.
>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> ---
>  hw/riscv/virt.c         | 530 +++++++++++++++++++++++-----------------
>  include/hw/riscv/virt.h |   9 +-
>  2 files changed, 308 insertions(+), 231 deletions(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index 421815081d..2863b42cea 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -35,6 +35,7 @@
>  #include "hw/riscv/sifive_test.h"
>  #include "hw/riscv/virt.h"
>  #include "hw/riscv/boot.h"
> +#include "hw/riscv/numa.h"
>  #include "chardev/char.h"
>  #include "sysemu/arch_init.h"
>  #include "sysemu/device_tree.h"
> @@ -60,7 +61,7 @@ static const struct MemmapEntry {
>      [VIRT_TEST] =        {   0x100000,        0x1000 },
>      [VIRT_RTC] =         {   0x101000,        0x1000 },
>      [VIRT_CLINT] =       {  0x2000000,       0x10000 },
> -    [VIRT_PLIC] =        {  0xc000000,     0x4000000 },
> +    [VIRT_PLIC] =        {  0xc000000, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
>      [VIRT_UART0] =       { 0x10000000,         0x100 },
>      [VIRT_VIRTIO] =      { 0x10001000,        0x1000 },
>      [VIRT_FLASH] =       { 0x20000000,     0x4000000 },
> @@ -182,10 +183,17 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
>      uint64_t mem_size, const char *cmdline)
>  {
>      void *fdt;
> -    int cpu, i;
> -    uint32_t *cells;
> -    char *nodename;
> -    uint32_t plic_phandle, test_phandle, phandle = 1;
> +    int i, cpu, socket;
> +    MachineState *mc = MACHINE(s);
> +    uint64_t addr, size;
> +    uint32_t *clint_cells, *plic_cells;
> +    unsigned long clint_addr, plic_addr;
> +    uint32_t plic_phandle[MAX_NODES];
> +    uint32_t cpu_phandle, intc_phandle, test_phandle;
> +    uint32_t phandle = 1, plic_mmio_phandle = 1;
> +    uint32_t plic_pcie_phandle = 1, plic_virtio_phandle = 1;
> +    char *mem_name, *cpu_name, *core_name, *intc_name;
> +    char *name, *clint_name, *plic_name, *clust_name;
>      hwaddr flashsize = virt_memmap[VIRT_FLASH].size / 2;
>      hwaddr flashbase = virt_memmap[VIRT_FLASH].base;
>
> @@ -206,231 +214,238 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
>      qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
>      qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
>
> -    nodename = g_strdup_printf("/memory@%lx",
> -        (long)memmap[VIRT_DRAM].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        memmap[VIRT_DRAM].base >> 32, memmap[VIRT_DRAM].base,
> -        mem_size >> 32, mem_size);
> -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> -    g_free(nodename);
> -
>      qemu_fdt_add_subnode(fdt, "/cpus");
>      qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
>                            SIFIVE_CLINT_TIMEBASE_FREQ);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
> +    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");

I'm no expert with cpu-map. Do you mind CCing Atish in the next
version and see if he can Ack these DT changes?

> +
> +    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
> +        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
> +        qemu_fdt_add_subnode(fdt, clust_name);
> +
> +        plic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> +        clint_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> +
> +        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
> +            cpu_phandle = phandle++;
>
> -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> -        int cpu_phandle = phandle++;
> -        int intc_phandle;
> -        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> -        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
> -        qemu_fdt_add_subnode(fdt, nodename);
> +            cpu_name = g_strdup_printf("/cpus/cpu@%d",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_add_subnode(fdt, cpu_name);
>  #if defined(TARGET_RISCV32)
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv32");
>  #else
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv48");
>  #endif
> -        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
> -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
> -        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
> -        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
> -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
> -        qemu_fdt_setprop_cell(fdt, nodename, "phandle", cpu_phandle);
> -        intc_phandle = phandle++;
> -        qemu_fdt_add_subnode(fdt, intc);
> -        qemu_fdt_setprop_cell(fdt, intc, "phandle", intc_phandle);
> -        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
> -        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
> -        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
> -        g_free(isa);
> -        g_free(intc);
> -        g_free(nodename);
> -    }
> +            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
> +            g_free(name);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
> +            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
> +
> +            intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
> +            qemu_fdt_add_subnode(fdt, intc_name);
> +            intc_phandle = phandle++;
> +            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
> +            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
> +                "riscv,cpu-intc");
> +            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
> +            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells", 1);
> +
> +            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> +            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> +
> +            plic_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> +            plic_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> +            plic_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> +            plic_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> +
> +            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
> +            qemu_fdt_add_subnode(fdt, core_name);
> +            qemu_fdt_setprop_cell(fdt, core_name, "cpu", cpu_phandle);
> +
> +            g_free(core_name);
> +            g_free(intc_name);
> +            g_free(cpu_name);
> +        }
>
> -    /* Add cpu-topology node */
> -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map/cluster0");
> -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> -        char *core_nodename = g_strdup_printf("/cpus/cpu-map/cluster0/core%d",
> -                                              cpu);
> -        char *cpu_nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, cpu_nodename);
> -        qemu_fdt_add_subnode(fdt, core_nodename);
> -        qemu_fdt_setprop_cell(fdt, core_nodename, "cpu", intc_phandle);
> -        g_free(core_nodename);
> -        g_free(cpu_nodename);
> +        addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(mc, socket);
> +        size = riscv_socket_mem_size(mc, socket);
> +        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
> +        qemu_fdt_add_subnode(fdt, mem_name);
> +        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
> +            addr >> 32, addr, size >> 32, size);
> +        qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
> +        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
> +        g_free(mem_name);
> +
> +        clint_addr = memmap[VIRT_CLINT].base +
> +            (memmap[VIRT_CLINT].size * socket);
> +        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
> +        qemu_fdt_add_subnode(fdt, clint_name);
> +        qemu_fdt_setprop_string(fdt, clint_name, "compatible", "riscv,clint0");
> +        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
> +            0x0, clint_addr, 0x0, memmap[VIRT_CLINT].size);
> +        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
> +            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> +        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
> +        g_free(clint_name);
> +
> +        plic_phandle[socket] = phandle++;
> +        plic_addr = memmap[VIRT_PLIC].base + (memmap[VIRT_PLIC].size * socket);
> +        plic_name = g_strdup_printf("/soc/plic@%lx", plic_addr);
> +        qemu_fdt_add_subnode(fdt, plic_name);
> +        qemu_fdt_setprop_cell(fdt, plic_name,
> +            "#address-cells", FDT_PLIC_ADDR_CELLS);
> +        qemu_fdt_setprop_cell(fdt, plic_name,
> +            "#interrupt-cells", FDT_PLIC_INT_CELLS);
> +        qemu_fdt_setprop_string(fdt, plic_name, "compatible", "riscv,plic0");
> +        qemu_fdt_setprop(fdt, plic_name, "interrupt-controller", NULL, 0);
> +        qemu_fdt_setprop(fdt, plic_name, "interrupts-extended",
> +            plic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> +        qemu_fdt_setprop_cells(fdt, plic_name, "reg",
> +            0x0, plic_addr, 0x0, memmap[VIRT_PLIC].size);
> +        qemu_fdt_setprop_cell(fdt, plic_name, "riscv,ndev", VIRTIO_NDEV);
> +        riscv_socket_fdt_write_id(mc, fdt, plic_name, socket);
> +        qemu_fdt_setprop_cell(fdt, plic_name, "phandle", plic_phandle[socket]);
> +        g_free(plic_name);
> +
> +        g_free(clint_cells);
> +        g_free(plic_cells);
> +        g_free(clust_name);
>      }
>
> -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> -        nodename =
> -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> -        g_free(nodename);
> -    }
> -    nodename = g_strdup_printf("/soc/clint@%lx",
> -        (long)memmap[VIRT_CLINT].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        0x0, memmap[VIRT_CLINT].base,
> -        0x0, memmap[VIRT_CLINT].size);
> -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> -    g_free(cells);
> -    g_free(nodename);
> -
> -    plic_phandle = phandle++;
> -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> -        nodename =
> -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> -        g_free(nodename);
> +    for (socket = 0; socket < riscv_socket_count(mc); socket++) {
> +        if (socket == 0) {
> +            plic_mmio_phandle = plic_phandle[socket];
> +            plic_virtio_phandle = plic_phandle[socket];
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
> +        if (socket == 1) {
> +            plic_virtio_phandle = plic_phandle[socket];
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
> +        if (socket == 2) {
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
>      }
> -    nodename = g_strdup_printf("/soc/interrupt-controller@%lx",
> -        (long)memmap[VIRT_PLIC].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> -                          FDT_PLIC_ADDR_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> -                          FDT_PLIC_INT_CELLS);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,plic0");
> -    qemu_fdt_setprop(fdt, nodename, "interrupt-controller", NULL, 0);
> -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        0x0, memmap[VIRT_PLIC].base,
> -        0x0, memmap[VIRT_PLIC].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
> -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
> -    plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -    g_free(cells);
> -    g_free(nodename);
> +
> +    riscv_socket_fdt_write_distance_matrix(mc, fdt);
>
>      for (i = 0; i < VIRTIO_COUNT; i++) {
> -        nodename = g_strdup_printf("/virtio_mmio@%lx",
> +        name = g_strdup_printf("/soc/virtio_mmio@%lx",
>              (long)(memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size));
> -        qemu_fdt_add_subnode(fdt, nodename);
> -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "virtio,mmio");
> -        qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +        qemu_fdt_add_subnode(fdt, name);
> +        qemu_fdt_setprop_string(fdt, name, "compatible", "virtio,mmio");
> +        qemu_fdt_setprop_cells(fdt, name, "reg",
>              0x0, memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
>              0x0, memmap[VIRT_VIRTIO].size);
> -        qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -        qemu_fdt_setprop_cell(fdt, nodename, "interrupts", VIRTIO_IRQ + i);
> -        g_free(nodename);
> +        qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
> +            plic_virtio_phandle);
> +        qemu_fdt_setprop_cell(fdt, name, "interrupts", VIRTIO_IRQ + i);
> +        g_free(name);
>      }
>
> -    nodename = g_strdup_printf("/soc/pci@%lx",
> +    name = g_strdup_printf("/soc/pci@%lx",
>          (long) memmap[VIRT_PCIE_ECAM].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> -                          FDT_PCI_ADDR_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> -                          FDT_PCI_INT_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0x2);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> -                            "pci-host-ecam-generic");
> -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "pci");
> -    qemu_fdt_setprop_cell(fdt, nodename, "linux,pci-domain", 0);
> -    qemu_fdt_setprop_cells(fdt, nodename, "bus-range", 0,
> -                           memmap[VIRT_PCIE_ECAM].size /
> -                               PCIE_MMCFG_SIZE_MIN - 1);
> -    qemu_fdt_setprop(fdt, nodename, "dma-coherent", NULL, 0);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg", 0, memmap[VIRT_PCIE_ECAM].base,
> -                           0, memmap[VIRT_PCIE_ECAM].size);
> -    qemu_fdt_setprop_sized_cells(fdt, nodename, "ranges",
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_cell(fdt, name, "#address-cells", FDT_PCI_ADDR_CELLS);
> +    qemu_fdt_setprop_cell(fdt, name, "#interrupt-cells", FDT_PCI_INT_CELLS);
> +    qemu_fdt_setprop_cell(fdt, name, "#size-cells", 0x2);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "pci-host-ecam-generic");
> +    qemu_fdt_setprop_string(fdt, name, "device_type", "pci");
> +    qemu_fdt_setprop_cell(fdt, name, "linux,pci-domain", 0);
> +    qemu_fdt_setprop_cells(fdt, name, "bus-range", 0,
> +        memmap[VIRT_PCIE_ECAM].size / PCIE_MMCFG_SIZE_MIN - 1);
> +    qemu_fdt_setprop(fdt, name, "dma-coherent", NULL, 0);
> +    qemu_fdt_setprop_cells(fdt, name, "reg", 0,
> +        memmap[VIRT_PCIE_ECAM].base, 0, memmap[VIRT_PCIE_ECAM].size);
> +    qemu_fdt_setprop_sized_cells(fdt, name, "ranges",
>          1, FDT_PCI_RANGE_IOPORT, 2, 0,
>          2, memmap[VIRT_PCIE_PIO].base, 2, memmap[VIRT_PCIE_PIO].size,
>          1, FDT_PCI_RANGE_MMIO,
>          2, memmap[VIRT_PCIE_MMIO].base,
>          2, memmap[VIRT_PCIE_MMIO].base, 2, memmap[VIRT_PCIE_MMIO].size);
> -    create_pcie_irq_map(fdt, nodename, plic_phandle);
> -    g_free(nodename);
> +    create_pcie_irq_map(fdt, name, plic_pcie_phandle);
> +    g_free(name);
>
>      test_phandle = phandle++;
> -    nodename = g_strdup_printf("/test@%lx",
> +    name = g_strdup_printf("/soc/test@%lx",
>          (long)memmap[VIRT_TEST].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> +    qemu_fdt_add_subnode(fdt, name);
>      {
>          const char compat[] = "sifive,test1\0sifive,test0\0syscon";
> -        qemu_fdt_setprop(fdt, nodename, "compatible", compat, sizeof(compat));
> +        qemu_fdt_setprop(fdt, name, "compatible", compat, sizeof(compat));
>      }
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_TEST].base,
>          0x0, memmap[VIRT_TEST].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", test_phandle);
> -    test_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/reboot");
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-reboot");
> -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_RESET);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/poweroff");
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-poweroff");
> -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_PASS);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/uart@%lx",
> -        (long)memmap[VIRT_UART0].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "ns16550a");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    qemu_fdt_setprop_cell(fdt, name, "phandle", test_phandle);
> +    test_phandle = qemu_fdt_get_phandle(fdt, name);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/reboot");
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-reboot");
> +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_RESET);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/poweroff");
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-poweroff");
> +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_PASS);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/uart@%lx", (long)memmap[VIRT_UART0].base);
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "ns16550a");
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_UART0].base,
>          0x0, memmap[VIRT_UART0].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", UART0_IRQ);
> +    qemu_fdt_setprop_cell(fdt, name, "clock-frequency", 3686400);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupts", UART0_IRQ);
>
>      qemu_fdt_add_subnode(fdt, "/chosen");
> -    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", nodename);
> +    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", name);
>      if (cmdline) {
>          qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
>      }
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/rtc@%lx",
> -        (long)memmap[VIRT_RTC].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> -        "google,goldfish-rtc");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/rtc@%lx", (long)memmap[VIRT_RTC].base);
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "google,goldfish-rtc");
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_RTC].base,
>          0x0, memmap[VIRT_RTC].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", RTC_IRQ);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
> -    qemu_fdt_add_subnode(s->fdt, nodename);
> -    qemu_fdt_setprop_string(s->fdt, nodename, "compatible", "cfi-flash");
> -    qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg",
> +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupts", RTC_IRQ);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/flash@%" PRIx64, flashbase);
> +    qemu_fdt_add_subnode(s->fdt, name);
> +    qemu_fdt_setprop_string(s->fdt, name, "compatible", "cfi-flash");
> +    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
>                                   2, flashbase, 2, flashsize,
>                                   2, flashbase + flashsize, 2, flashsize);
> -    qemu_fdt_setprop_cell(s->fdt, nodename, "bank-width", 4);
> -    g_free(nodename);
> +    qemu_fdt_setprop_cell(s->fdt, name, "bank-width", 4);
> +    g_free(name);
>  }
>
> -
>  static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
>                                            hwaddr ecam_base, hwaddr ecam_size,
>                                            hwaddr mmio_base, hwaddr mmio_size,
> @@ -478,21 +493,100 @@ static void riscv_virt_board_init(MachineState *machine)
>      MemoryRegion *system_memory = get_system_memory();
>      MemoryRegion *main_mem = g_new(MemoryRegion, 1);
>      MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
> -    char *plic_hart_config;
> +    char *plic_hart_config, *soc_name;
>      size_t plic_hart_config_len;
>      target_ulong start_addr = memmap[VIRT_DRAM].base;
> -    int i;
> -    unsigned int smp_cpus = machine->smp.cpus;
> -
> -    /* Initialize SOC */
> -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> -                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> -    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-type",
> -                            &error_abort);
> -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> -                            &error_abort);
> -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> -                            &error_abort);
> +    DeviceState *mmio_plic, *virtio_plic, *pcie_plic;
> +    int i, j, base_hartid, hart_count;
> +
> +    /* Check socket count limit */
> +    if (VIRT_SOCKETS_MAX < riscv_socket_count(machine)) {
> +        error_report("number of sockets/nodes should be less than %d",
> +            VIRT_SOCKETS_MAX);
> +        exit(1);
> +    }
> +
> +    /* Initialize sockets */
> +    mmio_plic = virtio_plic = pcie_plic = NULL;
> +    for (i = 0; i < riscv_socket_count(machine); i++) {
> +        if (!riscv_socket_check_hartids(machine, i)) {
> +            error_report("discontinuous hartids in socket%d", i);
> +            exit(1);
> +        }
> +
> +        base_hartid = riscv_socket_first_hartid(machine, i);
> +        if (base_hartid < 0) {
> +            error_report("can't find hartid base for socket%d", i);
> +            exit(1);
> +        }
> +
> +        hart_count = riscv_socket_hart_count(machine, i);
> +        if (hart_count < 0) {
> +            error_report("can't find hart count for socket%d", i);
> +            exit(1);
> +        }
> +
> +        soc_name = g_strdup_printf("soc%d", i);
> +        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
> +            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> +        g_free(soc_name);
> +        object_property_set_str(OBJECT(&s->soc[i]),
> +            machine->cpu_type, "cpu-type", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            base_hartid, "hartid-base", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            hart_count, "num-harts", &error_abort);
> +        object_property_set_bool(OBJECT(&s->soc[i]),
> +            true, "realized", &error_abort);
> +
> +        /* Per-socket CLINT */
> +        sifive_clint_create(
> +            memmap[VIRT_CLINT].base + i * memmap[VIRT_CLINT].size,
> +            memmap[VIRT_CLINT].size, base_hartid, hart_count,
> +            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
> +
> +        /* Per-socket PLIC hart topology configuration string */
> +        plic_hart_config_len =
> +            (strlen(VIRT_PLIC_HART_CONFIG) + 1) * hart_count;
> +        plic_hart_config = g_malloc0(plic_hart_config_len);
> +        for (j = 0; j < hart_count; j++) {
> +            if (j != 0) {
> +                strncat(plic_hart_config, ",", plic_hart_config_len);
> +            }
> +            strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG,
> +                plic_hart_config_len);
> +            plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> +        }
> +
> +        /* Per-socket PLIC */
> +        s->plic[i] = sifive_plic_create(
> +            memmap[VIRT_PLIC].base + i * memmap[VIRT_PLIC].size,
> +            plic_hart_config, base_hartid,
> +            VIRT_PLIC_NUM_SOURCES,
> +            VIRT_PLIC_NUM_PRIORITIES,
> +            VIRT_PLIC_PRIORITY_BASE,
> +            VIRT_PLIC_PENDING_BASE,
> +            VIRT_PLIC_ENABLE_BASE,
> +            VIRT_PLIC_ENABLE_STRIDE,
> +            VIRT_PLIC_CONTEXT_BASE,
> +            VIRT_PLIC_CONTEXT_STRIDE,
> +            memmap[VIRT_PLIC].size);
> +        g_free(plic_hart_config);
> +
> +        /* Try to use different PLIC instance based device type */

Why do we have different types of PLICs?

> +        if (i == 0) {
> +            mmio_plic = s->plic[i];
> +            virtio_plic = s->plic[i];
> +            pcie_plic = s->plic[i];
> +        }
> +        if (i == 1) {
> +            virtio_plic = s->plic[i];
> +            pcie_plic = s->plic[i];
> +        }
> +        if (i == 2) {
> +            pcie_plic = s->plic[i];
> +        }
> +    }
>
>      /* register system main memory (actual RAM) */
>      memory_region_init_ram(main_mem, NULL, "riscv_virt_board.ram",
> @@ -571,38 +665,14 @@ static void riscv_virt_board_init(MachineState *machine)
>                            memmap[VIRT_MROM].base + sizeof(reset_vec),
>                            &address_space_memory);
>
> -    /* create PLIC hart topology configuration string */
> -    plic_hart_config_len = (strlen(VIRT_PLIC_HART_CONFIG) + 1) * smp_cpus;
> -    plic_hart_config = g_malloc0(plic_hart_config_len);
> -    for (i = 0; i < smp_cpus; i++) {
> -        if (i != 0) {
> -            strncat(plic_hart_config, ",", plic_hart_config_len);
> -        }
> -        strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG, plic_hart_config_len);
> -        plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> -    }
> -
> -    /* MMIO */
> -    s->plic = sifive_plic_create(memmap[VIRT_PLIC].base,
> -        plic_hart_config, 0,
> -        VIRT_PLIC_NUM_SOURCES,
> -        VIRT_PLIC_NUM_PRIORITIES,
> -        VIRT_PLIC_PRIORITY_BASE,
> -        VIRT_PLIC_PENDING_BASE,
> -        VIRT_PLIC_ENABLE_BASE,
> -        VIRT_PLIC_ENABLE_STRIDE,
> -        VIRT_PLIC_CONTEXT_BASE,
> -        VIRT_PLIC_CONTEXT_STRIDE,
> -        memmap[VIRT_PLIC].size);
> -    sifive_clint_create(memmap[VIRT_CLINT].base,
> -        memmap[VIRT_CLINT].size, 0, smp_cpus,
> -        SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
> +    /* SiFive Test MMIO device */
>      sifive_test_create(memmap[VIRT_TEST].base);
>
> +    /* VirtIO MMIO devices */
>      for (i = 0; i < VIRTIO_COUNT; i++) {
>          sysbus_create_simple("virtio-mmio",
>              memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
> -            qdev_get_gpio_in(DEVICE(s->plic), VIRTIO_IRQ + i));
> +            qdev_get_gpio_in(DEVICE(virtio_plic), VIRTIO_IRQ + i));
>      }
>
>      gpex_pcie_init(system_memory,
> @@ -611,14 +681,14 @@ static void riscv_virt_board_init(MachineState *machine)
>                           memmap[VIRT_PCIE_MMIO].base,
>                           memmap[VIRT_PCIE_MMIO].size,
>                           memmap[VIRT_PCIE_PIO].base,
> -                         DEVICE(s->plic), true);
> +                         DEVICE(pcie_plic), true);
>
>      serial_mm_init(system_memory, memmap[VIRT_UART0].base,
> -        0, qdev_get_gpio_in(DEVICE(s->plic), UART0_IRQ), 399193,
> +        0, qdev_get_gpio_in(DEVICE(mmio_plic), UART0_IRQ), 399193,
>          serial_hd(0), DEVICE_LITTLE_ENDIAN);
>
>      sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
> -        qdev_get_gpio_in(DEVICE(s->plic), RTC_IRQ));
> +        qdev_get_gpio_in(DEVICE(mmio_plic), RTC_IRQ));
>
>      virt_flash_create(s);
>
> @@ -628,8 +698,6 @@ static void riscv_virt_board_init(MachineState *machine)
>                                    drive_get(IF_PFLASH, 0, i));
>      }
>      virt_flash_map(s, system_memory);
> -
> -    g_free(plic_hart_config);
>  }
>
>  static void riscv_virt_machine_instance_init(Object *obj)
> @@ -642,9 +710,13 @@ static void riscv_virt_machine_class_init(ObjectClass *oc, void *data)
>
>      mc->desc = "RISC-V VirtIO board";
>      mc->init = riscv_virt_board_init;
> -    mc->max_cpus = 8;
> +    mc->max_cpus = VIRT_CPUS_MAX;
>      mc->default_cpu_type = VIRT_CPU;
>      mc->pci_allow_0_address = true;
> +    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
> +    mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
> +    mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
> +    mc->numa_mem_supported = true;
>  }
>
>  static const TypeInfo riscv_virt_machine_typeinfo = {
> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
> index e69355efaf..1beacd7666 100644
> --- a/include/hw/riscv/virt.h
> +++ b/include/hw/riscv/virt.h
> @@ -23,6 +23,9 @@
>  #include "hw/sysbus.h"
>  #include "hw/block/flash.h"
>
> +#define VIRT_CPUS_MAX 8
> +#define VIRT_SOCKETS_MAX 8
> +
>  #define TYPE_RISCV_VIRT_MACHINE MACHINE_TYPE_NAME("virt")
>  #define RISCV_VIRT_MACHINE(obj) \
>      OBJECT_CHECK(RISCVVirtState, (obj), TYPE_RISCV_VIRT_MACHINE)
> @@ -32,8 +35,8 @@ typedef struct {
>      MachineState parent;
>
>      /*< public >*/
> -    RISCVHartArrayState soc;
> -    DeviceState *plic;
> +    RISCVHartArrayState soc[VIRT_SOCKETS_MAX];
> +    DeviceState *plic[VIRT_SOCKETS_MAX];
>      PFlashCFI01 *flash[2];
>
>      void *fdt;
> @@ -74,6 +77,8 @@ enum {
>  #define VIRT_PLIC_ENABLE_STRIDE 0x80
>  #define VIRT_PLIC_CONTEXT_BASE 0x200000
>  #define VIRT_PLIC_CONTEXT_STRIDE 0x1000
> +#define VIRT_PLIC_SIZE(__num_context) \
> +    (VIRT_PLIC_CONTEXT_BASE + (__num_context) * VIRT_PLIC_CONTEXT_STRIDE)
>
>  #define FDT_PCI_ADDR_CELLS    3
>  #define FDT_PCI_INT_CELLS     1
> --
> 2.25.1
>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
  2020-05-29 11:46   ` Anup Patel
@ 2020-06-10 23:28     ` Alistair Francis
  -1 siblings, 0 replies; 28+ messages in thread
From: Alistair Francis @ 2020-06-10 23:28 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, open list:RISC-V, Sagar Karandikar, Anup Patel,
	qemu-devel@nongnu.org Developers, Atish Patra, Alistair Francis,
	Palmer Dabbelt

On Fri, May 29, 2020 at 4:48 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> We add common helper routines which can be shared by RISC-V
> multi-socket NUMA machines.
>
> We have two types of helpers:
> 1. riscv_socket_xyz() - These helper assist managing multiple
>    sockets irrespective whether QEMU NUMA is enabled/disabled
> 2. riscv_numa_xyz() - These helpers assist in providing
>    necessary QEMU machine callbacks for QEMU NUMA emulation
>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> ---
>  hw/riscv/Makefile.objs  |   1 +
>  hw/riscv/numa.c         | 242 ++++++++++++++++++++++++++++++++++++++++
>  include/hw/riscv/numa.h |  51 +++++++++
>  3 files changed, 294 insertions(+)
>  create mode 100644 hw/riscv/numa.c
>  create mode 100644 include/hw/riscv/numa.h

I don't love that we have an entire file of functions to help with
NUMA when no other arch seems to have anything this complex.

What about RISC-V requires extra complexity?

>
> diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
> index fc3c6dd7c8..4483e61879 100644
> --- a/hw/riscv/Makefile.objs
> +++ b/hw/riscv/Makefile.objs
> @@ -1,4 +1,5 @@
>  obj-y += boot.o
> +obj-y += numa.o
>  obj-$(CONFIG_SPIKE) += riscv_htif.o
>  obj-$(CONFIG_HART) += riscv_hart.o
>  obj-$(CONFIG_SIFIVE_E) += sifive_e.o
> diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
> new file mode 100644
> index 0000000000..4f92307102
> --- /dev/null
> +++ b/hw/riscv/numa.c
> @@ -0,0 +1,242 @@
> +/*
> + * QEMU RISC-V NUMA Helper
> + *
> + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "hw/boards.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/riscv/numa.h"
> +#include "sysemu/device_tree.h"
> +
> +static bool numa_enabled(const MachineState *ms)
> +{
> +    return (ms->numa_state && ms->numa_state->num_nodes) ? true : false;
> +}
> +
> +int riscv_socket_count(const MachineState *ms)
> +{
> +    return (numa_enabled(ms)) ? ms->numa_state->num_nodes : 1;
> +}
> +
> +int riscv_socket_first_hartid(const MachineState *ms, int socket_id)
> +{
> +    int i, first_hartid = ms->smp.cpus;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? 0 : -1;
> +    }
> +
> +    for (i = 0; i < ms->smp.cpus; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            continue;
> +        }
> +        if (i < first_hartid) {
> +            first_hartid = i;
> +        }
> +    }
> +
> +    return (first_hartid < ms->smp.cpus) ? first_hartid : -1;
> +}
> +
> +int riscv_socket_last_hartid(const MachineState *ms, int socket_id)
> +{
> +    int i, last_hartid = -1;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->smp.cpus - 1 : -1;
> +    }
> +
> +    for (i = 0; i < ms->smp.cpus; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            continue;
> +        }
> +        if (i > last_hartid) {
> +            last_hartid = i;
> +        }
> +    }
> +
> +    return (last_hartid < ms->smp.cpus) ? last_hartid : -1;
> +}
> +
> +int riscv_socket_hart_count(const MachineState *ms, int socket_id)
> +{
> +    int first_hartid, last_hartid;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->smp.cpus : -1;
> +    }
> +
> +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> +    if (first_hartid < 0) {
> +        return -1;
> +    }
> +
> +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> +    if (last_hartid < 0) {
> +        return -1;
> +    }
> +
> +    if (first_hartid > last_hartid) {
> +        return -1;
> +    }
> +
> +    return last_hartid - first_hartid + 1;
> +}
> +
> +bool riscv_socket_check_hartids(const MachineState *ms, int socket_id)
> +{
> +    int i, first_hartid, last_hartid;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? true : false;
> +    }
> +
> +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> +    if (first_hartid < 0) {
> +        return false;
> +    }
> +
> +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> +    if (last_hartid < 0) {
> +        return false;
> +    }
> +
> +    for (i = first_hartid; i <= last_hartid; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
> +uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id)
> +{
> +    int i;
> +    uint64_t mem_offset = 0;
> +
> +    if (!numa_enabled(ms)) {
> +        return 0;
> +    }
> +
> +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
> +        if (i == socket_id) {
> +            break;
> +        }
> +        mem_offset += ms->numa_state->nodes[i].node_mem;
> +    }
> +
> +    return (i == socket_id) ? mem_offset : 0;
> +}
> +
> +uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id)
> +{
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->ram_size : 0;
> +    }
> +
> +    return (socket_id < ms->numa_state->num_nodes) ?
> +            ms->numa_state->nodes[socket_id].node_mem : 0;
> +}
> +
> +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> +                               const char *node_name, int socket_id)
> +{
> +    if (numa_enabled(ms)) {
> +        qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id", socket_id);
> +    }
> +}
> +
> +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt)
> +{
> +    int i, j, idx;
> +    uint32_t *dist_matrix, dist_matrix_size;
> +
> +    if (numa_enabled(ms) && ms->numa_state->have_numa_distance) {
> +        dist_matrix_size = riscv_socket_count(ms) * riscv_socket_count(ms);
> +        dist_matrix_size *= (3 * sizeof(uint32_t));
> +        dist_matrix = g_malloc0(dist_matrix_size);
> +
> +        for (i = 0; i < riscv_socket_count(ms); i++) {
> +            for (j = 0; j < riscv_socket_count(ms); j++) {
> +                idx = (i * riscv_socket_count(ms) + j) * 3;
> +                dist_matrix[idx + 0] = cpu_to_be32(i);
> +                dist_matrix[idx + 1] = cpu_to_be32(j);
> +                dist_matrix[idx + 2] =
> +                    cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
> +            }
> +        }
> +
> +        qemu_fdt_add_subnode(fdt, "/distance-map");
> +        qemu_fdt_setprop_string(fdt, "/distance-map", "compatible",
> +                                "numa-distance-map-v1");
> +        qemu_fdt_setprop(fdt, "/distance-map", "distance-matrix",
> +                         dist_matrix, dist_matrix_size);
> +        g_free(dist_matrix);
> +    }
> +}
> +
> +CpuInstanceProperties
> +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;
> +}
> +
> +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx)
> +{
> +    int64_t nidx = 0;
> +
> +    if (ms->numa_state->num_nodes) {
> +        nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
> +        if (ms->numa_state->num_nodes <= nidx) {
> +            nidx = ms->numa_state->num_nodes - 1;
> +        }
> +    }
> +
> +    return nidx;
> +}
> +
> +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms)
> +{
> +    int n;
> +    unsigned int max_cpus = ms->smp.max_cpus;
> +
> +    if (ms->possible_cpus) {
> +        assert(ms->possible_cpus->len == max_cpus);
> +        return ms->possible_cpus;
> +    }
> +
> +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> +                                  sizeof(CPUArchId) * max_cpus);
> +    ms->possible_cpus->len = max_cpus;
> +    for (n = 0; n < ms->possible_cpus->len; n++) {
> +        ms->possible_cpus->cpus[n].type = ms->cpu_type;
> +        ms->possible_cpus->cpus[n].arch_id = n;
> +        ms->possible_cpus->cpus[n].props.has_core_id = true;
> +        ms->possible_cpus->cpus[n].props.core_id = n;
> +    }
> +
> +    return ms->possible_cpus;
> +}
> diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h
> new file mode 100644
> index 0000000000..fd9517a315
> --- /dev/null
> +++ b/include/hw/riscv/numa.h
> @@ -0,0 +1,51 @@
> +/*
> + * QEMU RISC-V NUMA Helper
> + *
> + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef RISCV_NUMA_H
> +#define RISCV_NUMA_H
> +
> +#include "hw/sysbus.h"
> +#include "sysemu/numa.h"
> +
> +int riscv_socket_count(const MachineState *ms);
> +
> +int riscv_socket_first_hartid(const MachineState *ms, int socket_id);
> +
> +int riscv_socket_last_hartid(const MachineState *ms, int socket_id);
> +
> +int riscv_socket_hart_count(const MachineState *ms, int socket_id);
> +
> +uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id);
> +
> +uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id);
> +
> +bool riscv_socket_check_hartids(const MachineState *ms, int socket_id);
> +
> +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> +                               const char *node_name, int socket_id);
> +
> +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt);
> +
> +CpuInstanceProperties
> +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index);
> +
> +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx);
> +
> +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms);

Can we add some comments for the functions of what they are expected
to return (and that -1 is an error)?

Alistair

> +
> +#endif /* RISCV_NUMA_H */
> --
> 2.25.1
>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
@ 2020-06-10 23:28     ` Alistair Francis
  0 siblings, 0 replies; 28+ messages in thread
From: Alistair Francis @ 2020-06-10 23:28 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, Palmer Dabbelt, Alistair Francis,
	Sagar Karandikar, Atish Patra, open list:RISC-V,
	qemu-devel@nongnu.org Developers, Anup Patel

On Fri, May 29, 2020 at 4:48 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> We add common helper routines which can be shared by RISC-V
> multi-socket NUMA machines.
>
> We have two types of helpers:
> 1. riscv_socket_xyz() - These helper assist managing multiple
>    sockets irrespective whether QEMU NUMA is enabled/disabled
> 2. riscv_numa_xyz() - These helpers assist in providing
>    necessary QEMU machine callbacks for QEMU NUMA emulation
>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> ---
>  hw/riscv/Makefile.objs  |   1 +
>  hw/riscv/numa.c         | 242 ++++++++++++++++++++++++++++++++++++++++
>  include/hw/riscv/numa.h |  51 +++++++++
>  3 files changed, 294 insertions(+)
>  create mode 100644 hw/riscv/numa.c
>  create mode 100644 include/hw/riscv/numa.h

I don't love that we have an entire file of functions to help with
NUMA when no other arch seems to have anything this complex.

What about RISC-V requires extra complexity?

>
> diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
> index fc3c6dd7c8..4483e61879 100644
> --- a/hw/riscv/Makefile.objs
> +++ b/hw/riscv/Makefile.objs
> @@ -1,4 +1,5 @@
>  obj-y += boot.o
> +obj-y += numa.o
>  obj-$(CONFIG_SPIKE) += riscv_htif.o
>  obj-$(CONFIG_HART) += riscv_hart.o
>  obj-$(CONFIG_SIFIVE_E) += sifive_e.o
> diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
> new file mode 100644
> index 0000000000..4f92307102
> --- /dev/null
> +++ b/hw/riscv/numa.c
> @@ -0,0 +1,242 @@
> +/*
> + * QEMU RISC-V NUMA Helper
> + *
> + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "hw/boards.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/riscv/numa.h"
> +#include "sysemu/device_tree.h"
> +
> +static bool numa_enabled(const MachineState *ms)
> +{
> +    return (ms->numa_state && ms->numa_state->num_nodes) ? true : false;
> +}
> +
> +int riscv_socket_count(const MachineState *ms)
> +{
> +    return (numa_enabled(ms)) ? ms->numa_state->num_nodes : 1;
> +}
> +
> +int riscv_socket_first_hartid(const MachineState *ms, int socket_id)
> +{
> +    int i, first_hartid = ms->smp.cpus;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? 0 : -1;
> +    }
> +
> +    for (i = 0; i < ms->smp.cpus; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            continue;
> +        }
> +        if (i < first_hartid) {
> +            first_hartid = i;
> +        }
> +    }
> +
> +    return (first_hartid < ms->smp.cpus) ? first_hartid : -1;
> +}
> +
> +int riscv_socket_last_hartid(const MachineState *ms, int socket_id)
> +{
> +    int i, last_hartid = -1;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->smp.cpus - 1 : -1;
> +    }
> +
> +    for (i = 0; i < ms->smp.cpus; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            continue;
> +        }
> +        if (i > last_hartid) {
> +            last_hartid = i;
> +        }
> +    }
> +
> +    return (last_hartid < ms->smp.cpus) ? last_hartid : -1;
> +}
> +
> +int riscv_socket_hart_count(const MachineState *ms, int socket_id)
> +{
> +    int first_hartid, last_hartid;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->smp.cpus : -1;
> +    }
> +
> +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> +    if (first_hartid < 0) {
> +        return -1;
> +    }
> +
> +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> +    if (last_hartid < 0) {
> +        return -1;
> +    }
> +
> +    if (first_hartid > last_hartid) {
> +        return -1;
> +    }
> +
> +    return last_hartid - first_hartid + 1;
> +}
> +
> +bool riscv_socket_check_hartids(const MachineState *ms, int socket_id)
> +{
> +    int i, first_hartid, last_hartid;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? true : false;
> +    }
> +
> +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> +    if (first_hartid < 0) {
> +        return false;
> +    }
> +
> +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> +    if (last_hartid < 0) {
> +        return false;
> +    }
> +
> +    for (i = first_hartid; i <= last_hartid; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
> +uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id)
> +{
> +    int i;
> +    uint64_t mem_offset = 0;
> +
> +    if (!numa_enabled(ms)) {
> +        return 0;
> +    }
> +
> +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
> +        if (i == socket_id) {
> +            break;
> +        }
> +        mem_offset += ms->numa_state->nodes[i].node_mem;
> +    }
> +
> +    return (i == socket_id) ? mem_offset : 0;
> +}
> +
> +uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id)
> +{
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->ram_size : 0;
> +    }
> +
> +    return (socket_id < ms->numa_state->num_nodes) ?
> +            ms->numa_state->nodes[socket_id].node_mem : 0;
> +}
> +
> +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> +                               const char *node_name, int socket_id)
> +{
> +    if (numa_enabled(ms)) {
> +        qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id", socket_id);
> +    }
> +}
> +
> +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt)
> +{
> +    int i, j, idx;
> +    uint32_t *dist_matrix, dist_matrix_size;
> +
> +    if (numa_enabled(ms) && ms->numa_state->have_numa_distance) {
> +        dist_matrix_size = riscv_socket_count(ms) * riscv_socket_count(ms);
> +        dist_matrix_size *= (3 * sizeof(uint32_t));
> +        dist_matrix = g_malloc0(dist_matrix_size);
> +
> +        for (i = 0; i < riscv_socket_count(ms); i++) {
> +            for (j = 0; j < riscv_socket_count(ms); j++) {
> +                idx = (i * riscv_socket_count(ms) + j) * 3;
> +                dist_matrix[idx + 0] = cpu_to_be32(i);
> +                dist_matrix[idx + 1] = cpu_to_be32(j);
> +                dist_matrix[idx + 2] =
> +                    cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
> +            }
> +        }
> +
> +        qemu_fdt_add_subnode(fdt, "/distance-map");
> +        qemu_fdt_setprop_string(fdt, "/distance-map", "compatible",
> +                                "numa-distance-map-v1");
> +        qemu_fdt_setprop(fdt, "/distance-map", "distance-matrix",
> +                         dist_matrix, dist_matrix_size);
> +        g_free(dist_matrix);
> +    }
> +}
> +
> +CpuInstanceProperties
> +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;
> +}
> +
> +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx)
> +{
> +    int64_t nidx = 0;
> +
> +    if (ms->numa_state->num_nodes) {
> +        nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
> +        if (ms->numa_state->num_nodes <= nidx) {
> +            nidx = ms->numa_state->num_nodes - 1;
> +        }
> +    }
> +
> +    return nidx;
> +}
> +
> +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms)
> +{
> +    int n;
> +    unsigned int max_cpus = ms->smp.max_cpus;
> +
> +    if (ms->possible_cpus) {
> +        assert(ms->possible_cpus->len == max_cpus);
> +        return ms->possible_cpus;
> +    }
> +
> +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> +                                  sizeof(CPUArchId) * max_cpus);
> +    ms->possible_cpus->len = max_cpus;
> +    for (n = 0; n < ms->possible_cpus->len; n++) {
> +        ms->possible_cpus->cpus[n].type = ms->cpu_type;
> +        ms->possible_cpus->cpus[n].arch_id = n;
> +        ms->possible_cpus->cpus[n].props.has_core_id = true;
> +        ms->possible_cpus->cpus[n].props.core_id = n;
> +    }
> +
> +    return ms->possible_cpus;
> +}
> diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h
> new file mode 100644
> index 0000000000..fd9517a315
> --- /dev/null
> +++ b/include/hw/riscv/numa.h
> @@ -0,0 +1,51 @@
> +/*
> + * QEMU RISC-V NUMA Helper
> + *
> + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef RISCV_NUMA_H
> +#define RISCV_NUMA_H
> +
> +#include "hw/sysbus.h"
> +#include "sysemu/numa.h"
> +
> +int riscv_socket_count(const MachineState *ms);
> +
> +int riscv_socket_first_hartid(const MachineState *ms, int socket_id);
> +
> +int riscv_socket_last_hartid(const MachineState *ms, int socket_id);
> +
> +int riscv_socket_hart_count(const MachineState *ms, int socket_id);
> +
> +uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id);
> +
> +uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id);
> +
> +bool riscv_socket_check_hartids(const MachineState *ms, int socket_id);
> +
> +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> +                               const char *node_name, int socket_id);
> +
> +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt);
> +
> +CpuInstanceProperties
> +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index);
> +
> +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx);
> +
> +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms);

Can we add some comments for the functions of what they are expected
to return (and that -1 is an error)?

Alistair

> +
> +#endif /* RISCV_NUMA_H */
> --
> 2.25.1
>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v5 5/5] hw/riscv: virt: Allow creating multiple NUMA sockets
  2020-06-10 23:24     ` Alistair Francis
@ 2020-06-11 13:01       ` Anup Patel
  -1 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-06-11 13:01 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Peter Maydell, open list:RISC-V, Sagar Karandikar, Anup Patel,
	qemu-devel@nongnu.org Developers, Atish Patra, Alistair Francis,
	Palmer Dabbelt



> -----Original Message-----
> From: Alistair Francis <alistair23@gmail.com>
> Sent: 11 June 2020 04:55
> To: Anup Patel <Anup.Patel@wdc.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>; Palmer Dabbelt
> <palmer@dabbelt.com>; Alistair Francis <Alistair.Francis@wdc.com>; Sagar
> Karandikar <sagark@eecs.berkeley.edu>; Atish Patra
> <Atish.Patra@wdc.com>; open list:RISC-V <qemu-riscv@nongnu.org>;
> qemu-devel@nongnu.org Developers <qemu-devel@nongnu.org>; Anup
> Patel <anup@brainfault.org>
> Subject: Re: [PATCH v5 5/5] hw/riscv: virt: Allow creating multiple NUMA
> sockets
> 
> On Fri, May 29, 2020 at 4:49 AM Anup Patel <anup.patel@wdc.com> wrote:
> >
> > We extend RISC-V virt machine to allow creating a multi-socket
> > machine. Each RISC-V virt machine socket is a NUMA node having a set
> > of HARTs, a memory instance, a CLINT instance, and a PLIC instance.
> > Other devices are shared between all sockets. We also update the
> > generated device tree accordingly.
> >
> > By default, NUMA multi-socket support is disabled for RISC-V virt
> > machine. To enable it, users can use "-numa" command-line options of
> > QEMU.
> >
> > Example1: For two NUMA nodes with 2 CPUs each, append following to
> > command-line options: "-smp 4 -numa node -numa node"
> >
> > Example2: For two NUMA nodes with 1 and 3 CPUs, append following to
> > command-line options:
> > "-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \ -numa
> > cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \ -numa
> > cpu,node-id=1,core-id=3"
> >
> > The maximum number of sockets in a RISC-V virt machine is 8 but this
> > limit can be changed in future.
> >
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > ---
> >  hw/riscv/virt.c         | 530 +++++++++++++++++++++++-----------------
> >  include/hw/riscv/virt.h |   9 +-
> >  2 files changed, 308 insertions(+), 231 deletions(-)
> >
> > diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c index
> > 421815081d..2863b42cea 100644
> > --- a/hw/riscv/virt.c
> > +++ b/hw/riscv/virt.c
> > @@ -35,6 +35,7 @@
> >  #include "hw/riscv/sifive_test.h"
> >  #include "hw/riscv/virt.h"
> >  #include "hw/riscv/boot.h"
> > +#include "hw/riscv/numa.h"
> >  #include "chardev/char.h"
> >  #include "sysemu/arch_init.h"
> >  #include "sysemu/device_tree.h"
> > @@ -60,7 +61,7 @@ static const struct MemmapEntry {
> >      [VIRT_TEST] =        {   0x100000,        0x1000 },
> >      [VIRT_RTC] =         {   0x101000,        0x1000 },
> >      [VIRT_CLINT] =       {  0x2000000,       0x10000 },
> > -    [VIRT_PLIC] =        {  0xc000000,     0x4000000 },
> > +    [VIRT_PLIC] =        {  0xc000000, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
> >      [VIRT_UART0] =       { 0x10000000,         0x100 },
> >      [VIRT_VIRTIO] =      { 0x10001000,        0x1000 },
> >      [VIRT_FLASH] =       { 0x20000000,     0x4000000 },
> > @@ -182,10 +183,17 @@ static void create_fdt(RISCVVirtState *s, const
> struct MemmapEntry *memmap,
> >      uint64_t mem_size, const char *cmdline)  {
> >      void *fdt;
> > -    int cpu, i;
> > -    uint32_t *cells;
> > -    char *nodename;
> > -    uint32_t plic_phandle, test_phandle, phandle = 1;
> > +    int i, cpu, socket;
> > +    MachineState *mc = MACHINE(s);
> > +    uint64_t addr, size;
> > +    uint32_t *clint_cells, *plic_cells;
> > +    unsigned long clint_addr, plic_addr;
> > +    uint32_t plic_phandle[MAX_NODES];
> > +    uint32_t cpu_phandle, intc_phandle, test_phandle;
> > +    uint32_t phandle = 1, plic_mmio_phandle = 1;
> > +    uint32_t plic_pcie_phandle = 1, plic_virtio_phandle = 1;
> > +    char *mem_name, *cpu_name, *core_name, *intc_name;
> > +    char *name, *clint_name, *plic_name, *clust_name;
> >      hwaddr flashsize = virt_memmap[VIRT_FLASH].size / 2;
> >      hwaddr flashbase = virt_memmap[VIRT_FLASH].base;
> >
> > @@ -206,231 +214,238 @@ static void create_fdt(RISCVVirtState *s, const
> struct MemmapEntry *memmap,
> >      qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
> >      qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
> >
> > -    nodename = g_strdup_printf("/memory@%lx",
> > -        (long)memmap[VIRT_DRAM].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > -        memmap[VIRT_DRAM].base >> 32, memmap[VIRT_DRAM].base,
> > -        mem_size >> 32, mem_size);
> > -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> > -    g_free(nodename);
> > -
> >      qemu_fdt_add_subnode(fdt, "/cpus");
> >      qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
> >                            SIFIVE_CLINT_TIMEBASE_FREQ);
> >      qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
> >      qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
> > +    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> 
> I'm no expert with cpu-map. Do you mind CCing Atish in the next version and
> see if he can Ack these DT changes?

Sure, he is already there is CC.

By default, you and Atish are always in CC for all my patches. I apologize
if I missed in any of the patches.

> 
> > +
> > +    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
> > +        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
> > +        qemu_fdt_add_subnode(fdt, clust_name);
> > +
> > +        plic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> > +        clint_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> > +
> > +        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
> > +            cpu_phandle = phandle++;
> >
> > -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> > -        int cpu_phandle = phandle++;
> > -        int intc_phandle;
> > -        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> > -        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller",
> cpu);
> > -        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
> > -        qemu_fdt_add_subnode(fdt, nodename);
> > +            cpu_name = g_strdup_printf("/cpus/cpu@%d",
> > +                s->soc[socket].hartid_base + cpu);
> > +            qemu_fdt_add_subnode(fdt, cpu_name);
> >  #if defined(TARGET_RISCV32)
> > -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type",
> > + "riscv,sv32");
> >  #else
> > -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type",
> > + "riscv,sv48");
> >  #endif
> > -        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
> > -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
> > -        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
> > -        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
> > -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
> > -        qemu_fdt_setprop_cell(fdt, nodename, "phandle", cpu_phandle);
> > -        intc_phandle = phandle++;
> > -        qemu_fdt_add_subnode(fdt, intc);
> > -        qemu_fdt_setprop_cell(fdt, intc, "phandle", intc_phandle);
> > -        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
> > -        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
> > -        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
> > -        g_free(isa);
> > -        g_free(intc);
> > -        g_free(nodename);
> > -    }
> > +            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
> > +            g_free(name);
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
> > +            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
> > +                s->soc[socket].hartid_base + cpu);
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
> > +            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
> > +            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle",
> > + cpu_phandle);
> > +
> > +            intc_name = g_strdup_printf("%s/interrupt-controller",
> cpu_name);
> > +            qemu_fdt_add_subnode(fdt, intc_name);
> > +            intc_phandle = phandle++;
> > +            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
> > +            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
> > +                "riscv,cpu-intc");
> > +            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
> > +            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells",
> > + 1);
> > +
> > +            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> > +            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> > +            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> > +            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> > +
> > +            plic_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> > +            plic_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> > +            plic_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> > +            plic_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> > +
> > +            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
> > +            qemu_fdt_add_subnode(fdt, core_name);
> > +            qemu_fdt_setprop_cell(fdt, core_name, "cpu",
> > + cpu_phandle);
> > +
> > +            g_free(core_name);
> > +            g_free(intc_name);
> > +            g_free(cpu_name);
> > +        }
> >
> > -    /* Add cpu-topology node */
> > -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> > -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map/cluster0");
> > -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> > -        char *core_nodename = g_strdup_printf("/cpus/cpu-
> map/cluster0/core%d",
> > -                                              cpu);
> > -        char *cpu_nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> > -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt,
> cpu_nodename);
> > -        qemu_fdt_add_subnode(fdt, core_nodename);
> > -        qemu_fdt_setprop_cell(fdt, core_nodename, "cpu", intc_phandle);
> > -        g_free(core_nodename);
> > -        g_free(cpu_nodename);
> > +        addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(mc,
> socket);
> > +        size = riscv_socket_mem_size(mc, socket);
> > +        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
> > +        qemu_fdt_add_subnode(fdt, mem_name);
> > +        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
> > +            addr >> 32, addr, size >> 32, size);
> > +        qemu_fdt_setprop_string(fdt, mem_name, "device_type",
> "memory");
> > +        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
> > +        g_free(mem_name);
> > +
> > +        clint_addr = memmap[VIRT_CLINT].base +
> > +            (memmap[VIRT_CLINT].size * socket);
> > +        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
> > +        qemu_fdt_add_subnode(fdt, clint_name);
> > +        qemu_fdt_setprop_string(fdt, clint_name, "compatible",
> "riscv,clint0");
> > +        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
> > +            0x0, clint_addr, 0x0, memmap[VIRT_CLINT].size);
> > +        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
> > +            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> > +        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
> > +        g_free(clint_name);
> > +
> > +        plic_phandle[socket] = phandle++;
> > +        plic_addr = memmap[VIRT_PLIC].base + (memmap[VIRT_PLIC].size *
> socket);
> > +        plic_name = g_strdup_printf("/soc/plic@%lx", plic_addr);
> > +        qemu_fdt_add_subnode(fdt, plic_name);
> > +        qemu_fdt_setprop_cell(fdt, plic_name,
> > +            "#address-cells", FDT_PLIC_ADDR_CELLS);
> > +        qemu_fdt_setprop_cell(fdt, plic_name,
> > +            "#interrupt-cells", FDT_PLIC_INT_CELLS);
> > +        qemu_fdt_setprop_string(fdt, plic_name, "compatible", "riscv,plic0");
> > +        qemu_fdt_setprop(fdt, plic_name, "interrupt-controller", NULL, 0);
> > +        qemu_fdt_setprop(fdt, plic_name, "interrupts-extended",
> > +            plic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> > +        qemu_fdt_setprop_cells(fdt, plic_name, "reg",
> > +            0x0, plic_addr, 0x0, memmap[VIRT_PLIC].size);
> > +        qemu_fdt_setprop_cell(fdt, plic_name, "riscv,ndev", VIRTIO_NDEV);
> > +        riscv_socket_fdt_write_id(mc, fdt, plic_name, socket);
> > +        qemu_fdt_setprop_cell(fdt, plic_name, "phandle",
> plic_phandle[socket]);
> > +        g_free(plic_name);
> > +
> > +        g_free(clint_cells);
> > +        g_free(plic_cells);
> > +        g_free(clust_name);
> >      }
> >
> > -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> > -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> > -        nodename =
> > -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> > -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> > -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> > -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> > -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> > -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> > -        g_free(nodename);
> > -    }
> > -    nodename = g_strdup_printf("/soc/clint@%lx",
> > -        (long)memmap[VIRT_CLINT].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > -        0x0, memmap[VIRT_CLINT].base,
> > -        0x0, memmap[VIRT_CLINT].size);
> > -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> > -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> > -    g_free(cells);
> > -    g_free(nodename);
> > -
> > -    plic_phandle = phandle++;
> > -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> > -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> > -        nodename =
> > -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> > -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> > -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> > -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> > -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> > -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> > -        g_free(nodename);
> > +    for (socket = 0; socket < riscv_socket_count(mc); socket++) {
> > +        if (socket == 0) {
> > +            plic_mmio_phandle = plic_phandle[socket];
> > +            plic_virtio_phandle = plic_phandle[socket];
> > +            plic_pcie_phandle = plic_phandle[socket];
> > +        }
> > +        if (socket == 1) {
> > +            plic_virtio_phandle = plic_phandle[socket];
> > +            plic_pcie_phandle = plic_phandle[socket];
> > +        }
> > +        if (socket == 2) {
> > +            plic_pcie_phandle = plic_phandle[socket];
> > +        }
> >      }
> > -    nodename = g_strdup_printf("/soc/interrupt-controller@%lx",
> > -        (long)memmap[VIRT_PLIC].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> > -                          FDT_PLIC_ADDR_CELLS);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> > -                          FDT_PLIC_INT_CELLS);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,plic0");
> > -    qemu_fdt_setprop(fdt, nodename, "interrupt-controller", NULL, 0);
> > -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> > -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > -        0x0, memmap[VIRT_PLIC].base,
> > -        0x0, memmap[VIRT_PLIC].size);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
> > -    plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
> > -    g_free(cells);
> > -    g_free(nodename);
> > +
> > +    riscv_socket_fdt_write_distance_matrix(mc, fdt);
> >
> >      for (i = 0; i < VIRTIO_COUNT; i++) {
> > -        nodename = g_strdup_printf("/virtio_mmio@%lx",
> > +        name = g_strdup_printf("/soc/virtio_mmio@%lx",
> >              (long)(memmap[VIRT_VIRTIO].base + i *
> memmap[VIRT_VIRTIO].size));
> > -        qemu_fdt_add_subnode(fdt, nodename);
> > -        qemu_fdt_setprop_string(fdt, nodename, "compatible",
> "virtio,mmio");
> > -        qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > +        qemu_fdt_add_subnode(fdt, name);
> > +        qemu_fdt_setprop_string(fdt, name, "compatible", "virtio,mmio");
> > +        qemu_fdt_setprop_cells(fdt, name, "reg",
> >              0x0, memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
> >              0x0, memmap[VIRT_VIRTIO].size);
> > -        qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent",
> plic_phandle);
> > -        qemu_fdt_setprop_cell(fdt, nodename, "interrupts", VIRTIO_IRQ + i);
> > -        g_free(nodename);
> > +        qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
> > +            plic_virtio_phandle);
> > +        qemu_fdt_setprop_cell(fdt, name, "interrupts", VIRTIO_IRQ + i);
> > +        g_free(name);
> >      }
> >
> > -    nodename = g_strdup_printf("/soc/pci@%lx",
> > +    name = g_strdup_printf("/soc/pci@%lx",
> >          (long) memmap[VIRT_PCIE_ECAM].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> > -                          FDT_PCI_ADDR_CELLS);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> > -                          FDT_PCI_INT_CELLS);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0x2);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> > -                            "pci-host-ecam-generic");
> > -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "pci");
> > -    qemu_fdt_setprop_cell(fdt, nodename, "linux,pci-domain", 0);
> > -    qemu_fdt_setprop_cells(fdt, nodename, "bus-range", 0,
> > -                           memmap[VIRT_PCIE_ECAM].size /
> > -                               PCIE_MMCFG_SIZE_MIN - 1);
> > -    qemu_fdt_setprop(fdt, nodename, "dma-coherent", NULL, 0);
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg", 0,
> memmap[VIRT_PCIE_ECAM].base,
> > -                           0, memmap[VIRT_PCIE_ECAM].size);
> > -    qemu_fdt_setprop_sized_cells(fdt, nodename, "ranges",
> > +    qemu_fdt_add_subnode(fdt, name);
> > +    qemu_fdt_setprop_cell(fdt, name, "#address-cells",
> FDT_PCI_ADDR_CELLS);
> > +    qemu_fdt_setprop_cell(fdt, name, "#interrupt-cells",
> FDT_PCI_INT_CELLS);
> > +    qemu_fdt_setprop_cell(fdt, name, "#size-cells", 0x2);
> > +    qemu_fdt_setprop_string(fdt, name, "compatible", "pci-host-ecam-
> generic");
> > +    qemu_fdt_setprop_string(fdt, name, "device_type", "pci");
> > +    qemu_fdt_setprop_cell(fdt, name, "linux,pci-domain", 0);
> > +    qemu_fdt_setprop_cells(fdt, name, "bus-range", 0,
> > +        memmap[VIRT_PCIE_ECAM].size / PCIE_MMCFG_SIZE_MIN - 1);
> > +    qemu_fdt_setprop(fdt, name, "dma-coherent", NULL, 0);
> > +    qemu_fdt_setprop_cells(fdt, name, "reg", 0,
> > +        memmap[VIRT_PCIE_ECAM].base, 0,
> memmap[VIRT_PCIE_ECAM].size);
> > +    qemu_fdt_setprop_sized_cells(fdt, name, "ranges",
> >          1, FDT_PCI_RANGE_IOPORT, 2, 0,
> >          2, memmap[VIRT_PCIE_PIO].base, 2, memmap[VIRT_PCIE_PIO].size,
> >          1, FDT_PCI_RANGE_MMIO,
> >          2, memmap[VIRT_PCIE_MMIO].base,
> >          2, memmap[VIRT_PCIE_MMIO].base, 2,
> memmap[VIRT_PCIE_MMIO].size);
> > -    create_pcie_irq_map(fdt, nodename, plic_phandle);
> > -    g_free(nodename);
> > +    create_pcie_irq_map(fdt, name, plic_pcie_phandle);
> > +    g_free(name);
> >
> >      test_phandle = phandle++;
> > -    nodename = g_strdup_printf("/test@%lx",
> > +    name = g_strdup_printf("/soc/test@%lx",
> >          (long)memmap[VIRT_TEST].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > +    qemu_fdt_add_subnode(fdt, name);
> >      {
> >          const char compat[] = "sifive,test1\0sifive,test0\0syscon";
> > -        qemu_fdt_setprop(fdt, nodename, "compatible", compat,
> sizeof(compat));
> > +        qemu_fdt_setprop(fdt, name, "compatible", compat,
> > + sizeof(compat));
> >      }
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > +    qemu_fdt_setprop_cells(fdt, name, "reg",
> >          0x0, memmap[VIRT_TEST].base,
> >          0x0, memmap[VIRT_TEST].size);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", test_phandle);
> > -    test_phandle = qemu_fdt_get_phandle(fdt, nodename);
> > -    g_free(nodename);
> > -
> > -    nodename = g_strdup_printf("/reboot");
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-
> reboot");
> > -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_RESET);
> > -    g_free(nodename);
> > -
> > -    nodename = g_strdup_printf("/poweroff");
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-
> poweroff");
> > -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_PASS);
> > -    g_free(nodename);
> > -
> > -    nodename = g_strdup_printf("/uart@%lx",
> > -        (long)memmap[VIRT_UART0].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "ns16550a");
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > +    qemu_fdt_setprop_cell(fdt, name, "phandle", test_phandle);
> > +    test_phandle = qemu_fdt_get_phandle(fdt, name);
> > +    g_free(name);
> > +
> > +    name = g_strdup_printf("/soc/reboot");
> > +    qemu_fdt_add_subnode(fdt, name);
> > +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-reboot");
> > +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> > +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> > +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_RESET);
> > +    g_free(name);
> > +
> > +    name = g_strdup_printf("/soc/poweroff");
> > +    qemu_fdt_add_subnode(fdt, name);
> > +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-
> poweroff");
> > +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> > +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> > +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_PASS);
> > +    g_free(name);
> > +
> > +    name = g_strdup_printf("/soc/uart@%lx",
> (long)memmap[VIRT_UART0].base);
> > +    qemu_fdt_add_subnode(fdt, name);
> > +    qemu_fdt_setprop_string(fdt, name, "compatible", "ns16550a");
> > +    qemu_fdt_setprop_cells(fdt, name, "reg",
> >          0x0, memmap[VIRT_UART0].base,
> >          0x0, memmap[VIRT_UART0].size);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent",
> plic_phandle);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", UART0_IRQ);
> > +    qemu_fdt_setprop_cell(fdt, name, "clock-frequency", 3686400);
> > +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
> plic_mmio_phandle);
> > +    qemu_fdt_setprop_cell(fdt, name, "interrupts", UART0_IRQ);
> >
> >      qemu_fdt_add_subnode(fdt, "/chosen");
> > -    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", nodename);
> > +    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", name);
> >      if (cmdline) {
> >          qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
> >      }
> > -    g_free(nodename);
> > -
> > -    nodename = g_strdup_printf("/rtc@%lx",
> > -        (long)memmap[VIRT_RTC].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> > -        "google,goldfish-rtc");
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > +    g_free(name);
> > +
> > +    name = g_strdup_printf("/soc/rtc@%lx",
> (long)memmap[VIRT_RTC].base);
> > +    qemu_fdt_add_subnode(fdt, name);
> > +    qemu_fdt_setprop_string(fdt, name, "compatible", "google,goldfish-
> rtc");
> > +    qemu_fdt_setprop_cells(fdt, name, "reg",
> >          0x0, memmap[VIRT_RTC].base,
> >          0x0, memmap[VIRT_RTC].size);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent",
> plic_phandle);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", RTC_IRQ);
> > -    g_free(nodename);
> > -
> > -    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
> > -    qemu_fdt_add_subnode(s->fdt, nodename);
> > -    qemu_fdt_setprop_string(s->fdt, nodename, "compatible", "cfi-flash");
> > -    qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg",
> > +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
> plic_mmio_phandle);
> > +    qemu_fdt_setprop_cell(fdt, name, "interrupts", RTC_IRQ);
> > +    g_free(name);
> > +
> > +    name = g_strdup_printf("/soc/flash@%" PRIx64, flashbase);
> > +    qemu_fdt_add_subnode(s->fdt, name);
> > +    qemu_fdt_setprop_string(s->fdt, name, "compatible", "cfi-flash");
> > +    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
> >                                   2, flashbase, 2, flashsize,
> >                                   2, flashbase + flashsize, 2, flashsize);
> > -    qemu_fdt_setprop_cell(s->fdt, nodename, "bank-width", 4);
> > -    g_free(nodename);
> > +    qemu_fdt_setprop_cell(s->fdt, name, "bank-width", 4);
> > +    g_free(name);
> >  }
> >
> > -
> >  static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
> >                                            hwaddr ecam_base, hwaddr ecam_size,
> >                                            hwaddr mmio_base, hwaddr
> > mmio_size, @@ -478,21 +493,100 @@ static void
> riscv_virt_board_init(MachineState *machine)
> >      MemoryRegion *system_memory = get_system_memory();
> >      MemoryRegion *main_mem = g_new(MemoryRegion, 1);
> >      MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
> > -    char *plic_hart_config;
> > +    char *plic_hart_config, *soc_name;
> >      size_t plic_hart_config_len;
> >      target_ulong start_addr = memmap[VIRT_DRAM].base;
> > -    int i;
> > -    unsigned int smp_cpus = machine->smp.cpus;
> > -
> > -    /* Initialize SOC */
> > -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> > -                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> > -    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-
> type",
> > -                            &error_abort);
> > -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> > -                            &error_abort);
> > -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> > -                            &error_abort);
> > +    DeviceState *mmio_plic, *virtio_plic, *pcie_plic;
> > +    int i, j, base_hartid, hart_count;
> > +
> > +    /* Check socket count limit */
> > +    if (VIRT_SOCKETS_MAX < riscv_socket_count(machine)) {
> > +        error_report("number of sockets/nodes should be less than %d",
> > +            VIRT_SOCKETS_MAX);
> > +        exit(1);
> > +    }
> > +
> > +    /* Initialize sockets */
> > +    mmio_plic = virtio_plic = pcie_plic = NULL;
> > +    for (i = 0; i < riscv_socket_count(machine); i++) {
> > +        if (!riscv_socket_check_hartids(machine, i)) {
> > +            error_report("discontinuous hartids in socket%d", i);
> > +            exit(1);
> > +        }
> > +
> > +        base_hartid = riscv_socket_first_hartid(machine, i);
> > +        if (base_hartid < 0) {
> > +            error_report("can't find hartid base for socket%d", i);
> > +            exit(1);
> > +        }
> > +
> > +        hart_count = riscv_socket_hart_count(machine, i);
> > +        if (hart_count < 0) {
> > +            error_report("can't find hart count for socket%d", i);
> > +            exit(1);
> > +        }
> > +
> > +        soc_name = g_strdup_printf("soc%d", i);
> > +        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
> > +            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> > +        g_free(soc_name);
> > +        object_property_set_str(OBJECT(&s->soc[i]),
> > +            machine->cpu_type, "cpu-type", &error_abort);
> > +        object_property_set_int(OBJECT(&s->soc[i]),
> > +            base_hartid, "hartid-base", &error_abort);
> > +        object_property_set_int(OBJECT(&s->soc[i]),
> > +            hart_count, "num-harts", &error_abort);
> > +        object_property_set_bool(OBJECT(&s->soc[i]),
> > +            true, "realized", &error_abort);
> > +
> > +        /* Per-socket CLINT */
> > +        sifive_clint_create(
> > +            memmap[VIRT_CLINT].base + i * memmap[VIRT_CLINT].size,
> > +            memmap[VIRT_CLINT].size, base_hartid, hart_count,
> > +            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
> > + true);
> > +
> > +        /* Per-socket PLIC hart topology configuration string */
> > +        plic_hart_config_len =
> > +            (strlen(VIRT_PLIC_HART_CONFIG) + 1) * hart_count;
> > +        plic_hart_config = g_malloc0(plic_hart_config_len);
> > +        for (j = 0; j < hart_count; j++) {
> > +            if (j != 0) {
> > +                strncat(plic_hart_config, ",", plic_hart_config_len);
> > +            }
> > +            strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG,
> > +                plic_hart_config_len);
> > +            plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> > +        }
> > +
> > +        /* Per-socket PLIC */
> > +        s->plic[i] = sifive_plic_create(
> > +            memmap[VIRT_PLIC].base + i * memmap[VIRT_PLIC].size,
> > +            plic_hart_config, base_hartid,
> > +            VIRT_PLIC_NUM_SOURCES,
> > +            VIRT_PLIC_NUM_PRIORITIES,
> > +            VIRT_PLIC_PRIORITY_BASE,
> > +            VIRT_PLIC_PENDING_BASE,
> > +            VIRT_PLIC_ENABLE_BASE,
> > +            VIRT_PLIC_ENABLE_STRIDE,
> > +            VIRT_PLIC_CONTEXT_BASE,
> > +            VIRT_PLIC_CONTEXT_STRIDE,
> > +            memmap[VIRT_PLIC].size);
> > +        g_free(plic_hart_config);
> > +
> > +        /* Try to use different PLIC instance based device type */
> 
> Why do we have different types of PLICs?
> 
> > +        if (i == 0) {
> > +            mmio_plic = s->plic[i];
> > +            virtio_plic = s->plic[i];
> > +            pcie_plic = s->plic[i];
> > +        }
> > +        if (i == 1) {
> > +            virtio_plic = s->plic[i];
> > +            pcie_plic = s->plic[i];
> > +        }
> > +        if (i == 2) {
> > +            pcie_plic = s->plic[i];
> > +        }
> > +    }
> >
> >      /* register system main memory (actual RAM) */
> >      memory_region_init_ram(main_mem, NULL, "riscv_virt_board.ram",
> @@
> > -571,38 +665,14 @@ static void riscv_virt_board_init(MachineState
> *machine)
> >                            memmap[VIRT_MROM].base + sizeof(reset_vec),
> >                            &address_space_memory);
> >
> > -    /* create PLIC hart topology configuration string */
> > -    plic_hart_config_len = (strlen(VIRT_PLIC_HART_CONFIG) + 1) *
> smp_cpus;
> > -    plic_hart_config = g_malloc0(plic_hart_config_len);
> > -    for (i = 0; i < smp_cpus; i++) {
> > -        if (i != 0) {
> > -            strncat(plic_hart_config, ",", plic_hart_config_len);
> > -        }
> > -        strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG,
> plic_hart_config_len);
> > -        plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> > -    }
> > -
> > -    /* MMIO */
> > -    s->plic = sifive_plic_create(memmap[VIRT_PLIC].base,
> > -        plic_hart_config, 0,
> > -        VIRT_PLIC_NUM_SOURCES,
> > -        VIRT_PLIC_NUM_PRIORITIES,
> > -        VIRT_PLIC_PRIORITY_BASE,
> > -        VIRT_PLIC_PENDING_BASE,
> > -        VIRT_PLIC_ENABLE_BASE,
> > -        VIRT_PLIC_ENABLE_STRIDE,
> > -        VIRT_PLIC_CONTEXT_BASE,
> > -        VIRT_PLIC_CONTEXT_STRIDE,
> > -        memmap[VIRT_PLIC].size);
> > -    sifive_clint_create(memmap[VIRT_CLINT].base,
> > -        memmap[VIRT_CLINT].size, 0, smp_cpus,
> > -        SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
> > +    /* SiFive Test MMIO device */
> >      sifive_test_create(memmap[VIRT_TEST].base);
> >
> > +    /* VirtIO MMIO devices */
> >      for (i = 0; i < VIRTIO_COUNT; i++) {
> >          sysbus_create_simple("virtio-mmio",
> >              memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
> > -            qdev_get_gpio_in(DEVICE(s->plic), VIRTIO_IRQ + i));
> > +            qdev_get_gpio_in(DEVICE(virtio_plic), VIRTIO_IRQ + i));
> >      }
> >
> >      gpex_pcie_init(system_memory,
> > @@ -611,14 +681,14 @@ static void riscv_virt_board_init(MachineState
> *machine)
> >                           memmap[VIRT_PCIE_MMIO].base,
> >                           memmap[VIRT_PCIE_MMIO].size,
> >                           memmap[VIRT_PCIE_PIO].base,
> > -                         DEVICE(s->plic), true);
> > +                         DEVICE(pcie_plic), true);
> >
> >      serial_mm_init(system_memory, memmap[VIRT_UART0].base,
> > -        0, qdev_get_gpio_in(DEVICE(s->plic), UART0_IRQ), 399193,
> > +        0, qdev_get_gpio_in(DEVICE(mmio_plic), UART0_IRQ), 399193,
> >          serial_hd(0), DEVICE_LITTLE_ENDIAN);
> >
> >      sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
> > -        qdev_get_gpio_in(DEVICE(s->plic), RTC_IRQ));
> > +        qdev_get_gpio_in(DEVICE(mmio_plic), RTC_IRQ));
> >
> >      virt_flash_create(s);
> >
> > @@ -628,8 +698,6 @@ static void riscv_virt_board_init(MachineState
> *machine)
> >                                    drive_get(IF_PFLASH, 0, i));
> >      }
> >      virt_flash_map(s, system_memory);
> > -
> > -    g_free(plic_hart_config);
> >  }
> >
> >  static void riscv_virt_machine_instance_init(Object *obj) @@ -642,9
> > +710,13 @@ static void riscv_virt_machine_class_init(ObjectClass *oc,
> > void *data)
> >
> >      mc->desc = "RISC-V VirtIO board";
> >      mc->init = riscv_virt_board_init;
> > -    mc->max_cpus = 8;
> > +    mc->max_cpus = VIRT_CPUS_MAX;
> >      mc->default_cpu_type = VIRT_CPU;
> >      mc->pci_allow_0_address = true;
> > +    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
> > +    mc->cpu_index_to_instance_props =
> riscv_numa_cpu_index_to_props;
> > +    mc->get_default_cpu_node_id =
> riscv_numa_get_default_cpu_node_id;
> > +    mc->numa_mem_supported = true;
> >  }
> >
> >  static const TypeInfo riscv_virt_machine_typeinfo = { diff --git
> > a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h index
> > e69355efaf..1beacd7666 100644
> > --- a/include/hw/riscv/virt.h
> > +++ b/include/hw/riscv/virt.h
> > @@ -23,6 +23,9 @@
> >  #include "hw/sysbus.h"
> >  #include "hw/block/flash.h"
> >
> > +#define VIRT_CPUS_MAX 8
> > +#define VIRT_SOCKETS_MAX 8
> > +
> >  #define TYPE_RISCV_VIRT_MACHINE MACHINE_TYPE_NAME("virt")
> #define
> > RISCV_VIRT_MACHINE(obj) \
> >      OBJECT_CHECK(RISCVVirtState, (obj), TYPE_RISCV_VIRT_MACHINE) @@
> > -32,8 +35,8 @@ typedef struct {
> >      MachineState parent;
> >
> >      /*< public >*/
> > -    RISCVHartArrayState soc;
> > -    DeviceState *plic;
> > +    RISCVHartArrayState soc[VIRT_SOCKETS_MAX];
> > +    DeviceState *plic[VIRT_SOCKETS_MAX];
> >      PFlashCFI01 *flash[2];
> >
> >      void *fdt;
> > @@ -74,6 +77,8 @@ enum {
> >  #define VIRT_PLIC_ENABLE_STRIDE 0x80
> >  #define VIRT_PLIC_CONTEXT_BASE 0x200000  #define
> > VIRT_PLIC_CONTEXT_STRIDE 0x1000
> > +#define VIRT_PLIC_SIZE(__num_context) \
> > +    (VIRT_PLIC_CONTEXT_BASE + (__num_context) *
> > +VIRT_PLIC_CONTEXT_STRIDE)
> >
> >  #define FDT_PCI_ADDR_CELLS    3
> >  #define FDT_PCI_INT_CELLS     1
> > --
> > 2.25.1
> >
> >

Regards,
Anup

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v5 5/5] hw/riscv: virt: Allow creating multiple NUMA sockets
@ 2020-06-11 13:01       ` Anup Patel
  0 siblings, 0 replies; 28+ messages in thread
From: Anup Patel @ 2020-06-11 13:01 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Peter Maydell, Palmer Dabbelt, Alistair Francis,
	Sagar Karandikar, Atish Patra, open list:RISC-V,
	qemu-devel@nongnu.org Developers, Anup Patel



> -----Original Message-----
> From: Alistair Francis <alistair23@gmail.com>
> Sent: 11 June 2020 04:55
> To: Anup Patel <Anup.Patel@wdc.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>; Palmer Dabbelt
> <palmer@dabbelt.com>; Alistair Francis <Alistair.Francis@wdc.com>; Sagar
> Karandikar <sagark@eecs.berkeley.edu>; Atish Patra
> <Atish.Patra@wdc.com>; open list:RISC-V <qemu-riscv@nongnu.org>;
> qemu-devel@nongnu.org Developers <qemu-devel@nongnu.org>; Anup
> Patel <anup@brainfault.org>
> Subject: Re: [PATCH v5 5/5] hw/riscv: virt: Allow creating multiple NUMA
> sockets
> 
> On Fri, May 29, 2020 at 4:49 AM Anup Patel <anup.patel@wdc.com> wrote:
> >
> > We extend RISC-V virt machine to allow creating a multi-socket
> > machine. Each RISC-V virt machine socket is a NUMA node having a set
> > of HARTs, a memory instance, a CLINT instance, and a PLIC instance.
> > Other devices are shared between all sockets. We also update the
> > generated device tree accordingly.
> >
> > By default, NUMA multi-socket support is disabled for RISC-V virt
> > machine. To enable it, users can use "-numa" command-line options of
> > QEMU.
> >
> > Example1: For two NUMA nodes with 2 CPUs each, append following to
> > command-line options: "-smp 4 -numa node -numa node"
> >
> > Example2: For two NUMA nodes with 1 and 3 CPUs, append following to
> > command-line options:
> > "-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \ -numa
> > cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \ -numa
> > cpu,node-id=1,core-id=3"
> >
> > The maximum number of sockets in a RISC-V virt machine is 8 but this
> > limit can be changed in future.
> >
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > ---
> >  hw/riscv/virt.c         | 530 +++++++++++++++++++++++-----------------
> >  include/hw/riscv/virt.h |   9 +-
> >  2 files changed, 308 insertions(+), 231 deletions(-)
> >
> > diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c index
> > 421815081d..2863b42cea 100644
> > --- a/hw/riscv/virt.c
> > +++ b/hw/riscv/virt.c
> > @@ -35,6 +35,7 @@
> >  #include "hw/riscv/sifive_test.h"
> >  #include "hw/riscv/virt.h"
> >  #include "hw/riscv/boot.h"
> > +#include "hw/riscv/numa.h"
> >  #include "chardev/char.h"
> >  #include "sysemu/arch_init.h"
> >  #include "sysemu/device_tree.h"
> > @@ -60,7 +61,7 @@ static const struct MemmapEntry {
> >      [VIRT_TEST] =        {   0x100000,        0x1000 },
> >      [VIRT_RTC] =         {   0x101000,        0x1000 },
> >      [VIRT_CLINT] =       {  0x2000000,       0x10000 },
> > -    [VIRT_PLIC] =        {  0xc000000,     0x4000000 },
> > +    [VIRT_PLIC] =        {  0xc000000, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
> >      [VIRT_UART0] =       { 0x10000000,         0x100 },
> >      [VIRT_VIRTIO] =      { 0x10001000,        0x1000 },
> >      [VIRT_FLASH] =       { 0x20000000,     0x4000000 },
> > @@ -182,10 +183,17 @@ static void create_fdt(RISCVVirtState *s, const
> struct MemmapEntry *memmap,
> >      uint64_t mem_size, const char *cmdline)  {
> >      void *fdt;
> > -    int cpu, i;
> > -    uint32_t *cells;
> > -    char *nodename;
> > -    uint32_t plic_phandle, test_phandle, phandle = 1;
> > +    int i, cpu, socket;
> > +    MachineState *mc = MACHINE(s);
> > +    uint64_t addr, size;
> > +    uint32_t *clint_cells, *plic_cells;
> > +    unsigned long clint_addr, plic_addr;
> > +    uint32_t plic_phandle[MAX_NODES];
> > +    uint32_t cpu_phandle, intc_phandle, test_phandle;
> > +    uint32_t phandle = 1, plic_mmio_phandle = 1;
> > +    uint32_t plic_pcie_phandle = 1, plic_virtio_phandle = 1;
> > +    char *mem_name, *cpu_name, *core_name, *intc_name;
> > +    char *name, *clint_name, *plic_name, *clust_name;
> >      hwaddr flashsize = virt_memmap[VIRT_FLASH].size / 2;
> >      hwaddr flashbase = virt_memmap[VIRT_FLASH].base;
> >
> > @@ -206,231 +214,238 @@ static void create_fdt(RISCVVirtState *s, const
> struct MemmapEntry *memmap,
> >      qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
> >      qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
> >
> > -    nodename = g_strdup_printf("/memory@%lx",
> > -        (long)memmap[VIRT_DRAM].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > -        memmap[VIRT_DRAM].base >> 32, memmap[VIRT_DRAM].base,
> > -        mem_size >> 32, mem_size);
> > -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> > -    g_free(nodename);
> > -
> >      qemu_fdt_add_subnode(fdt, "/cpus");
> >      qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
> >                            SIFIVE_CLINT_TIMEBASE_FREQ);
> >      qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
> >      qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
> > +    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> 
> I'm no expert with cpu-map. Do you mind CCing Atish in the next version and
> see if he can Ack these DT changes?

Sure, he is already there is CC.

By default, you and Atish are always in CC for all my patches. I apologize
if I missed in any of the patches.

> 
> > +
> > +    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
> > +        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
> > +        qemu_fdt_add_subnode(fdt, clust_name);
> > +
> > +        plic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> > +        clint_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> > +
> > +        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
> > +            cpu_phandle = phandle++;
> >
> > -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> > -        int cpu_phandle = phandle++;
> > -        int intc_phandle;
> > -        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> > -        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller",
> cpu);
> > -        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
> > -        qemu_fdt_add_subnode(fdt, nodename);
> > +            cpu_name = g_strdup_printf("/cpus/cpu@%d",
> > +                s->soc[socket].hartid_base + cpu);
> > +            qemu_fdt_add_subnode(fdt, cpu_name);
> >  #if defined(TARGET_RISCV32)
> > -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type",
> > + "riscv,sv32");
> >  #else
> > -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type",
> > + "riscv,sv48");
> >  #endif
> > -        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
> > -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
> > -        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
> > -        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
> > -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
> > -        qemu_fdt_setprop_cell(fdt, nodename, "phandle", cpu_phandle);
> > -        intc_phandle = phandle++;
> > -        qemu_fdt_add_subnode(fdt, intc);
> > -        qemu_fdt_setprop_cell(fdt, intc, "phandle", intc_phandle);
> > -        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
> > -        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
> > -        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
> > -        g_free(isa);
> > -        g_free(intc);
> > -        g_free(nodename);
> > -    }
> > +            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
> > +            g_free(name);
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
> > +            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
> > +                s->soc[socket].hartid_base + cpu);
> > +            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
> > +            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
> > +            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle",
> > + cpu_phandle);
> > +
> > +            intc_name = g_strdup_printf("%s/interrupt-controller",
> cpu_name);
> > +            qemu_fdt_add_subnode(fdt, intc_name);
> > +            intc_phandle = phandle++;
> > +            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
> > +            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
> > +                "riscv,cpu-intc");
> > +            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
> > +            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells",
> > + 1);
> > +
> > +            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> > +            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> > +            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> > +            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> > +
> > +            plic_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> > +            plic_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> > +            plic_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> > +            plic_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> > +
> > +            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
> > +            qemu_fdt_add_subnode(fdt, core_name);
> > +            qemu_fdt_setprop_cell(fdt, core_name, "cpu",
> > + cpu_phandle);
> > +
> > +            g_free(core_name);
> > +            g_free(intc_name);
> > +            g_free(cpu_name);
> > +        }
> >
> > -    /* Add cpu-topology node */
> > -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> > -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map/cluster0");
> > -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> > -        char *core_nodename = g_strdup_printf("/cpus/cpu-
> map/cluster0/core%d",
> > -                                              cpu);
> > -        char *cpu_nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> > -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt,
> cpu_nodename);
> > -        qemu_fdt_add_subnode(fdt, core_nodename);
> > -        qemu_fdt_setprop_cell(fdt, core_nodename, "cpu", intc_phandle);
> > -        g_free(core_nodename);
> > -        g_free(cpu_nodename);
> > +        addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(mc,
> socket);
> > +        size = riscv_socket_mem_size(mc, socket);
> > +        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
> > +        qemu_fdt_add_subnode(fdt, mem_name);
> > +        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
> > +            addr >> 32, addr, size >> 32, size);
> > +        qemu_fdt_setprop_string(fdt, mem_name, "device_type",
> "memory");
> > +        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
> > +        g_free(mem_name);
> > +
> > +        clint_addr = memmap[VIRT_CLINT].base +
> > +            (memmap[VIRT_CLINT].size * socket);
> > +        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
> > +        qemu_fdt_add_subnode(fdt, clint_name);
> > +        qemu_fdt_setprop_string(fdt, clint_name, "compatible",
> "riscv,clint0");
> > +        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
> > +            0x0, clint_addr, 0x0, memmap[VIRT_CLINT].size);
> > +        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
> > +            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> > +        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
> > +        g_free(clint_name);
> > +
> > +        plic_phandle[socket] = phandle++;
> > +        plic_addr = memmap[VIRT_PLIC].base + (memmap[VIRT_PLIC].size *
> socket);
> > +        plic_name = g_strdup_printf("/soc/plic@%lx", plic_addr);
> > +        qemu_fdt_add_subnode(fdt, plic_name);
> > +        qemu_fdt_setprop_cell(fdt, plic_name,
> > +            "#address-cells", FDT_PLIC_ADDR_CELLS);
> > +        qemu_fdt_setprop_cell(fdt, plic_name,
> > +            "#interrupt-cells", FDT_PLIC_INT_CELLS);
> > +        qemu_fdt_setprop_string(fdt, plic_name, "compatible", "riscv,plic0");
> > +        qemu_fdt_setprop(fdt, plic_name, "interrupt-controller", NULL, 0);
> > +        qemu_fdt_setprop(fdt, plic_name, "interrupts-extended",
> > +            plic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> > +        qemu_fdt_setprop_cells(fdt, plic_name, "reg",
> > +            0x0, plic_addr, 0x0, memmap[VIRT_PLIC].size);
> > +        qemu_fdt_setprop_cell(fdt, plic_name, "riscv,ndev", VIRTIO_NDEV);
> > +        riscv_socket_fdt_write_id(mc, fdt, plic_name, socket);
> > +        qemu_fdt_setprop_cell(fdt, plic_name, "phandle",
> plic_phandle[socket]);
> > +        g_free(plic_name);
> > +
> > +        g_free(clint_cells);
> > +        g_free(plic_cells);
> > +        g_free(clust_name);
> >      }
> >
> > -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> > -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> > -        nodename =
> > -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> > -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> > -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> > -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> > -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> > -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> > -        g_free(nodename);
> > -    }
> > -    nodename = g_strdup_printf("/soc/clint@%lx",
> > -        (long)memmap[VIRT_CLINT].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > -        0x0, memmap[VIRT_CLINT].base,
> > -        0x0, memmap[VIRT_CLINT].size);
> > -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> > -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> > -    g_free(cells);
> > -    g_free(nodename);
> > -
> > -    plic_phandle = phandle++;
> > -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> > -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> > -        nodename =
> > -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> > -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> > -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> > -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> > -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> > -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> > -        g_free(nodename);
> > +    for (socket = 0; socket < riscv_socket_count(mc); socket++) {
> > +        if (socket == 0) {
> > +            plic_mmio_phandle = plic_phandle[socket];
> > +            plic_virtio_phandle = plic_phandle[socket];
> > +            plic_pcie_phandle = plic_phandle[socket];
> > +        }
> > +        if (socket == 1) {
> > +            plic_virtio_phandle = plic_phandle[socket];
> > +            plic_pcie_phandle = plic_phandle[socket];
> > +        }
> > +        if (socket == 2) {
> > +            plic_pcie_phandle = plic_phandle[socket];
> > +        }
> >      }
> > -    nodename = g_strdup_printf("/soc/interrupt-controller@%lx",
> > -        (long)memmap[VIRT_PLIC].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> > -                          FDT_PLIC_ADDR_CELLS);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> > -                          FDT_PLIC_INT_CELLS);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,plic0");
> > -    qemu_fdt_setprop(fdt, nodename, "interrupt-controller", NULL, 0);
> > -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> > -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > -        0x0, memmap[VIRT_PLIC].base,
> > -        0x0, memmap[VIRT_PLIC].size);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
> > -    plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
> > -    g_free(cells);
> > -    g_free(nodename);
> > +
> > +    riscv_socket_fdt_write_distance_matrix(mc, fdt);
> >
> >      for (i = 0; i < VIRTIO_COUNT; i++) {
> > -        nodename = g_strdup_printf("/virtio_mmio@%lx",
> > +        name = g_strdup_printf("/soc/virtio_mmio@%lx",
> >              (long)(memmap[VIRT_VIRTIO].base + i *
> memmap[VIRT_VIRTIO].size));
> > -        qemu_fdt_add_subnode(fdt, nodename);
> > -        qemu_fdt_setprop_string(fdt, nodename, "compatible",
> "virtio,mmio");
> > -        qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > +        qemu_fdt_add_subnode(fdt, name);
> > +        qemu_fdt_setprop_string(fdt, name, "compatible", "virtio,mmio");
> > +        qemu_fdt_setprop_cells(fdt, name, "reg",
> >              0x0, memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
> >              0x0, memmap[VIRT_VIRTIO].size);
> > -        qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent",
> plic_phandle);
> > -        qemu_fdt_setprop_cell(fdt, nodename, "interrupts", VIRTIO_IRQ + i);
> > -        g_free(nodename);
> > +        qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
> > +            plic_virtio_phandle);
> > +        qemu_fdt_setprop_cell(fdt, name, "interrupts", VIRTIO_IRQ + i);
> > +        g_free(name);
> >      }
> >
> > -    nodename = g_strdup_printf("/soc/pci@%lx",
> > +    name = g_strdup_printf("/soc/pci@%lx",
> >          (long) memmap[VIRT_PCIE_ECAM].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> > -                          FDT_PCI_ADDR_CELLS);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> > -                          FDT_PCI_INT_CELLS);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0x2);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> > -                            "pci-host-ecam-generic");
> > -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "pci");
> > -    qemu_fdt_setprop_cell(fdt, nodename, "linux,pci-domain", 0);
> > -    qemu_fdt_setprop_cells(fdt, nodename, "bus-range", 0,
> > -                           memmap[VIRT_PCIE_ECAM].size /
> > -                               PCIE_MMCFG_SIZE_MIN - 1);
> > -    qemu_fdt_setprop(fdt, nodename, "dma-coherent", NULL, 0);
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg", 0,
> memmap[VIRT_PCIE_ECAM].base,
> > -                           0, memmap[VIRT_PCIE_ECAM].size);
> > -    qemu_fdt_setprop_sized_cells(fdt, nodename, "ranges",
> > +    qemu_fdt_add_subnode(fdt, name);
> > +    qemu_fdt_setprop_cell(fdt, name, "#address-cells",
> FDT_PCI_ADDR_CELLS);
> > +    qemu_fdt_setprop_cell(fdt, name, "#interrupt-cells",
> FDT_PCI_INT_CELLS);
> > +    qemu_fdt_setprop_cell(fdt, name, "#size-cells", 0x2);
> > +    qemu_fdt_setprop_string(fdt, name, "compatible", "pci-host-ecam-
> generic");
> > +    qemu_fdt_setprop_string(fdt, name, "device_type", "pci");
> > +    qemu_fdt_setprop_cell(fdt, name, "linux,pci-domain", 0);
> > +    qemu_fdt_setprop_cells(fdt, name, "bus-range", 0,
> > +        memmap[VIRT_PCIE_ECAM].size / PCIE_MMCFG_SIZE_MIN - 1);
> > +    qemu_fdt_setprop(fdt, name, "dma-coherent", NULL, 0);
> > +    qemu_fdt_setprop_cells(fdt, name, "reg", 0,
> > +        memmap[VIRT_PCIE_ECAM].base, 0,
> memmap[VIRT_PCIE_ECAM].size);
> > +    qemu_fdt_setprop_sized_cells(fdt, name, "ranges",
> >          1, FDT_PCI_RANGE_IOPORT, 2, 0,
> >          2, memmap[VIRT_PCIE_PIO].base, 2, memmap[VIRT_PCIE_PIO].size,
> >          1, FDT_PCI_RANGE_MMIO,
> >          2, memmap[VIRT_PCIE_MMIO].base,
> >          2, memmap[VIRT_PCIE_MMIO].base, 2,
> memmap[VIRT_PCIE_MMIO].size);
> > -    create_pcie_irq_map(fdt, nodename, plic_phandle);
> > -    g_free(nodename);
> > +    create_pcie_irq_map(fdt, name, plic_pcie_phandle);
> > +    g_free(name);
> >
> >      test_phandle = phandle++;
> > -    nodename = g_strdup_printf("/test@%lx",
> > +    name = g_strdup_printf("/soc/test@%lx",
> >          (long)memmap[VIRT_TEST].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > +    qemu_fdt_add_subnode(fdt, name);
> >      {
> >          const char compat[] = "sifive,test1\0sifive,test0\0syscon";
> > -        qemu_fdt_setprop(fdt, nodename, "compatible", compat,
> sizeof(compat));
> > +        qemu_fdt_setprop(fdt, name, "compatible", compat,
> > + sizeof(compat));
> >      }
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > +    qemu_fdt_setprop_cells(fdt, name, "reg",
> >          0x0, memmap[VIRT_TEST].base,
> >          0x0, memmap[VIRT_TEST].size);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", test_phandle);
> > -    test_phandle = qemu_fdt_get_phandle(fdt, nodename);
> > -    g_free(nodename);
> > -
> > -    nodename = g_strdup_printf("/reboot");
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-
> reboot");
> > -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_RESET);
> > -    g_free(nodename);
> > -
> > -    nodename = g_strdup_printf("/poweroff");
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-
> poweroff");
> > -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_PASS);
> > -    g_free(nodename);
> > -
> > -    nodename = g_strdup_printf("/uart@%lx",
> > -        (long)memmap[VIRT_UART0].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "ns16550a");
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > +    qemu_fdt_setprop_cell(fdt, name, "phandle", test_phandle);
> > +    test_phandle = qemu_fdt_get_phandle(fdt, name);
> > +    g_free(name);
> > +
> > +    name = g_strdup_printf("/soc/reboot");
> > +    qemu_fdt_add_subnode(fdt, name);
> > +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-reboot");
> > +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> > +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> > +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_RESET);
> > +    g_free(name);
> > +
> > +    name = g_strdup_printf("/soc/poweroff");
> > +    qemu_fdt_add_subnode(fdt, name);
> > +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-
> poweroff");
> > +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> > +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> > +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_PASS);
> > +    g_free(name);
> > +
> > +    name = g_strdup_printf("/soc/uart@%lx",
> (long)memmap[VIRT_UART0].base);
> > +    qemu_fdt_add_subnode(fdt, name);
> > +    qemu_fdt_setprop_string(fdt, name, "compatible", "ns16550a");
> > +    qemu_fdt_setprop_cells(fdt, name, "reg",
> >          0x0, memmap[VIRT_UART0].base,
> >          0x0, memmap[VIRT_UART0].size);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent",
> plic_phandle);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", UART0_IRQ);
> > +    qemu_fdt_setprop_cell(fdt, name, "clock-frequency", 3686400);
> > +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
> plic_mmio_phandle);
> > +    qemu_fdt_setprop_cell(fdt, name, "interrupts", UART0_IRQ);
> >
> >      qemu_fdt_add_subnode(fdt, "/chosen");
> > -    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", nodename);
> > +    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", name);
> >      if (cmdline) {
> >          qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
> >      }
> > -    g_free(nodename);
> > -
> > -    nodename = g_strdup_printf("/rtc@%lx",
> > -        (long)memmap[VIRT_RTC].base);
> > -    qemu_fdt_add_subnode(fdt, nodename);
> > -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> > -        "google,goldfish-rtc");
> > -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > +    g_free(name);
> > +
> > +    name = g_strdup_printf("/soc/rtc@%lx",
> (long)memmap[VIRT_RTC].base);
> > +    qemu_fdt_add_subnode(fdt, name);
> > +    qemu_fdt_setprop_string(fdt, name, "compatible", "google,goldfish-
> rtc");
> > +    qemu_fdt_setprop_cells(fdt, name, "reg",
> >          0x0, memmap[VIRT_RTC].base,
> >          0x0, memmap[VIRT_RTC].size);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent",
> plic_phandle);
> > -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", RTC_IRQ);
> > -    g_free(nodename);
> > -
> > -    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
> > -    qemu_fdt_add_subnode(s->fdt, nodename);
> > -    qemu_fdt_setprop_string(s->fdt, nodename, "compatible", "cfi-flash");
> > -    qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg",
> > +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
> plic_mmio_phandle);
> > +    qemu_fdt_setprop_cell(fdt, name, "interrupts", RTC_IRQ);
> > +    g_free(name);
> > +
> > +    name = g_strdup_printf("/soc/flash@%" PRIx64, flashbase);
> > +    qemu_fdt_add_subnode(s->fdt, name);
> > +    qemu_fdt_setprop_string(s->fdt, name, "compatible", "cfi-flash");
> > +    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
> >                                   2, flashbase, 2, flashsize,
> >                                   2, flashbase + flashsize, 2, flashsize);
> > -    qemu_fdt_setprop_cell(s->fdt, nodename, "bank-width", 4);
> > -    g_free(nodename);
> > +    qemu_fdt_setprop_cell(s->fdt, name, "bank-width", 4);
> > +    g_free(name);
> >  }
> >
> > -
> >  static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
> >                                            hwaddr ecam_base, hwaddr ecam_size,
> >                                            hwaddr mmio_base, hwaddr
> > mmio_size, @@ -478,21 +493,100 @@ static void
> riscv_virt_board_init(MachineState *machine)
> >      MemoryRegion *system_memory = get_system_memory();
> >      MemoryRegion *main_mem = g_new(MemoryRegion, 1);
> >      MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
> > -    char *plic_hart_config;
> > +    char *plic_hart_config, *soc_name;
> >      size_t plic_hart_config_len;
> >      target_ulong start_addr = memmap[VIRT_DRAM].base;
> > -    int i;
> > -    unsigned int smp_cpus = machine->smp.cpus;
> > -
> > -    /* Initialize SOC */
> > -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> > -                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> > -    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-
> type",
> > -                            &error_abort);
> > -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> > -                            &error_abort);
> > -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> > -                            &error_abort);
> > +    DeviceState *mmio_plic, *virtio_plic, *pcie_plic;
> > +    int i, j, base_hartid, hart_count;
> > +
> > +    /* Check socket count limit */
> > +    if (VIRT_SOCKETS_MAX < riscv_socket_count(machine)) {
> > +        error_report("number of sockets/nodes should be less than %d",
> > +            VIRT_SOCKETS_MAX);
> > +        exit(1);
> > +    }
> > +
> > +    /* Initialize sockets */
> > +    mmio_plic = virtio_plic = pcie_plic = NULL;
> > +    for (i = 0; i < riscv_socket_count(machine); i++) {
> > +        if (!riscv_socket_check_hartids(machine, i)) {
> > +            error_report("discontinuous hartids in socket%d", i);
> > +            exit(1);
> > +        }
> > +
> > +        base_hartid = riscv_socket_first_hartid(machine, i);
> > +        if (base_hartid < 0) {
> > +            error_report("can't find hartid base for socket%d", i);
> > +            exit(1);
> > +        }
> > +
> > +        hart_count = riscv_socket_hart_count(machine, i);
> > +        if (hart_count < 0) {
> > +            error_report("can't find hart count for socket%d", i);
> > +            exit(1);
> > +        }
> > +
> > +        soc_name = g_strdup_printf("soc%d", i);
> > +        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
> > +            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> > +        g_free(soc_name);
> > +        object_property_set_str(OBJECT(&s->soc[i]),
> > +            machine->cpu_type, "cpu-type", &error_abort);
> > +        object_property_set_int(OBJECT(&s->soc[i]),
> > +            base_hartid, "hartid-base", &error_abort);
> > +        object_property_set_int(OBJECT(&s->soc[i]),
> > +            hart_count, "num-harts", &error_abort);
> > +        object_property_set_bool(OBJECT(&s->soc[i]),
> > +            true, "realized", &error_abort);
> > +
> > +        /* Per-socket CLINT */
> > +        sifive_clint_create(
> > +            memmap[VIRT_CLINT].base + i * memmap[VIRT_CLINT].size,
> > +            memmap[VIRT_CLINT].size, base_hartid, hart_count,
> > +            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
> > + true);
> > +
> > +        /* Per-socket PLIC hart topology configuration string */
> > +        plic_hart_config_len =
> > +            (strlen(VIRT_PLIC_HART_CONFIG) + 1) * hart_count;
> > +        plic_hart_config = g_malloc0(plic_hart_config_len);
> > +        for (j = 0; j < hart_count; j++) {
> > +            if (j != 0) {
> > +                strncat(plic_hart_config, ",", plic_hart_config_len);
> > +            }
> > +            strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG,
> > +                plic_hart_config_len);
> > +            plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> > +        }
> > +
> > +        /* Per-socket PLIC */
> > +        s->plic[i] = sifive_plic_create(
> > +            memmap[VIRT_PLIC].base + i * memmap[VIRT_PLIC].size,
> > +            plic_hart_config, base_hartid,
> > +            VIRT_PLIC_NUM_SOURCES,
> > +            VIRT_PLIC_NUM_PRIORITIES,
> > +            VIRT_PLIC_PRIORITY_BASE,
> > +            VIRT_PLIC_PENDING_BASE,
> > +            VIRT_PLIC_ENABLE_BASE,
> > +            VIRT_PLIC_ENABLE_STRIDE,
> > +            VIRT_PLIC_CONTEXT_BASE,
> > +            VIRT_PLIC_CONTEXT_STRIDE,
> > +            memmap[VIRT_PLIC].size);
> > +        g_free(plic_hart_config);
> > +
> > +        /* Try to use different PLIC instance based device type */
> 
> Why do we have different types of PLICs?
> 
> > +        if (i == 0) {
> > +            mmio_plic = s->plic[i];
> > +            virtio_plic = s->plic[i];
> > +            pcie_plic = s->plic[i];
> > +        }
> > +        if (i == 1) {
> > +            virtio_plic = s->plic[i];
> > +            pcie_plic = s->plic[i];
> > +        }
> > +        if (i == 2) {
> > +            pcie_plic = s->plic[i];
> > +        }
> > +    }
> >
> >      /* register system main memory (actual RAM) */
> >      memory_region_init_ram(main_mem, NULL, "riscv_virt_board.ram",
> @@
> > -571,38 +665,14 @@ static void riscv_virt_board_init(MachineState
> *machine)
> >                            memmap[VIRT_MROM].base + sizeof(reset_vec),
> >                            &address_space_memory);
> >
> > -    /* create PLIC hart topology configuration string */
> > -    plic_hart_config_len = (strlen(VIRT_PLIC_HART_CONFIG) + 1) *
> smp_cpus;
> > -    plic_hart_config = g_malloc0(plic_hart_config_len);
> > -    for (i = 0; i < smp_cpus; i++) {
> > -        if (i != 0) {
> > -            strncat(plic_hart_config, ",", plic_hart_config_len);
> > -        }
> > -        strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG,
> plic_hart_config_len);
> > -        plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> > -    }
> > -
> > -    /* MMIO */
> > -    s->plic = sifive_plic_create(memmap[VIRT_PLIC].base,
> > -        plic_hart_config, 0,
> > -        VIRT_PLIC_NUM_SOURCES,
> > -        VIRT_PLIC_NUM_PRIORITIES,
> > -        VIRT_PLIC_PRIORITY_BASE,
> > -        VIRT_PLIC_PENDING_BASE,
> > -        VIRT_PLIC_ENABLE_BASE,
> > -        VIRT_PLIC_ENABLE_STRIDE,
> > -        VIRT_PLIC_CONTEXT_BASE,
> > -        VIRT_PLIC_CONTEXT_STRIDE,
> > -        memmap[VIRT_PLIC].size);
> > -    sifive_clint_create(memmap[VIRT_CLINT].base,
> > -        memmap[VIRT_CLINT].size, 0, smp_cpus,
> > -        SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
> > +    /* SiFive Test MMIO device */
> >      sifive_test_create(memmap[VIRT_TEST].base);
> >
> > +    /* VirtIO MMIO devices */
> >      for (i = 0; i < VIRTIO_COUNT; i++) {
> >          sysbus_create_simple("virtio-mmio",
> >              memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
> > -            qdev_get_gpio_in(DEVICE(s->plic), VIRTIO_IRQ + i));
> > +            qdev_get_gpio_in(DEVICE(virtio_plic), VIRTIO_IRQ + i));
> >      }
> >
> >      gpex_pcie_init(system_memory,
> > @@ -611,14 +681,14 @@ static void riscv_virt_board_init(MachineState
> *machine)
> >                           memmap[VIRT_PCIE_MMIO].base,
> >                           memmap[VIRT_PCIE_MMIO].size,
> >                           memmap[VIRT_PCIE_PIO].base,
> > -                         DEVICE(s->plic), true);
> > +                         DEVICE(pcie_plic), true);
> >
> >      serial_mm_init(system_memory, memmap[VIRT_UART0].base,
> > -        0, qdev_get_gpio_in(DEVICE(s->plic), UART0_IRQ), 399193,
> > +        0, qdev_get_gpio_in(DEVICE(mmio_plic), UART0_IRQ), 399193,
> >          serial_hd(0), DEVICE_LITTLE_ENDIAN);
> >
> >      sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
> > -        qdev_get_gpio_in(DEVICE(s->plic), RTC_IRQ));
> > +        qdev_get_gpio_in(DEVICE(mmio_plic), RTC_IRQ));
> >
> >      virt_flash_create(s);
> >
> > @@ -628,8 +698,6 @@ static void riscv_virt_board_init(MachineState
> *machine)
> >                                    drive_get(IF_PFLASH, 0, i));
> >      }
> >      virt_flash_map(s, system_memory);
> > -
> > -    g_free(plic_hart_config);
> >  }
> >
> >  static void riscv_virt_machine_instance_init(Object *obj) @@ -642,9
> > +710,13 @@ static void riscv_virt_machine_class_init(ObjectClass *oc,
> > void *data)
> >
> >      mc->desc = "RISC-V VirtIO board";
> >      mc->init = riscv_virt_board_init;
> > -    mc->max_cpus = 8;
> > +    mc->max_cpus = VIRT_CPUS_MAX;
> >      mc->default_cpu_type = VIRT_CPU;
> >      mc->pci_allow_0_address = true;
> > +    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
> > +    mc->cpu_index_to_instance_props =
> riscv_numa_cpu_index_to_props;
> > +    mc->get_default_cpu_node_id =
> riscv_numa_get_default_cpu_node_id;
> > +    mc->numa_mem_supported = true;
> >  }
> >
> >  static const TypeInfo riscv_virt_machine_typeinfo = { diff --git
> > a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h index
> > e69355efaf..1beacd7666 100644
> > --- a/include/hw/riscv/virt.h
> > +++ b/include/hw/riscv/virt.h
> > @@ -23,6 +23,9 @@
> >  #include "hw/sysbus.h"
> >  #include "hw/block/flash.h"
> >
> > +#define VIRT_CPUS_MAX 8
> > +#define VIRT_SOCKETS_MAX 8
> > +
> >  #define TYPE_RISCV_VIRT_MACHINE MACHINE_TYPE_NAME("virt")
> #define
> > RISCV_VIRT_MACHINE(obj) \
> >      OBJECT_CHECK(RISCVVirtState, (obj), TYPE_RISCV_VIRT_MACHINE) @@
> > -32,8 +35,8 @@ typedef struct {
> >      MachineState parent;
> >
> >      /*< public >*/
> > -    RISCVHartArrayState soc;
> > -    DeviceState *plic;
> > +    RISCVHartArrayState soc[VIRT_SOCKETS_MAX];
> > +    DeviceState *plic[VIRT_SOCKETS_MAX];
> >      PFlashCFI01 *flash[2];
> >
> >      void *fdt;
> > @@ -74,6 +77,8 @@ enum {
> >  #define VIRT_PLIC_ENABLE_STRIDE 0x80
> >  #define VIRT_PLIC_CONTEXT_BASE 0x200000  #define
> > VIRT_PLIC_CONTEXT_STRIDE 0x1000
> > +#define VIRT_PLIC_SIZE(__num_context) \
> > +    (VIRT_PLIC_CONTEXT_BASE + (__num_context) *
> > +VIRT_PLIC_CONTEXT_STRIDE)
> >
> >  #define FDT_PCI_ADDR_CELLS    3
> >  #define FDT_PCI_INT_CELLS     1
> > --
> > 2.25.1
> >
> >

Regards,
Anup

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
  2020-06-10 23:28     ` Alistair Francis
  (?)
@ 2020-06-11 13:11     ` Anup Patel
  2020-06-13  0:52       ` Alistair Francis
  -1 siblings, 1 reply; 28+ messages in thread
From: Anup Patel @ 2020-06-11 13:11 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Peter Maydell, open list:RISC-V, Sagar Karandikar, Anup Patel,
	qemu-devel@nongnu.org Developers, Atish Patra, Alistair Francis,
	Palmer Dabbelt



> -----Original Message-----
> From: Qemu-riscv <qemu-riscv-
> bounces+anup.patel=wdc.com@nongnu.org> On Behalf Of Alistair Francis
> Sent: 11 June 2020 04:59
> To: Anup Patel <Anup.Patel@wdc.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>; open list:RISC-V <qemu-
> riscv@nongnu.org>; Sagar Karandikar <sagark@eecs.berkeley.edu>; Anup
> Patel <anup@brainfault.org>; qemu-devel@nongnu.org Developers <qemu-
> devel@nongnu.org>; Atish Patra <Atish.Patra@wdc.com>; Alistair Francis
> <Alistair.Francis@wdc.com>; Palmer Dabbelt <palmer@dabbelt.com>
> Subject: Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket
> NUMA machines
> 
> On Fri, May 29, 2020 at 4:48 AM Anup Patel <anup.patel@wdc.com> wrote:
> >
> > We add common helper routines which can be shared by RISC-V
> > multi-socket NUMA machines.
> >
> > We have two types of helpers:
> > 1. riscv_socket_xyz() - These helper assist managing multiple
> >    sockets irrespective whether QEMU NUMA is enabled/disabled 2.
> > riscv_numa_xyz() - These helpers assist in providing
> >    necessary QEMU machine callbacks for QEMU NUMA emulation
> >
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > ---
> >  hw/riscv/Makefile.objs  |   1 +
> >  hw/riscv/numa.c         | 242
> ++++++++++++++++++++++++++++++++++++++++
> >  include/hw/riscv/numa.h |  51 +++++++++
> >  3 files changed, 294 insertions(+)
> >  create mode 100644 hw/riscv/numa.c
> >  create mode 100644 include/hw/riscv/numa.h
> 
> I don't love that we have an entire file of functions to help with NUMA when
> no other arch seems to have anything this complex.
> 
> What about RISC-V requires extra complexity?

Other architectures, generally have one machine supporting NUMA.

In QEMU RISC-V, we are supporting NUMA in two machines (i.e Virt
and Spike). Both these machines, are synthetic machines and don't
match real-world hardware. The Spike machine is even more unique
because it has minimum number of devices and no interrupt controller.

In future, we might have few more machines in QEMU RISC-V having
NUMA/multi-socket support.

Comparted to other architectures, the riscv_numa_xyz() callbacks
defined here do:
1. Linear mapping of CPU arch_id to CPU logical idx
2. Linear assignment of node_id to CPU idx

The requirement 2) mentioned above is because CLINT and PLIC
device emulation require contiguous hard IDs in a socket.

Regards,
Anup

> 
> >
> > diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs index
> > fc3c6dd7c8..4483e61879 100644
> > --- a/hw/riscv/Makefile.objs
> > +++ b/hw/riscv/Makefile.objs
> > @@ -1,4 +1,5 @@
> >  obj-y += boot.o
> > +obj-y += numa.o
> >  obj-$(CONFIG_SPIKE) += riscv_htif.o
> >  obj-$(CONFIG_HART) += riscv_hart.o
> >  obj-$(CONFIG_SIFIVE_E) += sifive_e.o
> > diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c new file mode 100644
> > index 0000000000..4f92307102
> > --- /dev/null
> > +++ b/hw/riscv/numa.c
> > @@ -0,0 +1,242 @@
> > +/*
> > + * QEMU RISC-V NUMA Helper
> > + *
> > + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > +modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2 or later, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but
> > +WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> > +or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > +License for
> > + * more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > +along with
> > + * this program.  If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/units.h"
> > +#include "qemu/log.h"
> > +#include "qemu/error-report.h"
> > +#include "qapi/error.h"
> > +#include "hw/boards.h"
> > +#include "hw/qdev-properties.h"
> > +#include "hw/riscv/numa.h"
> > +#include "sysemu/device_tree.h"
> > +
> > +static bool numa_enabled(const MachineState *ms) {
> > +    return (ms->numa_state && ms->numa_state->num_nodes) ? true :
> > +false; }
> > +
> > +int riscv_socket_count(const MachineState *ms) {
> > +    return (numa_enabled(ms)) ? ms->numa_state->num_nodes : 1; }
> > +
> > +int riscv_socket_first_hartid(const MachineState *ms, int socket_id)
> > +{
> > +    int i, first_hartid = ms->smp.cpus;
> > +
> > +    if (!numa_enabled(ms)) {
> > +        return (!socket_id) ? 0 : -1;
> > +    }
> > +
> > +    for (i = 0; i < ms->smp.cpus; i++) {
> > +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> > +            continue;
> > +        }
> > +        if (i < first_hartid) {
> > +            first_hartid = i;
> > +        }
> > +    }
> > +
> > +    return (first_hartid < ms->smp.cpus) ? first_hartid : -1; }
> > +
> > +int riscv_socket_last_hartid(const MachineState *ms, int socket_id) {
> > +    int i, last_hartid = -1;
> > +
> > +    if (!numa_enabled(ms)) {
> > +        return (!socket_id) ? ms->smp.cpus - 1 : -1;
> > +    }
> > +
> > +    for (i = 0; i < ms->smp.cpus; i++) {
> > +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> > +            continue;
> > +        }
> > +        if (i > last_hartid) {
> > +            last_hartid = i;
> > +        }
> > +    }
> > +
> > +    return (last_hartid < ms->smp.cpus) ? last_hartid : -1; }
> > +
> > +int riscv_socket_hart_count(const MachineState *ms, int socket_id) {
> > +    int first_hartid, last_hartid;
> > +
> > +    if (!numa_enabled(ms)) {
> > +        return (!socket_id) ? ms->smp.cpus : -1;
> > +    }
> > +
> > +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> > +    if (first_hartid < 0) {
> > +        return -1;
> > +    }
> > +
> > +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> > +    if (last_hartid < 0) {
> > +        return -1;
> > +    }
> > +
> > +    if (first_hartid > last_hartid) {
> > +        return -1;
> > +    }
> > +
> > +    return last_hartid - first_hartid + 1; }
> > +
> > +bool riscv_socket_check_hartids(const MachineState *ms, int
> > +socket_id) {
> > +    int i, first_hartid, last_hartid;
> > +
> > +    if (!numa_enabled(ms)) {
> > +        return (!socket_id) ? true : false;
> > +    }
> > +
> > +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> > +    if (first_hartid < 0) {
> > +        return false;
> > +    }
> > +
> > +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> > +    if (last_hartid < 0) {
> > +        return false;
> > +    }
> > +
> > +    for (i = first_hartid; i <= last_hartid; i++) {
> > +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> > +            return false;
> > +        }
> > +    }
> > +
> > +    return true;
> > +}
> > +
> > +uint64_t riscv_socket_mem_offset(const MachineState *ms, int
> > +socket_id) {
> > +    int i;
> > +    uint64_t mem_offset = 0;
> > +
> > +    if (!numa_enabled(ms)) {
> > +        return 0;
> > +    }
> > +
> > +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
> > +        if (i == socket_id) {
> > +            break;
> > +        }
> > +        mem_offset += ms->numa_state->nodes[i].node_mem;
> > +    }
> > +
> > +    return (i == socket_id) ? mem_offset : 0; }
> > +
> > +uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id)
> > +{
> > +    if (!numa_enabled(ms)) {
> > +        return (!socket_id) ? ms->ram_size : 0;
> > +    }
> > +
> > +    return (socket_id < ms->numa_state->num_nodes) ?
> > +            ms->numa_state->nodes[socket_id].node_mem : 0; }
> > +
> > +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> > +                               const char *node_name, int socket_id)
> > +{
> > +    if (numa_enabled(ms)) {
> > +        qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id",
> socket_id);
> > +    }
> > +}
> > +
> > +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms,
> > +void *fdt) {
> > +    int i, j, idx;
> > +    uint32_t *dist_matrix, dist_matrix_size;
> > +
> > +    if (numa_enabled(ms) && ms->numa_state->have_numa_distance) {
> > +        dist_matrix_size = riscv_socket_count(ms) * riscv_socket_count(ms);
> > +        dist_matrix_size *= (3 * sizeof(uint32_t));
> > +        dist_matrix = g_malloc0(dist_matrix_size);
> > +
> > +        for (i = 0; i < riscv_socket_count(ms); i++) {
> > +            for (j = 0; j < riscv_socket_count(ms); j++) {
> > +                idx = (i * riscv_socket_count(ms) + j) * 3;
> > +                dist_matrix[idx + 0] = cpu_to_be32(i);
> > +                dist_matrix[idx + 1] = cpu_to_be32(j);
> > +                dist_matrix[idx + 2] =
> > +                    cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
> > +            }
> > +        }
> > +
> > +        qemu_fdt_add_subnode(fdt, "/distance-map");
> > +        qemu_fdt_setprop_string(fdt, "/distance-map", "compatible",
> > +                                "numa-distance-map-v1");
> > +        qemu_fdt_setprop(fdt, "/distance-map", "distance-matrix",
> > +                         dist_matrix, dist_matrix_size);
> > +        g_free(dist_matrix);
> > +    }
> > +}
> > +
> > +CpuInstanceProperties
> > +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned
> cpu_index) {
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > +    const CPUArchIdList *possible_cpus =
> > +mc->possible_cpu_arch_ids(ms);
> > +
> > +    assert(cpu_index < possible_cpus->len);
> > +    return possible_cpus->cpus[cpu_index].props;
> > +}
> > +
> > +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms,
> > +int idx) {
> > +    int64_t nidx = 0;
> > +
> > +    if (ms->numa_state->num_nodes) {
> > +        nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
> > +        if (ms->numa_state->num_nodes <= nidx) {
> > +            nidx = ms->numa_state->num_nodes - 1;
> > +        }
> > +    }
> > +
> > +    return nidx;
> > +}
> > +
> > +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState
> > +*ms) {
> > +    int n;
> > +    unsigned int max_cpus = ms->smp.max_cpus;
> > +
> > +    if (ms->possible_cpus) {
> > +        assert(ms->possible_cpus->len == max_cpus);
> > +        return ms->possible_cpus;
> > +    }
> > +
> > +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> > +                                  sizeof(CPUArchId) * max_cpus);
> > +    ms->possible_cpus->len = max_cpus;
> > +    for (n = 0; n < ms->possible_cpus->len; n++) {
> > +        ms->possible_cpus->cpus[n].type = ms->cpu_type;
> > +        ms->possible_cpus->cpus[n].arch_id = n;
> > +        ms->possible_cpus->cpus[n].props.has_core_id = true;
> > +        ms->possible_cpus->cpus[n].props.core_id = n;
> > +    }
> > +
> > +    return ms->possible_cpus;
> > +}
> > diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h new
> > file mode 100644 index 0000000000..fd9517a315
> > --- /dev/null
> > +++ b/include/hw/riscv/numa.h
> > @@ -0,0 +1,51 @@
> > +/*
> > + * QEMU RISC-V NUMA Helper
> > + *
> > + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > +modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2 or later, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but
> > +WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> > +or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > +License for
> > + * more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > +along with
> > + * this program.  If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#ifndef RISCV_NUMA_H
> > +#define RISCV_NUMA_H
> > +
> > +#include "hw/sysbus.h"
> > +#include "sysemu/numa.h"
> > +
> > +int riscv_socket_count(const MachineState *ms);
> > +
> > +int riscv_socket_first_hartid(const MachineState *ms, int socket_id);
> > +
> > +int riscv_socket_last_hartid(const MachineState *ms, int socket_id);
> > +
> > +int riscv_socket_hart_count(const MachineState *ms, int socket_id);
> > +
> > +uint64_t riscv_socket_mem_offset(const MachineState *ms, int
> > +socket_id);
> > +
> > +uint64_t riscv_socket_mem_size(const MachineState *ms, int
> > +socket_id);
> > +
> > +bool riscv_socket_check_hartids(const MachineState *ms, int
> > +socket_id);
> > +
> > +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> > +                               const char *node_name, int socket_id);
> > +
> > +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms,
> > +void *fdt);
> > +
> > +CpuInstanceProperties
> > +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned
> cpu_index);
> > +
> > +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms,
> > +int idx);
> > +
> > +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState
> > +*ms);
> 
> Can we add some comments for the functions of what they are expected to
> return (and that -1 is an error)?
> 
> Alistair
> 
> > +
> > +#endif /* RISCV_NUMA_H */
> > --
> > 2.25.1
> >
> >


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
  2020-06-11 13:11     ` Anup Patel
@ 2020-06-13  0:52       ` Alistair Francis
  2020-06-13  1:12           ` Atish Patra
  0 siblings, 1 reply; 28+ messages in thread
From: Alistair Francis @ 2020-06-13  0:52 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, open list:RISC-V, Sagar Karandikar, Anup Patel,
	qemu-devel@nongnu.org Developers, Atish Patra, Alistair Francis,
	Palmer Dabbelt

On Thu, Jun 11, 2020 at 6:11 AM Anup Patel <Anup.Patel@wdc.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Qemu-riscv <qemu-riscv-
> > bounces+anup.patel=wdc.com@nongnu.org> On Behalf Of Alistair Francis
> > Sent: 11 June 2020 04:59
> > To: Anup Patel <Anup.Patel@wdc.com>
> > Cc: Peter Maydell <peter.maydell@linaro.org>; open list:RISC-V <qemu-
> > riscv@nongnu.org>; Sagar Karandikar <sagark@eecs.berkeley.edu>; Anup
> > Patel <anup@brainfault.org>; qemu-devel@nongnu.org Developers <qemu-
> > devel@nongnu.org>; Atish Patra <Atish.Patra@wdc.com>; Alistair Francis
> > <Alistair.Francis@wdc.com>; Palmer Dabbelt <palmer@dabbelt.com>
> > Subject: Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket
> > NUMA machines
> >
> > On Fri, May 29, 2020 at 4:48 AM Anup Patel <anup.patel@wdc.com> wrote:
> > >
> > > We add common helper routines which can be shared by RISC-V
> > > multi-socket NUMA machines.
> > >
> > > We have two types of helpers:
> > > 1. riscv_socket_xyz() - These helper assist managing multiple
> > >    sockets irrespective whether QEMU NUMA is enabled/disabled 2.
> > > riscv_numa_xyz() - These helpers assist in providing
> > >    necessary QEMU machine callbacks for QEMU NUMA emulation
> > >
> > > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > > ---
> > >  hw/riscv/Makefile.objs  |   1 +
> > >  hw/riscv/numa.c         | 242
> > ++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/riscv/numa.h |  51 +++++++++
> > >  3 files changed, 294 insertions(+)
> > >  create mode 100644 hw/riscv/numa.c
> > >  create mode 100644 include/hw/riscv/numa.h
> >
> > I don't love that we have an entire file of functions to help with NUMA when
> > no other arch seems to have anything this complex.
> >
> > What about RISC-V requires extra complexity?
>
> Other architectures, generally have one machine supporting NUMA.
>
> In QEMU RISC-V, we are supporting NUMA in two machines (i.e Virt
> and Spike). Both these machines, are synthetic machines and don't
> match real-world hardware. The Spike machine is even more unique
> because it has minimum number of devices and no interrupt controller.
>
> In future, we might have few more machines in QEMU RISC-V having
> NUMA/multi-socket support.
>
> Comparted to other architectures, the riscv_numa_xyz() callbacks
> defined here do:
> 1. Linear mapping of CPU arch_id to CPU logical idx
> 2. Linear assignment of node_id to CPU idx
>
> The requirement 2) mentioned above is because CLINT and PLIC
> device emulation require contiguous hard IDs in a socket.

Ok, fair enough :)

Do you mind sending a new version, I think the Spike part will need to
be rebased on top of the Spike machine changes.

Then just pressure Atish to ack the DT changes :P

Alistair

>
> Regards,
> Anup
>
> >
> > >
> > > diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs index
> > > fc3c6dd7c8..4483e61879 100644
> > > --- a/hw/riscv/Makefile.objs
> > > +++ b/hw/riscv/Makefile.objs
> > > @@ -1,4 +1,5 @@
> > >  obj-y += boot.o
> > > +obj-y += numa.o
> > >  obj-$(CONFIG_SPIKE) += riscv_htif.o
> > >  obj-$(CONFIG_HART) += riscv_hart.o
> > >  obj-$(CONFIG_SIFIVE_E) += sifive_e.o
> > > diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c new file mode 100644
> > > index 0000000000..4f92307102
> > > --- /dev/null
> > > +++ b/hw/riscv/numa.c
> > > @@ -0,0 +1,242 @@
> > > +/*
> > > + * QEMU RISC-V NUMA Helper
> > > + *
> > > + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > +modify it
> > > + * under the terms and conditions of the GNU General Public License,
> > > + * version 2 or later, as published by the Free Software Foundation.
> > > + *
> > > + * This program is distributed in the hope it will be useful, but
> > > +WITHOUT
> > > + * ANY WARRANTY; without even the implied warranty of
> > MERCHANTABILITY
> > > +or
> > > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > > +License for
> > > + * more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public License
> > > +along with
> > > + * this program.  If not, see <http://www.gnu.org/licenses/>.
> > > + */
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "qemu/units.h"
> > > +#include "qemu/log.h"
> > > +#include "qemu/error-report.h"
> > > +#include "qapi/error.h"
> > > +#include "hw/boards.h"
> > > +#include "hw/qdev-properties.h"
> > > +#include "hw/riscv/numa.h"
> > > +#include "sysemu/device_tree.h"
> > > +
> > > +static bool numa_enabled(const MachineState *ms) {
> > > +    return (ms->numa_state && ms->numa_state->num_nodes) ? true :
> > > +false; }
> > > +
> > > +int riscv_socket_count(const MachineState *ms) {
> > > +    return (numa_enabled(ms)) ? ms->numa_state->num_nodes : 1; }
> > > +
> > > +int riscv_socket_first_hartid(const MachineState *ms, int socket_id)
> > > +{
> > > +    int i, first_hartid = ms->smp.cpus;
> > > +
> > > +    if (!numa_enabled(ms)) {
> > > +        return (!socket_id) ? 0 : -1;
> > > +    }
> > > +
> > > +    for (i = 0; i < ms->smp.cpus; i++) {
> > > +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> > > +            continue;
> > > +        }
> > > +        if (i < first_hartid) {
> > > +            first_hartid = i;
> > > +        }
> > > +    }
> > > +
> > > +    return (first_hartid < ms->smp.cpus) ? first_hartid : -1; }
> > > +
> > > +int riscv_socket_last_hartid(const MachineState *ms, int socket_id) {
> > > +    int i, last_hartid = -1;
> > > +
> > > +    if (!numa_enabled(ms)) {
> > > +        return (!socket_id) ? ms->smp.cpus - 1 : -1;
> > > +    }
> > > +
> > > +    for (i = 0; i < ms->smp.cpus; i++) {
> > > +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> > > +            continue;
> > > +        }
> > > +        if (i > last_hartid) {
> > > +            last_hartid = i;
> > > +        }
> > > +    }
> > > +
> > > +    return (last_hartid < ms->smp.cpus) ? last_hartid : -1; }
> > > +
> > > +int riscv_socket_hart_count(const MachineState *ms, int socket_id) {
> > > +    int first_hartid, last_hartid;
> > > +
> > > +    if (!numa_enabled(ms)) {
> > > +        return (!socket_id) ? ms->smp.cpus : -1;
> > > +    }
> > > +
> > > +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> > > +    if (first_hartid < 0) {
> > > +        return -1;
> > > +    }
> > > +
> > > +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> > > +    if (last_hartid < 0) {
> > > +        return -1;
> > > +    }
> > > +
> > > +    if (first_hartid > last_hartid) {
> > > +        return -1;
> > > +    }
> > > +
> > > +    return last_hartid - first_hartid + 1; }
> > > +
> > > +bool riscv_socket_check_hartids(const MachineState *ms, int
> > > +socket_id) {
> > > +    int i, first_hartid, last_hartid;
> > > +
> > > +    if (!numa_enabled(ms)) {
> > > +        return (!socket_id) ? true : false;
> > > +    }
> > > +
> > > +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> > > +    if (first_hartid < 0) {
> > > +        return false;
> > > +    }
> > > +
> > > +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> > > +    if (last_hartid < 0) {
> > > +        return false;
> > > +    }
> > > +
> > > +    for (i = first_hartid; i <= last_hartid; i++) {
> > > +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> > > +            return false;
> > > +        }
> > > +    }
> > > +
> > > +    return true;
> > > +}
> > > +
> > > +uint64_t riscv_socket_mem_offset(const MachineState *ms, int
> > > +socket_id) {
> > > +    int i;
> > > +    uint64_t mem_offset = 0;
> > > +
> > > +    if (!numa_enabled(ms)) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
> > > +        if (i == socket_id) {
> > > +            break;
> > > +        }
> > > +        mem_offset += ms->numa_state->nodes[i].node_mem;
> > > +    }
> > > +
> > > +    return (i == socket_id) ? mem_offset : 0; }
> > > +
> > > +uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id)
> > > +{
> > > +    if (!numa_enabled(ms)) {
> > > +        return (!socket_id) ? ms->ram_size : 0;
> > > +    }
> > > +
> > > +    return (socket_id < ms->numa_state->num_nodes) ?
> > > +            ms->numa_state->nodes[socket_id].node_mem : 0; }
> > > +
> > > +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> > > +                               const char *node_name, int socket_id)
> > > +{
> > > +    if (numa_enabled(ms)) {
> > > +        qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id",
> > socket_id);
> > > +    }
> > > +}
> > > +
> > > +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms,
> > > +void *fdt) {
> > > +    int i, j, idx;
> > > +    uint32_t *dist_matrix, dist_matrix_size;
> > > +
> > > +    if (numa_enabled(ms) && ms->numa_state->have_numa_distance) {
> > > +        dist_matrix_size = riscv_socket_count(ms) * riscv_socket_count(ms);
> > > +        dist_matrix_size *= (3 * sizeof(uint32_t));
> > > +        dist_matrix = g_malloc0(dist_matrix_size);
> > > +
> > > +        for (i = 0; i < riscv_socket_count(ms); i++) {
> > > +            for (j = 0; j < riscv_socket_count(ms); j++) {
> > > +                idx = (i * riscv_socket_count(ms) + j) * 3;
> > > +                dist_matrix[idx + 0] = cpu_to_be32(i);
> > > +                dist_matrix[idx + 1] = cpu_to_be32(j);
> > > +                dist_matrix[idx + 2] =
> > > +                    cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
> > > +            }
> > > +        }
> > > +
> > > +        qemu_fdt_add_subnode(fdt, "/distance-map");
> > > +        qemu_fdt_setprop_string(fdt, "/distance-map", "compatible",
> > > +                                "numa-distance-map-v1");
> > > +        qemu_fdt_setprop(fdt, "/distance-map", "distance-matrix",
> > > +                         dist_matrix, dist_matrix_size);
> > > +        g_free(dist_matrix);
> > > +    }
> > > +}
> > > +
> > > +CpuInstanceProperties
> > > +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned
> > cpu_index) {
> > > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > +    const CPUArchIdList *possible_cpus =
> > > +mc->possible_cpu_arch_ids(ms);
> > > +
> > > +    assert(cpu_index < possible_cpus->len);
> > > +    return possible_cpus->cpus[cpu_index].props;
> > > +}
> > > +
> > > +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms,
> > > +int idx) {
> > > +    int64_t nidx = 0;
> > > +
> > > +    if (ms->numa_state->num_nodes) {
> > > +        nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
> > > +        if (ms->numa_state->num_nodes <= nidx) {
> > > +            nidx = ms->numa_state->num_nodes - 1;
> > > +        }
> > > +    }
> > > +
> > > +    return nidx;
> > > +}
> > > +
> > > +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState
> > > +*ms) {
> > > +    int n;
> > > +    unsigned int max_cpus = ms->smp.max_cpus;
> > > +
> > > +    if (ms->possible_cpus) {
> > > +        assert(ms->possible_cpus->len == max_cpus);
> > > +        return ms->possible_cpus;
> > > +    }
> > > +
> > > +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> > > +                                  sizeof(CPUArchId) * max_cpus);
> > > +    ms->possible_cpus->len = max_cpus;
> > > +    for (n = 0; n < ms->possible_cpus->len; n++) {
> > > +        ms->possible_cpus->cpus[n].type = ms->cpu_type;
> > > +        ms->possible_cpus->cpus[n].arch_id = n;
> > > +        ms->possible_cpus->cpus[n].props.has_core_id = true;
> > > +        ms->possible_cpus->cpus[n].props.core_id = n;
> > > +    }
> > > +
> > > +    return ms->possible_cpus;
> > > +}
> > > diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h new
> > > file mode 100644 index 0000000000..fd9517a315
> > > --- /dev/null
> > > +++ b/include/hw/riscv/numa.h
> > > @@ -0,0 +1,51 @@
> > > +/*
> > > + * QEMU RISC-V NUMA Helper
> > > + *
> > > + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > +modify it
> > > + * under the terms and conditions of the GNU General Public License,
> > > + * version 2 or later, as published by the Free Software Foundation.
> > > + *
> > > + * This program is distributed in the hope it will be useful, but
> > > +WITHOUT
> > > + * ANY WARRANTY; without even the implied warranty of
> > MERCHANTABILITY
> > > +or
> > > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > > +License for
> > > + * more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public License
> > > +along with
> > > + * this program.  If not, see <http://www.gnu.org/licenses/>.
> > > + */
> > > +
> > > +#ifndef RISCV_NUMA_H
> > > +#define RISCV_NUMA_H
> > > +
> > > +#include "hw/sysbus.h"
> > > +#include "sysemu/numa.h"
> > > +
> > > +int riscv_socket_count(const MachineState *ms);
> > > +
> > > +int riscv_socket_first_hartid(const MachineState *ms, int socket_id);
> > > +
> > > +int riscv_socket_last_hartid(const MachineState *ms, int socket_id);
> > > +
> > > +int riscv_socket_hart_count(const MachineState *ms, int socket_id);
> > > +
> > > +uint64_t riscv_socket_mem_offset(const MachineState *ms, int
> > > +socket_id);
> > > +
> > > +uint64_t riscv_socket_mem_size(const MachineState *ms, int
> > > +socket_id);
> > > +
> > > +bool riscv_socket_check_hartids(const MachineState *ms, int
> > > +socket_id);
> > > +
> > > +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> > > +                               const char *node_name, int socket_id);
> > > +
> > > +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms,
> > > +void *fdt);
> > > +
> > > +CpuInstanceProperties
> > > +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned
> > cpu_index);
> > > +
> > > +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms,
> > > +int idx);
> > > +
> > > +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState
> > > +*ms);
> >
> > Can we add some comments for the functions of what they are expected to
> > return (and that -1 is an error)?
> >
> > Alistair
> >
> > > +
> > > +#endif /* RISCV_NUMA_H */
> > > --
> > > 2.25.1
> > >
> > >
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
  2020-06-13  0:52       ` Alistair Francis
@ 2020-06-13  1:12           ` Atish Patra
  0 siblings, 0 replies; 28+ messages in thread
From: Atish Patra @ 2020-06-13  1:12 UTC (permalink / raw)
  To: alistair23, Anup Patel
  Cc: peter.maydell, qemu-riscv, sagark, anup, qemu-devel,
	Alistair Francis, palmer

On Fri, 2020-06-12 at 17:52 -0700, Alistair Francis wrote:
> On Thu, Jun 11, 2020 at 6:11 AM Anup Patel <Anup.Patel@wdc.com>
> wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Qemu-riscv <qemu-riscv-
> > > bounces+anup.patel=wdc.com@nongnu.org> On Behalf Of Alistair
> > > Francis
> > > Sent: 11 June 2020 04:59
> > > To: Anup Patel <Anup.Patel@wdc.com>
> > > Cc: Peter Maydell <peter.maydell@linaro.org>; open list:RISC-V
> > > <qemu-
> > > riscv@nongnu.org>; Sagar Karandikar <sagark@eecs.berkeley.edu>;
> > > Anup
> > > Patel <anup@brainfault.org>; qemu-devel@nongnu.org Developers
> > > <qemu-
> > > devel@nongnu.org>; Atish Patra <Atish.Patra@wdc.com>; Alistair
> > > Francis
> > > <Alistair.Francis@wdc.com>; Palmer Dabbelt <palmer@dabbelt.com>
> > > Subject: Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V
> > > multi-socket
> > > NUMA machines
> > > 
> > > On Fri, May 29, 2020 at 4:48 AM Anup Patel <anup.patel@wdc.com>
> > > wrote:
> > > > We add common helper routines which can be shared by RISC-V
> > > > multi-socket NUMA machines.
> > > > 
> > > > We have two types of helpers:
> > > > 1. riscv_socket_xyz() - These helper assist managing multiple
> > > >    sockets irrespective whether QEMU NUMA is enabled/disabled
> > > > 2.
> > > > riscv_numa_xyz() - These helpers assist in providing
> > > >    necessary QEMU machine callbacks for QEMU NUMA emulation
> > > > 
> > > > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > > > ---
> > > >  hw/riscv/Makefile.objs  |   1 +
> > > >  hw/riscv/numa.c         | 242
> > > ++++++++++++++++++++++++++++++++++++++++
> > > >  include/hw/riscv/numa.h |  51 +++++++++
> > > >  3 files changed, 294 insertions(+)
> > > >  create mode 100644 hw/riscv/numa.c
> > > >  create mode 100644 include/hw/riscv/numa.h
> > > 
> > > I don't love that we have an entire file of functions to help
> > > with NUMA when
> > > no other arch seems to have anything this complex.
> > > 
> > > What about RISC-V requires extra complexity?
> > 
> > Other architectures, generally have one machine supporting NUMA.
> > 
> > In QEMU RISC-V, we are supporting NUMA in two machines (i.e Virt
> > and Spike). Both these machines, are synthetic machines and don't
> > match real-world hardware. The Spike machine is even more unique
> > because it has minimum number of devices and no interrupt
> > controller.
> > 
> > In future, we might have few more machines in QEMU RISC-V having
> > NUMA/multi-socket support.
> > 
> > Comparted to other architectures, the riscv_numa_xyz() callbacks
> > defined here do:
> > 1. Linear mapping of CPU arch_id to CPU logical idx
> > 2. Linear assignment of node_id to CPU idx
> > 
> > The requirement 2) mentioned above is because CLINT and PLIC
> > device emulation require contiguous hard IDs in a socket.
> 
> Ok, fair enough :)
> 
> Do you mind sending a new version, I think the Spike part will need
> to
> be rebased on top of the Spike machine changes.
> 
> Then just pressure Atish to ack the DT changes :P
> 

Since you had comments on v5, I thought I would just defer the review
to the next version ;)

Jokes apart, I will do it tonight either on v5 or v6 whatever is the
latest. I don't think the topology mapping would change in between v5 &
v6.

> Alistair
> 
> > Regards,
> > Anup
> > 
> > > > diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
> > > > index
> > > > fc3c6dd7c8..4483e61879 100644
> > > > --- a/hw/riscv/Makefile.objs
> > > > +++ b/hw/riscv/Makefile.objs
> > > > @@ -1,4 +1,5 @@
> > > >  obj-y += boot.o
> > > > +obj-y += numa.o
> > > >  obj-$(CONFIG_SPIKE) += riscv_htif.o
> > > >  obj-$(CONFIG_HART) += riscv_hart.o
> > > >  obj-$(CONFIG_SIFIVE_E) += sifive_e.o
> > > > diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c new file mode
> > > > 100644
> > > > index 0000000000..4f92307102
> > > > --- /dev/null
> > > > +++ b/hw/riscv/numa.c
> > > > @@ -0,0 +1,242 @@
> > > > +/*
> > > > + * QEMU RISC-V NUMA Helper
> > > > + *
> > > > + * Copyright (c) 2020 Western Digital Corporation or its
> > > > affiliates.
> > > > + *
> > > > + * This program is free software; you can redistribute it
> > > > and/or
> > > > +modify it
> > > > + * under the terms and conditions of the GNU General Public
> > > > License,
> > > > + * version 2 or later, as published by the Free Software
> > > > Foundation.
> > > > + *
> > > > + * This program is distributed in the hope it will be useful,
> > > > but
> > > > +WITHOUT
> > > > + * ANY WARRANTY; without even the implied warranty of
> > > MERCHANTABILITY
> > > > +or
> > > > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
> > > > Public
> > > > +License for
> > > > + * more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public
> > > > License
> > > > +along with
> > > > + * this program.  If not, see <http://www.gnu.org/licenses/>;.
> > > > + */
> > > > +
> > > > +#include "qemu/osdep.h"
> > > > +#include "qemu/units.h"
> > > > +#include "qemu/log.h"
> > > > +#include "qemu/error-report.h"
> > > > +#include "qapi/error.h"
> > > > +#include "hw/boards.h"
> > > > +#include "hw/qdev-properties.h"
> > > > +#include "hw/riscv/numa.h"
> > > > +#include "sysemu/device_tree.h"
> > > > +
> > > > +static bool numa_enabled(const MachineState *ms) {
> > > > +    return (ms->numa_state && ms->numa_state->num_nodes) ?
> > > > true :
> > > > +false; }
> > > > +
> > > > +int riscv_socket_count(const MachineState *ms) {
> > > > +    return (numa_enabled(ms)) ? ms->numa_state->num_nodes : 1;
> > > > }
> > > > +
> > > > +int riscv_socket_first_hartid(const MachineState *ms, int
> > > > socket_id)
> > > > +{
> > > > +    int i, first_hartid = ms->smp.cpus;
> > > > +
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return (!socket_id) ? 0 : -1;
> > > > +    }
> > > > +
> > > > +    for (i = 0; i < ms->smp.cpus; i++) {
> > > > +        if (ms->possible_cpus->cpus[i].props.node_id !=
> > > > socket_id) {
> > > > +            continue;
> > > > +        }
> > > > +        if (i < first_hartid) {
> > > > +            first_hartid = i;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return (first_hartid < ms->smp.cpus) ? first_hartid : -1;
> > > > }
> > > > +
> > > > +int riscv_socket_last_hartid(const MachineState *ms, int
> > > > socket_id) {
> > > > +    int i, last_hartid = -1;
> > > > +
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return (!socket_id) ? ms->smp.cpus - 1 : -1;
> > > > +    }
> > > > +
> > > > +    for (i = 0; i < ms->smp.cpus; i++) {
> > > > +        if (ms->possible_cpus->cpus[i].props.node_id !=
> > > > socket_id) {
> > > > +            continue;
> > > > +        }
> > > > +        if (i > last_hartid) {
> > > > +            last_hartid = i;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return (last_hartid < ms->smp.cpus) ? last_hartid : -1; }
> > > > +
> > > > +int riscv_socket_hart_count(const MachineState *ms, int
> > > > socket_id) {
> > > > +    int first_hartid, last_hartid;
> > > > +
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return (!socket_id) ? ms->smp.cpus : -1;
> > > > +    }
> > > > +
> > > > +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> > > > +    if (first_hartid < 0) {
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> > > > +    if (last_hartid < 0) {
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    if (first_hartid > last_hartid) {
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    return last_hartid - first_hartid + 1; }
> > > > +
> > > > +bool riscv_socket_check_hartids(const MachineState *ms, int
> > > > +socket_id) {
> > > > +    int i, first_hartid, last_hartid;
> > > > +
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return (!socket_id) ? true : false;
> > > > +    }
> > > > +
> > > > +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> > > > +    if (first_hartid < 0) {
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> > > > +    if (last_hartid < 0) {
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    for (i = first_hartid; i <= last_hartid; i++) {
> > > > +        if (ms->possible_cpus->cpus[i].props.node_id !=
> > > > socket_id) {
> > > > +            return false;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return true;
> > > > +}
> > > > +
> > > > +uint64_t riscv_socket_mem_offset(const MachineState *ms, int
> > > > +socket_id) {
> > > > +    int i;
> > > > +    uint64_t mem_offset = 0;
> > > > +
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return 0;
> > > > +    }
> > > > +
> > > > +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
> > > > +        if (i == socket_id) {
> > > > +            break;
> > > > +        }
> > > > +        mem_offset += ms->numa_state->nodes[i].node_mem;
> > > > +    }
> > > > +
> > > > +    return (i == socket_id) ? mem_offset : 0; }
> > > > +
> > > > +uint64_t riscv_socket_mem_size(const MachineState *ms, int
> > > > socket_id)
> > > > +{
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return (!socket_id) ? ms->ram_size : 0;
> > > > +    }
> > > > +
> > > > +    return (socket_id < ms->numa_state->num_nodes) ?
> > > > +            ms->numa_state->nodes[socket_id].node_mem : 0; }
> > > > +
> > > > +void riscv_socket_fdt_write_id(const MachineState *ms, void
> > > > *fdt,
> > > > +                               const char *node_name, int
> > > > socket_id)
> > > > +{
> > > > +    if (numa_enabled(ms)) {
> > > > +        qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id",
> > > socket_id);
> > > > +    }
> > > > +}
> > > > +
> > > > +void riscv_socket_fdt_write_distance_matrix(const MachineState
> > > > *ms,
> > > > +void *fdt) {
> > > > +    int i, j, idx;
> > > > +    uint32_t *dist_matrix, dist_matrix_size;
> > > > +
> > > > +    if (numa_enabled(ms) && ms->numa_state-
> > > > >have_numa_distance) {
> > > > +        dist_matrix_size = riscv_socket_count(ms) *
> > > > riscv_socket_count(ms);
> > > > +        dist_matrix_size *= (3 * sizeof(uint32_t));
> > > > +        dist_matrix = g_malloc0(dist_matrix_size);
> > > > +
> > > > +        for (i = 0; i < riscv_socket_count(ms); i++) {
> > > > +            for (j = 0; j < riscv_socket_count(ms); j++) {
> > > > +                idx = (i * riscv_socket_count(ms) + j) * 3;
> > > > +                dist_matrix[idx + 0] = cpu_to_be32(i);
> > > > +                dist_matrix[idx + 1] = cpu_to_be32(j);
> > > > +                dist_matrix[idx + 2] =
> > > > +                    cpu_to_be32(ms->numa_state-
> > > > >nodes[i].distance[j]);
> > > > +            }
> > > > +        }
> > > > +
> > > > +        qemu_fdt_add_subnode(fdt, "/distance-map");
> > > > +        qemu_fdt_setprop_string(fdt, "/distance-map",
> > > > "compatible",
> > > > +                                "numa-distance-map-v1");
> > > > +        qemu_fdt_setprop(fdt, "/distance-map", "distance-
> > > > matrix",
> > > > +                         dist_matrix, dist_matrix_size);
> > > > +        g_free(dist_matrix);
> > > > +    }
> > > > +}
> > > > +
> > > > +CpuInstanceProperties
> > > > +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned
> > > cpu_index) {
> > > > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > > +    const CPUArchIdList *possible_cpus =
> > > > +mc->possible_cpu_arch_ids(ms);
> > > > +
> > > > +    assert(cpu_index < possible_cpus->len);
> > > > +    return possible_cpus->cpus[cpu_index].props;
> > > > +}
> > > > +
> > > > +int64_t riscv_numa_get_default_cpu_node_id(const MachineState
> > > > *ms,
> > > > +int idx) {
> > > > +    int64_t nidx = 0;
> > > > +
> > > > +    if (ms->numa_state->num_nodes) {
> > > > +        nidx = idx / (ms->smp.cpus / ms->numa_state-
> > > > >num_nodes);
> > > > +        if (ms->numa_state->num_nodes <= nidx) {
> > > > +            nidx = ms->numa_state->num_nodes - 1;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return nidx;
> > > > +}
> > > > +
> > > > +const CPUArchIdList
> > > > *riscv_numa_possible_cpu_arch_ids(MachineState
> > > > +*ms) {
> > > > +    int n;
> > > > +    unsigned int max_cpus = ms->smp.max_cpus;
> > > > +
> > > > +    if (ms->possible_cpus) {
> > > > +        assert(ms->possible_cpus->len == max_cpus);
> > > > +        return ms->possible_cpus;
> > > > +    }
> > > > +
> > > > +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> > > > +                                  sizeof(CPUArchId) *
> > > > max_cpus);
> > > > +    ms->possible_cpus->len = max_cpus;
> > > > +    for (n = 0; n < ms->possible_cpus->len; n++) {
> > > > +        ms->possible_cpus->cpus[n].type = ms->cpu_type;
> > > > +        ms->possible_cpus->cpus[n].arch_id = n;
> > > > +        ms->possible_cpus->cpus[n].props.has_core_id = true;
> > > > +        ms->possible_cpus->cpus[n].props.core_id = n;
> > > > +    }
> > > > +
> > > > +    return ms->possible_cpus;
> > > > +}
> > > > diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h
> > > > new
> > > > file mode 100644 index 0000000000..fd9517a315
> > > > --- /dev/null
> > > > +++ b/include/hw/riscv/numa.h
> > > > @@ -0,0 +1,51 @@
> > > > +/*
> > > > + * QEMU RISC-V NUMA Helper
> > > > + *
> > > > + * Copyright (c) 2020 Western Digital Corporation or its
> > > > affiliates.
> > > > + *
> > > > + * This program is free software; you can redistribute it
> > > > and/or
> > > > +modify it
> > > > + * under the terms and conditions of the GNU General Public
> > > > License,
> > > > + * version 2 or later, as published by the Free Software
> > > > Foundation.
> > > > + *
> > > > + * This program is distributed in the hope it will be useful,
> > > > but
> > > > +WITHOUT
> > > > + * ANY WARRANTY; without even the implied warranty of
> > > MERCHANTABILITY
> > > > +or
> > > > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
> > > > Public
> > > > +License for
> > > > + * more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public
> > > > License
> > > > +along with
> > > > + * this program.  If not, see <http://www.gnu.org/licenses/>;.
> > > > + */
> > > > +
> > > > +#ifndef RISCV_NUMA_H
> > > > +#define RISCV_NUMA_H
> > > > +
> > > > +#include "hw/sysbus.h"
> > > > +#include "sysemu/numa.h"
> > > > +
> > > > +int riscv_socket_count(const MachineState *ms);
> > > > +
> > > > +int riscv_socket_first_hartid(const MachineState *ms, int
> > > > socket_id);
> > > > +
> > > > +int riscv_socket_last_hartid(const MachineState *ms, int
> > > > socket_id);
> > > > +
> > > > +int riscv_socket_hart_count(const MachineState *ms, int
> > > > socket_id);
> > > > +
> > > > +uint64_t riscv_socket_mem_offset(const MachineState *ms, int
> > > > +socket_id);
> > > > +
> > > > +uint64_t riscv_socket_mem_size(const MachineState *ms, int
> > > > +socket_id);
> > > > +
> > > > +bool riscv_socket_check_hartids(const MachineState *ms, int
> > > > +socket_id);
> > > > +
> > > > +void riscv_socket_fdt_write_id(const MachineState *ms, void
> > > > *fdt,
> > > > +                               const char *node_name, int
> > > > socket_id);
> > > > +
> > > > +void riscv_socket_fdt_write_distance_matrix(const MachineState
> > > > *ms,
> > > > +void *fdt);
> > > > +
> > > > +CpuInstanceProperties
> > > > +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned
> > > cpu_index);
> > > > +
> > > > +int64_t riscv_numa_get_default_cpu_node_id(const MachineState
> > > > *ms,
> > > > +int idx);
> > > > +
> > > > +const CPUArchIdList
> > > > *riscv_numa_possible_cpu_arch_ids(MachineState
> > > > +*ms);
> > > 
> > > Can we add some comments for the functions of what they are
> > > expected to
> > > return (and that -1 is an error)?
> > > 
> > > Alistair
> > > 
> > > > +
> > > > +#endif /* RISCV_NUMA_H */
> > > > --
> > > > 2.25.1
> > > > 
> > > > 

-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
@ 2020-06-13  1:12           ` Atish Patra
  0 siblings, 0 replies; 28+ messages in thread
From: Atish Patra @ 2020-06-13  1:12 UTC (permalink / raw)
  To: alistair23, Anup Patel
  Cc: anup, qemu-riscv, peter.maydell, sagark, Alistair Francis,
	qemu-devel, palmer

On Fri, 2020-06-12 at 17:52 -0700, Alistair Francis wrote:
> On Thu, Jun 11, 2020 at 6:11 AM Anup Patel <Anup.Patel@wdc.com>
> wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Qemu-riscv <qemu-riscv-
> > > bounces+anup.patel=wdc.com@nongnu.org> On Behalf Of Alistair
> > > Francis
> > > Sent: 11 June 2020 04:59
> > > To: Anup Patel <Anup.Patel@wdc.com>
> > > Cc: Peter Maydell <peter.maydell@linaro.org>; open list:RISC-V
> > > <qemu-
> > > riscv@nongnu.org>; Sagar Karandikar <sagark@eecs.berkeley.edu>;
> > > Anup
> > > Patel <anup@brainfault.org>; qemu-devel@nongnu.org Developers
> > > <qemu-
> > > devel@nongnu.org>; Atish Patra <Atish.Patra@wdc.com>; Alistair
> > > Francis
> > > <Alistair.Francis@wdc.com>; Palmer Dabbelt <palmer@dabbelt.com>
> > > Subject: Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V
> > > multi-socket
> > > NUMA machines
> > > 
> > > On Fri, May 29, 2020 at 4:48 AM Anup Patel <anup.patel@wdc.com>
> > > wrote:
> > > > We add common helper routines which can be shared by RISC-V
> > > > multi-socket NUMA machines.
> > > > 
> > > > We have two types of helpers:
> > > > 1. riscv_socket_xyz() - These helper assist managing multiple
> > > >    sockets irrespective whether QEMU NUMA is enabled/disabled
> > > > 2.
> > > > riscv_numa_xyz() - These helpers assist in providing
> > > >    necessary QEMU machine callbacks for QEMU NUMA emulation
> > > > 
> > > > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > > > ---
> > > >  hw/riscv/Makefile.objs  |   1 +
> > > >  hw/riscv/numa.c         | 242
> > > ++++++++++++++++++++++++++++++++++++++++
> > > >  include/hw/riscv/numa.h |  51 +++++++++
> > > >  3 files changed, 294 insertions(+)
> > > >  create mode 100644 hw/riscv/numa.c
> > > >  create mode 100644 include/hw/riscv/numa.h
> > > 
> > > I don't love that we have an entire file of functions to help
> > > with NUMA when
> > > no other arch seems to have anything this complex.
> > > 
> > > What about RISC-V requires extra complexity?
> > 
> > Other architectures, generally have one machine supporting NUMA.
> > 
> > In QEMU RISC-V, we are supporting NUMA in two machines (i.e Virt
> > and Spike). Both these machines, are synthetic machines and don't
> > match real-world hardware. The Spike machine is even more unique
> > because it has minimum number of devices and no interrupt
> > controller.
> > 
> > In future, we might have few more machines in QEMU RISC-V having
> > NUMA/multi-socket support.
> > 
> > Comparted to other architectures, the riscv_numa_xyz() callbacks
> > defined here do:
> > 1. Linear mapping of CPU arch_id to CPU logical idx
> > 2. Linear assignment of node_id to CPU idx
> > 
> > The requirement 2) mentioned above is because CLINT and PLIC
> > device emulation require contiguous hard IDs in a socket.
> 
> Ok, fair enough :)
> 
> Do you mind sending a new version, I think the Spike part will need
> to
> be rebased on top of the Spike machine changes.
> 
> Then just pressure Atish to ack the DT changes :P
> 

Since you had comments on v5, I thought I would just defer the review
to the next version ;)

Jokes apart, I will do it tonight either on v5 or v6 whatever is the
latest. I don't think the topology mapping would change in between v5 &
v6.

> Alistair
> 
> > Regards,
> > Anup
> > 
> > > > diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
> > > > index
> > > > fc3c6dd7c8..4483e61879 100644
> > > > --- a/hw/riscv/Makefile.objs
> > > > +++ b/hw/riscv/Makefile.objs
> > > > @@ -1,4 +1,5 @@
> > > >  obj-y += boot.o
> > > > +obj-y += numa.o
> > > >  obj-$(CONFIG_SPIKE) += riscv_htif.o
> > > >  obj-$(CONFIG_HART) += riscv_hart.o
> > > >  obj-$(CONFIG_SIFIVE_E) += sifive_e.o
> > > > diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c new file mode
> > > > 100644
> > > > index 0000000000..4f92307102
> > > > --- /dev/null
> > > > +++ b/hw/riscv/numa.c
> > > > @@ -0,0 +1,242 @@
> > > > +/*
> > > > + * QEMU RISC-V NUMA Helper
> > > > + *
> > > > + * Copyright (c) 2020 Western Digital Corporation or its
> > > > affiliates.
> > > > + *
> > > > + * This program is free software; you can redistribute it
> > > > and/or
> > > > +modify it
> > > > + * under the terms and conditions of the GNU General Public
> > > > License,
> > > > + * version 2 or later, as published by the Free Software
> > > > Foundation.
> > > > + *
> > > > + * This program is distributed in the hope it will be useful,
> > > > but
> > > > +WITHOUT
> > > > + * ANY WARRANTY; without even the implied warranty of
> > > MERCHANTABILITY
> > > > +or
> > > > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
> > > > Public
> > > > +License for
> > > > + * more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public
> > > > License
> > > > +along with
> > > > + * this program.  If not, see <http://www.gnu.org/licenses/>;.
> > > > + */
> > > > +
> > > > +#include "qemu/osdep.h"
> > > > +#include "qemu/units.h"
> > > > +#include "qemu/log.h"
> > > > +#include "qemu/error-report.h"
> > > > +#include "qapi/error.h"
> > > > +#include "hw/boards.h"
> > > > +#include "hw/qdev-properties.h"
> > > > +#include "hw/riscv/numa.h"
> > > > +#include "sysemu/device_tree.h"
> > > > +
> > > > +static bool numa_enabled(const MachineState *ms) {
> > > > +    return (ms->numa_state && ms->numa_state->num_nodes) ?
> > > > true :
> > > > +false; }
> > > > +
> > > > +int riscv_socket_count(const MachineState *ms) {
> > > > +    return (numa_enabled(ms)) ? ms->numa_state->num_nodes : 1;
> > > > }
> > > > +
> > > > +int riscv_socket_first_hartid(const MachineState *ms, int
> > > > socket_id)
> > > > +{
> > > > +    int i, first_hartid = ms->smp.cpus;
> > > > +
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return (!socket_id) ? 0 : -1;
> > > > +    }
> > > > +
> > > > +    for (i = 0; i < ms->smp.cpus; i++) {
> > > > +        if (ms->possible_cpus->cpus[i].props.node_id !=
> > > > socket_id) {
> > > > +            continue;
> > > > +        }
> > > > +        if (i < first_hartid) {
> > > > +            first_hartid = i;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return (first_hartid < ms->smp.cpus) ? first_hartid : -1;
> > > > }
> > > > +
> > > > +int riscv_socket_last_hartid(const MachineState *ms, int
> > > > socket_id) {
> > > > +    int i, last_hartid = -1;
> > > > +
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return (!socket_id) ? ms->smp.cpus - 1 : -1;
> > > > +    }
> > > > +
> > > > +    for (i = 0; i < ms->smp.cpus; i++) {
> > > > +        if (ms->possible_cpus->cpus[i].props.node_id !=
> > > > socket_id) {
> > > > +            continue;
> > > > +        }
> > > > +        if (i > last_hartid) {
> > > > +            last_hartid = i;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return (last_hartid < ms->smp.cpus) ? last_hartid : -1; }
> > > > +
> > > > +int riscv_socket_hart_count(const MachineState *ms, int
> > > > socket_id) {
> > > > +    int first_hartid, last_hartid;
> > > > +
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return (!socket_id) ? ms->smp.cpus : -1;
> > > > +    }
> > > > +
> > > > +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> > > > +    if (first_hartid < 0) {
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> > > > +    if (last_hartid < 0) {
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    if (first_hartid > last_hartid) {
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    return last_hartid - first_hartid + 1; }
> > > > +
> > > > +bool riscv_socket_check_hartids(const MachineState *ms, int
> > > > +socket_id) {
> > > > +    int i, first_hartid, last_hartid;
> > > > +
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return (!socket_id) ? true : false;
> > > > +    }
> > > > +
> > > > +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> > > > +    if (first_hartid < 0) {
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> > > > +    if (last_hartid < 0) {
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    for (i = first_hartid; i <= last_hartid; i++) {
> > > > +        if (ms->possible_cpus->cpus[i].props.node_id !=
> > > > socket_id) {
> > > > +            return false;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return true;
> > > > +}
> > > > +
> > > > +uint64_t riscv_socket_mem_offset(const MachineState *ms, int
> > > > +socket_id) {
> > > > +    int i;
> > > > +    uint64_t mem_offset = 0;
> > > > +
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return 0;
> > > > +    }
> > > > +
> > > > +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
> > > > +        if (i == socket_id) {
> > > > +            break;
> > > > +        }
> > > > +        mem_offset += ms->numa_state->nodes[i].node_mem;
> > > > +    }
> > > > +
> > > > +    return (i == socket_id) ? mem_offset : 0; }
> > > > +
> > > > +uint64_t riscv_socket_mem_size(const MachineState *ms, int
> > > > socket_id)
> > > > +{
> > > > +    if (!numa_enabled(ms)) {
> > > > +        return (!socket_id) ? ms->ram_size : 0;
> > > > +    }
> > > > +
> > > > +    return (socket_id < ms->numa_state->num_nodes) ?
> > > > +            ms->numa_state->nodes[socket_id].node_mem : 0; }
> > > > +
> > > > +void riscv_socket_fdt_write_id(const MachineState *ms, void
> > > > *fdt,
> > > > +                               const char *node_name, int
> > > > socket_id)
> > > > +{
> > > > +    if (numa_enabled(ms)) {
> > > > +        qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id",
> > > socket_id);
> > > > +    }
> > > > +}
> > > > +
> > > > +void riscv_socket_fdt_write_distance_matrix(const MachineState
> > > > *ms,
> > > > +void *fdt) {
> > > > +    int i, j, idx;
> > > > +    uint32_t *dist_matrix, dist_matrix_size;
> > > > +
> > > > +    if (numa_enabled(ms) && ms->numa_state-
> > > > >have_numa_distance) {
> > > > +        dist_matrix_size = riscv_socket_count(ms) *
> > > > riscv_socket_count(ms);
> > > > +        dist_matrix_size *= (3 * sizeof(uint32_t));
> > > > +        dist_matrix = g_malloc0(dist_matrix_size);
> > > > +
> > > > +        for (i = 0; i < riscv_socket_count(ms); i++) {
> > > > +            for (j = 0; j < riscv_socket_count(ms); j++) {
> > > > +                idx = (i * riscv_socket_count(ms) + j) * 3;
> > > > +                dist_matrix[idx + 0] = cpu_to_be32(i);
> > > > +                dist_matrix[idx + 1] = cpu_to_be32(j);
> > > > +                dist_matrix[idx + 2] =
> > > > +                    cpu_to_be32(ms->numa_state-
> > > > >nodes[i].distance[j]);
> > > > +            }
> > > > +        }
> > > > +
> > > > +        qemu_fdt_add_subnode(fdt, "/distance-map");
> > > > +        qemu_fdt_setprop_string(fdt, "/distance-map",
> > > > "compatible",
> > > > +                                "numa-distance-map-v1");
> > > > +        qemu_fdt_setprop(fdt, "/distance-map", "distance-
> > > > matrix",
> > > > +                         dist_matrix, dist_matrix_size);
> > > > +        g_free(dist_matrix);
> > > > +    }
> > > > +}
> > > > +
> > > > +CpuInstanceProperties
> > > > +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned
> > > cpu_index) {
> > > > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > > +    const CPUArchIdList *possible_cpus =
> > > > +mc->possible_cpu_arch_ids(ms);
> > > > +
> > > > +    assert(cpu_index < possible_cpus->len);
> > > > +    return possible_cpus->cpus[cpu_index].props;
> > > > +}
> > > > +
> > > > +int64_t riscv_numa_get_default_cpu_node_id(const MachineState
> > > > *ms,
> > > > +int idx) {
> > > > +    int64_t nidx = 0;
> > > > +
> > > > +    if (ms->numa_state->num_nodes) {
> > > > +        nidx = idx / (ms->smp.cpus / ms->numa_state-
> > > > >num_nodes);
> > > > +        if (ms->numa_state->num_nodes <= nidx) {
> > > > +            nidx = ms->numa_state->num_nodes - 1;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return nidx;
> > > > +}
> > > > +
> > > > +const CPUArchIdList
> > > > *riscv_numa_possible_cpu_arch_ids(MachineState
> > > > +*ms) {
> > > > +    int n;
> > > > +    unsigned int max_cpus = ms->smp.max_cpus;
> > > > +
> > > > +    if (ms->possible_cpus) {
> > > > +        assert(ms->possible_cpus->len == max_cpus);
> > > > +        return ms->possible_cpus;
> > > > +    }
> > > > +
> > > > +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> > > > +                                  sizeof(CPUArchId) *
> > > > max_cpus);
> > > > +    ms->possible_cpus->len = max_cpus;
> > > > +    for (n = 0; n < ms->possible_cpus->len; n++) {
> > > > +        ms->possible_cpus->cpus[n].type = ms->cpu_type;
> > > > +        ms->possible_cpus->cpus[n].arch_id = n;
> > > > +        ms->possible_cpus->cpus[n].props.has_core_id = true;
> > > > +        ms->possible_cpus->cpus[n].props.core_id = n;
> > > > +    }
> > > > +
> > > > +    return ms->possible_cpus;
> > > > +}
> > > > diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h
> > > > new
> > > > file mode 100644 index 0000000000..fd9517a315
> > > > --- /dev/null
> > > > +++ b/include/hw/riscv/numa.h
> > > > @@ -0,0 +1,51 @@
> > > > +/*
> > > > + * QEMU RISC-V NUMA Helper
> > > > + *
> > > > + * Copyright (c) 2020 Western Digital Corporation or its
> > > > affiliates.
> > > > + *
> > > > + * This program is free software; you can redistribute it
> > > > and/or
> > > > +modify it
> > > > + * under the terms and conditions of the GNU General Public
> > > > License,
> > > > + * version 2 or later, as published by the Free Software
> > > > Foundation.
> > > > + *
> > > > + * This program is distributed in the hope it will be useful,
> > > > but
> > > > +WITHOUT
> > > > + * ANY WARRANTY; without even the implied warranty of
> > > MERCHANTABILITY
> > > > +or
> > > > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
> > > > Public
> > > > +License for
> > > > + * more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public
> > > > License
> > > > +along with
> > > > + * this program.  If not, see <http://www.gnu.org/licenses/>;.
> > > > + */
> > > > +
> > > > +#ifndef RISCV_NUMA_H
> > > > +#define RISCV_NUMA_H
> > > > +
> > > > +#include "hw/sysbus.h"
> > > > +#include "sysemu/numa.h"
> > > > +
> > > > +int riscv_socket_count(const MachineState *ms);
> > > > +
> > > > +int riscv_socket_first_hartid(const MachineState *ms, int
> > > > socket_id);
> > > > +
> > > > +int riscv_socket_last_hartid(const MachineState *ms, int
> > > > socket_id);
> > > > +
> > > > +int riscv_socket_hart_count(const MachineState *ms, int
> > > > socket_id);
> > > > +
> > > > +uint64_t riscv_socket_mem_offset(const MachineState *ms, int
> > > > +socket_id);
> > > > +
> > > > +uint64_t riscv_socket_mem_size(const MachineState *ms, int
> > > > +socket_id);
> > > > +
> > > > +bool riscv_socket_check_hartids(const MachineState *ms, int
> > > > +socket_id);
> > > > +
> > > > +void riscv_socket_fdt_write_id(const MachineState *ms, void
> > > > *fdt,
> > > > +                               const char *node_name, int
> > > > socket_id);
> > > > +
> > > > +void riscv_socket_fdt_write_distance_matrix(const MachineState
> > > > *ms,
> > > > +void *fdt);
> > > > +
> > > > +CpuInstanceProperties
> > > > +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned
> > > cpu_index);
> > > > +
> > > > +int64_t riscv_numa_get_default_cpu_node_id(const MachineState
> > > > *ms,
> > > > +int idx);
> > > > +
> > > > +const CPUArchIdList
> > > > *riscv_numa_possible_cpu_arch_ids(MachineState
> > > > +*ms);
> > > 
> > > Can we add some comments for the functions of what they are
> > > expected to
> > > return (and that -1 is an error)?
> > > 
> > > Alistair
> > > 
> > > > +
> > > > +#endif /* RISCV_NUMA_H */
> > > > --
> > > > 2.25.1
> > > > 
> > > > 

-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
  2020-05-29 11:46   ` Anup Patel
@ 2020-06-13  5:18     ` Atish Patra
  -1 siblings, 0 replies; 28+ messages in thread
From: Atish Patra @ 2020-06-13  5:18 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, qemu-riscv, Sagar Karandikar, Anup Patel,
	qemu-devel, Atish Patra, Alistair Francis, Palmer Dabbelt

On Fri, May 29, 2020 at 4:48 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> We add common helper routines which can be shared by RISC-V
> multi-socket NUMA machines.
>
> We have two types of helpers:
> 1. riscv_socket_xyz() - These helper assist managing multiple
>    sockets irrespective whether QEMU NUMA is enabled/disabled
> 2. riscv_numa_xyz() - These helpers assist in providing
>    necessary QEMU machine callbacks for QEMU NUMA emulation
>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> ---
>  hw/riscv/Makefile.objs  |   1 +
>  hw/riscv/numa.c         | 242 ++++++++++++++++++++++++++++++++++++++++
>  include/hw/riscv/numa.h |  51 +++++++++
>  3 files changed, 294 insertions(+)
>  create mode 100644 hw/riscv/numa.c
>  create mode 100644 include/hw/riscv/numa.h
>
> diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
> index fc3c6dd7c8..4483e61879 100644
> --- a/hw/riscv/Makefile.objs
> +++ b/hw/riscv/Makefile.objs
> @@ -1,4 +1,5 @@
>  obj-y += boot.o
> +obj-y += numa.o
>  obj-$(CONFIG_SPIKE) += riscv_htif.o
>  obj-$(CONFIG_HART) += riscv_hart.o
>  obj-$(CONFIG_SIFIVE_E) += sifive_e.o
> diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
> new file mode 100644
> index 0000000000..4f92307102
> --- /dev/null
> +++ b/hw/riscv/numa.c
> @@ -0,0 +1,242 @@
> +/*
> + * QEMU RISC-V NUMA Helper
> + *
> + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "hw/boards.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/riscv/numa.h"
> +#include "sysemu/device_tree.h"
> +
> +static bool numa_enabled(const MachineState *ms)
> +{
> +    return (ms->numa_state && ms->numa_state->num_nodes) ? true : false;
> +}
> +
> +int riscv_socket_count(const MachineState *ms)
> +{
> +    return (numa_enabled(ms)) ? ms->numa_state->num_nodes : 1;
> +}
> +
> +int riscv_socket_first_hartid(const MachineState *ms, int socket_id)
> +{
> +    int i, first_hartid = ms->smp.cpus;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? 0 : -1;
> +    }
> +
> +    for (i = 0; i < ms->smp.cpus; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            continue;
> +        }
> +        if (i < first_hartid) {
> +            first_hartid = i;
> +        }
> +    }
> +
> +    return (first_hartid < ms->smp.cpus) ? first_hartid : -1;
> +}
> +
> +int riscv_socket_last_hartid(const MachineState *ms, int socket_id)
> +{
> +    int i, last_hartid = -1;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->smp.cpus - 1 : -1;
> +    }
> +
> +    for (i = 0; i < ms->smp.cpus; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            continue;
> +        }
> +        if (i > last_hartid) {
> +            last_hartid = i;
> +        }
> +    }
> +
> +    return (last_hartid < ms->smp.cpus) ? last_hartid : -1;
> +}
> +
> +int riscv_socket_hart_count(const MachineState *ms, int socket_id)
> +{
> +    int first_hartid, last_hartid;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->smp.cpus : -1;
> +    }
> +
> +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> +    if (first_hartid < 0) {
> +        return -1;
> +    }
> +
> +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> +    if (last_hartid < 0) {
> +        return -1;
> +    }
> +
> +    if (first_hartid > last_hartid) {
> +        return -1;
> +    }
> +
> +    return last_hartid - first_hartid + 1;
> +}
> +
> +bool riscv_socket_check_hartids(const MachineState *ms, int socket_id)
> +{
> +    int i, first_hartid, last_hartid;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? true : false;
> +    }
> +
> +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> +    if (first_hartid < 0) {
> +        return false;
> +    }
> +
> +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> +    if (last_hartid < 0) {
> +        return false;
> +    }
> +
> +    for (i = first_hartid; i <= last_hartid; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
> +uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id)
> +{
> +    int i;
> +    uint64_t mem_offset = 0;
> +
> +    if (!numa_enabled(ms)) {
> +        return 0;
> +    }
> +
> +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
> +        if (i == socket_id) {
> +            break;
> +        }
> +        mem_offset += ms->numa_state->nodes[i].node_mem;
> +    }
> +
> +    return (i == socket_id) ? mem_offset : 0;
> +}
> +
> +uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id)
> +{
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->ram_size : 0;
> +    }
> +
> +    return (socket_id < ms->numa_state->num_nodes) ?
> +            ms->numa_state->nodes[socket_id].node_mem : 0;
> +}
> +
> +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> +                               const char *node_name, int socket_id)
> +{
> +    if (numa_enabled(ms)) {
> +        qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id", socket_id);
> +    }
> +}
> +
> +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt)
> +{
> +    int i, j, idx;
> +    uint32_t *dist_matrix, dist_matrix_size;
> +
> +    if (numa_enabled(ms) && ms->numa_state->have_numa_distance) {
> +        dist_matrix_size = riscv_socket_count(ms) * riscv_socket_count(ms);
> +        dist_matrix_size *= (3 * sizeof(uint32_t));
> +        dist_matrix = g_malloc0(dist_matrix_size);
> +
> +        for (i = 0; i < riscv_socket_count(ms); i++) {
> +            for (j = 0; j < riscv_socket_count(ms); j++) {
> +                idx = (i * riscv_socket_count(ms) + j) * 3;
> +                dist_matrix[idx + 0] = cpu_to_be32(i);
> +                dist_matrix[idx + 1] = cpu_to_be32(j);
> +                dist_matrix[idx + 2] =
> +                    cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
> +            }
> +        }
> +
> +        qemu_fdt_add_subnode(fdt, "/distance-map");
> +        qemu_fdt_setprop_string(fdt, "/distance-map", "compatible",
> +                                "numa-distance-map-v1");
> +        qemu_fdt_setprop(fdt, "/distance-map", "distance-matrix",
> +                         dist_matrix, dist_matrix_size);
> +        g_free(dist_matrix);
> +    }
> +}
> +
> +CpuInstanceProperties
> +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;
> +}
> +
> +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx)
> +{
> +    int64_t nidx = 0;
> +
> +    if (ms->numa_state->num_nodes) {
> +        nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
> +        if (ms->numa_state->num_nodes <= nidx) {
> +            nidx = ms->numa_state->num_nodes - 1;
> +        }
> +    }
> +
> +    return nidx;
> +}
> +
> +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms)
> +{
> +    int n;
> +    unsigned int max_cpus = ms->smp.max_cpus;
> +
> +    if (ms->possible_cpus) {
> +        assert(ms->possible_cpus->len == max_cpus);
> +        return ms->possible_cpus;
> +    }
> +
> +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> +                                  sizeof(CPUArchId) * max_cpus);
> +    ms->possible_cpus->len = max_cpus;
> +    for (n = 0; n < ms->possible_cpus->len; n++) {
> +        ms->possible_cpus->cpus[n].type = ms->cpu_type;
> +        ms->possible_cpus->cpus[n].arch_id = n;
> +        ms->possible_cpus->cpus[n].props.has_core_id = true;
> +        ms->possible_cpus->cpus[n].props.core_id = n;
> +    }
> +
> +    return ms->possible_cpus;
> +}
> diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h
> new file mode 100644
> index 0000000000..fd9517a315
> --- /dev/null
> +++ b/include/hw/riscv/numa.h
> @@ -0,0 +1,51 @@
> +/*
> + * QEMU RISC-V NUMA Helper
> + *
> + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef RISCV_NUMA_H
> +#define RISCV_NUMA_H
> +
> +#include "hw/sysbus.h"
> +#include "sysemu/numa.h"
> +
> +int riscv_socket_count(const MachineState *ms);
> +
> +int riscv_socket_first_hartid(const MachineState *ms, int socket_id);
> +
> +int riscv_socket_last_hartid(const MachineState *ms, int socket_id);
> +
> +int riscv_socket_hart_count(const MachineState *ms, int socket_id);
> +
> +uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id);
> +
> +uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id);
> +
> +bool riscv_socket_check_hartids(const MachineState *ms, int socket_id);
> +
> +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> +                               const char *node_name, int socket_id);
> +
> +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt);
> +
> +CpuInstanceProperties
> +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index);
> +
> +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx);
> +
> +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms);
> +
> +#endif /* RISCV_NUMA_H */
> --
> 2.25.1
>
>

LGTM.

Reviewed-by: Atish Patra <atish.patra@wdc.com>

-- 
Regards,
Atish


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines
@ 2020-06-13  5:18     ` Atish Patra
  0 siblings, 0 replies; 28+ messages in thread
From: Atish Patra @ 2020-06-13  5:18 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, Palmer Dabbelt, Alistair Francis,
	Sagar Karandikar, Atish Patra, qemu-riscv, qemu-devel,
	Anup Patel

On Fri, May 29, 2020 at 4:48 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> We add common helper routines which can be shared by RISC-V
> multi-socket NUMA machines.
>
> We have two types of helpers:
> 1. riscv_socket_xyz() - These helper assist managing multiple
>    sockets irrespective whether QEMU NUMA is enabled/disabled
> 2. riscv_numa_xyz() - These helpers assist in providing
>    necessary QEMU machine callbacks for QEMU NUMA emulation
>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> ---
>  hw/riscv/Makefile.objs  |   1 +
>  hw/riscv/numa.c         | 242 ++++++++++++++++++++++++++++++++++++++++
>  include/hw/riscv/numa.h |  51 +++++++++
>  3 files changed, 294 insertions(+)
>  create mode 100644 hw/riscv/numa.c
>  create mode 100644 include/hw/riscv/numa.h
>
> diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
> index fc3c6dd7c8..4483e61879 100644
> --- a/hw/riscv/Makefile.objs
> +++ b/hw/riscv/Makefile.objs
> @@ -1,4 +1,5 @@
>  obj-y += boot.o
> +obj-y += numa.o
>  obj-$(CONFIG_SPIKE) += riscv_htif.o
>  obj-$(CONFIG_HART) += riscv_hart.o
>  obj-$(CONFIG_SIFIVE_E) += sifive_e.o
> diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
> new file mode 100644
> index 0000000000..4f92307102
> --- /dev/null
> +++ b/hw/riscv/numa.c
> @@ -0,0 +1,242 @@
> +/*
> + * QEMU RISC-V NUMA Helper
> + *
> + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "hw/boards.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/riscv/numa.h"
> +#include "sysemu/device_tree.h"
> +
> +static bool numa_enabled(const MachineState *ms)
> +{
> +    return (ms->numa_state && ms->numa_state->num_nodes) ? true : false;
> +}
> +
> +int riscv_socket_count(const MachineState *ms)
> +{
> +    return (numa_enabled(ms)) ? ms->numa_state->num_nodes : 1;
> +}
> +
> +int riscv_socket_first_hartid(const MachineState *ms, int socket_id)
> +{
> +    int i, first_hartid = ms->smp.cpus;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? 0 : -1;
> +    }
> +
> +    for (i = 0; i < ms->smp.cpus; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            continue;
> +        }
> +        if (i < first_hartid) {
> +            first_hartid = i;
> +        }
> +    }
> +
> +    return (first_hartid < ms->smp.cpus) ? first_hartid : -1;
> +}
> +
> +int riscv_socket_last_hartid(const MachineState *ms, int socket_id)
> +{
> +    int i, last_hartid = -1;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->smp.cpus - 1 : -1;
> +    }
> +
> +    for (i = 0; i < ms->smp.cpus; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            continue;
> +        }
> +        if (i > last_hartid) {
> +            last_hartid = i;
> +        }
> +    }
> +
> +    return (last_hartid < ms->smp.cpus) ? last_hartid : -1;
> +}
> +
> +int riscv_socket_hart_count(const MachineState *ms, int socket_id)
> +{
> +    int first_hartid, last_hartid;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->smp.cpus : -1;
> +    }
> +
> +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> +    if (first_hartid < 0) {
> +        return -1;
> +    }
> +
> +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> +    if (last_hartid < 0) {
> +        return -1;
> +    }
> +
> +    if (first_hartid > last_hartid) {
> +        return -1;
> +    }
> +
> +    return last_hartid - first_hartid + 1;
> +}
> +
> +bool riscv_socket_check_hartids(const MachineState *ms, int socket_id)
> +{
> +    int i, first_hartid, last_hartid;
> +
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? true : false;
> +    }
> +
> +    first_hartid = riscv_socket_first_hartid(ms, socket_id);
> +    if (first_hartid < 0) {
> +        return false;
> +    }
> +
> +    last_hartid = riscv_socket_last_hartid(ms, socket_id);
> +    if (last_hartid < 0) {
> +        return false;
> +    }
> +
> +    for (i = first_hartid; i <= last_hartid; i++) {
> +        if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
> +uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id)
> +{
> +    int i;
> +    uint64_t mem_offset = 0;
> +
> +    if (!numa_enabled(ms)) {
> +        return 0;
> +    }
> +
> +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
> +        if (i == socket_id) {
> +            break;
> +        }
> +        mem_offset += ms->numa_state->nodes[i].node_mem;
> +    }
> +
> +    return (i == socket_id) ? mem_offset : 0;
> +}
> +
> +uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id)
> +{
> +    if (!numa_enabled(ms)) {
> +        return (!socket_id) ? ms->ram_size : 0;
> +    }
> +
> +    return (socket_id < ms->numa_state->num_nodes) ?
> +            ms->numa_state->nodes[socket_id].node_mem : 0;
> +}
> +
> +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> +                               const char *node_name, int socket_id)
> +{
> +    if (numa_enabled(ms)) {
> +        qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id", socket_id);
> +    }
> +}
> +
> +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt)
> +{
> +    int i, j, idx;
> +    uint32_t *dist_matrix, dist_matrix_size;
> +
> +    if (numa_enabled(ms) && ms->numa_state->have_numa_distance) {
> +        dist_matrix_size = riscv_socket_count(ms) * riscv_socket_count(ms);
> +        dist_matrix_size *= (3 * sizeof(uint32_t));
> +        dist_matrix = g_malloc0(dist_matrix_size);
> +
> +        for (i = 0; i < riscv_socket_count(ms); i++) {
> +            for (j = 0; j < riscv_socket_count(ms); j++) {
> +                idx = (i * riscv_socket_count(ms) + j) * 3;
> +                dist_matrix[idx + 0] = cpu_to_be32(i);
> +                dist_matrix[idx + 1] = cpu_to_be32(j);
> +                dist_matrix[idx + 2] =
> +                    cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
> +            }
> +        }
> +
> +        qemu_fdt_add_subnode(fdt, "/distance-map");
> +        qemu_fdt_setprop_string(fdt, "/distance-map", "compatible",
> +                                "numa-distance-map-v1");
> +        qemu_fdt_setprop(fdt, "/distance-map", "distance-matrix",
> +                         dist_matrix, dist_matrix_size);
> +        g_free(dist_matrix);
> +    }
> +}
> +
> +CpuInstanceProperties
> +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;
> +}
> +
> +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx)
> +{
> +    int64_t nidx = 0;
> +
> +    if (ms->numa_state->num_nodes) {
> +        nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
> +        if (ms->numa_state->num_nodes <= nidx) {
> +            nidx = ms->numa_state->num_nodes - 1;
> +        }
> +    }
> +
> +    return nidx;
> +}
> +
> +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms)
> +{
> +    int n;
> +    unsigned int max_cpus = ms->smp.max_cpus;
> +
> +    if (ms->possible_cpus) {
> +        assert(ms->possible_cpus->len == max_cpus);
> +        return ms->possible_cpus;
> +    }
> +
> +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> +                                  sizeof(CPUArchId) * max_cpus);
> +    ms->possible_cpus->len = max_cpus;
> +    for (n = 0; n < ms->possible_cpus->len; n++) {
> +        ms->possible_cpus->cpus[n].type = ms->cpu_type;
> +        ms->possible_cpus->cpus[n].arch_id = n;
> +        ms->possible_cpus->cpus[n].props.has_core_id = true;
> +        ms->possible_cpus->cpus[n].props.core_id = n;
> +    }
> +
> +    return ms->possible_cpus;
> +}
> diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h
> new file mode 100644
> index 0000000000..fd9517a315
> --- /dev/null
> +++ b/include/hw/riscv/numa.h
> @@ -0,0 +1,51 @@
> +/*
> + * QEMU RISC-V NUMA Helper
> + *
> + * Copyright (c) 2020 Western Digital Corporation or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef RISCV_NUMA_H
> +#define RISCV_NUMA_H
> +
> +#include "hw/sysbus.h"
> +#include "sysemu/numa.h"
> +
> +int riscv_socket_count(const MachineState *ms);
> +
> +int riscv_socket_first_hartid(const MachineState *ms, int socket_id);
> +
> +int riscv_socket_last_hartid(const MachineState *ms, int socket_id);
> +
> +int riscv_socket_hart_count(const MachineState *ms, int socket_id);
> +
> +uint64_t riscv_socket_mem_offset(const MachineState *ms, int socket_id);
> +
> +uint64_t riscv_socket_mem_size(const MachineState *ms, int socket_id);
> +
> +bool riscv_socket_check_hartids(const MachineState *ms, int socket_id);
> +
> +void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
> +                               const char *node_name, int socket_id);
> +
> +void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt);
> +
> +CpuInstanceProperties
> +riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index);
> +
> +int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx);
> +
> +const CPUArchIdList *riscv_numa_possible_cpu_arch_ids(MachineState *ms);
> +
> +#endif /* RISCV_NUMA_H */
> --
> 2.25.1
>
>

LGTM.

Reviewed-by: Atish Patra <atish.patra@wdc.com>

-- 
Regards,
Atish


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 5/5] hw/riscv: virt: Allow creating multiple NUMA sockets
  2020-05-29 11:46   ` Anup Patel
@ 2020-06-13  5:21     ` Atish Patra
  -1 siblings, 0 replies; 28+ messages in thread
From: Atish Patra @ 2020-06-13  5:21 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, qemu-riscv, Sagar Karandikar, Anup Patel,
	qemu-devel, Atish Patra, Alistair Francis, Palmer Dabbelt

On Fri, May 29, 2020 at 4:50 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> We extend RISC-V virt machine to allow creating a multi-socket
> machine. Each RISC-V virt machine socket is a NUMA node having
> a set of HARTs, a memory instance, a CLINT instance, and a PLIC
> instance. Other devices are shared between all sockets. We also
> update the generated device tree accordingly.
>
> By default, NUMA multi-socket support is disabled for RISC-V virt
> machine. To enable it, users can use "-numa" command-line options
> of QEMU.
>
> Example1: For two NUMA nodes with 2 CPUs each, append following
> to command-line options: "-smp 4 -numa node -numa node"
>
> Example2: For two NUMA nodes with 1 and 3 CPUs, append following
> to command-line options:
> "-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \
> -numa cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \
> -numa cpu,node-id=1,core-id=3"
>
> The maximum number of sockets in a RISC-V virt machine is 8
> but this limit can be changed in future.
>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> ---
>  hw/riscv/virt.c         | 530 +++++++++++++++++++++++-----------------
>  include/hw/riscv/virt.h |   9 +-
>  2 files changed, 308 insertions(+), 231 deletions(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index 421815081d..2863b42cea 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -35,6 +35,7 @@
>  #include "hw/riscv/sifive_test.h"
>  #include "hw/riscv/virt.h"
>  #include "hw/riscv/boot.h"
> +#include "hw/riscv/numa.h"
>  #include "chardev/char.h"
>  #include "sysemu/arch_init.h"
>  #include "sysemu/device_tree.h"
> @@ -60,7 +61,7 @@ static const struct MemmapEntry {
>      [VIRT_TEST] =        {   0x100000,        0x1000 },
>      [VIRT_RTC] =         {   0x101000,        0x1000 },
>      [VIRT_CLINT] =       {  0x2000000,       0x10000 },
> -    [VIRT_PLIC] =        {  0xc000000,     0x4000000 },
> +    [VIRT_PLIC] =        {  0xc000000, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
>      [VIRT_UART0] =       { 0x10000000,         0x100 },
>      [VIRT_VIRTIO] =      { 0x10001000,        0x1000 },
>      [VIRT_FLASH] =       { 0x20000000,     0x4000000 },
> @@ -182,10 +183,17 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
>      uint64_t mem_size, const char *cmdline)
>  {
>      void *fdt;
> -    int cpu, i;
> -    uint32_t *cells;
> -    char *nodename;
> -    uint32_t plic_phandle, test_phandle, phandle = 1;
> +    int i, cpu, socket;
> +    MachineState *mc = MACHINE(s);
> +    uint64_t addr, size;
> +    uint32_t *clint_cells, *plic_cells;
> +    unsigned long clint_addr, plic_addr;
> +    uint32_t plic_phandle[MAX_NODES];
> +    uint32_t cpu_phandle, intc_phandle, test_phandle;
> +    uint32_t phandle = 1, plic_mmio_phandle = 1;
> +    uint32_t plic_pcie_phandle = 1, plic_virtio_phandle = 1;
> +    char *mem_name, *cpu_name, *core_name, *intc_name;
> +    char *name, *clint_name, *plic_name, *clust_name;
>      hwaddr flashsize = virt_memmap[VIRT_FLASH].size / 2;
>      hwaddr flashbase = virt_memmap[VIRT_FLASH].base;
>
> @@ -206,231 +214,238 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
>      qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
>      qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
>
> -    nodename = g_strdup_printf("/memory@%lx",
> -        (long)memmap[VIRT_DRAM].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        memmap[VIRT_DRAM].base >> 32, memmap[VIRT_DRAM].base,
> -        mem_size >> 32, mem_size);
> -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> -    g_free(nodename);
> -
>      qemu_fdt_add_subnode(fdt, "/cpus");
>      qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
>                            SIFIVE_CLINT_TIMEBASE_FREQ);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
> +    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> +
> +    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
> +        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
> +        qemu_fdt_add_subnode(fdt, clust_name);
> +
> +        plic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> +        clint_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> +
> +        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
> +            cpu_phandle = phandle++;
>
> -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> -        int cpu_phandle = phandle++;
> -        int intc_phandle;
> -        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> -        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
> -        qemu_fdt_add_subnode(fdt, nodename);
> +            cpu_name = g_strdup_printf("/cpus/cpu@%d",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_add_subnode(fdt, cpu_name);
>  #if defined(TARGET_RISCV32)
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv32");
>  #else
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv48");
>  #endif
> -        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
> -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
> -        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
> -        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
> -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
> -        qemu_fdt_setprop_cell(fdt, nodename, "phandle", cpu_phandle);
> -        intc_phandle = phandle++;
> -        qemu_fdt_add_subnode(fdt, intc);
> -        qemu_fdt_setprop_cell(fdt, intc, "phandle", intc_phandle);
> -        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
> -        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
> -        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
> -        g_free(isa);
> -        g_free(intc);
> -        g_free(nodename);
> -    }
> +            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
> +            g_free(name);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
> +            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
> +
> +            intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
> +            qemu_fdt_add_subnode(fdt, intc_name);
> +            intc_phandle = phandle++;
> +            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
> +            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
> +                "riscv,cpu-intc");
> +            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
> +            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells", 1);
> +
> +            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> +            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> +
> +            plic_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> +            plic_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> +            plic_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> +            plic_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> +
> +            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
> +            qemu_fdt_add_subnode(fdt, core_name);
> +            qemu_fdt_setprop_cell(fdt, core_name, "cpu", cpu_phandle);
> +
> +            g_free(core_name);
> +            g_free(intc_name);
> +            g_free(cpu_name);
> +        }
>
> -    /* Add cpu-topology node */
> -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map/cluster0");
> -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> -        char *core_nodename = g_strdup_printf("/cpus/cpu-map/cluster0/core%d",
> -                                              cpu);
> -        char *cpu_nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, cpu_nodename);
> -        qemu_fdt_add_subnode(fdt, core_nodename);
> -        qemu_fdt_setprop_cell(fdt, core_nodename, "cpu", intc_phandle);
> -        g_free(core_nodename);
> -        g_free(cpu_nodename);
> +        addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(mc, socket);
> +        size = riscv_socket_mem_size(mc, socket);
> +        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
> +        qemu_fdt_add_subnode(fdt, mem_name);
> +        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
> +            addr >> 32, addr, size >> 32, size);
> +        qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
> +        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
> +        g_free(mem_name);
> +
> +        clint_addr = memmap[VIRT_CLINT].base +
> +            (memmap[VIRT_CLINT].size * socket);
> +        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
> +        qemu_fdt_add_subnode(fdt, clint_name);
> +        qemu_fdt_setprop_string(fdt, clint_name, "compatible", "riscv,clint0");
> +        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
> +            0x0, clint_addr, 0x0, memmap[VIRT_CLINT].size);
> +        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
> +            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> +        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
> +        g_free(clint_name);
> +
> +        plic_phandle[socket] = phandle++;
> +        plic_addr = memmap[VIRT_PLIC].base + (memmap[VIRT_PLIC].size * socket);
> +        plic_name = g_strdup_printf("/soc/plic@%lx", plic_addr);
> +        qemu_fdt_add_subnode(fdt, plic_name);
> +        qemu_fdt_setprop_cell(fdt, plic_name,
> +            "#address-cells", FDT_PLIC_ADDR_CELLS);
> +        qemu_fdt_setprop_cell(fdt, plic_name,
> +            "#interrupt-cells", FDT_PLIC_INT_CELLS);
> +        qemu_fdt_setprop_string(fdt, plic_name, "compatible", "riscv,plic0");
> +        qemu_fdt_setprop(fdt, plic_name, "interrupt-controller", NULL, 0);
> +        qemu_fdt_setprop(fdt, plic_name, "interrupts-extended",
> +            plic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> +        qemu_fdt_setprop_cells(fdt, plic_name, "reg",
> +            0x0, plic_addr, 0x0, memmap[VIRT_PLIC].size);
> +        qemu_fdt_setprop_cell(fdt, plic_name, "riscv,ndev", VIRTIO_NDEV);
> +        riscv_socket_fdt_write_id(mc, fdt, plic_name, socket);
> +        qemu_fdt_setprop_cell(fdt, plic_name, "phandle", plic_phandle[socket]);
> +        g_free(plic_name);
> +
> +        g_free(clint_cells);
> +        g_free(plic_cells);
> +        g_free(clust_name);
>      }
>
> -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> -        nodename =
> -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> -        g_free(nodename);
> -    }
> -    nodename = g_strdup_printf("/soc/clint@%lx",
> -        (long)memmap[VIRT_CLINT].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        0x0, memmap[VIRT_CLINT].base,
> -        0x0, memmap[VIRT_CLINT].size);
> -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> -    g_free(cells);
> -    g_free(nodename);
> -
> -    plic_phandle = phandle++;
> -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> -        nodename =
> -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> -        g_free(nodename);
> +    for (socket = 0; socket < riscv_socket_count(mc); socket++) {
> +        if (socket == 0) {
> +            plic_mmio_phandle = plic_phandle[socket];
> +            plic_virtio_phandle = plic_phandle[socket];
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
> +        if (socket == 1) {
> +            plic_virtio_phandle = plic_phandle[socket];
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
> +        if (socket == 2) {
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
>      }
> -    nodename = g_strdup_printf("/soc/interrupt-controller@%lx",
> -        (long)memmap[VIRT_PLIC].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> -                          FDT_PLIC_ADDR_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> -                          FDT_PLIC_INT_CELLS);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,plic0");
> -    qemu_fdt_setprop(fdt, nodename, "interrupt-controller", NULL, 0);
> -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        0x0, memmap[VIRT_PLIC].base,
> -        0x0, memmap[VIRT_PLIC].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
> -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
> -    plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -    g_free(cells);
> -    g_free(nodename);
> +
> +    riscv_socket_fdt_write_distance_matrix(mc, fdt);
>
>      for (i = 0; i < VIRTIO_COUNT; i++) {
> -        nodename = g_strdup_printf("/virtio_mmio@%lx",
> +        name = g_strdup_printf("/soc/virtio_mmio@%lx",
>              (long)(memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size));
> -        qemu_fdt_add_subnode(fdt, nodename);
> -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "virtio,mmio");
> -        qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +        qemu_fdt_add_subnode(fdt, name);
> +        qemu_fdt_setprop_string(fdt, name, "compatible", "virtio,mmio");
> +        qemu_fdt_setprop_cells(fdt, name, "reg",
>              0x0, memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
>              0x0, memmap[VIRT_VIRTIO].size);
> -        qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -        qemu_fdt_setprop_cell(fdt, nodename, "interrupts", VIRTIO_IRQ + i);
> -        g_free(nodename);
> +        qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
> +            plic_virtio_phandle);
> +        qemu_fdt_setprop_cell(fdt, name, "interrupts", VIRTIO_IRQ + i);
> +        g_free(name);
>      }
>
> -    nodename = g_strdup_printf("/soc/pci@%lx",
> +    name = g_strdup_printf("/soc/pci@%lx",
>          (long) memmap[VIRT_PCIE_ECAM].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> -                          FDT_PCI_ADDR_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> -                          FDT_PCI_INT_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0x2);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> -                            "pci-host-ecam-generic");
> -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "pci");
> -    qemu_fdt_setprop_cell(fdt, nodename, "linux,pci-domain", 0);
> -    qemu_fdt_setprop_cells(fdt, nodename, "bus-range", 0,
> -                           memmap[VIRT_PCIE_ECAM].size /
> -                               PCIE_MMCFG_SIZE_MIN - 1);
> -    qemu_fdt_setprop(fdt, nodename, "dma-coherent", NULL, 0);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg", 0, memmap[VIRT_PCIE_ECAM].base,
> -                           0, memmap[VIRT_PCIE_ECAM].size);
> -    qemu_fdt_setprop_sized_cells(fdt, nodename, "ranges",
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_cell(fdt, name, "#address-cells", FDT_PCI_ADDR_CELLS);
> +    qemu_fdt_setprop_cell(fdt, name, "#interrupt-cells", FDT_PCI_INT_CELLS);
> +    qemu_fdt_setprop_cell(fdt, name, "#size-cells", 0x2);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "pci-host-ecam-generic");
> +    qemu_fdt_setprop_string(fdt, name, "device_type", "pci");
> +    qemu_fdt_setprop_cell(fdt, name, "linux,pci-domain", 0);
> +    qemu_fdt_setprop_cells(fdt, name, "bus-range", 0,
> +        memmap[VIRT_PCIE_ECAM].size / PCIE_MMCFG_SIZE_MIN - 1);
> +    qemu_fdt_setprop(fdt, name, "dma-coherent", NULL, 0);
> +    qemu_fdt_setprop_cells(fdt, name, "reg", 0,
> +        memmap[VIRT_PCIE_ECAM].base, 0, memmap[VIRT_PCIE_ECAM].size);
> +    qemu_fdt_setprop_sized_cells(fdt, name, "ranges",
>          1, FDT_PCI_RANGE_IOPORT, 2, 0,
>          2, memmap[VIRT_PCIE_PIO].base, 2, memmap[VIRT_PCIE_PIO].size,
>          1, FDT_PCI_RANGE_MMIO,
>          2, memmap[VIRT_PCIE_MMIO].base,
>          2, memmap[VIRT_PCIE_MMIO].base, 2, memmap[VIRT_PCIE_MMIO].size);
> -    create_pcie_irq_map(fdt, nodename, plic_phandle);
> -    g_free(nodename);
> +    create_pcie_irq_map(fdt, name, plic_pcie_phandle);
> +    g_free(name);
>
>      test_phandle = phandle++;
> -    nodename = g_strdup_printf("/test@%lx",
> +    name = g_strdup_printf("/soc/test@%lx",
>          (long)memmap[VIRT_TEST].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> +    qemu_fdt_add_subnode(fdt, name);
>      {
>          const char compat[] = "sifive,test1\0sifive,test0\0syscon";
> -        qemu_fdt_setprop(fdt, nodename, "compatible", compat, sizeof(compat));
> +        qemu_fdt_setprop(fdt, name, "compatible", compat, sizeof(compat));
>      }
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_TEST].base,
>          0x0, memmap[VIRT_TEST].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", test_phandle);
> -    test_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/reboot");
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-reboot");
> -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_RESET);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/poweroff");
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-poweroff");
> -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_PASS);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/uart@%lx",
> -        (long)memmap[VIRT_UART0].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "ns16550a");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    qemu_fdt_setprop_cell(fdt, name, "phandle", test_phandle);
> +    test_phandle = qemu_fdt_get_phandle(fdt, name);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/reboot");
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-reboot");
> +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_RESET);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/poweroff");
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-poweroff");
> +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_PASS);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/uart@%lx", (long)memmap[VIRT_UART0].base);
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "ns16550a");
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_UART0].base,
>          0x0, memmap[VIRT_UART0].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", UART0_IRQ);
> +    qemu_fdt_setprop_cell(fdt, name, "clock-frequency", 3686400);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupts", UART0_IRQ);
>
>      qemu_fdt_add_subnode(fdt, "/chosen");
> -    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", nodename);
> +    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", name);
>      if (cmdline) {
>          qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
>      }
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/rtc@%lx",
> -        (long)memmap[VIRT_RTC].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> -        "google,goldfish-rtc");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/rtc@%lx", (long)memmap[VIRT_RTC].base);
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "google,goldfish-rtc");
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_RTC].base,
>          0x0, memmap[VIRT_RTC].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", RTC_IRQ);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
> -    qemu_fdt_add_subnode(s->fdt, nodename);
> -    qemu_fdt_setprop_string(s->fdt, nodename, "compatible", "cfi-flash");
> -    qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg",
> +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupts", RTC_IRQ);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/flash@%" PRIx64, flashbase);
> +    qemu_fdt_add_subnode(s->fdt, name);
> +    qemu_fdt_setprop_string(s->fdt, name, "compatible", "cfi-flash");
> +    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
>                                   2, flashbase, 2, flashsize,
>                                   2, flashbase + flashsize, 2, flashsize);
> -    qemu_fdt_setprop_cell(s->fdt, nodename, "bank-width", 4);
> -    g_free(nodename);
> +    qemu_fdt_setprop_cell(s->fdt, name, "bank-width", 4);
> +    g_free(name);
>  }
>
> -
>  static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
>                                            hwaddr ecam_base, hwaddr ecam_size,
>                                            hwaddr mmio_base, hwaddr mmio_size,
> @@ -478,21 +493,100 @@ static void riscv_virt_board_init(MachineState *machine)
>      MemoryRegion *system_memory = get_system_memory();
>      MemoryRegion *main_mem = g_new(MemoryRegion, 1);
>      MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
> -    char *plic_hart_config;
> +    char *plic_hart_config, *soc_name;
>      size_t plic_hart_config_len;
>      target_ulong start_addr = memmap[VIRT_DRAM].base;
> -    int i;
> -    unsigned int smp_cpus = machine->smp.cpus;
> -
> -    /* Initialize SOC */
> -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> -                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> -    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-type",
> -                            &error_abort);
> -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> -                            &error_abort);
> -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> -                            &error_abort);
> +    DeviceState *mmio_plic, *virtio_plic, *pcie_plic;
> +    int i, j, base_hartid, hart_count;
> +
> +    /* Check socket count limit */
> +    if (VIRT_SOCKETS_MAX < riscv_socket_count(machine)) {
> +        error_report("number of sockets/nodes should be less than %d",
> +            VIRT_SOCKETS_MAX);
> +        exit(1);
> +    }
> +
> +    /* Initialize sockets */
> +    mmio_plic = virtio_plic = pcie_plic = NULL;
> +    for (i = 0; i < riscv_socket_count(machine); i++) {
> +        if (!riscv_socket_check_hartids(machine, i)) {
> +            error_report("discontinuous hartids in socket%d", i);
> +            exit(1);
> +        }
> +
> +        base_hartid = riscv_socket_first_hartid(machine, i);
> +        if (base_hartid < 0) {
> +            error_report("can't find hartid base for socket%d", i);
> +            exit(1);
> +        }
> +
> +        hart_count = riscv_socket_hart_count(machine, i);
> +        if (hart_count < 0) {
> +            error_report("can't find hart count for socket%d", i);
> +            exit(1);
> +        }
> +
> +        soc_name = g_strdup_printf("soc%d", i);
> +        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
> +            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> +        g_free(soc_name);
> +        object_property_set_str(OBJECT(&s->soc[i]),
> +            machine->cpu_type, "cpu-type", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            base_hartid, "hartid-base", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            hart_count, "num-harts", &error_abort);
> +        object_property_set_bool(OBJECT(&s->soc[i]),
> +            true, "realized", &error_abort);
> +
> +        /* Per-socket CLINT */
> +        sifive_clint_create(
> +            memmap[VIRT_CLINT].base + i * memmap[VIRT_CLINT].size,
> +            memmap[VIRT_CLINT].size, base_hartid, hart_count,
> +            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
> +
> +        /* Per-socket PLIC hart topology configuration string */
> +        plic_hart_config_len =
> +            (strlen(VIRT_PLIC_HART_CONFIG) + 1) * hart_count;
> +        plic_hart_config = g_malloc0(plic_hart_config_len);
> +        for (j = 0; j < hart_count; j++) {
> +            if (j != 0) {
> +                strncat(plic_hart_config, ",", plic_hart_config_len);
> +            }
> +            strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG,
> +                plic_hart_config_len);
> +            plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> +        }
> +
> +        /* Per-socket PLIC */
> +        s->plic[i] = sifive_plic_create(
> +            memmap[VIRT_PLIC].base + i * memmap[VIRT_PLIC].size,
> +            plic_hart_config, base_hartid,
> +            VIRT_PLIC_NUM_SOURCES,
> +            VIRT_PLIC_NUM_PRIORITIES,
> +            VIRT_PLIC_PRIORITY_BASE,
> +            VIRT_PLIC_PENDING_BASE,
> +            VIRT_PLIC_ENABLE_BASE,
> +            VIRT_PLIC_ENABLE_STRIDE,
> +            VIRT_PLIC_CONTEXT_BASE,
> +            VIRT_PLIC_CONTEXT_STRIDE,
> +            memmap[VIRT_PLIC].size);
> +        g_free(plic_hart_config);
> +
> +        /* Try to use different PLIC instance based device type */
> +        if (i == 0) {
> +            mmio_plic = s->plic[i];
> +            virtio_plic = s->plic[i];
> +            pcie_plic = s->plic[i];
> +        }
> +        if (i == 1) {
> +            virtio_plic = s->plic[i];
> +            pcie_plic = s->plic[i];
> +        }
> +        if (i == 2) {
> +            pcie_plic = s->plic[i];
> +        }
> +    }
>
>      /* register system main memory (actual RAM) */
>      memory_region_init_ram(main_mem, NULL, "riscv_virt_board.ram",
> @@ -571,38 +665,14 @@ static void riscv_virt_board_init(MachineState *machine)
>                            memmap[VIRT_MROM].base + sizeof(reset_vec),
>                            &address_space_memory);
>
> -    /* create PLIC hart topology configuration string */
> -    plic_hart_config_len = (strlen(VIRT_PLIC_HART_CONFIG) + 1) * smp_cpus;
> -    plic_hart_config = g_malloc0(plic_hart_config_len);
> -    for (i = 0; i < smp_cpus; i++) {
> -        if (i != 0) {
> -            strncat(plic_hart_config, ",", plic_hart_config_len);
> -        }
> -        strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG, plic_hart_config_len);
> -        plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> -    }
> -
> -    /* MMIO */
> -    s->plic = sifive_plic_create(memmap[VIRT_PLIC].base,
> -        plic_hart_config, 0,
> -        VIRT_PLIC_NUM_SOURCES,
> -        VIRT_PLIC_NUM_PRIORITIES,
> -        VIRT_PLIC_PRIORITY_BASE,
> -        VIRT_PLIC_PENDING_BASE,
> -        VIRT_PLIC_ENABLE_BASE,
> -        VIRT_PLIC_ENABLE_STRIDE,
> -        VIRT_PLIC_CONTEXT_BASE,
> -        VIRT_PLIC_CONTEXT_STRIDE,
> -        memmap[VIRT_PLIC].size);
> -    sifive_clint_create(memmap[VIRT_CLINT].base,
> -        memmap[VIRT_CLINT].size, 0, smp_cpus,
> -        SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
> +    /* SiFive Test MMIO device */
>      sifive_test_create(memmap[VIRT_TEST].base);
>
> +    /* VirtIO MMIO devices */
>      for (i = 0; i < VIRTIO_COUNT; i++) {
>          sysbus_create_simple("virtio-mmio",
>              memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
> -            qdev_get_gpio_in(DEVICE(s->plic), VIRTIO_IRQ + i));
> +            qdev_get_gpio_in(DEVICE(virtio_plic), VIRTIO_IRQ + i));
>      }
>
>      gpex_pcie_init(system_memory,
> @@ -611,14 +681,14 @@ static void riscv_virt_board_init(MachineState *machine)
>                           memmap[VIRT_PCIE_MMIO].base,
>                           memmap[VIRT_PCIE_MMIO].size,
>                           memmap[VIRT_PCIE_PIO].base,
> -                         DEVICE(s->plic), true);
> +                         DEVICE(pcie_plic), true);
>
>      serial_mm_init(system_memory, memmap[VIRT_UART0].base,
> -        0, qdev_get_gpio_in(DEVICE(s->plic), UART0_IRQ), 399193,
> +        0, qdev_get_gpio_in(DEVICE(mmio_plic), UART0_IRQ), 399193,
>          serial_hd(0), DEVICE_LITTLE_ENDIAN);
>
>      sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
> -        qdev_get_gpio_in(DEVICE(s->plic), RTC_IRQ));
> +        qdev_get_gpio_in(DEVICE(mmio_plic), RTC_IRQ));
>
>      virt_flash_create(s);
>
> @@ -628,8 +698,6 @@ static void riscv_virt_board_init(MachineState *machine)
>                                    drive_get(IF_PFLASH, 0, i));
>      }
>      virt_flash_map(s, system_memory);
> -
> -    g_free(plic_hart_config);
>  }
>
>  static void riscv_virt_machine_instance_init(Object *obj)
> @@ -642,9 +710,13 @@ static void riscv_virt_machine_class_init(ObjectClass *oc, void *data)
>
>      mc->desc = "RISC-V VirtIO board";
>      mc->init = riscv_virt_board_init;
> -    mc->max_cpus = 8;
> +    mc->max_cpus = VIRT_CPUS_MAX;
>      mc->default_cpu_type = VIRT_CPU;
>      mc->pci_allow_0_address = true;
> +    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
> +    mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
> +    mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
> +    mc->numa_mem_supported = true;
>  }
>
>  static const TypeInfo riscv_virt_machine_typeinfo = {
> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
> index e69355efaf..1beacd7666 100644
> --- a/include/hw/riscv/virt.h
> +++ b/include/hw/riscv/virt.h
> @@ -23,6 +23,9 @@
>  #include "hw/sysbus.h"
>  #include "hw/block/flash.h"
>
> +#define VIRT_CPUS_MAX 8
> +#define VIRT_SOCKETS_MAX 8
> +
>  #define TYPE_RISCV_VIRT_MACHINE MACHINE_TYPE_NAME("virt")
>  #define RISCV_VIRT_MACHINE(obj) \
>      OBJECT_CHECK(RISCVVirtState, (obj), TYPE_RISCV_VIRT_MACHINE)
> @@ -32,8 +35,8 @@ typedef struct {
>      MachineState parent;
>
>      /*< public >*/
> -    RISCVHartArrayState soc;
> -    DeviceState *plic;
> +    RISCVHartArrayState soc[VIRT_SOCKETS_MAX];
> +    DeviceState *plic[VIRT_SOCKETS_MAX];
>      PFlashCFI01 *flash[2];
>
>      void *fdt;
> @@ -74,6 +77,8 @@ enum {
>  #define VIRT_PLIC_ENABLE_STRIDE 0x80
>  #define VIRT_PLIC_CONTEXT_BASE 0x200000
>  #define VIRT_PLIC_CONTEXT_STRIDE 0x1000
> +#define VIRT_PLIC_SIZE(__num_context) \
> +    (VIRT_PLIC_CONTEXT_BASE + (__num_context) * VIRT_PLIC_CONTEXT_STRIDE)
>
>  #define FDT_PCI_ADDR_CELLS    3
>  #define FDT_PCI_INT_CELLS     1
> --
> 2.25.1
>
>

LGTM.

Reviewed-by: Atish Patra <atish.patra@wdc.com>

-- 
Regards,
Atish


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 5/5] hw/riscv: virt: Allow creating multiple NUMA sockets
@ 2020-06-13  5:21     ` Atish Patra
  0 siblings, 0 replies; 28+ messages in thread
From: Atish Patra @ 2020-06-13  5:21 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, Palmer Dabbelt, Alistair Francis,
	Sagar Karandikar, Atish Patra, qemu-riscv, qemu-devel,
	Anup Patel

On Fri, May 29, 2020 at 4:50 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> We extend RISC-V virt machine to allow creating a multi-socket
> machine. Each RISC-V virt machine socket is a NUMA node having
> a set of HARTs, a memory instance, a CLINT instance, and a PLIC
> instance. Other devices are shared between all sockets. We also
> update the generated device tree accordingly.
>
> By default, NUMA multi-socket support is disabled for RISC-V virt
> machine. To enable it, users can use "-numa" command-line options
> of QEMU.
>
> Example1: For two NUMA nodes with 2 CPUs each, append following
> to command-line options: "-smp 4 -numa node -numa node"
>
> Example2: For two NUMA nodes with 1 and 3 CPUs, append following
> to command-line options:
> "-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \
> -numa cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \
> -numa cpu,node-id=1,core-id=3"
>
> The maximum number of sockets in a RISC-V virt machine is 8
> but this limit can be changed in future.
>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> ---
>  hw/riscv/virt.c         | 530 +++++++++++++++++++++++-----------------
>  include/hw/riscv/virt.h |   9 +-
>  2 files changed, 308 insertions(+), 231 deletions(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index 421815081d..2863b42cea 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -35,6 +35,7 @@
>  #include "hw/riscv/sifive_test.h"
>  #include "hw/riscv/virt.h"
>  #include "hw/riscv/boot.h"
> +#include "hw/riscv/numa.h"
>  #include "chardev/char.h"
>  #include "sysemu/arch_init.h"
>  #include "sysemu/device_tree.h"
> @@ -60,7 +61,7 @@ static const struct MemmapEntry {
>      [VIRT_TEST] =        {   0x100000,        0x1000 },
>      [VIRT_RTC] =         {   0x101000,        0x1000 },
>      [VIRT_CLINT] =       {  0x2000000,       0x10000 },
> -    [VIRT_PLIC] =        {  0xc000000,     0x4000000 },
> +    [VIRT_PLIC] =        {  0xc000000, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
>      [VIRT_UART0] =       { 0x10000000,         0x100 },
>      [VIRT_VIRTIO] =      { 0x10001000,        0x1000 },
>      [VIRT_FLASH] =       { 0x20000000,     0x4000000 },
> @@ -182,10 +183,17 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
>      uint64_t mem_size, const char *cmdline)
>  {
>      void *fdt;
> -    int cpu, i;
> -    uint32_t *cells;
> -    char *nodename;
> -    uint32_t plic_phandle, test_phandle, phandle = 1;
> +    int i, cpu, socket;
> +    MachineState *mc = MACHINE(s);
> +    uint64_t addr, size;
> +    uint32_t *clint_cells, *plic_cells;
> +    unsigned long clint_addr, plic_addr;
> +    uint32_t plic_phandle[MAX_NODES];
> +    uint32_t cpu_phandle, intc_phandle, test_phandle;
> +    uint32_t phandle = 1, plic_mmio_phandle = 1;
> +    uint32_t plic_pcie_phandle = 1, plic_virtio_phandle = 1;
> +    char *mem_name, *cpu_name, *core_name, *intc_name;
> +    char *name, *clint_name, *plic_name, *clust_name;
>      hwaddr flashsize = virt_memmap[VIRT_FLASH].size / 2;
>      hwaddr flashbase = virt_memmap[VIRT_FLASH].base;
>
> @@ -206,231 +214,238 @@ static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
>      qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
>      qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
>
> -    nodename = g_strdup_printf("/memory@%lx",
> -        (long)memmap[VIRT_DRAM].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        memmap[VIRT_DRAM].base >> 32, memmap[VIRT_DRAM].base,
> -        mem_size >> 32, mem_size);
> -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> -    g_free(nodename);
> -
>      qemu_fdt_add_subnode(fdt, "/cpus");
>      qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
>                            SIFIVE_CLINT_TIMEBASE_FREQ);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
> +    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> +
> +    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
> +        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
> +        qemu_fdt_add_subnode(fdt, clust_name);
> +
> +        plic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> +        clint_cells = g_new0(uint32_t, s->soc[socket].num_harts * 4);
> +
> +        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
> +            cpu_phandle = phandle++;
>
> -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> -        int cpu_phandle = phandle++;
> -        int intc_phandle;
> -        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> -        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
> -        qemu_fdt_add_subnode(fdt, nodename);
> +            cpu_name = g_strdup_printf("/cpus/cpu@%d",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_add_subnode(fdt, cpu_name);
>  #if defined(TARGET_RISCV32)
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv32");
>  #else
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv48");
>  #endif
> -        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
> -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
> -        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
> -        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
> -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
> -        qemu_fdt_setprop_cell(fdt, nodename, "phandle", cpu_phandle);
> -        intc_phandle = phandle++;
> -        qemu_fdt_add_subnode(fdt, intc);
> -        qemu_fdt_setprop_cell(fdt, intc, "phandle", intc_phandle);
> -        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
> -        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
> -        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
> -        g_free(isa);
> -        g_free(intc);
> -        g_free(nodename);
> -    }
> +            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
> +            g_free(name);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
> +            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
> +
> +            intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
> +            qemu_fdt_add_subnode(fdt, intc_name);
> +            intc_phandle = phandle++;
> +            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
> +            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
> +                "riscv,cpu-intc");
> +            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
> +            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells", 1);
> +
> +            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> +            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> +
> +            plic_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> +            plic_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> +            plic_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> +            plic_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> +
> +            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
> +            qemu_fdt_add_subnode(fdt, core_name);
> +            qemu_fdt_setprop_cell(fdt, core_name, "cpu", cpu_phandle);
> +
> +            g_free(core_name);
> +            g_free(intc_name);
> +            g_free(cpu_name);
> +        }
>
> -    /* Add cpu-topology node */
> -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> -    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map/cluster0");
> -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> -        char *core_nodename = g_strdup_printf("/cpus/cpu-map/cluster0/core%d",
> -                                              cpu);
> -        char *cpu_nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, cpu_nodename);
> -        qemu_fdt_add_subnode(fdt, core_nodename);
> -        qemu_fdt_setprop_cell(fdt, core_nodename, "cpu", intc_phandle);
> -        g_free(core_nodename);
> -        g_free(cpu_nodename);
> +        addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(mc, socket);
> +        size = riscv_socket_mem_size(mc, socket);
> +        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
> +        qemu_fdt_add_subnode(fdt, mem_name);
> +        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
> +            addr >> 32, addr, size >> 32, size);
> +        qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
> +        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
> +        g_free(mem_name);
> +
> +        clint_addr = memmap[VIRT_CLINT].base +
> +            (memmap[VIRT_CLINT].size * socket);
> +        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
> +        qemu_fdt_add_subnode(fdt, clint_name);
> +        qemu_fdt_setprop_string(fdt, clint_name, "compatible", "riscv,clint0");
> +        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
> +            0x0, clint_addr, 0x0, memmap[VIRT_CLINT].size);
> +        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
> +            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> +        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
> +        g_free(clint_name);
> +
> +        plic_phandle[socket] = phandle++;
> +        plic_addr = memmap[VIRT_PLIC].base + (memmap[VIRT_PLIC].size * socket);
> +        plic_name = g_strdup_printf("/soc/plic@%lx", plic_addr);
> +        qemu_fdt_add_subnode(fdt, plic_name);
> +        qemu_fdt_setprop_cell(fdt, plic_name,
> +            "#address-cells", FDT_PLIC_ADDR_CELLS);
> +        qemu_fdt_setprop_cell(fdt, plic_name,
> +            "#interrupt-cells", FDT_PLIC_INT_CELLS);
> +        qemu_fdt_setprop_string(fdt, plic_name, "compatible", "riscv,plic0");
> +        qemu_fdt_setprop(fdt, plic_name, "interrupt-controller", NULL, 0);
> +        qemu_fdt_setprop(fdt, plic_name, "interrupts-extended",
> +            plic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> +        qemu_fdt_setprop_cells(fdt, plic_name, "reg",
> +            0x0, plic_addr, 0x0, memmap[VIRT_PLIC].size);
> +        qemu_fdt_setprop_cell(fdt, plic_name, "riscv,ndev", VIRTIO_NDEV);
> +        riscv_socket_fdt_write_id(mc, fdt, plic_name, socket);
> +        qemu_fdt_setprop_cell(fdt, plic_name, "phandle", plic_phandle[socket]);
> +        g_free(plic_name);
> +
> +        g_free(clint_cells);
> +        g_free(plic_cells);
> +        g_free(clust_name);
>      }
>
> -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> -        nodename =
> -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> -        g_free(nodename);
> -    }
> -    nodename = g_strdup_printf("/soc/clint@%lx",
> -        (long)memmap[VIRT_CLINT].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        0x0, memmap[VIRT_CLINT].base,
> -        0x0, memmap[VIRT_CLINT].size);
> -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> -    g_free(cells);
> -    g_free(nodename);
> -
> -    plic_phandle = phandle++;
> -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> -        nodename =
> -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
> -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
> -        g_free(nodename);
> +    for (socket = 0; socket < riscv_socket_count(mc); socket++) {
> +        if (socket == 0) {
> +            plic_mmio_phandle = plic_phandle[socket];
> +            plic_virtio_phandle = plic_phandle[socket];
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
> +        if (socket == 1) {
> +            plic_virtio_phandle = plic_phandle[socket];
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
> +        if (socket == 2) {
> +            plic_pcie_phandle = plic_phandle[socket];
> +        }
>      }
> -    nodename = g_strdup_printf("/soc/interrupt-controller@%lx",
> -        (long)memmap[VIRT_PLIC].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> -                          FDT_PLIC_ADDR_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> -                          FDT_PLIC_INT_CELLS);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,plic0");
> -    qemu_fdt_setprop(fdt, nodename, "interrupt-controller", NULL, 0);
> -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        0x0, memmap[VIRT_PLIC].base,
> -        0x0, memmap[VIRT_PLIC].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
> -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
> -    plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -    g_free(cells);
> -    g_free(nodename);
> +
> +    riscv_socket_fdt_write_distance_matrix(mc, fdt);
>
>      for (i = 0; i < VIRTIO_COUNT; i++) {
> -        nodename = g_strdup_printf("/virtio_mmio@%lx",
> +        name = g_strdup_printf("/soc/virtio_mmio@%lx",
>              (long)(memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size));
> -        qemu_fdt_add_subnode(fdt, nodename);
> -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "virtio,mmio");
> -        qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +        qemu_fdt_add_subnode(fdt, name);
> +        qemu_fdt_setprop_string(fdt, name, "compatible", "virtio,mmio");
> +        qemu_fdt_setprop_cells(fdt, name, "reg",
>              0x0, memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
>              0x0, memmap[VIRT_VIRTIO].size);
> -        qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -        qemu_fdt_setprop_cell(fdt, nodename, "interrupts", VIRTIO_IRQ + i);
> -        g_free(nodename);
> +        qemu_fdt_setprop_cell(fdt, name, "interrupt-parent",
> +            plic_virtio_phandle);
> +        qemu_fdt_setprop_cell(fdt, name, "interrupts", VIRTIO_IRQ + i);
> +        g_free(name);
>      }
>
> -    nodename = g_strdup_printf("/soc/pci@%lx",
> +    name = g_strdup_printf("/soc/pci@%lx",
>          (long) memmap[VIRT_PCIE_ECAM].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
> -                          FDT_PCI_ADDR_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
> -                          FDT_PCI_INT_CELLS);
> -    qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0x2);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> -                            "pci-host-ecam-generic");
> -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "pci");
> -    qemu_fdt_setprop_cell(fdt, nodename, "linux,pci-domain", 0);
> -    qemu_fdt_setprop_cells(fdt, nodename, "bus-range", 0,
> -                           memmap[VIRT_PCIE_ECAM].size /
> -                               PCIE_MMCFG_SIZE_MIN - 1);
> -    qemu_fdt_setprop(fdt, nodename, "dma-coherent", NULL, 0);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg", 0, memmap[VIRT_PCIE_ECAM].base,
> -                           0, memmap[VIRT_PCIE_ECAM].size);
> -    qemu_fdt_setprop_sized_cells(fdt, nodename, "ranges",
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_cell(fdt, name, "#address-cells", FDT_PCI_ADDR_CELLS);
> +    qemu_fdt_setprop_cell(fdt, name, "#interrupt-cells", FDT_PCI_INT_CELLS);
> +    qemu_fdt_setprop_cell(fdt, name, "#size-cells", 0x2);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "pci-host-ecam-generic");
> +    qemu_fdt_setprop_string(fdt, name, "device_type", "pci");
> +    qemu_fdt_setprop_cell(fdt, name, "linux,pci-domain", 0);
> +    qemu_fdt_setprop_cells(fdt, name, "bus-range", 0,
> +        memmap[VIRT_PCIE_ECAM].size / PCIE_MMCFG_SIZE_MIN - 1);
> +    qemu_fdt_setprop(fdt, name, "dma-coherent", NULL, 0);
> +    qemu_fdt_setprop_cells(fdt, name, "reg", 0,
> +        memmap[VIRT_PCIE_ECAM].base, 0, memmap[VIRT_PCIE_ECAM].size);
> +    qemu_fdt_setprop_sized_cells(fdt, name, "ranges",
>          1, FDT_PCI_RANGE_IOPORT, 2, 0,
>          2, memmap[VIRT_PCIE_PIO].base, 2, memmap[VIRT_PCIE_PIO].size,
>          1, FDT_PCI_RANGE_MMIO,
>          2, memmap[VIRT_PCIE_MMIO].base,
>          2, memmap[VIRT_PCIE_MMIO].base, 2, memmap[VIRT_PCIE_MMIO].size);
> -    create_pcie_irq_map(fdt, nodename, plic_phandle);
> -    g_free(nodename);
> +    create_pcie_irq_map(fdt, name, plic_pcie_phandle);
> +    g_free(name);
>
>      test_phandle = phandle++;
> -    nodename = g_strdup_printf("/test@%lx",
> +    name = g_strdup_printf("/soc/test@%lx",
>          (long)memmap[VIRT_TEST].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> +    qemu_fdt_add_subnode(fdt, name);
>      {
>          const char compat[] = "sifive,test1\0sifive,test0\0syscon";
> -        qemu_fdt_setprop(fdt, nodename, "compatible", compat, sizeof(compat));
> +        qemu_fdt_setprop(fdt, name, "compatible", compat, sizeof(compat));
>      }
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_TEST].base,
>          0x0, memmap[VIRT_TEST].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "phandle", test_phandle);
> -    test_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/reboot");
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-reboot");
> -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_RESET);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/poweroff");
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "syscon-poweroff");
> -    qemu_fdt_setprop_cell(fdt, nodename, "regmap", test_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "offset", 0x0);
> -    qemu_fdt_setprop_cell(fdt, nodename, "value", FINISHER_PASS);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/uart@%lx",
> -        (long)memmap[VIRT_UART0].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "ns16550a");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    qemu_fdt_setprop_cell(fdt, name, "phandle", test_phandle);
> +    test_phandle = qemu_fdt_get_phandle(fdt, name);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/reboot");
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-reboot");
> +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_RESET);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/poweroff");
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "syscon-poweroff");
> +    qemu_fdt_setprop_cell(fdt, name, "regmap", test_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "offset", 0x0);
> +    qemu_fdt_setprop_cell(fdt, name, "value", FINISHER_PASS);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/uart@%lx", (long)memmap[VIRT_UART0].base);
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "ns16550a");
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_UART0].base,
>          0x0, memmap[VIRT_UART0].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", UART0_IRQ);
> +    qemu_fdt_setprop_cell(fdt, name, "clock-frequency", 3686400);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupts", UART0_IRQ);
>
>      qemu_fdt_add_subnode(fdt, "/chosen");
> -    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", nodename);
> +    qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", name);
>      if (cmdline) {
>          qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
>      }
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/rtc@%lx",
> -        (long)memmap[VIRT_RTC].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible",
> -        "google,goldfish-rtc");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/rtc@%lx", (long)memmap[VIRT_RTC].base);
> +    qemu_fdt_add_subnode(fdt, name);
> +    qemu_fdt_setprop_string(fdt, name, "compatible", "google,goldfish-rtc");
> +    qemu_fdt_setprop_cells(fdt, name, "reg",
>          0x0, memmap[VIRT_RTC].base,
>          0x0, memmap[VIRT_RTC].size);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
> -    qemu_fdt_setprop_cell(fdt, nodename, "interrupts", RTC_IRQ);
> -    g_free(nodename);
> -
> -    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
> -    qemu_fdt_add_subnode(s->fdt, nodename);
> -    qemu_fdt_setprop_string(s->fdt, nodename, "compatible", "cfi-flash");
> -    qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg",
> +    qemu_fdt_setprop_cell(fdt, name, "interrupt-parent", plic_mmio_phandle);
> +    qemu_fdt_setprop_cell(fdt, name, "interrupts", RTC_IRQ);
> +    g_free(name);
> +
> +    name = g_strdup_printf("/soc/flash@%" PRIx64, flashbase);
> +    qemu_fdt_add_subnode(s->fdt, name);
> +    qemu_fdt_setprop_string(s->fdt, name, "compatible", "cfi-flash");
> +    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
>                                   2, flashbase, 2, flashsize,
>                                   2, flashbase + flashsize, 2, flashsize);
> -    qemu_fdt_setprop_cell(s->fdt, nodename, "bank-width", 4);
> -    g_free(nodename);
> +    qemu_fdt_setprop_cell(s->fdt, name, "bank-width", 4);
> +    g_free(name);
>  }
>
> -
>  static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
>                                            hwaddr ecam_base, hwaddr ecam_size,
>                                            hwaddr mmio_base, hwaddr mmio_size,
> @@ -478,21 +493,100 @@ static void riscv_virt_board_init(MachineState *machine)
>      MemoryRegion *system_memory = get_system_memory();
>      MemoryRegion *main_mem = g_new(MemoryRegion, 1);
>      MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
> -    char *plic_hart_config;
> +    char *plic_hart_config, *soc_name;
>      size_t plic_hart_config_len;
>      target_ulong start_addr = memmap[VIRT_DRAM].base;
> -    int i;
> -    unsigned int smp_cpus = machine->smp.cpus;
> -
> -    /* Initialize SOC */
> -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> -                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> -    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-type",
> -                            &error_abort);
> -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> -                            &error_abort);
> -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> -                            &error_abort);
> +    DeviceState *mmio_plic, *virtio_plic, *pcie_plic;
> +    int i, j, base_hartid, hart_count;
> +
> +    /* Check socket count limit */
> +    if (VIRT_SOCKETS_MAX < riscv_socket_count(machine)) {
> +        error_report("number of sockets/nodes should be less than %d",
> +            VIRT_SOCKETS_MAX);
> +        exit(1);
> +    }
> +
> +    /* Initialize sockets */
> +    mmio_plic = virtio_plic = pcie_plic = NULL;
> +    for (i = 0; i < riscv_socket_count(machine); i++) {
> +        if (!riscv_socket_check_hartids(machine, i)) {
> +            error_report("discontinuous hartids in socket%d", i);
> +            exit(1);
> +        }
> +
> +        base_hartid = riscv_socket_first_hartid(machine, i);
> +        if (base_hartid < 0) {
> +            error_report("can't find hartid base for socket%d", i);
> +            exit(1);
> +        }
> +
> +        hart_count = riscv_socket_hart_count(machine, i);
> +        if (hart_count < 0) {
> +            error_report("can't find hart count for socket%d", i);
> +            exit(1);
> +        }
> +
> +        soc_name = g_strdup_printf("soc%d", i);
> +        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
> +            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> +        g_free(soc_name);
> +        object_property_set_str(OBJECT(&s->soc[i]),
> +            machine->cpu_type, "cpu-type", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            base_hartid, "hartid-base", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            hart_count, "num-harts", &error_abort);
> +        object_property_set_bool(OBJECT(&s->soc[i]),
> +            true, "realized", &error_abort);
> +
> +        /* Per-socket CLINT */
> +        sifive_clint_create(
> +            memmap[VIRT_CLINT].base + i * memmap[VIRT_CLINT].size,
> +            memmap[VIRT_CLINT].size, base_hartid, hart_count,
> +            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
> +
> +        /* Per-socket PLIC hart topology configuration string */
> +        plic_hart_config_len =
> +            (strlen(VIRT_PLIC_HART_CONFIG) + 1) * hart_count;
> +        plic_hart_config = g_malloc0(plic_hart_config_len);
> +        for (j = 0; j < hart_count; j++) {
> +            if (j != 0) {
> +                strncat(plic_hart_config, ",", plic_hart_config_len);
> +            }
> +            strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG,
> +                plic_hart_config_len);
> +            plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> +        }
> +
> +        /* Per-socket PLIC */
> +        s->plic[i] = sifive_plic_create(
> +            memmap[VIRT_PLIC].base + i * memmap[VIRT_PLIC].size,
> +            plic_hart_config, base_hartid,
> +            VIRT_PLIC_NUM_SOURCES,
> +            VIRT_PLIC_NUM_PRIORITIES,
> +            VIRT_PLIC_PRIORITY_BASE,
> +            VIRT_PLIC_PENDING_BASE,
> +            VIRT_PLIC_ENABLE_BASE,
> +            VIRT_PLIC_ENABLE_STRIDE,
> +            VIRT_PLIC_CONTEXT_BASE,
> +            VIRT_PLIC_CONTEXT_STRIDE,
> +            memmap[VIRT_PLIC].size);
> +        g_free(plic_hart_config);
> +
> +        /* Try to use different PLIC instance based device type */
> +        if (i == 0) {
> +            mmio_plic = s->plic[i];
> +            virtio_plic = s->plic[i];
> +            pcie_plic = s->plic[i];
> +        }
> +        if (i == 1) {
> +            virtio_plic = s->plic[i];
> +            pcie_plic = s->plic[i];
> +        }
> +        if (i == 2) {
> +            pcie_plic = s->plic[i];
> +        }
> +    }
>
>      /* register system main memory (actual RAM) */
>      memory_region_init_ram(main_mem, NULL, "riscv_virt_board.ram",
> @@ -571,38 +665,14 @@ static void riscv_virt_board_init(MachineState *machine)
>                            memmap[VIRT_MROM].base + sizeof(reset_vec),
>                            &address_space_memory);
>
> -    /* create PLIC hart topology configuration string */
> -    plic_hart_config_len = (strlen(VIRT_PLIC_HART_CONFIG) + 1) * smp_cpus;
> -    plic_hart_config = g_malloc0(plic_hart_config_len);
> -    for (i = 0; i < smp_cpus; i++) {
> -        if (i != 0) {
> -            strncat(plic_hart_config, ",", plic_hart_config_len);
> -        }
> -        strncat(plic_hart_config, VIRT_PLIC_HART_CONFIG, plic_hart_config_len);
> -        plic_hart_config_len -= (strlen(VIRT_PLIC_HART_CONFIG) + 1);
> -    }
> -
> -    /* MMIO */
> -    s->plic = sifive_plic_create(memmap[VIRT_PLIC].base,
> -        plic_hart_config, 0,
> -        VIRT_PLIC_NUM_SOURCES,
> -        VIRT_PLIC_NUM_PRIORITIES,
> -        VIRT_PLIC_PRIORITY_BASE,
> -        VIRT_PLIC_PENDING_BASE,
> -        VIRT_PLIC_ENABLE_BASE,
> -        VIRT_PLIC_ENABLE_STRIDE,
> -        VIRT_PLIC_CONTEXT_BASE,
> -        VIRT_PLIC_CONTEXT_STRIDE,
> -        memmap[VIRT_PLIC].size);
> -    sifive_clint_create(memmap[VIRT_CLINT].base,
> -        memmap[VIRT_CLINT].size, 0, smp_cpus,
> -        SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, true);
> +    /* SiFive Test MMIO device */
>      sifive_test_create(memmap[VIRT_TEST].base);
>
> +    /* VirtIO MMIO devices */
>      for (i = 0; i < VIRTIO_COUNT; i++) {
>          sysbus_create_simple("virtio-mmio",
>              memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
> -            qdev_get_gpio_in(DEVICE(s->plic), VIRTIO_IRQ + i));
> +            qdev_get_gpio_in(DEVICE(virtio_plic), VIRTIO_IRQ + i));
>      }
>
>      gpex_pcie_init(system_memory,
> @@ -611,14 +681,14 @@ static void riscv_virt_board_init(MachineState *machine)
>                           memmap[VIRT_PCIE_MMIO].base,
>                           memmap[VIRT_PCIE_MMIO].size,
>                           memmap[VIRT_PCIE_PIO].base,
> -                         DEVICE(s->plic), true);
> +                         DEVICE(pcie_plic), true);
>
>      serial_mm_init(system_memory, memmap[VIRT_UART0].base,
> -        0, qdev_get_gpio_in(DEVICE(s->plic), UART0_IRQ), 399193,
> +        0, qdev_get_gpio_in(DEVICE(mmio_plic), UART0_IRQ), 399193,
>          serial_hd(0), DEVICE_LITTLE_ENDIAN);
>
>      sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
> -        qdev_get_gpio_in(DEVICE(s->plic), RTC_IRQ));
> +        qdev_get_gpio_in(DEVICE(mmio_plic), RTC_IRQ));
>
>      virt_flash_create(s);
>
> @@ -628,8 +698,6 @@ static void riscv_virt_board_init(MachineState *machine)
>                                    drive_get(IF_PFLASH, 0, i));
>      }
>      virt_flash_map(s, system_memory);
> -
> -    g_free(plic_hart_config);
>  }
>
>  static void riscv_virt_machine_instance_init(Object *obj)
> @@ -642,9 +710,13 @@ static void riscv_virt_machine_class_init(ObjectClass *oc, void *data)
>
>      mc->desc = "RISC-V VirtIO board";
>      mc->init = riscv_virt_board_init;
> -    mc->max_cpus = 8;
> +    mc->max_cpus = VIRT_CPUS_MAX;
>      mc->default_cpu_type = VIRT_CPU;
>      mc->pci_allow_0_address = true;
> +    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
> +    mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
> +    mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
> +    mc->numa_mem_supported = true;
>  }
>
>  static const TypeInfo riscv_virt_machine_typeinfo = {
> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
> index e69355efaf..1beacd7666 100644
> --- a/include/hw/riscv/virt.h
> +++ b/include/hw/riscv/virt.h
> @@ -23,6 +23,9 @@
>  #include "hw/sysbus.h"
>  #include "hw/block/flash.h"
>
> +#define VIRT_CPUS_MAX 8
> +#define VIRT_SOCKETS_MAX 8
> +
>  #define TYPE_RISCV_VIRT_MACHINE MACHINE_TYPE_NAME("virt")
>  #define RISCV_VIRT_MACHINE(obj) \
>      OBJECT_CHECK(RISCVVirtState, (obj), TYPE_RISCV_VIRT_MACHINE)
> @@ -32,8 +35,8 @@ typedef struct {
>      MachineState parent;
>
>      /*< public >*/
> -    RISCVHartArrayState soc;
> -    DeviceState *plic;
> +    RISCVHartArrayState soc[VIRT_SOCKETS_MAX];
> +    DeviceState *plic[VIRT_SOCKETS_MAX];
>      PFlashCFI01 *flash[2];
>
>      void *fdt;
> @@ -74,6 +77,8 @@ enum {
>  #define VIRT_PLIC_ENABLE_STRIDE 0x80
>  #define VIRT_PLIC_CONTEXT_BASE 0x200000
>  #define VIRT_PLIC_CONTEXT_STRIDE 0x1000
> +#define VIRT_PLIC_SIZE(__num_context) \
> +    (VIRT_PLIC_CONTEXT_BASE + (__num_context) * VIRT_PLIC_CONTEXT_STRIDE)
>
>  #define FDT_PCI_ADDR_CELLS    3
>  #define FDT_PCI_INT_CELLS     1
> --
> 2.25.1
>
>

LGTM.

Reviewed-by: Atish Patra <atish.patra@wdc.com>

-- 
Regards,
Atish


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 4/5] hw/riscv: spike: Allow creating multiple NUMA sockets
  2020-05-29 11:46   ` Anup Patel
@ 2020-06-13  5:34     ` Atish Patra
  -1 siblings, 0 replies; 28+ messages in thread
From: Atish Patra @ 2020-06-13  5:34 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, qemu-riscv, Sagar Karandikar, Anup Patel,
	qemu-devel, Atish Patra, Alistair Francis, Palmer Dabbelt

On Fri, May 29, 2020 at 4:48 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> We extend RISC-V spike machine to allow creating a multi-socket
> machine. Each RISC-V spike machine socket is a NUMA node having
> a set of HARTs, a memory instance, and a CLINT instance. Other
> devices are shared between all sockets. We also update the
> generated device tree accordingly.
>
> By default, NUMA multi-socket support is disabled for RISC-V spike
> machine. To enable it, users can use "-numa" command-line options
> of QEMU.
>
> Example1: For two NUMA nodes with 2 CPUs each, append following
> to command-line options: "-smp 4 -numa node -numa node"
>
> Example2: For two NUMA nodes with 1 and 3 CPUs, append following
> to command-line options:
> "-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \
> -numa cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \
> -numa cpu,node-id=1,core-id=3"
>
> The maximum number of sockets in a RISC-V spike machine is 8
> but this limit can be changed in future.
>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> ---
>  hw/riscv/spike.c         | 268 ++++++++++++++++++++++++++-------------
>  include/hw/riscv/spike.h |  11 +-
>  2 files changed, 187 insertions(+), 92 deletions(-)
>
> diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
> index d5e0103d89..b8373eb1eb 100644
> --- a/hw/riscv/spike.c
> +++ b/hw/riscv/spike.c
> @@ -36,6 +36,7 @@
>  #include "hw/riscv/sifive_clint.h"
>  #include "hw/riscv/spike.h"
>  #include "hw/riscv/boot.h"
> +#include "hw/riscv/numa.h"
>  #include "chardev/char.h"
>  #include "sysemu/arch_init.h"
>  #include "sysemu/device_tree.h"
> @@ -64,9 +65,14 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
>      uint64_t mem_size, const char *cmdline)
>  {
>      void *fdt;
> -    int cpu;
> -    uint32_t *cells;
> -    char *nodename;
> +    uint64_t addr, size;
> +    unsigned long clint_addr;
> +    int cpu, socket;
> +    MachineState *mc = MACHINE(s);
> +    uint32_t *clint_cells;
> +    uint32_t cpu_phandle, intc_phandle, phandle = 1;
> +    char *name, *mem_name, *clint_name, *clust_name;
> +    char *core_name, *cpu_name, *intc_name;
>
>      fdt = s->fdt = create_device_tree(&s->fdt_size);
>      if (!fdt) {
> @@ -88,68 +94,91 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
>      qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
>      qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
>
> -    nodename = g_strdup_printf("/memory@%lx",
> -        (long)memmap[SPIKE_DRAM].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        memmap[SPIKE_DRAM].base >> 32, memmap[SPIKE_DRAM].base,
> -        mem_size >> 32, mem_size);
> -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> -    g_free(nodename);
> -
>      qemu_fdt_add_subnode(fdt, "/cpus");
>      qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
>          SIFIVE_CLINT_TIMEBASE_FREQ);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
> +    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> +
> +    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
> +        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
> +        qemu_fdt_add_subnode(fdt, clust_name);
> +
> +        clint_cells =  g_new0(uint32_t, s->soc[socket].num_harts * 4);
>
> -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> -        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> -        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
> -        qemu_fdt_add_subnode(fdt, nodename);
> +        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
> +            cpu_phandle = phandle++;
> +
> +            cpu_name = g_strdup_printf("/cpus/cpu@%d",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_add_subnode(fdt, cpu_name);
>  #if defined(TARGET_RISCV32)
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv32");
>  #else
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv48");
>  #endif
> -        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
> -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
> -        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
> -        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
> -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
> -        qemu_fdt_add_subnode(fdt, intc);
> -        qemu_fdt_setprop_cell(fdt, intc, "phandle", 1);
> -        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
> -        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
> -        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
> -        g_free(isa);
> -        g_free(intc);
> -        g_free(nodename);
> -    }
> +            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
> +            g_free(name);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
> +            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
> +
> +            intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
> +            qemu_fdt_add_subnode(fdt, intc_name);
> +            intc_phandle = phandle++;
> +            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
> +            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
> +                "riscv,cpu-intc");
> +            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
> +            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells", 1);
> +
> +            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> +            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> +
> +            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
> +            qemu_fdt_add_subnode(fdt, core_name);
> +            qemu_fdt_setprop_cell(fdt, core_name, "cpu", cpu_phandle);
> +
> +            g_free(core_name);
> +            g_free(intc_name);
> +            g_free(cpu_name);
> +        }
>
> -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> -        nodename =
> -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> -        g_free(nodename);
> +        addr = memmap[SPIKE_DRAM].base + riscv_socket_mem_offset(mc, socket);
> +        size = riscv_socket_mem_size(mc, socket);
> +        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
> +        qemu_fdt_add_subnode(fdt, mem_name);
> +        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
> +            addr >> 32, addr, size >> 32, size);
> +        qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
> +        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
> +        g_free(mem_name);
> +
> +        clint_addr = memmap[SPIKE_CLINT].base +
> +            (memmap[SPIKE_CLINT].size * socket);
> +        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
> +        qemu_fdt_add_subnode(fdt, clint_name);
> +        qemu_fdt_setprop_string(fdt, clint_name, "compatible", "riscv,clint0");
> +        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
> +            0x0, clint_addr, 0x0, memmap[SPIKE_CLINT].size);
> +        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
> +            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> +        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
> +
> +        g_free(clint_name);
> +        g_free(clint_cells);
> +        g_free(clust_name);
>      }
> -    nodename = g_strdup_printf("/soc/clint@%lx",
> -        (long)memmap[SPIKE_CLINT].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        0x0, memmap[SPIKE_CLINT].base,
> -        0x0, memmap[SPIKE_CLINT].size);
> -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> -    g_free(cells);
> -    g_free(nodename);
> +
> +    riscv_socket_fdt_write_distance_matrix(mc, fdt);
>
>      if (cmdline) {
>          qemu_fdt_add_subnode(fdt, "/chosen");
> @@ -160,23 +189,58 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
>  static void spike_board_init(MachineState *machine)
>  {
>      const struct MemmapEntry *memmap = spike_memmap;
> -
> -    SpikeState *s = g_new0(SpikeState, 1);
> +    SpikeState *s = SPIKE_MACHINE(machine);
>      MemoryRegion *system_memory = get_system_memory();
>      MemoryRegion *main_mem = g_new(MemoryRegion, 1);
>      MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
> -    int i;
> -    unsigned int smp_cpus = machine->smp.cpus;
> +    char *soc_name;
> +    int i, base_hartid, hart_count;
>
> -    /* Initialize SOC */
> -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> -                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> -    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-type",
> -                            &error_abort);
> -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> -                            &error_abort);
> -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> -                            &error_abort);
> +    /* Check socket count limit */
> +    if (SPIKE_SOCKETS_MAX < riscv_socket_count(machine)) {
> +        error_report("number of sockets/nodes should be less than %d",
> +            SPIKE_SOCKETS_MAX);
> +        exit(1);
> +    }
> +
> +    /* Initialize sockets */
> +    for (i = 0; i < riscv_socket_count(machine); i++) {
> +        if (!riscv_socket_check_hartids(machine, i)) {
> +            error_report("discontinuous hartids in socket%d", i);
> +            exit(1);
> +        }
> +
> +        base_hartid = riscv_socket_first_hartid(machine, i);
> +        if (base_hartid < 0) {
> +            error_report("can't find hartid base for socket%d", i);
> +            exit(1);
> +        }
> +
> +        hart_count = riscv_socket_hart_count(machine, i);
> +        if (hart_count < 0) {
> +            error_report("can't find hart count for socket%d", i);
> +            exit(1);
> +        }
> +
> +        soc_name = g_strdup_printf("soc%d", i);
> +        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
> +            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> +        g_free(soc_name);
> +        object_property_set_str(OBJECT(&s->soc[i]),
> +            machine->cpu_type, "cpu-type", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            base_hartid, "hartid-base", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            hart_count, "num-harts", &error_abort);
> +        object_property_set_bool(OBJECT(&s->soc[i]),
> +            true, "realized", &error_abort);
> +
> +        /* Core Local Interruptor (timer and IPI) for each socket */
> +        sifive_clint_create(
> +            memmap[SPIKE_CLINT].base + i * memmap[SPIKE_CLINT].size,
> +            memmap[SPIKE_CLINT].size, base_hartid, hart_count,
> +            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, false);
> +    }
>
>      /* register system main memory (actual RAM) */
>      memory_region_init_ram(main_mem, NULL, "riscv.spike.ram",
> @@ -249,12 +313,8 @@ static void spike_board_init(MachineState *machine)
>                            &address_space_memory);
>
>      /* initialize HTIF using symbols found in load_kernel */
> -    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
> -
> -    /* Core Local Interruptor (timer and IPI) */
> -    sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
> -        0, smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
> -        false);
> +    htif_mm_init(system_memory, mask_rom,
> +                 &s->soc[0].harts[0].env, serial_hd(0));
>  }
>
>  static void spike_v1_10_0_board_init(MachineState *machine)
> @@ -275,13 +335,14 @@ static void spike_v1_10_0_board_init(MachineState *machine)
>      }
>
>      /* Initialize SOC */
> -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> +    object_initialize_child(OBJECT(machine), "soc",
> +                            &s->soc[0], sizeof(s->soc[0]),
>                              TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> -    object_property_set_str(OBJECT(&s->soc), SPIKE_V1_10_0_CPU, "cpu-type",
> +    object_property_set_str(OBJECT(&s->soc[0]), SPIKE_V1_10_0_CPU, "cpu-type",
>                              &error_abort);
> -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> +    object_property_set_int(OBJECT(&s->soc[0]), smp_cpus, "num-harts",
>                              &error_abort);
> -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> +    object_property_set_bool(OBJECT(&s->soc[0]), true, "realized",
>                              &error_abort);
>
>      /* register system main memory (actual RAM) */
> @@ -339,7 +400,8 @@ static void spike_v1_10_0_board_init(MachineState *machine)
>                            &address_space_memory);
>
>      /* initialize HTIF using symbols found in load_kernel */
> -    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
> +    htif_mm_init(system_memory, mask_rom,
> +                 &s->soc[0].harts[0].env, serial_hd(0));
>
>      /* Core Local Interruptor (timer and IPI) */
>      sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
> @@ -365,13 +427,14 @@ static void spike_v1_09_1_board_init(MachineState *machine)
>      }
>
>      /* Initialize SOC */
> -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> +    object_initialize_child(OBJECT(machine), "soc",
> +                            &s->soc[0], sizeof(s->soc[0]),
>                              TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> -    object_property_set_str(OBJECT(&s->soc), SPIKE_V1_09_1_CPU, "cpu-type",
> +    object_property_set_str(OBJECT(&s->soc[0]), SPIKE_V1_09_1_CPU, "cpu-type",
>                              &error_abort);
> -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> +    object_property_set_int(OBJECT(&s->soc[0]), smp_cpus, "num-harts",
>                              &error_abort);
> -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> +    object_property_set_bool(OBJECT(&s->soc[0]), true, "realized",
>                              &error_abort);
>
>      /* register system main memory (actual RAM) */
> @@ -425,7 +488,7 @@ static void spike_v1_09_1_board_init(MachineState *machine)
>          "};\n";
>
>      /* build config string with supplied memory size */
> -    char *isa = riscv_isa_string(&s->soc.harts[0]);
> +    char *isa = riscv_isa_string(&s->soc[0].harts[0]);
>      char *config_string = g_strdup_printf(config_string_tmpl,
>          (uint64_t)memmap[SPIKE_CLINT].base + SIFIVE_TIME_BASE,
>          (uint64_t)memmap[SPIKE_DRAM].base,
> @@ -448,7 +511,8 @@ static void spike_v1_09_1_board_init(MachineState *machine)
>                            &address_space_memory);
>
>      /* initialize HTIF using symbols found in load_kernel */
> -    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
> +    htif_mm_init(system_memory, mask_rom,
> +                 &s->soc[0].harts[0].env, serial_hd(0));
>
>      /* Core Local Interruptor (timer and IPI) */
>      sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
> @@ -472,15 +536,39 @@ static void spike_v1_10_0_machine_init(MachineClass *mc)
>      mc->max_cpus = 1;
>  }
>
> -static void spike_machine_init(MachineClass *mc)
> +DEFINE_MACHINE("spike_v1.9.1", spike_v1_09_1_machine_init)
> +DEFINE_MACHINE("spike_v1.10", spike_v1_10_0_machine_init)
> +
> +static void spike_machine_instance_init(Object *obj)
> +{
> +}
> +
> +static void spike_machine_class_init(ObjectClass *oc, void *data)
>  {
> -    mc->desc = "RISC-V Spike Board";
> +    MachineClass *mc = MACHINE_CLASS(oc);
> +
> +    mc->desc = "RISC-V Spike board";
>      mc->init = spike_board_init;
> -    mc->max_cpus = 8;
> +    mc->max_cpus = SPIKE_CPUS_MAX;
>      mc->is_default = true;
>      mc->default_cpu_type = SPIKE_V1_10_0_CPU;
> +    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
> +    mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
> +    mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
> +    mc->numa_mem_supported = true;
>  }
>
> -DEFINE_MACHINE("spike_v1.9.1", spike_v1_09_1_machine_init)
> -DEFINE_MACHINE("spike_v1.10", spike_v1_10_0_machine_init)
> -DEFINE_MACHINE("spike", spike_machine_init)
> +static const TypeInfo spike_machine_typeinfo = {
> +    .name       = MACHINE_TYPE_NAME("spike"),
> +    .parent     = TYPE_MACHINE,
> +    .class_init = spike_machine_class_init,
> +    .instance_init = spike_machine_instance_init,
> +    .instance_size = sizeof(SpikeState),
> +};
> +
> +static void spike_machine_init_register_types(void)
> +{
> +    type_register_static(&spike_machine_typeinfo);
> +}
> +
> +type_init(spike_machine_init_register_types)
> diff --git a/include/hw/riscv/spike.h b/include/hw/riscv/spike.h
> index dc770421bc..c55fdf4d24 100644
> --- a/include/hw/riscv/spike.h
> +++ b/include/hw/riscv/spike.h
> @@ -22,12 +22,19 @@
>  #include "hw/riscv/riscv_hart.h"
>  #include "hw/sysbus.h"
>
> +#define SPIKE_CPUS_MAX 8
> +#define SPIKE_SOCKETS_MAX 8
> +
> +#define TYPE_SPIKE_MACHINE MACHINE_TYPE_NAME("spike")
> +#define SPIKE_MACHINE(obj) \
> +    OBJECT_CHECK(SpikeState, (obj), TYPE_SPIKE_MACHINE)
> +
>  typedef struct {
>      /*< private >*/
> -    SysBusDevice parent_obj;
> +    MachineState parent;
>
>      /*< public >*/
> -    RISCVHartArrayState soc;
> +    RISCVHartArrayState soc[SPIKE_SOCKETS_MAX];
>      void *fdt;
>      int fdt_size;
>  } SpikeState;
> --
> 2.25.1
>
>

As the upstream version of spike removed the deprecated ISA specific
machines, the rebased patch
will be bit different from this version. But I don't think there will
be any change in functionality.

With that assumption:

Reviewed-by: Atish Patra <atish.patra@wdc.com>

-- 
Regards,
Atish


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 4/5] hw/riscv: spike: Allow creating multiple NUMA sockets
@ 2020-06-13  5:34     ` Atish Patra
  0 siblings, 0 replies; 28+ messages in thread
From: Atish Patra @ 2020-06-13  5:34 UTC (permalink / raw)
  To: Anup Patel
  Cc: Peter Maydell, Palmer Dabbelt, Alistair Francis,
	Sagar Karandikar, Atish Patra, qemu-riscv, qemu-devel,
	Anup Patel

On Fri, May 29, 2020 at 4:48 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> We extend RISC-V spike machine to allow creating a multi-socket
> machine. Each RISC-V spike machine socket is a NUMA node having
> a set of HARTs, a memory instance, and a CLINT instance. Other
> devices are shared between all sockets. We also update the
> generated device tree accordingly.
>
> By default, NUMA multi-socket support is disabled for RISC-V spike
> machine. To enable it, users can use "-numa" command-line options
> of QEMU.
>
> Example1: For two NUMA nodes with 2 CPUs each, append following
> to command-line options: "-smp 4 -numa node -numa node"
>
> Example2: For two NUMA nodes with 1 and 3 CPUs, append following
> to command-line options:
> "-smp 4 -numa node -numa node -numa cpu,node-id=0,core-id=0 \
> -numa cpu,node-id=1,core-id=1 -numa cpu,node-id=1,core-id=2 \
> -numa cpu,node-id=1,core-id=3"
>
> The maximum number of sockets in a RISC-V spike machine is 8
> but this limit can be changed in future.
>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> ---
>  hw/riscv/spike.c         | 268 ++++++++++++++++++++++++++-------------
>  include/hw/riscv/spike.h |  11 +-
>  2 files changed, 187 insertions(+), 92 deletions(-)
>
> diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
> index d5e0103d89..b8373eb1eb 100644
> --- a/hw/riscv/spike.c
> +++ b/hw/riscv/spike.c
> @@ -36,6 +36,7 @@
>  #include "hw/riscv/sifive_clint.h"
>  #include "hw/riscv/spike.h"
>  #include "hw/riscv/boot.h"
> +#include "hw/riscv/numa.h"
>  #include "chardev/char.h"
>  #include "sysemu/arch_init.h"
>  #include "sysemu/device_tree.h"
> @@ -64,9 +65,14 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
>      uint64_t mem_size, const char *cmdline)
>  {
>      void *fdt;
> -    int cpu;
> -    uint32_t *cells;
> -    char *nodename;
> +    uint64_t addr, size;
> +    unsigned long clint_addr;
> +    int cpu, socket;
> +    MachineState *mc = MACHINE(s);
> +    uint32_t *clint_cells;
> +    uint32_t cpu_phandle, intc_phandle, phandle = 1;
> +    char *name, *mem_name, *clint_name, *clust_name;
> +    char *core_name, *cpu_name, *intc_name;
>
>      fdt = s->fdt = create_device_tree(&s->fdt_size);
>      if (!fdt) {
> @@ -88,68 +94,91 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
>      qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
>      qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
>
> -    nodename = g_strdup_printf("/memory@%lx",
> -        (long)memmap[SPIKE_DRAM].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        memmap[SPIKE_DRAM].base >> 32, memmap[SPIKE_DRAM].base,
> -        mem_size >> 32, mem_size);
> -    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> -    g_free(nodename);
> -
>      qemu_fdt_add_subnode(fdt, "/cpus");
>      qemu_fdt_setprop_cell(fdt, "/cpus", "timebase-frequency",
>          SIFIVE_CLINT_TIMEBASE_FREQ);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
>      qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
> +    qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
> +
> +    for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
> +        clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
> +        qemu_fdt_add_subnode(fdt, clust_name);
> +
> +        clint_cells =  g_new0(uint32_t, s->soc[socket].num_harts * 4);
>
> -    for (cpu = s->soc.num_harts - 1; cpu >= 0; cpu--) {
> -        nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> -        char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        char *isa = riscv_isa_string(&s->soc.harts[cpu]);
> -        qemu_fdt_add_subnode(fdt, nodename);
> +        for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
> +            cpu_phandle = phandle++;
> +
> +            cpu_name = g_strdup_printf("/cpus/cpu@%d",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_add_subnode(fdt, cpu_name);
>  #if defined(TARGET_RISCV32)
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv32");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv32");
>  #else
> -        qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", "riscv,sv48");
>  #endif
> -        qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
> -        qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
> -        qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
> -        qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
> -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
> -        qemu_fdt_add_subnode(fdt, intc);
> -        qemu_fdt_setprop_cell(fdt, intc, "phandle", 1);
> -        qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
> -        qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
> -        qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
> -        g_free(isa);
> -        g_free(intc);
> -        g_free(nodename);
> -    }
> +            name = riscv_isa_string(&s->soc[socket].harts[cpu]);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
> +            g_free(name);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
> +            qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
> +                s->soc[socket].hartid_base + cpu);
> +            qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
> +            riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
> +            qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
> +
> +            intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
> +            qemu_fdt_add_subnode(fdt, intc_name);
> +            intc_phandle = phandle++;
> +            qemu_fdt_setprop_cell(fdt, intc_name, "phandle", intc_phandle);
> +            qemu_fdt_setprop_string(fdt, intc_name, "compatible",
> +                "riscv,cpu-intc");
> +            qemu_fdt_setprop(fdt, intc_name, "interrupt-controller", NULL, 0);
> +            qemu_fdt_setprop_cell(fdt, intc_name, "#interrupt-cells", 1);
> +
> +            clint_cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> +            clint_cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> +            clint_cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> +
> +            core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
> +            qemu_fdt_add_subnode(fdt, core_name);
> +            qemu_fdt_setprop_cell(fdt, core_name, "cpu", cpu_phandle);
> +
> +            g_free(core_name);
> +            g_free(intc_name);
> +            g_free(cpu_name);
> +        }
>
> -    cells =  g_new0(uint32_t, s->soc.num_harts * 4);
> -    for (cpu = 0; cpu < s->soc.num_harts; cpu++) {
> -        nodename =
> -            g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
> -        uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -        cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_SOFT);
> -        cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
> -        cells[cpu * 4 + 3] = cpu_to_be32(IRQ_M_TIMER);
> -        g_free(nodename);
> +        addr = memmap[SPIKE_DRAM].base + riscv_socket_mem_offset(mc, socket);
> +        size = riscv_socket_mem_size(mc, socket);
> +        mem_name = g_strdup_printf("/memory@%lx", (long)addr);
> +        qemu_fdt_add_subnode(fdt, mem_name);
> +        qemu_fdt_setprop_cells(fdt, mem_name, "reg",
> +            addr >> 32, addr, size >> 32, size);
> +        qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
> +        riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
> +        g_free(mem_name);
> +
> +        clint_addr = memmap[SPIKE_CLINT].base +
> +            (memmap[SPIKE_CLINT].size * socket);
> +        clint_name = g_strdup_printf("/soc/clint@%lx", clint_addr);
> +        qemu_fdt_add_subnode(fdt, clint_name);
> +        qemu_fdt_setprop_string(fdt, clint_name, "compatible", "riscv,clint0");
> +        qemu_fdt_setprop_cells(fdt, clint_name, "reg",
> +            0x0, clint_addr, 0x0, memmap[SPIKE_CLINT].size);
> +        qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
> +            clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
> +        riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
> +
> +        g_free(clint_name);
> +        g_free(clint_cells);
> +        g_free(clust_name);
>      }
> -    nodename = g_strdup_printf("/soc/clint@%lx",
> -        (long)memmap[SPIKE_CLINT].base);
> -    qemu_fdt_add_subnode(fdt, nodename);
> -    qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,clint0");
> -    qemu_fdt_setprop_cells(fdt, nodename, "reg",
> -        0x0, memmap[SPIKE_CLINT].base,
> -        0x0, memmap[SPIKE_CLINT].size);
> -    qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
> -        cells, s->soc.num_harts * sizeof(uint32_t) * 4);
> -    g_free(cells);
> -    g_free(nodename);
> +
> +    riscv_socket_fdt_write_distance_matrix(mc, fdt);
>
>      if (cmdline) {
>          qemu_fdt_add_subnode(fdt, "/chosen");
> @@ -160,23 +189,58 @@ static void create_fdt(SpikeState *s, const struct MemmapEntry *memmap,
>  static void spike_board_init(MachineState *machine)
>  {
>      const struct MemmapEntry *memmap = spike_memmap;
> -
> -    SpikeState *s = g_new0(SpikeState, 1);
> +    SpikeState *s = SPIKE_MACHINE(machine);
>      MemoryRegion *system_memory = get_system_memory();
>      MemoryRegion *main_mem = g_new(MemoryRegion, 1);
>      MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
> -    int i;
> -    unsigned int smp_cpus = machine->smp.cpus;
> +    char *soc_name;
> +    int i, base_hartid, hart_count;
>
> -    /* Initialize SOC */
> -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> -                            TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> -    object_property_set_str(OBJECT(&s->soc), machine->cpu_type, "cpu-type",
> -                            &error_abort);
> -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> -                            &error_abort);
> -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> -                            &error_abort);
> +    /* Check socket count limit */
> +    if (SPIKE_SOCKETS_MAX < riscv_socket_count(machine)) {
> +        error_report("number of sockets/nodes should be less than %d",
> +            SPIKE_SOCKETS_MAX);
> +        exit(1);
> +    }
> +
> +    /* Initialize sockets */
> +    for (i = 0; i < riscv_socket_count(machine); i++) {
> +        if (!riscv_socket_check_hartids(machine, i)) {
> +            error_report("discontinuous hartids in socket%d", i);
> +            exit(1);
> +        }
> +
> +        base_hartid = riscv_socket_first_hartid(machine, i);
> +        if (base_hartid < 0) {
> +            error_report("can't find hartid base for socket%d", i);
> +            exit(1);
> +        }
> +
> +        hart_count = riscv_socket_hart_count(machine, i);
> +        if (hart_count < 0) {
> +            error_report("can't find hart count for socket%d", i);
> +            exit(1);
> +        }
> +
> +        soc_name = g_strdup_printf("soc%d", i);
> +        object_initialize_child(OBJECT(machine), soc_name, &s->soc[i],
> +            sizeof(s->soc[i]), TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> +        g_free(soc_name);
> +        object_property_set_str(OBJECT(&s->soc[i]),
> +            machine->cpu_type, "cpu-type", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            base_hartid, "hartid-base", &error_abort);
> +        object_property_set_int(OBJECT(&s->soc[i]),
> +            hart_count, "num-harts", &error_abort);
> +        object_property_set_bool(OBJECT(&s->soc[i]),
> +            true, "realized", &error_abort);
> +
> +        /* Core Local Interruptor (timer and IPI) for each socket */
> +        sifive_clint_create(
> +            memmap[SPIKE_CLINT].base + i * memmap[SPIKE_CLINT].size,
> +            memmap[SPIKE_CLINT].size, base_hartid, hart_count,
> +            SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE, false);
> +    }
>
>      /* register system main memory (actual RAM) */
>      memory_region_init_ram(main_mem, NULL, "riscv.spike.ram",
> @@ -249,12 +313,8 @@ static void spike_board_init(MachineState *machine)
>                            &address_space_memory);
>
>      /* initialize HTIF using symbols found in load_kernel */
> -    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
> -
> -    /* Core Local Interruptor (timer and IPI) */
> -    sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
> -        0, smp_cpus, SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE,
> -        false);
> +    htif_mm_init(system_memory, mask_rom,
> +                 &s->soc[0].harts[0].env, serial_hd(0));
>  }
>
>  static void spike_v1_10_0_board_init(MachineState *machine)
> @@ -275,13 +335,14 @@ static void spike_v1_10_0_board_init(MachineState *machine)
>      }
>
>      /* Initialize SOC */
> -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> +    object_initialize_child(OBJECT(machine), "soc",
> +                            &s->soc[0], sizeof(s->soc[0]),
>                              TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> -    object_property_set_str(OBJECT(&s->soc), SPIKE_V1_10_0_CPU, "cpu-type",
> +    object_property_set_str(OBJECT(&s->soc[0]), SPIKE_V1_10_0_CPU, "cpu-type",
>                              &error_abort);
> -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> +    object_property_set_int(OBJECT(&s->soc[0]), smp_cpus, "num-harts",
>                              &error_abort);
> -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> +    object_property_set_bool(OBJECT(&s->soc[0]), true, "realized",
>                              &error_abort);
>
>      /* register system main memory (actual RAM) */
> @@ -339,7 +400,8 @@ static void spike_v1_10_0_board_init(MachineState *machine)
>                            &address_space_memory);
>
>      /* initialize HTIF using symbols found in load_kernel */
> -    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
> +    htif_mm_init(system_memory, mask_rom,
> +                 &s->soc[0].harts[0].env, serial_hd(0));
>
>      /* Core Local Interruptor (timer and IPI) */
>      sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
> @@ -365,13 +427,14 @@ static void spike_v1_09_1_board_init(MachineState *machine)
>      }
>
>      /* Initialize SOC */
> -    object_initialize_child(OBJECT(machine), "soc", &s->soc, sizeof(s->soc),
> +    object_initialize_child(OBJECT(machine), "soc",
> +                            &s->soc[0], sizeof(s->soc[0]),
>                              TYPE_RISCV_HART_ARRAY, &error_abort, NULL);
> -    object_property_set_str(OBJECT(&s->soc), SPIKE_V1_09_1_CPU, "cpu-type",
> +    object_property_set_str(OBJECT(&s->soc[0]), SPIKE_V1_09_1_CPU, "cpu-type",
>                              &error_abort);
> -    object_property_set_int(OBJECT(&s->soc), smp_cpus, "num-harts",
> +    object_property_set_int(OBJECT(&s->soc[0]), smp_cpus, "num-harts",
>                              &error_abort);
> -    object_property_set_bool(OBJECT(&s->soc), true, "realized",
> +    object_property_set_bool(OBJECT(&s->soc[0]), true, "realized",
>                              &error_abort);
>
>      /* register system main memory (actual RAM) */
> @@ -425,7 +488,7 @@ static void spike_v1_09_1_board_init(MachineState *machine)
>          "};\n";
>
>      /* build config string with supplied memory size */
> -    char *isa = riscv_isa_string(&s->soc.harts[0]);
> +    char *isa = riscv_isa_string(&s->soc[0].harts[0]);
>      char *config_string = g_strdup_printf(config_string_tmpl,
>          (uint64_t)memmap[SPIKE_CLINT].base + SIFIVE_TIME_BASE,
>          (uint64_t)memmap[SPIKE_DRAM].base,
> @@ -448,7 +511,8 @@ static void spike_v1_09_1_board_init(MachineState *machine)
>                            &address_space_memory);
>
>      /* initialize HTIF using symbols found in load_kernel */
> -    htif_mm_init(system_memory, mask_rom, &s->soc.harts[0].env, serial_hd(0));
> +    htif_mm_init(system_memory, mask_rom,
> +                 &s->soc[0].harts[0].env, serial_hd(0));
>
>      /* Core Local Interruptor (timer and IPI) */
>      sifive_clint_create(memmap[SPIKE_CLINT].base, memmap[SPIKE_CLINT].size,
> @@ -472,15 +536,39 @@ static void spike_v1_10_0_machine_init(MachineClass *mc)
>      mc->max_cpus = 1;
>  }
>
> -static void spike_machine_init(MachineClass *mc)
> +DEFINE_MACHINE("spike_v1.9.1", spike_v1_09_1_machine_init)
> +DEFINE_MACHINE("spike_v1.10", spike_v1_10_0_machine_init)
> +
> +static void spike_machine_instance_init(Object *obj)
> +{
> +}
> +
> +static void spike_machine_class_init(ObjectClass *oc, void *data)
>  {
> -    mc->desc = "RISC-V Spike Board";
> +    MachineClass *mc = MACHINE_CLASS(oc);
> +
> +    mc->desc = "RISC-V Spike board";
>      mc->init = spike_board_init;
> -    mc->max_cpus = 8;
> +    mc->max_cpus = SPIKE_CPUS_MAX;
>      mc->is_default = true;
>      mc->default_cpu_type = SPIKE_V1_10_0_CPU;
> +    mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
> +    mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
> +    mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
> +    mc->numa_mem_supported = true;
>  }
>
> -DEFINE_MACHINE("spike_v1.9.1", spike_v1_09_1_machine_init)
> -DEFINE_MACHINE("spike_v1.10", spike_v1_10_0_machine_init)
> -DEFINE_MACHINE("spike", spike_machine_init)
> +static const TypeInfo spike_machine_typeinfo = {
> +    .name       = MACHINE_TYPE_NAME("spike"),
> +    .parent     = TYPE_MACHINE,
> +    .class_init = spike_machine_class_init,
> +    .instance_init = spike_machine_instance_init,
> +    .instance_size = sizeof(SpikeState),
> +};
> +
> +static void spike_machine_init_register_types(void)
> +{
> +    type_register_static(&spike_machine_typeinfo);
> +}
> +
> +type_init(spike_machine_init_register_types)
> diff --git a/include/hw/riscv/spike.h b/include/hw/riscv/spike.h
> index dc770421bc..c55fdf4d24 100644
> --- a/include/hw/riscv/spike.h
> +++ b/include/hw/riscv/spike.h
> @@ -22,12 +22,19 @@
>  #include "hw/riscv/riscv_hart.h"
>  #include "hw/sysbus.h"
>
> +#define SPIKE_CPUS_MAX 8
> +#define SPIKE_SOCKETS_MAX 8
> +
> +#define TYPE_SPIKE_MACHINE MACHINE_TYPE_NAME("spike")
> +#define SPIKE_MACHINE(obj) \
> +    OBJECT_CHECK(SpikeState, (obj), TYPE_SPIKE_MACHINE)
> +
>  typedef struct {
>      /*< private >*/
> -    SysBusDevice parent_obj;
> +    MachineState parent;
>
>      /*< public >*/
> -    RISCVHartArrayState soc;
> +    RISCVHartArrayState soc[SPIKE_SOCKETS_MAX];
>      void *fdt;
>      int fdt_size;
>  } SpikeState;
> --
> 2.25.1
>
>

As the upstream version of spike removed the deprecated ISA specific
machines, the rebased patch
will be bit different from this version. But I don't think there will
be any change in functionality.

With that assumption:

Reviewed-by: Atish Patra <atish.patra@wdc.com>

-- 
Regards,
Atish


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-06-13  5:35 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-29 11:46 [PATCH v5 0/5] RISC-V multi-socket support Anup Patel
2020-05-29 11:46 ` Anup Patel
2020-05-29 11:46 ` [PATCH v5 1/5] hw/riscv: Allow creating multiple instances of CLINT Anup Patel
2020-05-29 11:46   ` Anup Patel
2020-05-29 11:46 ` [PATCH v5 2/5] hw/riscv: Allow creating multiple instances of PLIC Anup Patel
2020-05-29 11:46   ` Anup Patel
2020-05-29 11:46 ` [PATCH v5 3/5] hw/riscv: Add helpers for RISC-V multi-socket NUMA machines Anup Patel
2020-05-29 11:46   ` Anup Patel
2020-06-10 23:28   ` Alistair Francis
2020-06-10 23:28     ` Alistair Francis
2020-06-11 13:11     ` Anup Patel
2020-06-13  0:52       ` Alistair Francis
2020-06-13  1:12         ` Atish Patra
2020-06-13  1:12           ` Atish Patra
2020-06-13  5:18   ` Atish Patra
2020-06-13  5:18     ` Atish Patra
2020-05-29 11:46 ` [PATCH v5 4/5] hw/riscv: spike: Allow creating multiple NUMA sockets Anup Patel
2020-05-29 11:46   ` Anup Patel
2020-06-13  5:34   ` Atish Patra
2020-06-13  5:34     ` Atish Patra
2020-05-29 11:46 ` [PATCH v5 5/5] hw/riscv: virt: " Anup Patel
2020-05-29 11:46   ` Anup Patel
2020-06-10 23:24   ` Alistair Francis
2020-06-10 23:24     ` Alistair Francis
2020-06-11 13:01     ` Anup Patel
2020-06-11 13:01       ` Anup Patel
2020-06-13  5:21   ` Atish Patra
2020-06-13  5:21     ` Atish Patra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.