* [PATCH v4] Fixes related to processing of qemu's -numa option
@ 2012-07-17 4:31 ` Chegu Vinod
0 siblings, 0 replies; 10+ messages in thread
From: Chegu Vinod @ 2012-07-17 4:31 UTC (permalink / raw)
To: qemu-devel, aliguori; +Cc: kvm, ehabkost, Chegu Vinod
Changes since v3:
- using bitmap_set() instead of set_bit() in numa_add() routine.
- removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
- Rebased to the latest qemu.
Changes since v2:
- Using "unsigned long *" for the node_cpumask[].
- Use bitmap_new() instead of g_malloc0() for allocation.
- Don't rely on "max_cpus" since it may not be initialized
before the numa related qemu options are parsed & processed.
Note: Continuing to use a new constant for allocation of
the mask (This constant is currently set to 255 since
with an 8bit APIC ID VCPUs can range from 0-254 in a
guest. The APIC ID 255 (0xFF) is reserved for broadcast).
Changes since v1:
- Use bitmap functions that are already in qemu (instead
of cpu_set_t macro's from sched.h)
- Added a check for endvalue >= max_cpus.
- Fix to address the round-robbing assignment when
cpu's are not explicitly specified.
-----------------------------------------------
v1:
--
The -numa option to qemu is used to create [fake] numa nodes
and expose them to the guest OS instance.
There are a couple of issues with the -numa option:
a) Max VCPU's that can be specified for a guest while using
the qemu's -numa option is 64. Due to a typecasting issue
when the number of VCPUs is > 32 the VCPUs don't show up
under the specified [fake] numa nodes.
b) KVM currently has support for 160VCPUs per guest. The
qemu's -numa option has only support for upto 64VCPUs
per guest.
This patch addresses these two issues.
Below are examples of (a) and (b)
a) >32 VCPUs are specified with the -numa option:
/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
71:01:01 \
-net tap,ifname=tap0,script=no,downscript=no \
-vnc :4
...
Upstream qemu :
--------------
QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
6 nodes
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41
node 0 size: 131072 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51
node 1 size: 131072 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59
node 2 size: 131072 MB
node 3 cpus: 30
node 3 size: 131072 MB
node 4 cpus:
node 4 size: 131072 MB
node 5 cpus: 31
node 5 size: 131072 MB
With the patch applied :
-----------------------
QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
6 nodes
node 0 cpus: 0 1 2 3 4 5 6 7 8 9
node 0 size: 131072 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19
node 1 size: 131072 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29
node 2 size: 131072 MB
node 3 cpus: 30 31 32 33 34 35 36 37 38 39
node 3 size: 131072 MB
node 4 cpus: 40 41 42 43 44 45 46 47 48 49
node 4 size: 131072 MB
node 5 cpus: 50 51 52 53 54 55 56 57 58 59
node 5 size: 131072 MB
b) >64 VCPUs specified with -numa option:
/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
-cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
...
Upstream qemu :
--------------
only 63 CPUs in NUMA mode supported.
only 64 CPUs in NUMA mode supported.
QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
8 nodes
node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73
node 0 size: 65536 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79
node 1 size: 65536 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61
node 2 size: 65536 MB
node 3 cpus: 30 62
node 3 size: 65536 MB
node 4 cpus:
node 4 size: 65536 MB
node 5 cpus:
node 5 size: 65536 MB
node 6 cpus: 31 63
node 6 size: 65536 MB
node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69
node 7 size: 65536 MB
With the patch applied :
-----------------------
QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
8 nodes
node 0 cpus: 0 1 2 3 4 5 6 7 8 9
node 0 size: 65536 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19
node 1 size: 65536 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29
node 2 size: 65536 MB
node 3 cpus: 30 31 32 33 34 35 36 37 38 39
node 3 size: 65536 MB
node 4 cpus: 40 41 42 43 44 45 46 47 48 49
node 4 size: 65536 MB
node 5 cpus: 50 51 52 53 54 55 56 57 58 59
node 5 size: 65536 MB
node 6 cpus: 60 61 62 63 64 65 66 67 68 69
node 6 size: 65536 MB
node 7 cpus: 70 71 72 73 74 75 76 77 78 79
Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
---
cpus.c | 3 ++-
hw/pc.c | 3 ++-
sysemu.h | 3 ++-
vl.c | 43 +++++++++++++++++++++----------------------
4 files changed, 27 insertions(+), 25 deletions(-)
diff --git a/cpus.c b/cpus.c
index b182b3d..acccd08 100644
--- a/cpus.c
+++ b/cpus.c
@@ -36,6 +36,7 @@
#include "cpus.h"
#include "qtest.h"
#include "main-loop.h"
+#include "bitmap.h"
#ifndef _WIN32
#include "compatfd.h"
@@ -1145,7 +1146,7 @@ void set_numa_modes(void)
for (env = first_cpu; env != NULL; env = env->next_cpu) {
for (i = 0; i < nb_numa_nodes; i++) {
- if (node_cpumask[i] & (1 << env->cpu_index)) {
+ if (test_bit(env->cpu_index, node_cpumask[i])) {
env->numa_node = i;
}
}
diff --git a/hw/pc.c b/hw/pc.c
index c7e9ab3..2edcc07 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -48,6 +48,7 @@
#include "memory.h"
#include "exec-memory.h"
#include "arch_init.h"
+#include "bitmap.h"
/* output Bochs bios info messages */
//#define DEBUG_BIOS
@@ -639,7 +640,7 @@ static void *bochs_bios_init(void)
numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
for (i = 0; i < max_cpus; i++) {
for (j = 0; j < nb_numa_nodes; j++) {
- if (node_cpumask[j] & (1 << i)) {
+ if (test_bit(i, node_cpumask[j])) {
numa_fw_cfg[i + 1] = cpu_to_le64(j);
break;
}
diff --git a/sysemu.h b/sysemu.h
index 6540c79..4669348 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2];
extern QEMUClock *rtc_clock;
#define MAX_NODES 64
+#define MAX_CPUMASK_BITS 255
extern int nb_numa_nodes;
extern uint64_t node_mem[MAX_NODES];
-extern uint64_t node_cpumask[MAX_NODES];
+extern unsigned long *node_cpumask[MAX_NODES];
#define MAX_OPTION_ROMS 16
typedef struct QEMUOptionRom {
diff --git a/vl.c b/vl.c
index 46248b9..35a641b 100644
--- a/vl.c
+++ b/vl.c
@@ -28,6 +28,7 @@
#include <errno.h>
#include <sys/time.h>
#include <zlib.h>
+#include "bitmap.h"
/* Needed early for CONFIG_BSD etc. */
#include "config-host.h"
@@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order
int nb_numa_nodes;
uint64_t node_mem[MAX_NODES];
-uint64_t node_cpumask[MAX_NODES];
+unsigned long *node_cpumask[MAX_NODES];
uint8_t qemu_uuid[16];
@@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
unsigned long long value, endvalue;
int nodenr;
+ value = endvalue = 0ULL;
+
optarg = get_opt_name(option, 128, optarg, ',') + 1;
if (!strcmp(option, "node")) {
if (get_param_value(option, 128, "nodeid", optarg) == 0) {
@@ -970,27 +973,22 @@ static void numa_add(const char *optarg)
}
node_mem[nodenr] = sval;
}
- if (get_param_value(option, 128, "cpus", optarg) == 0) {
- node_cpumask[nodenr] = 0;
- } else {
+ if (get_param_value(option, 128, "cpus", optarg) != 0) {
value = strtoull(option, &endptr, 10);
- if (value >= 64) {
- value = 63;
- fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
+ if (*endptr == '-') {
+ endvalue = strtoull(endptr+1, &endptr, 10);
} else {
- if (*endptr == '-') {
- endvalue = strtoull(endptr+1, &endptr, 10);
- if (endvalue >= 63) {
- endvalue = 62;
- fprintf(stderr,
- "only 63 CPUs in NUMA mode supported.\n");
- }
- value = (2ULL << endvalue) - (1ULL << value);
- } else {
- value = 1ULL << value;
- }
+ endvalue = value;
+ }
+
+ if (!(endvalue < MAX_CPUMASK_BITS)) {
+ endvalue = MAX_CPUMASK_BITS - 1;
+ fprintf(stderr,
+ "A max of %d CPUs are supported in a guest\n",
+ MAX_CPUMASK_BITS);
}
- node_cpumask[nodenr] = value;
+
+ bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
}
nb_numa_nodes++;
}
@@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
for (i = 0; i < MAX_NODES; i++) {
node_mem[i] = 0;
- node_cpumask[i] = 0;
+ node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
}
nb_numa_nodes = 0;
@@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
}
for (i = 0; i < nb_numa_nodes; i++) {
- if (node_cpumask[i] != 0)
+ if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
break;
+ }
}
/* assigning the VCPUs round-robin is easier to implement, guest OSes
* must cope with this anyway, because there are BIOSes out there in
@@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
*/
if (i == nb_numa_nodes) {
for (i = 0; i < max_cpus; i++) {
- node_cpumask[i % nb_numa_nodes] |= 1 << i;
+ set_bit(i, node_cpumask[i % nb_numa_nodes]);
}
}
}
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
@ 2012-07-17 4:31 ` Chegu Vinod
0 siblings, 0 replies; 10+ messages in thread
From: Chegu Vinod @ 2012-07-17 4:31 UTC (permalink / raw)
To: qemu-devel, aliguori; +Cc: Chegu Vinod, ehabkost, kvm
Changes since v3:
- using bitmap_set() instead of set_bit() in numa_add() routine.
- removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
- Rebased to the latest qemu.
Changes since v2:
- Using "unsigned long *" for the node_cpumask[].
- Use bitmap_new() instead of g_malloc0() for allocation.
- Don't rely on "max_cpus" since it may not be initialized
before the numa related qemu options are parsed & processed.
Note: Continuing to use a new constant for allocation of
the mask (This constant is currently set to 255 since
with an 8bit APIC ID VCPUs can range from 0-254 in a
guest. The APIC ID 255 (0xFF) is reserved for broadcast).
Changes since v1:
- Use bitmap functions that are already in qemu (instead
of cpu_set_t macro's from sched.h)
- Added a check for endvalue >= max_cpus.
- Fix to address the round-robbing assignment when
cpu's are not explicitly specified.
-----------------------------------------------
v1:
--
The -numa option to qemu is used to create [fake] numa nodes
and expose them to the guest OS instance.
There are a couple of issues with the -numa option:
a) Max VCPU's that can be specified for a guest while using
the qemu's -numa option is 64. Due to a typecasting issue
when the number of VCPUs is > 32 the VCPUs don't show up
under the specified [fake] numa nodes.
b) KVM currently has support for 160VCPUs per guest. The
qemu's -numa option has only support for upto 64VCPUs
per guest.
This patch addresses these two issues.
Below are examples of (a) and (b)
a) >32 VCPUs are specified with the -numa option:
/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
71:01:01 \
-net tap,ifname=tap0,script=no,downscript=no \
-vnc :4
...
Upstream qemu :
--------------
QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
6 nodes
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41
node 0 size: 131072 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51
node 1 size: 131072 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59
node 2 size: 131072 MB
node 3 cpus: 30
node 3 size: 131072 MB
node 4 cpus:
node 4 size: 131072 MB
node 5 cpus: 31
node 5 size: 131072 MB
With the patch applied :
-----------------------
QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
6 nodes
node 0 cpus: 0 1 2 3 4 5 6 7 8 9
node 0 size: 131072 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19
node 1 size: 131072 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29
node 2 size: 131072 MB
node 3 cpus: 30 31 32 33 34 35 36 37 38 39
node 3 size: 131072 MB
node 4 cpus: 40 41 42 43 44 45 46 47 48 49
node 4 size: 131072 MB
node 5 cpus: 50 51 52 53 54 55 56 57 58 59
node 5 size: 131072 MB
b) >64 VCPUs specified with -numa option:
/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
-cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
...
Upstream qemu :
--------------
only 63 CPUs in NUMA mode supported.
only 64 CPUs in NUMA mode supported.
QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
8 nodes
node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73
node 0 size: 65536 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79
node 1 size: 65536 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61
node 2 size: 65536 MB
node 3 cpus: 30 62
node 3 size: 65536 MB
node 4 cpus:
node 4 size: 65536 MB
node 5 cpus:
node 5 size: 65536 MB
node 6 cpus: 31 63
node 6 size: 65536 MB
node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69
node 7 size: 65536 MB
With the patch applied :
-----------------------
QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
8 nodes
node 0 cpus: 0 1 2 3 4 5 6 7 8 9
node 0 size: 65536 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19
node 1 size: 65536 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29
node 2 size: 65536 MB
node 3 cpus: 30 31 32 33 34 35 36 37 38 39
node 3 size: 65536 MB
node 4 cpus: 40 41 42 43 44 45 46 47 48 49
node 4 size: 65536 MB
node 5 cpus: 50 51 52 53 54 55 56 57 58 59
node 5 size: 65536 MB
node 6 cpus: 60 61 62 63 64 65 66 67 68 69
node 6 size: 65536 MB
node 7 cpus: 70 71 72 73 74 75 76 77 78 79
Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
---
cpus.c | 3 ++-
hw/pc.c | 3 ++-
sysemu.h | 3 ++-
vl.c | 43 +++++++++++++++++++++----------------------
4 files changed, 27 insertions(+), 25 deletions(-)
diff --git a/cpus.c b/cpus.c
index b182b3d..acccd08 100644
--- a/cpus.c
+++ b/cpus.c
@@ -36,6 +36,7 @@
#include "cpus.h"
#include "qtest.h"
#include "main-loop.h"
+#include "bitmap.h"
#ifndef _WIN32
#include "compatfd.h"
@@ -1145,7 +1146,7 @@ void set_numa_modes(void)
for (env = first_cpu; env != NULL; env = env->next_cpu) {
for (i = 0; i < nb_numa_nodes; i++) {
- if (node_cpumask[i] & (1 << env->cpu_index)) {
+ if (test_bit(env->cpu_index, node_cpumask[i])) {
env->numa_node = i;
}
}
diff --git a/hw/pc.c b/hw/pc.c
index c7e9ab3..2edcc07 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -48,6 +48,7 @@
#include "memory.h"
#include "exec-memory.h"
#include "arch_init.h"
+#include "bitmap.h"
/* output Bochs bios info messages */
//#define DEBUG_BIOS
@@ -639,7 +640,7 @@ static void *bochs_bios_init(void)
numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
for (i = 0; i < max_cpus; i++) {
for (j = 0; j < nb_numa_nodes; j++) {
- if (node_cpumask[j] & (1 << i)) {
+ if (test_bit(i, node_cpumask[j])) {
numa_fw_cfg[i + 1] = cpu_to_le64(j);
break;
}
diff --git a/sysemu.h b/sysemu.h
index 6540c79..4669348 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2];
extern QEMUClock *rtc_clock;
#define MAX_NODES 64
+#define MAX_CPUMASK_BITS 255
extern int nb_numa_nodes;
extern uint64_t node_mem[MAX_NODES];
-extern uint64_t node_cpumask[MAX_NODES];
+extern unsigned long *node_cpumask[MAX_NODES];
#define MAX_OPTION_ROMS 16
typedef struct QEMUOptionRom {
diff --git a/vl.c b/vl.c
index 46248b9..35a641b 100644
--- a/vl.c
+++ b/vl.c
@@ -28,6 +28,7 @@
#include <errno.h>
#include <sys/time.h>
#include <zlib.h>
+#include "bitmap.h"
/* Needed early for CONFIG_BSD etc. */
#include "config-host.h"
@@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order
int nb_numa_nodes;
uint64_t node_mem[MAX_NODES];
-uint64_t node_cpumask[MAX_NODES];
+unsigned long *node_cpumask[MAX_NODES];
uint8_t qemu_uuid[16];
@@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
unsigned long long value, endvalue;
int nodenr;
+ value = endvalue = 0ULL;
+
optarg = get_opt_name(option, 128, optarg, ',') + 1;
if (!strcmp(option, "node")) {
if (get_param_value(option, 128, "nodeid", optarg) == 0) {
@@ -970,27 +973,22 @@ static void numa_add(const char *optarg)
}
node_mem[nodenr] = sval;
}
- if (get_param_value(option, 128, "cpus", optarg) == 0) {
- node_cpumask[nodenr] = 0;
- } else {
+ if (get_param_value(option, 128, "cpus", optarg) != 0) {
value = strtoull(option, &endptr, 10);
- if (value >= 64) {
- value = 63;
- fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
+ if (*endptr == '-') {
+ endvalue = strtoull(endptr+1, &endptr, 10);
} else {
- if (*endptr == '-') {
- endvalue = strtoull(endptr+1, &endptr, 10);
- if (endvalue >= 63) {
- endvalue = 62;
- fprintf(stderr,
- "only 63 CPUs in NUMA mode supported.\n");
- }
- value = (2ULL << endvalue) - (1ULL << value);
- } else {
- value = 1ULL << value;
- }
+ endvalue = value;
+ }
+
+ if (!(endvalue < MAX_CPUMASK_BITS)) {
+ endvalue = MAX_CPUMASK_BITS - 1;
+ fprintf(stderr,
+ "A max of %d CPUs are supported in a guest\n",
+ MAX_CPUMASK_BITS);
}
- node_cpumask[nodenr] = value;
+
+ bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
}
nb_numa_nodes++;
}
@@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
for (i = 0; i < MAX_NODES; i++) {
node_mem[i] = 0;
- node_cpumask[i] = 0;
+ node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
}
nb_numa_nodes = 0;
@@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
}
for (i = 0; i < nb_numa_nodes; i++) {
- if (node_cpumask[i] != 0)
+ if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
break;
+ }
}
/* assigning the VCPUs round-robin is easier to implement, guest OSes
* must cope with this anyway, because there are BIOSes out there in
@@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
*/
if (i == nb_numa_nodes) {
for (i = 0; i < max_cpus; i++) {
- node_cpumask[i % nb_numa_nodes] |= 1 << i;
+ set_bit(i, node_cpumask[i % nb_numa_nodes]);
}
}
}
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4] Fixes related to processing of qemu's -numa option
2012-07-17 4:31 ` [Qemu-devel] " Chegu Vinod
@ 2012-07-18 17:15 ` Eduardo Habkost
-1 siblings, 0 replies; 10+ messages in thread
From: Eduardo Habkost @ 2012-07-18 17:15 UTC (permalink / raw)
To: Chegu Vinod; +Cc: aliguori, qemu-devel, kvm
On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote:
> Changes since v3:
> - using bitmap_set() instead of set_bit() in numa_add() routine.
> - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
> - Rebased to the latest qemu.
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>
> Changes since v2:
> - Using "unsigned long *" for the node_cpumask[].
> - Use bitmap_new() instead of g_malloc0() for allocation.
> - Don't rely on "max_cpus" since it may not be initialized
> before the numa related qemu options are parsed & processed.
>
> Note: Continuing to use a new constant for allocation of
> the mask (This constant is currently set to 255 since
> with an 8bit APIC ID VCPUs can range from 0-254 in a
> guest. The APIC ID 255 (0xFF) is reserved for broadcast).
>
> Changes since v1:
>
> - Use bitmap functions that are already in qemu (instead
> of cpu_set_t macro's from sched.h)
> - Added a check for endvalue >= max_cpus.
> - Fix to address the round-robbing assignment when
> cpu's are not explicitly specified.
> -----------------------------------------------
>
> v1:
> --
>
> The -numa option to qemu is used to create [fake] numa nodes
> and expose them to the guest OS instance.
>
> There are a couple of issues with the -numa option:
>
> a) Max VCPU's that can be specified for a guest while using
> the qemu's -numa option is 64. Due to a typecasting issue
> when the number of VCPUs is > 32 the VCPUs don't show up
> under the specified [fake] numa nodes.
>
> b) KVM currently has support for 160VCPUs per guest. The
> qemu's -numa option has only support for upto 64VCPUs
> per guest.
> This patch addresses these two issues.
>
> Below are examples of (a) and (b)
>
> a) >32 VCPUs are specified with the -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> 71:01:01 \
> -net tap,ifname=tap0,script=no,downscript=no \
> -vnc :4
>
> ...
> Upstream qemu :
> --------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59
> node 2 size: 131072 MB
> node 3 cpus: 30
> node 3 size: 131072 MB
> node 4 cpus:
> node 4 size: 131072 MB
> node 5 cpus: 31
> node 5 size: 131072 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 131072 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 131072 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 131072 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 131072 MB
>
> b) >64 VCPUs specified with -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
>
> ...
>
> Upstream qemu :
> --------------
>
> only 63 CPUs in NUMA mode supported.
> only 64 CPUs in NUMA mode supported.
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61
> node 2 size: 65536 MB
> node 3 cpus: 30 62
> node 3 size: 65536 MB
> node 4 cpus:
> node 4 size: 65536 MB
> node 5 cpus:
> node 5 size: 65536 MB
> node 6 cpus: 31 63
> node 6 size: 65536 MB
> node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69
> node 7 size: 65536 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 65536 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 65536 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 65536 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 65536 MB
> node 6 cpus: 60 61 62 63 64 65 66 67 68 69
> node 6 size: 65536 MB
> node 7 cpus: 70 71 72 73 74 75 76 77 78 79
>
> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
> ---
> cpus.c | 3 ++-
> hw/pc.c | 3 ++-
> sysemu.h | 3 ++-
> vl.c | 43 +++++++++++++++++++++----------------------
> 4 files changed, 27 insertions(+), 25 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index b182b3d..acccd08 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -36,6 +36,7 @@
> #include "cpus.h"
> #include "qtest.h"
> #include "main-loop.h"
> +#include "bitmap.h"
>
> #ifndef _WIN32
> #include "compatfd.h"
> @@ -1145,7 +1146,7 @@ void set_numa_modes(void)
>
> for (env = first_cpu; env != NULL; env = env->next_cpu) {
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] & (1 << env->cpu_index)) {
> + if (test_bit(env->cpu_index, node_cpumask[i])) {
> env->numa_node = i;
> }
> }
> diff --git a/hw/pc.c b/hw/pc.c
> index c7e9ab3..2edcc07 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -48,6 +48,7 @@
> #include "memory.h"
> #include "exec-memory.h"
> #include "arch_init.h"
> +#include "bitmap.h"
>
> /* output Bochs bios info messages */
> //#define DEBUG_BIOS
> @@ -639,7 +640,7 @@ static void *bochs_bios_init(void)
> numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> for (i = 0; i < max_cpus; i++) {
> for (j = 0; j < nb_numa_nodes; j++) {
> - if (node_cpumask[j] & (1 << i)) {
> + if (test_bit(i, node_cpumask[j])) {
> numa_fw_cfg[i + 1] = cpu_to_le64(j);
> break;
> }
> diff --git a/sysemu.h b/sysemu.h
> index 6540c79..4669348 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2];
> extern QEMUClock *rtc_clock;
>
> #define MAX_NODES 64
> +#define MAX_CPUMASK_BITS 255
> extern int nb_numa_nodes;
> extern uint64_t node_mem[MAX_NODES];
> -extern uint64_t node_cpumask[MAX_NODES];
> +extern unsigned long *node_cpumask[MAX_NODES];
>
> #define MAX_OPTION_ROMS 16
> typedef struct QEMUOptionRom {
> diff --git a/vl.c b/vl.c
> index 46248b9..35a641b 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -28,6 +28,7 @@
> #include <errno.h>
> #include <sys/time.h>
> #include <zlib.h>
> +#include "bitmap.h"
>
> /* Needed early for CONFIG_BSD etc. */
> #include "config-host.h"
> @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order
>
> int nb_numa_nodes;
> uint64_t node_mem[MAX_NODES];
> -uint64_t node_cpumask[MAX_NODES];
> +unsigned long *node_cpumask[MAX_NODES];
>
> uint8_t qemu_uuid[16];
>
> @@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
> unsigned long long value, endvalue;
> int nodenr;
>
> + value = endvalue = 0ULL;
> +
> optarg = get_opt_name(option, 128, optarg, ',') + 1;
> if (!strcmp(option, "node")) {
> if (get_param_value(option, 128, "nodeid", optarg) == 0) {
> @@ -970,27 +973,22 @@ static void numa_add(const char *optarg)
> }
> node_mem[nodenr] = sval;
> }
> - if (get_param_value(option, 128, "cpus", optarg) == 0) {
> - node_cpumask[nodenr] = 0;
> - } else {
> + if (get_param_value(option, 128, "cpus", optarg) != 0) {
> value = strtoull(option, &endptr, 10);
> - if (value >= 64) {
> - value = 63;
> - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
> + if (*endptr == '-') {
> + endvalue = strtoull(endptr+1, &endptr, 10);
> } else {
> - if (*endptr == '-') {
> - endvalue = strtoull(endptr+1, &endptr, 10);
> - if (endvalue >= 63) {
> - endvalue = 62;
> - fprintf(stderr,
> - "only 63 CPUs in NUMA mode supported.\n");
> - }
> - value = (2ULL << endvalue) - (1ULL << value);
> - } else {
> - value = 1ULL << value;
> - }
> + endvalue = value;
> + }
> +
> + if (!(endvalue < MAX_CPUMASK_BITS)) {
> + endvalue = MAX_CPUMASK_BITS - 1;
> + fprintf(stderr,
> + "A max of %d CPUs are supported in a guest\n",
> + MAX_CPUMASK_BITS);
> }
> - node_cpumask[nodenr] = value;
> +
> + bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
> }
> nb_numa_nodes++;
> }
> @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
>
> for (i = 0; i < MAX_NODES; i++) {
> node_mem[i] = 0;
> - node_cpumask[i] = 0;
> + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
> }
>
> nb_numa_nodes = 0;
> @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
> }
>
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] != 0)
> + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
> break;
> + }
> }
> /* assigning the VCPUs round-robin is easier to implement, guest OSes
> * must cope with this anyway, because there are BIOSes out there in
> @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
> */
> if (i == nb_numa_nodes) {
> for (i = 0; i < max_cpus; i++) {
> - node_cpumask[i % nb_numa_nodes] |= 1 << i;
> + set_bit(i, node_cpumask[i % nb_numa_nodes]);
> }
> }
> }
> --
> 1.7.1
>
--
Eduardo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
@ 2012-07-18 17:15 ` Eduardo Habkost
0 siblings, 0 replies; 10+ messages in thread
From: Eduardo Habkost @ 2012-07-18 17:15 UTC (permalink / raw)
To: Chegu Vinod; +Cc: aliguori, qemu-devel, kvm
On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote:
> Changes since v3:
> - using bitmap_set() instead of set_bit() in numa_add() routine.
> - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
> - Rebased to the latest qemu.
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>
> Changes since v2:
> - Using "unsigned long *" for the node_cpumask[].
> - Use bitmap_new() instead of g_malloc0() for allocation.
> - Don't rely on "max_cpus" since it may not be initialized
> before the numa related qemu options are parsed & processed.
>
> Note: Continuing to use a new constant for allocation of
> the mask (This constant is currently set to 255 since
> with an 8bit APIC ID VCPUs can range from 0-254 in a
> guest. The APIC ID 255 (0xFF) is reserved for broadcast).
>
> Changes since v1:
>
> - Use bitmap functions that are already in qemu (instead
> of cpu_set_t macro's from sched.h)
> - Added a check for endvalue >= max_cpus.
> - Fix to address the round-robbing assignment when
> cpu's are not explicitly specified.
> -----------------------------------------------
>
> v1:
> --
>
> The -numa option to qemu is used to create [fake] numa nodes
> and expose them to the guest OS instance.
>
> There are a couple of issues with the -numa option:
>
> a) Max VCPU's that can be specified for a guest while using
> the qemu's -numa option is 64. Due to a typecasting issue
> when the number of VCPUs is > 32 the VCPUs don't show up
> under the specified [fake] numa nodes.
>
> b) KVM currently has support for 160VCPUs per guest. The
> qemu's -numa option has only support for upto 64VCPUs
> per guest.
> This patch addresses these two issues.
>
> Below are examples of (a) and (b)
>
> a) >32 VCPUs are specified with the -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> 71:01:01 \
> -net tap,ifname=tap0,script=no,downscript=no \
> -vnc :4
>
> ...
> Upstream qemu :
> --------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59
> node 2 size: 131072 MB
> node 3 cpus: 30
> node 3 size: 131072 MB
> node 4 cpus:
> node 4 size: 131072 MB
> node 5 cpus: 31
> node 5 size: 131072 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 131072 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 131072 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 131072 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 131072 MB
>
> b) >64 VCPUs specified with -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
>
> ...
>
> Upstream qemu :
> --------------
>
> only 63 CPUs in NUMA mode supported.
> only 64 CPUs in NUMA mode supported.
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61
> node 2 size: 65536 MB
> node 3 cpus: 30 62
> node 3 size: 65536 MB
> node 4 cpus:
> node 4 size: 65536 MB
> node 5 cpus:
> node 5 size: 65536 MB
> node 6 cpus: 31 63
> node 6 size: 65536 MB
> node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69
> node 7 size: 65536 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 65536 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 65536 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 65536 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 65536 MB
> node 6 cpus: 60 61 62 63 64 65 66 67 68 69
> node 6 size: 65536 MB
> node 7 cpus: 70 71 72 73 74 75 76 77 78 79
>
> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
> ---
> cpus.c | 3 ++-
> hw/pc.c | 3 ++-
> sysemu.h | 3 ++-
> vl.c | 43 +++++++++++++++++++++----------------------
> 4 files changed, 27 insertions(+), 25 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index b182b3d..acccd08 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -36,6 +36,7 @@
> #include "cpus.h"
> #include "qtest.h"
> #include "main-loop.h"
> +#include "bitmap.h"
>
> #ifndef _WIN32
> #include "compatfd.h"
> @@ -1145,7 +1146,7 @@ void set_numa_modes(void)
>
> for (env = first_cpu; env != NULL; env = env->next_cpu) {
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] & (1 << env->cpu_index)) {
> + if (test_bit(env->cpu_index, node_cpumask[i])) {
> env->numa_node = i;
> }
> }
> diff --git a/hw/pc.c b/hw/pc.c
> index c7e9ab3..2edcc07 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -48,6 +48,7 @@
> #include "memory.h"
> #include "exec-memory.h"
> #include "arch_init.h"
> +#include "bitmap.h"
>
> /* output Bochs bios info messages */
> //#define DEBUG_BIOS
> @@ -639,7 +640,7 @@ static void *bochs_bios_init(void)
> numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> for (i = 0; i < max_cpus; i++) {
> for (j = 0; j < nb_numa_nodes; j++) {
> - if (node_cpumask[j] & (1 << i)) {
> + if (test_bit(i, node_cpumask[j])) {
> numa_fw_cfg[i + 1] = cpu_to_le64(j);
> break;
> }
> diff --git a/sysemu.h b/sysemu.h
> index 6540c79..4669348 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2];
> extern QEMUClock *rtc_clock;
>
> #define MAX_NODES 64
> +#define MAX_CPUMASK_BITS 255
> extern int nb_numa_nodes;
> extern uint64_t node_mem[MAX_NODES];
> -extern uint64_t node_cpumask[MAX_NODES];
> +extern unsigned long *node_cpumask[MAX_NODES];
>
> #define MAX_OPTION_ROMS 16
> typedef struct QEMUOptionRom {
> diff --git a/vl.c b/vl.c
> index 46248b9..35a641b 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -28,6 +28,7 @@
> #include <errno.h>
> #include <sys/time.h>
> #include <zlib.h>
> +#include "bitmap.h"
>
> /* Needed early for CONFIG_BSD etc. */
> #include "config-host.h"
> @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order
>
> int nb_numa_nodes;
> uint64_t node_mem[MAX_NODES];
> -uint64_t node_cpumask[MAX_NODES];
> +unsigned long *node_cpumask[MAX_NODES];
>
> uint8_t qemu_uuid[16];
>
> @@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
> unsigned long long value, endvalue;
> int nodenr;
>
> + value = endvalue = 0ULL;
> +
> optarg = get_opt_name(option, 128, optarg, ',') + 1;
> if (!strcmp(option, "node")) {
> if (get_param_value(option, 128, "nodeid", optarg) == 0) {
> @@ -970,27 +973,22 @@ static void numa_add(const char *optarg)
> }
> node_mem[nodenr] = sval;
> }
> - if (get_param_value(option, 128, "cpus", optarg) == 0) {
> - node_cpumask[nodenr] = 0;
> - } else {
> + if (get_param_value(option, 128, "cpus", optarg) != 0) {
> value = strtoull(option, &endptr, 10);
> - if (value >= 64) {
> - value = 63;
> - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
> + if (*endptr == '-') {
> + endvalue = strtoull(endptr+1, &endptr, 10);
> } else {
> - if (*endptr == '-') {
> - endvalue = strtoull(endptr+1, &endptr, 10);
> - if (endvalue >= 63) {
> - endvalue = 62;
> - fprintf(stderr,
> - "only 63 CPUs in NUMA mode supported.\n");
> - }
> - value = (2ULL << endvalue) - (1ULL << value);
> - } else {
> - value = 1ULL << value;
> - }
> + endvalue = value;
> + }
> +
> + if (!(endvalue < MAX_CPUMASK_BITS)) {
> + endvalue = MAX_CPUMASK_BITS - 1;
> + fprintf(stderr,
> + "A max of %d CPUs are supported in a guest\n",
> + MAX_CPUMASK_BITS);
> }
> - node_cpumask[nodenr] = value;
> +
> + bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
> }
> nb_numa_nodes++;
> }
> @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
>
> for (i = 0; i < MAX_NODES; i++) {
> node_mem[i] = 0;
> - node_cpumask[i] = 0;
> + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
> }
>
> nb_numa_nodes = 0;
> @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
> }
>
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] != 0)
> + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
> break;
> + }
> }
> /* assigning the VCPUs round-robin is easier to implement, guest OSes
> * must cope with this anyway, because there are BIOSes out there in
> @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
> */
> if (i == nb_numa_nodes) {
> for (i = 0; i < max_cpus; i++) {
> - node_cpumask[i % nb_numa_nodes] |= 1 << i;
> + set_bit(i, node_cpumask[i % nb_numa_nodes]);
> }
> }
> }
> --
> 1.7.1
>
--
Eduardo
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
2012-07-18 17:15 ` [Qemu-devel] " Eduardo Habkost
@ 2012-07-18 18:26 ` Vinod, Chegu
-1 siblings, 0 replies; 10+ messages in thread
From: Vinod, Chegu @ 2012-07-18 18:26 UTC (permalink / raw)
To: Eduardo Habkost, aliguori; +Cc: qemu-devel, kvm
Thanks Eduardo !
Hi Anthony, If you are ok with this patch...could you pl pull these changes into upstream (or)
suggest who I should talk to get these changes in ?
Thanks!
Vinod
-----Original Message-----
From: Eduardo Habkost [mailto:ehabkost@redhat.com]
Sent: Wednesday, July 18, 2012 10:15 AM
To: Vinod, Chegu
Cc: qemu-devel@nongnu.org; aliguori@us.ibm.com; kvm@vger.kernel.org
Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote:
> Changes since v3:
> - using bitmap_set() instead of set_bit() in numa_add() routine.
> - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
> - Rebased to the latest qemu.
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>
> Changes since v2:
> - Using "unsigned long *" for the node_cpumask[].
> - Use bitmap_new() instead of g_malloc0() for allocation.
> - Don't rely on "max_cpus" since it may not be initialized
> before the numa related qemu options are parsed & processed.
>
> Note: Continuing to use a new constant for allocation of
> the mask (This constant is currently set to 255 since
> with an 8bit APIC ID VCPUs can range from 0-254 in a
> guest. The APIC ID 255 (0xFF) is reserved for broadcast).
>
> Changes since v1:
>
> - Use bitmap functions that are already in qemu (instead
> of cpu_set_t macro's from sched.h)
> - Added a check for endvalue >= max_cpus.
> - Fix to address the round-robbing assignment when
> cpu's are not explicitly specified.
> -----------------------------------------------
>
> v1:
> --
>
> The -numa option to qemu is used to create [fake] numa nodes and
> expose them to the guest OS instance.
>
> There are a couple of issues with the -numa option:
>
> a) Max VCPU's that can be specified for a guest while using
> the qemu's -numa option is 64. Due to a typecasting issue
> when the number of VCPUs is > 32 the VCPUs don't show up
> under the specified [fake] numa nodes.
>
> b) KVM currently has support for 160VCPUs per guest. The
> qemu's -numa option has only support for upto 64VCPUs
> per guest.
> This patch addresses these two issues.
>
> Below are examples of (a) and (b)
>
> a) >32 VCPUs are specified with the -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> 71:01:01 \
> -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4
>
> ...
> Upstream qemu :
> --------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0
> size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45
> 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24
> 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3
> cpus: 30 node 3 size: 131072 MB node 4 cpus:
> node 4 size: 131072 MB
> node 5 cpus: 31
> node 5 size: 131072 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node
> 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3
> cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4
> cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5
> cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB
>
> b) >64 VCPUs specified with -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> -cpu
> Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl
> ,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
>
> ...
>
> Upstream qemu :
> --------------
>
> only 63 CPUs in NUMA mode supported.
> only 64 CPUs in NUMA mode supported.
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50
> 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24
> 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB
> node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus:
> node 4 size: 65536 MB
> node 5 cpus:
> node 5 size: 65536 MB
> node 6 cpus: 31 63
> node 6 size: 65536 MB
> node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7
> size: 65536 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node
> 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3
> cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus:
> 40 41 42 43 44 45 46 47 48 49 node 4 size: 65536 MB node 5 cpus: 50 51
> 52 53 54 55 56 57 58 59 node 5 size: 65536 MB node 6 cpus: 60 61 62 63
> 64 65 66 67 68 69 node 6 size: 65536 MB node 7 cpus: 70 71 72 73 74 75
> 76 77 78 79
>
> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull
> <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
> ---
> cpus.c | 3 ++-
> hw/pc.c | 3 ++-
> sysemu.h | 3 ++-
> vl.c | 43 +++++++++++++++++++++----------------------
> 4 files changed, 27 insertions(+), 25 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index b182b3d..acccd08 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -36,6 +36,7 @@
> #include "cpus.h"
> #include "qtest.h"
> #include "main-loop.h"
> +#include "bitmap.h"
>
> #ifndef _WIN32
> #include "compatfd.h"
> @@ -1145,7 +1146,7 @@ void set_numa_modes(void)
>
> for (env = first_cpu; env != NULL; env = env->next_cpu) {
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] & (1 << env->cpu_index)) {
> + if (test_bit(env->cpu_index, node_cpumask[i])) {
> env->numa_node = i;
> }
> }
> diff --git a/hw/pc.c b/hw/pc.c
> index c7e9ab3..2edcc07 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -48,6 +48,7 @@
> #include "memory.h"
> #include "exec-memory.h"
> #include "arch_init.h"
> +#include "bitmap.h"
>
> /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -639,7
> +640,7 @@ static void *bochs_bios_init(void)
> numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> for (i = 0; i < max_cpus; i++) {
> for (j = 0; j < nb_numa_nodes; j++) {
> - if (node_cpumask[j] & (1 << i)) {
> + if (test_bit(i, node_cpumask[j])) {
> numa_fw_cfg[i + 1] = cpu_to_le64(j);
> break;
> }
> diff --git a/sysemu.h b/sysemu.h
> index 6540c79..4669348 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2]; extern
> QEMUClock *rtc_clock;
>
> #define MAX_NODES 64
> +#define MAX_CPUMASK_BITS 255
> extern int nb_numa_nodes;
> extern uint64_t node_mem[MAX_NODES];
> -extern uint64_t node_cpumask[MAX_NODES];
> +extern unsigned long *node_cpumask[MAX_NODES];
>
> #define MAX_OPTION_ROMS 16
> typedef struct QEMUOptionRom {
> diff --git a/vl.c b/vl.c
> index 46248b9..35a641b 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -28,6 +28,7 @@
> #include <errno.h>
> #include <sys/time.h>
> #include <zlib.h>
> +#include "bitmap.h"
>
> /* Needed early for CONFIG_BSD etc. */ #include "config-host.h"
> @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
> QTAILQ_HEAD_INITIALIZER(fw_boot_order
>
> int nb_numa_nodes;
> uint64_t node_mem[MAX_NODES];
> -uint64_t node_cpumask[MAX_NODES];
> +unsigned long *node_cpumask[MAX_NODES];
>
> uint8_t qemu_uuid[16];
>
> @@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
> unsigned long long value, endvalue;
> int nodenr;
>
> + value = endvalue = 0ULL;
> +
> optarg = get_opt_name(option, 128, optarg, ',') + 1;
> if (!strcmp(option, "node")) {
> if (get_param_value(option, 128, "nodeid", optarg) == 0) { @@
> -970,27 +973,22 @@ static void numa_add(const char *optarg)
> }
> node_mem[nodenr] = sval;
> }
> - if (get_param_value(option, 128, "cpus", optarg) == 0) {
> - node_cpumask[nodenr] = 0;
> - } else {
> + if (get_param_value(option, 128, "cpus", optarg) != 0) {
> value = strtoull(option, &endptr, 10);
> - if (value >= 64) {
> - value = 63;
> - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
> + if (*endptr == '-') {
> + endvalue = strtoull(endptr+1, &endptr, 10);
> } else {
> - if (*endptr == '-') {
> - endvalue = strtoull(endptr+1, &endptr, 10);
> - if (endvalue >= 63) {
> - endvalue = 62;
> - fprintf(stderr,
> - "only 63 CPUs in NUMA mode supported.\n");
> - }
> - value = (2ULL << endvalue) - (1ULL << value);
> - } else {
> - value = 1ULL << value;
> - }
> + endvalue = value;
> + }
> +
> + if (!(endvalue < MAX_CPUMASK_BITS)) {
> + endvalue = MAX_CPUMASK_BITS - 1;
> + fprintf(stderr,
> + "A max of %d CPUs are supported in a guest\n",
> + MAX_CPUMASK_BITS);
> }
> - node_cpumask[nodenr] = value;
> +
> + bitmap_set(node_cpumask[nodenr], value,
> + endvalue-value+1);
> }
> nb_numa_nodes++;
> }
> @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
>
> for (i = 0; i < MAX_NODES; i++) {
> node_mem[i] = 0;
> - node_cpumask[i] = 0;
> + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
> }
>
> nb_numa_nodes = 0;
> @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
> }
>
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] != 0)
> + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
> break;
> + }
> }
> /* assigning the VCPUs round-robin is easier to implement, guest OSes
> * must cope with this anyway, because there are BIOSes out
> there in @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
> */
> if (i == nb_numa_nodes) {
> for (i = 0; i < max_cpus; i++) {
> - node_cpumask[i % nb_numa_nodes] |= 1 << i;
> + set_bit(i, node_cpumask[i % nb_numa_nodes]);
> }
> }
> }
> --
> 1.7.1
>
--
Eduardo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
@ 2012-07-18 18:26 ` Vinod, Chegu
0 siblings, 0 replies; 10+ messages in thread
From: Vinod, Chegu @ 2012-07-18 18:26 UTC (permalink / raw)
To: Eduardo Habkost, aliguori; +Cc: qemu-devel, kvm
Thanks Eduardo !
Hi Anthony, If you are ok with this patch...could you pl pull these changes into upstream (or)
suggest who I should talk to get these changes in ?
Thanks!
Vinod
-----Original Message-----
From: Eduardo Habkost [mailto:ehabkost@redhat.com]
Sent: Wednesday, July 18, 2012 10:15 AM
To: Vinod, Chegu
Cc: qemu-devel@nongnu.org; aliguori@us.ibm.com; kvm@vger.kernel.org
Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote:
> Changes since v3:
> - using bitmap_set() instead of set_bit() in numa_add() routine.
> - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
> - Rebased to the latest qemu.
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>
> Changes since v2:
> - Using "unsigned long *" for the node_cpumask[].
> - Use bitmap_new() instead of g_malloc0() for allocation.
> - Don't rely on "max_cpus" since it may not be initialized
> before the numa related qemu options are parsed & processed.
>
> Note: Continuing to use a new constant for allocation of
> the mask (This constant is currently set to 255 since
> with an 8bit APIC ID VCPUs can range from 0-254 in a
> guest. The APIC ID 255 (0xFF) is reserved for broadcast).
>
> Changes since v1:
>
> - Use bitmap functions that are already in qemu (instead
> of cpu_set_t macro's from sched.h)
> - Added a check for endvalue >= max_cpus.
> - Fix to address the round-robbing assignment when
> cpu's are not explicitly specified.
> -----------------------------------------------
>
> v1:
> --
>
> The -numa option to qemu is used to create [fake] numa nodes and
> expose them to the guest OS instance.
>
> There are a couple of issues with the -numa option:
>
> a) Max VCPU's that can be specified for a guest while using
> the qemu's -numa option is 64. Due to a typecasting issue
> when the number of VCPUs is > 32 the VCPUs don't show up
> under the specified [fake] numa nodes.
>
> b) KVM currently has support for 160VCPUs per guest. The
> qemu's -numa option has only support for upto 64VCPUs
> per guest.
> This patch addresses these two issues.
>
> Below are examples of (a) and (b)
>
> a) >32 VCPUs are specified with the -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> 71:01:01 \
> -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4
>
> ...
> Upstream qemu :
> --------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0
> size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45
> 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24
> 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3
> cpus: 30 node 3 size: 131072 MB node 4 cpus:
> node 4 size: 131072 MB
> node 5 cpus: 31
> node 5 size: 131072 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node
> 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3
> cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4
> cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5
> cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB
>
> b) >64 VCPUs specified with -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> -cpu
> Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl
> ,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
>
> ...
>
> Upstream qemu :
> --------------
>
> only 63 CPUs in NUMA mode supported.
> only 64 CPUs in NUMA mode supported.
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50
> 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24
> 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB
> node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus:
> node 4 size: 65536 MB
> node 5 cpus:
> node 5 size: 65536 MB
> node 6 cpus: 31 63
> node 6 size: 65536 MB
> node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7
> size: 65536 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node
> 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3
> cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus:
> 40 41 42 43 44 45 46 47 48 49 node 4 size: 65536 MB node 5 cpus: 50 51
> 52 53 54 55 56 57 58 59 node 5 size: 65536 MB node 6 cpus: 60 61 62 63
> 64 65 66 67 68 69 node 6 size: 65536 MB node 7 cpus: 70 71 72 73 74 75
> 76 77 78 79
>
> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull
> <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
> ---
> cpus.c | 3 ++-
> hw/pc.c | 3 ++-
> sysemu.h | 3 ++-
> vl.c | 43 +++++++++++++++++++++----------------------
> 4 files changed, 27 insertions(+), 25 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index b182b3d..acccd08 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -36,6 +36,7 @@
> #include "cpus.h"
> #include "qtest.h"
> #include "main-loop.h"
> +#include "bitmap.h"
>
> #ifndef _WIN32
> #include "compatfd.h"
> @@ -1145,7 +1146,7 @@ void set_numa_modes(void)
>
> for (env = first_cpu; env != NULL; env = env->next_cpu) {
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] & (1 << env->cpu_index)) {
> + if (test_bit(env->cpu_index, node_cpumask[i])) {
> env->numa_node = i;
> }
> }
> diff --git a/hw/pc.c b/hw/pc.c
> index c7e9ab3..2edcc07 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -48,6 +48,7 @@
> #include "memory.h"
> #include "exec-memory.h"
> #include "arch_init.h"
> +#include "bitmap.h"
>
> /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -639,7
> +640,7 @@ static void *bochs_bios_init(void)
> numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> for (i = 0; i < max_cpus; i++) {
> for (j = 0; j < nb_numa_nodes; j++) {
> - if (node_cpumask[j] & (1 << i)) {
> + if (test_bit(i, node_cpumask[j])) {
> numa_fw_cfg[i + 1] = cpu_to_le64(j);
> break;
> }
> diff --git a/sysemu.h b/sysemu.h
> index 6540c79..4669348 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2]; extern
> QEMUClock *rtc_clock;
>
> #define MAX_NODES 64
> +#define MAX_CPUMASK_BITS 255
> extern int nb_numa_nodes;
> extern uint64_t node_mem[MAX_NODES];
> -extern uint64_t node_cpumask[MAX_NODES];
> +extern unsigned long *node_cpumask[MAX_NODES];
>
> #define MAX_OPTION_ROMS 16
> typedef struct QEMUOptionRom {
> diff --git a/vl.c b/vl.c
> index 46248b9..35a641b 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -28,6 +28,7 @@
> #include <errno.h>
> #include <sys/time.h>
> #include <zlib.h>
> +#include "bitmap.h"
>
> /* Needed early for CONFIG_BSD etc. */ #include "config-host.h"
> @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
> QTAILQ_HEAD_INITIALIZER(fw_boot_order
>
> int nb_numa_nodes;
> uint64_t node_mem[MAX_NODES];
> -uint64_t node_cpumask[MAX_NODES];
> +unsigned long *node_cpumask[MAX_NODES];
>
> uint8_t qemu_uuid[16];
>
> @@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
> unsigned long long value, endvalue;
> int nodenr;
>
> + value = endvalue = 0ULL;
> +
> optarg = get_opt_name(option, 128, optarg, ',') + 1;
> if (!strcmp(option, "node")) {
> if (get_param_value(option, 128, "nodeid", optarg) == 0) { @@
> -970,27 +973,22 @@ static void numa_add(const char *optarg)
> }
> node_mem[nodenr] = sval;
> }
> - if (get_param_value(option, 128, "cpus", optarg) == 0) {
> - node_cpumask[nodenr] = 0;
> - } else {
> + if (get_param_value(option, 128, "cpus", optarg) != 0) {
> value = strtoull(option, &endptr, 10);
> - if (value >= 64) {
> - value = 63;
> - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
> + if (*endptr == '-') {
> + endvalue = strtoull(endptr+1, &endptr, 10);
> } else {
> - if (*endptr == '-') {
> - endvalue = strtoull(endptr+1, &endptr, 10);
> - if (endvalue >= 63) {
> - endvalue = 62;
> - fprintf(stderr,
> - "only 63 CPUs in NUMA mode supported.\n");
> - }
> - value = (2ULL << endvalue) - (1ULL << value);
> - } else {
> - value = 1ULL << value;
> - }
> + endvalue = value;
> + }
> +
> + if (!(endvalue < MAX_CPUMASK_BITS)) {
> + endvalue = MAX_CPUMASK_BITS - 1;
> + fprintf(stderr,
> + "A max of %d CPUs are supported in a guest\n",
> + MAX_CPUMASK_BITS);
> }
> - node_cpumask[nodenr] = value;
> +
> + bitmap_set(node_cpumask[nodenr], value,
> + endvalue-value+1);
> }
> nb_numa_nodes++;
> }
> @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
>
> for (i = 0; i < MAX_NODES; i++) {
> node_mem[i] = 0;
> - node_cpumask[i] = 0;
> + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
> }
>
> nb_numa_nodes = 0;
> @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
> }
>
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] != 0)
> + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
> break;
> + }
> }
> /* assigning the VCPUs round-robin is easier to implement, guest OSes
> * must cope with this anyway, because there are BIOSes out
> there in @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
> */
> if (i == nb_numa_nodes) {
> for (i = 0; i < max_cpus; i++) {
> - node_cpumask[i % nb_numa_nodes] |= 1 << i;
> + set_bit(i, node_cpumask[i % nb_numa_nodes]);
> }
> }
> }
> --
> 1.7.1
>
--
Eduardo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] Fixes related to processing of qemu's -numa option
2012-07-18 18:26 ` Vinod, Chegu
@ 2012-07-31 13:28 ` Eduardo Habkost
-1 siblings, 0 replies; 10+ messages in thread
From: Eduardo Habkost @ 2012-07-31 13:28 UTC (permalink / raw)
To: Vinod, Chegu; +Cc: aliguori, qemu-devel, kvm
Ping? Can anybody help Vinod, here? How can we get the attention of some
maintainer, to get this pulled in?
On Wed, Jul 18, 2012 at 06:26:52PM +0000, Vinod, Chegu wrote:
> Thanks Eduardo !
>
> Hi Anthony, If you are ok with this patch...could you pl pull these changes into upstream (or)
> suggest who I should talk to get these changes in ?
>
> Thanks!
> Vinod
>
> -----Original Message-----
> From: Eduardo Habkost [mailto:ehabkost@redhat.com]
> Sent: Wednesday, July 18, 2012 10:15 AM
> To: Vinod, Chegu
> Cc: qemu-devel@nongnu.org; aliguori@us.ibm.com; kvm@vger.kernel.org
> Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
>
> On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote:
> > Changes since v3:
> > - using bitmap_set() instead of set_bit() in numa_add() routine.
> > - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
> > - Rebased to the latest qemu.
>
> Tested-by: Eduardo Habkost <ehabkost@redhat.com>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>
>
> >
> > Changes since v2:
> > - Using "unsigned long *" for the node_cpumask[].
> > - Use bitmap_new() instead of g_malloc0() for allocation.
> > - Don't rely on "max_cpus" since it may not be initialized
> > before the numa related qemu options are parsed & processed.
> >
> > Note: Continuing to use a new constant for allocation of
> > the mask (This constant is currently set to 255 since
> > with an 8bit APIC ID VCPUs can range from 0-254 in a
> > guest. The APIC ID 255 (0xFF) is reserved for broadcast).
> >
> > Changes since v1:
> >
> > - Use bitmap functions that are already in qemu (instead
> > of cpu_set_t macro's from sched.h)
> > - Added a check for endvalue >= max_cpus.
> > - Fix to address the round-robbing assignment when
> > cpu's are not explicitly specified.
> > -----------------------------------------------
> >
> > v1:
> > --
> >
> > The -numa option to qemu is used to create [fake] numa nodes and
> > expose them to the guest OS instance.
> >
> > There are a couple of issues with the -numa option:
> >
> > a) Max VCPU's that can be specified for a guest while using
> > the qemu's -numa option is 64. Due to a typecasting issue
> > when the number of VCPUs is > 32 the VCPUs don't show up
> > under the specified [fake] numa nodes.
> >
> > b) KVM currently has support for 160VCPUs per guest. The
> > qemu's -numa option has only support for upto 64VCPUs
> > per guest.
> > This patch addresses these two issues.
> >
> > Below are examples of (a) and (b)
> >
> > a) >32 VCPUs are specified with the -numa option:
> >
> > /usr/local/bin/qemu-system-x86_64 \
> > -enable-kvm \
> > 71:01:01 \
> > -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4
> >
> > ...
> > Upstream qemu :
> > --------------
> >
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 6 nodes
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0
> > size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45
> > 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24
> > 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3
> > cpus: 30 node 3 size: 131072 MB node 4 cpus:
> > node 4 size: 131072 MB
> > node 5 cpus: 31
> > node 5 size: 131072 MB
> >
> > With the patch applied :
> > -----------------------
> >
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 6 nodes
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> > node 0 size: 131072 MB
> > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node
> > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3
> > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4
> > cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5
> > cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB
> >
> > b) >64 VCPUs specified with -numa option:
> >
> > /usr/local/bin/qemu-system-x86_64 \
> > -enable-kvm \
> > -cpu
> > Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl
> > ,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
> >
> > ...
> >
> > Upstream qemu :
> > --------------
> >
> > only 63 CPUs in NUMA mode supported.
> > only 64 CPUs in NUMA mode supported.
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 8 nodes
> > node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB
> > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50
> > 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24
> > 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB
> > node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus:
> > node 4 size: 65536 MB
> > node 5 cpus:
> > node 5 size: 65536 MB
> > node 6 cpus: 31 63
> > node 6 size: 65536 MB
> > node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7
> > size: 65536 MB
> >
> > With the patch applied :
> > -----------------------
> >
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 8 nodes
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> > node 0 size: 65536 MB
> > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node
> > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3
> > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus:
> > 40 41 42 43 44 45 46 47 48 49 node 4 size: 65536 MB node 5 cpus: 50 51
> > 52 53 54 55 56 57 58 59 node 5 size: 65536 MB node 6 cpus: 60 61 62 63
> > 64 65 66 67 68 69 node 6 size: 65536 MB node 7 cpus: 70 71 72 73 74 75
> > 76 77 78 79
> >
> > Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull
> > <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
> > ---
> > cpus.c | 3 ++-
> > hw/pc.c | 3 ++-
> > sysemu.h | 3 ++-
> > vl.c | 43 +++++++++++++++++++++----------------------
> > 4 files changed, 27 insertions(+), 25 deletions(-)
> >
> > diff --git a/cpus.c b/cpus.c
> > index b182b3d..acccd08 100644
> > --- a/cpus.c
> > +++ b/cpus.c
> > @@ -36,6 +36,7 @@
> > #include "cpus.h"
> > #include "qtest.h"
> > #include "main-loop.h"
> > +#include "bitmap.h"
> >
> > #ifndef _WIN32
> > #include "compatfd.h"
> > @@ -1145,7 +1146,7 @@ void set_numa_modes(void)
> >
> > for (env = first_cpu; env != NULL; env = env->next_cpu) {
> > for (i = 0; i < nb_numa_nodes; i++) {
> > - if (node_cpumask[i] & (1 << env->cpu_index)) {
> > + if (test_bit(env->cpu_index, node_cpumask[i])) {
> > env->numa_node = i;
> > }
> > }
> > diff --git a/hw/pc.c b/hw/pc.c
> > index c7e9ab3..2edcc07 100644
> > --- a/hw/pc.c
> > +++ b/hw/pc.c
> > @@ -48,6 +48,7 @@
> > #include "memory.h"
> > #include "exec-memory.h"
> > #include "arch_init.h"
> > +#include "bitmap.h"
> >
> > /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -639,7
> > +640,7 @@ static void *bochs_bios_init(void)
> > numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> > for (i = 0; i < max_cpus; i++) {
> > for (j = 0; j < nb_numa_nodes; j++) {
> > - if (node_cpumask[j] & (1 << i)) {
> > + if (test_bit(i, node_cpumask[j])) {
> > numa_fw_cfg[i + 1] = cpu_to_le64(j);
> > break;
> > }
> > diff --git a/sysemu.h b/sysemu.h
> > index 6540c79..4669348 100644
> > --- a/sysemu.h
> > +++ b/sysemu.h
> > @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2]; extern
> > QEMUClock *rtc_clock;
> >
> > #define MAX_NODES 64
> > +#define MAX_CPUMASK_BITS 255
> > extern int nb_numa_nodes;
> > extern uint64_t node_mem[MAX_NODES];
> > -extern uint64_t node_cpumask[MAX_NODES];
> > +extern unsigned long *node_cpumask[MAX_NODES];
> >
> > #define MAX_OPTION_ROMS 16
> > typedef struct QEMUOptionRom {
> > diff --git a/vl.c b/vl.c
> > index 46248b9..35a641b 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -28,6 +28,7 @@
> > #include <errno.h>
> > #include <sys/time.h>
> > #include <zlib.h>
> > +#include "bitmap.h"
> >
> > /* Needed early for CONFIG_BSD etc. */ #include "config-host.h"
> > @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
> > QTAILQ_HEAD_INITIALIZER(fw_boot_order
> >
> > int nb_numa_nodes;
> > uint64_t node_mem[MAX_NODES];
> > -uint64_t node_cpumask[MAX_NODES];
> > +unsigned long *node_cpumask[MAX_NODES];
> >
> > uint8_t qemu_uuid[16];
> >
> > @@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
> > unsigned long long value, endvalue;
> > int nodenr;
> >
> > + value = endvalue = 0ULL;
> > +
> > optarg = get_opt_name(option, 128, optarg, ',') + 1;
> > if (!strcmp(option, "node")) {
> > if (get_param_value(option, 128, "nodeid", optarg) == 0) { @@
> > -970,27 +973,22 @@ static void numa_add(const char *optarg)
> > }
> > node_mem[nodenr] = sval;
> > }
> > - if (get_param_value(option, 128, "cpus", optarg) == 0) {
> > - node_cpumask[nodenr] = 0;
> > - } else {
> > + if (get_param_value(option, 128, "cpus", optarg) != 0) {
> > value = strtoull(option, &endptr, 10);
> > - if (value >= 64) {
> > - value = 63;
> > - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
> > + if (*endptr == '-') {
> > + endvalue = strtoull(endptr+1, &endptr, 10);
> > } else {
> > - if (*endptr == '-') {
> > - endvalue = strtoull(endptr+1, &endptr, 10);
> > - if (endvalue >= 63) {
> > - endvalue = 62;
> > - fprintf(stderr,
> > - "only 63 CPUs in NUMA mode supported.\n");
> > - }
> > - value = (2ULL << endvalue) - (1ULL << value);
> > - } else {
> > - value = 1ULL << value;
> > - }
> > + endvalue = value;
> > + }
> > +
> > + if (!(endvalue < MAX_CPUMASK_BITS)) {
> > + endvalue = MAX_CPUMASK_BITS - 1;
> > + fprintf(stderr,
> > + "A max of %d CPUs are supported in a guest\n",
> > + MAX_CPUMASK_BITS);
> > }
> > - node_cpumask[nodenr] = value;
> > +
> > + bitmap_set(node_cpumask[nodenr], value,
> > + endvalue-value+1);
> > }
> > nb_numa_nodes++;
> > }
> > @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
> >
> > for (i = 0; i < MAX_NODES; i++) {
> > node_mem[i] = 0;
> > - node_cpumask[i] = 0;
> > + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
> > }
> >
> > nb_numa_nodes = 0;
> > @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
> > }
> >
> > for (i = 0; i < nb_numa_nodes; i++) {
> > - if (node_cpumask[i] != 0)
> > + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
> > break;
> > + }
> > }
> > /* assigning the VCPUs round-robin is easier to implement, guest OSes
> > * must cope with this anyway, because there are BIOSes out
> > there in @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
> > */
> > if (i == nb_numa_nodes) {
> > for (i = 0; i < max_cpus; i++) {
> > - node_cpumask[i % nb_numa_nodes] |= 1 << i;
> > + set_bit(i, node_cpumask[i % nb_numa_nodes]);
> > }
> > }
> > }
> > --
> > 1.7.1
> >
>
> --
> Eduardo
--
Eduardo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
@ 2012-07-31 13:28 ` Eduardo Habkost
0 siblings, 0 replies; 10+ messages in thread
From: Eduardo Habkost @ 2012-07-31 13:28 UTC (permalink / raw)
To: Vinod, Chegu; +Cc: aliguori, qemu-devel, kvm
Ping? Can anybody help Vinod, here? How can we get the attention of some
maintainer, to get this pulled in?
On Wed, Jul 18, 2012 at 06:26:52PM +0000, Vinod, Chegu wrote:
> Thanks Eduardo !
>
> Hi Anthony, If you are ok with this patch...could you pl pull these changes into upstream (or)
> suggest who I should talk to get these changes in ?
>
> Thanks!
> Vinod
>
> -----Original Message-----
> From: Eduardo Habkost [mailto:ehabkost@redhat.com]
> Sent: Wednesday, July 18, 2012 10:15 AM
> To: Vinod, Chegu
> Cc: qemu-devel@nongnu.org; aliguori@us.ibm.com; kvm@vger.kernel.org
> Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
>
> On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote:
> > Changes since v3:
> > - using bitmap_set() instead of set_bit() in numa_add() routine.
> > - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
> > - Rebased to the latest qemu.
>
> Tested-by: Eduardo Habkost <ehabkost@redhat.com>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>
>
> >
> > Changes since v2:
> > - Using "unsigned long *" for the node_cpumask[].
> > - Use bitmap_new() instead of g_malloc0() for allocation.
> > - Don't rely on "max_cpus" since it may not be initialized
> > before the numa related qemu options are parsed & processed.
> >
> > Note: Continuing to use a new constant for allocation of
> > the mask (This constant is currently set to 255 since
> > with an 8bit APIC ID VCPUs can range from 0-254 in a
> > guest. The APIC ID 255 (0xFF) is reserved for broadcast).
> >
> > Changes since v1:
> >
> > - Use bitmap functions that are already in qemu (instead
> > of cpu_set_t macro's from sched.h)
> > - Added a check for endvalue >= max_cpus.
> > - Fix to address the round-robbing assignment when
> > cpu's are not explicitly specified.
> > -----------------------------------------------
> >
> > v1:
> > --
> >
> > The -numa option to qemu is used to create [fake] numa nodes and
> > expose them to the guest OS instance.
> >
> > There are a couple of issues with the -numa option:
> >
> > a) Max VCPU's that can be specified for a guest while using
> > the qemu's -numa option is 64. Due to a typecasting issue
> > when the number of VCPUs is > 32 the VCPUs don't show up
> > under the specified [fake] numa nodes.
> >
> > b) KVM currently has support for 160VCPUs per guest. The
> > qemu's -numa option has only support for upto 64VCPUs
> > per guest.
> > This patch addresses these two issues.
> >
> > Below are examples of (a) and (b)
> >
> > a) >32 VCPUs are specified with the -numa option:
> >
> > /usr/local/bin/qemu-system-x86_64 \
> > -enable-kvm \
> > 71:01:01 \
> > -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4
> >
> > ...
> > Upstream qemu :
> > --------------
> >
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 6 nodes
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0
> > size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45
> > 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24
> > 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3
> > cpus: 30 node 3 size: 131072 MB node 4 cpus:
> > node 4 size: 131072 MB
> > node 5 cpus: 31
> > node 5 size: 131072 MB
> >
> > With the patch applied :
> > -----------------------
> >
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 6 nodes
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> > node 0 size: 131072 MB
> > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node
> > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3
> > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4
> > cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5
> > cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB
> >
> > b) >64 VCPUs specified with -numa option:
> >
> > /usr/local/bin/qemu-system-x86_64 \
> > -enable-kvm \
> > -cpu
> > Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl
> > ,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
> >
> > ...
> >
> > Upstream qemu :
> > --------------
> >
> > only 63 CPUs in NUMA mode supported.
> > only 64 CPUs in NUMA mode supported.
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 8 nodes
> > node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB
> > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50
> > 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24
> > 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB
> > node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus:
> > node 4 size: 65536 MB
> > node 5 cpus:
> > node 5 size: 65536 MB
> > node 6 cpus: 31 63
> > node 6 size: 65536 MB
> > node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7
> > size: 65536 MB
> >
> > With the patch applied :
> > -----------------------
> >
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 8 nodes
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> > node 0 size: 65536 MB
> > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node
> > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3
> > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus:
> > 40 41 42 43 44 45 46 47 48 49 node 4 size: 65536 MB node 5 cpus: 50 51
> > 52 53 54 55 56 57 58 59 node 5 size: 65536 MB node 6 cpus: 60 61 62 63
> > 64 65 66 67 68 69 node 6 size: 65536 MB node 7 cpus: 70 71 72 73 74 75
> > 76 77 78 79
> >
> > Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull
> > <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
> > ---
> > cpus.c | 3 ++-
> > hw/pc.c | 3 ++-
> > sysemu.h | 3 ++-
> > vl.c | 43 +++++++++++++++++++++----------------------
> > 4 files changed, 27 insertions(+), 25 deletions(-)
> >
> > diff --git a/cpus.c b/cpus.c
> > index b182b3d..acccd08 100644
> > --- a/cpus.c
> > +++ b/cpus.c
> > @@ -36,6 +36,7 @@
> > #include "cpus.h"
> > #include "qtest.h"
> > #include "main-loop.h"
> > +#include "bitmap.h"
> >
> > #ifndef _WIN32
> > #include "compatfd.h"
> > @@ -1145,7 +1146,7 @@ void set_numa_modes(void)
> >
> > for (env = first_cpu; env != NULL; env = env->next_cpu) {
> > for (i = 0; i < nb_numa_nodes; i++) {
> > - if (node_cpumask[i] & (1 << env->cpu_index)) {
> > + if (test_bit(env->cpu_index, node_cpumask[i])) {
> > env->numa_node = i;
> > }
> > }
> > diff --git a/hw/pc.c b/hw/pc.c
> > index c7e9ab3..2edcc07 100644
> > --- a/hw/pc.c
> > +++ b/hw/pc.c
> > @@ -48,6 +48,7 @@
> > #include "memory.h"
> > #include "exec-memory.h"
> > #include "arch_init.h"
> > +#include "bitmap.h"
> >
> > /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -639,7
> > +640,7 @@ static void *bochs_bios_init(void)
> > numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> > for (i = 0; i < max_cpus; i++) {
> > for (j = 0; j < nb_numa_nodes; j++) {
> > - if (node_cpumask[j] & (1 << i)) {
> > + if (test_bit(i, node_cpumask[j])) {
> > numa_fw_cfg[i + 1] = cpu_to_le64(j);
> > break;
> > }
> > diff --git a/sysemu.h b/sysemu.h
> > index 6540c79..4669348 100644
> > --- a/sysemu.h
> > +++ b/sysemu.h
> > @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2]; extern
> > QEMUClock *rtc_clock;
> >
> > #define MAX_NODES 64
> > +#define MAX_CPUMASK_BITS 255
> > extern int nb_numa_nodes;
> > extern uint64_t node_mem[MAX_NODES];
> > -extern uint64_t node_cpumask[MAX_NODES];
> > +extern unsigned long *node_cpumask[MAX_NODES];
> >
> > #define MAX_OPTION_ROMS 16
> > typedef struct QEMUOptionRom {
> > diff --git a/vl.c b/vl.c
> > index 46248b9..35a641b 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -28,6 +28,7 @@
> > #include <errno.h>
> > #include <sys/time.h>
> > #include <zlib.h>
> > +#include "bitmap.h"
> >
> > /* Needed early for CONFIG_BSD etc. */ #include "config-host.h"
> > @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
> > QTAILQ_HEAD_INITIALIZER(fw_boot_order
> >
> > int nb_numa_nodes;
> > uint64_t node_mem[MAX_NODES];
> > -uint64_t node_cpumask[MAX_NODES];
> > +unsigned long *node_cpumask[MAX_NODES];
> >
> > uint8_t qemu_uuid[16];
> >
> > @@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
> > unsigned long long value, endvalue;
> > int nodenr;
> >
> > + value = endvalue = 0ULL;
> > +
> > optarg = get_opt_name(option, 128, optarg, ',') + 1;
> > if (!strcmp(option, "node")) {
> > if (get_param_value(option, 128, "nodeid", optarg) == 0) { @@
> > -970,27 +973,22 @@ static void numa_add(const char *optarg)
> > }
> > node_mem[nodenr] = sval;
> > }
> > - if (get_param_value(option, 128, "cpus", optarg) == 0) {
> > - node_cpumask[nodenr] = 0;
> > - } else {
> > + if (get_param_value(option, 128, "cpus", optarg) != 0) {
> > value = strtoull(option, &endptr, 10);
> > - if (value >= 64) {
> > - value = 63;
> > - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
> > + if (*endptr == '-') {
> > + endvalue = strtoull(endptr+1, &endptr, 10);
> > } else {
> > - if (*endptr == '-') {
> > - endvalue = strtoull(endptr+1, &endptr, 10);
> > - if (endvalue >= 63) {
> > - endvalue = 62;
> > - fprintf(stderr,
> > - "only 63 CPUs in NUMA mode supported.\n");
> > - }
> > - value = (2ULL << endvalue) - (1ULL << value);
> > - } else {
> > - value = 1ULL << value;
> > - }
> > + endvalue = value;
> > + }
> > +
> > + if (!(endvalue < MAX_CPUMASK_BITS)) {
> > + endvalue = MAX_CPUMASK_BITS - 1;
> > + fprintf(stderr,
> > + "A max of %d CPUs are supported in a guest\n",
> > + MAX_CPUMASK_BITS);
> > }
> > - node_cpumask[nodenr] = value;
> > +
> > + bitmap_set(node_cpumask[nodenr], value,
> > + endvalue-value+1);
> > }
> > nb_numa_nodes++;
> > }
> > @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
> >
> > for (i = 0; i < MAX_NODES; i++) {
> > node_mem[i] = 0;
> > - node_cpumask[i] = 0;
> > + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
> > }
> >
> > nb_numa_nodes = 0;
> > @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
> > }
> >
> > for (i = 0; i < nb_numa_nodes; i++) {
> > - if (node_cpumask[i] != 0)
> > + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
> > break;
> > + }
> > }
> > /* assigning the VCPUs round-robin is easier to implement, guest OSes
> > * must cope with this anyway, because there are BIOSes out
> > there in @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
> > */
> > if (i == nb_numa_nodes) {
> > for (i = 0; i < max_cpus; i++) {
> > - node_cpumask[i % nb_numa_nodes] |= 1 << i;
> > + set_bit(i, node_cpumask[i % nb_numa_nodes]);
> > }
> > }
> > }
> > --
> > 1.7.1
> >
>
> --
> Eduardo
--
Eduardo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] Fixes related to processing of qemu's -numa option
2012-07-17 4:31 ` [Qemu-devel] " Chegu Vinod
@ 2012-08-04 16:13 ` Blue Swirl
-1 siblings, 0 replies; 10+ messages in thread
From: Blue Swirl @ 2012-08-04 16:13 UTC (permalink / raw)
To: Chegu Vinod; +Cc: aliguori, qemu-devel, kvm, ehabkost
Thanks, applied.
On Tue, Jul 17, 2012 at 4:31 AM, Chegu Vinod <chegu_vinod@hp.com> wrote:
> Changes since v3:
> - using bitmap_set() instead of set_bit() in numa_add() routine.
> - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
> - Rebased to the latest qemu.
>
> Changes since v2:
> - Using "unsigned long *" for the node_cpumask[].
> - Use bitmap_new() instead of g_malloc0() for allocation.
> - Don't rely on "max_cpus" since it may not be initialized
> before the numa related qemu options are parsed & processed.
>
> Note: Continuing to use a new constant for allocation of
> the mask (This constant is currently set to 255 since
> with an 8bit APIC ID VCPUs can range from 0-254 in a
> guest. The APIC ID 255 (0xFF) is reserved for broadcast).
>
> Changes since v1:
>
> - Use bitmap functions that are already in qemu (instead
> of cpu_set_t macro's from sched.h)
> - Added a check for endvalue >= max_cpus.
> - Fix to address the round-robbing assignment when
> cpu's are not explicitly specified.
> -----------------------------------------------
>
> v1:
> --
>
> The -numa option to qemu is used to create [fake] numa nodes
> and expose them to the guest OS instance.
>
> There are a couple of issues with the -numa option:
>
> a) Max VCPU's that can be specified for a guest while using
> the qemu's -numa option is 64. Due to a typecasting issue
> when the number of VCPUs is > 32 the VCPUs don't show up
> under the specified [fake] numa nodes.
>
> b) KVM currently has support for 160VCPUs per guest. The
> qemu's -numa option has only support for upto 64VCPUs
> per guest.
> This patch addresses these two issues.
>
> Below are examples of (a) and (b)
>
> a) >32 VCPUs are specified with the -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> 71:01:01 \
> -net tap,ifname=tap0,script=no,downscript=no \
> -vnc :4
>
> ...
> Upstream qemu :
> --------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59
> node 2 size: 131072 MB
> node 3 cpus: 30
> node 3 size: 131072 MB
> node 4 cpus:
> node 4 size: 131072 MB
> node 5 cpus: 31
> node 5 size: 131072 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 131072 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 131072 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 131072 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 131072 MB
>
> b) >64 VCPUs specified with -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
>
> ...
>
> Upstream qemu :
> --------------
>
> only 63 CPUs in NUMA mode supported.
> only 64 CPUs in NUMA mode supported.
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61
> node 2 size: 65536 MB
> node 3 cpus: 30 62
> node 3 size: 65536 MB
> node 4 cpus:
> node 4 size: 65536 MB
> node 5 cpus:
> node 5 size: 65536 MB
> node 6 cpus: 31 63
> node 6 size: 65536 MB
> node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69
> node 7 size: 65536 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 65536 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 65536 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 65536 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 65536 MB
> node 6 cpus: 60 61 62 63 64 65 66 67 68 69
> node 6 size: 65536 MB
> node 7 cpus: 70 71 72 73 74 75 76 77 78 79
>
> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
> ---
> cpus.c | 3 ++-
> hw/pc.c | 3 ++-
> sysemu.h | 3 ++-
> vl.c | 43 +++++++++++++++++++++----------------------
> 4 files changed, 27 insertions(+), 25 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index b182b3d..acccd08 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -36,6 +36,7 @@
> #include "cpus.h"
> #include "qtest.h"
> #include "main-loop.h"
> +#include "bitmap.h"
>
> #ifndef _WIN32
> #include "compatfd.h"
> @@ -1145,7 +1146,7 @@ void set_numa_modes(void)
>
> for (env = first_cpu; env != NULL; env = env->next_cpu) {
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] & (1 << env->cpu_index)) {
> + if (test_bit(env->cpu_index, node_cpumask[i])) {
> env->numa_node = i;
> }
> }
> diff --git a/hw/pc.c b/hw/pc.c
> index c7e9ab3..2edcc07 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -48,6 +48,7 @@
> #include "memory.h"
> #include "exec-memory.h"
> #include "arch_init.h"
> +#include "bitmap.h"
>
> /* output Bochs bios info messages */
> //#define DEBUG_BIOS
> @@ -639,7 +640,7 @@ static void *bochs_bios_init(void)
> numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> for (i = 0; i < max_cpus; i++) {
> for (j = 0; j < nb_numa_nodes; j++) {
> - if (node_cpumask[j] & (1 << i)) {
> + if (test_bit(i, node_cpumask[j])) {
> numa_fw_cfg[i + 1] = cpu_to_le64(j);
> break;
> }
> diff --git a/sysemu.h b/sysemu.h
> index 6540c79..4669348 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2];
> extern QEMUClock *rtc_clock;
>
> #define MAX_NODES 64
> +#define MAX_CPUMASK_BITS 255
> extern int nb_numa_nodes;
> extern uint64_t node_mem[MAX_NODES];
> -extern uint64_t node_cpumask[MAX_NODES];
> +extern unsigned long *node_cpumask[MAX_NODES];
>
> #define MAX_OPTION_ROMS 16
> typedef struct QEMUOptionRom {
> diff --git a/vl.c b/vl.c
> index 46248b9..35a641b 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -28,6 +28,7 @@
> #include <errno.h>
> #include <sys/time.h>
> #include <zlib.h>
> +#include "bitmap.h"
>
> /* Needed early for CONFIG_BSD etc. */
> #include "config-host.h"
> @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order
>
> int nb_numa_nodes;
> uint64_t node_mem[MAX_NODES];
> -uint64_t node_cpumask[MAX_NODES];
> +unsigned long *node_cpumask[MAX_NODES];
>
> uint8_t qemu_uuid[16];
>
> @@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
> unsigned long long value, endvalue;
> int nodenr;
>
> + value = endvalue = 0ULL;
> +
> optarg = get_opt_name(option, 128, optarg, ',') + 1;
> if (!strcmp(option, "node")) {
> if (get_param_value(option, 128, "nodeid", optarg) == 0) {
> @@ -970,27 +973,22 @@ static void numa_add(const char *optarg)
> }
> node_mem[nodenr] = sval;
> }
> - if (get_param_value(option, 128, "cpus", optarg) == 0) {
> - node_cpumask[nodenr] = 0;
> - } else {
> + if (get_param_value(option, 128, "cpus", optarg) != 0) {
> value = strtoull(option, &endptr, 10);
> - if (value >= 64) {
> - value = 63;
> - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
> + if (*endptr == '-') {
> + endvalue = strtoull(endptr+1, &endptr, 10);
> } else {
> - if (*endptr == '-') {
> - endvalue = strtoull(endptr+1, &endptr, 10);
> - if (endvalue >= 63) {
> - endvalue = 62;
> - fprintf(stderr,
> - "only 63 CPUs in NUMA mode supported.\n");
> - }
> - value = (2ULL << endvalue) - (1ULL << value);
> - } else {
> - value = 1ULL << value;
> - }
> + endvalue = value;
> + }
> +
> + if (!(endvalue < MAX_CPUMASK_BITS)) {
> + endvalue = MAX_CPUMASK_BITS - 1;
> + fprintf(stderr,
> + "A max of %d CPUs are supported in a guest\n",
> + MAX_CPUMASK_BITS);
> }
> - node_cpumask[nodenr] = value;
> +
> + bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
> }
> nb_numa_nodes++;
> }
> @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
>
> for (i = 0; i < MAX_NODES; i++) {
> node_mem[i] = 0;
> - node_cpumask[i] = 0;
> + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
> }
>
> nb_numa_nodes = 0;
> @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
> }
>
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] != 0)
> + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
> break;
> + }
> }
> /* assigning the VCPUs round-robin is easier to implement, guest OSes
> * must cope with this anyway, because there are BIOSes out there in
> @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
> */
> if (i == nb_numa_nodes) {
> for (i = 0; i < max_cpus; i++) {
> - node_cpumask[i % nb_numa_nodes] |= 1 << i;
> + set_bit(i, node_cpumask[i % nb_numa_nodes]);
> }
> }
> }
> --
> 1.7.1
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
@ 2012-08-04 16:13 ` Blue Swirl
0 siblings, 0 replies; 10+ messages in thread
From: Blue Swirl @ 2012-08-04 16:13 UTC (permalink / raw)
To: Chegu Vinod; +Cc: aliguori, qemu-devel, kvm, ehabkost
Thanks, applied.
On Tue, Jul 17, 2012 at 4:31 AM, Chegu Vinod <chegu_vinod@hp.com> wrote:
> Changes since v3:
> - using bitmap_set() instead of set_bit() in numa_add() routine.
> - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
> - Rebased to the latest qemu.
>
> Changes since v2:
> - Using "unsigned long *" for the node_cpumask[].
> - Use bitmap_new() instead of g_malloc0() for allocation.
> - Don't rely on "max_cpus" since it may not be initialized
> before the numa related qemu options are parsed & processed.
>
> Note: Continuing to use a new constant for allocation of
> the mask (This constant is currently set to 255 since
> with an 8bit APIC ID VCPUs can range from 0-254 in a
> guest. The APIC ID 255 (0xFF) is reserved for broadcast).
>
> Changes since v1:
>
> - Use bitmap functions that are already in qemu (instead
> of cpu_set_t macro's from sched.h)
> - Added a check for endvalue >= max_cpus.
> - Fix to address the round-robbing assignment when
> cpu's are not explicitly specified.
> -----------------------------------------------
>
> v1:
> --
>
> The -numa option to qemu is used to create [fake] numa nodes
> and expose them to the guest OS instance.
>
> There are a couple of issues with the -numa option:
>
> a) Max VCPU's that can be specified for a guest while using
> the qemu's -numa option is 64. Due to a typecasting issue
> when the number of VCPUs is > 32 the VCPUs don't show up
> under the specified [fake] numa nodes.
>
> b) KVM currently has support for 160VCPUs per guest. The
> qemu's -numa option has only support for upto 64VCPUs
> per guest.
> This patch addresses these two issues.
>
> Below are examples of (a) and (b)
>
> a) >32 VCPUs are specified with the -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> 71:01:01 \
> -net tap,ifname=tap0,script=no,downscript=no \
> -vnc :4
>
> ...
> Upstream qemu :
> --------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59
> node 2 size: 131072 MB
> node 3 cpus: 30
> node 3 size: 131072 MB
> node 4 cpus:
> node 4 size: 131072 MB
> node 5 cpus: 31
> node 5 size: 131072 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 131072 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 131072 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 131072 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 131072 MB
>
> b) >64 VCPUs specified with -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
>
> ...
>
> Upstream qemu :
> --------------
>
> only 63 CPUs in NUMA mode supported.
> only 64 CPUs in NUMA mode supported.
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61
> node 2 size: 65536 MB
> node 3 cpus: 30 62
> node 3 size: 65536 MB
> node 4 cpus:
> node 4 size: 65536 MB
> node 5 cpus:
> node 5 size: 65536 MB
> node 6 cpus: 31 63
> node 6 size: 65536 MB
> node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69
> node 7 size: 65536 MB
>
> With the patch applied :
> -----------------------
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 65536 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 65536 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 65536 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 65536 MB
> node 6 cpus: 60 61 62 63 64 65 66 67 68 69
> node 6 size: 65536 MB
> node 7 cpus: 70 71 72 73 74 75 76 77 78 79
>
> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
> ---
> cpus.c | 3 ++-
> hw/pc.c | 3 ++-
> sysemu.h | 3 ++-
> vl.c | 43 +++++++++++++++++++++----------------------
> 4 files changed, 27 insertions(+), 25 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index b182b3d..acccd08 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -36,6 +36,7 @@
> #include "cpus.h"
> #include "qtest.h"
> #include "main-loop.h"
> +#include "bitmap.h"
>
> #ifndef _WIN32
> #include "compatfd.h"
> @@ -1145,7 +1146,7 @@ void set_numa_modes(void)
>
> for (env = first_cpu; env != NULL; env = env->next_cpu) {
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] & (1 << env->cpu_index)) {
> + if (test_bit(env->cpu_index, node_cpumask[i])) {
> env->numa_node = i;
> }
> }
> diff --git a/hw/pc.c b/hw/pc.c
> index c7e9ab3..2edcc07 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -48,6 +48,7 @@
> #include "memory.h"
> #include "exec-memory.h"
> #include "arch_init.h"
> +#include "bitmap.h"
>
> /* output Bochs bios info messages */
> //#define DEBUG_BIOS
> @@ -639,7 +640,7 @@ static void *bochs_bios_init(void)
> numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> for (i = 0; i < max_cpus; i++) {
> for (j = 0; j < nb_numa_nodes; j++) {
> - if (node_cpumask[j] & (1 << i)) {
> + if (test_bit(i, node_cpumask[j])) {
> numa_fw_cfg[i + 1] = cpu_to_le64(j);
> break;
> }
> diff --git a/sysemu.h b/sysemu.h
> index 6540c79..4669348 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2];
> extern QEMUClock *rtc_clock;
>
> #define MAX_NODES 64
> +#define MAX_CPUMASK_BITS 255
> extern int nb_numa_nodes;
> extern uint64_t node_mem[MAX_NODES];
> -extern uint64_t node_cpumask[MAX_NODES];
> +extern unsigned long *node_cpumask[MAX_NODES];
>
> #define MAX_OPTION_ROMS 16
> typedef struct QEMUOptionRom {
> diff --git a/vl.c b/vl.c
> index 46248b9..35a641b 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -28,6 +28,7 @@
> #include <errno.h>
> #include <sys/time.h>
> #include <zlib.h>
> +#include "bitmap.h"
>
> /* Needed early for CONFIG_BSD etc. */
> #include "config-host.h"
> @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order
>
> int nb_numa_nodes;
> uint64_t node_mem[MAX_NODES];
> -uint64_t node_cpumask[MAX_NODES];
> +unsigned long *node_cpumask[MAX_NODES];
>
> uint8_t qemu_uuid[16];
>
> @@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
> unsigned long long value, endvalue;
> int nodenr;
>
> + value = endvalue = 0ULL;
> +
> optarg = get_opt_name(option, 128, optarg, ',') + 1;
> if (!strcmp(option, "node")) {
> if (get_param_value(option, 128, "nodeid", optarg) == 0) {
> @@ -970,27 +973,22 @@ static void numa_add(const char *optarg)
> }
> node_mem[nodenr] = sval;
> }
> - if (get_param_value(option, 128, "cpus", optarg) == 0) {
> - node_cpumask[nodenr] = 0;
> - } else {
> + if (get_param_value(option, 128, "cpus", optarg) != 0) {
> value = strtoull(option, &endptr, 10);
> - if (value >= 64) {
> - value = 63;
> - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
> + if (*endptr == '-') {
> + endvalue = strtoull(endptr+1, &endptr, 10);
> } else {
> - if (*endptr == '-') {
> - endvalue = strtoull(endptr+1, &endptr, 10);
> - if (endvalue >= 63) {
> - endvalue = 62;
> - fprintf(stderr,
> - "only 63 CPUs in NUMA mode supported.\n");
> - }
> - value = (2ULL << endvalue) - (1ULL << value);
> - } else {
> - value = 1ULL << value;
> - }
> + endvalue = value;
> + }
> +
> + if (!(endvalue < MAX_CPUMASK_BITS)) {
> + endvalue = MAX_CPUMASK_BITS - 1;
> + fprintf(stderr,
> + "A max of %d CPUs are supported in a guest\n",
> + MAX_CPUMASK_BITS);
> }
> - node_cpumask[nodenr] = value;
> +
> + bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
> }
> nb_numa_nodes++;
> }
> @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
>
> for (i = 0; i < MAX_NODES; i++) {
> node_mem[i] = 0;
> - node_cpumask[i] = 0;
> + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
> }
>
> nb_numa_nodes = 0;
> @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
> }
>
> for (i = 0; i < nb_numa_nodes; i++) {
> - if (node_cpumask[i] != 0)
> + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
> break;
> + }
> }
> /* assigning the VCPUs round-robin is easier to implement, guest OSes
> * must cope with this anyway, because there are BIOSes out there in
> @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
> */
> if (i == nb_numa_nodes) {
> for (i = 0; i < max_cpus; i++) {
> - node_cpumask[i % nb_numa_nodes] |= 1 << i;
> + set_bit(i, node_cpumask[i % nb_numa_nodes]);
> }
> }
> }
> --
> 1.7.1
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-08-04 16:13 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-17 4:31 [PATCH v4] Fixes related to processing of qemu's -numa option Chegu Vinod
2012-07-17 4:31 ` [Qemu-devel] " Chegu Vinod
2012-07-18 17:15 ` Eduardo Habkost
2012-07-18 17:15 ` [Qemu-devel] " Eduardo Habkost
2012-07-18 18:26 ` Vinod, Chegu
2012-07-18 18:26 ` Vinod, Chegu
2012-07-31 13:28 ` Eduardo Habkost
2012-07-31 13:28 ` [Qemu-devel] " Eduardo Habkost
2012-08-04 16:13 ` Blue Swirl
2012-08-04 16:13 ` [Qemu-devel] " Blue Swirl
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.