With the development of arm gic architecture, we think it will be useful to add some performance test in kut to measure the cost of interrupts. In this series, we add GICv4.1 support for ipi latency test and implement LPI/vtimer latency test. This series of patches has been tested on GICv4.1 supported hardware. Note: Based on patch "arm/arm64: timer: Extract irqs at setup time", https://www.spinics.net/lists/kvm-arm/msg41425.html * From v2: - Code and commit message cleanup - Clear nr_ipi_received before ipi_exec() thanks for Tao Zeng's review - rebase the patch "Add vtimer latency test" on Andrew's patch - Add test->post() to get actual PPI latency * From v1: - Fix spelling mistake - Use the existing interface to inject hw sgi to simply the logic - Add two separate patches to limit the running times and time cost of each individual micro-bench test Jingyi Wang (10): arm64: microbench: get correct ipi received num arm64: microbench: Generalize ipi test names arm64: microbench: gic: Add ipi latency test for gicv4.1 support kvm arm64: its: Handle its command queue wrapping arm64: microbench: its: Add LPI latency test arm64: microbench: Allow each test to specify its running times arm64: microbench: Add time limit for each individual test arm64: microbench: Add vtimer latency test arm64: microbench: Add test->post() to further process test results arm64: microbench: Add timer_post() to get actual PPI latency arm/micro-bench.c | 256 ++++++++++++++++++++++++++++++------- lib/arm/asm/gic-v3.h | 3 + lib/arm/asm/gic.h | 1 + lib/arm64/gic-v3-its-cmd.c | 3 +- 4 files changed, 219 insertions(+), 44 deletions(-) -- 2.19.1
If ipi_exec() fails because of timeout, we shouldn't increase the number of ipi received. Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> --- arm/micro-bench.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arm/micro-bench.c b/arm/micro-bench.c index 4612f41..794dfac 100644 --- a/arm/micro-bench.c +++ b/arm/micro-bench.c @@ -103,7 +103,9 @@ static void ipi_exec(void) while (!ipi_received && tries--) cpu_relax(); - ++received; + if (ipi_received) + ++received; + assert_msg(ipi_received, "failed to receive IPI in time, but received %d successfully\n", received); } -- 2.19.1
Later patches will use these functions for gic(ipi/lpi/timer) tests. Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> --- arm/micro-bench.c | 39 ++++++++++++++++++++++----------------- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/arm/micro-bench.c b/arm/micro-bench.c index 794dfac..fc4d356 100644 --- a/arm/micro-bench.c +++ b/arm/micro-bench.c @@ -25,24 +25,24 @@ static u32 cntfrq; -static volatile bool ipi_ready, ipi_received; +static volatile bool irq_ready, irq_received; static void *vgic_dist_base; static void (*write_eoir)(u32 irqstat); -static void ipi_irq_handler(struct pt_regs *regs) +static void gic_irq_handler(struct pt_regs *regs) { - ipi_ready = false; - ipi_received = true; + irq_ready = false; + irq_received = true; gic_write_eoir(gic_read_iar()); - ipi_ready = true; + irq_ready = true; } -static void ipi_secondary_entry(void *data) +static void gic_secondary_entry(void *data) { - install_irq_handler(EL1H_IRQ, ipi_irq_handler); + install_irq_handler(EL1H_IRQ, gic_irq_handler); gic_enable_defaults(); local_irq_enable(); - ipi_ready = true; + irq_ready = true; while (true) cpu_relax(); } @@ -72,9 +72,9 @@ static bool test_init(void) break; } - ipi_ready = false; + irq_ready = false; gic_enable_defaults(); - on_cpu_async(1, ipi_secondary_entry, NULL); + on_cpu_async(1, gic_secondary_entry, NULL); cntfrq = get_cntfrq(); printf("Timer Frequency %d Hz (Output in microseconds)\n", cntfrq); @@ -82,13 +82,18 @@ static bool test_init(void) return true; } -static void ipi_prep(void) +static void gic_prep_common(void) { unsigned tries = 1 << 28; - while (!ipi_ready && tries--) + while (!irq_ready && tries--) cpu_relax(); - assert(ipi_ready); + assert(irq_ready); +} + +static void ipi_prep(void) +{ + gic_prep_common(); } static void ipi_exec(void) @@ -96,17 +101,17 @@ static void ipi_exec(void) unsigned tries = 1 << 28; static int received = 0; - ipi_received = false; + irq_received = false; gic_ipi_send_single(1, 1); - while (!ipi_received && tries--) + while (!irq_received && tries--) cpu_relax(); - if (ipi_received) + if (irq_received) ++received; - assert_msg(ipi_received, "failed to receive IPI in time, but received %d successfully\n", received); + assert_msg(irq_received, "failed to receive IPI in time, but received %d successfully\n", received); } static void hvc_exec(void) -- 2.19.1
If gicv4.1(sgi hardware injection) is supported in kvm, we test ipi injection via hw/sw way separately. Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> --- arm/micro-bench.c | 62 ++++++++++++++++++++++++++++++++++++++------ lib/arm/asm/gic-v3.h | 3 +++ lib/arm/asm/gic.h | 1 + 3 files changed, 58 insertions(+), 8 deletions(-) diff --git a/arm/micro-bench.c b/arm/micro-bench.c index fc4d356..f8314db 100644 --- a/arm/micro-bench.c +++ b/arm/micro-bench.c @@ -26,6 +26,8 @@ static u32 cntfrq; static volatile bool irq_ready, irq_received; +static int nr_ipi_received; + static void *vgic_dist_base; static void (*write_eoir)(u32 irqstat); @@ -91,15 +93,55 @@ static void gic_prep_common(void) assert(irq_ready); } -static void ipi_prep(void) +static bool ipi_prep(void) +{ + u32 val; + + val = readl(vgic_dist_base + GICD_CTLR); + if (readl(vgic_dist_base + GICD_TYPER2) & GICD_TYPER2_nASSGIcap) { + /* nASSGIreq can be changed only when GICD is disabled */ + val &= ~GICD_CTLR_ENABLE_G1A; + val &= ~GICD_CTLR_nASSGIreq; + writel(val, vgic_dist_base + GICD_CTLR); + gicv3_dist_wait_for_rwp(); + + val |= GICD_CTLR_ENABLE_G1A; + writel(val, vgic_dist_base + GICD_CTLR); + gicv3_dist_wait_for_rwp(); + } + + nr_ipi_received = 0; + gic_prep_common(); + return true; +} + +static bool ipi_hw_prep(void) { + u32 val; + + val = readl(vgic_dist_base + GICD_CTLR); + if (readl(vgic_dist_base + GICD_TYPER2) & GICD_TYPER2_nASSGIcap) { + /* nASSGIreq can be changed only when GICD is disabled */ + val &= ~GICD_CTLR_ENABLE_G1A; + val |= GICD_CTLR_nASSGIreq; + writel(val, vgic_dist_base + GICD_CTLR); + gicv3_dist_wait_for_rwp(); + + val |= GICD_CTLR_ENABLE_G1A; + writel(val, vgic_dist_base + GICD_CTLR); + gicv3_dist_wait_for_rwp(); + } else { + return false; + } + + nr_ipi_received = 0; gic_prep_common(); + return true; } static void ipi_exec(void) { unsigned tries = 1 << 28; - static int received = 0; irq_received = false; @@ -109,9 +151,9 @@ static void ipi_exec(void) cpu_relax(); if (irq_received) - ++received; + ++nr_ipi_received; - assert_msg(irq_received, "failed to receive IPI in time, but received %d successfully\n", received); + assert_msg(irq_received, "failed to receive IPI in time, but received %d successfully\n", nr_ipi_received); } static void hvc_exec(void) @@ -147,7 +189,7 @@ static void eoi_exec(void) struct exit_test { const char *name; - void (*prep)(void); + bool (*prep)(void); void (*exec)(void); bool run; }; @@ -158,6 +200,7 @@ static struct exit_test tests[] = { {"mmio_read_vgic", NULL, mmio_read_vgic_exec, true}, {"eoi", NULL, eoi_exec, true}, {"ipi", ipi_prep, ipi_exec, true}, + {"ipi_hw", ipi_hw_prep, ipi_exec, true}, }; struct ns_time { @@ -181,9 +224,12 @@ static void loop_test(struct exit_test *test) uint64_t start, end, total_ticks, ntimes = NTIMES; struct ns_time total_ns, avg_ns; - if (test->prep) - test->prep(); - + if (test->prep) { + if(!test->prep()) { + printf("%s test skipped\n", test->name); + return; + } + } isb(); start = read_sysreg(cntpct_el0); while (ntimes--) diff --git a/lib/arm/asm/gic-v3.h b/lib/arm/asm/gic-v3.h index cb72922..b4ce130 100644 --- a/lib/arm/asm/gic-v3.h +++ b/lib/arm/asm/gic-v3.h @@ -20,10 +20,13 @@ */ #define GICD_CTLR 0x0000 #define GICD_CTLR_RWP (1U << 31) +#define GICD_CTLR_nASSGIreq (1U << 8) #define GICD_CTLR_ARE_NS (1U << 4) #define GICD_CTLR_ENABLE_G1A (1U << 1) #define GICD_CTLR_ENABLE_G1 (1U << 0) +#define GICD_TYPER2_nASSGIcap (1U << 8) + /* Re-Distributor registers, offsets from RD_base */ #define GICR_TYPER 0x0008 diff --git a/lib/arm/asm/gic.h b/lib/arm/asm/gic.h index 38e79b2..1898400 100644 --- a/lib/arm/asm/gic.h +++ b/lib/arm/asm/gic.h @@ -13,6 +13,7 @@ #define GICD_CTLR 0x0000 #define GICD_TYPER 0x0004 #define GICD_IIDR 0x0008 +#define GICD_TYPER2 0x000C #define GICD_IGROUPR 0x0080 #define GICD_ISENABLER 0x0100 #define GICD_ICENABLER 0x0180 -- 2.19.1
Because micro-bench may send a large number of ITS commands, we should handle ITS command queue wrapping as kernel instead of just failing the test. Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> --- lib/arm64/gic-v3-its-cmd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/arm64/gic-v3-its-cmd.c b/lib/arm64/gic-v3-its-cmd.c index 2c208d1..34574f7 100644 --- a/lib/arm64/gic-v3-its-cmd.c +++ b/lib/arm64/gic-v3-its-cmd.c @@ -164,8 +164,9 @@ static struct its_cmd_block *its_allocate_entry(void) { struct its_cmd_block *cmd; - assert((u64)its_data.cmd_write < (u64)its_data.cmd_base + SZ_64K); cmd = its_data.cmd_write++; + if ((u64)its_data.cmd_write == (u64)its_data.cmd_base + SZ_64K) + its_data.cmd_write = its_data.cmd_base; return cmd; } -- 2.19.1
Triggers LPIs through the INT command and test the latency. Mostly inherited form commit 0ef02cd6cbaa(arm/arm64: ITS: INT functional tests). Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> --- arm/micro-bench.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/arm/micro-bench.c b/arm/micro-bench.c index f8314db..82f3c07 100644 --- a/arm/micro-bench.c +++ b/arm/micro-bench.c @@ -20,6 +20,7 @@ */ #include <libcflat.h> #include <asm/gic.h> +#include <asm/gic-v3-its.h> #define NTIMES (1U << 16) @@ -156,6 +157,48 @@ static void ipi_exec(void) assert_msg(irq_received, "failed to receive IPI in time, but received %d successfully\n", nr_ipi_received); } +static bool lpi_prep(void) +{ + struct its_collection *col1; + struct its_device *dev2; + + if (!gicv3_its_base()) + return false; + + its_enable_defaults(); + dev2 = its_create_device(2 /* dev id */, 8 /* nb_ites */); + col1 = its_create_collection(1 /* col id */, 1 /* target PE */); + gicv3_lpi_set_config(8199, LPI_PROP_DEFAULT); + + its_send_mapd_nv(dev2, true); + its_send_mapc_nv(col1, true); + its_send_invall_nv(col1); + its_send_mapti_nv(dev2, 8199 /* lpi id */, 20 /* event id */, col1); + + gic_prep_common(); + return true; +} + +static void lpi_exec(void) +{ + struct its_device *dev2; + unsigned tries = 1 << 28; + static int received = 0; + + irq_received = false; + + dev2 = its_get_device(2); + its_send_int_nv(dev2, 20); + + while (!irq_received && tries--) + cpu_relax(); + + if (irq_received) + ++received; + + assert_msg(irq_received, "failed to receive LPI in time, but received %d successfully\n", received); +} + static void hvc_exec(void) { asm volatile("mov w0, #0x4b000000; hvc #0" ::: "w0"); @@ -201,6 +244,7 @@ static struct exit_test tests[] = { {"eoi", NULL, eoi_exec, true}, {"ipi", ipi_prep, ipi_exec, true}, {"ipi_hw", ipi_hw_prep, ipi_exec, true}, + {"lpi", lpi_prep, lpi_exec, true}, }; struct ns_time { -- 2.19.1
For some test in micro-bench can be time consuming, we add a micro-bench test parameter to allow each individual test to specify its running times. Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> --- arm/micro-bench.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/arm/micro-bench.c b/arm/micro-bench.c index 82f3c07..93bd855 100644 --- a/arm/micro-bench.c +++ b/arm/micro-bench.c @@ -22,8 +22,6 @@ #include <asm/gic.h> #include <asm/gic-v3-its.h> -#define NTIMES (1U << 16) - static u32 cntfrq; static volatile bool irq_ready, irq_received; @@ -234,17 +232,18 @@ struct exit_test { const char *name; bool (*prep)(void); void (*exec)(void); + u32 times; bool run; }; static struct exit_test tests[] = { - {"hvc", NULL, hvc_exec, true}, - {"mmio_read_user", NULL, mmio_read_user_exec, true}, - {"mmio_read_vgic", NULL, mmio_read_vgic_exec, true}, - {"eoi", NULL, eoi_exec, true}, - {"ipi", ipi_prep, ipi_exec, true}, - {"ipi_hw", ipi_hw_prep, ipi_exec, true}, - {"lpi", lpi_prep, lpi_exec, true}, + {"hvc", NULL, hvc_exec, 65536, true}, + {"mmio_read_user", NULL, mmio_read_user_exec, 65536, true}, + {"mmio_read_vgic", NULL, mmio_read_vgic_exec, 65536, true}, + {"eoi", NULL, eoi_exec, 65536, true}, + {"ipi", ipi_prep, ipi_exec, 65536, true}, + {"ipi_hw", ipi_hw_prep, ipi_exec, 65536, true}, + {"lpi", lpi_prep, lpi_exec, 65536, true}, }; struct ns_time { @@ -265,7 +264,7 @@ static void ticks_to_ns_time(uint64_t ticks, struct ns_time *ns_time) static void loop_test(struct exit_test *test) { - uint64_t start, end, total_ticks, ntimes = NTIMES; + uint64_t start, end, total_ticks, ntimes = 0; struct ns_time total_ns, avg_ns; if (test->prep) { @@ -276,15 +275,17 @@ static void loop_test(struct exit_test *test) } isb(); start = read_sysreg(cntpct_el0); - while (ntimes--) + while (ntimes < test->times) { test->exec(); + ntimes++; + } isb(); end = read_sysreg(cntpct_el0); total_ticks = end - start; ticks_to_ns_time(total_ticks, &total_ns); - avg_ns.ns = total_ns.ns / NTIMES; - avg_ns.ns_frac = total_ns.ns_frac / NTIMES; + avg_ns.ns = total_ns.ns / ntimes; + avg_ns.ns_frac = total_ns.ns_frac / ntimes; printf("%-30s%15" PRId64 ".%-15" PRId64 "%15" PRId64 ".%-15" PRId64 "\n", test->name, total_ns.ns, total_ns.ns_frac, avg_ns.ns, avg_ns.ns_frac); -- 2.19.1
Besides using separate running times parameter, we add time limit for loop_test to make sure each test should be done in a certain time(5 sec here). Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> --- arm/micro-bench.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/arm/micro-bench.c b/arm/micro-bench.c index 93bd855..09d9d53 100644 --- a/arm/micro-bench.c +++ b/arm/micro-bench.c @@ -22,6 +22,7 @@ #include <asm/gic.h> #include <asm/gic-v3-its.h> +#define NS_5_SECONDS (5 * 1000 * 1000 * 1000UL) static u32 cntfrq; static volatile bool irq_ready, irq_received; @@ -267,23 +268,26 @@ static void loop_test(struct exit_test *test) uint64_t start, end, total_ticks, ntimes = 0; struct ns_time total_ns, avg_ns; + total_ticks = 0; if (test->prep) { if(!test->prep()) { printf("%s test skipped\n", test->name); return; } } - isb(); - start = read_sysreg(cntpct_el0); - while (ntimes < test->times) { + + while (ntimes < test->times && total_ns.ns < NS_5_SECONDS) { + isb(); + start = read_sysreg(cntpct_el0); test->exec(); + isb(); + end = read_sysreg(cntpct_el0); + ntimes++; + total_ticks += (end - start); + ticks_to_ns_time(total_ticks, &total_ns); } - isb(); - end = read_sysreg(cntpct_el0); - total_ticks = end - start; - ticks_to_ns_time(total_ticks, &total_ns); avg_ns.ns = total_ns.ns / ntimes; avg_ns.ns_frac = total_ns.ns_frac / ntimes; -- 2.19.1
Trigger PPIs by setting up a 10msec timer and test the latency. Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> --- arm/micro-bench.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 52 insertions(+), 1 deletion(-) diff --git a/arm/micro-bench.c b/arm/micro-bench.c index 09d9d53..1e1bde5 100644 --- a/arm/micro-bench.c +++ b/arm/micro-bench.c @@ -21,8 +21,10 @@ #include <libcflat.h> #include <asm/gic.h> #include <asm/gic-v3-its.h> +#include <asm/timer.h> #define NS_5_SECONDS (5 * 1000 * 1000 * 1000UL) + static u32 cntfrq; static volatile bool irq_ready, irq_received; @@ -33,9 +35,16 @@ static void (*write_eoir)(u32 irqstat); static void gic_irq_handler(struct pt_regs *regs) { + u32 irqstat = gic_read_iar(); irq_ready = false; irq_received = true; - gic_write_eoir(gic_read_iar()); + gic_write_eoir(irqstat); + + if (irqstat == PPI(TIMER_VTIMER_IRQ)) { + write_sysreg((ARCH_TIMER_CTL_IMASK | ARCH_TIMER_CTL_ENABLE), + cntv_ctl_el0); + isb(); + } irq_ready = true; } @@ -198,6 +207,47 @@ static void lpi_exec(void) assert_msg(irq_received, "failed to receive LPI in time, but received %d successfully\n", received); } +static bool timer_prep(void) +{ + static void *gic_isenabler; + + gic_enable_defaults(); + install_irq_handler(EL1H_IRQ, gic_irq_handler); + local_irq_enable(); + + gic_isenabler = gicv3_sgi_base() + GICR_ISENABLER0; + writel(1 << PPI(TIMER_VTIMER_IRQ), gic_isenabler); + write_sysreg(ARCH_TIMER_CTL_ENABLE, cntv_ctl_el0); + isb(); + + gic_prep_common(); + return true; +} + +static void timer_exec(void) +{ + u64 before_timer; + u64 timer_10ms; + unsigned tries = 1 << 28; + static int received = 0; + + irq_received = false; + + before_timer = read_sysreg(cntvct_el0); + timer_10ms = cntfrq / 100; + write_sysreg(before_timer + timer_10ms, cntv_cval_el0); + write_sysreg(ARCH_TIMER_CTL_ENABLE, cntv_ctl_el0); + isb(); + + while (!irq_received && tries--) + cpu_relax(); + + if (irq_received) + ++received; + + assert_msg(irq_received, "failed to receive PPI in time, but received %d successfully\n", received); +} + static void hvc_exec(void) { asm volatile("mov w0, #0x4b000000; hvc #0" ::: "w0"); @@ -245,6 +295,7 @@ static struct exit_test tests[] = { {"ipi", ipi_prep, ipi_exec, 65536, true}, {"ipi_hw", ipi_hw_prep, ipi_exec, 65536, true}, {"lpi", lpi_prep, lpi_exec, 65536, true}, + {"timer_10ms", timer_prep, timer_exec, 256, true}, }; struct ns_time { -- 2.19.1
Under certain circumstances, we need to further process microbench test results, so we add test->post() in the microbench framework, later patch will use that. Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> --- arm/micro-bench.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/arm/micro-bench.c b/arm/micro-bench.c index 1e1bde5..4680ba4 100644 --- a/arm/micro-bench.c +++ b/arm/micro-bench.c @@ -33,6 +33,12 @@ static int nr_ipi_received; static void *vgic_dist_base; static void (*write_eoir)(u32 irqstat); +struct ns_time { + uint64_t ns; + uint64_t ns_frac; +}; +static void ticks_to_ns_time(uint64_t ticks, struct ns_time *ns_time); + static void gic_irq_handler(struct pt_regs *regs) { u32 irqstat = gic_read_iar(); @@ -283,24 +289,20 @@ struct exit_test { const char *name; bool (*prep)(void); void (*exec)(void); + void (*post)(uint64_t, uint64_t, struct ns_time*); u32 times; bool run; }; static struct exit_test tests[] = { - {"hvc", NULL, hvc_exec, 65536, true}, - {"mmio_read_user", NULL, mmio_read_user_exec, 65536, true}, - {"mmio_read_vgic", NULL, mmio_read_vgic_exec, 65536, true}, - {"eoi", NULL, eoi_exec, 65536, true}, - {"ipi", ipi_prep, ipi_exec, 65536, true}, - {"ipi_hw", ipi_hw_prep, ipi_exec, 65536, true}, - {"lpi", lpi_prep, lpi_exec, 65536, true}, - {"timer_10ms", timer_prep, timer_exec, 256, true}, -}; - -struct ns_time { - uint64_t ns; - uint64_t ns_frac; + {"hvc", NULL, hvc_exec, NULL, 65536, true}, + {"mmio_read_user", NULL, mmio_read_user_exec, NULL, 65536, true}, + {"mmio_read_vgic", NULL, mmio_read_vgic_exec, NULL, 65536, true}, + {"eoi", NULL, eoi_exec, NULL, 65536, true}, + {"ipi", ipi_prep, ipi_exec, NULL, 65536, true}, + {"ipi_hw", ipi_hw_prep, ipi_exec, NULL, 65536, true}, + {"lpi", lpi_prep, lpi_exec, NULL, 65536, true}, + {"timer_10ms", timer_prep, timer_exec, NULL, 256, true}, }; #define PS_PER_SEC (1000 * 1000 * 1000 * 1000UL) @@ -339,6 +341,9 @@ static void loop_test(struct exit_test *test) ticks_to_ns_time(total_ticks, &total_ns); } + if (test->post) + test->post(total_ticks, ntimes, &total_ns); + avg_ns.ns = total_ns.ns / ntimes; avg_ns.ns_frac = total_ns.ns_frac / ntimes; -- 2.19.1
For we get the time duration of (10msec timer + injection latency) in timer_exec(), we substract the value of 10msec in timer_post() to get the actual latency. Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> --- arm/micro-bench.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arm/micro-bench.c b/arm/micro-bench.c index 4680ba4..315fc7c 100644 --- a/arm/micro-bench.c +++ b/arm/micro-bench.c @@ -254,6 +254,18 @@ static void timer_exec(void) assert_msg(irq_received, "failed to receive PPI in time, but received %d successfully\n", received); } +static void timer_post(uint64_t total_ticks, uint64_t ntimes, struct ns_time *total_ns) +{ + /* + * We use a 10msec timer to test the latency of PPI, + * so we substract the ticks of 10msec to get the + * actual latency + */ + + total_ticks -= ntimes * (cntfrq / 100); + ticks_to_ns_time(total_ticks, total_ns); +} + static void hvc_exec(void) { asm volatile("mov w0, #0x4b000000; hvc #0" ::: "w0"); @@ -302,7 +314,7 @@ static struct exit_test tests[] = { {"ipi", ipi_prep, ipi_exec, NULL, 65536, true}, {"ipi_hw", ipi_hw_prep, ipi_exec, NULL, 65536, true}, {"lpi", lpi_prep, lpi_exec, NULL, 65536, true}, - {"timer_10ms", timer_prep, timer_exec, NULL, 256, true}, + {"timer_10ms", timer_prep, timer_exec, timer_post, 256, true}, }; #define PS_PER_SEC (1000 * 1000 * 1000 * 1000UL) -- 2.19.1
On Fri, Jul 31, 2020 at 03:42:34PM +0800, Jingyi Wang wrote: > With the development of arm gic architecture, we think it will be useful > to add some performance test in kut to measure the cost of interrupts. > In this series, we add GICv4.1 support for ipi latency test and > implement LPI/vtimer latency test. > > This series of patches has been tested on GICv4.1 supported hardware. > > Note: > Based on patch "arm/arm64: timer: Extract irqs at setup time", > https://www.spinics.net/lists/kvm-arm/msg41425.html > > * From v2: > - Code and commit message cleanup > - Clear nr_ipi_received before ipi_exec() thanks for Tao Zeng's review > - rebase the patch "Add vtimer latency test" on Andrew's patch It'd be good if you'd reposted my patch along with this series, since we didn't merge mine yet either. Don't worry about now, though, I'll pick it up the same time I pick up this series, which I plan to do later today or tomorrow. Getting this series applied will allow me to try out our new and shiny gitlab repo :-) Thanks, drew > - Add test->post() to get actual PPI latency > > * From v1: > - Fix spelling mistake > - Use the existing interface to inject hw sgi to simply the logic > - Add two separate patches to limit the running times and time cost > of each individual micro-bench test > > Jingyi Wang (10): > arm64: microbench: get correct ipi received num > arm64: microbench: Generalize ipi test names > arm64: microbench: gic: Add ipi latency test for gicv4.1 support kvm > arm64: its: Handle its command queue wrapping > arm64: microbench: its: Add LPI latency test > arm64: microbench: Allow each test to specify its running times > arm64: microbench: Add time limit for each individual test > arm64: microbench: Add vtimer latency test > arm64: microbench: Add test->post() to further process test results > arm64: microbench: Add timer_post() to get actual PPI latency > > arm/micro-bench.c | 256 ++++++++++++++++++++++++++++++------- > lib/arm/asm/gic-v3.h | 3 + > lib/arm/asm/gic.h | 1 + > lib/arm64/gic-v3-its-cmd.c | 3 +- > 4 files changed, 219 insertions(+), 44 deletions(-) > > -- > 2.19.1 > >
On Fri, Jul 31, 2020 at 03:42:41PM +0800, Jingyi Wang wrote: > Besides using separate running times parameter, we add time limit > for loop_test to make sure each test should be done in a certain > time(5 sec here). > > Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> > Reviewed-by: Eric Auger <eric.auger@redhat.com> > --- > arm/micro-bench.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/arm/micro-bench.c b/arm/micro-bench.c > index 93bd855..09d9d53 100644 > --- a/arm/micro-bench.c > +++ b/arm/micro-bench.c > @@ -22,6 +22,7 @@ > #include <asm/gic.h> > #include <asm/gic-v3-its.h> > > +#define NS_5_SECONDS (5 * 1000 * 1000 * 1000UL) > static u32 cntfrq; > > static volatile bool irq_ready, irq_received; > @@ -267,23 +268,26 @@ static void loop_test(struct exit_test *test) > uint64_t start, end, total_ticks, ntimes = 0; > struct ns_time total_ns, avg_ns; > > + total_ticks = 0; > if (test->prep) { > if(!test->prep()) { > printf("%s test skipped\n", test->name); > return; > } > } > - isb(); > - start = read_sysreg(cntpct_el0); > - while (ntimes < test->times) { > + > + while (ntimes < test->times && total_ns.ns < NS_5_SECONDS) { total_ns.ns is now being used uninitialized here. It needs to be initialized to zero above with total_ns = {}. I'll do this fixup myself. Thanks, drew > + isb(); > + start = read_sysreg(cntpct_el0); > test->exec(); > + isb(); > + end = read_sysreg(cntpct_el0); > + > ntimes++; > + total_ticks += (end - start); > + ticks_to_ns_time(total_ticks, &total_ns); > } > - isb(); > - end = read_sysreg(cntpct_el0); > > - total_ticks = end - start; > - ticks_to_ns_time(total_ticks, &total_ns); > avg_ns.ns = total_ns.ns / ntimes; > avg_ns.ns_frac = total_ns.ns_frac / ntimes; > > -- > 2.19.1 > >
On Fri, Jul 31, 2020 at 03:42:42PM +0800, Jingyi Wang wrote: > Trigger PPIs by setting up a 10msec timer and test the latency. > > Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> > --- > arm/micro-bench.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 52 insertions(+), 1 deletion(-) > > diff --git a/arm/micro-bench.c b/arm/micro-bench.c > index 09d9d53..1e1bde5 100644 > --- a/arm/micro-bench.c > +++ b/arm/micro-bench.c > @@ -21,8 +21,10 @@ > #include <libcflat.h> > #include <asm/gic.h> > #include <asm/gic-v3-its.h> > +#include <asm/timer.h> > > #define NS_5_SECONDS (5 * 1000 * 1000 * 1000UL) > + > static u32 cntfrq; > > static volatile bool irq_ready, irq_received; > @@ -33,9 +35,16 @@ static void (*write_eoir)(u32 irqstat); > > static void gic_irq_handler(struct pt_regs *regs) > { > + u32 irqstat = gic_read_iar(); > irq_ready = false; > irq_received = true; > - gic_write_eoir(gic_read_iar()); > + gic_write_eoir(irqstat); > + > + if (irqstat == PPI(TIMER_VTIMER_IRQ)) { > + write_sysreg((ARCH_TIMER_CTL_IMASK | ARCH_TIMER_CTL_ENABLE), > + cntv_ctl_el0); > + isb(); > + } > irq_ready = true; > } > > @@ -198,6 +207,47 @@ static void lpi_exec(void) > assert_msg(irq_received, "failed to receive LPI in time, but received %d successfully\n", received); > } > > +static bool timer_prep(void) > +{ > + static void *gic_isenabler; This doesn't need to be static. > + > + gic_enable_defaults(); > + install_irq_handler(EL1H_IRQ, gic_irq_handler); > + local_irq_enable(); > + > + gic_isenabler = gicv3_sgi_base() + GICR_ISENABLER0; We can't assume GICv3. This test also runs with GICv2. I'll fix this up myself. Thanks, drew > + writel(1 << PPI(TIMER_VTIMER_IRQ), gic_isenabler); > + write_sysreg(ARCH_TIMER_CTL_ENABLE, cntv_ctl_el0); > + isb(); > + > + gic_prep_common(); > + return true; > +} > + > +static void timer_exec(void) > +{ > + u64 before_timer; > + u64 timer_10ms; > + unsigned tries = 1 << 28; > + static int received = 0; > + > + irq_received = false; > + > + before_timer = read_sysreg(cntvct_el0); > + timer_10ms = cntfrq / 100; > + write_sysreg(before_timer + timer_10ms, cntv_cval_el0); > + write_sysreg(ARCH_TIMER_CTL_ENABLE, cntv_ctl_el0); > + isb(); > + > + while (!irq_received && tries--) > + cpu_relax(); > + > + if (irq_received) > + ++received; > + > + assert_msg(irq_received, "failed to receive PPI in time, but received %d successfully\n", received); > +} > + > static void hvc_exec(void) > { > asm volatile("mov w0, #0x4b000000; hvc #0" ::: "w0"); > @@ -245,6 +295,7 @@ static struct exit_test tests[] = { > {"ipi", ipi_prep, ipi_exec, 65536, true}, > {"ipi_hw", ipi_hw_prep, ipi_exec, 65536, true}, > {"lpi", lpi_prep, lpi_exec, 65536, true}, > + {"timer_10ms", timer_prep, timer_exec, 256, true}, > }; > > struct ns_time { > -- > 2.19.1 > >
On Fri, Jul 31, 2020 at 03:42:43PM +0800, Jingyi Wang wrote: > Under certain circumstances, we need to further process microbench > test results, so we add test->post() in the microbench framework, > later patch will use that. > > Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com> > --- > arm/micro-bench.c | 31 ++++++++++++++++++------------- > 1 file changed, 18 insertions(+), 13 deletions(-) > > diff --git a/arm/micro-bench.c b/arm/micro-bench.c > index 1e1bde5..4680ba4 100644 > --- a/arm/micro-bench.c > +++ b/arm/micro-bench.c > @@ -33,6 +33,12 @@ static int nr_ipi_received; > static void *vgic_dist_base; > static void (*write_eoir)(u32 irqstat); > > +struct ns_time { > + uint64_t ns; > + uint64_t ns_frac; > +}; Missing blank line > +static void ticks_to_ns_time(uint64_t ticks, struct ns_time *ns_time); You could have moved the whole function up here. > + > static void gic_irq_handler(struct pt_regs *regs) > { > u32 irqstat = gic_read_iar(); > @@ -283,24 +289,20 @@ struct exit_test { > const char *name; > bool (*prep)(void); > void (*exec)(void); > + void (*post)(uint64_t, uint64_t, struct ns_time*); > u32 times; > bool run; > }; > > static struct exit_test tests[] = { > - {"hvc", NULL, hvc_exec, 65536, true}, > - {"mmio_read_user", NULL, mmio_read_user_exec, 65536, true}, > - {"mmio_read_vgic", NULL, mmio_read_vgic_exec, 65536, true}, > - {"eoi", NULL, eoi_exec, 65536, true}, > - {"ipi", ipi_prep, ipi_exec, 65536, true}, > - {"ipi_hw", ipi_hw_prep, ipi_exec, 65536, true}, > - {"lpi", lpi_prep, lpi_exec, 65536, true}, > - {"timer_10ms", timer_prep, timer_exec, 256, true}, > -}; > - > -struct ns_time { > - uint64_t ns; > - uint64_t ns_frac; > + {"hvc", NULL, hvc_exec, NULL, 65536, true}, > + {"mmio_read_user", NULL, mmio_read_user_exec, NULL, 65536, true}, > + {"mmio_read_vgic", NULL, mmio_read_vgic_exec, NULL, 65536, true}, > + {"eoi", NULL, eoi_exec, NULL, 65536, true}, > + {"ipi", ipi_prep, ipi_exec, NULL, 65536, true}, > + {"ipi_hw", ipi_hw_prep, ipi_exec, NULL, 65536, true}, > + {"lpi", lpi_prep, lpi_exec, NULL, 65536, true}, > + {"timer_10ms", timer_prep, timer_exec, NULL, 256, true}, > }; > > #define PS_PER_SEC (1000 * 1000 * 1000 * 1000UL) > @@ -339,6 +341,9 @@ static void loop_test(struct exit_test *test) > ticks_to_ns_time(total_ticks, &total_ns); > } > > + if (test->post) > + test->post(total_ticks, ntimes, &total_ns); > + We can drop the ns_time structure and pass total_ticks by reference. > avg_ns.ns = total_ns.ns / ntimes; > avg_ns.ns_frac = total_ns.ns_frac / ntimes; > > -- > 2.19.1 > > I can do these changes myself. Thanks, drew
On Fri, Jul 31, 2020 at 03:42:34PM +0800, Jingyi Wang wrote: > With the development of arm gic architecture, we think it will be useful > to add some performance test in kut to measure the cost of interrupts. > In this series, we add GICv4.1 support for ipi latency test and > implement LPI/vtimer latency test. > > This series of patches has been tested on GICv4.1 supported hardware. > > Note: > Based on patch "arm/arm64: timer: Extract irqs at setup time", > https://www.spinics.net/lists/kvm-arm/msg41425.html > > * From v2: > - Code and commit message cleanup > - Clear nr_ipi_received before ipi_exec() thanks for Tao Zeng's review > - rebase the patch "Add vtimer latency test" on Andrew's patch > - Add test->post() to get actual PPI latency > > * From v1: > - Fix spelling mistake > - Use the existing interface to inject hw sgi to simply the logic > - Add two separate patches to limit the running times and time cost > of each individual micro-bench test > > Jingyi Wang (10): > arm64: microbench: get correct ipi received num > arm64: microbench: Generalize ipi test names > arm64: microbench: gic: Add ipi latency test for gicv4.1 support kvm > arm64: its: Handle its command queue wrapping > arm64: microbench: its: Add LPI latency test > arm64: microbench: Allow each test to specify its running times > arm64: microbench: Add time limit for each individual test > arm64: microbench: Add vtimer latency test > arm64: microbench: Add test->post() to further process test results > arm64: microbench: Add timer_post() to get actual PPI latency > > arm/micro-bench.c | 256 ++++++++++++++++++++++++++++++------- > lib/arm/asm/gic-v3.h | 3 + > lib/arm/asm/gic.h | 1 + > lib/arm64/gic-v3-its-cmd.c | 3 +- > 4 files changed, 219 insertions(+), 44 deletions(-) > > -- > 2.19.1 > > Pushed (to the new repo at https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git) Thanks, drew
On 7/31/2020 8:01 PM, Andrew Jones wrote: > On Fri, Jul 31, 2020 at 03:42:34PM +0800, Jingyi Wang wrote: >> With the development of arm gic architecture, we think it will be useful >> to add some performance test in kut to measure the cost of interrupts. >> In this series, we add GICv4.1 support for ipi latency test and >> implement LPI/vtimer latency test. >> >> This series of patches has been tested on GICv4.1 supported hardware. >> >> Note: >> Based on patch "arm/arm64: timer: Extract irqs at setup time", >> https://www.spinics.net/lists/kvm-arm/msg41425.html >> >> * From v2: >> - Code and commit message cleanup >> - Clear nr_ipi_received before ipi_exec() thanks for Tao Zeng's review >> - rebase the patch "Add vtimer latency test" on Andrew's patch > > It'd be good if you'd reposted my patch along with this series, since we > didn't merge mine yet either. Don't worry about now, though, I'll pick it > up the same time I pick up this series, which I plan to do later today > or tomorrow. > > Getting this series applied will allow me to try out our new and shiny > gitlab repo :-) > > Thanks, > drew > Thanks for your reviewing and fix. >> - Add test->post() to get actual PPI latency >> >> * From v1: >> - Fix spelling mistake >> - Use the existing interface to inject hw sgi to simply the logic >> - Add two separate patches to limit the running times and time cost >> of each individual micro-bench test >> >> Jingyi Wang (10): >> arm64: microbench: get correct ipi received num >> arm64: microbench: Generalize ipi test names >> arm64: microbench: gic: Add ipi latency test for gicv4.1 support kvm >> arm64: its: Handle its command queue wrapping >> arm64: microbench: its: Add LPI latency test >> arm64: microbench: Allow each test to specify its running times >> arm64: microbench: Add time limit for each individual test >> arm64: microbench: Add vtimer latency test >> arm64: microbench: Add test->post() to further process test results >> arm64: microbench: Add timer_post() to get actual PPI latency >> >> arm/micro-bench.c | 256 ++++++++++++++++++++++++++++++------- >> lib/arm/asm/gic-v3.h | 3 + >> lib/arm/asm/gic.h | 1 + >> lib/arm64/gic-v3-its-cmd.c | 3 +- >> 4 files changed, 219 insertions(+), 44 deletions(-) >> >> -- >> 2.19.1 >> >> > > > . >
Hi all, Currently, kvm-unit-tests only support GICv3 vLPI injection. May I ask is there any plan or suggestion on constructing irq bypass mechanism to test vLPI direct injection in kvm-unit-tests? Thanks, Jingyi
On 2020-08-05 12:54, Jingyi Wang wrote:
> Hi all,
>
> Currently, kvm-unit-tests only support GICv3 vLPI injection. May I ask
> is there any plan or suggestion on constructing irq bypass mechanism
> to test vLPI direct injection in kvm-unit-tests?
I'm not sure what you are asking for here. VLPIs are only delivered
from a HW device, and the offloading mechanism isn't visible from
userspace (you either have an enabled GICv4 implementation, or
you don't).
There are ways to *trigger* device MSIs from userspace and inject
them in a guest, but that's only a debug feature, which shouldn't
be enabled on a production system.
M.
--
Jazz is not dead. It just smells funny...
Hi Marc,
On 8/5/2020 8:13 PM, Marc Zyngier wrote:
> On 2020-08-05 12:54, Jingyi Wang wrote:
>> Hi all,
>>
>> Currently, kvm-unit-tests only support GICv3 vLPI injection. May I ask
>> is there any plan or suggestion on constructing irq bypass mechanism
>> to test vLPI direct injection in kvm-unit-tests?
>
> I'm not sure what you are asking for here. VLPIs are only delivered
> from a HW device, and the offloading mechanism isn't visible from
> userspace (you either have an enabled GICv4 implementation, or
> you don't).
>
> There are ways to *trigger* device MSIs from userspace and inject
> them in a guest, but that's only a debug feature, which shouldn't
> be enabled on a production system.
>
> M.
Sorry for the late reply.
As I mentioned before, we want to add vLPI direct injection test
in KUT, meanwhile measure the latency of hardware vLPI injection.
Sure, vLPI is triggered by hardware. Since kernel supports sending
ITS INT command in guest to trigger vLPI, I wonder if it is possible
to add an extra interface to make a vLPI hardware-offload(just as
kvm_vgic_v4_set_forwarding() does). If so, vgic_its_trigger_msi()
can inject vLPI directly instead of using LR.
Thanks,
Jingyi
On 2020-08-11 02:48, Jingyi Wang wrote: > Hi Marc, > > On 8/5/2020 8:13 PM, Marc Zyngier wrote: >> On 2020-08-05 12:54, Jingyi Wang wrote: >>> Hi all, >>> >>> Currently, kvm-unit-tests only support GICv3 vLPI injection. May I >>> ask >>> is there any plan or suggestion on constructing irq bypass mechanism >>> to test vLPI direct injection in kvm-unit-tests? >> >> I'm not sure what you are asking for here. VLPIs are only delivered >> from a HW device, and the offloading mechanism isn't visible from >> userspace (you either have an enabled GICv4 implementation, or >> you don't). >> >> There are ways to *trigger* device MSIs from userspace and inject >> them in a guest, but that's only a debug feature, which shouldn't >> be enabled on a production system. >> >> M. > > Sorry for the late reply. > > As I mentioned before, we want to add vLPI direct injection test > in KUT, meanwhile measure the latency of hardware vLPI injection. > > Sure, vLPI is triggered by hardware. Since kernel supports sending > ITS INT command in guest to trigger vLPI, I wonder if it is possible So can the host. > to add an extra interface to make a vLPI hardware-offload(just as > kvm_vgic_v4_set_forwarding() does). If so, vgic_its_trigger_msi() > can inject vLPI directly instead of using LR. The interface exists, it is in debugfs. But it mandates that the device exists. And no, I am not willing to add an extra KVM userspace API for this. The whole concept of injecting an INT to measure the performance of GICv4 is slightly bonkers, actually. Most of the cost is paid on the injection path (queuing a pair of command, waiting until the ITS wakes up and generate the signal...). What you really want to measure is the time from generation of the LPI by a device until the guest acknowledges the interrupt to the device itself. and this can only be implemented in the device. M. -- Jazz is not dead. It just smells funny...
On 8/11/2020 3:49 PM, Marc Zyngier wrote:
> On 2020-08-11 02:48, Jingyi Wang wrote:
>> Hi Marc,
>>
>> On 8/5/2020 8:13 PM, Marc Zyngier wrote:
>>> On 2020-08-05 12:54, Jingyi Wang wrote:
>>>> Hi all,
>>>>
>>>> Currently, kvm-unit-tests only support GICv3 vLPI injection. May I ask
>>>> is there any plan or suggestion on constructing irq bypass mechanism
>>>> to test vLPI direct injection in kvm-unit-tests?
>>>
>>> I'm not sure what you are asking for here. VLPIs are only delivered
>>> from a HW device, and the offloading mechanism isn't visible from
>>> userspace (you either have an enabled GICv4 implementation, or
>>> you don't).
>>>
>>> There are ways to *trigger* device MSIs from userspace and inject
>>> them in a guest, but that's only a debug feature, which shouldn't
>>> be enabled on a production system.
>>>
>>> M.
>>
>> Sorry for the late reply.
>>
>> As I mentioned before, we want to add vLPI direct injection test
>> in KUT, meanwhile measure the latency of hardware vLPI injection.
>>
>> Sure, vLPI is triggered by hardware. Since kernel supports sending
>> ITS INT command in guest to trigger vLPI, I wonder if it is possible
>
> So can the host.
>
>> to add an extra interface to make a vLPI hardware-offload(just as
>> kvm_vgic_v4_set_forwarding() does). If so, vgic_its_trigger_msi()
>> can inject vLPI directly instead of using LR.
>
> The interface exists, it is in debugfs. But it mandates that the
> device exists. And no, I am not willing to add an extra KVM userspace
> API for this.
>
> The whole concept of injecting an INT to measure the performance
> of GICv4 is slightly bonkers, actually. Most of the cost is paid
> on the injection path (queuing a pair of command, waiting until
> the ITS wakes up and generate the signal...).
>
> What you really want to measure is the time from generation of
> the LPI by a device until the guest acknowledges the interrupt
> to the device itself. and this can only be implemented in the
> device.
>
> M.
OK understood. I just thought measuring the latency of the path
kvm->guest can be useful.
Thanks,
Jingyi
On 2020-08-17 02:46, Jingyi Wang wrote: > On 8/11/2020 3:49 PM, Marc Zyngier wrote: >> On 2020-08-11 02:48, Jingyi Wang wrote: [...] >>> As I mentioned before, we want to add vLPI direct injection test >>> in KUT, meanwhile measure the latency of hardware vLPI injection. >>> >>> Sure, vLPI is triggered by hardware. Since kernel supports sending >>> ITS INT command in guest to trigger vLPI, I wonder if it is possible >> >> So can the host. >> >>> to add an extra interface to make a vLPI hardware-offload(just as >>> kvm_vgic_v4_set_forwarding() does). If so, vgic_its_trigger_msi() >>> can inject vLPI directly instead of using LR. >> >> The interface exists, it is in debugfs. But it mandates that the >> device exists. And no, I am not willing to add an extra KVM userspace >> API for this. >> >> The whole concept of injecting an INT to measure the performance >> of GICv4 is slightly bonkers, actually. Most of the cost is paid >> on the injection path (queuing a pair of command, waiting until >> the ITS wakes up and generate the signal...). >> >> What you really want to measure is the time from generation of >> the LPI by a device until the guest acknowledges the interrupt >> to the device itself. and this can only be implemented in the >> device. >> >> M. > > OK understood. I just thought measuring the latency of the path > kvm->guest can be useful. That's the problem. There is no way you can implement this, because you cannot distinguish injection latency from the delivery latency. And frankly, it doesn't matter, because the hypervisor is not on that path at all (if it is slow, that's because the HW is slow, and you can't change anything in KVM to make it better). On the other hand, measuring the latency of a guest being scheduled back in when blocked on WFI would be much more relevant, as this is exactly what would happen on delivery of a doorbell. M. -- Jazz is not dead. It just smells funny...