[Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements
@ 2014-07-30 15:20 Alex Bennée
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 1/5] target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault Alex Bennée
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Alex Bennée @ 2014-07-30 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée

Hi,

Not too much has changed:

  * added a review tag
  * fixed up review comments
  * added some notes about benchmark results
  * added a patch to disable ARMv5 in AArch64 build

The most important thing is I've measured a 25-30% improvement in
kernel and android boot time.

Alex Bennée (5):
  target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault
  target-arm: A64: fix TLB flush instructions
  target-arm: A64: fix use 12 bit page tables for AArch64
  scripts/make_device_config.sh: inline includes
  target-arm: A64: disable a bunch of ARMv5 machines

 default-configs/aarch64-softmmu.mak |  5 ++++-
 default-configs/arm-softmmu.mak     |  1 +
 hw/arm/Makefile.objs                | 19 ++++++++++++++----
 hw/arm/realview.c                   |  6 ++++++
 scripts/make_device_config.sh       | 39 ++++++++++++++++++++++---------------
 target-arm/cpu.c                    | 36 +++++++++++++++++++++-------------
 target-arm/cpu.h                    | 13 ++++++++++---
 target-arm/helper.c                 | 14 +++++++++----
 8 files changed, 92 insertions(+), 41 deletions(-)

-- 
2.0.3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 1/5] target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault
  2014-07-30 15:20 [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Alex Bennée
@ 2014-07-30 15:20 ` Alex Bennée
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 2/5] target-arm: A64: fix TLB flush instructions Alex Bennée
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Alex Bennée @ 2014-07-30 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée

Otherwise we break quickly when we change TARGET_PAGE_SIZE.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

diff --git a/target-arm/helper.c b/target-arm/helper.c
index a0e57cd..aa5d267 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4029,8 +4029,8 @@ int arm_cpu_handle_mmu_fault(CPUState *cs, vaddr address,
                         &page_size);
     if (ret == 0) {
         /* Map a single [sub]page.  */
-        phys_addr &= ~(hwaddr)0x3ff;
-        address &= ~(target_ulong)0x3ff;
+        phys_addr &= TARGET_PAGE_MASK;
+        address &= TARGET_PAGE_MASK;
         tlb_set_page(cs, address, phys_addr, prot, mmu_idx, page_size);
         return 0;
     }
-- 
2.0.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 2/5] target-arm: A64: fix TLB flush instructions
  2014-07-30 15:20 [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Alex Bennée
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 1/5] target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault Alex Bennée
@ 2014-07-30 15:20 ` Alex Bennée
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 3/5] target-arm: A64: fix use 12 bit page tables for AArch64 Alex Bennée
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Alex Bennée @ 2014-07-30 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée

According to the ARM ARM we weren't correctly flushing the TLB entries
where bits 63:56 didn't match bit 55 of the virtual address. This
exposed a problem when we switched QEMU's internal TARGET_PAGE_BITS to
12 for aarch64.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

---

v2:
  - remove mangled ARM ARM quote

diff --git a/target-arm/helper.c b/target-arm/helper.c
index aa5d267..906940d 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1766,12 +1766,17 @@ static CPAccessResult aa64_cacheop_access(CPUARMState *env,
     return CP_ACCESS_OK;
 }
 
+/* See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions
+ * Page D4-1736 (DDI0487A.b)
+ */
+
 static void tlbi_aa64_va_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                uint64_t value)
 {
     /* Invalidate by VA (AArch64 version) */
     ARMCPU *cpu = arm_env_get_cpu(env);
-    uint64_t pageaddr = value << 12;
+    uint64_t pageaddr = sextract64(value << 12, 0, 56);
+
     tlb_flush_page(CPU(cpu), pageaddr);
 }
 
@@ -1780,7 +1785,8 @@ static void tlbi_aa64_vaa_write(CPUARMState *env, const ARMCPRegInfo *ri,
 {
     /* Invalidate by VA, all ASIDs (AArch64 version) */
     ARMCPU *cpu = arm_env_get_cpu(env);
-    uint64_t pageaddr = value << 12;
+    uint64_t pageaddr = sextract64(value << 12, 0, 56);
+
     tlb_flush_page(CPU(cpu), pageaddr);
 }
 
-- 
2.0.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 3/5] target-arm: A64: fix use 12 bit page tables for AArch64
  2014-07-30 15:20 [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Alex Bennée
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 1/5] target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault Alex Bennée
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 2/5] target-arm: A64: fix TLB flush instructions Alex Bennée
@ 2014-07-30 15:20 ` Alex Bennée
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 4/5] scripts/make_device_config.sh: inline includes Alex Bennée
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Alex Bennée @ 2014-07-30 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée

The AArch64 architecture only support 4k+ pages so using a smaller value
for QEMU's internal page table handling only makes us less efficient. I
ran some simple benchmarks and measured a 25-30% speed improvement for
CPU bound tasks like booting the kernel or compressing a section of a
file-system.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---

v2:
  - fix AArch64 references
  - add benchmark notes to commit msg

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index c83f249..83df513 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -1051,11 +1051,18 @@ bool write_cpustate_to_list(ARMCPU *cpu);
 #if defined(CONFIG_USER_ONLY)
 #define TARGET_PAGE_BITS 12
 #else
-/* The ARM MMU allows 1k pages.  */
-/* ??? Linux doesn't actually use these, and they're deprecated in recent
-   architecture revisions.  Maybe a configure option to disable them.  */
+#if defined(TARGET_AARCH64)
+/* You can't configure 1k pages on AArch64 hardware */
+#define TARGET_PAGE_BITS 12
+#else
+/* The ARM MMU allows 1k pages - although they are not used by Linux
+ * FIXME?: they're deprecated in recent architecture revisions and
+ * this does create a performance hit. Maybe a configure option to
+ * disable them?
+ */
 #define TARGET_PAGE_BITS 10
 #endif
+#endif
 
 #if defined(TARGET_AARCH64)
 #  define TARGET_PHYS_ADDR_SPACE_BITS 48
-- 
2.0.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 4/5] scripts/make_device_config.sh: inline includes
  2014-07-30 15:20 [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Alex Bennée
                   ` (2 preceding siblings ...)
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 3/5] target-arm: A64: fix use 12 bit page tables for AArch64 Alex Bennée
@ 2014-07-30 15:20 ` Alex Bennée
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 5/5] target-arm: A64: disable a bunch of ARMv5 machines Alex Bennée
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Alex Bennée @ 2014-07-30 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée

Previously we processed all the includes at first and then just
concatenated them to the end of the eventual file. This meant if you
wanted to include a set of configs but only turn off some you couldn't.

Now you can do (for example):

    # We support most of the 32 bit boards so need all their config
    include arm-softmmu.mak
    # we explicitly disable ones that require old ARMv5 support
    CONFIG_ARMV5_BOARDS=n

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

diff --git a/scripts/make_device_config.sh b/scripts/make_device_config.sh
index 7242707..b0d0b51 100644
--- a/scripts/make_device_config.sh
+++ b/scripts/make_device_config.sh
@@ -1,6 +1,8 @@
 #! /bin/sh
 # Construct a target device config file from a default, pulling in any
-# files from include directives.
+# files from include directives. The include files are inlined in the order
+# they are found in the source file so you can reverse a sub-set of
+# settings afterwards.
 
 dest=$1.tmp
 dep=`dirname $1`-`basename $1`.d
@@ -8,21 +10,26 @@ src=$2
 src_dir=`dirname $src`
 all_includes=
 
-process_includes () {
-  cat $1 | grep '^include' | \
-  while read include file ; do
-    all_includes="$all_includes $src_dir/$file"
-    process_includes $src_dir/$file
-  done
+process_file () {
+    local in=$1
+    local out=$2
+    while read first second; do
+        if [ "$first" = "include" ]; then
+            local inc="$src_dir/$second"
+            all_includes="$all_includes $inc"
+            echo "# include file: $inc" >> $out
+            process_file $inc $out
+            echo "# end of include: $inc" >> $out
+        else
+            if [ "x$second" = "x" ]; then
+                echo $first >> $out
+            else
+                echo "$first $second" >> $out
+            fi
+        fi
+    done < $in
 }
 
-f=$src
-while [ -n "$f" ] ; do
-  f=`cat $f | tr -d '\r' | awk '/^include / {printf "'$src_dir'/%s ", $2}'`
-  [ $? = 0 ] || exit 1
-  all_includes="$all_includes $f"
-done
-process_includes $src > $dest
-
-cat $src $all_includes | grep -v '^include' > $dest
+echo "# This file is auto-generated by make_device_config.sh" > $dest
+process_file $src $dest
 echo "$1: $all_includes" > $dep
-- 
2.0.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 5/5] target-arm: A64: disable a bunch of ARMv5 machines
  2014-07-30 15:20 [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Alex Bennée
                   ` (3 preceding siblings ...)
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 4/5] scripts/make_device_config.sh: inline includes Alex Bennée
@ 2014-07-30 15:20 ` Alex Bennée
  2014-08-01 16:45   ` Christopher Covington
  2014-08-01 16:06 ` [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Peter Maydell
  2014-08-01 19:35 ` Paolo Bonzini
  6 siblings, 1 reply; 17+ messages in thread
From: Alex Bennée @ 2014-07-30 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée

If you attempt to run a system image which uses 1k pages in the
qemu-system-aarch64 build it will fail thanks to the change to 12 bit
pages. The boards are still available for the qemu-system-arm build.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

diff --git a/default-configs/aarch64-softmmu.mak b/default-configs/aarch64-softmmu.mak
index 6d3b5c7..2bf26a0 100644
--- a/default-configs/aarch64-softmmu.mak
+++ b/default-configs/aarch64-softmmu.mak
@@ -1,6 +1,9 @@
 # Default configuration for aarch64-softmmu
 
-# We support all the 32 bit boards so need all their config
+# We support most of the 32 bit boards so need all their config
 include arm-softmmu.mak
 
+# we explicitly disable ones that require old ARMv5 support
+CONFIG_ARMV5_BOARDS=n
+
 # Currently no 64-bit specific config requirements
diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index f10cc69..1e5656e 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -63,6 +63,7 @@ CONFIG_BITBANG_I2C=y
 CONFIG_FRAMEBUFFER=y
 CONFIG_XILINX_SPIPS=y
 
+CONFIG_ARMV5_BOARDS=y
 CONFIG_ARM11SCU=y
 CONFIG_A9SCU=y
 CONFIG_DIGIC=y
diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 5899ed6..3dd87c6 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -1,8 +1,19 @@
-obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
+obj-y += boot.o collie.o exynos4_boards.o
+obj-$(CONFIG_ARMV5_BOARDS) += gumstix.o
+obj-y += highbank.o
 obj-$(CONFIG_DIGIC) += digic_boards.o
-obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o
-obj-y += omap_sx1.o palm.o ranchu.o realview.o spitz.o stellaris.o
-obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
+obj-$(CONFIG_ARMV5_BOARDS) += integratorcp.o
+obj-y += kzm.o
+obj-$(CONFIG_ARMV5_BOARDS) += mainstone.o
+obj-$(CONFIG_ARMV5_BOARDS) += musicpal.o
+obj-y += nseries.o
+obj-y += omap_sx1.o palm.o ranchu.o realview.o
+obj-$(CONFIG_ARMV5_BOARDS) += spitz.o
+obj-y += stellaris.o
+obj-$(CONFIG_ARMV5_BOARDS) += tosa.o
+obj-$(CONFIG_ARMV5_BOARDS) +=versatilepb.o
+obj-y += vexpress.o virt.o xilinx_zynq.o
+obj-$(CONFIG_ARMV5_BOARDS) +=z2.o
 obj-y += lionhead.o
 
 obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
diff --git a/hw/arm/realview.c b/hw/arm/realview.c
index 7e04e50..6152927 100644
--- a/hw/arm/realview.c
+++ b/hw/arm/realview.c
@@ -351,6 +351,7 @@ static void realview_init(QEMUMachineInitArgs *args,
     arm_load_kernel(ARM_CPU(first_cpu), &realview_binfo);
 }
 
+#ifndef TARGET_AARCH64
 static void realview_eb_init(QEMUMachineInitArgs *args)
 {
     if (!args->cpu_model) {
@@ -358,6 +359,7 @@ static void realview_eb_init(QEMUMachineInitArgs *args)
     }
     realview_init(args, BOARD_EB);
 }
+#endif
 
 static void realview_eb_mpcore_init(QEMUMachineInitArgs *args)
 {
@@ -383,12 +385,14 @@ static void realview_pbx_a9_init(QEMUMachineInitArgs *args)
     realview_init(args, BOARD_PBX_A9);
 }
 
+#ifndef TARGET_AARCH64
 static QEMUMachine realview_eb_machine = {
     .name = "realview-eb",
     .desc = "ARM RealView Emulation Baseboard (ARM926EJ-S)",
     .init = realview_eb_init,
     .block_default_type = IF_SCSI,
 };
+#endif
 
 static QEMUMachine realview_eb_mpcore_machine = {
     .name = "realview-eb-mpcore",
@@ -414,7 +418,9 @@ static QEMUMachine realview_pbx_a9_machine = {
 
 static void realview_machine_init(void)
 {
+#ifndef TARGET_AARCH64
     qemu_register_machine(&realview_eb_machine);
+#endif
     qemu_register_machine(&realview_eb_mpcore_machine);
     qemu_register_machine(&realview_pb_a8_machine);
     qemu_register_machine(&realview_pbx_a9_machine);
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 6c6f2b3..3c0ad9a 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -398,6 +398,7 @@ static ObjectClass *arm_cpu_class_by_name(const char *cpu_model)
 /* CPU models. These are not needed for the AArch64 linux-user build. */
 #if !defined(CONFIG_USER_ONLY) || !defined(TARGET_AARCH64)
 
+#ifndef TARGET_AARCH64
 static void arm926_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
@@ -452,6 +453,7 @@ static void arm1026_initfn(Object *obj)
         define_one_arm_cp_reg(cpu, &ifar);
     }
 }
+#endif /* TARGET_AARCH64 */
 
 static void arm1136_r2_initfn(Object *obj)
 {
@@ -780,6 +782,7 @@ static void cortex_a15_initfn(Object *obj)
     define_arm_cp_regs(cpu, cortexa15_cp_reginfo);
 }
 
+#ifndef TARGET_AARCH64
 static void ti925t_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
@@ -947,6 +950,7 @@ static void pxa270c5_initfn(Object *obj)
     cpu->ctr = 0xd172172;
     cpu->reset_sctlr = 0x00000078;
 }
+#endif /* TARGET_AARCH64 */
 
 #ifdef CONFIG_USER_ONLY
 static void arm_any_initfn(Object *obj)
@@ -975,24 +979,16 @@ typedef struct ARMCPUInfo {
     void (*class_init)(ObjectClass *oc, void *data);
 } ARMCPUInfo;
 
+/* ARMv5 CPU models are disabled for the TARGET_AARCH64 build as they
+ * could potentially use the smaller 1k pages which we don't support
+ * for aarch64
+ */
 static const ARMCPUInfo arm_cpus[] = {
 #if !defined(CONFIG_USER_ONLY) || !defined(TARGET_AARCH64)
+#ifndef TARGET_AARCH64
     { .name = "arm926",      .initfn = arm926_initfn },
     { .name = "arm946",      .initfn = arm946_initfn },
     { .name = "arm1026",     .initfn = arm1026_initfn },
-    /* What QEMU calls "arm1136-r2" is actually the 1136 r0p2, i.e. an
-     * older core than plain "arm1136". In particular this does not
-     * have the v6K features.
-     */
-    { .name = "arm1136-r2",  .initfn = arm1136_r2_initfn },
-    { .name = "arm1136",     .initfn = arm1136_initfn },
-    { .name = "arm1176",     .initfn = arm1176_initfn },
-    { .name = "arm11mpcore", .initfn = arm11mpcore_initfn },
-    { .name = "cortex-m3",   .initfn = cortex_m3_initfn,
-                             .class_init = arm_v7m_class_init },
-    { .name = "cortex-a8",   .initfn = cortex_a8_initfn },
-    { .name = "cortex-a9",   .initfn = cortex_a9_initfn },
-    { .name = "cortex-a15",  .initfn = cortex_a15_initfn },
     { .name = "ti925t",      .initfn = ti925t_initfn },
     { .name = "sa1100",      .initfn = sa1100_initfn },
     { .name = "sa1110",      .initfn = sa1110_initfn },
@@ -1009,6 +1005,20 @@ static const ARMCPUInfo arm_cpus[] = {
     { .name = "pxa270-b1",   .initfn = pxa270b1_initfn },
     { .name = "pxa270-c0",   .initfn = pxa270c0_initfn },
     { .name = "pxa270-c5",   .initfn = pxa270c5_initfn },
+#endif
+    /* What QEMU calls "arm1136-r2" is actually the 1136 r0p2, i.e. an
+     * older core than plain "arm1136". In particular this does not
+     * have the v6K features.
+     */
+    { .name = "arm1136-r2",  .initfn = arm1136_r2_initfn },
+    { .name = "arm1136",     .initfn = arm1136_initfn },
+    { .name = "arm1176",     .initfn = arm1176_initfn },
+    { .name = "arm11mpcore", .initfn = arm11mpcore_initfn },
+    { .name = "cortex-m3",   .initfn = cortex_m3_initfn,
+                             .class_init = arm_v7m_class_init },
+    { .name = "cortex-a8",   .initfn = cortex_a8_initfn },
+    { .name = "cortex-a9",   .initfn = cortex_a9_initfn },
+    { .name = "cortex-a15",  .initfn = cortex_a15_initfn },
 #ifdef CONFIG_USER_ONLY
     { .name = "any",         .initfn = arm_any_initfn },
 #endif
-- 
2.0.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements
  2014-07-30 15:20 [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Alex Bennée
                   ` (4 preceding siblings ...)
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 5/5] target-arm: A64: disable a bunch of ARMv5 machines Alex Bennée
@ 2014-08-01 16:06 ` Peter Maydell
  2014-08-01 22:26   ` Peter Maydell
  2014-08-01 19:35 ` Paolo Bonzini
  6 siblings, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2014-08-01 16:06 UTC (permalink / raw)
  To: Alex Bennée; +Cc: QEMU Developers

On 30 July 2014 16:20, Alex Bennée <alex.bennee@linaro.org> wrote:
> Hi,
>
> Not too much has changed:
>
>   * added a review tag
>   * fixed up review comments
>   * added some notes about benchmark results
>   * added a patch to disable ARMv5 in AArch64 build
>
> The most important thing is I've measured a 25-30% improvement in
> kernel and android boot time.
>
> Alex Bennée (5):
>   target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault
>   target-arm: A64: fix TLB flush instructions
>   target-arm: A64: fix use 12 bit page tables for AArch64
>   scripts/make_device_config.sh: inline includes
>   target-arm: A64: disable a bunch of ARMv5 machines

I'm taking the first two of these into target-arm.next because
they're obvious standalone bugfixes. I need to think about the
last three a bit more: I dislike just dropping the ARMv5 CPUs
from qemu-system-aarch64, it's kind of arbitrary.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] target-arm: A64: disable a bunch of ARMv5 machines
  2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 5/5] target-arm: A64: disable a bunch of ARMv5 machines Alex Bennée
@ 2014-08-01 16:45   ` Christopher Covington
  2014-08-01 17:32     ` Peter Maydell
  0 siblings, 1 reply; 17+ messages in thread
From: Christopher Covington @ 2014-08-01 16:45 UTC (permalink / raw)
  To: Alex Bennée; +Cc: peter.maydell, qemu-devel

On 07/30/2014 11:20 AM, Alex Bennée wrote:
> If you attempt to run a system image which uses 1k pages in the
> qemu-system-aarch64 build it will fail thanks to the change to 12 bit
> pages. The boards are still available for the qemu-system-arm build.

I fail to understand the correlation between ARMv5 machines and software use
of 1M sections. Are AArch32, the short descriptor translation table format,
and 1M sections optional and unimplemented in newer machines like those using
the Cortex A15 or Cortex A57?

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] target-arm: A64: disable a bunch of ARMv5 machines
  2014-08-01 16:45   ` Christopher Covington
@ 2014-08-01 17:32     ` Peter Maydell
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-08-01 17:32 UTC (permalink / raw)
  To: Christopher Covington; +Cc: Alex Bennée, QEMU Developers

On 1 August 2014 17:45, Christopher Covington <cov@codeaurora.org> wrote:
> On 07/30/2014 11:20 AM, Alex Bennée wrote:
>> If you attempt to run a system image which uses 1k pages in the
>> qemu-system-aarch64 build it will fail thanks to the change to 12 bit
>> pages. The boards are still available for the qemu-system-arm build.
>
> I fail to understand the correlation between ARMv5 machines and software use
> of 1M sections. Are AArch32, the short descriptor translation table format,
> and 1M sections optional and unimplemented in newer machines like those using
> the Cortex A15 or Cortex A57?

The commit message says 1K, not 1M, and it means it. These
"tiny pages" were supported by v4 and v5 MMUs, but not by
v6 onwards, where the smallest possible pagesize is 4K.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements
  2014-07-30 15:20 [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Alex Bennée
                   ` (5 preceding siblings ...)
  2014-08-01 16:06 ` [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Peter Maydell
@ 2014-08-01 19:35 ` Paolo Bonzini
  2014-08-04 10:29   ` Alex Bennée
  6 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2014-08-01 19:35 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: peter.maydell

Il 30/07/2014 17:20, Alex Bennée ha scritto:
> Hi,
> 
> Not too much has changed:
> 
>   * added a review tag
>   * fixed up review comments
>   * added some notes about benchmark results
>   * added a patch to disable ARMv5 in AArch64 build
> 
> The most important thing is I've measured a 25-30% improvement in
> kernel and android boot time.
> 
> Alex Bennée (5):
>   target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault
>   target-arm: A64: fix TLB flush instructions
>   target-arm: A64: fix use 12 bit page tables for AArch64
>   scripts/make_device_config.sh: inline includes
>   target-arm: A64: disable a bunch of ARMv5 machines
> 
>  default-configs/aarch64-softmmu.mak |  5 ++++-
>  default-configs/arm-softmmu.mak     |  1 +
>  hw/arm/Makefile.objs                | 19 ++++++++++++++----
>  hw/arm/realview.c                   |  6 ++++++
>  scripts/make_device_config.sh       | 39 ++++++++++++++++++++++---------------
>  target-arm/cpu.c                    | 36 +++++++++++++++++++++-------------
>  target-arm/cpu.h                    | 13 ++++++++++---
>  target-arm/helper.c                 | 14 +++++++++----
>  8 files changed, 92 insertions(+), 41 deletions(-)
> 

Hi Alex, have you seen this patch?  Perhaps you're interested in
reviving it.

http://article.gmane.org/gmane.comp.emulators.qemu/253864

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements
  2014-08-01 16:06 ` [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Peter Maydell
@ 2014-08-01 22:26   ` Peter Maydell
  2014-08-04 10:23     ` Alex Bennée
  2014-08-06 20:32     ` Richard Henderson
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Maydell @ 2014-08-01 22:26 UTC (permalink / raw)
  To: Alex Bennée; +Cc: QEMU Developers

On 1 August 2014 17:06, Peter Maydell <peter.maydell@linaro.org> wrote:
> I'm taking the first two of these into target-arm.next because
> they're obvious standalone bugfixes. I need to think about the
> last three a bit more: I dislike just dropping the ARMv5 CPUs
> from qemu-system-aarch64, it's kind of arbitrary.

So:
 * there's clearly a big perf win to be had here
 * this patchset gives us that for 4K pages on AArch64
 * but it doesn't help for 4K pages on AArch32 (really
    common)
 * and it's not going to be good for 64K pages on AArch64
   either (which I suspect will not be a rare choice)

So I think it would be good if we investigated the degree
of difficulty in improving QEMU's TLB code so it isn't just
"one TLB entry size with larger pages a bolt-on which we
hope people don't actually use" first, before we just disable
all the v5 CPUs.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements
  2014-08-01 22:26   ` Peter Maydell
@ 2014-08-04 10:23     ` Alex Bennée
  2014-08-04 10:32       ` Peter Maydell
  2014-08-06 20:32     ` Richard Henderson
  1 sibling, 1 reply; 17+ messages in thread
From: Alex Bennée @ 2014-08-04 10:23 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

Peter Maydell writes:

> On 1 August 2014 17:06, Peter Maydell <peter.maydell@linaro.org> wrote:
>> I'm taking the first two of these into target-arm.next because
>> they're obvious standalone bugfixes. I need to think about the
>> last three a bit more: I dislike just dropping the ARMv5 CPUs
>> from qemu-system-aarch64, it's kind of arbitrary.
>
> So:
>  * there's clearly a big perf win to be had here
>  * this patchset gives us that for 4K pages on AArch64
>  * but it doesn't help for 4K pages on AArch32 (really
>     common)

Well for the AArch32 profile if you ran under qemu-system-aarch64 you
would be OK surely?

>  * and it's not going to be good for 64K pages on AArch64
>    either (which I suspect will not be a rare choice)

Does the kernel already use 64k pages for it's code?

>
> So I think it would be good if we investigated the degree
> of difficulty in improving QEMU's TLB code so it isn't just
> "one TLB entry size with larger pages a bolt-on which we
> hope people don't actually use" first, before we just disable
> all the v5 CPUs.

Given there is likely to be a growth of multiple-page size guests we
probably do want to look at cleaning up the TLB code to handle these
cases gracefully.

Another option we could look at is keeping track of cross-page TB links
and then invalidating them if we need to. We might want to do that based
on heuristics so avoid excessive cleaning up. However you would expect
for example the kernel to sit in it's own set of bigger pages which
never get invalidated where we could happily chain more TBs together.

>
> thanks
> -- PMM

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements
  2014-08-01 19:35 ` Paolo Bonzini
@ 2014-08-04 10:29   ` Alex Bennée
  2014-08-04 11:34     ` Alex Bennée
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Bennée @ 2014-08-04 10:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: peter.maydell, qemu-devel


Paolo Bonzini writes:

> Il 30/07/2014 17:20, Alex Bennée ha scritto:
>> Hi,
>> 
<snip>
>> The most important thing is I've measured a 25-30% improvement in
>> kernel and android boot time.
>> 
<snip>
> Hi Alex, have you seen this patch?  Perhaps you're interested in
> reviving it.
>
> http://article.gmane.org/gmane.comp.emulators.qemu/253864

I saw it when it first came out but I didn't quite follow what it was
doing as I hadn't looked at the TLB code. I'll have another look and see
what difference it can make.

>
> Paolo

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements
  2014-08-04 10:23     ` Alex Bennée
@ 2014-08-04 10:32       ` Peter Maydell
  2014-08-04 13:11         ` Christopher Covington
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2014-08-04 10:32 UTC (permalink / raw)
  To: Alex Bennée; +Cc: QEMU Developers

On 4 August 2014 11:23, Alex Bennée <alex.bennee@linaro.org> wrote:
> Peter Maydell writes:
>> So:
>>  * there's clearly a big perf win to be had here
>>  * this patchset gives us that for 4K pages on AArch64
>>  * but it doesn't help for 4K pages on AArch32 (really
>>     common)
>
> Well for the AArch32 profile if you ran under qemu-system-aarch64 you
> would be OK surely?

Yes, but that's pretty non-obvious, and also it doesn't
make much sense to the user to say "these 32 bit
CPUs should be run under qemu-system-aarch64 but
these other ones should be under qemu-system-arm".

>>  * and it's not going to be good for 64K pages on AArch64
>>    either (which I suspect will not be a rare choice)
>
> Does the kernel already use 64k pages for it's code?

There's a config option, which will cause it to use 64K
pages for everything including userspace.
(There's also 16K pages but I forget if Linux has support
for those.) I think the kernel can also use 64K pages in
some cases even in a 4K page config, but I don't know the
details.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements
  2014-08-04 10:29   ` Alex Bennée
@ 2014-08-04 11:34     ` Alex Bennée
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Bennée @ 2014-08-04 11:34 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: peter.maydell, Xin Tong, qemu-devel


Alex Bennée writes:

> Paolo Bonzini writes:
>
>> Il 30/07/2014 17:20, Alex Bennée ha scritto:
>>> Hi,
>>> 
> <snip>
>>> The most important thing is I've measured a 25-30% improvement in
>>> kernel and android boot time.
>>> 
> <snip>
>> Hi Alex, have you seen this patch?  Perhaps you're interested in
>> reviving it.
>>
>> http://article.gmane.org/gmane.comp.emulators.qemu/253864
>
> I saw it when it first came out but I didn't quite follow what it was
> doing as I hadn't looked at the TLB code. I'll have another look and see
> what difference it can make.

A quick and dirty benchmark:

**** Comparing 10bit/12bit tables with and without [[http://article.gmane.org/gmane.comp.emulators.qemu/253864][victim cache]]

#+BEGIN_NOTES
Time in seconds, smaller is better
Percentage is amount of time compared to run to the left
#+END_NOTES

| Code  |   10 bit | 10 bit + victim |    12 bit | 12 bit + victim |
|-------+----------+-----------------+-----------+-----------------|
|       |   12.783 |          11.664 |    10.348 |           9.527 |
| Runs  |   13.046 |          11.971 |    10.123 |           9.326 |
|       |   12.929 |          11.673 |    11.130 |           9.858 |
|       |   12.981 |          11.941 |    10.223 |           9.673 |
|-------+----------+-----------------+-----------+-----------------|
| Avgs  | 12.93475 |        11.81225 |    10.456 |           9.596 |
|-------+----------+-----------------+-----------+-----------------|
| %prev |     100% |       91.321827 | 88.518276 |       91.775057 |
#+TBLFM: $2=vmean(@I..II)::$3=(@II$3/@II$2)*100::$4=vmean(@I..II)::$5=vmean(@I..II)

Which as you expect shows the page table size is a greater improvement
to the performance but the victim cache also improves the run time on
top of this.

I say as you would expect because any time you need to exit translated
code there is a bunch of overhead in doing so.

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements
  2014-08-04 10:32       ` Peter Maydell
@ 2014-08-04 13:11         ` Christopher Covington
  0 siblings, 0 replies; 17+ messages in thread
From: Christopher Covington @ 2014-08-04 13:11 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Alex Bennée, QEMU Developers

On 08/04/2014 06:32 AM, Peter Maydell wrote:
> On 4 August 2014 11:23, Alex Bennée <alex.bennee@linaro.org> wrote:
>> Peter Maydell writes:
>>> So:
>>>  * there's clearly a big perf win to be had here
>>>  * this patchset gives us that for 4K pages on AArch64
>>>  * but it doesn't help for 4K pages on AArch32 (really
>>>     common)
>>
>> Well for the AArch32 profile if you ran under qemu-system-aarch64 you
>> would be OK surely?
> 
> Yes, but that's pretty non-obvious, and also it doesn't
> make much sense to the user to say "these 32 bit
> CPUs should be run under qemu-system-aarch64 but
> these other ones should be under qemu-system-arm".
> 
>>>  * and it's not going to be good for 64K pages on AArch64
>>>    either (which I suspect will not be a rare choice)
>>
>> Does the kernel already use 64k pages for it's code?
> 
> There's a config option, which will cause it to use 64K
> pages for everything including userspace.
> (There's also 16K pages but I forget if Linux has support
> for those.) I think the kernel can also use 64K pages in
> some cases even in a 4K page config, but I don't know the
> details.

Linux support for the 16K granule has not been merged nor have I seen any
patches for it.

With a 4K granule one can early out to 1GiB blocks ("gigabyte kernel logical
mappings") [1] or 2MiB blocks ("huge pages") [2].

With a 64K granule, one can early out to 512MiB blocks ("huge pages") [2].

1. http://permalink.gmane.org/gmane.linux.ports.arm.kernel/322436
2. http://comments.gmane.org/gmane.linux.kernel.mm/100651

Regards,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements
  2014-08-01 22:26   ` Peter Maydell
  2014-08-04 10:23     ` Alex Bennée
@ 2014-08-06 20:32     ` Richard Henderson
  1 sibling, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2014-08-06 20:32 UTC (permalink / raw)
  To: Peter Maydell, Alex Bennée; +Cc: QEMU Developers

On 08/01/2014 12:26 PM, Peter Maydell wrote:
> So I think it would be good if we investigated the degree
> of difficulty in improving QEMU's TLB code so it isn't just
> "one TLB entry size with larger pages a bolt-on which we
> hope people don't actually use" first, before we just disable
> all the v5 CPUs.

I suspect the overhead of making the guest page size variable
will be less than the improvement that can be had in doing so.

Supporting multiple page sizes simultaneously, as with "huge pages", is
probably unfeasible.


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-08-06 20:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-30 15:20 [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Alex Bennée
2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 1/5] target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault Alex Bennée
2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 2/5] target-arm: A64: fix TLB flush instructions Alex Bennée
2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 3/5] target-arm: A64: fix use 12 bit page tables for AArch64 Alex Bennée
2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 4/5] scripts/make_device_config.sh: inline includes Alex Bennée
2014-07-30 15:20 ` [Qemu-devel] [PATCH v2 5/5] target-arm: A64: disable a bunch of ARMv5 machines Alex Bennée
2014-08-01 16:45   ` Christopher Covington
2014-08-01 17:32     ` Peter Maydell
2014-08-01 16:06 ` [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements Peter Maydell
2014-08-01 22:26   ` Peter Maydell
2014-08-04 10:23     ` Alex Bennée
2014-08-04 10:32       ` Peter Maydell
2014-08-04 13:11         ` Christopher Covington
2014-08-06 20:32     ` Richard Henderson
2014-08-01 19:35 ` Paolo Bonzini
2014-08-04 10:29   ` Alex Bennée
2014-08-04 11:34     ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.