All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
@ 2016-02-08 11:40 ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

I've recently been looking at our entry/exit costs, and profiling
figures did show some very low hanging fruits.

The most obvious cost is that accessing the GIC HW is slow. As in
"deadly slow", specially when GICv2 is involved. So not hammering the
HW when there is nothing to write is immediately beneficial, as this
is the most common cases (whatever people seem to think, interrupts
are a *rare* event).

Another easy thing to fix is the way we handle trapped system
registers. We do insist on (mostly) sorting them, but we do perform a
linear search on trap. We can switch to a binary search for free, and
get immediate benefits (the PMU code, being extremely trap-happy,
benefits immediately from this).

With these in place, I see an improvement of 20 to 30% (depending on
the platform) on our world-switch cycle count when running a set of
hand-crafted guests that are designed to only perform traps.

Methodology:

* NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
and then a power-off:

__start:
	mov	x19, #(1 << 16)
1:	mov	x0, #0x84000000
	hvc	#0
	sub	x19, x19, #1
	cbnz	x19, 1b
	mov	x0, #0x84000000
	add	x0, x0, #9
	hvc	#0
	b	.

* sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:

__start:
	mov	x19, #(1 << 20)
1:	mrs	x0, PMSELR_EL0
	sub	x19, x19, #1
	cbnz	x19, 1b
	mov	x0, #0x84000000
	add	x0, x0, #9
	hvc	#0
	b	.

* These guests are profiled using perf and kvmtool:

taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles

The result is then divided by the number of iterations (2^16 or 2^20).

These tests have been run on Seattle, Mustang, and LS2085, and shown
significant improvements in all cases. I've only touched the arm64
GIC code, but obviously the 32bit code should use it as well once
we've migrated it to C.

I've pushed out a branch (kvm-arm64/suck-less) to the usual location.

Thanks,

	M.

Marc Zyngier (8):
  arm64: KVM: Switch the sys_reg search to be a binary search
  ARM: KVM: Properly sort the invariant table
  ARM: KVM: Enforce sorting of all CP tables
  ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit
  ARM: KVM: Switch the CP reg search to be a binary search
  KVM: arm/arm64: timer: Add active state caching
  KVM: arm/arm64: Avoid accessing GICH registers
  KVM: arm64: Avoid accessing ICH registers

 arch/arm/kvm/arm.c              |   1 +
 arch/arm/kvm/coproc.c           |  74 ++++++-----
 arch/arm/kvm/coproc.h           |   8 +-
 arch/arm64/kvm/hyp/vgic-v2-sr.c |  71 +++++++---
 arch/arm64/kvm/hyp/vgic-v3-sr.c | 288 ++++++++++++++++++++++++----------------
 arch/arm64/kvm/sys_regs.c       |  40 +++---
 include/kvm/arm_arch_timer.h    |   5 +
 include/kvm/arm_vgic.h          |   8 +-
 virt/kvm/arm/arch_timer.c       |  31 +++++
 virt/kvm/arm/vgic-v3.c          |   4 +-
 10 files changed, 334 insertions(+), 196 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
@ 2016-02-08 11:40 ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

I've recently been looking at our entry/exit costs, and profiling
figures did show some very low hanging fruits.

The most obvious cost is that accessing the GIC HW is slow. As in
"deadly slow", specially when GICv2 is involved. So not hammering the
HW when there is nothing to write is immediately beneficial, as this
is the most common cases (whatever people seem to think, interrupts
are a *rare* event).

Another easy thing to fix is the way we handle trapped system
registers. We do insist on (mostly) sorting them, but we do perform a
linear search on trap. We can switch to a binary search for free, and
get immediate benefits (the PMU code, being extremely trap-happy,
benefits immediately from this).

With these in place, I see an improvement of 20 to 30% (depending on
the platform) on our world-switch cycle count when running a set of
hand-crafted guests that are designed to only perform traps.

Methodology:

* NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
and then a power-off:

__start:
	mov	x19, #(1 << 16)
1:	mov	x0, #0x84000000
	hvc	#0
	sub	x19, x19, #1
	cbnz	x19, 1b
	mov	x0, #0x84000000
	add	x0, x0, #9
	hvc	#0
	b	.

* sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:

__start:
	mov	x19, #(1 << 20)
1:	mrs	x0, PMSELR_EL0
	sub	x19, x19, #1
	cbnz	x19, 1b
	mov	x0, #0x84000000
	add	x0, x0, #9
	hvc	#0
	b	.

* These guests are profiled using perf and kvmtool:

taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles

The result is then divided by the number of iterations (2^16 or 2^20).

These tests have been run on Seattle, Mustang, and LS2085, and shown
significant improvements in all cases. I've only touched the arm64
GIC code, but obviously the 32bit code should use it as well once
we've migrated it to C.

I've pushed out a branch (kvm-arm64/suck-less) to the usual location.

Thanks,

	M.

Marc Zyngier (8):
  arm64: KVM: Switch the sys_reg search to be a binary search
  ARM: KVM: Properly sort the invariant table
  ARM: KVM: Enforce sorting of all CP tables
  ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit
  ARM: KVM: Switch the CP reg search to be a binary search
  KVM: arm/arm64: timer: Add active state caching
  KVM: arm/arm64: Avoid accessing GICH registers
  KVM: arm64: Avoid accessing ICH registers

 arch/arm/kvm/arm.c              |   1 +
 arch/arm/kvm/coproc.c           |  74 ++++++-----
 arch/arm/kvm/coproc.h           |   8 +-
 arch/arm64/kvm/hyp/vgic-v2-sr.c |  71 +++++++---
 arch/arm64/kvm/hyp/vgic-v3-sr.c | 288 ++++++++++++++++++++++++----------------
 arch/arm64/kvm/sys_regs.c       |  40 +++---
 include/kvm/arm_arch_timer.h    |   5 +
 include/kvm/arm_vgic.h          |   8 +-
 virt/kvm/arm/arch_timer.c       |  31 +++++
 virt/kvm/arm/vgic-v3.c          |   4 +-
 10 files changed, 334 insertions(+), 196 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search
  2016-02-08 11:40 ` Marc Zyngier
@ 2016-02-08 11:40   ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

Our 64bit sys_reg table is about 90 entries long (so far, and the
PMU support is likely to increase this). This means that on average,
it takes 45 comparaisons to find the right entry (and actually the
full 90 if we have to search the invariant table).

Not the most efficient thing. Specially when you think that this
table is already sorted. Switching to a binary search effectively
reduces the search to about 7 comparaisons. Slightly better!

As an added bonus, the comparaison is done by comparing all the
fields at once, instead of one at a time.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index eec3598..0035869 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -20,6 +20,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/bsearch.h>
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
 #include <linux/uaccess.h>
@@ -942,29 +943,32 @@ static const struct sys_reg_desc *get_target_table(unsigned target,
 	}
 }
 
+#define reg_to_match_value(x)						\
+	({								\
+		unsigned long val;					\
+		val  = (x)->Op0 << 14;					\
+		val |= (x)->Op1 << 11;					\
+		val |= (x)->CRn << 7;					\
+		val |= (x)->CRm << 3;					\
+		val |= (x)->Op2;					\
+		val;							\
+	 })
+
+static int match_sys_reg(const void *key, const void *elt)
+{
+	const unsigned long pval = (unsigned long)key;
+	const struct sys_reg_desc *r = elt;
+
+	return pval - reg_to_match_value(r);
+}
+
 static const struct sys_reg_desc *find_reg(const struct sys_reg_params *params,
 					 const struct sys_reg_desc table[],
 					 unsigned int num)
 {
-	unsigned int i;
-
-	for (i = 0; i < num; i++) {
-		const struct sys_reg_desc *r = &table[i];
+	unsigned long pval = reg_to_match_value(params);
 
-		if (params->Op0 != r->Op0)
-			continue;
-		if (params->Op1 != r->Op1)
-			continue;
-		if (params->CRn != r->CRn)
-			continue;
-		if (params->CRm != r->CRm)
-			continue;
-		if (params->Op2 != r->Op2)
-			continue;
-
-		return r;
-	}
-	return NULL;
+	return bsearch((void *)pval, table, num, sizeof(table[0]), match_sys_reg);
 }
 
 int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search
@ 2016-02-08 11:40   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

Our 64bit sys_reg table is about 90 entries long (so far, and the
PMU support is likely to increase this). This means that on average,
it takes 45 comparaisons to find the right entry (and actually the
full 90 if we have to search the invariant table).

Not the most efficient thing. Specially when you think that this
table is already sorted. Switching to a binary search effectively
reduces the search to about 7 comparaisons. Slightly better!

As an added bonus, the comparaison is done by comparing all the
fields at once, instead of one at a time.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index eec3598..0035869 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -20,6 +20,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/bsearch.h>
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
 #include <linux/uaccess.h>
@@ -942,29 +943,32 @@ static const struct sys_reg_desc *get_target_table(unsigned target,
 	}
 }
 
+#define reg_to_match_value(x)						\
+	({								\
+		unsigned long val;					\
+		val  = (x)->Op0 << 14;					\
+		val |= (x)->Op1 << 11;					\
+		val |= (x)->CRn << 7;					\
+		val |= (x)->CRm << 3;					\
+		val |= (x)->Op2;					\
+		val;							\
+	 })
+
+static int match_sys_reg(const void *key, const void *elt)
+{
+	const unsigned long pval = (unsigned long)key;
+	const struct sys_reg_desc *r = elt;
+
+	return pval - reg_to_match_value(r);
+}
+
 static const struct sys_reg_desc *find_reg(const struct sys_reg_params *params,
 					 const struct sys_reg_desc table[],
 					 unsigned int num)
 {
-	unsigned int i;
-
-	for (i = 0; i < num; i++) {
-		const struct sys_reg_desc *r = &table[i];
+	unsigned long pval = reg_to_match_value(params);
 
-		if (params->Op0 != r->Op0)
-			continue;
-		if (params->Op1 != r->Op1)
-			continue;
-		if (params->CRn != r->CRn)
-			continue;
-		if (params->CRm != r->CRm)
-			continue;
-		if (params->Op2 != r->Op2)
-			continue;
-
-		return r;
-	}
-	return NULL;
+	return bsearch((void *)pval, table, num, sizeof(table[0]), match_sys_reg);
 }
 
 int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/8] ARM: KVM: Properly sort the invariant table
  2016-02-08 11:40 ` Marc Zyngier
@ 2016-02-08 11:40   ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Not having the invariant table properly sorted is an oddity, and
may get in the way of future optimisations. Let's fix it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/coproc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index f3d88dc..16c74f8 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -645,6 +645,9 @@ static struct coproc_reg invariant_cp15[] = {
 	{ CRn( 0), CRm( 0), Op1( 0), Op2( 3), is32, NULL, get_TLBTR },
 	{ CRn( 0), CRm( 0), Op1( 0), Op2( 6), is32, NULL, get_REVIDR },
 
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
+
 	{ CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, NULL, get_ID_PFR0 },
 	{ CRn( 0), CRm( 1), Op1( 0), Op2( 1), is32, NULL, get_ID_PFR1 },
 	{ CRn( 0), CRm( 1), Op1( 0), Op2( 2), is32, NULL, get_ID_DFR0 },
@@ -660,9 +663,6 @@ static struct coproc_reg invariant_cp15[] = {
 	{ CRn( 0), CRm( 2), Op1( 0), Op2( 3), is32, NULL, get_ID_ISAR3 },
 	{ CRn( 0), CRm( 2), Op1( 0), Op2( 4), is32, NULL, get_ID_ISAR4 },
 	{ CRn( 0), CRm( 2), Op1( 0), Op2( 5), is32, NULL, get_ID_ISAR5 },
-
-	{ CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
-	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
 };
 
 /*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/8] ARM: KVM: Properly sort the invariant table
@ 2016-02-08 11:40   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

Not having the invariant table properly sorted is an oddity, and
may get in the way of future optimisations. Let's fix it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/coproc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index f3d88dc..16c74f8 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -645,6 +645,9 @@ static struct coproc_reg invariant_cp15[] = {
 	{ CRn( 0), CRm( 0), Op1( 0), Op2( 3), is32, NULL, get_TLBTR },
 	{ CRn( 0), CRm( 0), Op1( 0), Op2( 6), is32, NULL, get_REVIDR },
 
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
+
 	{ CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, NULL, get_ID_PFR0 },
 	{ CRn( 0), CRm( 1), Op1( 0), Op2( 1), is32, NULL, get_ID_PFR1 },
 	{ CRn( 0), CRm( 1), Op1( 0), Op2( 2), is32, NULL, get_ID_DFR0 },
@@ -660,9 +663,6 @@ static struct coproc_reg invariant_cp15[] = {
 	{ CRn( 0), CRm( 2), Op1( 0), Op2( 3), is32, NULL, get_ID_ISAR3 },
 	{ CRn( 0), CRm( 2), Op1( 0), Op2( 4), is32, NULL, get_ID_ISAR4 },
 	{ CRn( 0), CRm( 2), Op1( 0), Op2( 5), is32, NULL, get_ID_ISAR5 },
-
-	{ CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
-	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
 };
 
 /*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 3/8] ARM: KVM: Enforce sorting of all CP tables
  2016-02-08 11:40 ` Marc Zyngier
@ 2016-02-08 11:40   ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Since we're obviously terrible at sorting the CP tables, make sure
we're going to do it properly (or fail to boot). arm64 has had the
same mechanism for a while, and nobody ever broke it...

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/coproc.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 16c74f8..03f5d14 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -381,17 +381,26 @@ static const struct coproc_reg cp15_regs[] = {
 	{ CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
 };
 
+static int check_reg_table(const struct coproc_reg *table, unsigned int n)
+{
+	unsigned int i;
+
+	for (i = 1; i < n; i++) {
+		if (cmp_reg(&table[i-1], &table[i]) >= 0) {
+			kvm_err("reg table %p out of order (%d)\n", table, i - 1);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
 /* Target specific emulation tables */
 static struct kvm_coproc_target_table *target_tables[KVM_ARM_NUM_TARGETS];
 
 void kvm_register_target_coproc_table(struct kvm_coproc_target_table *table)
 {
-	unsigned int i;
-
-	for (i = 1; i < table->num; i++)
-		BUG_ON(cmp_reg(&table->table[i-1],
-			       &table->table[i]) >= 0);
-
+	BUG_ON(check_reg_table(table->table, table->num));
 	target_tables[table->target] = table;
 }
 
@@ -1210,8 +1219,8 @@ void kvm_coproc_table_init(void)
 	unsigned int i;
 
 	/* Make sure tables are unique and in order. */
-	for (i = 1; i < ARRAY_SIZE(cp15_regs); i++)
-		BUG_ON(cmp_reg(&cp15_regs[i-1], &cp15_regs[i]) >= 0);
+	BUG_ON(check_reg_table(cp15_regs, ARRAY_SIZE(cp15_regs)));
+	BUG_ON(check_reg_table(invariant_cp15, ARRAY_SIZE(invariant_cp15)));
 
 	/* We abuse the reset function to overwrite the table itself. */
 	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 3/8] ARM: KVM: Enforce sorting of all CP tables
@ 2016-02-08 11:40   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

Since we're obviously terrible at sorting the CP tables, make sure
we're going to do it properly (or fail to boot). arm64 has had the
same mechanism for a while, and nobody ever broke it...

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/coproc.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 16c74f8..03f5d14 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -381,17 +381,26 @@ static const struct coproc_reg cp15_regs[] = {
 	{ CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
 };
 
+static int check_reg_table(const struct coproc_reg *table, unsigned int n)
+{
+	unsigned int i;
+
+	for (i = 1; i < n; i++) {
+		if (cmp_reg(&table[i-1], &table[i]) >= 0) {
+			kvm_err("reg table %p out of order (%d)\n", table, i - 1);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
 /* Target specific emulation tables */
 static struct kvm_coproc_target_table *target_tables[KVM_ARM_NUM_TARGETS];
 
 void kvm_register_target_coproc_table(struct kvm_coproc_target_table *table)
 {
-	unsigned int i;
-
-	for (i = 1; i < table->num; i++)
-		BUG_ON(cmp_reg(&table->table[i-1],
-			       &table->table[i]) >= 0);
-
+	BUG_ON(check_reg_table(table->table, table->num));
 	target_tables[table->target] = table;
 }
 
@@ -1210,8 +1219,8 @@ void kvm_coproc_table_init(void)
 	unsigned int i;
 
 	/* Make sure tables are unique and in order. */
-	for (i = 1; i < ARRAY_SIZE(cp15_regs); i++)
-		BUG_ON(cmp_reg(&cp15_regs[i-1], &cp15_regs[i]) >= 0);
+	BUG_ON(check_reg_table(cp15_regs, ARRAY_SIZE(cp15_regs)));
+	BUG_ON(check_reg_table(invariant_cp15, ARRAY_SIZE(invariant_cp15)));
 
 	/* We abuse the reset function to overwrite the table itself. */
 	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 4/8] ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit
  2016-02-08 11:40 ` Marc Zyngier
@ 2016-02-08 11:40   ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

As we're going to play some tricks on the struct coproc_reg,
make sure its 64bit indicator field matches that of coproc_params.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/coproc.c | 4 ++--
 arch/arm/kvm/coproc.h | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 03f5d14..2a67f00 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -423,7 +423,7 @@ static const struct coproc_reg *find_reg(const struct coproc_params *params,
 	for (i = 0; i < num; i++) {
 		const struct coproc_reg *r = &table[i];
 
-		if (params->is_64bit != r->is_64)
+		if (params->is_64bit != r->is_64bit)
 			continue;
 		if (params->CRn != r->CRn)
 			continue;
@@ -1105,7 +1105,7 @@ static int write_demux_regids(u64 __user *uindices)
 static u64 cp15_to_index(const struct coproc_reg *reg)
 {
 	u64 val = KVM_REG_ARM | (15 << KVM_REG_ARM_COPROC_SHIFT);
-	if (reg->is_64) {
+	if (reg->is_64bit) {
 		val |= KVM_REG_SIZE_U64;
 		val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT);
 		/*
diff --git a/arch/arm/kvm/coproc.h b/arch/arm/kvm/coproc.h
index 88d24a3..5acd097 100644
--- a/arch/arm/kvm/coproc.h
+++ b/arch/arm/kvm/coproc.h
@@ -37,7 +37,7 @@ struct coproc_reg {
 	unsigned long Op1;
 	unsigned long Op2;
 
-	bool is_64;
+	bool is_64bit;
 
 	/* Trapped access from guest, if non-NULL. */
 	bool (*access)(struct kvm_vcpu *,
@@ -141,7 +141,7 @@ static inline int cmp_reg(const struct coproc_reg *i1,
 		return i1->Op1 - i2->Op1;
 	if (i1->Op2 != i2->Op2)
 		return i1->Op2 - i2->Op2;
-	return i2->is_64 - i1->is_64;
+	return i2->is_64bit - i1->is_64bit;
 }
 
 
@@ -150,8 +150,8 @@ static inline int cmp_reg(const struct coproc_reg *i1,
 #define CRm64(_x)       .CRn = _x, .CRm = 0
 #define Op1(_x) 	.Op1 = _x
 #define Op2(_x) 	.Op2 = _x
-#define is64		.is_64 = true
-#define is32		.is_64 = false
+#define is64		.is_64bit = true
+#define is32		.is_64bit = false
 
 bool access_vm_reg(struct kvm_vcpu *vcpu,
 		   const struct coproc_params *p,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 4/8] ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit
@ 2016-02-08 11:40   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

As we're going to play some tricks on the struct coproc_reg,
make sure its 64bit indicator field matches that of coproc_params.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/coproc.c | 4 ++--
 arch/arm/kvm/coproc.h | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 03f5d14..2a67f00 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -423,7 +423,7 @@ static const struct coproc_reg *find_reg(const struct coproc_params *params,
 	for (i = 0; i < num; i++) {
 		const struct coproc_reg *r = &table[i];
 
-		if (params->is_64bit != r->is_64)
+		if (params->is_64bit != r->is_64bit)
 			continue;
 		if (params->CRn != r->CRn)
 			continue;
@@ -1105,7 +1105,7 @@ static int write_demux_regids(u64 __user *uindices)
 static u64 cp15_to_index(const struct coproc_reg *reg)
 {
 	u64 val = KVM_REG_ARM | (15 << KVM_REG_ARM_COPROC_SHIFT);
-	if (reg->is_64) {
+	if (reg->is_64bit) {
 		val |= KVM_REG_SIZE_U64;
 		val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT);
 		/*
diff --git a/arch/arm/kvm/coproc.h b/arch/arm/kvm/coproc.h
index 88d24a3..5acd097 100644
--- a/arch/arm/kvm/coproc.h
+++ b/arch/arm/kvm/coproc.h
@@ -37,7 +37,7 @@ struct coproc_reg {
 	unsigned long Op1;
 	unsigned long Op2;
 
-	bool is_64;
+	bool is_64bit;
 
 	/* Trapped access from guest, if non-NULL. */
 	bool (*access)(struct kvm_vcpu *,
@@ -141,7 +141,7 @@ static inline int cmp_reg(const struct coproc_reg *i1,
 		return i1->Op1 - i2->Op1;
 	if (i1->Op2 != i2->Op2)
 		return i1->Op2 - i2->Op2;
-	return i2->is_64 - i1->is_64;
+	return i2->is_64bit - i1->is_64bit;
 }
 
 
@@ -150,8 +150,8 @@ static inline int cmp_reg(const struct coproc_reg *i1,
 #define CRm64(_x)       .CRn = _x, .CRm = 0
 #define Op1(_x) 	.Op1 = _x
 #define Op2(_x) 	.Op2 = _x
-#define is64		.is_64 = true
-#define is32		.is_64 = false
+#define is64		.is_64bit = true
+#define is32		.is_64bit = false
 
 bool access_vm_reg(struct kvm_vcpu *vcpu,
 		   const struct coproc_params *p,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 5/8] ARM: KVM: Switch the CP reg search to be a binary search
  2016-02-08 11:40 ` Marc Zyngier
@ 2016-02-08 11:40   ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Doing a linear search is a bit silly when we can do a binary search.
Not that we trap that so many things that it has become a burden yet,
but it makes sense to align it with the arm64 code.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/coproc.c | 41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 2a67f00..4f1c869 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -16,6 +16,8 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/bsearch.h>
 #include <linux/mm.h>
 #include <linux/kvm_host.h>
 #include <linux/uaccess.h>
@@ -414,29 +416,32 @@ static const struct coproc_reg *get_target_table(unsigned target, size_t *num)
 	return table->table;
 }
 
+#define reg_to_match_value(x)						\
+	({								\
+		unsigned long val;					\
+		val  = (x)->CRn << 11;					\
+		val |= (x)->CRm << 7;					\
+		val |= (x)->Op1 << 4;					\
+		val |= (x)->Op2 << 1;					\
+		val |= !(x)->is_64bit;					\
+		val;							\
+	 })
+
+static int match_reg(const void *key, const void *elt)
+{
+	const unsigned long pval = (unsigned long)key;
+	const struct coproc_reg *r = elt;
+
+	return pval - reg_to_match_value(r);
+}
+
 static const struct coproc_reg *find_reg(const struct coproc_params *params,
 					 const struct coproc_reg table[],
 					 unsigned int num)
 {
-	unsigned int i;
-
-	for (i = 0; i < num; i++) {
-		const struct coproc_reg *r = &table[i];
+	unsigned long pval = reg_to_match_value(params);
 
-		if (params->is_64bit != r->is_64bit)
-			continue;
-		if (params->CRn != r->CRn)
-			continue;
-		if (params->CRm != r->CRm)
-			continue;
-		if (params->Op1 != r->Op1)
-			continue;
-		if (params->Op2 != r->Op2)
-			continue;
-
-		return r;
-	}
-	return NULL;
+	return bsearch((void *)pval, table, num, sizeof(table[0]), match_reg);
 }
 
 static int emulate_cp15(struct kvm_vcpu *vcpu,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 5/8] ARM: KVM: Switch the CP reg search to be a binary search
@ 2016-02-08 11:40   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

Doing a linear search is a bit silly when we can do a binary search.
Not that we trap that so many things that it has become a burden yet,
but it makes sense to align it with the arm64 code.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/coproc.c | 41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 2a67f00..4f1c869 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -16,6 +16,8 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/bsearch.h>
 #include <linux/mm.h>
 #include <linux/kvm_host.h>
 #include <linux/uaccess.h>
@@ -414,29 +416,32 @@ static const struct coproc_reg *get_target_table(unsigned target, size_t *num)
 	return table->table;
 }
 
+#define reg_to_match_value(x)						\
+	({								\
+		unsigned long val;					\
+		val  = (x)->CRn << 11;					\
+		val |= (x)->CRm << 7;					\
+		val |= (x)->Op1 << 4;					\
+		val |= (x)->Op2 << 1;					\
+		val |= !(x)->is_64bit;					\
+		val;							\
+	 })
+
+static int match_reg(const void *key, const void *elt)
+{
+	const unsigned long pval = (unsigned long)key;
+	const struct coproc_reg *r = elt;
+
+	return pval - reg_to_match_value(r);
+}
+
 static const struct coproc_reg *find_reg(const struct coproc_params *params,
 					 const struct coproc_reg table[],
 					 unsigned int num)
 {
-	unsigned int i;
-
-	for (i = 0; i < num; i++) {
-		const struct coproc_reg *r = &table[i];
+	unsigned long pval = reg_to_match_value(params);
 
-		if (params->is_64bit != r->is_64bit)
-			continue;
-		if (params->CRn != r->CRn)
-			continue;
-		if (params->CRm != r->CRm)
-			continue;
-		if (params->Op1 != r->Op1)
-			continue;
-		if (params->Op2 != r->Op2)
-			continue;
-
-		return r;
-	}
-	return NULL;
+	return bsearch((void *)pval, table, num, sizeof(table[0]), match_reg);
 }
 
 static int emulate_cp15(struct kvm_vcpu *vcpu,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 6/8] KVM: arm/arm64: timer: Add active state caching
  2016-02-08 11:40 ` Marc Zyngier
@ 2016-02-08 11:40   ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Programming the active state in the (re)distributor can be an
expensive operation so it makes some sense to try and reduce
the number of accesses as much as possible. So far, we
program the active state on each VM entry, but there is some
opportunity to do less.

An obvious solution is to cache the active state in memory,
and only program it in the HW when conditions change. But
because the HW can also change things under our feet (the active
state can transition from 1 to 0 when the guest does an EOI),
some precautions have to be taken, which amount to only caching
an "inactive" state, and always programing it otherwise.

With this in place, we observe a reduction of around 700 cycles
on a 2GHz GICv2 platform for a NULL hypercall.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/arm.c           |  1 +
 include/kvm/arm_arch_timer.h |  5 +++++
 virt/kvm/arm/arch_timer.c    | 31 +++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dda1959..af7c1a3 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -320,6 +320,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	vcpu->cpu = -1;
 
 	kvm_arm_set_running_vcpu(NULL);
+	kvm_timer_vcpu_put(vcpu);
 }
 
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 1800227..b651aed 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -55,6 +55,9 @@ struct arch_timer_cpu {
 
 	/* VGIC mapping */
 	struct irq_phys_map		*map;
+
+	/* Active IRQ state caching */
+	bool				active_cleared_last;
 };
 
 int kvm_timer_hyp_init(void);
@@ -74,4 +77,6 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
 void kvm_timer_schedule(struct kvm_vcpu *vcpu);
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
+void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
+
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 69bca18..bfec447 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -34,6 +34,11 @@ static struct timecounter *timecounter;
 static struct workqueue_struct *wqueue;
 static unsigned int host_vtimer_irq;
 
+void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.timer_cpu.active_cleared_last = false;
+}
+
 static cycle_t kvm_phys_timer_read(void)
 {
 	return timecounter->cc->read(timecounter->cc);
@@ -130,6 +135,7 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
 
 	BUG_ON(!vgic_initialized(vcpu->kvm));
 
+	timer->active_cleared_last = false;
 	timer->irq.level = new_level;
 	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer->map->virt_irq,
 				   timer->irq.level);
@@ -242,10 +248,35 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 	else
 		phys_active = false;
 
+	/*
+	 * We want to avoid hitting the (re)distributor as much as
+	 * possible, as this is a potentially expensive MMIO access
+	 * (not to mention locks in the irq layer), and a solution for
+	 * this is to cache the "active" state in memory.
+	 *
+	 * Things to consider: we cannot cache an "active set" state,
+	 * because the HW can change this behind our back (it becomes
+	 * "clear" in the HW). We must then restrict the caching to
+	 * the "clear" state.
+	 *
+	 * The cache is invalidated on:
+	 * - vcpu put, indicating that the HW cannot be trusted to be
+	 *   in a sane state on the next vcpu load,
+	 * - any change in the interrupt state
+	 *
+	 * Usage conditions:
+	 * - cached value is "active clear"
+	 * - value to be programmed is "active clear"
+	 */
+	if (timer->active_cleared_last && !phys_active)
+		return;
+
 	ret = irq_set_irqchip_state(timer->map->irq,
 				    IRQCHIP_STATE_ACTIVE,
 				    phys_active);
 	WARN_ON(ret);
+
+	timer->active_cleared_last = !phys_active;
 }
 
 /**
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 6/8] KVM: arm/arm64: timer: Add active state caching
@ 2016-02-08 11:40   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

Programming the active state in the (re)distributor can be an
expensive operation so it makes some sense to try and reduce
the number of accesses as much as possible. So far, we
program the active state on each VM entry, but there is some
opportunity to do less.

An obvious solution is to cache the active state in memory,
and only program it in the HW when conditions change. But
because the HW can also change things under our feet (the active
state can transition from 1 to 0 when the guest does an EOI),
some precautions have to be taken, which amount to only caching
an "inactive" state, and always programing it otherwise.

With this in place, we observe a reduction of around 700 cycles
on a 2GHz GICv2 platform for a NULL hypercall.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/arm.c           |  1 +
 include/kvm/arm_arch_timer.h |  5 +++++
 virt/kvm/arm/arch_timer.c    | 31 +++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dda1959..af7c1a3 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -320,6 +320,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	vcpu->cpu = -1;
 
 	kvm_arm_set_running_vcpu(NULL);
+	kvm_timer_vcpu_put(vcpu);
 }
 
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 1800227..b651aed 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -55,6 +55,9 @@ struct arch_timer_cpu {
 
 	/* VGIC mapping */
 	struct irq_phys_map		*map;
+
+	/* Active IRQ state caching */
+	bool				active_cleared_last;
 };
 
 int kvm_timer_hyp_init(void);
@@ -74,4 +77,6 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
 void kvm_timer_schedule(struct kvm_vcpu *vcpu);
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
+void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
+
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 69bca18..bfec447 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -34,6 +34,11 @@ static struct timecounter *timecounter;
 static struct workqueue_struct *wqueue;
 static unsigned int host_vtimer_irq;
 
+void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.timer_cpu.active_cleared_last = false;
+}
+
 static cycle_t kvm_phys_timer_read(void)
 {
 	return timecounter->cc->read(timecounter->cc);
@@ -130,6 +135,7 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
 
 	BUG_ON(!vgic_initialized(vcpu->kvm));
 
+	timer->active_cleared_last = false;
 	timer->irq.level = new_level;
 	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer->map->virt_irq,
 				   timer->irq.level);
@@ -242,10 +248,35 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 	else
 		phys_active = false;
 
+	/*
+	 * We want to avoid hitting the (re)distributor as much as
+	 * possible, as this is a potentially expensive MMIO access
+	 * (not to mention locks in the irq layer), and a solution for
+	 * this is to cache the "active" state in memory.
+	 *
+	 * Things to consider: we cannot cache an "active set" state,
+	 * because the HW can change this behind our back (it becomes
+	 * "clear" in the HW). We must then restrict the caching to
+	 * the "clear" state.
+	 *
+	 * The cache is invalidated on:
+	 * - vcpu put, indicating that the HW cannot be trusted to be
+	 *   in a sane state on the next vcpu load,
+	 * - any change in the interrupt state
+	 *
+	 * Usage conditions:
+	 * - cached value is "active clear"
+	 * - value to be programmed is "active clear"
+	 */
+	if (timer->active_cleared_last && !phys_active)
+		return;
+
 	ret = irq_set_irqchip_state(timer->map->irq,
 				    IRQCHIP_STATE_ACTIVE,
 				    phys_active);
 	WARN_ON(ret);
+
+	timer->active_cleared_last = !phys_active;
 }
 
 /**
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers
  2016-02-08 11:40 ` Marc Zyngier
@ 2016-02-08 11:40   ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
But we're equaly bad, as we make a point in accessing them even if
we don't have any interrupt in flight.

A good solution is to first find out if we have anything useful to
write into the GIC, and if we don't, to simply not do it. This
involves tracking which LRs actually have something valid there.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/vgic-v2-sr.c | 71 ++++++++++++++++++++++++++++-------------
 include/kvm/arm_vgic.h          |  2 ++
 2 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
index e717612..874a08d 100644
--- a/arch/arm64/kvm/hyp/vgic-v2-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
@@ -38,28 +38,40 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
 
 	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
 	cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
-	cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
-	eisr0  = readl_relaxed(base + GICH_EISR0);
-	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
-	if (unlikely(nr_lr > 32)) {
-		eisr1  = readl_relaxed(base + GICH_EISR1);
-		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
-	} else {
-		eisr1 = elrsr1 = 0;
-	}
+
+	if (vcpu->arch.vgic_cpu.live_lrs) {
+		eisr0  = readl_relaxed(base + GICH_EISR0);
+		elrsr0 = readl_relaxed(base + GICH_ELRSR0);
+		cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
+		cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
+
+		if (unlikely(nr_lr > 32)) {
+			eisr1  = readl_relaxed(base + GICH_EISR1);
+			elrsr1 = readl_relaxed(base + GICH_ELRSR1);
+		} else {
+			eisr1 = elrsr1 = 0;
+		}
+
 #ifdef CONFIG_CPU_BIG_ENDIAN
-	cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
-	cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
+		cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
+		cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
 #else
-	cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
-	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
+		cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
+		cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
 #endif
-	cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
 
-	writel_relaxed(0, base + GICH_HCR);
+		for (i = 0; i < nr_lr; i++)
+			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
+				cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
 
-	for (i = 0; i < nr_lr; i++)
-		cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
+		writel_relaxed(0, base + GICH_HCR);
+
+		vcpu->arch.vgic_cpu.live_lrs = 0;
+	} else {
+		cpu_if->vgic_eisr = 0;
+		cpu_if->vgic_elrsr = ~0UL;
+		cpu_if->vgic_misr = 0;
+	}
 }
 
 /* vcpu is already in the HYP VA space */
@@ -70,15 +82,30 @@ void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
 	struct vgic_dist *vgic = &kvm->arch.vgic;
 	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
 	int i, nr_lr;
+	u64 live_lrs = 0;
 
 	if (!base)
 		return;
 
-	writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
-	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
-	writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
-
 	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
+
 	for (i = 0; i < nr_lr; i++)
-		writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
+		if (cpu_if->vgic_lr[i] & GICH_LR_STATE)
+			live_lrs |= 1UL << i;
+
+	if (live_lrs) {
+		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
+		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
+		for (i = 0; i < nr_lr; i++) {
+			u32 val = 0;
+
+			if (live_lrs & (1UL << i))
+				val = cpu_if->vgic_lr[i];
+
+			writel_relaxed(val, base + GICH_LR0 + (i * 4));
+		}
+	}
+
+	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
+	vcpu->arch.vgic_cpu.live_lrs = live_lrs;
 }
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 13a3d53..f473fd6 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -321,6 +321,8 @@ struct vgic_cpu {
 
 	/* Protected by the distributor's irq_phys_map_lock */
 	struct list_head	irq_phys_map_list;
+
+	u64		live_lrs;
 };
 
 #define LR_EMPTY	0xff
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers
@ 2016-02-08 11:40   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
But we're equaly bad, as we make a point in accessing them even if
we don't have any interrupt in flight.

A good solution is to first find out if we have anything useful to
write into the GIC, and if we don't, to simply not do it. This
involves tracking which LRs actually have something valid there.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/vgic-v2-sr.c | 71 ++++++++++++++++++++++++++++-------------
 include/kvm/arm_vgic.h          |  2 ++
 2 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
index e717612..874a08d 100644
--- a/arch/arm64/kvm/hyp/vgic-v2-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
@@ -38,28 +38,40 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
 
 	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
 	cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
-	cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
-	eisr0  = readl_relaxed(base + GICH_EISR0);
-	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
-	if (unlikely(nr_lr > 32)) {
-		eisr1  = readl_relaxed(base + GICH_EISR1);
-		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
-	} else {
-		eisr1 = elrsr1 = 0;
-	}
+
+	if (vcpu->arch.vgic_cpu.live_lrs) {
+		eisr0  = readl_relaxed(base + GICH_EISR0);
+		elrsr0 = readl_relaxed(base + GICH_ELRSR0);
+		cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
+		cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
+
+		if (unlikely(nr_lr > 32)) {
+			eisr1  = readl_relaxed(base + GICH_EISR1);
+			elrsr1 = readl_relaxed(base + GICH_ELRSR1);
+		} else {
+			eisr1 = elrsr1 = 0;
+		}
+
 #ifdef CONFIG_CPU_BIG_ENDIAN
-	cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
-	cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
+		cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
+		cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
 #else
-	cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
-	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
+		cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
+		cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
 #endif
-	cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
 
-	writel_relaxed(0, base + GICH_HCR);
+		for (i = 0; i < nr_lr; i++)
+			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
+				cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
 
-	for (i = 0; i < nr_lr; i++)
-		cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
+		writel_relaxed(0, base + GICH_HCR);
+
+		vcpu->arch.vgic_cpu.live_lrs = 0;
+	} else {
+		cpu_if->vgic_eisr = 0;
+		cpu_if->vgic_elrsr = ~0UL;
+		cpu_if->vgic_misr = 0;
+	}
 }
 
 /* vcpu is already in the HYP VA space */
@@ -70,15 +82,30 @@ void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
 	struct vgic_dist *vgic = &kvm->arch.vgic;
 	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
 	int i, nr_lr;
+	u64 live_lrs = 0;
 
 	if (!base)
 		return;
 
-	writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
-	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
-	writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
-
 	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
+
 	for (i = 0; i < nr_lr; i++)
-		writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
+		if (cpu_if->vgic_lr[i] & GICH_LR_STATE)
+			live_lrs |= 1UL << i;
+
+	if (live_lrs) {
+		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
+		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
+		for (i = 0; i < nr_lr; i++) {
+			u32 val = 0;
+
+			if (live_lrs & (1UL << i))
+				val = cpu_if->vgic_lr[i];
+
+			writel_relaxed(val, base + GICH_LR0 + (i * 4));
+		}
+	}
+
+	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
+	vcpu->arch.vgic_cpu.live_lrs = live_lrs;
 }
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 13a3d53..f473fd6 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -321,6 +321,8 @@ struct vgic_cpu {
 
 	/* Protected by the distributor's irq_phys_map_lock */
 	struct list_head	irq_phys_map_list;
+
+	u64		live_lrs;
 };
 
 #define LR_EMPTY	0xff
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 8/8] KVM: arm64: Avoid accessing ICH registers
  2016-02-08 11:40 ` Marc Zyngier
@ 2016-02-08 11:40   ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Just like on GICv2, we're a bit hammer-happy with GICv3, and access
them more often than we should.

Adopt a policy similar to what we do for GICv2, only save/restoring
the minimal set of registers. As we don't access the registers
linearly anymore (we may skip some), the convoluted accessors become
slightly simpler, and we can drop the ugly indexing macro that
tended to confuse the reviewers.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/vgic-v3-sr.c | 288 ++++++++++++++++++++++++----------------
 include/kvm/arm_vgic.h          |   6 -
 virt/kvm/arm/vgic-v3.c          |   4 +-
 3 files changed, 176 insertions(+), 122 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 9142e082..d3813f5 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
@@ -39,12 +39,104 @@
 		asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
 	} while (0)
 
-/* vcpu is already in the HYP VA space */
+static u64 __hyp_text __gic_v3_get_lr(unsigned int lr)
+{
+	switch (lr & 0xf) {
+	case 0:
+		return read_gicreg(ICH_LR0_EL2);
+	case 1:
+		return read_gicreg(ICH_LR1_EL2);
+	case 2:
+		return read_gicreg(ICH_LR2_EL2);
+	case 3:
+		return read_gicreg(ICH_LR3_EL2);
+	case 4:
+		return read_gicreg(ICH_LR4_EL2);
+	case 5:
+		return read_gicreg(ICH_LR5_EL2);
+	case 6:
+		return read_gicreg(ICH_LR6_EL2);
+	case 7:
+		return read_gicreg(ICH_LR7_EL2);
+	case 8:
+		return read_gicreg(ICH_LR8_EL2);
+	case 9:
+		return read_gicreg(ICH_LR9_EL2);
+	case 10:
+		return read_gicreg(ICH_LR10_EL2);
+	case 11:
+		return read_gicreg(ICH_LR11_EL2);
+	case 12:
+		return read_gicreg(ICH_LR12_EL2);
+	case 13:
+		return read_gicreg(ICH_LR13_EL2);
+	case 14:
+		return read_gicreg(ICH_LR14_EL2);
+	case 15:
+		return read_gicreg(ICH_LR15_EL2);
+	}
+
+	unreachable();
+}
+
+static void __hyp_text __gic_v3_set_lr(u64 val, int lr)
+{
+	switch (lr & 0xf) {
+	case 0:
+		write_gicreg(val, ICH_LR0_EL2);
+		break;
+	case 1:
+		write_gicreg(val, ICH_LR1_EL2);
+		break;
+	case 2:
+		write_gicreg(val, ICH_LR2_EL2);
+		break;
+	case 3:
+		write_gicreg(val, ICH_LR3_EL2);
+		break;
+	case 4:
+		write_gicreg(val, ICH_LR4_EL2);
+		break;
+	case 5:
+		write_gicreg(val, ICH_LR5_EL2);
+		break;
+	case 6:
+		write_gicreg(val, ICH_LR6_EL2);
+		break;
+	case 7:
+		write_gicreg(val, ICH_LR7_EL2);
+		break;
+	case 8:
+		write_gicreg(val, ICH_LR8_EL2);
+		break;
+	case 9:
+		write_gicreg(val, ICH_LR9_EL2);
+		break;
+	case 10:
+		write_gicreg(val, ICH_LR10_EL2);
+		break;
+	case 11:
+		write_gicreg(val, ICH_LR11_EL2);
+		break;
+	case 12:
+		write_gicreg(val, ICH_LR12_EL2);
+		break;
+	case 13:
+		write_gicreg(val, ICH_LR13_EL2);
+		break;
+	case 14:
+		write_gicreg(val, ICH_LR14_EL2);
+		break;
+	case 15:
+		write_gicreg(val, ICH_LR15_EL2);
+		break;
+	}
+}
+
 void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 	u64 val;
-	u32 max_lr_idx, nr_pri_bits;
 
 	/*
 	 * Make sure stores to the GIC via the memory mapped interface
@@ -53,68 +145,50 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 	dsb(st);
 
 	cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
-	cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
-	cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
-	cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
 
-	write_gicreg(0, ICH_HCR_EL2);
-	val = read_gicreg(ICH_VTR_EL2);
-	max_lr_idx = vtr_to_max_lr_idx(val);
-	nr_pri_bits = vtr_to_nr_pri_bits(val);
+	if (vcpu->arch.vgic_cpu.live_lrs) {
+		int i;
+		u32 max_lr_idx, nr_pri_bits;
 
-	switch (max_lr_idx) {
-	case 15:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)] = read_gicreg(ICH_LR15_EL2);
-	case 14:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)] = read_gicreg(ICH_LR14_EL2);
-	case 13:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)] = read_gicreg(ICH_LR13_EL2);
-	case 12:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)] = read_gicreg(ICH_LR12_EL2);
-	case 11:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)] = read_gicreg(ICH_LR11_EL2);
-	case 10:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)] = read_gicreg(ICH_LR10_EL2);
-	case 9:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)] = read_gicreg(ICH_LR9_EL2);
-	case 8:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)] = read_gicreg(ICH_LR8_EL2);
-	case 7:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)] = read_gicreg(ICH_LR7_EL2);
-	case 6:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)] = read_gicreg(ICH_LR6_EL2);
-	case 5:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)] = read_gicreg(ICH_LR5_EL2);
-	case 4:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)] = read_gicreg(ICH_LR4_EL2);
-	case 3:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)] = read_gicreg(ICH_LR3_EL2);
-	case 2:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)] = read_gicreg(ICH_LR2_EL2);
-	case 1:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)] = read_gicreg(ICH_LR1_EL2);
-	case 0:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)] = read_gicreg(ICH_LR0_EL2);
-	}
+		cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
+		cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
+		cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
 
-	switch (nr_pri_bits) {
-	case 7:
-		cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
-		cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
-	case 6:
-		cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
-	default:
-		cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
-	}
+		write_gicreg(0, ICH_HCR_EL2);
+		val = read_gicreg(ICH_VTR_EL2);
+		max_lr_idx = vtr_to_max_lr_idx(val);
+		nr_pri_bits = vtr_to_nr_pri_bits(val);
 
-	switch (nr_pri_bits) {
-	case 7:
-		cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2);
-		cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2);
-	case 6:
-		cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2);
-	default:
-		cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2);
+		for (i = 0; i <= max_lr_idx; i++) {
+			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
+				cpu_if->vgic_lr[i] = __gic_v3_get_lr(i);
+		}
+
+		switch (nr_pri_bits) {
+		case 7:
+			cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
+			cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
+		case 6:
+			cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
+		default:
+			cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
+		}
+
+		switch (nr_pri_bits) {
+		case 7:
+			cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2);
+			cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2);
+		case 6:
+			cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2);
+		default:
+			cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2);
+		}
+
+		vcpu->arch.vgic_cpu.live_lrs = 0;
+	} else {
+		cpu_if->vgic_misr  = 0;
+		cpu_if->vgic_eisr  = 0;
+		cpu_if->vgic_elrsr = 0xffff;
 	}
 
 	val = read_gicreg(ICC_SRE_EL2);
@@ -128,6 +202,8 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 	u64 val;
 	u32 max_lr_idx, nr_pri_bits;
+	u16 live_lrs = 0;
+	int i;
 
 	/*
 	 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a
@@ -140,68 +216,51 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	write_gicreg(cpu_if->vgic_sre, ICC_SRE_EL1);
 	isb();
 
-	write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
-	write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
-
 	val = read_gicreg(ICH_VTR_EL2);
 	max_lr_idx = vtr_to_max_lr_idx(val);
 	nr_pri_bits = vtr_to_nr_pri_bits(val);
 
-	switch (nr_pri_bits) {
-	case 7:
-		 write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2);
-		 write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2);
-	case 6:
-		 write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2);
-	default:
-		 write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2);
-	}	 	                           
-		 	                           
-	switch (nr_pri_bits) {
-	case 7:
-		 write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2);
-		 write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2);
-	case 6:
-		 write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2);
-	default:
-		 write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2);
+	for (i = 0; i <= max_lr_idx; i++) {
+		if (cpu_if->vgic_lr[i] & ICH_LR_STATE)
+			live_lrs |= (1 << i);
 	}
 
-	switch (max_lr_idx) {
-	case 15:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)], ICH_LR15_EL2);
-	case 14:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)], ICH_LR14_EL2);
-	case 13:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)], ICH_LR13_EL2);
-	case 12:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)], ICH_LR12_EL2);
-	case 11:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)], ICH_LR11_EL2);
-	case 10:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)], ICH_LR10_EL2);
-	case 9:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)], ICH_LR9_EL2);
-	case 8:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)], ICH_LR8_EL2);
-	case 7:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)], ICH_LR7_EL2);
-	case 6:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)], ICH_LR6_EL2);
-	case 5:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)], ICH_LR5_EL2);
-	case 4:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)], ICH_LR4_EL2);
-	case 3:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)], ICH_LR3_EL2);
-	case 2:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)], ICH_LR2_EL2);
-	case 1:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)], ICH_LR1_EL2);
-	case 0:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)], ICH_LR0_EL2);
+	write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
+
+	if (live_lrs) {
+		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
+
+		switch (nr_pri_bits) {
+		case 7:
+			write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2);
+			write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2);
+		case 6:
+			write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2);
+		default:
+			write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2);
+		}
+		 	                           
+		switch (nr_pri_bits) {
+		case 7:
+			write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2);
+			write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2);
+		case 6:
+			write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2);
+		default:
+			write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2);
+		}
+
+		for (i = 0; i <= max_lr_idx; i++) {
+			val = 0;
+
+			if (live_lrs & (1 << i))
+				val = cpu_if->vgic_lr[i];
+
+			__gic_v3_set_lr(val, i);
+		}
 	}
 
+
 	/*
 	 * Ensures that the above will have reached the
 	 * (re)distributors. This ensure the guest will read the
@@ -209,6 +268,7 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	 */
 	isb();
 	dsb(sy);
+	vcpu->arch.vgic_cpu.live_lrs = live_lrs;
 
 	/*
 	 * Prevent the guest from touching the GIC system registers if
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index f473fd6..281caf8 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -279,12 +279,6 @@ struct vgic_v2_cpu_if {
 	u32		vgic_lr[VGIC_V2_MAX_LRS];
 };
 
-/*
- * LRs are stored in reverse order in memory. make sure we index them
- * correctly.
- */
-#define VGIC_V3_LR_INDEX(lr)		(VGIC_V3_MAX_LRS - 1 - lr)
-
 struct vgic_v3_cpu_if {
 #ifdef CONFIG_KVM_ARM_VGIC_V3
 	u32		vgic_hcr;
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index 453eafd..11b5ff6 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -42,7 +42,7 @@ static u32 ich_vtr_el2;
 static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
 {
 	struct vgic_lr lr_desc;
-	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)];
+	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr];
 
 	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
 		lr_desc.irq = val & ICH_LR_VIRTUALID_MASK;
@@ -106,7 +106,7 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
 		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
 	}
 
-	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)] = lr_val;
+	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr] = lr_val;
 
 	if (!(lr_desc.state & LR_STATE_MASK))
 		vcpu->arch.vgic_cpu.vgic_v3.vgic_elrsr |= (1U << lr);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 8/8] KVM: arm64: Avoid accessing ICH registers
@ 2016-02-08 11:40   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-08 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

Just like on GICv2, we're a bit hammer-happy with GICv3, and access
them more often than we should.

Adopt a policy similar to what we do for GICv2, only save/restoring
the minimal set of registers. As we don't access the registers
linearly anymore (we may skip some), the convoluted accessors become
slightly simpler, and we can drop the ugly indexing macro that
tended to confuse the reviewers.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/vgic-v3-sr.c | 288 ++++++++++++++++++++++++----------------
 include/kvm/arm_vgic.h          |   6 -
 virt/kvm/arm/vgic-v3.c          |   4 +-
 3 files changed, 176 insertions(+), 122 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 9142e082..d3813f5 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
@@ -39,12 +39,104 @@
 		asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
 	} while (0)
 
-/* vcpu is already in the HYP VA space */
+static u64 __hyp_text __gic_v3_get_lr(unsigned int lr)
+{
+	switch (lr & 0xf) {
+	case 0:
+		return read_gicreg(ICH_LR0_EL2);
+	case 1:
+		return read_gicreg(ICH_LR1_EL2);
+	case 2:
+		return read_gicreg(ICH_LR2_EL2);
+	case 3:
+		return read_gicreg(ICH_LR3_EL2);
+	case 4:
+		return read_gicreg(ICH_LR4_EL2);
+	case 5:
+		return read_gicreg(ICH_LR5_EL2);
+	case 6:
+		return read_gicreg(ICH_LR6_EL2);
+	case 7:
+		return read_gicreg(ICH_LR7_EL2);
+	case 8:
+		return read_gicreg(ICH_LR8_EL2);
+	case 9:
+		return read_gicreg(ICH_LR9_EL2);
+	case 10:
+		return read_gicreg(ICH_LR10_EL2);
+	case 11:
+		return read_gicreg(ICH_LR11_EL2);
+	case 12:
+		return read_gicreg(ICH_LR12_EL2);
+	case 13:
+		return read_gicreg(ICH_LR13_EL2);
+	case 14:
+		return read_gicreg(ICH_LR14_EL2);
+	case 15:
+		return read_gicreg(ICH_LR15_EL2);
+	}
+
+	unreachable();
+}
+
+static void __hyp_text __gic_v3_set_lr(u64 val, int lr)
+{
+	switch (lr & 0xf) {
+	case 0:
+		write_gicreg(val, ICH_LR0_EL2);
+		break;
+	case 1:
+		write_gicreg(val, ICH_LR1_EL2);
+		break;
+	case 2:
+		write_gicreg(val, ICH_LR2_EL2);
+		break;
+	case 3:
+		write_gicreg(val, ICH_LR3_EL2);
+		break;
+	case 4:
+		write_gicreg(val, ICH_LR4_EL2);
+		break;
+	case 5:
+		write_gicreg(val, ICH_LR5_EL2);
+		break;
+	case 6:
+		write_gicreg(val, ICH_LR6_EL2);
+		break;
+	case 7:
+		write_gicreg(val, ICH_LR7_EL2);
+		break;
+	case 8:
+		write_gicreg(val, ICH_LR8_EL2);
+		break;
+	case 9:
+		write_gicreg(val, ICH_LR9_EL2);
+		break;
+	case 10:
+		write_gicreg(val, ICH_LR10_EL2);
+		break;
+	case 11:
+		write_gicreg(val, ICH_LR11_EL2);
+		break;
+	case 12:
+		write_gicreg(val, ICH_LR12_EL2);
+		break;
+	case 13:
+		write_gicreg(val, ICH_LR13_EL2);
+		break;
+	case 14:
+		write_gicreg(val, ICH_LR14_EL2);
+		break;
+	case 15:
+		write_gicreg(val, ICH_LR15_EL2);
+		break;
+	}
+}
+
 void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 	u64 val;
-	u32 max_lr_idx, nr_pri_bits;
 
 	/*
 	 * Make sure stores to the GIC via the memory mapped interface
@@ -53,68 +145,50 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 	dsb(st);
 
 	cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
-	cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
-	cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
-	cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
 
-	write_gicreg(0, ICH_HCR_EL2);
-	val = read_gicreg(ICH_VTR_EL2);
-	max_lr_idx = vtr_to_max_lr_idx(val);
-	nr_pri_bits = vtr_to_nr_pri_bits(val);
+	if (vcpu->arch.vgic_cpu.live_lrs) {
+		int i;
+		u32 max_lr_idx, nr_pri_bits;
 
-	switch (max_lr_idx) {
-	case 15:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)] = read_gicreg(ICH_LR15_EL2);
-	case 14:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)] = read_gicreg(ICH_LR14_EL2);
-	case 13:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)] = read_gicreg(ICH_LR13_EL2);
-	case 12:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)] = read_gicreg(ICH_LR12_EL2);
-	case 11:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)] = read_gicreg(ICH_LR11_EL2);
-	case 10:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)] = read_gicreg(ICH_LR10_EL2);
-	case 9:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)] = read_gicreg(ICH_LR9_EL2);
-	case 8:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)] = read_gicreg(ICH_LR8_EL2);
-	case 7:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)] = read_gicreg(ICH_LR7_EL2);
-	case 6:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)] = read_gicreg(ICH_LR6_EL2);
-	case 5:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)] = read_gicreg(ICH_LR5_EL2);
-	case 4:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)] = read_gicreg(ICH_LR4_EL2);
-	case 3:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)] = read_gicreg(ICH_LR3_EL2);
-	case 2:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)] = read_gicreg(ICH_LR2_EL2);
-	case 1:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)] = read_gicreg(ICH_LR1_EL2);
-	case 0:
-		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)] = read_gicreg(ICH_LR0_EL2);
-	}
+		cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
+		cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
+		cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
 
-	switch (nr_pri_bits) {
-	case 7:
-		cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
-		cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
-	case 6:
-		cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
-	default:
-		cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
-	}
+		write_gicreg(0, ICH_HCR_EL2);
+		val = read_gicreg(ICH_VTR_EL2);
+		max_lr_idx = vtr_to_max_lr_idx(val);
+		nr_pri_bits = vtr_to_nr_pri_bits(val);
 
-	switch (nr_pri_bits) {
-	case 7:
-		cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2);
-		cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2);
-	case 6:
-		cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2);
-	default:
-		cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2);
+		for (i = 0; i <= max_lr_idx; i++) {
+			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
+				cpu_if->vgic_lr[i] = __gic_v3_get_lr(i);
+		}
+
+		switch (nr_pri_bits) {
+		case 7:
+			cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
+			cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
+		case 6:
+			cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
+		default:
+			cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
+		}
+
+		switch (nr_pri_bits) {
+		case 7:
+			cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2);
+			cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2);
+		case 6:
+			cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2);
+		default:
+			cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2);
+		}
+
+		vcpu->arch.vgic_cpu.live_lrs = 0;
+	} else {
+		cpu_if->vgic_misr  = 0;
+		cpu_if->vgic_eisr  = 0;
+		cpu_if->vgic_elrsr = 0xffff;
 	}
 
 	val = read_gicreg(ICC_SRE_EL2);
@@ -128,6 +202,8 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 	u64 val;
 	u32 max_lr_idx, nr_pri_bits;
+	u16 live_lrs = 0;
+	int i;
 
 	/*
 	 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a
@@ -140,68 +216,51 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	write_gicreg(cpu_if->vgic_sre, ICC_SRE_EL1);
 	isb();
 
-	write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
-	write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
-
 	val = read_gicreg(ICH_VTR_EL2);
 	max_lr_idx = vtr_to_max_lr_idx(val);
 	nr_pri_bits = vtr_to_nr_pri_bits(val);
 
-	switch (nr_pri_bits) {
-	case 7:
-		 write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2);
-		 write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2);
-	case 6:
-		 write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2);
-	default:
-		 write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2);
-	}	 	                           
-		 	                           
-	switch (nr_pri_bits) {
-	case 7:
-		 write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2);
-		 write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2);
-	case 6:
-		 write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2);
-	default:
-		 write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2);
+	for (i = 0; i <= max_lr_idx; i++) {
+		if (cpu_if->vgic_lr[i] & ICH_LR_STATE)
+			live_lrs |= (1 << i);
 	}
 
-	switch (max_lr_idx) {
-	case 15:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)], ICH_LR15_EL2);
-	case 14:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)], ICH_LR14_EL2);
-	case 13:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)], ICH_LR13_EL2);
-	case 12:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)], ICH_LR12_EL2);
-	case 11:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)], ICH_LR11_EL2);
-	case 10:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)], ICH_LR10_EL2);
-	case 9:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)], ICH_LR9_EL2);
-	case 8:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)], ICH_LR8_EL2);
-	case 7:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)], ICH_LR7_EL2);
-	case 6:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)], ICH_LR6_EL2);
-	case 5:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)], ICH_LR5_EL2);
-	case 4:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)], ICH_LR4_EL2);
-	case 3:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)], ICH_LR3_EL2);
-	case 2:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)], ICH_LR2_EL2);
-	case 1:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)], ICH_LR1_EL2);
-	case 0:
-		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)], ICH_LR0_EL2);
+	write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
+
+	if (live_lrs) {
+		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
+
+		switch (nr_pri_bits) {
+		case 7:
+			write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2);
+			write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2);
+		case 6:
+			write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2);
+		default:
+			write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2);
+		}
+		 	                           
+		switch (nr_pri_bits) {
+		case 7:
+			write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2);
+			write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2);
+		case 6:
+			write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2);
+		default:
+			write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2);
+		}
+
+		for (i = 0; i <= max_lr_idx; i++) {
+			val = 0;
+
+			if (live_lrs & (1 << i))
+				val = cpu_if->vgic_lr[i];
+
+			__gic_v3_set_lr(val, i);
+		}
 	}
 
+
 	/*
 	 * Ensures that the above will have reached the
 	 * (re)distributors. This ensure the guest will read the
@@ -209,6 +268,7 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	 */
 	isb();
 	dsb(sy);
+	vcpu->arch.vgic_cpu.live_lrs = live_lrs;
 
 	/*
 	 * Prevent the guest from touching the GIC system registers if
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index f473fd6..281caf8 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -279,12 +279,6 @@ struct vgic_v2_cpu_if {
 	u32		vgic_lr[VGIC_V2_MAX_LRS];
 };
 
-/*
- * LRs are stored in reverse order in memory. make sure we index them
- * correctly.
- */
-#define VGIC_V3_LR_INDEX(lr)		(VGIC_V3_MAX_LRS - 1 - lr)
-
 struct vgic_v3_cpu_if {
 #ifdef CONFIG_KVM_ARM_VGIC_V3
 	u32		vgic_hcr;
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index 453eafd..11b5ff6 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -42,7 +42,7 @@ static u32 ich_vtr_el2;
 static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
 {
 	struct vgic_lr lr_desc;
-	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)];
+	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr];
 
 	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
 		lr_desc.irq = val & ICH_LR_VIRTUALID_MASK;
@@ -106,7 +106,7 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
 		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
 	}
 
-	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)] = lr_val;
+	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr] = lr_val;
 
 	if (!(lr_desc.state & LR_STATE_MASK))
 		vcpu->arch.vgic_cpu.vgic_v3.vgic_elrsr |= (1U << lr);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
  2016-02-08 11:40 ` Marc Zyngier
@ 2016-02-09 20:59   ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-09 20:59 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
> I've recently been looking at our entry/exit costs, and profiling
> figures did show some very low hanging fruits.
> 
> The most obvious cost is that accessing the GIC HW is slow. As in
> "deadly slow", specially when GICv2 is involved. So not hammering the
> HW when there is nothing to write is immediately beneficial, as this
> is the most common cases (whatever people seem to think, interrupts
> are a *rare* event).
> 
> Another easy thing to fix is the way we handle trapped system
> registers. We do insist on (mostly) sorting them, but we do perform a
> linear search on trap. We can switch to a binary search for free, and
> get immediate benefits (the PMU code, being extremely trap-happy,
> benefits immediately from this).
> 
> With these in place, I see an improvement of 20 to 30% (depending on
> the platform) on our world-switch cycle count when running a set of
> hand-crafted guests that are designed to only perform traps.

I'm curious about the weight of these two?  My guess based on the
measurement work I did is that the GIC is by far the worst sinner, but
that was exacerbated on X-Gene compared to Seattle.

> 
> Methodology:
> 
> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
> and then a power-off:
> 
> __start:
> 	mov	x19, #(1 << 16)
> 1:	mov	x0, #0x84000000
> 	hvc	#0
> 	sub	x19, x19, #1
> 	cbnz	x19, 1b
> 	mov	x0, #0x84000000
> 	add	x0, x0, #9
> 	hvc	#0
> 	b	.
> 
> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
> 
> __start:
> 	mov	x19, #(1 << 20)
> 1:	mrs	x0, PMSELR_EL0
> 	sub	x19, x19, #1
> 	cbnz	x19, 1b
> 	mov	x0, #0x84000000
> 	add	x0, x0, #9
> 	hvc	#0
> 	b	.
> 
> * These guests are profiled using perf and kvmtool:
> 
> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles

these would be good to add to kvm-unit-tests so we can keep an eye on
this sort of thing...


> 
> The result is then divided by the number of iterations (2^16 or 2^20).
> 
> These tests have been run on Seattle, Mustang, and LS2085, and shown
> significant improvements in all cases. I've only touched the arm64
> GIC code, but obviously the 32bit code should use it as well once
> we've migrated it to C.
> 
> I've pushed out a branch (kvm-arm64/suck-less) to the usual location.
> 

Looks promising!

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
@ 2016-02-09 20:59   ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-09 20:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
> I've recently been looking at our entry/exit costs, and profiling
> figures did show some very low hanging fruits.
> 
> The most obvious cost is that accessing the GIC HW is slow. As in
> "deadly slow", specially when GICv2 is involved. So not hammering the
> HW when there is nothing to write is immediately beneficial, as this
> is the most common cases (whatever people seem to think, interrupts
> are a *rare* event).
> 
> Another easy thing to fix is the way we handle trapped system
> registers. We do insist on (mostly) sorting them, but we do perform a
> linear search on trap. We can switch to a binary search for free, and
> get immediate benefits (the PMU code, being extremely trap-happy,
> benefits immediately from this).
> 
> With these in place, I see an improvement of 20 to 30% (depending on
> the platform) on our world-switch cycle count when running a set of
> hand-crafted guests that are designed to only perform traps.

I'm curious about the weight of these two?  My guess based on the
measurement work I did is that the GIC is by far the worst sinner, but
that was exacerbated on X-Gene compared to Seattle.

> 
> Methodology:
> 
> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
> and then a power-off:
> 
> __start:
> 	mov	x19, #(1 << 16)
> 1:	mov	x0, #0x84000000
> 	hvc	#0
> 	sub	x19, x19, #1
> 	cbnz	x19, 1b
> 	mov	x0, #0x84000000
> 	add	x0, x0, #9
> 	hvc	#0
> 	b	.
> 
> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
> 
> __start:
> 	mov	x19, #(1 << 20)
> 1:	mrs	x0, PMSELR_EL0
> 	sub	x19, x19, #1
> 	cbnz	x19, 1b
> 	mov	x0, #0x84000000
> 	add	x0, x0, #9
> 	hvc	#0
> 	b	.
> 
> * These guests are profiled using perf and kvmtool:
> 
> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles

these would be good to add to kvm-unit-tests so we can keep an eye on
this sort of thing...


> 
> The result is then divided by the number of iterations (2^16 or 2^20).
> 
> These tests have been run on Seattle, Mustang, and LS2085, and shown
> significant improvements in all cases. I've only touched the arm64
> GIC code, but obviously the 32bit code should use it as well once
> we've migrated it to C.
> 
> I've pushed out a branch (kvm-arm64/suck-less) to the usual location.
> 

Looks promising!

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
  2016-02-09 20:59   ` Christoffer Dall
@ 2016-02-10  8:34     ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10  8:34 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On 09/02/16 20:59, Christoffer Dall wrote:
> On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
>> I've recently been looking at our entry/exit costs, and profiling
>> figures did show some very low hanging fruits.
>>
>> The most obvious cost is that accessing the GIC HW is slow. As in
>> "deadly slow", specially when GICv2 is involved. So not hammering the
>> HW when there is nothing to write is immediately beneficial, as this
>> is the most common cases (whatever people seem to think, interrupts
>> are a *rare* event).
>>
>> Another easy thing to fix is the way we handle trapped system
>> registers. We do insist on (mostly) sorting them, but we do perform a
>> linear search on trap. We can switch to a binary search for free, and
>> get immediate benefits (the PMU code, being extremely trap-happy,
>> benefits immediately from this).
>>
>> With these in place, I see an improvement of 20 to 30% (depending on
>> the platform) on our world-switch cycle count when running a set of
>> hand-crafted guests that are designed to only perform traps.
> 
> I'm curious about the weight of these two?  My guess based on the
> measurement work I did is that the GIC is by far the worst sinner, but
> that was exacerbated on X-Gene compared to Seattle.

Indeed, the GIC is the real pig. 80% of the benefit is provided by not
accessing it when not absolutely required. The sysreg access is only
visible for workloads that are extremely trap-happy, but that's what
happens with as soon as you start exercising the PMU code.

>>
>> Methodology:
>>
>> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
>> and then a power-off:
>>
>> __start:
>> 	mov	x19, #(1 << 16)
>> 1:	mov	x0, #0x84000000
>> 	hvc	#0
>> 	sub	x19, x19, #1
>> 	cbnz	x19, 1b
>> 	mov	x0, #0x84000000
>> 	add	x0, x0, #9
>> 	hvc	#0
>> 	b	.
>>
>> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
>>
>> __start:
>> 	mov	x19, #(1 << 20)
>> 1:	mrs	x0, PMSELR_EL0
>> 	sub	x19, x19, #1
>> 	cbnz	x19, 1b
>> 	mov	x0, #0x84000000
>> 	add	x0, x0, #9
>> 	hvc	#0
>> 	b	.
>>
>> * These guests are profiled using perf and kvmtool:
>>
>> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles
> 
> these would be good to add to kvm-unit-tests so we can keep an eye on
> this sort of thing...

Yeah, I was thinking of that too. In the meantime, I've also created a
GICv2 self-IPI test case, which has led to further improvement (a 10%
reduction in the number of cycles on Seattle). The ugly thing about that
test is that it knows where kvmtool places the GIC (I didn't fancy
parsing the DT in assembly code). Hopefully there is a way to abstract this.

We definitely run that kind of things on a regular basis and track the
evolutions...

> 
>>
>> The result is then divided by the number of iterations (2^16 or 2^20).
>>
>> These tests have been run on Seattle, Mustang, and LS2085, and shown
>> significant improvements in all cases. I've only touched the arm64
>> GIC code, but obviously the 32bit code should use it as well once
>> we've migrated it to C.
>>
>> I've pushed out a branch (kvm-arm64/suck-less) to the usual location.
>>
> 
> Looks promising!

I thought as much. I'll keep on updating this branch, as it looks like
there is a few more low hanging fruits around there...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
@ 2016-02-10  8:34     ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10  8:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/02/16 20:59, Christoffer Dall wrote:
> On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
>> I've recently been looking at our entry/exit costs, and profiling
>> figures did show some very low hanging fruits.
>>
>> The most obvious cost is that accessing the GIC HW is slow. As in
>> "deadly slow", specially when GICv2 is involved. So not hammering the
>> HW when there is nothing to write is immediately beneficial, as this
>> is the most common cases (whatever people seem to think, interrupts
>> are a *rare* event).
>>
>> Another easy thing to fix is the way we handle trapped system
>> registers. We do insist on (mostly) sorting them, but we do perform a
>> linear search on trap. We can switch to a binary search for free, and
>> get immediate benefits (the PMU code, being extremely trap-happy,
>> benefits immediately from this).
>>
>> With these in place, I see an improvement of 20 to 30% (depending on
>> the platform) on our world-switch cycle count when running a set of
>> hand-crafted guests that are designed to only perform traps.
> 
> I'm curious about the weight of these two?  My guess based on the
> measurement work I did is that the GIC is by far the worst sinner, but
> that was exacerbated on X-Gene compared to Seattle.

Indeed, the GIC is the real pig. 80% of the benefit is provided by not
accessing it when not absolutely required. The sysreg access is only
visible for workloads that are extremely trap-happy, but that's what
happens with as soon as you start exercising the PMU code.

>>
>> Methodology:
>>
>> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
>> and then a power-off:
>>
>> __start:
>> 	mov	x19, #(1 << 16)
>> 1:	mov	x0, #0x84000000
>> 	hvc	#0
>> 	sub	x19, x19, #1
>> 	cbnz	x19, 1b
>> 	mov	x0, #0x84000000
>> 	add	x0, x0, #9
>> 	hvc	#0
>> 	b	.
>>
>> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
>>
>> __start:
>> 	mov	x19, #(1 << 20)
>> 1:	mrs	x0, PMSELR_EL0
>> 	sub	x19, x19, #1
>> 	cbnz	x19, 1b
>> 	mov	x0, #0x84000000
>> 	add	x0, x0, #9
>> 	hvc	#0
>> 	b	.
>>
>> * These guests are profiled using perf and kvmtool:
>>
>> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles
> 
> these would be good to add to kvm-unit-tests so we can keep an eye on
> this sort of thing...

Yeah, I was thinking of that too. In the meantime, I've also created a
GICv2 self-IPI test case, which has led to further improvement (a 10%
reduction in the number of cycles on Seattle). The ugly thing about that
test is that it knows where kvmtool places the GIC (I didn't fancy
parsing the DT in assembly code). Hopefully there is a way to abstract this.

We definitely run that kind of things on a regular basis and track the
evolutions...

> 
>>
>> The result is then divided by the number of iterations (2^16 or 2^20).
>>
>> These tests have been run on Seattle, Mustang, and LS2085, and shown
>> significant improvements in all cases. I've only touched the arm64
>> GIC code, but obviously the 32bit code should use it as well once
>> we've migrated it to C.
>>
>> I've pushed out a branch (kvm-arm64/suck-less) to the usual location.
>>
> 
> Looks promising!

I thought as much. I'll keep on updating this branch, as it looks like
there is a few more low hanging fruits around there...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
  2016-02-10  8:34     ` Marc Zyngier
@ 2016-02-10 12:02       ` Andrew Jones
  -1 siblings, 0 replies; 60+ messages in thread
From: Andrew Jones @ 2016-02-10 12:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, linux-arm-kernel, kvm, kvmarm, andre.przywara

On Wed, Feb 10, 2016 at 08:34:21AM +0000, Marc Zyngier wrote:
> On 09/02/16 20:59, Christoffer Dall wrote:
> > On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
> >> I've recently been looking at our entry/exit costs, and profiling
> >> figures did show some very low hanging fruits.
> >>
> >> The most obvious cost is that accessing the GIC HW is slow. As in
> >> "deadly slow", specially when GICv2 is involved. So not hammering the
> >> HW when there is nothing to write is immediately beneficial, as this
> >> is the most common cases (whatever people seem to think, interrupts
> >> are a *rare* event).
> >>
> >> Another easy thing to fix is the way we handle trapped system
> >> registers. We do insist on (mostly) sorting them, but we do perform a
> >> linear search on trap. We can switch to a binary search for free, and
> >> get immediate benefits (the PMU code, being extremely trap-happy,
> >> benefits immediately from this).
> >>
> >> With these in place, I see an improvement of 20 to 30% (depending on
> >> the platform) on our world-switch cycle count when running a set of
> >> hand-crafted guests that are designed to only perform traps.
> > 
> > I'm curious about the weight of these two?  My guess based on the
> > measurement work I did is that the GIC is by far the worst sinner, but
> > that was exacerbated on X-Gene compared to Seattle.
> 
> Indeed, the GIC is the real pig. 80% of the benefit is provided by not
> accessing it when not absolutely required. The sysreg access is only
> visible for workloads that are extremely trap-happy, but that's what
> happens with as soon as you start exercising the PMU code.
> 
> >>
> >> Methodology:
> >>
> >> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
> >> and then a power-off:
> >>
> >> __start:
> >> 	mov	x19, #(1 << 16)
> >> 1:	mov	x0, #0x84000000
> >> 	hvc	#0
> >> 	sub	x19, x19, #1
> >> 	cbnz	x19, 1b
> >> 	mov	x0, #0x84000000
> >> 	add	x0, x0, #9
> >> 	hvc	#0
> >> 	b	.
> >>
> >> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
> >>
> >> __start:
> >> 	mov	x19, #(1 << 20)
> >> 1:	mrs	x0, PMSELR_EL0
> >> 	sub	x19, x19, #1
> >> 	cbnz	x19, 1b
> >> 	mov	x0, #0x84000000
> >> 	add	x0, x0, #9
> >> 	hvc	#0
> >> 	b	.
> >>
> >> * These guests are profiled using perf and kvmtool:
> >>
> >> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles
> > 
> > these would be good to add to kvm-unit-tests so we can keep an eye on
> > this sort of thing...

I can work on that. (Actually I already had put this on my TODO when I
saw this series. Your interest in it just bumped it up in priority :-)

> 
> Yeah, I was thinking of that too. In the meantime, I've also created a
> GICv2 self-IPI test case, which has led to further improvement (a 10%
> reduction in the number of cycles on Seattle). The ugly thing about that
> test is that it knows where kvmtool places the GIC (I didn't fancy
> parsing the DT in assembly code). Hopefully there is a way to abstract this.

I have a simple IPI test written for kvm-unit-tests already[*], but it's
been laying around for a while. I can dust it off and make a self-IPI
test out of it yet today though. I've been hesitating to post any gic
related stuff to kvm-unit-tests, because I know Andre has been looking
into it (and he has the gic expertise to do it more cleanly than I). I'll
go ahead and post my little thing now though, as he can always review it
and/or clean it up later :-)

[*] https://github.com/rhdrjones/kvm-unit-tests/commit/05af9b0361ac5eab58f46e5451e585c9625c3b75

Thanks,
drew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
@ 2016-02-10 12:02       ` Andrew Jones
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Jones @ 2016-02-10 12:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 10, 2016 at 08:34:21AM +0000, Marc Zyngier wrote:
> On 09/02/16 20:59, Christoffer Dall wrote:
> > On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
> >> I've recently been looking at our entry/exit costs, and profiling
> >> figures did show some very low hanging fruits.
> >>
> >> The most obvious cost is that accessing the GIC HW is slow. As in
> >> "deadly slow", specially when GICv2 is involved. So not hammering the
> >> HW when there is nothing to write is immediately beneficial, as this
> >> is the most common cases (whatever people seem to think, interrupts
> >> are a *rare* event).
> >>
> >> Another easy thing to fix is the way we handle trapped system
> >> registers. We do insist on (mostly) sorting them, but we do perform a
> >> linear search on trap. We can switch to a binary search for free, and
> >> get immediate benefits (the PMU code, being extremely trap-happy,
> >> benefits immediately from this).
> >>
> >> With these in place, I see an improvement of 20 to 30% (depending on
> >> the platform) on our world-switch cycle count when running a set of
> >> hand-crafted guests that are designed to only perform traps.
> > 
> > I'm curious about the weight of these two?  My guess based on the
> > measurement work I did is that the GIC is by far the worst sinner, but
> > that was exacerbated on X-Gene compared to Seattle.
> 
> Indeed, the GIC is the real pig. 80% of the benefit is provided by not
> accessing it when not absolutely required. The sysreg access is only
> visible for workloads that are extremely trap-happy, but that's what
> happens with as soon as you start exercising the PMU code.
> 
> >>
> >> Methodology:
> >>
> >> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
> >> and then a power-off:
> >>
> >> __start:
> >> 	mov	x19, #(1 << 16)
> >> 1:	mov	x0, #0x84000000
> >> 	hvc	#0
> >> 	sub	x19, x19, #1
> >> 	cbnz	x19, 1b
> >> 	mov	x0, #0x84000000
> >> 	add	x0, x0, #9
> >> 	hvc	#0
> >> 	b	.
> >>
> >> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
> >>
> >> __start:
> >> 	mov	x19, #(1 << 20)
> >> 1:	mrs	x0, PMSELR_EL0
> >> 	sub	x19, x19, #1
> >> 	cbnz	x19, 1b
> >> 	mov	x0, #0x84000000
> >> 	add	x0, x0, #9
> >> 	hvc	#0
> >> 	b	.
> >>
> >> * These guests are profiled using perf and kvmtool:
> >>
> >> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles
> > 
> > these would be good to add to kvm-unit-tests so we can keep an eye on
> > this sort of thing...

I can work on that. (Actually I already had put this on my TODO when I
saw this series. Your interest in it just bumped it up in priority :-)

> 
> Yeah, I was thinking of that too. In the meantime, I've also created a
> GICv2 self-IPI test case, which has led to further improvement (a 10%
> reduction in the number of cycles on Seattle). The ugly thing about that
> test is that it knows where kvmtool places the GIC (I didn't fancy
> parsing the DT in assembly code). Hopefully there is a way to abstract this.

I have a simple IPI test written for kvm-unit-tests already[*], but it's
been laying around for a while. I can dust it off and make a self-IPI
test out of it yet today though. I've been hesitating to post any gic
related stuff to kvm-unit-tests, because I know Andre has been looking
into it (and he has the gic expertise to do it more cleanly than I). I'll
go ahead and post my little thing now though, as he can always review it
and/or clean it up later :-)

[*] https://github.com/rhdrjones/kvm-unit-tests/commit/05af9b0361ac5eab58f46e5451e585c9625c3b75

Thanks,
drew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
  2016-02-10 12:02       ` Andrew Jones
@ 2016-02-10 12:24         ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10 12:24 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Christoffer Dall, linux-arm-kernel, kvm, kvmarm, andre.przywara

On 10/02/16 12:02, Andrew Jones wrote:
> On Wed, Feb 10, 2016 at 08:34:21AM +0000, Marc Zyngier wrote:
>> On 09/02/16 20:59, Christoffer Dall wrote:
>>> On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
>>>> I've recently been looking at our entry/exit costs, and profiling
>>>> figures did show some very low hanging fruits.
>>>>
>>>> The most obvious cost is that accessing the GIC HW is slow. As in
>>>> "deadly slow", specially when GICv2 is involved. So not hammering the
>>>> HW when there is nothing to write is immediately beneficial, as this
>>>> is the most common cases (whatever people seem to think, interrupts
>>>> are a *rare* event).
>>>>
>>>> Another easy thing to fix is the way we handle trapped system
>>>> registers. We do insist on (mostly) sorting them, but we do perform a
>>>> linear search on trap. We can switch to a binary search for free, and
>>>> get immediate benefits (the PMU code, being extremely trap-happy,
>>>> benefits immediately from this).
>>>>
>>>> With these in place, I see an improvement of 20 to 30% (depending on
>>>> the platform) on our world-switch cycle count when running a set of
>>>> hand-crafted guests that are designed to only perform traps.
>>>
>>> I'm curious about the weight of these two?  My guess based on the
>>> measurement work I did is that the GIC is by far the worst sinner, but
>>> that was exacerbated on X-Gene compared to Seattle.
>>
>> Indeed, the GIC is the real pig. 80% of the benefit is provided by not
>> accessing it when not absolutely required. The sysreg access is only
>> visible for workloads that are extremely trap-happy, but that's what
>> happens with as soon as you start exercising the PMU code.
>>
>>>>
>>>> Methodology:
>>>>
>>>> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
>>>> and then a power-off:
>>>>
>>>> __start:
>>>> 	mov	x19, #(1 << 16)
>>>> 1:	mov	x0, #0x84000000
>>>> 	hvc	#0
>>>> 	sub	x19, x19, #1
>>>> 	cbnz	x19, 1b
>>>> 	mov	x0, #0x84000000
>>>> 	add	x0, x0, #9
>>>> 	hvc	#0
>>>> 	b	.
>>>>
>>>> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
>>>>
>>>> __start:
>>>> 	mov	x19, #(1 << 20)
>>>> 1:	mrs	x0, PMSELR_EL0
>>>> 	sub	x19, x19, #1
>>>> 	cbnz	x19, 1b
>>>> 	mov	x0, #0x84000000
>>>> 	add	x0, x0, #9
>>>> 	hvc	#0
>>>> 	b	.
>>>>
>>>> * These guests are profiled using perf and kvmtool:
>>>>
>>>> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles
>>>
>>> these would be good to add to kvm-unit-tests so we can keep an eye on
>>> this sort of thing...
> 
> I can work on that. (Actually I already had put this on my TODO when I
> saw this series. Your interest in it just bumped it up in priority :-)

Ah! You're in charge, then! ;-)

>>
>> Yeah, I was thinking of that too. In the meantime, I've also created a
>> GICv2 self-IPI test case, which has led to further improvement (a 10%
>> reduction in the number of cycles on Seattle). The ugly thing about that
>> test is that it knows where kvmtool places the GIC (I didn't fancy
>> parsing the DT in assembly code). Hopefully there is a way to abstract this.
> 
> I have a simple IPI test written for kvm-unit-tests already[*], but it's
> been laying around for a while. I can dust it off and make a self-IPI
> test out of it yet today though. I've been hesitating to post any gic
> related stuff to kvm-unit-tests, because I know Andre has been looking
> into it (and he has the gic expertise to do it more cleanly than I). I'll
> go ahead and post my little thing now though, as he can always review it
> and/or clean it up later :-)
> 
> [*] https://github.com/rhdrjones/kvm-unit-tests/commit/05af9b0361ac5eab58f46e5451e585c9625c3b75

For the record, the test case I've been running is this:

__start:
	mov	x19, #(1 << 20)

	mov	x0, #0x3fff0000		// Dist
	mov	x1, #0x3ffd0000		// CPU
	mov	w2, #1
	str	w2, [x0]		// Enable Group0
	ldr	w2, =0xa0a0a0a0
	str	w2, [x0, 0x400]		// A0 priority for SGI0-3
	mov	w2, #0x0f
	str	w2, [x0, #0x100]	// Enable SGI0-3
	mov	w2, #0xf0
	str	w2, [x1, #4]		// PMR
	mov	w2, #1
	str	w2, [x1]		// Enable CPU interface
	
1:
	mov	w2, #(2 << 24)		// Interrupt self with SGI0
	str	w2, [x0, #0xf00]

2:	ldr	w2, [x1, #0x0c]		// GICC_IAR
	cmp	w2, #0x3ff
	b.ne	3f

	wfi
	b	2b

3:	str	w2, [x1, #0x10]		// EOI

	sub	x19, x19, #1
	cbnz	x19, 1b

// Die
	mov	x0, #0x84000000
	add	x0, x0, #9
	hvc	#0
	b	.

Feel free to adapt it so it fits in your framework if you find it useful
(but I guess you'll be inclined to rewrite it in C).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
@ 2016-02-10 12:24         ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/02/16 12:02, Andrew Jones wrote:
> On Wed, Feb 10, 2016 at 08:34:21AM +0000, Marc Zyngier wrote:
>> On 09/02/16 20:59, Christoffer Dall wrote:
>>> On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
>>>> I've recently been looking at our entry/exit costs, and profiling
>>>> figures did show some very low hanging fruits.
>>>>
>>>> The most obvious cost is that accessing the GIC HW is slow. As in
>>>> "deadly slow", specially when GICv2 is involved. So not hammering the
>>>> HW when there is nothing to write is immediately beneficial, as this
>>>> is the most common cases (whatever people seem to think, interrupts
>>>> are a *rare* event).
>>>>
>>>> Another easy thing to fix is the way we handle trapped system
>>>> registers. We do insist on (mostly) sorting them, but we do perform a
>>>> linear search on trap. We can switch to a binary search for free, and
>>>> get immediate benefits (the PMU code, being extremely trap-happy,
>>>> benefits immediately from this).
>>>>
>>>> With these in place, I see an improvement of 20 to 30% (depending on
>>>> the platform) on our world-switch cycle count when running a set of
>>>> hand-crafted guests that are designed to only perform traps.
>>>
>>> I'm curious about the weight of these two?  My guess based on the
>>> measurement work I did is that the GIC is by far the worst sinner, but
>>> that was exacerbated on X-Gene compared to Seattle.
>>
>> Indeed, the GIC is the real pig. 80% of the benefit is provided by not
>> accessing it when not absolutely required. The sysreg access is only
>> visible for workloads that are extremely trap-happy, but that's what
>> happens with as soon as you start exercising the PMU code.
>>
>>>>
>>>> Methodology:
>>>>
>>>> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
>>>> and then a power-off:
>>>>
>>>> __start:
>>>> 	mov	x19, #(1 << 16)
>>>> 1:	mov	x0, #0x84000000
>>>> 	hvc	#0
>>>> 	sub	x19, x19, #1
>>>> 	cbnz	x19, 1b
>>>> 	mov	x0, #0x84000000
>>>> 	add	x0, x0, #9
>>>> 	hvc	#0
>>>> 	b	.
>>>>
>>>> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
>>>>
>>>> __start:
>>>> 	mov	x19, #(1 << 20)
>>>> 1:	mrs	x0, PMSELR_EL0
>>>> 	sub	x19, x19, #1
>>>> 	cbnz	x19, 1b
>>>> 	mov	x0, #0x84000000
>>>> 	add	x0, x0, #9
>>>> 	hvc	#0
>>>> 	b	.
>>>>
>>>> * These guests are profiled using perf and kvmtool:
>>>>
>>>> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles
>>>
>>> these would be good to add to kvm-unit-tests so we can keep an eye on
>>> this sort of thing...
> 
> I can work on that. (Actually I already had put this on my TODO when I
> saw this series. Your interest in it just bumped it up in priority :-)

Ah! You're in charge, then! ;-)

>>
>> Yeah, I was thinking of that too. In the meantime, I've also created a
>> GICv2 self-IPI test case, which has led to further improvement (a 10%
>> reduction in the number of cycles on Seattle). The ugly thing about that
>> test is that it knows where kvmtool places the GIC (I didn't fancy
>> parsing the DT in assembly code). Hopefully there is a way to abstract this.
> 
> I have a simple IPI test written for kvm-unit-tests already[*], but it's
> been laying around for a while. I can dust it off and make a self-IPI
> test out of it yet today though. I've been hesitating to post any gic
> related stuff to kvm-unit-tests, because I know Andre has been looking
> into it (and he has the gic expertise to do it more cleanly than I). I'll
> go ahead and post my little thing now though, as he can always review it
> and/or clean it up later :-)
> 
> [*] https://github.com/rhdrjones/kvm-unit-tests/commit/05af9b0361ac5eab58f46e5451e585c9625c3b75

For the record, the test case I've been running is this:

__start:
	mov	x19, #(1 << 20)

	mov	x0, #0x3fff0000		// Dist
	mov	x1, #0x3ffd0000		// CPU
	mov	w2, #1
	str	w2, [x0]		// Enable Group0
	ldr	w2, =0xa0a0a0a0
	str	w2, [x0, 0x400]		// A0 priority for SGI0-3
	mov	w2, #0x0f
	str	w2, [x0, #0x100]	// Enable SGI0-3
	mov	w2, #0xf0
	str	w2, [x1, #4]		// PMR
	mov	w2, #1
	str	w2, [x1]		// Enable CPU interface
	
1:
	mov	w2, #(2 << 24)		// Interrupt self with SGI0
	str	w2, [x0, #0xf00]

2:	ldr	w2, [x1, #0x0c]		// GICC_IAR
	cmp	w2, #0x3ff
	b.ne	3f

	wfi
	b	2b

3:	str	w2, [x1, #0x10]		// EOI

	sub	x19, x19, #1
	cbnz	x19, 1b

// Die
	mov	x0, #0x84000000
	add	x0, x0, #9
	hvc	#0
	b	.

Feel free to adapt it so it fits in your framework if you find it useful
(but I guess you'll be inclined to rewrite it in C).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search
  2016-02-08 11:40   ` Marc Zyngier
@ 2016-02-10 12:44     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Mon, Feb 08, 2016 at 11:40:15AM +0000, Marc Zyngier wrote:
> Our 64bit sys_reg table is about 90 entries long (so far, and the
> PMU support is likely to increase this). This means that on average,
> it takes 45 comparaisons to find the right entry (and actually the
> full 90 if we have to search the invariant table).
> 
> Not the most efficient thing. Specially when you think that this
> table is already sorted. Switching to a binary search effectively
> reduces the search to about 7 comparaisons. Slightly better!
> 
> As an added bonus, the comparaison is done by comparing all the

s/comparaison/comparison/

> fields at once, instead of one at a time.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search
@ 2016-02-10 12:44     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 08, 2016 at 11:40:15AM +0000, Marc Zyngier wrote:
> Our 64bit sys_reg table is about 90 entries long (so far, and the
> PMU support is likely to increase this). This means that on average,
> it takes 45 comparaisons to find the right entry (and actually the
> full 90 if we have to search the invariant table).
> 
> Not the most efficient thing. Specially when you think that this
> table is already sorted. Switching to a binary search effectively
> reduces the search to about 7 comparaisons. Slightly better!
> 
> As an added bonus, the comparaison is done by comparing all the

s/comparaison/comparison/

> fields at once, instead of one at a time.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/8] ARM: KVM: Properly sort the invariant table
  2016-02-08 11:40   ` Marc Zyngier
@ 2016-02-10 12:44     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Mon, Feb 08, 2016 at 11:40:16AM +0000, Marc Zyngier wrote:
> Not having the invariant table properly sorted is an oddity, and
> may get in the way of future optimisations. Let's fix it.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 2/8] ARM: KVM: Properly sort the invariant table
@ 2016-02-10 12:44     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 08, 2016 at 11:40:16AM +0000, Marc Zyngier wrote:
> Not having the invariant table properly sorted is an oddity, and
> may get in the way of future optimisations. Let's fix it.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/8] ARM: KVM: Enforce sorting of all CP tables
  2016-02-08 11:40   ` Marc Zyngier
@ 2016-02-10 12:44     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Mon, Feb 08, 2016 at 11:40:17AM +0000, Marc Zyngier wrote:
> Since we're obviously terrible at sorting the CP tables, make sure
> we're going to do it properly (or fail to boot). arm64 has had the
> same mechanism for a while, and nobody ever broke it...
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 3/8] ARM: KVM: Enforce sorting of all CP tables
@ 2016-02-10 12:44     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 08, 2016 at 11:40:17AM +0000, Marc Zyngier wrote:
> Since we're obviously terrible at sorting the CP tables, make sure
> we're going to do it properly (or fail to boot). arm64 has had the
> same mechanism for a while, and nobody ever broke it...
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/8] ARM: KVM: Switch the CP reg search to be a binary search
  2016-02-08 11:40   ` Marc Zyngier
@ 2016-02-10 12:44     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Mon, Feb 08, 2016 at 11:40:19AM +0000, Marc Zyngier wrote:
> Doing a linear search is a bit silly when we can do a binary search.
> Not that we trap that so many things that it has become a burden yet,
> but it makes sense to align it with the arm64 code.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 5/8] ARM: KVM: Switch the CP reg search to be a binary search
@ 2016-02-10 12:44     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 08, 2016 at 11:40:19AM +0000, Marc Zyngier wrote:
> Doing a linear search is a bit silly when we can do a binary search.
> Not that we trap that so many things that it has become a burden yet,
> but it makes sense to align it with the arm64 code.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 4/8] ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit
  2016-02-08 11:40   ` Marc Zyngier
@ 2016-02-10 12:44     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Mon, Feb 08, 2016 at 11:40:18AM +0000, Marc Zyngier wrote:
> As we're going to play some tricks on the struct coproc_reg,
> make sure its 64bit indicator field matches that of coproc_params.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 4/8] ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit
@ 2016-02-10 12:44     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 08, 2016 at 11:40:18AM +0000, Marc Zyngier wrote:
> As we're going to play some tricks on the struct coproc_reg,
> make sure its 64bit indicator field matches that of coproc_params.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 6/8] KVM: arm/arm64: timer: Add active state caching
  2016-02-08 11:40   ` Marc Zyngier
@ 2016-02-10 12:44     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvm, kvmarm

On Mon, Feb 08, 2016 at 11:40:20AM +0000, Marc Zyngier wrote:
> Programming the active state in the (re)distributor can be an
> expensive operation so it makes some sense to try and reduce
> the number of accesses as much as possible. So far, we
> program the active state on each VM entry, but there is some
> opportunity to do less.
> 
> An obvious solution is to cache the active state in memory,
> and only program it in the HW when conditions change. But
> because the HW can also change things under our feet (the active
> state can transition from 1 to 0 when the guest does an EOI),
> some precautions have to be taken, which amount to only caching
> an "inactive" state, and always programing it otherwise.
> 
> With this in place, we observe a reduction of around 700 cycles
> on a 2GHz GICv2 platform for a NULL hypercall.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 6/8] KVM: arm/arm64: timer: Add active state caching
@ 2016-02-10 12:44     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 08, 2016 at 11:40:20AM +0000, Marc Zyngier wrote:
> Programming the active state in the (re)distributor can be an
> expensive operation so it makes some sense to try and reduce
> the number of accesses as much as possible. So far, we
> program the active state on each VM entry, but there is some
> opportunity to do less.
> 
> An obvious solution is to cache the active state in memory,
> and only program it in the HW when conditions change. But
> because the HW can also change things under our feet (the active
> state can transition from 1 to 0 when the guest does an EOI),
> some precautions have to be taken, which amount to only caching
> an "inactive" state, and always programing it otherwise.
> 
> With this in place, we observe a reduction of around 700 cycles
> on a 2GHz GICv2 platform for a NULL hypercall.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers
  2016-02-08 11:40   ` Marc Zyngier
@ 2016-02-10 12:45     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:45 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvm, kvmarm

On Mon, Feb 08, 2016 at 11:40:21AM +0000, Marc Zyngier wrote:
> GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
> But we're equaly bad, as we make a point in accessing them even if
> we don't have any interrupt in flight.
> 
> A good solution is to first find out if we have anything useful to
> write into the GIC, and if we don't, to simply not do it. This
> involves tracking which LRs actually have something valid there.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 71 ++++++++++++++++++++++++++++-------------
>  include/kvm/arm_vgic.h          |  2 ++
>  2 files changed, 51 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> index e717612..874a08d 100644
> --- a/arch/arm64/kvm/hyp/vgic-v2-sr.c
> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> @@ -38,28 +38,40 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
>  
>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>  	cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
> -	cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
> -	eisr0  = readl_relaxed(base + GICH_EISR0);
> -	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> -	if (unlikely(nr_lr > 32)) {
> -		eisr1  = readl_relaxed(base + GICH_EISR1);
> -		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> -	} else {
> -		eisr1 = elrsr1 = 0;
> -	}
> +
> +	if (vcpu->arch.vgic_cpu.live_lrs) {
> +		eisr0  = readl_relaxed(base + GICH_EISR0);
> +		elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> +		cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
> +		cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
> +
> +		if (unlikely(nr_lr > 32)) {
> +			eisr1  = readl_relaxed(base + GICH_EISR1);
> +			elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> +		} else {
> +			eisr1 = elrsr1 = 0;
> +		}
> +
>  #ifdef CONFIG_CPU_BIG_ENDIAN
> -	cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
> -	cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
> +		cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
> +		cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>  #else
> -	cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
> -	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
> +		cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
> +		cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>  #endif
> -	cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
>  
> -	writel_relaxed(0, base + GICH_HCR);
> +		for (i = 0; i < nr_lr; i++)
> +			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
> +				cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
>  
> -	for (i = 0; i < nr_lr; i++)
> -		cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
> +		writel_relaxed(0, base + GICH_HCR);
> +
> +		vcpu->arch.vgic_cpu.live_lrs = 0;
> +	} else {
> +		cpu_if->vgic_eisr = 0;
> +		cpu_if->vgic_elrsr = ~0UL;
> +		cpu_if->vgic_misr = 0;
> +	}
>  }
>  
>  /* vcpu is already in the HYP VA space */
> @@ -70,15 +82,30 @@ void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
>  	struct vgic_dist *vgic = &kvm->arch.vgic;
>  	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
>  	int i, nr_lr;
> +	u64 live_lrs = 0;
>  
>  	if (!base)
>  		return;
>  
> -	writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> -	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
> -	writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> -
>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
> +
>  	for (i = 0; i < nr_lr; i++)
> -		writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
> +		if (cpu_if->vgic_lr[i] & GICH_LR_STATE)
> +			live_lrs |= 1UL << i;
> +
> +	if (live_lrs) {
> +		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> +		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> +		for (i = 0; i < nr_lr; i++) {
> +			u32 val = 0;
> +
> +			if (live_lrs & (1UL << i))
> +				val = cpu_if->vgic_lr[i];
> +
> +			writel_relaxed(val, base + GICH_LR0 + (i * 4));
> +		}
> +	}
> +
> +	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);

couldn't you optimize this out by storing the last read value and
compare if anything changed?  (you'd have to invalidate the cached value
on vcpu_put obviously).

> +	vcpu->arch.vgic_cpu.live_lrs = live_lrs;
>  }
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 13a3d53..f473fd6 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -321,6 +321,8 @@ struct vgic_cpu {
>  
>  	/* Protected by the distributor's irq_phys_map_lock */
>  	struct list_head	irq_phys_map_list;
> +
> +	u64		live_lrs;
>  };
>  
>  #define LR_EMPTY	0xff
> -- 
> 2.1.4
> 

Otherwise:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers
@ 2016-02-10 12:45     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 08, 2016 at 11:40:21AM +0000, Marc Zyngier wrote:
> GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
> But we're equaly bad, as we make a point in accessing them even if
> we don't have any interrupt in flight.
> 
> A good solution is to first find out if we have anything useful to
> write into the GIC, and if we don't, to simply not do it. This
> involves tracking which LRs actually have something valid there.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 71 ++++++++++++++++++++++++++++-------------
>  include/kvm/arm_vgic.h          |  2 ++
>  2 files changed, 51 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> index e717612..874a08d 100644
> --- a/arch/arm64/kvm/hyp/vgic-v2-sr.c
> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> @@ -38,28 +38,40 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
>  
>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>  	cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
> -	cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
> -	eisr0  = readl_relaxed(base + GICH_EISR0);
> -	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> -	if (unlikely(nr_lr > 32)) {
> -		eisr1  = readl_relaxed(base + GICH_EISR1);
> -		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> -	} else {
> -		eisr1 = elrsr1 = 0;
> -	}
> +
> +	if (vcpu->arch.vgic_cpu.live_lrs) {
> +		eisr0  = readl_relaxed(base + GICH_EISR0);
> +		elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> +		cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
> +		cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
> +
> +		if (unlikely(nr_lr > 32)) {
> +			eisr1  = readl_relaxed(base + GICH_EISR1);
> +			elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> +		} else {
> +			eisr1 = elrsr1 = 0;
> +		}
> +
>  #ifdef CONFIG_CPU_BIG_ENDIAN
> -	cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
> -	cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
> +		cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
> +		cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>  #else
> -	cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
> -	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
> +		cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
> +		cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>  #endif
> -	cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
>  
> -	writel_relaxed(0, base + GICH_HCR);
> +		for (i = 0; i < nr_lr; i++)
> +			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
> +				cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
>  
> -	for (i = 0; i < nr_lr; i++)
> -		cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
> +		writel_relaxed(0, base + GICH_HCR);
> +
> +		vcpu->arch.vgic_cpu.live_lrs = 0;
> +	} else {
> +		cpu_if->vgic_eisr = 0;
> +		cpu_if->vgic_elrsr = ~0UL;
> +		cpu_if->vgic_misr = 0;
> +	}
>  }
>  
>  /* vcpu is already in the HYP VA space */
> @@ -70,15 +82,30 @@ void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
>  	struct vgic_dist *vgic = &kvm->arch.vgic;
>  	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
>  	int i, nr_lr;
> +	u64 live_lrs = 0;
>  
>  	if (!base)
>  		return;
>  
> -	writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> -	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
> -	writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> -
>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
> +
>  	for (i = 0; i < nr_lr; i++)
> -		writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
> +		if (cpu_if->vgic_lr[i] & GICH_LR_STATE)
> +			live_lrs |= 1UL << i;
> +
> +	if (live_lrs) {
> +		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> +		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> +		for (i = 0; i < nr_lr; i++) {
> +			u32 val = 0;
> +
> +			if (live_lrs & (1UL << i))
> +				val = cpu_if->vgic_lr[i];
> +
> +			writel_relaxed(val, base + GICH_LR0 + (i * 4));
> +		}
> +	}
> +
> +	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);

couldn't you optimize this out by storing the last read value and
compare if anything changed?  (you'd have to invalidate the cached value
on vcpu_put obviously).

> +	vcpu->arch.vgic_cpu.live_lrs = live_lrs;
>  }
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 13a3d53..f473fd6 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -321,6 +321,8 @@ struct vgic_cpu {
>  
>  	/* Protected by the distributor's irq_phys_map_lock */
>  	struct list_head	irq_phys_map_list;
> +
> +	u64		live_lrs;
>  };
>  
>  #define LR_EMPTY	0xff
> -- 
> 2.1.4
> 

Otherwise:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/8] KVM: arm64: Avoid accessing ICH registers
  2016-02-08 11:40   ` Marc Zyngier
@ 2016-02-10 12:45     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:45 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvm, kvmarm

On Mon, Feb 08, 2016 at 11:40:22AM +0000, Marc Zyngier wrote:
> Just like on GICv2, we're a bit hammer-happy with GICv3, and access
> them more often than we should.
> 
> Adopt a policy similar to what we do for GICv2, only save/restoring
> the minimal set of registers. As we don't access the registers
> linearly anymore (we may skip some), the convoluted accessors become
> slightly simpler, and we can drop the ugly indexing macro that
> tended to confuse the reviewers.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 288 ++++++++++++++++++++++++----------------
>  include/kvm/arm_vgic.h          |   6 -
>  virt/kvm/arm/vgic-v3.c          |   4 +-
>  3 files changed, 176 insertions(+), 122 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> index 9142e082..d3813f5 100644
> --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -39,12 +39,104 @@
>  		asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
>  	} while (0)
>  
> -/* vcpu is already in the HYP VA space */
> +static u64 __hyp_text __gic_v3_get_lr(unsigned int lr)
> +{
> +	switch (lr & 0xf) {
> +	case 0:
> +		return read_gicreg(ICH_LR0_EL2);
> +	case 1:
> +		return read_gicreg(ICH_LR1_EL2);
> +	case 2:
> +		return read_gicreg(ICH_LR2_EL2);
> +	case 3:
> +		return read_gicreg(ICH_LR3_EL2);
> +	case 4:
> +		return read_gicreg(ICH_LR4_EL2);
> +	case 5:
> +		return read_gicreg(ICH_LR5_EL2);
> +	case 6:
> +		return read_gicreg(ICH_LR6_EL2);
> +	case 7:
> +		return read_gicreg(ICH_LR7_EL2);
> +	case 8:
> +		return read_gicreg(ICH_LR8_EL2);
> +	case 9:
> +		return read_gicreg(ICH_LR9_EL2);
> +	case 10:
> +		return read_gicreg(ICH_LR10_EL2);
> +	case 11:
> +		return read_gicreg(ICH_LR11_EL2);
> +	case 12:
> +		return read_gicreg(ICH_LR12_EL2);
> +	case 13:
> +		return read_gicreg(ICH_LR13_EL2);
> +	case 14:
> +		return read_gicreg(ICH_LR14_EL2);
> +	case 15:
> +		return read_gicreg(ICH_LR15_EL2);
> +	}
> +
> +	unreachable();
> +}
> +
> +static void __hyp_text __gic_v3_set_lr(u64 val, int lr)
> +{
> +	switch (lr & 0xf) {
> +	case 0:
> +		write_gicreg(val, ICH_LR0_EL2);
> +		break;
> +	case 1:
> +		write_gicreg(val, ICH_LR1_EL2);
> +		break;
> +	case 2:
> +		write_gicreg(val, ICH_LR2_EL2);
> +		break;
> +	case 3:
> +		write_gicreg(val, ICH_LR3_EL2);
> +		break;
> +	case 4:
> +		write_gicreg(val, ICH_LR4_EL2);
> +		break;
> +	case 5:
> +		write_gicreg(val, ICH_LR5_EL2);
> +		break;
> +	case 6:
> +		write_gicreg(val, ICH_LR6_EL2);
> +		break;
> +	case 7:
> +		write_gicreg(val, ICH_LR7_EL2);
> +		break;
> +	case 8:
> +		write_gicreg(val, ICH_LR8_EL2);
> +		break;
> +	case 9:
> +		write_gicreg(val, ICH_LR9_EL2);
> +		break;
> +	case 10:
> +		write_gicreg(val, ICH_LR10_EL2);
> +		break;
> +	case 11:
> +		write_gicreg(val, ICH_LR11_EL2);
> +		break;
> +	case 12:
> +		write_gicreg(val, ICH_LR12_EL2);
> +		break;
> +	case 13:
> +		write_gicreg(val, ICH_LR13_EL2);
> +		break;
> +	case 14:
> +		write_gicreg(val, ICH_LR14_EL2);
> +		break;
> +	case 15:
> +		write_gicreg(val, ICH_LR15_EL2);
> +		break;
> +	}
> +}
> +
>  void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
>  	u64 val;
> -	u32 max_lr_idx, nr_pri_bits;
>  
>  	/*
>  	 * Make sure stores to the GIC via the memory mapped interface
> @@ -53,68 +145,50 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>  	dsb(st);
>  
>  	cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
> -	cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> -	cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> -	cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>  
> -	write_gicreg(0, ICH_HCR_EL2);
> -	val = read_gicreg(ICH_VTR_EL2);
> -	max_lr_idx = vtr_to_max_lr_idx(val);
> -	nr_pri_bits = vtr_to_nr_pri_bits(val);
> +	if (vcpu->arch.vgic_cpu.live_lrs) {
> +		int i;
> +		u32 max_lr_idx, nr_pri_bits;
>  
> -	switch (max_lr_idx) {
> -	case 15:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)] = read_gicreg(ICH_LR15_EL2);
> -	case 14:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)] = read_gicreg(ICH_LR14_EL2);
> -	case 13:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)] = read_gicreg(ICH_LR13_EL2);
> -	case 12:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)] = read_gicreg(ICH_LR12_EL2);
> -	case 11:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)] = read_gicreg(ICH_LR11_EL2);
> -	case 10:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)] = read_gicreg(ICH_LR10_EL2);
> -	case 9:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)] = read_gicreg(ICH_LR9_EL2);
> -	case 8:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)] = read_gicreg(ICH_LR8_EL2);
> -	case 7:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)] = read_gicreg(ICH_LR7_EL2);
> -	case 6:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)] = read_gicreg(ICH_LR6_EL2);
> -	case 5:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)] = read_gicreg(ICH_LR5_EL2);
> -	case 4:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)] = read_gicreg(ICH_LR4_EL2);
> -	case 3:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)] = read_gicreg(ICH_LR3_EL2);
> -	case 2:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)] = read_gicreg(ICH_LR2_EL2);
> -	case 1:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)] = read_gicreg(ICH_LR1_EL2);
> -	case 0:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)] = read_gicreg(ICH_LR0_EL2);
> -	}
> +		cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> +		cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> +		cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>  
> -	switch (nr_pri_bits) {
> -	case 7:
> -		cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
> -		cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
> -	case 6:
> -		cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
> -	default:
> -		cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
> -	}
> +		write_gicreg(0, ICH_HCR_EL2);
> +		val = read_gicreg(ICH_VTR_EL2);

can't we cache the read of ICH_VTR_EL2 then?

> +		max_lr_idx = vtr_to_max_lr_idx(val);
> +		nr_pri_bits = vtr_to_nr_pri_bits(val);
>  
> -	switch (nr_pri_bits) {
> -	case 7:
> -		cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2);
> -		cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2);
> -	case 6:
> -		cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2);
> -	default:
> -		cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2);
> +		for (i = 0; i <= max_lr_idx; i++) {
> +			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
> +				cpu_if->vgic_lr[i] = __gic_v3_get_lr(i);
> +		}
> +
> +		switch (nr_pri_bits) {
> +		case 7:
> +			cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
> +			cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
> +		case 6:
> +			cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
> +		default:
> +			cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
> +		}
> +
> +		switch (nr_pri_bits) {
> +		case 7:
> +			cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2);
> +			cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2);
> +		case 6:
> +			cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2);
> +		default:
> +			cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2);
> +		}
> +
> +		vcpu->arch.vgic_cpu.live_lrs = 0;
> +	} else {
> +		cpu_if->vgic_misr  = 0;
> +		cpu_if->vgic_eisr  = 0;
> +		cpu_if->vgic_elrsr = 0xffff;
>  	}
>  
>  	val = read_gicreg(ICC_SRE_EL2);
> @@ -128,6 +202,8 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
>  	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
>  	u64 val;
>  	u32 max_lr_idx, nr_pri_bits;
> +	u16 live_lrs = 0;
> +	int i;
>  
>  	/*
>  	 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a
> @@ -140,68 +216,51 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
>  	write_gicreg(cpu_if->vgic_sre, ICC_SRE_EL1);
>  	isb();
>  
> -	write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
> -	write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
> -
>  	val = read_gicreg(ICH_VTR_EL2);

same as above

>  	max_lr_idx = vtr_to_max_lr_idx(val);
>  	nr_pri_bits = vtr_to_nr_pri_bits(val);
>  
> -	switch (nr_pri_bits) {
> -	case 7:
> -		 write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2);
> -		 write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2);
> -	case 6:
> -		 write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2);
> -	default:
> -		 write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2);
> -	}	 	                           
> -		 	                           
> -	switch (nr_pri_bits) {
> -	case 7:
> -		 write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2);
> -		 write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2);
> -	case 6:
> -		 write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2);
> -	default:
> -		 write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2);
> +	for (i = 0; i <= max_lr_idx; i++) {
> +		if (cpu_if->vgic_lr[i] & ICH_LR_STATE)
> +			live_lrs |= (1 << i);
>  	}
>  
> -	switch (max_lr_idx) {
> -	case 15:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)], ICH_LR15_EL2);
> -	case 14:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)], ICH_LR14_EL2);
> -	case 13:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)], ICH_LR13_EL2);
> -	case 12:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)], ICH_LR12_EL2);
> -	case 11:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)], ICH_LR11_EL2);
> -	case 10:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)], ICH_LR10_EL2);
> -	case 9:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)], ICH_LR9_EL2);
> -	case 8:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)], ICH_LR8_EL2);
> -	case 7:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)], ICH_LR7_EL2);
> -	case 6:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)], ICH_LR6_EL2);
> -	case 5:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)], ICH_LR5_EL2);
> -	case 4:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)], ICH_LR4_EL2);
> -	case 3:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)], ICH_LR3_EL2);
> -	case 2:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)], ICH_LR2_EL2);
> -	case 1:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)], ICH_LR1_EL2);
> -	case 0:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)], ICH_LR0_EL2);
> +	write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);

also here you may be able to optimize and cache the last seen in-ardware VMCR.

> +
> +	if (live_lrs) {
> +		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
> +
> +		switch (nr_pri_bits) {
> +		case 7:
> +			write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2);
> +			write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2);
> +		case 6:
> +			write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2);
> +		default:
> +			write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2);
> +		}
> +		 	                           

nit: trailing white space

> +		switch (nr_pri_bits) {
> +		case 7:
> +			write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2);
> +			write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2);
> +		case 6:
> +			write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2);
> +		default:
> +			write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2);
> +		}
> +
> +		for (i = 0; i <= max_lr_idx; i++) {
> +			val = 0;
> +
> +			if (live_lrs & (1 << i))
> +				val = cpu_if->vgic_lr[i];
> +
> +			__gic_v3_set_lr(val, i);
> +		}
>  	}
>  
> +
>  	/*
>  	 * Ensures that the above will have reached the
>  	 * (re)distributors. This ensure the guest will read the
> @@ -209,6 +268,7 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
>  	 */
>  	isb();
>  	dsb(sy);
> +	vcpu->arch.vgic_cpu.live_lrs = live_lrs;
>  
>  	/*
>  	 * Prevent the guest from touching the GIC system registers if
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index f473fd6..281caf8 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -279,12 +279,6 @@ struct vgic_v2_cpu_if {
>  	u32		vgic_lr[VGIC_V2_MAX_LRS];
>  };
>  
> -/*
> - * LRs are stored in reverse order in memory. make sure we index them
> - * correctly.
> - */
> -#define VGIC_V3_LR_INDEX(lr)		(VGIC_V3_MAX_LRS - 1 - lr)
> -
>  struct vgic_v3_cpu_if {
>  #ifdef CONFIG_KVM_ARM_VGIC_V3
>  	u32		vgic_hcr;
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index 453eafd..11b5ff6 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -42,7 +42,7 @@ static u32 ich_vtr_el2;
>  static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  {
>  	struct vgic_lr lr_desc;
> -	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)];
> +	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr];
>  
>  	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
>  		lr_desc.irq = val & ICH_LR_VIRTUALID_MASK;
> @@ -106,7 +106,7 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
>  	}
>  
> -	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)] = lr_val;
> +	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr] = lr_val;
>  
>  	if (!(lr_desc.state & LR_STATE_MASK))
>  		vcpu->arch.vgic_cpu.vgic_v3.vgic_elrsr |= (1U << lr);
> -- 
> 2.1.4
> 

Ignoring potential further optimizations:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 8/8] KVM: arm64: Avoid accessing ICH registers
@ 2016-02-10 12:45     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 08, 2016 at 11:40:22AM +0000, Marc Zyngier wrote:
> Just like on GICv2, we're a bit hammer-happy with GICv3, and access
> them more often than we should.
> 
> Adopt a policy similar to what we do for GICv2, only save/restoring
> the minimal set of registers. As we don't access the registers
> linearly anymore (we may skip some), the convoluted accessors become
> slightly simpler, and we can drop the ugly indexing macro that
> tended to confuse the reviewers.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 288 ++++++++++++++++++++++++----------------
>  include/kvm/arm_vgic.h          |   6 -
>  virt/kvm/arm/vgic-v3.c          |   4 +-
>  3 files changed, 176 insertions(+), 122 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> index 9142e082..d3813f5 100644
> --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -39,12 +39,104 @@
>  		asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
>  	} while (0)
>  
> -/* vcpu is already in the HYP VA space */
> +static u64 __hyp_text __gic_v3_get_lr(unsigned int lr)
> +{
> +	switch (lr & 0xf) {
> +	case 0:
> +		return read_gicreg(ICH_LR0_EL2);
> +	case 1:
> +		return read_gicreg(ICH_LR1_EL2);
> +	case 2:
> +		return read_gicreg(ICH_LR2_EL2);
> +	case 3:
> +		return read_gicreg(ICH_LR3_EL2);
> +	case 4:
> +		return read_gicreg(ICH_LR4_EL2);
> +	case 5:
> +		return read_gicreg(ICH_LR5_EL2);
> +	case 6:
> +		return read_gicreg(ICH_LR6_EL2);
> +	case 7:
> +		return read_gicreg(ICH_LR7_EL2);
> +	case 8:
> +		return read_gicreg(ICH_LR8_EL2);
> +	case 9:
> +		return read_gicreg(ICH_LR9_EL2);
> +	case 10:
> +		return read_gicreg(ICH_LR10_EL2);
> +	case 11:
> +		return read_gicreg(ICH_LR11_EL2);
> +	case 12:
> +		return read_gicreg(ICH_LR12_EL2);
> +	case 13:
> +		return read_gicreg(ICH_LR13_EL2);
> +	case 14:
> +		return read_gicreg(ICH_LR14_EL2);
> +	case 15:
> +		return read_gicreg(ICH_LR15_EL2);
> +	}
> +
> +	unreachable();
> +}
> +
> +static void __hyp_text __gic_v3_set_lr(u64 val, int lr)
> +{
> +	switch (lr & 0xf) {
> +	case 0:
> +		write_gicreg(val, ICH_LR0_EL2);
> +		break;
> +	case 1:
> +		write_gicreg(val, ICH_LR1_EL2);
> +		break;
> +	case 2:
> +		write_gicreg(val, ICH_LR2_EL2);
> +		break;
> +	case 3:
> +		write_gicreg(val, ICH_LR3_EL2);
> +		break;
> +	case 4:
> +		write_gicreg(val, ICH_LR4_EL2);
> +		break;
> +	case 5:
> +		write_gicreg(val, ICH_LR5_EL2);
> +		break;
> +	case 6:
> +		write_gicreg(val, ICH_LR6_EL2);
> +		break;
> +	case 7:
> +		write_gicreg(val, ICH_LR7_EL2);
> +		break;
> +	case 8:
> +		write_gicreg(val, ICH_LR8_EL2);
> +		break;
> +	case 9:
> +		write_gicreg(val, ICH_LR9_EL2);
> +		break;
> +	case 10:
> +		write_gicreg(val, ICH_LR10_EL2);
> +		break;
> +	case 11:
> +		write_gicreg(val, ICH_LR11_EL2);
> +		break;
> +	case 12:
> +		write_gicreg(val, ICH_LR12_EL2);
> +		break;
> +	case 13:
> +		write_gicreg(val, ICH_LR13_EL2);
> +		break;
> +	case 14:
> +		write_gicreg(val, ICH_LR14_EL2);
> +		break;
> +	case 15:
> +		write_gicreg(val, ICH_LR15_EL2);
> +		break;
> +	}
> +}
> +
>  void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
>  	u64 val;
> -	u32 max_lr_idx, nr_pri_bits;
>  
>  	/*
>  	 * Make sure stores to the GIC via the memory mapped interface
> @@ -53,68 +145,50 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>  	dsb(st);
>  
>  	cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
> -	cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> -	cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> -	cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>  
> -	write_gicreg(0, ICH_HCR_EL2);
> -	val = read_gicreg(ICH_VTR_EL2);
> -	max_lr_idx = vtr_to_max_lr_idx(val);
> -	nr_pri_bits = vtr_to_nr_pri_bits(val);
> +	if (vcpu->arch.vgic_cpu.live_lrs) {
> +		int i;
> +		u32 max_lr_idx, nr_pri_bits;
>  
> -	switch (max_lr_idx) {
> -	case 15:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)] = read_gicreg(ICH_LR15_EL2);
> -	case 14:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)] = read_gicreg(ICH_LR14_EL2);
> -	case 13:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)] = read_gicreg(ICH_LR13_EL2);
> -	case 12:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)] = read_gicreg(ICH_LR12_EL2);
> -	case 11:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)] = read_gicreg(ICH_LR11_EL2);
> -	case 10:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)] = read_gicreg(ICH_LR10_EL2);
> -	case 9:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)] = read_gicreg(ICH_LR9_EL2);
> -	case 8:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)] = read_gicreg(ICH_LR8_EL2);
> -	case 7:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)] = read_gicreg(ICH_LR7_EL2);
> -	case 6:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)] = read_gicreg(ICH_LR6_EL2);
> -	case 5:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)] = read_gicreg(ICH_LR5_EL2);
> -	case 4:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)] = read_gicreg(ICH_LR4_EL2);
> -	case 3:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)] = read_gicreg(ICH_LR3_EL2);
> -	case 2:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)] = read_gicreg(ICH_LR2_EL2);
> -	case 1:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)] = read_gicreg(ICH_LR1_EL2);
> -	case 0:
> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)] = read_gicreg(ICH_LR0_EL2);
> -	}
> +		cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> +		cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> +		cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>  
> -	switch (nr_pri_bits) {
> -	case 7:
> -		cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
> -		cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
> -	case 6:
> -		cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
> -	default:
> -		cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
> -	}
> +		write_gicreg(0, ICH_HCR_EL2);
> +		val = read_gicreg(ICH_VTR_EL2);

can't we cache the read of ICH_VTR_EL2 then?

> +		max_lr_idx = vtr_to_max_lr_idx(val);
> +		nr_pri_bits = vtr_to_nr_pri_bits(val);
>  
> -	switch (nr_pri_bits) {
> -	case 7:
> -		cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2);
> -		cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2);
> -	case 6:
> -		cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2);
> -	default:
> -		cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2);
> +		for (i = 0; i <= max_lr_idx; i++) {
> +			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
> +				cpu_if->vgic_lr[i] = __gic_v3_get_lr(i);
> +		}
> +
> +		switch (nr_pri_bits) {
> +		case 7:
> +			cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
> +			cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
> +		case 6:
> +			cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
> +		default:
> +			cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
> +		}
> +
> +		switch (nr_pri_bits) {
> +		case 7:
> +			cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2);
> +			cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2);
> +		case 6:
> +			cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2);
> +		default:
> +			cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2);
> +		}
> +
> +		vcpu->arch.vgic_cpu.live_lrs = 0;
> +	} else {
> +		cpu_if->vgic_misr  = 0;
> +		cpu_if->vgic_eisr  = 0;
> +		cpu_if->vgic_elrsr = 0xffff;
>  	}
>  
>  	val = read_gicreg(ICC_SRE_EL2);
> @@ -128,6 +202,8 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
>  	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
>  	u64 val;
>  	u32 max_lr_idx, nr_pri_bits;
> +	u16 live_lrs = 0;
> +	int i;
>  
>  	/*
>  	 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a
> @@ -140,68 +216,51 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
>  	write_gicreg(cpu_if->vgic_sre, ICC_SRE_EL1);
>  	isb();
>  
> -	write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
> -	write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
> -
>  	val = read_gicreg(ICH_VTR_EL2);

same as above

>  	max_lr_idx = vtr_to_max_lr_idx(val);
>  	nr_pri_bits = vtr_to_nr_pri_bits(val);
>  
> -	switch (nr_pri_bits) {
> -	case 7:
> -		 write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2);
> -		 write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2);
> -	case 6:
> -		 write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2);
> -	default:
> -		 write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2);
> -	}	 	                           
> -		 	                           
> -	switch (nr_pri_bits) {
> -	case 7:
> -		 write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2);
> -		 write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2);
> -	case 6:
> -		 write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2);
> -	default:
> -		 write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2);
> +	for (i = 0; i <= max_lr_idx; i++) {
> +		if (cpu_if->vgic_lr[i] & ICH_LR_STATE)
> +			live_lrs |= (1 << i);
>  	}
>  
> -	switch (max_lr_idx) {
> -	case 15:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)], ICH_LR15_EL2);
> -	case 14:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)], ICH_LR14_EL2);
> -	case 13:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)], ICH_LR13_EL2);
> -	case 12:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)], ICH_LR12_EL2);
> -	case 11:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)], ICH_LR11_EL2);
> -	case 10:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)], ICH_LR10_EL2);
> -	case 9:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)], ICH_LR9_EL2);
> -	case 8:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)], ICH_LR8_EL2);
> -	case 7:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)], ICH_LR7_EL2);
> -	case 6:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)], ICH_LR6_EL2);
> -	case 5:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)], ICH_LR5_EL2);
> -	case 4:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)], ICH_LR4_EL2);
> -	case 3:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)], ICH_LR3_EL2);
> -	case 2:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)], ICH_LR2_EL2);
> -	case 1:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)], ICH_LR1_EL2);
> -	case 0:
> -		write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)], ICH_LR0_EL2);
> +	write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);

also here you may be able to optimize and cache the last seen in-ardware VMCR.

> +
> +	if (live_lrs) {
> +		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
> +
> +		switch (nr_pri_bits) {
> +		case 7:
> +			write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2);
> +			write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2);
> +		case 6:
> +			write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2);
> +		default:
> +			write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2);
> +		}
> +		 	                           

nit: trailing white space

> +		switch (nr_pri_bits) {
> +		case 7:
> +			write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2);
> +			write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2);
> +		case 6:
> +			write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2);
> +		default:
> +			write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2);
> +		}
> +
> +		for (i = 0; i <= max_lr_idx; i++) {
> +			val = 0;
> +
> +			if (live_lrs & (1 << i))
> +				val = cpu_if->vgic_lr[i];
> +
> +			__gic_v3_set_lr(val, i);
> +		}
>  	}
>  
> +
>  	/*
>  	 * Ensures that the above will have reached the
>  	 * (re)distributors. This ensure the guest will read the
> @@ -209,6 +268,7 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
>  	 */
>  	isb();
>  	dsb(sy);
> +	vcpu->arch.vgic_cpu.live_lrs = live_lrs;
>  
>  	/*
>  	 * Prevent the guest from touching the GIC system registers if
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index f473fd6..281caf8 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -279,12 +279,6 @@ struct vgic_v2_cpu_if {
>  	u32		vgic_lr[VGIC_V2_MAX_LRS];
>  };
>  
> -/*
> - * LRs are stored in reverse order in memory. make sure we index them
> - * correctly.
> - */
> -#define VGIC_V3_LR_INDEX(lr)		(VGIC_V3_MAX_LRS - 1 - lr)
> -
>  struct vgic_v3_cpu_if {
>  #ifdef CONFIG_KVM_ARM_VGIC_V3
>  	u32		vgic_hcr;
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index 453eafd..11b5ff6 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -42,7 +42,7 @@ static u32 ich_vtr_el2;
>  static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  {
>  	struct vgic_lr lr_desc;
> -	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)];
> +	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr];
>  
>  	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
>  		lr_desc.irq = val & ICH_LR_VIRTUALID_MASK;
> @@ -106,7 +106,7 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  		lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT;
>  	}
>  
> -	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)] = lr_val;
> +	vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr] = lr_val;
>  
>  	if (!(lr_desc.state & LR_STATE_MASK))
>  		vcpu->arch.vgic_cpu.vgic_v3.vgic_elrsr |= (1U << lr);
> -- 
> 2.1.4
> 

Ignoring potential further optimizations:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers
  2016-02-10 12:45     ` Christoffer Dall
@ 2016-02-10 13:34       ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10 13:34 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On 10/02/16 12:45, Christoffer Dall wrote:
> On Mon, Feb 08, 2016 at 11:40:21AM +0000, Marc Zyngier wrote:
>> GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
>> But we're equaly bad, as we make a point in accessing them even if
>> we don't have any interrupt in flight.
>>
>> A good solution is to first find out if we have anything useful to
>> write into the GIC, and if we don't, to simply not do it. This
>> involves tracking which LRs actually have something valid there.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 71 ++++++++++++++++++++++++++++-------------
>>  include/kvm/arm_vgic.h          |  2 ++
>>  2 files changed, 51 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>> index e717612..874a08d 100644
>> --- a/arch/arm64/kvm/hyp/vgic-v2-sr.c
>> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>> @@ -38,28 +38,40 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
>>  
>>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>>  	cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
>> -	cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
>> -	eisr0  = readl_relaxed(base + GICH_EISR0);
>> -	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
>> -	if (unlikely(nr_lr > 32)) {
>> -		eisr1  = readl_relaxed(base + GICH_EISR1);
>> -		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
>> -	} else {
>> -		eisr1 = elrsr1 = 0;
>> -	}
>> +
>> +	if (vcpu->arch.vgic_cpu.live_lrs) {
>> +		eisr0  = readl_relaxed(base + GICH_EISR0);
>> +		elrsr0 = readl_relaxed(base + GICH_ELRSR0);
>> +		cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
>> +		cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
>> +
>> +		if (unlikely(nr_lr > 32)) {
>> +			eisr1  = readl_relaxed(base + GICH_EISR1);
>> +			elrsr1 = readl_relaxed(base + GICH_ELRSR1);
>> +		} else {
>> +			eisr1 = elrsr1 = 0;
>> +		}
>> +
>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>> -	cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
>> -	cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>> +		cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
>> +		cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>>  #else
>> -	cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
>> -	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>> +		cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
>> +		cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>>  #endif
>> -	cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
>>  
>> -	writel_relaxed(0, base + GICH_HCR);
>> +		for (i = 0; i < nr_lr; i++)
>> +			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
>> +				cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
>>  
>> -	for (i = 0; i < nr_lr; i++)
>> -		cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
>> +		writel_relaxed(0, base + GICH_HCR);
>> +
>> +		vcpu->arch.vgic_cpu.live_lrs = 0;
>> +	} else {
>> +		cpu_if->vgic_eisr = 0;
>> +		cpu_if->vgic_elrsr = ~0UL;
>> +		cpu_if->vgic_misr = 0;
>> +	}
>>  }
>>  
>>  /* vcpu is already in the HYP VA space */
>> @@ -70,15 +82,30 @@ void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
>>  	struct vgic_dist *vgic = &kvm->arch.vgic;
>>  	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
>>  	int i, nr_lr;
>> +	u64 live_lrs = 0;
>>  
>>  	if (!base)
>>  		return;
>>  
>> -	writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
>> -	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
>> -	writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
>> -
>>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>> +
>>  	for (i = 0; i < nr_lr; i++)
>> -		writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
>> +		if (cpu_if->vgic_lr[i] & GICH_LR_STATE)
>> +			live_lrs |= 1UL << i;
>> +
>> +	if (live_lrs) {
>> +		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
>> +		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
>> +		for (i = 0; i < nr_lr; i++) {
>> +			u32 val = 0;
>> +
>> +			if (live_lrs & (1UL << i))
>> +				val = cpu_if->vgic_lr[i];
>> +
>> +			writel_relaxed(val, base + GICH_LR0 + (i * 4));
>> +		}
>> +	}
>> +
>> +	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
> 
> couldn't you optimize this out by storing the last read value and
> compare if anything changed?  (you'd have to invalidate the cached value
> on vcpu_put obviously).

Yeah, very good point. Only the guest can update this, so we could even
move it to vcpu_load/vcpu_put entirely, and never save/restore it inside
the run loop.

I'll keep that for a further patch, as it requires a bit of infrastructure.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers
@ 2016-02-10 13:34       ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/02/16 12:45, Christoffer Dall wrote:
> On Mon, Feb 08, 2016 at 11:40:21AM +0000, Marc Zyngier wrote:
>> GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
>> But we're equaly bad, as we make a point in accessing them even if
>> we don't have any interrupt in flight.
>>
>> A good solution is to first find out if we have anything useful to
>> write into the GIC, and if we don't, to simply not do it. This
>> involves tracking which LRs actually have something valid there.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 71 ++++++++++++++++++++++++++++-------------
>>  include/kvm/arm_vgic.h          |  2 ++
>>  2 files changed, 51 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>> index e717612..874a08d 100644
>> --- a/arch/arm64/kvm/hyp/vgic-v2-sr.c
>> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>> @@ -38,28 +38,40 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
>>  
>>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>>  	cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
>> -	cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
>> -	eisr0  = readl_relaxed(base + GICH_EISR0);
>> -	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
>> -	if (unlikely(nr_lr > 32)) {
>> -		eisr1  = readl_relaxed(base + GICH_EISR1);
>> -		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
>> -	} else {
>> -		eisr1 = elrsr1 = 0;
>> -	}
>> +
>> +	if (vcpu->arch.vgic_cpu.live_lrs) {
>> +		eisr0  = readl_relaxed(base + GICH_EISR0);
>> +		elrsr0 = readl_relaxed(base + GICH_ELRSR0);
>> +		cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
>> +		cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
>> +
>> +		if (unlikely(nr_lr > 32)) {
>> +			eisr1  = readl_relaxed(base + GICH_EISR1);
>> +			elrsr1 = readl_relaxed(base + GICH_ELRSR1);
>> +		} else {
>> +			eisr1 = elrsr1 = 0;
>> +		}
>> +
>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>> -	cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
>> -	cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>> +		cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
>> +		cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>>  #else
>> -	cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
>> -	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>> +		cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
>> +		cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>>  #endif
>> -	cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
>>  
>> -	writel_relaxed(0, base + GICH_HCR);
>> +		for (i = 0; i < nr_lr; i++)
>> +			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
>> +				cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
>>  
>> -	for (i = 0; i < nr_lr; i++)
>> -		cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
>> +		writel_relaxed(0, base + GICH_HCR);
>> +
>> +		vcpu->arch.vgic_cpu.live_lrs = 0;
>> +	} else {
>> +		cpu_if->vgic_eisr = 0;
>> +		cpu_if->vgic_elrsr = ~0UL;
>> +		cpu_if->vgic_misr = 0;
>> +	}
>>  }
>>  
>>  /* vcpu is already in the HYP VA space */
>> @@ -70,15 +82,30 @@ void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
>>  	struct vgic_dist *vgic = &kvm->arch.vgic;
>>  	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
>>  	int i, nr_lr;
>> +	u64 live_lrs = 0;
>>  
>>  	if (!base)
>>  		return;
>>  
>> -	writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
>> -	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
>> -	writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
>> -
>>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>> +
>>  	for (i = 0; i < nr_lr; i++)
>> -		writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
>> +		if (cpu_if->vgic_lr[i] & GICH_LR_STATE)
>> +			live_lrs |= 1UL << i;
>> +
>> +	if (live_lrs) {
>> +		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
>> +		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
>> +		for (i = 0; i < nr_lr; i++) {
>> +			u32 val = 0;
>> +
>> +			if (live_lrs & (1UL << i))
>> +				val = cpu_if->vgic_lr[i];
>> +
>> +			writel_relaxed(val, base + GICH_LR0 + (i * 4));
>> +		}
>> +	}
>> +
>> +	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
> 
> couldn't you optimize this out by storing the last read value and
> compare if anything changed?  (you'd have to invalidate the cached value
> on vcpu_put obviously).

Yeah, very good point. Only the guest can update this, so we could even
move it to vcpu_load/vcpu_put entirely, and never save/restore it inside
the run loop.

I'll keep that for a further patch, as it requires a bit of infrastructure.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search
  2016-02-08 11:40   ` Marc Zyngier
@ 2016-02-10 13:49     ` Alex Bennée
  -1 siblings, 0 replies; 60+ messages in thread
From: Alex Bennée @ 2016-02-10 13:49 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvm, kvmarm


Marc Zyngier <marc.zyngier@arm.com> writes:

> Our 64bit sys_reg table is about 90 entries long (so far, and the
> PMU support is likely to increase this). This means that on average,
> it takes 45 comparaisons to find the right entry (and actually the
> full 90 if we have to search the invariant table).
>
> Not the most efficient thing. Specially when you think that this
> table is already sorted. Switching to a binary search effectively
> reduces the search to about 7 comparaisons. Slightly better!

Is there an argument for making this a hash table instead or is this not
possible as you would have to use dynamically allocated instead?

--
Alex Bennée
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search
@ 2016-02-10 13:49     ` Alex Bennée
  0 siblings, 0 replies; 60+ messages in thread
From: Alex Bennée @ 2016-02-10 13:49 UTC (permalink / raw)
  To: linux-arm-kernel


Marc Zyngier <marc.zyngier@arm.com> writes:

> Our 64bit sys_reg table is about 90 entries long (so far, and the
> PMU support is likely to increase this). This means that on average,
> it takes 45 comparaisons to find the right entry (and actually the
> full 90 if we have to search the invariant table).
>
> Not the most efficient thing. Specially when you think that this
> table is already sorted. Switching to a binary search effectively
> reduces the search to about 7 comparaisons. Slightly better!

Is there an argument for making this a hash table instead or is this not
possible as you would have to use dynamically allocated instead?

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search
  2016-02-10 13:49     ` Alex Bennée
@ 2016-02-10 14:00       ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10 14:00 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Christoffer Dall, kvm, linux-arm-kernel, kvmarm

On 10/02/16 13:49, Alex Bennée wrote:
> 
> Marc Zyngier <marc.zyngier@arm.com> writes:
> 
>> Our 64bit sys_reg table is about 90 entries long (so far, and the
>> PMU support is likely to increase this). This means that on average,
>> it takes 45 comparaisons to find the right entry (and actually the
>> full 90 if we have to search the invariant table).
>>
>> Not the most efficient thing. Specially when you think that this
>> table is already sorted. Switching to a binary search effectively
>> reduces the search to about 7 comparaisons. Slightly better!
> 
> Is there an argument for making this a hash table instead or is this not
> possible as you would have to use dynamically allocated instead?

I believe it would be possible, assuming we have the right hash. Another
alternative would be a radix tree, which would always give us the right
sysreg in four memory accesses. It has some impacts on the memory side,
but that's shouldn't a blocker.

As I said, the binary search was a very low hanging fruit, so it made
some sense to implement it and see how we fared. Finding the perfect
data structure is left as an exercise for the reader! ;-)

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search
@ 2016-02-10 14:00       ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10 14:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/02/16 13:49, Alex Benn?e wrote:
> 
> Marc Zyngier <marc.zyngier@arm.com> writes:
> 
>> Our 64bit sys_reg table is about 90 entries long (so far, and the
>> PMU support is likely to increase this). This means that on average,
>> it takes 45 comparaisons to find the right entry (and actually the
>> full 90 if we have to search the invariant table).
>>
>> Not the most efficient thing. Specially when you think that this
>> table is already sorted. Switching to a binary search effectively
>> reduces the search to about 7 comparaisons. Slightly better!
> 
> Is there an argument for making this a hash table instead or is this not
> possible as you would have to use dynamically allocated instead?

I believe it would be possible, assuming we have the right hash. Another
alternative would be a radix tree, which would always give us the right
sysreg in four memory accesses. It has some impacts on the memory side,
but that's shouldn't a blocker.

As I said, the binary search was a very low hanging fruit, so it made
some sense to implement it and see how we fared. Finding the perfect
data structure is left as an exercise for the reader! ;-)

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/8] KVM: arm64: Avoid accessing ICH registers
  2016-02-10 12:45     ` Christoffer Dall
@ 2016-02-10 16:47       ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10 16:47 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

On 10/02/16 12:45, Christoffer Dall wrote:
> On Mon, Feb 08, 2016 at 11:40:22AM +0000, Marc Zyngier wrote:
>> Just like on GICv2, we're a bit hammer-happy with GICv3, and access
>> them more often than we should.
>>
>> Adopt a policy similar to what we do for GICv2, only save/restoring
>> the minimal set of registers. As we don't access the registers
>> linearly anymore (we may skip some), the convoluted accessors become
>> slightly simpler, and we can drop the ugly indexing macro that
>> tended to confuse the reviewers.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 288 ++++++++++++++++++++++++----------------
>>  include/kvm/arm_vgic.h          |   6 -
>>  virt/kvm/arm/vgic-v3.c          |   4 +-
>>  3 files changed, 176 insertions(+), 122 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> index 9142e082..d3813f5 100644
>> --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> @@ -39,12 +39,104 @@
>>  		asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
>>  	} while (0)
>>  
>> -/* vcpu is already in the HYP VA space */
>> +static u64 __hyp_text __gic_v3_get_lr(unsigned int lr)
>> +{
>> +	switch (lr & 0xf) {
>> +	case 0:
>> +		return read_gicreg(ICH_LR0_EL2);
>> +	case 1:
>> +		return read_gicreg(ICH_LR1_EL2);
>> +	case 2:
>> +		return read_gicreg(ICH_LR2_EL2);
>> +	case 3:
>> +		return read_gicreg(ICH_LR3_EL2);
>> +	case 4:
>> +		return read_gicreg(ICH_LR4_EL2);
>> +	case 5:
>> +		return read_gicreg(ICH_LR5_EL2);
>> +	case 6:
>> +		return read_gicreg(ICH_LR6_EL2);
>> +	case 7:
>> +		return read_gicreg(ICH_LR7_EL2);
>> +	case 8:
>> +		return read_gicreg(ICH_LR8_EL2);
>> +	case 9:
>> +		return read_gicreg(ICH_LR9_EL2);
>> +	case 10:
>> +		return read_gicreg(ICH_LR10_EL2);
>> +	case 11:
>> +		return read_gicreg(ICH_LR11_EL2);
>> +	case 12:
>> +		return read_gicreg(ICH_LR12_EL2);
>> +	case 13:
>> +		return read_gicreg(ICH_LR13_EL2);
>> +	case 14:
>> +		return read_gicreg(ICH_LR14_EL2);
>> +	case 15:
>> +		return read_gicreg(ICH_LR15_EL2);
>> +	}
>> +
>> +	unreachable();
>> +}
>> +
>> +static void __hyp_text __gic_v3_set_lr(u64 val, int lr)
>> +{
>> +	switch (lr & 0xf) {
>> +	case 0:
>> +		write_gicreg(val, ICH_LR0_EL2);
>> +		break;
>> +	case 1:
>> +		write_gicreg(val, ICH_LR1_EL2);
>> +		break;
>> +	case 2:
>> +		write_gicreg(val, ICH_LR2_EL2);
>> +		break;
>> +	case 3:
>> +		write_gicreg(val, ICH_LR3_EL2);
>> +		break;
>> +	case 4:
>> +		write_gicreg(val, ICH_LR4_EL2);
>> +		break;
>> +	case 5:
>> +		write_gicreg(val, ICH_LR5_EL2);
>> +		break;
>> +	case 6:
>> +		write_gicreg(val, ICH_LR6_EL2);
>> +		break;
>> +	case 7:
>> +		write_gicreg(val, ICH_LR7_EL2);
>> +		break;
>> +	case 8:
>> +		write_gicreg(val, ICH_LR8_EL2);
>> +		break;
>> +	case 9:
>> +		write_gicreg(val, ICH_LR9_EL2);
>> +		break;
>> +	case 10:
>> +		write_gicreg(val, ICH_LR10_EL2);
>> +		break;
>> +	case 11:
>> +		write_gicreg(val, ICH_LR11_EL2);
>> +		break;
>> +	case 12:
>> +		write_gicreg(val, ICH_LR12_EL2);
>> +		break;
>> +	case 13:
>> +		write_gicreg(val, ICH_LR13_EL2);
>> +		break;
>> +	case 14:
>> +		write_gicreg(val, ICH_LR14_EL2);
>> +		break;
>> +	case 15:
>> +		write_gicreg(val, ICH_LR15_EL2);
>> +		break;
>> +	}
>> +}
>> +
>>  void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
>>  	u64 val;
>> -	u32 max_lr_idx, nr_pri_bits;
>>  
>>  	/*
>>  	 * Make sure stores to the GIC via the memory mapped interface
>> @@ -53,68 +145,50 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>>  	dsb(st);
>>  
>>  	cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
>> -	cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
>> -	cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
>> -	cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>>  
>> -	write_gicreg(0, ICH_HCR_EL2);
>> -	val = read_gicreg(ICH_VTR_EL2);
>> -	max_lr_idx = vtr_to_max_lr_idx(val);
>> -	nr_pri_bits = vtr_to_nr_pri_bits(val);
>> +	if (vcpu->arch.vgic_cpu.live_lrs) {
>> +		int i;
>> +		u32 max_lr_idx, nr_pri_bits;
>>  
>> -	switch (max_lr_idx) {
>> -	case 15:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)] = read_gicreg(ICH_LR15_EL2);
>> -	case 14:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)] = read_gicreg(ICH_LR14_EL2);
>> -	case 13:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)] = read_gicreg(ICH_LR13_EL2);
>> -	case 12:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)] = read_gicreg(ICH_LR12_EL2);
>> -	case 11:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)] = read_gicreg(ICH_LR11_EL2);
>> -	case 10:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)] = read_gicreg(ICH_LR10_EL2);
>> -	case 9:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)] = read_gicreg(ICH_LR9_EL2);
>> -	case 8:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)] = read_gicreg(ICH_LR8_EL2);
>> -	case 7:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)] = read_gicreg(ICH_LR7_EL2);
>> -	case 6:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)] = read_gicreg(ICH_LR6_EL2);
>> -	case 5:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)] = read_gicreg(ICH_LR5_EL2);
>> -	case 4:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)] = read_gicreg(ICH_LR4_EL2);
>> -	case 3:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)] = read_gicreg(ICH_LR3_EL2);
>> -	case 2:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)] = read_gicreg(ICH_LR2_EL2);
>> -	case 1:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)] = read_gicreg(ICH_LR1_EL2);
>> -	case 0:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)] = read_gicreg(ICH_LR0_EL2);
>> -	}
>> +		cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
>> +		cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
>> +		cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>>  
>> -	switch (nr_pri_bits) {
>> -	case 7:
>> -		cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
>> -		cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
>> -	case 6:
>> -		cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
>> -	default:
>> -		cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
>> -	}
>> +		write_gicreg(0, ICH_HCR_EL2);
>> +		val = read_gicreg(ICH_VTR_EL2);
> 
> can't we cache the read of ICH_VTR_EL2 then?

We can (this is an invariant anyway). I don't expect it to be slow
though, as this doesn't have to go deep into the GIC, and stays at the
CPU level. I'll benchmark it anyway.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 8/8] KVM: arm64: Avoid accessing ICH registers
@ 2016-02-10 16:47       ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10 16:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/02/16 12:45, Christoffer Dall wrote:
> On Mon, Feb 08, 2016 at 11:40:22AM +0000, Marc Zyngier wrote:
>> Just like on GICv2, we're a bit hammer-happy with GICv3, and access
>> them more often than we should.
>>
>> Adopt a policy similar to what we do for GICv2, only save/restoring
>> the minimal set of registers. As we don't access the registers
>> linearly anymore (we may skip some), the convoluted accessors become
>> slightly simpler, and we can drop the ugly indexing macro that
>> tended to confuse the reviewers.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 288 ++++++++++++++++++++++++----------------
>>  include/kvm/arm_vgic.h          |   6 -
>>  virt/kvm/arm/vgic-v3.c          |   4 +-
>>  3 files changed, 176 insertions(+), 122 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> index 9142e082..d3813f5 100644
>> --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> @@ -39,12 +39,104 @@
>>  		asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
>>  	} while (0)
>>  
>> -/* vcpu is already in the HYP VA space */
>> +static u64 __hyp_text __gic_v3_get_lr(unsigned int lr)
>> +{
>> +	switch (lr & 0xf) {
>> +	case 0:
>> +		return read_gicreg(ICH_LR0_EL2);
>> +	case 1:
>> +		return read_gicreg(ICH_LR1_EL2);
>> +	case 2:
>> +		return read_gicreg(ICH_LR2_EL2);
>> +	case 3:
>> +		return read_gicreg(ICH_LR3_EL2);
>> +	case 4:
>> +		return read_gicreg(ICH_LR4_EL2);
>> +	case 5:
>> +		return read_gicreg(ICH_LR5_EL2);
>> +	case 6:
>> +		return read_gicreg(ICH_LR6_EL2);
>> +	case 7:
>> +		return read_gicreg(ICH_LR7_EL2);
>> +	case 8:
>> +		return read_gicreg(ICH_LR8_EL2);
>> +	case 9:
>> +		return read_gicreg(ICH_LR9_EL2);
>> +	case 10:
>> +		return read_gicreg(ICH_LR10_EL2);
>> +	case 11:
>> +		return read_gicreg(ICH_LR11_EL2);
>> +	case 12:
>> +		return read_gicreg(ICH_LR12_EL2);
>> +	case 13:
>> +		return read_gicreg(ICH_LR13_EL2);
>> +	case 14:
>> +		return read_gicreg(ICH_LR14_EL2);
>> +	case 15:
>> +		return read_gicreg(ICH_LR15_EL2);
>> +	}
>> +
>> +	unreachable();
>> +}
>> +
>> +static void __hyp_text __gic_v3_set_lr(u64 val, int lr)
>> +{
>> +	switch (lr & 0xf) {
>> +	case 0:
>> +		write_gicreg(val, ICH_LR0_EL2);
>> +		break;
>> +	case 1:
>> +		write_gicreg(val, ICH_LR1_EL2);
>> +		break;
>> +	case 2:
>> +		write_gicreg(val, ICH_LR2_EL2);
>> +		break;
>> +	case 3:
>> +		write_gicreg(val, ICH_LR3_EL2);
>> +		break;
>> +	case 4:
>> +		write_gicreg(val, ICH_LR4_EL2);
>> +		break;
>> +	case 5:
>> +		write_gicreg(val, ICH_LR5_EL2);
>> +		break;
>> +	case 6:
>> +		write_gicreg(val, ICH_LR6_EL2);
>> +		break;
>> +	case 7:
>> +		write_gicreg(val, ICH_LR7_EL2);
>> +		break;
>> +	case 8:
>> +		write_gicreg(val, ICH_LR8_EL2);
>> +		break;
>> +	case 9:
>> +		write_gicreg(val, ICH_LR9_EL2);
>> +		break;
>> +	case 10:
>> +		write_gicreg(val, ICH_LR10_EL2);
>> +		break;
>> +	case 11:
>> +		write_gicreg(val, ICH_LR11_EL2);
>> +		break;
>> +	case 12:
>> +		write_gicreg(val, ICH_LR12_EL2);
>> +		break;
>> +	case 13:
>> +		write_gicreg(val, ICH_LR13_EL2);
>> +		break;
>> +	case 14:
>> +		write_gicreg(val, ICH_LR14_EL2);
>> +		break;
>> +	case 15:
>> +		write_gicreg(val, ICH_LR15_EL2);
>> +		break;
>> +	}
>> +}
>> +
>>  void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
>>  	u64 val;
>> -	u32 max_lr_idx, nr_pri_bits;
>>  
>>  	/*
>>  	 * Make sure stores to the GIC via the memory mapped interface
>> @@ -53,68 +145,50 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>>  	dsb(st);
>>  
>>  	cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
>> -	cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
>> -	cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
>> -	cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>>  
>> -	write_gicreg(0, ICH_HCR_EL2);
>> -	val = read_gicreg(ICH_VTR_EL2);
>> -	max_lr_idx = vtr_to_max_lr_idx(val);
>> -	nr_pri_bits = vtr_to_nr_pri_bits(val);
>> +	if (vcpu->arch.vgic_cpu.live_lrs) {
>> +		int i;
>> +		u32 max_lr_idx, nr_pri_bits;
>>  
>> -	switch (max_lr_idx) {
>> -	case 15:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)] = read_gicreg(ICH_LR15_EL2);
>> -	case 14:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)] = read_gicreg(ICH_LR14_EL2);
>> -	case 13:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)] = read_gicreg(ICH_LR13_EL2);
>> -	case 12:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)] = read_gicreg(ICH_LR12_EL2);
>> -	case 11:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)] = read_gicreg(ICH_LR11_EL2);
>> -	case 10:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)] = read_gicreg(ICH_LR10_EL2);
>> -	case 9:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)] = read_gicreg(ICH_LR9_EL2);
>> -	case 8:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)] = read_gicreg(ICH_LR8_EL2);
>> -	case 7:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)] = read_gicreg(ICH_LR7_EL2);
>> -	case 6:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)] = read_gicreg(ICH_LR6_EL2);
>> -	case 5:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)] = read_gicreg(ICH_LR5_EL2);
>> -	case 4:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)] = read_gicreg(ICH_LR4_EL2);
>> -	case 3:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)] = read_gicreg(ICH_LR3_EL2);
>> -	case 2:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)] = read_gicreg(ICH_LR2_EL2);
>> -	case 1:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)] = read_gicreg(ICH_LR1_EL2);
>> -	case 0:
>> -		cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)] = read_gicreg(ICH_LR0_EL2);
>> -	}
>> +		cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
>> +		cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
>> +		cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>>  
>> -	switch (nr_pri_bits) {
>> -	case 7:
>> -		cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
>> -		cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
>> -	case 6:
>> -		cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
>> -	default:
>> -		cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
>> -	}
>> +		write_gicreg(0, ICH_HCR_EL2);
>> +		val = read_gicreg(ICH_VTR_EL2);
> 
> can't we cache the read of ICH_VTR_EL2 then?

We can (this is an invariant anyway). I don't expect it to be slow
though, as this doesn't have to go deep into the GIC, and stays at the
CPU level. I'll benchmark it anyway.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers
  2016-02-10 13:34       ` Marc Zyngier
@ 2016-02-10 17:30         ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 17:30 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Wed, Feb 10, 2016 at 01:34:44PM +0000, Marc Zyngier wrote:
> On 10/02/16 12:45, Christoffer Dall wrote:
> > On Mon, Feb 08, 2016 at 11:40:21AM +0000, Marc Zyngier wrote:
> >> GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
> >> But we're equaly bad, as we make a point in accessing them even if
> >> we don't have any interrupt in flight.
> >>
> >> A good solution is to first find out if we have anything useful to
> >> write into the GIC, and if we don't, to simply not do it. This
> >> involves tracking which LRs actually have something valid there.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 71 ++++++++++++++++++++++++++++-------------
> >>  include/kvm/arm_vgic.h          |  2 ++
> >>  2 files changed, 51 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> >> index e717612..874a08d 100644
> >> --- a/arch/arm64/kvm/hyp/vgic-v2-sr.c
> >> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> >> @@ -38,28 +38,40 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
> >>  
> >>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
> >>  	cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
> >> -	cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
> >> -	eisr0  = readl_relaxed(base + GICH_EISR0);
> >> -	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> >> -	if (unlikely(nr_lr > 32)) {
> >> -		eisr1  = readl_relaxed(base + GICH_EISR1);
> >> -		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> >> -	} else {
> >> -		eisr1 = elrsr1 = 0;
> >> -	}
> >> +
> >> +	if (vcpu->arch.vgic_cpu.live_lrs) {
> >> +		eisr0  = readl_relaxed(base + GICH_EISR0);
> >> +		elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> >> +		cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
> >> +		cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
> >> +
> >> +		if (unlikely(nr_lr > 32)) {
> >> +			eisr1  = readl_relaxed(base + GICH_EISR1);
> >> +			elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> >> +		} else {
> >> +			eisr1 = elrsr1 = 0;
> >> +		}
> >> +
> >>  #ifdef CONFIG_CPU_BIG_ENDIAN
> >> -	cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
> >> -	cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
> >> +		cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
> >> +		cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
> >>  #else
> >> -	cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
> >> -	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
> >> +		cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
> >> +		cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
> >>  #endif
> >> -	cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
> >>  
> >> -	writel_relaxed(0, base + GICH_HCR);
> >> +		for (i = 0; i < nr_lr; i++)
> >> +			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
> >> +				cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
> >>  
> >> -	for (i = 0; i < nr_lr; i++)
> >> -		cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
> >> +		writel_relaxed(0, base + GICH_HCR);
> >> +
> >> +		vcpu->arch.vgic_cpu.live_lrs = 0;
> >> +	} else {
> >> +		cpu_if->vgic_eisr = 0;
> >> +		cpu_if->vgic_elrsr = ~0UL;
> >> +		cpu_if->vgic_misr = 0;
> >> +	}
> >>  }
> >>  
> >>  /* vcpu is already in the HYP VA space */
> >> @@ -70,15 +82,30 @@ void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
> >>  	struct vgic_dist *vgic = &kvm->arch.vgic;
> >>  	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
> >>  	int i, nr_lr;
> >> +	u64 live_lrs = 0;
> >>  
> >>  	if (!base)
> >>  		return;
> >>  
> >> -	writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> >> -	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
> >> -	writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> >> -
> >>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
> >> +
> >>  	for (i = 0; i < nr_lr; i++)
> >> -		writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
> >> +		if (cpu_if->vgic_lr[i] & GICH_LR_STATE)
> >> +			live_lrs |= 1UL << i;
> >> +
> >> +	if (live_lrs) {
> >> +		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> >> +		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> >> +		for (i = 0; i < nr_lr; i++) {
> >> +			u32 val = 0;
> >> +
> >> +			if (live_lrs & (1UL << i))
> >> +				val = cpu_if->vgic_lr[i];
> >> +
> >> +			writel_relaxed(val, base + GICH_LR0 + (i * 4));
> >> +		}
> >> +	}
> >> +
> >> +	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
> > 
> > couldn't you optimize this out by storing the last read value and
> > compare if anything changed?  (you'd have to invalidate the cached value
> > on vcpu_put obviously).
> 
> Yeah, very good point. Only the guest can update this, so we could even
> move it to vcpu_load/vcpu_put entirely, and never save/restore it inside
> the run loop.

If vcpu_load is called *after* loading incoming state on migration, this
should work, yes.

> 
> I'll keep that for a further patch, as it requires a bit of infrastructure.
> 
Sounds good.

We can probably also optimize the writing of the LRs further, but I
figure it's not worth it as the interrupt delivery path is the slow path
anyway and we should care about optimizing the common case.

I wouldn't think saving 2-3 writes to some LRs would be measurable for
interrupt delivery anyhow.

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers
@ 2016-02-10 17:30         ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 10, 2016 at 01:34:44PM +0000, Marc Zyngier wrote:
> On 10/02/16 12:45, Christoffer Dall wrote:
> > On Mon, Feb 08, 2016 at 11:40:21AM +0000, Marc Zyngier wrote:
> >> GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
> >> But we're equaly bad, as we make a point in accessing them even if
> >> we don't have any interrupt in flight.
> >>
> >> A good solution is to first find out if we have anything useful to
> >> write into the GIC, and if we don't, to simply not do it. This
> >> involves tracking which LRs actually have something valid there.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 71 ++++++++++++++++++++++++++++-------------
> >>  include/kvm/arm_vgic.h          |  2 ++
> >>  2 files changed, 51 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> >> index e717612..874a08d 100644
> >> --- a/arch/arm64/kvm/hyp/vgic-v2-sr.c
> >> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> >> @@ -38,28 +38,40 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
> >>  
> >>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
> >>  	cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
> >> -	cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
> >> -	eisr0  = readl_relaxed(base + GICH_EISR0);
> >> -	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> >> -	if (unlikely(nr_lr > 32)) {
> >> -		eisr1  = readl_relaxed(base + GICH_EISR1);
> >> -		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> >> -	} else {
> >> -		eisr1 = elrsr1 = 0;
> >> -	}
> >> +
> >> +	if (vcpu->arch.vgic_cpu.live_lrs) {
> >> +		eisr0  = readl_relaxed(base + GICH_EISR0);
> >> +		elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> >> +		cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
> >> +		cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
> >> +
> >> +		if (unlikely(nr_lr > 32)) {
> >> +			eisr1  = readl_relaxed(base + GICH_EISR1);
> >> +			elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> >> +		} else {
> >> +			eisr1 = elrsr1 = 0;
> >> +		}
> >> +
> >>  #ifdef CONFIG_CPU_BIG_ENDIAN
> >> -	cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
> >> -	cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
> >> +		cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
> >> +		cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
> >>  #else
> >> -	cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
> >> -	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
> >> +		cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
> >> +		cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
> >>  #endif
> >> -	cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
> >>  
> >> -	writel_relaxed(0, base + GICH_HCR);
> >> +		for (i = 0; i < nr_lr; i++)
> >> +			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
> >> +				cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
> >>  
> >> -	for (i = 0; i < nr_lr; i++)
> >> -		cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
> >> +		writel_relaxed(0, base + GICH_HCR);
> >> +
> >> +		vcpu->arch.vgic_cpu.live_lrs = 0;
> >> +	} else {
> >> +		cpu_if->vgic_eisr = 0;
> >> +		cpu_if->vgic_elrsr = ~0UL;
> >> +		cpu_if->vgic_misr = 0;
> >> +	}
> >>  }
> >>  
> >>  /* vcpu is already in the HYP VA space */
> >> @@ -70,15 +82,30 @@ void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
> >>  	struct vgic_dist *vgic = &kvm->arch.vgic;
> >>  	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
> >>  	int i, nr_lr;
> >> +	u64 live_lrs = 0;
> >>  
> >>  	if (!base)
> >>  		return;
> >>  
> >> -	writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> >> -	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
> >> -	writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> >> -
> >>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
> >> +
> >>  	for (i = 0; i < nr_lr; i++)
> >> -		writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
> >> +		if (cpu_if->vgic_lr[i] & GICH_LR_STATE)
> >> +			live_lrs |= 1UL << i;
> >> +
> >> +	if (live_lrs) {
> >> +		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> >> +		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> >> +		for (i = 0; i < nr_lr; i++) {
> >> +			u32 val = 0;
> >> +
> >> +			if (live_lrs & (1UL << i))
> >> +				val = cpu_if->vgic_lr[i];
> >> +
> >> +			writel_relaxed(val, base + GICH_LR0 + (i * 4));
> >> +		}
> >> +	}
> >> +
> >> +	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
> > 
> > couldn't you optimize this out by storing the last read value and
> > compare if anything changed?  (you'd have to invalidate the cached value
> > on vcpu_put obviously).
> 
> Yeah, very good point. Only the guest can update this, so we could even
> move it to vcpu_load/vcpu_put entirely, and never save/restore it inside
> the run loop.

If vcpu_load is called *after* loading incoming state on migration, this
should work, yes.

> 
> I'll keep that for a further patch, as it requires a bit of infrastructure.
> 
Sounds good.

We can probably also optimize the writing of the LRs further, but I
figure it's not worth it as the interrupt delivery path is the slow path
anyway and we should care about optimizing the common case.

I wouldn't think saving 2-3 writes to some LRs would be measurable for
interrupt delivery anyhow.

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers
  2016-02-10 17:30         ` Christoffer Dall
@ 2016-02-10 17:43           ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10 17:43 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

On 10/02/16 17:30, Christoffer Dall wrote:
> On Wed, Feb 10, 2016 at 01:34:44PM +0000, Marc Zyngier wrote:
>> On 10/02/16 12:45, Christoffer Dall wrote:
>>> On Mon, Feb 08, 2016 at 11:40:21AM +0000, Marc Zyngier wrote:
>>>> GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
>>>> But we're equaly bad, as we make a point in accessing them even if
>>>> we don't have any interrupt in flight.
>>>>
>>>> A good solution is to first find out if we have anything useful to
>>>> write into the GIC, and if we don't, to simply not do it. This
>>>> involves tracking which LRs actually have something valid there.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 71 ++++++++++++++++++++++++++++-------------
>>>>  include/kvm/arm_vgic.h          |  2 ++
>>>>  2 files changed, 51 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>>>> index e717612..874a08d 100644
>>>> --- a/arch/arm64/kvm/hyp/vgic-v2-sr.c
>>>> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>>>> @@ -38,28 +38,40 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
>>>>  
>>>>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>>>>  	cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
>>>> -	cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
>>>> -	eisr0  = readl_relaxed(base + GICH_EISR0);
>>>> -	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
>>>> -	if (unlikely(nr_lr > 32)) {
>>>> -		eisr1  = readl_relaxed(base + GICH_EISR1);
>>>> -		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
>>>> -	} else {
>>>> -		eisr1 = elrsr1 = 0;
>>>> -	}
>>>> +
>>>> +	if (vcpu->arch.vgic_cpu.live_lrs) {
>>>> +		eisr0  = readl_relaxed(base + GICH_EISR0);
>>>> +		elrsr0 = readl_relaxed(base + GICH_ELRSR0);
>>>> +		cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
>>>> +		cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
>>>> +
>>>> +		if (unlikely(nr_lr > 32)) {
>>>> +			eisr1  = readl_relaxed(base + GICH_EISR1);
>>>> +			elrsr1 = readl_relaxed(base + GICH_ELRSR1);
>>>> +		} else {
>>>> +			eisr1 = elrsr1 = 0;
>>>> +		}
>>>> +
>>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>>>> -	cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
>>>> -	cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>>>> +		cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
>>>> +		cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>>>>  #else
>>>> -	cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
>>>> -	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>>>> +		cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
>>>> +		cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>>>>  #endif
>>>> -	cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
>>>>  
>>>> -	writel_relaxed(0, base + GICH_HCR);
>>>> +		for (i = 0; i < nr_lr; i++)
>>>> +			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
>>>> +				cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
>>>>  
>>>> -	for (i = 0; i < nr_lr; i++)
>>>> -		cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
>>>> +		writel_relaxed(0, base + GICH_HCR);
>>>> +
>>>> +		vcpu->arch.vgic_cpu.live_lrs = 0;
>>>> +	} else {
>>>> +		cpu_if->vgic_eisr = 0;
>>>> +		cpu_if->vgic_elrsr = ~0UL;
>>>> +		cpu_if->vgic_misr = 0;
>>>> +	}
>>>>  }
>>>>  
>>>>  /* vcpu is already in the HYP VA space */
>>>> @@ -70,15 +82,30 @@ void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
>>>>  	struct vgic_dist *vgic = &kvm->arch.vgic;
>>>>  	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
>>>>  	int i, nr_lr;
>>>> +	u64 live_lrs = 0;
>>>>  
>>>>  	if (!base)
>>>>  		return;
>>>>  
>>>> -	writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
>>>> -	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
>>>> -	writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
>>>> -
>>>>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>>>> +
>>>>  	for (i = 0; i < nr_lr; i++)
>>>> -		writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
>>>> +		if (cpu_if->vgic_lr[i] & GICH_LR_STATE)
>>>> +			live_lrs |= 1UL << i;
>>>> +
>>>> +	if (live_lrs) {
>>>> +		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
>>>> +		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
>>>> +		for (i = 0; i < nr_lr; i++) {
>>>> +			u32 val = 0;
>>>> +
>>>> +			if (live_lrs & (1UL << i))
>>>> +				val = cpu_if->vgic_lr[i];
>>>> +
>>>> +			writel_relaxed(val, base + GICH_LR0 + (i * 4));
>>>> +		}
>>>> +	}
>>>> +
>>>> +	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
>>>
>>> couldn't you optimize this out by storing the last read value and
>>> compare if anything changed?  (you'd have to invalidate the cached value
>>> on vcpu_put obviously).
>>
>> Yeah, very good point. Only the guest can update this, so we could even
>> move it to vcpu_load/vcpu_put entirely, and never save/restore it inside
>> the run loop.
> 
> If vcpu_load is called *after* loading incoming state on migration, this
> should work, yes.

Hmmm. That could be an issue, actually. I need to check if we do a
vcpu_load on SET_ONE_REG access. If we do, then vcpu_put will overwrite
the value we've written in the shadow copies by reading back the old
value from the HW.

I'll investigate when I get the time.

>>
>> I'll keep that for a further patch, as it requires a bit of infrastructure.
>>
> Sounds good.
> 
> We can probably also optimize the writing of the LRs further, but I
> figure it's not worth it as the interrupt delivery path is the slow path
> anyway and we should care about optimizing the common case.
> 
> I wouldn't think saving 2-3 writes to some LRs would be measurable for
> interrupt delivery anyhow.

I found out that some other simple optimizations did save about 800
cycles for a single interrupt injection, which is about 10% of the
complete exit/enter path.

Guilty ones are the maintenance interrupt status registers (MISR, EISR),
and zeroing of LRs. That's with GICv2 though, and GICv3 seems less
sensitive to that kind of thing...

But I agree with you: this is a fairly slow path overall, and we'll
quickly approach the point of diminishing returns.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers
@ 2016-02-10 17:43           ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-10 17:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/02/16 17:30, Christoffer Dall wrote:
> On Wed, Feb 10, 2016 at 01:34:44PM +0000, Marc Zyngier wrote:
>> On 10/02/16 12:45, Christoffer Dall wrote:
>>> On Mon, Feb 08, 2016 at 11:40:21AM +0000, Marc Zyngier wrote:
>>>> GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
>>>> But we're equaly bad, as we make a point in accessing them even if
>>>> we don't have any interrupt in flight.
>>>>
>>>> A good solution is to first find out if we have anything useful to
>>>> write into the GIC, and if we don't, to simply not do it. This
>>>> involves tracking which LRs actually have something valid there.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 71 ++++++++++++++++++++++++++++-------------
>>>>  include/kvm/arm_vgic.h          |  2 ++
>>>>  2 files changed, 51 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>>>> index e717612..874a08d 100644
>>>> --- a/arch/arm64/kvm/hyp/vgic-v2-sr.c
>>>> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>>>> @@ -38,28 +38,40 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
>>>>  
>>>>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>>>>  	cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
>>>> -	cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
>>>> -	eisr0  = readl_relaxed(base + GICH_EISR0);
>>>> -	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
>>>> -	if (unlikely(nr_lr > 32)) {
>>>> -		eisr1  = readl_relaxed(base + GICH_EISR1);
>>>> -		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
>>>> -	} else {
>>>> -		eisr1 = elrsr1 = 0;
>>>> -	}
>>>> +
>>>> +	if (vcpu->arch.vgic_cpu.live_lrs) {
>>>> +		eisr0  = readl_relaxed(base + GICH_EISR0);
>>>> +		elrsr0 = readl_relaxed(base + GICH_ELRSR0);
>>>> +		cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
>>>> +		cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
>>>> +
>>>> +		if (unlikely(nr_lr > 32)) {
>>>> +			eisr1  = readl_relaxed(base + GICH_EISR1);
>>>> +			elrsr1 = readl_relaxed(base + GICH_ELRSR1);
>>>> +		} else {
>>>> +			eisr1 = elrsr1 = 0;
>>>> +		}
>>>> +
>>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>>>> -	cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
>>>> -	cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>>>> +		cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
>>>> +		cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>>>>  #else
>>>> -	cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
>>>> -	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>>>> +		cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
>>>> +		cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>>>>  #endif
>>>> -	cpu_if->vgic_apr    = readl_relaxed(base + GICH_APR);
>>>>  
>>>> -	writel_relaxed(0, base + GICH_HCR);
>>>> +		for (i = 0; i < nr_lr; i++)
>>>> +			if (vcpu->arch.vgic_cpu.live_lrs & (1UL << i))
>>>> +				cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
>>>>  
>>>> -	for (i = 0; i < nr_lr; i++)
>>>> -		cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
>>>> +		writel_relaxed(0, base + GICH_HCR);
>>>> +
>>>> +		vcpu->arch.vgic_cpu.live_lrs = 0;
>>>> +	} else {
>>>> +		cpu_if->vgic_eisr = 0;
>>>> +		cpu_if->vgic_elrsr = ~0UL;
>>>> +		cpu_if->vgic_misr = 0;
>>>> +	}
>>>>  }
>>>>  
>>>>  /* vcpu is already in the HYP VA space */
>>>> @@ -70,15 +82,30 @@ void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
>>>>  	struct vgic_dist *vgic = &kvm->arch.vgic;
>>>>  	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
>>>>  	int i, nr_lr;
>>>> +	u64 live_lrs = 0;
>>>>  
>>>>  	if (!base)
>>>>  		return;
>>>>  
>>>> -	writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
>>>> -	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
>>>> -	writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
>>>> -
>>>>  	nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>>>> +
>>>>  	for (i = 0; i < nr_lr; i++)
>>>> -		writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
>>>> +		if (cpu_if->vgic_lr[i] & GICH_LR_STATE)
>>>> +			live_lrs |= 1UL << i;
>>>> +
>>>> +	if (live_lrs) {
>>>> +		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
>>>> +		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
>>>> +		for (i = 0; i < nr_lr; i++) {
>>>> +			u32 val = 0;
>>>> +
>>>> +			if (live_lrs & (1UL << i))
>>>> +				val = cpu_if->vgic_lr[i];
>>>> +
>>>> +			writel_relaxed(val, base + GICH_LR0 + (i * 4));
>>>> +		}
>>>> +	}
>>>> +
>>>> +	writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
>>>
>>> couldn't you optimize this out by storing the last read value and
>>> compare if anything changed?  (you'd have to invalidate the cached value
>>> on vcpu_put obviously).
>>
>> Yeah, very good point. Only the guest can update this, so we could even
>> move it to vcpu_load/vcpu_put entirely, and never save/restore it inside
>> the run loop.
> 
> If vcpu_load is called *after* loading incoming state on migration, this
> should work, yes.

Hmmm. That could be an issue, actually. I need to check if we do a
vcpu_load on SET_ONE_REG access. If we do, then vcpu_put will overwrite
the value we've written in the shadow copies by reading back the old
value from the HW.

I'll investigate when I get the time.

>>
>> I'll keep that for a further patch, as it requires a bit of infrastructure.
>>
> Sounds good.
> 
> We can probably also optimize the writing of the LRs further, but I
> figure it's not worth it as the interrupt delivery path is the slow path
> anyway and we should care about optimizing the common case.
> 
> I wouldn't think saving 2-3 writes to some LRs would be measurable for
> interrupt delivery anyhow.

I found out that some other simple optimizations did save about 800
cycles for a single interrupt injection, which is about 10% of the
complete exit/enter path.

Guilty ones are the maintenance interrupt status registers (MISR, EISR),
and zeroing of LRs. That's with GICv2 though, and GICv3 seems less
sensitive to that kind of thing...

But I agree with you: this is a fairly slow path overall, and we'll
quickly approach the point of diminishing returns.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
  2016-02-08 11:40 ` Marc Zyngier
@ 2016-02-10 20:40   ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 20:40 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvm, kvmarm

On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
> I've recently been looking at our entry/exit costs, and profiling
> figures did show some very low hanging fruits.
> 
> The most obvious cost is that accessing the GIC HW is slow. As in
> "deadly slow", specially when GICv2 is involved. So not hammering the
> HW when there is nothing to write is immediately beneficial, as this
> is the most common cases (whatever people seem to think, interrupts
> are a *rare* event).
> 
> Another easy thing to fix is the way we handle trapped system
> registers. We do insist on (mostly) sorting them, but we do perform a
> linear search on trap. We can switch to a binary search for free, and
> get immediate benefits (the PMU code, being extremely trap-happy,
> benefits immediately from this).
> 
> With these in place, I see an improvement of 20 to 30% (depending on
> the platform) on our world-switch cycle count when running a set of
> hand-crafted guests that are designed to only perform traps.
> 

By the way, I took this whole stack of changes (wsinc, vhe, and
optimizations) and ran it on Mustang and fired up UEFI and did a reboot
and things seem to work, so that's a small shallow
'tested-by-something-else-than-a-linux-guest' statement from me.

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
@ 2016-02-10 20:40   ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-10 20:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
> I've recently been looking at our entry/exit costs, and profiling
> figures did show some very low hanging fruits.
> 
> The most obvious cost is that accessing the GIC HW is slow. As in
> "deadly slow", specially when GICv2 is involved. So not hammering the
> HW when there is nothing to write is immediately beneficial, as this
> is the most common cases (whatever people seem to think, interrupts
> are a *rare* event).
> 
> Another easy thing to fix is the way we handle trapped system
> registers. We do insist on (mostly) sorting them, but we do perform a
> linear search on trap. We can switch to a binary search for free, and
> get immediate benefits (the PMU code, being extremely trap-happy,
> benefits immediately from this).
> 
> With these in place, I see an improvement of 20 to 30% (depending on
> the platform) on our world-switch cycle count when running a set of
> hand-crafted guests that are designed to only perform traps.
> 

By the way, I took this whole stack of changes (wsinc, vhe, and
optimizations) and ran it on Mustang and fired up UEFI and did a reboot
and things seem to work, so that's a small shallow
'tested-by-something-else-than-a-linux-guest' statement from me.

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
  2016-02-10 20:40   ` Christoffer Dall
@ 2016-02-16 20:05     ` Marc Zyngier
  -1 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-16 20:05 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

On 10/02/16 20:40, Christoffer Dall wrote:
> On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
>> I've recently been looking at our entry/exit costs, and profiling
>> figures did show some very low hanging fruits.
>>
>> The most obvious cost is that accessing the GIC HW is slow. As in
>> "deadly slow", specially when GICv2 is involved. So not hammering the
>> HW when there is nothing to write is immediately beneficial, as this
>> is the most common cases (whatever people seem to think, interrupts
>> are a *rare* event).
>>
>> Another easy thing to fix is the way we handle trapped system
>> registers. We do insist on (mostly) sorting them, but we do perform a
>> linear search on trap. We can switch to a binary search for free, and
>> get immediate benefits (the PMU code, being extremely trap-happy,
>> benefits immediately from this).
>>
>> With these in place, I see an improvement of 20 to 30% (depending on
>> the platform) on our world-switch cycle count when running a set of
>> hand-crafted guests that are designed to only perform traps.
>>
> 
> By the way, I took this whole stack of changes (wsinc, vhe, and
> optimizations) and ran it on Mustang and fired up UEFI and did a reboot
> and things seem to work, so that's a small shallow
> 'tested-by-something-else-than-a-linux-guest' statement from me.

I've ran a slightly heavier set of tests, and the infamous reboot loop
broke, thanks to patch #7.

Notice how we fail to wipe the vgic_apr copy on the "light" exit path?
If you're unlucky (and odds are that you will be), you will inject an
interrupt while its active priority bit is set, and the new interrupt
won't be delivered. Bah.

With that fixed, the reboot loop has been going strong for a few hours.
I'll leave my Seattle cooking overnight and if everything looks good in
the morning, I'll repost a new set of patches.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
@ 2016-02-16 20:05     ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-02-16 20:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/02/16 20:40, Christoffer Dall wrote:
> On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
>> I've recently been looking at our entry/exit costs, and profiling
>> figures did show some very low hanging fruits.
>>
>> The most obvious cost is that accessing the GIC HW is slow. As in
>> "deadly slow", specially when GICv2 is involved. So not hammering the
>> HW when there is nothing to write is immediately beneficial, as this
>> is the most common cases (whatever people seem to think, interrupts
>> are a *rare* event).
>>
>> Another easy thing to fix is the way we handle trapped system
>> registers. We do insist on (mostly) sorting them, but we do perform a
>> linear search on trap. We can switch to a binary search for free, and
>> get immediate benefits (the PMU code, being extremely trap-happy,
>> benefits immediately from this).
>>
>> With these in place, I see an improvement of 20 to 30% (depending on
>> the platform) on our world-switch cycle count when running a set of
>> hand-crafted guests that are designed to only perform traps.
>>
> 
> By the way, I took this whole stack of changes (wsinc, vhe, and
> optimizations) and ran it on Mustang and fired up UEFI and did a reboot
> and things seem to work, so that's a small shallow
> 'tested-by-something-else-than-a-linux-guest' statement from me.

I've ran a slightly heavier set of tests, and the infamous reboot loop
broke, thanks to patch #7.

Notice how we fail to wipe the vgic_apr copy on the "light" exit path?
If you're unlucky (and odds are that you will be), you will inject an
interrupt while its active priority bit is set, and the new interrupt
won't be delivered. Bah.

With that fixed, the reboot loop has been going strong for a few hours.
I'll leave my Seattle cooking overnight and if everything looks good in
the morning, I'll repost a new set of patches.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
  2016-02-16 20:05     ` Marc Zyngier
@ 2016-02-17  9:15       ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-17  9:15 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Feb 16, 2016 at 08:05:29PM +0000, Marc Zyngier wrote:
> On 10/02/16 20:40, Christoffer Dall wrote:
> > On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
> >> I've recently been looking at our entry/exit costs, and profiling
> >> figures did show some very low hanging fruits.
> >>
> >> The most obvious cost is that accessing the GIC HW is slow. As in
> >> "deadly slow", specially when GICv2 is involved. So not hammering the
> >> HW when there is nothing to write is immediately beneficial, as this
> >> is the most common cases (whatever people seem to think, interrupts
> >> are a *rare* event).
> >>
> >> Another easy thing to fix is the way we handle trapped system
> >> registers. We do insist on (mostly) sorting them, but we do perform a
> >> linear search on trap. We can switch to a binary search for free, and
> >> get immediate benefits (the PMU code, being extremely trap-happy,
> >> benefits immediately from this).
> >>
> >> With these in place, I see an improvement of 20 to 30% (depending on
> >> the platform) on our world-switch cycle count when running a set of
> >> hand-crafted guests that are designed to only perform traps.
> >>
> > 
> > By the way, I took this whole stack of changes (wsinc, vhe, and
> > optimizations) and ran it on Mustang and fired up UEFI and did a reboot
> > and things seem to work, so that's a small shallow
> > 'tested-by-something-else-than-a-linux-guest' statement from me.
> 
> I've ran a slightly heavier set of tests, and the infamous reboot loop
> broke, thanks to patch #7.
> 
> Notice how we fail to wipe the vgic_apr copy on the "light" exit path?
> If you're unlucky (and odds are that you will be), you will inject an
> interrupt while its active priority bit is set, and the new interrupt
> won't be delivered. Bah.
> 
> With that fixed, the reboot loop has been going strong for a few hours.
> I'll leave my Seattle cooking overnight and if everything looks good in
> the morning, I'll repost a new set of patches.
> 
Good that you caught this one then!

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
@ 2016-02-17  9:15       ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2016-02-17  9:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 16, 2016 at 08:05:29PM +0000, Marc Zyngier wrote:
> On 10/02/16 20:40, Christoffer Dall wrote:
> > On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
> >> I've recently been looking at our entry/exit costs, and profiling
> >> figures did show some very low hanging fruits.
> >>
> >> The most obvious cost is that accessing the GIC HW is slow. As in
> >> "deadly slow", specially when GICv2 is involved. So not hammering the
> >> HW when there is nothing to write is immediately beneficial, as this
> >> is the most common cases (whatever people seem to think, interrupts
> >> are a *rare* event).
> >>
> >> Another easy thing to fix is the way we handle trapped system
> >> registers. We do insist on (mostly) sorting them, but we do perform a
> >> linear search on trap. We can switch to a binary search for free, and
> >> get immediate benefits (the PMU code, being extremely trap-happy,
> >> benefits immediately from this).
> >>
> >> With these in place, I see an improvement of 20 to 30% (depending on
> >> the platform) on our world-switch cycle count when running a set of
> >> hand-crafted guests that are designed to only perform traps.
> >>
> > 
> > By the way, I took this whole stack of changes (wsinc, vhe, and
> > optimizations) and ran it on Mustang and fired up UEFI and did a reboot
> > and things seem to work, so that's a small shallow
> > 'tested-by-something-else-than-a-linux-guest' statement from me.
> 
> I've ran a slightly heavier set of tests, and the infamous reboot loop
> broke, thanks to patch #7.
> 
> Notice how we fail to wipe the vgic_apr copy on the "light" exit path?
> If you're unlucky (and odds are that you will be), you will inject an
> interrupt while its active priority bit is set, and the new interrupt
> won't be delivered. Bah.
> 
> With that fixed, the reboot loop has been going strong for a few hours.
> I'll leave my Seattle cooking overnight and if everything looks good in
> the morning, I'll repost a new set of patches.
> 
Good that you caught this one then!

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-02-17  9:15 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-08 11:40 [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-08 11:40 ` [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-10 13:49   ` Alex Bennée
2016-02-10 13:49     ` Alex Bennée
2016-02-10 14:00     ` Marc Zyngier
2016-02-10 14:00       ` Marc Zyngier
2016-02-08 11:40 ` [PATCH 2/8] ARM: KVM: Properly sort the invariant table Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 3/8] ARM: KVM: Enforce sorting of all CP tables Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 4/8] ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 5/8] ARM: KVM: Switch the CP reg search to be a binary search Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 6/8] KVM: arm/arm64: timer: Add active state caching Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:45   ` Christoffer Dall
2016-02-10 12:45     ` Christoffer Dall
2016-02-10 13:34     ` Marc Zyngier
2016-02-10 13:34       ` Marc Zyngier
2016-02-10 17:30       ` Christoffer Dall
2016-02-10 17:30         ` Christoffer Dall
2016-02-10 17:43         ` Marc Zyngier
2016-02-10 17:43           ` Marc Zyngier
2016-02-08 11:40 ` [PATCH 8/8] KVM: arm64: Avoid accessing ICH registers Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:45   ` Christoffer Dall
2016-02-10 12:45     ` Christoffer Dall
2016-02-10 16:47     ` Marc Zyngier
2016-02-10 16:47       ` Marc Zyngier
2016-02-09 20:59 ` [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations Christoffer Dall
2016-02-09 20:59   ` Christoffer Dall
2016-02-10  8:34   ` Marc Zyngier
2016-02-10  8:34     ` Marc Zyngier
2016-02-10 12:02     ` Andrew Jones
2016-02-10 12:02       ` Andrew Jones
2016-02-10 12:24       ` Marc Zyngier
2016-02-10 12:24         ` Marc Zyngier
2016-02-10 20:40 ` Christoffer Dall
2016-02-10 20:40   ` Christoffer Dall
2016-02-16 20:05   ` Marc Zyngier
2016-02-16 20:05     ` Marc Zyngier
2016-02-17  9:15     ` Christoffer Dall
2016-02-17  9:15       ` Christoffer Dall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.