KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v5 00/20] KVM RISC-V Support
@ 2019-08-22  8:42 Anup Patel
  2019-08-22  8:42 ` [PATCH v5 01/20] KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface Anup Patel
                   ` (20 more replies)
  0 siblings, 21 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:42 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

This series adds initial KVM RISC-V support. Currently, we are able to boot
RISC-V 64bit Linux Guests with multiple VCPUs.

Few key aspects of KVM RISC-V added by this series are:
1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
3. KVM ONE_REG interface for VCPU register access from user-space.
4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
   be added in future.
5. Timer and IPI emuation is done in-kernel.
6. MMU notifiers supported.
7. FP lazy save/restore supported.
8. SBI v0.1 emulation for KVM Guest available.

Here's a brief TODO list which we will work upon after this series:
1. Handle trap from unpriv access in reading Guest instruction
2. Handle trap from unpriv access in SBI v0.1 emulation
3. Implement recursive stage2 page table programing
4. SBI v0.2 emulation in-kernel
5. SBI v0.2 hart hotplug emulation in-kernel
6. In-kernel PLIC emulation
7. ..... and more .....

This series can be found in riscv_kvm_v5 branch at:
https//github.com/avpatel/linux.git

Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v1 branch at:
https//github.com/avpatel/kvmtool.git

We need OpenSBI with RISC-V hypervisor extension support which can be
found in hyp_ext_changes_v1 branch at:
https://github.com/riscv/opensbi.git

The QEMU RISC-V hypervisor emulation is done by Alistair and is available
in riscv-hyp-work.next branch at:
https://github.com/alistair23/qemu.git

To play around with KVM RISC-V, here are few reference commands:
1) To cross-compile KVMTOOL:
   $ make lkvm-static
2) To launch RISC-V Host Linux:
   $ qemu-system-riscv64 -monitor null -cpu rv64,h=true -M virt \
   -m 512M -display none -serial mon:stdio \
   -kernel opensbi/build/platform/qemu/virt/firmware/fw_jump.elf \
   -device loader,file=build-riscv64/arch/riscv/boot/Image,addr=0x80200000 \
   -initrd ./rootfs_kvm_riscv64.img \
   -append "root=/dev/ram rw console=ttyS0 earlycon=sbi"
3) To launch RISC-V Guest Linux with 9P rootfs:
   $ ./apps/lkvm-static run -m 128 -c2 --console serial \
   -p "console=ttyS0 earlycon=uart8250,mmio,0x3f8" -k ./apps/Image --debug
4) To launch RISC-V Guest Linux with initrd:
   $ ./apps/lkvm-static run -m 128 -c2 --console serial \
   -p "console=ttyS0 earlycon=uart8250,mmio,0x3f8" -k ./apps/Image \
   -i ./apps/rootfs.img --debug

Changes since v4:
- Rebased patches on Linux-5.3-rc5
- Added Paolo's Acked-by and Reviewed-by
- Updated mailing list in MAINTAINERS entry

Changes since v3:
- Moved patch for ISA bitmap from KVM prep series to this series
- Make vsip_shadow as run-time percpu variable instead of compile-time
- Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs

Changes since v2:
- Removed references of KVM_REQ_IRQ_PENDING from all patches
- Use kvm->srcu within in-kernel KVM run loop
- Added percpu vsip_shadow to track last value programmed in VSIP CSR
- Added comments about irqs_pending and irqs_pending_mask
- Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
  in system_opcode_insn()
- Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
- Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
- Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid

Changes since v1:
- Fixed compile errors in building KVM RISC-V as module
- Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
- Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
- Made vmid_version as unsigned long instead of atomic
- Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
- Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
- Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
- Updated ONE_REG interface for CSR access to user-space
- Removed irqs_pending_lock and use atomic bitops instead
- Added separate patch for FP ONE_REG interface
- Added separate patch for updating MAINTAINERS file

Anup Patel (15):
  KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface
  RISC-V: Add bitmap reprensenting ISA features common across CPUs
  RISC-V: Add hypervisor extension related CSR defines
  RISC-V: Add initial skeletal KVM support
  RISC-V: KVM: Implement VCPU create, init and destroy functions
  RISC-V: KVM: Implement VCPU interrupts and requests handling
  RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  RISC-V: KVM: Implement VCPU world-switch
  RISC-V: KVM: Handle MMIO exits for VCPU
  RISC-V: KVM: Handle WFI exits for VCPU
  RISC-V: KVM: Implement VMID allocator
  RISC-V: KVM: Implement stage2 page table programming
  RISC-V: KVM: Implement MMU notifiers
  RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig
  RISC-V: KVM: Add MAINTAINERS entry

Atish Patra (5):
  RISC-V: Export few kernel symbols
  RISC-V: KVM: Add timer functionality
  RISC-V: KVM: FP lazy save/restore
  RISC-V: KVM: Implement ONE REG interface for FP registers
  RISC-V: KVM: Add SBI v0.1 support

 MAINTAINERS                             |  10 +
 arch/riscv/Kconfig                      |   2 +
 arch/riscv/Makefile                     |   2 +
 arch/riscv/configs/defconfig            |  11 +
 arch/riscv/configs/rv32_defconfig       |  11 +
 arch/riscv/include/asm/csr.h            |  58 ++
 arch/riscv/include/asm/hwcap.h          |  26 +
 arch/riscv/include/asm/kvm_host.h       | 246 ++++++
 arch/riscv/include/asm/kvm_vcpu_timer.h |  32 +
 arch/riscv/include/asm/pgtable-bits.h   |   1 +
 arch/riscv/include/uapi/asm/kvm.h       |  98 +++
 arch/riscv/kernel/asm-offsets.c         | 148 ++++
 arch/riscv/kernel/cpufeature.c          |  79 +-
 arch/riscv/kernel/smp.c                 |   2 +-
 arch/riscv/kernel/time.c                |   1 +
 arch/riscv/kvm/Kconfig                  |  34 +
 arch/riscv/kvm/Makefile                 |  14 +
 arch/riscv/kvm/main.c                   |  92 +++
 arch/riscv/kvm/mmu.c                    | 905 ++++++++++++++++++++++
 arch/riscv/kvm/tlb.S                    |  43 ++
 arch/riscv/kvm/vcpu.c                   | 989 ++++++++++++++++++++++++
 arch/riscv/kvm/vcpu_exit.c              | 556 +++++++++++++
 arch/riscv/kvm/vcpu_sbi.c               | 119 +++
 arch/riscv/kvm/vcpu_switch.S            | 368 +++++++++
 arch/riscv/kvm/vcpu_timer.c             | 106 +++
 arch/riscv/kvm/vm.c                     |  86 +++
 arch/riscv/kvm/vmid.c                   | 123 +++
 drivers/clocksource/timer-riscv.c       |   8 +
 include/clocksource/timer-riscv.h       |  16 +
 include/uapi/linux/kvm.h                |   1 +
 30 files changed, 4183 insertions(+), 4 deletions(-)
 create mode 100644 arch/riscv/include/asm/kvm_host.h
 create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
 create mode 100644 arch/riscv/include/uapi/asm/kvm.h
 create mode 100644 arch/riscv/kvm/Kconfig
 create mode 100644 arch/riscv/kvm/Makefile
 create mode 100644 arch/riscv/kvm/main.c
 create mode 100644 arch/riscv/kvm/mmu.c
 create mode 100644 arch/riscv/kvm/tlb.S
 create mode 100644 arch/riscv/kvm/vcpu.c
 create mode 100644 arch/riscv/kvm/vcpu_exit.c
 create mode 100644 arch/riscv/kvm/vcpu_sbi.c
 create mode 100644 arch/riscv/kvm/vcpu_switch.S
 create mode 100644 arch/riscv/kvm/vcpu_timer.c
 create mode 100644 arch/riscv/kvm/vm.c
 create mode 100644 arch/riscv/kvm/vmid.c
 create mode 100644 include/clocksource/timer-riscv.h

--
2.17.1

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 01/20] KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
@ 2019-08-22  8:42 ` Anup Patel
  2019-08-22  8:42 ` [PATCH v5 02/20] RISC-V: Add bitmap reprensenting ISA features common across CPUs Anup Patel
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:42 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

We will be using ONE_REG interface accessing VCPU registers from
user-space hence we add KVM_REG_RISCV for RISC-V VCPU registers.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/uapi/linux/kvm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5e3f12d5359e..fcaea3c2fc7e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1142,6 +1142,7 @@ struct kvm_dirty_tlb {
 #define KVM_REG_S390		0x5000000000000000ULL
 #define KVM_REG_ARM64		0x6000000000000000ULL
 #define KVM_REG_MIPS		0x7000000000000000ULL
+#define KVM_REG_RISCV		0x8000000000000000ULL
 
 #define KVM_REG_SIZE_SHIFT	52
 #define KVM_REG_SIZE_MASK	0x00f0000000000000ULL
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 02/20] RISC-V: Add bitmap reprensenting ISA features common across CPUs
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
  2019-08-22  8:42 ` [PATCH v5 01/20] KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface Anup Patel
@ 2019-08-22  8:42 ` Anup Patel
  2019-08-22  8:43 ` [PATCH v5 03/20] RISC-V: Export few kernel symbols Anup Patel
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:42 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

This patch adds riscv_isa bitmap which represents Host ISA features
common across all Host CPUs. The riscv_isa is not same as elf_hwcap
because elf_hwcap will only have ISA features relevant for user-space
apps whereas riscv_isa will have ISA features relevant to both kernel
and user-space apps.

One of the use-case for riscv_isa bitmap is in KVM hypervisor where
we will use it to do following operations:

1. Check whether hypervisor extension is available
2. Find ISA features that need to be virtualized (e.g. floating
   point support, vector extension, etc.)

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
---
 arch/riscv/include/asm/hwcap.h | 26 +++++++++++
 arch/riscv/kernel/cpufeature.c | 79 ++++++++++++++++++++++++++++++++--
 2 files changed, 102 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 7ecb7c6a57b1..9b657375aa51 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -8,6 +8,7 @@
 #ifndef __ASM_HWCAP_H
 #define __ASM_HWCAP_H
 
+#include <linux/bits.h>
 #include <uapi/asm/hwcap.h>
 
 #ifndef __ASSEMBLY__
@@ -22,5 +23,30 @@ enum {
 };
 
 extern unsigned long elf_hwcap;
+
+#define RISCV_ISA_EXT_a		('a' - 'a')
+#define RISCV_ISA_EXT_c		('c' - 'a')
+#define RISCV_ISA_EXT_d		('d' - 'a')
+#define RISCV_ISA_EXT_f		('f' - 'a')
+#define RISCV_ISA_EXT_h		('h' - 'a')
+#define RISCV_ISA_EXT_i		('i' - 'a')
+#define RISCV_ISA_EXT_m		('m' - 'a')
+#define RISCV_ISA_EXT_s		('s' - 'a')
+#define RISCV_ISA_EXT_u		('u' - 'a')
+#define RISCV_ISA_EXT_zicsr	(('z' - 'a') + 1)
+#define RISCV_ISA_EXT_zifencei	(('z' - 'a') + 2)
+#define RISCV_ISA_EXT_zam	(('z' - 'a') + 3)
+#define RISCV_ISA_EXT_ztso	(('z' - 'a') + 4)
+
+#define RISCV_ISA_EXT_MAX	256
+
+unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap);
+
+#define riscv_isa_extension_mask(ext) BIT_MASK(RISCV_ISA_EXT_##ext)
+
+bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int bit);
+#define riscv_isa_extension_available(isa_bitmap, ext)	\
+	__riscv_isa_extension_available(isa_bitmap, RISCV_ISA_EXT_##ext)
+
 #endif
 #endif
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index b1ade9a49347..4ce71ce5e290 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -6,21 +6,64 @@
  * Copyright (C) 2017 SiFive
  */
 
+#include <linux/bitmap.h>
 #include <linux/of.h>
 #include <asm/processor.h>
 #include <asm/hwcap.h>
 #include <asm/smp.h>
 
 unsigned long elf_hwcap __read_mostly;
+
+/* Host ISA bitmap */
+static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
+
 #ifdef CONFIG_FPU
 bool has_fpu __read_mostly;
 #endif
 
+/**
+ * riscv_isa_extension_base - Get base extension word
+ *
+ * @isa_bitmap ISA bitmap to use
+ * @returns base extension word as unsigned long value
+ *
+ * NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used.
+ */
+unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap)
+{
+	if (!isa_bitmap)
+		return riscv_isa[0];
+	return isa_bitmap[0];
+}
+EXPORT_SYMBOL_GPL(riscv_isa_extension_base);
+
+/**
+ * __riscv_isa_extension_available - Check whether given extension
+ * is available or not
+ *
+ * @isa_bitmap ISA bitmap to use
+ * @bit bit position of the desired extension
+ * @returns true or false
+ *
+ * NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used.
+ */
+bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int bit)
+{
+	const unsigned long *bmap = (isa_bitmap) ? isa_bitmap : riscv_isa;
+
+	if (bit >= RISCV_ISA_EXT_MAX)
+		return false;
+
+	return test_bit(bit, bmap) ? true : false;
+}
+EXPORT_SYMBOL_GPL(__riscv_isa_extension_available);
+
 void riscv_fill_hwcap(void)
 {
 	struct device_node *node;
 	const char *isa;
-	size_t i;
+	char print_str[BITS_PER_LONG+1];
+	size_t i, j, isa_len;
 	static unsigned long isa2hwcap[256] = {0};
 
 	isa2hwcap['i'] = isa2hwcap['I'] = COMPAT_HWCAP_ISA_I;
@@ -32,8 +75,11 @@ void riscv_fill_hwcap(void)
 
 	elf_hwcap = 0;
 
+	bitmap_zero(riscv_isa, RISCV_ISA_EXT_MAX);
+
 	for_each_of_cpu_node(node) {
 		unsigned long this_hwcap = 0;
+		unsigned long this_isa = 0;
 
 		if (riscv_of_processor_hartid(node) < 0)
 			continue;
@@ -43,8 +89,20 @@ void riscv_fill_hwcap(void)
 			continue;
 		}
 
-		for (i = 0; i < strlen(isa); ++i)
+		i = 0;
+		isa_len = strlen(isa);
+#if defined(CONFIG_32BIT)
+		if (!strncmp(isa, "rv32", 4))
+			i += 4;
+#elif defined(CONFIG_64BIT)
+		if (!strncmp(isa, "rv64", 4))
+			i += 4;
+#endif
+		for (; i < isa_len; ++i) {
 			this_hwcap |= isa2hwcap[(unsigned char)(isa[i])];
+			if ('a' <= isa[i] && isa[i] <= 'z')
+				this_isa |= (1UL << (isa[i] - 'a'));
+		}
 
 		/*
 		 * All "okay" hart should have same isa. Set HWCAP based on
@@ -55,6 +113,11 @@ void riscv_fill_hwcap(void)
 			elf_hwcap &= this_hwcap;
 		else
 			elf_hwcap = this_hwcap;
+
+		if (riscv_isa[0])
+			riscv_isa[0] &= this_isa;
+		else
+			riscv_isa[0] = this_isa;
 	}
 
 	/* We don't support systems with F but without D, so mask those out
@@ -64,7 +127,17 @@ void riscv_fill_hwcap(void)
 		elf_hwcap &= ~COMPAT_HWCAP_ISA_F;
 	}
 
-	pr_info("elf_hwcap is 0x%lx\n", elf_hwcap);
+	memset(print_str, 0, sizeof(print_str));
+	for (i = 0, j = 0; i < BITS_PER_LONG; i++)
+		if (riscv_isa[0] & BIT_MASK(i))
+			print_str[j++] = (char)('a' + i);
+	pr_info("riscv: ISA extensions %s\n", print_str);
+
+	memset(print_str, 0, sizeof(print_str));
+	for (i = 0, j = 0; i < BITS_PER_LONG; i++)
+		if (elf_hwcap & BIT_MASK(i))
+			print_str[j++] = (char)('a' + i);
+	pr_info("riscv: ELF capabilities %s\n", print_str);
 
 #ifdef CONFIG_FPU
 	if (elf_hwcap & (COMPAT_HWCAP_ISA_F | COMPAT_HWCAP_ISA_D))
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 03/20] RISC-V: Export few kernel symbols
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
  2019-08-22  8:42 ` [PATCH v5 01/20] KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface Anup Patel
  2019-08-22  8:42 ` [PATCH v5 02/20] RISC-V: Add bitmap reprensenting ISA features common across CPUs Anup Patel
@ 2019-08-22  8:43 ` Anup Patel
  2019-08-22  8:43 ` [PATCH v5 04/20] RISC-V: Add hypervisor extension related CSR defines Anup Patel
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:43 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

From: Atish Patra <atish.patra@wdc.com>

Export few symbols used by kvm module. Without this, kvm cannot
be compiled as a module.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/kernel/smp.c  | 2 +-
 arch/riscv/kernel/time.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 5a9834503a2f..402979f575de 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -193,4 +193,4 @@ void smp_send_reschedule(int cpu)
 {
 	send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE);
 }
-
+EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/riscv/kernel/time.c b/arch/riscv/kernel/time.c
index 541a2b885814..9dd1f2e64db1 100644
--- a/arch/riscv/kernel/time.c
+++ b/arch/riscv/kernel/time.c
@@ -9,6 +9,7 @@
 #include <asm/sbi.h>
 
 unsigned long riscv_timebase;
+EXPORT_SYMBOL_GPL(riscv_timebase);
 
 void __init time_init(void)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 04/20] RISC-V: Add hypervisor extension related CSR defines
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (2 preceding siblings ...)
  2019-08-22  8:43 ` [PATCH v5 03/20] RISC-V: Export few kernel symbols Anup Patel
@ 2019-08-22  8:43 ` Anup Patel
  2019-08-22  8:43 ` [PATCH v5 05/20] RISC-V: Add initial skeletal KVM support Anup Patel
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:43 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

This patch extends asm/csr.h by adding RISC-V hypervisor extension
related defines.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/csr.h | 58 ++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index a18923fa23c8..059c5cb22aaf 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -27,6 +27,8 @@
 #define SR_XS_CLEAN	_AC(0x00010000, UL)
 #define SR_XS_DIRTY	_AC(0x00018000, UL)
 
+#define SR_MXR		_AC(0x00080000, UL)
+
 #ifndef CONFIG_64BIT
 #define SR_SD		_AC(0x80000000, UL) /* FS/XS dirty */
 #else
@@ -59,10 +61,13 @@
 
 #define EXC_INST_MISALIGNED	0
 #define EXC_INST_ACCESS		1
+#define EXC_INST_ILLEGAL	2
 #define EXC_BREAKPOINT		3
 #define EXC_LOAD_ACCESS		5
 #define EXC_STORE_ACCESS	7
 #define EXC_SYSCALL		8
+#define EXC_HYPERVISOR_SYSCALL	9
+#define EXC_SUPERVISOR_SYSCALL	10
 #define EXC_INST_PAGE_FAULT	12
 #define EXC_LOAD_PAGE_FAULT	13
 #define EXC_STORE_PAGE_FAULT	15
@@ -72,6 +77,43 @@
 #define SIE_STIE		(_AC(0x1, UL) << IRQ_S_TIMER)
 #define SIE_SEIE		(_AC(0x1, UL) << IRQ_S_EXT)
 
+/* HSTATUS flags */
+#define HSTATUS_VTSR		_AC(0x00400000, UL)
+#define HSTATUS_VTVM		_AC(0x00100000, UL)
+#define HSTATUS_SP2V		_AC(0x00000200, UL)
+#define HSTATUS_SP2P		_AC(0x00000100, UL)
+#define HSTATUS_SPV		_AC(0x00000080, UL)
+#define HSTATUS_STL		_AC(0x00000040, UL)
+#define HSTATUS_SPRV		_AC(0x00000001, UL)
+
+/* HGATP flags */
+#define HGATP_MODE_OFF		_AC(0, UL)
+#define HGATP_MODE_SV32X4	_AC(1, UL)
+#define HGATP_MODE_SV39X4	_AC(8, UL)
+#define HGATP_MODE_SV48X4	_AC(9, UL)
+
+#define HGATP32_MODE_SHIFT	31
+#define HGATP32_VMID_SHIFT	22
+#define HGATP32_VMID_MASK	_AC(0x1FC00000, UL)
+#define HGATP32_PPN		_AC(0x003FFFFF, UL)
+
+#define HGATP64_MODE_SHIFT	60
+#define HGATP64_VMID_SHIFT	44
+#define HGATP64_VMID_MASK	_AC(0x03FFF00000000000, UL)
+#define HGATP64_PPN		_AC(0x00000FFFFFFFFFFF, UL)
+
+#ifdef CONFIG_64BIT
+#define HGATP_PPN		HGATP64_PPN
+#define HGATP_VMID_SHIFT	HGATP64_VMID_SHIFT
+#define HGATP_VMID_MASK		HGATP64_VMID_MASK
+#define HGATP_MODE		(HGATP_MODE_SV39X4 << HGATP64_MODE_SHIFT)
+#else
+#define HGATP_PPN		HGATP32_PPN
+#define HGATP_VMID_SHIFT	HGATP32_VMID_SHIFT
+#define HGATP_VMID_MASK		HGATP32_VMID_MASK
+#define HGATP_MODE		(HGATP_MODE_SV32X4 << HGATP32_MODE_SHIFT)
+#endif
+
 #define CSR_CYCLE		0xc00
 #define CSR_TIME		0xc01
 #define CSR_INSTRET		0xc02
@@ -85,6 +127,22 @@
 #define CSR_STVAL		0x143
 #define CSR_SIP			0x144
 #define CSR_SATP		0x180
+
+#define CSR_VSSTATUS		0x200
+#define CSR_VSIE		0x204
+#define CSR_VSTVEC		0x205
+#define CSR_VSSCRATCH		0x240
+#define CSR_VSEPC		0x241
+#define CSR_VSCAUSE		0x242
+#define CSR_VSTVAL		0x243
+#define CSR_VSIP		0x244
+#define CSR_VSATP		0x280
+
+#define CSR_HSTATUS		0x600
+#define CSR_HEDELEG		0x602
+#define CSR_HIDELEG		0x603
+#define CSR_HGATP		0x680
+
 #define CSR_CYCLEH		0xc80
 #define CSR_TIMEH		0xc81
 #define CSR_INSTRETH		0xc82
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 05/20] RISC-V: Add initial skeletal KVM support
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (3 preceding siblings ...)
  2019-08-22  8:43 ` [PATCH v5 04/20] RISC-V: Add hypervisor extension related CSR defines Anup Patel
@ 2019-08-22  8:43 ` Anup Patel
  2019-08-22  8:43 ` [PATCH v5 06/20] RISC-V: KVM: Implement VCPU create, init and destroy functions Anup Patel
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:43 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

This patch adds initial skeletal KVM RISC-V support which has:
1. A simple implementation of arch specific VM functions
   except kvm_vm_ioctl_get_dirty_log() which will implemeted
   in-future as part of stage2 page loging.
2. Stubs of required arch specific VCPU functions except
   kvm_arch_vcpu_ioctl_run() which is semi-complete and
   extended by subsequent patches.
3. Stubs for required arch specific stage2 MMU functions.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/Kconfig                |   2 +
 arch/riscv/Makefile               |   2 +
 arch/riscv/include/asm/kvm_host.h |  81 ++++++++
 arch/riscv/include/uapi/asm/kvm.h |  47 +++++
 arch/riscv/kvm/Kconfig            |  33 ++++
 arch/riscv/kvm/Makefile           |  13 ++
 arch/riscv/kvm/main.c             |  80 ++++++++
 arch/riscv/kvm/mmu.c              |  83 ++++++++
 arch/riscv/kvm/vcpu.c             | 312 ++++++++++++++++++++++++++++++
 arch/riscv/kvm/vcpu_exit.c        |  35 ++++
 arch/riscv/kvm/vm.c               |  79 ++++++++
 11 files changed, 767 insertions(+)
 create mode 100644 arch/riscv/include/asm/kvm_host.h
 create mode 100644 arch/riscv/include/uapi/asm/kvm.h
 create mode 100644 arch/riscv/kvm/Kconfig
 create mode 100644 arch/riscv/kvm/Makefile
 create mode 100644 arch/riscv/kvm/main.c
 create mode 100644 arch/riscv/kvm/mmu.c
 create mode 100644 arch/riscv/kvm/vcpu.c
 create mode 100644 arch/riscv/kvm/vcpu_exit.c
 create mode 100644 arch/riscv/kvm/vm.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 59a4727ecd6c..906104b8dc74 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -289,3 +289,5 @@ menu "Power management options"
 source "kernel/power/Kconfig"
 
 endmenu
+
+source "arch/riscv/kvm/Kconfig"
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 7a117be8297c..9f4f418978b1 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -74,6 +74,8 @@ head-y := arch/riscv/kernel/head.o
 
 core-y += arch/riscv/kernel/ arch/riscv/mm/ arch/riscv/net/
 
+core-$(CONFIG_KVM) += arch/riscv/kvm/
+
 libs-y += arch/riscv/lib/
 
 PHONY += vdso_install
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
new file mode 100644
index 000000000000..9459709656be
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Anup Patel <anup.patel@wdc.com>
+ */
+
+#ifndef __RISCV_KVM_HOST_H__
+#define __RISCV_KVM_HOST_H__
+
+#include <linux/types.h>
+#include <linux/kvm.h>
+#include <linux/kvm_types.h>
+
+#ifdef CONFIG_64BIT
+#define KVM_MAX_VCPUS			(1U << 16)
+#else
+#define KVM_MAX_VCPUS			(1U << 9)
+#endif
+
+#define KVM_USER_MEM_SLOTS		512
+#define KVM_HALT_POLL_NS_DEFAULT	500000
+
+#define KVM_VCPU_MAX_FEATURES		0
+
+#define KVM_REQ_SLEEP \
+	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VCPU_RESET		KVM_ARCH_REQ(1)
+
+struct kvm_vm_stat {
+	ulong remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+	u64 halt_successful_poll;
+	u64 halt_attempted_poll;
+	u64 halt_poll_invalid;
+	u64 halt_wakeup;
+	u64 ecall_exit_stat;
+	u64 wfi_exit_stat;
+	u64 mmio_exit_user;
+	u64 mmio_exit_kernel;
+	u64 exits;
+};
+
+struct kvm_arch_memory_slot {
+};
+
+struct kvm_arch {
+	/* stage2 page table */
+	pgd_t *pgd;
+	phys_addr_t pgd_phys;
+};
+
+struct kvm_vcpu_arch {
+	/* Don't run the VCPU (blocked) */
+	bool pause;
+
+	/* SRCU lock index for in-kernel run loop */
+	int srcu_idx;
+};
+
+static inline void kvm_arch_hardware_unsetup(void) {}
+static inline void kvm_arch_sync_events(struct kvm *kvm) {}
+static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+
+void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
+int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
+void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
+void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
+
+int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			unsigned long scause, unsigned long stval);
+
+static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
+
+#endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
new file mode 100644
index 000000000000..d15875818b6e
--- /dev/null
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Anup Patel <anup.patel@wdc.com>
+ */
+
+#ifndef __LINUX_KVM_RISCV_H
+#define __LINUX_KVM_RISCV_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#define __KVM_HAVE_READONLY_MEM
+
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+/* for KVM_GET_REGS and KVM_SET_REGS */
+struct kvm_regs {
+};
+
+/* for KVM_GET_FPU and KVM_SET_FPU */
+struct kvm_fpu {
+};
+
+/* KVM Debug exit structure */
+struct kvm_debug_exit_arch {
+};
+
+/* for KVM_SET_GUEST_DEBUG */
+struct kvm_guest_debug_arch {
+};
+
+/* definition of registers in kvm_run */
+struct kvm_sync_regs {
+};
+
+/* dummy definition */
+struct kvm_sregs {
+};
+
+#endif
+
+#endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
new file mode 100644
index 000000000000..35fd30d0e432
--- /dev/null
+++ b/arch/riscv/kvm/Kconfig
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# KVM configuration
+#
+
+source "virt/kvm/Kconfig"
+
+menuconfig VIRTUALIZATION
+	bool "Virtualization"
+	help
+	  Say Y here to get to see options for using your Linux host to run
+	  other operating systems inside virtual machines (guests).
+	  This option alone does not add any kernel code.
+
+	  If you say N, all options in this submenu will be skipped and
+	  disabled.
+
+if VIRTUALIZATION
+
+config KVM
+	tristate "Kernel-based Virtual Machine (KVM) support"
+	depends on OF
+	select PREEMPT_NOTIFIERS
+	select ANON_INODES
+	select KVM_MMIO
+	select HAVE_KVM_VCPU_ASYNC_IOCTL
+	select SRCU
+	help
+	  Support hosting virtualized guest machines.
+
+	  If unsure, say N.
+
+endif # VIRTUALIZATION
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
new file mode 100644
index 000000000000..37b5a59d4f4f
--- /dev/null
+++ b/arch/riscv/kvm/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for RISC-V KVM support
+#
+
+common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+
+ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
+
+kvm-objs := $(common-objs-y)
+
+kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o
+
+obj-$(CONFIG_KVM)	+= kvm.o
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
new file mode 100644
index 000000000000..e1ffe6d42f39
--- /dev/null
+++ b/arch/riscv/kvm/main.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Anup Patel <anup.patel@wdc.com>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/kvm_host.h>
+#include <asm/csr.h>
+#include <asm/hwcap.h>
+
+long kvm_arch_dev_ioctl(struct file *filp,
+			unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_check_processor_compat(void)
+{
+	return 0;
+}
+
+int kvm_arch_hardware_setup(void)
+{
+	return 0;
+}
+
+int kvm_arch_hardware_enable(void)
+{
+	unsigned long hideleg, hedeleg;
+
+	hedeleg = 0;
+	hedeleg |= (1UL << EXC_INST_MISALIGNED);
+	hedeleg |= (1UL << EXC_BREAKPOINT);
+	hedeleg |= (1UL << EXC_SYSCALL);
+	hedeleg |= (1UL << EXC_INST_PAGE_FAULT);
+	hedeleg |= (1UL << EXC_LOAD_PAGE_FAULT);
+	hedeleg |= (1UL << EXC_STORE_PAGE_FAULT);
+	csr_write(CSR_HEDELEG, hedeleg);
+
+	hideleg = 0;
+	hideleg |= SIE_SSIE;
+	hideleg |= SIE_STIE;
+	hideleg |= SIE_SEIE;
+	csr_write(CSR_HIDELEG, hideleg);
+
+	return 0;
+}
+
+void kvm_arch_hardware_disable(void)
+{
+	csr_write(CSR_HEDELEG, 0);
+	csr_write(CSR_HIDELEG, 0);
+}
+
+int kvm_arch_init(void *opaque)
+{
+	if (!riscv_isa_extension_available(NULL, h)) {
+		kvm_info("hypervisor extension not available\n");
+		return -ENODEV;
+	}
+
+	kvm_info("hypervisor extension available\n");
+
+	return 0;
+}
+
+void kvm_arch_exit(void)
+{
+}
+
+static int riscv_kvm_init(void)
+{
+	return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+}
+module_init(riscv_kvm_init);
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
new file mode 100644
index 000000000000..04dd089b86ff
--- /dev/null
+++ b/arch/riscv/kvm/mmu.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Anup Patel <anup.patel@wdc.com>
+ */
+
+#include <linux/bitops.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/hugetlb.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/vmalloc.h>
+#include <linux/kvm_host.h>
+#include <linux/sched/signal.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
+			   struct kvm_memory_slot *dont)
+{
+}
+
+int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
+			    unsigned long npages)
+{
+	return 0;
+}
+
+void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
+{
+}
+
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+	/* TODO: */
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+				   struct kvm_memory_slot *slot)
+{
+}
+
+void kvm_arch_commit_memory_region(struct kvm *kvm,
+				const struct kvm_userspace_memory_region *mem,
+				const struct kvm_memory_slot *old,
+				const struct kvm_memory_slot *new,
+				enum kvm_mr_change change)
+{
+	/* TODO: */
+}
+
+int kvm_arch_prepare_memory_region(struct kvm *kvm,
+				struct kvm_memory_slot *memslot,
+				const struct kvm_userspace_memory_region *mem,
+				enum kvm_mr_change change)
+{
+	/* TODO: */
+	return 0;
+}
+
+void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
+{
+	/* TODO: */
+}
+
+int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm)
+{
+	/* TODO: */
+	return 0;
+}
+
+void kvm_riscv_stage2_free_pgd(struct kvm *kvm)
+{
+	/* TODO: */
+}
+
+void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu)
+{
+	/* TODO: */
+}
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
new file mode 100644
index 000000000000..48536cb0c8e7
--- /dev/null
+++ b/arch/riscv/kvm/vcpu.c
@@ -0,0 +1,312 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Anup Patel <anup.patel@wdc.com>
+ */
+
+#include <linux/bitops.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kdebug.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/vmalloc.h>
+#include <linux/sched/signal.h>
+#include <linux/fs.h>
+#include <linux/kvm_host.h>
+#include <asm/csr.h>
+#include <asm/delay.h>
+#include <asm/hwcap.h>
+
+#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	VCPU_STAT(ecall_exit_stat),
+	VCPU_STAT(wfi_exit_stat),
+	VCPU_STAT(mmio_exit_user),
+	VCPU_STAT(mmio_exit_kernel),
+	VCPU_STAT(exits),
+	{ NULL }
+};
+
+struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	/* TODO: */
+	return NULL;
+}
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
+{
+	/* TODO: */
+	return 0;
+}
+
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+	/* TODO: */
+}
+
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	/* TODO: */
+	return 0;
+}
+
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
+{
+	/* TODO: */
+	return 0;
+}
+
+int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+{
+	/* TODO: */
+	return 0;
+}
+
+bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
+{
+	/* TODO: */
+	return false;
+}
+
+bool kvm_arch_has_vcpu_debugfs(void)
+{
+	return false;
+}
+
+int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+	return VM_FAULT_SIGBUS;
+}
+
+long kvm_arch_vcpu_async_ioctl(struct file *filp,
+			       unsigned int ioctl, unsigned long arg)
+{
+	/* TODO; */
+	return -ENOIOCTLCMD;
+}
+
+long kvm_arch_vcpu_ioctl(struct file *filp,
+			 unsigned int ioctl, unsigned long arg)
+{
+	/* TODO: */
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+				  struct kvm_translation *tr)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	/* TODO: */
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	/* TODO: */
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+					struct kvm_guest_debug *dbg)
+{
+	/* TODO; To be implemented later. */
+	return -EINVAL;
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	/* TODO: */
+
+	kvm_riscv_stage2_update_hgatp(vcpu);
+}
+
+void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	/* TODO: */
+}
+
+static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
+{
+	/* TODO: */
+}
+
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	int ret;
+	unsigned long scause, stval;
+
+	vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+	/* Process MMIO value returned from user-space */
+	if (run->exit_reason == KVM_EXIT_MMIO) {
+		ret = kvm_riscv_vcpu_mmio_return(vcpu, vcpu->run);
+		if (ret) {
+			srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
+			return ret;
+		}
+	}
+
+	if (run->immediate_exit) {
+		srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
+		return -EINTR;
+	}
+
+	vcpu_load(vcpu);
+
+	kvm_sigset_activate(vcpu);
+
+	ret = 1;
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	while (ret > 0) {
+		/* Check conditions before entering the guest */
+		cond_resched();
+
+		kvm_riscv_check_vcpu_requests(vcpu);
+
+		preempt_disable();
+
+		local_irq_disable();
+
+		/*
+		 * Exit if we have a signal pending so that we can deliver
+		 * the signal to user space.
+		 */
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			run->exit_reason = KVM_EXIT_INTR;
+		}
+
+		/*
+		 * Ensure we set mode to IN_GUEST_MODE after we disable
+		 * interrupts and before the final VCPU requests check.
+		 * See the comment in kvm_vcpu_exiting_guest_mode() and
+		 * Documentation/virtual/kvm/vcpu-requests.rst
+		 */
+		vcpu->mode = IN_GUEST_MODE;
+
+		srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
+		smp_mb__after_srcu_read_unlock();
+
+		if (ret <= 0 ||
+		    kvm_request_pending(vcpu)) {
+			vcpu->mode = OUTSIDE_GUEST_MODE;
+			local_irq_enable();
+			preempt_enable();
+			vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+			continue;
+		}
+
+		guest_enter_irqoff();
+
+		__kvm_riscv_switch_to(&vcpu->arch);
+
+		vcpu->mode = OUTSIDE_GUEST_MODE;
+		vcpu->stat.exits++;
+
+		/* Save SCAUSE and STVAL because we might get an interrupt
+		 * between __kvm_riscv_switch_to() and local_irq_enable()
+		 * which can potentially overwrite SCAUSE and STVAL.
+		 */
+		scause = csr_read(CSR_SCAUSE);
+		stval = csr_read(CSR_STVAL);
+
+		/*
+		 * We may have taken a host interrupt in VS/VU-mode (i.e.
+		 * while executing the guest). This interrupt is still
+		 * pending, as we haven't serviced it yet!
+		 *
+		 * We're now back in HS-mode with interrupts disabled
+		 * so enabling the interrupts now will have the effect
+		 * of taking the interrupt again, in HS-mode this time.
+		 */
+		local_irq_enable();
+
+		/*
+		 * We do local_irq_enable() before calling guest_exit() so
+		 * that if a timer interrupt hits while running the guest
+		 * we account that tick as being spent in the guest. We
+		 * enable preemption after calling guest_exit() so that if
+		 * we get preempted we make sure ticks after that is not
+		 * counted as guest time.
+		 */
+		guest_exit();
+
+		preempt_enable();
+
+		vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+		ret = kvm_riscv_vcpu_exit(vcpu, run, scause, stval);
+	}
+
+	kvm_sigset_deactivate(vcpu);
+
+	vcpu_put(vcpu);
+
+	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
+
+	return ret;
+}
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
new file mode 100644
index 000000000000..e4d7c8f0807a
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Anup Patel <anup.patel@wdc.com>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+
+/**
+ * kvm_riscv_vcpu_mmio_return -- Handle MMIO loads after user space emulation
+ *			     or in-kernel IO emulation
+ *
+ * @vcpu: The VCPU pointer
+ * @run:  The VCPU run struct containing the mmio data
+ */
+int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* TODO: */
+	return 0;
+}
+
+/*
+ * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
+ * proper exit to userspace.
+ */
+int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			unsigned long scause, unsigned long stval)
+{
+	/* TODO: */
+	return 0;
+}
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
new file mode 100644
index 000000000000..ac0211820521
--- /dev/null
+++ b/arch/riscv/kvm/vm.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Anup Patel <anup.patel@wdc.com>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/kvm_host.h>
+
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	/* TODO: To be added later. */
+	return -ENOTSUPP;
+}
+
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+{
+	int r;
+
+	r = kvm_riscv_stage2_alloc_pgd(kvm);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+void kvm_arch_destroy_vm(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		if (kvm->vcpus[i]) {
+			kvm_arch_vcpu_destroy(kvm->vcpus[i]);
+			kvm->vcpus[i] = NULL;
+		}
+	}
+}
+
+int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
+{
+	int r;
+
+	switch (ext) {
+	case KVM_CAP_DEVICE_CTRL:
+	case KVM_CAP_USER_MEMORY:
+	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
+	case KVM_CAP_ONE_REG:
+	case KVM_CAP_READONLY_MEM:
+	case KVM_CAP_MP_STATE:
+	case KVM_CAP_IMMEDIATE_EXIT:
+		r = 1;
+		break;
+	case KVM_CAP_NR_VCPUS:
+		r = num_online_cpus();
+		break;
+	case KVM_CAP_MAX_VCPUS:
+		r = KVM_MAX_VCPUS;
+		break;
+	case KVM_CAP_NR_MEMSLOTS:
+		r = KVM_USER_MEM_SLOTS;
+		break;
+	default:
+		r = 0;
+		break;
+	}
+
+	return r;
+}
+
+long kvm_arch_vm_ioctl(struct file *filp,
+		       unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 06/20] RISC-V: KVM: Implement VCPU create, init and destroy functions
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (4 preceding siblings ...)
  2019-08-22  8:43 ` [PATCH v5 05/20] RISC-V: Add initial skeletal KVM support Anup Patel
@ 2019-08-22  8:43 ` Anup Patel
  2019-08-22  8:44 ` [PATCH v5 07/20] RISC-V: KVM: Implement VCPU interrupts and requests handling Anup Patel
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:43 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

This patch implements VCPU create, init and destroy functions
required by generic KVM module. We don't have much dynamic
resources in struct kvm_vcpu_arch so thest functions are quite
simple for KVM RISC-V.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/kvm_host.h | 68 +++++++++++++++++++++++++++++++
 arch/riscv/kvm/vcpu.c             | 68 +++++++++++++++++++++++++++++--
 2 files changed, 132 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 9459709656be..dab32c9c3470 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -53,7 +53,75 @@ struct kvm_arch {
 	phys_addr_t pgd_phys;
 };
 
+struct kvm_cpu_context {
+	unsigned long zero;
+	unsigned long ra;
+	unsigned long sp;
+	unsigned long gp;
+	unsigned long tp;
+	unsigned long t0;
+	unsigned long t1;
+	unsigned long t2;
+	unsigned long s0;
+	unsigned long s1;
+	unsigned long a0;
+	unsigned long a1;
+	unsigned long a2;
+	unsigned long a3;
+	unsigned long a4;
+	unsigned long a5;
+	unsigned long a6;
+	unsigned long a7;
+	unsigned long s2;
+	unsigned long s3;
+	unsigned long s4;
+	unsigned long s5;
+	unsigned long s6;
+	unsigned long s7;
+	unsigned long s8;
+	unsigned long s9;
+	unsigned long s10;
+	unsigned long s11;
+	unsigned long t3;
+	unsigned long t4;
+	unsigned long t5;
+	unsigned long t6;
+	unsigned long sepc;
+	unsigned long sstatus;
+	unsigned long hstatus;
+};
+
+struct kvm_vcpu_csr {
+	unsigned long vsstatus;
+	unsigned long vsie;
+	unsigned long vstvec;
+	unsigned long vsscratch;
+	unsigned long vsepc;
+	unsigned long vscause;
+	unsigned long vstval;
+	unsigned long vsip;
+	unsigned long vsatp;
+};
+
 struct kvm_vcpu_arch {
+	/* VCPU ran atleast once */
+	bool ran_atleast_once;
+
+	/* ISA feature bits (similar to MISA) */
+	unsigned long isa;
+
+	/* CPU context of Guest VCPU */
+	struct kvm_cpu_context guest_context;
+
+	/* CPU CSR context of Guest VCPU */
+	struct kvm_vcpu_csr guest_csr;
+
+	/* CPU context upon Guest VCPU reset */
+	struct kvm_cpu_context guest_reset_context;
+
+	/* CPU CSR context upon Guest VCPU reset */
+	struct kvm_vcpu_csr guest_reset_csr;
+
 	/* Don't run the VCPU (blocked) */
 	bool pause;
 
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 48536cb0c8e7..8272b05d6ce4 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -31,10 +31,48 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
 
+#define KVM_RISCV_ISA_ALLOWED	(riscv_isa_extension_mask(a) | \
+				 riscv_isa_extension_mask(c) | \
+				 riscv_isa_extension_mask(d) | \
+				 riscv_isa_extension_mask(f) | \
+				 riscv_isa_extension_mask(i) | \
+				 riscv_isa_extension_mask(m) | \
+				 riscv_isa_extension_mask(s) | \
+				 riscv_isa_extension_mask(u))
+
+static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+	struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr;
+	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+	struct kvm_cpu_context *reset_cntx = &vcpu->arch.guest_reset_context;
+
+	memcpy(csr, reset_csr, sizeof(*csr));
+
+	memcpy(cntx, reset_cntx, sizeof(*cntx));
+}
+
 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 {
-	/* TODO: */
-	return NULL;
+	int err;
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	if (!vcpu) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	return vcpu;
+
+free_vcpu:
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
+out:
+	return ERR_PTR(err);
 }
 
 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
@@ -48,13 +86,32 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
+	struct kvm_cpu_context *cntx;
+
+	/* Mark this VCPU never ran */
+	vcpu->arch.ran_atleast_once = false;
+
+	/* Setup ISA features available to VCPU */
+	vcpu->arch.isa = riscv_isa_extension_base(NULL) & KVM_RISCV_ISA_ALLOWED;
+
+	/* Setup reset state of shadow SSTATUS and HSTATUS CSRs */
+	cntx = &vcpu->arch.guest_reset_context;
+	cntx->sstatus = SR_SPP | SR_SPIE;
+	cntx->hstatus = 0;
+	cntx->hstatus |= HSTATUS_SP2V;
+	cntx->hstatus |= HSTATUS_SP2P;
+	cntx->hstatus |= HSTATUS_SPV;
+
+	/* Reset VCPU */
+	kvm_riscv_reset_vcpu(vcpu);
+
 	return 0;
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
+	kvm_riscv_stage2_flush_cache(vcpu);
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
@@ -199,6 +256,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	int ret;
 	unsigned long scause, stval;
 
+	/* Mark this VCPU ran atleast once */
+	vcpu->arch.ran_atleast_once = true;
+
 	vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 
 	/* Process MMIO value returned from user-space */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 07/20] RISC-V: KVM: Implement VCPU interrupts and requests handling
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (5 preceding siblings ...)
  2019-08-22  8:43 ` [PATCH v5 06/20] RISC-V: KVM: Implement VCPU create, init and destroy functions Anup Patel
@ 2019-08-22  8:44 ` Anup Patel
  2019-08-22  8:44 ` [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls Anup Patel
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:44 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

This patch implements VCPU interrupts and requests which are both
asynchronous events.

The VCPU interrupts can be set/unset using KVM_INTERRUPT ioctl from
user-space. In future, the in-kernel IRQCHIP emulation will use
kvm_riscv_vcpu_set_interrupt() and kvm_riscv_vcpu_unset_interrupt()
functions to set/unset VCPU interrupts.

Important VCPU requests implemented by this patch are:
KVM_REQ_SLEEP       - set whenever VCPU itself goes to sleep state
KVM_REQ_VCPU_RESET  - set whenever VCPU reset is requested

The WFI trap-n-emulate (added later) will use KVM_REQ_SLEEP request
and kvm_riscv_vcpu_has_interrupt() function.

The KVM_REQ_VCPU_RESET request will be used by SBI emulation (added
later) to power-up a VCPU in power-off state. The user-space can use
the GET_MPSTATE/SET_MPSTATE ioctls to get/set power state of a VCPU.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/kvm_host.h |  26 +++++
 arch/riscv/include/uapi/asm/kvm.h |   3 +
 arch/riscv/kvm/main.c             |   8 ++
 arch/riscv/kvm/vcpu.c             | 184 +++++++++++++++++++++++++++---
 4 files changed, 208 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index dab32c9c3470..d801216da6d0 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -122,6 +122,21 @@ struct kvm_vcpu_arch {
 	/* CPU CSR context upon Guest VCPU reset */
 	struct kvm_vcpu_csr guest_reset_csr;
 
+	/*
+	 * VCPU interrupts
+	 *
+	 * We have a lockless approach for tracking pending VCPU interrupts
+	 * implemented using atomic bitops. The irqs_pending bitmap represent
+	 * pending interrupts whereas irqs_pending_mask represent bits changed
+	 * in irqs_pending. Our approach is modeled around multiple producer
+	 * and single consumer problem where the consumer is the VCPU itself.
+	 */
+	unsigned long irqs_pending;
+	unsigned long irqs_pending_mask;
+
+	/* VCPU power-off state */
+	bool power_off;
+
 	/* Don't run the VCPU (blocked) */
 	bool pause;
 
@@ -135,6 +150,9 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
+int kvm_riscv_setup_vsip(void);
+void kvm_riscv_cleanup_vsip(void);
+
 void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
 int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
@@ -146,4 +164,12 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 
 static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
 
+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu);
+bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
+
 #endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index d15875818b6e..6dbc056d58ba 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -18,6 +18,9 @@
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
+#define KVM_INTERRUPT_SET	-1U
+#define KVM_INTERRUPT_UNSET	-2U
+
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
 };
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index e1ffe6d42f39..d088247843c5 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -48,6 +48,8 @@ int kvm_arch_hardware_enable(void)
 	hideleg |= SIE_SEIE;
 	csr_write(CSR_HIDELEG, hideleg);
 
+	csr_write(CSR_VSIP, 0);
+
 	return 0;
 }
 
@@ -59,11 +61,17 @@ void kvm_arch_hardware_disable(void)
 
 int kvm_arch_init(void *opaque)
 {
+	int ret;
+
 	if (!riscv_isa_extension_available(NULL, h)) {
 		kvm_info("hypervisor extension not available\n");
 		return -ENODEV;
 	}
 
+	ret = kvm_riscv_setup_vsip();
+	if (ret)
+		return ret;
+
 	kvm_info("hypervisor extension available\n");
 
 	return 0;
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 8272b05d6ce4..7f59e85c6af8 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -11,6 +11,7 @@
 #include <linux/err.h>
 #include <linux/kdebug.h>
 #include <linux/module.h>
+#include <linux/percpu.h>
 #include <linux/uaccess.h>
 #include <linux/vmalloc.h>
 #include <linux/sched/signal.h>
@@ -40,6 +41,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 				 riscv_isa_extension_mask(s) | \
 				 riscv_isa_extension_mask(u))
 
+static unsigned long __percpu *vsip_shadow;
+
 static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
@@ -50,6 +53,9 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
 	memcpy(csr, reset_csr, sizeof(*csr));
 
 	memcpy(cntx, reset_cntx, sizeof(*cntx));
+
+	WRITE_ONCE(vcpu->arch.irqs_pending, 0);
+	WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
 }
 
 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
@@ -116,8 +122,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
-	return 0;
+	return READ_ONCE(vcpu->arch.irqs_pending) & (1UL << IRQ_S_TIMER);
 }
 
 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
@@ -130,20 +135,18 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
-	return 0;
+	return (kvm_riscv_vcpu_has_interrupt(vcpu) &&
+		!vcpu->arch.power_off && !vcpu->arch.pause);
 }
 
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
-	return 0;
+	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
 }
 
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
-	return false;
+	return (vcpu->arch.guest_context.sstatus & SR_SPP) ? true : false;
 }
 
 bool kvm_arch_has_vcpu_debugfs(void)
@@ -164,7 +167,21 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
 long kvm_arch_vcpu_async_ioctl(struct file *filp,
 			       unsigned int ioctl, unsigned long arg)
 {
-	/* TODO; */
+	struct kvm_vcpu *vcpu = filp->private_data;
+	void __user *argp = (void __user *)arg;
+
+	if (ioctl == KVM_INTERRUPT) {
+		struct kvm_interrupt irq;
+
+		if (copy_from_user(&irq, argp, sizeof(irq)))
+			return -EFAULT;
+
+		if (irq.irq == KVM_INTERRUPT_SET)
+			return kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_EXT);
+		else
+			return kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_EXT);
+	}
+
 	return -ENOIOCTLCMD;
 }
 
@@ -213,18 +230,103 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 	return -EINVAL;
 }
 
+void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+	unsigned long mask, val;
+
+	if (READ_ONCE(vcpu->arch.irqs_pending_mask)) {
+		mask = xchg_acquire(&vcpu->arch.irqs_pending_mask, 0);
+		val = READ_ONCE(vcpu->arch.irqs_pending) & mask;
+
+		csr->vsip &= ~mask;
+		csr->vsip |= val;
+	}
+}
+
+void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.guest_csr.vsip = csr_read(CSR_VSIP);
+	vcpu->arch.guest_csr.vsie = csr_read(CSR_VSIE);
+}
+
+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
+{
+	if (irq != IRQ_S_SOFT &&
+	    irq != IRQ_S_TIMER &&
+	    irq != IRQ_S_EXT)
+		return -EINVAL;
+
+	set_bit(irq, &vcpu->arch.irqs_pending);
+	smp_mb__before_atomic();
+	set_bit(irq, &vcpu->arch.irqs_pending_mask);
+
+	kvm_vcpu_kick(vcpu);
+
+	return 0;
+}
+
+int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
+{
+	if (irq != IRQ_S_SOFT &&
+	    irq != IRQ_S_TIMER &&
+	    irq != IRQ_S_EXT)
+		return -EINVAL;
+
+	clear_bit(irq, &vcpu->arch.irqs_pending);
+	smp_mb__before_atomic();
+	set_bit(irq, &vcpu->arch.irqs_pending_mask);
+
+	return 0;
+}
+
+bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	return (READ_ONCE(vcpu->arch.irqs_pending) &
+		vcpu->arch.guest_csr.vsie) ? true : false;
+}
+
+void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.power_off = true;
+	kvm_make_request(KVM_REQ_SLEEP, vcpu);
+	kvm_vcpu_kick(vcpu);
+}
+
+void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.power_off = false;
+	kvm_vcpu_wake_up(vcpu);
+}
+
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	/* TODO: */
+	if (vcpu->arch.power_off)
+		mp_state->mp_state = KVM_MP_STATE_STOPPED;
+	else
+		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
+
 	return 0;
 }
 
 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	/* TODO: */
-	return 0;
+	int ret = 0;
+
+	switch (mp_state->mp_state) {
+	case KVM_MP_STATE_RUNNABLE:
+		vcpu->arch.power_off = false;
+		break;
+	case KVM_MP_STATE_STOPPED:
+		kvm_riscv_vcpu_power_off(vcpu);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
 }
 
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
@@ -248,7 +350,51 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 
 static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
+	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
+
+	if (kvm_request_pending(vcpu)) {
+		if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) {
+			swait_event_interruptible_exclusive(*wq,
+						((!vcpu->arch.power_off) &&
+						(!vcpu->arch.pause)));
+
+			if (vcpu->arch.power_off || vcpu->arch.pause) {
+				/*
+				 * Awaken to handle a signal, request to
+				 * sleep again later.
+				 */
+				kvm_make_request(KVM_REQ_SLEEP, vcpu);
+			}
+		}
+
+		if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
+			kvm_riscv_reset_vcpu(vcpu);
+	}
+}
+
+static void kvm_riscv_update_vsip(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+	unsigned long *vsip = raw_cpu_ptr(vsip_shadow);
+
+	if (*vsip != csr->vsip) {
+		csr_write(CSR_VSIP, csr->vsip);
+		*vsip = csr->vsip;
+	}
+}
+
+int kvm_riscv_setup_vsip(void)
+{
+	vsip_shadow = alloc_percpu(unsigned long);
+	if (!vsip_shadow)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void kvm_riscv_cleanup_vsip(void)
+{
+	free_percpu(vsip_shadow);
 }
 
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
@@ -311,6 +457,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
 		smp_mb__after_srcu_read_unlock();
 
+		/*
+		 * We might have got VCPU interrupts updated asynchronously
+		 * so update it in HW.
+		 */
+		kvm_riscv_vcpu_flush_interrupts(vcpu);
+
+		/* Update VSIP CSR for current CPU */
+		kvm_riscv_update_vsip(vcpu);
+
 		if (ret <= 0 ||
 		    kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
@@ -334,6 +489,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		scause = csr_read(CSR_SCAUSE);
 		stval = csr_read(CSR_STVAL);
 
+		/* Syncup interrupts state with HW */
+		kvm_riscv_vcpu_sync_interrupts(vcpu);
+
 		/*
 		 * We may have taken a host interrupt in VS/VU-mode (i.e.
 		 * while executing the guest). This interrupt is still
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (6 preceding siblings ...)
  2019-08-22  8:44 ` [PATCH v5 07/20] RISC-V: KVM: Implement VCPU interrupts and requests handling Anup Patel
@ 2019-08-22  8:44 ` Anup Patel
  2019-08-22 12:01   ` Alexander Graf
  2019-08-22  8:44 ` [PATCH v5 09/20] RISC-V: KVM: Implement VCPU world-switch Anup Patel
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:44 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
VCPU config and registers from user-space.

We have three types of VCPU registers:
1. CONFIG - these are VCPU config and capabilities
2. CORE   - these are VCPU general purpose registers
3. CSR    - these are VCPU control and status registers

The CONFIG registers available to user-space are ISA and TIMEBASE. Out
of these, TIMEBASE is a read-only register which inform user-space about
VCPU timer base frequency. The ISA register is a read and write register
where user-space can only write the desired VCPU ISA capabilities before
running the VCPU.

The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
PC and MODE. The PC register represents program counter whereas the MODE
register represent VCPU privilege mode (i.e. S/U-mode).

The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.

In future, more VCPU register types will be added (such as FP) for the
KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/uapi/asm/kvm.h |  40 ++++-
 arch/riscv/kvm/vcpu.c             | 235 +++++++++++++++++++++++++++++-
 2 files changed, 272 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index 6dbc056d58ba..024f220eb17e 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -23,8 +23,15 @@
 
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
+	/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
+	struct user_regs_struct regs;
+	unsigned long mode;
 };
 
+/* Possible privilege modes for kvm_regs */
+#define KVM_RISCV_MODE_S	1
+#define KVM_RISCV_MODE_U	0
+
 /* for KVM_GET_FPU and KVM_SET_FPU */
 struct kvm_fpu {
 };
@@ -41,10 +48,41 @@ struct kvm_guest_debug_arch {
 struct kvm_sync_regs {
 };
 
-/* dummy definition */
+/* for KVM_GET_SREGS and KVM_SET_SREGS */
 struct kvm_sregs {
+	unsigned long sstatus;
+	unsigned long sie;
+	unsigned long stvec;
+	unsigned long sscratch;
+	unsigned long sepc;
+	unsigned long scause;
+	unsigned long stval;
+	unsigned long sip;
+	unsigned long satp;
 };
 
+#define KVM_REG_SIZE(id)		\
+	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
+
+/* If you need to interpret the index values, here is the key: */
+#define KVM_REG_RISCV_TYPE_MASK		0x00000000FF000000
+#define KVM_REG_RISCV_TYPE_SHIFT	24
+
+/* Config registers are mapped as type 1 */
+#define KVM_REG_RISCV_CONFIG		(0x01 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CONFIG_ISA	0x0
+#define KVM_REG_RISCV_CONFIG_TIMEBASE	0x1
+
+/* Core registers are mapped as type 2 */
+#define KVM_REG_RISCV_CORE		(0x02 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CORE_REG(name)	\
+		(offsetof(struct kvm_regs, name) / sizeof(unsigned long))
+
+/* Control and status registers are mapped as type 3 */
+#define KVM_REG_RISCV_CSR		(0x03 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CSR_REG(name)	\
+		(offsetof(struct kvm_sregs, name) / sizeof(unsigned long))
+
 #endif
 
 #endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 7f59e85c6af8..9396a83c0611 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -164,6 +164,215 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
 	return VM_FAULT_SIGBUS;
 }
 
+static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
+					 const struct kvm_one_reg *reg)
+{
+	unsigned long __user *uaddr =
+			(unsigned long __user *)(unsigned long)reg->addr;
+	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+					    KVM_REG_SIZE_MASK |
+					    KVM_REG_RISCV_CONFIG);
+	unsigned long reg_val;
+
+	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+		return -EINVAL;
+
+	switch (reg_num) {
+	case KVM_REG_RISCV_CONFIG_ISA:
+		reg_val = vcpu->arch.isa;
+		break;
+	case KVM_REG_RISCV_CONFIG_TIMEBASE:
+		reg_val = riscv_timebase;
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
+					 const struct kvm_one_reg *reg)
+{
+	unsigned long __user *uaddr =
+			(unsigned long __user *)(unsigned long)reg->addr;
+	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+					    KVM_REG_SIZE_MASK |
+					    KVM_REG_RISCV_CONFIG);
+	unsigned long reg_val;
+
+	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+		return -EINVAL;
+
+	if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
+		return -EFAULT;
+
+	switch (reg_num) {
+	case KVM_REG_RISCV_CONFIG_ISA:
+		if (!vcpu->arch.ran_atleast_once) {
+			vcpu->arch.isa = reg_val;
+			vcpu->arch.isa &= riscv_isa_extension_base(NULL);
+			vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED;
+		} else {
+			return -ENOTSUPP;
+		}
+		break;
+	case KVM_REG_RISCV_CONFIG_TIMEBASE:
+		return -ENOTSUPP;
+	default:
+		return -EINVAL;
+	};
+
+	return 0;
+}
+
+static int kvm_riscv_vcpu_get_reg_core(struct kvm_vcpu *vcpu,
+				       const struct kvm_one_reg *reg)
+{
+	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+	unsigned long __user *uaddr =
+			(unsigned long __user *)(unsigned long)reg->addr;
+	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+					    KVM_REG_SIZE_MASK |
+					    KVM_REG_RISCV_CORE);
+	unsigned long reg_val;
+
+	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+		return -EINVAL;
+
+	if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
+		reg_val = cntx->sepc;
+	else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
+		 reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6))
+		reg_val = ((unsigned long *)cntx)[reg_num];
+	else if (reg_num == KVM_REG_RISCV_CORE_REG(mode))
+		reg_val = (cntx->sstatus & SR_SPP) ?
+				KVM_RISCV_MODE_S : KVM_RISCV_MODE_U;
+	else
+		return -EINVAL;
+
+	if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_core(struct kvm_vcpu *vcpu,
+				       const struct kvm_one_reg *reg)
+{
+	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+	unsigned long __user *uaddr =
+			(unsigned long __user *)(unsigned long)reg->addr;
+	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+					    KVM_REG_SIZE_MASK |
+					    KVM_REG_RISCV_CORE);
+	unsigned long reg_val;
+
+	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+		return -EINVAL;
+
+	if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
+		return -EFAULT;
+
+	if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
+		cntx->sepc = reg_val;
+	else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
+		 reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6))
+		((unsigned long *)cntx)[reg_num] = reg_val;
+	else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) {
+		if (reg_val == KVM_RISCV_MODE_S)
+			cntx->sstatus |= SR_SPP;
+		else
+			cntx->sstatus &= ~SR_SPP;
+	} else
+		return -EINVAL;
+
+	return 0;
+}
+
+static int kvm_riscv_vcpu_get_reg_csr(struct kvm_vcpu *vcpu,
+				      const struct kvm_one_reg *reg)
+{
+	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+	unsigned long __user *uaddr =
+			(unsigned long __user *)(unsigned long)reg->addr;
+	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+					    KVM_REG_SIZE_MASK |
+					    KVM_REG_RISCV_CSR);
+	unsigned long reg_val;
+
+	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+		return -EINVAL;
+	if (reg_num >= sizeof(struct kvm_sregs) / sizeof(unsigned long))
+		return -EINVAL;
+
+	if (reg_num == KVM_REG_RISCV_CSR_REG(sip))
+		kvm_riscv_vcpu_flush_interrupts(vcpu);
+
+	reg_val = ((unsigned long *)csr)[reg_num];
+
+	if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu,
+				      const struct kvm_one_reg *reg)
+{
+	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+	unsigned long __user *uaddr =
+			(unsigned long __user *)(unsigned long)reg->addr;
+	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+					    KVM_REG_SIZE_MASK |
+					    KVM_REG_RISCV_CSR);
+	unsigned long reg_val;
+
+	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+		return -EINVAL;
+	if (reg_num >= sizeof(struct kvm_sregs) / sizeof(unsigned long))
+		return -EINVAL;
+
+	if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
+		return -EFAULT;
+
+	((unsigned long *)csr)[reg_num] = reg_val;
+
+	if (reg_num == KVM_REG_RISCV_CSR_REG(sip))
+		WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
+
+	return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu,
+				  const struct kvm_one_reg *reg)
+{
+	if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CONFIG)
+		return kvm_riscv_vcpu_set_reg_config(vcpu, reg);
+	else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CORE)
+		return kvm_riscv_vcpu_set_reg_core(vcpu, reg);
+	else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR)
+		return kvm_riscv_vcpu_set_reg_csr(vcpu, reg);
+
+	return -EINVAL;
+}
+
+static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu,
+				  const struct kvm_one_reg *reg)
+{
+	if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CONFIG)
+		return kvm_riscv_vcpu_get_reg_config(vcpu, reg);
+	else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CORE)
+		return kvm_riscv_vcpu_get_reg_core(vcpu, reg);
+	else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR)
+		return kvm_riscv_vcpu_get_reg_csr(vcpu, reg);
+
+	return -EINVAL;
+}
+
 long kvm_arch_vcpu_async_ioctl(struct file *filp,
 			       unsigned int ioctl, unsigned long arg)
 {
@@ -188,8 +397,30 @@ long kvm_arch_vcpu_async_ioctl(struct file *filp,
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
-	/* TODO: */
-	return -EINVAL;
+	struct kvm_vcpu *vcpu = filp->private_data;
+	void __user *argp = (void __user *)arg;
+	long r = -EINVAL;
+
+	switch (ioctl) {
+	case KVM_SET_ONE_REG:
+	case KVM_GET_ONE_REG: {
+		struct kvm_one_reg reg;
+
+		r = -EFAULT;
+		if (copy_from_user(&reg, argp, sizeof(reg)))
+			break;
+
+		if (ioctl == KVM_SET_ONE_REG)
+			r = kvm_riscv_vcpu_set_reg(vcpu, &reg);
+		else
+			r = kvm_riscv_vcpu_get_reg(vcpu, &reg);
+		break;
+	}
+	default:
+		break;
+	}
+
+	return r;
 }
 
 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 09/20] RISC-V: KVM: Implement VCPU world-switch
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (7 preceding siblings ...)
  2019-08-22  8:44 ` [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls Anup Patel
@ 2019-08-22  8:44 ` Anup Patel
  2019-08-22  8:44 ` [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU Anup Patel
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:44 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

This patch implements the VCPU world-switch for KVM RISC-V.

The KVM RISC-V world-switch (i.e. __kvm_riscv_switch_to()) mostly
switches general purpose registers, SSTATUS, STVEC, SSCRATCH and
HSTATUS CSRs. Other CSRs are switched via vcpu_load() and vcpu_put()
interface in kvm_arch_vcpu_load() and kvm_arch_vcpu_put() functions
respectively.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/kvm_host.h |   9 +-
 arch/riscv/kernel/asm-offsets.c   |  76 ++++++++++++
 arch/riscv/kvm/Makefile           |   2 +-
 arch/riscv/kvm/vcpu.c             |  32 ++++-
 arch/riscv/kvm/vcpu_switch.S      | 194 ++++++++++++++++++++++++++++++
 5 files changed, 309 insertions(+), 4 deletions(-)
 create mode 100644 arch/riscv/kvm/vcpu_switch.S

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index d801216da6d0..18f1097f1d8d 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -110,6 +110,13 @@ struct kvm_vcpu_arch {
 	/* ISA feature bits (similar to MISA) */
 	unsigned long isa;
 
+	/* SSCRATCH and STVEC of Host */
+	unsigned long host_sscratch;
+	unsigned long host_stvec;
+
+	/* CPU context of Host */
+	struct kvm_cpu_context host_context;
+
 	/* CPU context of Guest VCPU */
 	struct kvm_cpu_context guest_context;
 
@@ -162,7 +169,7 @@ int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 			unsigned long scause, unsigned long stval);
 
-static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
+void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch);
 
 int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
 int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 9f5628c38ac9..711656710190 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -7,7 +7,9 @@
 #define GENERATING_ASM_OFFSETS
 
 #include <linux/kbuild.h>
+#include <linux/mm.h>
 #include <linux/sched.h>
+#include <asm/kvm_host.h>
 #include <asm/thread_info.h>
 #include <asm/ptrace.h>
 
@@ -109,6 +111,80 @@ void asm_offsets(void)
 	OFFSET(PT_SBADADDR, pt_regs, sbadaddr);
 	OFFSET(PT_SCAUSE, pt_regs, scause);
 
+	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
+	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
+	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
+	OFFSET(KVM_ARCH_GUEST_GP, kvm_vcpu_arch, guest_context.gp);
+	OFFSET(KVM_ARCH_GUEST_TP, kvm_vcpu_arch, guest_context.tp);
+	OFFSET(KVM_ARCH_GUEST_T0, kvm_vcpu_arch, guest_context.t0);
+	OFFSET(KVM_ARCH_GUEST_T1, kvm_vcpu_arch, guest_context.t1);
+	OFFSET(KVM_ARCH_GUEST_T2, kvm_vcpu_arch, guest_context.t2);
+	OFFSET(KVM_ARCH_GUEST_S0, kvm_vcpu_arch, guest_context.s0);
+	OFFSET(KVM_ARCH_GUEST_S1, kvm_vcpu_arch, guest_context.s1);
+	OFFSET(KVM_ARCH_GUEST_A0, kvm_vcpu_arch, guest_context.a0);
+	OFFSET(KVM_ARCH_GUEST_A1, kvm_vcpu_arch, guest_context.a1);
+	OFFSET(KVM_ARCH_GUEST_A2, kvm_vcpu_arch, guest_context.a2);
+	OFFSET(KVM_ARCH_GUEST_A3, kvm_vcpu_arch, guest_context.a3);
+	OFFSET(KVM_ARCH_GUEST_A4, kvm_vcpu_arch, guest_context.a4);
+	OFFSET(KVM_ARCH_GUEST_A5, kvm_vcpu_arch, guest_context.a5);
+	OFFSET(KVM_ARCH_GUEST_A6, kvm_vcpu_arch, guest_context.a6);
+	OFFSET(KVM_ARCH_GUEST_A7, kvm_vcpu_arch, guest_context.a7);
+	OFFSET(KVM_ARCH_GUEST_S2, kvm_vcpu_arch, guest_context.s2);
+	OFFSET(KVM_ARCH_GUEST_S3, kvm_vcpu_arch, guest_context.s3);
+	OFFSET(KVM_ARCH_GUEST_S4, kvm_vcpu_arch, guest_context.s4);
+	OFFSET(KVM_ARCH_GUEST_S5, kvm_vcpu_arch, guest_context.s5);
+	OFFSET(KVM_ARCH_GUEST_S6, kvm_vcpu_arch, guest_context.s6);
+	OFFSET(KVM_ARCH_GUEST_S7, kvm_vcpu_arch, guest_context.s7);
+	OFFSET(KVM_ARCH_GUEST_S8, kvm_vcpu_arch, guest_context.s8);
+	OFFSET(KVM_ARCH_GUEST_S9, kvm_vcpu_arch, guest_context.s9);
+	OFFSET(KVM_ARCH_GUEST_S10, kvm_vcpu_arch, guest_context.s10);
+	OFFSET(KVM_ARCH_GUEST_S11, kvm_vcpu_arch, guest_context.s11);
+	OFFSET(KVM_ARCH_GUEST_T3, kvm_vcpu_arch, guest_context.t3);
+	OFFSET(KVM_ARCH_GUEST_T4, kvm_vcpu_arch, guest_context.t4);
+	OFFSET(KVM_ARCH_GUEST_T5, kvm_vcpu_arch, guest_context.t5);
+	OFFSET(KVM_ARCH_GUEST_T6, kvm_vcpu_arch, guest_context.t6);
+	OFFSET(KVM_ARCH_GUEST_SEPC, kvm_vcpu_arch, guest_context.sepc);
+	OFFSET(KVM_ARCH_GUEST_SSTATUS, kvm_vcpu_arch, guest_context.sstatus);
+	OFFSET(KVM_ARCH_GUEST_HSTATUS, kvm_vcpu_arch, guest_context.hstatus);
+
+	OFFSET(KVM_ARCH_HOST_ZERO, kvm_vcpu_arch, host_context.zero);
+	OFFSET(KVM_ARCH_HOST_RA, kvm_vcpu_arch, host_context.ra);
+	OFFSET(KVM_ARCH_HOST_SP, kvm_vcpu_arch, host_context.sp);
+	OFFSET(KVM_ARCH_HOST_GP, kvm_vcpu_arch, host_context.gp);
+	OFFSET(KVM_ARCH_HOST_TP, kvm_vcpu_arch, host_context.tp);
+	OFFSET(KVM_ARCH_HOST_T0, kvm_vcpu_arch, host_context.t0);
+	OFFSET(KVM_ARCH_HOST_T1, kvm_vcpu_arch, host_context.t1);
+	OFFSET(KVM_ARCH_HOST_T2, kvm_vcpu_arch, host_context.t2);
+	OFFSET(KVM_ARCH_HOST_S0, kvm_vcpu_arch, host_context.s0);
+	OFFSET(KVM_ARCH_HOST_S1, kvm_vcpu_arch, host_context.s1);
+	OFFSET(KVM_ARCH_HOST_A0, kvm_vcpu_arch, host_context.a0);
+	OFFSET(KVM_ARCH_HOST_A1, kvm_vcpu_arch, host_context.a1);
+	OFFSET(KVM_ARCH_HOST_A2, kvm_vcpu_arch, host_context.a2);
+	OFFSET(KVM_ARCH_HOST_A3, kvm_vcpu_arch, host_context.a3);
+	OFFSET(KVM_ARCH_HOST_A4, kvm_vcpu_arch, host_context.a4);
+	OFFSET(KVM_ARCH_HOST_A5, kvm_vcpu_arch, host_context.a5);
+	OFFSET(KVM_ARCH_HOST_A6, kvm_vcpu_arch, host_context.a6);
+	OFFSET(KVM_ARCH_HOST_A7, kvm_vcpu_arch, host_context.a7);
+	OFFSET(KVM_ARCH_HOST_S2, kvm_vcpu_arch, host_context.s2);
+	OFFSET(KVM_ARCH_HOST_S3, kvm_vcpu_arch, host_context.s3);
+	OFFSET(KVM_ARCH_HOST_S4, kvm_vcpu_arch, host_context.s4);
+	OFFSET(KVM_ARCH_HOST_S5, kvm_vcpu_arch, host_context.s5);
+	OFFSET(KVM_ARCH_HOST_S6, kvm_vcpu_arch, host_context.s6);
+	OFFSET(KVM_ARCH_HOST_S7, kvm_vcpu_arch, host_context.s7);
+	OFFSET(KVM_ARCH_HOST_S8, kvm_vcpu_arch, host_context.s8);
+	OFFSET(KVM_ARCH_HOST_S9, kvm_vcpu_arch, host_context.s9);
+	OFFSET(KVM_ARCH_HOST_S10, kvm_vcpu_arch, host_context.s10);
+	OFFSET(KVM_ARCH_HOST_S11, kvm_vcpu_arch, host_context.s11);
+	OFFSET(KVM_ARCH_HOST_T3, kvm_vcpu_arch, host_context.t3);
+	OFFSET(KVM_ARCH_HOST_T4, kvm_vcpu_arch, host_context.t4);
+	OFFSET(KVM_ARCH_HOST_T5, kvm_vcpu_arch, host_context.t5);
+	OFFSET(KVM_ARCH_HOST_T6, kvm_vcpu_arch, host_context.t6);
+	OFFSET(KVM_ARCH_HOST_SEPC, kvm_vcpu_arch, host_context.sepc);
+	OFFSET(KVM_ARCH_HOST_SSTATUS, kvm_vcpu_arch, host_context.sstatus);
+	OFFSET(KVM_ARCH_HOST_HSTATUS, kvm_vcpu_arch, host_context.hstatus);
+	OFFSET(KVM_ARCH_HOST_SSCRATCH, kvm_vcpu_arch, host_sscratch);
+	OFFSET(KVM_ARCH_HOST_STVEC, kvm_vcpu_arch, host_stvec);
+
 	/*
 	 * THREAD_{F,X}* might be larger than a S-type offset can handle, but
 	 * these are used in performance-sensitive assembly so we can't resort
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 37b5a59d4f4f..845579273727 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -8,6 +8,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 
 kvm-objs := $(common-objs-y)
 
-kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o
+kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o vcpu_switch.o
 
 obj-$(CONFIG_KVM)	+= kvm.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 9396a83c0611..e6d74a9a2fdf 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -569,14 +569,42 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
-	/* TODO: */
+	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+	unsigned long *vsip = raw_cpu_ptr(vsip_shadow);
+
+	csr_write(CSR_VSSTATUS, csr->vsstatus);
+	csr_write(CSR_VSIE, csr->vsie);
+	csr_write(CSR_VSTVEC, csr->vstvec);
+	csr_write(CSR_VSSCRATCH, csr->vsscratch);
+	csr_write(CSR_VSEPC, csr->vsepc);
+	csr_write(CSR_VSCAUSE, csr->vscause);
+	csr_write(CSR_VSTVAL, csr->vstval);
+	csr_write(CSR_VSIP, csr->vsip);
+	*vsip = csr->vsip;
+	csr_write(CSR_VSATP, csr->vsatp);
 
 	kvm_riscv_stage2_update_hgatp(vcpu);
+
+	vcpu->cpu = cpu;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
+	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+
+	vcpu->cpu = -1;
+
+	csr_write(CSR_HGATP, 0);
+
+	csr->vsstatus = csr_read(CSR_VSSTATUS);
+	csr->vsie = csr_read(CSR_VSIE);
+	csr->vstvec = csr_read(CSR_VSTVEC);
+	csr->vsscratch = csr_read(CSR_VSSCRATCH);
+	csr->vsepc = csr_read(CSR_VSEPC);
+	csr->vscause = csr_read(CSR_VSCAUSE);
+	csr->vstval = csr_read(CSR_VSTVAL);
+	csr->vsip = csr_read(CSR_VSIP);
+	csr->vsatp = csr_read(CSR_VSATP);
 }
 
 static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
diff --git a/arch/riscv/kvm/vcpu_switch.S b/arch/riscv/kvm/vcpu_switch.S
new file mode 100644
index 000000000000..e1a17df1b379
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_switch.S
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Anup Patel <anup.patel@wdc.com>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm/asm-offsets.h>
+#include <asm/csr.h>
+
+	.text
+	.altmacro
+	.option norelax
+
+ENTRY(__kvm_riscv_switch_to)
+	/* Save Host GPRs (except A0 and T0-T6) */
+	REG_S	ra, (KVM_ARCH_HOST_RA)(a0)
+	REG_S	sp, (KVM_ARCH_HOST_SP)(a0)
+	REG_S	gp, (KVM_ARCH_HOST_GP)(a0)
+	REG_S	tp, (KVM_ARCH_HOST_TP)(a0)
+	REG_S	s0, (KVM_ARCH_HOST_S0)(a0)
+	REG_S	s1, (KVM_ARCH_HOST_S1)(a0)
+	REG_S	a1, (KVM_ARCH_HOST_A1)(a0)
+	REG_S	a2, (KVM_ARCH_HOST_A2)(a0)
+	REG_S	a3, (KVM_ARCH_HOST_A3)(a0)
+	REG_S	a4, (KVM_ARCH_HOST_A4)(a0)
+	REG_S	a5, (KVM_ARCH_HOST_A5)(a0)
+	REG_S	a6, (KVM_ARCH_HOST_A6)(a0)
+	REG_S	a7, (KVM_ARCH_HOST_A7)(a0)
+	REG_S	s2, (KVM_ARCH_HOST_S2)(a0)
+	REG_S	s3, (KVM_ARCH_HOST_S3)(a0)
+	REG_S	s4, (KVM_ARCH_HOST_S4)(a0)
+	REG_S	s5, (KVM_ARCH_HOST_S5)(a0)
+	REG_S	s6, (KVM_ARCH_HOST_S6)(a0)
+	REG_S	s7, (KVM_ARCH_HOST_S7)(a0)
+	REG_S	s8, (KVM_ARCH_HOST_S8)(a0)
+	REG_S	s9, (KVM_ARCH_HOST_S9)(a0)
+	REG_S	s10, (KVM_ARCH_HOST_S10)(a0)
+	REG_S	s11, (KVM_ARCH_HOST_S11)(a0)
+
+	/* Save Host SSTATUS, HSTATUS, SCRATCH and STVEC */
+	csrr	t0, CSR_SSTATUS
+	REG_S	t0, (KVM_ARCH_HOST_SSTATUS)(a0)
+	csrr	t1, CSR_HSTATUS
+	REG_S	t1, (KVM_ARCH_HOST_HSTATUS)(a0)
+	csrr	t2, CSR_SSCRATCH
+	REG_S	t2, (KVM_ARCH_HOST_SSCRATCH)(a0)
+	csrr	t3, CSR_STVEC
+	REG_S	t3, (KVM_ARCH_HOST_STVEC)(a0)
+
+	/* Change Host exception vector to return path */
+	la	t4, __kvm_switch_return
+	csrw	CSR_STVEC, t4
+
+	/* Restore Guest HSTATUS, SSTATUS and SEPC */
+	REG_L	t4, (KVM_ARCH_GUEST_SEPC)(a0)
+	csrw	CSR_SEPC, t4
+	REG_L	t5, (KVM_ARCH_GUEST_SSTATUS)(a0)
+	csrw	CSR_SSTATUS, t5
+	REG_L	t6, (KVM_ARCH_GUEST_HSTATUS)(a0)
+	csrw	CSR_HSTATUS, t6
+
+	/* Restore Guest GPRs (except A0) */
+	REG_L	ra, (KVM_ARCH_GUEST_RA)(a0)
+	REG_L	sp, (KVM_ARCH_GUEST_SP)(a0)
+	REG_L	gp, (KVM_ARCH_GUEST_GP)(a0)
+	REG_L	tp, (KVM_ARCH_GUEST_TP)(a0)
+	REG_L	t0, (KVM_ARCH_GUEST_T0)(a0)
+	REG_L	t1, (KVM_ARCH_GUEST_T1)(a0)
+	REG_L	t2, (KVM_ARCH_GUEST_T2)(a0)
+	REG_L	s0, (KVM_ARCH_GUEST_S0)(a0)
+	REG_L	s1, (KVM_ARCH_GUEST_S1)(a0)
+	REG_L	a1, (KVM_ARCH_GUEST_A1)(a0)
+	REG_L	a2, (KVM_ARCH_GUEST_A2)(a0)
+	REG_L	a3, (KVM_ARCH_GUEST_A3)(a0)
+	REG_L	a4, (KVM_ARCH_GUEST_A4)(a0)
+	REG_L	a5, (KVM_ARCH_GUEST_A5)(a0)
+	REG_L	a6, (KVM_ARCH_GUEST_A6)(a0)
+	REG_L	a7, (KVM_ARCH_GUEST_A7)(a0)
+	REG_L	s2, (KVM_ARCH_GUEST_S2)(a0)
+	REG_L	s3, (KVM_ARCH_GUEST_S3)(a0)
+	REG_L	s4, (KVM_ARCH_GUEST_S4)(a0)
+	REG_L	s5, (KVM_ARCH_GUEST_S5)(a0)
+	REG_L	s6, (KVM_ARCH_GUEST_S6)(a0)
+	REG_L	s7, (KVM_ARCH_GUEST_S7)(a0)
+	REG_L	s8, (KVM_ARCH_GUEST_S8)(a0)
+	REG_L	s9, (KVM_ARCH_GUEST_S9)(a0)
+	REG_L	s10, (KVM_ARCH_GUEST_S10)(a0)
+	REG_L	s11, (KVM_ARCH_GUEST_S11)(a0)
+	REG_L	t3, (KVM_ARCH_GUEST_T3)(a0)
+	REG_L	t4, (KVM_ARCH_GUEST_T4)(a0)
+	REG_L	t5, (KVM_ARCH_GUEST_T5)(a0)
+	REG_L	t6, (KVM_ARCH_GUEST_T6)(a0)
+
+	/* Save Host A0 in SSCRATCH */
+	csrw	CSR_SSCRATCH, a0
+
+	/* Restore Guest A0 */
+	REG_L	a0, (KVM_ARCH_GUEST_A0)(a0)
+
+	/* Resume Guest */
+	sret
+
+	/* Back to Host */
+	.align 2
+__kvm_switch_return:
+	/* Swap Guest A0 with SSCRATCH */
+	csrrw	a0, CSR_SSCRATCH, a0
+
+	/* Save Guest GPRs (except A0) */
+	REG_S	ra, (KVM_ARCH_GUEST_RA)(a0)
+	REG_S	sp, (KVM_ARCH_GUEST_SP)(a0)
+	REG_S	gp, (KVM_ARCH_GUEST_GP)(a0)
+	REG_S	tp, (KVM_ARCH_GUEST_TP)(a0)
+	REG_S	t0, (KVM_ARCH_GUEST_T0)(a0)
+	REG_S	t1, (KVM_ARCH_GUEST_T1)(a0)
+	REG_S	t2, (KVM_ARCH_GUEST_T2)(a0)
+	REG_S	s0, (KVM_ARCH_GUEST_S0)(a0)
+	REG_S	s1, (KVM_ARCH_GUEST_S1)(a0)
+	REG_S	a1, (KVM_ARCH_GUEST_A1)(a0)
+	REG_S	a2, (KVM_ARCH_GUEST_A2)(a0)
+	REG_S	a3, (KVM_ARCH_GUEST_A3)(a0)
+	REG_S	a4, (KVM_ARCH_GUEST_A4)(a0)
+	REG_S	a5, (KVM_ARCH_GUEST_A5)(a0)
+	REG_S	a6, (KVM_ARCH_GUEST_A6)(a0)
+	REG_S	a7, (KVM_ARCH_GUEST_A7)(a0)
+	REG_S	s2, (KVM_ARCH_GUEST_S2)(a0)
+	REG_S	s3, (KVM_ARCH_GUEST_S3)(a0)
+	REG_S	s4, (KVM_ARCH_GUEST_S4)(a0)
+	REG_S	s5, (KVM_ARCH_GUEST_S5)(a0)
+	REG_S	s6, (KVM_ARCH_GUEST_S6)(a0)
+	REG_S	s7, (KVM_ARCH_GUEST_S7)(a0)
+	REG_S	s8, (KVM_ARCH_GUEST_S8)(a0)
+	REG_S	s9, (KVM_ARCH_GUEST_S9)(a0)
+	REG_S	s10, (KVM_ARCH_GUEST_S10)(a0)
+	REG_S	s11, (KVM_ARCH_GUEST_S11)(a0)
+	REG_S	t3, (KVM_ARCH_GUEST_T3)(a0)
+	REG_S	t4, (KVM_ARCH_GUEST_T4)(a0)
+	REG_S	t5, (KVM_ARCH_GUEST_T5)(a0)
+	REG_S	t6, (KVM_ARCH_GUEST_T6)(a0)
+
+	/* Save Guest A0 */
+	csrr	t0, CSR_SSCRATCH
+	REG_S	t0, (KVM_ARCH_GUEST_A0)(a0)
+
+	/* Save Guest HSTATUS, SSTATUS, and SEPC */
+	csrr	t0, CSR_SEPC
+	REG_S	t0, (KVM_ARCH_GUEST_SEPC)(a0)
+	csrr	t1, CSR_SSTATUS
+	REG_S	t1, (KVM_ARCH_GUEST_SSTATUS)(a0)
+	csrr	t2, CSR_HSTATUS
+	REG_S	t2, (KVM_ARCH_GUEST_HSTATUS)(a0)
+
+	/* Restore Host SSTATUS, HSTATUS, SCRATCH and STVEC */
+	REG_L	t3, (KVM_ARCH_HOST_SSTATUS)(a0)
+	csrw	CSR_SSTATUS, t3
+	REG_L	t4, (KVM_ARCH_HOST_HSTATUS)(a0)
+	csrw	CSR_HSTATUS, t4
+	REG_L	t5, (KVM_ARCH_HOST_SSCRATCH)(a0)
+	csrw	CSR_SSCRATCH, t5
+	REG_L	t6, (KVM_ARCH_HOST_STVEC)(a0)
+	csrw	CSR_STVEC, t6
+
+	/* Restore Host GPRs (except A0 and T0-T6) */
+	REG_L	ra, (KVM_ARCH_HOST_RA)(a0)
+	REG_L	sp, (KVM_ARCH_HOST_SP)(a0)
+	REG_L	gp, (KVM_ARCH_HOST_GP)(a0)
+	REG_L	tp, (KVM_ARCH_HOST_TP)(a0)
+	REG_L	s0, (KVM_ARCH_HOST_S0)(a0)
+	REG_L	s1, (KVM_ARCH_HOST_S1)(a0)
+	REG_L	a1, (KVM_ARCH_HOST_A1)(a0)
+	REG_L	a2, (KVM_ARCH_HOST_A2)(a0)
+	REG_L	a3, (KVM_ARCH_HOST_A3)(a0)
+	REG_L	a4, (KVM_ARCH_HOST_A4)(a0)
+	REG_L	a5, (KVM_ARCH_HOST_A5)(a0)
+	REG_L	a6, (KVM_ARCH_HOST_A6)(a0)
+	REG_L	a7, (KVM_ARCH_HOST_A7)(a0)
+	REG_L	s2, (KVM_ARCH_HOST_S2)(a0)
+	REG_L	s3, (KVM_ARCH_HOST_S3)(a0)
+	REG_L	s4, (KVM_ARCH_HOST_S4)(a0)
+	REG_L	s5, (KVM_ARCH_HOST_S5)(a0)
+	REG_L	s6, (KVM_ARCH_HOST_S6)(a0)
+	REG_L	s7, (KVM_ARCH_HOST_S7)(a0)
+	REG_L	s8, (KVM_ARCH_HOST_S8)(a0)
+	REG_L	s9, (KVM_ARCH_HOST_S9)(a0)
+	REG_L	s10, (KVM_ARCH_HOST_S10)(a0)
+	REG_L	s11, (KVM_ARCH_HOST_S11)(a0)
+
+	/* Return to C code */
+	ret
+ENDPROC(__kvm_riscv_switch_to)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (8 preceding siblings ...)
  2019-08-22  8:44 ` [PATCH v5 09/20] RISC-V: KVM: Implement VCPU world-switch Anup Patel
@ 2019-08-22  8:44 ` Anup Patel
  2019-08-22 12:10   ` Alexander Graf
  2019-08-22 12:14   ` Alexander Graf
  2019-08-22  8:45 ` [PATCH v5 11/20] RISC-V: KVM: Handle WFI " Anup Patel
                   ` (10 subsequent siblings)
  20 siblings, 2 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:44 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

We will get stage2 page faults whenever Guest/VM access SW emulated
MMIO device or unmapped Guest RAM.

This patch implements MMIO read/write emulation by extracting MMIO
details from the trapped load/store instruction and forwarding the
MMIO read/write to user-space. The actual MMIO emulation will happen
in user-space and KVM kernel module will only take care of register
updates before resuming the trapped VCPU.

The handling for stage2 page faults for unmapped Guest RAM will be
implemeted by a separate patch later.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/kvm_host.h |  11 +
 arch/riscv/kvm/mmu.c              |   7 +
 arch/riscv/kvm/vcpu_exit.c        | 436 +++++++++++++++++++++++++++++-
 3 files changed, 451 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 18f1097f1d8d..4388bace6d70 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -53,6 +53,12 @@ struct kvm_arch {
 	phys_addr_t pgd_phys;
 };
 
+struct kvm_mmio_decode {
+	unsigned long insn;
+	int len;
+	int shift;
+};
+
 struct kvm_cpu_context {
 	unsigned long zero;
 	unsigned long ra;
@@ -141,6 +147,9 @@ struct kvm_vcpu_arch {
 	unsigned long irqs_pending;
 	unsigned long irqs_pending_mask;
 
+	/* MMIO instruction details */
+	struct kvm_mmio_decode mmio_decode;
+
 	/* VCPU power-off state */
 	bool power_off;
 
@@ -160,6 +169,8 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 int kvm_riscv_setup_vsip(void);
 void kvm_riscv_cleanup_vsip(void);
 
+int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
+			 bool is_write);
 void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
 int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 04dd089b86ff..2b965f9aac07 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -61,6 +61,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	return 0;
 }
 
+int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
+			 bool is_write)
+{
+	/* TODO: */
+	return 0;
+}
+
 void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
 {
 	/* TODO: */
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index e4d7c8f0807a..efc06198c259 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -6,9 +6,371 @@
  *     Anup Patel <anup.patel@wdc.com>
  */
 
+#include <linux/bitops.h>
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/kvm_host.h>
+#include <asm/csr.h>
+
+#define INSN_MATCH_LB		0x3
+#define INSN_MASK_LB		0x707f
+#define INSN_MATCH_LH		0x1003
+#define INSN_MASK_LH		0x707f
+#define INSN_MATCH_LW		0x2003
+#define INSN_MASK_LW		0x707f
+#define INSN_MATCH_LD		0x3003
+#define INSN_MASK_LD		0x707f
+#define INSN_MATCH_LBU		0x4003
+#define INSN_MASK_LBU		0x707f
+#define INSN_MATCH_LHU		0x5003
+#define INSN_MASK_LHU		0x707f
+#define INSN_MATCH_LWU		0x6003
+#define INSN_MASK_LWU		0x707f
+#define INSN_MATCH_SB		0x23
+#define INSN_MASK_SB		0x707f
+#define INSN_MATCH_SH		0x1023
+#define INSN_MASK_SH		0x707f
+#define INSN_MATCH_SW		0x2023
+#define INSN_MASK_SW		0x707f
+#define INSN_MATCH_SD		0x3023
+#define INSN_MASK_SD		0x707f
+
+#define INSN_MATCH_C_LD		0x6000
+#define INSN_MASK_C_LD		0xe003
+#define INSN_MATCH_C_SD		0xe000
+#define INSN_MASK_C_SD		0xe003
+#define INSN_MATCH_C_LW		0x4000
+#define INSN_MASK_C_LW		0xe003
+#define INSN_MATCH_C_SW		0xc000
+#define INSN_MASK_C_SW		0xe003
+#define INSN_MATCH_C_LDSP	0x6002
+#define INSN_MASK_C_LDSP	0xe003
+#define INSN_MATCH_C_SDSP	0xe002
+#define INSN_MASK_C_SDSP	0xe003
+#define INSN_MATCH_C_LWSP	0x4002
+#define INSN_MASK_C_LWSP	0xe003
+#define INSN_MATCH_C_SWSP	0xc002
+#define INSN_MASK_C_SWSP	0xe003
+
+#define INSN_LEN(insn)		((((insn) & 0x3) < 0x3) ? 2 : 4)
+
+#ifdef CONFIG_64BIT
+#define LOG_REGBYTES		3
+#else
+#define LOG_REGBYTES		2
+#endif
+#define REGBYTES		(1 << LOG_REGBYTES)
+
+#define SH_RD			7
+#define SH_RS1			15
+#define SH_RS2			20
+#define SH_RS2C			2
+
+#define RV_X(x, s, n)		(((x) >> (s)) & ((1 << (n)) - 1))
+#define RVC_LW_IMM(x)		((RV_X(x, 6, 1) << 2) | \
+				 (RV_X(x, 10, 3) << 3) | \
+				 (RV_X(x, 5, 1) << 6))
+#define RVC_LD_IMM(x)		((RV_X(x, 10, 3) << 3) | \
+				 (RV_X(x, 5, 2) << 6))
+#define RVC_LWSP_IMM(x)		((RV_X(x, 4, 3) << 2) | \
+				 (RV_X(x, 12, 1) << 5) | \
+				 (RV_X(x, 2, 2) << 6))
+#define RVC_LDSP_IMM(x)		((RV_X(x, 5, 2) << 3) | \
+				 (RV_X(x, 12, 1) << 5) | \
+				 (RV_X(x, 2, 3) << 6))
+#define RVC_SWSP_IMM(x)		((RV_X(x, 9, 4) << 2) | \
+				 (RV_X(x, 7, 2) << 6))
+#define RVC_SDSP_IMM(x)		((RV_X(x, 10, 3) << 3) | \
+				 (RV_X(x, 7, 3) << 6))
+#define RVC_RS1S(insn)		(8 + RV_X(insn, SH_RD, 3))
+#define RVC_RS2S(insn)		(8 + RV_X(insn, SH_RS2C, 3))
+#define RVC_RS2(insn)		RV_X(insn, SH_RS2C, 5)
+
+#define SHIFT_RIGHT(x, y)		\
+	((y) < 0 ? ((x) << -(y)) : ((x) >> (y)))
+
+#define REG_MASK			\
+	((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES))
+
+#define REG_OFFSET(insn, pos)		\
+	(SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK)
+
+#define REG_PTR(insn, pos, regs)	\
+	(ulong *)((ulong)(regs) + REG_OFFSET(insn, pos))
+
+#define GET_RM(insn)		(((insn) >> 12) & 7)
+
+#define GET_RS1(insn, regs)	(*REG_PTR(insn, SH_RS1, regs))
+#define GET_RS2(insn, regs)	(*REG_PTR(insn, SH_RS2, regs))
+#define GET_RS1S(insn, regs)	(*REG_PTR(RVC_RS1S(insn), 0, regs))
+#define GET_RS2S(insn, regs)	(*REG_PTR(RVC_RS2S(insn), 0, regs))
+#define GET_RS2C(insn, regs)	(*REG_PTR(insn, SH_RS2C, regs))
+#define GET_SP(regs)		(*REG_PTR(2, 0, regs))
+#define SET_RD(insn, regs, val)	(*REG_PTR(insn, SH_RD, regs) = (val))
+#define IMM_I(insn)		((s32)(insn) >> 20)
+#define IMM_S(insn)		(((s32)(insn) >> 25 << 5) | \
+				 (s32)(((insn) >> 7) & 0x1f))
+#define MASK_FUNCT3		0x7000
+
+#define STR(x)			XSTR(x)
+#define XSTR(x)			#x
+
+/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
+static ulong get_insn(struct kvm_vcpu *vcpu)
+{
+	ulong __sepc = vcpu->arch.guest_context.sepc;
+	ulong __hstatus, __sstatus, __vsstatus;
+#ifdef CONFIG_RISCV_ISA_C
+	ulong rvc_mask = 3, tmp;
+#endif
+	ulong flags, val;
+
+	local_irq_save(flags);
+
+	__vsstatus = csr_read(CSR_VSSTATUS);
+	__sstatus = csr_read(CSR_SSTATUS);
+	__hstatus = csr_read(CSR_HSTATUS);
+
+	csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR);
+	csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR);
+	csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
+
+#ifndef CONFIG_RISCV_ISA_C
+	asm ("\n"
+#ifdef CONFIG_64BIT
+		STR(LWU) " %[insn], (%[addr])\n"
+#else
+		STR(LW) " %[insn], (%[addr])\n"
+#endif
+		: [insn] "=&r" (val) : [addr] "r" (__sepc));
+#else
+	asm ("and %[tmp], %[addr], 2\n"
+		"bnez %[tmp], 1f\n"
+#ifdef CONFIG_64BIT
+		STR(LWU) " %[insn], (%[addr])\n"
+#else
+		STR(LW) " %[insn], (%[addr])\n"
+#endif
+		"and %[tmp], %[insn], %[rvc_mask]\n"
+		"beq %[tmp], %[rvc_mask], 2f\n"
+		"sll %[insn], %[insn], %[xlen_minus_16]\n"
+		"srl %[insn], %[insn], %[xlen_minus_16]\n"
+		"j 2f\n"
+		"1:\n"
+		"lhu %[insn], (%[addr])\n"
+		"and %[tmp], %[insn], %[rvc_mask]\n"
+		"bne %[tmp], %[rvc_mask], 2f\n"
+		"lhu %[tmp], 2(%[addr])\n"
+		"sll %[tmp], %[tmp], 16\n"
+		"add %[insn], %[insn], %[tmp]\n"
+		"2:"
+	: [vsstatus] "+&r" (__vsstatus), [insn] "=&r" (val),
+	  [tmp] "=&r" (tmp)
+	: [addr] "r" (__sepc), [rvc_mask] "r" (rvc_mask),
+	  [xlen_minus_16] "i" (__riscv_xlen - 16));
+#endif
+
+	csr_write(CSR_HSTATUS, __hstatus);
+	csr_write(CSR_SSTATUS, __sstatus);
+	csr_write(CSR_VSSTATUS, __vsstatus);
+
+	local_irq_restore(flags);
+
+	return val;
+}
+
+static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			unsigned long fault_addr)
+{
+	int shift = 0, len = 0;
+	ulong insn = get_insn(vcpu);
+
+	/* Decode length of MMIO and shift */
+	if ((insn & INSN_MASK_LW) == INSN_MATCH_LW) {
+		len = 4;
+		shift = 8 * (sizeof(ulong) - len);
+	} else if ((insn & INSN_MASK_LB) == INSN_MATCH_LB) {
+		len = 1;
+		shift = 8 * (sizeof(ulong) - len);
+	} else if ((insn & INSN_MASK_LBU) == INSN_MATCH_LBU) {
+		len = 1;
+		shift = 8 * (sizeof(ulong) - len);
+#ifdef CONFIG_64BIT
+	} else if ((insn & INSN_MASK_LD) == INSN_MATCH_LD) {
+		len = 8;
+		shift = 8 * (sizeof(ulong) - len);
+	} else if ((insn & INSN_MASK_LWU) == INSN_MATCH_LWU) {
+		len = 4;
+#endif
+	} else if ((insn & INSN_MASK_LH) == INSN_MATCH_LH) {
+		len = 2;
+		shift = 8 * (sizeof(ulong) - len);
+	} else if ((insn & INSN_MASK_LHU) == INSN_MATCH_LHU) {
+		len = 2;
+#ifdef CONFIG_RISCV_ISA_C
+#ifdef CONFIG_64BIT
+	} else if ((insn & INSN_MASK_C_LD) == INSN_MATCH_C_LD) {
+		len = 8;
+		shift = 8 * (sizeof(ulong) - len);
+		insn = RVC_RS2S(insn) << SH_RD;
+	} else if ((insn & INSN_MASK_C_LDSP) == INSN_MATCH_C_LDSP &&
+		   ((insn >> SH_RD) & 0x1f)) {
+		len = 8;
+		shift = 8 * (sizeof(ulong) - len);
+#endif
+	} else if ((insn & INSN_MASK_C_LW) == INSN_MATCH_C_LW) {
+		len = 4;
+		shift = 8 * (sizeof(ulong) - len);
+		insn = RVC_RS2S(insn) << SH_RD;
+	} else if ((insn & INSN_MASK_C_LWSP) == INSN_MATCH_C_LWSP &&
+		   ((insn >> SH_RD) & 0x1f)) {
+		len = 4;
+		shift = 8 * (sizeof(ulong) - len);
+#endif
+	} else {
+		return -ENOTSUPP;
+	}
+
+	/* Fault address should be aligned to length of MMIO */
+	if (fault_addr & (len - 1))
+		return -EIO;
+
+	/* Save instruction decode info */
+	vcpu->arch.mmio_decode.insn = insn;
+	vcpu->arch.mmio_decode.shift = shift;
+	vcpu->arch.mmio_decode.len = len;
+
+	/* Exit to userspace for MMIO emulation */
+	vcpu->stat.mmio_exit_user++;
+	run->exit_reason = KVM_EXIT_MMIO;
+	run->mmio.is_write = false;
+	run->mmio.phys_addr = fault_addr;
+	run->mmio.len = len;
+
+	/* Move to next instruction */
+	vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+
+	return 0;
+}
+
+static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			 unsigned long fault_addr)
+{
+	u8 data8;
+	u16 data16;
+	u32 data32;
+	u64 data64;
+	ulong data;
+	int len = 0;
+	ulong insn = get_insn(vcpu);
+
+	data = GET_RS2(insn, &vcpu->arch.guest_context);
+	data8 = data16 = data32 = data64 = data;
+
+	if ((insn & INSN_MASK_SW) == INSN_MATCH_SW) {
+		len = 4;
+	} else if ((insn & INSN_MASK_SB) == INSN_MATCH_SB) {
+		len = 1;
+#ifdef CONFIG_64BIT
+	} else if ((insn & INSN_MASK_SD) == INSN_MATCH_SD) {
+		len = 8;
+#endif
+	} else if ((insn & INSN_MASK_SH) == INSN_MATCH_SH) {
+		len = 2;
+#ifdef CONFIG_RISCV_ISA_C
+#ifdef CONFIG_64BIT
+	} else if ((insn & INSN_MASK_C_SD) == INSN_MATCH_C_SD) {
+		len = 8;
+		data64 = GET_RS2S(insn, &vcpu->arch.guest_context);
+	} else if ((insn & INSN_MASK_C_SDSP) == INSN_MATCH_C_SDSP &&
+		   ((insn >> SH_RD) & 0x1f)) {
+		len = 8;
+		data64 = GET_RS2C(insn, &vcpu->arch.guest_context);
+#endif
+	} else if ((insn & INSN_MASK_C_SW) == INSN_MATCH_C_SW) {
+		len = 4;
+		data32 = GET_RS2S(insn, &vcpu->arch.guest_context);
+	} else if ((insn & INSN_MASK_C_SWSP) == INSN_MATCH_C_SWSP &&
+		   ((insn >> SH_RD) & 0x1f)) {
+		len = 4;
+		data32 = GET_RS2C(insn, &vcpu->arch.guest_context);
+#endif
+	} else {
+		return -ENOTSUPP;
+	}
+
+	/* Fault address should be aligned to length of MMIO */
+	if (fault_addr & (len - 1))
+		return -EIO;
+
+	/* Clear instruction decode info */
+	vcpu->arch.mmio_decode.insn = 0;
+	vcpu->arch.mmio_decode.shift = 0;
+	vcpu->arch.mmio_decode.len = 0;
+
+	/* Copy data to kvm_run instance */
+	switch (len) {
+	case 1:
+		*((u8 *)run->mmio.data) = data8;
+		break;
+	case 2:
+		*((u16 *)run->mmio.data) = data16;
+		break;
+	case 4:
+		*((u32 *)run->mmio.data) = data32;
+		break;
+	case 8:
+		*((u64 *)run->mmio.data) = data64;
+		break;
+	default:
+		return -ENOTSUPP;
+	};
+
+	/* Exit to userspace for MMIO emulation */
+	vcpu->stat.mmio_exit_user++;
+	run->exit_reason = KVM_EXIT_MMIO;
+	run->mmio.is_write = true;
+	run->mmio.phys_addr = fault_addr;
+	run->mmio.len = len;
+
+	/* Move to next instruction */
+	vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+
+	return 0;
+}
+
+static int stage2_page_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			     unsigned long scause, unsigned long stval)
+{
+	struct kvm_memory_slot *memslot;
+	unsigned long hva;
+	bool writable;
+	gfn_t gfn;
+	int ret;
+
+	gfn = stval >> PAGE_SHIFT;
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
+
+	if (kvm_is_error_hva(hva) ||
+	    (scause == EXC_STORE_PAGE_FAULT && !writable)) {
+		switch (scause) {
+		case EXC_LOAD_PAGE_FAULT:
+			return emulate_load(vcpu, run, stval);
+		case EXC_STORE_PAGE_FAULT:
+			return emulate_store(vcpu, run, stval);
+		default:
+			return -ENOTSUPP;
+		};
+	}
+
+	ret = kvm_riscv_stage2_map(vcpu, stval, hva,
+			(scause == EXC_STORE_PAGE_FAULT) ? true : false);
+	if (ret < 0)
+		return ret;
+
+	return 1;
+}
 
 /**
  * kvm_riscv_vcpu_mmio_return -- Handle MMIO loads after user space emulation
@@ -19,7 +381,44 @@
  */
 int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	/* TODO: */
+	u8 data8;
+	u16 data16;
+	u32 data32;
+	u64 data64;
+	ulong insn;
+	int len, shift;
+
+	if (run->mmio.is_write)
+		return 0;
+
+	insn = vcpu->arch.mmio_decode.insn;
+	len = vcpu->arch.mmio_decode.len;
+	shift = vcpu->arch.mmio_decode.shift;
+	switch (len) {
+	case 1:
+		data8 = *((u8 *)run->mmio.data);
+		SET_RD(insn, &vcpu->arch.guest_context,
+			(ulong)data8 << shift >> shift);
+		break;
+	case 2:
+		data16 = *((u16 *)run->mmio.data);
+		SET_RD(insn, &vcpu->arch.guest_context,
+			(ulong)data16 << shift >> shift);
+		break;
+	case 4:
+		data32 = *((u32 *)run->mmio.data);
+		SET_RD(insn, &vcpu->arch.guest_context,
+			(ulong)data32 << shift >> shift);
+		break;
+	case 8:
+		data64 = *((u64 *)run->mmio.data);
+		SET_RD(insn, &vcpu->arch.guest_context,
+			(ulong)data64 << shift >> shift);
+		break;
+	default:
+		return -ENOTSUPP;
+	};
+
 	return 0;
 }
 
@@ -30,6 +429,37 @@ int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
 int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 			unsigned long scause, unsigned long stval)
 {
-	/* TODO: */
-	return 0;
+	int ret;
+
+	/* If we got host interrupt then do nothing */
+	if (scause & SCAUSE_IRQ_FLAG)
+		return 1;
+
+	/* Handle guest traps */
+	ret = -EFAULT;
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	switch (scause) {
+	case EXC_INST_PAGE_FAULT:
+	case EXC_LOAD_PAGE_FAULT:
+	case EXC_STORE_PAGE_FAULT:
+		if ((vcpu->arch.guest_context.hstatus & HSTATUS_SPV) &&
+		    (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
+			ret = stage2_page_fault(vcpu, run, scause, stval);
+		break;
+	default:
+		break;
+	};
+
+	/* Print details in-case of error */
+	if (ret < 0) {
+		kvm_err("VCPU exit error %d\n", ret);
+		kvm_err("SEPC=0x%lx SSTATUS=0x%lx HSTATUS=0x%lx\n",
+			vcpu->arch.guest_context.sepc,
+			vcpu->arch.guest_context.sstatus,
+			vcpu->arch.guest_context.hstatus);
+		kvm_err("SCAUSE=0x%lx STVAL=0x%lx\n",
+			scause, stval);
+	}
+
+	return ret;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 11/20] RISC-V: KVM: Handle WFI exits for VCPU
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (9 preceding siblings ...)
  2019-08-22  8:44 ` [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU Anup Patel
@ 2019-08-22  8:45 ` " Anup Patel
  2019-08-22 12:19   ` Alexander Graf
  2019-08-22  8:45 ` [PATCH v5 12/20] RISC-V: KVM: Implement VMID allocator Anup Patel
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:45 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

We get illegal instruction trap whenever Guest/VM executes WFI
instruction.

This patch handles WFI trap by blocking the trapped VCPU using
kvm_vcpu_block() API. The blocked VCPU will be automatically
resumed whenever a VCPU interrupt is injected from user-space
or from in-kernel IRQCHIP emulation.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/kvm/vcpu_exit.c | 88 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index efc06198c259..fbc04fe335ad 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -12,6 +12,9 @@
 #include <linux/kvm_host.h>
 #include <asm/csr.h>
 
+#define INSN_MASK_WFI		0xffffff00
+#define INSN_MATCH_WFI		0x10500000
+
 #define INSN_MATCH_LB		0x3
 #define INSN_MASK_LB		0x707f
 #define INSN_MATCH_LH		0x1003
@@ -179,6 +182,87 @@ static ulong get_insn(struct kvm_vcpu *vcpu)
 	return val;
 }
 
+typedef int (*illegal_insn_func)(struct kvm_vcpu *vcpu,
+				 struct kvm_run *run,
+				 ulong insn);
+
+static int truly_illegal_insn(struct kvm_vcpu *vcpu,
+			      struct kvm_run *run,
+			      ulong insn)
+{
+	/* TODO: Redirect trap to Guest VCPU */
+	return -ENOTSUPP;
+}
+
+static int system_opcode_insn(struct kvm_vcpu *vcpu,
+			      struct kvm_run *run,
+			      ulong insn)
+{
+	if ((insn & INSN_MASK_WFI) == INSN_MATCH_WFI) {
+		vcpu->stat.wfi_exit_stat++;
+		if (!kvm_arch_vcpu_runnable(vcpu)) {
+			srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
+			kvm_vcpu_block(vcpu);
+			vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+			kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+		}
+		vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+		return 1;
+	}
+
+	return truly_illegal_insn(vcpu, run, insn);
+}
+
+static illegal_insn_func illegal_insn_table[32] = {
+	truly_illegal_insn, /* 0 */
+	truly_illegal_insn, /* 1 */
+	truly_illegal_insn, /* 2 */
+	truly_illegal_insn, /* 3 */
+	truly_illegal_insn, /* 4 */
+	truly_illegal_insn, /* 5 */
+	truly_illegal_insn, /* 6 */
+	truly_illegal_insn, /* 7 */
+	truly_illegal_insn, /* 8 */
+	truly_illegal_insn, /* 9 */
+	truly_illegal_insn, /* 10 */
+	truly_illegal_insn, /* 11 */
+	truly_illegal_insn, /* 12 */
+	truly_illegal_insn, /* 13 */
+	truly_illegal_insn, /* 14 */
+	truly_illegal_insn, /* 15 */
+	truly_illegal_insn, /* 16 */
+	truly_illegal_insn, /* 17 */
+	truly_illegal_insn, /* 18 */
+	truly_illegal_insn, /* 19 */
+	truly_illegal_insn, /* 20 */
+	truly_illegal_insn, /* 21 */
+	truly_illegal_insn, /* 22 */
+	truly_illegal_insn, /* 23 */
+	truly_illegal_insn, /* 24 */
+	truly_illegal_insn, /* 25 */
+	truly_illegal_insn, /* 26 */
+	truly_illegal_insn, /* 27 */
+	system_opcode_insn, /* 28 */
+	truly_illegal_insn, /* 29 */
+	truly_illegal_insn, /* 30 */
+	truly_illegal_insn  /* 31 */
+};
+
+static int illegal_inst_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			      unsigned long stval)
+{
+	ulong insn = stval;
+
+	if (unlikely((insn & 3) != 3)) {
+		if (insn == 0)
+			insn = get_insn(vcpu);
+		if ((insn & 3) != 3)
+			return truly_illegal_insn(vcpu, run, insn);
+	}
+
+	return illegal_insn_table[(insn & 0x7c) >> 2](vcpu, run, insn);
+}
+
 static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
 			unsigned long fault_addr)
 {
@@ -439,6 +523,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	ret = -EFAULT;
 	run->exit_reason = KVM_EXIT_UNKNOWN;
 	switch (scause) {
+	case EXC_INST_ILLEGAL:
+		if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
+			ret = illegal_inst_fault(vcpu, run, stval);
+		break;
 	case EXC_INST_PAGE_FAULT:
 	case EXC_LOAD_PAGE_FAULT:
 	case EXC_STORE_PAGE_FAULT:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 12/20] RISC-V: KVM: Implement VMID allocator
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (10 preceding siblings ...)
  2019-08-22  8:45 ` [PATCH v5 11/20] RISC-V: KVM: Handle WFI " Anup Patel
@ 2019-08-22  8:45 ` Anup Patel
  2019-08-22  8:45 ` [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming Anup Patel
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:45 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

We implement a simple VMID allocator for Guests/VMs which:
1. Detects number of VMID bits at boot-time
2. Uses atomic number to track VMID version and increments
   VMID version whenever we run-out of VMIDs
3. Flushes Guest TLBs on all host CPUs whenever we run-out
   of VMIDs
4. Force updates HW Stage2 VMID for each Guest VCPU whenever
   VMID changes using VCPU request KVM_REQ_UPDATE_HGATP

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/kvm_host.h |  25 ++++++
 arch/riscv/kvm/Makefile           |   3 +-
 arch/riscv/kvm/main.c             |   4 +
 arch/riscv/kvm/tlb.S              |  43 +++++++++++
 arch/riscv/kvm/vcpu.c             |   9 +++
 arch/riscv/kvm/vm.c               |   6 ++
 arch/riscv/kvm/vmid.c             | 123 ++++++++++++++++++++++++++++++
 7 files changed, 212 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/kvm/tlb.S
 create mode 100644 arch/riscv/kvm/vmid.c

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 4388bace6d70..3b09158f80f2 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -27,6 +27,7 @@
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_VCPU_RESET		KVM_ARCH_REQ(1)
+#define KVM_REQ_UPDATE_HGATP		KVM_ARCH_REQ(2)
 
 struct kvm_vm_stat {
 	ulong remote_tlb_flush;
@@ -47,7 +48,19 @@ struct kvm_vcpu_stat {
 struct kvm_arch_memory_slot {
 };
 
+struct kvm_vmid {
+	/*
+	 * Writes to vmid_version and vmid happen with vmid_lock held
+	 * whereas reads happen without any lock held.
+	 */
+	unsigned long vmid_version;
+	unsigned long vmid;
+};
+
 struct kvm_arch {
+	/* stage2 vmid */
+	struct kvm_vmid vmid;
+
 	/* stage2 page table */
 	pgd_t *pgd;
 	phys_addr_t pgd_phys;
@@ -169,6 +182,12 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 int kvm_riscv_setup_vsip(void);
 void kvm_riscv_cleanup_vsip(void);
 
+extern void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long vmid,
+					     unsigned long gpa);
+extern void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid);
+extern void __kvm_riscv_hfence_gvma_gpa(unsigned long gpa);
+extern void __kvm_riscv_hfence_gvma_all(void);
+
 int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
 			 bool is_write);
 void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
@@ -176,6 +195,12 @@ int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
 
+void kvm_riscv_stage2_vmid_detect(void);
+unsigned long kvm_riscv_stage2_vmid_bits(void);
+int kvm_riscv_stage2_vmid_init(struct kvm *kvm);
+bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid);
+void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu);
+
 int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 			unsigned long scause, unsigned long stval);
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 845579273727..c0f57f26c13d 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -8,6 +8,7 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 
 kvm-objs := $(common-objs-y)
 
-kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o vcpu_switch.o
+kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
 
 obj-$(CONFIG_KVM)	+= kvm.o
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index d088247843c5..55df85184241 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -72,8 +72,12 @@ int kvm_arch_init(void *opaque)
 	if (ret)
 		return ret;
 
+	kvm_riscv_stage2_vmid_detect();
+
 	kvm_info("hypervisor extension available\n");
 
+	kvm_info("host has %ld VMID bits\n", kvm_riscv_stage2_vmid_bits());
+
 	return 0;
 }
 
diff --git a/arch/riscv/kvm/tlb.S b/arch/riscv/kvm/tlb.S
new file mode 100644
index 000000000000..453fca8d7940
--- /dev/null
+++ b/arch/riscv/kvm/tlb.S
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Anup Patel <anup.patel@wdc.com>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+
+	.text
+	.altmacro
+	.option norelax
+
+	/*
+	 * Instruction encoding of hfence.gvma is:
+	 * 0110001 rs2(5) rs1(5) 000 00000 1110011
+	 */
+
+ENTRY(__kvm_riscv_hfence_gvma_vmid_gpa)
+	/* hfence.gvma a1, a0 */
+	.word 0x62a60073
+	ret
+ENDPROC(__kvm_riscv_hfence_gvma_vmid_gpa)
+
+ENTRY(__kvm_riscv_hfence_gvma_vmid)
+	/* hfence.gvma zero, a0 */
+	.word 0x62a00073
+	ret
+ENDPROC(__kvm_riscv_hfence_gvma_vmid)
+
+ENTRY(__kvm_riscv_hfence_gvma_gpa)
+	/* hfence.gvma a0 */
+	.word 0x62050073
+	ret
+ENDPROC(__kvm_riscv_hfence_gvma_gpa)
+
+ENTRY(__kvm_riscv_hfence_gvma_all)
+	/* hfence.gvma */
+	.word 0x62000073
+	ret
+ENDPROC(__kvm_riscv_hfence_gvma_all)
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index e6d74a9a2fdf..6124077d154f 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -628,6 +628,12 @@ static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
 			kvm_riscv_reset_vcpu(vcpu);
+
+		if (kvm_check_request(KVM_REQ_UPDATE_HGATP, vcpu))
+			kvm_riscv_stage2_update_hgatp(vcpu);
+
+		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
+			__kvm_riscv_hfence_gvma_all();
 	}
 }
 
@@ -690,6 +696,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		/* Check conditions before entering the guest */
 		cond_resched();
 
+		kvm_riscv_stage2_vmid_update(vcpu);
+
 		kvm_riscv_check_vcpu_requests(vcpu);
 
 		preempt_disable();
@@ -726,6 +734,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_riscv_update_vsip(vcpu);
 
 		if (ret <= 0 ||
+		    kvm_riscv_stage2_vmid_ver_changed(&vcpu->kvm->arch.vmid) ||
 		    kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
 			local_irq_enable();
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index ac0211820521..c5aab5478c38 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -26,6 +26,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (r)
 		return r;
 
+	r = kvm_riscv_stage2_vmid_init(kvm);
+	if (r) {
+		kvm_riscv_stage2_free_pgd(kvm);
+		return r;
+	}
+
 	return 0;
 }
 
diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
new file mode 100644
index 000000000000..8154feea12d5
--- /dev/null
+++ b/arch/riscv/kvm/vmid.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Anup Patel <anup.patel@wdc.com>
+ */
+
+#include <linux/bitops.h>
+#include <linux/cpumask.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/kvm_host.h>
+#include <asm/csr.h>
+
+static unsigned long vmid_version = 1;
+static unsigned long vmid_next;
+static unsigned long vmid_bits;
+static DEFINE_SPINLOCK(vmid_lock);
+
+void kvm_riscv_stage2_vmid_detect(void)
+{
+	unsigned long old;
+
+	/* Figure-out number of VMID bits in HW */
+	old = csr_read(CSR_HGATP);
+	csr_write(CSR_HGATP, old | HGATP_VMID_MASK);
+	vmid_bits = csr_read(CSR_HGATP);
+	vmid_bits = (vmid_bits & HGATP_VMID_MASK) >> HGATP_VMID_SHIFT;
+	vmid_bits = fls_long(vmid_bits);
+	csr_write(CSR_HGATP, old);
+
+	/* We polluted local TLB so flush all guest TLB */
+	__kvm_riscv_hfence_gvma_all();
+
+	/* We don't use VMID bits if they are not sufficient */
+	if ((1UL << vmid_bits) < num_possible_cpus())
+		vmid_bits = 0;
+}
+
+unsigned long kvm_riscv_stage2_vmid_bits(void)
+{
+	return vmid_bits;
+}
+
+int kvm_riscv_stage2_vmid_init(struct kvm *kvm)
+{
+	/* Mark the initial VMID and VMID version invalid */
+	kvm->arch.vmid.vmid_version = 0;
+	kvm->arch.vmid.vmid = 0;
+
+	return 0;
+}
+
+static void local_guest_tlb_flush_all(void *info)
+{
+	__kvm_riscv_hfence_gvma_all();
+}
+
+bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid)
+{
+	if (!vmid_bits)
+		return false;
+
+	return unlikely(READ_ONCE(vmid->vmid_version) !=
+			READ_ONCE(vmid_version));
+}
+
+void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu)
+{
+	int i;
+	struct kvm_vcpu *v;
+	struct kvm_vmid *vmid = &vcpu->kvm->arch.vmid;
+
+	if (!kvm_riscv_stage2_vmid_ver_changed(vmid))
+		return;
+
+	spin_lock(&vmid_lock);
+
+	/*
+	 * We need to re-check the vmid_version here to ensure that if
+	 * another vcpu already allocated a valid vmid for this vm.
+	 */
+	if (!kvm_riscv_stage2_vmid_ver_changed(vmid)) {
+		spin_unlock(&vmid_lock);
+		return;
+	}
+
+	/* First user of a new VMID version? */
+	if (unlikely(vmid_next == 0)) {
+		WRITE_ONCE(vmid_version, READ_ONCE(vmid_version) + 1);
+		vmid_next = 1;
+
+		/*
+		 * We ran out of VMIDs so we increment vmid_version and
+		 * start assigning VMIDs from 1.
+		 *
+		 * This also means existing VMIDs assignement to all Guest
+		 * instances is invalid and we have force VMID re-assignement
+		 * for all Guest instances. The Guest instances that were not
+		 * running will automatically pick-up new VMIDs because will
+		 * call kvm_riscv_stage2_vmid_update() whenever they enter
+		 * in-kernel run loop. For Guest instances that are already
+		 * running, we force VM exits on all host CPUs using IPI and
+		 * flush all Guest TLBs.
+		 */
+		smp_call_function_many(cpu_all_mask, local_guest_tlb_flush_all,
+				       NULL, true);
+	}
+
+	vmid->vmid = vmid_next;
+	vmid_next++;
+	vmid_next &= (1 << vmid_bits) - 1;
+
+	WRITE_ONCE(vmid->vmid_version, READ_ONCE(vmid_version));
+
+	spin_unlock(&vmid_lock);
+
+	/* Request stage2 page table update for all VCPUs */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		kvm_make_request(KVM_REQ_UPDATE_HGATP, v);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (11 preceding siblings ...)
  2019-08-22  8:45 ` [PATCH v5 12/20] RISC-V: KVM: Implement VMID allocator Anup Patel
@ 2019-08-22  8:45 ` Anup Patel
  2019-08-22 12:28   ` Alexander Graf
  2019-08-22  8:45 ` [PATCH v5 14/20] RISC-V: KVM: Implement MMU notifiers Anup Patel
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:45 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

This patch implements all required functions for programming
the stage2 page table for each Guest/VM.

At high-level, the flow of stage2 related functions is similar
from KVM ARM/ARM64 implementation but the stage2 page table
format is quite different for KVM RISC-V.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/kvm_host.h     |  10 +
 arch/riscv/include/asm/pgtable-bits.h |   1 +
 arch/riscv/kvm/mmu.c                  | 637 +++++++++++++++++++++++++-
 3 files changed, 638 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 3b09158f80f2..a37775c92586 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -72,6 +72,13 @@ struct kvm_mmio_decode {
 	int shift;
 };
 
+#define KVM_MMU_PAGE_CACHE_NR_OBJS	32
+
+struct kvm_mmu_page_cache {
+	int nobjs;
+	void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS];
+};
+
 struct kvm_cpu_context {
 	unsigned long zero;
 	unsigned long ra;
@@ -163,6 +170,9 @@ struct kvm_vcpu_arch {
 	/* MMIO instruction details */
 	struct kvm_mmio_decode mmio_decode;
 
+	/* Cache pages needed to program page tables with spinlock held */
+	struct kvm_mmu_page_cache mmu_page_cache;
+
 	/* VCPU power-off state */
 	bool power_off;
 
diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
index bbaeb5d35842..be49d62fcc2b 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -26,6 +26,7 @@
 
 #define _PAGE_SPECIAL   _PAGE_SOFT
 #define _PAGE_TABLE     _PAGE_PRESENT
+#define _PAGE_LEAF      (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 
 /*
  * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 2b965f9aac07..9e95ab6769f6 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -18,6 +18,432 @@
 #include <asm/page.h>
 #include <asm/pgtable.h>
 
+#ifdef CONFIG_64BIT
+#define stage2_have_pmd		true
+#define stage2_gpa_size		((phys_addr_t)(1ULL << 39))
+#define stage2_cache_min_pages	2
+#else
+#define pmd_index(x)		0
+#define pfn_pmd(x, y)		({ pmd_t __x = { 0 }; __x; })
+#define stage2_have_pmd		false
+#define stage2_gpa_size		((phys_addr_t)(1ULL << 32))
+#define stage2_cache_min_pages	1
+#endif
+
+static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache,
+			      int min, int max)
+{
+	void *page;
+
+	BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS);
+	if (pcache->nobjs >= min)
+		return 0;
+	while (pcache->nobjs < max) {
+		page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			return -ENOMEM;
+		pcache->objects[pcache->nobjs++] = page;
+	}
+
+	return 0;
+}
+
+static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache)
+{
+	while (pcache && pcache->nobjs)
+		free_page((unsigned long)pcache->objects[--pcache->nobjs]);
+}
+
+static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
+{
+	void *p;
+
+	if (!pcache)
+		return NULL;
+
+	BUG_ON(!pcache->nobjs);
+	p = pcache->objects[--pcache->nobjs];
+
+	return p;
+}
+
+struct local_guest_tlb_info {
+	struct kvm_vmid *vmid;
+	gpa_t addr;
+};
+
+static void local_guest_tlb_flush_vmid_gpa(void *info)
+{
+	struct local_guest_tlb_info *infop = info;
+
+	__kvm_riscv_hfence_gvma_vmid_gpa(READ_ONCE(infop->vmid->vmid_version),
+					 infop->addr);
+}
+
+static void stage2_remote_tlb_flush(struct kvm *kvm, gpa_t addr)
+{
+	struct local_guest_tlb_info info;
+	struct kvm_vmid *vmid = &kvm->arch.vmid;
+
+	/* TODO: This should be SBI call */
+	info.vmid = vmid;
+	info.addr = addr;
+	preempt_disable();
+	smp_call_function_many(cpu_all_mask, local_guest_tlb_flush_vmid_gpa,
+			       &info, true);
+	preempt_enable();
+}
+
+static int stage2_set_pgd(struct kvm *kvm, gpa_t addr, const pgd_t *new_pgd)
+{
+	pgd_t *pgdp = &kvm->arch.pgd[pgd_index(addr)];
+
+	*pgdp = *new_pgd;
+	if (pgd_val(*pgdp) & _PAGE_LEAF)
+		stage2_remote_tlb_flush(kvm, addr);
+
+	return 0;
+}
+
+static int stage2_set_pmd(struct kvm *kvm, struct kvm_mmu_page_cache *pcache,
+			  gpa_t addr, const pmd_t *new_pmd)
+{
+	int rc;
+	pmd_t *pmdp;
+	pgd_t new_pgd;
+	pgd_t *pgdp = &kvm->arch.pgd[pgd_index(addr)];
+
+	if (!pgd_val(*pgdp)) {
+		pmdp = stage2_cache_alloc(pcache);
+		if (!pmdp)
+			return -ENOMEM;
+		new_pgd = pfn_pgd(PFN_DOWN(__pa(pmdp)), __pgprot(_PAGE_TABLE));
+		rc = stage2_set_pgd(kvm, addr, &new_pgd);
+		if (rc)
+			return rc;
+	}
+
+	if (pgd_val(*pgdp) & _PAGE_LEAF)
+		return -EEXIST;
+
+	pmdp = (void *)pgd_page_vaddr(*pgdp);
+	pmdp = &pmdp[pmd_index(addr)];
+
+	*pmdp = *new_pmd;
+	if (pmd_val(*pmdp) & _PAGE_LEAF)
+		stage2_remote_tlb_flush(kvm, addr);
+
+	return 0;
+}
+
+static int stage2_set_pte(struct kvm *kvm,
+			  struct kvm_mmu_page_cache *pcache,
+			  gpa_t addr, const pte_t *new_pte)
+{
+	int rc;
+	pte_t *ptep;
+	pmd_t new_pmd;
+	pmd_t *pmdp;
+	pgd_t new_pgd;
+	pgd_t *pgdp = &kvm->arch.pgd[pgd_index(addr)];
+
+	if (!pgd_val(*pgdp)) {
+		pmdp = stage2_cache_alloc(pcache);
+		if (!pmdp)
+			return -ENOMEM;
+		new_pgd = pfn_pgd(PFN_DOWN(__pa(pmdp)), __pgprot(_PAGE_TABLE));
+		rc = stage2_set_pgd(kvm, addr, &new_pgd);
+		if (rc)
+			return rc;
+	}
+
+	if (pgd_val(*pgdp) & _PAGE_LEAF)
+		return -EEXIST;
+
+	if (stage2_have_pmd) {
+		pmdp = (void *)pgd_page_vaddr(*pgdp);
+		pmdp = &pmdp[pmd_index(addr)];
+		if (!pmd_present(*pmdp)) {
+			ptep = stage2_cache_alloc(pcache);
+			if (!ptep)
+				return -ENOMEM;
+			new_pmd = pfn_pmd(PFN_DOWN(__pa(ptep)),
+					  __pgprot(_PAGE_TABLE));
+			rc = stage2_set_pmd(kvm, pcache, addr, &new_pmd);
+			if (rc)
+				return rc;
+		}
+
+		if (pmd_val(*pmdp) & _PAGE_LEAF)
+			return -EEXIST;
+
+		ptep = (void *)pmd_page_vaddr(*pmdp);
+	} else {
+		ptep = (void *)pgd_page_vaddr(*pgdp);
+	}
+
+	ptep = &ptep[pte_index(addr)];
+
+	*ptep = *new_pte;
+	if (pte_val(*ptep) & _PAGE_LEAF)
+		stage2_remote_tlb_flush(kvm, addr);
+
+	return 0;
+}
+
+static int stage2_map_page(struct kvm *kvm,
+			   struct kvm_mmu_page_cache *pcache,
+			   gpa_t gpa, phys_addr_t hpa,
+			   unsigned long page_size, pgprot_t prot)
+{
+	pte_t new_pte;
+	pmd_t new_pmd;
+	pgd_t new_pgd;
+
+	if (page_size == PAGE_SIZE) {
+		new_pte = pfn_pte(PFN_DOWN(hpa), prot);
+		return stage2_set_pte(kvm, pcache, gpa, &new_pte);
+	}
+
+	if (stage2_have_pmd && page_size == PMD_SIZE) {
+		new_pmd = pfn_pmd(PFN_DOWN(hpa), prot);
+		return stage2_set_pmd(kvm, pcache, gpa, &new_pmd);
+	}
+
+	if (page_size == PGDIR_SIZE) {
+		new_pgd = pfn_pgd(PFN_DOWN(hpa), prot);
+		return stage2_set_pgd(kvm, gpa, &new_pgd);
+	}
+
+	return -EINVAL;
+}
+
+enum stage2_op {
+	STAGE2_OP_NOP = 0,	/* Nothing */
+	STAGE2_OP_CLEAR,	/* Clear/Unmap */
+	STAGE2_OP_WP,		/* Write-protect */
+};
+
+static void stage2_op_pte(struct kvm *kvm, gpa_t addr, pte_t *ptep,
+			  enum stage2_op op)
+{
+	BUG_ON(addr & (PAGE_SIZE - 1));
+
+	if (!pte_present(*ptep))
+		return;
+
+	if (op == STAGE2_OP_CLEAR)
+		set_pte(ptep, __pte(0));
+	else if (op == STAGE2_OP_WP)
+		set_pte(ptep, __pte(pte_val(*ptep) & ~_PAGE_WRITE));
+	stage2_remote_tlb_flush(kvm, addr);
+}
+
+static void stage2_op_pmd(struct kvm *kvm, gpa_t addr, pmd_t *pmdp,
+			  enum stage2_op op)
+{
+	int i;
+	pte_t *ptep;
+
+	BUG_ON(addr & (PMD_SIZE - 1));
+
+	if (!pmd_present(*pmdp))
+		return;
+
+	if (pmd_val(*pmdp) & _PAGE_LEAF)
+		ptep = NULL;
+	else
+		ptep = (pte_t *)pmd_page_vaddr(*pmdp);
+
+	if (op == STAGE2_OP_CLEAR)
+		set_pmd(pmdp, __pmd(0));
+
+	if (ptep) {
+		for (i = 0; i < PTRS_PER_PTE; i++)
+			stage2_op_pte(kvm, addr + i * PAGE_SIZE, &ptep[i], op);
+		if (op == STAGE2_OP_CLEAR)
+			put_page(virt_to_page(ptep));
+	} else {
+		if (op == STAGE2_OP_WP)
+			set_pmd(pmdp, __pmd(pmd_val(*pmdp) & ~_PAGE_WRITE));
+		stage2_remote_tlb_flush(kvm, addr);
+	}
+}
+
+static void stage2_op_pgd(struct kvm *kvm, gpa_t addr, pgd_t *pgdp,
+			  enum stage2_op op)
+{
+	int i;
+	pte_t *ptep;
+	pmd_t *pmdp;
+
+	BUG_ON(addr & (PGDIR_SIZE - 1));
+
+	if (!pgd_val(*pgdp))
+		return;
+
+	ptep = NULL;
+	pmdp = NULL;
+	if (!(pgd_val(*pgdp) & _PAGE_LEAF)) {
+		if (stage2_have_pmd)
+			pmdp = (pmd_t *)pgd_page_vaddr(*pgdp);
+		else
+			ptep = (pte_t *)pgd_page_vaddr(*pgdp);
+	}
+
+	if (op == STAGE2_OP_CLEAR)
+		set_pgd(pgdp, __pgd(0));
+
+	if (pmdp) {
+		for (i = 0; i < PTRS_PER_PMD; i++)
+			stage2_op_pmd(kvm, addr + i * PMD_SIZE, &pmdp[i], op);
+		if (op == STAGE2_OP_CLEAR)
+			put_page(virt_to_page(pmdp));
+	} else if (ptep) {
+		for (i = 0; i < PTRS_PER_PTE; i++)
+			stage2_op_pte(kvm, addr + i * PAGE_SIZE, &ptep[i], op);
+		if (op == STAGE2_OP_CLEAR)
+			put_page(virt_to_page(ptep));
+	} else {
+		if (op == STAGE2_OP_WP)
+			set_pgd(pgdp, __pgd(pgd_val(*pgdp) & ~_PAGE_WRITE));
+		stage2_remote_tlb_flush(kvm, addr);
+	}
+}
+
+static void stage2_unmap_range(struct kvm *kvm, gpa_t start, gpa_t size)
+{
+	pmd_t *pmdp;
+	pte_t *ptep;
+	pgd_t *pgdp;
+	gpa_t addr = start, end = start + size;
+
+	while (addr < end) {
+		pgdp = &kvm->arch.pgd[pgd_index(addr)];
+		if (!pgd_val(*pgdp)) {
+			addr += PGDIR_SIZE;
+			continue;
+		} else if (!(addr & (PGDIR_SIZE - 1)) &&
+			  ((end - addr) >= PGDIR_SIZE)) {
+			stage2_op_pgd(kvm, addr, pgdp, STAGE2_OP_CLEAR);
+			addr += PGDIR_SIZE;
+			continue;
+		}
+
+		if (stage2_have_pmd) {
+			pmdp = (pmd_t *)pgd_page_vaddr(*pgdp);
+			if (!pmd_present(*pmdp)) {
+				addr += PMD_SIZE;
+				continue;
+			} else if (!(addr & (PMD_SIZE - 1)) &&
+				   ((end - addr) >= PMD_SIZE)) {
+				stage2_op_pmd(kvm, addr, pmdp,
+					      STAGE2_OP_CLEAR);
+				addr += PMD_SIZE;
+				continue;
+			}
+			ptep = (pte_t *)pmd_page_vaddr(*pmdp);
+		} else {
+			ptep = (pte_t *)pgd_page_vaddr(*pgdp);
+		}
+
+		stage2_op_pte(kvm, addr, ptep, STAGE2_OP_CLEAR);
+		addr += PAGE_SIZE;
+	}
+}
+
+static void stage2_wp_range(struct kvm *kvm, gpa_t start, gpa_t end)
+{
+	pmd_t *pmdp;
+	pte_t *ptep;
+	pgd_t *pgdp;
+	gpa_t addr = start;
+
+	while (addr < end) {
+		pgdp = &kvm->arch.pgd[pgd_index(addr)];
+		if (!pgd_val(*pgdp)) {
+			addr += PGDIR_SIZE;
+			continue;
+		} else if (!(addr & (PGDIR_SIZE - 1)) &&
+			   ((end - addr) >= PGDIR_SIZE)) {
+			stage2_op_pgd(kvm, addr, pgdp, STAGE2_OP_WP);
+			addr += PGDIR_SIZE;
+			continue;
+		}
+
+		if (stage2_have_pmd) {
+			pmdp = (pmd_t *)pgd_page_vaddr(*pgdp);
+			if (!pmd_present(*pmdp)) {
+				addr += PMD_SIZE;
+				continue;
+			} else if (!(addr & (PMD_SIZE - 1)) &&
+				   ((end - addr) >= PMD_SIZE)) {
+				stage2_op_pmd(kvm, addr, pmdp, STAGE2_OP_WP);
+				addr += PMD_SIZE;
+				continue;
+			}
+			ptep = (pte_t *)pmd_page_vaddr(*pmdp);
+		} else {
+			ptep = (pte_t *)pgd_page_vaddr(*pgdp);
+		}
+
+		stage2_op_pte(kvm, addr, ptep, STAGE2_OP_WP);
+		addr += PAGE_SIZE;
+	}
+}
+
+void stage2_wp_memory_region(struct kvm *kvm, int slot)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+
+	spin_lock(&kvm->mmu_lock);
+	stage2_wp_range(kvm, start, end);
+	spin_unlock(&kvm->mmu_lock);
+	kvm_flush_remote_tlbs(kvm);
+}
+
+int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
+		   unsigned long size, bool writable)
+{
+	pte_t pte;
+	int ret = 0;
+	unsigned long pfn;
+	phys_addr_t addr, end;
+	struct kvm_mmu_page_cache pcache = { 0, };
+
+	end = (gpa + size + PAGE_SIZE - 1) & PAGE_MASK;
+	pfn = __phys_to_pfn(hpa);
+
+	for (addr = gpa; addr < end; addr += PAGE_SIZE) {
+		pte = pfn_pte(pfn, PAGE_KERNEL);
+
+		if (!writable)
+			pte = pte_wrprotect(pte);
+
+		ret = stage2_cache_topup(&pcache,
+					 stage2_cache_min_pages,
+					 KVM_MMU_PAGE_CACHE_NR_OBJS);
+		if (ret)
+			goto out;
+
+		spin_lock(&kvm->mmu_lock);
+		ret = stage2_set_pte(kvm, &pcache, addr, &pte);
+		spin_unlock(&kvm->mmu_lock);
+		if (ret)
+			goto out;
+
+		pfn++;
+	}
+
+out:
+	stage2_cache_flush(&pcache);
+	return ret;
+
+}
+
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 			   struct kvm_memory_slot *dont)
 {
@@ -35,7 +461,7 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
-	/* TODO: */
+	kvm_riscv_stage2_free_pgd(kvm);
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
@@ -49,7 +475,13 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				const struct kvm_memory_slot *new,
 				enum kvm_mr_change change)
 {
-	/* TODO: */
+	/*
+	 * At this point memslot has been committed and there is an
+	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
+	 * memory slot is write protected.
+	 */
+	if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES)
+		stage2_wp_memory_region(kvm, mem->slot);
 }
 
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
@@ -57,34 +489,219 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				const struct kvm_userspace_memory_region *mem,
 				enum kvm_mr_change change)
 {
-	/* TODO: */
-	return 0;
+	hva_t hva = mem->userspace_addr;
+	hva_t reg_end = hva + mem->memory_size;
+	bool writable = !(mem->flags & KVM_MEM_READONLY);
+	int ret = 0;
+
+	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
+			change != KVM_MR_FLAGS_ONLY)
+		return 0;
+
+	/*
+	 * Prevent userspace from creating a memory region outside of the GPA
+	 * space addressable by the KVM guest GPA space.
+	 */
+	if ((memslot->base_gfn + memslot->npages) >=
+	    (stage2_gpa_size >> PAGE_SHIFT))
+		return -EFAULT;
+
+	down_read(&current->mm->mmap_sem);
+
+	/*
+	 * A memory region could potentially cover multiple VMAs, and
+	 * any holes between them, so iterate over all of them to find
+	 * out if we can map any of them right now.
+	 *
+	 *     +--------------------------------------------+
+	 * +---------------+----------------+   +----------------+
+	 * |   : VMA 1     |      VMA 2     |   |    VMA 3  :    |
+	 * +---------------+----------------+   +----------------+
+	 *     |               memory region                |
+	 *     +--------------------------------------------+
+	 */
+	do {
+		struct vm_area_struct *vma = find_vma(current->mm, hva);
+		hva_t vm_start, vm_end;
+
+		if (!vma || vma->vm_start >= reg_end)
+			break;
+
+		/*
+		 * Mapping a read-only VMA is only allowed if the
+		 * memory region is configured as read-only.
+		 */
+		if (writable && !(vma->vm_flags & VM_WRITE)) {
+			ret = -EPERM;
+			break;
+		}
+
+		/* Take the intersection of this VMA with the memory region */
+		vm_start = max(hva, vma->vm_start);
+		vm_end = min(reg_end, vma->vm_end);
+
+		if (vma->vm_flags & VM_PFNMAP) {
+			gpa_t gpa = mem->guest_phys_addr +
+				    (vm_start - mem->userspace_addr);
+			phys_addr_t pa;
+
+			pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
+			pa += vm_start - vma->vm_start;
+
+			/* IO region dirty page logging not allowed */
+			if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			ret = stage2_ioremap(kvm, gpa, pa,
+					     vm_end - vm_start, writable);
+			if (ret)
+				break;
+		}
+		hva = vm_end;
+	} while (hva < reg_end);
+
+	if (change == KVM_MR_FLAGS_ONLY)
+		goto out;
+
+	spin_lock(&kvm->mmu_lock);
+	if (ret)
+		stage2_unmap_range(kvm, mem->guest_phys_addr,
+				   mem->memory_size);
+	spin_unlock(&kvm->mmu_lock);
+
+out:
+	up_read(&current->mm->mmap_sem);
+	return ret;
 }
 
 int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
 			 bool is_write)
 {
-	/* TODO: */
-	return 0;
+	int ret;
+	short lsb;
+	kvm_pfn_t hfn;
+	bool writeable;
+	gfn_t gfn = gpa >> PAGE_SHIFT;
+	struct vm_area_struct *vma;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_mmu_page_cache *pcache = &vcpu->arch.mmu_page_cache;
+	unsigned long vma_pagesize;
+
+	down_read(&current->mm->mmap_sem);
+
+	vma = find_vma_intersection(current->mm, hva, hva + 1);
+	if (unlikely(!vma)) {
+		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
+		up_read(&current->mm->mmap_sem);
+		return -EFAULT;
+	}
+
+	vma_pagesize = vma_kernel_pagesize(vma);
+
+	up_read(&current->mm->mmap_sem);
+
+	if (vma_pagesize != PGDIR_SIZE &&
+	    vma_pagesize != PMD_SIZE &&
+	    vma_pagesize != PAGE_SIZE) {
+		kvm_err("Invalid VMA page size 0x%lx\n", vma_pagesize);
+		return -EFAULT;
+	}
+
+	/* We need minimum second+third level pages */
+	ret = stage2_cache_topup(pcache, stage2_cache_min_pages,
+				 KVM_MMU_PAGE_CACHE_NR_OBJS);
+	if (ret) {
+		kvm_err("Failed to topup stage2 cache\n");
+		return ret;
+	}
+
+	hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writeable);
+	if (hfn == KVM_PFN_ERR_HWPOISON) {
+		if (is_vm_hugetlb_page(vma))
+			lsb = huge_page_shift(hstate_vma(vma));
+		else
+			lsb = PAGE_SHIFT;
+
+		send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
+				lsb, current);
+		return 0;
+	}
+	if (is_error_noslot_pfn(hfn))
+		return -EFAULT;
+	if (!writeable && is_write)
+		return -EPERM;
+
+	spin_lock(&kvm->mmu_lock);
+
+	if (writeable) {
+		kvm_set_pfn_dirty(hfn);
+		ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
+				      vma_pagesize, PAGE_WRITE_EXEC);
+	} else {
+		ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
+				      vma_pagesize, PAGE_READ_EXEC);
+	}
+
+	if (ret)
+		kvm_err("Failed to map in stage2\n");
+
+	spin_unlock(&kvm->mmu_lock);
+	kvm_set_pfn_accessed(hfn);
+	kvm_release_pfn_clean(hfn);
+	return ret;
 }
 
 void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
+	stage2_cache_flush(&vcpu->arch.mmu_page_cache);
 }
 
 int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm)
 {
-	/* TODO: */
+	if (kvm->arch.pgd != NULL) {
+		kvm_err("kvm_arch already initialized?\n");
+		return -EINVAL;
+	}
+
+	kvm->arch.pgd = alloc_pages_exact(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
+	if (!kvm->arch.pgd)
+		return -ENOMEM;
+	kvm->arch.pgd_phys = virt_to_phys(kvm->arch.pgd);
+
 	return 0;
 }
 
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm)
 {
-	/* TODO: */
+	void *pgd = NULL;
+
+	spin_lock(&kvm->mmu_lock);
+	if (kvm->arch.pgd) {
+		stage2_unmap_range(kvm, 0UL, stage2_gpa_size);
+		pgd = READ_ONCE(kvm->arch.pgd);
+		kvm->arch.pgd = NULL;
+		kvm->arch.pgd_phys = 0;
+	}
+	spin_unlock(&kvm->mmu_lock);
+
+	/* Free the HW pgd, one page at a time */
+	if (pgd)
+		free_pages_exact(pgd, PAGE_SIZE);
 }
 
 void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
+	unsigned long hgatp = HGATP_MODE;
+	struct kvm_arch *k = &vcpu->kvm->arch;
+
+	hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) &
+		 HGATP_VMID_MASK;
+	hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
+
+	csr_write(CSR_HGATP, hgatp);
+
+	if (!kvm_riscv_stage2_vmid_bits())
+		__kvm_riscv_hfence_gvma_all();
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 14/20] RISC-V: KVM: Implement MMU notifiers
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (12 preceding siblings ...)
  2019-08-22  8:45 ` [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming Anup Patel
@ 2019-08-22  8:45 ` Anup Patel
  2019-08-22  8:46 ` [PATCH v5 15/20] RISC-V: KVM: Add timer functionality Anup Patel
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:45 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

This patch implements MMU notifiers for KVM RISC-V so that Guest
physical address space is in-sync with Host physical address space.

This will allow swapping, page migration, etc to work transparently
with KVM RISC-V.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/kvm_host.h |   7 ++
 arch/riscv/kvm/Kconfig            |   1 +
 arch/riscv/kvm/mmu.c              | 200 +++++++++++++++++++++++++++++-
 arch/riscv/kvm/vm.c               |   1 +
 4 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index a37775c92586..ab33e59a3d88 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -192,6 +192,13 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 int kvm_riscv_setup_vsip(void);
 void kvm_riscv_cleanup_vsip(void);
 
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
+
 extern void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long vmid,
 					     unsigned long gpa);
 extern void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid);
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index 35fd30d0e432..002e14ee37f6 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -20,6 +20,7 @@ if VIRTUALIZATION
 config KVM
 	tristate "Kernel-based Virtual Machine (KVM) support"
 	depends on OF
+	select MMU_NOTIFIER
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 	select KVM_MMIO
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 9e95ab6769f6..0b8e46aebb02 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -67,6 +67,66 @@ static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
 	return p;
 }
 
+static int stage2_pgdp_test_and_clear_young(pgd_t *pgd)
+{
+	return ptep_test_and_clear_young(NULL, 0, (pte_t *)pgd);
+}
+
+static int stage2_pmdp_test_and_clear_young(pmd_t *pmd)
+{
+	return ptep_test_and_clear_young(NULL, 0, (pte_t *)pmd);
+}
+
+static int stage2_ptep_test_and_clear_young(pte_t *pte)
+{
+	return ptep_test_and_clear_young(NULL, 0, pte);
+}
+
+static bool stage2_get_leaf_entry(struct kvm *kvm, gpa_t addr,
+				  pgd_t **pgdpp, pmd_t **pmdpp, pte_t **ptepp)
+{
+	pgd_t *pgdp;
+	pmd_t *pmdp;
+	pte_t *ptep;
+
+	*pgdpp = NULL;
+	*pmdpp = NULL;
+	*ptepp = NULL;
+
+	pgdp = &kvm->arch.pgd[pgd_index(addr)];
+	if (!pgd_val(*pgdp))
+		return false;
+	if (pgd_val(*pgdp) & _PAGE_LEAF) {
+		*pgdpp = pgdp;
+		return true;
+	}
+
+	if (stage2_have_pmd) {
+		pmdp = (void *)pgd_page_vaddr(*pgdp);
+		pmdp = &pmdp[pmd_index(addr)];
+		if (!pmd_present(*pmdp))
+			return false;
+		if (pmd_val(*pmdp) & _PAGE_LEAF) {
+			*pmdpp = pmdp;
+			return true;
+		}
+
+		ptep = (void *)pmd_page_vaddr(*pmdp);
+	} else {
+		ptep = (void *)pgd_page_vaddr(*pgdp);
+	}
+
+	ptep = &ptep[pte_index(addr)];
+	if (!pte_present(*ptep))
+		return false;
+	if (pte_val(*ptep) & _PAGE_LEAF) {
+		*ptepp = ptep;
+		return true;
+	}
+
+	return false;
+}
+
 struct local_guest_tlb_info {
 	struct kvm_vmid *vmid;
 	gpa_t addr;
@@ -444,6 +504,38 @@ int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
 
 }
 
+static int handle_hva_to_gpa(struct kvm *kvm,
+			     unsigned long start,
+			     unsigned long end,
+			     int (*handler)(struct kvm *kvm,
+					    gpa_t gpa, u64 size,
+					    void *data),
+			     void *data)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	int ret = 0;
+
+	slots = kvm_memslots(kvm);
+
+	/* we only care about the pages that the guest sees */
+	kvm_for_each_memslot(memslot, slots) {
+		unsigned long hva_start, hva_end;
+		gfn_t gpa;
+
+		hva_start = max(start, memslot->userspace_addr);
+		hva_end = min(end, memslot->userspace_addr +
+					(memslot->npages << PAGE_SHIFT));
+		if (hva_start >= hva_end)
+			continue;
+
+		gpa = hva_to_gfn_memslot(hva_start, memslot) << PAGE_SHIFT;
+		ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data);
+	}
+
+	return ret;
+}
+
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 			   struct kvm_memory_slot *dont)
 {
@@ -576,6 +668,106 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	return ret;
 }
 
+static int kvm_unmap_hva_handler(struct kvm *kvm,
+				 gpa_t gpa, u64 size, void *data)
+{
+	stage2_unmap_range(kvm, gpa, size);
+	return 0;
+}
+
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end)
+{
+	if (!kvm->arch.pgd)
+		return 0;
+
+	handle_hva_to_gpa(kvm, start, end,
+			  &kvm_unmap_hva_handler, NULL);
+	return 0;
+}
+
+static int kvm_set_spte_handler(struct kvm *kvm,
+				gpa_t gpa, u64 size, void *data)
+{
+	pte_t *pte = (pte_t *)data;
+
+	WARN_ON(size != PAGE_SIZE);
+	stage2_set_pte(kvm, NULL, gpa, pte);
+
+	return 0;
+}
+
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+	unsigned long end = hva + PAGE_SIZE;
+	kvm_pfn_t pfn = pte_pfn(pte);
+	pte_t stage2_pte;
+
+	if (!kvm->arch.pgd)
+		return 0;
+
+	stage2_pte = pfn_pte(pfn, PAGE_WRITE_EXEC);
+	handle_hva_to_gpa(kvm, hva, end,
+			  &kvm_set_spte_handler, &stage2_pte);
+
+	return 0;
+}
+
+static int kvm_age_hva_handler(struct kvm *kvm,
+				gpa_t gpa, u64 size, void *data)
+{
+	pgd_t *pgd;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PGDIR_SIZE);
+	if (!stage2_get_leaf_entry(kvm, gpa, &pgd, &pmd, &pte))
+		return 0;
+
+	if (pgd)
+		return stage2_pgdp_test_and_clear_young(pgd);
+	else if (pmd)
+		return stage2_pmdp_test_and_clear_young(pmd);
+	else
+		return stage2_ptep_test_and_clear_young(pte);
+}
+
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
+{
+	if (!kvm->arch.pgd)
+		return 0;
+
+	return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
+}
+
+static int kvm_test_age_hva_handler(struct kvm *kvm,
+				    gpa_t gpa, u64 size, void *data)
+{
+	pgd_t *pgd;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
+	if (!stage2_get_leaf_entry(kvm, gpa, &pgd, &pmd, &pte))
+		return 0;
+
+	if (pgd)
+		return pte_young(*((pte_t *)pgd));
+	else if (pmd)
+		return pte_young(*((pte_t *)pmd));
+	else
+		return pte_young(*pte);
+}
+
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	if (!kvm->arch.pgd)
+		return 0;
+
+	return handle_hva_to_gpa(kvm, hva, hva,
+				 kvm_test_age_hva_handler, NULL);
+}
+
 int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
 			 bool is_write)
 {
@@ -587,7 +779,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
 	struct vm_area_struct *vma;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page_cache *pcache = &vcpu->arch.mmu_page_cache;
-	unsigned long vma_pagesize;
+	unsigned long vma_pagesize, mmu_seq;
 
 	down_read(&current->mm->mmap_sem);
 
@@ -617,6 +809,8 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
 		return ret;
 	}
 
+	mmu_seq = kvm->mmu_notifier_seq;
+
 	hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writeable);
 	if (hfn == KVM_PFN_ERR_HWPOISON) {
 		if (is_vm_hugetlb_page(vma))
@@ -635,6 +829,9 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
 
 	spin_lock(&kvm->mmu_lock);
 
+	if (mmu_notifier_retry(kvm, mmu_seq))
+		goto out_unlock;
+
 	if (writeable) {
 		kvm_set_pfn_dirty(hfn);
 		ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
@@ -647,6 +844,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
 	if (ret)
 		kvm_err("Failed to map in stage2\n");
 
+out_unlock:
 	spin_unlock(&kvm->mmu_lock);
 	kvm_set_pfn_accessed(hfn);
 	kvm_release_pfn_clean(hfn);
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index c5aab5478c38..fd84b4d914dc 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -54,6 +54,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	switch (ext) {
 	case KVM_CAP_DEVICE_CTRL:
 	case KVM_CAP_USER_MEMORY:
+	case KVM_CAP_SYNC_MMU:
 	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
 	case KVM_CAP_ONE_REG:
 	case KVM_CAP_READONLY_MEM:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 15/20] RISC-V: KVM: Add timer functionality
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (13 preceding siblings ...)
  2019-08-22  8:45 ` [PATCH v5 14/20] RISC-V: KVM: Implement MMU notifiers Anup Patel
@ 2019-08-22  8:46 ` Anup Patel
  2019-08-23  7:52   ` Alexander Graf
  2019-08-22  8:46 ` [PATCH v5 16/20] RISC-V: KVM: FP lazy save/restore Anup Patel
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:46 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

From: Atish Patra <atish.patra@wdc.com>

The RISC-V hypervisor specification doesn't have any virtual timer
feature.

Due to this, the guest VCPU timer will be programmed via SBI calls.
The host will use a separate hrtimer event for each guest VCPU to
provide timer functionality. We inject a virtual timer interrupt to
the guest VCPU whenever the guest VCPU hrtimer event expires.

The following features are not supported yet and will be added in
future:
1. A time offset to adjust guest time from host time
2. A saved next event in guest vcpu for vm migration

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/kvm_host.h       |   4 +
 arch/riscv/include/asm/kvm_vcpu_timer.h |  32 +++++++
 arch/riscv/kvm/Makefile                 |   2 +-
 arch/riscv/kvm/vcpu.c                   |   6 ++
 arch/riscv/kvm/vcpu_timer.c             | 106 ++++++++++++++++++++++++
 drivers/clocksource/timer-riscv.c       |   8 ++
 include/clocksource/timer-riscv.h       |  16 ++++
 7 files changed, 173 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
 create mode 100644 arch/riscv/kvm/vcpu_timer.c
 create mode 100644 include/clocksource/timer-riscv.h

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index ab33e59a3d88..d2a2e45eefc0 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -12,6 +12,7 @@
 #include <linux/types.h>
 #include <linux/kvm.h>
 #include <linux/kvm_types.h>
+#include <asm/kvm_vcpu_timer.h>
 
 #ifdef CONFIG_64BIT
 #define KVM_MAX_VCPUS			(1U << 16)
@@ -167,6 +168,9 @@ struct kvm_vcpu_arch {
 	unsigned long irqs_pending;
 	unsigned long irqs_pending_mask;
 
+	/* VCPU Timer */
+	struct kvm_vcpu_timer timer;
+
 	/* MMIO instruction details */
 	struct kvm_mmio_decode mmio_decode;
 
diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h
new file mode 100644
index 000000000000..df67ea86988e
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *	Atish Patra <atish.patra@wdc.com>
+ */
+
+#ifndef __KVM_VCPU_RISCV_TIMER_H
+#define __KVM_VCPU_RISCV_TIMER_H
+
+#include <linux/hrtimer.h>
+
+#define VCPU_TIMER_PROGRAM_THRESHOLD_NS		1000
+
+struct kvm_vcpu_timer {
+	bool init_done;
+	/* Check if the timer is programmed */
+	bool is_set;
+	struct hrtimer hrt;
+	/* Mult & Shift values to get nanosec from cycles */
+	u32 mult;
+	u32 shift;
+};
+
+int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
+				    unsigned long ncycles);
+
+#endif
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index c0f57f26c13d..3e0c7558320d 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 kvm-objs := $(common-objs-y)
 
 kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
-kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
 
 obj-$(CONFIG_KVM)	+= kvm.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 6124077d154f..018fca436776 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -54,6 +54,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
 
 	memcpy(cntx, reset_cntx, sizeof(*cntx));
 
+	kvm_riscv_vcpu_timer_reset(vcpu);
+
 	WRITE_ONCE(vcpu->arch.irqs_pending, 0);
 	WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
 }
@@ -108,6 +110,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	cntx->hstatus |= HSTATUS_SP2P;
 	cntx->hstatus |= HSTATUS_SPV;
 
+	/* Setup VCPU timer */
+	kvm_riscv_vcpu_timer_init(vcpu);
+
 	/* Reset VCPU */
 	kvm_riscv_reset_vcpu(vcpu);
 
@@ -116,6 +121,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
+	kvm_riscv_vcpu_timer_deinit(vcpu);
 	kvm_riscv_stage2_flush_cache(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
new file mode 100644
index 000000000000..a45ca06e1aa6
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_timer.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Atish Patra <atish.patra@wdc.com>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <clocksource/timer-riscv.h>
+#include <asm/csr.h>
+#include <asm/kvm_vcpu_timer.h>
+
+static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h)
+{
+	struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt);
+	struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer);
+
+	t->is_set = false;
+	kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
+
+	return HRTIMER_NORESTART;
+}
+
+static u64 kvm_riscv_delta_cycles2ns(u64 cycles, struct kvm_vcpu_timer *t)
+{
+	unsigned long flags;
+	u64 cycles_now, cycles_delta, delta_ns;
+
+	local_irq_save(flags);
+	cycles_now = get_cycles64();
+	if (cycles_now < cycles)
+		cycles_delta = cycles - cycles_now;
+	else
+		cycles_delta = 0;
+	delta_ns = (cycles_delta * t->mult) >> t->shift;
+	local_irq_restore(flags);
+
+	return delta_ns;
+}
+
+static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t)
+{
+	if (!t->init_done || !t->is_set)
+		return -EINVAL;
+
+	hrtimer_cancel(&t->hrt);
+	t->is_set = false;
+
+	return 0;
+}
+
+int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
+				    unsigned long ncycles)
+{
+	struct kvm_vcpu_timer *t = &vcpu->arch.timer;
+	u64 delta_ns = kvm_riscv_delta_cycles2ns(ncycles, t);
+
+	if (!t->init_done)
+		return -EINVAL;
+
+	kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_TIMER);
+
+	if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
+		hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(), delta_ns),
+				HRTIMER_MODE_ABS);
+		t->is_set = true;
+	} else
+		kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
+
+	return 0;
+}
+
+int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_timer *t = &vcpu->arch.timer;
+
+	if (t->init_done)
+		return -EINVAL;
+
+	hrtimer_init(&t->hrt, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	t->hrt.function = kvm_riscv_vcpu_hrtimer_expired;
+	t->init_done = true;
+	t->is_set = false;
+
+	riscv_cs_get_mult_shift(&t->mult, &t->shift);
+
+	return 0;
+}
+
+int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	ret = kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer);
+	vcpu->arch.timer.init_done = false;
+
+	return ret;
+}
+
+int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu)
+{
+	return kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer);
+}
diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index 09e031176bc6..7c595203aa5c 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -8,6 +8,7 @@
 #include <linux/cpu.h>
 #include <linux/delay.h>
 #include <linux/irq.h>
+#include <linux/module.h>
 #include <linux/sched_clock.h>
 #include <asm/smp.h>
 #include <asm/sbi.h>
@@ -80,6 +81,13 @@ static int riscv_timer_dying_cpu(unsigned int cpu)
 	return 0;
 }
 
+void riscv_cs_get_mult_shift(u32 *mult, u32 *shift)
+{
+	*mult = riscv_clocksource.mult;
+	*shift = riscv_clocksource.shift;
+}
+EXPORT_SYMBOL_GPL(riscv_cs_get_mult_shift);
+
 /* called directly from the low-level interrupt handler */
 void riscv_timer_interrupt(void)
 {
diff --git a/include/clocksource/timer-riscv.h b/include/clocksource/timer-riscv.h
new file mode 100644
index 000000000000..e94e4feecbe8
--- /dev/null
+++ b/include/clocksource/timer-riscv.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *	Atish Patra <atish.patra@wdc.com>
+ */
+
+#ifndef __TIMER_RISCV_H
+#define __TIMER_RISCV_H
+
+#include <linux/types.h>
+
+void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
+
+#endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 16/20] RISC-V: KVM: FP lazy save/restore
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (14 preceding siblings ...)
  2019-08-22  8:46 ` [PATCH v5 15/20] RISC-V: KVM: Add timer functionality Anup Patel
@ 2019-08-22  8:46 ` Anup Patel
  2019-08-22  8:46 ` [PATCH v5 17/20] RISC-V: KVM: Implement ONE REG interface for FP registers Anup Patel
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:46 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

From: Atish Patra <atish.patra@wdc.com>

This patch adds floating point (F and D extension) context save/restore
for guest VCPUs. The FP context is saved and restored lazily only when
kernel enter/exits the in-kernel run loop and not during the KVM world
switch. This way FP save/restore has minimal impact on KVM performance.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/kvm_host.h |   5 +
 arch/riscv/kernel/asm-offsets.c   |  72 +++++++++++++
 arch/riscv/kvm/vcpu.c             |  81 ++++++++++++++
 arch/riscv/kvm/vcpu_switch.S      | 174 ++++++++++++++++++++++++++++++
 4 files changed, 332 insertions(+)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index d2a2e45eefc0..2af3a179c08e 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -116,6 +116,7 @@ struct kvm_cpu_context {
 	unsigned long sepc;
 	unsigned long sstatus;
 	unsigned long hstatus;
+	union __riscv_fp_state fp;
 };
 
 struct kvm_vcpu_csr {
@@ -227,6 +228,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 			unsigned long scause, unsigned long stval);
 
 void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch);
+void __kvm_riscv_vcpu_fp_f_save(struct kvm_cpu_context *context);
+void __kvm_riscv_vcpu_fp_f_restore(struct kvm_cpu_context *context);
+void __kvm_riscv_vcpu_fp_d_save(struct kvm_cpu_context *context);
+void __kvm_riscv_vcpu_fp_d_restore(struct kvm_cpu_context *context);
 
 int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
 int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 711656710190..9980069a1acf 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -185,6 +185,78 @@ void asm_offsets(void)
 	OFFSET(KVM_ARCH_HOST_SSCRATCH, kvm_vcpu_arch, host_sscratch);
 	OFFSET(KVM_ARCH_HOST_STVEC, kvm_vcpu_arch, host_stvec);
 
+	/* F extension */
+
+	OFFSET(KVM_ARCH_FP_F_F0, kvm_cpu_context, fp.f.f[0]);
+	OFFSET(KVM_ARCH_FP_F_F1, kvm_cpu_context, fp.f.f[1]);
+	OFFSET(KVM_ARCH_FP_F_F2, kvm_cpu_context, fp.f.f[2]);
+	OFFSET(KVM_ARCH_FP_F_F3, kvm_cpu_context, fp.f.f[3]);
+	OFFSET(KVM_ARCH_FP_F_F4, kvm_cpu_context, fp.f.f[4]);
+	OFFSET(KVM_ARCH_FP_F_F5, kvm_cpu_context, fp.f.f[5]);
+	OFFSET(KVM_ARCH_FP_F_F6, kvm_cpu_context, fp.f.f[6]);
+	OFFSET(KVM_ARCH_FP_F_F7, kvm_cpu_context, fp.f.f[7]);
+	OFFSET(KVM_ARCH_FP_F_F8, kvm_cpu_context, fp.f.f[8]);
+	OFFSET(KVM_ARCH_FP_F_F9, kvm_cpu_context, fp.f.f[9]);
+	OFFSET(KVM_ARCH_FP_F_F10, kvm_cpu_context, fp.f.f[10]);
+	OFFSET(KVM_ARCH_FP_F_F11, kvm_cpu_context, fp.f.f[11]);
+	OFFSET(KVM_ARCH_FP_F_F12, kvm_cpu_context, fp.f.f[12]);
+	OFFSET(KVM_ARCH_FP_F_F13, kvm_cpu_context, fp.f.f[13]);
+	OFFSET(KVM_ARCH_FP_F_F14, kvm_cpu_context, fp.f.f[14]);
+	OFFSET(KVM_ARCH_FP_F_F15, kvm_cpu_context, fp.f.f[15]);
+	OFFSET(KVM_ARCH_FP_F_F16, kvm_cpu_context, fp.f.f[16]);
+	OFFSET(KVM_ARCH_FP_F_F17, kvm_cpu_context, fp.f.f[17]);
+	OFFSET(KVM_ARCH_FP_F_F18, kvm_cpu_context, fp.f.f[18]);
+	OFFSET(KVM_ARCH_FP_F_F19, kvm_cpu_context, fp.f.f[19]);
+	OFFSET(KVM_ARCH_FP_F_F20, kvm_cpu_context, fp.f.f[20]);
+	OFFSET(KVM_ARCH_FP_F_F21, kvm_cpu_context, fp.f.f[21]);
+	OFFSET(KVM_ARCH_FP_F_F22, kvm_cpu_context, fp.f.f[22]);
+	OFFSET(KVM_ARCH_FP_F_F23, kvm_cpu_context, fp.f.f[23]);
+	OFFSET(KVM_ARCH_FP_F_F24, kvm_cpu_context, fp.f.f[24]);
+	OFFSET(KVM_ARCH_FP_F_F25, kvm_cpu_context, fp.f.f[25]);
+	OFFSET(KVM_ARCH_FP_F_F26, kvm_cpu_context, fp.f.f[26]);
+	OFFSET(KVM_ARCH_FP_F_F27, kvm_cpu_context, fp.f.f[27]);
+	OFFSET(KVM_ARCH_FP_F_F28, kvm_cpu_context, fp.f.f[28]);
+	OFFSET(KVM_ARCH_FP_F_F29, kvm_cpu_context, fp.f.f[29]);
+	OFFSET(KVM_ARCH_FP_F_F30, kvm_cpu_context, fp.f.f[30]);
+	OFFSET(KVM_ARCH_FP_F_F31, kvm_cpu_context, fp.f.f[31]);
+	OFFSET(KVM_ARCH_FP_F_FCSR, kvm_cpu_context, fp.f.fcsr);
+
+	/* D extension */
+
+	OFFSET(KVM_ARCH_FP_D_F0, kvm_cpu_context, fp.d.f[0]);
+	OFFSET(KVM_ARCH_FP_D_F1, kvm_cpu_context, fp.d.f[1]);
+	OFFSET(KVM_ARCH_FP_D_F2, kvm_cpu_context, fp.d.f[2]);
+	OFFSET(KVM_ARCH_FP_D_F3, kvm_cpu_context, fp.d.f[3]);
+	OFFSET(KVM_ARCH_FP_D_F4, kvm_cpu_context, fp.d.f[4]);
+	OFFSET(KVM_ARCH_FP_D_F5, kvm_cpu_context, fp.d.f[5]);
+	OFFSET(KVM_ARCH_FP_D_F6, kvm_cpu_context, fp.d.f[6]);
+	OFFSET(KVM_ARCH_FP_D_F7, kvm_cpu_context, fp.d.f[7]);
+	OFFSET(KVM_ARCH_FP_D_F8, kvm_cpu_context, fp.d.f[8]);
+	OFFSET(KVM_ARCH_FP_D_F9, kvm_cpu_context, fp.d.f[9]);
+	OFFSET(KVM_ARCH_FP_D_F10, kvm_cpu_context, fp.d.f[10]);
+	OFFSET(KVM_ARCH_FP_D_F11, kvm_cpu_context, fp.d.f[11]);
+	OFFSET(KVM_ARCH_FP_D_F12, kvm_cpu_context, fp.d.f[12]);
+	OFFSET(KVM_ARCH_FP_D_F13, kvm_cpu_context, fp.d.f[13]);
+	OFFSET(KVM_ARCH_FP_D_F14, kvm_cpu_context, fp.d.f[14]);
+	OFFSET(KVM_ARCH_FP_D_F15, kvm_cpu_context, fp.d.f[15]);
+	OFFSET(KVM_ARCH_FP_D_F16, kvm_cpu_context, fp.d.f[16]);
+	OFFSET(KVM_ARCH_FP_D_F17, kvm_cpu_context, fp.d.f[17]);
+	OFFSET(KVM_ARCH_FP_D_F18, kvm_cpu_context, fp.d.f[18]);
+	OFFSET(KVM_ARCH_FP_D_F19, kvm_cpu_context, fp.d.f[19]);
+	OFFSET(KVM_ARCH_FP_D_F20, kvm_cpu_context, fp.d.f[20]);
+	OFFSET(KVM_ARCH_FP_D_F21, kvm_cpu_context, fp.d.f[21]);
+	OFFSET(KVM_ARCH_FP_D_F22, kvm_cpu_context, fp.d.f[22]);
+	OFFSET(KVM_ARCH_FP_D_F23, kvm_cpu_context, fp.d.f[23]);
+	OFFSET(KVM_ARCH_FP_D_F24, kvm_cpu_context, fp.d.f[24]);
+	OFFSET(KVM_ARCH_FP_D_F25, kvm_cpu_context, fp.d.f[25]);
+	OFFSET(KVM_ARCH_FP_D_F26, kvm_cpu_context, fp.d.f[26]);
+	OFFSET(KVM_ARCH_FP_D_F27, kvm_cpu_context, fp.d.f[27]);
+	OFFSET(KVM_ARCH_FP_D_F28, kvm_cpu_context, fp.d.f[28]);
+	OFFSET(KVM_ARCH_FP_D_F29, kvm_cpu_context, fp.d.f[29]);
+	OFFSET(KVM_ARCH_FP_D_F30, kvm_cpu_context, fp.d.f[30]);
+	OFFSET(KVM_ARCH_FP_D_F31, kvm_cpu_context, fp.d.f[31]);
+	OFFSET(KVM_ARCH_FP_D_FCSR, kvm_cpu_context, fp.d.fcsr);
+
 	/*
 	 * THREAD_{F,X}* might be larger than a S-type offset can handle, but
 	 * these are used in performance-sensitive assembly so we can't resort
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 018fca436776..e7c5fe09c3bc 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -32,6 +32,76 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
 
+#ifdef CONFIG_FPU
+static void kvm_riscv_vcpu_fp_reset(struct kvm_vcpu *vcpu)
+{
+	unsigned long isa = vcpu->arch.isa;
+	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+
+	cntx->sstatus &= ~SR_FS;
+	if (riscv_isa_extension_available(&isa, f) ||
+	    riscv_isa_extension_available(&isa, d))
+		cntx->sstatus |= SR_FS_INITIAL;
+	else
+		cntx->sstatus |= SR_FS_OFF;
+}
+
+static void kvm_riscv_vcpu_fp_clean(struct kvm_cpu_context *cntx)
+{
+	cntx->sstatus &= ~SR_FS;
+	cntx->sstatus |= SR_FS_CLEAN;
+}
+
+static void kvm_riscv_vcpu_guest_fp_save(struct kvm_cpu_context *cntx,
+					 unsigned long isa)
+{
+	if ((cntx->sstatus & SR_FS) == SR_FS_DIRTY) {
+		if (riscv_isa_extension_available(&isa, d))
+			__kvm_riscv_vcpu_fp_d_save(cntx);
+		else if (riscv_isa_extension_available(&isa, f))
+			__kvm_riscv_vcpu_fp_f_save(cntx);
+		kvm_riscv_vcpu_fp_clean(cntx);
+	}
+}
+
+static void kvm_riscv_vcpu_guest_fp_restore(struct kvm_cpu_context *cntx,
+					    unsigned long isa)
+{
+	if ((cntx->sstatus & SR_FS) != SR_FS_OFF) {
+		if (riscv_isa_extension_available(&isa, d))
+			__kvm_riscv_vcpu_fp_d_restore(cntx);
+		else if (riscv_isa_extension_available(&isa, f))
+			__kvm_riscv_vcpu_fp_f_restore(cntx);
+		kvm_riscv_vcpu_fp_clean(cntx);
+	}
+}
+
+static void kvm_riscv_vcpu_host_fp_save(struct kvm_cpu_context *cntx)
+{
+	/* No need to check host sstatus as it can be modified outside */
+	if (riscv_isa_extension_available(NULL, d))
+		__kvm_riscv_vcpu_fp_d_save(cntx);
+	else if (riscv_isa_extension_available(NULL, f))
+		__kvm_riscv_vcpu_fp_f_save(cntx);
+}
+
+static void kvm_riscv_vcpu_host_fp_restore(struct kvm_cpu_context *cntx)
+{
+	if (riscv_isa_extension_available(NULL, d))
+		__kvm_riscv_vcpu_fp_d_restore(cntx);
+	else if (riscv_isa_extension_available(NULL, f))
+		__kvm_riscv_vcpu_fp_f_restore(cntx);
+}
+#else
+static void kvm_riscv_vcpu_fp_reset(struct kvm_vcpu *vcpu) {}
+static void kvm_riscv_vcpu_guest_fp_save(struct kvm_cpu_context *cntx,
+					 unsigned long isa) {}
+static void kvm_riscv_vcpu_guest_fp_restore(struct kvm_cpu_context *cntx,
+					    unsigned long isa) {}
+static void kvm_riscv_vcpu_host_fp_save(struct kvm_cpu_context *cntx) {}
+static void kvm_riscv_vcpu_host_fp_restore(struct kvm_cpu_context *cntx) {}
+#endif
+
 #define KVM_RISCV_ISA_ALLOWED	(riscv_isa_extension_mask(a) | \
 				 riscv_isa_extension_mask(c) | \
 				 riscv_isa_extension_mask(d) | \
@@ -54,6 +124,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
 
 	memcpy(cntx, reset_cntx, sizeof(*cntx));
 
+	kvm_riscv_vcpu_fp_reset(vcpu);
+
 	kvm_riscv_vcpu_timer_reset(vcpu);
 
 	WRITE_ONCE(vcpu->arch.irqs_pending, 0);
@@ -222,6 +294,7 @@ static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
 			vcpu->arch.isa = reg_val;
 			vcpu->arch.isa &= riscv_isa_extension_base(NULL);
 			vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED;
+			kvm_riscv_vcpu_fp_reset(vcpu);
 		} else {
 			return -ENOTSUPP;
 		}
@@ -591,6 +664,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	kvm_riscv_stage2_update_hgatp(vcpu);
 
+	kvm_riscv_vcpu_host_fp_save(&vcpu->arch.host_context);
+	kvm_riscv_vcpu_guest_fp_restore(&vcpu->arch.guest_context,
+					vcpu->arch.isa);
+
 	vcpu->cpu = cpu;
 }
 
@@ -600,6 +677,10 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 
 	vcpu->cpu = -1;
 
+	kvm_riscv_vcpu_guest_fp_save(&vcpu->arch.guest_context,
+				     vcpu->arch.isa);
+	kvm_riscv_vcpu_host_fp_restore(&vcpu->arch.host_context);
+
 	csr_write(CSR_HGATP, 0);
 
 	csr->vsstatus = csr_read(CSR_VSSTATUS);
diff --git a/arch/riscv/kvm/vcpu_switch.S b/arch/riscv/kvm/vcpu_switch.S
index e1a17df1b379..d7e237d1004c 100644
--- a/arch/riscv/kvm/vcpu_switch.S
+++ b/arch/riscv/kvm/vcpu_switch.S
@@ -192,3 +192,177 @@ __kvm_switch_return:
 	/* Return to C code */
 	ret
 ENDPROC(__kvm_riscv_switch_to)
+
+#ifdef	CONFIG_FPU
+	.align 3
+	.global __kvm_riscv_vcpu_fp_f_save
+__kvm_riscv_vcpu_fp_f_save:
+	csrr t2, CSR_SSTATUS
+	li t1, SR_FS
+	csrs CSR_SSTATUS, t1
+	frcsr t0
+	fsw f0,  KVM_ARCH_FP_F_F0(a0)
+	fsw f1,  KVM_ARCH_FP_F_F1(a0)
+	fsw f2,  KVM_ARCH_FP_F_F2(a0)
+	fsw f3,  KVM_ARCH_FP_F_F3(a0)
+	fsw f4,  KVM_ARCH_FP_F_F4(a0)
+	fsw f5,  KVM_ARCH_FP_F_F5(a0)
+	fsw f6,  KVM_ARCH_FP_F_F6(a0)
+	fsw f7,  KVM_ARCH_FP_F_F7(a0)
+	fsw f8,  KVM_ARCH_FP_F_F8(a0)
+	fsw f9,  KVM_ARCH_FP_F_F9(a0)
+	fsw f10, KVM_ARCH_FP_F_F10(a0)
+	fsw f11, KVM_ARCH_FP_F_F11(a0)
+	fsw f12, KVM_ARCH_FP_F_F12(a0)
+	fsw f13, KVM_ARCH_FP_F_F13(a0)
+	fsw f14, KVM_ARCH_FP_F_F14(a0)
+	fsw f15, KVM_ARCH_FP_F_F15(a0)
+	fsw f16, KVM_ARCH_FP_F_F16(a0)
+	fsw f17, KVM_ARCH_FP_F_F17(a0)
+	fsw f18, KVM_ARCH_FP_F_F18(a0)
+	fsw f19, KVM_ARCH_FP_F_F19(a0)
+	fsw f20, KVM_ARCH_FP_F_F20(a0)
+	fsw f21, KVM_ARCH_FP_F_F21(a0)
+	fsw f22, KVM_ARCH_FP_F_F22(a0)
+	fsw f23, KVM_ARCH_FP_F_F23(a0)
+	fsw f24, KVM_ARCH_FP_F_F24(a0)
+	fsw f25, KVM_ARCH_FP_F_F25(a0)
+	fsw f26, KVM_ARCH_FP_F_F26(a0)
+	fsw f27, KVM_ARCH_FP_F_F27(a0)
+	fsw f28, KVM_ARCH_FP_F_F28(a0)
+	fsw f29, KVM_ARCH_FP_F_F29(a0)
+	fsw f30, KVM_ARCH_FP_F_F30(a0)
+	fsw f31, KVM_ARCH_FP_F_F31(a0)
+	sw t0, KVM_ARCH_FP_F_FCSR(a0)
+	csrw CSR_SSTATUS, t2
+	ret
+
+	.align 3
+	.global __kvm_riscv_vcpu_fp_d_save
+__kvm_riscv_vcpu_fp_d_save:
+	csrr t2, CSR_SSTATUS
+	li t1, SR_FS
+	csrs CSR_SSTATUS, t1
+	frcsr t0
+	fsd f0,  KVM_ARCH_FP_D_F0(a0)
+	fsd f1,  KVM_ARCH_FP_D_F1(a0)
+	fsd f2,  KVM_ARCH_FP_D_F2(a0)
+	fsd f3,  KVM_ARCH_FP_D_F3(a0)
+	fsd f4,  KVM_ARCH_FP_D_F4(a0)
+	fsd f5,  KVM_ARCH_FP_D_F5(a0)
+	fsd f6,  KVM_ARCH_FP_D_F6(a0)
+	fsd f7,  KVM_ARCH_FP_D_F7(a0)
+	fsd f8,  KVM_ARCH_FP_D_F8(a0)
+	fsd f9,  KVM_ARCH_FP_D_F9(a0)
+	fsd f10, KVM_ARCH_FP_D_F10(a0)
+	fsd f11, KVM_ARCH_FP_D_F11(a0)
+	fsd f12, KVM_ARCH_FP_D_F12(a0)
+	fsd f13, KVM_ARCH_FP_D_F13(a0)
+	fsd f14, KVM_ARCH_FP_D_F14(a0)
+	fsd f15, KVM_ARCH_FP_D_F15(a0)
+	fsd f16, KVM_ARCH_FP_D_F16(a0)
+	fsd f17, KVM_ARCH_FP_D_F17(a0)
+	fsd f18, KVM_ARCH_FP_D_F18(a0)
+	fsd f19, KVM_ARCH_FP_D_F19(a0)
+	fsd f20, KVM_ARCH_FP_D_F20(a0)
+	fsd f21, KVM_ARCH_FP_D_F21(a0)
+	fsd f22, KVM_ARCH_FP_D_F22(a0)
+	fsd f23, KVM_ARCH_FP_D_F23(a0)
+	fsd f24, KVM_ARCH_FP_D_F24(a0)
+	fsd f25, KVM_ARCH_FP_D_F25(a0)
+	fsd f26, KVM_ARCH_FP_D_F26(a0)
+	fsd f27, KVM_ARCH_FP_D_F27(a0)
+	fsd f28, KVM_ARCH_FP_D_F28(a0)
+	fsd f29, KVM_ARCH_FP_D_F29(a0)
+	fsd f30, KVM_ARCH_FP_D_F30(a0)
+	fsd f31, KVM_ARCH_FP_D_F31(a0)
+	sw t0, KVM_ARCH_FP_D_FCSR(a0)
+	csrw CSR_SSTATUS, t2
+	ret
+
+	.align 3
+	.global __kvm_riscv_vcpu_fp_f_restore
+__kvm_riscv_vcpu_fp_f_restore:
+	csrr t2, CSR_SSTATUS
+	li t1, SR_FS
+	lw t0, KVM_ARCH_FP_F_FCSR(a0)
+	csrs CSR_SSTATUS, t1
+	flw f0,  KVM_ARCH_FP_F_F0(a0)
+	flw f1,  KVM_ARCH_FP_F_F1(a0)
+	flw f2,  KVM_ARCH_FP_F_F2(a0)
+	flw f3,  KVM_ARCH_FP_F_F3(a0)
+	flw f4,  KVM_ARCH_FP_F_F4(a0)
+	flw f5,  KVM_ARCH_FP_F_F5(a0)
+	flw f6,  KVM_ARCH_FP_F_F6(a0)
+	flw f7,  KVM_ARCH_FP_F_F7(a0)
+	flw f8,  KVM_ARCH_FP_F_F8(a0)
+	flw f9,  KVM_ARCH_FP_F_F9(a0)
+	flw f10, KVM_ARCH_FP_F_F10(a0)
+	flw f11, KVM_ARCH_FP_F_F11(a0)
+	flw f12, KVM_ARCH_FP_F_F12(a0)
+	flw f13, KVM_ARCH_FP_F_F13(a0)
+	flw f14, KVM_ARCH_FP_F_F14(a0)
+	flw f15, KVM_ARCH_FP_F_F15(a0)
+	flw f16, KVM_ARCH_FP_F_F16(a0)
+	flw f17, KVM_ARCH_FP_F_F17(a0)
+	flw f18, KVM_ARCH_FP_F_F18(a0)
+	flw f19, KVM_ARCH_FP_F_F19(a0)
+	flw f20, KVM_ARCH_FP_F_F20(a0)
+	flw f21, KVM_ARCH_FP_F_F21(a0)
+	flw f22, KVM_ARCH_FP_F_F22(a0)
+	flw f23, KVM_ARCH_FP_F_F23(a0)
+	flw f24, KVM_ARCH_FP_F_F24(a0)
+	flw f25, KVM_ARCH_FP_F_F25(a0)
+	flw f26, KVM_ARCH_FP_F_F26(a0)
+	flw f27, KVM_ARCH_FP_F_F27(a0)
+	flw f28, KVM_ARCH_FP_F_F28(a0)
+	flw f29, KVM_ARCH_FP_F_F29(a0)
+	flw f30, KVM_ARCH_FP_F_F30(a0)
+	flw f31, KVM_ARCH_FP_F_F31(a0)
+	fscsr t0
+	csrw CSR_SSTATUS, t2
+	ret
+
+	.align 3
+	.global __kvm_riscv_vcpu_fp_d_restore
+__kvm_riscv_vcpu_fp_d_restore:
+	csrr t2, CSR_SSTATUS
+	li t1, SR_FS
+	lw t0, KVM_ARCH_FP_D_FCSR(a0)
+	csrs CSR_SSTATUS, t1
+	fld f0,  KVM_ARCH_FP_D_F0(a0)
+	fld f1,  KVM_ARCH_FP_D_F1(a0)
+	fld f2,  KVM_ARCH_FP_D_F2(a0)
+	fld f3,  KVM_ARCH_FP_D_F3(a0)
+	fld f4,  KVM_ARCH_FP_D_F4(a0)
+	fld f5,  KVM_ARCH_FP_D_F5(a0)
+	fld f6,  KVM_ARCH_FP_D_F6(a0)
+	fld f7,  KVM_ARCH_FP_D_F7(a0)
+	fld f8,  KVM_ARCH_FP_D_F8(a0)
+	fld f9,  KVM_ARCH_FP_D_F9(a0)
+	fld f10, KVM_ARCH_FP_D_F10(a0)
+	fld f11, KVM_ARCH_FP_D_F11(a0)
+	fld f12, KVM_ARCH_FP_D_F12(a0)
+	fld f13, KVM_ARCH_FP_D_F13(a0)
+	fld f14, KVM_ARCH_FP_D_F14(a0)
+	fld f15, KVM_ARCH_FP_D_F15(a0)
+	fld f16, KVM_ARCH_FP_D_F16(a0)
+	fld f17, KVM_ARCH_FP_D_F17(a0)
+	fld f18, KVM_ARCH_FP_D_F18(a0)
+	fld f19, KVM_ARCH_FP_D_F19(a0)
+	fld f20, KVM_ARCH_FP_D_F20(a0)
+	fld f21, KVM_ARCH_FP_D_F21(a0)
+	fld f22, KVM_ARCH_FP_D_F22(a0)
+	fld f23, KVM_ARCH_FP_D_F23(a0)
+	fld f24, KVM_ARCH_FP_D_F24(a0)
+	fld f25, KVM_ARCH_FP_D_F25(a0)
+	fld f26, KVM_ARCH_FP_D_F26(a0)
+	fld f27, KVM_ARCH_FP_D_F27(a0)
+	fld f28, KVM_ARCH_FP_D_F28(a0)
+	fld f29, KVM_ARCH_FP_D_F29(a0)
+	fld f30, KVM_ARCH_FP_D_F30(a0)
+	fld f31, KVM_ARCH_FP_D_F31(a0)
+	fscsr t0
+	csrw CSR_SSTATUS, t2
+	ret
+#endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 17/20] RISC-V: KVM: Implement ONE REG interface for FP registers
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (15 preceding siblings ...)
  2019-08-22  8:46 ` [PATCH v5 16/20] RISC-V: KVM: FP lazy save/restore Anup Patel
@ 2019-08-22  8:46 ` Anup Patel
  2019-08-22  8:46 ` [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support Anup Patel
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:46 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

From: Atish Patra <atish.patra@wdc.com>

Add a KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctl interface for floating
point registers such as F0-F31 and FCSR. This support is added for
both 'F' and 'D' extensions.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/uapi/asm/kvm.h |  10 +++
 arch/riscv/kvm/vcpu.c             | 104 ++++++++++++++++++++++++++++++
 2 files changed, 114 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index 024f220eb17e..c9f03363bb28 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -83,6 +83,16 @@ struct kvm_sregs {
 #define KVM_REG_RISCV_CSR_REG(name)	\
 		(offsetof(struct kvm_sregs, name) / sizeof(unsigned long))
 
+/* F extension registers are mapped as type4 */
+#define KVM_REG_RISCV_FP_F		(0x04 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_FP_F_REG(name)	\
+		(offsetof(struct __riscv_f_ext_state, name) / sizeof(u32))
+
+/* D extension registers are mapped as type 5 */
+#define KVM_REG_RISCV_FP_D		(0x05 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_FP_D_REG(name)	\
+		(offsetof(struct __riscv_d_ext_state, name) / sizeof(u64))
+
 #endif
 
 #endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index e7c5fe09c3bc..ad7b67dc80aa 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -426,6 +426,98 @@ static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
+static int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu,
+				     const struct kvm_one_reg *reg,
+				     unsigned long rtype)
+{
+	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+	unsigned long isa = vcpu->arch.isa;
+	unsigned long __user *uaddr =
+			(unsigned long __user *)(unsigned long)reg->addr;
+	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+					    KVM_REG_SIZE_MASK |
+					    rtype);
+	void *reg_val;
+
+	if ((rtype == KVM_REG_RISCV_FP_F) &&
+	    riscv_isa_extension_available(&isa, f)) {
+		if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+			return -EINVAL;
+		if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
+			reg_val = &cntx->fp.f.fcsr;
+		else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
+			  reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
+			reg_val = &cntx->fp.f.f[reg_num];
+		else
+			return -EINVAL;
+	} else if ((rtype == KVM_REG_RISCV_FP_D) &&
+		   riscv_isa_extension_available(&isa, d)) {
+		if (reg_num == KVM_REG_RISCV_FP_D_REG(fcsr)) {
+			if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+				return -EINVAL;
+			reg_val = &cntx->fp.d.fcsr;
+		} else if ((KVM_REG_RISCV_FP_D_REG(f[0]) <= reg_num) &&
+			   reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) {
+			if (KVM_REG_SIZE(reg->id) != sizeof(u64))
+				return -EINVAL;
+			reg_val = &cntx->fp.d.f[reg_num];
+		} else
+			return -EINVAL;
+	} else
+		return -EINVAL;
+
+	if (copy_to_user(uaddr, reg_val, KVM_REG_SIZE(reg->id)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu,
+				     const struct kvm_one_reg *reg,
+				     unsigned long rtype)
+{
+	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+	unsigned long isa = vcpu->arch.isa;
+	unsigned long __user *uaddr =
+			(unsigned long __user *)(unsigned long)reg->addr;
+	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+					    KVM_REG_SIZE_MASK |
+					    rtype);
+	void *reg_val;
+
+	if ((rtype == KVM_REG_RISCV_FP_F) &&
+	    riscv_isa_extension_available(&isa, f)) {
+		if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+			return -EINVAL;
+		if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
+			reg_val = &cntx->fp.f.fcsr;
+		else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
+			  reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
+			reg_val = &cntx->fp.f.f[reg_num];
+		else
+			return -EINVAL;
+	} else if ((rtype == KVM_REG_RISCV_FP_D) &&
+		   riscv_isa_extension_available(&isa, d)) {
+		if (reg_num == KVM_REG_RISCV_FP_D_REG(fcsr)) {
+			if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+				return -EINVAL;
+			reg_val = &cntx->fp.d.fcsr;
+		} else if ((KVM_REG_RISCV_FP_D_REG(f[0]) <= reg_num) &&
+			   reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) {
+			if (KVM_REG_SIZE(reg->id) != sizeof(u64))
+				return -EINVAL;
+			reg_val = &cntx->fp.d.f[reg_num];
+		} else
+			return -EINVAL;
+	} else
+		return -EINVAL;
+
+	if (copy_from_user(reg_val, uaddr, KVM_REG_SIZE(reg->id)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu,
 				  const struct kvm_one_reg *reg)
 {
@@ -435,6 +527,12 @@ static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu,
 		return kvm_riscv_vcpu_set_reg_core(vcpu, reg);
 	else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR)
 		return kvm_riscv_vcpu_set_reg_csr(vcpu, reg);
+	else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_F)
+		return kvm_riscv_vcpu_set_reg_fp(vcpu, reg,
+						 KVM_REG_RISCV_FP_F);
+	else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_D)
+		return kvm_riscv_vcpu_set_reg_fp(vcpu, reg,
+						 KVM_REG_RISCV_FP_D);
 
 	return -EINVAL;
 }
@@ -448,6 +546,12 @@ static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu,
 		return kvm_riscv_vcpu_get_reg_core(vcpu, reg);
 	else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR)
 		return kvm_riscv_vcpu_get_reg_csr(vcpu, reg);
+	else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_F)
+		return kvm_riscv_vcpu_get_reg_fp(vcpu, reg,
+						 KVM_REG_RISCV_FP_F);
+	else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_D)
+		return kvm_riscv_vcpu_get_reg_fp(vcpu, reg,
+						 KVM_REG_RISCV_FP_D);
 
 	return -EINVAL;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (16 preceding siblings ...)
  2019-08-22  8:46 ` [PATCH v5 17/20] RISC-V: KVM: Implement ONE REG interface for FP registers Anup Patel
@ 2019-08-22  8:46 ` Anup Patel
  2019-08-23  8:04   ` Alexander Graf
  2019-08-22  8:47 ` [PATCH v5 19/20] RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig Anup Patel
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:46 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

From: Atish Patra <atish.patra@wdc.com>

The KVM host kernel running in HS-mode needs to handle SBI calls coming
from guest kernel running in VS-mode.

This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
implemented correctly except remote tlb flushes. For remote TLB flushes,
we are doing full TLB flush and this will be optimized in future.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/include/asm/kvm_host.h |   2 +
 arch/riscv/kvm/Makefile           |   2 +-
 arch/riscv/kvm/vcpu_exit.c        |   3 +
 arch/riscv/kvm/vcpu_sbi.c         | 119 ++++++++++++++++++++++++++++++
 4 files changed, 125 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/kvm/vcpu_sbi.c

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 2af3a179c08e..0b1eceaef59f 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -241,4 +241,6 @@ bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
 void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
 void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
 
+int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu);
+
 #endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 3e0c7558320d..b56dc1650d2c 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 kvm-objs := $(common-objs-y)
 
 kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
-kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
 
 obj-$(CONFIG_KVM)	+= kvm.o
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index fbc04fe335ad..87b83fcf9a14 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -534,6 +534,9 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		    (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
 			ret = stage2_page_fault(vcpu, run, scause, stval);
 		break;
+	case EXC_SUPERVISOR_SYSCALL:
+		if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
+			ret = kvm_riscv_vcpu_sbi_ecall(vcpu);
 	default:
 		break;
 	};
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
new file mode 100644
index 000000000000..5793202eb514
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * Copyright (c) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Atish Patra <atish.patra@wdc.com>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <asm/csr.h>
+#include <asm/kvm_vcpu_timer.h>
+
+#define SBI_VERSION_MAJOR			0
+#define SBI_VERSION_MINOR			1
+
+/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
+static unsigned long kvm_sbi_unpriv_load(const unsigned long *addr,
+					 struct kvm_vcpu *vcpu)
+{
+	unsigned long flags, val;
+	unsigned long __hstatus, __sstatus;
+
+	local_irq_save(flags);
+	__hstatus = csr_read(CSR_HSTATUS);
+	__sstatus = csr_read(CSR_SSTATUS);
+	csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
+	csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
+	val = *addr;
+	csr_write(CSR_HSTATUS, __hstatus);
+	csr_write(CSR_SSTATUS, __sstatus);
+	local_irq_restore(flags);
+
+	return val;
+}
+
+static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, u32 type)
+{
+	int i;
+	struct kvm_vcpu *tmp;
+
+	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
+		tmp->arch.power_off = true;
+	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
+
+	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
+	vcpu->run->system_event.type = type;
+	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+}
+
+int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu)
+{
+	int ret = 1;
+	u64 next_cycle;
+	int vcpuid;
+	struct kvm_vcpu *remote_vcpu;
+	ulong dhart_mask;
+	struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+
+	if (!cp)
+		return -EINVAL;
+	switch (cp->a7) {
+	case SBI_SET_TIMER:
+#if __riscv_xlen == 32
+		next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
+#else
+		next_cycle = (u64)cp->a0;
+#endif
+		kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
+		break;
+	case SBI_CONSOLE_PUTCHAR:
+		/* Not implemented */
+		cp->a0 = -ENOTSUPP;
+		break;
+	case SBI_CONSOLE_GETCHAR:
+		/* Not implemented */
+		cp->a0 = -ENOTSUPP;
+		break;
+	case SBI_CLEAR_IPI:
+		kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
+		break;
+	case SBI_SEND_IPI:
+		dhart_mask = kvm_sbi_unpriv_load((unsigned long *)cp->a0, vcpu);
+		for_each_set_bit(vcpuid, &dhart_mask, BITS_PER_LONG) {
+			remote_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, vcpuid);
+			kvm_riscv_vcpu_set_interrupt(remote_vcpu, IRQ_S_SOFT);
+		}
+		break;
+	case SBI_SHUTDOWN:
+		kvm_sbi_system_shutdown(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
+		ret = 0;
+		break;
+	case SBI_REMOTE_FENCE_I:
+		sbi_remote_fence_i(NULL);
+		break;
+	/*
+	 * TODO: There should be a way to call remote hfence.bvma.
+	 * Preferred method is now a SBI call. Until then, just flush
+	 * all tlbs.
+	 */
+	case SBI_REMOTE_SFENCE_VMA:
+		/*TODO: Parse vma range.*/
+		sbi_remote_sfence_vma(NULL, 0, 0);
+		break;
+	case SBI_REMOTE_SFENCE_VMA_ASID:
+		/*TODO: Parse vma range for given ASID */
+		sbi_remote_sfence_vma(NULL, 0, 0);
+		break;
+	default:
+		cp->a0 = ENOTSUPP;
+		break;
+	};
+
+	if (ret >= 0)
+		cp->sepc += 4;
+
+	return ret;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 19/20] RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (17 preceding siblings ...)
  2019-08-22  8:46 ` [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support Anup Patel
@ 2019-08-22  8:47 ` Anup Patel
  2019-08-22  8:47 ` [PATCH v5 20/20] RISC-V: KVM: Add MAINTAINERS entry Anup Patel
  2019-08-23  8:08 ` [PATCH v5 00/20] KVM RISC-V Support Alexander Graf
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:47 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

This patch enables more VIRTIO drivers (such as console, rpmsg, 9p,
rng, etc.) which are usable on KVM RISC-V Guest and Xvisor RISC-V
Guest.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/riscv/configs/defconfig      | 11 +++++++++++
 arch/riscv/configs/rv32_defconfig | 11 +++++++++++
 2 files changed, 22 insertions(+)

diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index 3efff552a261..420a0dbef386 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -29,6 +29,8 @@ CONFIG_IP_PNP_DHCP=y
 CONFIG_IP_PNP_BOOTP=y
 CONFIG_IP_PNP_RARP=y
 CONFIG_NETLINK_DIAG=y
+CONFIG_NET_9P=y
+CONFIG_NET_9P_VIRTIO=y
 CONFIG_PCI=y
 CONFIG_PCIEPORTBUS=y
 CONFIG_PCI_HOST_GENERIC=y
@@ -39,6 +41,7 @@ CONFIG_BLK_DEV_LOOP=y
 CONFIG_VIRTIO_BLK=y
 CONFIG_BLK_DEV_SD=y
 CONFIG_BLK_DEV_SR=y
+CONFIG_SCSI_VIRTIO=y
 CONFIG_ATA=y
 CONFIG_SATA_AHCI=y
 CONFIG_SATA_AHCI_PLATFORM=y
@@ -54,6 +57,7 @@ CONFIG_SERIAL_8250_CONSOLE=y
 CONFIG_SERIAL_OF_PLATFORM=y
 CONFIG_SERIAL_EARLYCON_RISCV_SBI=y
 CONFIG_HVC_RISCV_SBI=y
+CONFIG_VIRTIO_CONSOLE=y
 CONFIG_HW_RANDOM=y
 CONFIG_HW_RANDOM_VIRTIO=y
 CONFIG_SPI=y
@@ -61,6 +65,7 @@ CONFIG_SPI_SIFIVE=y
 # CONFIG_PTP_1588_CLOCK is not set
 CONFIG_DRM=y
 CONFIG_DRM_RADEON=y
+CONFIG_DRM_VIRTIO_GPU=y
 CONFIG_FRAMEBUFFER_CONSOLE=y
 CONFIG_USB=y
 CONFIG_USB_XHCI_HCD=y
@@ -73,7 +78,12 @@ CONFIG_USB_STORAGE=y
 CONFIG_USB_UAS=y
 CONFIG_MMC=y
 CONFIG_MMC_SPI=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_VIRTIO_INPUT=y
 CONFIG_VIRTIO_MMIO=y
+CONFIG_RPMSG_CHAR=y
+CONFIG_RPMSG_VIRTIO=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
 CONFIG_AUTOFS4_FS=y
@@ -86,6 +96,7 @@ CONFIG_NFS_V4=y
 CONFIG_NFS_V4_1=y
 CONFIG_NFS_V4_2=y
 CONFIG_ROOT_NFS=y
+CONFIG_9P_FS=y
 CONFIG_CRYPTO_USER_API_HASH=y
 CONFIG_CRYPTO_DEV_VIRTIO=y
 CONFIG_PRINTK_TIME=y
diff --git a/arch/riscv/configs/rv32_defconfig b/arch/riscv/configs/rv32_defconfig
index 7da93e494445..87ee6e62b64b 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -29,6 +29,8 @@ CONFIG_IP_PNP_DHCP=y
 CONFIG_IP_PNP_BOOTP=y
 CONFIG_IP_PNP_RARP=y
 CONFIG_NETLINK_DIAG=y
+CONFIG_NET_9P=y
+CONFIG_NET_9P_VIRTIO=y
 CONFIG_PCI=y
 CONFIG_PCIEPORTBUS=y
 CONFIG_PCI_HOST_GENERIC=y
@@ -39,6 +41,7 @@ CONFIG_BLK_DEV_LOOP=y
 CONFIG_VIRTIO_BLK=y
 CONFIG_BLK_DEV_SD=y
 CONFIG_BLK_DEV_SR=y
+CONFIG_SCSI_VIRTIO=y
 CONFIG_ATA=y
 CONFIG_SATA_AHCI=y
 CONFIG_SATA_AHCI_PLATFORM=y
@@ -54,11 +57,13 @@ CONFIG_SERIAL_8250_CONSOLE=y
 CONFIG_SERIAL_OF_PLATFORM=y
 CONFIG_SERIAL_EARLYCON_RISCV_SBI=y
 CONFIG_HVC_RISCV_SBI=y
+CONFIG_VIRTIO_CONSOLE=y
 CONFIG_HW_RANDOM=y
 CONFIG_HW_RANDOM_VIRTIO=y
 # CONFIG_PTP_1588_CLOCK is not set
 CONFIG_DRM=y
 CONFIG_DRM_RADEON=y
+CONFIG_DRM_VIRTIO_GPU=y
 CONFIG_FRAMEBUFFER_CONSOLE=y
 CONFIG_USB=y
 CONFIG_USB_XHCI_HCD=y
@@ -69,7 +74,12 @@ CONFIG_USB_OHCI_HCD=y
 CONFIG_USB_OHCI_HCD_PLATFORM=y
 CONFIG_USB_STORAGE=y
 CONFIG_USB_UAS=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_VIRTIO_INPUT=y
 CONFIG_VIRTIO_MMIO=y
+CONFIG_RPMSG_CHAR=y
+CONFIG_RPMSG_VIRTIO=y
 CONFIG_SIFIVE_PLIC=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
@@ -83,6 +93,7 @@ CONFIG_NFS_V4=y
 CONFIG_NFS_V4_1=y
 CONFIG_NFS_V4_2=y
 CONFIG_ROOT_NFS=y
+CONFIG_9P_FS=y
 CONFIG_CRYPTO_USER_API_HASH=y
 CONFIG_CRYPTO_DEV_VIRTIO=y
 CONFIG_PRINTK_TIME=y
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v5 20/20] RISC-V: KVM: Add MAINTAINERS entry
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (18 preceding siblings ...)
  2019-08-22  8:47 ` [PATCH v5 19/20] RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig Anup Patel
@ 2019-08-22  8:47 ` Anup Patel
  2019-08-23  8:08 ` [PATCH v5 00/20] KVM RISC-V Support Alexander Graf
  20 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22  8:47 UTC (permalink / raw)
  To: Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel, Anup Patel

Add myself as maintainer for KVM RISC-V as Atish as designated reviewer.

For time being, we use my GitHub repo as KVM RISC-V gitrepo. We will
update this once we have common KVM RISC-V gitrepo under kernel.org.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 MAINTAINERS | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 43604d6ab96c..85c4e273fc72 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8877,6 +8877,16 @@ F:	arch/powerpc/include/asm/kvm*
 F:	arch/powerpc/kvm/
 F:	arch/powerpc/kernel/kvm*
 
+KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)
+M:	Anup Patel <anup.patel@wdc.com>
+R:	Atish Patra <atish.patra@wdc.com>
+L:	kvm@vger.kernel.org
+T:	git git://github.com/avpatel/linux.git
+S:	Maintained
+F:	arch/riscv/include/uapi/asm/kvm*
+F:	arch/riscv/include/asm/kvm*
+F:	arch/riscv/kvm/
+
 KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
 M:	Christian Borntraeger <borntraeger@de.ibm.com>
 M:	Janosch Frank <frankja@linux.ibm.com>
-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  2019-08-22  8:44 ` [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls Anup Patel
@ 2019-08-22 12:01   ` Alexander Graf
  2019-08-22 14:00     ` Anup Patel
  2019-08-22 14:05     ` Anup Patel
  0 siblings, 2 replies; 61+ messages in thread
From: Alexander Graf @ 2019-08-22 12:01 UTC (permalink / raw)
  To: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel

On 22.08.19 10:44, Anup Patel wrote:
> For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
> VCPU config and registers from user-space.
> 
> We have three types of VCPU registers:
> 1. CONFIG - these are VCPU config and capabilities
> 2. CORE   - these are VCPU general purpose registers
> 3. CSR    - these are VCPU control and status registers
> 
> The CONFIG registers available to user-space are ISA and TIMEBASE. Out
> of these, TIMEBASE is a read-only register which inform user-space about
> VCPU timer base frequency. The ISA register is a read and write register
> where user-space can only write the desired VCPU ISA capabilities before
> running the VCPU.
> 
> The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
> T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
> PC and MODE. The PC register represents program counter whereas the MODE
> register represent VCPU privilege mode (i.e. S/U-mode).
> 
> The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
> SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.
> 
> In future, more VCPU register types will be added (such as FP) for the
> KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.
> 
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/riscv/include/uapi/asm/kvm.h |  40 ++++-
>   arch/riscv/kvm/vcpu.c             | 235 +++++++++++++++++++++++++++++-
>   2 files changed, 272 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
> index 6dbc056d58ba..024f220eb17e 100644
> --- a/arch/riscv/include/uapi/asm/kvm.h
> +++ b/arch/riscv/include/uapi/asm/kvm.h
> @@ -23,8 +23,15 @@
>   
>   /* for KVM_GET_REGS and KVM_SET_REGS */
>   struct kvm_regs {
> +	/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
> +	struct user_regs_struct regs;
> +	unsigned long mode;

Is there any particular reason you're reusing kvm_regs and don't invent 
your own struct? kvm_regs is explicitly meant for the get_regs and 
set_regs ioctls.

>   };
>   
> +/* Possible privilege modes for kvm_regs */
> +#define KVM_RISCV_MODE_S	1
> +#define KVM_RISCV_MODE_U	0
> +
>   /* for KVM_GET_FPU and KVM_SET_FPU */
>   struct kvm_fpu {
>   };
> @@ -41,10 +48,41 @@ struct kvm_guest_debug_arch {
>   struct kvm_sync_regs {
>   };
>   
> -/* dummy definition */
> +/* for KVM_GET_SREGS and KVM_SET_SREGS */
>   struct kvm_sregs {
> +	unsigned long sstatus;
> +	unsigned long sie;
> +	unsigned long stvec;
> +	unsigned long sscratch;
> +	unsigned long sepc;
> +	unsigned long scause;
> +	unsigned long stval;
> +	unsigned long sip;
> +	unsigned long satp;

Same comment here.

>   };
>   
> +#define KVM_REG_SIZE(id)		\
> +	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> +
> +/* If you need to interpret the index values, here is the key: */
> +#define KVM_REG_RISCV_TYPE_MASK		0x00000000FF000000
> +#define KVM_REG_RISCV_TYPE_SHIFT	24
> +
> +/* Config registers are mapped as type 1 */
> +#define KVM_REG_RISCV_CONFIG		(0x01 << KVM_REG_RISCV_TYPE_SHIFT)
> +#define KVM_REG_RISCV_CONFIG_ISA	0x0
> +#define KVM_REG_RISCV_CONFIG_TIMEBASE	0x1
> +
> +/* Core registers are mapped as type 2 */
> +#define KVM_REG_RISCV_CORE		(0x02 << KVM_REG_RISCV_TYPE_SHIFT)
> +#define KVM_REG_RISCV_CORE_REG(name)	\
> +		(offsetof(struct kvm_regs, name) / sizeof(unsigned long))

I see, you're trying to implicitly use the struct offsets as index.

I'm not a really big fan of it, but I can't pinpoint exactly why just 
yet. It just seems too magical (read: potentially breaking down the 
road) for me.

> +
> +/* Control and status registers are mapped as type 3 */
> +#define KVM_REG_RISCV_CSR		(0x03 << KVM_REG_RISCV_TYPE_SHIFT)
> +#define KVM_REG_RISCV_CSR_REG(name)	\
> +		(offsetof(struct kvm_sregs, name) / sizeof(unsigned long))
> +
>   #endif
>   
>   #endif /* __LINUX_KVM_RISCV_H */
> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> index 7f59e85c6af8..9396a83c0611 100644
> --- a/arch/riscv/kvm/vcpu.c
> +++ b/arch/riscv/kvm/vcpu.c
> @@ -164,6 +164,215 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>   	return VM_FAULT_SIGBUS;
>   }
>   
> +static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
> +					 const struct kvm_one_reg *reg)
> +{
> +	unsigned long __user *uaddr =
> +			(unsigned long __user *)(unsigned long)reg->addr;
> +	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> +					    KVM_REG_SIZE_MASK |
> +					    KVM_REG_RISCV_CONFIG);
> +	unsigned long reg_val;
> +
> +	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> +		return -EINVAL;
> +
> +	switch (reg_num) {
> +	case KVM_REG_RISCV_CONFIG_ISA:
> +		reg_val = vcpu->arch.isa;
> +		break;
> +	case KVM_REG_RISCV_CONFIG_TIMEBASE:
> +		reg_val = riscv_timebase;

What does this reflect? The current guest time hopefully not? An offset? 
Related to what?

All ONE_REG registers should be documented in 
Documentation/virtual/kvm/api.txt. Please add them there.

> +		break;
> +	default:
> +		return -EINVAL;
> +	};
> +
> +	if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
> +					 const struct kvm_one_reg *reg)
> +{
> +	unsigned long __user *uaddr =
> +			(unsigned long __user *)(unsigned long)reg->addr;
> +	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> +					    KVM_REG_SIZE_MASK |
> +					    KVM_REG_RISCV_CONFIG);
> +	unsigned long reg_val;
> +
> +	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
> +		return -EFAULT;
> +
> +	switch (reg_num) {
> +	case KVM_REG_RISCV_CONFIG_ISA:
> +		if (!vcpu->arch.ran_atleast_once) {
> +			vcpu->arch.isa = reg_val;
> +			vcpu->arch.isa &= riscv_isa_extension_base(NULL);
> +			vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED;

This register definitely needs proper documentation too ;). You may want 
to reconsider to put a few of the helper bits from patch 02/20 into 
uapi, so that user space can directly use them.

> +		} else {
> +			return -ENOTSUPP;
> +		}
> +		break;
> +	case KVM_REG_RISCV_CONFIG_TIMEBASE:
> +		return -ENOTSUPP;
> +	default:
> +		return -EINVAL;
> +	};
> +
> +	return 0;
> +}
> +
> +static int kvm_riscv_vcpu_get_reg_core(struct kvm_vcpu *vcpu,
> +				       const struct kvm_one_reg *reg)
> +{
> +	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
> +	unsigned long __user *uaddr =
> +			(unsigned long __user *)(unsigned long)reg->addr;
> +	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> +					    KVM_REG_SIZE_MASK |
> +					    KVM_REG_RISCV_CORE);
> +	unsigned long reg_val;
> +
> +	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> +		return -EINVAL;
> +
> +	if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
> +		reg_val = cntx->sepc;
> +	else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
> +		 reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6))
> +		reg_val = ((unsigned long *)cntx)[reg_num];
> +	else if (reg_num == KVM_REG_RISCV_CORE_REG(mode))
> +		reg_val = (cntx->sstatus & SR_SPP) ?
> +				KVM_RISCV_MODE_S : KVM_RISCV_MODE_U;
> +	else
> +		return -EINVAL;
> +
> +	if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int kvm_riscv_vcpu_set_reg_core(struct kvm_vcpu *vcpu,
> +				       const struct kvm_one_reg *reg)
> +{
> +	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
> +	unsigned long __user *uaddr =
> +			(unsigned long __user *)(unsigned long)reg->addr;
> +	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> +					    KVM_REG_SIZE_MASK |
> +					    KVM_REG_RISCV_CORE);
> +	unsigned long reg_val;
> +
> +	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
> +		return -EFAULT;
> +
> +	if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
> +		cntx->sepc = reg_val;
> +	else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
> +		 reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6))
> +		((unsigned long *)cntx)[reg_num] = reg_val;
> +	else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) {
> +		if (reg_val == KVM_RISCV_MODE_S)
> +			cntx->sstatus |= SR_SPP;
> +		else
> +			cntx->sstatus &= ~SR_SPP;
> +	} else
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static int kvm_riscv_vcpu_get_reg_csr(struct kvm_vcpu *vcpu,
> +				      const struct kvm_one_reg *reg)
> +{
> +	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
> +	unsigned long __user *uaddr =
> +			(unsigned long __user *)(unsigned long)reg->addr;
> +	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> +					    KVM_REG_SIZE_MASK |
> +					    KVM_REG_RISCV_CSR);
> +	unsigned long reg_val;
> +
> +	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> +		return -EINVAL;
> +	if (reg_num >= sizeof(struct kvm_sregs) / sizeof(unsigned long))
> +		return -EINVAL;
> +
> +	if (reg_num == KVM_REG_RISCV_CSR_REG(sip))
> +		kvm_riscv_vcpu_flush_interrupts(vcpu);
> +
> +	reg_val = ((unsigned long *)csr)[reg_num];
> +
> +	if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu,
> +				      const struct kvm_one_reg *reg)
> +{
> +	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
> +	unsigned long __user *uaddr =
> +			(unsigned long __user *)(unsigned long)reg->addr;
> +	unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> +					    KVM_REG_SIZE_MASK |
> +					    KVM_REG_RISCV_CSR);
> +	unsigned long reg_val;
> +
> +	if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> +		return -EINVAL;
> +	if (reg_num >= sizeof(struct kvm_sregs) / sizeof(unsigned long))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
> +		return -EFAULT;
> +
> +	((unsigned long *)csr)[reg_num] = reg_val;
> +
> +	if (reg_num == KVM_REG_RISCV_CSR_REG(sip))
> +		WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);

Why does writing SIP clear all pending interrupts?


Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU
  2019-08-22  8:44 ` [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU Anup Patel
@ 2019-08-22 12:10   ` Alexander Graf
  2019-08-22 12:21     ` Andrew Jones
  2019-08-22 12:27     ` Anup Patel
  2019-08-22 12:14   ` Alexander Graf
  1 sibling, 2 replies; 61+ messages in thread
From: Alexander Graf @ 2019-08-22 12:10 UTC (permalink / raw)
  To: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel

On 22.08.19 10:44, Anup Patel wrote:
> We will get stage2 page faults whenever Guest/VM access SW emulated
> MMIO device or unmapped Guest RAM.
> 
> This patch implements MMIO read/write emulation by extracting MMIO
> details from the trapped load/store instruction and forwarding the
> MMIO read/write to user-space. The actual MMIO emulation will happen
> in user-space and KVM kernel module will only take care of register
> updates before resuming the trapped VCPU.
> 
> The handling for stage2 page faults for unmapped Guest RAM will be
> implemeted by a separate patch later.
> 
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/riscv/include/asm/kvm_host.h |  11 +
>   arch/riscv/kvm/mmu.c              |   7 +
>   arch/riscv/kvm/vcpu_exit.c        | 436 +++++++++++++++++++++++++++++-
>   3 files changed, 451 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index 18f1097f1d8d..4388bace6d70 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -53,6 +53,12 @@ struct kvm_arch {
>   	phys_addr_t pgd_phys;
>   };
>   
> +struct kvm_mmio_decode {
> +	unsigned long insn;
> +	int len;
> +	int shift;
> +};
> +
>   struct kvm_cpu_context {
>   	unsigned long zero;
>   	unsigned long ra;
> @@ -141,6 +147,9 @@ struct kvm_vcpu_arch {
>   	unsigned long irqs_pending;
>   	unsigned long irqs_pending_mask;
>   
> +	/* MMIO instruction details */
> +	struct kvm_mmio_decode mmio_decode;
> +
>   	/* VCPU power-off state */
>   	bool power_off;
>   
> @@ -160,6 +169,8 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>   int kvm_riscv_setup_vsip(void);
>   void kvm_riscv_cleanup_vsip(void);
>   
> +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
> +			 bool is_write);
>   void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
>   int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
>   void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 04dd089b86ff..2b965f9aac07 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -61,6 +61,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>   	return 0;
>   }
>   
> +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
> +			 bool is_write)
> +{
> +	/* TODO: */
> +	return 0;
> +}
> +
>   void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
>   {
>   	/* TODO: */
> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> index e4d7c8f0807a..efc06198c259 100644
> --- a/arch/riscv/kvm/vcpu_exit.c
> +++ b/arch/riscv/kvm/vcpu_exit.c
> @@ -6,9 +6,371 @@
>    *     Anup Patel <anup.patel@wdc.com>
>    */
>   
> +#include <linux/bitops.h>
>   #include <linux/errno.h>
>   #include <linux/err.h>
>   #include <linux/kvm_host.h>
> +#include <asm/csr.h>
> +
> +#define INSN_MATCH_LB		0x3
> +#define INSN_MASK_LB		0x707f
> +#define INSN_MATCH_LH		0x1003
> +#define INSN_MASK_LH		0x707f
> +#define INSN_MATCH_LW		0x2003
> +#define INSN_MASK_LW		0x707f
> +#define INSN_MATCH_LD		0x3003
> +#define INSN_MASK_LD		0x707f
> +#define INSN_MATCH_LBU		0x4003
> +#define INSN_MASK_LBU		0x707f
> +#define INSN_MATCH_LHU		0x5003
> +#define INSN_MASK_LHU		0x707f
> +#define INSN_MATCH_LWU		0x6003
> +#define INSN_MASK_LWU		0x707f
> +#define INSN_MATCH_SB		0x23
> +#define INSN_MASK_SB		0x707f
> +#define INSN_MATCH_SH		0x1023
> +#define INSN_MASK_SH		0x707f
> +#define INSN_MATCH_SW		0x2023
> +#define INSN_MASK_SW		0x707f
> +#define INSN_MATCH_SD		0x3023
> +#define INSN_MASK_SD		0x707f
> +
> +#define INSN_MATCH_C_LD		0x6000
> +#define INSN_MASK_C_LD		0xe003
> +#define INSN_MATCH_C_SD		0xe000
> +#define INSN_MASK_C_SD		0xe003
> +#define INSN_MATCH_C_LW		0x4000
> +#define INSN_MASK_C_LW		0xe003
> +#define INSN_MATCH_C_SW		0xc000
> +#define INSN_MASK_C_SW		0xe003
> +#define INSN_MATCH_C_LDSP	0x6002
> +#define INSN_MASK_C_LDSP	0xe003
> +#define INSN_MATCH_C_SDSP	0xe002
> +#define INSN_MASK_C_SDSP	0xe003
> +#define INSN_MATCH_C_LWSP	0x4002
> +#define INSN_MASK_C_LWSP	0xe003
> +#define INSN_MATCH_C_SWSP	0xc002
> +#define INSN_MASK_C_SWSP	0xe003
> +
> +#define INSN_LEN(insn)		((((insn) & 0x3) < 0x3) ? 2 : 4)
> +
> +#ifdef CONFIG_64BIT
> +#define LOG_REGBYTES		3
> +#else
> +#define LOG_REGBYTES		2
> +#endif
> +#define REGBYTES		(1 << LOG_REGBYTES)
> +
> +#define SH_RD			7
> +#define SH_RS1			15
> +#define SH_RS2			20
> +#define SH_RS2C			2
> +
> +#define RV_X(x, s, n)		(((x) >> (s)) & ((1 << (n)) - 1))
> +#define RVC_LW_IMM(x)		((RV_X(x, 6, 1) << 2) | \
> +				 (RV_X(x, 10, 3) << 3) | \
> +				 (RV_X(x, 5, 1) << 6))
> +#define RVC_LD_IMM(x)		((RV_X(x, 10, 3) << 3) | \
> +				 (RV_X(x, 5, 2) << 6))
> +#define RVC_LWSP_IMM(x)		((RV_X(x, 4, 3) << 2) | \
> +				 (RV_X(x, 12, 1) << 5) | \
> +				 (RV_X(x, 2, 2) << 6))
> +#define RVC_LDSP_IMM(x)		((RV_X(x, 5, 2) << 3) | \
> +				 (RV_X(x, 12, 1) << 5) | \
> +				 (RV_X(x, 2, 3) << 6))
> +#define RVC_SWSP_IMM(x)		((RV_X(x, 9, 4) << 2) | \
> +				 (RV_X(x, 7, 2) << 6))
> +#define RVC_SDSP_IMM(x)		((RV_X(x, 10, 3) << 3) | \
> +				 (RV_X(x, 7, 3) << 6))
> +#define RVC_RS1S(insn)		(8 + RV_X(insn, SH_RD, 3))
> +#define RVC_RS2S(insn)		(8 + RV_X(insn, SH_RS2C, 3))
> +#define RVC_RS2(insn)		RV_X(insn, SH_RS2C, 5)
> +
> +#define SHIFT_RIGHT(x, y)		\
> +	((y) < 0 ? ((x) << -(y)) : ((x) >> (y)))
> +
> +#define REG_MASK			\
> +	((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES))
> +
> +#define REG_OFFSET(insn, pos)		\
> +	(SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK)
> +
> +#define REG_PTR(insn, pos, regs)	\
> +	(ulong *)((ulong)(regs) + REG_OFFSET(insn, pos))
> +
> +#define GET_RM(insn)		(((insn) >> 12) & 7)
> +
> +#define GET_RS1(insn, regs)	(*REG_PTR(insn, SH_RS1, regs))
> +#define GET_RS2(insn, regs)	(*REG_PTR(insn, SH_RS2, regs))
> +#define GET_RS1S(insn, regs)	(*REG_PTR(RVC_RS1S(insn), 0, regs))
> +#define GET_RS2S(insn, regs)	(*REG_PTR(RVC_RS2S(insn), 0, regs))
> +#define GET_RS2C(insn, regs)	(*REG_PTR(insn, SH_RS2C, regs))
> +#define GET_SP(regs)		(*REG_PTR(2, 0, regs))
> +#define SET_RD(insn, regs, val)	(*REG_PTR(insn, SH_RD, regs) = (val))
> +#define IMM_I(insn)		((s32)(insn) >> 20)
> +#define IMM_S(insn)		(((s32)(insn) >> 25 << 5) | \
> +				 (s32)(((insn) >> 7) & 0x1f))
> +#define MASK_FUNCT3		0x7000
> +
> +#define STR(x)			XSTR(x)
> +#define XSTR(x)			#x
> +
> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
> +static ulong get_insn(struct kvm_vcpu *vcpu)
> +{
> +	ulong __sepc = vcpu->arch.guest_context.sepc;
> +	ulong __hstatus, __sstatus, __vsstatus;
> +#ifdef CONFIG_RISCV_ISA_C
> +	ulong rvc_mask = 3, tmp;
> +#endif
> +	ulong flags, val;
> +
> +	local_irq_save(flags);
> +
> +	__vsstatus = csr_read(CSR_VSSTATUS);
> +	__sstatus = csr_read(CSR_SSTATUS);
> +	__hstatus = csr_read(CSR_HSTATUS);
> +
> +	csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR);
> +	csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR);
> +	csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> +
> +#ifndef CONFIG_RISCV_ISA_C
> +	asm ("\n"
> +#ifdef CONFIG_64BIT
> +		STR(LWU) " %[insn], (%[addr])\n"
> +#else
> +		STR(LW) " %[insn], (%[addr])\n"
> +#endif
> +		: [insn] "=&r" (val) : [addr] "r" (__sepc));
> +#else
> +	asm ("and %[tmp], %[addr], 2\n"
> +		"bnez %[tmp], 1f\n"
> +#ifdef CONFIG_64BIT
> +		STR(LWU) " %[insn], (%[addr])\n"
> +#else
> +		STR(LW) " %[insn], (%[addr])\n"
> +#endif
> +		"and %[tmp], %[insn], %[rvc_mask]\n"
> +		"beq %[tmp], %[rvc_mask], 2f\n"
> +		"sll %[insn], %[insn], %[xlen_minus_16]\n"
> +		"srl %[insn], %[insn], %[xlen_minus_16]\n"
> +		"j 2f\n"
> +		"1:\n"
> +		"lhu %[insn], (%[addr])\n"
> +		"and %[tmp], %[insn], %[rvc_mask]\n"
> +		"bne %[tmp], %[rvc_mask], 2f\n"
> +		"lhu %[tmp], 2(%[addr])\n"
> +		"sll %[tmp], %[tmp], 16\n"
> +		"add %[insn], %[insn], %[tmp]\n"
> +		"2:"
> +	: [vsstatus] "+&r" (__vsstatus), [insn] "=&r" (val),
> +	  [tmp] "=&r" (tmp)
> +	: [addr] "r" (__sepc), [rvc_mask] "r" (rvc_mask),
> +	  [xlen_minus_16] "i" (__riscv_xlen - 16));
> +#endif
> +
> +	csr_write(CSR_HSTATUS, __hstatus);
> +	csr_write(CSR_SSTATUS, __sstatus);
> +	csr_write(CSR_VSSTATUS, __vsstatus);
> +
> +	local_irq_restore(flags);
> +
> +	return val;
> +}
> +
> +static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +			unsigned long fault_addr)
> +{
> +	int shift = 0, len = 0;
> +	ulong insn = get_insn(vcpu);
> +
> +	/* Decode length of MMIO and shift */
> +	if ((insn & INSN_MASK_LW) == INSN_MATCH_LW) {
> +		len = 4;
> +		shift = 8 * (sizeof(ulong) - len);
> +	} else if ((insn & INSN_MASK_LB) == INSN_MATCH_LB) {
> +		len = 1;
> +		shift = 8 * (sizeof(ulong) - len);
> +	} else if ((insn & INSN_MASK_LBU) == INSN_MATCH_LBU) {
> +		len = 1;
> +		shift = 8 * (sizeof(ulong) - len);
> +#ifdef CONFIG_64BIT
> +	} else if ((insn & INSN_MASK_LD) == INSN_MATCH_LD) {
> +		len = 8;
> +		shift = 8 * (sizeof(ulong) - len);
> +	} else if ((insn & INSN_MASK_LWU) == INSN_MATCH_LWU) {
> +		len = 4;
> +#endif
> +	} else if ((insn & INSN_MASK_LH) == INSN_MATCH_LH) {
> +		len = 2;
> +		shift = 8 * (sizeof(ulong) - len);
> +	} else if ((insn & INSN_MASK_LHU) == INSN_MATCH_LHU) {
> +		len = 2;
> +#ifdef CONFIG_RISCV_ISA_C
> +#ifdef CONFIG_64BIT
> +	} else if ((insn & INSN_MASK_C_LD) == INSN_MATCH_C_LD) {
> +		len = 8;
> +		shift = 8 * (sizeof(ulong) - len);
> +		insn = RVC_RS2S(insn) << SH_RD;
> +	} else if ((insn & INSN_MASK_C_LDSP) == INSN_MATCH_C_LDSP &&
> +		   ((insn >> SH_RD) & 0x1f)) {
> +		len = 8;
> +		shift = 8 * (sizeof(ulong) - len);
> +#endif
> +	} else if ((insn & INSN_MASK_C_LW) == INSN_MATCH_C_LW) {
> +		len = 4;
> +		shift = 8 * (sizeof(ulong) - len);
> +		insn = RVC_RS2S(insn) << SH_RD;
> +	} else if ((insn & INSN_MASK_C_LWSP) == INSN_MATCH_C_LWSP &&
> +		   ((insn >> SH_RD) & 0x1f)) {
> +		len = 4;
> +		shift = 8 * (sizeof(ulong) - len);
> +#endif
> +	} else {
> +		return -ENOTSUPP;
> +	}
> +
> +	/* Fault address should be aligned to length of MMIO */
> +	if (fault_addr & (len - 1))
> +		return -EIO;
> +
> +	/* Save instruction decode info */
> +	vcpu->arch.mmio_decode.insn = insn;
> +	vcpu->arch.mmio_decode.shift = shift;
> +	vcpu->arch.mmio_decode.len = len;
> +
> +	/* Exit to userspace for MMIO emulation */
> +	vcpu->stat.mmio_exit_user++;
> +	run->exit_reason = KVM_EXIT_MMIO;
> +	run->mmio.is_write = false;
> +	run->mmio.phys_addr = fault_addr;
> +	run->mmio.len = len;
> +
> +	/* Move to next instruction */
> +	vcpu->arch.guest_context.sepc += INSN_LEN(insn);

Doesn't that make more sense on the reentry path? What if you want to 
inject an MCE on access to unmapped addresses from user space?

> +
> +	return 0;
> +}
> +
> +static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +			 unsigned long fault_addr)
> +{
> +	u8 data8;
> +	u16 data16;
> +	u32 data32;
> +	u64 data64;
> +	ulong data;
> +	int len = 0;
> +	ulong insn = get_insn(vcpu);
> +
> +	data = GET_RS2(insn, &vcpu->arch.guest_context);
> +	data8 = data16 = data32 = data64 = data;
> +
> +	if ((insn & INSN_MASK_SW) == INSN_MATCH_SW) {
> +		len = 4;
> +	} else if ((insn & INSN_MASK_SB) == INSN_MATCH_SB) {
> +		len = 1;
> +#ifdef CONFIG_64BIT
> +	} else if ((insn & INSN_MASK_SD) == INSN_MATCH_SD) {
> +		len = 8;
> +#endif
> +	} else if ((insn & INSN_MASK_SH) == INSN_MATCH_SH) {
> +		len = 2;
> +#ifdef CONFIG_RISCV_ISA_C
> +#ifdef CONFIG_64BIT
> +	} else if ((insn & INSN_MASK_C_SD) == INSN_MATCH_C_SD) {
> +		len = 8;
> +		data64 = GET_RS2S(insn, &vcpu->arch.guest_context);
> +	} else if ((insn & INSN_MASK_C_SDSP) == INSN_MATCH_C_SDSP &&
> +		   ((insn >> SH_RD) & 0x1f)) {
> +		len = 8;
> +		data64 = GET_RS2C(insn, &vcpu->arch.guest_context);
> +#endif
> +	} else if ((insn & INSN_MASK_C_SW) == INSN_MATCH_C_SW) {
> +		len = 4;
> +		data32 = GET_RS2S(insn, &vcpu->arch.guest_context);
> +	} else if ((insn & INSN_MASK_C_SWSP) == INSN_MATCH_C_SWSP &&
> +		   ((insn >> SH_RD) & 0x1f)) {
> +		len = 4;
> +		data32 = GET_RS2C(insn, &vcpu->arch.guest_context);
> +#endif
> +	} else {
> +		return -ENOTSUPP;
> +	}
> +
> +	/* Fault address should be aligned to length of MMIO */
> +	if (fault_addr & (len - 1))
> +		return -EIO;
> +
> +	/* Clear instruction decode info */
> +	vcpu->arch.mmio_decode.insn = 0;
> +	vcpu->arch.mmio_decode.shift = 0;
> +	vcpu->arch.mmio_decode.len = 0;
> +
> +	/* Copy data to kvm_run instance */
> +	switch (len) {
> +	case 1:
> +		*((u8 *)run->mmio.data) = data8;
> +		break;
> +	case 2:
> +		*((u16 *)run->mmio.data) = data16;
> +		break;
> +	case 4:
> +		*((u32 *)run->mmio.data) = data32;
> +		break;
> +	case 8:
> +		*((u64 *)run->mmio.data) = data64;
> +		break;
> +	default:
> +		return -ENOTSUPP;
> +	};
> +
> +	/* Exit to userspace for MMIO emulation */
> +	vcpu->stat.mmio_exit_user++;
> +	run->exit_reason = KVM_EXIT_MMIO;
> +	run->mmio.is_write = true;
> +	run->mmio.phys_addr = fault_addr;
> +	run->mmio.len = len;
> +
> +	/* Move to next instruction */
> +	vcpu->arch.guest_context.sepc += INSN_LEN(insn);

Same comment here.


Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU
  2019-08-22  8:44 ` [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU Anup Patel
  2019-08-22 12:10   ` Alexander Graf
@ 2019-08-22 12:14   ` Alexander Graf
  2019-08-22 12:33     ` Anup Patel
  1 sibling, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-22 12:14 UTC (permalink / raw)
  To: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel

On 22.08.19 10:44, Anup Patel wrote:
> We will get stage2 page faults whenever Guest/VM access SW emulated
> MMIO device or unmapped Guest RAM.
> 
> This patch implements MMIO read/write emulation by extracting MMIO
> details from the trapped load/store instruction and forwarding the
> MMIO read/write to user-space. The actual MMIO emulation will happen
> in user-space and KVM kernel module will only take care of register
> updates before resuming the trapped VCPU.
> 
> The handling for stage2 page faults for unmapped Guest RAM will be
> implemeted by a separate patch later.
> 
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/riscv/include/asm/kvm_host.h |  11 +
>   arch/riscv/kvm/mmu.c              |   7 +
>   arch/riscv/kvm/vcpu_exit.c        | 436 +++++++++++++++++++++++++++++-
>   3 files changed, 451 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index 18f1097f1d8d..4388bace6d70 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -53,6 +53,12 @@ struct kvm_arch {
>   	phys_addr_t pgd_phys;
>   };
>   
> +struct kvm_mmio_decode {
> +	unsigned long insn;
> +	int len;
> +	int shift;
> +};
> +
>   struct kvm_cpu_context {
>   	unsigned long zero;
>   	unsigned long ra;
> @@ -141,6 +147,9 @@ struct kvm_vcpu_arch {
>   	unsigned long irqs_pending;
>   	unsigned long irqs_pending_mask;
>   
> +	/* MMIO instruction details */
> +	struct kvm_mmio_decode mmio_decode;
> +
>   	/* VCPU power-off state */
>   	bool power_off;
>   
> @@ -160,6 +169,8 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>   int kvm_riscv_setup_vsip(void);
>   void kvm_riscv_cleanup_vsip(void);
>   
> +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
> +			 bool is_write);
>   void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
>   int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
>   void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 04dd089b86ff..2b965f9aac07 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -61,6 +61,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>   	return 0;
>   }
>   
> +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
> +			 bool is_write)
> +{
> +	/* TODO: */
> +	return 0;
> +}
> +
>   void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
>   {
>   	/* TODO: */
> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> index e4d7c8f0807a..efc06198c259 100644
> --- a/arch/riscv/kvm/vcpu_exit.c
> +++ b/arch/riscv/kvm/vcpu_exit.c
> @@ -6,9 +6,371 @@
>    *     Anup Patel <anup.patel@wdc.com>
>    */
>   
> +#include <linux/bitops.h>
>   #include <linux/errno.h>
>   #include <linux/err.h>
>   #include <linux/kvm_host.h>
> +#include <asm/csr.h>
> +
> +#define INSN_MATCH_LB		0x3
> +#define INSN_MASK_LB		0x707f
> +#define INSN_MATCH_LH		0x1003
> +#define INSN_MASK_LH		0x707f
> +#define INSN_MATCH_LW		0x2003
> +#define INSN_MASK_LW		0x707f
> +#define INSN_MATCH_LD		0x3003
> +#define INSN_MASK_LD		0x707f
> +#define INSN_MATCH_LBU		0x4003
> +#define INSN_MASK_LBU		0x707f
> +#define INSN_MATCH_LHU		0x5003
> +#define INSN_MASK_LHU		0x707f
> +#define INSN_MATCH_LWU		0x6003
> +#define INSN_MASK_LWU		0x707f
> +#define INSN_MATCH_SB		0x23
> +#define INSN_MASK_SB		0x707f
> +#define INSN_MATCH_SH		0x1023
> +#define INSN_MASK_SH		0x707f
> +#define INSN_MATCH_SW		0x2023
> +#define INSN_MASK_SW		0x707f
> +#define INSN_MATCH_SD		0x3023
> +#define INSN_MASK_SD		0x707f
> +
> +#define INSN_MATCH_C_LD		0x6000
> +#define INSN_MASK_C_LD		0xe003
> +#define INSN_MATCH_C_SD		0xe000
> +#define INSN_MASK_C_SD		0xe003
> +#define INSN_MATCH_C_LW		0x4000
> +#define INSN_MASK_C_LW		0xe003
> +#define INSN_MATCH_C_SW		0xc000
> +#define INSN_MASK_C_SW		0xe003
> +#define INSN_MATCH_C_LDSP	0x6002
> +#define INSN_MASK_C_LDSP	0xe003
> +#define INSN_MATCH_C_SDSP	0xe002
> +#define INSN_MASK_C_SDSP	0xe003
> +#define INSN_MATCH_C_LWSP	0x4002
> +#define INSN_MASK_C_LWSP	0xe003
> +#define INSN_MATCH_C_SWSP	0xc002
> +#define INSN_MASK_C_SWSP	0xe003
> +
> +#define INSN_LEN(insn)		((((insn) & 0x3) < 0x3) ? 2 : 4)
> +
> +#ifdef CONFIG_64BIT
> +#define LOG_REGBYTES		3
> +#else
> +#define LOG_REGBYTES		2
> +#endif
> +#define REGBYTES		(1 << LOG_REGBYTES)
> +
> +#define SH_RD			7
> +#define SH_RS1			15
> +#define SH_RS2			20
> +#define SH_RS2C			2
> +
> +#define RV_X(x, s, n)		(((x) >> (s)) & ((1 << (n)) - 1))
> +#define RVC_LW_IMM(x)		((RV_X(x, 6, 1) << 2) | \
> +				 (RV_X(x, 10, 3) << 3) | \
> +				 (RV_X(x, 5, 1) << 6))
> +#define RVC_LD_IMM(x)		((RV_X(x, 10, 3) << 3) | \
> +				 (RV_X(x, 5, 2) << 6))
> +#define RVC_LWSP_IMM(x)		((RV_X(x, 4, 3) << 2) | \
> +				 (RV_X(x, 12, 1) << 5) | \
> +				 (RV_X(x, 2, 2) << 6))
> +#define RVC_LDSP_IMM(x)		((RV_X(x, 5, 2) << 3) | \
> +				 (RV_X(x, 12, 1) << 5) | \
> +				 (RV_X(x, 2, 3) << 6))
> +#define RVC_SWSP_IMM(x)		((RV_X(x, 9, 4) << 2) | \
> +				 (RV_X(x, 7, 2) << 6))
> +#define RVC_SDSP_IMM(x)		((RV_X(x, 10, 3) << 3) | \
> +				 (RV_X(x, 7, 3) << 6))
> +#define RVC_RS1S(insn)		(8 + RV_X(insn, SH_RD, 3))
> +#define RVC_RS2S(insn)		(8 + RV_X(insn, SH_RS2C, 3))
> +#define RVC_RS2(insn)		RV_X(insn, SH_RS2C, 5)
> +
> +#define SHIFT_RIGHT(x, y)		\
> +	((y) < 0 ? ((x) << -(y)) : ((x) >> (y)))
> +
> +#define REG_MASK			\
> +	((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES))
> +
> +#define REG_OFFSET(insn, pos)		\
> +	(SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK)
> +
> +#define REG_PTR(insn, pos, regs)	\
> +	(ulong *)((ulong)(regs) + REG_OFFSET(insn, pos))
> +
> +#define GET_RM(insn)		(((insn) >> 12) & 7)
> +
> +#define GET_RS1(insn, regs)	(*REG_PTR(insn, SH_RS1, regs))
> +#define GET_RS2(insn, regs)	(*REG_PTR(insn, SH_RS2, regs))
> +#define GET_RS1S(insn, regs)	(*REG_PTR(RVC_RS1S(insn), 0, regs))
> +#define GET_RS2S(insn, regs)	(*REG_PTR(RVC_RS2S(insn), 0, regs))
> +#define GET_RS2C(insn, regs)	(*REG_PTR(insn, SH_RS2C, regs))
> +#define GET_SP(regs)		(*REG_PTR(2, 0, regs))
> +#define SET_RD(insn, regs, val)	(*REG_PTR(insn, SH_RD, regs) = (val))
> +#define IMM_I(insn)		((s32)(insn) >> 20)
> +#define IMM_S(insn)		(((s32)(insn) >> 25 << 5) | \
> +				 (s32)(((insn) >> 7) & 0x1f))
> +#define MASK_FUNCT3		0x7000
> +
> +#define STR(x)			XSTR(x)
> +#define XSTR(x)			#x
> +
> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
> +static ulong get_insn(struct kvm_vcpu *vcpu)
> +{
> +	ulong __sepc = vcpu->arch.guest_context.sepc;
> +	ulong __hstatus, __sstatus, __vsstatus;
> +#ifdef CONFIG_RISCV_ISA_C
> +	ulong rvc_mask = 3, tmp;
> +#endif
> +	ulong flags, val;
> +
> +	local_irq_save(flags);
> +
> +	__vsstatus = csr_read(CSR_VSSTATUS);
> +	__sstatus = csr_read(CSR_SSTATUS);
> +	__hstatus = csr_read(CSR_HSTATUS);
> +
> +	csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR);
> +	csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR);
> +	csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);

What happens when the insn load triggers a page fault, maybe because the 
guest was malicious and did

   1) Run on page 0x1000
   2) Remove map for 0x1000, do *not* flush TLB
   3) Trigger MMIO

That would DOS the host here, as the host kernel would continue running 
in guest address space, right?


Alex

> +
> +#ifndef CONFIG_RISCV_ISA_C
> +	asm ("\n"
> +#ifdef CONFIG_64BIT
> +		STR(LWU) " %[insn], (%[addr])\n"
> +#else
> +		STR(LW) " %[insn], (%[addr])\n"
> +#endif
> +		: [insn] "=&r" (val) : [addr] "r" (__sepc));
> +#else
> +	asm ("and %[tmp], %[addr], 2\n"
> +		"bnez %[tmp], 1f\n"
> +#ifdef CONFIG_64BIT
> +		STR(LWU) " %[insn], (%[addr])\n"
> +#else
> +		STR(LW) " %[insn], (%[addr])\n"
> +#endif
> +		"and %[tmp], %[insn], %[rvc_mask]\n"
> +		"beq %[tmp], %[rvc_mask], 2f\n"
> +		"sll %[insn], %[insn], %[xlen_minus_16]\n"
> +		"srl %[insn], %[insn], %[xlen_minus_16]\n"
> +		"j 2f\n"
> +		"1:\n"
> +		"lhu %[insn], (%[addr])\n"
> +		"and %[tmp], %[insn], %[rvc_mask]\n"
> +		"bne %[tmp], %[rvc_mask], 2f\n"
> +		"lhu %[tmp], 2(%[addr])\n"
> +		"sll %[tmp], %[tmp], 16\n"
> +		"add %[insn], %[insn], %[tmp]\n"
> +		"2:"
> +	: [vsstatus] "+&r" (__vsstatus), [insn] "=&r" (val),
> +	  [tmp] "=&r" (tmp)
> +	: [addr] "r" (__sepc), [rvc_mask] "r" (rvc_mask),
> +	  [xlen_minus_16] "i" (__riscv_xlen - 16));
> +#endif
> +
> +	csr_write(CSR_HSTATUS, __hstatus);
> +	csr_write(CSR_SSTATUS, __sstatus);
> +	csr_write(CSR_VSSTATUS, __vsstatus);
> +
> +	local_irq_restore(flags);
> +
> +	return val;
> +}


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 11/20] RISC-V: KVM: Handle WFI exits for VCPU
  2019-08-22  8:45 ` [PATCH v5 11/20] RISC-V: KVM: Handle WFI " Anup Patel
@ 2019-08-22 12:19   ` Alexander Graf
  2019-08-22 12:50     ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-22 12:19 UTC (permalink / raw)
  To: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel

On 22.08.19 10:45, Anup Patel wrote:
> We get illegal instruction trap whenever Guest/VM executes WFI
> instruction.
> 
> This patch handles WFI trap by blocking the trapped VCPU using
> kvm_vcpu_block() API. The blocked VCPU will be automatically
> resumed whenever a VCPU interrupt is injected from user-space
> or from in-kernel IRQCHIP emulation.
> 
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/riscv/kvm/vcpu_exit.c | 88 ++++++++++++++++++++++++++++++++++++++
>   1 file changed, 88 insertions(+)
> 
> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> index efc06198c259..fbc04fe335ad 100644
> --- a/arch/riscv/kvm/vcpu_exit.c
> +++ b/arch/riscv/kvm/vcpu_exit.c
> @@ -12,6 +12,9 @@
>   #include <linux/kvm_host.h>
>   #include <asm/csr.h>
>   
> +#define INSN_MASK_WFI		0xffffff00
> +#define INSN_MATCH_WFI		0x10500000
> +
>   #define INSN_MATCH_LB		0x3
>   #define INSN_MASK_LB		0x707f
>   #define INSN_MATCH_LH		0x1003
> @@ -179,6 +182,87 @@ static ulong get_insn(struct kvm_vcpu *vcpu)
>   	return val;
>   }
>   
> +typedef int (*illegal_insn_func)(struct kvm_vcpu *vcpu,
> +				 struct kvm_run *run,
> +				 ulong insn);
> +
> +static int truly_illegal_insn(struct kvm_vcpu *vcpu,
> +			      struct kvm_run *run,
> +			      ulong insn)
> +{
> +	/* TODO: Redirect trap to Guest VCPU */
> +	return -ENOTSUPP;
> +}
> +
> +static int system_opcode_insn(struct kvm_vcpu *vcpu,
> +			      struct kvm_run *run,
> +			      ulong insn)
> +{
> +	if ((insn & INSN_MASK_WFI) == INSN_MATCH_WFI) {
> +		vcpu->stat.wfi_exit_stat++;
> +		if (!kvm_arch_vcpu_runnable(vcpu)) {
> +			srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
> +			kvm_vcpu_block(vcpu);
> +			vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> +			kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> +		}
> +		vcpu->arch.guest_context.sepc += INSN_LEN(insn);
> +		return 1;
> +	}
> +
> +	return truly_illegal_insn(vcpu, run, insn);
> +}
> +
> +static illegal_insn_func illegal_insn_table[32] = {

Every time I did experiments on PowerPC with indirect tables like this 
over switch() in C, the switch() code won. CPUs are pretty good at 
predicting branches. Predicting indirect jumps however, they are 
terrible at.

So unless you consider the jump table more readable / maintainable, I 
would suggest to use a simple switch() statement. It will be faster and 
smaller.


Alex


> +	truly_illegal_insn, /* 0 */
> +	truly_illegal_insn, /* 1 */
> +	truly_illegal_insn, /* 2 */
> +	truly_illegal_insn, /* 3 */
> +	truly_illegal_insn, /* 4 */
> +	truly_illegal_insn, /* 5 */
> +	truly_illegal_insn, /* 6 */
> +	truly_illegal_insn, /* 7 */
> +	truly_illegal_insn, /* 8 */
> +	truly_illegal_insn, /* 9 */
> +	truly_illegal_insn, /* 10 */
> +	truly_illegal_insn, /* 11 */
> +	truly_illegal_insn, /* 12 */
> +	truly_illegal_insn, /* 13 */
> +	truly_illegal_insn, /* 14 */
> +	truly_illegal_insn, /* 15 */
> +	truly_illegal_insn, /* 16 */
> +	truly_illegal_insn, /* 17 */
> +	truly_illegal_insn, /* 18 */
> +	truly_illegal_insn, /* 19 */
> +	truly_illegal_insn, /* 20 */
> +	truly_illegal_insn, /* 21 */
> +	truly_illegal_insn, /* 22 */
> +	truly_illegal_insn, /* 23 */
> +	truly_illegal_insn, /* 24 */
> +	truly_illegal_insn, /* 25 */
> +	truly_illegal_insn, /* 26 */
> +	truly_illegal_insn, /* 27 */
> +	system_opcode_insn, /* 28 */
> +	truly_illegal_insn, /* 29 */
> +	truly_illegal_insn, /* 30 */
> +	truly_illegal_insn  /* 31 */
> +};
> +
> +static int illegal_inst_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +			      unsigned long stval)
> +{
> +	ulong insn = stval;
> +
> +	if (unlikely((insn & 3) != 3)) {
> +		if (insn == 0)
> +			insn = get_insn(vcpu);
> +		if ((insn & 3) != 3)
> +			return truly_illegal_insn(vcpu, run, insn);
> +	}
> +
> +	return illegal_insn_table[(insn & 0x7c) >> 2](vcpu, run, insn);
> +}
> +
>   static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
>   			unsigned long fault_addr)
>   {
> @@ -439,6 +523,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>   	ret = -EFAULT;
>   	run->exit_reason = KVM_EXIT_UNKNOWN;
>   	switch (scause) {
> +	case EXC_INST_ILLEGAL:
> +		if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
> +			ret = illegal_inst_fault(vcpu, run, stval);
> +		break;
>   	case EXC_INST_PAGE_FAULT:
>   	case EXC_LOAD_PAGE_FAULT:
>   	case EXC_STORE_PAGE_FAULT:
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU
  2019-08-22 12:10   ` Alexander Graf
@ 2019-08-22 12:21     ` Andrew Jones
  2019-08-22 12:27     ` Anup Patel
  1 sibling, 0 replies; 61+ messages in thread
From: Andrew Jones @ 2019-08-22 12:21 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, Anup Patel,
	kvm, linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 02:10:48PM +0200, Alexander Graf wrote:
> On 22.08.19 10:44, Anup Patel wrote:
...
> > +static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > +			unsigned long fault_addr)
...
> > +	/* Exit to userspace for MMIO emulation */
> > +	vcpu->stat.mmio_exit_user++;
> > +	run->exit_reason = KVM_EXIT_MMIO;
> > +	run->mmio.is_write = false;
> > +	run->mmio.phys_addr = fault_addr;
> > +	run->mmio.len = len;
> > +
> > +	/* Move to next instruction */
> > +	vcpu->arch.guest_context.sepc += INSN_LEN(insn);
> 
> Doesn't that make more sense on the reentry path? What if you want to inject
> an MCE on access to unmapped addresses from user space?
>

I agree. See commit 0d640732dbeb for arm's justification for moving
the instruction skip. But also see

https://patchwork.kernel.org/patch/11109063/

for a needed fix to avoid skipping the instructions multiple times.
It looks like riscv's KVM_RUN ioctl would be vulnerable to that as
well.

Thanks,
drew

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU
  2019-08-22 12:10   ` Alexander Graf
  2019-08-22 12:21     ` Andrew Jones
@ 2019-08-22 12:27     ` Anup Patel
  1 sibling, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22 12:27 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 5:40 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 22.08.19 10:44, Anup Patel wrote:
> > We will get stage2 page faults whenever Guest/VM access SW emulated
> > MMIO device or unmapped Guest RAM.
> >
> > This patch implements MMIO read/write emulation by extracting MMIO
> > details from the trapped load/store instruction and forwarding the
> > MMIO read/write to user-space. The actual MMIO emulation will happen
> > in user-space and KVM kernel module will only take care of register
> > updates before resuming the trapped VCPU.
> >
> > The handling for stage2 page faults for unmapped Guest RAM will be
> > implemeted by a separate patch later.
> >
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   arch/riscv/include/asm/kvm_host.h |  11 +
> >   arch/riscv/kvm/mmu.c              |   7 +
> >   arch/riscv/kvm/vcpu_exit.c        | 436 +++++++++++++++++++++++++++++-
> >   3 files changed, 451 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > index 18f1097f1d8d..4388bace6d70 100644
> > --- a/arch/riscv/include/asm/kvm_host.h
> > +++ b/arch/riscv/include/asm/kvm_host.h
> > @@ -53,6 +53,12 @@ struct kvm_arch {
> >       phys_addr_t pgd_phys;
> >   };
> >
> > +struct kvm_mmio_decode {
> > +     unsigned long insn;
> > +     int len;
> > +     int shift;
> > +};
> > +
> >   struct kvm_cpu_context {
> >       unsigned long zero;
> >       unsigned long ra;
> > @@ -141,6 +147,9 @@ struct kvm_vcpu_arch {
> >       unsigned long irqs_pending;
> >       unsigned long irqs_pending_mask;
> >
> > +     /* MMIO instruction details */
> > +     struct kvm_mmio_decode mmio_decode;
> > +
> >       /* VCPU power-off state */
> >       bool power_off;
> >
> > @@ -160,6 +169,8 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
> >   int kvm_riscv_setup_vsip(void);
> >   void kvm_riscv_cleanup_vsip(void);
> >
> > +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
> > +                      bool is_write);
> >   void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
> >   int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
> >   void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
> > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> > index 04dd089b86ff..2b965f9aac07 100644
> > --- a/arch/riscv/kvm/mmu.c
> > +++ b/arch/riscv/kvm/mmu.c
> > @@ -61,6 +61,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >       return 0;
> >   }
> >
> > +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
> > +                      bool is_write)
> > +{
> > +     /* TODO: */
> > +     return 0;
> > +}
> > +
> >   void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
> >   {
> >       /* TODO: */
> > diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> > index e4d7c8f0807a..efc06198c259 100644
> > --- a/arch/riscv/kvm/vcpu_exit.c
> > +++ b/arch/riscv/kvm/vcpu_exit.c
> > @@ -6,9 +6,371 @@
> >    *     Anup Patel <anup.patel@wdc.com>
> >    */
> >
> > +#include <linux/bitops.h>
> >   #include <linux/errno.h>
> >   #include <linux/err.h>
> >   #include <linux/kvm_host.h>
> > +#include <asm/csr.h>
> > +
> > +#define INSN_MATCH_LB                0x3
> > +#define INSN_MASK_LB         0x707f
> > +#define INSN_MATCH_LH                0x1003
> > +#define INSN_MASK_LH         0x707f
> > +#define INSN_MATCH_LW                0x2003
> > +#define INSN_MASK_LW         0x707f
> > +#define INSN_MATCH_LD                0x3003
> > +#define INSN_MASK_LD         0x707f
> > +#define INSN_MATCH_LBU               0x4003
> > +#define INSN_MASK_LBU                0x707f
> > +#define INSN_MATCH_LHU               0x5003
> > +#define INSN_MASK_LHU                0x707f
> > +#define INSN_MATCH_LWU               0x6003
> > +#define INSN_MASK_LWU                0x707f
> > +#define INSN_MATCH_SB                0x23
> > +#define INSN_MASK_SB         0x707f
> > +#define INSN_MATCH_SH                0x1023
> > +#define INSN_MASK_SH         0x707f
> > +#define INSN_MATCH_SW                0x2023
> > +#define INSN_MASK_SW         0x707f
> > +#define INSN_MATCH_SD                0x3023
> > +#define INSN_MASK_SD         0x707f
> > +
> > +#define INSN_MATCH_C_LD              0x6000
> > +#define INSN_MASK_C_LD               0xe003
> > +#define INSN_MATCH_C_SD              0xe000
> > +#define INSN_MASK_C_SD               0xe003
> > +#define INSN_MATCH_C_LW              0x4000
> > +#define INSN_MASK_C_LW               0xe003
> > +#define INSN_MATCH_C_SW              0xc000
> > +#define INSN_MASK_C_SW               0xe003
> > +#define INSN_MATCH_C_LDSP    0x6002
> > +#define INSN_MASK_C_LDSP     0xe003
> > +#define INSN_MATCH_C_SDSP    0xe002
> > +#define INSN_MASK_C_SDSP     0xe003
> > +#define INSN_MATCH_C_LWSP    0x4002
> > +#define INSN_MASK_C_LWSP     0xe003
> > +#define INSN_MATCH_C_SWSP    0xc002
> > +#define INSN_MASK_C_SWSP     0xe003
> > +
> > +#define INSN_LEN(insn)               ((((insn) & 0x3) < 0x3) ? 2 : 4)
> > +
> > +#ifdef CONFIG_64BIT
> > +#define LOG_REGBYTES         3
> > +#else
> > +#define LOG_REGBYTES         2
> > +#endif
> > +#define REGBYTES             (1 << LOG_REGBYTES)
> > +
> > +#define SH_RD                        7
> > +#define SH_RS1                       15
> > +#define SH_RS2                       20
> > +#define SH_RS2C                      2
> > +
> > +#define RV_X(x, s, n)                (((x) >> (s)) & ((1 << (n)) - 1))
> > +#define RVC_LW_IMM(x)                ((RV_X(x, 6, 1) << 2) | \
> > +                              (RV_X(x, 10, 3) << 3) | \
> > +                              (RV_X(x, 5, 1) << 6))
> > +#define RVC_LD_IMM(x)                ((RV_X(x, 10, 3) << 3) | \
> > +                              (RV_X(x, 5, 2) << 6))
> > +#define RVC_LWSP_IMM(x)              ((RV_X(x, 4, 3) << 2) | \
> > +                              (RV_X(x, 12, 1) << 5) | \
> > +                              (RV_X(x, 2, 2) << 6))
> > +#define RVC_LDSP_IMM(x)              ((RV_X(x, 5, 2) << 3) | \
> > +                              (RV_X(x, 12, 1) << 5) | \
> > +                              (RV_X(x, 2, 3) << 6))
> > +#define RVC_SWSP_IMM(x)              ((RV_X(x, 9, 4) << 2) | \
> > +                              (RV_X(x, 7, 2) << 6))
> > +#define RVC_SDSP_IMM(x)              ((RV_X(x, 10, 3) << 3) | \
> > +                              (RV_X(x, 7, 3) << 6))
> > +#define RVC_RS1S(insn)               (8 + RV_X(insn, SH_RD, 3))
> > +#define RVC_RS2S(insn)               (8 + RV_X(insn, SH_RS2C, 3))
> > +#define RVC_RS2(insn)                RV_X(insn, SH_RS2C, 5)
> > +
> > +#define SHIFT_RIGHT(x, y)            \
> > +     ((y) < 0 ? ((x) << -(y)) : ((x) >> (y)))
> > +
> > +#define REG_MASK                     \
> > +     ((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES))
> > +
> > +#define REG_OFFSET(insn, pos)                \
> > +     (SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK)
> > +
> > +#define REG_PTR(insn, pos, regs)     \
> > +     (ulong *)((ulong)(regs) + REG_OFFSET(insn, pos))
> > +
> > +#define GET_RM(insn)         (((insn) >> 12) & 7)
> > +
> > +#define GET_RS1(insn, regs)  (*REG_PTR(insn, SH_RS1, regs))
> > +#define GET_RS2(insn, regs)  (*REG_PTR(insn, SH_RS2, regs))
> > +#define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs))
> > +#define GET_RS2S(insn, regs) (*REG_PTR(RVC_RS2S(insn), 0, regs))
> > +#define GET_RS2C(insn, regs) (*REG_PTR(insn, SH_RS2C, regs))
> > +#define GET_SP(regs)         (*REG_PTR(2, 0, regs))
> > +#define SET_RD(insn, regs, val)      (*REG_PTR(insn, SH_RD, regs) = (val))
> > +#define IMM_I(insn)          ((s32)(insn) >> 20)
> > +#define IMM_S(insn)          (((s32)(insn) >> 25 << 5) | \
> > +                              (s32)(((insn) >> 7) & 0x1f))
> > +#define MASK_FUNCT3          0x7000
> > +
> > +#define STR(x)                       XSTR(x)
> > +#define XSTR(x)                      #x
> > +
> > +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
> > +static ulong get_insn(struct kvm_vcpu *vcpu)
> > +{
> > +     ulong __sepc = vcpu->arch.guest_context.sepc;
> > +     ulong __hstatus, __sstatus, __vsstatus;
> > +#ifdef CONFIG_RISCV_ISA_C
> > +     ulong rvc_mask = 3, tmp;
> > +#endif
> > +     ulong flags, val;
> > +
> > +     local_irq_save(flags);
> > +
> > +     __vsstatus = csr_read(CSR_VSSTATUS);
> > +     __sstatus = csr_read(CSR_SSTATUS);
> > +     __hstatus = csr_read(CSR_HSTATUS);
> > +
> > +     csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR);
> > +     csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR);
> > +     csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> > +
> > +#ifndef CONFIG_RISCV_ISA_C
> > +     asm ("\n"
> > +#ifdef CONFIG_64BIT
> > +             STR(LWU) " %[insn], (%[addr])\n"
> > +#else
> > +             STR(LW) " %[insn], (%[addr])\n"
> > +#endif
> > +             : [insn] "=&r" (val) : [addr] "r" (__sepc));
> > +#else
> > +     asm ("and %[tmp], %[addr], 2\n"
> > +             "bnez %[tmp], 1f\n"
> > +#ifdef CONFIG_64BIT
> > +             STR(LWU) " %[insn], (%[addr])\n"
> > +#else
> > +             STR(LW) " %[insn], (%[addr])\n"
> > +#endif
> > +             "and %[tmp], %[insn], %[rvc_mask]\n"
> > +             "beq %[tmp], %[rvc_mask], 2f\n"
> > +             "sll %[insn], %[insn], %[xlen_minus_16]\n"
> > +             "srl %[insn], %[insn], %[xlen_minus_16]\n"
> > +             "j 2f\n"
> > +             "1:\n"
> > +             "lhu %[insn], (%[addr])\n"
> > +             "and %[tmp], %[insn], %[rvc_mask]\n"
> > +             "bne %[tmp], %[rvc_mask], 2f\n"
> > +             "lhu %[tmp], 2(%[addr])\n"
> > +             "sll %[tmp], %[tmp], 16\n"
> > +             "add %[insn], %[insn], %[tmp]\n"
> > +             "2:"
> > +     : [vsstatus] "+&r" (__vsstatus), [insn] "=&r" (val),
> > +       [tmp] "=&r" (tmp)
> > +     : [addr] "r" (__sepc), [rvc_mask] "r" (rvc_mask),
> > +       [xlen_minus_16] "i" (__riscv_xlen - 16));
> > +#endif
> > +
> > +     csr_write(CSR_HSTATUS, __hstatus);
> > +     csr_write(CSR_SSTATUS, __sstatus);
> > +     csr_write(CSR_VSSTATUS, __vsstatus);
> > +
> > +     local_irq_restore(flags);
> > +
> > +     return val;
> > +}
> > +
> > +static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > +                     unsigned long fault_addr)
> > +{
> > +     int shift = 0, len = 0;
> > +     ulong insn = get_insn(vcpu);
> > +
> > +     /* Decode length of MMIO and shift */
> > +     if ((insn & INSN_MASK_LW) == INSN_MATCH_LW) {
> > +             len = 4;
> > +             shift = 8 * (sizeof(ulong) - len);
> > +     } else if ((insn & INSN_MASK_LB) == INSN_MATCH_LB) {
> > +             len = 1;
> > +             shift = 8 * (sizeof(ulong) - len);
> > +     } else if ((insn & INSN_MASK_LBU) == INSN_MATCH_LBU) {
> > +             len = 1;
> > +             shift = 8 * (sizeof(ulong) - len);
> > +#ifdef CONFIG_64BIT
> > +     } else if ((insn & INSN_MASK_LD) == INSN_MATCH_LD) {
> > +             len = 8;
> > +             shift = 8 * (sizeof(ulong) - len);
> > +     } else if ((insn & INSN_MASK_LWU) == INSN_MATCH_LWU) {
> > +             len = 4;
> > +#endif
> > +     } else if ((insn & INSN_MASK_LH) == INSN_MATCH_LH) {
> > +             len = 2;
> > +             shift = 8 * (sizeof(ulong) - len);
> > +     } else if ((insn & INSN_MASK_LHU) == INSN_MATCH_LHU) {
> > +             len = 2;
> > +#ifdef CONFIG_RISCV_ISA_C
> > +#ifdef CONFIG_64BIT
> > +     } else if ((insn & INSN_MASK_C_LD) == INSN_MATCH_C_LD) {
> > +             len = 8;
> > +             shift = 8 * (sizeof(ulong) - len);
> > +             insn = RVC_RS2S(insn) << SH_RD;
> > +     } else if ((insn & INSN_MASK_C_LDSP) == INSN_MATCH_C_LDSP &&
> > +                ((insn >> SH_RD) & 0x1f)) {
> > +             len = 8;
> > +             shift = 8 * (sizeof(ulong) - len);
> > +#endif
> > +     } else if ((insn & INSN_MASK_C_LW) == INSN_MATCH_C_LW) {
> > +             len = 4;
> > +             shift = 8 * (sizeof(ulong) - len);
> > +             insn = RVC_RS2S(insn) << SH_RD;
> > +     } else if ((insn & INSN_MASK_C_LWSP) == INSN_MATCH_C_LWSP &&
> > +                ((insn >> SH_RD) & 0x1f)) {
> > +             len = 4;
> > +             shift = 8 * (sizeof(ulong) - len);
> > +#endif
> > +     } else {
> > +             return -ENOTSUPP;
> > +     }
> > +
> > +     /* Fault address should be aligned to length of MMIO */
> > +     if (fault_addr & (len - 1))
> > +             return -EIO;
> > +
> > +     /* Save instruction decode info */
> > +     vcpu->arch.mmio_decode.insn = insn;
> > +     vcpu->arch.mmio_decode.shift = shift;
> > +     vcpu->arch.mmio_decode.len = len;
> > +
> > +     /* Exit to userspace for MMIO emulation */
> > +     vcpu->stat.mmio_exit_user++;
> > +     run->exit_reason = KVM_EXIT_MMIO;
> > +     run->mmio.is_write = false;
> > +     run->mmio.phys_addr = fault_addr;
> > +     run->mmio.len = len;
> > +
> > +     /* Move to next instruction */
> > +     vcpu->arch.guest_context.sepc += INSN_LEN(insn);
>
> Doesn't that make more sense on the reentry path? What if you want to
> inject an MCE on access to unmapped addresses from user space?

This a good suggestion. I did not think about debugging aspect.

I will update this patch accordingly.

>
> > +
> > +     return 0;
> > +}
> > +
> > +static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > +                      unsigned long fault_addr)
> > +{
> > +     u8 data8;
> > +     u16 data16;
> > +     u32 data32;
> > +     u64 data64;
> > +     ulong data;
> > +     int len = 0;
> > +     ulong insn = get_insn(vcpu);
> > +
> > +     data = GET_RS2(insn, &vcpu->arch.guest_context);
> > +     data8 = data16 = data32 = data64 = data;
> > +
> > +     if ((insn & INSN_MASK_SW) == INSN_MATCH_SW) {
> > +             len = 4;
> > +     } else if ((insn & INSN_MASK_SB) == INSN_MATCH_SB) {
> > +             len = 1;
> > +#ifdef CONFIG_64BIT
> > +     } else if ((insn & INSN_MASK_SD) == INSN_MATCH_SD) {
> > +             len = 8;
> > +#endif
> > +     } else if ((insn & INSN_MASK_SH) == INSN_MATCH_SH) {
> > +             len = 2;
> > +#ifdef CONFIG_RISCV_ISA_C
> > +#ifdef CONFIG_64BIT
> > +     } else if ((insn & INSN_MASK_C_SD) == INSN_MATCH_C_SD) {
> > +             len = 8;
> > +             data64 = GET_RS2S(insn, &vcpu->arch.guest_context);
> > +     } else if ((insn & INSN_MASK_C_SDSP) == INSN_MATCH_C_SDSP &&
> > +                ((insn >> SH_RD) & 0x1f)) {
> > +             len = 8;
> > +             data64 = GET_RS2C(insn, &vcpu->arch.guest_context);
> > +#endif
> > +     } else if ((insn & INSN_MASK_C_SW) == INSN_MATCH_C_SW) {
> > +             len = 4;
> > +             data32 = GET_RS2S(insn, &vcpu->arch.guest_context);
> > +     } else if ((insn & INSN_MASK_C_SWSP) == INSN_MATCH_C_SWSP &&
> > +                ((insn >> SH_RD) & 0x1f)) {
> > +             len = 4;
> > +             data32 = GET_RS2C(insn, &vcpu->arch.guest_context);
> > +#endif
> > +     } else {
> > +             return -ENOTSUPP;
> > +     }
> > +
> > +     /* Fault address should be aligned to length of MMIO */
> > +     if (fault_addr & (len - 1))
> > +             return -EIO;
> > +
> > +     /* Clear instruction decode info */
> > +     vcpu->arch.mmio_decode.insn = 0;
> > +     vcpu->arch.mmio_decode.shift = 0;
> > +     vcpu->arch.mmio_decode.len = 0;
> > +
> > +     /* Copy data to kvm_run instance */
> > +     switch (len) {
> > +     case 1:
> > +             *((u8 *)run->mmio.data) = data8;
> > +             break;
> > +     case 2:
> > +             *((u16 *)run->mmio.data) = data16;
> > +             break;
> > +     case 4:
> > +             *((u32 *)run->mmio.data) = data32;
> > +             break;
> > +     case 8:
> > +             *((u64 *)run->mmio.data) = data64;
> > +             break;
> > +     default:
> > +             return -ENOTSUPP;
> > +     };
> > +
> > +     /* Exit to userspace for MMIO emulation */
> > +     vcpu->stat.mmio_exit_user++;
> > +     run->exit_reason = KVM_EXIT_MMIO;
> > +     run->mmio.is_write = true;
> > +     run->mmio.phys_addr = fault_addr;
> > +     run->mmio.len = len;
> > +
> > +     /* Move to next instruction */
> > +     vcpu->arch.guest_context.sepc += INSN_LEN(insn);
>
> Same comment here.

Sure, I will update.

>
>
> Alex

Regards,
Anup

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming
  2019-08-22  8:45 ` [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming Anup Patel
@ 2019-08-22 12:28   ` Alexander Graf
  2019-08-22 12:38     ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-22 12:28 UTC (permalink / raw)
  To: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel

On 22.08.19 10:45, Anup Patel wrote:
> This patch implements all required functions for programming
> the stage2 page table for each Guest/VM.
> 
> At high-level, the flow of stage2 related functions is similar
> from KVM ARM/ARM64 implementation but the stage2 page table
> format is quite different for KVM RISC-V.
> 
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/riscv/include/asm/kvm_host.h     |  10 +
>   arch/riscv/include/asm/pgtable-bits.h |   1 +
>   arch/riscv/kvm/mmu.c                  | 637 +++++++++++++++++++++++++-
>   3 files changed, 638 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index 3b09158f80f2..a37775c92586 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -72,6 +72,13 @@ struct kvm_mmio_decode {
>   	int shift;
>   };
>   
> +#define KVM_MMU_PAGE_CACHE_NR_OBJS	32
> +
> +struct kvm_mmu_page_cache {
> +	int nobjs;
> +	void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS];
> +};
> +
>   struct kvm_cpu_context {
>   	unsigned long zero;
>   	unsigned long ra;
> @@ -163,6 +170,9 @@ struct kvm_vcpu_arch {
>   	/* MMIO instruction details */
>   	struct kvm_mmio_decode mmio_decode;
>   
> +	/* Cache pages needed to program page tables with spinlock held */
> +	struct kvm_mmu_page_cache mmu_page_cache;
> +
>   	/* VCPU power-off state */
>   	bool power_off;
>   
> diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
> index bbaeb5d35842..be49d62fcc2b 100644
> --- a/arch/riscv/include/asm/pgtable-bits.h
> +++ b/arch/riscv/include/asm/pgtable-bits.h
> @@ -26,6 +26,7 @@
>   
>   #define _PAGE_SPECIAL   _PAGE_SOFT
>   #define _PAGE_TABLE     _PAGE_PRESENT
> +#define _PAGE_LEAF      (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
>   
>   /*
>    * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 2b965f9aac07..9e95ab6769f6 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -18,6 +18,432 @@
>   #include <asm/page.h>
>   #include <asm/pgtable.h>
>   
> +#ifdef CONFIG_64BIT
> +#define stage2_have_pmd		true
> +#define stage2_gpa_size		((phys_addr_t)(1ULL << 39))
> +#define stage2_cache_min_pages	2
> +#else
> +#define pmd_index(x)		0
> +#define pfn_pmd(x, y)		({ pmd_t __x = { 0 }; __x; })
> +#define stage2_have_pmd		false
> +#define stage2_gpa_size		((phys_addr_t)(1ULL << 32))
> +#define stage2_cache_min_pages	1
> +#endif
> +
> +static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache,
> +			      int min, int max)
> +{
> +	void *page;
> +
> +	BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS);
> +	if (pcache->nobjs >= min)
> +		return 0;
> +	while (pcache->nobjs < max) {
> +		page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> +		if (!page)
> +			return -ENOMEM;
> +		pcache->objects[pcache->nobjs++] = page;
> +	}
> +
> +	return 0;
> +}
> +
> +static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache)
> +{
> +	while (pcache && pcache->nobjs)
> +		free_page((unsigned long)pcache->objects[--pcache->nobjs]);
> +}
> +
> +static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
> +{
> +	void *p;
> +
> +	if (!pcache)
> +		return NULL;
> +
> +	BUG_ON(!pcache->nobjs);
> +	p = pcache->objects[--pcache->nobjs];
> +
> +	return p;
> +}
> +
> +struct local_guest_tlb_info {
> +	struct kvm_vmid *vmid;
> +	gpa_t addr;
> +};
> +
> +static void local_guest_tlb_flush_vmid_gpa(void *info)
> +{
> +	struct local_guest_tlb_info *infop = info;
> +
> +	__kvm_riscv_hfence_gvma_vmid_gpa(READ_ONCE(infop->vmid->vmid_version),
> +					 infop->addr);
> +}
> +
> +static void stage2_remote_tlb_flush(struct kvm *kvm, gpa_t addr)
> +{
> +	struct local_guest_tlb_info info;
> +	struct kvm_vmid *vmid = &kvm->arch.vmid;
> +
> +	/* TODO: This should be SBI call */
> +	info.vmid = vmid;
> +	info.addr = addr;
> +	preempt_disable();
> +	smp_call_function_many(cpu_all_mask, local_guest_tlb_flush_vmid_gpa,
> +			       &info, true);

This is all nice and dandy on the toy 4 core systems we have today, but 
it will become a bottleneck further down the road.

How many VMIDs do you have? Could you just allocate a new one every time 
you switch host CPUs? Then you know exactly which CPUs to flush by 
looking at all your vcpu structs and a local field that tells you which 
pCPU they're on at this moment.

Either way, it's nothing that should block inclusion. For today, we're fine.


Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU
  2019-08-22 12:14   ` Alexander Graf
@ 2019-08-22 12:33     ` Anup Patel
  2019-08-22 13:25       ` Alexander Graf
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-22 12:33 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 5:44 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 22.08.19 10:44, Anup Patel wrote:
> > We will get stage2 page faults whenever Guest/VM access SW emulated
> > MMIO device or unmapped Guest RAM.
> >
> > This patch implements MMIO read/write emulation by extracting MMIO
> > details from the trapped load/store instruction and forwarding the
> > MMIO read/write to user-space. The actual MMIO emulation will happen
> > in user-space and KVM kernel module will only take care of register
> > updates before resuming the trapped VCPU.
> >
> > The handling for stage2 page faults for unmapped Guest RAM will be
> > implemeted by a separate patch later.
> >
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   arch/riscv/include/asm/kvm_host.h |  11 +
> >   arch/riscv/kvm/mmu.c              |   7 +
> >   arch/riscv/kvm/vcpu_exit.c        | 436 +++++++++++++++++++++++++++++-
> >   3 files changed, 451 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > index 18f1097f1d8d..4388bace6d70 100644
> > --- a/arch/riscv/include/asm/kvm_host.h
> > +++ b/arch/riscv/include/asm/kvm_host.h
> > @@ -53,6 +53,12 @@ struct kvm_arch {
> >       phys_addr_t pgd_phys;
> >   };
> >
> > +struct kvm_mmio_decode {
> > +     unsigned long insn;
> > +     int len;
> > +     int shift;
> > +};
> > +
> >   struct kvm_cpu_context {
> >       unsigned long zero;
> >       unsigned long ra;
> > @@ -141,6 +147,9 @@ struct kvm_vcpu_arch {
> >       unsigned long irqs_pending;
> >       unsigned long irqs_pending_mask;
> >
> > +     /* MMIO instruction details */
> > +     struct kvm_mmio_decode mmio_decode;
> > +
> >       /* VCPU power-off state */
> >       bool power_off;
> >
> > @@ -160,6 +169,8 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
> >   int kvm_riscv_setup_vsip(void);
> >   void kvm_riscv_cleanup_vsip(void);
> >
> > +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
> > +                      bool is_write);
> >   void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
> >   int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
> >   void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
> > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> > index 04dd089b86ff..2b965f9aac07 100644
> > --- a/arch/riscv/kvm/mmu.c
> > +++ b/arch/riscv/kvm/mmu.c
> > @@ -61,6 +61,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >       return 0;
> >   }
> >
> > +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
> > +                      bool is_write)
> > +{
> > +     /* TODO: */
> > +     return 0;
> > +}
> > +
> >   void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
> >   {
> >       /* TODO: */
> > diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> > index e4d7c8f0807a..efc06198c259 100644
> > --- a/arch/riscv/kvm/vcpu_exit.c
> > +++ b/arch/riscv/kvm/vcpu_exit.c
> > @@ -6,9 +6,371 @@
> >    *     Anup Patel <anup.patel@wdc.com>
> >    */
> >
> > +#include <linux/bitops.h>
> >   #include <linux/errno.h>
> >   #include <linux/err.h>
> >   #include <linux/kvm_host.h>
> > +#include <asm/csr.h>
> > +
> > +#define INSN_MATCH_LB                0x3
> > +#define INSN_MASK_LB         0x707f
> > +#define INSN_MATCH_LH                0x1003
> > +#define INSN_MASK_LH         0x707f
> > +#define INSN_MATCH_LW                0x2003
> > +#define INSN_MASK_LW         0x707f
> > +#define INSN_MATCH_LD                0x3003
> > +#define INSN_MASK_LD         0x707f
> > +#define INSN_MATCH_LBU               0x4003
> > +#define INSN_MASK_LBU                0x707f
> > +#define INSN_MATCH_LHU               0x5003
> > +#define INSN_MASK_LHU                0x707f
> > +#define INSN_MATCH_LWU               0x6003
> > +#define INSN_MASK_LWU                0x707f
> > +#define INSN_MATCH_SB                0x23
> > +#define INSN_MASK_SB         0x707f
> > +#define INSN_MATCH_SH                0x1023
> > +#define INSN_MASK_SH         0x707f
> > +#define INSN_MATCH_SW                0x2023
> > +#define INSN_MASK_SW         0x707f
> > +#define INSN_MATCH_SD                0x3023
> > +#define INSN_MASK_SD         0x707f
> > +
> > +#define INSN_MATCH_C_LD              0x6000
> > +#define INSN_MASK_C_LD               0xe003
> > +#define INSN_MATCH_C_SD              0xe000
> > +#define INSN_MASK_C_SD               0xe003
> > +#define INSN_MATCH_C_LW              0x4000
> > +#define INSN_MASK_C_LW               0xe003
> > +#define INSN_MATCH_C_SW              0xc000
> > +#define INSN_MASK_C_SW               0xe003
> > +#define INSN_MATCH_C_LDSP    0x6002
> > +#define INSN_MASK_C_LDSP     0xe003
> > +#define INSN_MATCH_C_SDSP    0xe002
> > +#define INSN_MASK_C_SDSP     0xe003
> > +#define INSN_MATCH_C_LWSP    0x4002
> > +#define INSN_MASK_C_LWSP     0xe003
> > +#define INSN_MATCH_C_SWSP    0xc002
> > +#define INSN_MASK_C_SWSP     0xe003
> > +
> > +#define INSN_LEN(insn)               ((((insn) & 0x3) < 0x3) ? 2 : 4)
> > +
> > +#ifdef CONFIG_64BIT
> > +#define LOG_REGBYTES         3
> > +#else
> > +#define LOG_REGBYTES         2
> > +#endif
> > +#define REGBYTES             (1 << LOG_REGBYTES)
> > +
> > +#define SH_RD                        7
> > +#define SH_RS1                       15
> > +#define SH_RS2                       20
> > +#define SH_RS2C                      2
> > +
> > +#define RV_X(x, s, n)                (((x) >> (s)) & ((1 << (n)) - 1))
> > +#define RVC_LW_IMM(x)                ((RV_X(x, 6, 1) << 2) | \
> > +                              (RV_X(x, 10, 3) << 3) | \
> > +                              (RV_X(x, 5, 1) << 6))
> > +#define RVC_LD_IMM(x)                ((RV_X(x, 10, 3) << 3) | \
> > +                              (RV_X(x, 5, 2) << 6))
> > +#define RVC_LWSP_IMM(x)              ((RV_X(x, 4, 3) << 2) | \
> > +                              (RV_X(x, 12, 1) << 5) | \
> > +                              (RV_X(x, 2, 2) << 6))
> > +#define RVC_LDSP_IMM(x)              ((RV_X(x, 5, 2) << 3) | \
> > +                              (RV_X(x, 12, 1) << 5) | \
> > +                              (RV_X(x, 2, 3) << 6))
> > +#define RVC_SWSP_IMM(x)              ((RV_X(x, 9, 4) << 2) | \
> > +                              (RV_X(x, 7, 2) << 6))
> > +#define RVC_SDSP_IMM(x)              ((RV_X(x, 10, 3) << 3) | \
> > +                              (RV_X(x, 7, 3) << 6))
> > +#define RVC_RS1S(insn)               (8 + RV_X(insn, SH_RD, 3))
> > +#define RVC_RS2S(insn)               (8 + RV_X(insn, SH_RS2C, 3))
> > +#define RVC_RS2(insn)                RV_X(insn, SH_RS2C, 5)
> > +
> > +#define SHIFT_RIGHT(x, y)            \
> > +     ((y) < 0 ? ((x) << -(y)) : ((x) >> (y)))
> > +
> > +#define REG_MASK                     \
> > +     ((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES))
> > +
> > +#define REG_OFFSET(insn, pos)                \
> > +     (SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK)
> > +
> > +#define REG_PTR(insn, pos, regs)     \
> > +     (ulong *)((ulong)(regs) + REG_OFFSET(insn, pos))
> > +
> > +#define GET_RM(insn)         (((insn) >> 12) & 7)
> > +
> > +#define GET_RS1(insn, regs)  (*REG_PTR(insn, SH_RS1, regs))
> > +#define GET_RS2(insn, regs)  (*REG_PTR(insn, SH_RS2, regs))
> > +#define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs))
> > +#define GET_RS2S(insn, regs) (*REG_PTR(RVC_RS2S(insn), 0, regs))
> > +#define GET_RS2C(insn, regs) (*REG_PTR(insn, SH_RS2C, regs))
> > +#define GET_SP(regs)         (*REG_PTR(2, 0, regs))
> > +#define SET_RD(insn, regs, val)      (*REG_PTR(insn, SH_RD, regs) = (val))
> > +#define IMM_I(insn)          ((s32)(insn) >> 20)
> > +#define IMM_S(insn)          (((s32)(insn) >> 25 << 5) | \
> > +                              (s32)(((insn) >> 7) & 0x1f))
> > +#define MASK_FUNCT3          0x7000
> > +
> > +#define STR(x)                       XSTR(x)
> > +#define XSTR(x)                      #x
> > +
> > +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
> > +static ulong get_insn(struct kvm_vcpu *vcpu)
> > +{
> > +     ulong __sepc = vcpu->arch.guest_context.sepc;
> > +     ulong __hstatus, __sstatus, __vsstatus;
> > +#ifdef CONFIG_RISCV_ISA_C
> > +     ulong rvc_mask = 3, tmp;
> > +#endif
> > +     ulong flags, val;
> > +
> > +     local_irq_save(flags);
> > +
> > +     __vsstatus = csr_read(CSR_VSSTATUS);
> > +     __sstatus = csr_read(CSR_SSTATUS);
> > +     __hstatus = csr_read(CSR_HSTATUS);
> > +
> > +     csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR);
> > +     csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR);
> > +     csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
>
> What happens when the insn load triggers a page fault, maybe because the
> guest was malicious and did
>
>    1) Run on page 0x1000
>    2) Remove map for 0x1000, do *not* flush TLB
>    3) Trigger MMIO
>
> That would DOS the host here, as the host kernel would continue running
> in guest address space, right?

Yes, we can certainly fault while accessing Guest instruction. We will
be fixing this issue in a followup series. We have mentioned this in cover
letter as well.

BTW, RISC-V spec is going to further improve to provide easy
access of faulting instruction to Hypervisor.
(Refer, https://github.com/riscv/riscv-isa-manual/issues/431)

Regards,
Anup

>
>
> Alex
>
> > +
> > +#ifndef CONFIG_RISCV_ISA_C
> > +     asm ("\n"
> > +#ifdef CONFIG_64BIT
> > +             STR(LWU) " %[insn], (%[addr])\n"
> > +#else
> > +             STR(LW) " %[insn], (%[addr])\n"
> > +#endif
> > +             : [insn] "=&r" (val) : [addr] "r" (__sepc));
> > +#else
> > +     asm ("and %[tmp], %[addr], 2\n"
> > +             "bnez %[tmp], 1f\n"
> > +#ifdef CONFIG_64BIT
> > +             STR(LWU) " %[insn], (%[addr])\n"
> > +#else
> > +             STR(LW) " %[insn], (%[addr])\n"
> > +#endif
> > +             "and %[tmp], %[insn], %[rvc_mask]\n"
> > +             "beq %[tmp], %[rvc_mask], 2f\n"
> > +             "sll %[insn], %[insn], %[xlen_minus_16]\n"
> > +             "srl %[insn], %[insn], %[xlen_minus_16]\n"
> > +             "j 2f\n"
> > +             "1:\n"
> > +             "lhu %[insn], (%[addr])\n"
> > +             "and %[tmp], %[insn], %[rvc_mask]\n"
> > +             "bne %[tmp], %[rvc_mask], 2f\n"
> > +             "lhu %[tmp], 2(%[addr])\n"
> > +             "sll %[tmp], %[tmp], 16\n"
> > +             "add %[insn], %[insn], %[tmp]\n"
> > +             "2:"
> > +     : [vsstatus] "+&r" (__vsstatus), [insn] "=&r" (val),
> > +       [tmp] "=&r" (tmp)
> > +     : [addr] "r" (__sepc), [rvc_mask] "r" (rvc_mask),
> > +       [xlen_minus_16] "i" (__riscv_xlen - 16));
> > +#endif
> > +
> > +     csr_write(CSR_HSTATUS, __hstatus);
> > +     csr_write(CSR_SSTATUS, __sstatus);
> > +     csr_write(CSR_VSSTATUS, __vsstatus);
> > +
> > +     local_irq_restore(flags);
> > +
> > +     return val;
> > +}
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming
  2019-08-22 12:28   ` Alexander Graf
@ 2019-08-22 12:38     ` Anup Patel
  2019-08-22 13:27       ` Alexander Graf
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-22 12:38 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 5:58 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 22.08.19 10:45, Anup Patel wrote:
> > This patch implements all required functions for programming
> > the stage2 page table for each Guest/VM.
> >
> > At high-level, the flow of stage2 related functions is similar
> > from KVM ARM/ARM64 implementation but the stage2 page table
> > format is quite different for KVM RISC-V.
> >
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   arch/riscv/include/asm/kvm_host.h     |  10 +
> >   arch/riscv/include/asm/pgtable-bits.h |   1 +
> >   arch/riscv/kvm/mmu.c                  | 637 +++++++++++++++++++++++++-
> >   3 files changed, 638 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > index 3b09158f80f2..a37775c92586 100644
> > --- a/arch/riscv/include/asm/kvm_host.h
> > +++ b/arch/riscv/include/asm/kvm_host.h
> > @@ -72,6 +72,13 @@ struct kvm_mmio_decode {
> >       int shift;
> >   };
> >
> > +#define KVM_MMU_PAGE_CACHE_NR_OBJS   32
> > +
> > +struct kvm_mmu_page_cache {
> > +     int nobjs;
> > +     void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS];
> > +};
> > +
> >   struct kvm_cpu_context {
> >       unsigned long zero;
> >       unsigned long ra;
> > @@ -163,6 +170,9 @@ struct kvm_vcpu_arch {
> >       /* MMIO instruction details */
> >       struct kvm_mmio_decode mmio_decode;
> >
> > +     /* Cache pages needed to program page tables with spinlock held */
> > +     struct kvm_mmu_page_cache mmu_page_cache;
> > +
> >       /* VCPU power-off state */
> >       bool power_off;
> >
> > diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
> > index bbaeb5d35842..be49d62fcc2b 100644
> > --- a/arch/riscv/include/asm/pgtable-bits.h
> > +++ b/arch/riscv/include/asm/pgtable-bits.h
> > @@ -26,6 +26,7 @@
> >
> >   #define _PAGE_SPECIAL   _PAGE_SOFT
> >   #define _PAGE_TABLE     _PAGE_PRESENT
> > +#define _PAGE_LEAF      (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
> >
> >   /*
> >    * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
> > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> > index 2b965f9aac07..9e95ab6769f6 100644
> > --- a/arch/riscv/kvm/mmu.c
> > +++ b/arch/riscv/kvm/mmu.c
> > @@ -18,6 +18,432 @@
> >   #include <asm/page.h>
> >   #include <asm/pgtable.h>
> >
> > +#ifdef CONFIG_64BIT
> > +#define stage2_have_pmd              true
> > +#define stage2_gpa_size              ((phys_addr_t)(1ULL << 39))
> > +#define stage2_cache_min_pages       2
> > +#else
> > +#define pmd_index(x)         0
> > +#define pfn_pmd(x, y)                ({ pmd_t __x = { 0 }; __x; })
> > +#define stage2_have_pmd              false
> > +#define stage2_gpa_size              ((phys_addr_t)(1ULL << 32))
> > +#define stage2_cache_min_pages       1
> > +#endif
> > +
> > +static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache,
> > +                           int min, int max)
> > +{
> > +     void *page;
> > +
> > +     BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS);
> > +     if (pcache->nobjs >= min)
> > +             return 0;
> > +     while (pcache->nobjs < max) {
> > +             page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> > +             if (!page)
> > +                     return -ENOMEM;
> > +             pcache->objects[pcache->nobjs++] = page;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache)
> > +{
> > +     while (pcache && pcache->nobjs)
> > +             free_page((unsigned long)pcache->objects[--pcache->nobjs]);
> > +}
> > +
> > +static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
> > +{
> > +     void *p;
> > +
> > +     if (!pcache)
> > +             return NULL;
> > +
> > +     BUG_ON(!pcache->nobjs);
> > +     p = pcache->objects[--pcache->nobjs];
> > +
> > +     return p;
> > +}
> > +
> > +struct local_guest_tlb_info {
> > +     struct kvm_vmid *vmid;
> > +     gpa_t addr;
> > +};
> > +
> > +static void local_guest_tlb_flush_vmid_gpa(void *info)
> > +{
> > +     struct local_guest_tlb_info *infop = info;
> > +
> > +     __kvm_riscv_hfence_gvma_vmid_gpa(READ_ONCE(infop->vmid->vmid_version),
> > +                                      infop->addr);
> > +}
> > +
> > +static void stage2_remote_tlb_flush(struct kvm *kvm, gpa_t addr)
> > +{
> > +     struct local_guest_tlb_info info;
> > +     struct kvm_vmid *vmid = &kvm->arch.vmid;
> > +
> > +     /* TODO: This should be SBI call */
> > +     info.vmid = vmid;
> > +     info.addr = addr;
> > +     preempt_disable();
> > +     smp_call_function_many(cpu_all_mask, local_guest_tlb_flush_vmid_gpa,
> > +                            &info, true);
>
> This is all nice and dandy on the toy 4 core systems we have today, but
> it will become a bottleneck further down the road.
>
> How many VMIDs do you have? Could you just allocate a new one every time
> you switch host CPUs? Then you know exactly which CPUs to flush by
> looking at all your vcpu structs and a local field that tells you which
> pCPU they're on at this moment.
>
> Either way, it's nothing that should block inclusion. For today, we're fine.

We are not happy about this either.

Other two options, we have are:
1. Have SBI calls for remote HFENCEs
2. Propose RISC-V ISA extension for remote FENCEs

Option1 is mostly extending SBI spec and implementing it in runtime
firmware.

Option2 is ideal solution but requires consensus among wider audience
in RISC-V foundation.

At this point, we are fine with a simple solution.

Regards,
Anup

>
>
> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 11/20] RISC-V: KVM: Handle WFI exits for VCPU
  2019-08-22 12:19   ` Alexander Graf
@ 2019-08-22 12:50     ` Anup Patel
  0 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22 12:50 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 5:49 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 22.08.19 10:45, Anup Patel wrote:
> > We get illegal instruction trap whenever Guest/VM executes WFI
> > instruction.
> >
> > This patch handles WFI trap by blocking the trapped VCPU using
> > kvm_vcpu_block() API. The blocked VCPU will be automatically
> > resumed whenever a VCPU interrupt is injected from user-space
> > or from in-kernel IRQCHIP emulation.
> >
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   arch/riscv/kvm/vcpu_exit.c | 88 ++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 88 insertions(+)
> >
> > diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> > index efc06198c259..fbc04fe335ad 100644
> > --- a/arch/riscv/kvm/vcpu_exit.c
> > +++ b/arch/riscv/kvm/vcpu_exit.c
> > @@ -12,6 +12,9 @@
> >   #include <linux/kvm_host.h>
> >   #include <asm/csr.h>
> >
> > +#define INSN_MASK_WFI                0xffffff00
> > +#define INSN_MATCH_WFI               0x10500000
> > +
> >   #define INSN_MATCH_LB               0x3
> >   #define INSN_MASK_LB                0x707f
> >   #define INSN_MATCH_LH               0x1003
> > @@ -179,6 +182,87 @@ static ulong get_insn(struct kvm_vcpu *vcpu)
> >       return val;
> >   }
> >
> > +typedef int (*illegal_insn_func)(struct kvm_vcpu *vcpu,
> > +                              struct kvm_run *run,
> > +                              ulong insn);
> > +
> > +static int truly_illegal_insn(struct kvm_vcpu *vcpu,
> > +                           struct kvm_run *run,
> > +                           ulong insn)
> > +{
> > +     /* TODO: Redirect trap to Guest VCPU */
> > +     return -ENOTSUPP;
> > +}
> > +
> > +static int system_opcode_insn(struct kvm_vcpu *vcpu,
> > +                           struct kvm_run *run,
> > +                           ulong insn)
> > +{
> > +     if ((insn & INSN_MASK_WFI) == INSN_MATCH_WFI) {
> > +             vcpu->stat.wfi_exit_stat++;
> > +             if (!kvm_arch_vcpu_runnable(vcpu)) {
> > +                     srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
> > +                     kvm_vcpu_block(vcpu);
> > +                     vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> > +                     kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> > +             }
> > +             vcpu->arch.guest_context.sepc += INSN_LEN(insn);
> > +             return 1;
> > +     }
> > +
> > +     return truly_illegal_insn(vcpu, run, insn);
> > +}
> > +
> > +static illegal_insn_func illegal_insn_table[32] = {
>
> Every time I did experiments on PowerPC with indirect tables like this
> over switch() in C, the switch() code won. CPUs are pretty good at
> predicting branches. Predicting indirect jumps however, they are
> terrible at.
>
> So unless you consider the jump table more readable / maintainable, I
> would suggest to use a simple switch() statement. It will be faster and
> smaller.

Yes, readability was the reason why we choose jump table but
I see your point. Most of the entries in jump table point to
truly_illegal_insn() so I guess switch case will be quite simple
here.

I will update this in next revision.

Regards,
Anup

>
>
> Alex
>
>
> > +     truly_illegal_insn, /* 0 */
> > +     truly_illegal_insn, /* 1 */
> > +     truly_illegal_insn, /* 2 */
> > +     truly_illegal_insn, /* 3 */
> > +     truly_illegal_insn, /* 4 */
> > +     truly_illegal_insn, /* 5 */
> > +     truly_illegal_insn, /* 6 */
> > +     truly_illegal_insn, /* 7 */
> > +     truly_illegal_insn, /* 8 */
> > +     truly_illegal_insn, /* 9 */
> > +     truly_illegal_insn, /* 10 */
> > +     truly_illegal_insn, /* 11 */
> > +     truly_illegal_insn, /* 12 */
> > +     truly_illegal_insn, /* 13 */
> > +     truly_illegal_insn, /* 14 */
> > +     truly_illegal_insn, /* 15 */
> > +     truly_illegal_insn, /* 16 */
> > +     truly_illegal_insn, /* 17 */
> > +     truly_illegal_insn, /* 18 */
> > +     truly_illegal_insn, /* 19 */
> > +     truly_illegal_insn, /* 20 */
> > +     truly_illegal_insn, /* 21 */
> > +     truly_illegal_insn, /* 22 */
> > +     truly_illegal_insn, /* 23 */
> > +     truly_illegal_insn, /* 24 */
> > +     truly_illegal_insn, /* 25 */
> > +     truly_illegal_insn, /* 26 */
> > +     truly_illegal_insn, /* 27 */
> > +     system_opcode_insn, /* 28 */
> > +     truly_illegal_insn, /* 29 */
> > +     truly_illegal_insn, /* 30 */
> > +     truly_illegal_insn  /* 31 */
> > +};
> > +
> > +static int illegal_inst_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > +                           unsigned long stval)
> > +{
> > +     ulong insn = stval;
> > +
> > +     if (unlikely((insn & 3) != 3)) {
> > +             if (insn == 0)
> > +                     insn = get_insn(vcpu);
> > +             if ((insn & 3) != 3)
> > +                     return truly_illegal_insn(vcpu, run, insn);
> > +     }
> > +
> > +     return illegal_insn_table[(insn & 0x7c) >> 2](vcpu, run, insn);
> > +}
> > +
> >   static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >                       unsigned long fault_addr)
> >   {
> > @@ -439,6 +523,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >       ret = -EFAULT;
> >       run->exit_reason = KVM_EXIT_UNKNOWN;
> >       switch (scause) {
> > +     case EXC_INST_ILLEGAL:
> > +             if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
> > +                     ret = illegal_inst_fault(vcpu, run, stval);
> > +             break;
> >       case EXC_INST_PAGE_FAULT:
> >       case EXC_LOAD_PAGE_FAULT:
> >       case EXC_STORE_PAGE_FAULT:
> >
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU
  2019-08-22 12:33     ` Anup Patel
@ 2019-08-22 13:25       ` Alexander Graf
  2019-08-22 13:55         ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-22 13:25 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel



On 22.08.19 14:33, Anup Patel wrote:
> On Thu, Aug 22, 2019 at 5:44 PM Alexander Graf <graf@amazon.com> wrote:
>>
>> On 22.08.19 10:44, Anup Patel wrote:
>>> We will get stage2 page faults whenever Guest/VM access SW emulated
>>> MMIO device or unmapped Guest RAM.
>>>
>>> This patch implements MMIO read/write emulation by extracting MMIO
>>> details from the trapped load/store instruction and forwarding the
>>> MMIO read/write to user-space. The actual MMIO emulation will happen
>>> in user-space and KVM kernel module will only take care of register
>>> updates before resuming the trapped VCPU.
>>>
>>> The handling for stage2 page faults for unmapped Guest RAM will be
>>> implemeted by a separate patch later.
>>>
>>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
>>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>    arch/riscv/include/asm/kvm_host.h |  11 +
>>>    arch/riscv/kvm/mmu.c              |   7 +
>>>    arch/riscv/kvm/vcpu_exit.c        | 436 +++++++++++++++++++++++++++++-
>>>    3 files changed, 451 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
>>> index 18f1097f1d8d..4388bace6d70 100644
>>> --- a/arch/riscv/include/asm/kvm_host.h
>>> +++ b/arch/riscv/include/asm/kvm_host.h
>>> @@ -53,6 +53,12 @@ struct kvm_arch {
>>>        phys_addr_t pgd_phys;
>>>    };
>>>
>>> +struct kvm_mmio_decode {
>>> +     unsigned long insn;
>>> +     int len;
>>> +     int shift;
>>> +};
>>> +
>>>    struct kvm_cpu_context {
>>>        unsigned long zero;
>>>        unsigned long ra;
>>> @@ -141,6 +147,9 @@ struct kvm_vcpu_arch {
>>>        unsigned long irqs_pending;
>>>        unsigned long irqs_pending_mask;
>>>
>>> +     /* MMIO instruction details */
>>> +     struct kvm_mmio_decode mmio_decode;
>>> +
>>>        /* VCPU power-off state */
>>>        bool power_off;
>>>
>>> @@ -160,6 +169,8 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>    int kvm_riscv_setup_vsip(void);
>>>    void kvm_riscv_cleanup_vsip(void);
>>>
>>> +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
>>> +                      bool is_write);
>>>    void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
>>>    int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
>>>    void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
>>> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
>>> index 04dd089b86ff..2b965f9aac07 100644
>>> --- a/arch/riscv/kvm/mmu.c
>>> +++ b/arch/riscv/kvm/mmu.c
>>> @@ -61,6 +61,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>>>        return 0;
>>>    }
>>>
>>> +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
>>> +                      bool is_write)
>>> +{
>>> +     /* TODO: */
>>> +     return 0;
>>> +}
>>> +
>>>    void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
>>>    {
>>>        /* TODO: */
>>> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
>>> index e4d7c8f0807a..efc06198c259 100644
>>> --- a/arch/riscv/kvm/vcpu_exit.c
>>> +++ b/arch/riscv/kvm/vcpu_exit.c
>>> @@ -6,9 +6,371 @@
>>>     *     Anup Patel <anup.patel@wdc.com>
>>>     */
>>>
>>> +#include <linux/bitops.h>
>>>    #include <linux/errno.h>
>>>    #include <linux/err.h>
>>>    #include <linux/kvm_host.h>
>>> +#include <asm/csr.h>
>>> +
>>> +#define INSN_MATCH_LB                0x3
>>> +#define INSN_MASK_LB         0x707f
>>> +#define INSN_MATCH_LH                0x1003
>>> +#define INSN_MASK_LH         0x707f
>>> +#define INSN_MATCH_LW                0x2003
>>> +#define INSN_MASK_LW         0x707f
>>> +#define INSN_MATCH_LD                0x3003
>>> +#define INSN_MASK_LD         0x707f
>>> +#define INSN_MATCH_LBU               0x4003
>>> +#define INSN_MASK_LBU                0x707f
>>> +#define INSN_MATCH_LHU               0x5003
>>> +#define INSN_MASK_LHU                0x707f
>>> +#define INSN_MATCH_LWU               0x6003
>>> +#define INSN_MASK_LWU                0x707f
>>> +#define INSN_MATCH_SB                0x23
>>> +#define INSN_MASK_SB         0x707f
>>> +#define INSN_MATCH_SH                0x1023
>>> +#define INSN_MASK_SH         0x707f
>>> +#define INSN_MATCH_SW                0x2023
>>> +#define INSN_MASK_SW         0x707f
>>> +#define INSN_MATCH_SD                0x3023
>>> +#define INSN_MASK_SD         0x707f
>>> +
>>> +#define INSN_MATCH_C_LD              0x6000
>>> +#define INSN_MASK_C_LD               0xe003
>>> +#define INSN_MATCH_C_SD              0xe000
>>> +#define INSN_MASK_C_SD               0xe003
>>> +#define INSN_MATCH_C_LW              0x4000
>>> +#define INSN_MASK_C_LW               0xe003
>>> +#define INSN_MATCH_C_SW              0xc000
>>> +#define INSN_MASK_C_SW               0xe003
>>> +#define INSN_MATCH_C_LDSP    0x6002
>>> +#define INSN_MASK_C_LDSP     0xe003
>>> +#define INSN_MATCH_C_SDSP    0xe002
>>> +#define INSN_MASK_C_SDSP     0xe003
>>> +#define INSN_MATCH_C_LWSP    0x4002
>>> +#define INSN_MASK_C_LWSP     0xe003
>>> +#define INSN_MATCH_C_SWSP    0xc002
>>> +#define INSN_MASK_C_SWSP     0xe003
>>> +
>>> +#define INSN_LEN(insn)               ((((insn) & 0x3) < 0x3) ? 2 : 4)
>>> +
>>> +#ifdef CONFIG_64BIT
>>> +#define LOG_REGBYTES         3
>>> +#else
>>> +#define LOG_REGBYTES         2
>>> +#endif
>>> +#define REGBYTES             (1 << LOG_REGBYTES)
>>> +
>>> +#define SH_RD                        7
>>> +#define SH_RS1                       15
>>> +#define SH_RS2                       20
>>> +#define SH_RS2C                      2
>>> +
>>> +#define RV_X(x, s, n)                (((x) >> (s)) & ((1 << (n)) - 1))
>>> +#define RVC_LW_IMM(x)                ((RV_X(x, 6, 1) << 2) | \
>>> +                              (RV_X(x, 10, 3) << 3) | \
>>> +                              (RV_X(x, 5, 1) << 6))
>>> +#define RVC_LD_IMM(x)                ((RV_X(x, 10, 3) << 3) | \
>>> +                              (RV_X(x, 5, 2) << 6))
>>> +#define RVC_LWSP_IMM(x)              ((RV_X(x, 4, 3) << 2) | \
>>> +                              (RV_X(x, 12, 1) << 5) | \
>>> +                              (RV_X(x, 2, 2) << 6))
>>> +#define RVC_LDSP_IMM(x)              ((RV_X(x, 5, 2) << 3) | \
>>> +                              (RV_X(x, 12, 1) << 5) | \
>>> +                              (RV_X(x, 2, 3) << 6))
>>> +#define RVC_SWSP_IMM(x)              ((RV_X(x, 9, 4) << 2) | \
>>> +                              (RV_X(x, 7, 2) << 6))
>>> +#define RVC_SDSP_IMM(x)              ((RV_X(x, 10, 3) << 3) | \
>>> +                              (RV_X(x, 7, 3) << 6))
>>> +#define RVC_RS1S(insn)               (8 + RV_X(insn, SH_RD, 3))
>>> +#define RVC_RS2S(insn)               (8 + RV_X(insn, SH_RS2C, 3))
>>> +#define RVC_RS2(insn)                RV_X(insn, SH_RS2C, 5)
>>> +
>>> +#define SHIFT_RIGHT(x, y)            \
>>> +     ((y) < 0 ? ((x) << -(y)) : ((x) >> (y)))
>>> +
>>> +#define REG_MASK                     \
>>> +     ((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES))
>>> +
>>> +#define REG_OFFSET(insn, pos)                \
>>> +     (SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK)
>>> +
>>> +#define REG_PTR(insn, pos, regs)     \
>>> +     (ulong *)((ulong)(regs) + REG_OFFSET(insn, pos))
>>> +
>>> +#define GET_RM(insn)         (((insn) >> 12) & 7)
>>> +
>>> +#define GET_RS1(insn, regs)  (*REG_PTR(insn, SH_RS1, regs))
>>> +#define GET_RS2(insn, regs)  (*REG_PTR(insn, SH_RS2, regs))
>>> +#define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs))
>>> +#define GET_RS2S(insn, regs) (*REG_PTR(RVC_RS2S(insn), 0, regs))
>>> +#define GET_RS2C(insn, regs) (*REG_PTR(insn, SH_RS2C, regs))
>>> +#define GET_SP(regs)         (*REG_PTR(2, 0, regs))
>>> +#define SET_RD(insn, regs, val)      (*REG_PTR(insn, SH_RD, regs) = (val))
>>> +#define IMM_I(insn)          ((s32)(insn) >> 20)
>>> +#define IMM_S(insn)          (((s32)(insn) >> 25 << 5) | \
>>> +                              (s32)(((insn) >> 7) & 0x1f))
>>> +#define MASK_FUNCT3          0x7000
>>> +
>>> +#define STR(x)                       XSTR(x)
>>> +#define XSTR(x)                      #x
>>> +
>>> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
>>> +static ulong get_insn(struct kvm_vcpu *vcpu)
>>> +{
>>> +     ulong __sepc = vcpu->arch.guest_context.sepc;
>>> +     ulong __hstatus, __sstatus, __vsstatus;
>>> +#ifdef CONFIG_RISCV_ISA_C
>>> +     ulong rvc_mask = 3, tmp;
>>> +#endif
>>> +     ulong flags, val;
>>> +
>>> +     local_irq_save(flags);
>>> +
>>> +     __vsstatus = csr_read(CSR_VSSTATUS);
>>> +     __sstatus = csr_read(CSR_SSTATUS);
>>> +     __hstatus = csr_read(CSR_HSTATUS);
>>> +
>>> +     csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR);
>>> +     csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR);
>>> +     csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
>>
>> What happens when the insn load triggers a page fault, maybe because the
>> guest was malicious and did
>>
>>     1) Run on page 0x1000
>>     2) Remove map for 0x1000, do *not* flush TLB
>>     3) Trigger MMIO
>>
>> That would DOS the host here, as the host kernel would continue running
>> in guest address space, right?
> 
> Yes, we can certainly fault while accessing Guest instruction. We will
> be fixing this issue in a followup series. We have mentioned this in cover
> letter as well.

I don't think the cover letter is the right place for such a comment. 
Please definitely put it into the code as well, pointing out that this 
is a known bug. Or even better yet: Fix it up properly :).

In fact, with a bug that dramatic, I'm not even sure we can safely 
include the code. We're consciously allowing user space to DOS the kernel.

> 
> BTW, RISC-V spec is going to further improve to provide easy
> access of faulting instruction to Hypervisor.
> (Refer, https://github.com/riscv/riscv-isa-manual/issues/431)

Yes, we have similar extensions on other archs. Is this going to be an 
optional addition or a mandatory bit of the hypervisor spec? If it's not 
mandatory, we can not rely on it, so the current path has to be safe.


Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming
  2019-08-22 12:38     ` Anup Patel
@ 2019-08-22 13:27       ` Alexander Graf
  2019-08-22 13:58         ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-22 13:27 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel



On 22.08.19 14:38, Anup Patel wrote:
> On Thu, Aug 22, 2019 at 5:58 PM Alexander Graf <graf@amazon.com> wrote:
>>
>> On 22.08.19 10:45, Anup Patel wrote:
>>> This patch implements all required functions for programming
>>> the stage2 page table for each Guest/VM.
>>>
>>> At high-level, the flow of stage2 related functions is similar
>>> from KVM ARM/ARM64 implementation but the stage2 page table
>>> format is quite different for KVM RISC-V.
>>>
>>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
>>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>    arch/riscv/include/asm/kvm_host.h     |  10 +
>>>    arch/riscv/include/asm/pgtable-bits.h |   1 +
>>>    arch/riscv/kvm/mmu.c                  | 637 +++++++++++++++++++++++++-
>>>    3 files changed, 638 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
>>> index 3b09158f80f2..a37775c92586 100644
>>> --- a/arch/riscv/include/asm/kvm_host.h
>>> +++ b/arch/riscv/include/asm/kvm_host.h
>>> @@ -72,6 +72,13 @@ struct kvm_mmio_decode {
>>>        int shift;
>>>    };
>>>
>>> +#define KVM_MMU_PAGE_CACHE_NR_OBJS   32
>>> +
>>> +struct kvm_mmu_page_cache {
>>> +     int nobjs;
>>> +     void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS];
>>> +};
>>> +
>>>    struct kvm_cpu_context {
>>>        unsigned long zero;
>>>        unsigned long ra;
>>> @@ -163,6 +170,9 @@ struct kvm_vcpu_arch {
>>>        /* MMIO instruction details */
>>>        struct kvm_mmio_decode mmio_decode;
>>>
>>> +     /* Cache pages needed to program page tables with spinlock held */
>>> +     struct kvm_mmu_page_cache mmu_page_cache;
>>> +
>>>        /* VCPU power-off state */
>>>        bool power_off;
>>>
>>> diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
>>> index bbaeb5d35842..be49d62fcc2b 100644
>>> --- a/arch/riscv/include/asm/pgtable-bits.h
>>> +++ b/arch/riscv/include/asm/pgtable-bits.h
>>> @@ -26,6 +26,7 @@
>>>
>>>    #define _PAGE_SPECIAL   _PAGE_SOFT
>>>    #define _PAGE_TABLE     _PAGE_PRESENT
>>> +#define _PAGE_LEAF      (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
>>>
>>>    /*
>>>     * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
>>> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
>>> index 2b965f9aac07..9e95ab6769f6 100644
>>> --- a/arch/riscv/kvm/mmu.c
>>> +++ b/arch/riscv/kvm/mmu.c
>>> @@ -18,6 +18,432 @@
>>>    #include <asm/page.h>
>>>    #include <asm/pgtable.h>
>>>
>>> +#ifdef CONFIG_64BIT
>>> +#define stage2_have_pmd              true
>>> +#define stage2_gpa_size              ((phys_addr_t)(1ULL << 39))
>>> +#define stage2_cache_min_pages       2
>>> +#else
>>> +#define pmd_index(x)         0
>>> +#define pfn_pmd(x, y)                ({ pmd_t __x = { 0 }; __x; })
>>> +#define stage2_have_pmd              false
>>> +#define stage2_gpa_size              ((phys_addr_t)(1ULL << 32))
>>> +#define stage2_cache_min_pages       1
>>> +#endif
>>> +
>>> +static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache,
>>> +                           int min, int max)
>>> +{
>>> +     void *page;
>>> +
>>> +     BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS);
>>> +     if (pcache->nobjs >= min)
>>> +             return 0;
>>> +     while (pcache->nobjs < max) {
>>> +             page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>>> +             if (!page)
>>> +                     return -ENOMEM;
>>> +             pcache->objects[pcache->nobjs++] = page;
>>> +     }
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache)
>>> +{
>>> +     while (pcache && pcache->nobjs)
>>> +             free_page((unsigned long)pcache->objects[--pcache->nobjs]);
>>> +}
>>> +
>>> +static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
>>> +{
>>> +     void *p;
>>> +
>>> +     if (!pcache)
>>> +             return NULL;
>>> +
>>> +     BUG_ON(!pcache->nobjs);
>>> +     p = pcache->objects[--pcache->nobjs];
>>> +
>>> +     return p;
>>> +}
>>> +
>>> +struct local_guest_tlb_info {
>>> +     struct kvm_vmid *vmid;
>>> +     gpa_t addr;
>>> +};
>>> +
>>> +static void local_guest_tlb_flush_vmid_gpa(void *info)
>>> +{
>>> +     struct local_guest_tlb_info *infop = info;
>>> +
>>> +     __kvm_riscv_hfence_gvma_vmid_gpa(READ_ONCE(infop->vmid->vmid_version),
>>> +                                      infop->addr);
>>> +}
>>> +
>>> +static void stage2_remote_tlb_flush(struct kvm *kvm, gpa_t addr)
>>> +{
>>> +     struct local_guest_tlb_info info;
>>> +     struct kvm_vmid *vmid = &kvm->arch.vmid;
>>> +
>>> +     /* TODO: This should be SBI call */
>>> +     info.vmid = vmid;
>>> +     info.addr = addr;
>>> +     preempt_disable();
>>> +     smp_call_function_many(cpu_all_mask, local_guest_tlb_flush_vmid_gpa,
>>> +                            &info, true);
>>
>> This is all nice and dandy on the toy 4 core systems we have today, but
>> it will become a bottleneck further down the road.
>>
>> How many VMIDs do you have? Could you just allocate a new one every time
>> you switch host CPUs? Then you know exactly which CPUs to flush by
>> looking at all your vcpu structs and a local field that tells you which
>> pCPU they're on at this moment.
>>
>> Either way, it's nothing that should block inclusion. For today, we're fine.
> 
> We are not happy about this either.
> 
> Other two options, we have are:
> 1. Have SBI calls for remote HFENCEs
> 2. Propose RISC-V ISA extension for remote FENCEs
> 
> Option1 is mostly extending SBI spec and implementing it in runtime
> firmware.
> 
> Option2 is ideal solution but requires consensus among wider audience
> in RISC-V foundation.
> 
> At this point, we are fine with a simple solution.

It's fine to explicitly IPI other CPUs to flush their TLBs. What is not 
fine is to IPI *all* CPUs to flush their TLBs.


Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU
  2019-08-22 13:25       ` Alexander Graf
@ 2019-08-22 13:55         ` Anup Patel
  0 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22 13:55 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 6:55 PM Alexander Graf <graf@amazon.com> wrote:
>
>
>
> On 22.08.19 14:33, Anup Patel wrote:
> > On Thu, Aug 22, 2019 at 5:44 PM Alexander Graf <graf@amazon.com> wrote:
> >>
> >> On 22.08.19 10:44, Anup Patel wrote:
> >>> We will get stage2 page faults whenever Guest/VM access SW emulated
> >>> MMIO device or unmapped Guest RAM.
> >>>
> >>> This patch implements MMIO read/write emulation by extracting MMIO
> >>> details from the trapped load/store instruction and forwarding the
> >>> MMIO read/write to user-space. The actual MMIO emulation will happen
> >>> in user-space and KVM kernel module will only take care of register
> >>> updates before resuming the trapped VCPU.
> >>>
> >>> The handling for stage2 page faults for unmapped Guest RAM will be
> >>> implemeted by a separate patch later.
> >>>
> >>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> >>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> ---
> >>>    arch/riscv/include/asm/kvm_host.h |  11 +
> >>>    arch/riscv/kvm/mmu.c              |   7 +
> >>>    arch/riscv/kvm/vcpu_exit.c        | 436 +++++++++++++++++++++++++++++-
> >>>    3 files changed, 451 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> >>> index 18f1097f1d8d..4388bace6d70 100644
> >>> --- a/arch/riscv/include/asm/kvm_host.h
> >>> +++ b/arch/riscv/include/asm/kvm_host.h
> >>> @@ -53,6 +53,12 @@ struct kvm_arch {
> >>>        phys_addr_t pgd_phys;
> >>>    };
> >>>
> >>> +struct kvm_mmio_decode {
> >>> +     unsigned long insn;
> >>> +     int len;
> >>> +     int shift;
> >>> +};
> >>> +
> >>>    struct kvm_cpu_context {
> >>>        unsigned long zero;
> >>>        unsigned long ra;
> >>> @@ -141,6 +147,9 @@ struct kvm_vcpu_arch {
> >>>        unsigned long irqs_pending;
> >>>        unsigned long irqs_pending_mask;
> >>>
> >>> +     /* MMIO instruction details */
> >>> +     struct kvm_mmio_decode mmio_decode;
> >>> +
> >>>        /* VCPU power-off state */
> >>>        bool power_off;
> >>>
> >>> @@ -160,6 +169,8 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
> >>>    int kvm_riscv_setup_vsip(void);
> >>>    void kvm_riscv_cleanup_vsip(void);
> >>>
> >>> +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
> >>> +                      bool is_write);
> >>>    void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
> >>>    int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
> >>>    void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
> >>> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> >>> index 04dd089b86ff..2b965f9aac07 100644
> >>> --- a/arch/riscv/kvm/mmu.c
> >>> +++ b/arch/riscv/kvm/mmu.c
> >>> @@ -61,6 +61,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >>>        return 0;
> >>>    }
> >>>
> >>> +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
> >>> +                      bool is_write)
> >>> +{
> >>> +     /* TODO: */
> >>> +     return 0;
> >>> +}
> >>> +
> >>>    void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
> >>>    {
> >>>        /* TODO: */
> >>> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> >>> index e4d7c8f0807a..efc06198c259 100644
> >>> --- a/arch/riscv/kvm/vcpu_exit.c
> >>> +++ b/arch/riscv/kvm/vcpu_exit.c
> >>> @@ -6,9 +6,371 @@
> >>>     *     Anup Patel <anup.patel@wdc.com>
> >>>     */
> >>>
> >>> +#include <linux/bitops.h>
> >>>    #include <linux/errno.h>
> >>>    #include <linux/err.h>
> >>>    #include <linux/kvm_host.h>
> >>> +#include <asm/csr.h>
> >>> +
> >>> +#define INSN_MATCH_LB                0x3
> >>> +#define INSN_MASK_LB         0x707f
> >>> +#define INSN_MATCH_LH                0x1003
> >>> +#define INSN_MASK_LH         0x707f
> >>> +#define INSN_MATCH_LW                0x2003
> >>> +#define INSN_MASK_LW         0x707f
> >>> +#define INSN_MATCH_LD                0x3003
> >>> +#define INSN_MASK_LD         0x707f
> >>> +#define INSN_MATCH_LBU               0x4003
> >>> +#define INSN_MASK_LBU                0x707f
> >>> +#define INSN_MATCH_LHU               0x5003
> >>> +#define INSN_MASK_LHU                0x707f
> >>> +#define INSN_MATCH_LWU               0x6003
> >>> +#define INSN_MASK_LWU                0x707f
> >>> +#define INSN_MATCH_SB                0x23
> >>> +#define INSN_MASK_SB         0x707f
> >>> +#define INSN_MATCH_SH                0x1023
> >>> +#define INSN_MASK_SH         0x707f
> >>> +#define INSN_MATCH_SW                0x2023
> >>> +#define INSN_MASK_SW         0x707f
> >>> +#define INSN_MATCH_SD                0x3023
> >>> +#define INSN_MASK_SD         0x707f
> >>> +
> >>> +#define INSN_MATCH_C_LD              0x6000
> >>> +#define INSN_MASK_C_LD               0xe003
> >>> +#define INSN_MATCH_C_SD              0xe000
> >>> +#define INSN_MASK_C_SD               0xe003
> >>> +#define INSN_MATCH_C_LW              0x4000
> >>> +#define INSN_MASK_C_LW               0xe003
> >>> +#define INSN_MATCH_C_SW              0xc000
> >>> +#define INSN_MASK_C_SW               0xe003
> >>> +#define INSN_MATCH_C_LDSP    0x6002
> >>> +#define INSN_MASK_C_LDSP     0xe003
> >>> +#define INSN_MATCH_C_SDSP    0xe002
> >>> +#define INSN_MASK_C_SDSP     0xe003
> >>> +#define INSN_MATCH_C_LWSP    0x4002
> >>> +#define INSN_MASK_C_LWSP     0xe003
> >>> +#define INSN_MATCH_C_SWSP    0xc002
> >>> +#define INSN_MASK_C_SWSP     0xe003
> >>> +
> >>> +#define INSN_LEN(insn)               ((((insn) & 0x3) < 0x3) ? 2 : 4)
> >>> +
> >>> +#ifdef CONFIG_64BIT
> >>> +#define LOG_REGBYTES         3
> >>> +#else
> >>> +#define LOG_REGBYTES         2
> >>> +#endif
> >>> +#define REGBYTES             (1 << LOG_REGBYTES)
> >>> +
> >>> +#define SH_RD                        7
> >>> +#define SH_RS1                       15
> >>> +#define SH_RS2                       20
> >>> +#define SH_RS2C                      2
> >>> +
> >>> +#define RV_X(x, s, n)                (((x) >> (s)) & ((1 << (n)) - 1))
> >>> +#define RVC_LW_IMM(x)                ((RV_X(x, 6, 1) << 2) | \
> >>> +                              (RV_X(x, 10, 3) << 3) | \
> >>> +                              (RV_X(x, 5, 1) << 6))
> >>> +#define RVC_LD_IMM(x)                ((RV_X(x, 10, 3) << 3) | \
> >>> +                              (RV_X(x, 5, 2) << 6))
> >>> +#define RVC_LWSP_IMM(x)              ((RV_X(x, 4, 3) << 2) | \
> >>> +                              (RV_X(x, 12, 1) << 5) | \
> >>> +                              (RV_X(x, 2, 2) << 6))
> >>> +#define RVC_LDSP_IMM(x)              ((RV_X(x, 5, 2) << 3) | \
> >>> +                              (RV_X(x, 12, 1) << 5) | \
> >>> +                              (RV_X(x, 2, 3) << 6))
> >>> +#define RVC_SWSP_IMM(x)              ((RV_X(x, 9, 4) << 2) | \
> >>> +                              (RV_X(x, 7, 2) << 6))
> >>> +#define RVC_SDSP_IMM(x)              ((RV_X(x, 10, 3) << 3) | \
> >>> +                              (RV_X(x, 7, 3) << 6))
> >>> +#define RVC_RS1S(insn)               (8 + RV_X(insn, SH_RD, 3))
> >>> +#define RVC_RS2S(insn)               (8 + RV_X(insn, SH_RS2C, 3))
> >>> +#define RVC_RS2(insn)                RV_X(insn, SH_RS2C, 5)
> >>> +
> >>> +#define SHIFT_RIGHT(x, y)            \
> >>> +     ((y) < 0 ? ((x) << -(y)) : ((x) >> (y)))
> >>> +
> >>> +#define REG_MASK                     \
> >>> +     ((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES))
> >>> +
> >>> +#define REG_OFFSET(insn, pos)                \
> >>> +     (SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK)
> >>> +
> >>> +#define REG_PTR(insn, pos, regs)     \
> >>> +     (ulong *)((ulong)(regs) + REG_OFFSET(insn, pos))
> >>> +
> >>> +#define GET_RM(insn)         (((insn) >> 12) & 7)
> >>> +
> >>> +#define GET_RS1(insn, regs)  (*REG_PTR(insn, SH_RS1, regs))
> >>> +#define GET_RS2(insn, regs)  (*REG_PTR(insn, SH_RS2, regs))
> >>> +#define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs))
> >>> +#define GET_RS2S(insn, regs) (*REG_PTR(RVC_RS2S(insn), 0, regs))
> >>> +#define GET_RS2C(insn, regs) (*REG_PTR(insn, SH_RS2C, regs))
> >>> +#define GET_SP(regs)         (*REG_PTR(2, 0, regs))
> >>> +#define SET_RD(insn, regs, val)      (*REG_PTR(insn, SH_RD, regs) = (val))
> >>> +#define IMM_I(insn)          ((s32)(insn) >> 20)
> >>> +#define IMM_S(insn)          (((s32)(insn) >> 25 << 5) | \
> >>> +                              (s32)(((insn) >> 7) & 0x1f))
> >>> +#define MASK_FUNCT3          0x7000
> >>> +
> >>> +#define STR(x)                       XSTR(x)
> >>> +#define XSTR(x)                      #x
> >>> +
> >>> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
> >>> +static ulong get_insn(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +     ulong __sepc = vcpu->arch.guest_context.sepc;
> >>> +     ulong __hstatus, __sstatus, __vsstatus;
> >>> +#ifdef CONFIG_RISCV_ISA_C
> >>> +     ulong rvc_mask = 3, tmp;
> >>> +#endif
> >>> +     ulong flags, val;
> >>> +
> >>> +     local_irq_save(flags);
> >>> +
> >>> +     __vsstatus = csr_read(CSR_VSSTATUS);
> >>> +     __sstatus = csr_read(CSR_SSTATUS);
> >>> +     __hstatus = csr_read(CSR_HSTATUS);
> >>> +
> >>> +     csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR);
> >>> +     csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR);
> >>> +     csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> >>
> >> What happens when the insn load triggers a page fault, maybe because the
> >> guest was malicious and did
> >>
> >>     1) Run on page 0x1000
> >>     2) Remove map for 0x1000, do *not* flush TLB
> >>     3) Trigger MMIO
> >>
> >> That would DOS the host here, as the host kernel would continue running
> >> in guest address space, right?
> >
> > Yes, we can certainly fault while accessing Guest instruction. We will
> > be fixing this issue in a followup series. We have mentioned this in cover
> > letter as well.
>
> I don't think the cover letter is the right place for such a comment.
> Please definitely put it into the code as well, pointing out that this
> is a known bug. Or even better yet: Fix it up properly :).
>
> In fact, with a bug that dramatic, I'm not even sure we can safely
> include the code. We're consciously allowing user space to DOS the kernel.

There is already a TODO comment above get_insn() function.

>
> >
> > BTW, RISC-V spec is going to further improve to provide easy
> > access of faulting instruction to Hypervisor.
> > (Refer, https://github.com/riscv/riscv-isa-manual/issues/431)
>
> Yes, we have similar extensions on other archs. Is this going to be an
> optional addition or a mandatory bit of the hypervisor spec? If it's not
> mandatory, we can not rely on it, so the current path has to be safe.

Yes, it's going to be optional so we are certainly going to fix this issue
here.

This issue discussed in previous patch reviews. We have already
agreed to fix this in next revision.

Regards,
Anup

>
>
> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming
  2019-08-22 13:27       ` Alexander Graf
@ 2019-08-22 13:58         ` Anup Patel
  2019-08-22 14:09           ` Alexander Graf
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-22 13:58 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 6:57 PM Alexander Graf <graf@amazon.com> wrote:
>
>
>
> On 22.08.19 14:38, Anup Patel wrote:
> > On Thu, Aug 22, 2019 at 5:58 PM Alexander Graf <graf@amazon.com> wrote:
> >>
> >> On 22.08.19 10:45, Anup Patel wrote:
> >>> This patch implements all required functions for programming
> >>> the stage2 page table for each Guest/VM.
> >>>
> >>> At high-level, the flow of stage2 related functions is similar
> >>> from KVM ARM/ARM64 implementation but the stage2 page table
> >>> format is quite different for KVM RISC-V.
> >>>
> >>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> >>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> ---
> >>>    arch/riscv/include/asm/kvm_host.h     |  10 +
> >>>    arch/riscv/include/asm/pgtable-bits.h |   1 +
> >>>    arch/riscv/kvm/mmu.c                  | 637 +++++++++++++++++++++++++-
> >>>    3 files changed, 638 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> >>> index 3b09158f80f2..a37775c92586 100644
> >>> --- a/arch/riscv/include/asm/kvm_host.h
> >>> +++ b/arch/riscv/include/asm/kvm_host.h
> >>> @@ -72,6 +72,13 @@ struct kvm_mmio_decode {
> >>>        int shift;
> >>>    };
> >>>
> >>> +#define KVM_MMU_PAGE_CACHE_NR_OBJS   32
> >>> +
> >>> +struct kvm_mmu_page_cache {
> >>> +     int nobjs;
> >>> +     void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS];
> >>> +};
> >>> +
> >>>    struct kvm_cpu_context {
> >>>        unsigned long zero;
> >>>        unsigned long ra;
> >>> @@ -163,6 +170,9 @@ struct kvm_vcpu_arch {
> >>>        /* MMIO instruction details */
> >>>        struct kvm_mmio_decode mmio_decode;
> >>>
> >>> +     /* Cache pages needed to program page tables with spinlock held */
> >>> +     struct kvm_mmu_page_cache mmu_page_cache;
> >>> +
> >>>        /* VCPU power-off state */
> >>>        bool power_off;
> >>>
> >>> diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
> >>> index bbaeb5d35842..be49d62fcc2b 100644
> >>> --- a/arch/riscv/include/asm/pgtable-bits.h
> >>> +++ b/arch/riscv/include/asm/pgtable-bits.h
> >>> @@ -26,6 +26,7 @@
> >>>
> >>>    #define _PAGE_SPECIAL   _PAGE_SOFT
> >>>    #define _PAGE_TABLE     _PAGE_PRESENT
> >>> +#define _PAGE_LEAF      (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
> >>>
> >>>    /*
> >>>     * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
> >>> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> >>> index 2b965f9aac07..9e95ab6769f6 100644
> >>> --- a/arch/riscv/kvm/mmu.c
> >>> +++ b/arch/riscv/kvm/mmu.c
> >>> @@ -18,6 +18,432 @@
> >>>    #include <asm/page.h>
> >>>    #include <asm/pgtable.h>
> >>>
> >>> +#ifdef CONFIG_64BIT
> >>> +#define stage2_have_pmd              true
> >>> +#define stage2_gpa_size              ((phys_addr_t)(1ULL << 39))
> >>> +#define stage2_cache_min_pages       2
> >>> +#else
> >>> +#define pmd_index(x)         0
> >>> +#define pfn_pmd(x, y)                ({ pmd_t __x = { 0 }; __x; })
> >>> +#define stage2_have_pmd              false
> >>> +#define stage2_gpa_size              ((phys_addr_t)(1ULL << 32))
> >>> +#define stage2_cache_min_pages       1
> >>> +#endif
> >>> +
> >>> +static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache,
> >>> +                           int min, int max)
> >>> +{
> >>> +     void *page;
> >>> +
> >>> +     BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS);
> >>> +     if (pcache->nobjs >= min)
> >>> +             return 0;
> >>> +     while (pcache->nobjs < max) {
> >>> +             page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> >>> +             if (!page)
> >>> +                     return -ENOMEM;
> >>> +             pcache->objects[pcache->nobjs++] = page;
> >>> +     }
> >>> +
> >>> +     return 0;
> >>> +}
> >>> +
> >>> +static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache)
> >>> +{
> >>> +     while (pcache && pcache->nobjs)
> >>> +             free_page((unsigned long)pcache->objects[--pcache->nobjs]);
> >>> +}
> >>> +
> >>> +static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
> >>> +{
> >>> +     void *p;
> >>> +
> >>> +     if (!pcache)
> >>> +             return NULL;
> >>> +
> >>> +     BUG_ON(!pcache->nobjs);
> >>> +     p = pcache->objects[--pcache->nobjs];
> >>> +
> >>> +     return p;
> >>> +}
> >>> +
> >>> +struct local_guest_tlb_info {
> >>> +     struct kvm_vmid *vmid;
> >>> +     gpa_t addr;
> >>> +};
> >>> +
> >>> +static void local_guest_tlb_flush_vmid_gpa(void *info)
> >>> +{
> >>> +     struct local_guest_tlb_info *infop = info;
> >>> +
> >>> +     __kvm_riscv_hfence_gvma_vmid_gpa(READ_ONCE(infop->vmid->vmid_version),
> >>> +                                      infop->addr);
> >>> +}
> >>> +
> >>> +static void stage2_remote_tlb_flush(struct kvm *kvm, gpa_t addr)
> >>> +{
> >>> +     struct local_guest_tlb_info info;
> >>> +     struct kvm_vmid *vmid = &kvm->arch.vmid;
> >>> +
> >>> +     /* TODO: This should be SBI call */
> >>> +     info.vmid = vmid;
> >>> +     info.addr = addr;
> >>> +     preempt_disable();
> >>> +     smp_call_function_many(cpu_all_mask, local_guest_tlb_flush_vmid_gpa,
> >>> +                            &info, true);
> >>
> >> This is all nice and dandy on the toy 4 core systems we have today, but
> >> it will become a bottleneck further down the road.
> >>
> >> How many VMIDs do you have? Could you just allocate a new one every time
> >> you switch host CPUs? Then you know exactly which CPUs to flush by
> >> looking at all your vcpu structs and a local field that tells you which
> >> pCPU they're on at this moment.
> >>
> >> Either way, it's nothing that should block inclusion. For today, we're fine.
> >
> > We are not happy about this either.
> >
> > Other two options, we have are:
> > 1. Have SBI calls for remote HFENCEs
> > 2. Propose RISC-V ISA extension for remote FENCEs
> >
> > Option1 is mostly extending SBI spec and implementing it in runtime
> > firmware.
> >
> > Option2 is ideal solution but requires consensus among wider audience
> > in RISC-V foundation.
> >
> > At this point, we are fine with a simple solution.
>
> It's fine to explicitly IPI other CPUs to flush their TLBs. What is not
> fine is to IPI *all* CPUs to flush their TLBs.

Ahh, this should have been cpu_online_mask instead of cpu_all_mask

I will update this in next revision.

Regards,
Anup

>
>
> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  2019-08-22 12:01   ` Alexander Graf
@ 2019-08-22 14:00     ` Anup Patel
  2019-08-22 14:12       ` Alexander Graf
  2019-08-22 14:05     ` Anup Patel
  1 sibling, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-22 14:00 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 5:31 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 22.08.19 10:44, Anup Patel wrote:
> > For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
> > VCPU config and registers from user-space.
> >
> > We have three types of VCPU registers:
> > 1. CONFIG - these are VCPU config and capabilities
> > 2. CORE   - these are VCPU general purpose registers
> > 3. CSR    - these are VCPU control and status registers
> >
> > The CONFIG registers available to user-space are ISA and TIMEBASE. Out
> > of these, TIMEBASE is a read-only register which inform user-space about
> > VCPU timer base frequency. The ISA register is a read and write register
> > where user-space can only write the desired VCPU ISA capabilities before
> > running the VCPU.
> >
> > The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
> > T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
> > PC and MODE. The PC register represents program counter whereas the MODE
> > register represent VCPU privilege mode (i.e. S/U-mode).
> >
> > The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
> > SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.
> >
> > In future, more VCPU register types will be added (such as FP) for the
> > KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.
> >
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   arch/riscv/include/uapi/asm/kvm.h |  40 ++++-
> >   arch/riscv/kvm/vcpu.c             | 235 +++++++++++++++++++++++++++++-
> >   2 files changed, 272 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
> > index 6dbc056d58ba..024f220eb17e 100644
> > --- a/arch/riscv/include/uapi/asm/kvm.h
> > +++ b/arch/riscv/include/uapi/asm/kvm.h
> > @@ -23,8 +23,15 @@
> >
> >   /* for KVM_GET_REGS and KVM_SET_REGS */
> >   struct kvm_regs {
> > +     /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
> > +     struct user_regs_struct regs;
> > +     unsigned long mode;
>
> Is there any particular reason you're reusing kvm_regs and don't invent
> your own struct? kvm_regs is explicitly meant for the get_regs and
> set_regs ioctls.

We are implementing only ONE_REG interface so most of these
structs are unused hence we tried to reuse these struct instead
of introducing new structs. (Similar to KVM ARM64)

>
> >   };
> >
> > +/* Possible privilege modes for kvm_regs */
> > +#define KVM_RISCV_MODE_S     1
> > +#define KVM_RISCV_MODE_U     0
> > +
> >   /* for KVM_GET_FPU and KVM_SET_FPU */
> >   struct kvm_fpu {
> >   };
> > @@ -41,10 +48,41 @@ struct kvm_guest_debug_arch {
> >   struct kvm_sync_regs {
> >   };
> >
> > -/* dummy definition */
> > +/* for KVM_GET_SREGS and KVM_SET_SREGS */
> >   struct kvm_sregs {
> > +     unsigned long sstatus;
> > +     unsigned long sie;
> > +     unsigned long stvec;
> > +     unsigned long sscratch;
> > +     unsigned long sepc;
> > +     unsigned long scause;
> > +     unsigned long stval;
> > +     unsigned long sip;
> > +     unsigned long satp;
>
> Same comment here.

Same as above, we are trying to use unused struct.

>
> >   };
> >
> > +#define KVM_REG_SIZE(id)             \
> > +     (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> > +
> > +/* If you need to interpret the index values, here is the key: */
> > +#define KVM_REG_RISCV_TYPE_MASK              0x00000000FF000000
> > +#define KVM_REG_RISCV_TYPE_SHIFT     24
> > +
> > +/* Config registers are mapped as type 1 */
> > +#define KVM_REG_RISCV_CONFIG         (0x01 << KVM_REG_RISCV_TYPE_SHIFT)
> > +#define KVM_REG_RISCV_CONFIG_ISA     0x0
> > +#define KVM_REG_RISCV_CONFIG_TIMEBASE        0x1
> > +
> > +/* Core registers are mapped as type 2 */
> > +#define KVM_REG_RISCV_CORE           (0x02 << KVM_REG_RISCV_TYPE_SHIFT)
> > +#define KVM_REG_RISCV_CORE_REG(name) \
> > +             (offsetof(struct kvm_regs, name) / sizeof(unsigned long))
>
> I see, you're trying to implicitly use the struct offsets as index.
>
> I'm not a really big fan of it, but I can't pinpoint exactly why just
> yet. It just seems too magical (read: potentially breaking down the
> road) for me.
>
> > +
> > +/* Control and status registers are mapped as type 3 */
> > +#define KVM_REG_RISCV_CSR            (0x03 << KVM_REG_RISCV_TYPE_SHIFT)
> > +#define KVM_REG_RISCV_CSR_REG(name)  \
> > +             (offsetof(struct kvm_sregs, name) / sizeof(unsigned long))
> > +
> >   #endif
> >
> >   #endif /* __LINUX_KVM_RISCV_H */
> > diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> > index 7f59e85c6af8..9396a83c0611 100644
> > --- a/arch/riscv/kvm/vcpu.c
> > +++ b/arch/riscv/kvm/vcpu.c
> > @@ -164,6 +164,215 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> >       return VM_FAULT_SIGBUS;
> >   }
> >
> > +static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
> > +                                      const struct kvm_one_reg *reg)
> > +{
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CONFIG);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     switch (reg_num) {
> > +     case KVM_REG_RISCV_CONFIG_ISA:
> > +             reg_val = vcpu->arch.isa;
> > +             break;
> > +     case KVM_REG_RISCV_CONFIG_TIMEBASE:
> > +             reg_val = riscv_timebase;
>
> What does this reflect? The current guest time hopefully not? An offset?
> Related to what?

riscv_timebase is the frequency in HZ of the system timer.

The name "timebase" is not appropriate but we have been
carrying it since quite some time now.

>
> All ONE_REG registers should be documented in
> Documentation/virtual/kvm/api.txt. Please add them there.

Sure, I will update in next revision.

>
> > +             break;
> > +     default:
> > +             return -EINVAL;
> > +     };
> > +
> > +     if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     return 0;
> > +}
> > +
> > +static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
> > +                                      const struct kvm_one_reg *reg)
> > +{
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CONFIG);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     switch (reg_num) {
> > +     case KVM_REG_RISCV_CONFIG_ISA:
> > +             if (!vcpu->arch.ran_atleast_once) {
> > +                     vcpu->arch.isa = reg_val;
> > +                     vcpu->arch.isa &= riscv_isa_extension_base(NULL);
> > +                     vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED;
>
> This register definitely needs proper documentation too ;). You may want
> to reconsider to put a few of the helper bits from patch 02/20 into
> uapi, so that user space can directly use them.

Sure, I will add details about ISA register in Documentation/virt/kvm/api.txt

Regards,
Anup


>
> > +             } else {
> > +                     return -ENOTSUPP;
> > +             }
> > +             break;
> > +     case KVM_REG_RISCV_CONFIG_TIMEBASE:
> > +             return -ENOTSUPP;
> > +     default:
> > +             return -EINVAL;
> > +     };
> > +
> > +     return 0;
> > +}
> > +
> > +static int kvm_riscv_vcpu_get_reg_core(struct kvm_vcpu *vcpu,
> > +                                    const struct kvm_one_reg *reg)
> > +{
> > +     struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CORE);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
> > +             reg_val = cntx->sepc;
> > +     else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
> > +              reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6))
> > +             reg_val = ((unsigned long *)cntx)[reg_num];
> > +     else if (reg_num == KVM_REG_RISCV_CORE_REG(mode))
> > +             reg_val = (cntx->sstatus & SR_SPP) ?
> > +                             KVM_RISCV_MODE_S : KVM_RISCV_MODE_U;
> > +     else
> > +             return -EINVAL;
> > +
> > +     if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     return 0;
> > +}
> > +
> > +static int kvm_riscv_vcpu_set_reg_core(struct kvm_vcpu *vcpu,
> > +                                    const struct kvm_one_reg *reg)
> > +{
> > +     struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CORE);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
> > +             cntx->sepc = reg_val;
> > +     else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
> > +              reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6))
> > +             ((unsigned long *)cntx)[reg_num] = reg_val;
> > +     else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) {
> > +             if (reg_val == KVM_RISCV_MODE_S)
> > +                     cntx->sstatus |= SR_SPP;
> > +             else
> > +                     cntx->sstatus &= ~SR_SPP;
> > +     } else
> > +             return -EINVAL;
> > +
> > +     return 0;
> > +}
> > +
> > +static int kvm_riscv_vcpu_get_reg_csr(struct kvm_vcpu *vcpu,
> > +                                   const struct kvm_one_reg *reg)
> > +{
> > +     struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CSR);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +     if (reg_num >= sizeof(struct kvm_sregs) / sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     if (reg_num == KVM_REG_RISCV_CSR_REG(sip))
> > +             kvm_riscv_vcpu_flush_interrupts(vcpu);
> > +
> > +     reg_val = ((unsigned long *)csr)[reg_num];
> > +
> > +     if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     return 0;
> > +}
> > +
> > +static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu,
> > +                                   const struct kvm_one_reg *reg)
> > +{
> > +     struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CSR);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +     if (reg_num >= sizeof(struct kvm_sregs) / sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     ((unsigned long *)csr)[reg_num] = reg_val;
> > +
> > +     if (reg_num == KVM_REG_RISCV_CSR_REG(sip))
> > +             WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
>
> Why does writing SIP clear all pending interrupts?
>
>
> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  2019-08-22 12:01   ` Alexander Graf
  2019-08-22 14:00     ` Anup Patel
@ 2019-08-22 14:05     ` Anup Patel
  1 sibling, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-22 14:05 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 5:31 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 22.08.19 10:44, Anup Patel wrote:
> > For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
> > VCPU config and registers from user-space.
> >
> > We have three types of VCPU registers:
> > 1. CONFIG - these are VCPU config and capabilities
> > 2. CORE   - these are VCPU general purpose registers
> > 3. CSR    - these are VCPU control and status registers
> >
> > The CONFIG registers available to user-space are ISA and TIMEBASE. Out
> > of these, TIMEBASE is a read-only register which inform user-space about
> > VCPU timer base frequency. The ISA register is a read and write register
> > where user-space can only write the desired VCPU ISA capabilities before
> > running the VCPU.
> >
> > The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
> > T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
> > PC and MODE. The PC register represents program counter whereas the MODE
> > register represent VCPU privilege mode (i.e. S/U-mode).
> >
> > The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
> > SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.
> >
> > In future, more VCPU register types will be added (such as FP) for the
> > KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.
> >
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   arch/riscv/include/uapi/asm/kvm.h |  40 ++++-
> >   arch/riscv/kvm/vcpu.c             | 235 +++++++++++++++++++++++++++++-
> >   2 files changed, 272 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
> > index 6dbc056d58ba..024f220eb17e 100644
> > --- a/arch/riscv/include/uapi/asm/kvm.h
> > +++ b/arch/riscv/include/uapi/asm/kvm.h
> > @@ -23,8 +23,15 @@
> >
> >   /* for KVM_GET_REGS and KVM_SET_REGS */
> >   struct kvm_regs {
> > +     /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
> > +     struct user_regs_struct regs;
> > +     unsigned long mode;
>
> Is there any particular reason you're reusing kvm_regs and don't invent
> your own struct? kvm_regs is explicitly meant for the get_regs and
> set_regs ioctls.
>
> >   };
> >
> > +/* Possible privilege modes for kvm_regs */
> > +#define KVM_RISCV_MODE_S     1
> > +#define KVM_RISCV_MODE_U     0
> > +
> >   /* for KVM_GET_FPU and KVM_SET_FPU */
> >   struct kvm_fpu {
> >   };
> > @@ -41,10 +48,41 @@ struct kvm_guest_debug_arch {
> >   struct kvm_sync_regs {
> >   };
> >
> > -/* dummy definition */
> > +/* for KVM_GET_SREGS and KVM_SET_SREGS */
> >   struct kvm_sregs {
> > +     unsigned long sstatus;
> > +     unsigned long sie;
> > +     unsigned long stvec;
> > +     unsigned long sscratch;
> > +     unsigned long sepc;
> > +     unsigned long scause;
> > +     unsigned long stval;
> > +     unsigned long sip;
> > +     unsigned long satp;
>
> Same comment here.
>
> >   };
> >
> > +#define KVM_REG_SIZE(id)             \
> > +     (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> > +
> > +/* If you need to interpret the index values, here is the key: */
> > +#define KVM_REG_RISCV_TYPE_MASK              0x00000000FF000000
> > +#define KVM_REG_RISCV_TYPE_SHIFT     24
> > +
> > +/* Config registers are mapped as type 1 */
> > +#define KVM_REG_RISCV_CONFIG         (0x01 << KVM_REG_RISCV_TYPE_SHIFT)
> > +#define KVM_REG_RISCV_CONFIG_ISA     0x0
> > +#define KVM_REG_RISCV_CONFIG_TIMEBASE        0x1
> > +
> > +/* Core registers are mapped as type 2 */
> > +#define KVM_REG_RISCV_CORE           (0x02 << KVM_REG_RISCV_TYPE_SHIFT)
> > +#define KVM_REG_RISCV_CORE_REG(name) \
> > +             (offsetof(struct kvm_regs, name) / sizeof(unsigned long))
>
> I see, you're trying to implicitly use the struct offsets as index.
>
> I'm not a really big fan of it, but I can't pinpoint exactly why just
> yet. It just seems too magical (read: potentially breaking down the
> road) for me.
>
> > +
> > +/* Control and status registers are mapped as type 3 */
> > +#define KVM_REG_RISCV_CSR            (0x03 << KVM_REG_RISCV_TYPE_SHIFT)
> > +#define KVM_REG_RISCV_CSR_REG(name)  \
> > +             (offsetof(struct kvm_sregs, name) / sizeof(unsigned long))
> > +
> >   #endif
> >
> >   #endif /* __LINUX_KVM_RISCV_H */
> > diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> > index 7f59e85c6af8..9396a83c0611 100644
> > --- a/arch/riscv/kvm/vcpu.c
> > +++ b/arch/riscv/kvm/vcpu.c
> > @@ -164,6 +164,215 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> >       return VM_FAULT_SIGBUS;
> >   }
> >
> > +static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
> > +                                      const struct kvm_one_reg *reg)
> > +{
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CONFIG);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     switch (reg_num) {
> > +     case KVM_REG_RISCV_CONFIG_ISA:
> > +             reg_val = vcpu->arch.isa;
> > +             break;
> > +     case KVM_REG_RISCV_CONFIG_TIMEBASE:
> > +             reg_val = riscv_timebase;
>
> What does this reflect? The current guest time hopefully not? An offset?
> Related to what?
>
> All ONE_REG registers should be documented in
> Documentation/virtual/kvm/api.txt. Please add them there.
>
> > +             break;
> > +     default:
> > +             return -EINVAL;
> > +     };
> > +
> > +     if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     return 0;
> > +}
> > +
> > +static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
> > +                                      const struct kvm_one_reg *reg)
> > +{
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CONFIG);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     switch (reg_num) {
> > +     case KVM_REG_RISCV_CONFIG_ISA:
> > +             if (!vcpu->arch.ran_atleast_once) {
> > +                     vcpu->arch.isa = reg_val;
> > +                     vcpu->arch.isa &= riscv_isa_extension_base(NULL);
> > +                     vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED;
>
> This register definitely needs proper documentation too ;). You may want
> to reconsider to put a few of the helper bits from patch 02/20 into
> uapi, so that user space can directly use them.
>
> > +             } else {
> > +                     return -ENOTSUPP;
> > +             }
> > +             break;
> > +     case KVM_REG_RISCV_CONFIG_TIMEBASE:
> > +             return -ENOTSUPP;
> > +     default:
> > +             return -EINVAL;
> > +     };
> > +
> > +     return 0;
> > +}
> > +
> > +static int kvm_riscv_vcpu_get_reg_core(struct kvm_vcpu *vcpu,
> > +                                    const struct kvm_one_reg *reg)
> > +{
> > +     struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CORE);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
> > +             reg_val = cntx->sepc;
> > +     else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
> > +              reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6))
> > +             reg_val = ((unsigned long *)cntx)[reg_num];
> > +     else if (reg_num == KVM_REG_RISCV_CORE_REG(mode))
> > +             reg_val = (cntx->sstatus & SR_SPP) ?
> > +                             KVM_RISCV_MODE_S : KVM_RISCV_MODE_U;
> > +     else
> > +             return -EINVAL;
> > +
> > +     if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     return 0;
> > +}
> > +
> > +static int kvm_riscv_vcpu_set_reg_core(struct kvm_vcpu *vcpu,
> > +                                    const struct kvm_one_reg *reg)
> > +{
> > +     struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CORE);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
> > +             cntx->sepc = reg_val;
> > +     else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
> > +              reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6))
> > +             ((unsigned long *)cntx)[reg_num] = reg_val;
> > +     else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) {
> > +             if (reg_val == KVM_RISCV_MODE_S)
> > +                     cntx->sstatus |= SR_SPP;
> > +             else
> > +                     cntx->sstatus &= ~SR_SPP;
> > +     } else
> > +             return -EINVAL;
> > +
> > +     return 0;
> > +}
> > +
> > +static int kvm_riscv_vcpu_get_reg_csr(struct kvm_vcpu *vcpu,
> > +                                   const struct kvm_one_reg *reg)
> > +{
> > +     struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CSR);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +     if (reg_num >= sizeof(struct kvm_sregs) / sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     if (reg_num == KVM_REG_RISCV_CSR_REG(sip))
> > +             kvm_riscv_vcpu_flush_interrupts(vcpu);
> > +
> > +     reg_val = ((unsigned long *)csr)[reg_num];
> > +
> > +     if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     return 0;
> > +}
> > +
> > +static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu,
> > +                                   const struct kvm_one_reg *reg)
> > +{
> > +     struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
> > +     unsigned long __user *uaddr =
> > +                     (unsigned long __user *)(unsigned long)reg->addr;
> > +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> > +                                         KVM_REG_SIZE_MASK |
> > +                                         KVM_REG_RISCV_CSR);
> > +     unsigned long reg_val;
> > +
> > +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> > +             return -EINVAL;
> > +     if (reg_num >= sizeof(struct kvm_sregs) / sizeof(unsigned long))
> > +             return -EINVAL;
> > +
> > +     if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
> > +             return -EFAULT;
> > +
> > +     ((unsigned long *)csr)[reg_num] = reg_val;
> > +
> > +     if (reg_num == KVM_REG_RISCV_CSR_REG(sip))
> > +             WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
>
> Why does writing SIP clear all pending interrupts?

irqs_pending_mask represents bits changes in irqs_pending.

Once the SIP CSR is updated by user-space, the changes to
irqs_pending are no longer valid so we clear irqs_pending_mask.

If we don't clear irqs_pending_mask then value programmed by
user-space can get overwritten if there were interrupts after
we saved SIP CSR and before we restored it.

Regards,
Anup

>
>
> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming
  2019-08-22 13:58         ` Anup Patel
@ 2019-08-22 14:09           ` Alexander Graf
  2019-08-23 11:21             ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-22 14:09 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel



On 22.08.19 15:58, Anup Patel wrote:
> On Thu, Aug 22, 2019 at 6:57 PM Alexander Graf <graf@amazon.com> wrote:
>>
>>
>>
>> On 22.08.19 14:38, Anup Patel wrote:
>>> On Thu, Aug 22, 2019 at 5:58 PM Alexander Graf <graf@amazon.com> wrote:
>>>>
>>>> On 22.08.19 10:45, Anup Patel wrote:
>>>>> This patch implements all required functions for programming
>>>>> the stage2 page table for each Guest/VM.
>>>>>
>>>>> At high-level, the flow of stage2 related functions is similar
>>>>> from KVM ARM/ARM64 implementation but the stage2 page table
>>>>> format is quite different for KVM RISC-V.
>>>>>
>>>>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
>>>>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>> ---
>>>>>     arch/riscv/include/asm/kvm_host.h     |  10 +
>>>>>     arch/riscv/include/asm/pgtable-bits.h |   1 +
>>>>>     arch/riscv/kvm/mmu.c                  | 637 +++++++++++++++++++++++++-
>>>>>     3 files changed, 638 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
>>>>> index 3b09158f80f2..a37775c92586 100644
>>>>> --- a/arch/riscv/include/asm/kvm_host.h
>>>>> +++ b/arch/riscv/include/asm/kvm_host.h
>>>>> @@ -72,6 +72,13 @@ struct kvm_mmio_decode {
>>>>>         int shift;
>>>>>     };
>>>>>
>>>>> +#define KVM_MMU_PAGE_CACHE_NR_OBJS   32
>>>>> +
>>>>> +struct kvm_mmu_page_cache {
>>>>> +     int nobjs;
>>>>> +     void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS];
>>>>> +};
>>>>> +
>>>>>     struct kvm_cpu_context {
>>>>>         unsigned long zero;
>>>>>         unsigned long ra;
>>>>> @@ -163,6 +170,9 @@ struct kvm_vcpu_arch {
>>>>>         /* MMIO instruction details */
>>>>>         struct kvm_mmio_decode mmio_decode;
>>>>>
>>>>> +     /* Cache pages needed to program page tables with spinlock held */
>>>>> +     struct kvm_mmu_page_cache mmu_page_cache;
>>>>> +
>>>>>         /* VCPU power-off state */
>>>>>         bool power_off;
>>>>>
>>>>> diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
>>>>> index bbaeb5d35842..be49d62fcc2b 100644
>>>>> --- a/arch/riscv/include/asm/pgtable-bits.h
>>>>> +++ b/arch/riscv/include/asm/pgtable-bits.h
>>>>> @@ -26,6 +26,7 @@
>>>>>
>>>>>     #define _PAGE_SPECIAL   _PAGE_SOFT
>>>>>     #define _PAGE_TABLE     _PAGE_PRESENT
>>>>> +#define _PAGE_LEAF      (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
>>>>>
>>>>>     /*
>>>>>      * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
>>>>> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
>>>>> index 2b965f9aac07..9e95ab6769f6 100644
>>>>> --- a/arch/riscv/kvm/mmu.c
>>>>> +++ b/arch/riscv/kvm/mmu.c
>>>>> @@ -18,6 +18,432 @@
>>>>>     #include <asm/page.h>
>>>>>     #include <asm/pgtable.h>
>>>>>
>>>>> +#ifdef CONFIG_64BIT
>>>>> +#define stage2_have_pmd              true
>>>>> +#define stage2_gpa_size              ((phys_addr_t)(1ULL << 39))
>>>>> +#define stage2_cache_min_pages       2
>>>>> +#else
>>>>> +#define pmd_index(x)         0
>>>>> +#define pfn_pmd(x, y)                ({ pmd_t __x = { 0 }; __x; })
>>>>> +#define stage2_have_pmd              false
>>>>> +#define stage2_gpa_size              ((phys_addr_t)(1ULL << 32))
>>>>> +#define stage2_cache_min_pages       1
>>>>> +#endif
>>>>> +
>>>>> +static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache,
>>>>> +                           int min, int max)
>>>>> +{
>>>>> +     void *page;
>>>>> +
>>>>> +     BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS);
>>>>> +     if (pcache->nobjs >= min)
>>>>> +             return 0;
>>>>> +     while (pcache->nobjs < max) {
>>>>> +             page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>>>>> +             if (!page)
>>>>> +                     return -ENOMEM;
>>>>> +             pcache->objects[pcache->nobjs++] = page;
>>>>> +     }
>>>>> +
>>>>> +     return 0;
>>>>> +}
>>>>> +
>>>>> +static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache)
>>>>> +{
>>>>> +     while (pcache && pcache->nobjs)
>>>>> +             free_page((unsigned long)pcache->objects[--pcache->nobjs]);
>>>>> +}
>>>>> +
>>>>> +static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
>>>>> +{
>>>>> +     void *p;
>>>>> +
>>>>> +     if (!pcache)
>>>>> +             return NULL;
>>>>> +
>>>>> +     BUG_ON(!pcache->nobjs);
>>>>> +     p = pcache->objects[--pcache->nobjs];
>>>>> +
>>>>> +     return p;
>>>>> +}
>>>>> +
>>>>> +struct local_guest_tlb_info {
>>>>> +     struct kvm_vmid *vmid;
>>>>> +     gpa_t addr;
>>>>> +};
>>>>> +
>>>>> +static void local_guest_tlb_flush_vmid_gpa(void *info)
>>>>> +{
>>>>> +     struct local_guest_tlb_info *infop = info;
>>>>> +
>>>>> +     __kvm_riscv_hfence_gvma_vmid_gpa(READ_ONCE(infop->vmid->vmid_version),
>>>>> +                                      infop->addr);
>>>>> +}
>>>>> +
>>>>> +static void stage2_remote_tlb_flush(struct kvm *kvm, gpa_t addr)
>>>>> +{
>>>>> +     struct local_guest_tlb_info info;
>>>>> +     struct kvm_vmid *vmid = &kvm->arch.vmid;
>>>>> +
>>>>> +     /* TODO: This should be SBI call */
>>>>> +     info.vmid = vmid;
>>>>> +     info.addr = addr;
>>>>> +     preempt_disable();
>>>>> +     smp_call_function_many(cpu_all_mask, local_guest_tlb_flush_vmid_gpa,
>>>>> +                            &info, true);
>>>>
>>>> This is all nice and dandy on the toy 4 core systems we have today, but
>>>> it will become a bottleneck further down the road.
>>>>
>>>> How many VMIDs do you have? Could you just allocate a new one every time
>>>> you switch host CPUs? Then you know exactly which CPUs to flush by
>>>> looking at all your vcpu structs and a local field that tells you which
>>>> pCPU they're on at this moment.
>>>>
>>>> Either way, it's nothing that should block inclusion. For today, we're fine.
>>>
>>> We are not happy about this either.
>>>
>>> Other two options, we have are:
>>> 1. Have SBI calls for remote HFENCEs
>>> 2. Propose RISC-V ISA extension for remote FENCEs
>>>
>>> Option1 is mostly extending SBI spec and implementing it in runtime
>>> firmware.
>>>
>>> Option2 is ideal solution but requires consensus among wider audience
>>> in RISC-V foundation.
>>>
>>> At this point, we are fine with a simple solution.
>>
>> It's fine to explicitly IPI other CPUs to flush their TLBs. What is not
>> fine is to IPI *all* CPUs to flush their TLBs.
> 
> Ahh, this should have been cpu_online_mask instead of cpu_all_mask
> 
> I will update this in next revision.

What I was trying to say is that you only want to flush currently 
running other vcpus and add a hint for all the others saying "please 
flush the next time you come up".

I think we had a mechanism for that somewhere in the EVENT magic.

But as I said, this is a performance optimization - that's something I'm 
happy to delay. Security and user space ABI are the bits I'm worried 
about at this stage.


Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  2019-08-22 14:00     ` Anup Patel
@ 2019-08-22 14:12       ` Alexander Graf
  2019-08-23 11:20         ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-22 14:12 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel



On 22.08.19 16:00, Anup Patel wrote:
> On Thu, Aug 22, 2019 at 5:31 PM Alexander Graf <graf@amazon.com> wrote:
>>
>> On 22.08.19 10:44, Anup Patel wrote:
>>> For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
>>> VCPU config and registers from user-space.
>>>
>>> We have three types of VCPU registers:
>>> 1. CONFIG - these are VCPU config and capabilities
>>> 2. CORE   - these are VCPU general purpose registers
>>> 3. CSR    - these are VCPU control and status registers
>>>
>>> The CONFIG registers available to user-space are ISA and TIMEBASE. Out
>>> of these, TIMEBASE is a read-only register which inform user-space about
>>> VCPU timer base frequency. The ISA register is a read and write register
>>> where user-space can only write the desired VCPU ISA capabilities before
>>> running the VCPU.
>>>
>>> The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
>>> T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
>>> PC and MODE. The PC register represents program counter whereas the MODE
>>> register represent VCPU privilege mode (i.e. S/U-mode).
>>>
>>> The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
>>> SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.
>>>
>>> In future, more VCPU register types will be added (such as FP) for the
>>> KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.
>>>
>>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
>>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>    arch/riscv/include/uapi/asm/kvm.h |  40 ++++-
>>>    arch/riscv/kvm/vcpu.c             | 235 +++++++++++++++++++++++++++++-
>>>    2 files changed, 272 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
>>> index 6dbc056d58ba..024f220eb17e 100644
>>> --- a/arch/riscv/include/uapi/asm/kvm.h
>>> +++ b/arch/riscv/include/uapi/asm/kvm.h
>>> @@ -23,8 +23,15 @@
>>>
>>>    /* for KVM_GET_REGS and KVM_SET_REGS */
>>>    struct kvm_regs {
>>> +     /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
>>> +     struct user_regs_struct regs;
>>> +     unsigned long mode;
>>
>> Is there any particular reason you're reusing kvm_regs and don't invent
>> your own struct? kvm_regs is explicitly meant for the get_regs and
>> set_regs ioctls.
> 
> We are implementing only ONE_REG interface so most of these
> structs are unused hence we tried to reuse these struct instead
> of introducing new structs. (Similar to KVM ARM64)
> 
>>
>>>    };
>>>
>>> +/* Possible privilege modes for kvm_regs */
>>> +#define KVM_RISCV_MODE_S     1
>>> +#define KVM_RISCV_MODE_U     0
>>> +
>>>    /* for KVM_GET_FPU and KVM_SET_FPU */
>>>    struct kvm_fpu {
>>>    };
>>> @@ -41,10 +48,41 @@ struct kvm_guest_debug_arch {
>>>    struct kvm_sync_regs {
>>>    };
>>>
>>> -/* dummy definition */
>>> +/* for KVM_GET_SREGS and KVM_SET_SREGS */
>>>    struct kvm_sregs {
>>> +     unsigned long sstatus;
>>> +     unsigned long sie;
>>> +     unsigned long stvec;
>>> +     unsigned long sscratch;
>>> +     unsigned long sepc;
>>> +     unsigned long scause;
>>> +     unsigned long stval;
>>> +     unsigned long sip;
>>> +     unsigned long satp;
>>
>> Same comment here.
> 
> Same as above, we are trying to use unused struct.
> 
>>
>>>    };
>>>
>>> +#define KVM_REG_SIZE(id)             \
>>> +     (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>>> +
>>> +/* If you need to interpret the index values, here is the key: */
>>> +#define KVM_REG_RISCV_TYPE_MASK              0x00000000FF000000
>>> +#define KVM_REG_RISCV_TYPE_SHIFT     24
>>> +
>>> +/* Config registers are mapped as type 1 */
>>> +#define KVM_REG_RISCV_CONFIG         (0x01 << KVM_REG_RISCV_TYPE_SHIFT)
>>> +#define KVM_REG_RISCV_CONFIG_ISA     0x0
>>> +#define KVM_REG_RISCV_CONFIG_TIMEBASE        0x1
>>> +
>>> +/* Core registers are mapped as type 2 */
>>> +#define KVM_REG_RISCV_CORE           (0x02 << KVM_REG_RISCV_TYPE_SHIFT)
>>> +#define KVM_REG_RISCV_CORE_REG(name) \
>>> +             (offsetof(struct kvm_regs, name) / sizeof(unsigned long))
>>
>> I see, you're trying to implicitly use the struct offsets as index.
>>
>> I'm not a really big fan of it, but I can't pinpoint exactly why just
>> yet. It just seems too magical (read: potentially breaking down the
>> road) for me.
>>
>>> +
>>> +/* Control and status registers are mapped as type 3 */
>>> +#define KVM_REG_RISCV_CSR            (0x03 << KVM_REG_RISCV_TYPE_SHIFT)
>>> +#define KVM_REG_RISCV_CSR_REG(name)  \
>>> +             (offsetof(struct kvm_sregs, name) / sizeof(unsigned long))
>>> +
>>>    #endif
>>>
>>>    #endif /* __LINUX_KVM_RISCV_H */
>>> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
>>> index 7f59e85c6af8..9396a83c0611 100644
>>> --- a/arch/riscv/kvm/vcpu.c
>>> +++ b/arch/riscv/kvm/vcpu.c
>>> @@ -164,6 +164,215 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>>>        return VM_FAULT_SIGBUS;
>>>    }
>>>
>>> +static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
>>> +                                      const struct kvm_one_reg *reg)
>>> +{
>>> +     unsigned long __user *uaddr =
>>> +                     (unsigned long __user *)(unsigned long)reg->addr;
>>> +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
>>> +                                         KVM_REG_SIZE_MASK |
>>> +                                         KVM_REG_RISCV_CONFIG);
>>> +     unsigned long reg_val;
>>> +
>>> +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
>>> +             return -EINVAL;
>>> +
>>> +     switch (reg_num) {
>>> +     case KVM_REG_RISCV_CONFIG_ISA:
>>> +             reg_val = vcpu->arch.isa;
>>> +             break;
>>> +     case KVM_REG_RISCV_CONFIG_TIMEBASE:
>>> +             reg_val = riscv_timebase;
>>
>> What does this reflect? The current guest time hopefully not? An offset?
>> Related to what?
> 
> riscv_timebase is the frequency in HZ of the system timer.
> 
> The name "timebase" is not appropriate but we have been
> carrying it since quite some time now.

What do you mean by "some time"? So far I only see a kernel internal 
variable named after it. That's dramatically different from something 
exposed via uapi.

Just name it tbfreq.

So if this is the frequency, where is the offset? You will need it on 
save/restore. If you're saying that's out of scope for now, that's fine 
with me too :).


Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 15/20] RISC-V: KVM: Add timer functionality
  2019-08-22  8:46 ` [PATCH v5 15/20] RISC-V: KVM: Add timer functionality Anup Patel
@ 2019-08-23  7:52   ` Alexander Graf
  2019-08-23 11:04     ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-23  7:52 UTC (permalink / raw)
  To: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel

On 22.08.19 10:46, Anup Patel wrote:
> From: Atish Patra <atish.patra@wdc.com>
> 
> The RISC-V hypervisor specification doesn't have any virtual timer
> feature.
> 
> Due to this, the guest VCPU timer will be programmed via SBI calls.
> The host will use a separate hrtimer event for each guest VCPU to
> provide timer functionality. We inject a virtual timer interrupt to
> the guest VCPU whenever the guest VCPU hrtimer event expires.
> 
> The following features are not supported yet and will be added in
> future:
> 1. A time offset to adjust guest time from host time
> 2. A saved next event in guest vcpu for vm migration

Implementing these 2 bits right now should be trivial. Why wait?

> 
> Signed-off-by: Atish Patra <atish.patra@wdc.com>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/riscv/include/asm/kvm_host.h       |   4 +
>   arch/riscv/include/asm/kvm_vcpu_timer.h |  32 +++++++
>   arch/riscv/kvm/Makefile                 |   2 +-
>   arch/riscv/kvm/vcpu.c                   |   6 ++
>   arch/riscv/kvm/vcpu_timer.c             | 106 ++++++++++++++++++++++++
>   drivers/clocksource/timer-riscv.c       |   8 ++
>   include/clocksource/timer-riscv.h       |  16 ++++
>   7 files changed, 173 insertions(+), 1 deletion(-)
>   create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
>   create mode 100644 arch/riscv/kvm/vcpu_timer.c
>   create mode 100644 include/clocksource/timer-riscv.h
> 
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index ab33e59a3d88..d2a2e45eefc0 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -12,6 +12,7 @@
>   #include <linux/types.h>
>   #include <linux/kvm.h>
>   #include <linux/kvm_types.h>
> +#include <asm/kvm_vcpu_timer.h>
>   
>   #ifdef CONFIG_64BIT
>   #define KVM_MAX_VCPUS			(1U << 16)
> @@ -167,6 +168,9 @@ struct kvm_vcpu_arch {
>   	unsigned long irqs_pending;
>   	unsigned long irqs_pending_mask;
>   
> +	/* VCPU Timer */
> +	struct kvm_vcpu_timer timer;
> +
>   	/* MMIO instruction details */
>   	struct kvm_mmio_decode mmio_decode;
>   
> diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h
> new file mode 100644
> index 000000000000..df67ea86988e
> --- /dev/null
> +++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2019 Western Digital Corporation or its affiliates.
> + *
> + * Authors:
> + *	Atish Patra <atish.patra@wdc.com>
> + */
> +
> +#ifndef __KVM_VCPU_RISCV_TIMER_H
> +#define __KVM_VCPU_RISCV_TIMER_H
> +
> +#include <linux/hrtimer.h>
> +
> +#define VCPU_TIMER_PROGRAM_THRESHOLD_NS		1000
> +
> +struct kvm_vcpu_timer {
> +	bool init_done;
> +	/* Check if the timer is programmed */
> +	bool is_set;
> +	struct hrtimer hrt;
> +	/* Mult & Shift values to get nanosec from cycles */
> +	u32 mult;
> +	u32 shift;
> +};
> +
> +int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu);
> +int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
> +int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
> +int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
> +				    unsigned long ncycles);

This function never gets called?

> +
> +#endif
> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> index c0f57f26c13d..3e0c7558320d 100644
> --- a/arch/riscv/kvm/Makefile
> +++ b/arch/riscv/kvm/Makefile
> @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
>   kvm-objs := $(common-objs-y)
>   
>   kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
> -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
> +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
>   
>   obj-$(CONFIG_KVM)	+= kvm.o
> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> index 6124077d154f..018fca436776 100644
> --- a/arch/riscv/kvm/vcpu.c
> +++ b/arch/riscv/kvm/vcpu.c
> @@ -54,6 +54,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
>   
>   	memcpy(cntx, reset_cntx, sizeof(*cntx));
>   
> +	kvm_riscv_vcpu_timer_reset(vcpu);
> +
>   	WRITE_ONCE(vcpu->arch.irqs_pending, 0);
>   	WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
>   }
> @@ -108,6 +110,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>   	cntx->hstatus |= HSTATUS_SP2P;
>   	cntx->hstatus |= HSTATUS_SPV;
>   
> +	/* Setup VCPU timer */
> +	kvm_riscv_vcpu_timer_init(vcpu);
> +
>   	/* Reset VCPU */
>   	kvm_riscv_reset_vcpu(vcpu);
>   
> @@ -116,6 +121,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>   
>   void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>   {
> +	kvm_riscv_vcpu_timer_deinit(vcpu);
>   	kvm_riscv_stage2_flush_cache(vcpu);
>   	kmem_cache_free(kvm_vcpu_cache, vcpu);
>   }
> diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
> new file mode 100644
> index 000000000000..a45ca06e1aa6
> --- /dev/null
> +++ b/arch/riscv/kvm/vcpu_timer.c
> @@ -0,0 +1,106 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Western Digital Corporation or its affiliates.
> + *
> + * Authors:
> + *     Atish Patra <atish.patra@wdc.com>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <clocksource/timer-riscv.h>
> +#include <asm/csr.h>
> +#include <asm/kvm_vcpu_timer.h>
> +
> +static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h)
> +{
> +	struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt);
> +	struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer);
> +
> +	t->is_set = false;
> +	kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
> +
> +	return HRTIMER_NORESTART;
> +}
> +
> +static u64 kvm_riscv_delta_cycles2ns(u64 cycles, struct kvm_vcpu_timer *t)
> +{
> +	unsigned long flags;
> +	u64 cycles_now, cycles_delta, delta_ns;
> +
> +	local_irq_save(flags);
> +	cycles_now = get_cycles64();
> +	if (cycles_now < cycles)
> +		cycles_delta = cycles - cycles_now;
> +	else
> +		cycles_delta = 0;
> +	delta_ns = (cycles_delta * t->mult) >> t->shift;
> +	local_irq_restore(flags);
> +
> +	return delta_ns;
> +}
> +
> +static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t)
> +{
> +	if (!t->init_done || !t->is_set)
> +		return -EINVAL;
> +
> +	hrtimer_cancel(&t->hrt);
> +	t->is_set = false;
> +
> +	return 0;
> +}
> +
> +int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
> +				    unsigned long ncycles)
> +{
> +	struct kvm_vcpu_timer *t = &vcpu->arch.timer;
> +	u64 delta_ns = kvm_riscv_delta_cycles2ns(ncycles, t);

... in fact, I feel like I'm missing something obvious here. How does 
the guest trigger the timer event? What is the argument it uses for that 
and how does that play with the tbfreq in the earlier patch?


Alex


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support
  2019-08-22  8:46 ` [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support Anup Patel
@ 2019-08-23  8:04   ` Alexander Graf
  2019-08-23 11:17     ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-23  8:04 UTC (permalink / raw)
  To: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Damien Le Moal, kvm, Anup Patel, Daniel Lezcano, linux-kernel,
	Christoph Hellwig, Atish Patra, Alistair Francis,
	Thomas Gleixner, linux-riscv

On 22.08.19 10:46, Anup Patel wrote:
> From: Atish Patra <atish.patra@wdc.com>
> 
> The KVM host kernel running in HS-mode needs to handle SBI calls coming
> from guest kernel running in VS-mode.
> 
> This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
> implemented correctly except remote tlb flushes. For remote TLB flushes,
> we are doing full TLB flush and this will be optimized in future.
> 
> Signed-off-by: Atish Patra <atish.patra@wdc.com>
> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/riscv/include/asm/kvm_host.h |   2 +
>   arch/riscv/kvm/Makefile           |   2 +-
>   arch/riscv/kvm/vcpu_exit.c        |   3 +
>   arch/riscv/kvm/vcpu_sbi.c         | 119 ++++++++++++++++++++++++++++++
>   4 files changed, 125 insertions(+), 1 deletion(-)
>   create mode 100644 arch/riscv/kvm/vcpu_sbi.c
> 
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index 2af3a179c08e..0b1eceaef59f 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -241,4 +241,6 @@ bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
>   void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
>   void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
>   
> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu);
> +
>   #endif /* __RISCV_KVM_HOST_H__ */
> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> index 3e0c7558320d..b56dc1650d2c 100644
> --- a/arch/riscv/kvm/Makefile
> +++ b/arch/riscv/kvm/Makefile
> @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
>   kvm-objs := $(common-objs-y)
>   
>   kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
> -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
> +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
>   
>   obj-$(CONFIG_KVM)	+= kvm.o
> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> index fbc04fe335ad..87b83fcf9a14 100644
> --- a/arch/riscv/kvm/vcpu_exit.c
> +++ b/arch/riscv/kvm/vcpu_exit.c
> @@ -534,6 +534,9 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>   		    (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
>   			ret = stage2_page_fault(vcpu, run, scause, stval);
>   		break;
> +	case EXC_SUPERVISOR_SYSCALL:
> +		if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
> +			ret = kvm_riscv_vcpu_sbi_ecall(vcpu);
>   	default:
>   		break;
>   	};
> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> new file mode 100644
> index 000000000000..5793202eb514
> --- /dev/null
> +++ b/arch/riscv/kvm/vcpu_sbi.c
> @@ -0,0 +1,119 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/**
> + * Copyright (c) 2019 Western Digital Corporation or its affiliates.
> + *
> + * Authors:
> + *     Atish Patra <atish.patra@wdc.com>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <asm/csr.h>
> +#include <asm/kvm_vcpu_timer.h>
> +
> +#define SBI_VERSION_MAJOR			0
> +#define SBI_VERSION_MINOR			1
> +
> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */

Ugh, another one of those? Can't you just figure out a way to recover 
from the page fault? Also, you want to combine this with the instruction 
load logic, so that we have a single place that guest address space 
reads go through.

> +static unsigned long kvm_sbi_unpriv_load(const unsigned long *addr,
> +					 struct kvm_vcpu *vcpu)
> +{
> +	unsigned long flags, val;
> +	unsigned long __hstatus, __sstatus;
> +
> +	local_irq_save(flags);
> +	__hstatus = csr_read(CSR_HSTATUS);
> +	__sstatus = csr_read(CSR_SSTATUS);
> +	csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> +	csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
> +	val = *addr;
> +	csr_write(CSR_HSTATUS, __hstatus);
> +	csr_write(CSR_SSTATUS, __sstatus);
> +	local_irq_restore(flags);
> +
> +	return val;
> +}
> +
> +static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, u32 type)
> +{
> +	int i;
> +	struct kvm_vcpu *tmp;
> +
> +	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> +		tmp->arch.power_off = true;
> +	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
> +
> +	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> +	vcpu->run->system_event.type = type;
> +	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +}
> +
> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu)
> +{
> +	int ret = 1;
> +	u64 next_cycle;
> +	int vcpuid;
> +	struct kvm_vcpu *remote_vcpu;
> +	ulong dhart_mask;
> +	struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> +
> +	if (!cp)
> +		return -EINVAL;
> +	switch (cp->a7) {
> +	case SBI_SET_TIMER:
> +#if __riscv_xlen == 32
> +		next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
> +#else
> +		next_cycle = (u64)cp->a0;
> +#endif
> +		kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);

Ah, this is where the timer set happens. I still don't understand how 
this takes the frequency bit into account?

> +		break;
> +	case SBI_CONSOLE_PUTCHAR:
> +		/* Not implemented */
> +		cp->a0 = -ENOTSUPP;
> +		break;
> +	case SBI_CONSOLE_GETCHAR:
> +		/* Not implemented */
> +		cp->a0 = -ENOTSUPP;
> +		break;

These two should be covered by the default case.

> +	case SBI_CLEAR_IPI:
> +		kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
> +		break;
> +	case SBI_SEND_IPI:
> +		dhart_mask = kvm_sbi_unpriv_load((unsigned long *)cp->a0, vcpu);
> +		for_each_set_bit(vcpuid, &dhart_mask, BITS_PER_LONG) {
> +			remote_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, vcpuid);
> +			kvm_riscv_vcpu_set_interrupt(remote_vcpu, IRQ_S_SOFT);
> +		}
> +		break;
> +	case SBI_SHUTDOWN:
> +		kvm_sbi_system_shutdown(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
> +		ret = 0;
> +		break;
> +	case SBI_REMOTE_FENCE_I:
> +		sbi_remote_fence_i(NULL);
> +		break;
> +	/*
> +	 * TODO: There should be a way to call remote hfence.bvma.
> +	 * Preferred method is now a SBI call. Until then, just flush
> +	 * all tlbs.
> +	 */
> +	case SBI_REMOTE_SFENCE_VMA:
> +		/*TODO: Parse vma range.*/
> +		sbi_remote_sfence_vma(NULL, 0, 0);
> +		break;
> +	case SBI_REMOTE_SFENCE_VMA_ASID:
> +		/*TODO: Parse vma range for given ASID */
> +		sbi_remote_sfence_vma(NULL, 0, 0);
> +		break;
> +	default:
> +		cp->a0 = ENOTSUPP;
> +		break;

Please just send unsupported SBI events into user space.

Alex

> +	};
> +
> +	if (ret >= 0)
> +		cp->sepc += 4;
> +
> +	return ret;
> +}
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 00/20] KVM RISC-V Support
  2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
                   ` (19 preceding siblings ...)
  2019-08-22  8:47 ` [PATCH v5 20/20] RISC-V: KVM: Add MAINTAINERS entry Anup Patel
@ 2019-08-23  8:08 ` Alexander Graf
  2019-08-23 11:25   ` Anup Patel
  20 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-23  8:08 UTC (permalink / raw)
  To: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini, Radim K
  Cc: Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, Anup Patel, kvm, linux-riscv,
	linux-kernel

On 22.08.19 10:42, Anup Patel wrote:
> This series adds initial KVM RISC-V support. Currently, we are able to boot
> RISC-V 64bit Linux Guests with multiple VCPUs.
> 
> Few key aspects of KVM RISC-V added by this series are:
> 1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
> 2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
> 3. KVM ONE_REG interface for VCPU register access from user-space.
> 4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
>     be added in future.
> 5. Timer and IPI emuation is done in-kernel.
> 6. MMU notifiers supported.
> 7. FP lazy save/restore supported.
> 8. SBI v0.1 emulation for KVM Guest available.
> 
> Here's a brief TODO list which we will work upon after this series:
> 1. Handle trap from unpriv access in reading Guest instruction
> 2. Handle trap from unpriv access in SBI v0.1 emulation
> 3. Implement recursive stage2 page table programing
> 4. SBI v0.2 emulation in-kernel
> 5. SBI v0.2 hart hotplug emulation in-kernel
> 6. In-kernel PLIC emulation
> 7. ..... and more .....

Please consider patches I did not comment on as

Reviewed-by: Alexander Graf <graf@amazon.com>

Overall, I'm quite happy with the code. It's a very clean implementation 
of a KVM target.

The only major nit I have is the guest address space read: I don't think 
we should pull in code that we know allows user space to DOS the kernel. 
For that, we need to find an alternative. Either you implement a 
software page table walker and resolve VAs manually or you find a way to 
ensure that *any* exception taken during the read does not affect 
general code execution.


Thanks,

Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 15/20] RISC-V: KVM: Add timer functionality
  2019-08-23  7:52   ` Alexander Graf
@ 2019-08-23 11:04     ` Anup Patel
  2019-08-23 11:33       ` Graf (AWS), Alexander
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-23 11:04 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Fri, Aug 23, 2019 at 1:23 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 22.08.19 10:46, Anup Patel wrote:
> > From: Atish Patra <atish.patra@wdc.com>
> >
> > The RISC-V hypervisor specification doesn't have any virtual timer
> > feature.
> >
> > Due to this, the guest VCPU timer will be programmed via SBI calls.
> > The host will use a separate hrtimer event for each guest VCPU to
> > provide timer functionality. We inject a virtual timer interrupt to
> > the guest VCPU whenever the guest VCPU hrtimer event expires.
> >
> > The following features are not supported yet and will be added in
> > future:
> > 1. A time offset to adjust guest time from host time
> > 2. A saved next event in guest vcpu for vm migration
>
> Implementing these 2 bits right now should be trivial. Why wait?

We were waiting for HTIMEDELTA CSR to be merged so we
deferred this items.

>
> >
> > Signed-off-by: Atish Patra <atish.patra@wdc.com>
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   arch/riscv/include/asm/kvm_host.h       |   4 +
> >   arch/riscv/include/asm/kvm_vcpu_timer.h |  32 +++++++
> >   arch/riscv/kvm/Makefile                 |   2 +-
> >   arch/riscv/kvm/vcpu.c                   |   6 ++
> >   arch/riscv/kvm/vcpu_timer.c             | 106 ++++++++++++++++++++++++
> >   drivers/clocksource/timer-riscv.c       |   8 ++
> >   include/clocksource/timer-riscv.h       |  16 ++++
> >   7 files changed, 173 insertions(+), 1 deletion(-)
> >   create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
> >   create mode 100644 arch/riscv/kvm/vcpu_timer.c
> >   create mode 100644 include/clocksource/timer-riscv.h
> >
> > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > index ab33e59a3d88..d2a2e45eefc0 100644
> > --- a/arch/riscv/include/asm/kvm_host.h
> > +++ b/arch/riscv/include/asm/kvm_host.h
> > @@ -12,6 +12,7 @@
> >   #include <linux/types.h>
> >   #include <linux/kvm.h>
> >   #include <linux/kvm_types.h>
> > +#include <asm/kvm_vcpu_timer.h>
> >
> >   #ifdef CONFIG_64BIT
> >   #define KVM_MAX_VCPUS                       (1U << 16)
> > @@ -167,6 +168,9 @@ struct kvm_vcpu_arch {
> >       unsigned long irqs_pending;
> >       unsigned long irqs_pending_mask;
> >
> > +     /* VCPU Timer */
> > +     struct kvm_vcpu_timer timer;
> > +
> >       /* MMIO instruction details */
> >       struct kvm_mmio_decode mmio_decode;
> >
> > diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h
> > new file mode 100644
> > index 000000000000..df67ea86988e
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
> > @@ -0,0 +1,32 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2019 Western Digital Corporation or its affiliates.
> > + *
> > + * Authors:
> > + *   Atish Patra <atish.patra@wdc.com>
> > + */
> > +
> > +#ifndef __KVM_VCPU_RISCV_TIMER_H
> > +#define __KVM_VCPU_RISCV_TIMER_H
> > +
> > +#include <linux/hrtimer.h>
> > +
> > +#define VCPU_TIMER_PROGRAM_THRESHOLD_NS              1000
> > +
> > +struct kvm_vcpu_timer {
> > +     bool init_done;
> > +     /* Check if the timer is programmed */
> > +     bool is_set;
> > +     struct hrtimer hrt;
> > +     /* Mult & Shift values to get nanosec from cycles */
> > +     u32 mult;
> > +     u32 shift;
> > +};
> > +
> > +int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu);
> > +int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
> > +int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
> > +int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
> > +                                 unsigned long ncycles);
>
> This function never gets called?

It's called from SBI emulation.

>
> > +
> > +#endif
> > diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> > index c0f57f26c13d..3e0c7558320d 100644
> > --- a/arch/riscv/kvm/Makefile
> > +++ b/arch/riscv/kvm/Makefile
> > @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
> >   kvm-objs := $(common-objs-y)
> >
> >   kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
> > -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
> > +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
> >
> >   obj-$(CONFIG_KVM)   += kvm.o
> > diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> > index 6124077d154f..018fca436776 100644
> > --- a/arch/riscv/kvm/vcpu.c
> > +++ b/arch/riscv/kvm/vcpu.c
> > @@ -54,6 +54,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
> >
> >       memcpy(cntx, reset_cntx, sizeof(*cntx));
> >
> > +     kvm_riscv_vcpu_timer_reset(vcpu);
> > +
> >       WRITE_ONCE(vcpu->arch.irqs_pending, 0);
> >       WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
> >   }
> > @@ -108,6 +110,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >       cntx->hstatus |= HSTATUS_SP2P;
> >       cntx->hstatus |= HSTATUS_SPV;
> >
> > +     /* Setup VCPU timer */
> > +     kvm_riscv_vcpu_timer_init(vcpu);
> > +
> >       /* Reset VCPU */
> >       kvm_riscv_reset_vcpu(vcpu);
> >
> > @@ -116,6 +121,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >
> >   void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >   {
> > +     kvm_riscv_vcpu_timer_deinit(vcpu);
> >       kvm_riscv_stage2_flush_cache(vcpu);
> >       kmem_cache_free(kvm_vcpu_cache, vcpu);
> >   }
> > diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
> > new file mode 100644
> > index 000000000000..a45ca06e1aa6
> > --- /dev/null
> > +++ b/arch/riscv/kvm/vcpu_timer.c
> > @@ -0,0 +1,106 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Western Digital Corporation or its affiliates.
> > + *
> > + * Authors:
> > + *     Atish Patra <atish.patra@wdc.com>
> > + */
> > +
> > +#include <linux/errno.h>
> > +#include <linux/err.h>
> > +#include <linux/kvm_host.h>
> > +#include <clocksource/timer-riscv.h>
> > +#include <asm/csr.h>
> > +#include <asm/kvm_vcpu_timer.h>
> > +
> > +static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h)
> > +{
> > +     struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt);
> > +     struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer);
> > +
> > +     t->is_set = false;
> > +     kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
> > +
> > +     return HRTIMER_NORESTART;
> > +}
> > +
> > +static u64 kvm_riscv_delta_cycles2ns(u64 cycles, struct kvm_vcpu_timer *t)
> > +{
> > +     unsigned long flags;
> > +     u64 cycles_now, cycles_delta, delta_ns;
> > +
> > +     local_irq_save(flags);
> > +     cycles_now = get_cycles64();
> > +     if (cycles_now < cycles)
> > +             cycles_delta = cycles - cycles_now;
> > +     else
> > +             cycles_delta = 0;
> > +     delta_ns = (cycles_delta * t->mult) >> t->shift;
> > +     local_irq_restore(flags);
> > +
> > +     return delta_ns;
> > +}
> > +
> > +static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t)
> > +{
> > +     if (!t->init_done || !t->is_set)
> > +             return -EINVAL;
> > +
> > +     hrtimer_cancel(&t->hrt);
> > +     t->is_set = false;
> > +
> > +     return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
> > +                                 unsigned long ncycles)
> > +{
> > +     struct kvm_vcpu_timer *t = &vcpu->arch.timer;
> > +     u64 delta_ns = kvm_riscv_delta_cycles2ns(ncycles, t);
>
> ... in fact, I feel like I'm missing something obvious here. How does
> the guest trigger the timer event? What is the argument it uses for that
> and how does that play with the tbfreq in the earlier patch?

We have SBI call inferface between Hypervisor and Guest. One of the
SBI call allows Guest to program time event. The next event is specified
as absolute cycles. The Guest can read time using TIME CSR which
returns system timer value (@ tbfreq freqency).

Guest Linux will know the tbfreq from DTB passed by QEMU/KVMTOOL
and it has to be same as Host tbfreq.

The TBFREQ config register visible to user-space is a read-only CONFIG
register which tells user-space tools (QEMU/KVMTOOL) about Host tbfreq.

Regards,
Anup

>
>
> Alex
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support
  2019-08-23  8:04   ` Alexander Graf
@ 2019-08-23 11:17     ` Anup Patel
  2019-08-23 11:38       ` Graf (AWS), Alexander
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-23 11:17 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Damien Le Moal, kvm, Daniel Lezcano, linux-kernel,
	Christoph Hellwig, Atish Patra, Alistair Francis,
	Thomas Gleixner, linux-riscv

On Fri, Aug 23, 2019 at 1:34 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 22.08.19 10:46, Anup Patel wrote:
> > From: Atish Patra <atish.patra@wdc.com>
> >
> > The KVM host kernel running in HS-mode needs to handle SBI calls coming
> > from guest kernel running in VS-mode.
> >
> > This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
> > implemented correctly except remote tlb flushes. For remote TLB flushes,
> > we are doing full TLB flush and this will be optimized in future.
> >
> > Signed-off-by: Atish Patra <atish.patra@wdc.com>
> > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   arch/riscv/include/asm/kvm_host.h |   2 +
> >   arch/riscv/kvm/Makefile           |   2 +-
> >   arch/riscv/kvm/vcpu_exit.c        |   3 +
> >   arch/riscv/kvm/vcpu_sbi.c         | 119 ++++++++++++++++++++++++++++++
> >   4 files changed, 125 insertions(+), 1 deletion(-)
> >   create mode 100644 arch/riscv/kvm/vcpu_sbi.c
> >
> > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > index 2af3a179c08e..0b1eceaef59f 100644
> > --- a/arch/riscv/include/asm/kvm_host.h
> > +++ b/arch/riscv/include/asm/kvm_host.h
> > @@ -241,4 +241,6 @@ bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
> >   void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
> >   void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
> >
> > +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu);
> > +
> >   #endif /* __RISCV_KVM_HOST_H__ */
> > diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> > index 3e0c7558320d..b56dc1650d2c 100644
> > --- a/arch/riscv/kvm/Makefile
> > +++ b/arch/riscv/kvm/Makefile
> > @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
> >   kvm-objs := $(common-objs-y)
> >
> >   kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
> > -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
> > +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
> >
> >   obj-$(CONFIG_KVM)   += kvm.o
> > diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> > index fbc04fe335ad..87b83fcf9a14 100644
> > --- a/arch/riscv/kvm/vcpu_exit.c
> > +++ b/arch/riscv/kvm/vcpu_exit.c
> > @@ -534,6 +534,9 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >                   (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
> >                       ret = stage2_page_fault(vcpu, run, scause, stval);
> >               break;
> > +     case EXC_SUPERVISOR_SYSCALL:
> > +             if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
> > +                     ret = kvm_riscv_vcpu_sbi_ecall(vcpu);
> >       default:
> >               break;
> >       };
> > diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> > new file mode 100644
> > index 000000000000..5793202eb514
> > --- /dev/null
> > +++ b/arch/riscv/kvm/vcpu_sbi.c
> > @@ -0,0 +1,119 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/**
> > + * Copyright (c) 2019 Western Digital Corporation or its affiliates.
> > + *
> > + * Authors:
> > + *     Atish Patra <atish.patra@wdc.com>
> > + */
> > +
> > +#include <linux/errno.h>
> > +#include <linux/err.h>
> > +#include <linux/kvm_host.h>
> > +#include <asm/csr.h>
> > +#include <asm/kvm_vcpu_timer.h>
> > +
> > +#define SBI_VERSION_MAJOR                    0
> > +#define SBI_VERSION_MINOR                    1
> > +
> > +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
>
> Ugh, another one of those? Can't you just figure out a way to recover
> from the page fault? Also, you want to combine this with the instruction
> load logic, so that we have a single place that guest address space
> reads go through.

Walking Guest page table would be more expensive compared to implementing
a trap handling mechanism.

We will be adding trap handling mechanism for reading instruction and reading
load.

Both these operations are different in following ways:
1. RISC-V instructions are variable length. We get to know exact instruction
    length only after reading first 16bits
2. We need to set VSSTATUS.MXR bit when reading instruction for
    execute-only Guest pages.

>
> > +static unsigned long kvm_sbi_unpriv_load(const unsigned long *addr,
> > +                                      struct kvm_vcpu *vcpu)
> > +{
> > +     unsigned long flags, val;
> > +     unsigned long __hstatus, __sstatus;
> > +
> > +     local_irq_save(flags);
> > +     __hstatus = csr_read(CSR_HSTATUS);
> > +     __sstatus = csr_read(CSR_SSTATUS);
> > +     csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> > +     csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
> > +     val = *addr;
> > +     csr_write(CSR_HSTATUS, __hstatus);
> > +     csr_write(CSR_SSTATUS, __sstatus);
> > +     local_irq_restore(flags);
> > +
> > +     return val;
> > +}
> > +
> > +static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, u32 type)
> > +{
> > +     int i;
> > +     struct kvm_vcpu *tmp;
> > +
> > +     kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> > +             tmp->arch.power_off = true;
> > +     kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
> > +
> > +     memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> > +     vcpu->run->system_event.type = type;
> > +     vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> > +}
> > +
> > +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu)
> > +{
> > +     int ret = 1;
> > +     u64 next_cycle;
> > +     int vcpuid;
> > +     struct kvm_vcpu *remote_vcpu;
> > +     ulong dhart_mask;
> > +     struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> > +
> > +     if (!cp)
> > +             return -EINVAL;
> > +     switch (cp->a7) {
> > +     case SBI_SET_TIMER:
> > +#if __riscv_xlen == 32
> > +             next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
> > +#else
> > +             next_cycle = (u64)cp->a0;
> > +#endif
> > +             kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
>
> Ah, this is where the timer set happens. I still don't understand how
> this takes the frequency bit into account?

Explained it in PATCH17 comments.

>
> > +             break;
> > +     case SBI_CONSOLE_PUTCHAR:
> > +             /* Not implemented */
> > +             cp->a0 = -ENOTSUPP;
> > +             break;
> > +     case SBI_CONSOLE_GETCHAR:
> > +             /* Not implemented */
> > +             cp->a0 = -ENOTSUPP;
> > +             break;
>
> These two should be covered by the default case.

Sure, I will update.

>
> > +     case SBI_CLEAR_IPI:
> > +             kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
> > +             break;
> > +     case SBI_SEND_IPI:
> > +             dhart_mask = kvm_sbi_unpriv_load((unsigned long *)cp->a0, vcpu);
> > +             for_each_set_bit(vcpuid, &dhart_mask, BITS_PER_LONG) {
> > +                     remote_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, vcpuid);
> > +                     kvm_riscv_vcpu_set_interrupt(remote_vcpu, IRQ_S_SOFT);
> > +             }
> > +             break;
> > +     case SBI_SHUTDOWN:
> > +             kvm_sbi_system_shutdown(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
> > +             ret = 0;
> > +             break;
> > +     case SBI_REMOTE_FENCE_I:
> > +             sbi_remote_fence_i(NULL);
> > +             break;
> > +     /*
> > +      * TODO: There should be a way to call remote hfence.bvma.
> > +      * Preferred method is now a SBI call. Until then, just flush
> > +      * all tlbs.
> > +      */
> > +     case SBI_REMOTE_SFENCE_VMA:
> > +             /*TODO: Parse vma range.*/
> > +             sbi_remote_sfence_vma(NULL, 0, 0);
> > +             break;
> > +     case SBI_REMOTE_SFENCE_VMA_ASID:
> > +             /*TODO: Parse vma range for given ASID */
> > +             sbi_remote_sfence_vma(NULL, 0, 0);
> > +             break;
> > +     default:
> > +             cp->a0 = ENOTSUPP;
> > +             break;
>
> Please just send unsupported SBI events into user space.

For unsupported SBI calls, we should be returning error to the
Guest Linux so that do something about it. This is in accordance
with the SBI spec.

The SBI v0.1 is quite primitive in design but we have SBI v0.2
base spec now available. The SBI v0.2 is extensible and people
can easily come-up with new set of SBI v0.2 calls (i.e. SBI v0.2
extensions).

We also have SBI v0.2 implementation coming-up in next
series.

Regards,
Anup

>
> Alex
>
> > +     };
> > +
> > +     if (ret >= 0)
> > +             cp->sepc += 4;
> > +
> > +     return ret;
> > +}
> >
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  2019-08-22 14:12       ` Alexander Graf
@ 2019-08-23 11:20         ` Anup Patel
  2019-08-23 11:42           ` Graf (AWS), Alexander
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-23 11:20 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 7:42 PM Alexander Graf <graf@amazon.com> wrote:
>
>
>
> On 22.08.19 16:00, Anup Patel wrote:
> > On Thu, Aug 22, 2019 at 5:31 PM Alexander Graf <graf@amazon.com> wrote:
> >>
> >> On 22.08.19 10:44, Anup Patel wrote:
> >>> For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
> >>> VCPU config and registers from user-space.
> >>>
> >>> We have three types of VCPU registers:
> >>> 1. CONFIG - these are VCPU config and capabilities
> >>> 2. CORE   - these are VCPU general purpose registers
> >>> 3. CSR    - these are VCPU control and status registers
> >>>
> >>> The CONFIG registers available to user-space are ISA and TIMEBASE. Out
> >>> of these, TIMEBASE is a read-only register which inform user-space about
> >>> VCPU timer base frequency. The ISA register is a read and write register
> >>> where user-space can only write the desired VCPU ISA capabilities before
> >>> running the VCPU.
> >>>
> >>> The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
> >>> T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
> >>> PC and MODE. The PC register represents program counter whereas the MODE
> >>> register represent VCPU privilege mode (i.e. S/U-mode).
> >>>
> >>> The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
> >>> SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.
> >>>
> >>> In future, more VCPU register types will be added (such as FP) for the
> >>> KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.
> >>>
> >>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> >>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> ---
> >>>    arch/riscv/include/uapi/asm/kvm.h |  40 ++++-
> >>>    arch/riscv/kvm/vcpu.c             | 235 +++++++++++++++++++++++++++++-
> >>>    2 files changed, 272 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
> >>> index 6dbc056d58ba..024f220eb17e 100644
> >>> --- a/arch/riscv/include/uapi/asm/kvm.h
> >>> +++ b/arch/riscv/include/uapi/asm/kvm.h
> >>> @@ -23,8 +23,15 @@
> >>>
> >>>    /* for KVM_GET_REGS and KVM_SET_REGS */
> >>>    struct kvm_regs {
> >>> +     /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
> >>> +     struct user_regs_struct regs;
> >>> +     unsigned long mode;
> >>
> >> Is there any particular reason you're reusing kvm_regs and don't invent
> >> your own struct? kvm_regs is explicitly meant for the get_regs and
> >> set_regs ioctls.
> >
> > We are implementing only ONE_REG interface so most of these
> > structs are unused hence we tried to reuse these struct instead
> > of introducing new structs. (Similar to KVM ARM64)
> >
> >>
> >>>    };
> >>>
> >>> +/* Possible privilege modes for kvm_regs */
> >>> +#define KVM_RISCV_MODE_S     1
> >>> +#define KVM_RISCV_MODE_U     0
> >>> +
> >>>    /* for KVM_GET_FPU and KVM_SET_FPU */
> >>>    struct kvm_fpu {
> >>>    };
> >>> @@ -41,10 +48,41 @@ struct kvm_guest_debug_arch {
> >>>    struct kvm_sync_regs {
> >>>    };
> >>>
> >>> -/* dummy definition */
> >>> +/* for KVM_GET_SREGS and KVM_SET_SREGS */
> >>>    struct kvm_sregs {
> >>> +     unsigned long sstatus;
> >>> +     unsigned long sie;
> >>> +     unsigned long stvec;
> >>> +     unsigned long sscratch;
> >>> +     unsigned long sepc;
> >>> +     unsigned long scause;
> >>> +     unsigned long stval;
> >>> +     unsigned long sip;
> >>> +     unsigned long satp;
> >>
> >> Same comment here.
> >
> > Same as above, we are trying to use unused struct.
> >
> >>
> >>>    };
> >>>
> >>> +#define KVM_REG_SIZE(id)             \
> >>> +     (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> >>> +
> >>> +/* If you need to interpret the index values, here is the key: */
> >>> +#define KVM_REG_RISCV_TYPE_MASK              0x00000000FF000000
> >>> +#define KVM_REG_RISCV_TYPE_SHIFT     24
> >>> +
> >>> +/* Config registers are mapped as type 1 */
> >>> +#define KVM_REG_RISCV_CONFIG         (0x01 << KVM_REG_RISCV_TYPE_SHIFT)
> >>> +#define KVM_REG_RISCV_CONFIG_ISA     0x0
> >>> +#define KVM_REG_RISCV_CONFIG_TIMEBASE        0x1
> >>> +
> >>> +/* Core registers are mapped as type 2 */
> >>> +#define KVM_REG_RISCV_CORE           (0x02 << KVM_REG_RISCV_TYPE_SHIFT)
> >>> +#define KVM_REG_RISCV_CORE_REG(name) \
> >>> +             (offsetof(struct kvm_regs, name) / sizeof(unsigned long))
> >>
> >> I see, you're trying to implicitly use the struct offsets as index.
> >>
> >> I'm not a really big fan of it, but I can't pinpoint exactly why just
> >> yet. It just seems too magical (read: potentially breaking down the
> >> road) for me.
> >>
> >>> +
> >>> +/* Control and status registers are mapped as type 3 */
> >>> +#define KVM_REG_RISCV_CSR            (0x03 << KVM_REG_RISCV_TYPE_SHIFT)
> >>> +#define KVM_REG_RISCV_CSR_REG(name)  \
> >>> +             (offsetof(struct kvm_sregs, name) / sizeof(unsigned long))
> >>> +
> >>>    #endif
> >>>
> >>>    #endif /* __LINUX_KVM_RISCV_H */
> >>> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> >>> index 7f59e85c6af8..9396a83c0611 100644
> >>> --- a/arch/riscv/kvm/vcpu.c
> >>> +++ b/arch/riscv/kvm/vcpu.c
> >>> @@ -164,6 +164,215 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> >>>        return VM_FAULT_SIGBUS;
> >>>    }
> >>>
> >>> +static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
> >>> +                                      const struct kvm_one_reg *reg)
> >>> +{
> >>> +     unsigned long __user *uaddr =
> >>> +                     (unsigned long __user *)(unsigned long)reg->addr;
> >>> +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> >>> +                                         KVM_REG_SIZE_MASK |
> >>> +                                         KVM_REG_RISCV_CONFIG);
> >>> +     unsigned long reg_val;
> >>> +
> >>> +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> >>> +             return -EINVAL;
> >>> +
> >>> +     switch (reg_num) {
> >>> +     case KVM_REG_RISCV_CONFIG_ISA:
> >>> +             reg_val = vcpu->arch.isa;
> >>> +             break;
> >>> +     case KVM_REG_RISCV_CONFIG_TIMEBASE:
> >>> +             reg_val = riscv_timebase;
> >>
> >> What does this reflect? The current guest time hopefully not? An offset?
> >> Related to what?
> >
> > riscv_timebase is the frequency in HZ of the system timer.
> >
> > The name "timebase" is not appropriate but we have been
> > carrying it since quite some time now.
>
> What do you mean by "some time"? So far I only see a kernel internal
> variable named after it. That's dramatically different from something
> exposed via uapi.
>
> Just name it tbfreq.

Sure, I will use TBFREQ name.

>
> So if this is the frequency, where is the offset? You will need it on
> save/restore. If you're saying that's out of scope for now, that's fine
> with me too :).

tbfreq is read-only and fixed.

The Guest tbfreq has to be same as Host tbfreq. This means we
can only migrate Guest from Host A to Host B only if:
1. They have matching ISA capabilities
2. They have matching tbfreq

Regards,
Anup

>
>
> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming
  2019-08-22 14:09           ` Alexander Graf
@ 2019-08-23 11:21             ` Anup Patel
  0 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-23 11:21 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Thu, Aug 22, 2019 at 7:39 PM Alexander Graf <graf@amazon.com> wrote:
>
>
>
> On 22.08.19 15:58, Anup Patel wrote:
> > On Thu, Aug 22, 2019 at 6:57 PM Alexander Graf <graf@amazon.com> wrote:
> >>
> >>
> >>
> >> On 22.08.19 14:38, Anup Patel wrote:
> >>> On Thu, Aug 22, 2019 at 5:58 PM Alexander Graf <graf@amazon.com> wrote:
> >>>>
> >>>> On 22.08.19 10:45, Anup Patel wrote:
> >>>>> This patch implements all required functions for programming
> >>>>> the stage2 page table for each Guest/VM.
> >>>>>
> >>>>> At high-level, the flow of stage2 related functions is similar
> >>>>> from KVM ARM/ARM64 implementation but the stage2 page table
> >>>>> format is quite different for KVM RISC-V.
> >>>>>
> >>>>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> >>>>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>>> ---
> >>>>>     arch/riscv/include/asm/kvm_host.h     |  10 +
> >>>>>     arch/riscv/include/asm/pgtable-bits.h |   1 +
> >>>>>     arch/riscv/kvm/mmu.c                  | 637 +++++++++++++++++++++++++-
> >>>>>     3 files changed, 638 insertions(+), 10 deletions(-)
> >>>>>
> >>>>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> >>>>> index 3b09158f80f2..a37775c92586 100644
> >>>>> --- a/arch/riscv/include/asm/kvm_host.h
> >>>>> +++ b/arch/riscv/include/asm/kvm_host.h
> >>>>> @@ -72,6 +72,13 @@ struct kvm_mmio_decode {
> >>>>>         int shift;
> >>>>>     };
> >>>>>
> >>>>> +#define KVM_MMU_PAGE_CACHE_NR_OBJS   32
> >>>>> +
> >>>>> +struct kvm_mmu_page_cache {
> >>>>> +     int nobjs;
> >>>>> +     void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS];
> >>>>> +};
> >>>>> +
> >>>>>     struct kvm_cpu_context {
> >>>>>         unsigned long zero;
> >>>>>         unsigned long ra;
> >>>>> @@ -163,6 +170,9 @@ struct kvm_vcpu_arch {
> >>>>>         /* MMIO instruction details */
> >>>>>         struct kvm_mmio_decode mmio_decode;
> >>>>>
> >>>>> +     /* Cache pages needed to program page tables with spinlock held */
> >>>>> +     struct kvm_mmu_page_cache mmu_page_cache;
> >>>>> +
> >>>>>         /* VCPU power-off state */
> >>>>>         bool power_off;
> >>>>>
> >>>>> diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
> >>>>> index bbaeb5d35842..be49d62fcc2b 100644
> >>>>> --- a/arch/riscv/include/asm/pgtable-bits.h
> >>>>> +++ b/arch/riscv/include/asm/pgtable-bits.h
> >>>>> @@ -26,6 +26,7 @@
> >>>>>
> >>>>>     #define _PAGE_SPECIAL   _PAGE_SOFT
> >>>>>     #define _PAGE_TABLE     _PAGE_PRESENT
> >>>>> +#define _PAGE_LEAF      (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
> >>>>>
> >>>>>     /*
> >>>>>      * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
> >>>>> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> >>>>> index 2b965f9aac07..9e95ab6769f6 100644
> >>>>> --- a/arch/riscv/kvm/mmu.c
> >>>>> +++ b/arch/riscv/kvm/mmu.c
> >>>>> @@ -18,6 +18,432 @@
> >>>>>     #include <asm/page.h>
> >>>>>     #include <asm/pgtable.h>
> >>>>>
> >>>>> +#ifdef CONFIG_64BIT
> >>>>> +#define stage2_have_pmd              true
> >>>>> +#define stage2_gpa_size              ((phys_addr_t)(1ULL << 39))
> >>>>> +#define stage2_cache_min_pages       2
> >>>>> +#else
> >>>>> +#define pmd_index(x)         0
> >>>>> +#define pfn_pmd(x, y)                ({ pmd_t __x = { 0 }; __x; })
> >>>>> +#define stage2_have_pmd              false
> >>>>> +#define stage2_gpa_size              ((phys_addr_t)(1ULL << 32))
> >>>>> +#define stage2_cache_min_pages       1
> >>>>> +#endif
> >>>>> +
> >>>>> +static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache,
> >>>>> +                           int min, int max)
> >>>>> +{
> >>>>> +     void *page;
> >>>>> +
> >>>>> +     BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS);
> >>>>> +     if (pcache->nobjs >= min)
> >>>>> +             return 0;
> >>>>> +     while (pcache->nobjs < max) {
> >>>>> +             page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> >>>>> +             if (!page)
> >>>>> +                     return -ENOMEM;
> >>>>> +             pcache->objects[pcache->nobjs++] = page;
> >>>>> +     }
> >>>>> +
> >>>>> +     return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache)
> >>>>> +{
> >>>>> +     while (pcache && pcache->nobjs)
> >>>>> +             free_page((unsigned long)pcache->objects[--pcache->nobjs]);
> >>>>> +}
> >>>>> +
> >>>>> +static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
> >>>>> +{
> >>>>> +     void *p;
> >>>>> +
> >>>>> +     if (!pcache)
> >>>>> +             return NULL;
> >>>>> +
> >>>>> +     BUG_ON(!pcache->nobjs);
> >>>>> +     p = pcache->objects[--pcache->nobjs];
> >>>>> +
> >>>>> +     return p;
> >>>>> +}
> >>>>> +
> >>>>> +struct local_guest_tlb_info {
> >>>>> +     struct kvm_vmid *vmid;
> >>>>> +     gpa_t addr;
> >>>>> +};
> >>>>> +
> >>>>> +static void local_guest_tlb_flush_vmid_gpa(void *info)
> >>>>> +{
> >>>>> +     struct local_guest_tlb_info *infop = info;
> >>>>> +
> >>>>> +     __kvm_riscv_hfence_gvma_vmid_gpa(READ_ONCE(infop->vmid->vmid_version),
> >>>>> +                                      infop->addr);
> >>>>> +}
> >>>>> +
> >>>>> +static void stage2_remote_tlb_flush(struct kvm *kvm, gpa_t addr)
> >>>>> +{
> >>>>> +     struct local_guest_tlb_info info;
> >>>>> +     struct kvm_vmid *vmid = &kvm->arch.vmid;
> >>>>> +
> >>>>> +     /* TODO: This should be SBI call */
> >>>>> +     info.vmid = vmid;
> >>>>> +     info.addr = addr;
> >>>>> +     preempt_disable();
> >>>>> +     smp_call_function_many(cpu_all_mask, local_guest_tlb_flush_vmid_gpa,
> >>>>> +                            &info, true);
> >>>>
> >>>> This is all nice and dandy on the toy 4 core systems we have today, but
> >>>> it will become a bottleneck further down the road.
> >>>>
> >>>> How many VMIDs do you have? Could you just allocate a new one every time
> >>>> you switch host CPUs? Then you know exactly which CPUs to flush by
> >>>> looking at all your vcpu structs and a local field that tells you which
> >>>> pCPU they're on at this moment.
> >>>>
> >>>> Either way, it's nothing that should block inclusion. For today, we're fine.
> >>>
> >>> We are not happy about this either.
> >>>
> >>> Other two options, we have are:
> >>> 1. Have SBI calls for remote HFENCEs
> >>> 2. Propose RISC-V ISA extension for remote FENCEs
> >>>
> >>> Option1 is mostly extending SBI spec and implementing it in runtime
> >>> firmware.
> >>>
> >>> Option2 is ideal solution but requires consensus among wider audience
> >>> in RISC-V foundation.
> >>>
> >>> At this point, we are fine with a simple solution.
> >>
> >> It's fine to explicitly IPI other CPUs to flush their TLBs. What is not
> >> fine is to IPI *all* CPUs to flush their TLBs.
> >
> > Ahh, this should have been cpu_online_mask instead of cpu_all_mask
> >
> > I will update this in next revision.
>
> What I was trying to say is that you only want to flush currently
> running other vcpus and add a hint for all the others saying "please
> flush the next time you come up".
>
> I think we had a mechanism for that somewhere in the EVENT magic.
>
> But as I said, this is a performance optimization - that's something I'm
> happy to delay. Security and user space ABI are the bits I'm worried
> about at this stage.

I had thought about this previously. This will be certainly a good
optimization. Let me add a TODO comment here so that we don't
forget it.

Regards,
Anup

>
>
> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 00/20] KVM RISC-V Support
  2019-08-23  8:08 ` [PATCH v5 00/20] KVM RISC-V Support Alexander Graf
@ 2019-08-23 11:25   ` Anup Patel
  2019-08-23 11:44     ` Graf (AWS), Alexander
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-23 11:25 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Fri, Aug 23, 2019 at 1:39 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 22.08.19 10:42, Anup Patel wrote:
> > This series adds initial KVM RISC-V support. Currently, we are able to boot
> > RISC-V 64bit Linux Guests with multiple VCPUs.
> >
> > Few key aspects of KVM RISC-V added by this series are:
> > 1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
> > 2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
> > 3. KVM ONE_REG interface for VCPU register access from user-space.
> > 4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
> >     be added in future.
> > 5. Timer and IPI emuation is done in-kernel.
> > 6. MMU notifiers supported.
> > 7. FP lazy save/restore supported.
> > 8. SBI v0.1 emulation for KVM Guest available.
> >
> > Here's a brief TODO list which we will work upon after this series:
> > 1. Handle trap from unpriv access in reading Guest instruction
> > 2. Handle trap from unpriv access in SBI v0.1 emulation
> > 3. Implement recursive stage2 page table programing
> > 4. SBI v0.2 emulation in-kernel
> > 5. SBI v0.2 hart hotplug emulation in-kernel
> > 6. In-kernel PLIC emulation
> > 7. ..... and more .....
>
> Please consider patches I did not comment on as
>
> Reviewed-by: Alexander Graf <graf@amazon.com>
>
> Overall, I'm quite happy with the code. It's a very clean implementation
> of a KVM target.

Thanks Alex.

>
> The only major nit I have is the guest address space read: I don't think
> we should pull in code that we know allows user space to DOS the kernel.
> For that, we need to find an alternative. Either you implement a
> software page table walker and resolve VAs manually or you find a way to
> ensure that *any* exception taken during the read does not affect
> general code execution.

I will send v6 next week. I will try my best to implement unpriv trap
handling in v6 itself.

Regards,
Anup

>
>
> Thanks,
>
> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 15/20] RISC-V: KVM: Add timer functionality
  2019-08-23 11:04     ` Anup Patel
@ 2019-08-23 11:33       ` Graf (AWS), Alexander
  2019-08-23 11:46         ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Graf (AWS), Alexander @ 2019-08-23 11:33 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel



> Am 23.08.2019 um 13:05 schrieb Anup Patel <anup@brainfault.org>:
> 
>> On Fri, Aug 23, 2019 at 1:23 PM Alexander Graf <graf@amazon.com> wrote:
>> 
>>> On 22.08.19 10:46, Anup Patel wrote:
>>> From: Atish Patra <atish.patra@wdc.com>
>>> 
>>> The RISC-V hypervisor specification doesn't have any virtual timer
>>> feature.
>>> 
>>> Due to this, the guest VCPU timer will be programmed via SBI calls.
>>> The host will use a separate hrtimer event for each guest VCPU to
>>> provide timer functionality. We inject a virtual timer interrupt to
>>> the guest VCPU whenever the guest VCPU hrtimer event expires.
>>> 
>>> The following features are not supported yet and will be added in
>>> future:
>>> 1. A time offset to adjust guest time from host time
>>> 2. A saved next event in guest vcpu for vm migration
>> 
>> Implementing these 2 bits right now should be trivial. Why wait?
> 
> We were waiting for HTIMEDELTA CSR to be merged so we
> deferred this items.
> 
>> 
>>> 
>>> Signed-off-by: Atish Patra <atish.patra@wdc.com>
>>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
>>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  arch/riscv/include/asm/kvm_host.h       |   4 +
>>>  arch/riscv/include/asm/kvm_vcpu_timer.h |  32 +++++++
>>>  arch/riscv/kvm/Makefile                 |   2 +-
>>>  arch/riscv/kvm/vcpu.c                   |   6 ++
>>>  arch/riscv/kvm/vcpu_timer.c             | 106 ++++++++++++++++++++++++
>>>  drivers/clocksource/timer-riscv.c       |   8 ++
>>>  include/clocksource/timer-riscv.h       |  16 ++++
>>>  7 files changed, 173 insertions(+), 1 deletion(-)
>>>  create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
>>>  create mode 100644 arch/riscv/kvm/vcpu_timer.c
>>>  create mode 100644 include/clocksource/timer-riscv.h
>>> 
>>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
>>> index ab33e59a3d88..d2a2e45eefc0 100644
>>> --- a/arch/riscv/include/asm/kvm_host.h
>>> +++ b/arch/riscv/include/asm/kvm_host.h
>>> @@ -12,6 +12,7 @@
>>>  #include <linux/types.h>
>>>  #include <linux/kvm.h>
>>>  #include <linux/kvm_types.h>
>>> +#include <asm/kvm_vcpu_timer.h>
>>> 
>>>  #ifdef CONFIG_64BIT
>>>  #define KVM_MAX_VCPUS                       (1U << 16)
>>> @@ -167,6 +168,9 @@ struct kvm_vcpu_arch {
>>>      unsigned long irqs_pending;
>>>      unsigned long irqs_pending_mask;
>>> 
>>> +     /* VCPU Timer */
>>> +     struct kvm_vcpu_timer timer;
>>> +
>>>      /* MMIO instruction details */
>>>      struct kvm_mmio_decode mmio_decode;
>>> 
>>> diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h
>>> new file mode 100644
>>> index 000000000000..df67ea86988e
>>> --- /dev/null
>>> +++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
>>> @@ -0,0 +1,32 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Copyright (C) 2019 Western Digital Corporation or its affiliates.
>>> + *
>>> + * Authors:
>>> + *   Atish Patra <atish.patra@wdc.com>
>>> + */
>>> +
>>> +#ifndef __KVM_VCPU_RISCV_TIMER_H
>>> +#define __KVM_VCPU_RISCV_TIMER_H
>>> +
>>> +#include <linux/hrtimer.h>
>>> +
>>> +#define VCPU_TIMER_PROGRAM_THRESHOLD_NS 1000
>>> +
>>> +struct kvm_vcpu_timer {
>>> +     bool init_done;
>>> +     /* Check if the timer is programmed */
>>> +     bool is_set;
>>> +     struct hrtimer hrt;
>>> +     /* Mult & Shift values to get nanosec from cycles */
>>> +     u32 mult;
>>> +     u32 shift;
>>> +};
>>> +
>>> +int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu);
>>> +int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
>>> +int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
>>> +int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
>>> +                                 unsigned long ncycles);
>> 
>> This function never gets called?
> 
> It's called from SBI emulation.
> 
>> 
>>> +
>>> +#endif
>>> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
>>> index c0f57f26c13d..3e0c7558320d 100644
>>> --- a/arch/riscv/kvm/Makefile
>>> +++ b/arch/riscv/kvm/Makefile
>>> @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
>>>  kvm-objs := $(common-objs-y)
>>> 
>>>  kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
>>> -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
>>> +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
>>> 
>>>  obj-$(CONFIG_KVM)   += kvm.o
>>> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
>>> index 6124077d154f..018fca436776 100644
>>> --- a/arch/riscv/kvm/vcpu.c
>>> +++ b/arch/riscv/kvm/vcpu.c
>>> @@ -54,6 +54,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
>>> 
>>>      memcpy(cntx, reset_cntx, sizeof(*cntx));
>>> 
>>> +     kvm_riscv_vcpu_timer_reset(vcpu);
>>> +
>>>      WRITE_ONCE(vcpu->arch.irqs_pending, 0);
>>>      WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
>>>  }
>>> @@ -108,6 +110,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>>      cntx->hstatus |= HSTATUS_SP2P;
>>>      cntx->hstatus |= HSTATUS_SPV;
>>> 
>>> +     /* Setup VCPU timer */
>>> +     kvm_riscv_vcpu_timer_init(vcpu);
>>> +
>>>      /* Reset VCPU */
>>>      kvm_riscv_reset_vcpu(vcpu);
>>> 
>>> @@ -116,6 +121,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>> 
>>>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>  {
>>> +     kvm_riscv_vcpu_timer_deinit(vcpu);
>>>      kvm_riscv_stage2_flush_cache(vcpu);
>>>      kmem_cache_free(kvm_vcpu_cache, vcpu);
>>>  }
>>> diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
>>> new file mode 100644
>>> index 000000000000..a45ca06e1aa6
>>> --- /dev/null
>>> +++ b/arch/riscv/kvm/vcpu_timer.c
>>> @@ -0,0 +1,106 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * Copyright (C) 2019 Western Digital Corporation or its affiliates.
>>> + *
>>> + * Authors:
>>> + *     Atish Patra <atish.patra@wdc.com>
>>> + */
>>> +
>>> +#include <linux/errno.h>
>>> +#include <linux/err.h>
>>> +#include <linux/kvm_host.h>
>>> +#include <clocksource/timer-riscv.h>
>>> +#include <asm/csr.h>
>>> +#include <asm/kvm_vcpu_timer.h>
>>> +
>>> +static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h)
>>> +{
>>> +     struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt);
>>> +     struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer);
>>> +
>>> +     t->is_set = false;
>>> +     kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
>>> +
>>> +     return HRTIMER_NORESTART;
>>> +}
>>> +
>>> +static u64 kvm_riscv_delta_cycles2ns(u64 cycles, struct kvm_vcpu_timer *t)
>>> +{
>>> +     unsigned long flags;
>>> +     u64 cycles_now, cycles_delta, delta_ns;
>>> +
>>> +     local_irq_save(flags);
>>> +     cycles_now = get_cycles64();
>>> +     if (cycles_now < cycles)
>>> +             cycles_delta = cycles - cycles_now;
>>> +     else
>>> +             cycles_delta = 0;
>>> +     delta_ns = (cycles_delta * t->mult) >> t->shift;
>>> +     local_irq_restore(flags);
>>> +
>>> +     return delta_ns;
>>> +}
>>> +
>>> +static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t)
>>> +{
>>> +     if (!t->init_done || !t->is_set)
>>> +             return -EINVAL;
>>> +
>>> +     hrtimer_cancel(&t->hrt);
>>> +     t->is_set = false;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
>>> +                                 unsigned long ncycles)
>>> +{
>>> +     struct kvm_vcpu_timer *t = &vcpu->arch.timer;
>>> +     u64 delta_ns = kvm_riscv_delta_cycles2ns(ncycles, t);
>> 
>> ... in fact, I feel like I'm missing something obvious here. How does
>> the guest trigger the timer event? What is the argument it uses for that
>> and how does that play with the tbfreq in the earlier patch?
> 
> We have SBI call inferface between Hypervisor and Guest. One of the
> SBI call allows Guest to program time event. The next event is specified
> as absolute cycles. The Guest can read time using TIME CSR which
> returns system timer value (@ tbfreq freqency).
> 
> Guest Linux will know the tbfreq from DTB passed by QEMU/KVMTOOL
> and it has to be same as Host tbfreq.
> 
> The TBFREQ config register visible to user-space is a read-only CONFIG
> register which tells user-space tools (QEMU/KVMTOOL) about Host tbfreq.

And it's read-only because you can not trap on TB reads?

Alex

> 
> Regards,
> Anup
> 
>> 
>> 
>> Alex
>> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support
  2019-08-23 11:17     ` Anup Patel
@ 2019-08-23 11:38       ` Graf (AWS), Alexander
  2019-08-23 12:00         ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Graf (AWS), Alexander @ 2019-08-23 11:38 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Damien Le Moal, kvm, Daniel Lezcano, linux-kernel,
	Christoph Hellwig, Atish Patra, Alistair Francis,
	Thomas Gleixner, linux-riscv



> Am 23.08.2019 um 13:18 schrieb Anup Patel <anup@brainfault.org>:
> 
>> On Fri, Aug 23, 2019 at 1:34 PM Alexander Graf <graf@amazon.com> wrote:
>> 
>>> On 22.08.19 10:46, Anup Patel wrote:
>>> From: Atish Patra <atish.patra@wdc.com>
>>> 
>>> The KVM host kernel running in HS-mode needs to handle SBI calls coming
>>> from guest kernel running in VS-mode.
>>> 
>>> This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
>>> implemented correctly except remote tlb flushes. For remote TLB flushes,
>>> we are doing full TLB flush and this will be optimized in future.
>>> 
>>> Signed-off-by: Atish Patra <atish.patra@wdc.com>
>>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
>>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  arch/riscv/include/asm/kvm_host.h |   2 +
>>>  arch/riscv/kvm/Makefile           |   2 +-
>>>  arch/riscv/kvm/vcpu_exit.c        |   3 +
>>>  arch/riscv/kvm/vcpu_sbi.c         | 119 ++++++++++++++++++++++++++++++
>>>  4 files changed, 125 insertions(+), 1 deletion(-)
>>>  create mode 100644 arch/riscv/kvm/vcpu_sbi.c
>>> 
>>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
>>> index 2af3a179c08e..0b1eceaef59f 100644
>>> --- a/arch/riscv/include/asm/kvm_host.h
>>> +++ b/arch/riscv/include/asm/kvm_host.h
>>> @@ -241,4 +241,6 @@ bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
>>>  void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
>>>  void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
>>> 
>>> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu);
>>> +
>>>  #endif /* __RISCV_KVM_HOST_H__ */
>>> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
>>> index 3e0c7558320d..b56dc1650d2c 100644
>>> --- a/arch/riscv/kvm/Makefile
>>> +++ b/arch/riscv/kvm/Makefile
>>> @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
>>>  kvm-objs := $(common-objs-y)
>>> 
>>>  kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
>>> -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
>>> +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
>>> 
>>>  obj-$(CONFIG_KVM)   += kvm.o
>>> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
>>> index fbc04fe335ad..87b83fcf9a14 100644
>>> --- a/arch/riscv/kvm/vcpu_exit.c
>>> +++ b/arch/riscv/kvm/vcpu_exit.c
>>> @@ -534,6 +534,9 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>>                  (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
>>>                      ret = stage2_page_fault(vcpu, run, scause, stval);
>>>              break;
>>> +     case EXC_SUPERVISOR_SYSCALL:
>>> +             if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
>>> +                     ret = kvm_riscv_vcpu_sbi_ecall(vcpu);
>>>      default:
>>>              break;
>>>      };
>>> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
>>> new file mode 100644
>>> index 000000000000..5793202eb514
>>> --- /dev/null
>>> +++ b/arch/riscv/kvm/vcpu_sbi.c
>>> @@ -0,0 +1,119 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/**
>>> + * Copyright (c) 2019 Western Digital Corporation or its affiliates.
>>> + *
>>> + * Authors:
>>> + *     Atish Patra <atish.patra@wdc.com>
>>> + */
>>> +
>>> +#include <linux/errno.h>
>>> +#include <linux/err.h>
>>> +#include <linux/kvm_host.h>
>>> +#include <asm/csr.h>
>>> +#include <asm/kvm_vcpu_timer.h>
>>> +
>>> +#define SBI_VERSION_MAJOR                    0
>>> +#define SBI_VERSION_MINOR                    1
>>> +
>>> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
>> 
>> Ugh, another one of those? Can't you just figure out a way to recover
>> from the page fault? Also, you want to combine this with the instruction
>> load logic, so that we have a single place that guest address space
>> reads go through.
> 
> Walking Guest page table would be more expensive compared to implementing
> a trap handling mechanism.
> 
> We will be adding trap handling mechanism for reading instruction and reading
> load.
> 
> Both these operations are different in following ways:
> 1. RISC-V instructions are variable length. We get to know exact instruction
>    length only after reading first 16bits
> 2. We need to set VSSTATUS.MXR bit when reading instruction for
>    execute-only Guest pages.

Yup, sounds like you could solve that with a trivial if() based on "read instruction" or not, no? If you want to, feel free to provide short versions that do only read ins/data, but I would really like to see the whole "data reads become guest reads" magic to be funneled through a single function (in C, can be inline unrolled in asm of course)

> 
>> 
>>> +static unsigned long kvm_sbi_unpriv_load(const unsigned long *addr,
>>> +                                      struct kvm_vcpu *vcpu)
>>> +{
>>> +     unsigned long flags, val;
>>> +     unsigned long __hstatus, __sstatus;
>>> +
>>> +     local_irq_save(flags);
>>> +     __hstatus = csr_read(CSR_HSTATUS);
>>> +     __sstatus = csr_read(CSR_SSTATUS);
>>> +     csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
>>> +     csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
>>> +     val = *addr;
>>> +     csr_write(CSR_HSTATUS, __hstatus);
>>> +     csr_write(CSR_SSTATUS, __sstatus);
>>> +     local_irq_restore(flags);
>>> +
>>> +     return val;
>>> +}
>>> +
>>> +static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, u32 type)
>>> +{
>>> +     int i;
>>> +     struct kvm_vcpu *tmp;
>>> +
>>> +     kvm_for_each_vcpu(i, tmp, vcpu->kvm)
>>> +             tmp->arch.power_off = true;
>>> +     kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
>>> +
>>> +     memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
>>> +     vcpu->run->system_event.type = type;
>>> +     vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
>>> +}
>>> +
>>> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu)
>>> +{
>>> +     int ret = 1;
>>> +     u64 next_cycle;
>>> +     int vcpuid;
>>> +     struct kvm_vcpu *remote_vcpu;
>>> +     ulong dhart_mask;
>>> +     struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
>>> +
>>> +     if (!cp)
>>> +             return -EINVAL;
>>> +     switch (cp->a7) {
>>> +     case SBI_SET_TIMER:
>>> +#if __riscv_xlen == 32
>>> +             next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
>>> +#else
>>> +             next_cycle = (u64)cp->a0;
>>> +#endif
>>> +             kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
>> 
>> Ah, this is where the timer set happens. I still don't understand how
>> this takes the frequency bit into account?
> 
> Explained it in PATCH17 comments.
> 
>> 
>>> +             break;
>>> +     case SBI_CONSOLE_PUTCHAR:
>>> +             /* Not implemented */
>>> +             cp->a0 = -ENOTSUPP;
>>> +             break;
>>> +     case SBI_CONSOLE_GETCHAR:
>>> +             /* Not implemented */
>>> +             cp->a0 = -ENOTSUPP;
>>> +             break;
>> 
>> These two should be covered by the default case.
> 
> Sure, I will update.
> 
>> 
>>> +     case SBI_CLEAR_IPI:
>>> +             kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
>>> +             break;
>>> +     case SBI_SEND_IPI:
>>> +             dhart_mask = kvm_sbi_unpriv_load((unsigned long *)cp->a0, vcpu);
>>> +             for_each_set_bit(vcpuid, &dhart_mask, BITS_PER_LONG) {
>>> +                     remote_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, vcpuid);
>>> +                     kvm_riscv_vcpu_set_interrupt(remote_vcpu, IRQ_S_SOFT);
>>> +             }
>>> +             break;
>>> +     case SBI_SHUTDOWN:
>>> +             kvm_sbi_system_shutdown(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
>>> +             ret = 0;
>>> +             break;
>>> +     case SBI_REMOTE_FENCE_I:
>>> +             sbi_remote_fence_i(NULL);
>>> +             break;
>>> +     /*
>>> +      * TODO: There should be a way to call remote hfence.bvma.
>>> +      * Preferred method is now a SBI call. Until then, just flush
>>> +      * all tlbs.
>>> +      */
>>> +     case SBI_REMOTE_SFENCE_VMA:
>>> +             /*TODO: Parse vma range.*/
>>> +             sbi_remote_sfence_vma(NULL, 0, 0);
>>> +             break;
>>> +     case SBI_REMOTE_SFENCE_VMA_ASID:
>>> +             /*TODO: Parse vma range for given ASID */
>>> +             sbi_remote_sfence_vma(NULL, 0, 0);
>>> +             break;
>>> +     default:
>>> +             cp->a0 = ENOTSUPP;
>>> +             break;
>> 
>> Please just send unsupported SBI events into user space.
> 
> For unsupported SBI calls, we should be returning error to the
> Guest Linux so that do something about it. This is in accordance
> with the SBI spec.

That's up to user space (QEMU / kvmtool) to decide. If user space wants to implement the  console functions (like we do on s390), it should have the chance to do so.

Alex


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  2019-08-23 11:20         ` Anup Patel
@ 2019-08-23 11:42           ` Graf (AWS), Alexander
  0 siblings, 0 replies; 61+ messages in thread
From: Graf (AWS), Alexander @ 2019-08-23 11:42 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel



> Am 23.08.2019 um 13:21 schrieb Anup Patel <anup@brainfault.org>:
> 
>> On Thu, Aug 22, 2019 at 7:42 PM Alexander Graf <graf@amazon.com> wrote:
>> 
>> 
>> 
>>> On 22.08.19 16:00, Anup Patel wrote:
>>>> On Thu, Aug 22, 2019 at 5:31 PM Alexander Graf <graf@amazon.com> wrote:
>>>> 
>>>>> On 22.08.19 10:44, Anup Patel wrote:
>>>>> For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
>>>>> VCPU config and registers from user-space.
>>>>> 
>>>>> We have three types of VCPU registers:
>>>>> 1. CONFIG - these are VCPU config and capabilities
>>>>> 2. CORE   - these are VCPU general purpose registers
>>>>> 3. CSR    - these are VCPU control and status registers
>>>>> 
>>>>> The CONFIG registers available to user-space are ISA and TIMEBASE. Out
>>>>> of these, TIMEBASE is a read-only register which inform user-space about
>>>>> VCPU timer base frequency. The ISA register is a read and write register
>>>>> where user-space can only write the desired VCPU ISA capabilities before
>>>>> running the VCPU.
>>>>> 
>>>>> The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
>>>>> T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
>>>>> PC and MODE. The PC register represents program counter whereas the MODE
>>>>> register represent VCPU privilege mode (i.e. S/U-mode).
>>>>> 
>>>>> The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
>>>>> SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.
>>>>> 
>>>>> In future, more VCPU register types will be added (such as FP) for the
>>>>> KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.
>>>>> 
>>>>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
>>>>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>> ---
>>>>>   arch/riscv/include/uapi/asm/kvm.h |  40 ++++-
>>>>>   arch/riscv/kvm/vcpu.c             | 235 +++++++++++++++++++++++++++++-
>>>>>   2 files changed, 272 insertions(+), 3 deletions(-)
>>>>> 
>>>>> diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
>>>>> index 6dbc056d58ba..024f220eb17e 100644
>>>>> --- a/arch/riscv/include/uapi/asm/kvm.h
>>>>> +++ b/arch/riscv/include/uapi/asm/kvm.h
>>>>> @@ -23,8 +23,15 @@
>>>>> 
>>>>>   /* for KVM_GET_REGS and KVM_SET_REGS */
>>>>>   struct kvm_regs {
>>>>> +     /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
>>>>> +     struct user_regs_struct regs;
>>>>> +     unsigned long mode;
>>>> 
>>>> Is there any particular reason you're reusing kvm_regs and don't invent
>>>> your own struct? kvm_regs is explicitly meant for the get_regs and
>>>> set_regs ioctls.
>>> 
>>> We are implementing only ONE_REG interface so most of these
>>> structs are unused hence we tried to reuse these struct instead
>>> of introducing new structs. (Similar to KVM ARM64)
>>> 
>>>> 
>>>>>   };
>>>>> 
>>>>> +/* Possible privilege modes for kvm_regs */
>>>>> +#define KVM_RISCV_MODE_S     1
>>>>> +#define KVM_RISCV_MODE_U     0
>>>>> +
>>>>>   /* for KVM_GET_FPU and KVM_SET_FPU */
>>>>>   struct kvm_fpu {
>>>>>   };
>>>>> @@ -41,10 +48,41 @@ struct kvm_guest_debug_arch {
>>>>>   struct kvm_sync_regs {
>>>>>   };
>>>>> 
>>>>> -/* dummy definition */
>>>>> +/* for KVM_GET_SREGS and KVM_SET_SREGS */
>>>>>   struct kvm_sregs {
>>>>> +     unsigned long sstatus;
>>>>> +     unsigned long sie;
>>>>> +     unsigned long stvec;
>>>>> +     unsigned long sscratch;
>>>>> +     unsigned long sepc;
>>>>> +     unsigned long scause;
>>>>> +     unsigned long stval;
>>>>> +     unsigned long sip;
>>>>> +     unsigned long satp;
>>>> 
>>>> Same comment here.
>>> 
>>> Same as above, we are trying to use unused struct.
>>> 
>>>> 
>>>>>   };
>>>>> 
>>>>> +#define KVM_REG_SIZE(id)             \
>>>>> +     (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>>>>> +
>>>>> +/* If you need to interpret the index values, here is the key: */
>>>>> +#define KVM_REG_RISCV_TYPE_MASK              0x00000000FF000000
>>>>> +#define KVM_REG_RISCV_TYPE_SHIFT     24
>>>>> +
>>>>> +/* Config registers are mapped as type 1 */
>>>>> +#define KVM_REG_RISCV_CONFIG         (0x01 << KVM_REG_RISCV_TYPE_SHIFT)
>>>>> +#define KVM_REG_RISCV_CONFIG_ISA     0x0
>>>>> +#define KVM_REG_RISCV_CONFIG_TIMEBASE        0x1
>>>>> +
>>>>> +/* Core registers are mapped as type 2 */
>>>>> +#define KVM_REG_RISCV_CORE           (0x02 << KVM_REG_RISCV_TYPE_SHIFT)
>>>>> +#define KVM_REG_RISCV_CORE_REG(name) \
>>>>> +             (offsetof(struct kvm_regs, name) / sizeof(unsigned long))
>>>> 
>>>> I see, you're trying to implicitly use the struct offsets as index.
>>>> 
>>>> I'm not a really big fan of it, but I can't pinpoint exactly why just
>>>> yet. It just seems too magical (read: potentially breaking down the
>>>> road) for me.
>>>> 
>>>>> +
>>>>> +/* Control and status registers are mapped as type 3 */
>>>>> +#define KVM_REG_RISCV_CSR            (0x03 << KVM_REG_RISCV_TYPE_SHIFT)
>>>>> +#define KVM_REG_RISCV_CSR_REG(name)  \
>>>>> +             (offsetof(struct kvm_sregs, name) / sizeof(unsigned long))
>>>>> +
>>>>>   #endif
>>>>> 
>>>>>   #endif /* __LINUX_KVM_RISCV_H */
>>>>> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
>>>>> index 7f59e85c6af8..9396a83c0611 100644
>>>>> --- a/arch/riscv/kvm/vcpu.c
>>>>> +++ b/arch/riscv/kvm/vcpu.c
>>>>> @@ -164,6 +164,215 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>>>>>       return VM_FAULT_SIGBUS;
>>>>>   }
>>>>> 
>>>>> +static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
>>>>> +                                      const struct kvm_one_reg *reg)
>>>>> +{
>>>>> +     unsigned long __user *uaddr =
>>>>> +                     (unsigned long __user *)(unsigned long)reg->addr;
>>>>> +     unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
>>>>> +                                         KVM_REG_SIZE_MASK |
>>>>> +                                         KVM_REG_RISCV_CONFIG);
>>>>> +     unsigned long reg_val;
>>>>> +
>>>>> +     if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
>>>>> +             return -EINVAL;
>>>>> +
>>>>> +     switch (reg_num) {
>>>>> +     case KVM_REG_RISCV_CONFIG_ISA:
>>>>> +             reg_val = vcpu->arch.isa;
>>>>> +             break;
>>>>> +     case KVM_REG_RISCV_CONFIG_TIMEBASE:
>>>>> +             reg_val = riscv_timebase;
>>>> 
>>>> What does this reflect? The current guest time hopefully not? An offset?
>>>> Related to what?
>>> 
>>> riscv_timebase is the frequency in HZ of the system timer.
>>> 
>>> The name "timebase" is not appropriate but we have been
>>> carrying it since quite some time now.
>> 
>> What do you mean by "some time"? So far I only see a kernel internal
>> variable named after it. That's dramatically different from something
>> exposed via uapi.
>> 
>> Just name it tbfreq.
> 
> Sure, I will use TBFREQ name.
> 
>> 
>> So if this is the frequency, where is the offset? You will need it on
>> save/restore. If you're saying that's out of scope for now, that's fine
>> with me too :).
> 
> tbfreq is read-only and fixed.
> 
> The Guest tbfreq has to be same as Host tbfreq. This means we
> can only migrate Guest from Host A to Host B only if:
> 1. They have matching ISA capabilities

That's what we have on almost all archs, it's a fair statement.

> 2. They have matching tbfreq

This was true for most archs in the early virtualization days, but CPU vendors learned since then. It really makes people upset if they can not move their guests to a new CPU.

If you see bits in the spec that are missing (tb freq scaling / trapping on tb reads), please work with the ISA people to resolve them going forward.

Alex

> 
> Regards,
> Anup
> 
>> 
>> 
>> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 00/20] KVM RISC-V Support
  2019-08-23 11:25   ` Anup Patel
@ 2019-08-23 11:44     ` Graf (AWS), Alexander
  2019-08-23 12:10       ` Paolo Bonzini
  0 siblings, 1 reply; 61+ messages in thread
From: Graf (AWS), Alexander @ 2019-08-23 11:44 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel



> Am 23.08.2019 um 13:26 schrieb Anup Patel <anup@brainfault.org>:
> 
>> On Fri, Aug 23, 2019 at 1:39 PM Alexander Graf <graf@amazon.com> wrote:
>> 
>>> On 22.08.19 10:42, Anup Patel wrote:
>>> This series adds initial KVM RISC-V support. Currently, we are able to boot
>>> RISC-V 64bit Linux Guests with multiple VCPUs.
>>> 
>>> Few key aspects of KVM RISC-V added by this series are:
>>> 1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
>>> 2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
>>> 3. KVM ONE_REG interface for VCPU register access from user-space.
>>> 4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
>>>    be added in future.
>>> 5. Timer and IPI emuation is done in-kernel.
>>> 6. MMU notifiers supported.
>>> 7. FP lazy save/restore supported.
>>> 8. SBI v0.1 emulation for KVM Guest available.
>>> 
>>> Here's a brief TODO list which we will work upon after this series:
>>> 1. Handle trap from unpriv access in reading Guest instruction
>>> 2. Handle trap from unpriv access in SBI v0.1 emulation
>>> 3. Implement recursive stage2 page table programing
>>> 4. SBI v0.2 emulation in-kernel
>>> 5. SBI v0.2 hart hotplug emulation in-kernel
>>> 6. In-kernel PLIC emulation
>>> 7. ..... and more .....
>> 
>> Please consider patches I did not comment on as
>> 
>> Reviewed-by: Alexander Graf <graf@amazon.com>
>> 
>> Overall, I'm quite happy with the code. It's a very clean implementation
>> of a KVM target.
> 
> Thanks Alex.
> 
>> 
>> The only major nit I have is the guest address space read: I don't think
>> we should pull in code that we know allows user space to DOS the kernel.
>> For that, we need to find an alternative. Either you implement a
>> software page table walker and resolve VAs manually or you find a way to
>> ensure that *any* exception taken during the read does not affect
>> general code execution.
> 
> I will send v6 next week. I will try my best to implement unpriv trap
> handling in v6 itself.

Are you sure unpriv is the only exception that can hit there? What about NMIs? Do you have #MCs yet (ECC errors)? Do you have something like ARM's #SError which can asynchronously hit at any time because of external bus (PCI) errors?

Alex

> 
> Regards,
> Anup
> 
>> 
>> 
>> Thanks,
>> 
>> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 15/20] RISC-V: KVM: Add timer functionality
  2019-08-23 11:33       ` Graf (AWS), Alexander
@ 2019-08-23 11:46         ` Anup Patel
  2019-08-23 11:49           ` Alexander Graf
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-23 11:46 UTC (permalink / raw)
  To: Graf (AWS), Alexander
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Fri, Aug 23, 2019 at 5:03 PM Graf (AWS), Alexander <graf@amazon.com> wrote:
>
>
>
> > Am 23.08.2019 um 13:05 schrieb Anup Patel <anup@brainfault.org>:
> >
> >> On Fri, Aug 23, 2019 at 1:23 PM Alexander Graf <graf@amazon.com> wrote:
> >>
> >>> On 22.08.19 10:46, Anup Patel wrote:
> >>> From: Atish Patra <atish.patra@wdc.com>
> >>>
> >>> The RISC-V hypervisor specification doesn't have any virtual timer
> >>> feature.
> >>>
> >>> Due to this, the guest VCPU timer will be programmed via SBI calls.
> >>> The host will use a separate hrtimer event for each guest VCPU to
> >>> provide timer functionality. We inject a virtual timer interrupt to
> >>> the guest VCPU whenever the guest VCPU hrtimer event expires.
> >>>
> >>> The following features are not supported yet and will be added in
> >>> future:
> >>> 1. A time offset to adjust guest time from host time
> >>> 2. A saved next event in guest vcpu for vm migration
> >>
> >> Implementing these 2 bits right now should be trivial. Why wait?
> >
> > We were waiting for HTIMEDELTA CSR to be merged so we
> > deferred this items.
> >
> >>
> >>>
> >>> Signed-off-by: Atish Patra <atish.patra@wdc.com>
> >>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> >>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> ---
> >>>  arch/riscv/include/asm/kvm_host.h       |   4 +
> >>>  arch/riscv/include/asm/kvm_vcpu_timer.h |  32 +++++++
> >>>  arch/riscv/kvm/Makefile                 |   2 +-
> >>>  arch/riscv/kvm/vcpu.c                   |   6 ++
> >>>  arch/riscv/kvm/vcpu_timer.c             | 106 ++++++++++++++++++++++++
> >>>  drivers/clocksource/timer-riscv.c       |   8 ++
> >>>  include/clocksource/timer-riscv.h       |  16 ++++
> >>>  7 files changed, 173 insertions(+), 1 deletion(-)
> >>>  create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
> >>>  create mode 100644 arch/riscv/kvm/vcpu_timer.c
> >>>  create mode 100644 include/clocksource/timer-riscv.h
> >>>
> >>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> >>> index ab33e59a3d88..d2a2e45eefc0 100644
> >>> --- a/arch/riscv/include/asm/kvm_host.h
> >>> +++ b/arch/riscv/include/asm/kvm_host.h
> >>> @@ -12,6 +12,7 @@
> >>>  #include <linux/types.h>
> >>>  #include <linux/kvm.h>
> >>>  #include <linux/kvm_types.h>
> >>> +#include <asm/kvm_vcpu_timer.h>
> >>>
> >>>  #ifdef CONFIG_64BIT
> >>>  #define KVM_MAX_VCPUS                       (1U << 16)
> >>> @@ -167,6 +168,9 @@ struct kvm_vcpu_arch {
> >>>      unsigned long irqs_pending;
> >>>      unsigned long irqs_pending_mask;
> >>>
> >>> +     /* VCPU Timer */
> >>> +     struct kvm_vcpu_timer timer;
> >>> +
> >>>      /* MMIO instruction details */
> >>>      struct kvm_mmio_decode mmio_decode;
> >>>
> >>> diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h
> >>> new file mode 100644
> >>> index 000000000000..df67ea86988e
> >>> --- /dev/null
> >>> +++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
> >>> @@ -0,0 +1,32 @@
> >>> +/* SPDX-License-Identifier: GPL-2.0-only */
> >>> +/*
> >>> + * Copyright (C) 2019 Western Digital Corporation or its affiliates.
> >>> + *
> >>> + * Authors:
> >>> + *   Atish Patra <atish.patra@wdc.com>
> >>> + */
> >>> +
> >>> +#ifndef __KVM_VCPU_RISCV_TIMER_H
> >>> +#define __KVM_VCPU_RISCV_TIMER_H
> >>> +
> >>> +#include <linux/hrtimer.h>
> >>> +
> >>> +#define VCPU_TIMER_PROGRAM_THRESHOLD_NS 1000
> >>> +
> >>> +struct kvm_vcpu_timer {
> >>> +     bool init_done;
> >>> +     /* Check if the timer is programmed */
> >>> +     bool is_set;
> >>> +     struct hrtimer hrt;
> >>> +     /* Mult & Shift values to get nanosec from cycles */
> >>> +     u32 mult;
> >>> +     u32 shift;
> >>> +};
> >>> +
> >>> +int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu);
> >>> +int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
> >>> +int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
> >>> +int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
> >>> +                                 unsigned long ncycles);
> >>
> >> This function never gets called?
> >
> > It's called from SBI emulation.
> >
> >>
> >>> +
> >>> +#endif
> >>> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> >>> index c0f57f26c13d..3e0c7558320d 100644
> >>> --- a/arch/riscv/kvm/Makefile
> >>> +++ b/arch/riscv/kvm/Makefile
> >>> @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
> >>>  kvm-objs := $(common-objs-y)
> >>>
> >>>  kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
> >>> -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
> >>> +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
> >>>
> >>>  obj-$(CONFIG_KVM)   += kvm.o
> >>> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> >>> index 6124077d154f..018fca436776 100644
> >>> --- a/arch/riscv/kvm/vcpu.c
> >>> +++ b/arch/riscv/kvm/vcpu.c
> >>> @@ -54,6 +54,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
> >>>
> >>>      memcpy(cntx, reset_cntx, sizeof(*cntx));
> >>>
> >>> +     kvm_riscv_vcpu_timer_reset(vcpu);
> >>> +
> >>>      WRITE_ONCE(vcpu->arch.irqs_pending, 0);
> >>>      WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
> >>>  }
> >>> @@ -108,6 +110,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >>>      cntx->hstatus |= HSTATUS_SP2P;
> >>>      cntx->hstatus |= HSTATUS_SPV;
> >>>
> >>> +     /* Setup VCPU timer */
> >>> +     kvm_riscv_vcpu_timer_init(vcpu);
> >>> +
> >>>      /* Reset VCPU */
> >>>      kvm_riscv_reset_vcpu(vcpu);
> >>>
> >>> @@ -116,6 +121,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >>>
> >>>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >>>  {
> >>> +     kvm_riscv_vcpu_timer_deinit(vcpu);
> >>>      kvm_riscv_stage2_flush_cache(vcpu);
> >>>      kmem_cache_free(kvm_vcpu_cache, vcpu);
> >>>  }
> >>> diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
> >>> new file mode 100644
> >>> index 000000000000..a45ca06e1aa6
> >>> --- /dev/null
> >>> +++ b/arch/riscv/kvm/vcpu_timer.c
> >>> @@ -0,0 +1,106 @@
> >>> +// SPDX-License-Identifier: GPL-2.0
> >>> +/*
> >>> + * Copyright (C) 2019 Western Digital Corporation or its affiliates.
> >>> + *
> >>> + * Authors:
> >>> + *     Atish Patra <atish.patra@wdc.com>
> >>> + */
> >>> +
> >>> +#include <linux/errno.h>
> >>> +#include <linux/err.h>
> >>> +#include <linux/kvm_host.h>
> >>> +#include <clocksource/timer-riscv.h>
> >>> +#include <asm/csr.h>
> >>> +#include <asm/kvm_vcpu_timer.h>
> >>> +
> >>> +static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h)
> >>> +{
> >>> +     struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt);
> >>> +     struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer);
> >>> +
> >>> +     t->is_set = false;
> >>> +     kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
> >>> +
> >>> +     return HRTIMER_NORESTART;
> >>> +}
> >>> +
> >>> +static u64 kvm_riscv_delta_cycles2ns(u64 cycles, struct kvm_vcpu_timer *t)
> >>> +{
> >>> +     unsigned long flags;
> >>> +     u64 cycles_now, cycles_delta, delta_ns;
> >>> +
> >>> +     local_irq_save(flags);
> >>> +     cycles_now = get_cycles64();
> >>> +     if (cycles_now < cycles)
> >>> +             cycles_delta = cycles - cycles_now;
> >>> +     else
> >>> +             cycles_delta = 0;
> >>> +     delta_ns = (cycles_delta * t->mult) >> t->shift;
> >>> +     local_irq_restore(flags);
> >>> +
> >>> +     return delta_ns;
> >>> +}
> >>> +
> >>> +static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t)
> >>> +{
> >>> +     if (!t->init_done || !t->is_set)
> >>> +             return -EINVAL;
> >>> +
> >>> +     hrtimer_cancel(&t->hrt);
> >>> +     t->is_set = false;
> >>> +
> >>> +     return 0;
> >>> +}
> >>> +
> >>> +int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
> >>> +                                 unsigned long ncycles)
> >>> +{
> >>> +     struct kvm_vcpu_timer *t = &vcpu->arch.timer;
> >>> +     u64 delta_ns = kvm_riscv_delta_cycles2ns(ncycles, t);
> >>
> >> ... in fact, I feel like I'm missing something obvious here. How does
> >> the guest trigger the timer event? What is the argument it uses for that
> >> and how does that play with the tbfreq in the earlier patch?
> >
> > We have SBI call inferface between Hypervisor and Guest. One of the
> > SBI call allows Guest to program time event. The next event is specified
> > as absolute cycles. The Guest can read time using TIME CSR which
> > returns system timer value (@ tbfreq freqency).
> >
> > Guest Linux will know the tbfreq from DTB passed by QEMU/KVMTOOL
> > and it has to be same as Host tbfreq.
> >
> > The TBFREQ config register visible to user-space is a read-only CONFIG
> > register which tells user-space tools (QEMU/KVMTOOL) about Host tbfreq.
>
> And it's read-only because you can not trap on TB reads?

There is no TB registers.

The tbfreq can only be know through DT/ACPI kind-of HW description
for both Host and Guest.

The KVM user-space tool needs to know TBFREQ so that it can set correct
value in generated DT for Guest Linux.

Regards,
Anup

>
> Alex
>
> >
> > Regards,
> > Anup
> >
> >>
> >>
> >> Alex
> >>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 15/20] RISC-V: KVM: Add timer functionality
  2019-08-23 11:46         ` Anup Patel
@ 2019-08-23 11:49           ` Alexander Graf
  2019-08-23 12:11             ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-23 11:49 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel



On 23.08.19 13:46, Anup Patel wrote:
> On Fri, Aug 23, 2019 at 5:03 PM Graf (AWS), Alexander <graf@amazon.com> wrote:
>>
>>
>>
>>> Am 23.08.2019 um 13:05 schrieb Anup Patel <anup@brainfault.org>:
>>>
>>>> On Fri, Aug 23, 2019 at 1:23 PM Alexander Graf <graf@amazon.com> wrote:
>>>>
>>>>> On 22.08.19 10:46, Anup Patel wrote:
>>>>> From: Atish Patra <atish.patra@wdc.com>
>>>>>
>>>>> The RISC-V hypervisor specification doesn't have any virtual timer
>>>>> feature.
>>>>>
>>>>> Due to this, the guest VCPU timer will be programmed via SBI calls.
>>>>> The host will use a separate hrtimer event for each guest VCPU to
>>>>> provide timer functionality. We inject a virtual timer interrupt to
>>>>> the guest VCPU whenever the guest VCPU hrtimer event expires.
>>>>>
>>>>> The following features are not supported yet and will be added in
>>>>> future:
>>>>> 1. A time offset to adjust guest time from host time
>>>>> 2. A saved next event in guest vcpu for vm migration
>>>>
>>>> Implementing these 2 bits right now should be trivial. Why wait?
>>>

[...]

>>>> ... in fact, I feel like I'm missing something obvious here. How does
>>>> the guest trigger the timer event? What is the argument it uses for that
>>>> and how does that play with the tbfreq in the earlier patch?
>>>
>>> We have SBI call inferface between Hypervisor and Guest. One of the
>>> SBI call allows Guest to program time event. The next event is specified
>>> as absolute cycles. The Guest can read time using TIME CSR which
>>> returns system timer value (@ tbfreq freqency).
>>>
>>> Guest Linux will know the tbfreq from DTB passed by QEMU/KVMTOOL
>>> and it has to be same as Host tbfreq.
>>>
>>> The TBFREQ config register visible to user-space is a read-only CONFIG
>>> register which tells user-space tools (QEMU/KVMTOOL) about Host tbfreq.
>>
>> And it's read-only because you can not trap on TB reads?
> 
> There is no TB registers.
> 
> The tbfreq can only be know through DT/ACPI kind-of HW description
> for both Host and Guest.
> 
> The KVM user-space tool needs to know TBFREQ so that it can set correct
> value in generated DT for Guest Linux.

So what access methods do get influenced by TBFREQ? If it's only the SBI 
timer, we can control the frequency, which means we can make TBFREQ 
read/write.


Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support
  2019-08-23 11:38       ` Graf (AWS), Alexander
@ 2019-08-23 12:00         ` Anup Patel
  2019-08-23 12:19           ` Alexander Graf
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-23 12:00 UTC (permalink / raw)
  To: Graf (AWS), Alexander
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Damien Le Moal, kvm, Daniel Lezcano, linux-kernel,
	Christoph Hellwig, Atish Patra, Alistair Francis,
	Thomas Gleixner, linux-riscv

On Fri, Aug 23, 2019 at 5:09 PM Graf (AWS), Alexander <graf@amazon.com> wrote:
>
>
>
> > Am 23.08.2019 um 13:18 schrieb Anup Patel <anup@brainfault.org>:
> >
> >> On Fri, Aug 23, 2019 at 1:34 PM Alexander Graf <graf@amazon.com> wrote:
> >>
> >>> On 22.08.19 10:46, Anup Patel wrote:
> >>> From: Atish Patra <atish.patra@wdc.com>
> >>>
> >>> The KVM host kernel running in HS-mode needs to handle SBI calls coming
> >>> from guest kernel running in VS-mode.
> >>>
> >>> This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
> >>> implemented correctly except remote tlb flushes. For remote TLB flushes,
> >>> we are doing full TLB flush and this will be optimized in future.
> >>>
> >>> Signed-off-by: Atish Patra <atish.patra@wdc.com>
> >>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> >>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> ---
> >>>  arch/riscv/include/asm/kvm_host.h |   2 +
> >>>  arch/riscv/kvm/Makefile           |   2 +-
> >>>  arch/riscv/kvm/vcpu_exit.c        |   3 +
> >>>  arch/riscv/kvm/vcpu_sbi.c         | 119 ++++++++++++++++++++++++++++++
> >>>  4 files changed, 125 insertions(+), 1 deletion(-)
> >>>  create mode 100644 arch/riscv/kvm/vcpu_sbi.c
> >>>
> >>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> >>> index 2af3a179c08e..0b1eceaef59f 100644
> >>> --- a/arch/riscv/include/asm/kvm_host.h
> >>> +++ b/arch/riscv/include/asm/kvm_host.h
> >>> @@ -241,4 +241,6 @@ bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
> >>>  void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
> >>>  void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
> >>>
> >>> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu);
> >>> +
> >>>  #endif /* __RISCV_KVM_HOST_H__ */
> >>> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> >>> index 3e0c7558320d..b56dc1650d2c 100644
> >>> --- a/arch/riscv/kvm/Makefile
> >>> +++ b/arch/riscv/kvm/Makefile
> >>> @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
> >>>  kvm-objs := $(common-objs-y)
> >>>
> >>>  kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
> >>> -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
> >>> +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
> >>>
> >>>  obj-$(CONFIG_KVM)   += kvm.o
> >>> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> >>> index fbc04fe335ad..87b83fcf9a14 100644
> >>> --- a/arch/riscv/kvm/vcpu_exit.c
> >>> +++ b/arch/riscv/kvm/vcpu_exit.c
> >>> @@ -534,6 +534,9 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >>>                  (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
> >>>                      ret = stage2_page_fault(vcpu, run, scause, stval);
> >>>              break;
> >>> +     case EXC_SUPERVISOR_SYSCALL:
> >>> +             if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
> >>> +                     ret = kvm_riscv_vcpu_sbi_ecall(vcpu);
> >>>      default:
> >>>              break;
> >>>      };
> >>> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> >>> new file mode 100644
> >>> index 000000000000..5793202eb514
> >>> --- /dev/null
> >>> +++ b/arch/riscv/kvm/vcpu_sbi.c
> >>> @@ -0,0 +1,119 @@
> >>> +// SPDX-License-Identifier: GPL-2.0
> >>> +/**
> >>> + * Copyright (c) 2019 Western Digital Corporation or its affiliates.
> >>> + *
> >>> + * Authors:
> >>> + *     Atish Patra <atish.patra@wdc.com>
> >>> + */
> >>> +
> >>> +#include <linux/errno.h>
> >>> +#include <linux/err.h>
> >>> +#include <linux/kvm_host.h>
> >>> +#include <asm/csr.h>
> >>> +#include <asm/kvm_vcpu_timer.h>
> >>> +
> >>> +#define SBI_VERSION_MAJOR                    0
> >>> +#define SBI_VERSION_MINOR                    1
> >>> +
> >>> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
> >>
> >> Ugh, another one of those? Can't you just figure out a way to recover
> >> from the page fault? Also, you want to combine this with the instruction
> >> load logic, so that we have a single place that guest address space
> >> reads go through.
> >
> > Walking Guest page table would be more expensive compared to implementing
> > a trap handling mechanism.
> >
> > We will be adding trap handling mechanism for reading instruction and reading
> > load.
> >
> > Both these operations are different in following ways:
> > 1. RISC-V instructions are variable length. We get to know exact instruction
> >    length only after reading first 16bits
> > 2. We need to set VSSTATUS.MXR bit when reading instruction for
> >    execute-only Guest pages.
>
> Yup, sounds like you could solve that with a trivial if() based on "read instruction" or not, no? If you want to, feel free to provide short versions that do only read ins/data, but I would really like to see the whole "data reads become guest reads" magic to be funneled through a single function (in C, can be inline unrolled in asm of course)
>
> >
> >>
> >>> +static unsigned long kvm_sbi_unpriv_load(const unsigned long *addr,
> >>> +                                      struct kvm_vcpu *vcpu)
> >>> +{
> >>> +     unsigned long flags, val;
> >>> +     unsigned long __hstatus, __sstatus;
> >>> +
> >>> +     local_irq_save(flags);
> >>> +     __hstatus = csr_read(CSR_HSTATUS);
> >>> +     __sstatus = csr_read(CSR_SSTATUS);
> >>> +     csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> >>> +     csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
> >>> +     val = *addr;
> >>> +     csr_write(CSR_HSTATUS, __hstatus);
> >>> +     csr_write(CSR_SSTATUS, __sstatus);
> >>> +     local_irq_restore(flags);
> >>> +
> >>> +     return val;
> >>> +}
> >>> +
> >>> +static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, u32 type)
> >>> +{
> >>> +     int i;
> >>> +     struct kvm_vcpu *tmp;
> >>> +
> >>> +     kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> >>> +             tmp->arch.power_off = true;
> >>> +     kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
> >>> +
> >>> +     memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> >>> +     vcpu->run->system_event.type = type;
> >>> +     vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> >>> +}
> >>> +
> >>> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +     int ret = 1;
> >>> +     u64 next_cycle;
> >>> +     int vcpuid;
> >>> +     struct kvm_vcpu *remote_vcpu;
> >>> +     ulong dhart_mask;
> >>> +     struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> >>> +
> >>> +     if (!cp)
> >>> +             return -EINVAL;
> >>> +     switch (cp->a7) {
> >>> +     case SBI_SET_TIMER:
> >>> +#if __riscv_xlen == 32
> >>> +             next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
> >>> +#else
> >>> +             next_cycle = (u64)cp->a0;
> >>> +#endif
> >>> +             kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
> >>
> >> Ah, this is where the timer set happens. I still don't understand how
> >> this takes the frequency bit into account?
> >
> > Explained it in PATCH17 comments.
> >
> >>
> >>> +             break;
> >>> +     case SBI_CONSOLE_PUTCHAR:
> >>> +             /* Not implemented */
> >>> +             cp->a0 = -ENOTSUPP;
> >>> +             break;
> >>> +     case SBI_CONSOLE_GETCHAR:
> >>> +             /* Not implemented */
> >>> +             cp->a0 = -ENOTSUPP;
> >>> +             break;
> >>
> >> These two should be covered by the default case.
> >
> > Sure, I will update.
> >
> >>
> >>> +     case SBI_CLEAR_IPI:
> >>> +             kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
> >>> +             break;
> >>> +     case SBI_SEND_IPI:
> >>> +             dhart_mask = kvm_sbi_unpriv_load((unsigned long *)cp->a0, vcpu);
> >>> +             for_each_set_bit(vcpuid, &dhart_mask, BITS_PER_LONG) {
> >>> +                     remote_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, vcpuid);
> >>> +                     kvm_riscv_vcpu_set_interrupt(remote_vcpu, IRQ_S_SOFT);
> >>> +             }
> >>> +             break;
> >>> +     case SBI_SHUTDOWN:
> >>> +             kvm_sbi_system_shutdown(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
> >>> +             ret = 0;
> >>> +             break;
> >>> +     case SBI_REMOTE_FENCE_I:
> >>> +             sbi_remote_fence_i(NULL);
> >>> +             break;
> >>> +     /*
> >>> +      * TODO: There should be a way to call remote hfence.bvma.
> >>> +      * Preferred method is now a SBI call. Until then, just flush
> >>> +      * all tlbs.
> >>> +      */
> >>> +     case SBI_REMOTE_SFENCE_VMA:
> >>> +             /*TODO: Parse vma range.*/
> >>> +             sbi_remote_sfence_vma(NULL, 0, 0);
> >>> +             break;
> >>> +     case SBI_REMOTE_SFENCE_VMA_ASID:
> >>> +             /*TODO: Parse vma range for given ASID */
> >>> +             sbi_remote_sfence_vma(NULL, 0, 0);
> >>> +             break;
> >>> +     default:
> >>> +             cp->a0 = ENOTSUPP;
> >>> +             break;
> >>
> >> Please just send unsupported SBI events into user space.
> >
> > For unsupported SBI calls, we should be returning error to the
> > Guest Linux so that do something about it. This is in accordance
> > with the SBI spec.
>
> That's up to user space (QEMU / kvmtool) to decide. If user space wants to implement the  console functions (like we do on s390), it should have the chance to do so.

The SBI_CONSOLE_PUTCHAR and SBI_CONSOLE_GETCHAR are
for debugging only. These calls are deprecated in SBI v0.2 onwards
because we now have earlycon for early prints in Linux RISC-V.

The RISC-V Guest will generally have it's own MMIO based UART
which will be the default console.

Due to these reasons, we have not implemented these SBI calls.

If we still want user-space to implement this then we will require
separate exit reasons and we are trying to avoid adding RISC-V
specific exit reasons/ioctls in KVM user-space ABI.

The absence of SBI_CONSOLE_PUTCHAR/GETCHAR certainly
does not block anyone in debugging Guest Linux because we have
earlycon support in Linux RISC-V.

Regards,
Anup

>
> Alex
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 00/20] KVM RISC-V Support
  2019-08-23 11:44     ` Graf (AWS), Alexander
@ 2019-08-23 12:10       ` Paolo Bonzini
  2019-08-23 12:19         ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Paolo Bonzini @ 2019-08-23 12:10 UTC (permalink / raw)
  To: Graf (AWS), Alexander, Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Radim K,
	Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, kvm, linux-riscv,
	linux-kernel

On 23/08/19 13:44, Graf (AWS), Alexander wrote:
>> Overall, I'm quite happy with the code. It's a very clean implementation
>> of a KVM target.

Yup, I said the same even for v1 (I prefer recursive implementation of
page table walking but that's all I can say).

>> I will send v6 next week. I will try my best to implement unpriv
>> trap handling in v6 itself.
> Are you sure unpriv is the only exception that can hit there? What
> about NMIs? Do you have #MCs yet (ECC errors)? Do you have something
> like ARM's #SError which can asynchronously hit at any time because
> of external bus (PCI) errors?

As far as I know, all interrupts on RISC-V are disabled by
local_irq_disable()/local_irq_enable().

Paolo

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 15/20] RISC-V: KVM: Add timer functionality
  2019-08-23 11:49           ` Alexander Graf
@ 2019-08-23 12:11             ` Anup Patel
  2019-08-23 12:25               ` Alexander Graf
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-23 12:11 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel

On Fri, Aug 23, 2019 at 5:19 PM Alexander Graf <graf@amazon.com> wrote:
>
>
>
> On 23.08.19 13:46, Anup Patel wrote:
> > On Fri, Aug 23, 2019 at 5:03 PM Graf (AWS), Alexander <graf@amazon.com> wrote:
> >>
> >>
> >>
> >>> Am 23.08.2019 um 13:05 schrieb Anup Patel <anup@brainfault.org>:
> >>>
> >>>> On Fri, Aug 23, 2019 at 1:23 PM Alexander Graf <graf@amazon.com> wrote:
> >>>>
> >>>>> On 22.08.19 10:46, Anup Patel wrote:
> >>>>> From: Atish Patra <atish.patra@wdc.com>
> >>>>>
> >>>>> The RISC-V hypervisor specification doesn't have any virtual timer
> >>>>> feature.
> >>>>>
> >>>>> Due to this, the guest VCPU timer will be programmed via SBI calls.
> >>>>> The host will use a separate hrtimer event for each guest VCPU to
> >>>>> provide timer functionality. We inject a virtual timer interrupt to
> >>>>> the guest VCPU whenever the guest VCPU hrtimer event expires.
> >>>>>
> >>>>> The following features are not supported yet and will be added in
> >>>>> future:
> >>>>> 1. A time offset to adjust guest time from host time
> >>>>> 2. A saved next event in guest vcpu for vm migration
> >>>>
> >>>> Implementing these 2 bits right now should be trivial. Why wait?
> >>>
>
> [...]
>
> >>>> ... in fact, I feel like I'm missing something obvious here. How does
> >>>> the guest trigger the timer event? What is the argument it uses for that
> >>>> and how does that play with the tbfreq in the earlier patch?
> >>>
> >>> We have SBI call inferface between Hypervisor and Guest. One of the
> >>> SBI call allows Guest to program time event. The next event is specified
> >>> as absolute cycles. The Guest can read time using TIME CSR which
> >>> returns system timer value (@ tbfreq freqency).
> >>>
> >>> Guest Linux will know the tbfreq from DTB passed by QEMU/KVMTOOL
> >>> and it has to be same as Host tbfreq.
> >>>
> >>> The TBFREQ config register visible to user-space is a read-only CONFIG
> >>> register which tells user-space tools (QEMU/KVMTOOL) about Host tbfreq.
> >>
> >> And it's read-only because you can not trap on TB reads?
> >
> > There is no TB registers.
> >
> > The tbfreq can only be know through DT/ACPI kind-of HW description
> > for both Host and Guest.
> >
> > The KVM user-space tool needs to know TBFREQ so that it can set correct
> > value in generated DT for Guest Linux.
>
> So what access methods do get influenced by TBFREQ? If it's only the SBI
> timer, we can control the frequency, which means we can make TBFREQ
> read/write.

There are two things influenced by TBFREQ:
1. TIME CSR which is a free running counter
2. SBI calls for programming next timer event

The Guest TIME CSR will be at same rate as Host TIME CSR so
we cannot show different TBFREQ to Guest Linux.

In future, we will be having a dedicated RISC-V timer extension which
will have all programming done via CSRs but until then we are stuck
with TIME CSR + SBI call combination.

Regards,
Anup

>
>
> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 00/20] KVM RISC-V Support
  2019-08-23 12:10       ` Paolo Bonzini
@ 2019-08-23 12:19         ` Anup Patel
  2019-08-23 12:28           ` Alexander Graf
  0 siblings, 1 reply; 61+ messages in thread
From: Anup Patel @ 2019-08-23 12:19 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Graf (AWS),
	Alexander, Anup Patel, Palmer Dabbelt, Paul Walmsley, Radim K,
	Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, kvm, linux-riscv,
	linux-kernel

On Fri, Aug 23, 2019 at 5:40 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 23/08/19 13:44, Graf (AWS), Alexander wrote:
> >> Overall, I'm quite happy with the code. It's a very clean implementation
> >> of a KVM target.
>
> Yup, I said the same even for v1 (I prefer recursive implementation of
> page table walking but that's all I can say).
>
> >> I will send v6 next week. I will try my best to implement unpriv
> >> trap handling in v6 itself.
> > Are you sure unpriv is the only exception that can hit there? What
> > about NMIs? Do you have #MCs yet (ECC errors)? Do you have something
> > like ARM's #SError which can asynchronously hit at any time because
> > of external bus (PCI) errors?
>
> As far as I know, all interrupts on RISC-V are disabled by
> local_irq_disable()/local_irq_enable().

Yes, we don't have per-CPU interrupts for async bus errors or
non-maskable interrupts. The local_irq_disable() and local_irq_enable()
affect all interrupts (excepts traps).

Although, the async bus errors can certainly be routed to Linux
via PLIC (interrupt-controller) as regular peripheral interrupts.

Regards,
Anup

>
> Paolo

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support
  2019-08-23 12:00         ` Anup Patel
@ 2019-08-23 12:19           ` Alexander Graf
  2019-08-23 12:28             ` Anup Patel
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Graf @ 2019-08-23 12:19 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Damien Le Moal, kvm, Daniel Lezcano, linux-kernel,
	Christoph Hellwig, Atish Patra, Alistair Francis,
	Thomas Gleixner, linux-riscv



On 23.08.19 14:00, Anup Patel wrote:
> On Fri, Aug 23, 2019 at 5:09 PM Graf (AWS), Alexander <graf@amazon.com> wrote:
>>
>>
>>
>>> Am 23.08.2019 um 13:18 schrieb Anup Patel <anup@brainfault.org>:
>>>
>>>> On Fri, Aug 23, 2019 at 1:34 PM Alexander Graf <graf@amazon.com> wrote:
>>>>
>>>>> On 22.08.19 10:46, Anup Patel wrote:
>>>>> From: Atish Patra <atish.patra@wdc.com>
>>>>>
>>>>> The KVM host kernel running in HS-mode needs to handle SBI calls coming
>>>>> from guest kernel running in VS-mode.
>>>>>
>>>>> This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
>>>>> implemented correctly except remote tlb flushes. For remote TLB flushes,
>>>>> we are doing full TLB flush and this will be optimized in future.
>>>>>
>>>>> Signed-off-by: Atish Patra <atish.patra@wdc.com>
>>>>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
>>>>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>> ---
>>>>>   arch/riscv/include/asm/kvm_host.h |   2 +
>>>>>   arch/riscv/kvm/Makefile           |   2 +-
>>>>>   arch/riscv/kvm/vcpu_exit.c        |   3 +
>>>>>   arch/riscv/kvm/vcpu_sbi.c         | 119 ++++++++++++++++++++++++++++++
>>>>>   4 files changed, 125 insertions(+), 1 deletion(-)
>>>>>   create mode 100644 arch/riscv/kvm/vcpu_sbi.c
>>>>>
>>>>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
>>>>> index 2af3a179c08e..0b1eceaef59f 100644
>>>>> --- a/arch/riscv/include/asm/kvm_host.h
>>>>> +++ b/arch/riscv/include/asm/kvm_host.h
>>>>> @@ -241,4 +241,6 @@ bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
>>>>>   void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
>>>>>   void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
>>>>>
>>>>> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu);
>>>>> +
>>>>>   #endif /* __RISCV_KVM_HOST_H__ */
>>>>> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
>>>>> index 3e0c7558320d..b56dc1650d2c 100644
>>>>> --- a/arch/riscv/kvm/Makefile
>>>>> +++ b/arch/riscv/kvm/Makefile
>>>>> @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
>>>>>   kvm-objs := $(common-objs-y)
>>>>>
>>>>>   kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
>>>>> -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
>>>>> +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
>>>>>
>>>>>   obj-$(CONFIG_KVM)   += kvm.o
>>>>> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
>>>>> index fbc04fe335ad..87b83fcf9a14 100644
>>>>> --- a/arch/riscv/kvm/vcpu_exit.c
>>>>> +++ b/arch/riscv/kvm/vcpu_exit.c
>>>>> @@ -534,6 +534,9 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>>>>                   (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
>>>>>                       ret = stage2_page_fault(vcpu, run, scause, stval);
>>>>>               break;
>>>>> +     case EXC_SUPERVISOR_SYSCALL:
>>>>> +             if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
>>>>> +                     ret = kvm_riscv_vcpu_sbi_ecall(vcpu);
>>>>>       default:
>>>>>               break;
>>>>>       };
>>>>> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
>>>>> new file mode 100644
>>>>> index 000000000000..5793202eb514
>>>>> --- /dev/null
>>>>> +++ b/arch/riscv/kvm/vcpu_sbi.c
>>>>> @@ -0,0 +1,119 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>> +/**
>>>>> + * Copyright (c) 2019 Western Digital Corporation or its affiliates.
>>>>> + *
>>>>> + * Authors:
>>>>> + *     Atish Patra <atish.patra@wdc.com>
>>>>> + */
>>>>> +
>>>>> +#include <linux/errno.h>
>>>>> +#include <linux/err.h>
>>>>> +#include <linux/kvm_host.h>
>>>>> +#include <asm/csr.h>
>>>>> +#include <asm/kvm_vcpu_timer.h>
>>>>> +
>>>>> +#define SBI_VERSION_MAJOR                    0
>>>>> +#define SBI_VERSION_MINOR                    1
>>>>> +
>>>>> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
>>>>
>>>> Ugh, another one of those? Can't you just figure out a way to recover
>>>> from the page fault? Also, you want to combine this with the instruction
>>>> load logic, so that we have a single place that guest address space
>>>> reads go through.
>>>
>>> Walking Guest page table would be more expensive compared to implementing
>>> a trap handling mechanism.
>>>
>>> We will be adding trap handling mechanism for reading instruction and reading
>>> load.
>>>
>>> Both these operations are different in following ways:
>>> 1. RISC-V instructions are variable length. We get to know exact instruction
>>>     length only after reading first 16bits
>>> 2. We need to set VSSTATUS.MXR bit when reading instruction for
>>>     execute-only Guest pages.
>>
>> Yup, sounds like you could solve that with a trivial if() based on "read instruction" or not, no? If you want to, feel free to provide short versions that do only read ins/data, but I would really like to see the whole "data reads become guest reads" magic to be funneled through a single function (in C, can be inline unrolled in asm of course)
>>
>>>
>>>>
>>>>> +static unsigned long kvm_sbi_unpriv_load(const unsigned long *addr,
>>>>> +                                      struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +     unsigned long flags, val;
>>>>> +     unsigned long __hstatus, __sstatus;
>>>>> +
>>>>> +     local_irq_save(flags);
>>>>> +     __hstatus = csr_read(CSR_HSTATUS);
>>>>> +     __sstatus = csr_read(CSR_SSTATUS);
>>>>> +     csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
>>>>> +     csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
>>>>> +     val = *addr;
>>>>> +     csr_write(CSR_HSTATUS, __hstatus);
>>>>> +     csr_write(CSR_SSTATUS, __sstatus);
>>>>> +     local_irq_restore(flags);
>>>>> +
>>>>> +     return val;
>>>>> +}
>>>>> +
>>>>> +static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, u32 type)
>>>>> +{
>>>>> +     int i;
>>>>> +     struct kvm_vcpu *tmp;
>>>>> +
>>>>> +     kvm_for_each_vcpu(i, tmp, vcpu->kvm)
>>>>> +             tmp->arch.power_off = true;
>>>>> +     kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
>>>>> +
>>>>> +     memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
>>>>> +     vcpu->run->system_event.type = type;
>>>>> +     vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
>>>>> +}
>>>>> +
>>>>> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +     int ret = 1;
>>>>> +     u64 next_cycle;
>>>>> +     int vcpuid;
>>>>> +     struct kvm_vcpu *remote_vcpu;
>>>>> +     ulong dhart_mask;
>>>>> +     struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
>>>>> +
>>>>> +     if (!cp)
>>>>> +             return -EINVAL;
>>>>> +     switch (cp->a7) {
>>>>> +     case SBI_SET_TIMER:
>>>>> +#if __riscv_xlen == 32
>>>>> +             next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
>>>>> +#else
>>>>> +             next_cycle = (u64)cp->a0;
>>>>> +#endif
>>>>> +             kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
>>>>
>>>> Ah, this is where the timer set happens. I still don't understand how
>>>> this takes the frequency bit into account?
>>>
>>> Explained it in PATCH17 comments.
>>>
>>>>
>>>>> +             break;
>>>>> +     case SBI_CONSOLE_PUTCHAR:
>>>>> +             /* Not implemented */
>>>>> +             cp->a0 = -ENOTSUPP;
>>>>> +             break;
>>>>> +     case SBI_CONSOLE_GETCHAR:
>>>>> +             /* Not implemented */
>>>>> +             cp->a0 = -ENOTSUPP;
>>>>> +             break;
>>>>
>>>> These two should be covered by the default case.
>>>
>>> Sure, I will update.
>>>
>>>>
>>>>> +     case SBI_CLEAR_IPI:
>>>>> +             kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
>>>>> +             break;
>>>>> +     case SBI_SEND_IPI:
>>>>> +             dhart_mask = kvm_sbi_unpriv_load((unsigned long *)cp->a0, vcpu);
>>>>> +             for_each_set_bit(vcpuid, &dhart_mask, BITS_PER_LONG) {
>>>>> +                     remote_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, vcpuid);
>>>>> +                     kvm_riscv_vcpu_set_interrupt(remote_vcpu, IRQ_S_SOFT);
>>>>> +             }
>>>>> +             break;
>>>>> +     case SBI_SHUTDOWN:
>>>>> +             kvm_sbi_system_shutdown(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
>>>>> +             ret = 0;
>>>>> +             break;
>>>>> +     case SBI_REMOTE_FENCE_I:
>>>>> +             sbi_remote_fence_i(NULL);
>>>>> +             break;
>>>>> +     /*
>>>>> +      * TODO: There should be a way to call remote hfence.bvma.
>>>>> +      * Preferred method is now a SBI call. Until then, just flush
>>>>> +      * all tlbs.
>>>>> +      */
>>>>> +     case SBI_REMOTE_SFENCE_VMA:
>>>>> +             /*TODO: Parse vma range.*/
>>>>> +             sbi_remote_sfence_vma(NULL, 0, 0);
>>>>> +             break;
>>>>> +     case SBI_REMOTE_SFENCE_VMA_ASID:
>>>>> +             /*TODO: Parse vma range for given ASID */
>>>>> +             sbi_remote_sfence_vma(NULL, 0, 0);
>>>>> +             break;
>>>>> +     default:
>>>>> +             cp->a0 = ENOTSUPP;
>>>>> +             break;
>>>>
>>>> Please just send unsupported SBI events into user space.
>>>
>>> For unsupported SBI calls, we should be returning error to the
>>> Guest Linux so that do something about it. This is in accordance
>>> with the SBI spec.
>>
>> That's up to user space (QEMU / kvmtool) to decide. If user space wants to implement the  console functions (like we do on s390), it should have the chance to do so.
> 
> The SBI_CONSOLE_PUTCHAR and SBI_CONSOLE_GETCHAR are
> for debugging only. These calls are deprecated in SBI v0.2 onwards
> because we now have earlycon for early prints in Linux RISC-V.
> 
> The RISC-V Guest will generally have it's own MMIO based UART
> which will be the default console.
> 
> Due to these reasons, we have not implemented these SBI calls.

I'm not saying we should implement them. I'm saying we should leave a 
policy decision like that up to user space. By terminating the SBI in 
kernel space, you can not quickly debug something going wrong.

> If we still want user-space to implement this then we will require
> separate exit reasons and we are trying to avoid adding RISC-V
> specific exit reasons/ioctls in KVM user-space ABI.

Why?

I had so many occasions where I would have loved to have user space 
exits for MSR access, SPR access, hypercalls, etc etc. It really makes 
life so much easier when you can quickly hack something up in user space 
rather than modify the kernel.

> The absence of SBI_CONSOLE_PUTCHAR/GETCHAR certainly
> does not block anyone in debugging Guest Linux because we have
> earlycon support in Linux RISC-V.

I'm not hung on on the console. What I'm trying to express is a general 
sentiment that terminating extensible hypervisor <-> guest interfaces in 
kvm is not a great idea. Some times we can't get around it (like on page 
tables), but some times we do. And this is a case where we could.

At the end of the day this is your call though :).


Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 15/20] RISC-V: KVM: Add timer functionality
  2019-08-23 12:11             ` Anup Patel
@ 2019-08-23 12:25               ` Alexander Graf
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Graf @ 2019-08-23 12:25 UTC (permalink / raw)
  To: Anup Patel
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Daniel Lezcano, Thomas Gleixner, Atish Patra,
	Alistair Francis, Damien Le Moal, Christoph Hellwig, kvm,
	linux-riscv, linux-kernel



On 23.08.19 14:11, Anup Patel wrote:
> On Fri, Aug 23, 2019 at 5:19 PM Alexander Graf <graf@amazon.com> wrote:
>>
>>
>>
>> On 23.08.19 13:46, Anup Patel wrote:
>>> On Fri, Aug 23, 2019 at 5:03 PM Graf (AWS), Alexander <graf@amazon.com> wrote:
>>>>
>>>>
>>>>
>>>>> Am 23.08.2019 um 13:05 schrieb Anup Patel <anup@brainfault.org>:
>>>>>
>>>>>> On Fri, Aug 23, 2019 at 1:23 PM Alexander Graf <graf@amazon.com> wrote:
>>>>>>
>>>>>>> On 22.08.19 10:46, Anup Patel wrote:
>>>>>>> From: Atish Patra <atish.patra@wdc.com>
>>>>>>>
>>>>>>> The RISC-V hypervisor specification doesn't have any virtual timer
>>>>>>> feature.
>>>>>>>
>>>>>>> Due to this, the guest VCPU timer will be programmed via SBI calls.
>>>>>>> The host will use a separate hrtimer event for each guest VCPU to
>>>>>>> provide timer functionality. We inject a virtual timer interrupt to
>>>>>>> the guest VCPU whenever the guest VCPU hrtimer event expires.
>>>>>>>
>>>>>>> The following features are not supported yet and will be added in
>>>>>>> future:
>>>>>>> 1. A time offset to adjust guest time from host time
>>>>>>> 2. A saved next event in guest vcpu for vm migration
>>>>>>
>>>>>> Implementing these 2 bits right now should be trivial. Why wait?
>>>>>
>>
>> [...]
>>
>>>>>> ... in fact, I feel like I'm missing something obvious here. How does
>>>>>> the guest trigger the timer event? What is the argument it uses for that
>>>>>> and how does that play with the tbfreq in the earlier patch?
>>>>>
>>>>> We have SBI call inferface between Hypervisor and Guest. One of the
>>>>> SBI call allows Guest to program time event. The next event is specified
>>>>> as absolute cycles. The Guest can read time using TIME CSR which
>>>>> returns system timer value (@ tbfreq freqency).
>>>>>
>>>>> Guest Linux will know the tbfreq from DTB passed by QEMU/KVMTOOL
>>>>> and it has to be same as Host tbfreq.
>>>>>
>>>>> The TBFREQ config register visible to user-space is a read-only CONFIG
>>>>> register which tells user-space tools (QEMU/KVMTOOL) about Host tbfreq.
>>>>
>>>> And it's read-only because you can not trap on TB reads?
>>>
>>> There is no TB registers.
>>>
>>> The tbfreq can only be know through DT/ACPI kind-of HW description
>>> for both Host and Guest.
>>>
>>> The KVM user-space tool needs to know TBFREQ so that it can set correct
>>> value in generated DT for Guest Linux.
>>
>> So what access methods do get influenced by TBFREQ? If it's only the SBI
>> timer, we can control the frequency, which means we can make TBFREQ
>> read/write.
> 
> There are two things influenced by TBFREQ:
> 1. TIME CSR which is a free running counter
> 2. SBI calls for programming next timer event
> 
> The Guest TIME CSR will be at same rate as Host TIME CSR so
> we cannot show different TBFREQ to Guest Linux.
> 
> In future, we will be having a dedicated RISC-V timer extension which
> will have all programming done via CSRs but until then we are stuck
> with TIME CSR + SBI call combination.

Please make sure that in a future revision of the spec either

   a) TIME CSR can be trapped or
   b) TIME CSR can be virtualized (virtual TIME READ has offset and 
multiplier on phys TIME READ applied)

and the same goes for the timer extension - either make it all trappable 
or all propery adjustable. You need to be double cautious there that 
people don't design something that breaks live migration between hosts 
that have a different TBFREQ.


Thanks,

Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 00/20] KVM RISC-V Support
  2019-08-23 12:19         ` Anup Patel
@ 2019-08-23 12:28           ` Alexander Graf
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Graf @ 2019-08-23 12:28 UTC (permalink / raw)
  To: Anup Patel, Paolo Bonzini
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Radim K,
	Daniel Lezcano, Thomas Gleixner, Atish Patra, Alistair Francis,
	Damien Le Moal, Christoph Hellwig, kvm, linux-riscv,
	linux-kernel



On 23.08.19 14:19, Anup Patel wrote:
> On Fri, Aug 23, 2019 at 5:40 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 23/08/19 13:44, Graf (AWS), Alexander wrote:
>>>> Overall, I'm quite happy with the code. It's a very clean implementation
>>>> of a KVM target.
>>
>> Yup, I said the same even for v1 (I prefer recursive implementation of
>> page table walking but that's all I can say).
>>
>>>> I will send v6 next week. I will try my best to implement unpriv
>>>> trap handling in v6 itself.
>>> Are you sure unpriv is the only exception that can hit there? What
>>> about NMIs? Do you have #MCs yet (ECC errors)? Do you have something
>>> like ARM's #SError which can asynchronously hit at any time because
>>> of external bus (PCI) errors?
>>
>> As far as I know, all interrupts on RISC-V are disabled by
>> local_irq_disable()/local_irq_enable().
> 
> Yes, we don't have per-CPU interrupts for async bus errors or
> non-maskable interrupts. The local_irq_disable() and local_irq_enable()
> affect all interrupts (excepts traps).

Awesome, so that means you really only need to worry about traps. Even 
easier then! :)

Also, you want to look out for a future extension that adds any of the 
above (NMI, MCE, SError on local bus), as that would then break the 
function ;)


Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support
  2019-08-23 12:19           ` Alexander Graf
@ 2019-08-23 12:28             ` Anup Patel
  0 siblings, 0 replies; 61+ messages in thread
From: Anup Patel @ 2019-08-23 12:28 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anup Patel, Palmer Dabbelt, Paul Walmsley, Paolo Bonzini,
	Radim K, Damien Le Moal, kvm, Daniel Lezcano, linux-kernel,
	Christoph Hellwig, Atish Patra, Alistair Francis,
	Thomas Gleixner, linux-riscv

On Fri, Aug 23, 2019 at 5:50 PM Alexander Graf <graf@amazon.com> wrote:
>
>
>
> On 23.08.19 14:00, Anup Patel wrote:
> > On Fri, Aug 23, 2019 at 5:09 PM Graf (AWS), Alexander <graf@amazon.com> wrote:
> >>
> >>
> >>
> >>> Am 23.08.2019 um 13:18 schrieb Anup Patel <anup@brainfault.org>:
> >>>
> >>>> On Fri, Aug 23, 2019 at 1:34 PM Alexander Graf <graf@amazon.com> wrote:
> >>>>
> >>>>> On 22.08.19 10:46, Anup Patel wrote:
> >>>>> From: Atish Patra <atish.patra@wdc.com>
> >>>>>
> >>>>> The KVM host kernel running in HS-mode needs to handle SBI calls coming
> >>>>> from guest kernel running in VS-mode.
> >>>>>
> >>>>> This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
> >>>>> implemented correctly except remote tlb flushes. For remote TLB flushes,
> >>>>> we are doing full TLB flush and this will be optimized in future.
> >>>>>
> >>>>> Signed-off-by: Atish Patra <atish.patra@wdc.com>
> >>>>> Signed-off-by: Anup Patel <anup.patel@wdc.com>
> >>>>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>>> ---
> >>>>>   arch/riscv/include/asm/kvm_host.h |   2 +
> >>>>>   arch/riscv/kvm/Makefile           |   2 +-
> >>>>>   arch/riscv/kvm/vcpu_exit.c        |   3 +
> >>>>>   arch/riscv/kvm/vcpu_sbi.c         | 119 ++++++++++++++++++++++++++++++
> >>>>>   4 files changed, 125 insertions(+), 1 deletion(-)
> >>>>>   create mode 100644 arch/riscv/kvm/vcpu_sbi.c
> >>>>>
> >>>>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> >>>>> index 2af3a179c08e..0b1eceaef59f 100644
> >>>>> --- a/arch/riscv/include/asm/kvm_host.h
> >>>>> +++ b/arch/riscv/include/asm/kvm_host.h
> >>>>> @@ -241,4 +241,6 @@ bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
> >>>>>   void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
> >>>>>   void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
> >>>>>
> >>>>> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu);
> >>>>> +
> >>>>>   #endif /* __RISCV_KVM_HOST_H__ */
> >>>>> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> >>>>> index 3e0c7558320d..b56dc1650d2c 100644
> >>>>> --- a/arch/riscv/kvm/Makefile
> >>>>> +++ b/arch/riscv/kvm/Makefile
> >>>>> @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
> >>>>>   kvm-objs := $(common-objs-y)
> >>>>>
> >>>>>   kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
> >>>>> -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
> >>>>> +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
> >>>>>
> >>>>>   obj-$(CONFIG_KVM)   += kvm.o
> >>>>> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> >>>>> index fbc04fe335ad..87b83fcf9a14 100644
> >>>>> --- a/arch/riscv/kvm/vcpu_exit.c
> >>>>> +++ b/arch/riscv/kvm/vcpu_exit.c
> >>>>> @@ -534,6 +534,9 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >>>>>                   (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
> >>>>>                       ret = stage2_page_fault(vcpu, run, scause, stval);
> >>>>>               break;
> >>>>> +     case EXC_SUPERVISOR_SYSCALL:
> >>>>> +             if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
> >>>>> +                     ret = kvm_riscv_vcpu_sbi_ecall(vcpu);
> >>>>>       default:
> >>>>>               break;
> >>>>>       };
> >>>>> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> >>>>> new file mode 100644
> >>>>> index 000000000000..5793202eb514
> >>>>> --- /dev/null
> >>>>> +++ b/arch/riscv/kvm/vcpu_sbi.c
> >>>>> @@ -0,0 +1,119 @@
> >>>>> +// SPDX-License-Identifier: GPL-2.0
> >>>>> +/**
> >>>>> + * Copyright (c) 2019 Western Digital Corporation or its affiliates.
> >>>>> + *
> >>>>> + * Authors:
> >>>>> + *     Atish Patra <atish.patra@wdc.com>
> >>>>> + */
> >>>>> +
> >>>>> +#include <linux/errno.h>
> >>>>> +#include <linux/err.h>
> >>>>> +#include <linux/kvm_host.h>
> >>>>> +#include <asm/csr.h>
> >>>>> +#include <asm/kvm_vcpu_timer.h>
> >>>>> +
> >>>>> +#define SBI_VERSION_MAJOR                    0
> >>>>> +#define SBI_VERSION_MINOR                    1
> >>>>> +
> >>>>> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
> >>>>
> >>>> Ugh, another one of those? Can't you just figure out a way to recover
> >>>> from the page fault? Also, you want to combine this with the instruction
> >>>> load logic, so that we have a single place that guest address space
> >>>> reads go through.
> >>>
> >>> Walking Guest page table would be more expensive compared to implementing
> >>> a trap handling mechanism.
> >>>
> >>> We will be adding trap handling mechanism for reading instruction and reading
> >>> load.
> >>>
> >>> Both these operations are different in following ways:
> >>> 1. RISC-V instructions are variable length. We get to know exact instruction
> >>>     length only after reading first 16bits
> >>> 2. We need to set VSSTATUS.MXR bit when reading instruction for
> >>>     execute-only Guest pages.
> >>
> >> Yup, sounds like you could solve that with a trivial if() based on "read instruction" or not, no? If you want to, feel free to provide short versions that do only read ins/data, but I would really like to see the whole "data reads become guest reads" magic to be funneled through a single function (in C, can be inline unrolled in asm of course)
> >>
> >>>
> >>>>
> >>>>> +static unsigned long kvm_sbi_unpriv_load(const unsigned long *addr,
> >>>>> +                                      struct kvm_vcpu *vcpu)
> >>>>> +{
> >>>>> +     unsigned long flags, val;
> >>>>> +     unsigned long __hstatus, __sstatus;
> >>>>> +
> >>>>> +     local_irq_save(flags);
> >>>>> +     __hstatus = csr_read(CSR_HSTATUS);
> >>>>> +     __sstatus = csr_read(CSR_SSTATUS);
> >>>>> +     csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> >>>>> +     csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
> >>>>> +     val = *addr;
> >>>>> +     csr_write(CSR_HSTATUS, __hstatus);
> >>>>> +     csr_write(CSR_SSTATUS, __sstatus);
> >>>>> +     local_irq_restore(flags);
> >>>>> +
> >>>>> +     return val;
> >>>>> +}
> >>>>> +
> >>>>> +static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, u32 type)
> >>>>> +{
> >>>>> +     int i;
> >>>>> +     struct kvm_vcpu *tmp;
> >>>>> +
> >>>>> +     kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> >>>>> +             tmp->arch.power_off = true;
> >>>>> +     kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
> >>>>> +
> >>>>> +     memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> >>>>> +     vcpu->run->system_event.type = type;
> >>>>> +     vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> >>>>> +}
> >>>>> +
> >>>>> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu)
> >>>>> +{
> >>>>> +     int ret = 1;
> >>>>> +     u64 next_cycle;
> >>>>> +     int vcpuid;
> >>>>> +     struct kvm_vcpu *remote_vcpu;
> >>>>> +     ulong dhart_mask;
> >>>>> +     struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> >>>>> +
> >>>>> +     if (!cp)
> >>>>> +             return -EINVAL;
> >>>>> +     switch (cp->a7) {
> >>>>> +     case SBI_SET_TIMER:
> >>>>> +#if __riscv_xlen == 32
> >>>>> +             next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
> >>>>> +#else
> >>>>> +             next_cycle = (u64)cp->a0;
> >>>>> +#endif
> >>>>> +             kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
> >>>>
> >>>> Ah, this is where the timer set happens. I still don't understand how
> >>>> this takes the frequency bit into account?
> >>>
> >>> Explained it in PATCH17 comments.
> >>>
> >>>>
> >>>>> +             break;
> >>>>> +     case SBI_CONSOLE_PUTCHAR:
> >>>>> +             /* Not implemented */
> >>>>> +             cp->a0 = -ENOTSUPP;
> >>>>> +             break;
> >>>>> +     case SBI_CONSOLE_GETCHAR:
> >>>>> +             /* Not implemented */
> >>>>> +             cp->a0 = -ENOTSUPP;
> >>>>> +             break;
> >>>>
> >>>> These two should be covered by the default case.
> >>>
> >>> Sure, I will update.
> >>>
> >>>>
> >>>>> +     case SBI_CLEAR_IPI:
> >>>>> +             kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
> >>>>> +             break;
> >>>>> +     case SBI_SEND_IPI:
> >>>>> +             dhart_mask = kvm_sbi_unpriv_load((unsigned long *)cp->a0, vcpu);
> >>>>> +             for_each_set_bit(vcpuid, &dhart_mask, BITS_PER_LONG) {
> >>>>> +                     remote_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, vcpuid);
> >>>>> +                     kvm_riscv_vcpu_set_interrupt(remote_vcpu, IRQ_S_SOFT);
> >>>>> +             }
> >>>>> +             break;
> >>>>> +     case SBI_SHUTDOWN:
> >>>>> +             kvm_sbi_system_shutdown(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
> >>>>> +             ret = 0;
> >>>>> +             break;
> >>>>> +     case SBI_REMOTE_FENCE_I:
> >>>>> +             sbi_remote_fence_i(NULL);
> >>>>> +             break;
> >>>>> +     /*
> >>>>> +      * TODO: There should be a way to call remote hfence.bvma.
> >>>>> +      * Preferred method is now a SBI call. Until then, just flush
> >>>>> +      * all tlbs.
> >>>>> +      */
> >>>>> +     case SBI_REMOTE_SFENCE_VMA:
> >>>>> +             /*TODO: Parse vma range.*/
> >>>>> +             sbi_remote_sfence_vma(NULL, 0, 0);
> >>>>> +             break;
> >>>>> +     case SBI_REMOTE_SFENCE_VMA_ASID:
> >>>>> +             /*TODO: Parse vma range for given ASID */
> >>>>> +             sbi_remote_sfence_vma(NULL, 0, 0);
> >>>>> +             break;
> >>>>> +     default:
> >>>>> +             cp->a0 = ENOTSUPP;
> >>>>> +             break;
> >>>>
> >>>> Please just send unsupported SBI events into user space.
> >>>
> >>> For unsupported SBI calls, we should be returning error to the
> >>> Guest Linux so that do something about it. This is in accordance
> >>> with the SBI spec.
> >>
> >> That's up to user space (QEMU / kvmtool) to decide. If user space wants to implement the  console functions (like we do on s390), it should have the chance to do so.
> >
> > The SBI_CONSOLE_PUTCHAR and SBI_CONSOLE_GETCHAR are
> > for debugging only. These calls are deprecated in SBI v0.2 onwards
> > because we now have earlycon for early prints in Linux RISC-V.
> >
> > The RISC-V Guest will generally have it's own MMIO based UART
> > which will be the default console.
> >
> > Due to these reasons, we have not implemented these SBI calls.
>
> I'm not saying we should implement them. I'm saying we should leave a
> policy decision like that up to user space. By terminating the SBI in
> kernel space, you can not quickly debug something going wrong.
>
> > If we still want user-space to implement this then we will require
> > separate exit reasons and we are trying to avoid adding RISC-V
> > specific exit reasons/ioctls in KVM user-space ABI.
>
> Why?
>
> I had so many occasions where I would have loved to have user space
> exits for MSR access, SPR access, hypercalls, etc etc. It really makes
> life so much easier when you can quickly hack something up in user space
> rather than modify the kernel.
>
> > The absence of SBI_CONSOLE_PUTCHAR/GETCHAR certainly
> > does not block anyone in debugging Guest Linux because we have
> > earlycon support in Linux RISC-V.
>
> I'm not hung on on the console. What I'm trying to express is a general
> sentiment that terminating extensible hypervisor <-> guest interfaces in
> kvm is not a great idea. Some times we can't get around it (like on page
> tables), but some times we do. And this is a case where we could.
>
> At the end of the day this is your call though :).

I am not sure about user-space CSRs but having ability to route
unsupported SBI calls to user-space can be very useful.

For this series, we will continue to return error to Guest Linux for
unsupported SBI calls.

We will add unsupported SBI routing to user-space in next series.

Regards,
Anup

>
>
> Alex

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, back to index

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22  8:42 [PATCH v5 00/20] KVM RISC-V Support Anup Patel
2019-08-22  8:42 ` [PATCH v5 01/20] KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface Anup Patel
2019-08-22  8:42 ` [PATCH v5 02/20] RISC-V: Add bitmap reprensenting ISA features common across CPUs Anup Patel
2019-08-22  8:43 ` [PATCH v5 03/20] RISC-V: Export few kernel symbols Anup Patel
2019-08-22  8:43 ` [PATCH v5 04/20] RISC-V: Add hypervisor extension related CSR defines Anup Patel
2019-08-22  8:43 ` [PATCH v5 05/20] RISC-V: Add initial skeletal KVM support Anup Patel
2019-08-22  8:43 ` [PATCH v5 06/20] RISC-V: KVM: Implement VCPU create, init and destroy functions Anup Patel
2019-08-22  8:44 ` [PATCH v5 07/20] RISC-V: KVM: Implement VCPU interrupts and requests handling Anup Patel
2019-08-22  8:44 ` [PATCH v5 08/20] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls Anup Patel
2019-08-22 12:01   ` Alexander Graf
2019-08-22 14:00     ` Anup Patel
2019-08-22 14:12       ` Alexander Graf
2019-08-23 11:20         ` Anup Patel
2019-08-23 11:42           ` Graf (AWS), Alexander
2019-08-22 14:05     ` Anup Patel
2019-08-22  8:44 ` [PATCH v5 09/20] RISC-V: KVM: Implement VCPU world-switch Anup Patel
2019-08-22  8:44 ` [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU Anup Patel
2019-08-22 12:10   ` Alexander Graf
2019-08-22 12:21     ` Andrew Jones
2019-08-22 12:27     ` Anup Patel
2019-08-22 12:14   ` Alexander Graf
2019-08-22 12:33     ` Anup Patel
2019-08-22 13:25       ` Alexander Graf
2019-08-22 13:55         ` Anup Patel
2019-08-22  8:45 ` [PATCH v5 11/20] RISC-V: KVM: Handle WFI " Anup Patel
2019-08-22 12:19   ` Alexander Graf
2019-08-22 12:50     ` Anup Patel
2019-08-22  8:45 ` [PATCH v5 12/20] RISC-V: KVM: Implement VMID allocator Anup Patel
2019-08-22  8:45 ` [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming Anup Patel
2019-08-22 12:28   ` Alexander Graf
2019-08-22 12:38     ` Anup Patel
2019-08-22 13:27       ` Alexander Graf
2019-08-22 13:58         ` Anup Patel
2019-08-22 14:09           ` Alexander Graf
2019-08-23 11:21             ` Anup Patel
2019-08-22  8:45 ` [PATCH v5 14/20] RISC-V: KVM: Implement MMU notifiers Anup Patel
2019-08-22  8:46 ` [PATCH v5 15/20] RISC-V: KVM: Add timer functionality Anup Patel
2019-08-23  7:52   ` Alexander Graf
2019-08-23 11:04     ` Anup Patel
2019-08-23 11:33       ` Graf (AWS), Alexander
2019-08-23 11:46         ` Anup Patel
2019-08-23 11:49           ` Alexander Graf
2019-08-23 12:11             ` Anup Patel
2019-08-23 12:25               ` Alexander Graf
2019-08-22  8:46 ` [PATCH v5 16/20] RISC-V: KVM: FP lazy save/restore Anup Patel
2019-08-22  8:46 ` [PATCH v5 17/20] RISC-V: KVM: Implement ONE REG interface for FP registers Anup Patel
2019-08-22  8:46 ` [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support Anup Patel
2019-08-23  8:04   ` Alexander Graf
2019-08-23 11:17     ` Anup Patel
2019-08-23 11:38       ` Graf (AWS), Alexander
2019-08-23 12:00         ` Anup Patel
2019-08-23 12:19           ` Alexander Graf
2019-08-23 12:28             ` Anup Patel
2019-08-22  8:47 ` [PATCH v5 19/20] RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig Anup Patel
2019-08-22  8:47 ` [PATCH v5 20/20] RISC-V: KVM: Add MAINTAINERS entry Anup Patel
2019-08-23  8:08 ` [PATCH v5 00/20] KVM RISC-V Support Alexander Graf
2019-08-23 11:25   ` Anup Patel
2019-08-23 11:44     ` Graf (AWS), Alexander
2019-08-23 12:10       ` Paolo Bonzini
2019-08-23 12:19         ` Anup Patel
2019-08-23 12:28           ` Alexander Graf

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org kvm@archiver.kernel.org
	public-inbox-index kvm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox