linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest
@ 2024-02-29  1:01 Atish Patra
  2024-02-29  1:01 ` [PATCH v4 01/15] RISC-V: Fix the typo in Scountovf CSR name Atish Patra
                   ` (14 more replies)
  0 siblings, 15 replies; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Anup Patel, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions. 

SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters. 

The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf. 

Here are the patch wise implementation details.

PATCH 1,6,7 : Generic cleanups/improvements.
PATCH 2,3,10 : FW_READ_HI function implementation
PATCH 4-5: Add PMU snapshot feature in sbi pmu driver
PATCH 6-7: KVM implementation for snapshot and sampling in kvm guests
PATCH 11-15: KVM selftests for SBI PMU extension

The series is based on kvm-next and is available at:

https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v4

The series is based on kvm-riscv/queue branch + fixes suggested on the following
series
https://patchwork.kernel.org/project/kvm/cover/cover.1705916069.git.haibo1.xu@intel.com/

The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf

It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.

```
-cpu rv64,sscofpmf=true,x-ssaia=true
```

There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.

```
./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug 
```

The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.

===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
...
$ Running 'sched/messaging' benchmark:
...
[   45.928723] perf_duration_warn: 2 callbacks suppressed
[   45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run

     Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943  of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead  Command          Shared Object                Symbol               >
$ ........  ...............  ...........................  .....................>
$
     7.59%  sched-messaging  [kernel.kallsyms]            [k] memcpy
     5.48%  sched-messaging  [kernel.kallsyms]            [k] percpu_counter_ad>
     5.24%  sched-messaging  [kernel.kallsyms]            [k] __sbi_rfence_v02_>
     4.00%  sched-messaging  [kernel.kallsyms]            [k] _raw_spin_unlock_>
     3.79%  sched-messaging  [kernel.kallsyms]            [k] set_pte_range
     3.72%  sched-messaging  [kernel.kallsyms]            [k] next_uptodate_fol>
     3.46%  sched-messaging  [kernel.kallsyms]            [k] filemap_map_pages
     3.31%  sched-messaging  [kernel.kallsyms]            [k] handle_mm_fault
     3.20%  sched-messaging  [kernel.kallsyms]            [k] finish_task_switc>
     3.16%  sched-messaging  [kernel.kallsyms]            [k] clear_page
     3.03%  sched-messaging  [kernel.kallsyms]            [k] mtree_range_walk
     2.42%  sched-messaging  [kernel.kallsyms]            [k] flush_icache_pte

===================================================

[1] https://github.com/riscv-non-isa/riscv-sbi-doc

Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.

Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.

Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.

Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
   Ssaia in the host.

Atish Patra (15):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
RISC-V: Add SBI PMU snapshot definitions
drivers/perf: riscv: Implement SBI PMU snapshot function
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow

arch/riscv/include/asm/csr.h                  |   5 +-
arch/riscv/include/asm/errata_list.h          |   2 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h         |  14 +-
arch/riscv/include/asm/sbi.h                  |  12 +
arch/riscv/include/uapi/asm/kvm.h             |   1 +
arch/riscv/kvm/aia.c                          |   5 +
arch/riscv/kvm/vcpu.c                         |  14 +-
arch/riscv/kvm/vcpu_onereg.c                  |   9 +-
arch/riscv/kvm/vcpu_pmu.c                     | 247 +++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c                 |  15 +-
drivers/perf/riscv_pmu.c                      |   1 +
drivers/perf/riscv_pmu_sbi.c                  | 229 ++++++-
include/linux/perf/riscv_pmu.h                |   6 +
tools/testing/selftests/kvm/Makefile          |   1 +
.../selftests/kvm/include/riscv/processor.h   |  92 +++
.../selftests/kvm/lib/riscv/processor.c       |  12 +
.../selftests/kvm/riscv/get-reg-list.c        |   4 +
tools/testing/selftests/kvm/riscv/sbi_pmu.c   | 588 ++++++++++++++++++
18 files changed, 1212 insertions(+), 45 deletions(-)
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu.c

--
2.34.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v4 01/15] RISC-V: Fix the typo in Scountovf CSR name
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-01  8:25   ` Clément Léger
  2024-02-29  1:01 ` [PATCH v4 02/15] RISC-V: Add FIRMWARE_READ_HI definition Atish Patra
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Conor Dooley, Anup Patel, Albert Ou,
	Alexandre Ghiti, Andrew Jones, Atish Patra, Guo Ren,
	Icenowy Zheng, kvm-riscv, kvm, linux-kselftest, linux-riscv,
	Mark Rutland, Palmer Dabbelt, Paolo Bonzini, Paul Walmsley,
	Shuah Khan, Will Deacon

The counter overflow CSR name is "scountovf" not "sscountovf".

Fix the csr name.

Fixes: 4905ec2fb7e6 ("RISC-V: Add sscofpmf extension support")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/csr.h         | 2 +-
 arch/riscv/include/asm/errata_list.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 510014051f5d..603e5a3c61f9 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -281,7 +281,7 @@
 #define CSR_HPMCOUNTER30H	0xc9e
 #define CSR_HPMCOUNTER31H	0xc9f
 
-#define CSR_SSCOUNTOVF		0xda0
+#define CSR_SCOUNTOVF		0xda0
 
 #define CSR_SSTATUS		0x100
 #define CSR_SIE			0x104
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index ea33288f8a25..cd49eb025ddf 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -114,7 +114,7 @@ asm volatile(ALTERNATIVE(						\
 
 #define ALT_SBI_PMU_OVERFLOW(__ovl)					\
 asm volatile(ALTERNATIVE(						\
-	"csrr %0, " __stringify(CSR_SSCOUNTOVF),			\
+	"csrr %0, " __stringify(CSR_SCOUNTOVF),				\
 	"csrr %0, " __stringify(THEAD_C9XX_CSR_SCOUNTEROF),		\
 		THEAD_VENDOR_ID, ERRATA_THEAD_PMU,			\
 		CONFIG_ERRATA_THEAD_PMU)				\
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 02/15] RISC-V: Add FIRMWARE_READ_HI definition
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
  2024-02-29  1:01 ` [PATCH v4 01/15] RISC-V: Fix the typo in Scountovf CSR name Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-01  8:27   ` Clément Léger
  2024-02-29  1:01 ` [PATCH v4 03/15] drivers/perf: riscv: Read upper bits of a firmware counter Atish Patra
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Conor Dooley, Anup Patel, Albert Ou,
	Alexandre Ghiti, Andrew Jones, Atish Patra, Guo Ren,
	Icenowy Zheng, kvm-riscv, kvm, linux-kselftest, linux-riscv,
	Mark Rutland, Palmer Dabbelt, Paolo Bonzini, Paul Walmsley,
	Shuah Khan, Will Deacon

SBI v2.0 added another function to SBI PMU extension to read
the upper bits of a counter with width larger than XLEN.

Add the definition for that function.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/sbi.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 6e68f8dff76b..ef8311dafb91 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -131,6 +131,7 @@ enum sbi_ext_pmu_fid {
 	SBI_EXT_PMU_COUNTER_START,
 	SBI_EXT_PMU_COUNTER_STOP,
 	SBI_EXT_PMU_COUNTER_FW_READ,
+	SBI_EXT_PMU_COUNTER_FW_READ_HI,
 };
 
 union sbi_pmu_ctr_info {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 03/15] drivers/perf: riscv: Read upper bits of a firmware counter
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
  2024-02-29  1:01 ` [PATCH v4 01/15] RISC-V: Fix the typo in Scountovf CSR name Atish Patra
  2024-02-29  1:01 ` [PATCH v4 02/15] RISC-V: Add FIRMWARE_READ_HI definition Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-01  9:52   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 04/15] RISC-V: Add SBI PMU snapshot definitions Atish Patra
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Palmer Dabbelt, Conor Dooley, Anup Patel, Albert Ou,
	Alexandre Ghiti, Andrew Jones, Atish Patra, Guo Ren,
	Icenowy Zheng, kvm-riscv, kvm, linux-kselftest, linux-riscv,
	Mark Rutland, Palmer Dabbelt, Paolo Bonzini, Paul Walmsley,
	Shuah Khan, Will Deacon

SBI v2.0 introduced a explicit function to read the upper 32 bits
for any firmwar counter width that is longer than 32bits.
This is only applicable for RV32 where firmware counter can be
64 bit.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 drivers/perf/riscv_pmu_sbi.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 16acd4dcdb96..ea0fdb589f0d 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -35,6 +35,8 @@
 PMU_FORMAT_ATTR(event, "config:0-47");
 PMU_FORMAT_ATTR(firmware, "config:63");
 
+static bool sbi_v2_available;
+
 static struct attribute *riscv_arch_formats_attr[] = {
 	&format_attr_event.attr,
 	&format_attr_firmware.attr,
@@ -488,16 +490,23 @@ static u64 pmu_sbi_ctr_read(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 	struct sbiret ret;
-	union sbi_pmu_ctr_info info;
 	u64 val = 0;
+	union sbi_pmu_ctr_info info = pmu_ctr_list[idx];
 
 	if (pmu_sbi_is_fw_event(event)) {
 		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ,
 				hwc->idx, 0, 0, 0, 0, 0);
-		if (!ret.error)
-			val = ret.value;
+		if (ret.error)
+			return 0;
+
+		val = ret.value;
+		if (IS_ENABLED(CONFIG_32BIT) && sbi_v2_available && info.width >= 32) {
+			ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ_HI,
+					hwc->idx, 0, 0, 0, 0, 0);
+			if (!ret.error)
+				val |= ((u64)ret.value << 32);
+		}
 	} else {
-		info = pmu_ctr_list[idx];
 		val = riscv_pmu_ctr_read_csr(info.csr);
 		if (IS_ENABLED(CONFIG_32BIT))
 			val = ((u64)riscv_pmu_ctr_read_csr(info.csr + 0x80)) << 31 | val;
@@ -1108,6 +1117,9 @@ static int __init pmu_sbi_devinit(void)
 		return 0;
 	}
 
+	if (sbi_spec_version >= sbi_mk_version(2, 0))
+		sbi_v2_available = true;
+
 	ret = cpuhp_setup_state_multi(CPUHP_AP_PERF_RISCV_STARTING,
 				      "perf/riscv/pmu:starting",
 				      pmu_sbi_starting_cpu, pmu_sbi_dying_cpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 04/15] RISC-V: Add SBI PMU snapshot definitions
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (2 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 03/15] drivers/perf: riscv: Read upper bits of a firmware counter Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-01 11:14   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 05/15] drivers/perf: riscv: Implement SBI PMU snapshot function Atish Patra
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Anup Patel, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Andrew Jones, Atish Patra, Conor Dooley,
	Guo Ren, Icenowy Zheng, kvm-riscv, kvm, linux-kselftest,
	linux-riscv, Mark Rutland, Palmer Dabbelt, Paolo Bonzini,
	Paul Walmsley, Shuah Khan, Will Deacon

SBI PMU Snapshot function optimizes the number of traps to
higher privilege mode by leveraging a shared memory between the S/VS-mode
and the M/HS mode. Add the definitions for that extension and new error
codes.

Reviewed-by: Anup Patel <anup@brainfault.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/sbi.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index ef8311dafb91..dfa830f7d54b 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -132,6 +132,7 @@ enum sbi_ext_pmu_fid {
 	SBI_EXT_PMU_COUNTER_STOP,
 	SBI_EXT_PMU_COUNTER_FW_READ,
 	SBI_EXT_PMU_COUNTER_FW_READ_HI,
+	SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
 };
 
 union sbi_pmu_ctr_info {
@@ -148,6 +149,13 @@ union sbi_pmu_ctr_info {
 	};
 };
 
+/* Data structure to contain the pmu snapshot data */
+struct riscv_pmu_snapshot_data {
+	u64 ctr_overflow_mask;
+	u64 ctr_values[64];
+	u64 reserved[447];
+};
+
 #define RISCV_PMU_RAW_EVENT_MASK GENMASK_ULL(47, 0)
 #define RISCV_PMU_RAW_EVENT_IDX 0x20000
 
@@ -244,9 +252,11 @@ enum sbi_pmu_ctr_type {
 
 /* Flags defined for counter start function */
 #define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)
+#define SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT BIT(1)
 
 /* Flags defined for counter stop function */
 #define SBI_PMU_STOP_FLAG_RESET (1 << 0)
+#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
 
 enum sbi_ext_dbcn_fid {
 	SBI_EXT_DBCN_CONSOLE_WRITE = 0,
@@ -285,6 +295,7 @@ struct sbi_sta_struct {
 #define SBI_ERR_ALREADY_AVAILABLE -6
 #define SBI_ERR_ALREADY_STARTED -7
 #define SBI_ERR_ALREADY_STOPPED -8
+#define SBI_ERR_NO_SHMEM	-9
 
 extern unsigned long sbi_spec_version;
 struct sbiret {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 05/15] drivers/perf: riscv: Implement SBI PMU snapshot function
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (3 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 04/15] RISC-V: Add SBI PMU snapshot definitions Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-01 14:40   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 06/15] RISC-V: KVM: No need to update the counter value during reset Atish Patra
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Palmer Dabbelt, Anup Patel, Conor Dooley, Albert Ou,
	Alexandre Ghiti, Andrew Jones, Atish Patra, Guo Ren,
	Icenowy Zheng, kvm-riscv, kvm, linux-kselftest, linux-riscv,
	Mark Rutland, Palmer Dabbelt, Paolo Bonzini, Paul Walmsley,
	Shuah Khan, Will Deacon

SBI v2.0 SBI introduced PMU snapshot feature which adds the following
features.

1. Read counter values directly from the shared memory instead of
csr read.
2. Start multiple counters with initial values with one SBI call.

These functionalities optimizes the number of traps to the higher
privilege mode. If the kernel is in VS mode while the hypervisor
deploy trap & emulate method, this would minimize all the hpmcounter
CSR read traps. If the kernel is running in S-mode, the benefits
reduced to CSR latency vs DRAM/cache latency as there is no trap
involved while accessing the hpmcounter CSRs.

In both modes, it does saves the number of ecalls while starting
multiple counter together with an initial values. This is a likely
scenario if multiple counters overflow at the same time.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 drivers/perf/riscv_pmu.c       |   1 +
 drivers/perf/riscv_pmu_sbi.c   | 209 +++++++++++++++++++++++++++++++--
 include/linux/perf/riscv_pmu.h |   6 +
 3 files changed, 204 insertions(+), 12 deletions(-)

diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index 0dda70e1ef90..5b57acb770d3 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -412,6 +412,7 @@ struct riscv_pmu *riscv_pmu_alloc(void)
 		cpuc->n_events = 0;
 		for (i = 0; i < RISCV_MAX_COUNTERS; i++)
 			cpuc->events[i] = NULL;
+		cpuc->snapshot_addr = NULL;
 	}
 	pmu->pmu = (struct pmu) {
 		.event_init	= riscv_pmu_event_init,
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index ea0fdb589f0d..8de5721e8019 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -36,6 +36,9 @@ PMU_FORMAT_ATTR(event, "config:0-47");
 PMU_FORMAT_ATTR(firmware, "config:63");
 
 static bool sbi_v2_available;
+static DEFINE_STATIC_KEY_FALSE(sbi_pmu_snapshot_available);
+#define sbi_pmu_snapshot_available() \
+	static_branch_unlikely(&sbi_pmu_snapshot_available)
 
 static struct attribute *riscv_arch_formats_attr[] = {
 	&format_attr_event.attr,
@@ -485,14 +488,100 @@ static int pmu_sbi_event_map(struct perf_event *event, u64 *econfig)
 	return ret;
 }
 
+static void pmu_sbi_snapshot_free(struct riscv_pmu *pmu)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
+
+		if (!cpu_hw_evt->snapshot_addr)
+			continue;
+
+		free_page((unsigned long)cpu_hw_evt->snapshot_addr);
+		cpu_hw_evt->snapshot_addr = NULL;
+		cpu_hw_evt->snapshot_addr_phys = 0;
+	}
+}
+
+static int pmu_sbi_snapshot_alloc(struct riscv_pmu *pmu)
+{
+	int cpu;
+	struct page *snapshot_page;
+
+	for_each_possible_cpu(cpu) {
+		struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
+
+		if (cpu_hw_evt->snapshot_addr)
+			continue;
+
+		snapshot_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+		if (!snapshot_page) {
+			pmu_sbi_snapshot_free(pmu);
+			return -ENOMEM;
+		}
+		cpu_hw_evt->snapshot_addr = page_to_virt(snapshot_page);
+		cpu_hw_evt->snapshot_addr_phys = page_to_phys(snapshot_page);
+	}
+
+	return 0;
+}
+
+static void pmu_sbi_snapshot_disable(void)
+{
+	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM, -1,
+		  -1, 0, 0, 0, 0);
+}
+
+static int pmu_sbi_snapshot_setup(struct riscv_pmu *pmu, int cpu)
+{
+	struct cpu_hw_events *cpu_hw_evt;
+	struct sbiret ret = {0};
+
+	cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
+	if (!cpu_hw_evt->snapshot_addr_phys)
+		return -EINVAL;
+
+	if (cpu_hw_evt->snapshot_set_done)
+		return 0;
+
+	if (IS_ENABLED(CONFIG_32BIT))
+		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
+				cpu_hw_evt->snapshot_addr_phys,
+				(u64)(cpu_hw_evt->snapshot_addr_phys) >> 32, 0, 0, 0, 0);
+	else
+		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
+				cpu_hw_evt->snapshot_addr_phys, 0, 0, 0, 0, 0);
+
+	/* Free up the snapshot area memory and fall back to SBI PMU calls without snapshot */
+	if (ret.error) {
+		if (ret.error != SBI_ERR_NOT_SUPPORTED)
+			pr_warn("pmu snapshot setup failed with error %ld\n", ret.error);
+		return sbi_err_map_linux_errno(ret.error);
+	}
+
+	cpu_hw_evt->snapshot_set_done = true;
+
+	return 0;
+}
+
 static u64 pmu_sbi_ctr_read(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 	struct sbiret ret;
 	u64 val = 0;
+	struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
+	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
+	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
 	union sbi_pmu_ctr_info info = pmu_ctr_list[idx];
 
+	/* Read the value from the shared memory directly */
+	if (sbi_pmu_snapshot_available()) {
+		val = sdata->ctr_values[idx];
+		return val;
+	}
+
 	if (pmu_sbi_is_fw_event(event)) {
 		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ,
 				hwc->idx, 0, 0, 0, 0, 0);
@@ -539,6 +628,7 @@ static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
 
+	/* There is no benefit setting SNAPSHOT FLAG for a single counter */
 #if defined(CONFIG_32BIT)
 	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, hwc->idx,
 			1, flag, ival, ival >> 32, 0);
@@ -559,16 +649,36 @@ static void pmu_sbi_ctr_stop(struct perf_event *event, unsigned long flag)
 {
 	struct sbiret ret;
 	struct hw_perf_event *hwc = &event->hw;
+	struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
+	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
+	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
 
 	if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
 	    (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
 		pmu_sbi_reset_scounteren((void *)event);
 
+	if (sbi_pmu_snapshot_available())
+		flag |= SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
+
 	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, hwc->idx, 1, flag, 0, 0, 0);
-	if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
-		flag != SBI_PMU_STOP_FLAG_RESET)
+	if (!ret.error && sbi_pmu_snapshot_available()) {
+		/*
+		 * The counter snapshot is based on the index base specified by hwc->idx.
+		 * The actual counter value is updated in shared memory at index 0 when counter
+		 * mask is 0x01. To ensure accurate counter values, it's necessary to transfer
+		 * the counter value to shared memory. However, if hwc->idx is zero, the counter
+		 * value is already correctly updated in shared memory, requiring no further
+		 * adjustment.
+		 */
+		if (hwc->idx > 0) {
+			sdata->ctr_values[hwc->idx] = sdata->ctr_values[0];
+			sdata->ctr_values[0] = 0;
+		}
+	} else if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
+		flag != SBI_PMU_STOP_FLAG_RESET) {
 		pr_err("Stopping counter idx %d failed with error %d\n",
 			hwc->idx, sbi_err_map_linux_errno(ret.error));
+	}
 }
 
 static int pmu_sbi_find_num_ctrs(void)
@@ -626,10 +736,14 @@ static inline void pmu_sbi_stop_all(struct riscv_pmu *pmu)
 static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
 {
 	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
+	unsigned long flag = 0;
+
+	if (sbi_pmu_snapshot_available())
+		flag = SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
 
 	/* No need to check the error here as we can't do anything about the error */
 	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0,
-		  cpu_hw_evt->used_hw_ctrs[0], 0, 0, 0, 0);
+		  cpu_hw_evt->used_hw_ctrs[0], flag, 0, 0, 0);
 }
 
 /*
@@ -638,11 +752,10 @@ static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
  * while the overflowed counters need to be started with updated initialization
  * value.
  */
-static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
-					       unsigned long ctr_ovf_mask)
+static noinline void pmu_sbi_start_ovf_ctrs_sbi(struct cpu_hw_events *cpu_hw_evt,
+						unsigned long ctr_ovf_mask)
 {
 	int idx = 0;
-	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
 	struct perf_event *event;
 	unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
 	unsigned long ctr_start_mask = 0;
@@ -677,6 +790,49 @@ static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
 	}
 }
 
+static noinline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_evt,
+						     unsigned long ctr_ovf_mask)
+{
+	int idx = 0;
+	struct perf_event *event;
+	unsigned long flag = SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT;
+	u64 max_period, init_val = 0;
+	struct hw_perf_event *hwc;
+	unsigned long ctr_start_mask = 0;
+	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
+
+	for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
+		if (ctr_ovf_mask & (1 << idx)) {
+			event = cpu_hw_evt->events[idx];
+			hwc = &event->hw;
+			max_period = riscv_pmu_ctr_get_width_mask(event);
+			init_val = local64_read(&hwc->prev_count) & max_period;
+			sdata->ctr_values[idx] = init_val;
+		}
+		/*
+		 * We donot need to update the non-overflow counters the previous
+		 * value should have been there already.
+		 */
+	}
+
+	ctr_start_mask = cpu_hw_evt->used_hw_ctrs[0];
+
+	/* Start all the counters in a single shot */
+	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, 0, ctr_start_mask,
+		  flag, 0, 0, 0);
+}
+
+static void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
+					unsigned long ctr_ovf_mask)
+{
+	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
+
+	if (sbi_pmu_snapshot_available())
+		pmu_sbi_start_ovf_ctrs_snapshot(cpu_hw_evt, ctr_ovf_mask);
+	else
+		pmu_sbi_start_ovf_ctrs_sbi(cpu_hw_evt, ctr_ovf_mask);
+}
+
 static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
 {
 	struct perf_sample_data data;
@@ -690,6 +846,7 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
 	unsigned long overflowed_ctrs = 0;
 	struct cpu_hw_events *cpu_hw_evt = dev;
 	u64 start_clock = sched_clock();
+	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
 
 	if (WARN_ON_ONCE(!cpu_hw_evt))
 		return IRQ_NONE;
@@ -711,8 +868,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
 	pmu_sbi_stop_hw_ctrs(pmu);
 
 	/* Overflow status register should only be read after counter are stopped */
-	ALT_SBI_PMU_OVERFLOW(overflow);
-
+	if (sbi_pmu_snapshot_available())
+		overflow = sdata->ctr_overflow_mask;
+	else
+		ALT_SBI_PMU_OVERFLOW(overflow);
 	/*
 	 * Overflow interrupt pending bit should only be cleared after stopping
 	 * all the counters to avoid any race condition.
@@ -794,6 +953,9 @@ static int pmu_sbi_starting_cpu(unsigned int cpu, struct hlist_node *node)
 		enable_percpu_irq(riscv_pmu_irq, IRQ_TYPE_NONE);
 	}
 
+	if (sbi_pmu_snapshot_available())
+		return pmu_sbi_snapshot_setup(pmu, cpu);
+
 	return 0;
 }
 
@@ -807,6 +969,9 @@ static int pmu_sbi_dying_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Disable all counters access for user mode now */
 	csr_write(CSR_SCOUNTEREN, 0x0);
 
+	if (sbi_pmu_snapshot_available())
+		pmu_sbi_snapshot_disable();
+
 	return 0;
 }
 
@@ -1076,10 +1241,6 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
 	pmu->event_unmapped = pmu_sbi_event_unmapped;
 	pmu->csr_index = pmu_sbi_csr_index;
 
-	ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
-	if (ret)
-		return ret;
-
 	ret = riscv_pm_pmu_register(pmu);
 	if (ret)
 		goto out_unregister;
@@ -1088,8 +1249,32 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
 	if (ret)
 		goto out_unregister;
 
+	/* SBI PMU Snapsphot is only available in SBI v2.0 */
+	if (sbi_v2_available) {
+		ret = pmu_sbi_snapshot_alloc(pmu);
+		if (ret)
+			goto out_unregister;
+
+		ret = pmu_sbi_snapshot_setup(pmu, smp_processor_id());
+		if (!ret) {
+			pr_info("SBI PMU snapshot detected\n");
+			/*
+			 * We enable it once here for the boot cpu. If snapshot shmem setup
+			 * fails during cpu hotplug process, it will fail to start the cpu
+			 * as we can not handle hetergenous PMUs with different snapshot
+			 * capability.
+			 */
+			static_branch_enable(&sbi_pmu_snapshot_available);
+		}
+		/* Snapshot is an optional feature. Continue if not available */
+	}
+
 	register_sysctl("kernel", sbi_pmu_sysctl_table);
 
+	ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
+	if (ret)
+		return ret;
+
 	return 0;
 
 out_unregister:
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index 43282e22ebe1..c3fa90970042 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -39,6 +39,12 @@ struct cpu_hw_events {
 	DECLARE_BITMAP(used_hw_ctrs, RISCV_MAX_COUNTERS);
 	/* currently enabled firmware counters */
 	DECLARE_BITMAP(used_fw_ctrs, RISCV_MAX_COUNTERS);
+	/* The virtual address of the shared memory where counter snapshot will be taken */
+	void *snapshot_addr;
+	/* The physical address of the shared memory where counter snapshot will be taken */
+	phys_addr_t snapshot_addr_phys;
+	/* Boolean flag to indicate setup is already done */
+	bool snapshot_set_done;
 };
 
 struct riscv_pmu {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 06/15] RISC-V: KVM: No need to update the counter value during reset
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (4 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 05/15] drivers/perf: riscv: Implement SBI PMU snapshot function Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-02  7:47   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 07/15] RISC-V: KVM: No need to exit to the user space if perf event failed Atish Patra
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Anup Patel, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

The virtual counter value is updated during pmu_ctr_read. There is no need
to update it in reset case. Otherwise, it will be counted twice which is
incorrect.

Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling")
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/kvm/vcpu_pmu.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 86391a5061dd..b1574c043f77 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -397,7 +397,6 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
 {
 	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
 	int i, pmc_index, sbiret = 0;
-	u64 enabled, running;
 	struct kvm_pmc *pmc;
 	int fevent_code;
 
@@ -432,12 +431,9 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
 				sbiret = SBI_ERR_ALREADY_STOPPED;
 			}
 
-			if (flags & SBI_PMU_STOP_FLAG_RESET) {
-				/* Relase the counter if this is a reset request */
-				pmc->counter_val += perf_event_read_value(pmc->perf_event,
-									  &enabled, &running);
+			if (flags & SBI_PMU_STOP_FLAG_RESET)
+				/* Release the counter if this is a reset request */
 				kvm_pmu_release_perf_event(pmc);
-			}
 		} else {
 			sbiret = SBI_ERR_INVALID_PARAM;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 07/15] RISC-V: KVM: No need to exit to the user space if perf event failed
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (5 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 06/15] RISC-V: KVM: No need to update the counter value during reset Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-02  8:15   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 08/15] RISC-V: KVM: Implement SBI PMU Snapshot feature Atish Patra
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Anup Patel, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

Currently, we return a linux error code if creating a perf event failed
in kvm. That shouldn't be necessary as guest can continue to operate
without perf profiling or profiling with firmware counters.

Return appropriate SBI error code to indicate that PMU configuration
failed. An error message in kvm already describes the reason for failure.

Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling")
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/kvm/vcpu_pmu.c     | 14 +++++++++-----
 arch/riscv/kvm/vcpu_sbi_pmu.c |  6 +++---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index b1574c043f77..29bf4ca798cb 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -229,8 +229,9 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
 	return 0;
 }
 
-static int kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
-				     unsigned long flags, unsigned long eidx, unsigned long evtdata)
+static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
+				      unsigned long flags, unsigned long eidx,
+				      unsigned long evtdata)
 {
 	struct perf_event *event;
 
@@ -454,7 +455,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
 				     unsigned long eidx, u64 evtdata,
 				     struct kvm_vcpu_sbi_return *retdata)
 {
-	int ctr_idx, ret, sbiret = 0;
+	int ctr_idx, sbiret = 0;
+	long ret;
 	bool is_fevent;
 	unsigned long event_code;
 	u32 etype = kvm_pmu_get_perf_event_type(eidx);
@@ -513,8 +515,10 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
 			kvpmu->fw_event[event_code].started = true;
 	} else {
 		ret = kvm_pmu_create_perf_event(pmc, &attr, flags, eidx, evtdata);
-		if (ret)
-			return ret;
+		if (ret) {
+			sbiret = SBI_ERR_NOT_SUPPORTED;
+			goto out;
+		}
 	}
 
 	set_bit(ctr_idx, kvpmu->pmc_in_use);
diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
index 7eca72df2cbd..b70179e9e875 100644
--- a/arch/riscv/kvm/vcpu_sbi_pmu.c
+++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
@@ -42,9 +42,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
 #endif
 		/*
 		 * This can fail if perf core framework fails to create an event.
-		 * Forward the error to userspace because it's an error which
-		 * happened within the host kernel. The other option would be
-		 * to convert to an SBI error and forward to the guest.
+		 * No need to forward the error to userspace and exit the guest
+		 * operation can continue without profiling. Forward the
+		 * appropriate SBI error to the guest.
 		 */
 		ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
 						       cp->a2, cp->a3, temp, retdata);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 08/15] RISC-V: KVM: Implement SBI PMU Snapshot feature
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (6 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 07/15] RISC-V: KVM: No need to exit to the user space if perf event failed Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-02  9:49   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 09/15] RISC-V: KVM: Add perf sampling support for guests Atish Patra
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Anup Patel, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

PMU Snapshot function allows to minimize the number of traps when the
guest access configures/access the hpmcounters. If the snapshot feature
is enabled, the hypervisor updates the shared memory with counter
data and state of overflown counters. The guest can just read the
shared memory instead of trap & emulate done by the hypervisor.

This patch doesn't implement the counter overflow yet.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/kvm_vcpu_pmu.h |   7 ++
 arch/riscv/kvm/vcpu_pmu.c             | 120 +++++++++++++++++++++++++-
 arch/riscv/kvm/vcpu_sbi_pmu.c         |   3 +
 drivers/perf/riscv_pmu_sbi.c          |   2 +-
 4 files changed, 129 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
index 395518a1664e..586bab84be35 100644
--- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -50,6 +50,10 @@ struct kvm_pmu {
 	bool init_done;
 	/* Bit map of all the virtual counter used */
 	DECLARE_BITMAP(pmc_in_use, RISCV_KVM_MAX_COUNTERS);
+	/* The address of the counter snapshot area (guest physical address) */
+	gpa_t snapshot_addr;
+	/* The actual data of the snapshot */
+	struct riscv_pmu_snapshot_data *sdata;
 };
 
 #define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu_context)
@@ -85,6 +89,9 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
 int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
 				struct kvm_vcpu_sbi_return *retdata);
 void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
+				      unsigned long saddr_high, unsigned long flags,
+				      struct kvm_vcpu_sbi_return *retdata);
 void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
 void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
 
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 29bf4ca798cb..74865e6050a1 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -311,6 +311,81 @@ int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
 	return ret;
 }
 
+static void kvm_pmu_clear_snapshot_area(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+	int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
+
+	if (kvpmu->sdata) {
+		memset(kvpmu->sdata, 0, snapshot_area_size);
+		if (kvpmu->snapshot_addr != INVALID_GPA)
+			kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr,
+					     kvpmu->sdata, snapshot_area_size);
+		kfree(kvpmu->sdata);
+		kvpmu->sdata = NULL;
+	}
+	kvpmu->snapshot_addr = INVALID_GPA;
+}
+
+int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
+				      unsigned long saddr_high, unsigned long flags,
+				      struct kvm_vcpu_sbi_return *retdata)
+{
+	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+	int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
+	int sbiret = 0;
+	gpa_t saddr;
+	unsigned long hva;
+	bool writable;
+
+	if (!kvpmu) {
+		sbiret = SBI_ERR_INVALID_PARAM;
+		goto out;
+	}
+
+	if (saddr_low == -1 && saddr_high == -1) {
+		kvm_pmu_clear_snapshot_area(vcpu);
+		return 0;
+	}
+
+	saddr = saddr_low;
+
+	if (saddr_high != 0) {
+		if (IS_ENABLED(CONFIG_32BIT))
+			saddr |= ((gpa_t)saddr << 32);
+		else
+			sbiret = SBI_ERR_INVALID_ADDRESS;
+		goto out;
+	}
+
+	if (kvm_is_error_gpa(vcpu->kvm, saddr)) {
+		sbiret = SBI_ERR_INVALID_PARAM;
+		goto out;
+	}
+
+	hva = kvm_vcpu_gfn_to_hva_prot(vcpu, saddr >> PAGE_SHIFT, &writable);
+	if (kvm_is_error_hva(hva) || !writable) {
+		sbiret = SBI_ERR_INVALID_ADDRESS;
+		goto out;
+	}
+
+	kvpmu->snapshot_addr = saddr;
+	kvpmu->sdata = kzalloc(snapshot_area_size, GFP_ATOMIC);
+	if (!kvpmu->sdata)
+		return -ENOMEM;
+
+	if (kvm_vcpu_write_guest(vcpu, saddr, kvpmu->sdata, snapshot_area_size)) {
+		kfree(kvpmu->sdata);
+		kvpmu->snapshot_addr = INVALID_GPA;
+		sbiret = SBI_ERR_FAILURE;
+	}
+
+out:
+	retdata->err_val = sbiret;
+
+	return 0;
+}
+
 int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu,
 				struct kvm_vcpu_sbi_return *retdata)
 {
@@ -344,20 +419,33 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
 	int i, pmc_index, sbiret = 0;
 	struct kvm_pmc *pmc;
 	int fevent_code;
+	bool snap_flag_set = flags & SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT;
 
 	if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
 		sbiret = SBI_ERR_INVALID_PARAM;
 		goto out;
 	}
 
+	if (snap_flag_set && kvpmu->snapshot_addr == INVALID_GPA) {
+		sbiret = SBI_ERR_NO_SHMEM;
+		goto out;
+	}
+
 	/* Start the counters that have been configured and requested by the guest */
 	for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
 		pmc_index = i + ctr_base;
 		if (!test_bit(pmc_index, kvpmu->pmc_in_use))
 			continue;
 		pmc = &kvpmu->pmc[pmc_index];
-		if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE)
+		if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE) {
 			pmc->counter_val = ival;
+		} else if (snap_flag_set) {
+			kvm_vcpu_read_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
+					    sizeof(struct riscv_pmu_snapshot_data));
+			/* The counter index in the snapshot are relative to the counter base */
+			pmc->counter_val = kvpmu->sdata->ctr_values[i];
+		}
+
 		if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
 			fevent_code = get_event_code(pmc->event_idx);
 			if (fevent_code >= SBI_PMU_FW_MAX) {
@@ -398,14 +486,21 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
 {
 	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
 	int i, pmc_index, sbiret = 0;
+	u64 enabled, running;
 	struct kvm_pmc *pmc;
 	int fevent_code;
+	bool snap_flag_set = flags & SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
 
-	if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
+	if ((kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0)) {
 		sbiret = SBI_ERR_INVALID_PARAM;
 		goto out;
 	}
 
+	if (snap_flag_set && kvpmu->snapshot_addr == INVALID_GPA) {
+		sbiret = SBI_ERR_NO_SHMEM;
+		goto out;
+	}
+
 	/* Stop the counters that have been configured and requested by the guest */
 	for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
 		pmc_index = i + ctr_base;
@@ -438,9 +533,28 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
 		} else {
 			sbiret = SBI_ERR_INVALID_PARAM;
 		}
+
+		if (snap_flag_set && !sbiret) {
+			if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW)
+				pmc->counter_val = kvpmu->fw_event[fevent_code].value;
+			else if (pmc->perf_event)
+				pmc->counter_val += perf_event_read_value(pmc->perf_event,
+									  &enabled, &running);
+			/* TODO: Add counter overflow support when sscofpmf support is added */
+			kvpmu->sdata->ctr_values[i] = pmc->counter_val;
+			kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
+					     sizeof(struct riscv_pmu_snapshot_data));
+		}
+
 		if (flags & SBI_PMU_STOP_FLAG_RESET) {
 			pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
 			clear_bit(pmc_index, kvpmu->pmc_in_use);
+			if (snap_flag_set) {
+				/* Clear the snapshot area for the upcoming deletion event */
+				kvpmu->sdata->ctr_values[i] = 0;
+				kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
+						     sizeof(struct riscv_pmu_snapshot_data));
+			}
 		}
 	}
 
@@ -566,6 +680,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
 	kvpmu->num_hw_ctrs = num_hw_ctrs + 1;
 	kvpmu->num_fw_ctrs = SBI_PMU_FW_MAX;
 	memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
+	kvpmu->snapshot_addr = INVALID_GPA;
 
 	if (kvpmu->num_hw_ctrs > RISCV_KVM_MAX_HW_CTRS) {
 		pr_warn_once("Limiting the hardware counters to 32 as specified by the ISA");
@@ -625,6 +740,7 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
 	}
 	bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
 	memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
+	kvm_pmu_clear_snapshot_area(vcpu);
 }
 
 void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
index b70179e9e875..9f61136e4bb1 100644
--- a/arch/riscv/kvm/vcpu_sbi_pmu.c
+++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
@@ -64,6 +64,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	case SBI_EXT_PMU_COUNTER_FW_READ:
 		ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
 		break;
+	case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM:
+		ret = kvm_riscv_vcpu_pmu_setup_snapshot(vcpu, cp->a0, cp->a1, cp->a2, retdata);
+		break;
 	default:
 		retdata->err_val = SBI_ERR_NOT_SUPPORTED;
 	}
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 8de5721e8019..1a22ce1ff8c8 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -802,7 +802,7 @@ static noinline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_h
 	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
 
 	for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
-		if (ctr_ovf_mask & (1 << idx)) {
+		if (ctr_ovf_mask & (BIT(idx))) {
 			event = cpu_hw_evt->events[idx];
 			hwc = &event->hw;
 			max_period = riscv_pmu_ctr_get_width_mask(event);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 09/15] RISC-V: KVM: Add perf sampling support for guests
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (7 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 08/15] RISC-V: KVM: Implement SBI PMU Snapshot feature Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-02 10:33   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 10/15] RISC-V: KVM: Support 64 bit firmware counters on RV32 Atish Patra
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Anup Patel, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

KVM enables perf for guest via counter virtualization. However, the
sampling can not be supported as there is no mechanism to enabled
trap/emulate scountovf in ISA yet. Rely on the SBI PMU snapshot
to provide the counter overflow data via the shared memory.

In case of sampling event, the host first guest the LCOFI interrupt
and injects to the guest via irq filtering mechanism defined in AIA
specification. Thus, ssaia must be enabled in the host in order to
use perf sampling in the guest. No other AIA dpeendancy w.r.t kernel
is required.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/csr.h          |  3 +-
 arch/riscv/include/asm/kvm_vcpu_pmu.h |  3 ++
 arch/riscv/include/uapi/asm/kvm.h     |  1 +
 arch/riscv/kvm/aia.c                  |  5 ++
 arch/riscv/kvm/vcpu.c                 | 14 ++++--
 arch/riscv/kvm/vcpu_onereg.c          |  9 +++-
 arch/riscv/kvm/vcpu_pmu.c             | 72 ++++++++++++++++++++++++---
 7 files changed, 96 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 603e5a3c61f9..c0de2fd6c564 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -168,7 +168,8 @@
 #define VSIP_TO_HVIP_SHIFT	(IRQ_VS_SOFT - IRQ_S_SOFT)
 #define VSIP_VALID_MASK		((_AC(1, UL) << IRQ_S_SOFT) | \
 				 (_AC(1, UL) << IRQ_S_TIMER) | \
-				 (_AC(1, UL) << IRQ_S_EXT))
+				 (_AC(1, UL) << IRQ_S_EXT) | \
+				 (_AC(1, UL) << IRQ_PMU_OVF))
 
 /* AIA CSR bits */
 #define TOPI_IID_SHIFT		16
diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
index 586bab84be35..8cb21a4f862c 100644
--- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -36,6 +36,7 @@ struct kvm_pmc {
 	bool started;
 	/* Monitoring event ID */
 	unsigned long event_idx;
+	struct kvm_vcpu *vcpu;
 };
 
 /* PMU data structure per vcpu */
@@ -50,6 +51,8 @@ struct kvm_pmu {
 	bool init_done;
 	/* Bit map of all the virtual counter used */
 	DECLARE_BITMAP(pmc_in_use, RISCV_KVM_MAX_COUNTERS);
+	/* Bit map of all the virtual counter overflown */
+	DECLARE_BITMAP(pmc_overflown, RISCV_KVM_MAX_COUNTERS);
 	/* The address of the counter snapshot area (guest physical address) */
 	gpa_t snapshot_addr;
 	/* The actual data of the snapshot */
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index 7499e88a947c..e8b7545f1803 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -166,6 +166,7 @@ enum KVM_RISCV_ISA_EXT_ID {
 	KVM_RISCV_ISA_EXT_ZVFH,
 	KVM_RISCV_ISA_EXT_ZVFHMIN,
 	KVM_RISCV_ISA_EXT_ZFA,
+	KVM_RISCV_ISA_EXT_SSCOFPMF,
 	KVM_RISCV_ISA_EXT_MAX,
 };
 
diff --git a/arch/riscv/kvm/aia.c b/arch/riscv/kvm/aia.c
index a944294f6f23..0f0a9d11bb5f 100644
--- a/arch/riscv/kvm/aia.c
+++ b/arch/riscv/kvm/aia.c
@@ -545,6 +545,9 @@ void kvm_riscv_aia_enable(void)
 	enable_percpu_irq(hgei_parent_irq,
 			  irq_get_trigger_type(hgei_parent_irq));
 	csr_set(CSR_HIE, BIT(IRQ_S_GEXT));
+	/* Enable IRQ filtering for overflow interrupt only if sscofpmf is present */
+	if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF))
+		csr_write(CSR_HVIEN, BIT(IRQ_PMU_OVF));
 }
 
 void kvm_riscv_aia_disable(void)
@@ -558,6 +561,8 @@ void kvm_riscv_aia_disable(void)
 		return;
 	hgctrl = get_cpu_ptr(&aia_hgei);
 
+	if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF))
+		csr_clear(CSR_HVIEN, BIT(IRQ_PMU_OVF));
 	/* Disable per-CPU SGEI interrupt */
 	csr_clear(CSR_HIE, BIT(IRQ_S_GEXT));
 	disable_percpu_irq(hgei_parent_irq);
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index b5ca9f2e98ac..fcd8ad4de4d2 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -365,6 +365,12 @@ void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	/* Sync up the HVIP.LCOFIP bit changes (only clear) by the guest */
+	if ((csr->hvip ^ hvip) & (1UL << IRQ_PMU_OVF)) {
+		if (!test_and_set_bit(IRQ_PMU_OVF, v->irqs_pending_mask))
+			clear_bit(IRQ_PMU_OVF, v->irqs_pending);
+	}
+
 	/* Sync-up AIA high interrupts */
 	kvm_riscv_vcpu_aia_sync_interrupts(vcpu);
 
@@ -382,7 +388,8 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
 	if (irq < IRQ_LOCAL_MAX &&
 	    irq != IRQ_VS_SOFT &&
 	    irq != IRQ_VS_TIMER &&
-	    irq != IRQ_VS_EXT)
+	    irq != IRQ_VS_EXT &&
+	    irq != IRQ_PMU_OVF)
 		return -EINVAL;
 
 	set_bit(irq, vcpu->arch.irqs_pending);
@@ -397,14 +404,15 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
 int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
 {
 	/*
-	 * We only allow VS-mode software, timer, and external
+	 * We only allow VS-mode software, timer, counter overflow and external
 	 * interrupts when irq is one of the local interrupts
 	 * defined by RISC-V privilege specification.
 	 */
 	if (irq < IRQ_LOCAL_MAX &&
 	    irq != IRQ_VS_SOFT &&
 	    irq != IRQ_VS_TIMER &&
-	    irq != IRQ_VS_EXT)
+	    irq != IRQ_VS_EXT &&
+	    irq != IRQ_PMU_OVF)
 		return -EINVAL;
 
 	clear_bit(irq, vcpu->arch.irqs_pending);
diff --git a/arch/riscv/kvm/vcpu_onereg.c b/arch/riscv/kvm/vcpu_onereg.c
index 5f7355e96008..a072910820c2 100644
--- a/arch/riscv/kvm/vcpu_onereg.c
+++ b/arch/riscv/kvm/vcpu_onereg.c
@@ -36,6 +36,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
 	/* Multi letter extensions (alphabetically sorted) */
 	KVM_ISA_EXT_ARR(SMSTATEEN),
 	KVM_ISA_EXT_ARR(SSAIA),
+	KVM_ISA_EXT_ARR(SSCOFPMF),
 	KVM_ISA_EXT_ARR(SSTC),
 	KVM_ISA_EXT_ARR(SVINVAL),
 	KVM_ISA_EXT_ARR(SVNAPOT),
@@ -115,6 +116,7 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
 	case KVM_RISCV_ISA_EXT_I:
 	case KVM_RISCV_ISA_EXT_M:
 	case KVM_RISCV_ISA_EXT_SSTC:
+	case KVM_RISCV_ISA_EXT_SSCOFPMF:
 	case KVM_RISCV_ISA_EXT_SVINVAL:
 	case KVM_RISCV_ISA_EXT_SVNAPOT:
 	case KVM_RISCV_ISA_EXT_ZBA:
@@ -171,8 +173,13 @@ void kvm_riscv_vcpu_setup_isa(struct kvm_vcpu *vcpu)
 	for (i = 0; i < ARRAY_SIZE(kvm_isa_ext_arr); i++) {
 		host_isa = kvm_isa_ext_arr[i];
 		if (__riscv_isa_extension_available(NULL, host_isa) &&
-		    kvm_riscv_vcpu_isa_enable_allowed(i))
+		    kvm_riscv_vcpu_isa_enable_allowed(i)) {
+			/* Sscofpmf depends on interrupt filtering defined in ssaia */
+			if (host_isa == RISCV_ISA_EXT_SSCOFPMF &&
+			    !__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSAIA))
+				continue;
 			set_bit(host_isa, vcpu->arch.isa);
+		}
 	}
 }
 
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 74865e6050a1..a02f7b981005 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -39,7 +39,7 @@ static u64 kvm_pmu_get_sample_period(struct kvm_pmc *pmc)
 	u64 sample_period;
 
 	if (!pmc->counter_val)
-		sample_period = counter_val_mask + 1;
+		sample_period = counter_val_mask;
 	else
 		sample_period = (-pmc->counter_val) & counter_val_mask;
 
@@ -229,6 +229,47 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
 	return 0;
 }
 
+static void kvm_riscv_pmu_overflow(struct perf_event *perf_event,
+				   struct perf_sample_data *data,
+				   struct pt_regs *regs)
+{
+	struct kvm_pmc *pmc = perf_event->overflow_handler_context;
+	struct kvm_vcpu *vcpu = pmc->vcpu;
+	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+	struct riscv_pmu *rpmu = to_riscv_pmu(perf_event->pmu);
+	u64 period;
+
+	/*
+	 * Stop the event counting by directly accessing the perf_event.
+	 * Otherwise, this needs to deferred via a workqueue.
+	 * That will introduce skew in the counter value because the actual
+	 * physical counter would start after returning from this function.
+	 * It will be stopped again once the workqueue is scheduled
+	 */
+	rpmu->pmu.stop(perf_event, PERF_EF_UPDATE);
+
+	/*
+	 * The hw counter would start automatically when this function returns.
+	 * Thus, the host may continue to interrupt and inject it to the guest
+	 * even without the guest configuring the next event. Depending on the hardware
+	 * the host may have some sluggishness only if privilege mode filtering is not
+	 * available. In an ideal world, where qemu is not the only capable hardware,
+	 * this can be removed.
+	 * FYI: ARM64 does this way while x86 doesn't do anything as such.
+	 * TODO: Should we keep it for RISC-V ?
+	 */
+	period = -(local64_read(&perf_event->count));
+
+	local64_set(&perf_event->hw.period_left, 0);
+	perf_event->attr.sample_period = period;
+	perf_event->hw.sample_period = period;
+
+	set_bit(pmc->idx, kvpmu->pmc_overflown);
+	kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_PMU_OVF);
+
+	rpmu->pmu.start(perf_event, PERF_EF_RELOAD);
+}
+
 static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
 				      unsigned long flags, unsigned long eidx,
 				      unsigned long evtdata)
@@ -248,7 +289,7 @@ static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_att
 	 */
 	attr->sample_period = kvm_pmu_get_sample_period(pmc);
 
-	event = perf_event_create_kernel_counter(attr, -1, current, NULL, pmc);
+	event = perf_event_create_kernel_counter(attr, -1, current, kvm_riscv_pmu_overflow, pmc);
 	if (IS_ERR(event)) {
 		pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
 		return PTR_ERR(event);
@@ -436,6 +477,8 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
 		pmc_index = i + ctr_base;
 		if (!test_bit(pmc_index, kvpmu->pmc_in_use))
 			continue;
+		/* The guest started the counter again. Reset the overflow status */
+		clear_bit(pmc_index, kvpmu->pmc_overflown);
 		pmc = &kvpmu->pmc[pmc_index];
 		if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE) {
 			pmc->counter_val = ival;
@@ -474,6 +517,10 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
 		}
 	}
 
+	/* The guest have serviced the interrupt and starting the counter again */
+	if (test_bit(IRQ_PMU_OVF, vcpu->arch.irqs_pending))
+		kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_PMU_OVF);
+
 out:
 	retdata->err_val = sbiret;
 
@@ -540,7 +587,13 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
 			else if (pmc->perf_event)
 				pmc->counter_val += perf_event_read_value(pmc->perf_event,
 									  &enabled, &running);
-			/* TODO: Add counter overflow support when sscofpmf support is added */
+			/*
+			 * The counter and overflow indicies in the snapshot region are w.r.to
+			 * cbase. Modify the set bit in the counter mask instead of the pmc_index
+			 * which indicates the absolute counter index.
+			 */
+			if (test_bit(pmc_index, kvpmu->pmc_overflown))
+				kvpmu->sdata->ctr_overflow_mask |= (1UL << i);
 			kvpmu->sdata->ctr_values[i] = pmc->counter_val;
 			kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
 					     sizeof(struct riscv_pmu_snapshot_data));
@@ -549,15 +602,20 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
 		if (flags & SBI_PMU_STOP_FLAG_RESET) {
 			pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
 			clear_bit(pmc_index, kvpmu->pmc_in_use);
+			clear_bit(pmc_index, kvpmu->pmc_overflown);
 			if (snap_flag_set) {
 				/* Clear the snapshot area for the upcoming deletion event */
 				kvpmu->sdata->ctr_values[i] = 0;
+				/*
+				 * Only clear the given counter as the caller is responsible to
+				 * validate both the overflow mask and configured counters.
+				 */
+				kvpmu->sdata->ctr_overflow_mask &= ~(1UL << i);
 				kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
 						     sizeof(struct riscv_pmu_snapshot_data));
 			}
 		}
 	}
-
 out:
 	retdata->err_val = sbiret;
 
@@ -700,6 +758,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
 		pmc = &kvpmu->pmc[i];
 		pmc->idx = i;
 		pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
+		pmc->vcpu = vcpu;
 		if (i < kvpmu->num_hw_ctrs) {
 			pmc->cinfo.type = SBI_PMU_CTR_TYPE_HW;
 			if (i < 3)
@@ -732,13 +791,14 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
 	if (!kvpmu)
 		return;
 
-	for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) {
+	for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS) {
 		pmc = &kvpmu->pmc[i];
 		pmc->counter_val = 0;
 		kvm_pmu_release_perf_event(pmc);
 		pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
 	}
-	bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
+	bitmap_zero(kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS);
+	bitmap_zero(kvpmu->pmc_overflown, RISCV_KVM_MAX_COUNTERS);
 	memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
 	kvm_pmu_clear_snapshot_area(vcpu);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 10/15] RISC-V: KVM: Support 64 bit firmware counters on RV32
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (8 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 09/15] RISC-V: KVM: Add perf sampling support for guests Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-02 10:52   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 11/15] KVM: riscv: selftests: Add Sscofpmf to get-reg-list test Atish Patra
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Anup Patel, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

The SBI v2.0 introduced a fw_read_hi function to read 64 bit firmware
counters for RV32 based systems.

Add infrastructure to support that.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/kvm_vcpu_pmu.h |  4 ++-
 arch/riscv/kvm/vcpu_pmu.c             | 37 ++++++++++++++++++++++++++-
 arch/riscv/kvm/vcpu_sbi_pmu.c         |  6 +++++
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
index 8cb21a4f862c..e0ad27dea46c 100644
--- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -20,7 +20,7 @@ static_assert(RISCV_KVM_MAX_COUNTERS <= 64);
 
 struct kvm_fw_event {
 	/* Current value of the event */
-	unsigned long value;
+	u64 value;
 
 	/* Event monitoring status */
 	bool started;
@@ -91,6 +91,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
 				     struct kvm_vcpu_sbi_return *retdata);
 int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
 				struct kvm_vcpu_sbi_return *retdata);
+int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
+				      struct kvm_vcpu_sbi_return *retdata);
 void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
 int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
 				      unsigned long saddr_high, unsigned long flags,
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index a02f7b981005..469bb430cf97 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -196,6 +196,29 @@ static int pmu_get_pmc_index(struct kvm_pmu *pmu, unsigned long eidx,
 	return kvm_pmu_get_programmable_pmc_index(pmu, eidx, cbase, cmask);
 }
 
+static int pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
+			      unsigned long *out_val)
+{
+	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc;
+	int fevent_code;
+
+	if (!IS_ENABLED(CONFIG_32BIT))
+		return -EINVAL;
+
+	pmc = &kvpmu->pmc[cidx];
+
+	if (pmc->cinfo.type != SBI_PMU_CTR_TYPE_FW)
+		return -EINVAL;
+
+	fevent_code = get_event_code(pmc->event_idx);
+	pmc->counter_val = kvpmu->fw_event[fevent_code].value;
+
+	*out_val = pmc->counter_val >> 32;
+
+	return 0;
+}
+
 static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
 			unsigned long *out_val)
 {
@@ -702,6 +725,18 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
 	return 0;
 }
 
+int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
+				      struct kvm_vcpu_sbi_return *retdata)
+{
+	int ret;
+
+	ret = pmu_fw_ctr_read_hi(vcpu, cidx, &retdata->out_val);
+	if (ret == -EINVAL)
+		retdata->err_val = SBI_ERR_INVALID_PARAM;
+
+	return 0;
+}
+
 int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
 				struct kvm_vcpu_sbi_return *retdata)
 {
@@ -775,7 +810,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
 			pmc->cinfo.csr = CSR_CYCLE + i;
 		} else {
 			pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
-			pmc->cinfo.width = BITS_PER_LONG - 1;
+			pmc->cinfo.width = 63;
 		}
 	}
 
diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
index 9f61136e4bb1..58a0e5587e2a 100644
--- a/arch/riscv/kvm/vcpu_sbi_pmu.c
+++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
@@ -64,6 +64,12 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	case SBI_EXT_PMU_COUNTER_FW_READ:
 		ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
 		break;
+	case SBI_EXT_PMU_COUNTER_FW_READ_HI:
+		if (IS_ENABLED(CONFIG_32BIT))
+			ret = kvm_riscv_vcpu_pmu_fw_ctr_read_hi(vcpu, cp->a0, retdata);
+		else
+			retdata->out_val = 0;
+		break;
 	case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM:
 		ret = kvm_riscv_vcpu_pmu_setup_snapshot(vcpu, cp->a0, cp->a1, cp->a2, retdata);
 		break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 11/15] KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (9 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 10/15] RISC-V: KVM: Support 64 bit firmware counters on RV32 Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-01  4:42   ` Anup Patel
  2024-03-02 10:52   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 12/15] KVM: riscv: selftests: Add SBI PMU extension definitions Atish Patra
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Anup Patel, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

The KVM RISC-V allows Sscofpmf extension for Guest/VM so let us
add this extension to get-reg-list test.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 tools/testing/selftests/kvm/riscv/get-reg-list.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/kvm/riscv/get-reg-list.c b/tools/testing/selftests/kvm/riscv/get-reg-list.c
index 8cece02ca23a..ca6d98a5dce5 100644
--- a/tools/testing/selftests/kvm/riscv/get-reg-list.c
+++ b/tools/testing/selftests/kvm/riscv/get-reg-list.c
@@ -43,6 +43,7 @@ bool filter_reg(__u64 reg)
 	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_V:
 	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SMSTATEEN:
 	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSAIA:
+	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSCOFPMF:
 	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSTC:
 	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVINVAL:
 	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVNAPOT:
@@ -406,6 +407,7 @@ static const char *isa_ext_single_id_to_str(__u64 reg_off)
 		KVM_ISA_EXT_ARR(V),
 		KVM_ISA_EXT_ARR(SMSTATEEN),
 		KVM_ISA_EXT_ARR(SSAIA),
+		KVM_ISA_EXT_ARR(SSCOFPMF),
 		KVM_ISA_EXT_ARR(SSTC),
 		KVM_ISA_EXT_ARR(SVINVAL),
 		KVM_ISA_EXT_ARR(SVNAPOT),
@@ -927,6 +929,7 @@ KVM_ISA_EXT_SUBLIST_CONFIG(fp_f, FP_F);
 KVM_ISA_EXT_SUBLIST_CONFIG(fp_d, FP_D);
 KVM_ISA_EXT_SIMPLE_CONFIG(h, H);
 KVM_ISA_EXT_SUBLIST_CONFIG(smstateen, SMSTATEEN);
+KVM_ISA_EXT_SIMPLE_CONFIG(sscofpmf, SSCOFPMF);
 KVM_ISA_EXT_SIMPLE_CONFIG(sstc, SSTC);
 KVM_ISA_EXT_SIMPLE_CONFIG(svinval, SVINVAL);
 KVM_ISA_EXT_SIMPLE_CONFIG(svnapot, SVNAPOT);
@@ -980,6 +983,7 @@ struct vcpu_reg_list *vcpu_configs[] = {
 	&config_fp_d,
 	&config_h,
 	&config_smstateen,
+	&config_sscofpmf,
 	&config_sstc,
 	&config_svinval,
 	&config_svnapot,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 12/15] KVM: riscv: selftests: Add SBI PMU extension definitions
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (10 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 11/15] KVM: riscv: selftests: Add Sscofpmf to get-reg-list test Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-01  4:43   ` Anup Patel
  2024-03-02 11:00   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest Atish Patra
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Anup Patel, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

The SBI PMU extension definition is required for upcoming SBI PMU
selftests.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 .../selftests/kvm/include/riscv/processor.h   | 67 +++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
index f75c381fa35a..a49a39c8e8d4 100644
--- a/tools/testing/selftests/kvm/include/riscv/processor.h
+++ b/tools/testing/selftests/kvm/include/riscv/processor.h
@@ -169,17 +169,84 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector, exception_handl
 enum sbi_ext_id {
 	SBI_EXT_BASE = 0x10,
 	SBI_EXT_STA = 0x535441,
+	SBI_EXT_PMU = 0x504D55,
 };
 
 enum sbi_ext_base_fid {
 	SBI_EXT_BASE_PROBE_EXT = 3,
 };
 
+enum sbi_ext_pmu_fid {
+	SBI_EXT_PMU_NUM_COUNTERS = 0,
+	SBI_EXT_PMU_COUNTER_GET_INFO,
+	SBI_EXT_PMU_COUNTER_CFG_MATCH,
+	SBI_EXT_PMU_COUNTER_START,
+	SBI_EXT_PMU_COUNTER_STOP,
+	SBI_EXT_PMU_COUNTER_FW_READ,
+	SBI_EXT_PMU_COUNTER_FW_READ_HI,
+	SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
+};
+
+union sbi_pmu_ctr_info {
+	unsigned long value;
+	struct {
+		unsigned long csr:12;
+		unsigned long width:6;
+#if __riscv_xlen == 32
+		unsigned long reserved:13;
+#else
+		unsigned long reserved:45;
+#endif
+		unsigned long type:1;
+	};
+};
+
 struct sbiret {
 	long error;
 	long value;
 };
 
+/** General pmu event codes specified in SBI PMU extension */
+enum sbi_pmu_hw_generic_events_t {
+	SBI_PMU_HW_NO_EVENT			= 0,
+	SBI_PMU_HW_CPU_CYCLES			= 1,
+	SBI_PMU_HW_INSTRUCTIONS			= 2,
+	SBI_PMU_HW_CACHE_REFERENCES		= 3,
+	SBI_PMU_HW_CACHE_MISSES			= 4,
+	SBI_PMU_HW_BRANCH_INSTRUCTIONS		= 5,
+	SBI_PMU_HW_BRANCH_MISSES		= 6,
+	SBI_PMU_HW_BUS_CYCLES			= 7,
+	SBI_PMU_HW_STALLED_CYCLES_FRONTEND	= 8,
+	SBI_PMU_HW_STALLED_CYCLES_BACKEND	= 9,
+	SBI_PMU_HW_REF_CPU_CYCLES		= 10,
+
+	SBI_PMU_HW_GENERAL_MAX,
+};
+
+/* SBI PMU counter types */
+enum sbi_pmu_ctr_type {
+	SBI_PMU_CTR_TYPE_HW = 0x0,
+	SBI_PMU_CTR_TYPE_FW,
+};
+
+/* Flags defined for config matching function */
+#define SBI_PMU_CFG_FLAG_SKIP_MATCH	(1 << 0)
+#define SBI_PMU_CFG_FLAG_CLEAR_VALUE	(1 << 1)
+#define SBI_PMU_CFG_FLAG_AUTO_START	(1 << 2)
+#define SBI_PMU_CFG_FLAG_SET_VUINH	(1 << 3)
+#define SBI_PMU_CFG_FLAG_SET_VSINH	(1 << 4)
+#define SBI_PMU_CFG_FLAG_SET_UINH	(1 << 5)
+#define SBI_PMU_CFG_FLAG_SET_SINH	(1 << 6)
+#define SBI_PMU_CFG_FLAG_SET_MINH	(1 << 7)
+
+/* Flags defined for counter start function */
+#define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)
+#define SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT BIT(1)
+
+/* Flags defined for counter stop function */
+#define SBI_PMU_STOP_FLAG_RESET (1 << 0)
+#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
+
 struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
 			unsigned long arg1, unsigned long arg2,
 			unsigned long arg3, unsigned long arg4,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (11 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 12/15] KVM: riscv: selftests: Add SBI PMU extension definitions Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-01  4:47   ` Anup Patel
  2024-03-02 11:52   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 14/15] KVM: riscv: selftests: Add a test for PMU snapshot functionality Atish Patra
  2024-02-29  1:01 ` [PATCH v4 15/15] KVM: riscv: selftests: Add a test for counter overflow Atish Patra
  14 siblings, 2 replies; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Anup Patel, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

This test implements basic sanity test and cycle/instret event
counting tests.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 tools/testing/selftests/kvm/Makefile        |   1 +
 tools/testing/selftests/kvm/riscv/sbi_pmu.c | 340 ++++++++++++++++++++
 2 files changed, 341 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 426f85798aea..b2dce6843b9e 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -195,6 +195,7 @@ TEST_GEN_PROGS_riscv += kvm_create_max_vcpus
 TEST_GEN_PROGS_riscv += kvm_page_table_test
 TEST_GEN_PROGS_riscv += set_memory_region_test
 TEST_GEN_PROGS_riscv += steal_time
+TEST_GEN_PROGS_riscv += riscv/sbi_pmu
 
 SPLIT_TESTS += arch_timer
 SPLIT_TESTS += get-reg-list
diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
new file mode 100644
index 000000000000..fc1fc5eea99e
--- /dev/null
+++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * arch_timer.c - Tests the riscv64 sstc timer IRQ functionality
+ *
+ * The test validates the sstc timer IRQs using vstimecmp registers.
+ * It's ported from the aarch64 arch_timer test.
+ *
+ * Copyright (c) 2024, Rivos Inc.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include "kvm_util.h"
+#include "test_util.h"
+#include "processor.h"
+
+/* Maximum counters (firmware + hardware)*/
+#define RISCV_MAX_PMU_COUNTERS 64
+union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
+
+/* Cache the available counters in a bitmask */
+static unsigned long counter_mask_available;
+
+unsigned long pmu_csr_read_num(int csr_num)
+{
+#define switchcase_csr_read(__csr_num, __val)		{\
+	case __csr_num:					\
+		__val = csr_read(__csr_num);		\
+		break; }
+#define switchcase_csr_read_2(__csr_num, __val)		{\
+	switchcase_csr_read(__csr_num + 0, __val)	 \
+	switchcase_csr_read(__csr_num + 1, __val)}
+#define switchcase_csr_read_4(__csr_num, __val)		{\
+	switchcase_csr_read_2(__csr_num + 0, __val)	 \
+	switchcase_csr_read_2(__csr_num + 2, __val)}
+#define switchcase_csr_read_8(__csr_num, __val)		{\
+	switchcase_csr_read_4(__csr_num + 0, __val)	 \
+	switchcase_csr_read_4(__csr_num + 4, __val)}
+#define switchcase_csr_read_16(__csr_num, __val)	{\
+	switchcase_csr_read_8(__csr_num + 0, __val)	 \
+	switchcase_csr_read_8(__csr_num + 8, __val)}
+#define switchcase_csr_read_32(__csr_num, __val)	{\
+	switchcase_csr_read_16(__csr_num + 0, __val)	 \
+	switchcase_csr_read_16(__csr_num + 16, __val)}
+
+	unsigned long ret = 0;
+
+	switch (csr_num) {
+	switchcase_csr_read_32(CSR_CYCLE, ret)
+	switchcase_csr_read_32(CSR_CYCLEH, ret)
+	default :
+		break;
+	}
+
+	return ret;
+#undef switchcase_csr_read_32
+#undef switchcase_csr_read_16
+#undef switchcase_csr_read_8
+#undef switchcase_csr_read_4
+#undef switchcase_csr_read_2
+#undef switchcase_csr_read
+}
+
+static inline void dummy_func_loop(int iter)
+{
+	int i = 0;
+
+	while (i < iter) {
+		asm volatile("nop");
+		i++;
+	}
+}
+
+static void guest_illegal_exception_handler(struct ex_regs *regs)
+{
+	__GUEST_ASSERT(regs->cause == EXC_INST_ILLEGAL,
+		       "Unexpected exception handler %lx\n", regs->cause);
+
+	/* skip the trapping instruction */
+	regs->epc += 4;
+}
+
+static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
+				       unsigned long cflags,
+				       unsigned long event)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase, cmask,
+			cflags, event, 0, 0);
+	__GUEST_ASSERT(ret.error == 0, "config matching failed %ld\n", ret.error);
+	GUEST_ASSERT((ret.value < RISCV_MAX_PMU_COUNTERS) &&
+		    ((1UL << ret.value) & counter_mask_available));
+
+	return ret.value;
+}
+
+static unsigned long get_num_counters(void)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_NUM_COUNTERS, 0, 0, 0, 0, 0, 0);
+
+	__GUEST_ASSERT(ret.error == 0, "Unable to retrieve number of counters from SBI PMU");
+
+	__GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS,
+		       "Invalid number of counters %ld\n", ret.value);
+
+	return ret.value;
+}
+
+static void update_counter_info(int num_counters)
+{
+	int i = 0;
+	struct sbiret ret;
+
+	for (i = 0; i < num_counters; i++) {
+		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i, 0, 0, 0, 0, 0);
+
+		/* There can be gaps in logical counter indicies*/
+		if (!ret.error)
+			GUEST_ASSERT_NE(ret.value, 0);
+
+		ctrinfo_arr[i].value = ret.value;
+		counter_mask_available |= BIT(i);
+	}
+
+	GUEST_ASSERT(counter_mask_available > 0);
+}
+
+static unsigned long read_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
+{
+	unsigned long counter_val = 0;
+	struct sbiret ret;
+
+	__GUEST_ASSERT(ctrinfo.type < 2, "Invalid counter type %d", ctrinfo.type);
+
+	if (ctrinfo.type == SBI_PMU_CTR_TYPE_HW) {
+		counter_val = pmu_csr_read_num(ctrinfo.csr);
+	} else if (ctrinfo.type == SBI_PMU_CTR_TYPE_FW) {
+		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ, idx, 0, 0, 0, 0, 0);
+		GUEST_ASSERT(ret.error == 0);
+		counter_val = ret.value;
+	}
+
+	return counter_val;
+}
+
+static void start_counter(unsigned long counter, unsigned long start_flags,
+			  unsigned long ival)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, counter, 1, start_flags,
+			ival, 0, 0);
+	__GUEST_ASSERT(ret.error == 0, "Unable to start counter %ld\n", counter);
+}
+
+static void stop_counter(unsigned long counter, unsigned long stop_flags)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, counter, 1, stop_flags,
+			0, 0, 0);
+	if (stop_flags & SBI_PMU_STOP_FLAG_RESET)
+		__GUEST_ASSERT(ret.error == SBI_ERR_ALREADY_STOPPED,
+			       "Unable to stop counter %ld\n", counter);
+	else
+		__GUEST_ASSERT(ret.error == 0, "Unable to stop counter %ld error %ld\n",
+			       counter, ret.error);
+}
+
+static void test_pmu_event(unsigned long event)
+{
+	unsigned long counter;
+	unsigned long counter_value_pre, counter_value_post;
+	unsigned long counter_init_value = 100;
+
+	counter = get_counter_index(0, counter_mask_available, 0, event);
+	counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
+
+	/* Do not set the initial value */
+	start_counter(counter, 0, counter_init_value);
+	dummy_func_loop(10000);
+
+	stop_counter(counter, 0);
+
+	counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
+	__GUEST_ASSERT(counter_value_post > counter_value_pre,
+		       "counter_value_post %lx counter_value_pre %lx\n",
+		       counter_value_post, counter_value_pre);
+
+	/* Now set the initial value and compare */
+	start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_init_value);
+	dummy_func_loop(10000);
+
+	stop_counter(counter, 0);
+
+	counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
+	__GUEST_ASSERT(counter_value_post > counter_init_value,
+		       "counter_value_post %lx counter_init_value %lx\n",
+		       counter_value_post, counter_init_value);
+
+	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
+}
+
+static void test_invalid_event(void)
+{
+	struct sbiret ret;
+	unsigned long event = 0x1234; /* A random event */
+
+	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, 0,
+			counter_mask_available, 0, event, 0, 0);
+	GUEST_ASSERT_EQ(ret.error, SBI_ERR_NOT_SUPPORTED);
+}
+
+static void test_pmu_events(int cpu)
+{
+	int num_counters = 0;
+
+	/* Get the counter details */
+	num_counters = get_num_counters();
+	update_counter_info(num_counters);
+
+	/* Sanity testing for any random invalid event */
+	test_invalid_event();
+
+	/* Only these two events are guranteed to be present */
+	test_pmu_event(SBI_PMU_HW_CPU_CYCLES);
+	test_pmu_event(SBI_PMU_HW_INSTRUCTIONS);
+
+	GUEST_DONE();
+}
+
+static void test_pmu_basic_sanity(int cpu)
+{
+	long out_val = 0;
+	bool probe;
+	struct sbiret ret;
+	int num_counters = 0, i;
+	unsigned long counter_val = -1;
+	union sbi_pmu_ctr_info ctrinfo;
+
+	probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
+	GUEST_ASSERT(probe && out_val == 1);
+
+	num_counters = get_num_counters();
+
+	for (i = 0; i < num_counters; i++) {
+		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i,
+				0, 0, 0, 0, 0);
+
+		/* There can be gaps in logical counter indicies*/
+		if (!ret.error)
+			GUEST_ASSERT_NE(ret.value, 0);
+		else
+			continue;
+
+		ctrinfo.value = ret.value;
+
+		/* Accesibility check of hardware and read capability of firmware counters */
+		counter_val = read_counter(i, ctrinfo);
+		/* The spec doesn't mandate any initial value. Verify if a sane value */
+		GUEST_ASSERT_NE(counter_val, -1);
+	}
+
+	GUEST_DONE();
+}
+
+static void run_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct ucall uc;
+
+	vcpu_run(vcpu);
+	switch (get_ucall(vcpu, &uc)) {
+	case UCALL_ABORT:
+		REPORT_GUEST_ASSERT(uc);
+		break;
+	case UCALL_DONE:
+	case UCALL_SYNC:
+		break;
+	default:
+		TEST_FAIL("Unknown ucall %lu", uc.cmd);
+		break;
+	}
+}
+
+void test_vm_destroy(struct kvm_vm *vm)
+{
+	memset(ctrinfo_arr, 0, sizeof(union sbi_pmu_ctr_info) * RISCV_MAX_PMU_COUNTERS);
+	counter_mask_available = 0;
+	kvm_vm_free(vm);
+}
+
+static void test_vm_basic_test(void *guest_code)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
+				   "SBI PMU not available, skipping test");
+	vm_init_vector_tables(vm);
+	/* Illegal instruction handler is required to verify read access without configuration */
+	vm_install_exception_handler(vm, EXC_INST_ILLEGAL, guest_illegal_exception_handler);
+
+	vcpu_init_vector_tables(vcpu);
+	vcpu_args_set(vcpu, 1, 0);
+	run_vcpu(vcpu);
+
+	test_vm_destroy(vm);
+}
+
+static void test_vm_events_test(void *guest_code)
+{
+	struct kvm_vm *vm = NULL;
+	struct kvm_vcpu *vcpu = NULL;
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
+				   "SBI PMU not available, skipping test");
+	vcpu_args_set(vcpu, 1, 0);
+	run_vcpu(vcpu);
+
+	test_vm_destroy(vm);
+}
+
+int main(void)
+{
+	test_vm_basic_test(test_pmu_basic_sanity);
+	pr_info("SBI PMU basic test : PASS\n");
+
+	test_vm_events_test(test_pmu_events);
+	pr_info("SBI PMU event verification test : PASS\n");
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 14/15] KVM: riscv: selftests: Add a test for PMU snapshot functionality
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (12 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-01  4:50   ` Anup Patel
  2024-03-02 12:13   ` Andrew Jones
  2024-02-29  1:01 ` [PATCH v4 15/15] KVM: riscv: selftests: Add a test for counter overflow Atish Patra
  14 siblings, 2 replies; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Anup Patel, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

Verify PMU snapshot functionality by setting up the shared memory
correctly and reading the counter values from the shared memory
instead of the CSR.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 .../selftests/kvm/include/riscv/processor.h   |  25 ++++
 .../selftests/kvm/lib/riscv/processor.c       |  12 ++
 tools/testing/selftests/kvm/riscv/sbi_pmu.c   | 124 ++++++++++++++++++
 3 files changed, 161 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
index a49a39c8e8d4..e114d039e87b 100644
--- a/tools/testing/selftests/kvm/include/riscv/processor.h
+++ b/tools/testing/selftests/kvm/include/riscv/processor.h
@@ -173,6 +173,7 @@ enum sbi_ext_id {
 };
 
 enum sbi_ext_base_fid {
+	SBI_EXT_BASE_GET_IMP_VERSION = 2,
 	SBI_EXT_BASE_PROBE_EXT = 3,
 };
 
@@ -201,6 +202,12 @@ union sbi_pmu_ctr_info {
 	};
 };
 
+struct riscv_pmu_snapshot_data {
+	u64 ctr_overflow_mask;
+	u64 ctr_values[64];
+	u64 reserved[447];
+};
+
 struct sbiret {
 	long error;
 	long value;
@@ -247,6 +254,14 @@ enum sbi_pmu_ctr_type {
 #define SBI_PMU_STOP_FLAG_RESET (1 << 0)
 #define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
 
+#define SBI_STA_SHMEM_DISABLE		-1
+
+/* SBI spec version fields */
+#define SBI_SPEC_VERSION_DEFAULT	0x1
+#define SBI_SPEC_VERSION_MAJOR_SHIFT	24
+#define SBI_SPEC_VERSION_MAJOR_MASK	0x7f
+#define SBI_SPEC_VERSION_MINOR_MASK	0xffffff
+
 struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
 			unsigned long arg1, unsigned long arg2,
 			unsigned long arg3, unsigned long arg4,
@@ -254,6 +269,16 @@ struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
 
 bool guest_sbi_probe_extension(int extid, long *out_val);
 
+/* Make SBI version */
+static inline unsigned long sbi_mk_version(unsigned long major,
+					    unsigned long minor)
+{
+	return ((major & SBI_SPEC_VERSION_MAJOR_MASK) <<
+		SBI_SPEC_VERSION_MAJOR_SHIFT) | minor;
+}
+
+unsigned long get_host_sbi_impl_version(void);
+
 static inline void local_irq_enable(void)
 {
 	csr_set(CSR_SSTATUS, SR_SIE);
diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
index ec66d331a127..b0162d923e38 100644
--- a/tools/testing/selftests/kvm/lib/riscv/processor.c
+++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
@@ -499,3 +499,15 @@ bool guest_sbi_probe_extension(int extid, long *out_val)
 
 	return true;
 }
+
+unsigned long get_host_sbi_impl_version(void)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_BASE, SBI_EXT_BASE_GET_IMP_VERSION, 0,
+		       0, 0, 0, 0, 0);
+
+	GUEST_ASSERT(!ret.error);
+
+	return ret.value;
+}
diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
index fc1fc5eea99e..8ea2a6db6610 100644
--- a/tools/testing/selftests/kvm/riscv/sbi_pmu.c
+++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
@@ -21,6 +21,11 @@
 #define RISCV_MAX_PMU_COUNTERS 64
 union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
 
+/* Snapshot shared memory data */
+#define PMU_SNAPSHOT_GPA_BASE		(1 << 30)
+static void *snapshot_gva;
+static vm_paddr_t snapshot_gpa;
+
 /* Cache the available counters in a bitmask */
 static unsigned long counter_mask_available;
 
@@ -173,6 +178,20 @@ static void stop_counter(unsigned long counter, unsigned long stop_flags)
 			       counter, ret.error);
 }
 
+static void snapshot_set_shmem(vm_paddr_t gpa, unsigned long flags)
+{
+	unsigned long lo = (unsigned long)gpa;
+#if __riscv_xlen == 32
+	unsigned long hi = (unsigned long)(gpa >> 32);
+#else
+	unsigned long hi = gpa == -1 ? -1 : 0;
+#endif
+	struct sbiret ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
+				      lo, hi, flags, 0, 0, 0);
+
+	GUEST_ASSERT(ret.value == 0 && ret.error == 0);
+}
+
 static void test_pmu_event(unsigned long event)
 {
 	unsigned long counter;
@@ -207,6 +226,43 @@ static void test_pmu_event(unsigned long event)
 	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
 }
 
+static void test_pmu_event_snapshot(unsigned long event)
+{
+	unsigned long counter;
+	unsigned long counter_value_pre, counter_value_post;
+	unsigned long counter_init_value = 100;
+	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
+
+	counter = get_counter_index(0, counter_mask_available, 0, event);
+	counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
+
+	/* Do not set the initial value */
+	start_counter(counter, 0, 0);
+	dummy_func_loop(10000);
+
+	stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
+
+	/* The counter value is updated w.r.t relative index of cbase */
+	counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
+	__GUEST_ASSERT(counter_value_post > counter_value_pre,
+		       "counter_value_post %lx counter_value_pre %lx\n",
+		       counter_value_post, counter_value_pre);
+
+	/* Now set the initial value and compare */
+	WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
+	start_counter(counter, SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT, 0);
+	dummy_func_loop(10000);
+
+	stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
+
+	counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
+	__GUEST_ASSERT(counter_value_post > counter_init_value,
+		       "counter_value_post %lx counter_init_value %lx for counter\n",
+		       counter_value_post, counter_init_value);
+
+	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
+}
+
 static void test_invalid_event(void)
 {
 	struct sbiret ret;
@@ -270,6 +326,41 @@ static void test_pmu_basic_sanity(int cpu)
 	GUEST_DONE();
 }
 
+static void test_pmu_events_snaphost(int cpu)
+{
+	long out_val = 0;
+	bool probe;
+	int num_counters = 0;
+	unsigned long sbi_impl_version;
+	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
+	int i;
+
+	probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
+	GUEST_ASSERT(probe && out_val == 1);
+
+	sbi_impl_version = get_host_sbi_impl_version();
+	if (sbi_impl_version >= sbi_mk_version(2, 0))
+		__GUEST_ASSERT(0, "SBI implementation version doesn't support PMU Snapshot");
+
+	snapshot_set_shmem(snapshot_gpa, 0);
+
+	/* Get the counter details */
+	num_counters = get_num_counters();
+	update_counter_info(num_counters);
+
+	/* Validate shared memory access */
+	GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_overflow_mask), 0);
+	for (i = 0; i < num_counters; i++) {
+		if (counter_mask_available & (1UL << i))
+			GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_values[i]), 0);
+	}
+	/* Only these two events are guranteed to be present */
+	test_pmu_event_snapshot(SBI_PMU_HW_CPU_CYCLES);
+	test_pmu_event_snapshot(SBI_PMU_HW_INSTRUCTIONS);
+
+	GUEST_DONE();
+}
+
 static void run_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct ucall uc;
@@ -328,6 +419,36 @@ static void test_vm_events_test(void *guest_code)
 	test_vm_destroy(vm);
 }
 
+static void test_vm_setup_snapshot_mem(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
+{
+	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, PMU_SNAPSHOT_GPA_BASE, 1, 1, 0);
+	/* PMU Snapshot requires single page only */
+	virt_map(vm, PMU_SNAPSHOT_GPA_BASE, PMU_SNAPSHOT_GPA_BASE, 1);
+
+	/* PMU_SNAPSHOT_GPA_BASE is identity mapped */
+	snapshot_gva = (void *)(PMU_SNAPSHOT_GPA_BASE);
+	snapshot_gpa = addr_gva2gpa(vcpu->vm, (vm_vaddr_t)snapshot_gva);
+	sync_global_to_guest(vcpu->vm, snapshot_gva);
+	sync_global_to_guest(vcpu->vm, snapshot_gpa);
+}
+
+static void test_vm_events_snapshot_test(void *guest_code)
+{
+	struct kvm_vm *vm = NULL;
+	struct kvm_vcpu *vcpu = NULL;
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
+				   "SBI PMU not available, skipping test");
+
+	test_vm_setup_snapshot_mem(vm, vcpu);
+
+	vcpu_args_set(vcpu, 1, 0);
+	run_vcpu(vcpu);
+
+	test_vm_destroy(vm);
+}
+
 int main(void)
 {
 	test_vm_basic_test(test_pmu_basic_sanity);
@@ -336,5 +457,8 @@ int main(void)
 	test_vm_events_test(test_pmu_events);
 	pr_info("SBI PMU event verification test : PASS\n");
 
+	test_vm_events_snapshot_test(test_pmu_events_snaphost);
+	pr_info("SBI PMU event verification with snapshot test : PASS\n");
+
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v4 15/15] KVM: riscv: selftests: Add a test for counter overflow
  2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
                   ` (13 preceding siblings ...)
  2024-02-29  1:01 ` [PATCH v4 14/15] KVM: riscv: selftests: Add a test for PMU snapshot functionality Atish Patra
@ 2024-02-29  1:01 ` Atish Patra
  2024-03-01  4:53   ` Anup Patel
  2024-03-02 12:35   ` Andrew Jones
  14 siblings, 2 replies; 56+ messages in thread
From: Atish Patra @ 2024-02-29  1:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Anup Patel, Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

Add a test for verifying overflow interrupt. Currently, it relies on
overflow support on cycle/instret events. This test works for cycle/
instret events which support sampling via hpmcounters on the platform.
There are no ISA extensions to detect if a platform supports that. Thus,
this test will fail on platform with virtualization but doesn't
support overflow on these two events.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 tools/testing/selftests/kvm/riscv/sbi_pmu.c | 126 +++++++++++++++++++-
 1 file changed, 125 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
index 8ea2a6db6610..c0264c636054 100644
--- a/tools/testing/selftests/kvm/riscv/sbi_pmu.c
+++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
@@ -8,6 +8,7 @@
  * Copyright (c) 2024, Rivos Inc.
  */
 
+#include "asm/csr.h"
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -16,6 +17,7 @@
 #include "kvm_util.h"
 #include "test_util.h"
 #include "processor.h"
+#include "arch_timer.h"
 
 /* Maximum counters (firmware + hardware)*/
 #define RISCV_MAX_PMU_COUNTERS 64
@@ -26,6 +28,11 @@ union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
 static void *snapshot_gva;
 static vm_paddr_t snapshot_gpa;
 
+static int pmu_irq = IRQ_PMU_OVF;
+
+static int vcpu_shared_irq_count;
+static int counter_in_use;
+
 /* Cache the available counters in a bitmask */
 static unsigned long counter_mask_available;
 
@@ -69,7 +76,9 @@ unsigned long pmu_csr_read_num(int csr_num)
 #undef switchcase_csr_read
 }
 
-static inline void dummy_func_loop(int iter)
+static void stop_counter(unsigned long counter, unsigned long stop_flags);
+
+static inline void dummy_func_loop(uint64_t iter)
 {
 	int i = 0;
 
@@ -88,6 +97,26 @@ static void guest_illegal_exception_handler(struct ex_regs *regs)
 	regs->epc += 4;
 }
 
+static void guest_irq_handler(struct ex_regs *regs)
+{
+	unsigned int irq_num = regs->cause & ~CAUSE_IRQ_FLAG;
+	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
+	unsigned long overflown_mask;
+
+	/* Stop all counters first to avoid further interrupts */
+	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0, 1UL << counter_in_use,
+		  SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT, 0, 0, 0);
+
+	csr_clear(CSR_SIP, BIT(pmu_irq));
+
+	overflown_mask = READ_ONCE(snapshot_data->ctr_overflow_mask);
+	GUEST_ASSERT(overflown_mask & (1UL << counter_in_use));
+
+	/* Validate that we are in the correct irq handler */
+	GUEST_ASSERT_EQ(irq_num, pmu_irq);
+	WRITE_ONCE(vcpu_shared_irq_count, vcpu_shared_irq_count+1);
+}
+
 static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
 				       unsigned long cflags,
 				       unsigned long event)
@@ -263,6 +292,32 @@ static void test_pmu_event_snapshot(unsigned long event)
 	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
 }
 
+static void test_pmu_event_overflow(unsigned long event)
+{
+	unsigned long counter;
+	unsigned long counter_value_post;
+	unsigned long counter_init_value = ULONG_MAX - 10000;
+	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
+
+	counter = get_counter_index(0, counter_mask_available, 0, event);
+	counter_in_use = counter;
+
+	/* The counter value is updated w.r.t relative index of cbase passed to start/stop */
+	WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
+	start_counter(counter, SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT, 0);
+	dummy_func_loop(10000);
+	udelay(msecs_to_usecs(2000));
+	/* irq handler should have stopped the counter */
+
+	counter_value_post = READ_ONCE(snapshot_data->ctr_values[counter_in_use]);
+	/* The counter value after stopping should be less the init value due to overflow */
+	__GUEST_ASSERT(counter_value_post < counter_init_value,
+		       "counter_value_post %lx counter_init_value %lx for counter\n",
+		       counter_value_post, counter_init_value);
+
+	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
+}
+
 static void test_invalid_event(void)
 {
 	struct sbiret ret;
@@ -361,6 +416,43 @@ static void test_pmu_events_snaphost(int cpu)
 	GUEST_DONE();
 }
 
+static void test_pmu_events_overflow(int cpu)
+{
+	long out_val = 0;
+	bool probe;
+	int num_counters = 0;
+	unsigned long sbi_impl_version;
+
+	probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
+	GUEST_ASSERT(probe && out_val == 1);
+
+	sbi_impl_version = get_host_sbi_impl_version();
+	if (sbi_impl_version >= sbi_mk_version(2, 0))
+		__GUEST_ASSERT(0, "SBI implementation version doesn't support PMU Snapshot");
+
+	snapshot_set_shmem(snapshot_gpa, 0);
+	csr_set(CSR_IE, BIT(pmu_irq));
+	local_irq_enable();
+
+	/* Get the counter details */
+	num_counters = get_num_counters();
+	update_counter_info(num_counters);
+
+	/*
+	 * Qemu supports overflow for cycle/instruction.
+	 * This test may fail on any platform that do not support overflow for these two events.
+	 */
+	test_pmu_event_overflow(SBI_PMU_HW_CPU_CYCLES);
+	GUEST_ASSERT_EQ(vcpu_shared_irq_count, 1);
+
+	/* Renable the interrupt again for another event */
+	csr_set(CSR_IE, BIT(pmu_irq));
+	test_pmu_event_overflow(SBI_PMU_HW_INSTRUCTIONS);
+	GUEST_ASSERT_EQ(vcpu_shared_irq_count, 2);
+
+	GUEST_DONE();
+}
+
 static void run_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct ucall uc;
@@ -449,6 +541,35 @@ static void test_vm_events_snapshot_test(void *guest_code)
 	test_vm_destroy(vm);
 }
 
+static void test_vm_events_overflow(void *guest_code)
+{
+	struct kvm_vm *vm = NULL;
+	struct kvm_vcpu *vcpu = NULL;
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_SBI_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
+				   "SBI PMU not available, skipping test");
+
+	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_ISA_EXT_SSCOFPMF)),
+				   "Sscofpmf is not available, skipping overflow test");
+
+
+	test_vm_setup_snapshot_mem(vm, vcpu);
+	vm_init_vector_tables(vm);
+	vm_install_interrupt_handler(vm, guest_irq_handler);
+
+	vcpu_init_vector_tables(vcpu);
+	/* Initialize guest timer frequency. */
+	vcpu_get_reg(vcpu, RISCV_TIMER_REG(frequency), &timer_freq);
+	sync_global_to_guest(vm, timer_freq);
+
+	vcpu_args_set(vcpu, 1, 0);
+
+	run_vcpu(vcpu);
+
+	test_vm_destroy(vm);
+}
+
 int main(void)
 {
 	test_vm_basic_test(test_pmu_basic_sanity);
@@ -460,5 +581,8 @@ int main(void)
 	test_vm_events_snapshot_test(test_pmu_events_snaphost);
 	pr_info("SBI PMU event verification with snapshot test : PASS\n");
 
+	test_vm_events_overflow(test_pmu_events_overflow);
+	pr_info("SBI PMU event verification with overflow test : PASS\n");
+
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 11/15] KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
  2024-02-29  1:01 ` [PATCH v4 11/15] KVM: riscv: selftests: Add Sscofpmf to get-reg-list test Atish Patra
@ 2024-03-01  4:42   ` Anup Patel
  2024-03-02 10:52   ` Andrew Jones
  1 sibling, 0 replies; 56+ messages in thread
From: Anup Patel @ 2024-03-01  4:42 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Thu, Feb 29, 2024 at 6:32 AM Atish Patra <atishp@rivosinc.com> wrote:
>
> The KVM RISC-V allows Sscofpmf extension for Guest/VM so let us
> add this extension to get-reg-list test.
>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>

LGTM.

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  tools/testing/selftests/kvm/riscv/get-reg-list.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/riscv/get-reg-list.c b/tools/testing/selftests/kvm/riscv/get-reg-list.c
> index 8cece02ca23a..ca6d98a5dce5 100644
> --- a/tools/testing/selftests/kvm/riscv/get-reg-list.c
> +++ b/tools/testing/selftests/kvm/riscv/get-reg-list.c
> @@ -43,6 +43,7 @@ bool filter_reg(__u64 reg)
>         case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_V:
>         case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SMSTATEEN:
>         case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSAIA:
> +       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSCOFPMF:
>         case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSTC:
>         case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVINVAL:
>         case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVNAPOT:
> @@ -406,6 +407,7 @@ static const char *isa_ext_single_id_to_str(__u64 reg_off)
>                 KVM_ISA_EXT_ARR(V),
>                 KVM_ISA_EXT_ARR(SMSTATEEN),
>                 KVM_ISA_EXT_ARR(SSAIA),
> +               KVM_ISA_EXT_ARR(SSCOFPMF),
>                 KVM_ISA_EXT_ARR(SSTC),
>                 KVM_ISA_EXT_ARR(SVINVAL),
>                 KVM_ISA_EXT_ARR(SVNAPOT),
> @@ -927,6 +929,7 @@ KVM_ISA_EXT_SUBLIST_CONFIG(fp_f, FP_F);
>  KVM_ISA_EXT_SUBLIST_CONFIG(fp_d, FP_D);
>  KVM_ISA_EXT_SIMPLE_CONFIG(h, H);
>  KVM_ISA_EXT_SUBLIST_CONFIG(smstateen, SMSTATEEN);
> +KVM_ISA_EXT_SIMPLE_CONFIG(sscofpmf, SSCOFPMF);
>  KVM_ISA_EXT_SIMPLE_CONFIG(sstc, SSTC);
>  KVM_ISA_EXT_SIMPLE_CONFIG(svinval, SVINVAL);
>  KVM_ISA_EXT_SIMPLE_CONFIG(svnapot, SVNAPOT);
> @@ -980,6 +983,7 @@ struct vcpu_reg_list *vcpu_configs[] = {
>         &config_fp_d,
>         &config_h,
>         &config_smstateen,
> +       &config_sscofpmf,
>         &config_sstc,
>         &config_svinval,
>         &config_svnapot,
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 12/15] KVM: riscv: selftests: Add SBI PMU extension definitions
  2024-02-29  1:01 ` [PATCH v4 12/15] KVM: riscv: selftests: Add SBI PMU extension definitions Atish Patra
@ 2024-03-01  4:43   ` Anup Patel
  2024-03-02 11:00   ` Andrew Jones
  1 sibling, 0 replies; 56+ messages in thread
From: Anup Patel @ 2024-03-01  4:43 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Thu, Feb 29, 2024 at 6:32 AM Atish Patra <atishp@rivosinc.com> wrote:
>
> The SBI PMU extension definition is required for upcoming SBI PMU
> selftests.
>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>

LGTM.

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  .../selftests/kvm/include/riscv/processor.h   | 67 +++++++++++++++++++
>  1 file changed, 67 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
> index f75c381fa35a..a49a39c8e8d4 100644
> --- a/tools/testing/selftests/kvm/include/riscv/processor.h
> +++ b/tools/testing/selftests/kvm/include/riscv/processor.h
> @@ -169,17 +169,84 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector, exception_handl
>  enum sbi_ext_id {
>         SBI_EXT_BASE = 0x10,
>         SBI_EXT_STA = 0x535441,
> +       SBI_EXT_PMU = 0x504D55,
>  };
>
>  enum sbi_ext_base_fid {
>         SBI_EXT_BASE_PROBE_EXT = 3,
>  };
>
> +enum sbi_ext_pmu_fid {
> +       SBI_EXT_PMU_NUM_COUNTERS = 0,
> +       SBI_EXT_PMU_COUNTER_GET_INFO,
> +       SBI_EXT_PMU_COUNTER_CFG_MATCH,
> +       SBI_EXT_PMU_COUNTER_START,
> +       SBI_EXT_PMU_COUNTER_STOP,
> +       SBI_EXT_PMU_COUNTER_FW_READ,
> +       SBI_EXT_PMU_COUNTER_FW_READ_HI,
> +       SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> +};
> +
> +union sbi_pmu_ctr_info {
> +       unsigned long value;
> +       struct {
> +               unsigned long csr:12;
> +               unsigned long width:6;
> +#if __riscv_xlen == 32
> +               unsigned long reserved:13;
> +#else
> +               unsigned long reserved:45;
> +#endif
> +               unsigned long type:1;
> +       };
> +};
> +
>  struct sbiret {
>         long error;
>         long value;
>  };
>
> +/** General pmu event codes specified in SBI PMU extension */
> +enum sbi_pmu_hw_generic_events_t {
> +       SBI_PMU_HW_NO_EVENT                     = 0,
> +       SBI_PMU_HW_CPU_CYCLES                   = 1,
> +       SBI_PMU_HW_INSTRUCTIONS                 = 2,
> +       SBI_PMU_HW_CACHE_REFERENCES             = 3,
> +       SBI_PMU_HW_CACHE_MISSES                 = 4,
> +       SBI_PMU_HW_BRANCH_INSTRUCTIONS          = 5,
> +       SBI_PMU_HW_BRANCH_MISSES                = 6,
> +       SBI_PMU_HW_BUS_CYCLES                   = 7,
> +       SBI_PMU_HW_STALLED_CYCLES_FRONTEND      = 8,
> +       SBI_PMU_HW_STALLED_CYCLES_BACKEND       = 9,
> +       SBI_PMU_HW_REF_CPU_CYCLES               = 10,
> +
> +       SBI_PMU_HW_GENERAL_MAX,
> +};
> +
> +/* SBI PMU counter types */
> +enum sbi_pmu_ctr_type {
> +       SBI_PMU_CTR_TYPE_HW = 0x0,
> +       SBI_PMU_CTR_TYPE_FW,
> +};
> +
> +/* Flags defined for config matching function */
> +#define SBI_PMU_CFG_FLAG_SKIP_MATCH    (1 << 0)
> +#define SBI_PMU_CFG_FLAG_CLEAR_VALUE   (1 << 1)
> +#define SBI_PMU_CFG_FLAG_AUTO_START    (1 << 2)
> +#define SBI_PMU_CFG_FLAG_SET_VUINH     (1 << 3)
> +#define SBI_PMU_CFG_FLAG_SET_VSINH     (1 << 4)
> +#define SBI_PMU_CFG_FLAG_SET_UINH      (1 << 5)
> +#define SBI_PMU_CFG_FLAG_SET_SINH      (1 << 6)
> +#define SBI_PMU_CFG_FLAG_SET_MINH      (1 << 7)
> +
> +/* Flags defined for counter start function */
> +#define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)
> +#define SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT BIT(1)
> +
> +/* Flags defined for counter stop function */
> +#define SBI_PMU_STOP_FLAG_RESET (1 << 0)
> +#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
> +
>  struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
>                         unsigned long arg1, unsigned long arg2,
>                         unsigned long arg3, unsigned long arg4,
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest
  2024-02-29  1:01 ` [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest Atish Patra
@ 2024-03-01  4:47   ` Anup Patel
  2024-03-02  1:01     ` Atish Kumar Patra
  2024-03-02 11:52   ` Andrew Jones
  1 sibling, 1 reply; 56+ messages in thread
From: Anup Patel @ 2024-03-01  4:47 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Thu, Feb 29, 2024 at 6:32 AM Atish Patra <atishp@rivosinc.com> wrote:
>
> This test implements basic sanity test and cycle/instret event
> counting tests.
>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>

I feel the test should have been called sbi_pmu_test but no need to
revise this series. I will take care of it at the time of merging.

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  tools/testing/selftests/kvm/Makefile        |   1 +
>  tools/testing/selftests/kvm/riscv/sbi_pmu.c | 340 ++++++++++++++++++++
>  2 files changed, 341 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu.c
>
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 426f85798aea..b2dce6843b9e 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -195,6 +195,7 @@ TEST_GEN_PROGS_riscv += kvm_create_max_vcpus
>  TEST_GEN_PROGS_riscv += kvm_page_table_test
>  TEST_GEN_PROGS_riscv += set_memory_region_test
>  TEST_GEN_PROGS_riscv += steal_time
> +TEST_GEN_PROGS_riscv += riscv/sbi_pmu
>
>  SPLIT_TESTS += arch_timer
>  SPLIT_TESTS += get-reg-list
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> new file mode 100644
> index 000000000000..fc1fc5eea99e
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> @@ -0,0 +1,340 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * arch_timer.c - Tests the riscv64 sstc timer IRQ functionality
> + *
> + * The test validates the sstc timer IRQs using vstimecmp registers.
> + * It's ported from the aarch64 arch_timer test.
> + *
> + * Copyright (c) 2024, Rivos Inc.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include "kvm_util.h"
> +#include "test_util.h"
> +#include "processor.h"
> +
> +/* Maximum counters (firmware + hardware)*/
> +#define RISCV_MAX_PMU_COUNTERS 64
> +union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
> +
> +/* Cache the available counters in a bitmask */
> +static unsigned long counter_mask_available;
> +
> +unsigned long pmu_csr_read_num(int csr_num)
> +{
> +#define switchcase_csr_read(__csr_num, __val)          {\
> +       case __csr_num:                                 \
> +               __val = csr_read(__csr_num);            \
> +               break; }
> +#define switchcase_csr_read_2(__csr_num, __val)                {\
> +       switchcase_csr_read(__csr_num + 0, __val)        \
> +       switchcase_csr_read(__csr_num + 1, __val)}
> +#define switchcase_csr_read_4(__csr_num, __val)                {\
> +       switchcase_csr_read_2(__csr_num + 0, __val)      \
> +       switchcase_csr_read_2(__csr_num + 2, __val)}
> +#define switchcase_csr_read_8(__csr_num, __val)                {\
> +       switchcase_csr_read_4(__csr_num + 0, __val)      \
> +       switchcase_csr_read_4(__csr_num + 4, __val)}
> +#define switchcase_csr_read_16(__csr_num, __val)       {\
> +       switchcase_csr_read_8(__csr_num + 0, __val)      \
> +       switchcase_csr_read_8(__csr_num + 8, __val)}
> +#define switchcase_csr_read_32(__csr_num, __val)       {\
> +       switchcase_csr_read_16(__csr_num + 0, __val)     \
> +       switchcase_csr_read_16(__csr_num + 16, __val)}
> +
> +       unsigned long ret = 0;
> +
> +       switch (csr_num) {
> +       switchcase_csr_read_32(CSR_CYCLE, ret)
> +       switchcase_csr_read_32(CSR_CYCLEH, ret)
> +       default :
> +               break;
> +       }
> +
> +       return ret;
> +#undef switchcase_csr_read_32
> +#undef switchcase_csr_read_16
> +#undef switchcase_csr_read_8
> +#undef switchcase_csr_read_4
> +#undef switchcase_csr_read_2
> +#undef switchcase_csr_read
> +}
> +
> +static inline void dummy_func_loop(int iter)
> +{
> +       int i = 0;
> +
> +       while (i < iter) {
> +               asm volatile("nop");
> +               i++;
> +       }
> +}
> +
> +static void guest_illegal_exception_handler(struct ex_regs *regs)
> +{
> +       __GUEST_ASSERT(regs->cause == EXC_INST_ILLEGAL,
> +                      "Unexpected exception handler %lx\n", regs->cause);
> +
> +       /* skip the trapping instruction */
> +       regs->epc += 4;
> +}
> +
> +static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
> +                                      unsigned long cflags,
> +                                      unsigned long event)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase, cmask,
> +                       cflags, event, 0, 0);
> +       __GUEST_ASSERT(ret.error == 0, "config matching failed %ld\n", ret.error);
> +       GUEST_ASSERT((ret.value < RISCV_MAX_PMU_COUNTERS) &&
> +                   ((1UL << ret.value) & counter_mask_available));
> +
> +       return ret.value;
> +}
> +
> +static unsigned long get_num_counters(void)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_NUM_COUNTERS, 0, 0, 0, 0, 0, 0);
> +
> +       __GUEST_ASSERT(ret.error == 0, "Unable to retrieve number of counters from SBI PMU");
> +
> +       __GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS,
> +                      "Invalid number of counters %ld\n", ret.value);
> +
> +       return ret.value;
> +}
> +
> +static void update_counter_info(int num_counters)
> +{
> +       int i = 0;
> +       struct sbiret ret;
> +
> +       for (i = 0; i < num_counters; i++) {
> +               ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i, 0, 0, 0, 0, 0);
> +
> +               /* There can be gaps in logical counter indicies*/
> +               if (!ret.error)
> +                       GUEST_ASSERT_NE(ret.value, 0);
> +
> +               ctrinfo_arr[i].value = ret.value;
> +               counter_mask_available |= BIT(i);
> +       }
> +
> +       GUEST_ASSERT(counter_mask_available > 0);
> +}
> +
> +static unsigned long read_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
> +{
> +       unsigned long counter_val = 0;
> +       struct sbiret ret;
> +
> +       __GUEST_ASSERT(ctrinfo.type < 2, "Invalid counter type %d", ctrinfo.type);
> +
> +       if (ctrinfo.type == SBI_PMU_CTR_TYPE_HW) {
> +               counter_val = pmu_csr_read_num(ctrinfo.csr);
> +       } else if (ctrinfo.type == SBI_PMU_CTR_TYPE_FW) {
> +               ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ, idx, 0, 0, 0, 0, 0);
> +               GUEST_ASSERT(ret.error == 0);
> +               counter_val = ret.value;
> +       }
> +
> +       return counter_val;
> +}
> +
> +static void start_counter(unsigned long counter, unsigned long start_flags,
> +                         unsigned long ival)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, counter, 1, start_flags,
> +                       ival, 0, 0);
> +       __GUEST_ASSERT(ret.error == 0, "Unable to start counter %ld\n", counter);
> +}
> +
> +static void stop_counter(unsigned long counter, unsigned long stop_flags)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, counter, 1, stop_flags,
> +                       0, 0, 0);
> +       if (stop_flags & SBI_PMU_STOP_FLAG_RESET)
> +               __GUEST_ASSERT(ret.error == SBI_ERR_ALREADY_STOPPED,
> +                              "Unable to stop counter %ld\n", counter);
> +       else
> +               __GUEST_ASSERT(ret.error == 0, "Unable to stop counter %ld error %ld\n",
> +                              counter, ret.error);
> +}
> +
> +static void test_pmu_event(unsigned long event)
> +{
> +       unsigned long counter;
> +       unsigned long counter_value_pre, counter_value_post;
> +       unsigned long counter_init_value = 100;
> +
> +       counter = get_counter_index(0, counter_mask_available, 0, event);
> +       counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
> +
> +       /* Do not set the initial value */
> +       start_counter(counter, 0, counter_init_value);
> +       dummy_func_loop(10000);
> +
> +       stop_counter(counter, 0);
> +
> +       counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> +       __GUEST_ASSERT(counter_value_post > counter_value_pre,
> +                      "counter_value_post %lx counter_value_pre %lx\n",
> +                      counter_value_post, counter_value_pre);
> +
> +       /* Now set the initial value and compare */
> +       start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_init_value);
> +       dummy_func_loop(10000);
> +
> +       stop_counter(counter, 0);
> +
> +       counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> +       __GUEST_ASSERT(counter_value_post > counter_init_value,
> +                      "counter_value_post %lx counter_init_value %lx\n",
> +                      counter_value_post, counter_init_value);
> +
> +       stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
> +}
> +
> +static void test_invalid_event(void)
> +{
> +       struct sbiret ret;
> +       unsigned long event = 0x1234; /* A random event */
> +
> +       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, 0,
> +                       counter_mask_available, 0, event, 0, 0);
> +       GUEST_ASSERT_EQ(ret.error, SBI_ERR_NOT_SUPPORTED);
> +}
> +
> +static void test_pmu_events(int cpu)
> +{
> +       int num_counters = 0;
> +
> +       /* Get the counter details */
> +       num_counters = get_num_counters();
> +       update_counter_info(num_counters);
> +
> +       /* Sanity testing for any random invalid event */
> +       test_invalid_event();
> +
> +       /* Only these two events are guranteed to be present */
> +       test_pmu_event(SBI_PMU_HW_CPU_CYCLES);
> +       test_pmu_event(SBI_PMU_HW_INSTRUCTIONS);
> +
> +       GUEST_DONE();
> +}
> +
> +static void test_pmu_basic_sanity(int cpu)
> +{
> +       long out_val = 0;
> +       bool probe;
> +       struct sbiret ret;
> +       int num_counters = 0, i;
> +       unsigned long counter_val = -1;
> +       union sbi_pmu_ctr_info ctrinfo;
> +
> +       probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
> +       GUEST_ASSERT(probe && out_val == 1);
> +
> +       num_counters = get_num_counters();
> +
> +       for (i = 0; i < num_counters; i++) {
> +               ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i,
> +                               0, 0, 0, 0, 0);
> +
> +               /* There can be gaps in logical counter indicies*/
> +               if (!ret.error)
> +                       GUEST_ASSERT_NE(ret.value, 0);
> +               else
> +                       continue;
> +
> +               ctrinfo.value = ret.value;
> +
> +               /* Accesibility check of hardware and read capability of firmware counters */
> +               counter_val = read_counter(i, ctrinfo);
> +               /* The spec doesn't mandate any initial value. Verify if a sane value */
> +               GUEST_ASSERT_NE(counter_val, -1);
> +       }
> +
> +       GUEST_DONE();
> +}
> +
> +static void run_vcpu(struct kvm_vcpu *vcpu)
> +{
> +       struct ucall uc;
> +
> +       vcpu_run(vcpu);
> +       switch (get_ucall(vcpu, &uc)) {
> +       case UCALL_ABORT:
> +               REPORT_GUEST_ASSERT(uc);
> +               break;
> +       case UCALL_DONE:
> +       case UCALL_SYNC:
> +               break;
> +       default:
> +               TEST_FAIL("Unknown ucall %lu", uc.cmd);
> +               break;
> +       }
> +}
> +
> +void test_vm_destroy(struct kvm_vm *vm)
> +{
> +       memset(ctrinfo_arr, 0, sizeof(union sbi_pmu_ctr_info) * RISCV_MAX_PMU_COUNTERS);
> +       counter_mask_available = 0;
> +       kvm_vm_free(vm);
> +}
> +
> +static void test_vm_basic_test(void *guest_code)
> +{
> +       struct kvm_vm *vm;
> +       struct kvm_vcpu *vcpu;
> +
> +       vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +       __TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
> +                                  "SBI PMU not available, skipping test");
> +       vm_init_vector_tables(vm);
> +       /* Illegal instruction handler is required to verify read access without configuration */
> +       vm_install_exception_handler(vm, EXC_INST_ILLEGAL, guest_illegal_exception_handler);
> +
> +       vcpu_init_vector_tables(vcpu);
> +       vcpu_args_set(vcpu, 1, 0);
> +       run_vcpu(vcpu);
> +
> +       test_vm_destroy(vm);
> +}
> +
> +static void test_vm_events_test(void *guest_code)
> +{
> +       struct kvm_vm *vm = NULL;
> +       struct kvm_vcpu *vcpu = NULL;
> +
> +       vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +       __TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
> +                                  "SBI PMU not available, skipping test");
> +       vcpu_args_set(vcpu, 1, 0);
> +       run_vcpu(vcpu);
> +
> +       test_vm_destroy(vm);
> +}
> +
> +int main(void)
> +{
> +       test_vm_basic_test(test_pmu_basic_sanity);
> +       pr_info("SBI PMU basic test : PASS\n");
> +
> +       test_vm_events_test(test_pmu_events);
> +       pr_info("SBI PMU event verification test : PASS\n");
> +
> +       return 0;
> +}
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 14/15] KVM: riscv: selftests: Add a test for PMU snapshot functionality
  2024-02-29  1:01 ` [PATCH v4 14/15] KVM: riscv: selftests: Add a test for PMU snapshot functionality Atish Patra
@ 2024-03-01  4:50   ` Anup Patel
  2024-03-02 12:13   ` Andrew Jones
  1 sibling, 0 replies; 56+ messages in thread
From: Anup Patel @ 2024-03-01  4:50 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Thu, Feb 29, 2024 at 6:32 AM Atish Patra <atishp@rivosinc.com> wrote:
>
> Verify PMU snapshot functionality by setting up the shared memory
> correctly and reading the counter values from the shared memory
> instead of the CSR.
>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>

LGTM.

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  .../selftests/kvm/include/riscv/processor.h   |  25 ++++
>  .../selftests/kvm/lib/riscv/processor.c       |  12 ++
>  tools/testing/selftests/kvm/riscv/sbi_pmu.c   | 124 ++++++++++++++++++
>  3 files changed, 161 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
> index a49a39c8e8d4..e114d039e87b 100644
> --- a/tools/testing/selftests/kvm/include/riscv/processor.h
> +++ b/tools/testing/selftests/kvm/include/riscv/processor.h
> @@ -173,6 +173,7 @@ enum sbi_ext_id {
>  };
>
>  enum sbi_ext_base_fid {
> +       SBI_EXT_BASE_GET_IMP_VERSION = 2,
>         SBI_EXT_BASE_PROBE_EXT = 3,
>  };
>
> @@ -201,6 +202,12 @@ union sbi_pmu_ctr_info {
>         };
>  };
>
> +struct riscv_pmu_snapshot_data {
> +       u64 ctr_overflow_mask;
> +       u64 ctr_values[64];
> +       u64 reserved[447];
> +};
> +
>  struct sbiret {
>         long error;
>         long value;
> @@ -247,6 +254,14 @@ enum sbi_pmu_ctr_type {
>  #define SBI_PMU_STOP_FLAG_RESET (1 << 0)
>  #define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
>
> +#define SBI_STA_SHMEM_DISABLE          -1
> +
> +/* SBI spec version fields */
> +#define SBI_SPEC_VERSION_DEFAULT       0x1
> +#define SBI_SPEC_VERSION_MAJOR_SHIFT   24
> +#define SBI_SPEC_VERSION_MAJOR_MASK    0x7f
> +#define SBI_SPEC_VERSION_MINOR_MASK    0xffffff
> +
>  struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
>                         unsigned long arg1, unsigned long arg2,
>                         unsigned long arg3, unsigned long arg4,
> @@ -254,6 +269,16 @@ struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
>
>  bool guest_sbi_probe_extension(int extid, long *out_val);
>
> +/* Make SBI version */
> +static inline unsigned long sbi_mk_version(unsigned long major,
> +                                           unsigned long minor)
> +{
> +       return ((major & SBI_SPEC_VERSION_MAJOR_MASK) <<
> +               SBI_SPEC_VERSION_MAJOR_SHIFT) | minor;
> +}
> +
> +unsigned long get_host_sbi_impl_version(void);
> +
>  static inline void local_irq_enable(void)
>  {
>         csr_set(CSR_SSTATUS, SR_SIE);
> diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
> index ec66d331a127..b0162d923e38 100644
> --- a/tools/testing/selftests/kvm/lib/riscv/processor.c
> +++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
> @@ -499,3 +499,15 @@ bool guest_sbi_probe_extension(int extid, long *out_val)
>
>         return true;
>  }
> +
> +unsigned long get_host_sbi_impl_version(void)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_BASE, SBI_EXT_BASE_GET_IMP_VERSION, 0,
> +                      0, 0, 0, 0, 0);
> +
> +       GUEST_ASSERT(!ret.error);
> +
> +       return ret.value;
> +}
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> index fc1fc5eea99e..8ea2a6db6610 100644
> --- a/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> @@ -21,6 +21,11 @@
>  #define RISCV_MAX_PMU_COUNTERS 64
>  union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
>
> +/* Snapshot shared memory data */
> +#define PMU_SNAPSHOT_GPA_BASE          (1 << 30)
> +static void *snapshot_gva;
> +static vm_paddr_t snapshot_gpa;
> +
>  /* Cache the available counters in a bitmask */
>  static unsigned long counter_mask_available;
>
> @@ -173,6 +178,20 @@ static void stop_counter(unsigned long counter, unsigned long stop_flags)
>                                counter, ret.error);
>  }
>
> +static void snapshot_set_shmem(vm_paddr_t gpa, unsigned long flags)
> +{
> +       unsigned long lo = (unsigned long)gpa;
> +#if __riscv_xlen == 32
> +       unsigned long hi = (unsigned long)(gpa >> 32);
> +#else
> +       unsigned long hi = gpa == -1 ? -1 : 0;
> +#endif
> +       struct sbiret ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> +                                     lo, hi, flags, 0, 0, 0);
> +
> +       GUEST_ASSERT(ret.value == 0 && ret.error == 0);
> +}
> +
>  static void test_pmu_event(unsigned long event)
>  {
>         unsigned long counter;
> @@ -207,6 +226,43 @@ static void test_pmu_event(unsigned long event)
>         stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
>  }
>
> +static void test_pmu_event_snapshot(unsigned long event)
> +{
> +       unsigned long counter;
> +       unsigned long counter_value_pre, counter_value_post;
> +       unsigned long counter_init_value = 100;
> +       struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> +
> +       counter = get_counter_index(0, counter_mask_available, 0, event);
> +       counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
> +
> +       /* Do not set the initial value */
> +       start_counter(counter, 0, 0);
> +       dummy_func_loop(10000);
> +
> +       stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
> +
> +       /* The counter value is updated w.r.t relative index of cbase */
> +       counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
> +       __GUEST_ASSERT(counter_value_post > counter_value_pre,
> +                      "counter_value_post %lx counter_value_pre %lx\n",
> +                      counter_value_post, counter_value_pre);
> +
> +       /* Now set the initial value and compare */
> +       WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
> +       start_counter(counter, SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT, 0);
> +       dummy_func_loop(10000);
> +
> +       stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
> +
> +       counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
> +       __GUEST_ASSERT(counter_value_post > counter_init_value,
> +                      "counter_value_post %lx counter_init_value %lx for counter\n",
> +                      counter_value_post, counter_init_value);
> +
> +       stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
> +}
> +
>  static void test_invalid_event(void)
>  {
>         struct sbiret ret;
> @@ -270,6 +326,41 @@ static void test_pmu_basic_sanity(int cpu)
>         GUEST_DONE();
>  }
>
> +static void test_pmu_events_snaphost(int cpu)
> +{
> +       long out_val = 0;
> +       bool probe;
> +       int num_counters = 0;
> +       unsigned long sbi_impl_version;
> +       struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> +       int i;
> +
> +       probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
> +       GUEST_ASSERT(probe && out_val == 1);
> +
> +       sbi_impl_version = get_host_sbi_impl_version();
> +       if (sbi_impl_version >= sbi_mk_version(2, 0))
> +               __GUEST_ASSERT(0, "SBI implementation version doesn't support PMU Snapshot");
> +
> +       snapshot_set_shmem(snapshot_gpa, 0);
> +
> +       /* Get the counter details */
> +       num_counters = get_num_counters();
> +       update_counter_info(num_counters);
> +
> +       /* Validate shared memory access */
> +       GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_overflow_mask), 0);
> +       for (i = 0; i < num_counters; i++) {
> +               if (counter_mask_available & (1UL << i))
> +                       GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_values[i]), 0);
> +       }
> +       /* Only these two events are guranteed to be present */
> +       test_pmu_event_snapshot(SBI_PMU_HW_CPU_CYCLES);
> +       test_pmu_event_snapshot(SBI_PMU_HW_INSTRUCTIONS);
> +
> +       GUEST_DONE();
> +}
> +
>  static void run_vcpu(struct kvm_vcpu *vcpu)
>  {
>         struct ucall uc;
> @@ -328,6 +419,36 @@ static void test_vm_events_test(void *guest_code)
>         test_vm_destroy(vm);
>  }
>
> +static void test_vm_setup_snapshot_mem(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
> +{
> +       vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, PMU_SNAPSHOT_GPA_BASE, 1, 1, 0);
> +       /* PMU Snapshot requires single page only */
> +       virt_map(vm, PMU_SNAPSHOT_GPA_BASE, PMU_SNAPSHOT_GPA_BASE, 1);
> +
> +       /* PMU_SNAPSHOT_GPA_BASE is identity mapped */
> +       snapshot_gva = (void *)(PMU_SNAPSHOT_GPA_BASE);
> +       snapshot_gpa = addr_gva2gpa(vcpu->vm, (vm_vaddr_t)snapshot_gva);
> +       sync_global_to_guest(vcpu->vm, snapshot_gva);
> +       sync_global_to_guest(vcpu->vm, snapshot_gpa);
> +}
> +
> +static void test_vm_events_snapshot_test(void *guest_code)
> +{
> +       struct kvm_vm *vm = NULL;
> +       struct kvm_vcpu *vcpu = NULL;
> +
> +       vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +       __TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
> +                                  "SBI PMU not available, skipping test");
> +
> +       test_vm_setup_snapshot_mem(vm, vcpu);
> +
> +       vcpu_args_set(vcpu, 1, 0);
> +       run_vcpu(vcpu);
> +
> +       test_vm_destroy(vm);
> +}
> +
>  int main(void)
>  {
>         test_vm_basic_test(test_pmu_basic_sanity);
> @@ -336,5 +457,8 @@ int main(void)
>         test_vm_events_test(test_pmu_events);
>         pr_info("SBI PMU event verification test : PASS\n");
>
> +       test_vm_events_snapshot_test(test_pmu_events_snaphost);
> +       pr_info("SBI PMU event verification with snapshot test : PASS\n");
> +
>         return 0;
>  }
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 15/15] KVM: riscv: selftests: Add a test for counter overflow
  2024-02-29  1:01 ` [PATCH v4 15/15] KVM: riscv: selftests: Add a test for counter overflow Atish Patra
@ 2024-03-01  4:53   ` Anup Patel
  2024-03-02 12:35   ` Andrew Jones
  1 sibling, 0 replies; 56+ messages in thread
From: Anup Patel @ 2024-03-01  4:53 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Thu, Feb 29, 2024 at 6:32 AM Atish Patra <atishp@rivosinc.com> wrote:
>
> Add a test for verifying overflow interrupt. Currently, it relies on
> overflow support on cycle/instret events. This test works for cycle/
> instret events which support sampling via hpmcounters on the platform.
> There are no ISA extensions to detect if a platform supports that. Thus,
> this test will fail on platform with virtualization but doesn't
> support overflow on these two events.
>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>

LGTM.

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  tools/testing/selftests/kvm/riscv/sbi_pmu.c | 126 +++++++++++++++++++-
>  1 file changed, 125 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> index 8ea2a6db6610..c0264c636054 100644
> --- a/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> @@ -8,6 +8,7 @@
>   * Copyright (c) 2024, Rivos Inc.
>   */
>
> +#include "asm/csr.h"
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
> @@ -16,6 +17,7 @@
>  #include "kvm_util.h"
>  #include "test_util.h"
>  #include "processor.h"
> +#include "arch_timer.h"
>
>  /* Maximum counters (firmware + hardware)*/
>  #define RISCV_MAX_PMU_COUNTERS 64
> @@ -26,6 +28,11 @@ union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
>  static void *snapshot_gva;
>  static vm_paddr_t snapshot_gpa;
>
> +static int pmu_irq = IRQ_PMU_OVF;
> +
> +static int vcpu_shared_irq_count;
> +static int counter_in_use;
> +
>  /* Cache the available counters in a bitmask */
>  static unsigned long counter_mask_available;
>
> @@ -69,7 +76,9 @@ unsigned long pmu_csr_read_num(int csr_num)
>  #undef switchcase_csr_read
>  }
>
> -static inline void dummy_func_loop(int iter)
> +static void stop_counter(unsigned long counter, unsigned long stop_flags);
> +
> +static inline void dummy_func_loop(uint64_t iter)
>  {
>         int i = 0;
>
> @@ -88,6 +97,26 @@ static void guest_illegal_exception_handler(struct ex_regs *regs)
>         regs->epc += 4;
>  }
>
> +static void guest_irq_handler(struct ex_regs *regs)
> +{
> +       unsigned int irq_num = regs->cause & ~CAUSE_IRQ_FLAG;
> +       struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> +       unsigned long overflown_mask;
> +
> +       /* Stop all counters first to avoid further interrupts */
> +       sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0, 1UL << counter_in_use,
> +                 SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT, 0, 0, 0);
> +
> +       csr_clear(CSR_SIP, BIT(pmu_irq));
> +
> +       overflown_mask = READ_ONCE(snapshot_data->ctr_overflow_mask);
> +       GUEST_ASSERT(overflown_mask & (1UL << counter_in_use));
> +
> +       /* Validate that we are in the correct irq handler */
> +       GUEST_ASSERT_EQ(irq_num, pmu_irq);
> +       WRITE_ONCE(vcpu_shared_irq_count, vcpu_shared_irq_count+1);
> +}
> +
>  static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
>                                        unsigned long cflags,
>                                        unsigned long event)
> @@ -263,6 +292,32 @@ static void test_pmu_event_snapshot(unsigned long event)
>         stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
>  }
>
> +static void test_pmu_event_overflow(unsigned long event)
> +{
> +       unsigned long counter;
> +       unsigned long counter_value_post;
> +       unsigned long counter_init_value = ULONG_MAX - 10000;
> +       struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> +
> +       counter = get_counter_index(0, counter_mask_available, 0, event);
> +       counter_in_use = counter;
> +
> +       /* The counter value is updated w.r.t relative index of cbase passed to start/stop */
> +       WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
> +       start_counter(counter, SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT, 0);
> +       dummy_func_loop(10000);
> +       udelay(msecs_to_usecs(2000));
> +       /* irq handler should have stopped the counter */
> +
> +       counter_value_post = READ_ONCE(snapshot_data->ctr_values[counter_in_use]);
> +       /* The counter value after stopping should be less the init value due to overflow */
> +       __GUEST_ASSERT(counter_value_post < counter_init_value,
> +                      "counter_value_post %lx counter_init_value %lx for counter\n",
> +                      counter_value_post, counter_init_value);
> +
> +       stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
> +}
> +
>  static void test_invalid_event(void)
>  {
>         struct sbiret ret;
> @@ -361,6 +416,43 @@ static void test_pmu_events_snaphost(int cpu)
>         GUEST_DONE();
>  }
>
> +static void test_pmu_events_overflow(int cpu)
> +{
> +       long out_val = 0;
> +       bool probe;
> +       int num_counters = 0;
> +       unsigned long sbi_impl_version;
> +
> +       probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
> +       GUEST_ASSERT(probe && out_val == 1);
> +
> +       sbi_impl_version = get_host_sbi_impl_version();
> +       if (sbi_impl_version >= sbi_mk_version(2, 0))
> +               __GUEST_ASSERT(0, "SBI implementation version doesn't support PMU Snapshot");
> +
> +       snapshot_set_shmem(snapshot_gpa, 0);
> +       csr_set(CSR_IE, BIT(pmu_irq));
> +       local_irq_enable();
> +
> +       /* Get the counter details */
> +       num_counters = get_num_counters();
> +       update_counter_info(num_counters);
> +
> +       /*
> +        * Qemu supports overflow for cycle/instruction.
> +        * This test may fail on any platform that do not support overflow for these two events.
> +        */
> +       test_pmu_event_overflow(SBI_PMU_HW_CPU_CYCLES);
> +       GUEST_ASSERT_EQ(vcpu_shared_irq_count, 1);
> +
> +       /* Renable the interrupt again for another event */
> +       csr_set(CSR_IE, BIT(pmu_irq));
> +       test_pmu_event_overflow(SBI_PMU_HW_INSTRUCTIONS);
> +       GUEST_ASSERT_EQ(vcpu_shared_irq_count, 2);
> +
> +       GUEST_DONE();
> +}
> +
>  static void run_vcpu(struct kvm_vcpu *vcpu)
>  {
>         struct ucall uc;
> @@ -449,6 +541,35 @@ static void test_vm_events_snapshot_test(void *guest_code)
>         test_vm_destroy(vm);
>  }
>
> +static void test_vm_events_overflow(void *guest_code)
> +{
> +       struct kvm_vm *vm = NULL;
> +       struct kvm_vcpu *vcpu = NULL;
> +
> +       vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +       __TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_SBI_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
> +                                  "SBI PMU not available, skipping test");
> +
> +       __TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_ISA_EXT_SSCOFPMF)),
> +                                  "Sscofpmf is not available, skipping overflow test");
> +
> +
> +       test_vm_setup_snapshot_mem(vm, vcpu);
> +       vm_init_vector_tables(vm);
> +       vm_install_interrupt_handler(vm, guest_irq_handler);
> +
> +       vcpu_init_vector_tables(vcpu);
> +       /* Initialize guest timer frequency. */
> +       vcpu_get_reg(vcpu, RISCV_TIMER_REG(frequency), &timer_freq);
> +       sync_global_to_guest(vm, timer_freq);
> +
> +       vcpu_args_set(vcpu, 1, 0);
> +
> +       run_vcpu(vcpu);
> +
> +       test_vm_destroy(vm);
> +}
> +
>  int main(void)
>  {
>         test_vm_basic_test(test_pmu_basic_sanity);
> @@ -460,5 +581,8 @@ int main(void)
>         test_vm_events_snapshot_test(test_pmu_events_snaphost);
>         pr_info("SBI PMU event verification with snapshot test : PASS\n");
>
> +       test_vm_events_overflow(test_pmu_events_overflow);
> +       pr_info("SBI PMU event verification with overflow test : PASS\n");
> +
>         return 0;
>  }
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 01/15] RISC-V: Fix the typo in Scountovf CSR name
  2024-02-29  1:01 ` [PATCH v4 01/15] RISC-V: Fix the typo in Scountovf CSR name Atish Patra
@ 2024-03-01  8:25   ` Clément Léger
  0 siblings, 0 replies; 56+ messages in thread
From: Clément Léger @ 2024-03-01  8:25 UTC (permalink / raw)
  To: Atish Patra, linux-kernel
  Cc: Mark Rutland, linux-kselftest, Albert Ou, Alexandre Ghiti, kvm,
	Will Deacon, Anup Patel, Paul Walmsley, Conor Dooley,
	Paolo Bonzini, Guo Ren, kvm-riscv, Atish Patra, Palmer Dabbelt,
	linux-riscv, Shuah Khan, Andrew Jones



On 29/02/2024 02:01, Atish Patra wrote:
> The counter overflow CSR name is "scountovf" not "sscountovf".
> 
> Fix the csr name.
> 
> Fixes: 4905ec2fb7e6 ("RISC-V: Add sscofpmf extension support")
> Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/include/asm/csr.h         | 2 +-
>  arch/riscv/include/asm/errata_list.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index 510014051f5d..603e5a3c61f9 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -281,7 +281,7 @@
>  #define CSR_HPMCOUNTER30H	0xc9e
>  #define CSR_HPMCOUNTER31H	0xc9f
>  
> -#define CSR_SSCOUNTOVF		0xda0
> +#define CSR_SCOUNTOVF		0xda0
>  
>  #define CSR_SSTATUS		0x100
>  #define CSR_SIE			0x104
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index ea33288f8a25..cd49eb025ddf 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -114,7 +114,7 @@ asm volatile(ALTERNATIVE(						\
>  
>  #define ALT_SBI_PMU_OVERFLOW(__ovl)					\
>  asm volatile(ALTERNATIVE(						\
> -	"csrr %0, " __stringify(CSR_SSCOUNTOVF),			\
> +	"csrr %0, " __stringify(CSR_SCOUNTOVF),				\
>  	"csrr %0, " __stringify(THEAD_C9XX_CSR_SCOUNTEROF),		\
>  		THEAD_VENDOR_ID, ERRATA_THEAD_PMU,			\
>  		CONFIG_ERRATA_THEAD_PMU)				\


Reviewed-by: Clément Léger <cleger@rivosinc.com>

Thanks,

Clément

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 02/15] RISC-V: Add FIRMWARE_READ_HI definition
  2024-02-29  1:01 ` [PATCH v4 02/15] RISC-V: Add FIRMWARE_READ_HI definition Atish Patra
@ 2024-03-01  8:27   ` Clément Léger
  0 siblings, 0 replies; 56+ messages in thread
From: Clément Léger @ 2024-03-01  8:27 UTC (permalink / raw)
  To: Atish Patra, linux-kernel
  Cc: Mark Rutland, linux-kselftest, Albert Ou, Alexandre Ghiti, kvm,
	Will Deacon, Anup Patel, Paul Walmsley, Conor Dooley,
	Paolo Bonzini, Guo Ren, kvm-riscv, Atish Patra, Palmer Dabbelt,
	linux-riscv, Shuah Khan, Andrew Jones



On 29/02/2024 02:01, Atish Patra wrote:
> SBI v2.0 added another function to SBI PMU extension to read
> the upper bits of a counter with width larger than XLEN.
> 
> Add the definition for that function.
> 
> Acked-by: Conor Dooley <conor.dooley@microchip.com>
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/include/asm/sbi.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 6e68f8dff76b..ef8311dafb91 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -131,6 +131,7 @@ enum sbi_ext_pmu_fid {
>  	SBI_EXT_PMU_COUNTER_START,
>  	SBI_EXT_PMU_COUNTER_STOP,
>  	SBI_EXT_PMU_COUNTER_FW_READ,
> +	SBI_EXT_PMU_COUNTER_FW_READ_HI,
>  };
>  
>  union sbi_pmu_ctr_info {

Reviewed-by: Clément Léger <cleger@rivosinc.com>

Thanks,

Clément

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 03/15] drivers/perf: riscv: Read upper bits of a firmware counter
  2024-02-29  1:01 ` [PATCH v4 03/15] drivers/perf: riscv: Read upper bits of a firmware counter Atish Patra
@ 2024-03-01  9:52   ` Andrew Jones
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Jones @ 2024-03-01  9:52 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Palmer Dabbelt, Conor Dooley, Anup Patel,
	Albert Ou, Alexandre Ghiti, Atish Patra, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

On Wed, Feb 28, 2024 at 05:01:18PM -0800, Atish Patra wrote:
> SBI v2.0 introduced a explicit function to read the upper 32 bits
> for any firmwar counter width that is longer than 32bits.

firmware

> This is only applicable for RV32 where firmware counter can be
> 64 bit.
> 
> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
> Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  drivers/perf/riscv_pmu_sbi.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 16acd4dcdb96..ea0fdb589f0d 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -35,6 +35,8 @@
>  PMU_FORMAT_ATTR(event, "config:0-47");
>  PMU_FORMAT_ATTR(firmware, "config:63");
>  
> +static bool sbi_v2_available;
> +
>  static struct attribute *riscv_arch_formats_attr[] = {
>  	&format_attr_event.attr,
>  	&format_attr_firmware.attr,
> @@ -488,16 +490,23 @@ static u64 pmu_sbi_ctr_read(struct perf_event *event)
>  	struct hw_perf_event *hwc = &event->hw;
>  	int idx = hwc->idx;
>  	struct sbiret ret;
> -	union sbi_pmu_ctr_info info;
>  	u64 val = 0;
> +	union sbi_pmu_ctr_info info = pmu_ctr_list[idx];
>  
>  	if (pmu_sbi_is_fw_event(event)) {
>  		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ,
>  				hwc->idx, 0, 0, 0, 0, 0);
> -		if (!ret.error)
> -			val = ret.value;
> +		if (ret.error)
> +			return 0;
> +
> +		val = ret.value;
> +		if (IS_ENABLED(CONFIG_32BIT) && sbi_v2_available && info.width >= 32) {
> +			ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ_HI,
> +					hwc->idx, 0, 0, 0, 0, 0);
> +			if (!ret.error)

Getting an error here indicates a buggy SBI. Maybe we should have a
warn-once?

> +				val |= ((u64)ret.value << 32);
> +		}
>  	} else {
> -		info = pmu_ctr_list[idx];
>  		val = riscv_pmu_ctr_read_csr(info.csr);
>  		if (IS_ENABLED(CONFIG_32BIT))
>  			val = ((u64)riscv_pmu_ctr_read_csr(info.csr + 0x80)) << 31 | val;
> @@ -1108,6 +1117,9 @@ static int __init pmu_sbi_devinit(void)
>  		return 0;
>  	}
>  
> +	if (sbi_spec_version >= sbi_mk_version(2, 0))
> +		sbi_v2_available = true;
> +
>  	ret = cpuhp_setup_state_multi(CPUHP_AP_PERF_RISCV_STARTING,
>  				      "perf/riscv/pmu:starting",
>  				      pmu_sbi_starting_cpu, pmu_sbi_dying_cpu);
> -- 
> 2.34.1
>

Either way,

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 04/15] RISC-V: Add SBI PMU snapshot definitions
  2024-02-29  1:01 ` [PATCH v4 04/15] RISC-V: Add SBI PMU snapshot definitions Atish Patra
@ 2024-03-01 11:14   ` Andrew Jones
  2024-03-01 19:30     ` Atish Kumar Patra
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-03-01 11:14 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Anup Patel, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Atish Patra, Conor Dooley, Guo Ren,
	Icenowy Zheng, kvm-riscv, kvm, linux-kselftest, linux-riscv,
	Mark Rutland, Palmer Dabbelt, Paolo Bonzini, Paul Walmsley,
	Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:19PM -0800, Atish Patra wrote:
> SBI PMU Snapshot function optimizes the number of traps to
> higher privilege mode by leveraging a shared memory between the S/VS-mode
> and the M/HS mode. Add the definitions for that extension and new error
> codes.
> 
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/include/asm/sbi.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index ef8311dafb91..dfa830f7d54b 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -132,6 +132,7 @@ enum sbi_ext_pmu_fid {
>  	SBI_EXT_PMU_COUNTER_STOP,
>  	SBI_EXT_PMU_COUNTER_FW_READ,
>  	SBI_EXT_PMU_COUNTER_FW_READ_HI,
> +	SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
>  };
>  
>  union sbi_pmu_ctr_info {
> @@ -148,6 +149,13 @@ union sbi_pmu_ctr_info {
>  	};
>  };
>  
> +/* Data structure to contain the pmu snapshot data */
> +struct riscv_pmu_snapshot_data {
> +	u64 ctr_overflow_mask;
> +	u64 ctr_values[64];
> +	u64 reserved[447];
> +};
> +
>  #define RISCV_PMU_RAW_EVENT_MASK GENMASK_ULL(47, 0)
>  #define RISCV_PMU_RAW_EVENT_IDX 0x20000
>  
> @@ -244,9 +252,11 @@ enum sbi_pmu_ctr_type {
>  
>  /* Flags defined for counter start function */
>  #define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)

A patch before this which changes all flags to use BIT() instead of shifts
would be good, since otherwise the new flags are inconsistent.

> +#define SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT BIT(1)

This is named SBI_PMU_START_FLAG_INIT_SNAPSHOT in the spec.

>  
>  /* Flags defined for counter stop function */
>  #define SBI_PMU_STOP_FLAG_RESET (1 << 0)
> +#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
>  
>  enum sbi_ext_dbcn_fid {
>  	SBI_EXT_DBCN_CONSOLE_WRITE = 0,
> @@ -285,6 +295,7 @@ struct sbi_sta_struct {
>  #define SBI_ERR_ALREADY_AVAILABLE -6
>  #define SBI_ERR_ALREADY_STARTED -7
>  #define SBI_ERR_ALREADY_STOPPED -8
> +#define SBI_ERR_NO_SHMEM	-9
>  
>  extern unsigned long sbi_spec_version;
>  struct sbiret {
> -- 
> 2.34.1
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 05/15] drivers/perf: riscv: Implement SBI PMU snapshot function
  2024-02-29  1:01 ` [PATCH v4 05/15] drivers/perf: riscv: Implement SBI PMU snapshot function Atish Patra
@ 2024-03-01 14:40   ` Andrew Jones
  2024-03-01 15:55     ` Alexandre Ghiti
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-03-01 14:40 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Palmer Dabbelt, Anup Patel, Conor Dooley,
	Albert Ou, Alexandre Ghiti, Atish Patra, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

On Wed, Feb 28, 2024 at 05:01:20PM -0800, Atish Patra wrote:
> SBI v2.0 SBI introduced PMU snapshot feature which adds the following
> features.
> 
> 1. Read counter values directly from the shared memory instead of
> csr read.
> 2. Start multiple counters with initial values with one SBI call.
> 
> These functionalities optimizes the number of traps to the higher
> privilege mode. If the kernel is in VS mode while the hypervisor
> deploy trap & emulate method, this would minimize all the hpmcounter
> CSR read traps. If the kernel is running in S-mode, the benefits
> reduced to CSR latency vs DRAM/cache latency as there is no trap
> involved while accessing the hpmcounter CSRs.
> 
> In both modes, it does saves the number of ecalls while starting
> multiple counter together with an initial values. This is a likely
> scenario if multiple counters overflow at the same time.
> 
> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  drivers/perf/riscv_pmu.c       |   1 +
>  drivers/perf/riscv_pmu_sbi.c   | 209 +++++++++++++++++++++++++++++++--
>  include/linux/perf/riscv_pmu.h |   6 +
>  3 files changed, 204 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> index 0dda70e1ef90..5b57acb770d3 100644
> --- a/drivers/perf/riscv_pmu.c
> +++ b/drivers/perf/riscv_pmu.c
> @@ -412,6 +412,7 @@ struct riscv_pmu *riscv_pmu_alloc(void)
>  		cpuc->n_events = 0;
>  		for (i = 0; i < RISCV_MAX_COUNTERS; i++)
>  			cpuc->events[i] = NULL;
> +		cpuc->snapshot_addr = NULL;
>  	}
>  	pmu->pmu = (struct pmu) {
>  		.event_init	= riscv_pmu_event_init,
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index ea0fdb589f0d..8de5721e8019 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -36,6 +36,9 @@ PMU_FORMAT_ATTR(event, "config:0-47");
>  PMU_FORMAT_ATTR(firmware, "config:63");
>  
>  static bool sbi_v2_available;
> +static DEFINE_STATIC_KEY_FALSE(sbi_pmu_snapshot_available);
> +#define sbi_pmu_snapshot_available() \
> +	static_branch_unlikely(&sbi_pmu_snapshot_available)
>  
>  static struct attribute *riscv_arch_formats_attr[] = {
>  	&format_attr_event.attr,
> @@ -485,14 +488,100 @@ static int pmu_sbi_event_map(struct perf_event *event, u64 *econfig)
>  	return ret;
>  }
>  
> +static void pmu_sbi_snapshot_free(struct riscv_pmu *pmu)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> +
> +		if (!cpu_hw_evt->snapshot_addr)
> +			continue;
> +
> +		free_page((unsigned long)cpu_hw_evt->snapshot_addr);
> +		cpu_hw_evt->snapshot_addr = NULL;
> +		cpu_hw_evt->snapshot_addr_phys = 0;
> +	}
> +}
> +
> +static int pmu_sbi_snapshot_alloc(struct riscv_pmu *pmu)
> +{
> +	int cpu;
> +	struct page *snapshot_page;
> +
> +	for_each_possible_cpu(cpu) {
> +		struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> +
> +		if (cpu_hw_evt->snapshot_addr)
> +			continue;
> +
> +		snapshot_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
> +		if (!snapshot_page) {
> +			pmu_sbi_snapshot_free(pmu);
> +			return -ENOMEM;
> +		}
> +		cpu_hw_evt->snapshot_addr = page_to_virt(snapshot_page);
> +		cpu_hw_evt->snapshot_addr_phys = page_to_phys(snapshot_page);
> +	}
> +
> +	return 0;
> +}
> +
> +static void pmu_sbi_snapshot_disable(void)
> +{
> +	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM, -1,
> +		  -1, 0, 0, 0, 0);
> +}
> +
> +static int pmu_sbi_snapshot_setup(struct riscv_pmu *pmu, int cpu)
> +{
> +	struct cpu_hw_events *cpu_hw_evt;
> +	struct sbiret ret = {0};
> +
> +	cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> +	if (!cpu_hw_evt->snapshot_addr_phys)
> +		return -EINVAL;
> +
> +	if (cpu_hw_evt->snapshot_set_done)
> +		return 0;
> +
> +	if (IS_ENABLED(CONFIG_32BIT))
> +		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> +				cpu_hw_evt->snapshot_addr_phys,
> +				(u64)(cpu_hw_evt->snapshot_addr_phys) >> 32, 0, 0, 0, 0);
> +	else
> +		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> +				cpu_hw_evt->snapshot_addr_phys, 0, 0, 0, 0, 0);
> +
> +	/* Free up the snapshot area memory and fall back to SBI PMU calls without snapshot */
> +	if (ret.error) {
> +		if (ret.error != SBI_ERR_NOT_SUPPORTED)
> +			pr_warn("pmu snapshot setup failed with error %ld\n", ret.error);
> +		return sbi_err_map_linux_errno(ret.error);
> +	}
> +
> +	cpu_hw_evt->snapshot_set_done = true;
> +
> +	return 0;
> +}
> +
>  static u64 pmu_sbi_ctr_read(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	int idx = hwc->idx;
>  	struct sbiret ret;
>  	u64 val = 0;
> +	struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
> +	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> +	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
>  	union sbi_pmu_ctr_info info = pmu_ctr_list[idx];
>  
> +	/* Read the value from the shared memory directly */
> +	if (sbi_pmu_snapshot_available()) {
> +		val = sdata->ctr_values[idx];
> +		return val;
> +	}
> +
>  	if (pmu_sbi_is_fw_event(event)) {
>  		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ,
>  				hwc->idx, 0, 0, 0, 0, 0);
> @@ -539,6 +628,7 @@ static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
>  	struct hw_perf_event *hwc = &event->hw;
>  	unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
>  
> +	/* There is no benefit setting SNAPSHOT FLAG for a single counter */
>  #if defined(CONFIG_32BIT)
>  	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, hwc->idx,
>  			1, flag, ival, ival >> 32, 0);
> @@ -559,16 +649,36 @@ static void pmu_sbi_ctr_stop(struct perf_event *event, unsigned long flag)
>  {
>  	struct sbiret ret;
>  	struct hw_perf_event *hwc = &event->hw;
> +	struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
> +	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> +	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
>  
>  	if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
>  	    (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
>  		pmu_sbi_reset_scounteren((void *)event);
>  
> +	if (sbi_pmu_snapshot_available())
> +		flag |= SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
> +
>  	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, hwc->idx, 1, flag, 0, 0, 0);
> -	if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
> -		flag != SBI_PMU_STOP_FLAG_RESET)
> +	if (!ret.error && sbi_pmu_snapshot_available()) {
> +		/*
> +		 * The counter snapshot is based on the index base specified by hwc->idx.
> +		 * The actual counter value is updated in shared memory at index 0 when counter
> +		 * mask is 0x01. To ensure accurate counter values, it's necessary to transfer
> +		 * the counter value to shared memory. However, if hwc->idx is zero, the counter
> +		 * value is already correctly updated in shared memory, requiring no further
> +		 * adjustment.
> +		 */
> +		if (hwc->idx > 0) {
> +			sdata->ctr_values[hwc->idx] = sdata->ctr_values[0];
> +			sdata->ctr_values[0] = 0;
> +		}
> +	} else if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
> +		flag != SBI_PMU_STOP_FLAG_RESET) {
>  		pr_err("Stopping counter idx %d failed with error %d\n",
>  			hwc->idx, sbi_err_map_linux_errno(ret.error));
> +	}
>  }
>  
>  static int pmu_sbi_find_num_ctrs(void)
> @@ -626,10 +736,14 @@ static inline void pmu_sbi_stop_all(struct riscv_pmu *pmu)
>  static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
>  {
>  	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> +	unsigned long flag = 0;
> +
> +	if (sbi_pmu_snapshot_available())
> +		flag = SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
>  
>  	/* No need to check the error here as we can't do anything about the error */
>  	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0,
> -		  cpu_hw_evt->used_hw_ctrs[0], 0, 0, 0, 0);
> +		  cpu_hw_evt->used_hw_ctrs[0], flag, 0, 0, 0);
>  }
>  
>  /*
> @@ -638,11 +752,10 @@ static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
>   * while the overflowed counters need to be started with updated initialization
>   * value.
>   */
> -static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> -					       unsigned long ctr_ovf_mask)
> +static noinline void pmu_sbi_start_ovf_ctrs_sbi(struct cpu_hw_events *cpu_hw_evt,
> +						unsigned long ctr_ovf_mask)
>  {
>  	int idx = 0;
> -	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
>  	struct perf_event *event;
>  	unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
>  	unsigned long ctr_start_mask = 0;
> @@ -677,6 +790,49 @@ static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
>  	}
>  }
>  
> +static noinline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_evt,
> +						     unsigned long ctr_ovf_mask)
> +{
> +	int idx = 0;
> +	struct perf_event *event;
> +	unsigned long flag = SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT;
> +	u64 max_period, init_val = 0;
> +	struct hw_perf_event *hwc;
> +	unsigned long ctr_start_mask = 0;
> +	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> +
> +	for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
> +		if (ctr_ovf_mask & (1 << idx)) {

nit: BIT(idx)

> +			event = cpu_hw_evt->events[idx];
> +			hwc = &event->hw;
> +			max_period = riscv_pmu_ctr_get_width_mask(event);
> +			init_val = local64_read(&hwc->prev_count) & max_period;
> +			sdata->ctr_values[idx] = init_val;
> +		}
> +		/*
> +		 * We donot need to update the non-overflow counters the previous

do not

> +		 * value should have been there already.
> +		 */
> +	}
> +
> +	ctr_start_mask = cpu_hw_evt->used_hw_ctrs[0];
> +
> +	/* Start all the counters in a single shot */
> +	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, 0, ctr_start_mask,
> +		  flag, 0, 0, 0);

I think we should always loop over all words of used_hw_ctrs[] since it'll
have more than one for riscv32. Hmm, it seems like there are several
places where we don't expect riscv32's second word to be used...

> +}
> +
> +static void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> +					unsigned long ctr_ovf_mask)
> +{
> +	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> +
> +	if (sbi_pmu_snapshot_available())
> +		pmu_sbi_start_ovf_ctrs_snapshot(cpu_hw_evt, ctr_ovf_mask);
> +	else
> +		pmu_sbi_start_ovf_ctrs_sbi(cpu_hw_evt, ctr_ovf_mask);
> +}
> +
>  static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
>  {
>  	struct perf_sample_data data;
> @@ -690,6 +846,7 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
>  	unsigned long overflowed_ctrs = 0;
>  	struct cpu_hw_events *cpu_hw_evt = dev;
>  	u64 start_clock = sched_clock();
> +	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
>  
>  	if (WARN_ON_ONCE(!cpu_hw_evt))
>  		return IRQ_NONE;
> @@ -711,8 +868,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
>  	pmu_sbi_stop_hw_ctrs(pmu);
>  
>  	/* Overflow status register should only be read after counter are stopped */
> -	ALT_SBI_PMU_OVERFLOW(overflow);
> -
> +	if (sbi_pmu_snapshot_available())
> +		overflow = sdata->ctr_overflow_mask;
> +	else
> +		ALT_SBI_PMU_OVERFLOW(overflow);
>  	/*
>  	 * Overflow interrupt pending bit should only be cleared after stopping
>  	 * all the counters to avoid any race condition.
> @@ -794,6 +953,9 @@ static int pmu_sbi_starting_cpu(unsigned int cpu, struct hlist_node *node)
>  		enable_percpu_irq(riscv_pmu_irq, IRQ_TYPE_NONE);
>  	}
>  
> +	if (sbi_pmu_snapshot_available())
> +		return pmu_sbi_snapshot_setup(pmu, cpu);
> +
>  	return 0;
>  }
>  
> @@ -807,6 +969,9 @@ static int pmu_sbi_dying_cpu(unsigned int cpu, struct hlist_node *node)
>  	/* Disable all counters access for user mode now */
>  	csr_write(CSR_SCOUNTEREN, 0x0);
>  
> +	if (sbi_pmu_snapshot_available())
> +		pmu_sbi_snapshot_disable();
> +
>  	return 0;
>  }
>  
> @@ -1076,10 +1241,6 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
>  	pmu->event_unmapped = pmu_sbi_event_unmapped;
>  	pmu->csr_index = pmu_sbi_csr_index;
>  
> -	ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
> -	if (ret)
> -		return ret;
> -
>  	ret = riscv_pm_pmu_register(pmu);
>  	if (ret)
>  		goto out_unregister;
> @@ -1088,8 +1249,32 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
>  	if (ret)
>  		goto out_unregister;
>  
> +	/* SBI PMU Snapsphot is only available in SBI v2.0 */
> +	if (sbi_v2_available) {
> +		ret = pmu_sbi_snapshot_alloc(pmu);
> +		if (ret)
> +			goto out_unregister;
> +
> +		ret = pmu_sbi_snapshot_setup(pmu, smp_processor_id());
> +		if (!ret) {
> +			pr_info("SBI PMU snapshot detected\n");
> +			/*
> +			 * We enable it once here for the boot cpu. If snapshot shmem setup
> +			 * fails during cpu hotplug process, it will fail to start the cpu
> +			 * as we can not handle hetergenous PMUs with different snapshot
> +			 * capability.
> +			 */
> +			static_branch_enable(&sbi_pmu_snapshot_available);
> +		}
> +		/* Snapshot is an optional feature. Continue if not available */
> +	}
> +
>  	register_sysctl("kernel", sbi_pmu_sysctl_table);
>  
> +	ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
> +	if (ret)
> +		return ret;
> +
>  	return 0;
>  
>  out_unregister:
> diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
> index 43282e22ebe1..c3fa90970042 100644
> --- a/include/linux/perf/riscv_pmu.h
> +++ b/include/linux/perf/riscv_pmu.h
> @@ -39,6 +39,12 @@ struct cpu_hw_events {
>  	DECLARE_BITMAP(used_hw_ctrs, RISCV_MAX_COUNTERS);
>  	/* currently enabled firmware counters */
>  	DECLARE_BITMAP(used_fw_ctrs, RISCV_MAX_COUNTERS);
> +	/* The virtual address of the shared memory where counter snapshot will be taken */
> +	void *snapshot_addr;
> +	/* The physical address of the shared memory where counter snapshot will be taken */
> +	phys_addr_t snapshot_addr_phys;
> +	/* Boolean flag to indicate setup is already done */
> +	bool snapshot_set_done;

Instead of the 'snapshot_set_done' boolean, we can just use
snapshot_addr, which can't be NULL after setup.

>  };
>  
>  struct riscv_pmu {
> -- 
> 2.34.1
> 

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 05/15] drivers/perf: riscv: Implement SBI PMU snapshot function
  2024-03-01 14:40   ` Andrew Jones
@ 2024-03-01 15:55     ` Alexandre Ghiti
  0 siblings, 0 replies; 56+ messages in thread
From: Alexandre Ghiti @ 2024-03-01 15:55 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Atish Patra, linux-kernel, Palmer Dabbelt, Anup Patel,
	Conor Dooley, Albert Ou, Atish Patra, Guo Ren, Icenowy Zheng,
	kvm-riscv, kvm, linux-kselftest, linux-riscv, Mark Rutland,
	Palmer Dabbelt, Paolo Bonzini, Paul Walmsley, Shuah Khan,
	Will Deacon

On Fri, Mar 1, 2024 at 3:40 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Wed, Feb 28, 2024 at 05:01:20PM -0800, Atish Patra wrote:
> > SBI v2.0 SBI introduced PMU snapshot feature which adds the following
> > features.
> >
> > 1. Read counter values directly from the shared memory instead of
> > csr read.
> > 2. Start multiple counters with initial values with one SBI call.
> >
> > These functionalities optimizes the number of traps to the higher
> > privilege mode. If the kernel is in VS mode while the hypervisor
> > deploy trap & emulate method, this would minimize all the hpmcounter
> > CSR read traps. If the kernel is running in S-mode, the benefits
> > reduced to CSR latency vs DRAM/cache latency as there is no trap
> > involved while accessing the hpmcounter CSRs.
> >
> > In both modes, it does saves the number of ecalls while starting
> > multiple counter together with an initial values. This is a likely
> > scenario if multiple counters overflow at the same time.
> >
> > Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
> > Reviewed-by: Anup Patel <anup@brainfault.org>
> > Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
> > Signed-off-by: Atish Patra <atishp@rivosinc.com>
> > ---
> >  drivers/perf/riscv_pmu.c       |   1 +
> >  drivers/perf/riscv_pmu_sbi.c   | 209 +++++++++++++++++++++++++++++++--
> >  include/linux/perf/riscv_pmu.h |   6 +
> >  3 files changed, 204 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> > index 0dda70e1ef90..5b57acb770d3 100644
> > --- a/drivers/perf/riscv_pmu.c
> > +++ b/drivers/perf/riscv_pmu.c
> > @@ -412,6 +412,7 @@ struct riscv_pmu *riscv_pmu_alloc(void)
> >               cpuc->n_events = 0;
> >               for (i = 0; i < RISCV_MAX_COUNTERS; i++)
> >                       cpuc->events[i] = NULL;
> > +             cpuc->snapshot_addr = NULL;
> >       }
> >       pmu->pmu = (struct pmu) {
> >               .event_init     = riscv_pmu_event_init,
> > diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> > index ea0fdb589f0d..8de5721e8019 100644
> > --- a/drivers/perf/riscv_pmu_sbi.c
> > +++ b/drivers/perf/riscv_pmu_sbi.c
> > @@ -36,6 +36,9 @@ PMU_FORMAT_ATTR(event, "config:0-47");
> >  PMU_FORMAT_ATTR(firmware, "config:63");
> >
> >  static bool sbi_v2_available;
> > +static DEFINE_STATIC_KEY_FALSE(sbi_pmu_snapshot_available);
> > +#define sbi_pmu_snapshot_available() \
> > +     static_branch_unlikely(&sbi_pmu_snapshot_available)
> >
> >  static struct attribute *riscv_arch_formats_attr[] = {
> >       &format_attr_event.attr,
> > @@ -485,14 +488,100 @@ static int pmu_sbi_event_map(struct perf_event *event, u64 *econfig)
> >       return ret;
> >  }
> >
> > +static void pmu_sbi_snapshot_free(struct riscv_pmu *pmu)
> > +{
> > +     int cpu;
> > +
> > +     for_each_possible_cpu(cpu) {
> > +             struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> > +
> > +             if (!cpu_hw_evt->snapshot_addr)
> > +                     continue;
> > +
> > +             free_page((unsigned long)cpu_hw_evt->snapshot_addr);
> > +             cpu_hw_evt->snapshot_addr = NULL;
> > +             cpu_hw_evt->snapshot_addr_phys = 0;
> > +     }
> > +}
> > +
> > +static int pmu_sbi_snapshot_alloc(struct riscv_pmu *pmu)
> > +{
> > +     int cpu;
> > +     struct page *snapshot_page;
> > +
> > +     for_each_possible_cpu(cpu) {
> > +             struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> > +
> > +             if (cpu_hw_evt->snapshot_addr)
> > +                     continue;
> > +
> > +             snapshot_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
> > +             if (!snapshot_page) {
> > +                     pmu_sbi_snapshot_free(pmu);
> > +                     return -ENOMEM;
> > +             }
> > +             cpu_hw_evt->snapshot_addr = page_to_virt(snapshot_page);
> > +             cpu_hw_evt->snapshot_addr_phys = page_to_phys(snapshot_page);
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static void pmu_sbi_snapshot_disable(void)
> > +{
> > +     sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM, -1,
> > +               -1, 0, 0, 0, 0);
> > +}
> > +
> > +static int pmu_sbi_snapshot_setup(struct riscv_pmu *pmu, int cpu)
> > +{
> > +     struct cpu_hw_events *cpu_hw_evt;
> > +     struct sbiret ret = {0};
> > +
> > +     cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> > +     if (!cpu_hw_evt->snapshot_addr_phys)
> > +             return -EINVAL;
> > +
> > +     if (cpu_hw_evt->snapshot_set_done)
> > +             return 0;
> > +
> > +     if (IS_ENABLED(CONFIG_32BIT))
> > +             ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> > +                             cpu_hw_evt->snapshot_addr_phys,
> > +                             (u64)(cpu_hw_evt->snapshot_addr_phys) >> 32, 0, 0, 0, 0);
> > +     else
> > +             ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> > +                             cpu_hw_evt->snapshot_addr_phys, 0, 0, 0, 0, 0);
> > +
> > +     /* Free up the snapshot area memory and fall back to SBI PMU calls without snapshot */
> > +     if (ret.error) {
> > +             if (ret.error != SBI_ERR_NOT_SUPPORTED)
> > +                     pr_warn("pmu snapshot setup failed with error %ld\n", ret.error);
> > +             return sbi_err_map_linux_errno(ret.error);
> > +     }
> > +
> > +     cpu_hw_evt->snapshot_set_done = true;
> > +
> > +     return 0;
> > +}
> > +
> >  static u64 pmu_sbi_ctr_read(struct perf_event *event)
> >  {
> >       struct hw_perf_event *hwc = &event->hw;
> >       int idx = hwc->idx;
> >       struct sbiret ret;
> >       u64 val = 0;
> > +     struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
> > +     struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > +     struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> >       union sbi_pmu_ctr_info info = pmu_ctr_list[idx];
> >
> > +     /* Read the value from the shared memory directly */
> > +     if (sbi_pmu_snapshot_available()) {
> > +             val = sdata->ctr_values[idx];
> > +             return val;
> > +     }
> > +
> >       if (pmu_sbi_is_fw_event(event)) {
> >               ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ,
> >                               hwc->idx, 0, 0, 0, 0, 0);
> > @@ -539,6 +628,7 @@ static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
> >       struct hw_perf_event *hwc = &event->hw;
> >       unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
> >
> > +     /* There is no benefit setting SNAPSHOT FLAG for a single counter */
> >  #if defined(CONFIG_32BIT)
> >       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, hwc->idx,
> >                       1, flag, ival, ival >> 32, 0);
> > @@ -559,16 +649,36 @@ static void pmu_sbi_ctr_stop(struct perf_event *event, unsigned long flag)
> >  {
> >       struct sbiret ret;
> >       struct hw_perf_event *hwc = &event->hw;
> > +     struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
> > +     struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > +     struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> >
> >       if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
> >           (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
> >               pmu_sbi_reset_scounteren((void *)event);
> >
> > +     if (sbi_pmu_snapshot_available())
> > +             flag |= SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
> > +
> >       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, hwc->idx, 1, flag, 0, 0, 0);
> > -     if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
> > -             flag != SBI_PMU_STOP_FLAG_RESET)
> > +     if (!ret.error && sbi_pmu_snapshot_available()) {
> > +             /*
> > +              * The counter snapshot is based on the index base specified by hwc->idx.
> > +              * The actual counter value is updated in shared memory at index 0 when counter
> > +              * mask is 0x01. To ensure accurate counter values, it's necessary to transfer
> > +              * the counter value to shared memory. However, if hwc->idx is zero, the counter
> > +              * value is already correctly updated in shared memory, requiring no further
> > +              * adjustment.
> > +              */
> > +             if (hwc->idx > 0) {
> > +                     sdata->ctr_values[hwc->idx] = sdata->ctr_values[0];
> > +                     sdata->ctr_values[0] = 0;
> > +             }
> > +     } else if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
> > +             flag != SBI_PMU_STOP_FLAG_RESET) {
> >               pr_err("Stopping counter idx %d failed with error %d\n",
> >                       hwc->idx, sbi_err_map_linux_errno(ret.error));
> > +     }
> >  }
> >
> >  static int pmu_sbi_find_num_ctrs(void)
> > @@ -626,10 +736,14 @@ static inline void pmu_sbi_stop_all(struct riscv_pmu *pmu)
> >  static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
> >  {
> >       struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > +     unsigned long flag = 0;
> > +
> > +     if (sbi_pmu_snapshot_available())
> > +             flag = SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
> >
> >       /* No need to check the error here as we can't do anything about the error */
> >       sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0,
> > -               cpu_hw_evt->used_hw_ctrs[0], 0, 0, 0, 0);
> > +               cpu_hw_evt->used_hw_ctrs[0], flag, 0, 0, 0);
> >  }
> >
> >  /*
> > @@ -638,11 +752,10 @@ static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
> >   * while the overflowed counters need to be started with updated initialization
> >   * value.
> >   */
> > -static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> > -                                            unsigned long ctr_ovf_mask)
> > +static noinline void pmu_sbi_start_ovf_ctrs_sbi(struct cpu_hw_events *cpu_hw_evt,
> > +                                             unsigned long ctr_ovf_mask)
> >  {
> >       int idx = 0;
> > -     struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> >       struct perf_event *event;
> >       unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
> >       unsigned long ctr_start_mask = 0;
> > @@ -677,6 +790,49 @@ static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> >       }
> >  }
> >
> > +static noinline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_evt,
> > +                                                  unsigned long ctr_ovf_mask)
> > +{
> > +     int idx = 0;
> > +     struct perf_event *event;
> > +     unsigned long flag = SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT;
> > +     u64 max_period, init_val = 0;
> > +     struct hw_perf_event *hwc;
> > +     unsigned long ctr_start_mask = 0;
> > +     struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> > +
> > +     for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
> > +             if (ctr_ovf_mask & (1 << idx)) {
>
> nit: BIT(idx)

Maybe more than a nit? It looks like the recent bug that Fei fixed
here https://lore.kernel.org/linux-riscv/20240228115425.2613856-1-fei2.wu@intel.com/

>
> > +                     event = cpu_hw_evt->events[idx];
> > +                     hwc = &event->hw;
> > +                     max_period = riscv_pmu_ctr_get_width_mask(event);
> > +                     init_val = local64_read(&hwc->prev_count) & max_period;
> > +                     sdata->ctr_values[idx] = init_val;
> > +             }
> > +             /*
> > +              * We donot need to update the non-overflow counters the previous
>
> do not
>
> > +              * value should have been there already.
> > +              */
> > +     }
> > +
> > +     ctr_start_mask = cpu_hw_evt->used_hw_ctrs[0];
> > +
> > +     /* Start all the counters in a single shot */
> > +     sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, 0, ctr_start_mask,
> > +               flag, 0, 0, 0);
>
> I think we should always loop over all words of used_hw_ctrs[] since it'll
> have more than one for riscv32. Hmm, it seems like there are several
> places where we don't expect riscv32's second word to be used...
>
> > +}
> > +
> > +static void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> > +                                     unsigned long ctr_ovf_mask)
> > +{
> > +     struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > +
> > +     if (sbi_pmu_snapshot_available())
> > +             pmu_sbi_start_ovf_ctrs_snapshot(cpu_hw_evt, ctr_ovf_mask);
> > +     else
> > +             pmu_sbi_start_ovf_ctrs_sbi(cpu_hw_evt, ctr_ovf_mask);
> > +}
> > +
> >  static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> >  {
> >       struct perf_sample_data data;
> > @@ -690,6 +846,7 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> >       unsigned long overflowed_ctrs = 0;
> >       struct cpu_hw_events *cpu_hw_evt = dev;
> >       u64 start_clock = sched_clock();
> > +     struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> >
> >       if (WARN_ON_ONCE(!cpu_hw_evt))
> >               return IRQ_NONE;
> > @@ -711,8 +868,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> >       pmu_sbi_stop_hw_ctrs(pmu);
> >
> >       /* Overflow status register should only be read after counter are stopped */
> > -     ALT_SBI_PMU_OVERFLOW(overflow);
> > -
> > +     if (sbi_pmu_snapshot_available())
> > +             overflow = sdata->ctr_overflow_mask;
> > +     else
> > +             ALT_SBI_PMU_OVERFLOW(overflow);
> >       /*
> >        * Overflow interrupt pending bit should only be cleared after stopping
> >        * all the counters to avoid any race condition.
> > @@ -794,6 +953,9 @@ static int pmu_sbi_starting_cpu(unsigned int cpu, struct hlist_node *node)
> >               enable_percpu_irq(riscv_pmu_irq, IRQ_TYPE_NONE);
> >       }
> >
> > +     if (sbi_pmu_snapshot_available())
> > +             return pmu_sbi_snapshot_setup(pmu, cpu);
> > +
> >       return 0;
> >  }
> >
> > @@ -807,6 +969,9 @@ static int pmu_sbi_dying_cpu(unsigned int cpu, struct hlist_node *node)
> >       /* Disable all counters access for user mode now */
> >       csr_write(CSR_SCOUNTEREN, 0x0);
> >
> > +     if (sbi_pmu_snapshot_available())
> > +             pmu_sbi_snapshot_disable();
> > +
> >       return 0;
> >  }
> >
> > @@ -1076,10 +1241,6 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
> >       pmu->event_unmapped = pmu_sbi_event_unmapped;
> >       pmu->csr_index = pmu_sbi_csr_index;
> >
> > -     ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
> > -     if (ret)
> > -             return ret;
> > -
> >       ret = riscv_pm_pmu_register(pmu);
> >       if (ret)
> >               goto out_unregister;
> > @@ -1088,8 +1249,32 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
> >       if (ret)
> >               goto out_unregister;
> >
> > +     /* SBI PMU Snapsphot is only available in SBI v2.0 */
> > +     if (sbi_v2_available) {
> > +             ret = pmu_sbi_snapshot_alloc(pmu);
> > +             if (ret)
> > +                     goto out_unregister;
> > +
> > +             ret = pmu_sbi_snapshot_setup(pmu, smp_processor_id());
> > +             if (!ret) {
> > +                     pr_info("SBI PMU snapshot detected\n");
> > +                     /*
> > +                      * We enable it once here for the boot cpu. If snapshot shmem setup
> > +                      * fails during cpu hotplug process, it will fail to start the cpu
> > +                      * as we can not handle hetergenous PMUs with different snapshot
> > +                      * capability.
> > +                      */
> > +                     static_branch_enable(&sbi_pmu_snapshot_available);
> > +             }
> > +             /* Snapshot is an optional feature. Continue if not available */
> > +     }
> > +
> >       register_sysctl("kernel", sbi_pmu_sysctl_table);
> >
> > +     ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
> > +     if (ret)
> > +             return ret;
> > +
> >       return 0;
> >
> >  out_unregister:
> > diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
> > index 43282e22ebe1..c3fa90970042 100644
> > --- a/include/linux/perf/riscv_pmu.h
> > +++ b/include/linux/perf/riscv_pmu.h
> > @@ -39,6 +39,12 @@ struct cpu_hw_events {
> >       DECLARE_BITMAP(used_hw_ctrs, RISCV_MAX_COUNTERS);
> >       /* currently enabled firmware counters */
> >       DECLARE_BITMAP(used_fw_ctrs, RISCV_MAX_COUNTERS);
> > +     /* The virtual address of the shared memory where counter snapshot will be taken */
> > +     void *snapshot_addr;
> > +     /* The physical address of the shared memory where counter snapshot will be taken */
> > +     phys_addr_t snapshot_addr_phys;
> > +     /* Boolean flag to indicate setup is already done */
> > +     bool snapshot_set_done;
>
> Instead of the 'snapshot_set_done' boolean, we can just use
> snapshot_addr, which can't be NULL after setup.
>
> >  };
> >
> >  struct riscv_pmu {
> > --
> > 2.34.1
> >
>
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 04/15] RISC-V: Add SBI PMU snapshot definitions
  2024-03-01 11:14   ` Andrew Jones
@ 2024-03-01 19:30     ` Atish Kumar Patra
  0 siblings, 0 replies; 56+ messages in thread
From: Atish Kumar Patra @ 2024-03-01 19:30 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-kernel, Anup Patel, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Atish Patra, Conor Dooley, Guo Ren,
	Icenowy Zheng, kvm-riscv, kvm, linux-kselftest, linux-riscv,
	Mark Rutland, Palmer Dabbelt, Paolo Bonzini, Paul Walmsley,
	Shuah Khan, Will Deacon

On Fri, Mar 1, 2024 at 3:14 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Wed, Feb 28, 2024 at 05:01:19PM -0800, Atish Patra wrote:
> > SBI PMU Snapshot function optimizes the number of traps to
> > higher privilege mode by leveraging a shared memory between the S/VS-mode
> > and the M/HS mode. Add the definitions for that extension and new error
> > codes.
> >
> > Reviewed-by: Anup Patel <anup@brainfault.org>
> > Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
> > Signed-off-by: Atish Patra <atishp@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/sbi.h | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> > index ef8311dafb91..dfa830f7d54b 100644
> > --- a/arch/riscv/include/asm/sbi.h
> > +++ b/arch/riscv/include/asm/sbi.h
> > @@ -132,6 +132,7 @@ enum sbi_ext_pmu_fid {
> >       SBI_EXT_PMU_COUNTER_STOP,
> >       SBI_EXT_PMU_COUNTER_FW_READ,
> >       SBI_EXT_PMU_COUNTER_FW_READ_HI,
> > +     SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> >  };
> >
> >  union sbi_pmu_ctr_info {
> > @@ -148,6 +149,13 @@ union sbi_pmu_ctr_info {
> >       };
> >  };
> >
> > +/* Data structure to contain the pmu snapshot data */
> > +struct riscv_pmu_snapshot_data {
> > +     u64 ctr_overflow_mask;
> > +     u64 ctr_values[64];
> > +     u64 reserved[447];
> > +};
> > +
> >  #define RISCV_PMU_RAW_EVENT_MASK GENMASK_ULL(47, 0)
> >  #define RISCV_PMU_RAW_EVENT_IDX 0x20000
> >
> > @@ -244,9 +252,11 @@ enum sbi_pmu_ctr_type {
> >
> >  /* Flags defined for counter start function */
> >  #define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)
>
> A patch before this which changes all flags to use BIT() instead of shifts
> would be good, since otherwise the new flags are inconsistent.
>

Done.

>
> > +#define SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT BIT(1)
>
> This is named SBI_PMU_START_FLAG_INIT_SNAPSHOT in the spec.
>

Fixed.

>
> >
> >  /* Flags defined for counter stop function */
> >  #define SBI_PMU_STOP_FLAG_RESET (1 << 0)
> > +#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
> >
> >  enum sbi_ext_dbcn_fid {
> >       SBI_EXT_DBCN_CONSOLE_WRITE = 0,
> > @@ -285,6 +295,7 @@ struct sbi_sta_struct {
> >  #define SBI_ERR_ALREADY_AVAILABLE -6
> >  #define SBI_ERR_ALREADY_STARTED -7
> >  #define SBI_ERR_ALREADY_STOPPED -8
> > +#define SBI_ERR_NO_SHMEM     -9
> >
> >  extern unsigned long sbi_spec_version;
> >  struct sbiret {
> > --
> > 2.34.1
> >
>
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest
  2024-03-01  4:47   ` Anup Patel
@ 2024-03-02  1:01     ` Atish Kumar Patra
  0 siblings, 0 replies; 56+ messages in thread
From: Atish Kumar Patra @ 2024-03-02  1:01 UTC (permalink / raw)
  To: Anup Patel
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Andrew Jones,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Thu, Feb 29, 2024 at 8:47 PM Anup Patel <anup@brainfault.org> wrote:
>
> On Thu, Feb 29, 2024 at 6:32 AM Atish Patra <atishp@rivosinc.com> wrote:
> >
> > This test implements basic sanity test and cycle/instret event
> > counting tests.
> >
> > Signed-off-by: Atish Patra <atishp@rivosinc.com>
>
> I feel the test should have been called sbi_pmu_test but no need to
> revise this series. I will take care of it at the time of merging.
>

Sure. I am going to send v5 anyway. I have changed the name there.

> Reviewed-by: Anup Patel <anup@brainfault.org>
>
> Regards,
> Anup
>
> > ---
> >  tools/testing/selftests/kvm/Makefile        |   1 +
> >  tools/testing/selftests/kvm/riscv/sbi_pmu.c | 340 ++++++++++++++++++++
> >  2 files changed, 341 insertions(+)
> >  create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu.c
> >
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index 426f85798aea..b2dce6843b9e 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -195,6 +195,7 @@ TEST_GEN_PROGS_riscv += kvm_create_max_vcpus
> >  TEST_GEN_PROGS_riscv += kvm_page_table_test
> >  TEST_GEN_PROGS_riscv += set_memory_region_test
> >  TEST_GEN_PROGS_riscv += steal_time
> > +TEST_GEN_PROGS_riscv += riscv/sbi_pmu
> >
> >  SPLIT_TESTS += arch_timer
> >  SPLIT_TESTS += get-reg-list
> > diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> > new file mode 100644
> > index 000000000000..fc1fc5eea99e
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> > @@ -0,0 +1,340 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * arch_timer.c - Tests the riscv64 sstc timer IRQ functionality
> > + *
> > + * The test validates the sstc timer IRQs using vstimecmp registers.
> > + * It's ported from the aarch64 arch_timer test.
> > + *
> > + * Copyright (c) 2024, Rivos Inc.
> > + */
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <sys/types.h>
> > +#include "kvm_util.h"
> > +#include "test_util.h"
> > +#include "processor.h"
> > +
> > +/* Maximum counters (firmware + hardware)*/
> > +#define RISCV_MAX_PMU_COUNTERS 64
> > +union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
> > +
> > +/* Cache the available counters in a bitmask */
> > +static unsigned long counter_mask_available;
> > +
> > +unsigned long pmu_csr_read_num(int csr_num)
> > +{
> > +#define switchcase_csr_read(__csr_num, __val)          {\
> > +       case __csr_num:                                 \
> > +               __val = csr_read(__csr_num);            \
> > +               break; }
> > +#define switchcase_csr_read_2(__csr_num, __val)                {\
> > +       switchcase_csr_read(__csr_num + 0, __val)        \
> > +       switchcase_csr_read(__csr_num + 1, __val)}
> > +#define switchcase_csr_read_4(__csr_num, __val)                {\
> > +       switchcase_csr_read_2(__csr_num + 0, __val)      \
> > +       switchcase_csr_read_2(__csr_num + 2, __val)}
> > +#define switchcase_csr_read_8(__csr_num, __val)                {\
> > +       switchcase_csr_read_4(__csr_num + 0, __val)      \
> > +       switchcase_csr_read_4(__csr_num + 4, __val)}
> > +#define switchcase_csr_read_16(__csr_num, __val)       {\
> > +       switchcase_csr_read_8(__csr_num + 0, __val)      \
> > +       switchcase_csr_read_8(__csr_num + 8, __val)}
> > +#define switchcase_csr_read_32(__csr_num, __val)       {\
> > +       switchcase_csr_read_16(__csr_num + 0, __val)     \
> > +       switchcase_csr_read_16(__csr_num + 16, __val)}
> > +
> > +       unsigned long ret = 0;
> > +
> > +       switch (csr_num) {
> > +       switchcase_csr_read_32(CSR_CYCLE, ret)
> > +       switchcase_csr_read_32(CSR_CYCLEH, ret)
> > +       default :
> > +               break;
> > +       }
> > +
> > +       return ret;
> > +#undef switchcase_csr_read_32
> > +#undef switchcase_csr_read_16
> > +#undef switchcase_csr_read_8
> > +#undef switchcase_csr_read_4
> > +#undef switchcase_csr_read_2
> > +#undef switchcase_csr_read
> > +}
> > +
> > +static inline void dummy_func_loop(int iter)
> > +{
> > +       int i = 0;
> > +
> > +       while (i < iter) {
> > +               asm volatile("nop");
> > +               i++;
> > +       }
> > +}
> > +
> > +static void guest_illegal_exception_handler(struct ex_regs *regs)
> > +{
> > +       __GUEST_ASSERT(regs->cause == EXC_INST_ILLEGAL,
> > +                      "Unexpected exception handler %lx\n", regs->cause);
> > +
> > +       /* skip the trapping instruction */
> > +       regs->epc += 4;
> > +}
> > +
> > +static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
> > +                                      unsigned long cflags,
> > +                                      unsigned long event)
> > +{
> > +       struct sbiret ret;
> > +
> > +       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase, cmask,
> > +                       cflags, event, 0, 0);
> > +       __GUEST_ASSERT(ret.error == 0, "config matching failed %ld\n", ret.error);
> > +       GUEST_ASSERT((ret.value < RISCV_MAX_PMU_COUNTERS) &&
> > +                   ((1UL << ret.value) & counter_mask_available));
> > +
> > +       return ret.value;
> > +}
> > +
> > +static unsigned long get_num_counters(void)
> > +{
> > +       struct sbiret ret;
> > +
> > +       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_NUM_COUNTERS, 0, 0, 0, 0, 0, 0);
> > +
> > +       __GUEST_ASSERT(ret.error == 0, "Unable to retrieve number of counters from SBI PMU");
> > +
> > +       __GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS,
> > +                      "Invalid number of counters %ld\n", ret.value);
> > +
> > +       return ret.value;
> > +}
> > +
> > +static void update_counter_info(int num_counters)
> > +{
> > +       int i = 0;
> > +       struct sbiret ret;
> > +
> > +       for (i = 0; i < num_counters; i++) {
> > +               ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i, 0, 0, 0, 0, 0);
> > +
> > +               /* There can be gaps in logical counter indicies*/
> > +               if (!ret.error)
> > +                       GUEST_ASSERT_NE(ret.value, 0);
> > +
> > +               ctrinfo_arr[i].value = ret.value;
> > +               counter_mask_available |= BIT(i);
> > +       }
> > +
> > +       GUEST_ASSERT(counter_mask_available > 0);
> > +}
> > +
> > +static unsigned long read_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
> > +{
> > +       unsigned long counter_val = 0;
> > +       struct sbiret ret;
> > +
> > +       __GUEST_ASSERT(ctrinfo.type < 2, "Invalid counter type %d", ctrinfo.type);
> > +
> > +       if (ctrinfo.type == SBI_PMU_CTR_TYPE_HW) {
> > +               counter_val = pmu_csr_read_num(ctrinfo.csr);
> > +       } else if (ctrinfo.type == SBI_PMU_CTR_TYPE_FW) {
> > +               ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ, idx, 0, 0, 0, 0, 0);
> > +               GUEST_ASSERT(ret.error == 0);
> > +               counter_val = ret.value;
> > +       }
> > +
> > +       return counter_val;
> > +}
> > +
> > +static void start_counter(unsigned long counter, unsigned long start_flags,
> > +                         unsigned long ival)
> > +{
> > +       struct sbiret ret;
> > +
> > +       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, counter, 1, start_flags,
> > +                       ival, 0, 0);
> > +       __GUEST_ASSERT(ret.error == 0, "Unable to start counter %ld\n", counter);
> > +}
> > +
> > +static void stop_counter(unsigned long counter, unsigned long stop_flags)
> > +{
> > +       struct sbiret ret;
> > +
> > +       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, counter, 1, stop_flags,
> > +                       0, 0, 0);
> > +       if (stop_flags & SBI_PMU_STOP_FLAG_RESET)
> > +               __GUEST_ASSERT(ret.error == SBI_ERR_ALREADY_STOPPED,
> > +                              "Unable to stop counter %ld\n", counter);
> > +       else
> > +               __GUEST_ASSERT(ret.error == 0, "Unable to stop counter %ld error %ld\n",
> > +                              counter, ret.error);
> > +}
> > +
> > +static void test_pmu_event(unsigned long event)
> > +{
> > +       unsigned long counter;
> > +       unsigned long counter_value_pre, counter_value_post;
> > +       unsigned long counter_init_value = 100;
> > +
> > +       counter = get_counter_index(0, counter_mask_available, 0, event);
> > +       counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
> > +
> > +       /* Do not set the initial value */
> > +       start_counter(counter, 0, counter_init_value);
> > +       dummy_func_loop(10000);
> > +
> > +       stop_counter(counter, 0);
> > +
> > +       counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> > +       __GUEST_ASSERT(counter_value_post > counter_value_pre,
> > +                      "counter_value_post %lx counter_value_pre %lx\n",
> > +                      counter_value_post, counter_value_pre);
> > +
> > +       /* Now set the initial value and compare */
> > +       start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_init_value);
> > +       dummy_func_loop(10000);
> > +
> > +       stop_counter(counter, 0);
> > +
> > +       counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> > +       __GUEST_ASSERT(counter_value_post > counter_init_value,
> > +                      "counter_value_post %lx counter_init_value %lx\n",
> > +                      counter_value_post, counter_init_value);
> > +
> > +       stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
> > +}
> > +
> > +static void test_invalid_event(void)
> > +{
> > +       struct sbiret ret;
> > +       unsigned long event = 0x1234; /* A random event */
> > +
> > +       ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, 0,
> > +                       counter_mask_available, 0, event, 0, 0);
> > +       GUEST_ASSERT_EQ(ret.error, SBI_ERR_NOT_SUPPORTED);
> > +}
> > +
> > +static void test_pmu_events(int cpu)
> > +{
> > +       int num_counters = 0;
> > +
> > +       /* Get the counter details */
> > +       num_counters = get_num_counters();
> > +       update_counter_info(num_counters);
> > +
> > +       /* Sanity testing for any random invalid event */
> > +       test_invalid_event();
> > +
> > +       /* Only these two events are guranteed to be present */
> > +       test_pmu_event(SBI_PMU_HW_CPU_CYCLES);
> > +       test_pmu_event(SBI_PMU_HW_INSTRUCTIONS);
> > +
> > +       GUEST_DONE();
> > +}
> > +
> > +static void test_pmu_basic_sanity(int cpu)
> > +{
> > +       long out_val = 0;
> > +       bool probe;
> > +       struct sbiret ret;
> > +       int num_counters = 0, i;
> > +       unsigned long counter_val = -1;
> > +       union sbi_pmu_ctr_info ctrinfo;
> > +
> > +       probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
> > +       GUEST_ASSERT(probe && out_val == 1);
> > +
> > +       num_counters = get_num_counters();
> > +
> > +       for (i = 0; i < num_counters; i++) {
> > +               ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i,
> > +                               0, 0, 0, 0, 0);
> > +
> > +               /* There can be gaps in logical counter indicies*/
> > +               if (!ret.error)
> > +                       GUEST_ASSERT_NE(ret.value, 0);
> > +               else
> > +                       continue;
> > +
> > +               ctrinfo.value = ret.value;
> > +
> > +               /* Accesibility check of hardware and read capability of firmware counters */
> > +               counter_val = read_counter(i, ctrinfo);
> > +               /* The spec doesn't mandate any initial value. Verify if a sane value */
> > +               GUEST_ASSERT_NE(counter_val, -1);
> > +       }
> > +
> > +       GUEST_DONE();
> > +}
> > +
> > +static void run_vcpu(struct kvm_vcpu *vcpu)
> > +{
> > +       struct ucall uc;
> > +
> > +       vcpu_run(vcpu);
> > +       switch (get_ucall(vcpu, &uc)) {
> > +       case UCALL_ABORT:
> > +               REPORT_GUEST_ASSERT(uc);
> > +               break;
> > +       case UCALL_DONE:
> > +       case UCALL_SYNC:
> > +               break;
> > +       default:
> > +               TEST_FAIL("Unknown ucall %lu", uc.cmd);
> > +               break;
> > +       }
> > +}
> > +
> > +void test_vm_destroy(struct kvm_vm *vm)
> > +{
> > +       memset(ctrinfo_arr, 0, sizeof(union sbi_pmu_ctr_info) * RISCV_MAX_PMU_COUNTERS);
> > +       counter_mask_available = 0;
> > +       kvm_vm_free(vm);
> > +}
> > +
> > +static void test_vm_basic_test(void *guest_code)
> > +{
> > +       struct kvm_vm *vm;
> > +       struct kvm_vcpu *vcpu;
> > +
> > +       vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> > +       __TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
> > +                                  "SBI PMU not available, skipping test");
> > +       vm_init_vector_tables(vm);
> > +       /* Illegal instruction handler is required to verify read access without configuration */
> > +       vm_install_exception_handler(vm, EXC_INST_ILLEGAL, guest_illegal_exception_handler);
> > +
> > +       vcpu_init_vector_tables(vcpu);
> > +       vcpu_args_set(vcpu, 1, 0);
> > +       run_vcpu(vcpu);
> > +
> > +       test_vm_destroy(vm);
> > +}
> > +
> > +static void test_vm_events_test(void *guest_code)
> > +{
> > +       struct kvm_vm *vm = NULL;
> > +       struct kvm_vcpu *vcpu = NULL;
> > +
> > +       vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> > +       __TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
> > +                                  "SBI PMU not available, skipping test");
> > +       vcpu_args_set(vcpu, 1, 0);
> > +       run_vcpu(vcpu);
> > +
> > +       test_vm_destroy(vm);
> > +}
> > +
> > +int main(void)
> > +{
> > +       test_vm_basic_test(test_pmu_basic_sanity);
> > +       pr_info("SBI PMU basic test : PASS\n");
> > +
> > +       test_vm_events_test(test_pmu_events);
> > +       pr_info("SBI PMU event verification test : PASS\n");
> > +
> > +       return 0;
> > +}
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 06/15] RISC-V: KVM: No need to update the counter value during reset
  2024-02-29  1:01 ` [PATCH v4 06/15] RISC-V: KVM: No need to update the counter value during reset Atish Patra
@ 2024-03-02  7:47   ` Andrew Jones
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Jones @ 2024-03-02  7:47 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:21PM -0800, Atish Patra wrote:
> The virtual counter value is updated during pmu_ctr_read. There is no need
> to update it in reset case. Otherwise, it will be counted twice which is
> incorrect.
> 
> Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling")
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/kvm/vcpu_pmu.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index 86391a5061dd..b1574c043f77 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -397,7 +397,6 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>  {
>  	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
>  	int i, pmc_index, sbiret = 0;
> -	u64 enabled, running;
>  	struct kvm_pmc *pmc;
>  	int fevent_code;
>  
> @@ -432,12 +431,9 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>  				sbiret = SBI_ERR_ALREADY_STOPPED;
>  			}
>  
> -			if (flags & SBI_PMU_STOP_FLAG_RESET) {
> -				/* Relase the counter if this is a reset request */
> -				pmc->counter_val += perf_event_read_value(pmc->perf_event,
> -									  &enabled, &running);
> +			if (flags & SBI_PMU_STOP_FLAG_RESET)
> +				/* Release the counter if this is a reset request */
>  				kvm_pmu_release_perf_event(pmc);
> -			}
>  		} else {
>  			sbiret = SBI_ERR_INVALID_PARAM;
>  		}
> -- 
> 2.34.1
>

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 07/15] RISC-V: KVM: No need to exit to the user space if perf event failed
  2024-02-29  1:01 ` [PATCH v4 07/15] RISC-V: KVM: No need to exit to the user space if perf event failed Atish Patra
@ 2024-03-02  8:15   ` Andrew Jones
  2024-04-01 22:37     ` Atish Patra
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-03-02  8:15 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:22PM -0800, Atish Patra wrote:
> Currently, we return a linux error code if creating a perf event failed
> in kvm. That shouldn't be necessary as guest can continue to operate
> without perf profiling or profiling with firmware counters.
> 
> Return appropriate SBI error code to indicate that PMU configuration
> failed. An error message in kvm already describes the reason for failure.

I don't know enough about the perf subsystem to know if there may be
a concern that resources are temporarily unavailable. If so, then this
patch would make it possible for a guest to do the exact same thing,
but sometimes succeed and sometimes get SBI_ERR_NOT_SUPPORTED.
sbi_pmu_counter_config_matching doesn't currently have any error types
specified that say "unsupported at the moment, maybe try again", which
would be more appropriate in that case. I do see
perf_event_create_kernel_counter() can return ENOMEM when memory isn't
available, but if the kernel isn't able to allocate a small amount of
memory, then we're in bigger trouble anyway, so the concern would be
if there are perf resource pools which may temporarily be exhausted at
the time the guest makes this request.

One comment below.

> 
> Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling")
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/kvm/vcpu_pmu.c     | 14 +++++++++-----
>  arch/riscv/kvm/vcpu_sbi_pmu.c |  6 +++---
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index b1574c043f77..29bf4ca798cb 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -229,8 +229,9 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
>  	return 0;
>  }
>  
> -static int kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
> -				     unsigned long flags, unsigned long eidx, unsigned long evtdata)
> +static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
> +				      unsigned long flags, unsigned long eidx,
> +				      unsigned long evtdata)
>  {
>  	struct perf_event *event;
>  
> @@ -454,7 +455,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
>  				     unsigned long eidx, u64 evtdata,
>  				     struct kvm_vcpu_sbi_return *retdata)
>  {
> -	int ctr_idx, ret, sbiret = 0;
> +	int ctr_idx, sbiret = 0;
> +	long ret;
>  	bool is_fevent;
>  	unsigned long event_code;
>  	u32 etype = kvm_pmu_get_perf_event_type(eidx);
> @@ -513,8 +515,10 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
>  			kvpmu->fw_event[event_code].started = true;
>  	} else {
>  		ret = kvm_pmu_create_perf_event(pmc, &attr, flags, eidx, evtdata);
> -		if (ret)
> -			return ret;
> +		if (ret) {
> +			sbiret = SBI_ERR_NOT_SUPPORTED;
> +			goto out;
> +		}
>  	}
>  
>  	set_bit(ctr_idx, kvpmu->pmc_in_use);
> diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> index 7eca72df2cbd..b70179e9e875 100644
> --- a/arch/riscv/kvm/vcpu_sbi_pmu.c
> +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> @@ -42,9 +42,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  #endif
>  		/*
>  		 * This can fail if perf core framework fails to create an event.
> -		 * Forward the error to userspace because it's an error which
> -		 * happened within the host kernel. The other option would be
> -		 * to convert to an SBI error and forward to the guest.
> +		 * No need to forward the error to userspace and exit the guest

Period after guest


> +		 * operation can continue without profiling. Forward the

The operation

> +		 * appropriate SBI error to the guest.
>  		 */
>  		ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
>  						       cp->a2, cp->a3, temp, retdata);
> -- 
> 2.34.1
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 08/15] RISC-V: KVM: Implement SBI PMU Snapshot feature
  2024-02-29  1:01 ` [PATCH v4 08/15] RISC-V: KVM: Implement SBI PMU Snapshot feature Atish Patra
@ 2024-03-02  9:49   ` Andrew Jones
  2024-04-01 22:36     ` Atish Patra
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-03-02  9:49 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:23PM -0800, Atish Patra wrote:
> PMU Snapshot function allows to minimize the number of traps when the
> guest access configures/access the hpmcounters. If the snapshot feature
> is enabled, the hypervisor updates the shared memory with counter
> data and state of overflown counters. The guest can just read the
> shared memory instead of trap & emulate done by the hypervisor.
> 
> This patch doesn't implement the counter overflow yet.
> 
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/include/asm/kvm_vcpu_pmu.h |   7 ++
>  arch/riscv/kvm/vcpu_pmu.c             | 120 +++++++++++++++++++++++++-
>  arch/riscv/kvm/vcpu_sbi_pmu.c         |   3 +
>  drivers/perf/riscv_pmu_sbi.c          |   2 +-
>  4 files changed, 129 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> index 395518a1664e..586bab84be35 100644
> --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> @@ -50,6 +50,10 @@ struct kvm_pmu {
>  	bool init_done;
>  	/* Bit map of all the virtual counter used */
>  	DECLARE_BITMAP(pmc_in_use, RISCV_KVM_MAX_COUNTERS);
> +	/* The address of the counter snapshot area (guest physical address) */
> +	gpa_t snapshot_addr;
> +	/* The actual data of the snapshot */
> +	struct riscv_pmu_snapshot_data *sdata;
>  };
>  
>  #define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu_context)
> @@ -85,6 +89,9 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
>  int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
>  				struct kvm_vcpu_sbi_return *retdata);
>  void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
> +int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
> +				      unsigned long saddr_high, unsigned long flags,
> +				      struct kvm_vcpu_sbi_return *retdata);

I prefer to name this function

kvm_riscv_vcpu_pmu_snapshot_set_shmem

>  void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
>  void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index 29bf4ca798cb..74865e6050a1 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -311,6 +311,81 @@ int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
>  	return ret;
>  }
>  
> +static void kvm_pmu_clear_snapshot_area(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +	int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
> +
> +	if (kvpmu->sdata) {
> +		memset(kvpmu->sdata, 0, snapshot_area_size);
> +		if (kvpmu->snapshot_addr != INVALID_GPA)

It's a KVM bug if we have non-null sdata but snapshot_addr is INVALID_GPA,
right? Maybe we should warn if we see that. We can also move the memset
inside the if block.

> +			kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr,
> +					     kvpmu->sdata, snapshot_area_size);
> +		kfree(kvpmu->sdata);
> +		kvpmu->sdata = NULL;
> +	}
> +	kvpmu->snapshot_addr = INVALID_GPA;
> +}
> +
> +int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
> +				      unsigned long saddr_high, unsigned long flags,
> +				      struct kvm_vcpu_sbi_return *retdata)
> +{
> +	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +	int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
> +	int sbiret = 0;
> +	gpa_t saddr;
> +	unsigned long hva;
> +	bool writable;
> +
> +	if (!kvpmu) {
> +		sbiret = SBI_ERR_INVALID_PARAM;
> +		goto out;
> +	}

Need to check that flags is zero or return SBI_ERR_INVALID_PARAM.

> +
> +	if (saddr_low == -1 && saddr_high == -1) {

We introduced SBI_STA_SHMEM_DISABLE for these magic -1's for STA. Since
SBI is using the -1 approach for all its shmem, then maybe we should
rename SBI_STA_SHMEM_DISABLE to SBI_SHMEM_DISABLE and then use them here
too.

> +		kvm_pmu_clear_snapshot_area(vcpu);
> +		return 0;
> +	}
> +
> +	saddr = saddr_low;
> +
> +	if (saddr_high != 0) {
> +		if (IS_ENABLED(CONFIG_32BIT))
> +			saddr |= ((gpa_t)saddr << 32);
> +		else
> +			sbiret = SBI_ERR_INVALID_ADDRESS;
> +		goto out;
> +	}
> +
> +	if (kvm_is_error_gpa(vcpu->kvm, saddr)) {
> +		sbiret = SBI_ERR_INVALID_PARAM;
> +		goto out;
> +	}

Does the check above provide anything more than what the check below does?

> +
> +	hva = kvm_vcpu_gfn_to_hva_prot(vcpu, saddr >> PAGE_SHIFT, &writable);
> +	if (kvm_is_error_hva(hva) || !writable) {
> +		sbiret = SBI_ERR_INVALID_ADDRESS;
> +		goto out;
> +	}
> +
> +	kvpmu->snapshot_addr = saddr;
> +	kvpmu->sdata = kzalloc(snapshot_area_size, GFP_ATOMIC);
> +	if (!kvpmu->sdata)

Should reset snapshot_addr to INVALID_GPA here on error. Or maybe we
should just set snapshot_addr to saddr at the bottom of this function if
we make it.

> +		return -ENOMEM;
> +
> +	if (kvm_vcpu_write_guest(vcpu, saddr, kvpmu->sdata, snapshot_area_size)) {
> +		kfree(kvpmu->sdata);
> +		kvpmu->snapshot_addr = INVALID_GPA;
> +		sbiret = SBI_ERR_FAILURE;

I agree we should return this SBI error for this case, but unfortunately
the spec is missing the

 SBI_ERR_FAILED - The request failed for unspecified or unknown other reasons.

that we have for other SBI functions. I guess we should keep the code like
this and open a PR to the spec.

> +	}
> +
> +out:
> +	retdata->err_val = sbiret;
> +
> +	return 0;
> +}
> +
>  int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu,
>  				struct kvm_vcpu_sbi_return *retdata)
>  {
> @@ -344,20 +419,33 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>  	int i, pmc_index, sbiret = 0;
>  	struct kvm_pmc *pmc;
>  	int fevent_code;
> +	bool snap_flag_set = flags & SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT;

This function should confirm no undefined bits are set in flags and the
spec should specify that the reserved flags must be zero otherwise an
invalid param will be returned.

Also here would should confirm that only one of the two flags is set,
otherwise return invalid param, as they've specified to be mutually
exclusive.

Regarding the spec, the note about the counter value not being modified
unless SBI_PMU_START_SET_INIT_VALUE is set should be modified to state
unless either of the two flags are set (so I think we need another spec
PR).

(The same flags checking/specifying comments apply to the other functions
with flags too.)

>  
>  	if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
>  		sbiret = SBI_ERR_INVALID_PARAM;
>  		goto out;
>  	}
>  
> +	if (snap_flag_set && kvpmu->snapshot_addr == INVALID_GPA) {
> +		sbiret = SBI_ERR_NO_SHMEM;
> +		goto out;
> +	}
> +
>  	/* Start the counters that have been configured and requested by the guest */
>  	for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
>  		pmc_index = i + ctr_base;
>  		if (!test_bit(pmc_index, kvpmu->pmc_in_use))
>  			continue;
>  		pmc = &kvpmu->pmc[pmc_index];
> -		if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE)
> +		if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE) {
>  			pmc->counter_val = ival;
> +		} else if (snap_flag_set) {
> +			kvm_vcpu_read_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
> +					    sizeof(struct riscv_pmu_snapshot_data));

The snapshot read should be outside the for_each_set_bit() loop and we
should warn and abort the counter starting if the read fails.

> +			/* The counter index in the snapshot are relative to the counter base */
> +			pmc->counter_val = kvpmu->sdata->ctr_values[i];
> +		}
> +
>  		if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
>  			fevent_code = get_event_code(pmc->event_idx);
>  			if (fevent_code >= SBI_PMU_FW_MAX) {
> @@ -398,14 +486,21 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>  {
>  	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
>  	int i, pmc_index, sbiret = 0;
> +	u64 enabled, running;
>  	struct kvm_pmc *pmc;
>  	int fevent_code;
> +	bool snap_flag_set = flags & SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
>  
> -	if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
> +	if ((kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0)) {

Added unnecessary () here.

>  		sbiret = SBI_ERR_INVALID_PARAM;
>  		goto out;
>  	}
>  
> +	if (snap_flag_set && kvpmu->snapshot_addr == INVALID_GPA) {
> +		sbiret = SBI_ERR_NO_SHMEM;
> +		goto out;
> +	}
> +
>  	/* Stop the counters that have been configured and requested by the guest */
>  	for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
>  		pmc_index = i + ctr_base;
> @@ -438,9 +533,28 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>  		} else {
>  			sbiret = SBI_ERR_INVALID_PARAM;
>  		}
> +
> +		if (snap_flag_set && !sbiret) {
> +			if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW)
> +				pmc->counter_val = kvpmu->fw_event[fevent_code].value;
> +			else if (pmc->perf_event)
> +				pmc->counter_val += perf_event_read_value(pmc->perf_event,
> +									  &enabled, &running);
> +			/* TODO: Add counter overflow support when sscofpmf support is added */
> +			kvpmu->sdata->ctr_values[i] = pmc->counter_val;
> +			kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
> +					     sizeof(struct riscv_pmu_snapshot_data));

Should just set a boolean here saying that the snapshot needs an update
and then do the update outside the for_each_set_bit loop.

> +		}
> +
>  		if (flags & SBI_PMU_STOP_FLAG_RESET) {
>  			pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
>  			clear_bit(pmc_index, kvpmu->pmc_in_use);
> +			if (snap_flag_set) {
> +				/* Clear the snapshot area for the upcoming deletion event */
> +				kvpmu->sdata->ctr_values[i] = 0;
> +				kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
> +						     sizeof(struct riscv_pmu_snapshot_data));

The spec isn't clear on this (so we should clarify it), but I'd expect
that a caller who set both the reset and the snapshot flag would want
the snapshot from before the reset when this call completes and then
assume that when they start counting again, and look at the snapshot
again, that those new counts would be from the reset values. Or maybe
not :-) Maybe they want to do a reset and take a snapshot in order to
look at the snapshot and confirm the reset happened? Either way, it
seems we should only do one of the two here. Either update the snapshot
before resetting, and not again after reset, or reset and then update
the snapshot (with no need to update before).

> +			}
>  		}
>  	}
>  
> @@ -566,6 +680,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
>  	kvpmu->num_hw_ctrs = num_hw_ctrs + 1;
>  	kvpmu->num_fw_ctrs = SBI_PMU_FW_MAX;
>  	memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
> +	kvpmu->snapshot_addr = INVALID_GPA;
>  
>  	if (kvpmu->num_hw_ctrs > RISCV_KVM_MAX_HW_CTRS) {
>  		pr_warn_once("Limiting the hardware counters to 32 as specified by the ISA");
> @@ -625,6 +740,7 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
>  	}
>  	bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
>  	memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
> +	kvm_pmu_clear_snapshot_area(vcpu);
>  }
>  
>  void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> index b70179e9e875..9f61136e4bb1 100644
> --- a/arch/riscv/kvm/vcpu_sbi_pmu.c
> +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> @@ -64,6 +64,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  	case SBI_EXT_PMU_COUNTER_FW_READ:
>  		ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
>  		break;
> +	case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM:
> +		ret = kvm_riscv_vcpu_pmu_setup_snapshot(vcpu, cp->a0, cp->a1, cp->a2, retdata);
> +		break;
>  	default:
>  		retdata->err_val = SBI_ERR_NOT_SUPPORTED;
>  	}
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 8de5721e8019..1a22ce1ff8c8 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -802,7 +802,7 @@ static noinline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_h
>  	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
>  
>  	for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
> -		if (ctr_ovf_mask & (1 << idx)) {
> +		if (ctr_ovf_mask & (BIT(idx))) {
>  			event = cpu_hw_evt->events[idx];
>  			hwc = &event->hw;
>  			max_period = riscv_pmu_ctr_get_width_mask(event);
> -- 
> 2.34.1
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 09/15] RISC-V: KVM: Add perf sampling support for guests
  2024-02-29  1:01 ` [PATCH v4 09/15] RISC-V: KVM: Add perf sampling support for guests Atish Patra
@ 2024-03-02 10:33   ` Andrew Jones
  2024-04-02  8:33     ` Atish Patra
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-03-02 10:33 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:24PM -0800, Atish Patra wrote:
> KVM enables perf for guest via counter virtualization. However, the
> sampling can not be supported as there is no mechanism to enabled
> trap/emulate scountovf in ISA yet. Rely on the SBI PMU snapshot
> to provide the counter overflow data via the shared memory.
> 
> In case of sampling event, the host first guest the LCOFI interrupt
       
s/guest the LCOFI/sets the guest's LCOFI/

> and injects to the guest via irq filtering mechanism defined in AIA
> specification. Thus, ssaia must be enabled in the host in order to
> use perf sampling in the guest. No other AIA dpeendancy w.r.t kernel

dependency

> is required.
> 
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/include/asm/csr.h          |  3 +-
>  arch/riscv/include/asm/kvm_vcpu_pmu.h |  3 ++
>  arch/riscv/include/uapi/asm/kvm.h     |  1 +
>  arch/riscv/kvm/aia.c                  |  5 ++
>  arch/riscv/kvm/vcpu.c                 | 14 ++++--
>  arch/riscv/kvm/vcpu_onereg.c          |  9 +++-
>  arch/riscv/kvm/vcpu_pmu.c             | 72 ++++++++++++++++++++++++---
>  7 files changed, 96 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index 603e5a3c61f9..c0de2fd6c564 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -168,7 +168,8 @@
>  #define VSIP_TO_HVIP_SHIFT	(IRQ_VS_SOFT - IRQ_S_SOFT)
>  #define VSIP_VALID_MASK		((_AC(1, UL) << IRQ_S_SOFT) | \
>  				 (_AC(1, UL) << IRQ_S_TIMER) | \
> -				 (_AC(1, UL) << IRQ_S_EXT))
> +				 (_AC(1, UL) << IRQ_S_EXT) | \
> +				 (_AC(1, UL) << IRQ_PMU_OVF))
>  
>  /* AIA CSR bits */
>  #define TOPI_IID_SHIFT		16
> diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> index 586bab84be35..8cb21a4f862c 100644
> --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> @@ -36,6 +36,7 @@ struct kvm_pmc {
>  	bool started;
>  	/* Monitoring event ID */
>  	unsigned long event_idx;
> +	struct kvm_vcpu *vcpu;
>  };
>  
>  /* PMU data structure per vcpu */
> @@ -50,6 +51,8 @@ struct kvm_pmu {
>  	bool init_done;
>  	/* Bit map of all the virtual counter used */
>  	DECLARE_BITMAP(pmc_in_use, RISCV_KVM_MAX_COUNTERS);
> +	/* Bit map of all the virtual counter overflown */
> +	DECLARE_BITMAP(pmc_overflown, RISCV_KVM_MAX_COUNTERS);
>  	/* The address of the counter snapshot area (guest physical address) */
>  	gpa_t snapshot_addr;
>  	/* The actual data of the snapshot */
> diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
> index 7499e88a947c..e8b7545f1803 100644
> --- a/arch/riscv/include/uapi/asm/kvm.h
> +++ b/arch/riscv/include/uapi/asm/kvm.h
> @@ -166,6 +166,7 @@ enum KVM_RISCV_ISA_EXT_ID {
>  	KVM_RISCV_ISA_EXT_ZVFH,
>  	KVM_RISCV_ISA_EXT_ZVFHMIN,
>  	KVM_RISCV_ISA_EXT_ZFA,
> +	KVM_RISCV_ISA_EXT_SSCOFPMF,
>  	KVM_RISCV_ISA_EXT_MAX,
>  };
>  
> diff --git a/arch/riscv/kvm/aia.c b/arch/riscv/kvm/aia.c
> index a944294f6f23..0f0a9d11bb5f 100644
> --- a/arch/riscv/kvm/aia.c
> +++ b/arch/riscv/kvm/aia.c
> @@ -545,6 +545,9 @@ void kvm_riscv_aia_enable(void)
>  	enable_percpu_irq(hgei_parent_irq,
>  			  irq_get_trigger_type(hgei_parent_irq));
>  	csr_set(CSR_HIE, BIT(IRQ_S_GEXT));
> +	/* Enable IRQ filtering for overflow interrupt only if sscofpmf is present */
> +	if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF))
> +		csr_write(CSR_HVIEN, BIT(IRQ_PMU_OVF));
>  }
>  
>  void kvm_riscv_aia_disable(void)
> @@ -558,6 +561,8 @@ void kvm_riscv_aia_disable(void)
>  		return;
>  	hgctrl = get_cpu_ptr(&aia_hgei);
>  
> +	if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF))
> +		csr_clear(CSR_HVIEN, BIT(IRQ_PMU_OVF));
>  	/* Disable per-CPU SGEI interrupt */
>  	csr_clear(CSR_HIE, BIT(IRQ_S_GEXT));
>  	disable_percpu_irq(hgei_parent_irq);
> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> index b5ca9f2e98ac..fcd8ad4de4d2 100644
> --- a/arch/riscv/kvm/vcpu.c
> +++ b/arch/riscv/kvm/vcpu.c
> @@ -365,6 +365,12 @@ void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> +	/* Sync up the HVIP.LCOFIP bit changes (only clear) by the guest */
> +	if ((csr->hvip ^ hvip) & (1UL << IRQ_PMU_OVF)) {
> +		if (!test_and_set_bit(IRQ_PMU_OVF, v->irqs_pending_mask))
> +			clear_bit(IRQ_PMU_OVF, v->irqs_pending);
> +	}
> +
>  	/* Sync-up AIA high interrupts */
>  	kvm_riscv_vcpu_aia_sync_interrupts(vcpu);
>  
> @@ -382,7 +388,8 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
>  	if (irq < IRQ_LOCAL_MAX &&
>  	    irq != IRQ_VS_SOFT &&
>  	    irq != IRQ_VS_TIMER &&
> -	    irq != IRQ_VS_EXT)
> +	    irq != IRQ_VS_EXT &&
> +	    irq != IRQ_PMU_OVF)
>  		return -EINVAL;
>  
>  	set_bit(irq, vcpu->arch.irqs_pending);
> @@ -397,14 +404,15 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
>  int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
>  {
>  	/*
> -	 * We only allow VS-mode software, timer, and external
> +	 * We only allow VS-mode software, timer, counter overflow and external
>  	 * interrupts when irq is one of the local interrupts
>  	 * defined by RISC-V privilege specification.
>  	 */
>  	if (irq < IRQ_LOCAL_MAX &&
>  	    irq != IRQ_VS_SOFT &&
>  	    irq != IRQ_VS_TIMER &&
> -	    irq != IRQ_VS_EXT)
> +	    irq != IRQ_VS_EXT &&
> +	    irq != IRQ_PMU_OVF)
>  		return -EINVAL;
>  
>  	clear_bit(irq, vcpu->arch.irqs_pending);
> diff --git a/arch/riscv/kvm/vcpu_onereg.c b/arch/riscv/kvm/vcpu_onereg.c
> index 5f7355e96008..a072910820c2 100644
> --- a/arch/riscv/kvm/vcpu_onereg.c
> +++ b/arch/riscv/kvm/vcpu_onereg.c
> @@ -36,6 +36,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
>  	/* Multi letter extensions (alphabetically sorted) */
>  	KVM_ISA_EXT_ARR(SMSTATEEN),
>  	KVM_ISA_EXT_ARR(SSAIA),
> +	KVM_ISA_EXT_ARR(SSCOFPMF),
>  	KVM_ISA_EXT_ARR(SSTC),
>  	KVM_ISA_EXT_ARR(SVINVAL),
>  	KVM_ISA_EXT_ARR(SVNAPOT),
> @@ -115,6 +116,7 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
>  	case KVM_RISCV_ISA_EXT_I:
>  	case KVM_RISCV_ISA_EXT_M:
>  	case KVM_RISCV_ISA_EXT_SSTC:
> +	case KVM_RISCV_ISA_EXT_SSCOFPMF:

It should go above SSTC to keep the alphabet happy,

but it should be possible for the VMM to disable this extension in the
guest. We just need to change all the checks in KVM of the host's ISA
for RISCV_ISA_EXT_SSCOFPMF to checking the guest's ISA instead. Maybe
it's not worth it, though, if the guest PMU isn't useful without overflow.
But, sometimes it's nice to be able to disable stuff for debug and
workarounds.

>  	case KVM_RISCV_ISA_EXT_SVINVAL:
>  	case KVM_RISCV_ISA_EXT_SVNAPOT:
>  	case KVM_RISCV_ISA_EXT_ZBA:
> @@ -171,8 +173,13 @@ void kvm_riscv_vcpu_setup_isa(struct kvm_vcpu *vcpu)
>  	for (i = 0; i < ARRAY_SIZE(kvm_isa_ext_arr); i++) {
>  		host_isa = kvm_isa_ext_arr[i];
>  		if (__riscv_isa_extension_available(NULL, host_isa) &&
> -		    kvm_riscv_vcpu_isa_enable_allowed(i))
> +		    kvm_riscv_vcpu_isa_enable_allowed(i)) {
> +			/* Sscofpmf depends on interrupt filtering defined in ssaia */
> +			if (host_isa == RISCV_ISA_EXT_SSCOFPMF &&
> +			    !__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSAIA))
> +				continue;

We shouldn't need to change kvm_riscv_vcpu_setup_isa(). We just need to
add a case for KVM_RISCV_ISA_EXT_SSCOFPMF to
kvm_riscv_vcpu_isa_enable_allowed().

>  			set_bit(host_isa, vcpu->arch.isa);
> +		}
>  	}
>  }
>  
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index 74865e6050a1..a02f7b981005 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -39,7 +39,7 @@ static u64 kvm_pmu_get_sample_period(struct kvm_pmc *pmc)
>  	u64 sample_period;
>  
>  	if (!pmc->counter_val)
> -		sample_period = counter_val_mask + 1;
> +		sample_period = counter_val_mask;

This change looks unrelated.

>  	else
>  		sample_period = (-pmc->counter_val) & counter_val_mask;
>  
> @@ -229,6 +229,47 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
>  	return 0;
>  }
>  
> +static void kvm_riscv_pmu_overflow(struct perf_event *perf_event,
> +				   struct perf_sample_data *data,
> +				   struct pt_regs *regs)
> +{
> +	struct kvm_pmc *pmc = perf_event->overflow_handler_context;
> +	struct kvm_vcpu *vcpu = pmc->vcpu;
> +	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +	struct riscv_pmu *rpmu = to_riscv_pmu(perf_event->pmu);
> +	u64 period;
> +
> +	/*
> +	 * Stop the event counting by directly accessing the perf_event.
> +	 * Otherwise, this needs to deferred via a workqueue.
> +	 * That will introduce skew in the counter value because the actual
> +	 * physical counter would start after returning from this function.
> +	 * It will be stopped again once the workqueue is scheduled
> +	 */
> +	rpmu->pmu.stop(perf_event, PERF_EF_UPDATE);
> +
> +	/*
> +	 * The hw counter would start automatically when this function returns.
> +	 * Thus, the host may continue to interrupt and inject it to the guest
> +	 * even without the guest configuring the next event. Depending on the hardware
> +	 * the host may have some sluggishness only if privilege mode filtering is not
> +	 * available. In an ideal world, where qemu is not the only capable hardware,
> +	 * this can be removed.
> +	 * FYI: ARM64 does this way while x86 doesn't do anything as such.
> +	 * TODO: Should we keep it for RISC-V ?
> +	 */
> +	period = -(local64_read(&perf_event->count));
> +
> +	local64_set(&perf_event->hw.period_left, 0);
> +	perf_event->attr.sample_period = period;
> +	perf_event->hw.sample_period = period;
> +
> +	set_bit(pmc->idx, kvpmu->pmc_overflown);
> +	kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_PMU_OVF);
> +
> +	rpmu->pmu.start(perf_event, PERF_EF_RELOAD);
> +}
> +
>  static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
>  				      unsigned long flags, unsigned long eidx,
>  				      unsigned long evtdata)
> @@ -248,7 +289,7 @@ static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_att
>  	 */
>  	attr->sample_period = kvm_pmu_get_sample_period(pmc);
>  
> -	event = perf_event_create_kernel_counter(attr, -1, current, NULL, pmc);
> +	event = perf_event_create_kernel_counter(attr, -1, current, kvm_riscv_pmu_overflow, pmc);
>  	if (IS_ERR(event)) {
>  		pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
>  		return PTR_ERR(event);
> @@ -436,6 +477,8 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>  		pmc_index = i + ctr_base;
>  		if (!test_bit(pmc_index, kvpmu->pmc_in_use))
>  			continue;
> +		/* The guest started the counter again. Reset the overflow status */
> +		clear_bit(pmc_index, kvpmu->pmc_overflown);
>  		pmc = &kvpmu->pmc[pmc_index];
>  		if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE) {
>  			pmc->counter_val = ival;
> @@ -474,6 +517,10 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>  		}
>  	}
>  
> +	/* The guest have serviced the interrupt and starting the counter again */
> +	if (test_bit(IRQ_PMU_OVF, vcpu->arch.irqs_pending))
> +		kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_PMU_OVF);
> +
>  out:
>  	retdata->err_val = sbiret;
>  
> @@ -540,7 +587,13 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>  			else if (pmc->perf_event)
>  				pmc->counter_val += perf_event_read_value(pmc->perf_event,
>  									  &enabled, &running);
> -			/* TODO: Add counter overflow support when sscofpmf support is added */
> +			/*
> +			 * The counter and overflow indicies in the snapshot region are w.r.to
> +			 * cbase. Modify the set bit in the counter mask instead of the pmc_index
> +			 * which indicates the absolute counter index.
> +			 */
> +			if (test_bit(pmc_index, kvpmu->pmc_overflown))
> +				kvpmu->sdata->ctr_overflow_mask |= (1UL << i);

Just in case you missed this one; BIT()

>  			kvpmu->sdata->ctr_values[i] = pmc->counter_val;
>  			kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
>  					     sizeof(struct riscv_pmu_snapshot_data));
> @@ -549,15 +602,20 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>  		if (flags & SBI_PMU_STOP_FLAG_RESET) {
>  			pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
>  			clear_bit(pmc_index, kvpmu->pmc_in_use);
> +			clear_bit(pmc_index, kvpmu->pmc_overflown);
>  			if (snap_flag_set) {
>  				/* Clear the snapshot area for the upcoming deletion event */
>  				kvpmu->sdata->ctr_values[i] = 0;
> +				/*
> +				 * Only clear the given counter as the caller is responsible to
> +				 * validate both the overflow mask and configured counters.
> +				 */
> +				kvpmu->sdata->ctr_overflow_mask &= ~(1UL << i);

And another BIT()

>  				kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
>  						     sizeof(struct riscv_pmu_snapshot_data));
>  			}
>  		}
>  	}
> -
>  out:
>  	retdata->err_val = sbiret;
>  
> @@ -700,6 +758,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
>  		pmc = &kvpmu->pmc[i];
>  		pmc->idx = i;
>  		pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
> +		pmc->vcpu = vcpu;
>  		if (i < kvpmu->num_hw_ctrs) {
>  			pmc->cinfo.type = SBI_PMU_CTR_TYPE_HW;
>  			if (i < 3)
> @@ -732,13 +791,14 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
>  	if (!kvpmu)
>  		return;
>  
> -	for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) {
> +	for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS) {
>  		pmc = &kvpmu->pmc[i];
>  		pmc->counter_val = 0;
>  		kvm_pmu_release_perf_event(pmc);
>  		pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
>  	}
> -	bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
> +	bitmap_zero(kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS);

Ideally the RISCV_MAX_COUNTERS change would go in a separate patch,
but 64 == 64, so OK.

> +	bitmap_zero(kvpmu->pmc_overflown, RISCV_KVM_MAX_COUNTERS);
>  	memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
>  	kvm_pmu_clear_snapshot_area(vcpu);
>  }
> -- 
> 2.34.1
> 

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 10/15] RISC-V: KVM: Support 64 bit firmware counters on RV32
  2024-02-29  1:01 ` [PATCH v4 10/15] RISC-V: KVM: Support 64 bit firmware counters on RV32 Atish Patra
@ 2024-03-02 10:52   ` Andrew Jones
  2024-04-02  0:03     ` Atish Patra
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-03-02 10:52 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:25PM -0800, Atish Patra wrote:
> The SBI v2.0 introduced a fw_read_hi function to read 64 bit firmware
> counters for RV32 based systems.
> 
> Add infrastructure to support that.
> 
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/include/asm/kvm_vcpu_pmu.h |  4 ++-
>  arch/riscv/kvm/vcpu_pmu.c             | 37 ++++++++++++++++++++++++++-
>  arch/riscv/kvm/vcpu_sbi_pmu.c         |  6 +++++
>  3 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> index 8cb21a4f862c..e0ad27dea46c 100644
> --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> @@ -20,7 +20,7 @@ static_assert(RISCV_KVM_MAX_COUNTERS <= 64);
>  
>  struct kvm_fw_event {
>  	/* Current value of the event */
> -	unsigned long value;
> +	u64 value;
>  
>  	/* Event monitoring status */
>  	bool started;
> @@ -91,6 +91,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
>  				     struct kvm_vcpu_sbi_return *retdata);
>  int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
>  				struct kvm_vcpu_sbi_return *retdata);
> +int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
> +				      struct kvm_vcpu_sbi_return *retdata);
>  void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
>  int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
>  				      unsigned long saddr_high, unsigned long flags,
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index a02f7b981005..469bb430cf97 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -196,6 +196,29 @@ static int pmu_get_pmc_index(struct kvm_pmu *pmu, unsigned long eidx,
>  	return kvm_pmu_get_programmable_pmc_index(pmu, eidx, cbase, cmask);
>  }
>  
> +static int pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
> +			      unsigned long *out_val)
> +{
> +	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +	struct kvm_pmc *pmc;
> +	int fevent_code;
> +
> +	if (!IS_ENABLED(CONFIG_32BIT))

Let's remove the CONFIG_32BIT check in kvm_sbi_ext_pmu_handler() and then
set *out_val to zero here and return success. Either that, or we should
WARN or something here since it's a KVM bug to get here with
!CONFIG_32BIT.

> +		return -EINVAL;
> +
> +	pmc = &kvpmu->pmc[cidx];

Uh oh! We're missing range validation of cidx! And I see we're missing it
in pmu_ctr_read() too. We need the same check we have in
kvm_riscv_vcpu_pmu_ctr_info(). I think the other SBI functions are OK,
but it's worth a triple check.

> +
> +	if (pmc->cinfo.type != SBI_PMU_CTR_TYPE_FW)
> +		return -EINVAL;
> +
> +	fevent_code = get_event_code(pmc->event_idx);
> +	pmc->counter_val = kvpmu->fw_event[fevent_code].value;
> +
> +	*out_val = pmc->counter_val >> 32;
> +
> +	return 0;
> +}
> +
>  static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
>  			unsigned long *out_val)
>  {
> @@ -702,6 +725,18 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
>  	return 0;
>  }
>  
> +int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
> +				      struct kvm_vcpu_sbi_return *retdata)
> +{
> +	int ret;
> +
> +	ret = pmu_fw_ctr_read_hi(vcpu, cidx, &retdata->out_val);
> +	if (ret == -EINVAL)
> +		retdata->err_val = SBI_ERR_INVALID_PARAM;
> +
> +	return 0;

I see this follows the pattern we have with kvm_riscv_vcpu_pmu_ctr_read
and pmu_ctr_read, but I wonder if we really need the
kvm_riscv_vcpu_pmu_ctr_read() and kvm_riscv_vcpu_pmu_fw_ctr_read_hi()
wrapper functions?

> +}
> +
>  int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
>  				struct kvm_vcpu_sbi_return *retdata)
>  {
> @@ -775,7 +810,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
>  			pmc->cinfo.csr = CSR_CYCLE + i;
>  		} else {
>  			pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
> -			pmc->cinfo.width = BITS_PER_LONG - 1;
> +			pmc->cinfo.width = 63;
>  		}
>  	}
>  
> diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> index 9f61136e4bb1..58a0e5587e2a 100644
> --- a/arch/riscv/kvm/vcpu_sbi_pmu.c
> +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> @@ -64,6 +64,12 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  	case SBI_EXT_PMU_COUNTER_FW_READ:
>  		ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
>  		break;
> +	case SBI_EXT_PMU_COUNTER_FW_READ_HI:
> +		if (IS_ENABLED(CONFIG_32BIT))
> +			ret = kvm_riscv_vcpu_pmu_fw_ctr_read_hi(vcpu, cp->a0, retdata);
> +		else
> +			retdata->out_val = 0;
> +		break;
>  	case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM:
>  		ret = kvm_riscv_vcpu_pmu_setup_snapshot(vcpu, cp->a0, cp->a1, cp->a2, retdata);
>  		break;
> -- 
> 2.34.1
> 

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 11/15] KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
  2024-02-29  1:01 ` [PATCH v4 11/15] KVM: riscv: selftests: Add Sscofpmf to get-reg-list test Atish Patra
  2024-03-01  4:42   ` Anup Patel
@ 2024-03-02 10:52   ` Andrew Jones
  1 sibling, 0 replies; 56+ messages in thread
From: Andrew Jones @ 2024-03-02 10:52 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Anup Patel,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:26PM -0800, Atish Patra wrote:
> The KVM RISC-V allows Sscofpmf extension for Guest/VM so let us
> add this extension to get-reg-list test.
> 
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  tools/testing/selftests/kvm/riscv/get-reg-list.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/riscv/get-reg-list.c b/tools/testing/selftests/kvm/riscv/get-reg-list.c
> index 8cece02ca23a..ca6d98a5dce5 100644
> --- a/tools/testing/selftests/kvm/riscv/get-reg-list.c
> +++ b/tools/testing/selftests/kvm/riscv/get-reg-list.c
> @@ -43,6 +43,7 @@ bool filter_reg(__u64 reg)
>  	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_V:
>  	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SMSTATEEN:
>  	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSAIA:
> +	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSCOFPMF:
>  	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSTC:
>  	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVINVAL:
>  	case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVNAPOT:
> @@ -406,6 +407,7 @@ static const char *isa_ext_single_id_to_str(__u64 reg_off)
>  		KVM_ISA_EXT_ARR(V),
>  		KVM_ISA_EXT_ARR(SMSTATEEN),
>  		KVM_ISA_EXT_ARR(SSAIA),
> +		KVM_ISA_EXT_ARR(SSCOFPMF),
>  		KVM_ISA_EXT_ARR(SSTC),
>  		KVM_ISA_EXT_ARR(SVINVAL),
>  		KVM_ISA_EXT_ARR(SVNAPOT),
> @@ -927,6 +929,7 @@ KVM_ISA_EXT_SUBLIST_CONFIG(fp_f, FP_F);
>  KVM_ISA_EXT_SUBLIST_CONFIG(fp_d, FP_D);
>  KVM_ISA_EXT_SIMPLE_CONFIG(h, H);
>  KVM_ISA_EXT_SUBLIST_CONFIG(smstateen, SMSTATEEN);
> +KVM_ISA_EXT_SIMPLE_CONFIG(sscofpmf, SSCOFPMF);
>  KVM_ISA_EXT_SIMPLE_CONFIG(sstc, SSTC);
>  KVM_ISA_EXT_SIMPLE_CONFIG(svinval, SVINVAL);
>  KVM_ISA_EXT_SIMPLE_CONFIG(svnapot, SVNAPOT);
> @@ -980,6 +983,7 @@ struct vcpu_reg_list *vcpu_configs[] = {
>  	&config_fp_d,
>  	&config_h,
>  	&config_smstateen,
> +	&config_sscofpmf,
>  	&config_sstc,
>  	&config_svinval,
>  	&config_svnapot,
> -- 
> 2.34.1
>

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 12/15] KVM: riscv: selftests: Add SBI PMU extension definitions
  2024-02-29  1:01 ` [PATCH v4 12/15] KVM: riscv: selftests: Add SBI PMU extension definitions Atish Patra
  2024-03-01  4:43   ` Anup Patel
@ 2024-03-02 11:00   ` Andrew Jones
  2024-04-02  8:43     ` Atish Patra
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-03-02 11:00 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Anup Patel,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:27PM -0800, Atish Patra wrote:
> The SBI PMU extension definition is required for upcoming SBI PMU
> selftests.
> 
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  .../selftests/kvm/include/riscv/processor.h   | 67 +++++++++++++++++++
>  1 file changed, 67 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
> index f75c381fa35a..a49a39c8e8d4 100644
> --- a/tools/testing/selftests/kvm/include/riscv/processor.h
> +++ b/tools/testing/selftests/kvm/include/riscv/processor.h

We should probably create a new header (include/riscv/sbi.h) since
otherwise processor.h is very quickly going to look like an SBI
header with a few non-sbi things in it. Can we add a patch prior to
this one that moves the SBI stuff we currently have in processor.h
out to an sbi.h? Or, we could start synchronizing a copy of
arch/riscv/include/asm/sbi.h in tools/arch/riscv/include/asm like
we've done for csr.h

> @@ -169,17 +169,84 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector, exception_handl
>  enum sbi_ext_id {
>  	SBI_EXT_BASE = 0x10,
>  	SBI_EXT_STA = 0x535441,
> +	SBI_EXT_PMU = 0x504D55,
>  };
>  
>  enum sbi_ext_base_fid {
>  	SBI_EXT_BASE_PROBE_EXT = 3,
>  };
>  
> +enum sbi_ext_pmu_fid {
> +	SBI_EXT_PMU_NUM_COUNTERS = 0,
> +	SBI_EXT_PMU_COUNTER_GET_INFO,
> +	SBI_EXT_PMU_COUNTER_CFG_MATCH,
> +	SBI_EXT_PMU_COUNTER_START,
> +	SBI_EXT_PMU_COUNTER_STOP,
> +	SBI_EXT_PMU_COUNTER_FW_READ,
> +	SBI_EXT_PMU_COUNTER_FW_READ_HI,
> +	SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> +};
> +
> +union sbi_pmu_ctr_info {
> +	unsigned long value;
> +	struct {
> +		unsigned long csr:12;
> +		unsigned long width:6;
> +#if __riscv_xlen == 32
> +		unsigned long reserved:13;
> +#else
> +		unsigned long reserved:45;
> +#endif
> +		unsigned long type:1;
> +	};
> +};
> +
>  struct sbiret {
>  	long error;
>  	long value;
>  };
>  
> +/** General pmu event codes specified in SBI PMU extension */
> +enum sbi_pmu_hw_generic_events_t {
> +	SBI_PMU_HW_NO_EVENT			= 0,
> +	SBI_PMU_HW_CPU_CYCLES			= 1,
> +	SBI_PMU_HW_INSTRUCTIONS			= 2,
> +	SBI_PMU_HW_CACHE_REFERENCES		= 3,
> +	SBI_PMU_HW_CACHE_MISSES			= 4,
> +	SBI_PMU_HW_BRANCH_INSTRUCTIONS		= 5,
> +	SBI_PMU_HW_BRANCH_MISSES		= 6,
> +	SBI_PMU_HW_BUS_CYCLES			= 7,
> +	SBI_PMU_HW_STALLED_CYCLES_FRONTEND	= 8,
> +	SBI_PMU_HW_STALLED_CYCLES_BACKEND	= 9,
> +	SBI_PMU_HW_REF_CPU_CYCLES		= 10,
> +
> +	SBI_PMU_HW_GENERAL_MAX,
> +};
> +
> +/* SBI PMU counter types */
> +enum sbi_pmu_ctr_type {
> +	SBI_PMU_CTR_TYPE_HW = 0x0,
> +	SBI_PMU_CTR_TYPE_FW,
> +};
> +
> +/* Flags defined for config matching function */
> +#define SBI_PMU_CFG_FLAG_SKIP_MATCH	(1 << 0)
> +#define SBI_PMU_CFG_FLAG_CLEAR_VALUE	(1 << 1)
> +#define SBI_PMU_CFG_FLAG_AUTO_START	(1 << 2)
> +#define SBI_PMU_CFG_FLAG_SET_VUINH	(1 << 3)
> +#define SBI_PMU_CFG_FLAG_SET_VSINH	(1 << 4)
> +#define SBI_PMU_CFG_FLAG_SET_UINH	(1 << 5)
> +#define SBI_PMU_CFG_FLAG_SET_SINH	(1 << 6)
> +#define SBI_PMU_CFG_FLAG_SET_MINH	(1 << 7)
> +
> +/* Flags defined for counter start function */
> +#define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)
> +#define SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT BIT(1)
> +
> +/* Flags defined for counter stop function */
> +#define SBI_PMU_STOP_FLAG_RESET (1 << 0)
> +#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)

When changing shifts to BIT()'s, don't forget these (easy not to forget
if we go with the synch sbi.h to tools approach)

> +
>  struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
>  			unsigned long arg1, unsigned long arg2,
>  			unsigned long arg3, unsigned long arg4,
> -- 
> 2.34.1
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest
  2024-02-29  1:01 ` [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest Atish Patra
  2024-03-01  4:47   ` Anup Patel
@ 2024-03-02 11:52   ` Andrew Jones
  2024-04-02  8:34     ` Atish Patra
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-03-02 11:52 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Anup Patel,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:28PM -0800, Atish Patra wrote:
> This test implements basic sanity test and cycle/instret event
> counting tests.
> 
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  tools/testing/selftests/kvm/Makefile        |   1 +
>  tools/testing/selftests/kvm/riscv/sbi_pmu.c | 340 ++++++++++++++++++++
>  2 files changed, 341 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 426f85798aea..b2dce6843b9e 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -195,6 +195,7 @@ TEST_GEN_PROGS_riscv += kvm_create_max_vcpus
>  TEST_GEN_PROGS_riscv += kvm_page_table_test
>  TEST_GEN_PROGS_riscv += set_memory_region_test
>  TEST_GEN_PROGS_riscv += steal_time
> +TEST_GEN_PROGS_riscv += riscv/sbi_pmu

We put the 

 TEST_GEN_PROGS_riscv += riscv/...

lines at the top of the

 TEST_GEN_PROGS_riscv += ...

set

>  
>  SPLIT_TESTS += arch_timer
>  SPLIT_TESTS += get-reg-list
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> new file mode 100644
> index 000000000000..fc1fc5eea99e
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> @@ -0,0 +1,340 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * arch_timer.c - Tests the riscv64 sstc timer IRQ functionality
> + *
> + * The test validates the sstc timer IRQs using vstimecmp registers.
> + * It's ported from the aarch64 arch_timer test.

The header (apparently borrowed from arch_timer.c) needs to be updated
to talk about the pmu instead of the timer.

> + *
> + * Copyright (c) 2024, Rivos Inc.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include "kvm_util.h"
> +#include "test_util.h"
> +#include "processor.h"
> +
> +/* Maximum counters (firmware + hardware)*/
                                            ^ space

> +#define RISCV_MAX_PMU_COUNTERS 64
> +union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
> +
> +/* Cache the available counters in a bitmask */
> +static unsigned long counter_mask_available;
> +
> +unsigned long pmu_csr_read_num(int csr_num)
> +{
> +#define switchcase_csr_read(__csr_num, __val)		{\
> +	case __csr_num:					\
> +		__val = csr_read(__csr_num);		\
> +		break; }
> +#define switchcase_csr_read_2(__csr_num, __val)		{\
> +	switchcase_csr_read(__csr_num + 0, __val)	 \
> +	switchcase_csr_read(__csr_num + 1, __val)}
> +#define switchcase_csr_read_4(__csr_num, __val)		{\
> +	switchcase_csr_read_2(__csr_num + 0, __val)	 \
> +	switchcase_csr_read_2(__csr_num + 2, __val)}
> +#define switchcase_csr_read_8(__csr_num, __val)		{\
> +	switchcase_csr_read_4(__csr_num + 0, __val)	 \
> +	switchcase_csr_read_4(__csr_num + 4, __val)}
> +#define switchcase_csr_read_16(__csr_num, __val)	{\
> +	switchcase_csr_read_8(__csr_num + 0, __val)	 \
> +	switchcase_csr_read_8(__csr_num + 8, __val)}
> +#define switchcase_csr_read_32(__csr_num, __val)	{\
> +	switchcase_csr_read_16(__csr_num + 0, __val)	 \
> +	switchcase_csr_read_16(__csr_num + 16, __val)}
> +
> +	unsigned long ret = 0;
> +
> +	switch (csr_num) {
> +	switchcase_csr_read_32(CSR_CYCLE, ret)
> +	switchcase_csr_read_32(CSR_CYCLEH, ret)
> +	default :
> +		break;
> +	}
> +
> +	return ret;
> +#undef switchcase_csr_read_32
> +#undef switchcase_csr_read_16
> +#undef switchcase_csr_read_8
> +#undef switchcase_csr_read_4
> +#undef switchcase_csr_read_2
> +#undef switchcase_csr_read
> +}
> +
> +static inline void dummy_func_loop(int iter)
> +{
> +	int i = 0;
> +
> +	while (i < iter) {
> +		asm volatile("nop");
> +		i++;
> +	}
> +}
> +
> +static void guest_illegal_exception_handler(struct ex_regs *regs)
> +{
> +	__GUEST_ASSERT(regs->cause == EXC_INST_ILLEGAL,
> +		       "Unexpected exception handler %lx\n", regs->cause);

Shouldn't we be reporting somehow that we were here? We seem to be using
this handler to skip instructions which don't work, which is fine, if
we have some knowledge we skipped them and then do something else.
Otherwise I don't understand.

> +
> +	/* skip the trapping instruction */
> +	regs->epc += 4;
> +}
> +
> +static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
> +				       unsigned long cflags,
> +				       unsigned long event)
> +{
> +	struct sbiret ret;
> +
> +	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase, cmask,
> +			cflags, event, 0, 0);
> +	__GUEST_ASSERT(ret.error == 0, "config matching failed %ld\n", ret.error);
> +	GUEST_ASSERT((ret.value < RISCV_MAX_PMU_COUNTERS) &&
> +		    ((1UL << ret.value) & counter_mask_available));

I'd prefer to break these apart so it's more clear which one fails, if one
fails.

   GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS);
   GUEST_ASSERT(BIT(ret.value) & counter_mask_available);

> +
> +	return ret.value;
> +}
> +
> +static unsigned long get_num_counters(void)
> +{
> +	struct sbiret ret;
> +
> +	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_NUM_COUNTERS, 0, 0, 0, 0, 0, 0);
> +
> +	__GUEST_ASSERT(ret.error == 0, "Unable to retrieve number of counters from SBI PMU");
> +

nit: drop this blank line

> +	__GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS,
> +		       "Invalid number of counters %ld\n", ret.value);
> +
> +	return ret.value;
> +}
> +
> +static void update_counter_info(int num_counters)
> +{
> +	int i = 0;
> +	struct sbiret ret;
> +
> +	for (i = 0; i < num_counters; i++) {
> +		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i, 0, 0, 0, 0, 0);
> +
> +		/* There can be gaps in logical counter indicies*/
> +		if (!ret.error)
> +			GUEST_ASSERT_NE(ret.value, 0);

I guess this should be

  if (ret.error)
    continue;
  GUEST_ASSERT_NE(ret.value, 0);

> +
> +		ctrinfo_arr[i].value = ret.value;
> +		counter_mask_available |= BIT(i);
> +	}
> +
> +	GUEST_ASSERT(counter_mask_available > 0);
> +}
> +
> +static unsigned long read_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
> +{
> +	unsigned long counter_val = 0;
> +	struct sbiret ret;
> +
> +	__GUEST_ASSERT(ctrinfo.type < 2, "Invalid counter type %d", ctrinfo.type);
> +
> +	if (ctrinfo.type == SBI_PMU_CTR_TYPE_HW) {
> +		counter_val = pmu_csr_read_num(ctrinfo.csr);
> +	} else if (ctrinfo.type == SBI_PMU_CTR_TYPE_FW) {
> +		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ, idx, 0, 0, 0, 0, 0);
> +		GUEST_ASSERT(ret.error == 0);
> +		counter_val = ret.value;
> +	}
> +
> +	return counter_val;
> +}
> +
> +static void start_counter(unsigned long counter, unsigned long start_flags,
> +			  unsigned long ival)
> +{
> +	struct sbiret ret;
> +
> +	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, counter, 1, start_flags,
> +			ival, 0, 0);
> +	__GUEST_ASSERT(ret.error == 0, "Unable to start counter %ld\n", counter);
> +}
> +
> +static void stop_counter(unsigned long counter, unsigned long stop_flags)
> +{
> +	struct sbiret ret;
> +
> +	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, counter, 1, stop_flags,
> +			0, 0, 0);
> +	if (stop_flags & SBI_PMU_STOP_FLAG_RESET)
> +		__GUEST_ASSERT(ret.error == SBI_ERR_ALREADY_STOPPED,
> +			       "Unable to stop counter %ld\n", counter);

This looks like we're abusing the SBI_PMU_STOP_FLAG_RESET flag to do the
already-stopped test. I'd rather helper functions work generally and do
stuff like this in test code with comments pointing it out. Or just
cleanly and separately set up an already-stopped test, so it's clear.

> +	else
> +		__GUEST_ASSERT(ret.error == 0, "Unable to stop counter %ld error %ld\n",
> +			       counter, ret.error);
> +}
> +
> +static void test_pmu_event(unsigned long event)
> +{
> +	unsigned long counter;
> +	unsigned long counter_value_pre, counter_value_post;
> +	unsigned long counter_init_value = 100;
> +
> +	counter = get_counter_index(0, counter_mask_available, 0, event);
> +	counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
> +
> +	/* Do not set the initial value */
> +	start_counter(counter, 0, counter_init_value);
> +	dummy_func_loop(10000);
> +

nit: I'd remove this blank line so we have start/dummy/stop all together
in a group. Same comment below.

> +	stop_counter(counter, 0);
> +
> +	counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> +	__GUEST_ASSERT(counter_value_post > counter_value_pre,
> +		       "counter_value_post %lx counter_value_pre %lx\n",
> +		       counter_value_post, counter_value_pre);
> +
> +	/* Now set the initial value and compare */
> +	start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_init_value);

We should try to confirm that we reset the counter, otherwise the check
below only proves that the value we read is greater than 100, which it
is possible even if the reset doesn't work.

> +	dummy_func_loop(10000);
> +
> +	stop_counter(counter, 0);
> +
> +	counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> +	__GUEST_ASSERT(counter_value_post > counter_init_value,
> +		       "counter_value_post %lx counter_init_value %lx\n",
> +		       counter_value_post, counter_init_value);
> +
> +	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
> +}
> +
> +static void test_invalid_event(void)
> +{
> +	struct sbiret ret;
> +	unsigned long event = 0x1234; /* A random event */
> +
> +	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, 0,
> +			counter_mask_available, 0, event, 0, 0);
> +	GUEST_ASSERT_EQ(ret.error, SBI_ERR_NOT_SUPPORTED);
> +}
> +
> +static void test_pmu_events(int cpu)

cpu is unused so the parameter list can be void. Same comment for
test_pmu_basic_sanity()

> +{
> +	int num_counters = 0;
> +
> +	/* Get the counter details */
> +	num_counters = get_num_counters();
> +	update_counter_info(num_counters);
> +
> +	/* Sanity testing for any random invalid event */
> +	test_invalid_event();
> +
> +	/* Only these two events are guranteed to be present */

guaranteed

> +	test_pmu_event(SBI_PMU_HW_CPU_CYCLES);
> +	test_pmu_event(SBI_PMU_HW_INSTRUCTIONS);
> +
> +	GUEST_DONE();
> +}
> +
> +static void test_pmu_basic_sanity(int cpu)
> +{
> +	long out_val = 0;
> +	bool probe;
> +	struct sbiret ret;
> +	int num_counters = 0, i;
> +	unsigned long counter_val = -1;
> +	union sbi_pmu_ctr_info ctrinfo;
> +
> +	probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
> +	GUEST_ASSERT(probe && out_val == 1);
> +
> +	num_counters = get_num_counters();
> +
> +	for (i = 0; i < num_counters; i++) {
> +		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i,
> +				0, 0, 0, 0, 0);
> +
> +		/* There can be gaps in logical counter indicies*/
> +		if (!ret.error)
> +			GUEST_ASSERT_NE(ret.value, 0);
> +		else
> +			continue;

nit:

 if (ret.error)
    continue;
  GUEST_ASSERT_NE(ret.value, 0);

> +
> +		ctrinfo.value = ret.value;
> +
> +		/* Accesibility check of hardware and read capability of firmware counters */

Accessibility

> +		counter_val = read_counter(i, ctrinfo);
> +		/* The spec doesn't mandate any initial value. Verify if a sane value */
> +		GUEST_ASSERT_NE(counter_val, -1);

Hmm, does -1 have any special meaning? Otherwise it's a member of the set
of 'any', so there's nothing we can test. Or, maybe we can test that bits
higher than the ctrinfo bitwidth are zero. Although those bits might also
be unspecified, which means there's nothing we can test.

> +	}
> +
> +	GUEST_DONE();
> +}
> +
> +static void run_vcpu(struct kvm_vcpu *vcpu)
> +{
> +	struct ucall uc;
> +
> +	vcpu_run(vcpu);
> +	switch (get_ucall(vcpu, &uc)) {
> +	case UCALL_ABORT:
> +		REPORT_GUEST_ASSERT(uc);
> +		break;
> +	case UCALL_DONE:
> +	case UCALL_SYNC:
> +		break;
> +	default:
> +		TEST_FAIL("Unknown ucall %lu", uc.cmd);
> +		break;
> +	}
> +}
> +
> +void test_vm_destroy(struct kvm_vm *vm)
> +{
> +	memset(ctrinfo_arr, 0, sizeof(union sbi_pmu_ctr_info) * RISCV_MAX_PMU_COUNTERS);
> +	counter_mask_available = 0;
> +	kvm_vm_free(vm);
> +}
> +
> +static void test_vm_basic_test(void *guest_code)
> +{
> +	struct kvm_vm *vm;
> +	struct kvm_vcpu *vcpu;
> +
> +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),

Shouldn't this be checking RISCV_SBI_EXT_REG(KVM_RISCV_SBI_EXT_PMU)?

We should probably create two more helpers

 bool __vcpu_has_isa_ext(struct kvm_vcpu *vcpu, uint64_t isa_ext)
 {
    return __vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(isa_ext));
 }
 bool __vcpu_has_sbi_ext(struct kvm_vcpu *vcpu, uint64_t sbi_ext)
 {
    return __vcpu_has_ext(vcpu, RISCV_SBI_EXT_REG(sbi_ext));
 }

to make the extension checks less verbose and error prone.

> +				   "SBI PMU not available, skipping test");
> +	vm_init_vector_tables(vm);
> +	/* Illegal instruction handler is required to verify read access without configuration */
> +	vm_install_exception_handler(vm, EXC_INST_ILLEGAL, guest_illegal_exception_handler);
> +
> +	vcpu_init_vector_tables(vcpu);
> +	vcpu_args_set(vcpu, 1, 0);

We don't use the arguments in the guest code functions so we don't need
this call to vcpu_args_set()

> +	run_vcpu(vcpu);
> +
> +	test_vm_destroy(vm);
> +}
> +
> +static void test_vm_events_test(void *guest_code)
> +{
> +	struct kvm_vm *vm = NULL;
> +	struct kvm_vcpu *vcpu = NULL;
> +
> +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),

Same comment as above.

> +				   "SBI PMU not available, skipping test");
> +	vcpu_args_set(vcpu, 1, 0);

Same comment as above.

> +	run_vcpu(vcpu);
> +
> +	test_vm_destroy(vm);
> +}
> +
> +int main(void)
> +{
> +	test_vm_basic_test(test_pmu_basic_sanity);
> +	pr_info("SBI PMU basic test : PASS\n");
> +
> +	test_vm_events_test(test_pmu_events);
> +	pr_info("SBI PMU event verification test : PASS\n");
> +
> +	return 0;
> +}
> -- 
> 2.34.1
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 14/15] KVM: riscv: selftests: Add a test for PMU snapshot functionality
  2024-02-29  1:01 ` [PATCH v4 14/15] KVM: riscv: selftests: Add a test for PMU snapshot functionality Atish Patra
  2024-03-01  4:50   ` Anup Patel
@ 2024-03-02 12:13   ` Andrew Jones
  2024-04-02  8:35     ` Atish Patra
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-03-02 12:13 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Anup Patel,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:29PM -0800, Atish Patra wrote:
> Verify PMU snapshot functionality by setting up the shared memory
> correctly and reading the counter values from the shared memory
> instead of the CSR.
> 
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  .../selftests/kvm/include/riscv/processor.h   |  25 ++++
>  .../selftests/kvm/lib/riscv/processor.c       |  12 ++
>  tools/testing/selftests/kvm/riscv/sbi_pmu.c   | 124 ++++++++++++++++++
>  3 files changed, 161 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
> index a49a39c8e8d4..e114d039e87b 100644
> --- a/tools/testing/selftests/kvm/include/riscv/processor.h
> +++ b/tools/testing/selftests/kvm/include/riscv/processor.h
> @@ -173,6 +173,7 @@ enum sbi_ext_id {
>  };
>  
>  enum sbi_ext_base_fid {
> +	SBI_EXT_BASE_GET_IMP_VERSION = 2,
>  	SBI_EXT_BASE_PROBE_EXT = 3,
>  };
>  
> @@ -201,6 +202,12 @@ union sbi_pmu_ctr_info {
>  	};
>  };
>  
> +struct riscv_pmu_snapshot_data {
> +	u64 ctr_overflow_mask;
> +	u64 ctr_values[64];
> +	u64 reserved[447];
> +};
> +
>  struct sbiret {
>  	long error;
>  	long value;
> @@ -247,6 +254,14 @@ enum sbi_pmu_ctr_type {
>  #define SBI_PMU_STOP_FLAG_RESET (1 << 0)
>  #define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
>  
> +#define SBI_STA_SHMEM_DISABLE		-1

unrelated change

> +
> +/* SBI spec version fields */
> +#define SBI_SPEC_VERSION_DEFAULT	0x1
> +#define SBI_SPEC_VERSION_MAJOR_SHIFT	24
> +#define SBI_SPEC_VERSION_MAJOR_MASK	0x7f
> +#define SBI_SPEC_VERSION_MINOR_MASK	0xffffff
> +
>  struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
>  			unsigned long arg1, unsigned long arg2,
>  			unsigned long arg3, unsigned long arg4,
> @@ -254,6 +269,16 @@ struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
>  
>  bool guest_sbi_probe_extension(int extid, long *out_val);
>  
> +/* Make SBI version */
> +static inline unsigned long sbi_mk_version(unsigned long major,
> +					    unsigned long minor)
> +{
> +	return ((major & SBI_SPEC_VERSION_MAJOR_MASK) <<
> +		SBI_SPEC_VERSION_MAJOR_SHIFT) | minor;
> +}

We should probably just synch sbi.h into tools, since we need plenty
from it.

> +
> +unsigned long get_host_sbi_impl_version(void);
> +
>  static inline void local_irq_enable(void)
>  {
>  	csr_set(CSR_SSTATUS, SR_SIE);
> diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
> index ec66d331a127..b0162d923e38 100644
> --- a/tools/testing/selftests/kvm/lib/riscv/processor.c
> +++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
> @@ -499,3 +499,15 @@ bool guest_sbi_probe_extension(int extid, long *out_val)
>  
>  	return true;
>  }
> +
> +unsigned long get_host_sbi_impl_version(void)
> +{
> +	struct sbiret ret;
> +
> +	ret = sbi_ecall(SBI_EXT_BASE, SBI_EXT_BASE_GET_IMP_VERSION, 0,
> +		       0, 0, 0, 0, 0);
> +
> +	GUEST_ASSERT(!ret.error);
> +
> +	return ret.value;
> +}
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> index fc1fc5eea99e..8ea2a6db6610 100644
> --- a/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> @@ -21,6 +21,11 @@
>  #define RISCV_MAX_PMU_COUNTERS 64
>  union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
>  
> +/* Snapshot shared memory data */
> +#define PMU_SNAPSHOT_GPA_BASE		(1 << 30)
> +static void *snapshot_gva;
> +static vm_paddr_t snapshot_gpa;
> +
>  /* Cache the available counters in a bitmask */
>  static unsigned long counter_mask_available;
>  
> @@ -173,6 +178,20 @@ static void stop_counter(unsigned long counter, unsigned long stop_flags)
>  			       counter, ret.error);
>  }
>  
> +static void snapshot_set_shmem(vm_paddr_t gpa, unsigned long flags)
> +{
> +	unsigned long lo = (unsigned long)gpa;
> +#if __riscv_xlen == 32
> +	unsigned long hi = (unsigned long)(gpa >> 32);
> +#else
> +	unsigned long hi = gpa == -1 ? -1 : 0;
> +#endif
> +	struct sbiret ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> +				      lo, hi, flags, 0, 0, 0);
> +
> +	GUEST_ASSERT(ret.value == 0 && ret.error == 0);
> +}
> +
>  static void test_pmu_event(unsigned long event)
>  {
>  	unsigned long counter;
> @@ -207,6 +226,43 @@ static void test_pmu_event(unsigned long event)
>  	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
>  }
>  
> +static void test_pmu_event_snapshot(unsigned long event)
> +{
> +	unsigned long counter;
> +	unsigned long counter_value_pre, counter_value_post;
> +	unsigned long counter_init_value = 100;
> +	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> +
> +	counter = get_counter_index(0, counter_mask_available, 0, event);
> +	counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
> +
> +	/* Do not set the initial value */
> +	start_counter(counter, 0, 0);
> +	dummy_func_loop(10000);
> +
> +	stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
> +
> +	/* The counter value is updated w.r.t relative index of cbase */
> +	counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
> +	__GUEST_ASSERT(counter_value_post > counter_value_pre,
> +		       "counter_value_post %lx counter_value_pre %lx\n",
> +		       counter_value_post, counter_value_pre);
> +
> +	/* Now set the initial value and compare */
> +	WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
> +	start_counter(counter, SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT, 0);
> +	dummy_func_loop(10000);
> +
> +	stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
> +
> +	counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
> +	__GUEST_ASSERT(counter_value_post > counter_init_value,
> +		       "counter_value_post %lx counter_init_value %lx for counter\n",
> +		       counter_value_post, counter_init_value);
> +
> +	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);

This function is almost identical to test_pmu_event(). If we change one,
we'll likely have to change the other. We should have a single function
which can be used by both tests. We can do that by passing a function
pointer for the read which is different for non-snapshot and snapshot.

> +}
> +
>  static void test_invalid_event(void)
>  {
>  	struct sbiret ret;
> @@ -270,6 +326,41 @@ static void test_pmu_basic_sanity(int cpu)
>  	GUEST_DONE();
>  }
>  
> +static void test_pmu_events_snaphost(int cpu)

unnecessary cpu parameter

> +{
> +	long out_val = 0;
> +	bool probe;
> +	int num_counters = 0;
> +	unsigned long sbi_impl_version;
> +	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> +	int i;
> +
> +	probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
> +	GUEST_ASSERT(probe && out_val == 1);
> +
> +	sbi_impl_version = get_host_sbi_impl_version();
> +	if (sbi_impl_version >= sbi_mk_version(2, 0))
> +		__GUEST_ASSERT(0, "SBI implementation version doesn't support PMU Snapshot");
> +
> +	snapshot_set_shmem(snapshot_gpa, 0);
> +
> +	/* Get the counter details */
> +	num_counters = get_num_counters();
> +	update_counter_info(num_counters);
> +
> +	/* Validate shared memory access */
> +	GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_overflow_mask), 0);
> +	for (i = 0; i < num_counters; i++) {
> +		if (counter_mask_available & (1UL << i))

BIT()

> +			GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_values[i]), 0);
> +	}
> +	/* Only these two events are guranteed to be present */
> +	test_pmu_event_snapshot(SBI_PMU_HW_CPU_CYCLES);
> +	test_pmu_event_snapshot(SBI_PMU_HW_INSTRUCTIONS);
> +
> +	GUEST_DONE();
> +}
> +
>  static void run_vcpu(struct kvm_vcpu *vcpu)
>  {
>  	struct ucall uc;
> @@ -328,6 +419,36 @@ static void test_vm_events_test(void *guest_code)
>  	test_vm_destroy(vm);
>  }
>  
> +static void test_vm_setup_snapshot_mem(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
> +{
> +	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, PMU_SNAPSHOT_GPA_BASE, 1, 1, 0);
> +	/* PMU Snapshot requires single page only */

This comment should go above the memory region add

> +	virt_map(vm, PMU_SNAPSHOT_GPA_BASE, PMU_SNAPSHOT_GPA_BASE, 1);
> +
> +	/* PMU_SNAPSHOT_GPA_BASE is identity mapped */

This comment should go above the virt_map

> +	snapshot_gva = (void *)(PMU_SNAPSHOT_GPA_BASE);
> +	snapshot_gpa = addr_gva2gpa(vcpu->vm, (vm_vaddr_t)snapshot_gva);
> +	sync_global_to_guest(vcpu->vm, snapshot_gva);
> +	sync_global_to_guest(vcpu->vm, snapshot_gpa);
> +}
> +
> +static void test_vm_events_snapshot_test(void *guest_code)
> +{
> +	struct kvm_vm *vm = NULL;
> +	struct kvm_vcpu *vcpu = NULL;

nit: no need to set to NULL

> +
> +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),

RISCV_SBI_EXT_REG

> +				   "SBI PMU not available, skipping test");
> +
> +	test_vm_setup_snapshot_mem(vm, vcpu);
> +
> +	vcpu_args_set(vcpu, 1, 0);

no need to set args

> +	run_vcpu(vcpu);
> +
> +	test_vm_destroy(vm);
> +}
> +
>  int main(void)
>  {
>  	test_vm_basic_test(test_pmu_basic_sanity);
> @@ -336,5 +457,8 @@ int main(void)
>  	test_vm_events_test(test_pmu_events);
>  	pr_info("SBI PMU event verification test : PASS\n");
>  
> +	test_vm_events_snapshot_test(test_pmu_events_snaphost);
> +	pr_info("SBI PMU event verification with snapshot test : PASS\n");
> +
>  	return 0;
>  }
> -- 
> 2.34.1
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 15/15] KVM: riscv: selftests: Add a test for counter overflow
  2024-02-29  1:01 ` [PATCH v4 15/15] KVM: riscv: selftests: Add a test for counter overflow Atish Patra
  2024-03-01  4:53   ` Anup Patel
@ 2024-03-02 12:35   ` Andrew Jones
  2024-04-02  8:42     ` Atish Patra
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-03-02 12:35 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Anup Patel,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Feb 28, 2024 at 05:01:30PM -0800, Atish Patra wrote:
> Add a test for verifying overflow interrupt. Currently, it relies on
> overflow support on cycle/instret events. This test works for cycle/
> instret events which support sampling via hpmcounters on the platform.
> There are no ISA extensions to detect if a platform supports that. Thus,

Ouch. Are there discussions/proposals as to how we can do better with
discoverability here? This type of thing sounds like the types of things
that get new extension names defined for them as part of the profile spec
work.

> this test will fail on platform with virtualization but doesn't
> support overflow on these two events.
> 
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  tools/testing/selftests/kvm/riscv/sbi_pmu.c | 126 +++++++++++++++++++-
>  1 file changed, 125 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> index 8ea2a6db6610..c0264c636054 100644
> --- a/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> @@ -8,6 +8,7 @@
>   * Copyright (c) 2024, Rivos Inc.
>   */
>  
> +#include "asm/csr.h"
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
> @@ -16,6 +17,7 @@
>  #include "kvm_util.h"
>  #include "test_util.h"
>  #include "processor.h"
> +#include "arch_timer.h"
>  
>  /* Maximum counters (firmware + hardware)*/
>  #define RISCV_MAX_PMU_COUNTERS 64
> @@ -26,6 +28,11 @@ union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
>  static void *snapshot_gva;
>  static vm_paddr_t snapshot_gpa;
>  
> +static int pmu_irq = IRQ_PMU_OVF;
> +
> +static int vcpu_shared_irq_count;
> +static int counter_in_use;
> +
>  /* Cache the available counters in a bitmask */
>  static unsigned long counter_mask_available;
>  
> @@ -69,7 +76,9 @@ unsigned long pmu_csr_read_num(int csr_num)
>  #undef switchcase_csr_read
>  }
>  
> -static inline void dummy_func_loop(int iter)
> +static void stop_counter(unsigned long counter, unsigned long stop_flags);
> +
> +static inline void dummy_func_loop(uint64_t iter)
>  {
>  	int i = 0;
>  
> @@ -88,6 +97,26 @@ static void guest_illegal_exception_handler(struct ex_regs *regs)
>  	regs->epc += 4;
>  }
>  
> +static void guest_irq_handler(struct ex_regs *regs)
> +{
> +	unsigned int irq_num = regs->cause & ~CAUSE_IRQ_FLAG;
> +	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> +	unsigned long overflown_mask;
> +
> +	/* Stop all counters first to avoid further interrupts */
> +	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0, 1UL << counter_in_use,
> +		  SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT, 0, 0, 0);
> +
> +	csr_clear(CSR_SIP, BIT(pmu_irq));
> +
> +	overflown_mask = READ_ONCE(snapshot_data->ctr_overflow_mask);
> +	GUEST_ASSERT(overflown_mask & (1UL << counter_in_use));
> +
> +	/* Validate that we are in the correct irq handler */
> +	GUEST_ASSERT_EQ(irq_num, pmu_irq);

Should probably do this irq handler assert first.

> +	WRITE_ONCE(vcpu_shared_irq_count, vcpu_shared_irq_count+1);
> +}
> +
>  static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
>  				       unsigned long cflags,
>  				       unsigned long event)
> @@ -263,6 +292,32 @@ static void test_pmu_event_snapshot(unsigned long event)
>  	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
>  }
>  
> +static void test_pmu_event_overflow(unsigned long event)
> +{
> +	unsigned long counter;
> +	unsigned long counter_value_post;
> +	unsigned long counter_init_value = ULONG_MAX - 10000;
> +	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> +
> +	counter = get_counter_index(0, counter_mask_available, 0, event);
> +	counter_in_use = counter;
> +
> +	/* The counter value is updated w.r.t relative index of cbase passed to start/stop */
> +	WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
> +	start_counter(counter, SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT, 0);
> +	dummy_func_loop(10000);
> +	udelay(msecs_to_usecs(2000));
> +	/* irq handler should have stopped the counter */
> +
> +	counter_value_post = READ_ONCE(snapshot_data->ctr_values[counter_in_use]);
> +	/* The counter value after stopping should be less the init value due to overflow */
> +	__GUEST_ASSERT(counter_value_post < counter_init_value,
> +		       "counter_value_post %lx counter_init_value %lx for counter\n",
> +		       counter_value_post, counter_init_value);
> +
> +	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
> +}
> +
>  static void test_invalid_event(void)
>  {
>  	struct sbiret ret;
> @@ -361,6 +416,43 @@ static void test_pmu_events_snaphost(int cpu)
>  	GUEST_DONE();
>  }
>  
> +static void test_pmu_events_overflow(int cpu)

no need for cpu

> +{
> +	long out_val = 0;
> +	bool probe;
> +	int num_counters = 0;
> +	unsigned long sbi_impl_version;
> +
> +	probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
> +	GUEST_ASSERT(probe && out_val == 1);
> +
> +	sbi_impl_version = get_host_sbi_impl_version();
> +	if (sbi_impl_version >= sbi_mk_version(2, 0))
> +		__GUEST_ASSERT(0, "SBI implementation version doesn't support PMU Snapshot");

Identical probe and version check as test_pmu_events_snaphost(). Can
factor out.

> +
> +	snapshot_set_shmem(snapshot_gpa, 0);
> +	csr_set(CSR_IE, BIT(pmu_irq));
> +	local_irq_enable();
> +
> +	/* Get the counter details */
> +	num_counters = get_num_counters();
> +	update_counter_info(num_counters);
> +
> +	/*
> +	 * Qemu supports overflow for cycle/instruction.
> +	 * This test may fail on any platform that do not support overflow for these two events.
> +	 */
> +	test_pmu_event_overflow(SBI_PMU_HW_CPU_CYCLES);
> +	GUEST_ASSERT_EQ(vcpu_shared_irq_count, 1);
> +
> +	/* Renable the interrupt again for another event */
> +	csr_set(CSR_IE, BIT(pmu_irq));
> +	test_pmu_event_overflow(SBI_PMU_HW_INSTRUCTIONS);
> +	GUEST_ASSERT_EQ(vcpu_shared_irq_count, 2);
> +
> +	GUEST_DONE();
> +}
> +
>  static void run_vcpu(struct kvm_vcpu *vcpu)
>  {
>  	struct ucall uc;
> @@ -449,6 +541,35 @@ static void test_vm_events_snapshot_test(void *guest_code)
>  	test_vm_destroy(vm);
>  }
>  
> +static void test_vm_events_overflow(void *guest_code)
> +{
> +	struct kvm_vm *vm = NULL;
> +	struct kvm_vcpu *vcpu = NULL;

nit: no need for NULL

> +
> +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_SBI_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
> +				   "SBI PMU not available, skipping test");
> +
> +	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_ISA_EXT_SSCOFPMF)),
> +				   "Sscofpmf is not available, skipping overflow test");
> +
> +
> +	test_vm_setup_snapshot_mem(vm, vcpu);
> +	vm_init_vector_tables(vm);
> +	vm_install_interrupt_handler(vm, guest_irq_handler);
> +
> +	vcpu_init_vector_tables(vcpu);
> +	/* Initialize guest timer frequency. */
> +	vcpu_get_reg(vcpu, RISCV_TIMER_REG(frequency), &timer_freq);
> +	sync_global_to_guest(vm, timer_freq);

I just noticed that timer_freq is in arch_timer.h and isn't an extern...
Fixing that is out of scope for this series though.

> +
> +	vcpu_args_set(vcpu, 1, 0);

no need for args

> +
> +	run_vcpu(vcpu);
> +
> +	test_vm_destroy(vm);
> +}
> +
>  int main(void)
>  {
>  	test_vm_basic_test(test_pmu_basic_sanity);
> @@ -460,5 +581,8 @@ int main(void)
>  	test_vm_events_snapshot_test(test_pmu_events_snaphost);
>  	pr_info("SBI PMU event verification with snapshot test : PASS\n");
>  
> +	test_vm_events_overflow(test_pmu_events_overflow);
> +	pr_info("SBI PMU event verification with overflow test : PASS\n");
> +
>  	return 0;
>  }
> -- 
> 2.34.1
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 08/15] RISC-V: KVM: Implement SBI PMU Snapshot feature
  2024-03-02  9:49   ` Andrew Jones
@ 2024-04-01 22:36     ` Atish Patra
  2024-04-03  7:36       ` Atish Patra
  0 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-04-01 22:36 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Atish Patra, linux-kernel, Anup Patel, Albert Ou,
	Alexandre Ghiti, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Sat, Mar 2, 2024 at 1:49 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Wed, Feb 28, 2024 at 05:01:23PM -0800, Atish Patra wrote:
> > PMU Snapshot function allows to minimize the number of traps when the
> > guest access configures/access the hpmcounters. If the snapshot feature
> > is enabled, the hypervisor updates the shared memory with counter
> > data and state of overflown counters. The guest can just read the
> > shared memory instead of trap & emulate done by the hypervisor.
> >
> > This patch doesn't implement the counter overflow yet.
> >
> > Reviewed-by: Anup Patel <anup@brainfault.org>
> > Signed-off-by: Atish Patra <atishp@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/kvm_vcpu_pmu.h |   7 ++
> >  arch/riscv/kvm/vcpu_pmu.c             | 120 +++++++++++++++++++++++++-
> >  arch/riscv/kvm/vcpu_sbi_pmu.c         |   3 +
> >  drivers/perf/riscv_pmu_sbi.c          |   2 +-
> >  4 files changed, 129 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > index 395518a1664e..586bab84be35 100644
> > --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > @@ -50,6 +50,10 @@ struct kvm_pmu {
> >       bool init_done;
> >       /* Bit map of all the virtual counter used */
> >       DECLARE_BITMAP(pmc_in_use, RISCV_KVM_MAX_COUNTERS);
> > +     /* The address of the counter snapshot area (guest physical address) */
> > +     gpa_t snapshot_addr;
> > +     /* The actual data of the snapshot */
> > +     struct riscv_pmu_snapshot_data *sdata;
> >  };
> >
> >  #define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu_context)
> > @@ -85,6 +89,9 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> >  int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> >                               struct kvm_vcpu_sbi_return *retdata);
> >  void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
> > +int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
> > +                                   unsigned long saddr_high, unsigned long flags,
> > +                                   struct kvm_vcpu_sbi_return *retdata);
>
> I prefer to name this function
>
> kvm_riscv_vcpu_pmu_snapshot_set_shmem
>

Sure.

> >  void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
> >  void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> >
> > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > index 29bf4ca798cb..74865e6050a1 100644
> > --- a/arch/riscv/kvm/vcpu_pmu.c
> > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > @@ -311,6 +311,81 @@ int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> >       return ret;
> >  }
> >
> > +static void kvm_pmu_clear_snapshot_area(struct kvm_vcpu *vcpu)
> > +{
> > +     struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > +     int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
> > +
> > +     if (kvpmu->sdata) {
> > +             memset(kvpmu->sdata, 0, snapshot_area_size);
> > +             if (kvpmu->snapshot_addr != INVALID_GPA)
>
> It's a KVM bug if we have non-null sdata but snapshot_addr is INVALID_GPA,
> right? Maybe we should warn if we see that. We can also move the memset
> inside the if block.
>

Added a warning.

> > +                     kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr,
> > +                                          kvpmu->sdata, snapshot_area_size);
> > +             kfree(kvpmu->sdata);
> > +             kvpmu->sdata = NULL;
> > +     }
> > +     kvpmu->snapshot_addr = INVALID_GPA;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
> > +                                   unsigned long saddr_high, unsigned long flags,
> > +                                   struct kvm_vcpu_sbi_return *retdata)
> > +{
> > +     struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > +     int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
> > +     int sbiret = 0;
> > +     gpa_t saddr;
> > +     unsigned long hva;
> > +     bool writable;
> > +
> > +     if (!kvpmu) {
> > +             sbiret = SBI_ERR_INVALID_PARAM;
> > +             goto out;
> > +     }
>
> Need to check that flags is zero or return SBI_ERR_INVALID_PARAM.
>

Fixed.

> > +
> > +     if (saddr_low == -1 && saddr_high == -1) {
>
> We introduced SBI_STA_SHMEM_DISABLE for these magic -1's for STA. Since
> SBI is using the -1 approach for all its shmem, then maybe we should
> rename SBI_STA_SHMEM_DISABLE to SBI_SHMEM_DISABLE and then use them here
> too.
>

Fixed

> > +             kvm_pmu_clear_snapshot_area(vcpu);
> > +             return 0;
> > +     }
> > +
> > +     saddr = saddr_low;
> > +
> > +     if (saddr_high != 0) {
> > +             if (IS_ENABLED(CONFIG_32BIT))
> > +                     saddr |= ((gpa_t)saddr << 32);
> > +             else
> > +                     sbiret = SBI_ERR_INVALID_ADDRESS;
> > +             goto out;
> > +     }
> > +
> > +     if (kvm_is_error_gpa(vcpu->kvm, saddr)) {
> > +             sbiret = SBI_ERR_INVALID_PARAM;
> > +             goto out;
> > +     }
>
> Does the check above provide anything more than what the check below does?
>
You are correct. I have removed the check

> > +
> > +     hva = kvm_vcpu_gfn_to_hva_prot(vcpu, saddr >> PAGE_SHIFT, &writable);
> > +     if (kvm_is_error_hva(hva) || !writable) {
> > +             sbiret = SBI_ERR_INVALID_ADDRESS;
> > +             goto out;
> > +     }
> > +
> > +     kvpmu->snapshot_addr = saddr;
> > +     kvpmu->sdata = kzalloc(snapshot_area_size, GFP_ATOMIC);
> > +     if (!kvpmu->sdata)
>
> Should reset snapshot_addr to INVALID_GPA here on error. Or maybe we
> should just set snapshot_addr to saddr at the bottom of this function if
> we make it.
>

Done.

> > +             return -ENOMEM;
> > +
> > +     if (kvm_vcpu_write_guest(vcpu, saddr, kvpmu->sdata, snapshot_area_size)) {
> > +             kfree(kvpmu->sdata);
> > +             kvpmu->snapshot_addr = INVALID_GPA;
> > +             sbiret = SBI_ERR_FAILURE;
>
> I agree we should return this SBI error for this case, but unfortunately
> the spec is missing the
>
>  SBI_ERR_FAILED - The request failed for unspecified or unknown other reasons.
>
> that we have for other SBI functions. I guess we should keep the code like
> this and open a PR to the spec.
>

I have created a blanket github issue for now. I will send a PR.

> > +     }
> > +
> > +out:
> > +     retdata->err_val = sbiret;
> > +
> > +     return 0;
> > +}
> > +
> >  int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu,
> >                               struct kvm_vcpu_sbi_return *retdata)
> >  {
> > @@ -344,20 +419,33 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> >       int i, pmc_index, sbiret = 0;
> >       struct kvm_pmc *pmc;
> >       int fevent_code;
> > +     bool snap_flag_set = flags & SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT;
>
> This function should confirm no undefined bits are set in flags and the
> spec should specify that the reserved flags must be zero otherwise an
> invalid param will be returned.
>
> Also here would should confirm that only one of the two flags is set,
> otherwise return invalid param, as they've specified to be mutually
> exclusive.
>

That makes sense. Update the same github issue.
(https://github.com/riscv-non-isa/riscv-sbi-doc/issues/145)

I will make the necessary changes in a separate series after the spec is merged.

> Regarding the spec, the note about the counter value not being modified
> unless SBI_PMU_START_SET_INIT_VALUE is set should be modified to state
> unless either of the two flags are set (so I think we need another spec
> PR).
>
> (The same flags checking/specifying comments apply to the other functions
> with flags too.)
>

Noted (https://github.com/riscv-non-isa/riscv-sbi-doc/issues/146).

> >
> >       if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
> >               sbiret = SBI_ERR_INVALID_PARAM;
> >               goto out;
> >       }
> >
> > +     if (snap_flag_set && kvpmu->snapshot_addr == INVALID_GPA) {
> > +             sbiret = SBI_ERR_NO_SHMEM;
> > +             goto out;
> > +     }
> > +
> >       /* Start the counters that have been configured and requested by the guest */
> >       for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
> >               pmc_index = i + ctr_base;
> >               if (!test_bit(pmc_index, kvpmu->pmc_in_use))
> >                       continue;
> >               pmc = &kvpmu->pmc[pmc_index];
> > -             if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE)
> > +             if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE) {
> >                       pmc->counter_val = ival;
> > +             } else if (snap_flag_set) {
> > +                     kvm_vcpu_read_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
> > +                                         sizeof(struct riscv_pmu_snapshot_data));
>
> The snapshot read should be outside the for_each_set_bit() loop and we
> should warn and abort the counter starting if the read fails.
>

Fixed. This should also fall under the SBI_ERR_FAILURE category.

> > +                     /* The counter index in the snapshot are relative to the counter base */
> > +                     pmc->counter_val = kvpmu->sdata->ctr_values[i];
> > +             }
> > +
> >               if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
> >                       fevent_code = get_event_code(pmc->event_idx);
> >                       if (fevent_code >= SBI_PMU_FW_MAX) {
> > @@ -398,14 +486,21 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> >  {
> >       struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> >       int i, pmc_index, sbiret = 0;
> > +     u64 enabled, running;
> >       struct kvm_pmc *pmc;
> >       int fevent_code;
> > +     bool snap_flag_set = flags & SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
> >
> > -     if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
> > +     if ((kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0)) {
>
> Added unnecessary () here.
>

Fixed.

> >               sbiret = SBI_ERR_INVALID_PARAM;
> >               goto out;
> >       }
> >
> > +     if (snap_flag_set && kvpmu->snapshot_addr == INVALID_GPA) {
> > +             sbiret = SBI_ERR_NO_SHMEM;
> > +             goto out;
> > +     }
> > +
> >       /* Stop the counters that have been configured and requested by the guest */
> >       for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
> >               pmc_index = i + ctr_base;
> > @@ -438,9 +533,28 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> >               } else {
> >                       sbiret = SBI_ERR_INVALID_PARAM;
> >               }
> > +
> > +             if (snap_flag_set && !sbiret) {
> > +                     if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW)
> > +                             pmc->counter_val = kvpmu->fw_event[fevent_code].value;
> > +                     else if (pmc->perf_event)
> > +                             pmc->counter_val += perf_event_read_value(pmc->perf_event,
> > +                                                                       &enabled, &running);
> > +                     /* TODO: Add counter overflow support when sscofpmf support is added */
> > +                     kvpmu->sdata->ctr_values[i] = pmc->counter_val;
> > +                     kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
> > +                                          sizeof(struct riscv_pmu_snapshot_data));
>
> Should just set a boolean here saying that the snapshot needs an update
> and then do the update outside the for_each_set_bit loop.
>

Done.

> > +             }
> > +
> >               if (flags & SBI_PMU_STOP_FLAG_RESET) {
> >                       pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
> >                       clear_bit(pmc_index, kvpmu->pmc_in_use);
> > +                     if (snap_flag_set) {
> > +                             /* Clear the snapshot area for the upcoming deletion event */
> > +                             kvpmu->sdata->ctr_values[i] = 0;
> > +                             kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
> > +                                                  sizeof(struct riscv_pmu_snapshot_data));
>
> The spec isn't clear on this (so we should clarify it), but I'd expect
> that a caller who set both the reset and the snapshot flag would want
> the snapshot from before the reset when this call completes and then
> assume that when they start counting again, and look at the snapshot
> again, that those new counts would be from the reset values. Or maybe
> not :-) Maybe they want to do a reset and take a snapshot in order to
> look at the snapshot and confirm the reset happened? Either way, it
> seems we should only do one of the two here. Either update the snapshot
> before resetting, and not again after reset, or reset and then update
> the snapshot (with no need to update before).
>

The reset call should happen when the event is deleted by the perf
framework in supervisor.
If we don't clear the values, the shared memory may have stale data of
last read counters
which is not ideal. That's why, I am clearing it upon resetting.
The actual counter value should be read while stopping the counters.

I thought the current description is clear enough as it says

"SBI_PMU_STOP_FLAG_RESET - Reset the counter to event mapping."

Do you feel we should be more explicit about this ?

> > +                     }
> >               }
> >       }
> >
> > @@ -566,6 +680,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> >       kvpmu->num_hw_ctrs = num_hw_ctrs + 1;
> >       kvpmu->num_fw_ctrs = SBI_PMU_FW_MAX;
> >       memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
> > +     kvpmu->snapshot_addr = INVALID_GPA;
> >
> >       if (kvpmu->num_hw_ctrs > RISCV_KVM_MAX_HW_CTRS) {
> >               pr_warn_once("Limiting the hardware counters to 32 as specified by the ISA");
> > @@ -625,6 +740,7 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> >       }
> >       bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
> >       memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
> > +     kvm_pmu_clear_snapshot_area(vcpu);
> >  }
> >
> >  void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> > diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> > index b70179e9e875..9f61136e4bb1 100644
> > --- a/arch/riscv/kvm/vcpu_sbi_pmu.c
> > +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> > @@ -64,6 +64,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >       case SBI_EXT_PMU_COUNTER_FW_READ:
> >               ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
> >               break;
> > +     case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM:
> > +             ret = kvm_riscv_vcpu_pmu_setup_snapshot(vcpu, cp->a0, cp->a1, cp->a2, retdata);
> > +             break;
> >       default:
> >               retdata->err_val = SBI_ERR_NOT_SUPPORTED;
> >       }
> > diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> > index 8de5721e8019..1a22ce1ff8c8 100644
> > --- a/drivers/perf/riscv_pmu_sbi.c
> > +++ b/drivers/perf/riscv_pmu_sbi.c
> > @@ -802,7 +802,7 @@ static noinline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_h
> >       struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> >
> >       for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
> > -             if (ctr_ovf_mask & (1 << idx)) {
> > +             if (ctr_ovf_mask & (BIT(idx))) {
> >                       event = cpu_hw_evt->events[idx];
> >                       hwc = &event->hw;
> >                       max_period = riscv_pmu_ctr_get_width_mask(event);
> > --
> > 2.34.1
> >
>
> Thanks,
> drew



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 07/15] RISC-V: KVM: No need to exit to the user space if perf event failed
  2024-03-02  8:15   ` Andrew Jones
@ 2024-04-01 22:37     ` Atish Patra
  2024-04-04 12:16       ` Andrew Jones
  0 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-04-01 22:37 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Atish Patra, linux-kernel, Anup Patel, Albert Ou,
	Alexandre Ghiti, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Sat, Mar 2, 2024 at 12:16 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Wed, Feb 28, 2024 at 05:01:22PM -0800, Atish Patra wrote:
> > Currently, we return a linux error code if creating a perf event failed
> > in kvm. That shouldn't be necessary as guest can continue to operate
> > without perf profiling or profiling with firmware counters.
> >
> > Return appropriate SBI error code to indicate that PMU configuration
> > failed. An error message in kvm already describes the reason for failure.
>
> I don't know enough about the perf subsystem to know if there may be
> a concern that resources are temporarily unavailable. If so, then this

Do you mean the hardware resources unavailable because the host is using it ?

> patch would make it possible for a guest to do the exact same thing,
> but sometimes succeed and sometimes get SBI_ERR_NOT_SUPPORTED.
> sbi_pmu_counter_config_matching doesn't currently have any error types
> specified that say "unsupported at the moment, maybe try again", which
> would be more appropriate in that case. I do see
> perf_event_create_kernel_counter() can return ENOMEM when memory isn't
> available, but if the kernel isn't able to allocate a small amount of
> memory, then we're in bigger trouble anyway, so the concern would be
> if there are perf resource pools which may temporarily be exhausted at
> the time the guest makes this request.
>

For other cases, this patch ensures that guests continue to run without failure
which allows the user in the guest to try again if this fails due to a temporary
resource availability.

> One comment below.
>
> >
> > Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling")
> > Reviewed-by: Anup Patel <anup@brainfault.org>
> > Signed-off-by: Atish Patra <atishp@rivosinc.com>
> > ---
> >  arch/riscv/kvm/vcpu_pmu.c     | 14 +++++++++-----
> >  arch/riscv/kvm/vcpu_sbi_pmu.c |  6 +++---
> >  2 files changed, 12 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > index b1574c043f77..29bf4ca798cb 100644
> > --- a/arch/riscv/kvm/vcpu_pmu.c
> > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > @@ -229,8 +229,9 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
> >       return 0;
> >  }
> >
> > -static int kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
> > -                                  unsigned long flags, unsigned long eidx, unsigned long evtdata)
> > +static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
> > +                                   unsigned long flags, unsigned long eidx,
> > +                                   unsigned long evtdata)
> >  {
> >       struct perf_event *event;
> >
> > @@ -454,7 +455,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> >                                    unsigned long eidx, u64 evtdata,
> >                                    struct kvm_vcpu_sbi_return *retdata)
> >  {
> > -     int ctr_idx, ret, sbiret = 0;
> > +     int ctr_idx, sbiret = 0;
> > +     long ret;
> >       bool is_fevent;
> >       unsigned long event_code;
> >       u32 etype = kvm_pmu_get_perf_event_type(eidx);
> > @@ -513,8 +515,10 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> >                       kvpmu->fw_event[event_code].started = true;
> >       } else {
> >               ret = kvm_pmu_create_perf_event(pmc, &attr, flags, eidx, evtdata);
> > -             if (ret)
> > -                     return ret;
> > +             if (ret) {
> > +                     sbiret = SBI_ERR_NOT_SUPPORTED;
> > +                     goto out;
> > +             }
> >       }
> >
> >       set_bit(ctr_idx, kvpmu->pmc_in_use);
> > diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> > index 7eca72df2cbd..b70179e9e875 100644
> > --- a/arch/riscv/kvm/vcpu_sbi_pmu.c
> > +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> > @@ -42,9 +42,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >  #endif
> >               /*
> >                * This can fail if perf core framework fails to create an event.
> > -              * Forward the error to userspace because it's an error which
> > -              * happened within the host kernel. The other option would be
> > -              * to convert to an SBI error and forward to the guest.
> > +              * No need to forward the error to userspace and exit the guest
>
> Period after guest
>
>
> > +              * operation can continue without profiling. Forward the
>
> The operation
>

Fixed the above two.


> > +              * appropriate SBI error to the guest.
> >                */
> >               ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
> >                                                      cp->a2, cp->a3, temp, retdata);
> > --
> > 2.34.1
> >
>
> Thanks,
> drew



--
Regards,
Atish

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 10/15] RISC-V: KVM: Support 64 bit firmware counters on RV32
  2024-03-02 10:52   ` Andrew Jones
@ 2024-04-02  0:03     ` Atish Patra
  0 siblings, 0 replies; 56+ messages in thread
From: Atish Patra @ 2024-04-02  0:03 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Atish Patra, linux-kernel, Anup Patel, Albert Ou,
	Alexandre Ghiti, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Sat, Mar 2, 2024 at 2:52 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Wed, Feb 28, 2024 at 05:01:25PM -0800, Atish Patra wrote:
> > The SBI v2.0 introduced a fw_read_hi function to read 64 bit firmware
> > counters for RV32 based systems.
> >
> > Add infrastructure to support that.
> >
> > Reviewed-by: Anup Patel <anup@brainfault.org>
> > Signed-off-by: Atish Patra <atishp@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/kvm_vcpu_pmu.h |  4 ++-
> >  arch/riscv/kvm/vcpu_pmu.c             | 37 ++++++++++++++++++++++++++-
> >  arch/riscv/kvm/vcpu_sbi_pmu.c         |  6 +++++
> >  3 files changed, 45 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > index 8cb21a4f862c..e0ad27dea46c 100644
> > --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > @@ -20,7 +20,7 @@ static_assert(RISCV_KVM_MAX_COUNTERS <= 64);
> >
> >  struct kvm_fw_event {
> >       /* Current value of the event */
> > -     unsigned long value;
> > +     u64 value;
> >
> >       /* Event monitoring status */
> >       bool started;
> > @@ -91,6 +91,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> >                                    struct kvm_vcpu_sbi_return *retdata);
> >  int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> >                               struct kvm_vcpu_sbi_return *retdata);
> > +int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
> > +                                   struct kvm_vcpu_sbi_return *retdata);
> >  void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
> >  int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
> >                                     unsigned long saddr_high, unsigned long flags,
> > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > index a02f7b981005..469bb430cf97 100644
> > --- a/arch/riscv/kvm/vcpu_pmu.c
> > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > @@ -196,6 +196,29 @@ static int pmu_get_pmc_index(struct kvm_pmu *pmu, unsigned long eidx,
> >       return kvm_pmu_get_programmable_pmc_index(pmu, eidx, cbase, cmask);
> >  }
> >
> > +static int pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
> > +                           unsigned long *out_val)
> > +{
> > +     struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > +     struct kvm_pmc *pmc;
> > +     int fevent_code;
> > +
> > +     if (!IS_ENABLED(CONFIG_32BIT))
>
> Let's remove the CONFIG_32BIT check in kvm_sbi_ext_pmu_handler() and then
> set *out_val to zero here and return success. Either that, or we should
> WARN or something here since it's a KVM bug to get here with
> !CONFIG_32BIT.
>

I added a warning here to prevent any sort of kvm bug. Returning
silently with out_val to zero from here may hide that.

The CONFIG_32BIT check in kvm_sbi_ext_pmu_handler also avoids
unnecessary code execution
(even though they are few) in case the lower privilege mode software
invokes the read_hi by mistake
for non RV32.


> > +             return -EINVAL;
> > +
> > +     pmc = &kvpmu->pmc[cidx];
>
> Uh oh! We're missing range validation of cidx! And I see we're missing it
> in pmu_ctr_read() too. We need the same check we have in
> kvm_riscv_vcpu_pmu_ctr_info(). I think the other SBI functions are OK,
> but it's worth a triple check.
>

Good catch. Thanks. Fixed it.

> > +
> > +     if (pmc->cinfo.type != SBI_PMU_CTR_TYPE_FW)
> > +             return -EINVAL;
> > +
> > +     fevent_code = get_event_code(pmc->event_idx);
> > +     pmc->counter_val = kvpmu->fw_event[fevent_code].value;
> > +
> > +     *out_val = pmc->counter_val >> 32;
> > +
> > +     return 0;
> > +}
> > +
> >  static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> >                       unsigned long *out_val)
> >  {
> > @@ -702,6 +725,18 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> >       return 0;
> >  }
> >
> > +int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
> > +                                   struct kvm_vcpu_sbi_return *retdata)
> > +{
> > +     int ret;
> > +
> > +     ret = pmu_fw_ctr_read_hi(vcpu, cidx, &retdata->out_val);
> > +     if (ret == -EINVAL)
> > +             retdata->err_val = SBI_ERR_INVALID_PARAM;
> > +
> > +     return 0;
>
> I see this follows the pattern we have with kvm_riscv_vcpu_pmu_ctr_read
> and pmu_ctr_read, but I wonder if we really need the
> kvm_riscv_vcpu_pmu_ctr_read() and kvm_riscv_vcpu_pmu_fw_ctr_read_hi()
> wrapper functions?
>

pmu_ctr_read is invoked from kvm_riscv_vcpu_pmu_read_hpm as well.
That's why I have a wrapper to read the counters in the SBI path.
kvm_riscv_vcpu_pmu_ctr_read

kvm_riscv_vcpu_pmu_fw_ctr_read_hi just followed the pattern.

If we refactor the firmware counter read and hpmcounter read to be
separate functions,
we won't need the wrapper though. But I am not sure if it will
actually improve the code readability.

If you think it's better that way, I will modify it.

Looking at this code, we should definitely change the
kvm_riscv_vcpu_pmu_ctr_read
to kvm_riscv_vcpu_pmu_fw_ctr_read to reflect the real purpose.

> > +}
> > +
> >  int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> >                               struct kvm_vcpu_sbi_return *retdata)
> >  {
> > @@ -775,7 +810,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> >                       pmc->cinfo.csr = CSR_CYCLE + i;
> >               } else {
> >                       pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
> > -                     pmc->cinfo.width = BITS_PER_LONG - 1;
> > +                     pmc->cinfo.width = 63;
> >               }
> >       }
> >
> > diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> > index 9f61136e4bb1..58a0e5587e2a 100644
> > --- a/arch/riscv/kvm/vcpu_sbi_pmu.c
> > +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> > @@ -64,6 +64,12 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >       case SBI_EXT_PMU_COUNTER_FW_READ:
> >               ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
> >               break;
> > +     case SBI_EXT_PMU_COUNTER_FW_READ_HI:
> > +             if (IS_ENABLED(CONFIG_32BIT))
> > +                     ret = kvm_riscv_vcpu_pmu_fw_ctr_read_hi(vcpu, cp->a0, retdata);
> > +             else
> > +                     retdata->out_val = 0;
> > +             break;
> >       case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM:
> >               ret = kvm_riscv_vcpu_pmu_setup_snapshot(vcpu, cp->a0, cp->a1, cp->a2, retdata);
> >               break;
> > --
> > 2.34.1
> >
>
> Thanks,
> drew



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 09/15] RISC-V: KVM: Add perf sampling support for guests
  2024-03-02 10:33   ` Andrew Jones
@ 2024-04-02  8:33     ` Atish Patra
  2024-04-05 12:05       ` Andrew Jones
  0 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-04-02  8:33 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On 3/2/24 02:33, Andrew Jones wrote:
> On Wed, Feb 28, 2024 at 05:01:24PM -0800, Atish Patra wrote:
>> KVM enables perf for guest via counter virtualization. However, the
>> sampling can not be supported as there is no mechanism to enabled
>> trap/emulate scountovf in ISA yet. Rely on the SBI PMU snapshot
>> to provide the counter overflow data via the shared memory.
>>
>> In case of sampling event, the host first guest the LCOFI interrupt
>         
> s/guest the LCOFI/sets the guest's LCOFI/
> 
>> and injects to the guest via irq filtering mechanism defined in AIA
>> specification. Thus, ssaia must be enabled in the host in order to
>> use perf sampling in the guest. No other AIA dpeendancy w.r.t kernel
> 
> dependency
> 

Fixed both.

>> is required.
>>
>> Reviewed-by: Anup Patel <anup@brainfault.org>
>> Signed-off-by: Atish Patra <atishp@rivosinc.com>
>> ---
>>   arch/riscv/include/asm/csr.h          |  3 +-
>>   arch/riscv/include/asm/kvm_vcpu_pmu.h |  3 ++
>>   arch/riscv/include/uapi/asm/kvm.h     |  1 +
>>   arch/riscv/kvm/aia.c                  |  5 ++
>>   arch/riscv/kvm/vcpu.c                 | 14 ++++--
>>   arch/riscv/kvm/vcpu_onereg.c          |  9 +++-
>>   arch/riscv/kvm/vcpu_pmu.c             | 72 ++++++++++++++++++++++++---
>>   7 files changed, 96 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
>> index 603e5a3c61f9..c0de2fd6c564 100644
>> --- a/arch/riscv/include/asm/csr.h
>> +++ b/arch/riscv/include/asm/csr.h
>> @@ -168,7 +168,8 @@
>>   #define VSIP_TO_HVIP_SHIFT	(IRQ_VS_SOFT - IRQ_S_SOFT)
>>   #define VSIP_VALID_MASK		((_AC(1, UL) << IRQ_S_SOFT) | \
>>   				 (_AC(1, UL) << IRQ_S_TIMER) | \
>> -				 (_AC(1, UL) << IRQ_S_EXT))
>> +				 (_AC(1, UL) << IRQ_S_EXT) | \
>> +				 (_AC(1, UL) << IRQ_PMU_OVF))
>>   
>>   /* AIA CSR bits */
>>   #define TOPI_IID_SHIFT		16
>> diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
>> index 586bab84be35..8cb21a4f862c 100644
>> --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
>> +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
>> @@ -36,6 +36,7 @@ struct kvm_pmc {
>>   	bool started;
>>   	/* Monitoring event ID */
>>   	unsigned long event_idx;
>> +	struct kvm_vcpu *vcpu;
>>   };
>>   
>>   /* PMU data structure per vcpu */
>> @@ -50,6 +51,8 @@ struct kvm_pmu {
>>   	bool init_done;
>>   	/* Bit map of all the virtual counter used */
>>   	DECLARE_BITMAP(pmc_in_use, RISCV_KVM_MAX_COUNTERS);
>> +	/* Bit map of all the virtual counter overflown */
>> +	DECLARE_BITMAP(pmc_overflown, RISCV_KVM_MAX_COUNTERS);
>>   	/* The address of the counter snapshot area (guest physical address) */
>>   	gpa_t snapshot_addr;
>>   	/* The actual data of the snapshot */
>> diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
>> index 7499e88a947c..e8b7545f1803 100644
>> --- a/arch/riscv/include/uapi/asm/kvm.h
>> +++ b/arch/riscv/include/uapi/asm/kvm.h
>> @@ -166,6 +166,7 @@ enum KVM_RISCV_ISA_EXT_ID {
>>   	KVM_RISCV_ISA_EXT_ZVFH,
>>   	KVM_RISCV_ISA_EXT_ZVFHMIN,
>>   	KVM_RISCV_ISA_EXT_ZFA,
>> +	KVM_RISCV_ISA_EXT_SSCOFPMF,
>>   	KVM_RISCV_ISA_EXT_MAX,
>>   };
>>   
>> diff --git a/arch/riscv/kvm/aia.c b/arch/riscv/kvm/aia.c
>> index a944294f6f23..0f0a9d11bb5f 100644
>> --- a/arch/riscv/kvm/aia.c
>> +++ b/arch/riscv/kvm/aia.c
>> @@ -545,6 +545,9 @@ void kvm_riscv_aia_enable(void)
>>   	enable_percpu_irq(hgei_parent_irq,
>>   			  irq_get_trigger_type(hgei_parent_irq));
>>   	csr_set(CSR_HIE, BIT(IRQ_S_GEXT));
>> +	/* Enable IRQ filtering for overflow interrupt only if sscofpmf is present */
>> +	if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF))
>> +		csr_write(CSR_HVIEN, BIT(IRQ_PMU_OVF));
>>   }
>>   
>>   void kvm_riscv_aia_disable(void)
>> @@ -558,6 +561,8 @@ void kvm_riscv_aia_disable(void)
>>   		return;
>>   	hgctrl = get_cpu_ptr(&aia_hgei);
>>   
>> +	if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF))
>> +		csr_clear(CSR_HVIEN, BIT(IRQ_PMU_OVF));
>>   	/* Disable per-CPU SGEI interrupt */
>>   	csr_clear(CSR_HIE, BIT(IRQ_S_GEXT));
>>   	disable_percpu_irq(hgei_parent_irq);
>> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
>> index b5ca9f2e98ac..fcd8ad4de4d2 100644
>> --- a/arch/riscv/kvm/vcpu.c
>> +++ b/arch/riscv/kvm/vcpu.c
>> @@ -365,6 +365,12 @@ void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu)
>>   		}
>>   	}
>>   
>> +	/* Sync up the HVIP.LCOFIP bit changes (only clear) by the guest */
>> +	if ((csr->hvip ^ hvip) & (1UL << IRQ_PMU_OVF)) {
>> +		if (!test_and_set_bit(IRQ_PMU_OVF, v->irqs_pending_mask))
>> +			clear_bit(IRQ_PMU_OVF, v->irqs_pending);
>> +	}
>> +
>>   	/* Sync-up AIA high interrupts */
>>   	kvm_riscv_vcpu_aia_sync_interrupts(vcpu);
>>   
>> @@ -382,7 +388,8 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
>>   	if (irq < IRQ_LOCAL_MAX &&
>>   	    irq != IRQ_VS_SOFT &&
>>   	    irq != IRQ_VS_TIMER &&
>> -	    irq != IRQ_VS_EXT)
>> +	    irq != IRQ_VS_EXT &&
>> +	    irq != IRQ_PMU_OVF)
>>   		return -EINVAL;
>>   
>>   	set_bit(irq, vcpu->arch.irqs_pending);
>> @@ -397,14 +404,15 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
>>   int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
>>   {
>>   	/*
>> -	 * We only allow VS-mode software, timer, and external
>> +	 * We only allow VS-mode software, timer, counter overflow and external
>>   	 * interrupts when irq is one of the local interrupts
>>   	 * defined by RISC-V privilege specification.
>>   	 */
>>   	if (irq < IRQ_LOCAL_MAX &&
>>   	    irq != IRQ_VS_SOFT &&
>>   	    irq != IRQ_VS_TIMER &&
>> -	    irq != IRQ_VS_EXT)
>> +	    irq != IRQ_VS_EXT &&
>> +	    irq != IRQ_PMU_OVF)
>>   		return -EINVAL;
>>   
>>   	clear_bit(irq, vcpu->arch.irqs_pending);
>> diff --git a/arch/riscv/kvm/vcpu_onereg.c b/arch/riscv/kvm/vcpu_onereg.c
>> index 5f7355e96008..a072910820c2 100644
>> --- a/arch/riscv/kvm/vcpu_onereg.c
>> +++ b/arch/riscv/kvm/vcpu_onereg.c
>> @@ -36,6 +36,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
>>   	/* Multi letter extensions (alphabetically sorted) */
>>   	KVM_ISA_EXT_ARR(SMSTATEEN),
>>   	KVM_ISA_EXT_ARR(SSAIA),
>> +	KVM_ISA_EXT_ARR(SSCOFPMF),
>>   	KVM_ISA_EXT_ARR(SSTC),
>>   	KVM_ISA_EXT_ARR(SVINVAL),
>>   	KVM_ISA_EXT_ARR(SVNAPOT),
>> @@ -115,6 +116,7 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
>>   	case KVM_RISCV_ISA_EXT_I:
>>   	case KVM_RISCV_ISA_EXT_M:
>>   	case KVM_RISCV_ISA_EXT_SSTC:
>> +	case KVM_RISCV_ISA_EXT_SSCOFPMF:
> 
> It should go above SSTC to keep the alphabet happy,
> 

Fixed.

> but it should be possible for the VMM to disable this extension in the
> guest. We just need to change all the checks in KVM of the host's ISA
> for RISCV_ISA_EXT_SSCOFPMF to checking the guest's ISA instead. Maybe
> it's not worth it, though, if the guest PMU isn't useful without overflow.
> But, sometimes it's nice to be able to disable stuff for debug and
> workarounds.
> 

As per my understanding, kvm_riscv_vcpu_isa_disable_allowed only returns
true for those extensions which can be disabled architecturally.

VMM can still disable any extension by not adding to the device tree.
In fact, that's how kvmtool can disable sstc or sscofpmf with 
--disable-<isa-ext command>.

The warning is bit confused though.

For example: if you run kvmtool with --disable-sstc

"Warning: Failed to disable sstc ISA exension"

But sstc is disabled: Here is the cpuinfo output.
# cat /proc/cpuinfo
processor       : 0
hart            : 0
isa             : 
rv64imafdc_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zfa_zba_zbb_zbc_zbs_smstateen_sscofpmf
mmu             : sv57
mvendorid       : 0x0
marchid         : 0x0
mimpid          : 0x0
hart isa        : 
rv64imafdc_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zfa_zba_zbb_zbc_zbs_smstateen_sscofpmf


Let me know if I misunderstood your comment.

>>   	case KVM_RISCV_ISA_EXT_SVINVAL:
>>   	case KVM_RISCV_ISA_EXT_SVNAPOT:
>>   	case KVM_RISCV_ISA_EXT_ZBA:
>> @@ -171,8 +173,13 @@ void kvm_riscv_vcpu_setup_isa(struct kvm_vcpu *vcpu)
>>   	for (i = 0; i < ARRAY_SIZE(kvm_isa_ext_arr); i++) {
>>   		host_isa = kvm_isa_ext_arr[i];
>>   		if (__riscv_isa_extension_available(NULL, host_isa) &&
>> -		    kvm_riscv_vcpu_isa_enable_allowed(i))
>> +		    kvm_riscv_vcpu_isa_enable_allowed(i)) {
>> +			/* Sscofpmf depends on interrupt filtering defined in ssaia */
>> +			if (host_isa == RISCV_ISA_EXT_SSCOFPMF &&
>> +			    !__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSAIA))
>> +				continue;
> 
> We shouldn't need to change kvm_riscv_vcpu_setup_isa(). We just need to
> add a case for KVM_RISCV_ISA_EXT_SSCOFPMF to
> kvm_riscv_vcpu_isa_enable_allowed().
> 

Good point. Done.

>>   			set_bit(host_isa, vcpu->arch.isa);
>> +		}
>>   	}
>>   }
>>   
>> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
>> index 74865e6050a1..a02f7b981005 100644
>> --- a/arch/riscv/kvm/vcpu_pmu.c
>> +++ b/arch/riscv/kvm/vcpu_pmu.c
>> @@ -39,7 +39,7 @@ static u64 kvm_pmu_get_sample_period(struct kvm_pmc *pmc)
>>   	u64 sample_period;
>>   
>>   	if (!pmc->counter_val)
>> -		sample_period = counter_val_mask + 1;
>> +		sample_period = counter_val_mask;
> 
> This change looks unrelated.
> 
Technically, it is related as this would be problematic when counter 
overflow is enabled as it may result in sample_period as 0 (if mask is 
0xFFFFFFFFFFFFFFFF) and generate spurious interrupts.

I can create a separate patch with above explaination.


>>   	else
>>   		sample_period = (-pmc->counter_val) & counter_val_mask;
>>   
>> @@ -229,6 +229,47 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
>>   	return 0;
>>   }
>>   
>> +static void kvm_riscv_pmu_overflow(struct perf_event *perf_event,
>> +				   struct perf_sample_data *data,
>> +				   struct pt_regs *regs)
>> +{
>> +	struct kvm_pmc *pmc = perf_event->overflow_handler_context;
>> +	struct kvm_vcpu *vcpu = pmc->vcpu;
>> +	struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
>> +	struct riscv_pmu *rpmu = to_riscv_pmu(perf_event->pmu);
>> +	u64 period;
>> +
>> +	/*
>> +	 * Stop the event counting by directly accessing the perf_event.
>> +	 * Otherwise, this needs to deferred via a workqueue.
>> +	 * That will introduce skew in the counter value because the actual
>> +	 * physical counter would start after returning from this function.
>> +	 * It will be stopped again once the workqueue is scheduled
>> +	 */
>> +	rpmu->pmu.stop(perf_event, PERF_EF_UPDATE);
>> +
>> +	/*
>> +	 * The hw counter would start automatically when this function returns.
>> +	 * Thus, the host may continue to interrupt and inject it to the guest
>> +	 * even without the guest configuring the next event. Depending on the hardware
>> +	 * the host may have some sluggishness only if privilege mode filtering is not
>> +	 * available. In an ideal world, where qemu is not the only capable hardware,
>> +	 * this can be removed.
>> +	 * FYI: ARM64 does this way while x86 doesn't do anything as such.
>> +	 * TODO: Should we keep it for RISC-V ?
>> +	 */
>> +	period = -(local64_read(&perf_event->count));
>> +
>> +	local64_set(&perf_event->hw.period_left, 0);
>> +	perf_event->attr.sample_period = period;
>> +	perf_event->hw.sample_period = period;
>> +
>> +	set_bit(pmc->idx, kvpmu->pmc_overflown);
>> +	kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_PMU_OVF);
>> +
>> +	rpmu->pmu.start(perf_event, PERF_EF_RELOAD);
>> +}
>> +
>>   static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
>>   				      unsigned long flags, unsigned long eidx,
>>   				      unsigned long evtdata)
>> @@ -248,7 +289,7 @@ static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_att
>>   	 */
>>   	attr->sample_period = kvm_pmu_get_sample_period(pmc);
>>   
>> -	event = perf_event_create_kernel_counter(attr, -1, current, NULL, pmc);
>> +	event = perf_event_create_kernel_counter(attr, -1, current, kvm_riscv_pmu_overflow, pmc);
>>   	if (IS_ERR(event)) {
>>   		pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
>>   		return PTR_ERR(event);
>> @@ -436,6 +477,8 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>>   		pmc_index = i + ctr_base;
>>   		if (!test_bit(pmc_index, kvpmu->pmc_in_use))
>>   			continue;
>> +		/* The guest started the counter again. Reset the overflow status */
>> +		clear_bit(pmc_index, kvpmu->pmc_overflown);
>>   		pmc = &kvpmu->pmc[pmc_index];
>>   		if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE) {
>>   			pmc->counter_val = ival;
>> @@ -474,6 +517,10 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>>   		}
>>   	}
>>   
>> +	/* The guest have serviced the interrupt and starting the counter again */
>> +	if (test_bit(IRQ_PMU_OVF, vcpu->arch.irqs_pending))
>> +		kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_PMU_OVF);
>> +
>>   out:
>>   	retdata->err_val = sbiret;
>>   
>> @@ -540,7 +587,13 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>>   			else if (pmc->perf_event)
>>   				pmc->counter_val += perf_event_read_value(pmc->perf_event,
>>   									  &enabled, &running);
>> -			/* TODO: Add counter overflow support when sscofpmf support is added */
>> +			/*
>> +			 * The counter and overflow indicies in the snapshot region are w.r.to
>> +			 * cbase. Modify the set bit in the counter mask instead of the pmc_index
>> +			 * which indicates the absolute counter index.
>> +			 */
>> +			if (test_bit(pmc_index, kvpmu->pmc_overflown))
>> +				kvpmu->sdata->ctr_overflow_mask |= (1UL << i);
> 
> Just in case you missed this one; BIT()
> 
>>   			kvpmu->sdata->ctr_values[i] = pmc->counter_val;
>>   			kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
>>   					     sizeof(struct riscv_pmu_snapshot_data));
>> @@ -549,15 +602,20 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>>   		if (flags & SBI_PMU_STOP_FLAG_RESET) {
>>   			pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
>>   			clear_bit(pmc_index, kvpmu->pmc_in_use);
>> +			clear_bit(pmc_index, kvpmu->pmc_overflown);
>>   			if (snap_flag_set) {
>>   				/* Clear the snapshot area for the upcoming deletion event */
>>   				kvpmu->sdata->ctr_values[i] = 0;
>> +				/*
>> +				 * Only clear the given counter as the caller is responsible to
>> +				 * validate both the overflow mask and configured counters.
>> +				 */
>> +				kvpmu->sdata->ctr_overflow_mask &= ~(1UL << i);
> 
> And another BIT()
> 
>>   				kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
>>   						     sizeof(struct riscv_pmu_snapshot_data));
>>   			}
>>   		}
>>   	}
>> -
>>   out:
>>   	retdata->err_val = sbiret;
>>   
>> @@ -700,6 +758,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
>>   		pmc = &kvpmu->pmc[i];
>>   		pmc->idx = i;
>>   		pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
>> +		pmc->vcpu = vcpu;
>>   		if (i < kvpmu->num_hw_ctrs) {
>>   			pmc->cinfo.type = SBI_PMU_CTR_TYPE_HW;
>>   			if (i < 3)
>> @@ -732,13 +791,14 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
>>   	if (!kvpmu)
>>   		return;
>>   
>> -	for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) {
>> +	for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS) {
>>   		pmc = &kvpmu->pmc[i];
>>   		pmc->counter_val = 0;
>>   		kvm_pmu_release_perf_event(pmc);
>>   		pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
>>   	}
>> -	bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
>> +	bitmap_zero(kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS);
> 
> Ideally the RISCV_MAX_COUNTERS change would go in a separate patch,
> but 64 == 64, so OK.
> 
>> +	bitmap_zero(kvpmu->pmc_overflown, RISCV_KVM_MAX_COUNTERS);
>>   	memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
>>   	kvm_pmu_clear_snapshot_area(vcpu);
>>   }
>> -- 
>> 2.34.1
>>
> 
> Thanks,
> drew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest
  2024-03-02 11:52   ` Andrew Jones
@ 2024-04-02  8:34     ` Atish Patra
  2024-04-05 12:48       ` Andrew Jones
  0 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-04-02  8:34 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Atish Patra, linux-kernel, Albert Ou, Alexandre Ghiti,
	Anup Patel, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv, kvm,
	linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Sat, Mar 2, 2024 at 3:52 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Wed, Feb 28, 2024 at 05:01:28PM -0800, Atish Patra wrote:
> > This test implements basic sanity test and cycle/instret event
> > counting tests.
> >
> > Signed-off-by: Atish Patra <atishp@rivosinc.com>
> > ---
> >  tools/testing/selftests/kvm/Makefile        |   1 +
> >  tools/testing/selftests/kvm/riscv/sbi_pmu.c | 340 ++++++++++++++++++++
> >  2 files changed, 341 insertions(+)
> >  create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu.c
> >
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index 426f85798aea..b2dce6843b9e 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -195,6 +195,7 @@ TEST_GEN_PROGS_riscv += kvm_create_max_vcpus
> >  TEST_GEN_PROGS_riscv += kvm_page_table_test
> >  TEST_GEN_PROGS_riscv += set_memory_region_test
> >  TEST_GEN_PROGS_riscv += steal_time
> > +TEST_GEN_PROGS_riscv += riscv/sbi_pmu
>
> We put the
>
>  TEST_GEN_PROGS_riscv += riscv/...
>
> lines at the top of the
>
>  TEST_GEN_PROGS_riscv += ...
>

Done.

> set
>
> >
> >  SPLIT_TESTS += arch_timer
> >  SPLIT_TESTS += get-reg-list
> > diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> > new file mode 100644
> > index 000000000000..fc1fc5eea99e
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
> > @@ -0,0 +1,340 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * arch_timer.c - Tests the riscv64 sstc timer IRQ functionality
> > + *
> > + * The test validates the sstc timer IRQs using vstimecmp registers.
> > + * It's ported from the aarch64 arch_timer test.
>
> The header (apparently borrowed from arch_timer.c) needs to be updated
> to talk about the pmu instead of the timer.
>

Oops. Thanks for catching it. Fixed it.

> > + *
> > + * Copyright (c) 2024, Rivos Inc.
> > + */
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <sys/types.h>
> > +#include "kvm_util.h"
> > +#include "test_util.h"
> > +#include "processor.h"
> > +
> > +/* Maximum counters (firmware + hardware)*/
>                                             ^ space
>
> > +#define RISCV_MAX_PMU_COUNTERS 64
> > +union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
> > +
> > +/* Cache the available counters in a bitmask */
> > +static unsigned long counter_mask_available;
> > +
> > +unsigned long pmu_csr_read_num(int csr_num)
> > +{
> > +#define switchcase_csr_read(__csr_num, __val)                {\
> > +     case __csr_num:                                 \
> > +             __val = csr_read(__csr_num);            \
> > +             break; }
> > +#define switchcase_csr_read_2(__csr_num, __val)              {\
> > +     switchcase_csr_read(__csr_num + 0, __val)        \
> > +     switchcase_csr_read(__csr_num + 1, __val)}
> > +#define switchcase_csr_read_4(__csr_num, __val)              {\
> > +     switchcase_csr_read_2(__csr_num + 0, __val)      \
> > +     switchcase_csr_read_2(__csr_num + 2, __val)}
> > +#define switchcase_csr_read_8(__csr_num, __val)              {\
> > +     switchcase_csr_read_4(__csr_num + 0, __val)      \
> > +     switchcase_csr_read_4(__csr_num + 4, __val)}
> > +#define switchcase_csr_read_16(__csr_num, __val)     {\
> > +     switchcase_csr_read_8(__csr_num + 0, __val)      \
> > +     switchcase_csr_read_8(__csr_num + 8, __val)}
> > +#define switchcase_csr_read_32(__csr_num, __val)     {\
> > +     switchcase_csr_read_16(__csr_num + 0, __val)     \
> > +     switchcase_csr_read_16(__csr_num + 16, __val)}
> > +
> > +     unsigned long ret = 0;
> > +
> > +     switch (csr_num) {
> > +     switchcase_csr_read_32(CSR_CYCLE, ret)
> > +     switchcase_csr_read_32(CSR_CYCLEH, ret)
> > +     default :
> > +             break;
> > +     }
> > +
> > +     return ret;
> > +#undef switchcase_csr_read_32
> > +#undef switchcase_csr_read_16
> > +#undef switchcase_csr_read_8
> > +#undef switchcase_csr_read_4
> > +#undef switchcase_csr_read_2
> > +#undef switchcase_csr_read
> > +}
> > +
> > +static inline void dummy_func_loop(int iter)
> > +{
> > +     int i = 0;
> > +
> > +     while (i < iter) {
> > +             asm volatile("nop");
> > +             i++;
> > +     }
> > +}
> > +
> > +static void guest_illegal_exception_handler(struct ex_regs *regs)
> > +{
> > +     __GUEST_ASSERT(regs->cause == EXC_INST_ILLEGAL,
> > +                    "Unexpected exception handler %lx\n", regs->cause);
>
> Shouldn't we be reporting somehow that we were here? We seem to be using
> this handler to skip instructions which don't work, which is fine, if
> we have some knowledge we skipped them and then do something else.
> Otherwise I don't understand.
>

This is only used in test_vm_basic_test to validate that the guest
will get an illegal
exception if they try to access without configuring first.

Any other test that validates the functionality will not use it.

> > +
> > +     /* skip the trapping instruction */
> > +     regs->epc += 4;
> > +}
> > +
> > +static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
> > +                                    unsigned long cflags,
> > +                                    unsigned long event)
> > +{
> > +     struct sbiret ret;
> > +
> > +     ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase, cmask,
> > +                     cflags, event, 0, 0);
> > +     __GUEST_ASSERT(ret.error == 0, "config matching failed %ld\n", ret.error);
> > +     GUEST_ASSERT((ret.value < RISCV_MAX_PMU_COUNTERS) &&
> > +                 ((1UL << ret.value) & counter_mask_available));
>
> I'd prefer to break these apart so it's more clear which one fails, if one
> fails.
>
>    GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS);
>    GUEST_ASSERT(BIT(ret.value) & counter_mask_available);
>

Done.

> > +
> > +     return ret.value;
> > +}
> > +
> > +static unsigned long get_num_counters(void)
> > +{
> > +     struct sbiret ret;
> > +
> > +     ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_NUM_COUNTERS, 0, 0, 0, 0, 0, 0);
> > +
> > +     __GUEST_ASSERT(ret.error == 0, "Unable to retrieve number of counters from SBI PMU");
> > +
>
> nit: drop this blank line
>
> > +     __GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS,
> > +                    "Invalid number of counters %ld\n", ret.value);
> > +
> > +     return ret.value;
> > +}
> > +
> > +static void update_counter_info(int num_counters)
> > +{
> > +     int i = 0;
> > +     struct sbiret ret;
> > +
> > +     for (i = 0; i < num_counters; i++) {
> > +             ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i, 0, 0, 0, 0, 0);
> > +
> > +             /* There can be gaps in logical counter indicies*/
> > +             if (!ret.error)
> > +                     GUEST_ASSERT_NE(ret.value, 0);
>
> I guess this should be
>
>   if (ret.error)
>     continue;
>   GUEST_ASSERT_NE(ret.value, 0);
>

Fixed it.

> > +
> > +             ctrinfo_arr[i].value = ret.value;
> > +             counter_mask_available |= BIT(i);
> > +     }
> > +
> > +     GUEST_ASSERT(counter_mask_available > 0);
> > +}
> > +
> > +static unsigned long read_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
> > +{
> > +     unsigned long counter_val = 0;
> > +     struct sbiret ret;
> > +
> > +     __GUEST_ASSERT(ctrinfo.type < 2, "Invalid counter type %d", ctrinfo.type);
> > +
> > +     if (ctrinfo.type == SBI_PMU_CTR_TYPE_HW) {
> > +             counter_val = pmu_csr_read_num(ctrinfo.csr);
> > +     } else if (ctrinfo.type == SBI_PMU_CTR_TYPE_FW) {
> > +             ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ, idx, 0, 0, 0, 0, 0);
> > +             GUEST_ASSERT(ret.error == 0);
> > +             counter_val = ret.value;
> > +     }
> > +
> > +     return counter_val;
> > +}
> > +
> > +static void start_counter(unsigned long counter, unsigned long start_flags,
> > +                       unsigned long ival)
> > +{
> > +     struct sbiret ret;
> > +
> > +     ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, counter, 1, start_flags,
> > +                     ival, 0, 0);
> > +     __GUEST_ASSERT(ret.error == 0, "Unable to start counter %ld\n", counter);
> > +}
> > +
> > +static void stop_counter(unsigned long counter, unsigned long stop_flags)
> > +{
> > +     struct sbiret ret;
> > +
> > +     ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, counter, 1, stop_flags,
> > +                     0, 0, 0);
> > +     if (stop_flags & SBI_PMU_STOP_FLAG_RESET)
> > +             __GUEST_ASSERT(ret.error == SBI_ERR_ALREADY_STOPPED,
> > +                            "Unable to stop counter %ld\n", counter);
>
> This looks like we're abusing the SBI_PMU_STOP_FLAG_RESET flag to do the
> already-stopped test. I'd rather helper functions work generally and do
> stuff like this in test code with comments pointing it out. Or just
> cleanly and separately set up an already-stopped test, so it's clear.
>

Doing it in test code adds redundancy. I will create two separate functions.

> > +     else
> > +             __GUEST_ASSERT(ret.error == 0, "Unable to stop counter %ld error %ld\n",
> > +                            counter, ret.error);
> > +}
> > +
> > +static void test_pmu_event(unsigned long event)
> > +{
> > +     unsigned long counter;
> > +     unsigned long counter_value_pre, counter_value_post;
> > +     unsigned long counter_init_value = 100;
> > +
> > +     counter = get_counter_index(0, counter_mask_available, 0, event);
> > +     counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
> > +
> > +     /* Do not set the initial value */
> > +     start_counter(counter, 0, counter_init_value);
> > +     dummy_func_loop(10000);
> > +
>
> nit: I'd remove this blank line so we have start/dummy/stop all together
> in a group. Same comment below.
>

Fixed it.

> > +     stop_counter(counter, 0);
> > +
> > +     counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> > +     __GUEST_ASSERT(counter_value_post > counter_value_pre,
> > +                    "counter_value_post %lx counter_value_pre %lx\n",
> > +                    counter_value_post, counter_value_pre);
> > +
> > +     /* Now set the initial value and compare */
> > +     start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_init_value);
>
> We should try to confirm that we reset the counter, otherwise the check
> below only proves that the value we read is greater than 100, which it
> is possible even if the reset doesn't work.
>

Hmm. There is no way to just update the counter value without starting
it. Reading it without stopping is not reliable.
Maybe we can do this.

1. Reset it to 100. Stop it immediately after and read it. Let's say
the value is X
2. Now reset it to counter  X + 1000.
3. Do the validation with the above reset value in #2.

Wdyt ?

> > +     dummy_func_loop(10000);
> > +
> > +     stop_counter(counter, 0);
> > +
> > +     counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> > +     __GUEST_ASSERT(counter_value_post > counter_init_value,
> > +                    "counter_value_post %lx counter_init_value %lx\n",
> > +                    counter_value_post, counter_init_value);
> > +
> > +     stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
> > +}
> > +
> > +static void test_invalid_event(void)
> > +{
> > +     struct sbiret ret;
> > +     unsigned long event = 0x1234; /* A random event */
> > +
> > +     ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, 0,
> > +                     counter_mask_available, 0, event, 0, 0);
> > +     GUEST_ASSERT_EQ(ret.error, SBI_ERR_NOT_SUPPORTED);
> > +}
> > +
> > +static void test_pmu_events(int cpu)
>
> cpu is unused so the parameter list can be void. Same comment for
> test_pmu_basic_sanity()
>

Fixed.


> > +{
> > +     int num_counters = 0;
> > +
> > +     /* Get the counter details */
> > +     num_counters = get_num_counters();
> > +     update_counter_info(num_counters);
> > +
> > +     /* Sanity testing for any random invalid event */
> > +     test_invalid_event();
> > +
> > +     /* Only these two events are guranteed to be present */
>
> guaranteed
>
> > +     test_pmu_event(SBI_PMU_HW_CPU_CYCLES);
> > +     test_pmu_event(SBI_PMU_HW_INSTRUCTIONS);
> > +
> > +     GUEST_DONE();
> > +}
> > +
> > +static void test_pmu_basic_sanity(int cpu)
> > +{
> > +     long out_val = 0;
> > +     bool probe;
> > +     struct sbiret ret;
> > +     int num_counters = 0, i;
> > +     unsigned long counter_val = -1;
> > +     union sbi_pmu_ctr_info ctrinfo;
> > +
> > +     probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
> > +     GUEST_ASSERT(probe && out_val == 1);
> > +
> > +     num_counters = get_num_counters();
> > +
> > +     for (i = 0; i < num_counters; i++) {
> > +             ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i,
> > +                             0, 0, 0, 0, 0);
> > +
> > +             /* There can be gaps in logical counter indicies*/
> > +             if (!ret.error)
> > +                     GUEST_ASSERT_NE(ret.value, 0);
> > +             else
> > +                     continue;
>
> nit:
>
>  if (ret.error)
>     continue;
>   GUEST_ASSERT_NE(ret.value, 0);
>

Done.

> > +
> > +             ctrinfo.value = ret.value;
> > +
> > +             /* Accesibility check of hardware and read capability of firmware counters */
>
> Accessibility
>

Fixed.

> > +             counter_val = read_counter(i, ctrinfo);
> > +             /* The spec doesn't mandate any initial value. Verify if a sane value */
> > +             GUEST_ASSERT_NE(counter_val, -1);
>
> Hmm, does -1 have any special meaning? Otherwise it's a member of the set
> of 'any', so there's nothing we can test. Or, maybe we can test that bits
> higher than the ctrinfo bitwidth are zero. Although those bits might also
> be unspecified, which means there's nothing we can test.
>

Yeah. I have removed the validation with a clarification.

> > +     }
> > +
> > +     GUEST_DONE();
> > +}
> > +
> > +static void run_vcpu(struct kvm_vcpu *vcpu)
> > +{
> > +     struct ucall uc;
> > +
> > +     vcpu_run(vcpu);
> > +     switch (get_ucall(vcpu, &uc)) {
> > +     case UCALL_ABORT:
> > +             REPORT_GUEST_ASSERT(uc);
> > +             break;
> > +     case UCALL_DONE:
> > +     case UCALL_SYNC:
> > +             break;
> > +     default:
> > +             TEST_FAIL("Unknown ucall %lu", uc.cmd);
> > +             break;
> > +     }
> > +}
> > +
> > +void test_vm_destroy(struct kvm_vm *vm)
> > +{
> > +     memset(ctrinfo_arr, 0, sizeof(union sbi_pmu_ctr_info) * RISCV_MAX_PMU_COUNTERS);
> > +     counter_mask_available = 0;
> > +     kvm_vm_free(vm);
> > +}
> > +
> > +static void test_vm_basic_test(void *guest_code)
> > +{
> > +     struct kvm_vm *vm;
> > +     struct kvm_vcpu *vcpu;
> > +
> > +     vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> > +     __TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
>
> Shouldn't this be checking RISCV_SBI_EXT_REG(KVM_RISCV_SBI_EXT_PMU)?
>

Oops. Fat fingers. Fixed it.

> We should probably create two more helpers
>
>  bool __vcpu_has_isa_ext(struct kvm_vcpu *vcpu, uint64_t isa_ext)
>  {
>     return __vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(isa_ext));
>  }
>  bool __vcpu_has_sbi_ext(struct kvm_vcpu *vcpu, uint64_t sbi_ext)
>  {
>     return __vcpu_has_ext(vcpu, RISCV_SBI_EXT_REG(sbi_ext));
>  }
>
> to make the extension checks less verbose and error prone.
>

Good idea. Added the patch.

> > +                                "SBI PMU not available, skipping test");
> > +     vm_init_vector_tables(vm);
> > +     /* Illegal instruction handler is required to verify read access without configuration */
> > +     vm_install_exception_handler(vm, EXC_INST_ILLEGAL, guest_illegal_exception_handler);
> > +
> > +     vcpu_init_vector_tables(vcpu);
> > +     vcpu_args_set(vcpu, 1, 0);
>
> We don't use the arguments in the guest code functions so we don't need
> this call to vcpu_args_set()
>

Done.

> > +     run_vcpu(vcpu);
> > +
> > +     test_vm_destroy(vm);
> > +}
> > +
> > +static void test_vm_events_test(void *guest_code)
> > +{
> > +     struct kvm_vm *vm = NULL;
> > +     struct kvm_vcpu *vcpu = NULL;
> > +
> > +     vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> > +     __TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
>
> Same comment as above.
>
> > +                                "SBI PMU not available, skipping test");
> > +     vcpu_args_set(vcpu, 1, 0);
>
> Same comment as above.
>
> > +     run_vcpu(vcpu);
> > +
> > +     test_vm_destroy(vm);
> > +}
> > +
> > +int main(void)
> > +{
> > +     test_vm_basic_test(test_pmu_basic_sanity);
> > +     pr_info("SBI PMU basic test : PASS\n");
> > +
> > +     test_vm_events_test(test_pmu_events);
> > +     pr_info("SBI PMU event verification test : PASS\n");
> > +
> > +     return 0;
> > +}
> > --
> > 2.34.1
> >
>
> Thanks,
> drew



--
Regards,
Atish

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 14/15] KVM: riscv: selftests: Add a test for PMU snapshot functionality
  2024-03-02 12:13   ` Andrew Jones
@ 2024-04-02  8:35     ` Atish Patra
  0 siblings, 0 replies; 56+ messages in thread
From: Atish Patra @ 2024-04-02  8:35 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Anup Patel,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On 3/2/24 04:13, Andrew Jones wrote:
> On Wed, Feb 28, 2024 at 05:01:29PM -0800, Atish Patra wrote:
>> Verify PMU snapshot functionality by setting up the shared memory
>> correctly and reading the counter values from the shared memory
>> instead of the CSR.
>>
>> Signed-off-by: Atish Patra <atishp@rivosinc.com>
>> ---
>>   .../selftests/kvm/include/riscv/processor.h   |  25 ++++
>>   .../selftests/kvm/lib/riscv/processor.c       |  12 ++
>>   tools/testing/selftests/kvm/riscv/sbi_pmu.c   | 124 ++++++++++++++++++
>>   3 files changed, 161 insertions(+)
>>
>> diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
>> index a49a39c8e8d4..e114d039e87b 100644
>> --- a/tools/testing/selftests/kvm/include/riscv/processor.h
>> +++ b/tools/testing/selftests/kvm/include/riscv/processor.h
>> @@ -173,6 +173,7 @@ enum sbi_ext_id {
>>   };
>>   
>>   enum sbi_ext_base_fid {
>> +	SBI_EXT_BASE_GET_IMP_VERSION = 2,
>>   	SBI_EXT_BASE_PROBE_EXT = 3,
>>   };
>>   
>> @@ -201,6 +202,12 @@ union sbi_pmu_ctr_info {
>>   	};
>>   };
>>   
>> +struct riscv_pmu_snapshot_data {
>> +	u64 ctr_overflow_mask;
>> +	u64 ctr_values[64];
>> +	u64 reserved[447];
>> +};
>> +
>>   struct sbiret {
>>   	long error;
>>   	long value;
>> @@ -247,6 +254,14 @@ enum sbi_pmu_ctr_type {
>>   #define SBI_PMU_STOP_FLAG_RESET (1 << 0)
>>   #define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
>>   
>> +#define SBI_STA_SHMEM_DISABLE		-1
> 
> unrelated change
> 

Dropped it.

>> +
>> +/* SBI spec version fields */
>> +#define SBI_SPEC_VERSION_DEFAULT	0x1
>> +#define SBI_SPEC_VERSION_MAJOR_SHIFT	24
>> +#define SBI_SPEC_VERSION_MAJOR_MASK	0x7f
>> +#define SBI_SPEC_VERSION_MINOR_MASK	0xffffff
>> +
>>   struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
>>   			unsigned long arg1, unsigned long arg2,
>>   			unsigned long arg3, unsigned long arg4,
>> @@ -254,6 +269,16 @@ struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
>>   
>>   bool guest_sbi_probe_extension(int extid, long *out_val);
>>   
>> +/* Make SBI version */
>> +static inline unsigned long sbi_mk_version(unsigned long major,
>> +					    unsigned long minor)
>> +{
>> +	return ((major & SBI_SPEC_VERSION_MAJOR_MASK) <<
>> +		SBI_SPEC_VERSION_MAJOR_SHIFT) | minor;
>> +}
> 
> We should probably just synch sbi.h into tools, since we need plenty
> from it.
> 

As of now I have created sbi.h and moved all the definitions there. 
There is a still lot of difference between sbi.h. Do we really want to 
bring everything in ? Should we adopt kvmtool like policy to sync sbi.h
or just do it as new test cases need sbi.h?

I can send another version with syncing sbi.h if you still think that's 
better.


>> +
>> +unsigned long get_host_sbi_impl_version(void);
>> +
>>   static inline void local_irq_enable(void)
>>   {
>>   	csr_set(CSR_SSTATUS, SR_SIE);
>> diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
>> index ec66d331a127..b0162d923e38 100644
>> --- a/tools/testing/selftests/kvm/lib/riscv/processor.c
>> +++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
>> @@ -499,3 +499,15 @@ bool guest_sbi_probe_extension(int extid, long *out_val)
>>   
>>   	return true;
>>   }
>> +
>> +unsigned long get_host_sbi_impl_version(void)
>> +{
>> +	struct sbiret ret;
>> +
>> +	ret = sbi_ecall(SBI_EXT_BASE, SBI_EXT_BASE_GET_IMP_VERSION, 0,
>> +		       0, 0, 0, 0, 0);
>> +
>> +	GUEST_ASSERT(!ret.error);
>> +
>> +	return ret.value;
>> +}
>> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
>> index fc1fc5eea99e..8ea2a6db6610 100644
>> --- a/tools/testing/selftests/kvm/riscv/sbi_pmu.c
>> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
>> @@ -21,6 +21,11 @@
>>   #define RISCV_MAX_PMU_COUNTERS 64
>>   union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
>>   
>> +/* Snapshot shared memory data */
>> +#define PMU_SNAPSHOT_GPA_BASE		(1 << 30)
>> +static void *snapshot_gva;
>> +static vm_paddr_t snapshot_gpa;
>> +
>>   /* Cache the available counters in a bitmask */
>>   static unsigned long counter_mask_available;
>>   
>> @@ -173,6 +178,20 @@ static void stop_counter(unsigned long counter, unsigned long stop_flags)
>>   			       counter, ret.error);
>>   }
>>   
>> +static void snapshot_set_shmem(vm_paddr_t gpa, unsigned long flags)
>> +{
>> +	unsigned long lo = (unsigned long)gpa;
>> +#if __riscv_xlen == 32
>> +	unsigned long hi = (unsigned long)(gpa >> 32);
>> +#else
>> +	unsigned long hi = gpa == -1 ? -1 : 0;
>> +#endif
>> +	struct sbiret ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
>> +				      lo, hi, flags, 0, 0, 0);
>> +
>> +	GUEST_ASSERT(ret.value == 0 && ret.error == 0);
>> +}
>> +
>>   static void test_pmu_event(unsigned long event)
>>   {
>>   	unsigned long counter;
>> @@ -207,6 +226,43 @@ static void test_pmu_event(unsigned long event)
>>   	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
>>   }
>>   
>> +static void test_pmu_event_snapshot(unsigned long event)
>> +{
>> +	unsigned long counter;
>> +	unsigned long counter_value_pre, counter_value_post;
>> +	unsigned long counter_init_value = 100;
>> +	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
>> +
>> +	counter = get_counter_index(0, counter_mask_available, 0, event);
>> +	counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
>> +
>> +	/* Do not set the initial value */
>> +	start_counter(counter, 0, 0);
>> +	dummy_func_loop(10000);
>> +
>> +	stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
>> +
>> +	/* The counter value is updated w.r.t relative index of cbase */
>> +	counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
>> +	__GUEST_ASSERT(counter_value_post > counter_value_pre,
>> +		       "counter_value_post %lx counter_value_pre %lx\n",
>> +		       counter_value_post, counter_value_pre);
>> +
>> +	/* Now set the initial value and compare */
>> +	WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
>> +	start_counter(counter, SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT, 0);
>> +	dummy_func_loop(10000);
>> +
>> +	stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
>> +
>> +	counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
>> +	__GUEST_ASSERT(counter_value_post > counter_init_value,
>> +		       "counter_value_post %lx counter_init_value %lx for counter\n",
>> +		       counter_value_post, counter_init_value);
>> +
>> +	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
> 
> This function is almost identical to test_pmu_event(). If we change one,
> we'll likely have to change the other. We should have a single function
> which can be used by both tests. We can do that by passing a function
> pointer for the read which is different for non-snapshot and snapshot.
> 

There are more difference than just read function. Stop/start takes 
snapshot specific flag. We also have to update the counter in the shared 
memory. If we combine the two functions to a single one, we will end up 
with bunch of if else condition which I don't like.

I am okay modifying it if you feel strongly about it though.

>> +}
>> +
>>   static void test_invalid_event(void)
>>   {
>>   	struct sbiret ret;
>> @@ -270,6 +326,41 @@ static void test_pmu_basic_sanity(int cpu)
>>   	GUEST_DONE();
>>   }
>>   
>> +static void test_pmu_events_snaphost(int cpu)
> 
> unnecessary cpu parameter
> 

Removed.

>> +{
>> +	long out_val = 0;
>> +	bool probe;
>> +	int num_counters = 0;
>> +	unsigned long sbi_impl_version;
>> +	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
>> +	int i;
>> +
>> +	probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
>> +	GUEST_ASSERT(probe && out_val == 1);
>> +
>> +	sbi_impl_version = get_host_sbi_impl_version();
>> +	if (sbi_impl_version >= sbi_mk_version(2, 0))
>> +		__GUEST_ASSERT(0, "SBI implementation version doesn't support PMU Snapshot");
>> +
>> +	snapshot_set_shmem(snapshot_gpa, 0);
>> +
>> +	/* Get the counter details */
>> +	num_counters = get_num_counters();
>> +	update_counter_info(num_counters);
>> +
>> +	/* Validate shared memory access */
>> +	GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_overflow_mask), 0);
>> +	for (i = 0; i < num_counters; i++) {
>> +		if (counter_mask_available & (1UL << i))
> 
> BIT()
> 

Done.

>> +			GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_values[i]), 0);
>> +	}
>> +	/* Only these two events are guranteed to be present */
>> +	test_pmu_event_snapshot(SBI_PMU_HW_CPU_CYCLES);
>> +	test_pmu_event_snapshot(SBI_PMU_HW_INSTRUCTIONS);
>> +
>> +	GUEST_DONE();
>> +}
>> +
>>   static void run_vcpu(struct kvm_vcpu *vcpu)
>>   {
>>   	struct ucall uc;
>> @@ -328,6 +419,36 @@ static void test_vm_events_test(void *guest_code)
>>   	test_vm_destroy(vm);
>>   }
>>   
>> +static void test_vm_setup_snapshot_mem(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
>> +{
>> +	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, PMU_SNAPSHOT_GPA_BASE, 1, 1, 0);
>> +	/* PMU Snapshot requires single page only */
> 
> This comment should go above the memory region add
> 
>> +	virt_map(vm, PMU_SNAPSHOT_GPA_BASE, PMU_SNAPSHOT_GPA_BASE, 1);
>> +
>> +	/* PMU_SNAPSHOT_GPA_BASE is identity mapped */
> 
> This comment should go above the virt_map
> 

Fixed.

>> +	snapshot_gva = (void *)(PMU_SNAPSHOT_GPA_BASE);
>> +	snapshot_gpa = addr_gva2gpa(vcpu->vm, (vm_vaddr_t)snapshot_gva);
>> +	sync_global_to_guest(vcpu->vm, snapshot_gva);
>> +	sync_global_to_guest(vcpu->vm, snapshot_gpa);
>> +}
>> +
>> +static void test_vm_events_snapshot_test(void *guest_code)
>> +{
>> +	struct kvm_vm *vm = NULL;
>> +	struct kvm_vcpu *vcpu = NULL;
> 
> nit: no need to set to NULL
> 
>> +
>> +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
>> +	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
> 
> RISCV_SBI_EXT_REG
> 

Updated to use the new helper functions as suggested in the earlier patch.

>> +				   "SBI PMU not available, skipping test");
>> +
>> +	test_vm_setup_snapshot_mem(vm, vcpu);
>> +
>> +	vcpu_args_set(vcpu, 1, 0);
> 
> no need to set args
> 

Fixed.

>> +	run_vcpu(vcpu);
>> +
>> +	test_vm_destroy(vm);
>> +}
>> +
>>   int main(void)
>>   {
>>   	test_vm_basic_test(test_pmu_basic_sanity);
>> @@ -336,5 +457,8 @@ int main(void)
>>   	test_vm_events_test(test_pmu_events);
>>   	pr_info("SBI PMU event verification test : PASS\n");
>>   
>> +	test_vm_events_snapshot_test(test_pmu_events_snaphost);
>> +	pr_info("SBI PMU event verification with snapshot test : PASS\n");
>> +
>>   	return 0;
>>   }
>> -- 
>> 2.34.1
>>
> 
> Thanks,
> drew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 15/15] KVM: riscv: selftests: Add a test for counter overflow
  2024-03-02 12:35   ` Andrew Jones
@ 2024-04-02  8:42     ` Atish Patra
  0 siblings, 0 replies; 56+ messages in thread
From: Atish Patra @ 2024-04-02  8:42 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Anup Patel,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On 3/2/24 04:35, Andrew Jones wrote:
> On Wed, Feb 28, 2024 at 05:01:30PM -0800, Atish Patra wrote:
>> Add a test for verifying overflow interrupt. Currently, it relies on
>> overflow support on cycle/instret events. This test works for cycle/
>> instret events which support sampling via hpmcounters on the platform.
>> There are no ISA extensions to detect if a platform supports that. Thus,
> 
> Ouch. Are there discussions/proposals as to how we can do better with
> discoverability here? This type of thing sounds like the types of things
> that get new extension names defined for them as part of the profile spec
> work.
> 

There is a Perf events TG started (last month) which will work on 
standardzing perf events for RISC-V.

Perf tool can also rely on the json file to figure out if sampling 
support or not. But for kselftests we don't have any of these 
infrastructure.

>> this test will fail on platform with virtualization but doesn't
>> support overflow on these two events.
>>
>> Signed-off-by: Atish Patra <atishp@rivosinc.com>
>> ---
>>   tools/testing/selftests/kvm/riscv/sbi_pmu.c | 126 +++++++++++++++++++-
>>   1 file changed, 125 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu.c b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
>> index 8ea2a6db6610..c0264c636054 100644
>> --- a/tools/testing/selftests/kvm/riscv/sbi_pmu.c
>> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu.c
>> @@ -8,6 +8,7 @@
>>    * Copyright (c) 2024, Rivos Inc.
>>    */
>>   
>> +#include "asm/csr.h"
>>   #include <stdio.h>
>>   #include <stdlib.h>
>>   #include <string.h>
>> @@ -16,6 +17,7 @@
>>   #include "kvm_util.h"
>>   #include "test_util.h"
>>   #include "processor.h"
>> +#include "arch_timer.h"
>>   
>>   /* Maximum counters (firmware + hardware)*/
>>   #define RISCV_MAX_PMU_COUNTERS 64
>> @@ -26,6 +28,11 @@ union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
>>   static void *snapshot_gva;
>>   static vm_paddr_t snapshot_gpa;
>>   
>> +static int pmu_irq = IRQ_PMU_OVF;
>> +
>> +static int vcpu_shared_irq_count;
>> +static int counter_in_use;
>> +
>>   /* Cache the available counters in a bitmask */
>>   static unsigned long counter_mask_available;
>>   
>> @@ -69,7 +76,9 @@ unsigned long pmu_csr_read_num(int csr_num)
>>   #undef switchcase_csr_read
>>   }
>>   
>> -static inline void dummy_func_loop(int iter)
>> +static void stop_counter(unsigned long counter, unsigned long stop_flags);
>> +
>> +static inline void dummy_func_loop(uint64_t iter)
>>   {
>>   	int i = 0;
>>   
>> @@ -88,6 +97,26 @@ static void guest_illegal_exception_handler(struct ex_regs *regs)
>>   	regs->epc += 4;
>>   }
>>   
>> +static void guest_irq_handler(struct ex_regs *regs)
>> +{
>> +	unsigned int irq_num = regs->cause & ~CAUSE_IRQ_FLAG;
>> +	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
>> +	unsigned long overflown_mask;
>> +
>> +	/* Stop all counters first to avoid further interrupts */
>> +	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0, 1UL << counter_in_use,
>> +		  SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT, 0, 0, 0);
>> +
>> +	csr_clear(CSR_SIP, BIT(pmu_irq));
>> +
>> +	overflown_mask = READ_ONCE(snapshot_data->ctr_overflow_mask);
>> +	GUEST_ASSERT(overflown_mask & (1UL << counter_in_use));
>> +
>> +	/* Validate that we are in the correct irq handler */
>> +	GUEST_ASSERT_EQ(irq_num, pmu_irq);
> 
> Should probably do this irq handler assert first.
> 

Done.

>> +	WRITE_ONCE(vcpu_shared_irq_count, vcpu_shared_irq_count+1);
>> +}
>> +
>>   static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
>>   				       unsigned long cflags,
>>   				       unsigned long event)
>> @@ -263,6 +292,32 @@ static void test_pmu_event_snapshot(unsigned long event)
>>   	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
>>   }
>>   
>> +static void test_pmu_event_overflow(unsigned long event)
>> +{
>> +	unsigned long counter;
>> +	unsigned long counter_value_post;
>> +	unsigned long counter_init_value = ULONG_MAX - 10000;
>> +	struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
>> +
>> +	counter = get_counter_index(0, counter_mask_available, 0, event);
>> +	counter_in_use = counter;
>> +
>> +	/* The counter value is updated w.r.t relative index of cbase passed to start/stop */
>> +	WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
>> +	start_counter(counter, SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT, 0);
>> +	dummy_func_loop(10000);
>> +	udelay(msecs_to_usecs(2000));
>> +	/* irq handler should have stopped the counter */
>> +
>> +	counter_value_post = READ_ONCE(snapshot_data->ctr_values[counter_in_use]);
>> +	/* The counter value after stopping should be less the init value due to overflow */
>> +	__GUEST_ASSERT(counter_value_post < counter_init_value,
>> +		       "counter_value_post %lx counter_init_value %lx for counter\n",
>> +		       counter_value_post, counter_init_value);
>> +
>> +	stop_counter(counter, SBI_PMU_STOP_FLAG_RESET);
>> +}
>> +
>>   static void test_invalid_event(void)
>>   {
>>   	struct sbiret ret;
>> @@ -361,6 +416,43 @@ static void test_pmu_events_snaphost(int cpu)
>>   	GUEST_DONE();
>>   }
>>   
>> +static void test_pmu_events_overflow(int cpu)
> 
> no need for cpu
> 

Fixed.

>> +{
>> +	long out_val = 0;
>> +	bool probe;
>> +	int num_counters = 0;
>> +	unsigned long sbi_impl_version;
>> +
>> +	probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
>> +	GUEST_ASSERT(probe && out_val == 1);
>> +
>> +	sbi_impl_version = get_host_sbi_impl_version();
>> +	if (sbi_impl_version >= sbi_mk_version(2, 0))
>> +		__GUEST_ASSERT(0, "SBI implementation version doesn't support PMU Snapshot");
> 
> Identical probe and version check as test_pmu_events_snaphost(). Can
> factor out.
> 

Done.

>> +
>> +	snapshot_set_shmem(snapshot_gpa, 0);
>> +	csr_set(CSR_IE, BIT(pmu_irq));
>> +	local_irq_enable();
>> +
>> +	/* Get the counter details */
>> +	num_counters = get_num_counters();
>> +	update_counter_info(num_counters);
>> +
>> +	/*
>> +	 * Qemu supports overflow for cycle/instruction.
>> +	 * This test may fail on any platform that do not support overflow for these two events.
>> +	 */
>> +	test_pmu_event_overflow(SBI_PMU_HW_CPU_CYCLES);
>> +	GUEST_ASSERT_EQ(vcpu_shared_irq_count, 1);
>> +
>> +	/* Renable the interrupt again for another event */
>> +	csr_set(CSR_IE, BIT(pmu_irq));
>> +	test_pmu_event_overflow(SBI_PMU_HW_INSTRUCTIONS);
>> +	GUEST_ASSERT_EQ(vcpu_shared_irq_count, 2);
>> +
>> +	GUEST_DONE();
>> +}
>> +
>>   static void run_vcpu(struct kvm_vcpu *vcpu)
>>   {
>>   	struct ucall uc;
>> @@ -449,6 +541,35 @@ static void test_vm_events_snapshot_test(void *guest_code)
>>   	test_vm_destroy(vm);
>>   }
>>   
>> +static void test_vm_events_overflow(void *guest_code)
>> +{
>> +	struct kvm_vm *vm = NULL;
>> +	struct kvm_vcpu *vcpu = NULL;
> 
> nit: no need for NULL
> 
>> +
>> +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
>> +	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_SBI_EXT_REG(KVM_RISCV_SBI_EXT_PMU)),
>> +				   "SBI PMU not available, skipping test");
>> +
>> +	__TEST_REQUIRE(__vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(KVM_RISCV_ISA_EXT_SSCOFPMF)),
>> +				   "Sscofpmf is not available, skipping overflow test");
>> +
>> +
>> +	test_vm_setup_snapshot_mem(vm, vcpu);
>> +	vm_init_vector_tables(vm);
>> +	vm_install_interrupt_handler(vm, guest_irq_handler);
>> +
>> +	vcpu_init_vector_tables(vcpu);
>> +	/* Initialize guest timer frequency. */
>> +	vcpu_get_reg(vcpu, RISCV_TIMER_REG(frequency), &timer_freq);
>> +	sync_global_to_guest(vm, timer_freq);
> 
> I just noticed that timer_freq is in arch_timer.h and isn't an extern...
> Fixing that is out of scope for this series though.
> 

Yeah. I can add a patch for that.
But declaring it as an extern requires the definitions to be in either 
common source file (i.e processor.c) or define it in each tests using 
timer_freq.

The first approach is bit odd with header file and definition residing 
two different source/header file.

Any preference ?

>> +
>> +	vcpu_args_set(vcpu, 1, 0);
> 
> no need for args
> 

Fixed.

>> +
>> +	run_vcpu(vcpu);
>> +
>> +	test_vm_destroy(vm);
>> +}
>> +
>>   int main(void)
>>   {
>>   	test_vm_basic_test(test_pmu_basic_sanity);
>> @@ -460,5 +581,8 @@ int main(void)
>>   	test_vm_events_snapshot_test(test_pmu_events_snaphost);
>>   	pr_info("SBI PMU event verification with snapshot test : PASS\n");
>>   
>> +	test_vm_events_overflow(test_pmu_events_overflow);
>> +	pr_info("SBI PMU event verification with overflow test : PASS\n");
>> +
>>   	return 0;
>>   }
>> -- 
>> 2.34.1
>>
> 
> Thanks,
> drew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 12/15] KVM: riscv: selftests: Add SBI PMU extension definitions
  2024-03-02 11:00   ` Andrew Jones
@ 2024-04-02  8:43     ` Atish Patra
  0 siblings, 0 replies; 56+ messages in thread
From: Atish Patra @ 2024-04-02  8:43 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-kernel, Albert Ou, Alexandre Ghiti, Anup Patel,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On 3/2/24 03:00, Andrew Jones wrote:
> On Wed, Feb 28, 2024 at 05:01:27PM -0800, Atish Patra wrote:
>> The SBI PMU extension definition is required for upcoming SBI PMU
>> selftests.
>>
>> Signed-off-by: Atish Patra <atishp@rivosinc.com>
>> ---
>>   .../selftests/kvm/include/riscv/processor.h   | 67 +++++++++++++++++++
>>   1 file changed, 67 insertions(+)
>>
>> diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
>> index f75c381fa35a..a49a39c8e8d4 100644
>> --- a/tools/testing/selftests/kvm/include/riscv/processor.h
>> +++ b/tools/testing/selftests/kvm/include/riscv/processor.h
> 
> We should probably create a new header (include/riscv/sbi.h) since
> otherwise processor.h is very quickly going to look like an SBI
> header with a few non-sbi things in it. Can we add a patch prior to
> this one that moves the SBI stuff we currently have in processor.h
> out to an sbi.h? Or, we could start synchronizing a copy of
> arch/riscv/include/asm/sbi.h in tools/arch/riscv/include/asm like
> we've done for csr.h
> 
A separate sbi.h makes sense. I have moved the definitions to sbi.h as 
of now.

There is still lot more changes in sbi.h which is not required for 
selftests even after this patch. But I am okay with syncing with sbi.h
But I am not sure what should be the synchronization policy for sbi.h.

As needed or regular sync with after every release? The csr.h is already 
out of date even though it was created last MW (one change is part of 
this series).

Let me know if you have any thoughts about that. I can send another 
version with that.

>> @@ -169,17 +169,84 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector, exception_handl
>>   enum sbi_ext_id {
>>   	SBI_EXT_BASE = 0x10,
>>   	SBI_EXT_STA = 0x535441,
>> +	SBI_EXT_PMU = 0x504D55,
>>   };
>>   
>>   enum sbi_ext_base_fid {
>>   	SBI_EXT_BASE_PROBE_EXT = 3,
>>   };
>>   
>> +enum sbi_ext_pmu_fid {
>> +	SBI_EXT_PMU_NUM_COUNTERS = 0,
>> +	SBI_EXT_PMU_COUNTER_GET_INFO,
>> +	SBI_EXT_PMU_COUNTER_CFG_MATCH,
>> +	SBI_EXT_PMU_COUNTER_START,
>> +	SBI_EXT_PMU_COUNTER_STOP,
>> +	SBI_EXT_PMU_COUNTER_FW_READ,
>> +	SBI_EXT_PMU_COUNTER_FW_READ_HI,
>> +	SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
>> +};
>> +
>> +union sbi_pmu_ctr_info {
>> +	unsigned long value;
>> +	struct {
>> +		unsigned long csr:12;
>> +		unsigned long width:6;
>> +#if __riscv_xlen == 32
>> +		unsigned long reserved:13;
>> +#else
>> +		unsigned long reserved:45;
>> +#endif
>> +		unsigned long type:1;
>> +	};
>> +};
>> +
>>   struct sbiret {
>>   	long error;
>>   	long value;
>>   };
>>   
>> +/** General pmu event codes specified in SBI PMU extension */
>> +enum sbi_pmu_hw_generic_events_t {
>> +	SBI_PMU_HW_NO_EVENT			= 0,
>> +	SBI_PMU_HW_CPU_CYCLES			= 1,
>> +	SBI_PMU_HW_INSTRUCTIONS			= 2,
>> +	SBI_PMU_HW_CACHE_REFERENCES		= 3,
>> +	SBI_PMU_HW_CACHE_MISSES			= 4,
>> +	SBI_PMU_HW_BRANCH_INSTRUCTIONS		= 5,
>> +	SBI_PMU_HW_BRANCH_MISSES		= 6,
>> +	SBI_PMU_HW_BUS_CYCLES			= 7,
>> +	SBI_PMU_HW_STALLED_CYCLES_FRONTEND	= 8,
>> +	SBI_PMU_HW_STALLED_CYCLES_BACKEND	= 9,
>> +	SBI_PMU_HW_REF_CPU_CYCLES		= 10,
>> +
>> +	SBI_PMU_HW_GENERAL_MAX,
>> +};
>> +
>> +/* SBI PMU counter types */
>> +enum sbi_pmu_ctr_type {
>> +	SBI_PMU_CTR_TYPE_HW = 0x0,
>> +	SBI_PMU_CTR_TYPE_FW,
>> +};
>> +
>> +/* Flags defined for config matching function */
>> +#define SBI_PMU_CFG_FLAG_SKIP_MATCH	(1 << 0)
>> +#define SBI_PMU_CFG_FLAG_CLEAR_VALUE	(1 << 1)
>> +#define SBI_PMU_CFG_FLAG_AUTO_START	(1 << 2)
>> +#define SBI_PMU_CFG_FLAG_SET_VUINH	(1 << 3)
>> +#define SBI_PMU_CFG_FLAG_SET_VSINH	(1 << 4)
>> +#define SBI_PMU_CFG_FLAG_SET_UINH	(1 << 5)
>> +#define SBI_PMU_CFG_FLAG_SET_SINH	(1 << 6)
>> +#define SBI_PMU_CFG_FLAG_SET_MINH	(1 << 7)
>> +
>> +/* Flags defined for counter start function */
>> +#define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)
>> +#define SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT BIT(1)
>> +
>> +/* Flags defined for counter stop function */
>> +#define SBI_PMU_STOP_FLAG_RESET (1 << 0)
>> +#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
> 
> When changing shifts to BIT()'s, don't forget these (easy not to forget
> if we go with the synch sbi.h to tools approach)
> 
>> +
>>   struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
>>   			unsigned long arg1, unsigned long arg2,
>>   			unsigned long arg3, unsigned long arg4,
>> -- 
>> 2.34.1
>>
> 
> Thanks,
> drew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 08/15] RISC-V: KVM: Implement SBI PMU Snapshot feature
  2024-04-01 22:36     ` Atish Patra
@ 2024-04-03  7:36       ` Atish Patra
  2024-04-04 13:19         ` Andrew Jones
  0 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-04-03  7:36 UTC (permalink / raw)
  To: Atish Patra, Andrew Jones
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv, kvm,
	linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On 4/1/24 15:36, Atish Patra wrote:
> On Sat, Mar 2, 2024 at 1:49 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>>
>> On Wed, Feb 28, 2024 at 05:01:23PM -0800, Atish Patra wrote:
>>> PMU Snapshot function allows to minimize the number of traps when the
>>> guest access configures/access the hpmcounters. If the snapshot feature
>>> is enabled, the hypervisor updates the shared memory with counter
>>> data and state of overflown counters. The guest can just read the
>>> shared memory instead of trap & emulate done by the hypervisor.
>>>
>>> This patch doesn't implement the counter overflow yet.
>>>
>>> Reviewed-by: Anup Patel <anup@brainfault.org>
>>> Signed-off-by: Atish Patra <atishp@rivosinc.com>
>>> ---
>>>   arch/riscv/include/asm/kvm_vcpu_pmu.h |   7 ++
>>>   arch/riscv/kvm/vcpu_pmu.c             | 120 +++++++++++++++++++++++++-
>>>   arch/riscv/kvm/vcpu_sbi_pmu.c         |   3 +
>>>   drivers/perf/riscv_pmu_sbi.c          |   2 +-
>>>   4 files changed, 129 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
>>> index 395518a1664e..586bab84be35 100644
>>> --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
>>> +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
>>> @@ -50,6 +50,10 @@ struct kvm_pmu {
>>>        bool init_done;
>>>        /* Bit map of all the virtual counter used */
>>>        DECLARE_BITMAP(pmc_in_use, RISCV_KVM_MAX_COUNTERS);
>>> +     /* The address of the counter snapshot area (guest physical address) */
>>> +     gpa_t snapshot_addr;
>>> +     /* The actual data of the snapshot */
>>> +     struct riscv_pmu_snapshot_data *sdata;
>>>   };
>>>
>>>   #define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu_context)
>>> @@ -85,6 +89,9 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
>>>   int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
>>>                                struct kvm_vcpu_sbi_return *retdata);
>>>   void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
>>> +int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
>>> +                                   unsigned long saddr_high, unsigned long flags,
>>> +                                   struct kvm_vcpu_sbi_return *retdata);
>>
>> I prefer to name this function
>>
>> kvm_riscv_vcpu_pmu_snapshot_set_shmem
>>
> 
> Sure.
> 
>>>   void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
>>>   void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
>>>
>>> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
>>> index 29bf4ca798cb..74865e6050a1 100644
>>> --- a/arch/riscv/kvm/vcpu_pmu.c
>>> +++ b/arch/riscv/kvm/vcpu_pmu.c
>>> @@ -311,6 +311,81 @@ int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
>>>        return ret;
>>>   }
>>>
>>> +static void kvm_pmu_clear_snapshot_area(struct kvm_vcpu *vcpu)
>>> +{
>>> +     struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
>>> +     int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
>>> +
>>> +     if (kvpmu->sdata) {
>>> +             memset(kvpmu->sdata, 0, snapshot_area_size);
>>> +             if (kvpmu->snapshot_addr != INVALID_GPA)
>>
>> It's a KVM bug if we have non-null sdata but snapshot_addr is INVALID_GPA,
>> right? Maybe we should warn if we see that. We can also move the memset
>> inside the if block.
>>
> 
> Added a warning.
> 
>>> +                     kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr,
>>> +                                          kvpmu->sdata, snapshot_area_size);
>>> +             kfree(kvpmu->sdata);
>>> +             kvpmu->sdata = NULL;
>>> +     }
>>> +     kvpmu->snapshot_addr = INVALID_GPA;
>>> +}
>>> +
>>> +int kvm_riscv_vcpu_pmu_setup_snapshot(struct kvm_vcpu *vcpu, unsigned long saddr_low,
>>> +                                   unsigned long saddr_high, unsigned long flags,
>>> +                                   struct kvm_vcpu_sbi_return *retdata)
>>> +{
>>> +     struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
>>> +     int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
>>> +     int sbiret = 0;
>>> +     gpa_t saddr;
>>> +     unsigned long hva;
>>> +     bool writable;
>>> +
>>> +     if (!kvpmu) {
>>> +             sbiret = SBI_ERR_INVALID_PARAM;
>>> +             goto out;
>>> +     }
>>
>> Need to check that flags is zero or return SBI_ERR_INVALID_PARAM.
>>
> 
> Fixed.
> 
>>> +
>>> +     if (saddr_low == -1 && saddr_high == -1) {
>>
>> We introduced SBI_STA_SHMEM_DISABLE for these magic -1's for STA. Since
>> SBI is using the -1 approach for all its shmem, then maybe we should
>> rename SBI_STA_SHMEM_DISABLE to SBI_SHMEM_DISABLE and then use them here
>> too.
>>
> 
> Fixed
> 
>>> +             kvm_pmu_clear_snapshot_area(vcpu);
>>> +             return 0;
>>> +     }
>>> +
>>> +     saddr = saddr_low;
>>> +
>>> +     if (saddr_high != 0) {
>>> +             if (IS_ENABLED(CONFIG_32BIT))
>>> +                     saddr |= ((gpa_t)saddr << 32);
>>> +             else
>>> +                     sbiret = SBI_ERR_INVALID_ADDRESS;
>>> +             goto out;
>>> +     }
>>> +
>>> +     if (kvm_is_error_gpa(vcpu->kvm, saddr)) {
>>> +             sbiret = SBI_ERR_INVALID_PARAM;
>>> +             goto out;
>>> +     }
>>
>> Does the check above provide anything more than what the check below does?
>>
> You are correct. I have removed the check
> 
>>> +
>>> +     hva = kvm_vcpu_gfn_to_hva_prot(vcpu, saddr >> PAGE_SHIFT, &writable);
>>> +     if (kvm_is_error_hva(hva) || !writable) {
>>> +             sbiret = SBI_ERR_INVALID_ADDRESS;
>>> +             goto out;
>>> +     }
>>> +
>>> +     kvpmu->snapshot_addr = saddr;
>>> +     kvpmu->sdata = kzalloc(snapshot_area_size, GFP_ATOMIC);
>>> +     if (!kvpmu->sdata)
>>
>> Should reset snapshot_addr to INVALID_GPA here on error. Or maybe we
>> should just set snapshot_addr to saddr at the bottom of this function if
>> we make it.
>>
> 
> Done.
> 
>>> +             return -ENOMEM;
>>> +
>>> +     if (kvm_vcpu_write_guest(vcpu, saddr, kvpmu->sdata, snapshot_area_size)) {
>>> +             kfree(kvpmu->sdata);
>>> +             kvpmu->snapshot_addr = INVALID_GPA;
>>> +             sbiret = SBI_ERR_FAILURE;
>>
>> I agree we should return this SBI error for this case, but unfortunately
>> the spec is missing the
>>
>>   SBI_ERR_FAILED - The request failed for unspecified or unknown other reasons.
>>
>> that we have for other SBI functions. I guess we should keep the code like
>> this and open a PR to the spec.
>>
> 
> I have created a blanket github issue for now. I will send a PR.
> 
>>> +     }
>>> +
>>> +out:
>>> +     retdata->err_val = sbiret;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>>   int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu,
>>>                                struct kvm_vcpu_sbi_return *retdata)
>>>   {
>>> @@ -344,20 +419,33 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>>>        int i, pmc_index, sbiret = 0;
>>>        struct kvm_pmc *pmc;
>>>        int fevent_code;
>>> +     bool snap_flag_set = flags & SBI_PMU_START_FLAG_INIT_FROM_SNAPSHOT;
>>
>> This function should confirm no undefined bits are set in flags and the
>> spec should specify that the reserved flags must be zero otherwise an
>> invalid param will be returned.
>>
>> Also here would should confirm that only one of the two flags is set,
>> otherwise return invalid param, as they've specified to be mutually
>> exclusive.
>>
> 
> That makes sense. Update the same github issue.
> (https://github.com/riscv-non-isa/riscv-sbi-doc/issues/145)
> 
> I will make the necessary changes in a separate series after the spec is merged.
> 
>> Regarding the spec, the note about the counter value not being modified
>> unless SBI_PMU_START_SET_INIT_VALUE is set should be modified to state
>> unless either of the two flags are set (so I think we need another spec
>> PR).
>>
>> (The same flags checking/specifying comments apply to the other functions
>> with flags too.)
>>
> 
> Noted (https://github.com/riscv-non-isa/riscv-sbi-doc/issues/146).
> 
>>>
>>>        if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
>>>                sbiret = SBI_ERR_INVALID_PARAM;
>>>                goto out;
>>>        }
>>>
>>> +     if (snap_flag_set && kvpmu->snapshot_addr == INVALID_GPA) {
>>> +             sbiret = SBI_ERR_NO_SHMEM;
>>> +             goto out;
>>> +     }
>>> +
>>>        /* Start the counters that have been configured and requested by the guest */
>>>        for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
>>>                pmc_index = i + ctr_base;
>>>                if (!test_bit(pmc_index, kvpmu->pmc_in_use))
>>>                        continue;
>>>                pmc = &kvpmu->pmc[pmc_index];
>>> -             if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE)
>>> +             if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE) {
>>>                        pmc->counter_val = ival;
>>> +             } else if (snap_flag_set) {
>>> +                     kvm_vcpu_read_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
>>> +                                         sizeof(struct riscv_pmu_snapshot_data));
>>
>> The snapshot read should be outside the for_each_set_bit() loop and we
>> should warn and abort the counter starting if the read fails.
>>
> 
> Fixed. This should also fall under the SBI_ERR_FAILURE category.
> 
>>> +                     /* The counter index in the snapshot are relative to the counter base */
>>> +                     pmc->counter_val = kvpmu->sdata->ctr_values[i];
>>> +             }
>>> +
>>>                if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
>>>                        fevent_code = get_event_code(pmc->event_idx);
>>>                        if (fevent_code >= SBI_PMU_FW_MAX) {
>>> @@ -398,14 +486,21 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>>>   {
>>>        struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
>>>        int i, pmc_index, sbiret = 0;
>>> +     u64 enabled, running;
>>>        struct kvm_pmc *pmc;
>>>        int fevent_code;
>>> +     bool snap_flag_set = flags & SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
>>>
>>> -     if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
>>> +     if ((kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0)) {
>>
>> Added unnecessary () here.
>>
> 
> Fixed.
> 
>>>                sbiret = SBI_ERR_INVALID_PARAM;
>>>                goto out;
>>>        }
>>>
>>> +     if (snap_flag_set && kvpmu->snapshot_addr == INVALID_GPA) {
>>> +             sbiret = SBI_ERR_NO_SHMEM;
>>> +             goto out;
>>> +     }
>>> +
>>>        /* Stop the counters that have been configured and requested by the guest */
>>>        for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
>>>                pmc_index = i + ctr_base;
>>> @@ -438,9 +533,28 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
>>>                } else {
>>>                        sbiret = SBI_ERR_INVALID_PARAM;
>>>                }
>>> +
>>> +             if (snap_flag_set && !sbiret) {
>>> +                     if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW)
>>> +                             pmc->counter_val = kvpmu->fw_event[fevent_code].value;
>>> +                     else if (pmc->perf_event)
>>> +                             pmc->counter_val += perf_event_read_value(pmc->perf_event,
>>> +                                                                       &enabled, &running);
>>> +                     /* TODO: Add counter overflow support when sscofpmf support is added */
>>> +                     kvpmu->sdata->ctr_values[i] = pmc->counter_val;
>>> +                     kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
>>> +                                          sizeof(struct riscv_pmu_snapshot_data));
>>
>> Should just set a boolean here saying that the snapshot needs an update
>> and then do the update outside the for_each_set_bit loop.
>>
> 
> Done.
> 
>>> +             }
>>> +
>>>                if (flags & SBI_PMU_STOP_FLAG_RESET) {
>>>                        pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
>>>                        clear_bit(pmc_index, kvpmu->pmc_in_use);
>>> +                     if (snap_flag_set) {
>>> +                             /* Clear the snapshot area for the upcoming deletion event */
>>> +                             kvpmu->sdata->ctr_values[i] = 0;
>>> +                             kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
>>> +                                                  sizeof(struct riscv_pmu_snapshot_data));
>>
>> The spec isn't clear on this (so we should clarify it), but I'd expect
>> that a caller who set both the reset and the snapshot flag would want
>> the snapshot from before the reset when this call completes and then
>> assume that when they start counting again, and look at the snapshot
>> again, that those new counts would be from the reset values. Or maybe
>> not :-) Maybe they want to do a reset and take a snapshot in order to
>> look at the snapshot and confirm the reset happened? Either way, it
>> seems we should only do one of the two here. Either update the snapshot
>> before resetting, and not again after reset, or reset and then update
>> the snapshot (with no need to update before).
>>
> 
> The reset call should happen when the event is deleted by the perf
> framework in supervisor.
> If we don't clear the values, the shared memory may have stale data of
> last read counters
> which is not ideal. That's why, I am clearing it upon resetting.

Thinking about it more, I think having stale values in the shared memory
would be similar expected behavior to hardware counters after reset. We 
don't need to clear the shared memory during the reset.

If both SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT and SBI_PMU_STOP_FLAG_RESET are 
set, may be we should just write it to the shared memory again without 
assuming the intention of the caller ?

> The actual counter value should be read while stopping the counters.
> 
> I thought the current description is clear enough as it says
> 
> "SBI_PMU_STOP_FLAG_RESET - Reset the counter to event mapping."
> 
> Do you feel we should be more explicit about this ?
> 
>>> +                     }
>>>                }
>>>        }
>>>
>>> @@ -566,6 +680,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
>>>        kvpmu->num_hw_ctrs = num_hw_ctrs + 1;
>>>        kvpmu->num_fw_ctrs = SBI_PMU_FW_MAX;
>>>        memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
>>> +     kvpmu->snapshot_addr = INVALID_GPA;
>>>
>>>        if (kvpmu->num_hw_ctrs > RISCV_KVM_MAX_HW_CTRS) {
>>>                pr_warn_once("Limiting the hardware counters to 32 as specified by the ISA");
>>> @@ -625,6 +740,7 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
>>>        }
>>>        bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
>>>        memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
>>> +     kvm_pmu_clear_snapshot_area(vcpu);
>>>   }
>>>
>>>   void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
>>> diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
>>> index b70179e9e875..9f61136e4bb1 100644
>>> --- a/arch/riscv/kvm/vcpu_sbi_pmu.c
>>> +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
>>> @@ -64,6 +64,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>>        case SBI_EXT_PMU_COUNTER_FW_READ:
>>>                ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
>>>                break;
>>> +     case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM:
>>> +             ret = kvm_riscv_vcpu_pmu_setup_snapshot(vcpu, cp->a0, cp->a1, cp->a2, retdata);
>>> +             break;
>>>        default:
>>>                retdata->err_val = SBI_ERR_NOT_SUPPORTED;
>>>        }
>>> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
>>> index 8de5721e8019..1a22ce1ff8c8 100644
>>> --- a/drivers/perf/riscv_pmu_sbi.c
>>> +++ b/drivers/perf/riscv_pmu_sbi.c
>>> @@ -802,7 +802,7 @@ static noinline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_h
>>>        struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
>>>
>>>        for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
>>> -             if (ctr_ovf_mask & (1 << idx)) {
>>> +             if (ctr_ovf_mask & (BIT(idx))) {
>>>                        event = cpu_hw_evt->events[idx];
>>>                        hwc = &event->hw;
>>>                        max_period = riscv_pmu_ctr_get_width_mask(event);
>>> --
>>> 2.34.1
>>>
>>
>> Thanks,
>> drew
> 
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 07/15] RISC-V: KVM: No need to exit to the user space if perf event failed
  2024-04-01 22:37     ` Atish Patra
@ 2024-04-04 12:16       ` Andrew Jones
  2024-04-10 22:44         ` Atish Patra
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-04-04 12:16 UTC (permalink / raw)
  To: Atish Patra
  Cc: Atish Patra, linux-kernel, Anup Patel, Albert Ou,
	Alexandre Ghiti, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Mon, Apr 01, 2024 at 03:37:01PM -0700, Atish Patra wrote:
> On Sat, Mar 2, 2024 at 12:16 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > On Wed, Feb 28, 2024 at 05:01:22PM -0800, Atish Patra wrote:
> > > Currently, we return a linux error code if creating a perf event failed
> > > in kvm. That shouldn't be necessary as guest can continue to operate
> > > without perf profiling or profiling with firmware counters.
> > >
> > > Return appropriate SBI error code to indicate that PMU configuration
> > > failed. An error message in kvm already describes the reason for failure.
> >
> > I don't know enough about the perf subsystem to know if there may be
> > a concern that resources are temporarily unavailable. If so, then this
> 
> Do you mean the hardware resources unavailable because the host is using it ?

Yes (I think). The issue I'm thinking of is if kvm_pmu_create_perf_event
(perf_event_create_kernel_counter) returns something like EBUSY and then
we translate that to SBI_ERR_NOT_SUPPORTED. I'm not sure guests would
interpret not-supported as an error which means they can retry. Or if
they retry and get something other than not-supported if they'd be
confused.

Thanks,
drew
  

> 
> > patch would make it possible for a guest to do the exact same thing,
> > but sometimes succeed and sometimes get SBI_ERR_NOT_SUPPORTED.
> > sbi_pmu_counter_config_matching doesn't currently have any error types
> > specified that say "unsupported at the moment, maybe try again", which
> > would be more appropriate in that case. I do see
> > perf_event_create_kernel_counter() can return ENOMEM when memory isn't
> > available, but if the kernel isn't able to allocate a small amount of
> > memory, then we're in bigger trouble anyway, so the concern would be
> > if there are perf resource pools which may temporarily be exhausted at
> > the time the guest makes this request.
> >
> 
> For other cases, this patch ensures that guests continue to run without failure
> which allows the user in the guest to try again if this fails due to a temporary
> resource availability.
> 
> > One comment below.
> >
> > >
> > > Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling")
> > > Reviewed-by: Anup Patel <anup@brainfault.org>
> > > Signed-off-by: Atish Patra <atishp@rivosinc.com>
> > > ---
> > >  arch/riscv/kvm/vcpu_pmu.c     | 14 +++++++++-----
> > >  arch/riscv/kvm/vcpu_sbi_pmu.c |  6 +++---
> > >  2 files changed, 12 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > > index b1574c043f77..29bf4ca798cb 100644
> > > --- a/arch/riscv/kvm/vcpu_pmu.c
> > > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > > @@ -229,8 +229,9 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
> > >       return 0;
> > >  }
> > >
> > > -static int kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
> > > -                                  unsigned long flags, unsigned long eidx, unsigned long evtdata)
> > > +static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
> > > +                                   unsigned long flags, unsigned long eidx,
> > > +                                   unsigned long evtdata)
> > >  {
> > >       struct perf_event *event;
> > >
> > > @@ -454,7 +455,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> > >                                    unsigned long eidx, u64 evtdata,
> > >                                    struct kvm_vcpu_sbi_return *retdata)
> > >  {
> > > -     int ctr_idx, ret, sbiret = 0;
> > > +     int ctr_idx, sbiret = 0;
> > > +     long ret;
> > >       bool is_fevent;
> > >       unsigned long event_code;
> > >       u32 etype = kvm_pmu_get_perf_event_type(eidx);
> > > @@ -513,8 +515,10 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> > >                       kvpmu->fw_event[event_code].started = true;
> > >       } else {
> > >               ret = kvm_pmu_create_perf_event(pmc, &attr, flags, eidx, evtdata);
> > > -             if (ret)
> > > -                     return ret;
> > > +             if (ret) {
> > > +                     sbiret = SBI_ERR_NOT_SUPPORTED;
> > > +                     goto out;
> > > +             }
> > >       }
> > >
> > >       set_bit(ctr_idx, kvpmu->pmc_in_use);
> > > diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> > > index 7eca72df2cbd..b70179e9e875 100644
> > > --- a/arch/riscv/kvm/vcpu_sbi_pmu.c
> > > +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> > > @@ -42,9 +42,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > >  #endif
> > >               /*
> > >                * This can fail if perf core framework fails to create an event.
> > > -              * Forward the error to userspace because it's an error which
> > > -              * happened within the host kernel. The other option would be
> > > -              * to convert to an SBI error and forward to the guest.
> > > +              * No need to forward the error to userspace and exit the guest
> >
> > Period after guest
> >
> >
> > > +              * operation can continue without profiling. Forward the
> >
> > The operation
> >
> 
> Fixed the above two.
> 
> 
> > > +              * appropriate SBI error to the guest.
> > >                */
> > >               ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
> > >                                                      cp->a2, cp->a3, temp, retdata);
> > > --
> > > 2.34.1
> > >
> >
> > Thanks,
> > drew
> 
> 
> 
> --
> Regards,
> Atish

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 08/15] RISC-V: KVM: Implement SBI PMU Snapshot feature
  2024-04-03  7:36       ` Atish Patra
@ 2024-04-04 13:19         ` Andrew Jones
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Jones @ 2024-04-04 13:19 UTC (permalink / raw)
  To: Atish Patra
  Cc: Atish Patra, linux-kernel, Anup Patel, Albert Ou,
	Alexandre Ghiti, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Apr 03, 2024 at 12:36:41AM -0700, Atish Patra wrote:
> On 4/1/24 15:36, Atish Patra wrote:
> > On Sat, Mar 2, 2024 at 1:49 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> > > 
> > > On Wed, Feb 28, 2024 at 05:01:23PM -0800, Atish Patra wrote:
...
> > > > +
> > > >                if (flags & SBI_PMU_STOP_FLAG_RESET) {
> > > >                        pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
> > > >                        clear_bit(pmc_index, kvpmu->pmc_in_use);
> > > > +                     if (snap_flag_set) {
> > > > +                             /* Clear the snapshot area for the upcoming deletion event */
> > > > +                             kvpmu->sdata->ctr_values[i] = 0;
> > > > +                             kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
> > > > +                                                  sizeof(struct riscv_pmu_snapshot_data));
> > > 
> > > The spec isn't clear on this (so we should clarify it), but I'd expect
> > > that a caller who set both the reset and the snapshot flag would want
> > > the snapshot from before the reset when this call completes and then
> > > assume that when they start counting again, and look at the snapshot
> > > again, that those new counts would be from the reset values. Or maybe
> > > not :-) Maybe they want to do a reset and take a snapshot in order to
> > > look at the snapshot and confirm the reset happened? Either way, it
> > > seems we should only do one of the two here. Either update the snapshot
> > > before resetting, and not again after reset, or reset and then update
> > > the snapshot (with no need to update before).
> > > 
> > 
> > The reset call should happen when the event is deleted by the perf
> > framework in supervisor.
> > If we don't clear the values, the shared memory may have stale data of
> > last read counters
> > which is not ideal. That's why, I am clearing it upon resetting.
> 
> Thinking about it more, I think having stale values in the shared memory
> would be similar expected behavior to hardware counters after reset. We
> don't need to clear the shared memory during the reset.
> 
> If both SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT and SBI_PMU_STOP_FLAG_RESET are set,
> may be we should just write it to the shared memory again without assuming
> the intention of the caller ?
>

Either way, we just need to ensure it's clear in the spec.

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 09/15] RISC-V: KVM: Add perf sampling support for guests
  2024-04-02  8:33     ` Atish Patra
@ 2024-04-05 12:05       ` Andrew Jones
  2024-04-10  0:11         ` Atish Patra
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Jones @ 2024-04-05 12:05 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Tue, Apr 02, 2024 at 01:33:10AM -0700, Atish Patra wrote:
...
> > but it should be possible for the VMM to disable this extension in the
> > guest. We just need to change all the checks in KVM of the host's ISA
> > for RISCV_ISA_EXT_SSCOFPMF to checking the guest's ISA instead. Maybe
> > it's not worth it, though, if the guest PMU isn't useful without overflow.
> > But, sometimes it's nice to be able to disable stuff for debug and
> > workarounds.
> > 
> 
> As per my understanding, kvm_riscv_vcpu_isa_disable_allowed only returns
> true for those extensions which can be disabled architecturally.

I think kvm_riscv_vcpu_isa_disable_allowed can return true for any
extensions that KVM can guarantee won't be exposed in any way to the
guest. Extensions that cannot be disabled architecturally must return
false, since their instructions will still be present in the guest, even
if KVM doesn't want to expose them, but extensions which KVM emulates
can return true because KVM can choose not to emulate them. IIUC, sscofpmf
falls in this latter category.

> 
> VMM can still disable any extension by not adding to the device tree.
> In fact, that's how kvmtool can disable sstc or sscofpmf with
> --disable-<isa-ext command>.
> 
> The warning is bit confused though.
> 
> For example: if you run kvmtool with --disable-sstc
> 
> "Warning: Failed to disable sstc ISA exension"

I think Sstc should allow disabling since it has a corresponding henvcfg
bit which KVM could not set in order to force accesses to the Sstc CSRs
to raise ILL exceptions. So, let's put Sstc aside, since it's not a good
example. An extension like Zihintpause, OTOH, cannot be disabled since
the 'pause' instruction will be present even if KVM does not put
Zihintpause in the guest's ISA string. If a kvmtool user uses
--disable-zihintpause, then I think this warning about failing to disable
the extension is appropriate.

> 
> But sstc is disabled: Here is the cpuinfo output.
> # cat /proc/cpuinfo
> processor       : 0
> hart            : 0
> isa             : rv64imafdc_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zfa_zba_zbb_zbc_zbs_smstateen_sscofpmf
> mmu             : sv57
> mvendorid       : 0x0
> marchid         : 0x0
> mimpid          : 0x0
> hart isa        : rv64imafdc_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zfa_zba_zbb_zbc_zbs_smstateen_sscofpmf

Removing from the ISA string is the best we can do in cases like
Zihintpause, and is likely good enough for well-behaved guests, but the
VMM's warning to the user is good for these cases too, since not all
guests are well-behaved.

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest
  2024-04-02  8:34     ` Atish Patra
@ 2024-04-05 12:48       ` Andrew Jones
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Jones @ 2024-04-05 12:48 UTC (permalink / raw)
  To: Atish Patra
  Cc: Atish Patra, linux-kernel, Albert Ou, Alexandre Ghiti,
	Anup Patel, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv, kvm,
	linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Tue, Apr 02, 2024 at 01:34:54AM -0700, Atish Patra wrote:
...
> > > +static void guest_illegal_exception_handler(struct ex_regs *regs)
> > > +{
> > > +     __GUEST_ASSERT(regs->cause == EXC_INST_ILLEGAL,
> > > +                    "Unexpected exception handler %lx\n", regs->cause);
> >
> > Shouldn't we be reporting somehow that we were here? We seem to be using
> > this handler to skip instructions which don't work, which is fine, if
> > we have some knowledge we skipped them and then do something else.
> > Otherwise I don't understand.
> >
> 
> This is only used in test_vm_basic_test to validate that the guest
> will get an illegal
> exception if they try to access without configuring first.

Yeah, that's good. I just don't see how we know we were ever here. We
either got the exception and then stepped over the CSR read or we did
the CSR read. Either way, the test progresses the same. Shouldn't this
induce a test skip or something instead?

> > > +
> > > +     counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> > > +     __GUEST_ASSERT(counter_value_post > counter_value_pre,
> > > +                    "counter_value_post %lx counter_value_pre %lx\n",
> > > +                    counter_value_post, counter_value_pre);
> > > +
> > > +     /* Now set the initial value and compare */
> > > +     start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_init_value);
> >
> > We should try to confirm that we reset the counter, otherwise the check
> > below only proves that the value we read is greater than 100, which it
> > is possible even if the reset doesn't work.
> >
> 
> Hmm. There is no way to just update the counter value without starting
> it. Reading it without stopping is not reliable.
> Maybe we can do this.
> 
> 1. Reset it to 100. Stop it immediately after and read it. Let's say
> the value is X
> 2. Now reset it to counter  X + 1000.
> 3. Do the validation with the above reset value in #2.
> 
> Wdyt ?

OK

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 09/15] RISC-V: KVM: Add perf sampling support for guests
  2024-04-05 12:05       ` Andrew Jones
@ 2024-04-10  0:11         ` Atish Patra
  2024-04-10  7:20           ` Andrew Jones
  0 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-04-10  0:11 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On 4/5/24 05:05, Andrew Jones wrote:
> On Tue, Apr 02, 2024 at 01:33:10AM -0700, Atish Patra wrote:
> ...
>>> but it should be possible for the VMM to disable this extension in the
>>> guest. We just need to change all the checks in KVM of the host's ISA
>>> for RISCV_ISA_EXT_SSCOFPMF to checking the guest's ISA instead. Maybe
>>> it's not worth it, though, if the guest PMU isn't useful without overflow.
>>> But, sometimes it's nice to be able to disable stuff for debug and
>>> workarounds.
>>>
>>
>> As per my understanding, kvm_riscv_vcpu_isa_disable_allowed only returns
>> true for those extensions which can be disabled architecturally.
> 
> I think kvm_riscv_vcpu_isa_disable_allowed can return true for any
> extensions that KVM can guarantee won't be exposed in any way to the
> guest. Extensions that cannot be disabled architecturally must return
> false, since their instructions will still be present in the guest, even
> if KVM doesn't want to expose them, but extensions which KVM emulates
> can return true because KVM can choose not to emulate them. IIUC, sscofpmf
> falls in this latter category.
> 

hmm. The Sscofpmf is dependent on interrupt filtering via hvien and SBI 
PMU. So you are suggesting to toggle off the CSR_HVIEN bit for overflow 
interrupt or do more granular disabling for privilege mode filtering in 
SBI PMU as well.

Beyond that we can't disable SBI PMU. Is that okay ? A guest can still 
cause counter overflow and interrupt the host. However, the guest won't 
get any interrupt as hvien is not set.

It can also still filter the events as that is tied with SBI PMU.

We can put more granular checks in SBI pmu but I am just wondering if it 
provides anything additional beyond just disabling the sscofpmf in 
device tree.


>>
>> VMM can still disable any extension by not adding to the device tree.
>> In fact, that's how kvmtool can disable sstc or sscofpmf with
>> --disable-<isa-ext command>.
>>
>> The warning is bit confused though.
>>
>> For example: if you run kvmtool with --disable-sstc
>>
>> "Warning: Failed to disable sstc ISA exension"
> 
> I think Sstc should allow disabling since it has a corresponding henvcfg
> bit which KVM could not set in order to force accesses to the Sstc CSRs
> to raise ILL exceptions. So, let's put Sstc aside, since it's not a good

Agreed. I will send a separate patch for that.

> example. An extension like Zihintpause, OTOH, cannot be disabled since
> the 'pause' instruction will be present even if KVM does not put
> Zihintpause in the guest's ISA string. If a kvmtool user uses
> --disable-zihintpause, then I think this warning about failing to disable
> the extension is appropriate.
> 
>>
>> But sstc is disabled: Here is the cpuinfo output.
>> # cat /proc/cpuinfo
>> processor       : 0
>> hart            : 0
>> isa             : rv64imafdc_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zfa_zba_zbb_zbc_zbs_smstateen_sscofpmf
>> mmu             : sv57
>> mvendorid       : 0x0
>> marchid         : 0x0
>> mimpid          : 0x0
>> hart isa        : rv64imafdc_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zfa_zba_zbb_zbc_zbs_smstateen_sscofpmf
> 
> Removing from the ISA string is the best we can do in cases like
> Zihintpause, and is likely good enough for well-behaved guests, but the
> VMM's warning to the user is good for these cases too, since not all
> guests are well-behaved.
> 
> Thanks,
> drew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 09/15] RISC-V: KVM: Add perf sampling support for guests
  2024-04-10  0:11         ` Atish Patra
@ 2024-04-10  7:20           ` Andrew Jones
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Jones @ 2024-04-10  7:20 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Atish Patra, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Tue, Apr 09, 2024 at 05:11:31PM -0700, Atish Patra wrote:
> On 4/5/24 05:05, Andrew Jones wrote:
> > On Tue, Apr 02, 2024 at 01:33:10AM -0700, Atish Patra wrote:
> > ...
> > > > but it should be possible for the VMM to disable this extension in the
> > > > guest. We just need to change all the checks in KVM of the host's ISA
> > > > for RISCV_ISA_EXT_SSCOFPMF to checking the guest's ISA instead. Maybe
> > > > it's not worth it, though, if the guest PMU isn't useful without overflow.
> > > > But, sometimes it's nice to be able to disable stuff for debug and
> > > > workarounds.
> > > > 
> > > 
> > > As per my understanding, kvm_riscv_vcpu_isa_disable_allowed only returns
> > > true for those extensions which can be disabled architecturally.
> > 
> > I think kvm_riscv_vcpu_isa_disable_allowed can return true for any
> > extensions that KVM can guarantee won't be exposed in any way to the
> > guest. Extensions that cannot be disabled architecturally must return
> > false, since their instructions will still be present in the guest, even
> > if KVM doesn't want to expose them, but extensions which KVM emulates
> > can return true because KVM can choose not to emulate them. IIUC, sscofpmf
> > falls in this latter category.
> > 
> 
> hmm. The Sscofpmf is dependent on interrupt filtering via hvien and SBI PMU.
> So you are suggesting to toggle off the CSR_HVIEN bit for overflow interrupt

Yeah, this is what I was thinking.

> or do more granular disabling for privilege mode filtering in SBI PMU as
> well.
> 
> Beyond that we can't disable SBI PMU. Is that okay ? A guest can still cause
> counter overflow and interrupt the host. However, the guest won't get any
> interrupt as hvien is not set.
> 
> It can also still filter the events as that is tied with SBI PMU.
> 
> We can put more granular checks in SBI pmu but I am just wondering if it
> provides anything additional beyond just disabling the sscofpmf in device
> tree.

If it's too much of a code burden for something we're unlikely going to
want to do for anything other than debug (where removing the extension
from the device tree is likely sufficient), then that's another reason to
not allow disabling. Maybe we should write a comment above
kvm_riscv_vcpu_isa_disable_allowed which points how extensions end up
there, i.e. either KVM is powerless to completely hide it or we don't
want to maintain KVM code to completely hide it.

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 07/15] RISC-V: KVM: No need to exit to the user space if perf event failed
  2024-04-04 12:16       ` Andrew Jones
@ 2024-04-10 22:44         ` Atish Patra
  2024-04-11  7:38           ` Andrew Jones
  0 siblings, 1 reply; 56+ messages in thread
From: Atish Patra @ 2024-04-10 22:44 UTC (permalink / raw)
  To: Andrew Jones, Atish Patra
  Cc: linux-kernel, Anup Patel, Albert Ou, Alexandre Ghiti,
	Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv, kvm,
	linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On 4/4/24 05:16, Andrew Jones wrote:
> On Mon, Apr 01, 2024 at 03:37:01PM -0700, Atish Patra wrote:
>> On Sat, Mar 2, 2024 at 12:16 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>>>
>>> On Wed, Feb 28, 2024 at 05:01:22PM -0800, Atish Patra wrote:
>>>> Currently, we return a linux error code if creating a perf event failed
>>>> in kvm. That shouldn't be necessary as guest can continue to operate
>>>> without perf profiling or profiling with firmware counters.
>>>>
>>>> Return appropriate SBI error code to indicate that PMU configuration
>>>> failed. An error message in kvm already describes the reason for failure.
>>>
>>> I don't know enough about the perf subsystem to know if there may be
>>> a concern that resources are temporarily unavailable. If so, then this
>>
>> Do you mean the hardware resources unavailable because the host is using it ?
> 
> Yes (I think). The issue I'm thinking of is if kvm_pmu_create_perf_event
> (perf_event_create_kernel_counter) returns something like EBUSY and then
> we translate that to SBI_ERR_NOT_SUPPORTED. I'm not sure guests would
> interpret not-supported as an error which means they can retry. Or if
> they retry and get something other than not-supported if they'd be
> confused.
> 

At least in Linux driver, treats -ENOTSUPP and it just fails. Other 
guest OS implementation may interpret it differently. But they should 
fail at that point as well. I don't see how can they interpret to be retry.

The perf user can retry again with assumption that may be enough 
counters are not available at this moment. But that's different from 
return a retry from driver code.

Even if we support a retry error code, when does the caller retry it ?
The driver doesn't know how long the user is going to run the perf 
command to keep the hardware resources occupied.

I feel the perf user is the best entity to know that and should retry if 
it knows the previous run is over which might have released the hardware 
resources.

> Thanks,
> drew
>    
> 
>>
>>> patch would make it possible for a guest to do the exact same thing,
>>> but sometimes succeed and sometimes get SBI_ERR_NOT_SUPPORTED.
>>> sbi_pmu_counter_config_matching doesn't currently have any error types
>>> specified that say "unsupported at the moment, maybe try again", which
>>> would be more appropriate in that case. I do see
>>> perf_event_create_kernel_counter() can return ENOMEM when memory isn't
>>> available, but if the kernel isn't able to allocate a small amount of
>>> memory, then we're in bigger trouble anyway, so the concern would be
>>> if there are perf resource pools which may temporarily be exhausted at
>>> the time the guest makes this request.
>>>
>>
>> For other cases, this patch ensures that guests continue to run without failure
>> which allows the user in the guest to try again if this fails due to a temporary
>> resource availability.
>>
>>> One comment below.
>>>
>>>>
>>>> Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling")
>>>> Reviewed-by: Anup Patel <anup@brainfault.org>
>>>> Signed-off-by: Atish Patra <atishp@rivosinc.com>
>>>> ---
>>>>   arch/riscv/kvm/vcpu_pmu.c     | 14 +++++++++-----
>>>>   arch/riscv/kvm/vcpu_sbi_pmu.c |  6 +++---
>>>>   2 files changed, 12 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
>>>> index b1574c043f77..29bf4ca798cb 100644
>>>> --- a/arch/riscv/kvm/vcpu_pmu.c
>>>> +++ b/arch/riscv/kvm/vcpu_pmu.c
>>>> @@ -229,8 +229,9 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
>>>>        return 0;
>>>>   }
>>>>
>>>> -static int kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
>>>> -                                  unsigned long flags, unsigned long eidx, unsigned long evtdata)
>>>> +static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
>>>> +                                   unsigned long flags, unsigned long eidx,
>>>> +                                   unsigned long evtdata)
>>>>   {
>>>>        struct perf_event *event;
>>>>
>>>> @@ -454,7 +455,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
>>>>                                     unsigned long eidx, u64 evtdata,
>>>>                                     struct kvm_vcpu_sbi_return *retdata)
>>>>   {
>>>> -     int ctr_idx, ret, sbiret = 0;
>>>> +     int ctr_idx, sbiret = 0;
>>>> +     long ret;
>>>>        bool is_fevent;
>>>>        unsigned long event_code;
>>>>        u32 etype = kvm_pmu_get_perf_event_type(eidx);
>>>> @@ -513,8 +515,10 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
>>>>                        kvpmu->fw_event[event_code].started = true;
>>>>        } else {
>>>>                ret = kvm_pmu_create_perf_event(pmc, &attr, flags, eidx, evtdata);
>>>> -             if (ret)
>>>> -                     return ret;
>>>> +             if (ret) {
>>>> +                     sbiret = SBI_ERR_NOT_SUPPORTED;
>>>> +                     goto out;
>>>> +             }
>>>>        }
>>>>
>>>>        set_bit(ctr_idx, kvpmu->pmc_in_use);
>>>> diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
>>>> index 7eca72df2cbd..b70179e9e875 100644
>>>> --- a/arch/riscv/kvm/vcpu_sbi_pmu.c
>>>> +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
>>>> @@ -42,9 +42,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>>>   #endif
>>>>                /*
>>>>                 * This can fail if perf core framework fails to create an event.
>>>> -              * Forward the error to userspace because it's an error which
>>>> -              * happened within the host kernel. The other option would be
>>>> -              * to convert to an SBI error and forward to the guest.
>>>> +              * No need to forward the error to userspace and exit the guest
>>>
>>> Period after guest
>>>
>>>
>>>> +              * operation can continue without profiling. Forward the
>>>
>>> The operation
>>>
>>
>> Fixed the above two.
>>
>>
>>>> +              * appropriate SBI error to the guest.
>>>>                 */
>>>>                ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
>>>>                                                       cp->a2, cp->a3, temp, retdata);
>>>> --
>>>> 2.34.1
>>>>
>>>
>>> Thanks,
>>> drew
>>
>>
>>
>> --
>> Regards,
>> Atish


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 07/15] RISC-V: KVM: No need to exit to the user space if perf event failed
  2024-04-10 22:44         ` Atish Patra
@ 2024-04-11  7:38           ` Andrew Jones
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Jones @ 2024-04-11  7:38 UTC (permalink / raw)
  To: Atish Patra
  Cc: Atish Patra, linux-kernel, Anup Patel, Albert Ou,
	Alexandre Ghiti, Conor Dooley, Guo Ren, Icenowy Zheng, kvm-riscv,
	kvm, linux-kselftest, linux-riscv, Mark Rutland, Palmer Dabbelt,
	Paolo Bonzini, Paul Walmsley, Shuah Khan, Will Deacon

On Wed, Apr 10, 2024 at 03:44:32PM -0700, Atish Patra wrote:
> On 4/4/24 05:16, Andrew Jones wrote:
> > On Mon, Apr 01, 2024 at 03:37:01PM -0700, Atish Patra wrote:
> > > On Sat, Mar 2, 2024 at 12:16 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> > > > 
> > > > On Wed, Feb 28, 2024 at 05:01:22PM -0800, Atish Patra wrote:
> > > > > Currently, we return a linux error code if creating a perf event failed
> > > > > in kvm. That shouldn't be necessary as guest can continue to operate
> > > > > without perf profiling or profiling with firmware counters.
> > > > > 
> > > > > Return appropriate SBI error code to indicate that PMU configuration
> > > > > failed. An error message in kvm already describes the reason for failure.
> > > > 
> > > > I don't know enough about the perf subsystem to know if there may be
> > > > a concern that resources are temporarily unavailable. If so, then this
> > > 
> > > Do you mean the hardware resources unavailable because the host is using it ?
> > 
> > Yes (I think). The issue I'm thinking of is if kvm_pmu_create_perf_event
> > (perf_event_create_kernel_counter) returns something like EBUSY and then
> > we translate that to SBI_ERR_NOT_SUPPORTED. I'm not sure guests would
> > interpret not-supported as an error which means they can retry. Or if
> > they retry and get something other than not-supported if they'd be
> > confused.
> > 
> 
> At least in Linux driver, treats -ENOTSUPP and it just fails. Other guest OS
> implementation may interpret it differently. But they should fail at that
> point as well. I don't see how can they interpret to be retry.
> 
> The perf user can retry again with assumption that may be enough counters
> are not available at this moment. But that's different from return a retry
> from driver code.
> 
> Even if we support a retry error code, when does the caller retry it ?
> The driver doesn't know how long the user is going to run the perf command
> to keep the hardware resources occupied.
> 
> I feel the perf user is the best entity to know that and should retry if it
> knows the previous run is over which might have released the hardware
> resources.

I agree, but how does the user know that retrying makes sense? I presume
-ENOTSUPP will get propagated all the way to the user in a form that
means "not supported". Or, can the user list all resources and then
when they see "not supported" know that means "not supported at the
moment", as they've already seen that the resources exist?

Anyway, as I said, I don't know enough about the perf subsystem to know
if this is a real concern or not, but it sort of looks like we have
potential to tell users that something isn't supported when in fact it
is supported, but only temporarily unavailable.

Thanks,
drew

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2024-04-11  7:38 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-29  1:01 [PATCH v4 00/15] RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest Atish Patra
2024-02-29  1:01 ` [PATCH v4 01/15] RISC-V: Fix the typo in Scountovf CSR name Atish Patra
2024-03-01  8:25   ` Clément Léger
2024-02-29  1:01 ` [PATCH v4 02/15] RISC-V: Add FIRMWARE_READ_HI definition Atish Patra
2024-03-01  8:27   ` Clément Léger
2024-02-29  1:01 ` [PATCH v4 03/15] drivers/perf: riscv: Read upper bits of a firmware counter Atish Patra
2024-03-01  9:52   ` Andrew Jones
2024-02-29  1:01 ` [PATCH v4 04/15] RISC-V: Add SBI PMU snapshot definitions Atish Patra
2024-03-01 11:14   ` Andrew Jones
2024-03-01 19:30     ` Atish Kumar Patra
2024-02-29  1:01 ` [PATCH v4 05/15] drivers/perf: riscv: Implement SBI PMU snapshot function Atish Patra
2024-03-01 14:40   ` Andrew Jones
2024-03-01 15:55     ` Alexandre Ghiti
2024-02-29  1:01 ` [PATCH v4 06/15] RISC-V: KVM: No need to update the counter value during reset Atish Patra
2024-03-02  7:47   ` Andrew Jones
2024-02-29  1:01 ` [PATCH v4 07/15] RISC-V: KVM: No need to exit to the user space if perf event failed Atish Patra
2024-03-02  8:15   ` Andrew Jones
2024-04-01 22:37     ` Atish Patra
2024-04-04 12:16       ` Andrew Jones
2024-04-10 22:44         ` Atish Patra
2024-04-11  7:38           ` Andrew Jones
2024-02-29  1:01 ` [PATCH v4 08/15] RISC-V: KVM: Implement SBI PMU Snapshot feature Atish Patra
2024-03-02  9:49   ` Andrew Jones
2024-04-01 22:36     ` Atish Patra
2024-04-03  7:36       ` Atish Patra
2024-04-04 13:19         ` Andrew Jones
2024-02-29  1:01 ` [PATCH v4 09/15] RISC-V: KVM: Add perf sampling support for guests Atish Patra
2024-03-02 10:33   ` Andrew Jones
2024-04-02  8:33     ` Atish Patra
2024-04-05 12:05       ` Andrew Jones
2024-04-10  0:11         ` Atish Patra
2024-04-10  7:20           ` Andrew Jones
2024-02-29  1:01 ` [PATCH v4 10/15] RISC-V: KVM: Support 64 bit firmware counters on RV32 Atish Patra
2024-03-02 10:52   ` Andrew Jones
2024-04-02  0:03     ` Atish Patra
2024-02-29  1:01 ` [PATCH v4 11/15] KVM: riscv: selftests: Add Sscofpmf to get-reg-list test Atish Patra
2024-03-01  4:42   ` Anup Patel
2024-03-02 10:52   ` Andrew Jones
2024-02-29  1:01 ` [PATCH v4 12/15] KVM: riscv: selftests: Add SBI PMU extension definitions Atish Patra
2024-03-01  4:43   ` Anup Patel
2024-03-02 11:00   ` Andrew Jones
2024-04-02  8:43     ` Atish Patra
2024-02-29  1:01 ` [PATCH v4 13/15] KVM: riscv: selftests: Add SBI PMU selftest Atish Patra
2024-03-01  4:47   ` Anup Patel
2024-03-02  1:01     ` Atish Kumar Patra
2024-03-02 11:52   ` Andrew Jones
2024-04-02  8:34     ` Atish Patra
2024-04-05 12:48       ` Andrew Jones
2024-02-29  1:01 ` [PATCH v4 14/15] KVM: riscv: selftests: Add a test for PMU snapshot functionality Atish Patra
2024-03-01  4:50   ` Anup Patel
2024-03-02 12:13   ` Andrew Jones
2024-04-02  8:35     ` Atish Patra
2024-02-29  1:01 ` [PATCH v4 15/15] KVM: riscv: selftests: Add a test for counter overflow Atish Patra
2024-03-01  4:53   ` Anup Patel
2024-03-02 12:35   ` Andrew Jones
2024-04-02  8:42     ` Atish Patra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).