* [PATCH 2/2] kvm: Add ioctl for gathering debug counters
@ 2020-01-15 13:43 Milan Pandurov
2020-01-15 14:04 ` Alexander Graf
0 siblings, 1 reply; 18+ messages in thread
From: Milan Pandurov @ 2020-01-15 13:43 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, rkrcmar, graf, borntraeger
KVM exposes debug counters through individual debugfs files.
Monitoring these counters requires debugfs to be enabled/accessible for
the application, which might not be always the case.
Additionally, periodic monitoring multiple debugfs files from
userspace requires multiple file open/read/close + atoi conversion
operations, which is not very efficient.
Let's expose new interface to userspace for garhering these
statistics with one ioctl.
Two new ioctl methods are added:
- KVM_GET_SUPPORTED_DEBUGFS_STAT : Returns list of available counter
names. Names correspond to the debugfs file names
- KVM_GET_DEBUGFS_VALUES : Returns list of u64 values each
corresponding to a value described in KVM_GET_SUPPORTED_DEBUGFS_STAT.
Userspace application can read counter description once using
KVM_GET_SUPPORTED_DEBUGFS_STAT and periodically invoke the
KVM_GET_DEBUGFS_VALUES to get value update.
Signed-off-by: Milan Pandurov <milanpa@amazon.de>
---
Current approach returns all available counters to userspace which might
be an overkill. This can be further extended with an interface in which
userspace provides indicies of counters it is interested in counters
will be filled accordingly.
NOTE: This patch is placed on top of:
https://www.spinics.net/lists/kvm/msg202599.html
---
include/uapi/linux/kvm.h | 21 ++++++++++++
virt/kvm/kvm_main.c | 70 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 91 insertions(+)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f0a16b4adbbd..07ad35ddc14f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1473,6 +1473,27 @@ struct kvm_enc_region {
/* Available with KVM_CAP_ARM_SVE */
#define KVM_ARM_VCPU_FINALIZE _IOW(KVMIO, 0xc2, int)
+#define KVM_DBG_DESCR_NAME_MAX_SIZE 30
+struct kvm_debugfs_entry_description {
+ char name[KVM_DBG_DESCR_NAME_MAX_SIZE + 1];
+};
+
+struct kvm_debugfs_entries_description {
+ __u32 nentries;
+ struct kvm_debugfs_entry_description entry[0];
+};
+
+struct kvm_debug_stats {
+ __u32 nentries;
+ __u64 values[0];
+};
+
+/* Get description of available debugfs counters */
+#define KVM_GET_SUPPORTED_DEBUGFS_STATS \
+ _IOWR(KVMIO, 0xc2, struct kvm_debugfs_entries_description)
+/* Get values from debugfs */
+#define KVM_GET_DEBUGFS_VALUES _IOWR(KVMIO, 0xc3, struct kvm_debug_stats)
+
/* Secure Encrypted Virtualization command */
enum sev_cmd_id {
/* Guest initialization commands */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9eb6e081da3a..66b36b7e347e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -146,6 +146,10 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
+static long kvm_get_debugfs_entry_description(struct kvm *kvm,
+ void __user *argp);
+static long kvm_get_debugfs_values(struct kvm *kvm, void __user *argp);
+
__visible bool kvm_rebooting;
EXPORT_SYMBOL_GPL(kvm_rebooting);
@@ -3452,6 +3456,12 @@ static long kvm_vm_ioctl(struct file *filp,
case KVM_CHECK_EXTENSION:
r = kvm_vm_ioctl_check_extension_generic(kvm, arg);
break;
+ case KVM_GET_SUPPORTED_DEBUGFS_STATS:
+ r = kvm_get_debugfs_entry_description(kvm, argp);
+ break;
+ case KVM_GET_DEBUGFS_VALUES:
+ r = kvm_get_debugfs_values(kvm, argp);
+ break;
default:
r = kvm_arch_vm_ioctl(filp, ioctl, arg);
}
@@ -4202,6 +4212,66 @@ static const struct file_operations *stat_fops[] = {
[KVM_STAT_VM] = &vm_stat_fops,
};
+static long kvm_get_debugfs_entry_description(struct kvm *kvm,
+ void __user *argp)
+{
+ struct kvm_debugfs_entries_description *description = argp;
+ struct kvm_stats_debugfs_item *dbgfs_item = debugfs_entries;
+ bool should_copy = true;
+ size_t name_length = 0;
+ __u32 i = 0;
+
+ for (; dbgfs_item->name != NULL; dbgfs_item++, i++) {
+ if (i >= description->nentries)
+ should_copy = false;
+
+ if (should_copy) {
+ name_length = strlen(dbgfs_item->name);
+ name_length =
+ (name_length > KVM_DBG_DESCR_NAME_MAX_SIZE) ?
+ KVM_DBG_DESCR_NAME_MAX_SIZE :
+ name_length;
+
+ copy_to_user(description->entry[i].name,
+ dbgfs_item->name, name_length);
+ put_user('\0',
+ description->entry[i].name + name_length);
+ }
+ }
+ put_user(i, &description->nentries);
+ return (should_copy) ? 0 : -ENOMEM;
+}
+
+static long kvm_get_debugfs_values(struct kvm *kvm, void __user *argp)
+{
+ struct kvm_debug_stats *stats = argp;
+ struct kvm_stats_debugfs_item *dbgfs_item = debugfs_entries;
+ bool should_copy = true;
+ __u32 i = 0;
+ __u64 tmp = 0;
+
+ for (; dbgfs_item->name != NULL; dbgfs_item++, i++) {
+ if (i >= stats->nentries)
+ should_copy = false;
+
+ if (should_copy) {
+ switch (dbgfs_item->kind) {
+ case KVM_STAT_VM:
+ kvm_get_stat_per_vm(kvm, dbgfs_item->offset,
+ &tmp);
+ break;
+ case KVM_STAT_VCPU:
+ kvm_get_stat_per_vcpu(kvm, dbgfs_item->offset,
+ &tmp);
+ break;
+ }
+ put_user(tmp, stats->values + i);
+ }
+ }
+ put_user(i, &stats->nentries);
+ return (should_copy) ? 0 : -ENOMEM;
+}
+
static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
{
struct kobj_uevent_env *env;
--
2.17.1
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-15 13:43 [PATCH 2/2] kvm: Add ioctl for gathering debug counters Milan Pandurov
@ 2020-01-15 14:04 ` Alexander Graf
2020-01-15 14:43 ` milanpa
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Graf @ 2020-01-15 14:04 UTC (permalink / raw)
To: Milan Pandurov, kvm; +Cc: pbonzini, rkrcmar, borntraeger
On 15.01.20 14:43, Milan Pandurov wrote:
> KVM exposes debug counters through individual debugfs files.
> Monitoring these counters requires debugfs to be enabled/accessible for
> the application, which might not be always the case.
> Additionally, periodic monitoring multiple debugfs files from
> userspace requires multiple file open/read/close + atoi conversion
> operations, which is not very efficient.
>
> Let's expose new interface to userspace for garhering these
> statistics with one ioctl.
>
> Two new ioctl methods are added:
> - KVM_GET_SUPPORTED_DEBUGFS_STAT : Returns list of available counter
> names. Names correspond to the debugfs file names
> - KVM_GET_DEBUGFS_VALUES : Returns list of u64 values each
> corresponding to a value described in KVM_GET_SUPPORTED_DEBUGFS_STAT.
>
> Userspace application can read counter description once using
> KVM_GET_SUPPORTED_DEBUGFS_STAT and periodically invoke the
> KVM_GET_DEBUGFS_VALUES to get value update.
>
> Signed-off-by: Milan Pandurov <milanpa@amazon.de>
Thanks a lot! I really love the idea to get stats directly from the kvm
vm fd owner. This is so much better than poking at files from a random
unrelated debug or stat file system.
I have a few comments overall though:
1)
This is an interface that requires a lot of logic and buffers from user
space to retrieve individual, explicit counters. What if I just wanted
to monitor the number of exits on every user space exit?
Also, we're suddenly making the debugfs names a full ABI, because
through this interface we only identify the individual stats through
their names. That means we can not remove stats or change their names,
because people may rely on them, no? Thining about this again, maybe
they already are an ABI because people rely on them in debugfs though?
I see two alternatives to this approach here:
a) ONE_REG
We can just add a new DEBUG arch in ONE_REG and expose ONE_REG per VM as
well (if we really have to?). That gives us explicit identifiers for
each stat with an explicit path to introduce new ones with very unique
identifiers.
That would give us a very easily structured approach to retrieve
individual counters.
b) part of the mmap'ed vcpu struct
We could simply move the counters into the shared struct that we expose
to user space via mmap. IIRC we only have that per-vcpu, but then again
most counters are per-vcpu anyway, so we would push the aggregation to
user space.
For per-vm ones, maybe we can just add another mmap'ed shared page for
the vm fd?
2) vcpu counters
Most of the counters count on vcpu granularity, but debugfs only gives
us a full VM view. Whatever we do to improve the situation, we should
definitely try to build something that allows us to get the counters per
vcpu (as well).
The main purpose of these counters is monitoring. It can be quite
important to know that only a single vCPU is going wild, compared to all
of them for example.
Thanks,
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-15 14:04 ` Alexander Graf
@ 2020-01-15 14:43 ` milanpa
2020-01-15 14:59 ` Alexander Graf
0 siblings, 1 reply; 18+ messages in thread
From: milanpa @ 2020-01-15 14:43 UTC (permalink / raw)
To: Alexander Graf, Milan Pandurov, kvm; +Cc: pbonzini, rkrcmar, borntraeger
On 1/15/20 3:04 PM, Alexander Graf wrote:
>
>
> On 15.01.20 14:43, Milan Pandurov wrote:
>> KVM exposes debug counters through individual debugfs files.
>> Monitoring these counters requires debugfs to be enabled/accessible for
>> the application, which might not be always the case.
>> Additionally, periodic monitoring multiple debugfs files from
>> userspace requires multiple file open/read/close + atoi conversion
>> operations, which is not very efficient.
>>
>> Let's expose new interface to userspace for garhering these
>> statistics with one ioctl.
>>
>> Two new ioctl methods are added:
>> - KVM_GET_SUPPORTED_DEBUGFS_STAT : Returns list of available counter
>> names. Names correspond to the debugfs file names
>> - KVM_GET_DEBUGFS_VALUES : Returns list of u64 values each
>> corresponding to a value described in KVM_GET_SUPPORTED_DEBUGFS_STAT.
>>
>> Userspace application can read counter description once using
>> KVM_GET_SUPPORTED_DEBUGFS_STAT and periodically invoke the
>> KVM_GET_DEBUGFS_VALUES to get value update.
>>
>> Signed-off-by: Milan Pandurov <milanpa@amazon.de>
>
> Thanks a lot! I really love the idea to get stats directly from the
> kvm vm fd owner. This is so much better than poking at files from a
> random unrelated debug or stat file system.
>
> I have a few comments overall though:
>
>
> 1)
>
> This is an interface that requires a lot of logic and buffers from
> user space to retrieve individual, explicit counters. What if I just
> wanted to monitor the number of exits on every user space exit?
In case we want to cover such latency sensitive use cases solution b),
with mmap'ed structs you suggested, would be a way to go, IMO.
>
> Also, we're suddenly making the debugfs names a full ABI, because
> through this interface we only identify the individual stats through
> their names. That means we can not remove stats or change their names,
> because people may rely on them, no? Thining about this again, maybe
> they already are an ABI because people rely on them in debugfs though?
>
> I see two alternatives to this approach here:
>
> a) ONE_REG
>
> We can just add a new DEBUG arch in ONE_REG and expose ONE_REG per VM
> as well (if we really have to?). That gives us explicit identifiers
> for each stat with an explicit path to introduce new ones with very
> unique identifiers.
>
> That would give us a very easily structured approach to retrieve
> individual counters.
>
> b) part of the mmap'ed vcpu struct
>
> We could simply move the counters into the shared struct that we
> expose to user space via mmap. IIRC we only have that per-vcpu, but
> then again most counters are per-vcpu anyway, so we would push the
> aggregation to user space.
>
> For per-vm ones, maybe we can just add another mmap'ed shared page for
> the vm fd?
>
>
> 2) vcpu counters
>
> Most of the counters count on vcpu granularity, but debugfs only gives
> us a full VM view. Whatever we do to improve the situation, we should
> definitely try to build something that allows us to get the counters
> per vcpu (as well).
>
> The main purpose of these counters is monitoring. It can be quite
> important to know that only a single vCPU is going wild, compared to
> all of them for example.
I agree, exposing per vcpu counters can be useful. I guess it didn't
make much sense exposing them through debugfs so aggregation was done in
kernel. However if we chose to go with approach 1-b) mmap counters
struct in userspace, we could do this.
Best,
Milan
>
>
> Thanks,
>
> Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-15 14:43 ` milanpa
@ 2020-01-15 14:59 ` Alexander Graf
2020-01-17 23:38 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Graf @ 2020-01-15 14:59 UTC (permalink / raw)
To: milanpa, Milan Pandurov, kvm; +Cc: pbonzini, rkrcmar, borntraeger
On 15.01.20 15:43, milanpa@amazon.com wrote:
> On 1/15/20 3:04 PM, Alexander Graf wrote:
>>
>>
>> On 15.01.20 14:43, Milan Pandurov wrote:
>>> KVM exposes debug counters through individual debugfs files.
>>> Monitoring these counters requires debugfs to be enabled/accessible for
>>> the application, which might not be always the case.
>>> Additionally, periodic monitoring multiple debugfs files from
>>> userspace requires multiple file open/read/close + atoi conversion
>>> operations, which is not very efficient.
>>>
>>> Let's expose new interface to userspace for garhering these
>>> statistics with one ioctl.
>>>
>>> Two new ioctl methods are added:
>>> - KVM_GET_SUPPORTED_DEBUGFS_STAT : Returns list of available counter
>>> names. Names correspond to the debugfs file names
>>> - KVM_GET_DEBUGFS_VALUES : Returns list of u64 values each
>>> corresponding to a value described in KVM_GET_SUPPORTED_DEBUGFS_STAT.
>>>
>>> Userspace application can read counter description once using
>>> KVM_GET_SUPPORTED_DEBUGFS_STAT and periodically invoke the
>>> KVM_GET_DEBUGFS_VALUES to get value update.
>>>
>>> Signed-off-by: Milan Pandurov <milanpa@amazon.de>
>>
>> Thanks a lot! I really love the idea to get stats directly from the
>> kvm vm fd owner. This is so much better than poking at files from a
>> random unrelated debug or stat file system.
>>
>> I have a few comments overall though:
>>
>>
>> 1)
>>
>> This is an interface that requires a lot of logic and buffers from
>> user space to retrieve individual, explicit counters. What if I just
>> wanted to monitor the number of exits on every user space exit?
>
> In case we want to cover such latency sensitive use cases solution b),
> with mmap'ed structs you suggested, would be a way to go, IMO.
>
>>
>> Also, we're suddenly making the debugfs names a full ABI, because
>> through this interface we only identify the individual stats through
>> their names. That means we can not remove stats or change their names,
>> because people may rely on them, no? Thining about this again, maybe
>> they already are an ABI because people rely on them in debugfs though?
>>
>> I see two alternatives to this approach here:
>>
>> a) ONE_REG
>>
>> We can just add a new DEBUG arch in ONE_REG and expose ONE_REG per VM
>> as well (if we really have to?). That gives us explicit identifiers
>> for each stat with an explicit path to introduce new ones with very
>> unique identifiers.
>>
>> That would give us a very easily structured approach to retrieve
>> individual counters.
>>
>> b) part of the mmap'ed vcpu struct
>>
>> We could simply move the counters into the shared struct that we
>> expose to user space via mmap. IIRC we only have that per-vcpu, but
>> then again most counters are per-vcpu anyway, so we would push the
>> aggregation to user space.
>>
>> For per-vm ones, maybe we can just add another mmap'ed shared page for
>> the vm fd?
>>
>>
>> 2) vcpu counters
>>
>> Most of the counters count on vcpu granularity, but debugfs only gives
>> us a full VM view. Whatever we do to improve the situation, we should
>> definitely try to build something that allows us to get the counters
>> per vcpu (as well).
>>
>> The main purpose of these counters is monitoring. It can be quite
>> important to know that only a single vCPU is going wild, compared to
>> all of them for example.
>
> I agree, exposing per vcpu counters can be useful. I guess it didn't
> make much sense exposing them through debugfs so aggregation was done in
> kernel. However if we chose to go with approach 1-b) mmap counters
> struct in userspace, we could do this.
We could do it in any approach, even with statfs if we do directories
per vcpu :).
The reason I dislike the debugfs/statfs approach is that it generates a
completely separate permission and access paths to the stats. That's
great for full system monitoring, but really bad when you have multiple
individual tenants on a single host.
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-15 14:59 ` Alexander Graf
@ 2020-01-17 23:38 ` Paolo Bonzini
2020-01-20 17:53 ` Alexander Graf
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2020-01-17 23:38 UTC (permalink / raw)
To: Alexander Graf, milanpa, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 15/01/20 15:59, Alexander Graf wrote:
> On 15.01.20 15:43, milanpa@amazon.com wrote:
>>>> Let's expose new interface to userspace for garhering these
>>>> statistics with one ioctl.
>>>>
>>>> Userspace application can read counter description once using
>>>> KVM_GET_SUPPORTED_DEBUGFS_STAT and periodically invoke the
>>>> KVM_GET_DEBUGFS_VALUES to get value update.
>>>
>>> This is an interface that requires a lot of logic and buffers from
>>> user space to retrieve individual, explicit counters. What if I just
>>> wanted to monitor the number of exits on every user space exit?
>>
>> In case we want to cover such latency sensitive use cases solution b),
>> with mmap'ed structs you suggested, would be a way to go, IMO.
>>
>>> Also, we're suddenly making the debugfs names a full ABI, because
>>> through this interface we only identify the individual stats through
>>> their names. That means we can not remove stats or change their
>>> names, because people may rely on them, no? Thining about this again,
>>> maybe they already are an ABI because people rely on them in debugfs
>>> though?
In theory not, in practice I have treated them as a kind of "soft" ABI:
if the meaning changes you should rename them, and removing them is
fine, but you shouldn't for example change the unit of measure (which is
not hard since they are all counters :) but perhaps you could have
nanoseconds vs TSC cycles in the future for some counters).
>>> I see two alternatives to this approach here:
>>>
>>> a) ONE_REG
>>>
>>> We can just add a new DEBUG arch in ONE_REG and expose ONE_REG per VM
>>> as well (if we really have to?). That gives us explicit identifiers
>>> for each stat with an explicit path to introduce new ones with very
>>> unique identifiers.
ONE_REG would force us to define constants for each counter, and would
make it hard to retire them. I don't like this.
>>> b) part of the mmap'ed vcpu struct
Same here. Even if we say the semantics of the struct would be exposed
to userspace via KVM_GET_SUPPORTED_DEBUGFS_STAT, someone might end up
getting this wrong and expecting a particular layout. Milan's proposed
API has the big advantage of being hard to get wrong for userspace. And
pushing the aggregation to userspace is not a huge chore, but it's still
a chore.
So unless someone has a usecase for latency-sensitive monitoring I'd
prefer to keep it simple (usually these kind of stats can even make
sense if you gather them over relatively large period of time, because
then you'll probably use something else like tracepoints to actually
pinpoint what's going on).
>>> 2) vcpu counters
>>>
>>> Most of the counters count on vcpu granularity, but debugfs only
>>> gives us a full VM view. Whatever we do to improve the situation, we
>>> should definitely try to build something that allows us to get the
>>> counters per vcpu (as well).
>>>
>>> The main purpose of these counters is monitoring. It can be quite
>>> important to know that only a single vCPU is going wild, compared to
>>> all of them for example.
>>
>> I agree, exposing per vcpu counters can be useful. I guess it didn't
>> make much sense exposing them through debugfs so aggregation was done
>> in kernel. However if we chose to go with approach 1-b) mmap counters
>> struct in userspace, we could do this.
>
> The reason I dislike the debugfs/statfs approach is that it generates a
> completely separate permission and access paths to the stats. That's
> great for full system monitoring, but really bad when you have multiple
> individual tenants on a single host.
I agree, anything in sysfs is complementary to vmfd/vcpufd access.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-17 23:38 ` Paolo Bonzini
@ 2020-01-20 17:53 ` Alexander Graf
2020-01-20 18:57 ` milanpa
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Graf @ 2020-01-20 17:53 UTC (permalink / raw)
To: Paolo Bonzini, milanpa, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 18.01.20 00:38, Paolo Bonzini wrote:
> On 15/01/20 15:59, Alexander Graf wrote:
>> On 15.01.20 15:43, milanpa@amazon.com wrote:
>>>>> Let's expose new interface to userspace for garhering these
>>>>> statistics with one ioctl.
>>>>>
>>>>> Userspace application can read counter description once using
>>>>> KVM_GET_SUPPORTED_DEBUGFS_STAT and periodically invoke the
>>>>> KVM_GET_DEBUGFS_VALUES to get value update.
>>>>
>>>> This is an interface that requires a lot of logic and buffers from
>>>> user space to retrieve individual, explicit counters. What if I just
>>>> wanted to monitor the number of exits on every user space exit?
>>>
>>> In case we want to cover such latency sensitive use cases solution b),
>>> with mmap'ed structs you suggested, would be a way to go, IMO.
>>>
>>>> Also, we're suddenly making the debugfs names a full ABI, because
>>>> through this interface we only identify the individual stats through
>>>> their names. That means we can not remove stats or change their
>>>> names, because people may rely on them, no? Thining about this again,
>>>> maybe they already are an ABI because people rely on them in debugfs
>>>> though?
>
> In theory not, in practice I have treated them as a kind of "soft" ABI:
> if the meaning changes you should rename them, and removing them is
> fine, but you shouldn't for example change the unit of measure (which is
> not hard since they are all counters :) but perhaps you could have
> nanoseconds vs TSC cycles in the future for some counters).
>
>>>> I see two alternatives to this approach here:
>>>>
>>>> a) ONE_REG
>>>>
>>>> We can just add a new DEBUG arch in ONE_REG and expose ONE_REG per VM
>>>> as well (if we really have to?). That gives us explicit identifiers
>>>> for each stat with an explicit path to introduce new ones with very
>>>> unique identifiers.
> ONE_REG would force us to define constants for each counter, and would
> make it hard to retire them. I don't like this.
Why does it make it hard to retire them? We would just return -EINVAL on
retrieval, like we do for any other non-supported ONE_REG.
It's the same as a file not existing in debugfs/statfs. Or an entry in
the array of this patch to disappear.
>
>>>> b) part of the mmap'ed vcpu struct
>
> Same here. Even if we say the semantics of the struct would be exposed
> to userspace via KVM_GET_SUPPORTED_DEBUGFS_STAT, someone might end up
> getting this wrong and expecting a particular layout. Milan's proposed
> API has the big advantage of being hard to get wrong for userspace. And
> pushing the aggregation to userspace is not a huge chore, but it's still
> a chore.
>
> So unless someone has a usecase for latency-sensitive monitoring I'd
> prefer to keep it simple (usually these kind of stats can even make
> sense if you gather them over relatively large period of time, because
> then you'll probably use something else like tracepoints to actually
> pinpoint what's going on).
I tend to agree. Fetching them via an explicit call into the kernel is
definitely the safer route.
>
>>>> 2) vcpu counters
>>>>
>>>> Most of the counters count on vcpu granularity, but debugfs only
>>>> gives us a full VM view. Whatever we do to improve the situation, we
>>>> should definitely try to build something that allows us to get the
>>>> counters per vcpu (as well).
>>>>
>>>> The main purpose of these counters is monitoring. It can be quite
>>>> important to know that only a single vCPU is going wild, compared to
>>>> all of them for example.
>>>
>>> I agree, exposing per vcpu counters can be useful. I guess it didn't
>>> make much sense exposing them through debugfs so aggregation was done
>>> in kernel. However if we chose to go with approach 1-b) mmap counters
>>> struct in userspace, we could do this.
>>
>> The reason I dislike the debugfs/statfs approach is that it generates a
>> completely separate permission and access paths to the stats. That's
>> great for full system monitoring, but really bad when you have multiple
>> individual tenants on a single host.
>
> I agree, anything in sysfs is complementary to vmfd/vcpufd access.
Cool :).
So we only need to agree on ONE_REG vs. this patch mostly?
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-20 17:53 ` Alexander Graf
@ 2020-01-20 18:57 ` milanpa
2020-01-21 15:38 ` Alexander Graf
0 siblings, 1 reply; 18+ messages in thread
From: milanpa @ 2020-01-20 18:57 UTC (permalink / raw)
To: Alexander Graf, Paolo Bonzini, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 1/20/20 6:53 PM, Alexander Graf wrote:
>
>
> On 18.01.20 00:38, Paolo Bonzini wrote:
>> On 15/01/20 15:59, Alexander Graf wrote:
>>> On 15.01.20 15:43, milanpa@amazon.com wrote:
>>>>>> Let's expose new interface to userspace for garhering these
>>>>>> statistics with one ioctl.
>>>>>>
>>>>>> Userspace application can read counter description once using
>>>>>> KVM_GET_SUPPORTED_DEBUGFS_STAT and periodically invoke the
>>>>>> KVM_GET_DEBUGFS_VALUES to get value update.
>>>>>
>>>>> This is an interface that requires a lot of logic and buffers from
>>>>> user space to retrieve individual, explicit counters. What if I just
>>>>> wanted to monitor the number of exits on every user space exit?
>>>>
>>>> In case we want to cover such latency sensitive use cases solution b),
>>>> with mmap'ed structs you suggested, would be a way to go, IMO.
>>>>
>>>>> Also, we're suddenly making the debugfs names a full ABI, because
>>>>> through this interface we only identify the individual stats through
>>>>> their names. That means we can not remove stats or change their
>>>>> names, because people may rely on them, no? Thining about this again,
>>>>> maybe they already are an ABI because people rely on them in debugfs
>>>>> though?
>>
>> In theory not, in practice I have treated them as a kind of "soft" ABI:
>> if the meaning changes you should rename them, and removing them is
>> fine, but you shouldn't for example change the unit of measure (which is
>> not hard since they are all counters :) but perhaps you could have
>> nanoseconds vs TSC cycles in the future for some counters).
>>
>>>>> I see two alternatives to this approach here:
>>>>>
>>>>> a) ONE_REG
>>>>>
>>>>> We can just add a new DEBUG arch in ONE_REG and expose ONE_REG per VM
>>>>> as well (if we really have to?). That gives us explicit identifiers
>>>>> for each stat with an explicit path to introduce new ones with very
>>>>> unique identifiers.
>> ONE_REG would force us to define constants for each counter, and would
>> make it hard to retire them. I don't like this.
>
> Why does it make it hard to retire them? We would just return -EINVAL
> on retrieval, like we do for any other non-supported ONE_REG.
>
> It's the same as a file not existing in debugfs/statfs. Or an entry in
> the array of this patch to disappear.
>
>>
>>>>> b) part of the mmap'ed vcpu struct
>>
>> Same here. Even if we say the semantics of the struct would be exposed
>> to userspace via KVM_GET_SUPPORTED_DEBUGFS_STAT, someone might end up
>> getting this wrong and expecting a particular layout. Milan's proposed
>> API has the big advantage of being hard to get wrong for userspace. And
>> pushing the aggregation to userspace is not a huge chore, but it's still
>> a chore.
>>
>> So unless someone has a usecase for latency-sensitive monitoring I'd
>> prefer to keep it simple (usually these kind of stats can even make
>> sense if you gather them over relatively large period of time, because
>> then you'll probably use something else like tracepoints to actually
>> pinpoint what's going on).
>
> I tend to agree. Fetching them via an explicit call into the kernel is
> definitely the safer route.
>
>>
>>>>> 2) vcpu counters
>>>>>
>>>>> Most of the counters count on vcpu granularity, but debugfs only
>>>>> gives us a full VM view. Whatever we do to improve the situation, we
>>>>> should definitely try to build something that allows us to get the
>>>>> counters per vcpu (as well).
>>>>>
>>>>> The main purpose of these counters is monitoring. It can be quite
>>>>> important to know that only a single vCPU is going wild, compared to
>>>>> all of them for example.
>>>>
>>>> I agree, exposing per vcpu counters can be useful. I guess it didn't
>>>> make much sense exposing them through debugfs so aggregation was done
>>>> in kernel. However if we chose to go with approach 1-b) mmap counters
>>>> struct in userspace, we could do this.
>>>
>>> The reason I dislike the debugfs/statfs approach is that it generates a
>>> completely separate permission and access paths to the stats. That's
>>> great for full system monitoring, but really bad when you have multiple
>>> individual tenants on a single host.
>>
>> I agree, anything in sysfs is complementary to vmfd/vcpufd access.
>
> Cool :).
>
> So we only need to agree on ONE_REG vs. this patch mostly?
What about extending KVM_GET_SUPPORTED_DEBUGFS_STAT with some additional
information like the data type and unit? ONE_REG exposes additional
semantics about data.
>
> Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-20 18:57 ` milanpa
@ 2020-01-21 15:38 ` Alexander Graf
2020-01-23 12:08 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Graf @ 2020-01-21 15:38 UTC (permalink / raw)
To: milanpa, Paolo Bonzini, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 20.01.20 19:57, milanpa@amazon.com wrote:
> On 1/20/20 6:53 PM, Alexander Graf wrote:
>>
>>
>> On 18.01.20 00:38, Paolo Bonzini wrote:
>>> On 15/01/20 15:59, Alexander Graf wrote:
>>>> On 15.01.20 15:43, milanpa@amazon.com wrote:
>>>>>>> Let's expose new interface to userspace for garhering these
>>>>>>> statistics with one ioctl.
>>>>>>>
>>>>>>> Userspace application can read counter description once using
>>>>>>> KVM_GET_SUPPORTED_DEBUGFS_STAT and periodically invoke the
>>>>>>> KVM_GET_DEBUGFS_VALUES to get value update.
>>>>>>
>>>>>> This is an interface that requires a lot of logic and buffers from
>>>>>> user space to retrieve individual, explicit counters. What if I just
>>>>>> wanted to monitor the number of exits on every user space exit?
>>>>>
>>>>> In case we want to cover such latency sensitive use cases solution b),
>>>>> with mmap'ed structs you suggested, would be a way to go, IMO.
>>>>>
>>>>>> Also, we're suddenly making the debugfs names a full ABI, because
>>>>>> through this interface we only identify the individual stats through
>>>>>> their names. That means we can not remove stats or change their
>>>>>> names, because people may rely on them, no? Thining about this again,
>>>>>> maybe they already are an ABI because people rely on them in debugfs
>>>>>> though?
>>>
>>> In theory not, in practice I have treated them as a kind of "soft" ABI:
>>> if the meaning changes you should rename them, and removing them is
>>> fine, but you shouldn't for example change the unit of measure (which is
>>> not hard since they are all counters :) but perhaps you could have
>>> nanoseconds vs TSC cycles in the future for some counters).
>>>
>>>>>> I see two alternatives to this approach here:
>>>>>>
>>>>>> a) ONE_REG
>>>>>>
>>>>>> We can just add a new DEBUG arch in ONE_REG and expose ONE_REG per VM
>>>>>> as well (if we really have to?). That gives us explicit identifiers
>>>>>> for each stat with an explicit path to introduce new ones with very
>>>>>> unique identifiers.
>>> ONE_REG would force us to define constants for each counter, and would
>>> make it hard to retire them. I don't like this.
>>
>> Why does it make it hard to retire them? We would just return -EINVAL
>> on retrieval, like we do for any other non-supported ONE_REG.
>>
>> It's the same as a file not existing in debugfs/statfs. Or an entry in
>> the array of this patch to disappear.
>>
>>>
>>>>>> b) part of the mmap'ed vcpu struct
>>>
>>> Same here. Even if we say the semantics of the struct would be exposed
>>> to userspace via KVM_GET_SUPPORTED_DEBUGFS_STAT, someone might end up
>>> getting this wrong and expecting a particular layout. Milan's proposed
>>> API has the big advantage of being hard to get wrong for userspace. And
>>> pushing the aggregation to userspace is not a huge chore, but it's still
>>> a chore.
>>>
>>> So unless someone has a usecase for latency-sensitive monitoring I'd
>>> prefer to keep it simple (usually these kind of stats can even make
>>> sense if you gather them over relatively large period of time, because
>>> then you'll probably use something else like tracepoints to actually
>>> pinpoint what's going on).
>>
>> I tend to agree. Fetching them via an explicit call into the kernel is
>> definitely the safer route.
>>
>>>
>>>>>> 2) vcpu counters
>>>>>>
>>>>>> Most of the counters count on vcpu granularity, but debugfs only
>>>>>> gives us a full VM view. Whatever we do to improve the situation, we
>>>>>> should definitely try to build something that allows us to get the
>>>>>> counters per vcpu (as well).
>>>>>>
>>>>>> The main purpose of these counters is monitoring. It can be quite
>>>>>> important to know that only a single vCPU is going wild, compared to
>>>>>> all of them for example.
>>>>>
>>>>> I agree, exposing per vcpu counters can be useful. I guess it didn't
>>>>> make much sense exposing them through debugfs so aggregation was done
>>>>> in kernel. However if we chose to go with approach 1-b) mmap counters
>>>>> struct in userspace, we could do this.
>>>>
>>>> The reason I dislike the debugfs/statfs approach is that it generates a
>>>> completely separate permission and access paths to the stats. That's
>>>> great for full system monitoring, but really bad when you have multiple
>>>> individual tenants on a single host.
>>>
>>> I agree, anything in sysfs is complementary to vmfd/vcpufd access.
>>
>> Cool :).
>>
>> So we only need to agree on ONE_REG vs. this patch mostly?
>
> What about extending KVM_GET_SUPPORTED_DEBUGFS_STAT with some additional
> information like the data type and unit? ONE_REG exposes additional
> semantics about data.
It's not a problem of exposing the type information - we have that today
by implicitly saying "every counter is 64bit".
The thing I'm worried about is that we keep inventing these special
purpose interfaces that really do nothing but transfer numbers in one
way or another. ONE_REG's purpose was to unify them. Debug counters
really are the same story.
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-21 15:38 ` Alexander Graf
@ 2020-01-23 12:08 ` Paolo Bonzini
2020-01-23 12:32 ` Alexander Graf
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2020-01-23 12:08 UTC (permalink / raw)
To: Alexander Graf, milanpa, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 21/01/20 16:38, Alexander Graf wrote:
>>>> ONE_REG would force us to define constants for each counter, and would
>>>> make it hard to retire them. I don't like this.
>>>
>>> Why does it make it hard to retire them? We would just return -EINVAL
>>> on retrieval, like we do for any other non-supported ONE_REG.
>>>
>>> It's the same as a file not existing in debugfs/statfs. Or an entry
>>> in the array of this patch to disappear.
The devil is in the details. For example, would you retire uapi/
constants and cause programs to fail compilation? Or do you keep the
obsolete constants forever? Also, fixing the mapping from ONE_REG
number to stat would mean a switch statement (or loop of some kind---a
switch statement is basically an unrolled binary search) to access the
stats. Instead returning the id in KVM_GET_SUPPORTED_DEBUGFS_STAT would
simplify returning the stats to a simple copy_to_user.
Of course, some of the complexity would be punted to userspace. But
userspace is much closer to the humans that ultimately look at the
stats, so the question is: does userspace really care about knowing
which stat is which, or do they just care about having a name that will
ultimately be consumed by humans down the pipe? If the latter (which is
also my gut feeling), that is also a point against ONE_REG.
> It's not a problem of exposing the type information - we have that today
> by implicitly saying "every counter is 64bit".
>
> The thing I'm worried about is that we keep inventing these special
> purpose interfaces that really do nothing but transfer numbers in one
> way or another. ONE_REG's purpose was to unify them. Debug counters
> really are the same story.
See above: I am not sure they are the same story because their consumers
might be very different from registers. Registers are generally
consumed by programs (to migrate VMs, for example) and only occasionally
by humans, while stats are meant to be consumed by humans. We may
disagree on whether this justifies a completely different API...
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-23 12:08 ` Paolo Bonzini
@ 2020-01-23 12:32 ` Alexander Graf
2020-01-23 14:19 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Graf @ 2020-01-23 12:32 UTC (permalink / raw)
To: Paolo Bonzini, milanpa, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 23.01.20 13:08, Paolo Bonzini wrote:
> On 21/01/20 16:38, Alexander Graf wrote:
>>>>> ONE_REG would force us to define constants for each counter, and would
>>>>> make it hard to retire them. I don't like this.
>>>>
>>>> Why does it make it hard to retire them? We would just return -EINVAL
>>>> on retrieval, like we do for any other non-supported ONE_REG.
>>>>
>>>> It's the same as a file not existing in debugfs/statfs. Or an entry
>>>> in the array of this patch to disappear.
>
> The devil is in the details. For example, would you retire uapi/
> constants and cause programs to fail compilation? Or do you keep the
> obsolete constants forever? Also, fixing the mapping from ONE_REG
> number to stat would mean a switch statement (or loop of some kind---a
> switch statement is basically an unrolled binary search) to access the
> stats. Instead returning the id in KVM_GET_SUPPORTED_DEBUGFS_STAT would
> simplify returning the stats to a simple copy_to_user.
If you look at the way RISC-V implemented ONE_REG, I think we can agree
that it's possible with constant identifiers as well :). The only
downside is of course that you may potentially end up with an identifier
array to map from "ONE_REG id" to "offset in vcpu/vm struct". I fail to
see how that's worse than the struct kvm_stats_debugfs_item[] we have today.
> Of course, some of the complexity would be punted to userspace. But
> userspace is much closer to the humans that ultimately look at the
> stats, so the question is: does userspace really care about knowing
> which stat is which, or do they just care about having a name that will
> ultimately be consumed by humans down the pipe? If the latter (which is
> also my gut feeling), that is also a point against ONE_REG.
>
>> It's not a problem of exposing the type information - we have that today
>> by implicitly saying "every counter is 64bit".
>>
>> The thing I'm worried about is that we keep inventing these special
>> purpose interfaces that really do nothing but transfer numbers in one
>> way or another. ONE_REG's purpose was to unify them. Debug counters
>> really are the same story.
>
> See above: I am not sure they are the same story because their consumers
> might be very different from registers. Registers are generally
> consumed by programs (to migrate VMs, for example) and only occasionally
> by humans, while stats are meant to be consumed by humans. We may
> disagree on whether this justifies a completely different API...
I don't fully agree on the "human" part here. At the end of the day, you
want stats because you want to act on stats. Ideally, you want to do
that fully automatically. Let me give you a few examples:
1) insn_emulation_fail triggers
You may want to feed all the failures into a database to check whether
there is something wrong in the emulator.
2) (remote_)tlb_flush beyond certain threshold
If you see that you're constantly flushing remote TLBs, there's a good
chance that you found a workload that may need tuning in KVM. You want
to gather those stats across your full fleet of hosts, so that for the
few occasions when you hit it, you can work with the actual VM owners to
potentially improve their performance
3) exits beyond certain threshold
You know roughly how many exits your fleet would usually see, so you can
configure an upper threshold on that. When you now have an automated way
to notify you when the threshold is exceeded, you can check what that
particular guest did to raise so many exits.
... and I'm sure there's room for a lot more potential stats that could
be useful to gather to determine the health of a KVM environment, such
as a "vcpu steal time" one or a "maximum time between two VMENTERS while
the guest was in running state".
All of these should eventually feed into something bigger that collects
the numbers across your full VM fleet, so that a human can take actions
based on them. However, that means the values are no longer directly
impacting a human, they need to feed into machines. And for that, exact,
constant identifiers make much more sense :)
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-23 12:32 ` Alexander Graf
@ 2020-01-23 14:19 ` Paolo Bonzini
2020-01-23 14:45 ` Alexander Graf
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2020-01-23 14:19 UTC (permalink / raw)
To: Alexander Graf, milanpa, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 23/01/20 13:32, Alexander Graf wrote:
>> See above: I am not sure they are the same story because their consumers
>> might be very different from registers. Registers are generally
>> consumed by programs (to migrate VMs, for example) and only occasionally
>> by humans, while stats are meant to be consumed by humans. We may
>> disagree on whether this justifies a completely different API...
>
> I don't fully agree on the "human" part here.
I agree it's not entirely about humans, but in general it's going to be
rules and queries on monitoring tools, where 1) the monitoring tools'
output is generally not KVM-specific, 2) the rules and queries will be
written by humans.
So if the kernel produces insn_emulation_fail, the plugin for the
monitoring tool will just log kvm.insn_emulation_fail. If the kernel
produces 0x10042, the plugin will have to convert it and then log it.
This is why I'm not sure that providing strings is actually less work
for userspace.
Paolo
> At the end of the day, you
> want stats because you want to act on stats. Ideally, you want to do
> that fully automatically. Let me give you a few examples:
>
> 1) insn_emulation_fail triggers
>
> You may want to feed all the failures into a database to check whether
> there is something wrong in the emulator.
>
> 2) (remote_)tlb_flush beyond certain threshold
>
> If you see that you're constantly flushing remote TLBs, there's a good
> chance that you found a workload that may need tuning in KVM. You want
> to gather those stats across your full fleet of hosts, so that for the
> few occasions when you hit it, you can work with the actual VM owners to
> potentially improve their performance
>
> 3) exits beyond certain threshold
>
> You know roughly how many exits your fleet would usually see, so you can
> configure an upper threshold on that. When you now have an automated way
> to notify you when the threshold is exceeded, you can check what that
> particular guest did to raise so many exits.
>
>
> ... and I'm sure there's room for a lot more potential stats that could
> be useful to gather to determine the health of a KVM environment, such
> as a "vcpu steal time" one or a "maximum time between two VMENTERS while
> the guest was in running state".
>
> All of these should eventually feed into something bigger that collects
> the numbers across your full VM fleet, so that a human can take actions
> based on them. However, that means the values are no longer directly
> impacting a human, they need to feed into machines. And for that, exact,
> constant identifiers make much more sense
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-23 14:19 ` Paolo Bonzini
@ 2020-01-23 14:45 ` Alexander Graf
2020-01-23 14:50 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Graf @ 2020-01-23 14:45 UTC (permalink / raw)
To: Paolo Bonzini, milanpa, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 23.01.20 15:19, Paolo Bonzini wrote:
> On 23/01/20 13:32, Alexander Graf wrote:
>>> See above: I am not sure they are the same story because their consumers
>>> might be very different from registers. Registers are generally
>>> consumed by programs (to migrate VMs, for example) and only occasionally
>>> by humans, while stats are meant to be consumed by humans. We may
>>> disagree on whether this justifies a completely different API...
>>
>> I don't fully agree on the "human" part here.
>
> I agree it's not entirely about humans, but in general it's going to be
> rules and queries on monitoring tools, where 1) the monitoring tools'
> output is generally not KVM-specific, 2) the rules and queries will be
> written by humans.
>
> So if the kernel produces insn_emulation_fail, the plugin for the
> monitoring tool will just log kvm.insn_emulation_fail. If the kernel
> produces 0x10042, the plugin will have to convert it and then log it.
> This is why I'm not sure that providing strings is actually less work
> for userspace.
I think we're in agreement then, just leaning onto the other side of the
same fence. My take is that if I don't know whether a string is
necessary, I'd rather not have a string :).
I guess as long as we do get stat information out per-vm as well as
per-vcpu through vmfd and vcpufd, I'm happy overall.
So how strongly do you feel about the string based approach?
Alex
PS: You could btw easily add a "give me the string for a ONE_REG id"
interface in KVM to translate from 0x10042 to "insn_emulation_fail" :).
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-23 14:45 ` Alexander Graf
@ 2020-01-23 14:50 ` Paolo Bonzini
2020-01-23 14:58 ` Alexander Graf
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2020-01-23 14:50 UTC (permalink / raw)
To: Alexander Graf, milanpa, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 23/01/20 15:45, Alexander Graf wrote:
> I think we're in agreement then, just leaning onto the other side of the
> same fence. My take is that if I don't know whether a string is
> necessary, I'd rather not have a string :).
And for me it's if I don't know whether a #define is necessary, I'd
rather not have a #define. So yeah we agree on everything except the
userspace API (which is no small thing, but it's a start).
> I guess as long as we do get stat information out per-vm as well as
> per-vcpu through vmfd and vcpufd, I'm happy overall.
>
> So how strongly do you feel about the string based approach?
I like it, of course.
> PS: You could btw easily add a "give me the string for a ONE_REG id"
> interface in KVM to translate from 0x10042 to "insn_emulation_fail" :).
That could actually be somewhat useful for VCPU registers as well (give
me the string and type, and a list of valid ONE_REG ids). If that was
the case, of course it would be fine for me to use ONE_REG on a VM. The
part which I don't like is having to make all ONE_REG part of the
userspace ABI/API.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-23 14:50 ` Paolo Bonzini
@ 2020-01-23 14:58 ` Alexander Graf
2020-01-23 15:05 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Graf @ 2020-01-23 14:58 UTC (permalink / raw)
To: Paolo Bonzini, milanpa, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 23.01.20 15:50, Paolo Bonzini wrote:
> On 23/01/20 15:45, Alexander Graf wrote:
>> I think we're in agreement then, just leaning onto the other side of the
>> same fence. My take is that if I don't know whether a string is
>> necessary, I'd rather not have a string :).
>
> And for me it's if I don't know whether a #define is necessary, I'd
> rather not have a #define. So yeah we agree on everything except the
> userspace API (which is no small thing, but it's a start).
>
>> I guess as long as we do get stat information out per-vm as well as
>> per-vcpu through vmfd and vcpufd, I'm happy overall.
>>
>> So how strongly do you feel about the string based approach?
>
> I like it, of course.
>
>> PS: You could btw easily add a "give me the string for a ONE_REG id"
>> interface in KVM to translate from 0x10042 to "insn_emulation_fail" :).
>
> That could actually be somewhat useful for VCPU registers as well (give
> me the string and type, and a list of valid ONE_REG ids). If that was
You don't need the type explicitly, it's part of the ONE_REG identifier :).
> the case, of course it would be fine for me to use ONE_REG on a VM. The
> part which I don't like is having to make all ONE_REG part of the
> userspace ABI/API.
We don't have all of ONE_REG as part of the user space ABI today. On
Book3S PPC you can not access BookE ONE_REG variables for obvious
reasons. Neither can you access ARM ONE_REGs. Not implementing a ONE_REG
is perfectly ok.
But I like the idea of a ONE_REG query interface that gives you the list
of available registers and a string representation of them. It would
make programming kvm from Python so easy!
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-23 14:58 ` Alexander Graf
@ 2020-01-23 15:05 ` Paolo Bonzini
2020-01-23 15:27 ` milanpa
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2020-01-23 15:05 UTC (permalink / raw)
To: Alexander Graf, milanpa, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 23/01/20 15:58, Alexander Graf wrote:
>> the case, of course it would be fine for me to use ONE_REG on a VM. The
>> part which I don't like is having to make all ONE_REG part of the
>> userspace ABI/API.
>
> We don't have all of ONE_REG as part of the user space ABI today.
Still those that exist cannot change their id. This makes them a part
of the userspace ABI.
> But I like the idea of a ONE_REG query interface that gives you the list
> of available registers and a string representation of them. It would
> make programming kvm from Python so easy!
Yeah, wouldn't it? Milan, what do you think about it?
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-23 15:05 ` Paolo Bonzini
@ 2020-01-23 15:27 ` milanpa
2020-01-23 16:15 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: milanpa @ 2020-01-23 15:27 UTC (permalink / raw)
To: Paolo Bonzini, Alexander Graf, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 1/23/20 4:05 PM, Paolo Bonzini wrote:
> On 23/01/20 15:58, Alexander Graf wrote:
>>> the case, of course it would be fine for me to use ONE_REG on a VM. The
>>> part which I don't like is having to make all ONE_REG part of the
>>> userspace ABI/API.
>> We don't have all of ONE_REG as part of the user space ABI today.
> Still those that exist cannot change their id. This makes them a part
> of the userspace ABI.
>
>> But I like the idea of a ONE_REG query interface that gives you the list
>> of available registers and a string representation of them. It would
>> make programming kvm from Python so easy!
> Yeah, wouldn't it? Milan, what do you think about it?
>
> Paolo
>
I agree, extending the API with GET_AVAILABLE_ONE_REGS (and possibly a
bitmask argument to narrow down which type of registers userspace is
interested in) is a clean solution. We won't require userspace to rely
on constants in compile time if it doesn't need to.
Only concern is that now we need to have some kind of datastructure for
keeping the mappings between all available ONE_REG IDs and their
strings/descriptions. Additionally enforcing that newly added ONE_REGs
always get added to that mapping, is also necessary.
Milan
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-23 15:27 ` milanpa
@ 2020-01-23 16:15 ` Paolo Bonzini
2020-01-23 18:31 ` milanpa
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2020-01-23 16:15 UTC (permalink / raw)
To: milanpa, Alexander Graf, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 23/01/20 16:27, milanpa@amazon.com wrote:
>>
>>
>> Paolo
>>
> I agree, extending the API with GET_AVAILABLE_ONE_REGS (and possibly a
> bitmask argument to narrow down which type of registers userspace is
> interested in) is a clean solution. We won't require userspace to rely
> on constants in compile time if it doesn't need to.
>
> Only concern is that now we need to have some kind of datastructure for
> keeping the mappings between all available ONE_REG IDs and their
> strings/descriptions. Additionally enforcing that newly added ONE_REGs
> always get added to that mapping, is also necessary.
For now just do the implementation for VM ONE_REGs. We'll worry about
the existing VCPU registers later.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
2020-01-23 16:15 ` Paolo Bonzini
@ 2020-01-23 18:31 ` milanpa
0 siblings, 0 replies; 18+ messages in thread
From: milanpa @ 2020-01-23 18:31 UTC (permalink / raw)
To: Paolo Bonzini, Alexander Graf, Milan Pandurov, kvm; +Cc: rkrcmar, borntraeger
On 1/23/20 5:15 PM, Paolo Bonzini wrote:
> On 23/01/20 16:27, milanpa@amazon.com wrote:
>>>
>>> Paolo
>>>
>> I agree, extending the API with GET_AVAILABLE_ONE_REGS (and possibly a
>> bitmask argument to narrow down which type of registers userspace is
>> interested in) is a clean solution. We won't require userspace to rely
>> on constants in compile time if it doesn't need to.
>>
>> Only concern is that now we need to have some kind of datastructure for
>> keeping the mappings between all available ONE_REG IDs and their
>> strings/descriptions. Additionally enforcing that newly added ONE_REGs
>> always get added to that mapping, is also necessary.
> For now just do the implementation for VM ONE_REGs. We'll worry about
> the existing VCPU registers later.
>
> Paolo
>
Sounds good.
Thanks for the help, I will update the patch soon.
Milan
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2020-01-23 18:32 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 13:43 [PATCH 2/2] kvm: Add ioctl for gathering debug counters Milan Pandurov
2020-01-15 14:04 ` Alexander Graf
2020-01-15 14:43 ` milanpa
2020-01-15 14:59 ` Alexander Graf
2020-01-17 23:38 ` Paolo Bonzini
2020-01-20 17:53 ` Alexander Graf
2020-01-20 18:57 ` milanpa
2020-01-21 15:38 ` Alexander Graf
2020-01-23 12:08 ` Paolo Bonzini
2020-01-23 12:32 ` Alexander Graf
2020-01-23 14:19 ` Paolo Bonzini
2020-01-23 14:45 ` Alexander Graf
2020-01-23 14:50 ` Paolo Bonzini
2020-01-23 14:58 ` Alexander Graf
2020-01-23 15:05 ` Paolo Bonzini
2020-01-23 15:27 ` milanpa
2020-01-23 16:15 ` Paolo Bonzini
2020-01-23 18:31 ` milanpa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).