* [PATCH 0/4] kvm: export TSC offset to user-space
@ 2016-08-31 17:05 Luiz Capitulino
2016-08-31 17:05 ` [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer Luiz Capitulino
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Luiz Capitulino @ 2016-08-31 17:05 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti
Hi,
We need to retrieve a VM's TSC offset in order to use
the host's TSC to merge host and guest traces. This is
explained in detail in this thread:
[Qemu-devel] [RFC] host and guest kernel trace merging
https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
Today, the only way to retrieve a VM's TSC offset is
by using the kvm_write_tsc_offset tracepoint. This has
a few problems. First, the tracepoint is only emitted
when the VM boots, which requires a reboot to get it if
the VM is already running. Second, tracepoints are not
supposed to be ABIs in case they need to be consumed by
user-space tools.
This series exports a VM's TSC offset to user-space via
debugfs. A new file called "tsc-offset" is created in
the VM's debugfs directory. For example:
/sys/kernel/debug/kvm/51696-10/tsc-offset
This file contains one TSC offset per line, for each
vCPU. For example:
vcpu0: 18446742405270834952
vcpu1: 18446742405270834952
vcpu2: 18446742405270834952
vcpu3: 18446742405270834952
Please, see patch 4/4 for additional details.
Luiz Capitulino (4):
kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer
kvm: kvm_create_vm_debugfs(): cleanup on error
kvm: add stub for arch specific debugfs support
kvm: x86: export TSC offset to user-space
arch/arm/kvm/arm.c | 5 +++++
arch/mips/kvm/mips.c | 5 +++++
arch/powerpc/kvm/powerpc.c | 5 +++++
arch/s390/kvm/kvm-s390.c | 5 +++++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/vmx.c | 8 ++++++++
arch/x86/kvm/x86.c | 35 +++++++++++++++++++++++++++++++++++
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_main.c | 22 ++++++++++++++++------
10 files changed, 83 insertions(+), 6 deletions(-)
--
2.5.5
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer
2016-08-31 17:05 [PATCH 0/4] kvm: export TSC offset to user-space Luiz Capitulino
@ 2016-08-31 17:05 ` Luiz Capitulino
2016-09-02 13:51 ` Paolo Bonzini
2016-08-31 17:05 ` [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error Luiz Capitulino
` (2 subsequent siblings)
3 siblings, 1 reply; 20+ messages in thread
From: Luiz Capitulino @ 2016-08-31 17:05 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti
This make it possible to call kvm_destroy_vm_debugfs() from
kvm_create_vm_debugfs() in error conditions.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
virt/kvm/kvm_main.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1950782..c1dc45e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -559,9 +559,11 @@ static void kvm_destroy_vm_debugfs(struct kvm *kvm)
debugfs_remove_recursive(kvm->debugfs_dentry);
- for (i = 0; i < kvm_debugfs_num_entries; i++)
- kfree(kvm->debugfs_stat_data[i]);
- kfree(kvm->debugfs_stat_data);
+ if (kvm->debugfs_stat_data) {
+ for (i = 0; i < kvm_debugfs_num_entries; i++)
+ kfree(kvm->debugfs_stat_data[i]);
+ kfree(kvm->debugfs_stat_data);
+ }
}
static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
--
2.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error
2016-08-31 17:05 [PATCH 0/4] kvm: export TSC offset to user-space Luiz Capitulino
2016-08-31 17:05 ` [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer Luiz Capitulino
@ 2016-08-31 17:05 ` Luiz Capitulino
2016-09-02 13:53 ` Paolo Bonzini
2016-08-31 17:05 ` [PATCH 3/4] kvm: add stub for arch specific debugfs support Luiz Capitulino
2016-08-31 17:05 ` [PATCH 4/4] kvm: x86: export TSC offset to user-space Luiz Capitulino
3 siblings, 1 reply; 20+ messages in thread
From: Luiz Capitulino @ 2016-08-31 17:05 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti
Memory and debugfs entries are leaked on error. Fix it.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
virt/kvm/kvm_main.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c1dc45e..9293285 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -585,12 +585,12 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
sizeof(*kvm->debugfs_stat_data),
GFP_KERNEL);
if (!kvm->debugfs_stat_data)
- return -ENOMEM;
+ goto out_err;
for (p = debugfs_entries; p->name; p++) {
stat_data = kzalloc(sizeof(*stat_data), GFP_KERNEL);
if (!stat_data)
- return -ENOMEM;
+ goto out_err;
stat_data->kvm = kvm;
stat_data->offset = p->offset;
@@ -599,9 +599,13 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
kvm->debugfs_dentry,
stat_data,
stat_fops_per_vm[p->kind]))
- return -ENOMEM;
+ goto out_err;
}
return 0;
+
+out_err:
+ kvm_destroy_vm_debugfs(kvm);
+ return -ENOMEM;
}
static struct kvm *kvm_create_vm(unsigned long type)
--
2.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/4] kvm: add stub for arch specific debugfs support
2016-08-31 17:05 [PATCH 0/4] kvm: export TSC offset to user-space Luiz Capitulino
2016-08-31 17:05 ` [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer Luiz Capitulino
2016-08-31 17:05 ` [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error Luiz Capitulino
@ 2016-08-31 17:05 ` Luiz Capitulino
2016-09-02 13:53 ` Paolo Bonzini
2016-09-03 3:34 ` Masami Hiramatsu
2016-08-31 17:05 ` [PATCH 4/4] kvm: x86: export TSC offset to user-space Luiz Capitulino
3 siblings, 2 replies; 20+ messages in thread
From: Luiz Capitulino @ 2016-08-31 17:05 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti
kvm_arch_create_vm_debugfs() allows arch specific code to
create entries in the VM's directory in debugfs. x86 will
implement support for this in the next commit.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
arch/arm/kvm/arm.c | 5 +++++
arch/mips/kvm/mips.c | 5 +++++
arch/powerpc/kvm/powerpc.c | 5 +++++
arch/s390/kvm/kvm-s390.c | 5 +++++
arch/x86/kvm/x86.c | 5 +++++
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_main.c | 4 ++++
7 files changed, 31 insertions(+)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 75f130e..491211f 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -144,6 +144,11 @@ out_fail_alloc:
return ret;
}
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+ return 0;
+}
+
int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
{
return VM_FAULT_SIGBUS;
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a6ea084..a854b8b 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -140,6 +140,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return 0;
}
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+ return 0;
+}
+
void kvm_mips_free_vcpus(struct kvm *kvm)
{
unsigned int i;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6ce40dd..d8867e5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -436,6 +436,11 @@ err_out:
return -EINVAL;
}
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+ return 0;
+}
+
void kvm_arch_destroy_vm(struct kvm *kvm)
{
unsigned int i;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index f142215..406324c 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1490,6 +1490,11 @@ out_err:
return rc;
}
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+ return 0;
+}
+
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
VCPU_EVENT(vcpu, 3, "%s", "free cpu");
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19f9f9e..18dfbac 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7779,6 +7779,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return 0;
}
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+ return 0;
+}
+
static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
{
int r;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9c28b4d..d3810bf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -835,6 +835,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
void kvm_arch_destroy_vm(struct kvm *kvm);
void kvm_arch_sync_events(struct kvm *kvm);
+int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9293285..2db53a8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -601,6 +601,10 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
stat_fops_per_vm[p->kind]))
goto out_err;
}
+
+ if (kvm_arch_create_vm_debugfs(kvm) < 0)
+ goto out_err;
+
return 0;
out_err:
--
2.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-08-31 17:05 [PATCH 0/4] kvm: export TSC offset to user-space Luiz Capitulino
` (2 preceding siblings ...)
2016-08-31 17:05 ` [PATCH 3/4] kvm: add stub for arch specific debugfs support Luiz Capitulino
@ 2016-08-31 17:05 ` Luiz Capitulino
2016-09-02 13:43 ` Stefan Hajnoczi
2016-09-02 17:00 ` Paolo Bonzini
3 siblings, 2 replies; 20+ messages in thread
From: Luiz Capitulino @ 2016-08-31 17:05 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti
We need to retrieve a VM's TSC offset in order to use
the host's TSC to merge host and guest traces. This is
explained in detail in this thread:
[Qemu-devel] [RFC] host and guest kernel trace merging
https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
Today, the only way to retrieve a VM's TSC offset is
by using the kvm_write_tsc_offset tracepoint. This has
a few problems. First, the tracepoint is only emitted
when the VM boots, which requires a reboot to get it if
the VM is already running. Second, tracepoints are not
supposed to be ABIs in case they need to be consumed by
user-space tools.
This commit exports a VM's TSC offset to user-space via
debugfs. A new file called "tsc-offset" is created in
the VM's debugfs directory. For example:
/sys/kernel/debug/kvm/51696-10/tsc-offset
This file contains one TSC offset per line, for each
vCPU. For example:
vcpu0: 18446742405270834952
vcpu1: 18446742405270834952
vcpu2: 18446742405270834952
vcpu3: 18446742405270834952
There are some important observations about this
solution:
- While all vCPUs TSC offsets should be equal for the
cases we care about (ie. stable TSC and no write to
the TSC MSR), I chose to follow the spec and export
each vCPU's TSC offset (might also be helpful for
debugging)
- The TSC offset is only useful after the VM has booted
- We'll probably need to export the TSC multiplier too.
However, I've been using only the TSC offset for now.
So, let's get this merged first and do the TSC multiplier
as a second step
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/vmx.c | 8 ++++++++
arch/x86/kvm/x86.c | 30 ++++++++++++++++++++++++++++++
4 files changed, 40 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 33ae3a4..5714bbd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -952,6 +952,7 @@ struct kvm_x86_ops {
bool (*has_wbinvd_exit)(void);
u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
+ u64 (*read_cached_tsc_offset)(struct kvm_vcpu *vcpu);
void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index af523d8..c851477 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5065,6 +5065,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.has_wbinvd_exit = svm_has_wbinvd_exit,
.read_tsc_offset = svm_read_tsc_offset,
+ .read_cached_tsc_offset = svm_read_tsc_offset,
.write_tsc_offset = svm_write_tsc_offset,
.adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest,
.read_l1_tsc = svm_read_l1_tsc,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5cede40..82dfe42 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -616,6 +616,7 @@ struct vcpu_vmx {
u64 hv_deadline_tsc;
u64 current_tsc_ratio;
+ u64 cached_tsc_offset;
bool guest_pkru_valid;
u32 guest_pkru;
@@ -2608,6 +2609,11 @@ static u64 vmx_read_tsc_offset(struct kvm_vcpu *vcpu)
return vmcs_read64(TSC_OFFSET);
}
+static u64 vmx_read_cached_tsc_offset(struct kvm_vcpu *vcpu)
+{
+ return to_vmx(vcpu)->cached_tsc_offset;
+}
+
/*
* writes 'offset' into guest's timestamp counter offset register
*/
@@ -2632,6 +2638,7 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
vmcs_read64(TSC_OFFSET), offset);
vmcs_write64(TSC_OFFSET, offset);
}
+ to_vmx(vcpu)->cached_tsc_offset = offset;
}
static void vmx_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
@@ -11275,6 +11282,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
.read_tsc_offset = vmx_read_tsc_offset,
+ .read_cached_tsc_offset = vmx_read_cached_tsc_offset,
.write_tsc_offset = vmx_write_tsc_offset,
.adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest,
.read_l1_tsc = vmx_read_l1_tsc,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 18dfbac..75a8e23 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -54,6 +54,7 @@
#include <linux/pvclock_gtod.h>
#include <linux/kvm_irqfd.h>
#include <linux/irqbypass.h>
+#include <linux/debugfs.h>
#include <trace/events/kvm.h>
#include <asm/debugreg.h>
@@ -7779,8 +7780,37 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return 0;
}
+static int tsc_offset_show(struct seq_file *m, void *data)
+{
+ struct kvm *kvm = m->private;
+ struct kvm_vcpu *vcpu;
+ int i;
+
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ seq_printf(m, "vcpu%d: %llu\n",
+ vcpu->vcpu_id, kvm_x86_ops->read_cached_tsc_offset(vcpu));
+
+ return 0;
+}
+
+static int tsc_offset_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, tsc_offset_show, inode->i_private);
+}
+
+static const struct file_operations tsc_offset_fops = {
+ .owner = THIS_MODULE,
+ .open = tsc_offset_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
int kvm_arch_create_vm_debugfs(struct kvm *kvm)
{
+ if (!debugfs_create_file("tsc-offset", 0444,
+ kvm->debugfs_dentry, kvm, &tsc_offset_fops))
+ return -ENOMEM;
return 0;
}
--
2.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-08-31 17:05 ` [PATCH 4/4] kvm: x86: export TSC offset to user-space Luiz Capitulino
@ 2016-09-02 13:43 ` Stefan Hajnoczi
2016-09-02 14:15 ` Steven Rostedt
` (2 more replies)
2016-09-02 17:00 ` Paolo Bonzini
1 sibling, 3 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2016-09-02 13:43 UTC (permalink / raw)
To: Luiz Capitulino
Cc: kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti
[-- Attachment #1: Type: text/plain, Size: 2123 bytes --]
On Wed, Aug 31, 2016 at 01:05:45PM -0400, Luiz Capitulino wrote:
> We need to retrieve a VM's TSC offset in order to use
> the host's TSC to merge host and guest traces. This is
> explained in detail in this thread:
>
> [Qemu-devel] [RFC] host and guest kernel trace merging
> https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
>
> Today, the only way to retrieve a VM's TSC offset is
> by using the kvm_write_tsc_offset tracepoint. This has
> a few problems. First, the tracepoint is only emitted
> when the VM boots, which requires a reboot to get it if
> the VM is already running. Second, tracepoints are not
> supposed to be ABIs in case they need to be consumed by
> user-space tools.
>
> This commit exports a VM's TSC offset to user-space via
> debugfs. A new file called "tsc-offset" is created in
> the VM's debugfs directory. For example:
>
> /sys/kernel/debug/kvm/51696-10/tsc-offset
>
> This file contains one TSC offset per line, for each
> vCPU. For example:
>
> vcpu0: 18446742405270834952
> vcpu1: 18446742405270834952
> vcpu2: 18446742405270834952
> vcpu3: 18446742405270834952
>
> There are some important observations about this
> solution:
>
> - While all vCPUs TSC offsets should be equal for the
> cases we care about (ie. stable TSC and no write to
> the TSC MSR), I chose to follow the spec and export
> each vCPU's TSC offset (might also be helpful for
> debugging)
>
> - The TSC offset is only useful after the VM has booted
>
> - We'll probably need to export the TSC multiplier too.
> However, I've been using only the TSC offset for now.
> So, let's get this merged first and do the TSC multiplier
> as a second step
Can TSC offset changes occur at runtime?
One example is vcpu hotplug where the tracing tool would need to fetch
the new vcpu's TSC offset after tracing has already started.
Another example is if QEMU or the guest change the TSC offset while
running. If the tracing tool doesn't notice this then trace events will have
incorrect timestamps.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer
2016-08-31 17:05 ` [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer Luiz Capitulino
@ 2016-09-02 13:51 ` Paolo Bonzini
0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-02 13:51 UTC (permalink / raw)
To: Luiz Capitulino, kvm; +Cc: linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti
On 31/08/2016 19:05, Luiz Capitulino wrote:
> This make it possible to call kvm_destroy_vm_debugfs() from
> kvm_create_vm_debugfs() in error conditions.
>
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
> virt/kvm/kvm_main.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1950782..c1dc45e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -559,9 +559,11 @@ static void kvm_destroy_vm_debugfs(struct kvm *kvm)
>
> debugfs_remove_recursive(kvm->debugfs_dentry);
>
> - for (i = 0; i < kvm_debugfs_num_entries; i++)
> - kfree(kvm->debugfs_stat_data[i]);
> - kfree(kvm->debugfs_stat_data);
> + if (kvm->debugfs_stat_data) {
> + for (i = 0; i < kvm_debugfs_num_entries; i++)
> + kfree(kvm->debugfs_stat_data[i]);
> + kfree(kvm->debugfs_stat_data);
> + }
> }
>
> static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error
2016-08-31 17:05 ` [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error Luiz Capitulino
@ 2016-09-02 13:53 ` Paolo Bonzini
0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-02 13:53 UTC (permalink / raw)
To: Luiz Capitulino, kvm; +Cc: linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti
On 31/08/2016 19:05, Luiz Capitulino wrote:
> Memory and debugfs entries are leaked on error. Fix it.
>
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
> virt/kvm/kvm_main.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c1dc45e..9293285 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -585,12 +585,12 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
> sizeof(*kvm->debugfs_stat_data),
> GFP_KERNEL);
> if (!kvm->debugfs_stat_data)
> - return -ENOMEM;
> + goto out_err;
>
> for (p = debugfs_entries; p->name; p++) {
> stat_data = kzalloc(sizeof(*stat_data), GFP_KERNEL);
> if (!stat_data)
> - return -ENOMEM;
> + goto out_err;
>
> stat_data->kvm = kvm;
> stat_data->offset = p->offset;
> @@ -599,9 +599,13 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
> kvm->debugfs_dentry,
> stat_data,
> stat_fops_per_vm[p->kind]))
> - return -ENOMEM;
> + goto out_err;
> }
> return 0;
> +
> +out_err:
> + kvm_destroy_vm_debugfs(kvm);
> + return -ENOMEM;
> }
>
> static struct kvm *kvm_create_vm(unsigned long type)
>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] kvm: add stub for arch specific debugfs support
2016-08-31 17:05 ` [PATCH 3/4] kvm: add stub for arch specific debugfs support Luiz Capitulino
@ 2016-09-02 13:53 ` Paolo Bonzini
2016-09-03 3:34 ` Masami Hiramatsu
1 sibling, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-02 13:53 UTC (permalink / raw)
To: Luiz Capitulino, kvm; +Cc: linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti
On 31/08/2016 19:05, Luiz Capitulino wrote:
> kvm_arch_create_vm_debugfs() allows arch specific code to
> create entries in the VM's directory in debugfs. x86 will
> implement support for this in the next commit.
>
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
> arch/arm/kvm/arm.c | 5 +++++
> arch/mips/kvm/mips.c | 5 +++++
> arch/powerpc/kvm/powerpc.c | 5 +++++
> arch/s390/kvm/kvm-s390.c | 5 +++++
> arch/x86/kvm/x86.c | 5 +++++
> include/linux/kvm_host.h | 2 ++
> virt/kvm/kvm_main.c | 4 ++++
> 7 files changed, 31 insertions(+)
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 75f130e..491211f 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -144,6 +144,11 @@ out_fail_alloc:
> return ret;
> }
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> {
> return VM_FAULT_SIGBUS;
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a6ea084..a854b8b 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -140,6 +140,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> return 0;
> }
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> void kvm_mips_free_vcpus(struct kvm *kvm)
> {
> unsigned int i;
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 6ce40dd..d8867e5 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -436,6 +436,11 @@ err_out:
> return -EINVAL;
> }
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> void kvm_arch_destroy_vm(struct kvm *kvm)
> {
> unsigned int i;
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index f142215..406324c 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1490,6 +1490,11 @@ out_err:
> return rc;
> }
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> {
> VCPU_EVENT(vcpu, 3, "%s", "free cpu");
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 19f9f9e..18dfbac 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7779,6 +7779,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> return 0;
> }
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
> {
> int r;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9c28b4d..d3810bf 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -835,6 +835,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
> void kvm_arch_destroy_vm(struct kvm *kvm);
> void kvm_arch_sync_events(struct kvm *kvm);
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm);
> +
> int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
> void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9293285..2db53a8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -601,6 +601,10 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
> stat_fops_per_vm[p->kind]))
> goto out_err;
> }
> +
> + if (kvm_arch_create_vm_debugfs(kvm) < 0)
> + goto out_err;
> +
> return 0;
>
> out_err:
>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-09-02 13:43 ` Stefan Hajnoczi
@ 2016-09-02 14:15 ` Steven Rostedt
2016-09-03 0:23 ` Marcelo Tosatti
2016-09-02 16:26 ` Luiz Capitulino
2016-09-02 23:49 ` Marcelo Tosatti
2 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2016-09-02 14:15 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Luiz Capitulino, kvm, linux-kernel, pbonzini, rkrcmar, mhiramat,
mtosatti
On Fri, 2 Sep 2016 09:43:01 -0400
Stefan Hajnoczi <stefanha@gmail.com> wrote:
> Can TSC offset changes occur at runtime?
>
> One example is vcpu hotplug where the tracing tool would need to fetch
> the new vcpu's TSC offset after tracing has already started.
>
> Another example is if QEMU or the guest change the TSC offset while
> running. If the tracing tool doesn't notice this then trace events will have
> incorrect timestamps.
I believe there are tracepoints for these events. They would obviously
need to be enabled for the tracer to catch them.
I would also recommend that they go into their own instance to make
sure other events do not overwrite them.
-- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-09-02 13:43 ` Stefan Hajnoczi
2016-09-02 14:15 ` Steven Rostedt
@ 2016-09-02 16:26 ` Luiz Capitulino
2016-09-02 16:29 ` Luiz Capitulino
2016-09-02 23:49 ` Marcelo Tosatti
2 siblings, 1 reply; 20+ messages in thread
From: Luiz Capitulino @ 2016-09-02 16:26 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti
On Fri, 2 Sep 2016 09:43:01 -0400
Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Wed, Aug 31, 2016 at 01:05:45PM -0400, Luiz Capitulino wrote:
> > We need to retrieve a VM's TSC offset in order to use
> > the host's TSC to merge host and guest traces. This is
> > explained in detail in this thread:
> >
> > [Qemu-devel] [RFC] host and guest kernel trace merging
> > https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
> >
> > Today, the only way to retrieve a VM's TSC offset is
> > by using the kvm_write_tsc_offset tracepoint. This has
> > a few problems. First, the tracepoint is only emitted
> > when the VM boots, which requires a reboot to get it if
> > the VM is already running. Second, tracepoints are not
> > supposed to be ABIs in case they need to be consumed by
> > user-space tools.
> >
> > This commit exports a VM's TSC offset to user-space via
> > debugfs. A new file called "tsc-offset" is created in
> > the VM's debugfs directory. For example:
> >
> > /sys/kernel/debug/kvm/51696-10/tsc-offset
> >
> > This file contains one TSC offset per line, for each
> > vCPU. For example:
> >
> > vcpu0: 18446742405270834952
> > vcpu1: 18446742405270834952
> > vcpu2: 18446742405270834952
> > vcpu3: 18446742405270834952
> >
> > There are some important observations about this
> > solution:
> >
> > - While all vCPUs TSC offsets should be equal for the
> > cases we care about (ie. stable TSC and no write to
> > the TSC MSR), I chose to follow the spec and export
> > each vCPU's TSC offset (might also be helpful for
> > debugging)
> >
> > - The TSC offset is only useful after the VM has booted
> >
> > - We'll probably need to export the TSC multiplier too.
> > However, I've been using only the TSC offset for now.
> > So, let's get this merged first and do the TSC multiplier
> > as a second step
>
> Can TSC offset changes occur at runtime?
Yes. IIRC, if the system has an unstable TSC, KVM will adjust the
TSC offset when migrating the vCPU to other cores (although
tracing with unstable TSC is not supported). Also, the guest can
write to the TSC MSR at any time (although I don't know if Linux
ever does this).
> One example is vcpu hotplug where the tracing tool would need to fetch
> the new vcpu's TSC offset after tracing has already started.
>
> Another example is if QEMU or the guest change the TSC offset while
> running. If the tracing tool doesn't notice this then trace events will have
> incorrect timestamps.
I guess that what tools like trace-cmd want to do in those cases
is to warn the user and discard the trace. A simple way of doing
this would be to re-check that the TSC offset are the same after
tracing is done. It could also use inotify, in case it works
for debugfs (never tried it myself).
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-09-02 16:26 ` Luiz Capitulino
@ 2016-09-02 16:29 ` Luiz Capitulino
0 siblings, 0 replies; 20+ messages in thread
From: Luiz Capitulino @ 2016-09-02 16:29 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti
On Fri, 2 Sep 2016 12:26:55 -0400
Luiz Capitulino <lcapitulino@redhat.com> wrote:
> I guess that what tools like trace-cmd want to do in those cases
> is to warn the user and discard the trace. A simple way of doing
> this would be to re-check that the TSC offset are the same after
> tracing is done. It could also use inotify, in case it works
> for debugfs (never tried it myself).
The second idea was probably stupid, never mind me :)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-08-31 17:05 ` [PATCH 4/4] kvm: x86: export TSC offset to user-space Luiz Capitulino
2016-09-02 13:43 ` Stefan Hajnoczi
@ 2016-09-02 17:00 ` Paolo Bonzini
2016-09-02 17:31 ` Luiz Capitulino
1 sibling, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-02 17:00 UTC (permalink / raw)
To: Luiz Capitulino, kvm; +Cc: linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti
On 31/08/2016 19:05, Luiz Capitulino wrote:
> vcpu0: 18446742405270834952
> vcpu1: 18446742405270834952
> vcpu2: 18446742405270834952
> vcpu3: 18446742405270834952
>
> - We'll probably need to export the TSC multiplier too.
> However, I've been using only the TSC offset for now.
> So, let's get this merged first and do the TSC multiplier
> as a second step
You'll need to export the number of fractional bits in the multiplier,
too. It's going to be a very simple patch, so please do everything now.
arch/x86/kvm/x86.c is huge; please create a new file arch/x86/kvm/debugfs.c.
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/svm.c | 1 +
> arch/x86/kvm/vmx.c | 8 ++++++++
> arch/x86/kvm/x86.c | 30 ++++++++++++++++++++++++++++++
> 4 files changed, 40 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 33ae3a4..5714bbd 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -952,6 +952,7 @@ struct kvm_x86_ops {
> bool (*has_wbinvd_exit)(void);
>
> u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
> + u64 (*read_cached_tsc_offset)(struct kvm_vcpu *vcpu);
> void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
>
> u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index af523d8..c851477 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -5065,6 +5065,7 @@ static struct kvm_x86_ops svm_x86_ops = {
> .has_wbinvd_exit = svm_has_wbinvd_exit,
>
> .read_tsc_offset = svm_read_tsc_offset,
> + .read_cached_tsc_offset = svm_read_tsc_offset,
> .write_tsc_offset = svm_write_tsc_offset,
> .adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest,
> .read_l1_tsc = svm_read_l1_tsc,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5cede40..82dfe42 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -616,6 +616,7 @@ struct vcpu_vmx {
> u64 hv_deadline_tsc;
>
> u64 current_tsc_ratio;
> + u64 cached_tsc_offset;
>
> bool guest_pkru_valid;
> u32 guest_pkru;
> @@ -2608,6 +2609,11 @@ static u64 vmx_read_tsc_offset(struct kvm_vcpu *vcpu)
> return vmcs_read64(TSC_OFFSET);
> }
>
> +static u64 vmx_read_cached_tsc_offset(struct kvm_vcpu *vcpu)
> +{
> + return to_vmx(vcpu)->cached_tsc_offset;
> +}
> +
> /*
> * writes 'offset' into guest's timestamp counter offset register
> */
> @@ -2632,6 +2638,7 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> vmcs_read64(TSC_OFFSET), offset);
> vmcs_write64(TSC_OFFSET, offset);
> }
> + to_vmx(vcpu)->cached_tsc_offset = offset;
> }
>
> static void vmx_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
> @@ -11275,6 +11282,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
> .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
>
> .read_tsc_offset = vmx_read_tsc_offset,
> + .read_cached_tsc_offset = vmx_read_cached_tsc_offset,
> .write_tsc_offset = vmx_write_tsc_offset,
> .adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest,
> .read_l1_tsc = vmx_read_l1_tsc,
You need to handle SVM as well. So you might as well simplify the code:
- add a kvm_vcpu_write_tsc_offset wrapper for kvm_x86_ops->write_tsc_offset
- add a tsc_offset field in struct kvm_vcpu_arch
- replace kvm_x86_ops->read_tsc_offset with accesses to the new field
Then in a fifth patch export the TSC offset (and multiplier ;)) to
userspace.
I'm not very happy about having a single file for all TSC offsets.
Creating subdirectories under the PID-FD per-VM directory would be nicer
in the long run.
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-09-02 17:00 ` Paolo Bonzini
@ 2016-09-02 17:31 ` Luiz Capitulino
2016-09-05 8:10 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Luiz Capitulino @ 2016-09-02 17:31 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti
On Fri, 2 Sep 2016 19:00:41 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 31/08/2016 19:05, Luiz Capitulino wrote:
> > vcpu0: 18446742405270834952
> > vcpu1: 18446742405270834952
> > vcpu2: 18446742405270834952
> > vcpu3: 18446742405270834952
> >
> > - We'll probably need to export the TSC multiplier too.
> > However, I've been using only the TSC offset for now.
> > So, let's get this merged first and do the TSC multiplier
> > as a second step
>
> You'll need to export the number of fractional bits in the multiplier,
> too. It's going to be a very simple patch, so please do everything now.
I didn't want to expose the multiplier before testing our tracing
procedure with it. So far we've been only using the TSC offset (and
it works great). I don't even know if I have a machine around to
test it, so it could take a bit.
> arch/x86/kvm/x86.c is huge; please create a new file arch/x86/kvm/debugfs.c.
Will do.
> > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 1 +
> > arch/x86/kvm/svm.c | 1 +
> > arch/x86/kvm/vmx.c | 8 ++++++++
> > arch/x86/kvm/x86.c | 30 ++++++++++++++++++++++++++++++
> > 4 files changed, 40 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 33ae3a4..5714bbd 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -952,6 +952,7 @@ struct kvm_x86_ops {
> > bool (*has_wbinvd_exit)(void);
> >
> > u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
> > + u64 (*read_cached_tsc_offset)(struct kvm_vcpu *vcpu);
> > void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> >
> > u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index af523d8..c851477 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -5065,6 +5065,7 @@ static struct kvm_x86_ops svm_x86_ops = {
> > .has_wbinvd_exit = svm_has_wbinvd_exit,
> >
> > .read_tsc_offset = svm_read_tsc_offset,
> > + .read_cached_tsc_offset = svm_read_tsc_offset,
> > .write_tsc_offset = svm_write_tsc_offset,
> > .adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest,
> > .read_l1_tsc = svm_read_l1_tsc,
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 5cede40..82dfe42 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -616,6 +616,7 @@ struct vcpu_vmx {
> > u64 hv_deadline_tsc;
> >
> > u64 current_tsc_ratio;
> > + u64 cached_tsc_offset;
> >
> > bool guest_pkru_valid;
> > u32 guest_pkru;
> > @@ -2608,6 +2609,11 @@ static u64 vmx_read_tsc_offset(struct kvm_vcpu *vcpu)
> > return vmcs_read64(TSC_OFFSET);
> > }
> >
> > +static u64 vmx_read_cached_tsc_offset(struct kvm_vcpu *vcpu)
> > +{
> > + return to_vmx(vcpu)->cached_tsc_offset;
> > +}
> > +
> > /*
> > * writes 'offset' into guest's timestamp counter offset register
> > */
> > @@ -2632,6 +2638,7 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> > vmcs_read64(TSC_OFFSET), offset);
> > vmcs_write64(TSC_OFFSET, offset);
> > }
> > + to_vmx(vcpu)->cached_tsc_offset = offset;
> > }
> >
> > static void vmx_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
> > @@ -11275,6 +11282,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
> > .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
> >
> > .read_tsc_offset = vmx_read_tsc_offset,
> > + .read_cached_tsc_offset = vmx_read_cached_tsc_offset,
> > .write_tsc_offset = vmx_write_tsc_offset,
> > .adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest,
> > .read_l1_tsc = vmx_read_l1_tsc,
>
> You need to handle SVM as well. So you might as well simplify the code:
SVM is handled:
+ .read_cached_tsc_offset = svm_read_tsc_offset,
> - add a kvm_vcpu_write_tsc_offset wrapper for kvm_x86_ops->write_tsc_offset
>
> - add a tsc_offset field in struct kvm_vcpu_arch
>
> - replace kvm_x86_ops->read_tsc_offset with accesses to the new field
Given that SVM is handled, you still want me to do this?
> Then in a fifth patch export the TSC offset (and multiplier ;)) to
> userspace.
>
> I'm not very happy about having a single file for all TSC offsets.
> Creating subdirectories under the PID-FD per-VM directory would be nicer
> in the long run.
I think Steven would also prefer that, but some people raised the
concern at KVM Forum that creating per vcpu dirs in debugfs may
consume considerable memory for a system running several dozen
if not hundreds of VMs. This concern seems valid to me, but I
can do either way.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-09-02 13:43 ` Stefan Hajnoczi
2016-09-02 14:15 ` Steven Rostedt
2016-09-02 16:26 ` Luiz Capitulino
@ 2016-09-02 23:49 ` Marcelo Tosatti
2016-09-03 1:29 ` Luiz Capitulino
2 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2016-09-02 23:49 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Luiz Capitulino, kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat
On Fri, Sep 02, 2016 at 09:43:01AM -0400, Stefan Hajnoczi wrote:
> On Wed, Aug 31, 2016 at 01:05:45PM -0400, Luiz Capitulino wrote:
> > We need to retrieve a VM's TSC offset in order to use
> > the host's TSC to merge host and guest traces. This is
> > explained in detail in this thread:
> >
> > [Qemu-devel] [RFC] host and guest kernel trace merging
> > https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
> >
> > Today, the only way to retrieve a VM's TSC offset is
> > by using the kvm_write_tsc_offset tracepoint. This has
> > a few problems. First, the tracepoint is only emitted
> > when the VM boots, which requires a reboot to get it if
> > the VM is already running. Second, tracepoints are not
> > supposed to be ABIs in case they need to be consumed by
> > user-space tools.
> >
> > This commit exports a VM's TSC offset to user-space via
> > debugfs. A new file called "tsc-offset" is created in
> > the VM's debugfs directory. For example:
> >
> > /sys/kernel/debug/kvm/51696-10/tsc-offset
> >
> > This file contains one TSC offset per line, for each
> > vCPU. For example:
> >
> > vcpu0: 18446742405270834952
> > vcpu1: 18446742405270834952
> > vcpu2: 18446742405270834952
> > vcpu3: 18446742405270834952
> >
> > There are some important observations about this
> > solution:
> >
> > - While all vCPUs TSC offsets should be equal for the
> > cases we care about (ie. stable TSC and no write to
> > the TSC MSR), I chose to follow the spec and export
> > each vCPU's TSC offset (might also be helpful for
> > debugging)
> >
> > - The TSC offset is only useful after the VM has booted
> >
> > - We'll probably need to export the TSC multiplier too.
> > However, I've been using only the TSC offset for now.
> > So, let's get this merged first and do the TSC multiplier
> > as a second step
>
> Can TSC offset changes occur at runtime?
>
> One example is vcpu hotplug where the tracing tool would need to fetch
> the new vcpu's TSC offset after tracing has already started.
>
> Another example is if QEMU or the guest change the TSC offset while
> running. If the tracing tool doesn't notice this then trace events will have
> incorrect timestamps.
>
> Stefan
Yes they can, and the interface should mention that "the user is
responsible for handling races of execution" (IMO).
So the workflow is:
1) User boots VM and knows the state of the VM.
2) User runs trace-cmd on the host.
Is there a need to automate gathering of traces? (that is to know the
state of reboots and so forth). I don't see one. In that case, the above
workflow is functional.
Can you add such comments to the interface Luiz (that the value
read is potentially stale).
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-09-02 14:15 ` Steven Rostedt
@ 2016-09-03 0:23 ` Marcelo Tosatti
2016-09-03 4:04 ` Masami Hiramatsu
0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2016-09-03 0:23 UTC (permalink / raw)
To: Steven Rostedt
Cc: Stefan Hajnoczi, Luiz Capitulino, kvm, linux-kernel, pbonzini,
rkrcmar, mhiramat
On Fri, Sep 02, 2016 at 10:15:41AM -0400, Steven Rostedt wrote:
> On Fri, 2 Sep 2016 09:43:01 -0400
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
>
> > Can TSC offset changes occur at runtime?
Yes, but Linux guests don't write to the TSC offset
after booting and unless user does manual TSC writes.
> > One example is vcpu hotplug where the tracing tool would need to fetch
> > the new vcpu's TSC offset after tracing has already started.
> >
> > Another example is if QEMU or the guest change the TSC offset while
> > running. If the tracing tool doesn't notice this then trace events will have
> > incorrect timestamps.
So what happens is this:
HostTSC (a variable).
GuestTSC (variable) = GuestTSCOffset (fixed) + HostTSC (variable)
Then the algorithm processes the trace as follows:
line = for each line(guest_trace)
line = line - GuestTSCOffset (only the timestamp of course)
So from the moment the guest writes a new TSC offset, the host
should use the new TSC offset to subtract from the trace entries.
The trace entries are in fact:
HostTSC + GuestTSCOffset
So the guest trace should contain entries for "USE NEW TSC OFFSET,
VALUE: xxx", which can be done (hum not sure if guest entries
or host entries).
However, correct me if i am wrong, the usecase seems to be:
1) Boot guest.
2) run trace-cmd
3) run workload
4) read traces on host.
Another option is to have notifications as follows: record on a buffer
the following:
[ EVENT: TSC offset write, VAL: [ host tsc value, guest tsc offset ] ]
[ EVENT: TSC offset write, VAL: [ host tsc value, guest tsc offset ] ]
Then when merging the trace entries, you do:
line = for each line(host trace)
write_to_merged_trace(line)
if (contains_tsc_offset_event(line)) {
GuestTSCOffset = line.GuestTSCOffset
if (!guest_tsc_offset_initialized) {
process_all_guest_lines(
line = line - GuestTSCOffset (only the timestamp of course)
}
}
Aha, fail: the traces on the host are not sufficient to know when
to use which offset to subtract on the guest trace.
So the only possibility is to have the guest inform the occurrence
of the events: however the guest does not have access to the TSC offset.
So the host needs to inform the new tsc offset value and the guest needs
to inform when the event occurred on its side. So the algorithm can use
information on both traces to know which value to subtract on the
algorithm above.
Is this necessary? Or people do:
1) Boot guest.
2) run trace-cmd
3) run workload
4) read traces on host.
> I believe there are tracepoints for these events. They would obviously
> need to be enabled for the tracer to catch them.
>
> I would also recommend that they go into their own instance to make
> sure other events do not overwrite them.
>
> -- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-09-02 23:49 ` Marcelo Tosatti
@ 2016-09-03 1:29 ` Luiz Capitulino
0 siblings, 0 replies; 20+ messages in thread
From: Luiz Capitulino @ 2016-09-03 1:29 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Stefan Hajnoczi, kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat
On Fri, 2 Sep 2016 20:49:37 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Fri, Sep 02, 2016 at 09:43:01AM -0400, Stefan Hajnoczi wrote:
> > On Wed, Aug 31, 2016 at 01:05:45PM -0400, Luiz Capitulino wrote:
> > > We need to retrieve a VM's TSC offset in order to use
> > > the host's TSC to merge host and guest traces. This is
> > > explained in detail in this thread:
> > >
> > > [Qemu-devel] [RFC] host and guest kernel trace merging
> > > https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
> > >
> > > Today, the only way to retrieve a VM's TSC offset is
> > > by using the kvm_write_tsc_offset tracepoint. This has
> > > a few problems. First, the tracepoint is only emitted
> > > when the VM boots, which requires a reboot to get it if
> > > the VM is already running. Second, tracepoints are not
> > > supposed to be ABIs in case they need to be consumed by
> > > user-space tools.
> > >
> > > This commit exports a VM's TSC offset to user-space via
> > > debugfs. A new file called "tsc-offset" is created in
> > > the VM's debugfs directory. For example:
> > >
> > > /sys/kernel/debug/kvm/51696-10/tsc-offset
> > >
> > > This file contains one TSC offset per line, for each
> > > vCPU. For example:
> > >
> > > vcpu0: 18446742405270834952
> > > vcpu1: 18446742405270834952
> > > vcpu2: 18446742405270834952
> > > vcpu3: 18446742405270834952
> > >
> > > There are some important observations about this
> > > solution:
> > >
> > > - While all vCPUs TSC offsets should be equal for the
> > > cases we care about (ie. stable TSC and no write to
> > > the TSC MSR), I chose to follow the spec and export
> > > each vCPU's TSC offset (might also be helpful for
> > > debugging)
> > >
> > > - The TSC offset is only useful after the VM has booted
> > >
> > > - We'll probably need to export the TSC multiplier too.
> > > However, I've been using only the TSC offset for now.
> > > So, let's get this merged first and do the TSC multiplier
> > > as a second step
> >
> > Can TSC offset changes occur at runtime?
> >
> > One example is vcpu hotplug where the tracing tool would need to fetch
> > the new vcpu's TSC offset after tracing has already started.
> >
> > Another example is if QEMU or the guest change the TSC offset while
> > running. If the tracing tool doesn't notice this then trace events will have
> > incorrect timestamps.
> >
> > Stefan
>
> Yes they can, and the interface should mention that "the user is
> responsible for handling races of execution" (IMO).
>
> So the workflow is:
>
> 1) User boots VM and knows the state of the VM.
> 2) User runs trace-cmd on the host.
>
> Is there a need to automate gathering of traces? (that is to know the
> state of reboots and so forth). I don't see one. In that case, the above
> workflow is functional.
>
> Can you add such comments to the interface Luiz (that the value
> read is potentially stale).
Sure, no problem.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] kvm: add stub for arch specific debugfs support
2016-08-31 17:05 ` [PATCH 3/4] kvm: add stub for arch specific debugfs support Luiz Capitulino
2016-09-02 13:53 ` Paolo Bonzini
@ 2016-09-03 3:34 ` Masami Hiramatsu
1 sibling, 0 replies; 20+ messages in thread
From: Masami Hiramatsu @ 2016-09-03 3:34 UTC (permalink / raw)
To: Luiz Capitulino
Cc: kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti
On Wed, 31 Aug 2016 13:05:44 -0400
Luiz Capitulino <lcapitulino@redhat.com> wrote:
> kvm_arch_create_vm_debugfs() allows arch specific code to
> create entries in the VM's directory in debugfs. x86 will
> implement support for this in the next commit.
I think we can use __weak symbol for other arch.
Thank you,
>
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
> arch/arm/kvm/arm.c | 5 +++++
> arch/mips/kvm/mips.c | 5 +++++
> arch/powerpc/kvm/powerpc.c | 5 +++++
> arch/s390/kvm/kvm-s390.c | 5 +++++
> arch/x86/kvm/x86.c | 5 +++++
> include/linux/kvm_host.h | 2 ++
> virt/kvm/kvm_main.c | 4 ++++
> 7 files changed, 31 insertions(+)
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 75f130e..491211f 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -144,6 +144,11 @@ out_fail_alloc:
> return ret;
> }
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> {
> return VM_FAULT_SIGBUS;
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a6ea084..a854b8b 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -140,6 +140,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> return 0;
> }
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> void kvm_mips_free_vcpus(struct kvm *kvm)
> {
> unsigned int i;
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 6ce40dd..d8867e5 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -436,6 +436,11 @@ err_out:
> return -EINVAL;
> }
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> void kvm_arch_destroy_vm(struct kvm *kvm)
> {
> unsigned int i;
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index f142215..406324c 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1490,6 +1490,11 @@ out_err:
> return rc;
> }
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> {
> VCPU_EVENT(vcpu, 3, "%s", "free cpu");
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 19f9f9e..18dfbac 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7779,6 +7779,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> return 0;
> }
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
> {
> int r;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9c28b4d..d3810bf 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -835,6 +835,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
> void kvm_arch_destroy_vm(struct kvm *kvm);
> void kvm_arch_sync_events(struct kvm *kvm);
>
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm);
> +
> int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
> void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9293285..2db53a8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -601,6 +601,10 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
> stat_fops_per_vm[p->kind]))
> goto out_err;
> }
> +
> + if (kvm_arch_create_vm_debugfs(kvm) < 0)
> + goto out_err;
> +
> return 0;
>
> out_err:
> --
> 2.5.5
>
--
Masami Hiramatsu <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-09-03 0:23 ` Marcelo Tosatti
@ 2016-09-03 4:04 ` Masami Hiramatsu
0 siblings, 0 replies; 20+ messages in thread
From: Masami Hiramatsu @ 2016-09-03 4:04 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Steven Rostedt, Stefan Hajnoczi, Luiz Capitulino, kvm,
linux-kernel, pbonzini, rkrcmar, mhiramat
On Fri, 2 Sep 2016 21:23:11 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Fri, Sep 02, 2016 at 10:15:41AM -0400, Steven Rostedt wrote:
> > On Fri, 2 Sep 2016 09:43:01 -0400
> > Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >
> > > Can TSC offset changes occur at runtime?
>
> Yes, but Linux guests don't write to the TSC offset
> after booting and unless user does manual TSC writes.
>
> > > One example is vcpu hotplug where the tracing tool would need to fetch
> > > the new vcpu's TSC offset after tracing has already started.
> > >
> > > Another example is if QEMU or the guest change the TSC offset while
> > > running. If the tracing tool doesn't notice this then trace events will have
> > > incorrect timestamps.
>
> So what happens is this:
>
> HostTSC (a variable).
> GuestTSC (variable) = GuestTSCOffset (fixed) + HostTSC (variable)
The same idea has been done by Yoshihiro
http://events.linuxfoundation.org/sites/events/files/cojp13_yunomae.pdf
> Then the algorithm processes the trace as follows:
> line = for each line(guest_trace)
> line = line - GuestTSCOffset (only the timestamp of course)
>
> So from the moment the guest writes a new TSC offset, the host
> should use the new TSC offset to subtract from the trace entries.
> The trace entries are in fact:
>
> HostTSC + GuestTSCOffset
>
> So the guest trace should contain entries for "USE NEW TSC OFFSET,
> VALUE: xxx", which can be done (hum not sure if guest entries
> or host entries).
>
> However, correct me if i am wrong, the usecase seems to be:
>
> 1) Boot guest.
> 2) run trace-cmd
> 3) run workload
> 4) read traces on host.
IIRC, previous (current?) method is to run trace-cmd at first (before
boot the guest) so that it can get tsc-offset event and
can wait on a special unix domain socket.
For above usecase, we have to have an interface to get the current
tsc offset like Luis suggested.
>
> Another option is to have notifications as follows: record on a buffer
> the following:
>
> [ EVENT: TSC offset write, VAL: [ host tsc value, guest tsc offset ] ]
> [ EVENT: TSC offset write, VAL: [ host tsc value, guest tsc offset ] ]
>
> Then when merging the trace entries, you do:
>
> line = for each line(host trace)
> write_to_merged_trace(line)
> if (contains_tsc_offset_event(line)) {
> GuestTSCOffset = line.GuestTSCOffset
> if (!guest_tsc_offset_initialized) {
> process_all_guest_lines(
> line = line - GuestTSCOffset (only the timestamp of course)
> }
> }
>
> Aha, fail: the traces on the host are not sufficient to know when
> to use which offset to subtract on the guest trace.
>
> So the only possibility is to have the guest inform the occurrence
> of the events: however the guest does not have access to the TSC offset.
>
> So the host needs to inform the new tsc offset value and the guest needs
> to inform when the event occurred on its side. So the algorithm can use
> information on both traces to know which value to subtract on the
> algorithm above.
>
> Is this necessary? Or people do:
> 1) Boot guest.
> 2) run trace-cmd
> 3) run workload
> 4) read traces on host.
>
> > I believe there are tracepoints for these events. They would obviously
> > need to be enabled for the tracer to catch them.
Yes, Yoshihiro introduced a tracepoint for that.
http://lkml.iu.edu/hypermail/linux/kernel/1306.1/01741.html
So, we have to trace this event in host side.
> >
> > I would also recommend that they go into their own instance to make
> > sure other events do not overwrite them.
> >
> > -- Steve
Thanks,
--
Masami Hiramatsu <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
2016-09-02 17:31 ` Luiz Capitulino
@ 2016-09-05 8:10 ` Paolo Bonzini
0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-05 8:10 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: kvm, linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti
On 02/09/2016 19:31, Luiz Capitulino wrote:
> On Fri, 2 Sep 2016 19:00:41 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> On 31/08/2016 19:05, Luiz Capitulino wrote:
>>> vcpu0: 18446742405270834952
>>> vcpu1: 18446742405270834952
>>> vcpu2: 18446742405270834952
>>> vcpu3: 18446742405270834952
>>>
>>> - We'll probably need to export the TSC multiplier too.
>>> However, I've been using only the TSC offset for now.
>>> So, let's get this merged first and do the TSC multiplier
>>> as a second step
>>
>> You'll need to export the number of fractional bits in the multiplier,
>> too. It's going to be a very simple patch, so please do everything now.
>
> I didn't want to expose the multiplier before testing our tracing
> procedure with it.
Exposing the multiplier should be independent of tracing. I can test it
for you.
Paolo
> So far we've been only using the TSC offset (and
> it works great). I don't even know if I have a machine around to
> test it, so it could take a bit.
>
>> arch/x86/kvm/x86.c is huge; please create a new file arch/x86/kvm/debugfs.c.
>
> Will do.
>
>>> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
>>> ---
>>> arch/x86/include/asm/kvm_host.h | 1 +
>>> arch/x86/kvm/svm.c | 1 +
>>> arch/x86/kvm/vmx.c | 8 ++++++++
>>> arch/x86/kvm/x86.c | 30 ++++++++++++++++++++++++++++++
>>> 4 files changed, 40 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 33ae3a4..5714bbd 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -952,6 +952,7 @@ struct kvm_x86_ops {
>>> bool (*has_wbinvd_exit)(void);
>>>
>>> u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
>>> + u64 (*read_cached_tsc_offset)(struct kvm_vcpu *vcpu);
>>> void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
>>>
>>> u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index af523d8..c851477 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -5065,6 +5065,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>>> .has_wbinvd_exit = svm_has_wbinvd_exit,
>>>
>>> .read_tsc_offset = svm_read_tsc_offset,
>>> + .read_cached_tsc_offset = svm_read_tsc_offset,
>>> .write_tsc_offset = svm_write_tsc_offset,
>>> .adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest,
>>> .read_l1_tsc = svm_read_l1_tsc,
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 5cede40..82dfe42 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -616,6 +616,7 @@ struct vcpu_vmx {
>>> u64 hv_deadline_tsc;
>>>
>>> u64 current_tsc_ratio;
>>> + u64 cached_tsc_offset;
>>>
>>> bool guest_pkru_valid;
>>> u32 guest_pkru;
>>> @@ -2608,6 +2609,11 @@ static u64 vmx_read_tsc_offset(struct kvm_vcpu *vcpu)
>>> return vmcs_read64(TSC_OFFSET);
>>> }
>>>
>>> +static u64 vmx_read_cached_tsc_offset(struct kvm_vcpu *vcpu)
>>> +{
>>> + return to_vmx(vcpu)->cached_tsc_offset;
>>> +}
>>> +
>>> /*
>>> * writes 'offset' into guest's timestamp counter offset register
>>> */
>>> @@ -2632,6 +2638,7 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>>> vmcs_read64(TSC_OFFSET), offset);
>>> vmcs_write64(TSC_OFFSET, offset);
>>> }
>>> + to_vmx(vcpu)->cached_tsc_offset = offset;
>>> }
>>>
>>> static void vmx_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
>>> @@ -11275,6 +11282,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
>>> .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
>>>
>>> .read_tsc_offset = vmx_read_tsc_offset,
>>> + .read_cached_tsc_offset = vmx_read_cached_tsc_offset,
>>> .write_tsc_offset = vmx_write_tsc_offset,
>>> .adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest,
>>> .read_l1_tsc = vmx_read_l1_tsc,
>>
>> You need to handle SVM as well. So you might as well simplify the code:
>
> SVM is handled:
>
> + .read_cached_tsc_offset = svm_read_tsc_offset,
>
>> - add a kvm_vcpu_write_tsc_offset wrapper for kvm_x86_ops->write_tsc_offset
>>
>> - add a tsc_offset field in struct kvm_vcpu_arch
>>
>> - replace kvm_x86_ops->read_tsc_offset with accesses to the new field
>
> Given that SVM is handled, you still want me to do this?
>
>> Then in a fifth patch export the TSC offset (and multiplier ;)) to
>> userspace.
>>
>> I'm not very happy about having a single file for all TSC offsets.
>> Creating subdirectories under the PID-FD per-VM directory would be nicer
>> in the long run.
>
> I think Steven would also prefer that, but some people raised the
> concern at KVM Forum that creating per vcpu dirs in debugfs may
> consume considerable memory for a system running several dozen
> if not hundreds of VMs. This concern seems valid to me, but I
> can do either way.
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-09-05 8:10 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-31 17:05 [PATCH 0/4] kvm: export TSC offset to user-space Luiz Capitulino
2016-08-31 17:05 ` [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer Luiz Capitulino
2016-09-02 13:51 ` Paolo Bonzini
2016-08-31 17:05 ` [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error Luiz Capitulino
2016-09-02 13:53 ` Paolo Bonzini
2016-08-31 17:05 ` [PATCH 3/4] kvm: add stub for arch specific debugfs support Luiz Capitulino
2016-09-02 13:53 ` Paolo Bonzini
2016-09-03 3:34 ` Masami Hiramatsu
2016-08-31 17:05 ` [PATCH 4/4] kvm: x86: export TSC offset to user-space Luiz Capitulino
2016-09-02 13:43 ` Stefan Hajnoczi
2016-09-02 14:15 ` Steven Rostedt
2016-09-03 0:23 ` Marcelo Tosatti
2016-09-03 4:04 ` Masami Hiramatsu
2016-09-02 16:26 ` Luiz Capitulino
2016-09-02 16:29 ` Luiz Capitulino
2016-09-02 23:49 ` Marcelo Tosatti
2016-09-03 1:29 ` Luiz Capitulino
2016-09-02 17:00 ` Paolo Bonzini
2016-09-02 17:31 ` Luiz Capitulino
2016-09-05 8:10 ` Paolo Bonzini
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.