All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] kvm: export TSC offset to user-space
@ 2016-08-31 17:05 Luiz Capitulino
  2016-08-31 17:05 ` [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer Luiz Capitulino
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Luiz Capitulino @ 2016-08-31 17:05 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti

Hi,

We need to retrieve a VM's TSC offset in order to use
the host's TSC to merge host and guest traces. This is
explained in detail in this thread:

  [Qemu-devel] [RFC] host and guest kernel trace merging
  https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html

Today, the only way to retrieve a VM's TSC offset is
by using the kvm_write_tsc_offset tracepoint. This has
a few problems. First, the tracepoint is only emitted
when the VM boots, which requires a reboot to get it if
the VM is already running. Second, tracepoints are not
supposed to be ABIs in case they need to be consumed by
user-space tools.

This series exports a VM's TSC offset to user-space via
debugfs. A new file called "tsc-offset" is created in
the VM's debugfs directory. For example:

  /sys/kernel/debug/kvm/51696-10/tsc-offset

This file contains one TSC offset per line, for each
vCPU. For example:

  vcpu0: 18446742405270834952
  vcpu1: 18446742405270834952
  vcpu2: 18446742405270834952
  vcpu3: 18446742405270834952

Please, see patch 4/4 for additional details.

Luiz Capitulino (4):
  kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer
  kvm: kvm_create_vm_debugfs(): cleanup on error
  kvm: add stub for arch specific debugfs support
  kvm: x86: export TSC offset to user-space

 arch/arm/kvm/arm.c              |  5 +++++
 arch/mips/kvm/mips.c            |  5 +++++
 arch/powerpc/kvm/powerpc.c      |  5 +++++
 arch/s390/kvm/kvm-s390.c        |  5 +++++
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c              |  1 +
 arch/x86/kvm/vmx.c              |  8 ++++++++
 arch/x86/kvm/x86.c              | 35 +++++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h        |  2 ++
 virt/kvm/kvm_main.c             | 22 ++++++++++++++++------
 10 files changed, 83 insertions(+), 6 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer
  2016-08-31 17:05 [PATCH 0/4] kvm: export TSC offset to user-space Luiz Capitulino
@ 2016-08-31 17:05 ` Luiz Capitulino
  2016-09-02 13:51   ` Paolo Bonzini
  2016-08-31 17:05 ` [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error Luiz Capitulino
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Luiz Capitulino @ 2016-08-31 17:05 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti

This make it possible to call kvm_destroy_vm_debugfs() from
kvm_create_vm_debugfs() in error conditions.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 virt/kvm/kvm_main.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1950782..c1dc45e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -559,9 +559,11 @@ static void kvm_destroy_vm_debugfs(struct kvm *kvm)
 
 	debugfs_remove_recursive(kvm->debugfs_dentry);
 
-	for (i = 0; i < kvm_debugfs_num_entries; i++)
-		kfree(kvm->debugfs_stat_data[i]);
-	kfree(kvm->debugfs_stat_data);
+	if (kvm->debugfs_stat_data) {
+		for (i = 0; i < kvm_debugfs_num_entries; i++)
+			kfree(kvm->debugfs_stat_data[i]);
+		kfree(kvm->debugfs_stat_data);
+	}
 }
 
 static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error
  2016-08-31 17:05 [PATCH 0/4] kvm: export TSC offset to user-space Luiz Capitulino
  2016-08-31 17:05 ` [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer Luiz Capitulino
@ 2016-08-31 17:05 ` Luiz Capitulino
  2016-09-02 13:53   ` Paolo Bonzini
  2016-08-31 17:05 ` [PATCH 3/4] kvm: add stub for arch specific debugfs support Luiz Capitulino
  2016-08-31 17:05 ` [PATCH 4/4] kvm: x86: export TSC offset to user-space Luiz Capitulino
  3 siblings, 1 reply; 20+ messages in thread
From: Luiz Capitulino @ 2016-08-31 17:05 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti

Memory and debugfs entries are leaked on error. Fix it.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 virt/kvm/kvm_main.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c1dc45e..9293285 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -585,12 +585,12 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 					 sizeof(*kvm->debugfs_stat_data),
 					 GFP_KERNEL);
 	if (!kvm->debugfs_stat_data)
-		return -ENOMEM;
+		goto out_err;
 
 	for (p = debugfs_entries; p->name; p++) {
 		stat_data = kzalloc(sizeof(*stat_data), GFP_KERNEL);
 		if (!stat_data)
-			return -ENOMEM;
+			goto out_err;
 
 		stat_data->kvm = kvm;
 		stat_data->offset = p->offset;
@@ -599,9 +599,13 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 					 kvm->debugfs_dentry,
 					 stat_data,
 					 stat_fops_per_vm[p->kind]))
-			return -ENOMEM;
+			goto out_err;
 	}
 	return 0;
+
+out_err:
+	kvm_destroy_vm_debugfs(kvm);
+	return -ENOMEM;
 }
 
 static struct kvm *kvm_create_vm(unsigned long type)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/4] kvm: add stub for arch specific debugfs support
  2016-08-31 17:05 [PATCH 0/4] kvm: export TSC offset to user-space Luiz Capitulino
  2016-08-31 17:05 ` [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer Luiz Capitulino
  2016-08-31 17:05 ` [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error Luiz Capitulino
@ 2016-08-31 17:05 ` Luiz Capitulino
  2016-09-02 13:53   ` Paolo Bonzini
  2016-09-03  3:34   ` Masami Hiramatsu
  2016-08-31 17:05 ` [PATCH 4/4] kvm: x86: export TSC offset to user-space Luiz Capitulino
  3 siblings, 2 replies; 20+ messages in thread
From: Luiz Capitulino @ 2016-08-31 17:05 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti

kvm_arch_create_vm_debugfs() allows arch specific code to
create entries in the VM's directory in debugfs. x86 will
implement support for this in the next commit.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 arch/arm/kvm/arm.c         | 5 +++++
 arch/mips/kvm/mips.c       | 5 +++++
 arch/powerpc/kvm/powerpc.c | 5 +++++
 arch/s390/kvm/kvm-s390.c   | 5 +++++
 arch/x86/kvm/x86.c         | 5 +++++
 include/linux/kvm_host.h   | 2 ++
 virt/kvm/kvm_main.c        | 4 ++++
 7 files changed, 31 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 75f130e..491211f 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -144,6 +144,11 @@ out_fail_alloc:
 	return ret;
 }
 
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+	return 0;
+}
+
 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
 {
 	return VM_FAULT_SIGBUS;
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a6ea084..a854b8b 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -140,6 +140,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	return 0;
 }
 
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+	return 0;
+}
+
 void kvm_mips_free_vcpus(struct kvm *kvm)
 {
 	unsigned int i;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6ce40dd..d8867e5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -436,6 +436,11 @@ err_out:
 	return -EINVAL;
 }
 
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+	return 0;
+}
+
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	unsigned int i;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index f142215..406324c 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1490,6 +1490,11 @@ out_err:
 	return rc;
 }
 
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+	return 0;
+}
+
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
 	VCPU_EVENT(vcpu, 3, "%s", "free cpu");
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19f9f9e..18dfbac 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7779,6 +7779,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	return 0;
 }
 
+int kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+	return 0;
+}
+
 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
 {
 	int r;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9c28b4d..d3810bf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -835,6 +835,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
 void kvm_arch_destroy_vm(struct kvm *kvm);
 void kvm_arch_sync_events(struct kvm *kvm);
 
+int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9293285..2db53a8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -601,6 +601,10 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 					 stat_fops_per_vm[p->kind]))
 			goto out_err;
 	}
+
+	if (kvm_arch_create_vm_debugfs(kvm) < 0)
+		goto out_err;
+
 	return 0;
 
 out_err:
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-08-31 17:05 [PATCH 0/4] kvm: export TSC offset to user-space Luiz Capitulino
                   ` (2 preceding siblings ...)
  2016-08-31 17:05 ` [PATCH 3/4] kvm: add stub for arch specific debugfs support Luiz Capitulino
@ 2016-08-31 17:05 ` Luiz Capitulino
  2016-09-02 13:43   ` Stefan Hajnoczi
  2016-09-02 17:00   ` Paolo Bonzini
  3 siblings, 2 replies; 20+ messages in thread
From: Luiz Capitulino @ 2016-08-31 17:05 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti

We need to retrieve a VM's TSC offset in order to use
the host's TSC to merge host and guest traces. This is
explained in detail in this thread:

  [Qemu-devel] [RFC] host and guest kernel trace merging
  https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html

Today, the only way to retrieve a VM's TSC offset is
by using the kvm_write_tsc_offset tracepoint. This has
a few problems. First, the tracepoint is only emitted
when the VM boots, which requires a reboot to get it if
the VM is already running. Second, tracepoints are not
supposed to be ABIs in case they need to be consumed by
user-space tools.

This commit exports a VM's TSC offset to user-space via
debugfs. A new file called "tsc-offset" is created in
the VM's debugfs directory. For example:

  /sys/kernel/debug/kvm/51696-10/tsc-offset

This file contains one TSC offset per line, for each
vCPU. For example:

  vcpu0: 18446742405270834952
  vcpu1: 18446742405270834952
  vcpu2: 18446742405270834952
  vcpu3: 18446742405270834952

There are some important observations about this
solution:

 - While all vCPUs TSC offsets should be equal for the
   cases we care about (ie. stable TSC and no write to
   the TSC MSR), I chose to follow the spec and export
   each vCPU's TSC offset (might also be helpful for
   debugging)

 - The TSC offset is only useful after the VM has booted

 - We'll probably need to export the TSC multiplier too.
   However, I've been using only the TSC offset for now.
   So, let's get this merged first and do the TSC multiplier
   as a second step

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c              |  1 +
 arch/x86/kvm/vmx.c              |  8 ++++++++
 arch/x86/kvm/x86.c              | 30 ++++++++++++++++++++++++++++++
 4 files changed, 40 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 33ae3a4..5714bbd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -952,6 +952,7 @@ struct kvm_x86_ops {
 	bool (*has_wbinvd_exit)(void);
 
 	u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
+	u64 (*read_cached_tsc_offset)(struct kvm_vcpu *vcpu);
 	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
 	u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index af523d8..c851477 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5065,6 +5065,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.has_wbinvd_exit = svm_has_wbinvd_exit,
 
 	.read_tsc_offset = svm_read_tsc_offset,
+	.read_cached_tsc_offset = svm_read_tsc_offset,
 	.write_tsc_offset = svm_write_tsc_offset,
 	.adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest,
 	.read_l1_tsc = svm_read_l1_tsc,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5cede40..82dfe42 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -616,6 +616,7 @@ struct vcpu_vmx {
 	u64 hv_deadline_tsc;
 
 	u64 current_tsc_ratio;
+	u64 cached_tsc_offset;
 
 	bool guest_pkru_valid;
 	u32 guest_pkru;
@@ -2608,6 +2609,11 @@ static u64 vmx_read_tsc_offset(struct kvm_vcpu *vcpu)
 	return vmcs_read64(TSC_OFFSET);
 }
 
+static u64 vmx_read_cached_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	return to_vmx(vcpu)->cached_tsc_offset;
+}
+
 /*
  * writes 'offset' into guest's timestamp counter offset register
  */
@@ -2632,6 +2638,7 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 					   vmcs_read64(TSC_OFFSET), offset);
 		vmcs_write64(TSC_OFFSET, offset);
 	}
+	to_vmx(vcpu)->cached_tsc_offset = offset;
 }
 
 static void vmx_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
@@ -11275,6 +11282,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
 
 	.read_tsc_offset = vmx_read_tsc_offset,
+	.read_cached_tsc_offset = vmx_read_cached_tsc_offset,
 	.write_tsc_offset = vmx_write_tsc_offset,
 	.adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest,
 	.read_l1_tsc = vmx_read_l1_tsc,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 18dfbac..75a8e23 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -54,6 +54,7 @@
 #include <linux/pvclock_gtod.h>
 #include <linux/kvm_irqfd.h>
 #include <linux/irqbypass.h>
+#include <linux/debugfs.h>
 #include <trace/events/kvm.h>
 
 #include <asm/debugreg.h>
@@ -7779,8 +7780,37 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	return 0;
 }
 
+static int tsc_offset_show(struct seq_file *m, void *data)
+{
+	struct kvm *kvm = m->private;
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		seq_printf(m, "vcpu%d: %llu\n",
+				vcpu->vcpu_id, kvm_x86_ops->read_cached_tsc_offset(vcpu));
+
+	return 0;
+}
+
+static int tsc_offset_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, tsc_offset_show, inode->i_private);
+}
+
+static const struct file_operations tsc_offset_fops = {
+	.owner		=	THIS_MODULE,
+	.open		=	tsc_offset_open,
+	.read		=	seq_read,
+	.llseek		=	seq_lseek,
+	.release	=	single_release,
+};
+
 int kvm_arch_create_vm_debugfs(struct kvm *kvm)
 {
+	if (!debugfs_create_file("tsc-offset", 0444,
+				kvm->debugfs_dentry, kvm, &tsc_offset_fops))
+		return -ENOMEM;
 	return 0;
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-08-31 17:05 ` [PATCH 4/4] kvm: x86: export TSC offset to user-space Luiz Capitulino
@ 2016-09-02 13:43   ` Stefan Hajnoczi
  2016-09-02 14:15     ` Steven Rostedt
                       ` (2 more replies)
  2016-09-02 17:00   ` Paolo Bonzini
  1 sibling, 3 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2016-09-02 13:43 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti

[-- Attachment #1: Type: text/plain, Size: 2123 bytes --]

On Wed, Aug 31, 2016 at 01:05:45PM -0400, Luiz Capitulino wrote:
> We need to retrieve a VM's TSC offset in order to use
> the host's TSC to merge host and guest traces. This is
> explained in detail in this thread:
> 
>   [Qemu-devel] [RFC] host and guest kernel trace merging
>   https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
> 
> Today, the only way to retrieve a VM's TSC offset is
> by using the kvm_write_tsc_offset tracepoint. This has
> a few problems. First, the tracepoint is only emitted
> when the VM boots, which requires a reboot to get it if
> the VM is already running. Second, tracepoints are not
> supposed to be ABIs in case they need to be consumed by
> user-space tools.
> 
> This commit exports a VM's TSC offset to user-space via
> debugfs. A new file called "tsc-offset" is created in
> the VM's debugfs directory. For example:
> 
>   /sys/kernel/debug/kvm/51696-10/tsc-offset
> 
> This file contains one TSC offset per line, for each
> vCPU. For example:
> 
>   vcpu0: 18446742405270834952
>   vcpu1: 18446742405270834952
>   vcpu2: 18446742405270834952
>   vcpu3: 18446742405270834952
> 
> There are some important observations about this
> solution:
> 
>  - While all vCPUs TSC offsets should be equal for the
>    cases we care about (ie. stable TSC and no write to
>    the TSC MSR), I chose to follow the spec and export
>    each vCPU's TSC offset (might also be helpful for
>    debugging)
> 
>  - The TSC offset is only useful after the VM has booted
> 
>  - We'll probably need to export the TSC multiplier too.
>    However, I've been using only the TSC offset for now.
>    So, let's get this merged first and do the TSC multiplier
>    as a second step

Can TSC offset changes occur at runtime?

One example is vcpu hotplug where the tracing tool would need to fetch
the new vcpu's TSC offset after tracing has already started.

Another example is if QEMU or the guest change the TSC offset while
running.  If the tracing tool doesn't notice this then trace events will have
incorrect timestamps.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer
  2016-08-31 17:05 ` [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer Luiz Capitulino
@ 2016-09-02 13:51   ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-02 13:51 UTC (permalink / raw)
  To: Luiz Capitulino, kvm; +Cc: linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti



On 31/08/2016 19:05, Luiz Capitulino wrote:
> This make it possible to call kvm_destroy_vm_debugfs() from
> kvm_create_vm_debugfs() in error conditions.
> 
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
>  virt/kvm/kvm_main.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1950782..c1dc45e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -559,9 +559,11 @@ static void kvm_destroy_vm_debugfs(struct kvm *kvm)
>  
>  	debugfs_remove_recursive(kvm->debugfs_dentry);
>  
> -	for (i = 0; i < kvm_debugfs_num_entries; i++)
> -		kfree(kvm->debugfs_stat_data[i]);
> -	kfree(kvm->debugfs_stat_data);
> +	if (kvm->debugfs_stat_data) {
> +		for (i = 0; i < kvm_debugfs_num_entries; i++)
> +			kfree(kvm->debugfs_stat_data[i]);
> +		kfree(kvm->debugfs_stat_data);
> +	}
>  }
>  
>  static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error
  2016-08-31 17:05 ` [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error Luiz Capitulino
@ 2016-09-02 13:53   ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-02 13:53 UTC (permalink / raw)
  To: Luiz Capitulino, kvm; +Cc: linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti



On 31/08/2016 19:05, Luiz Capitulino wrote:
> Memory and debugfs entries are leaked on error. Fix it.
> 
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
>  virt/kvm/kvm_main.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c1dc45e..9293285 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -585,12 +585,12 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
>  					 sizeof(*kvm->debugfs_stat_data),
>  					 GFP_KERNEL);
>  	if (!kvm->debugfs_stat_data)
> -		return -ENOMEM;
> +		goto out_err;
>  
>  	for (p = debugfs_entries; p->name; p++) {
>  		stat_data = kzalloc(sizeof(*stat_data), GFP_KERNEL);
>  		if (!stat_data)
> -			return -ENOMEM;
> +			goto out_err;
>  
>  		stat_data->kvm = kvm;
>  		stat_data->offset = p->offset;
> @@ -599,9 +599,13 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
>  					 kvm->debugfs_dentry,
>  					 stat_data,
>  					 stat_fops_per_vm[p->kind]))
> -			return -ENOMEM;
> +			goto out_err;
>  	}
>  	return 0;
> +
> +out_err:
> +	kvm_destroy_vm_debugfs(kvm);
> +	return -ENOMEM;
>  }
>  
>  static struct kvm *kvm_create_vm(unsigned long type)
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] kvm: add stub for arch specific debugfs support
  2016-08-31 17:05 ` [PATCH 3/4] kvm: add stub for arch specific debugfs support Luiz Capitulino
@ 2016-09-02 13:53   ` Paolo Bonzini
  2016-09-03  3:34   ` Masami Hiramatsu
  1 sibling, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-02 13:53 UTC (permalink / raw)
  To: Luiz Capitulino, kvm; +Cc: linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti



On 31/08/2016 19:05, Luiz Capitulino wrote:
> kvm_arch_create_vm_debugfs() allows arch specific code to
> create entries in the VM's directory in debugfs. x86 will
> implement support for this in the next commit.
> 
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
>  arch/arm/kvm/arm.c         | 5 +++++
>  arch/mips/kvm/mips.c       | 5 +++++
>  arch/powerpc/kvm/powerpc.c | 5 +++++
>  arch/s390/kvm/kvm-s390.c   | 5 +++++
>  arch/x86/kvm/x86.c         | 5 +++++
>  include/linux/kvm_host.h   | 2 ++
>  virt/kvm/kvm_main.c        | 4 ++++
>  7 files changed, 31 insertions(+)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 75f130e..491211f 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -144,6 +144,11 @@ out_fail_alloc:
>  	return ret;
>  }
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>  {
>  	return VM_FAULT_SIGBUS;
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a6ea084..a854b8b 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -140,6 +140,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	return 0;
>  }
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  void kvm_mips_free_vcpus(struct kvm *kvm)
>  {
>  	unsigned int i;
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 6ce40dd..d8867e5 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -436,6 +436,11 @@ err_out:
>  	return -EINVAL;
>  }
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  void kvm_arch_destroy_vm(struct kvm *kvm)
>  {
>  	unsigned int i;
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index f142215..406324c 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1490,6 +1490,11 @@ out_err:
>  	return rc;
>  }
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>  	VCPU_EVENT(vcpu, 3, "%s", "free cpu");
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 19f9f9e..18dfbac 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7779,6 +7779,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	return 0;
>  }
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
>  {
>  	int r;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9c28b4d..d3810bf 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -835,6 +835,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
>  void kvm_arch_destroy_vm(struct kvm *kvm);
>  void kvm_arch_sync_events(struct kvm *kvm);
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm);
> +
>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>  
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9293285..2db53a8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -601,6 +601,10 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
>  					 stat_fops_per_vm[p->kind]))
>  			goto out_err;
>  	}
> +
> +	if (kvm_arch_create_vm_debugfs(kvm) < 0)
> +		goto out_err;
> +
>  	return 0;
>  
>  out_err:
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-09-02 13:43   ` Stefan Hajnoczi
@ 2016-09-02 14:15     ` Steven Rostedt
  2016-09-03  0:23       ` Marcelo Tosatti
  2016-09-02 16:26     ` Luiz Capitulino
  2016-09-02 23:49     ` Marcelo Tosatti
  2 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2016-09-02 14:15 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Luiz Capitulino, kvm, linux-kernel, pbonzini, rkrcmar, mhiramat,
	mtosatti

On Fri, 2 Sep 2016 09:43:01 -0400
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> Can TSC offset changes occur at runtime?
> 
> One example is vcpu hotplug where the tracing tool would need to fetch
> the new vcpu's TSC offset after tracing has already started.
> 
> Another example is if QEMU or the guest change the TSC offset while
> running.  If the tracing tool doesn't notice this then trace events will have
> incorrect timestamps.

I believe there are tracepoints for these events. They would obviously
need to be enabled for the tracer to catch them.

I would also recommend that they go into their own instance to make
sure other events do not overwrite them.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-09-02 13:43   ` Stefan Hajnoczi
  2016-09-02 14:15     ` Steven Rostedt
@ 2016-09-02 16:26     ` Luiz Capitulino
  2016-09-02 16:29       ` Luiz Capitulino
  2016-09-02 23:49     ` Marcelo Tosatti
  2 siblings, 1 reply; 20+ messages in thread
From: Luiz Capitulino @ 2016-09-02 16:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti

On Fri, 2 Sep 2016 09:43:01 -0400
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Wed, Aug 31, 2016 at 01:05:45PM -0400, Luiz Capitulino wrote:
> > We need to retrieve a VM's TSC offset in order to use
> > the host's TSC to merge host and guest traces. This is
> > explained in detail in this thread:
> > 
> >   [Qemu-devel] [RFC] host and guest kernel trace merging
> >   https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
> > 
> > Today, the only way to retrieve a VM's TSC offset is
> > by using the kvm_write_tsc_offset tracepoint. This has
> > a few problems. First, the tracepoint is only emitted
> > when the VM boots, which requires a reboot to get it if
> > the VM is already running. Second, tracepoints are not
> > supposed to be ABIs in case they need to be consumed by
> > user-space tools.
> > 
> > This commit exports a VM's TSC offset to user-space via
> > debugfs. A new file called "tsc-offset" is created in
> > the VM's debugfs directory. For example:
> > 
> >   /sys/kernel/debug/kvm/51696-10/tsc-offset
> > 
> > This file contains one TSC offset per line, for each
> > vCPU. For example:
> > 
> >   vcpu0: 18446742405270834952
> >   vcpu1: 18446742405270834952
> >   vcpu2: 18446742405270834952
> >   vcpu3: 18446742405270834952
> > 
> > There are some important observations about this
> > solution:
> > 
> >  - While all vCPUs TSC offsets should be equal for the
> >    cases we care about (ie. stable TSC and no write to
> >    the TSC MSR), I chose to follow the spec and export
> >    each vCPU's TSC offset (might also be helpful for
> >    debugging)
> > 
> >  - The TSC offset is only useful after the VM has booted
> > 
> >  - We'll probably need to export the TSC multiplier too.
> >    However, I've been using only the TSC offset for now.
> >    So, let's get this merged first and do the TSC multiplier
> >    as a second step  
> 
> Can TSC offset changes occur at runtime?

Yes. IIRC, if the system has an unstable TSC, KVM will adjust the
TSC offset when migrating the vCPU to other cores (although
tracing with unstable TSC is not supported). Also, the guest can
write to the TSC MSR at any time (although I don't know if Linux
ever does this).

> One example is vcpu hotplug where the tracing tool would need to fetch
> the new vcpu's TSC offset after tracing has already started.
> 
> Another example is if QEMU or the guest change the TSC offset while
> running.  If the tracing tool doesn't notice this then trace events will have
> incorrect timestamps.

I guess that what tools like trace-cmd want to do in those cases
is to warn the user and discard the trace. A simple way of doing
this would be to re-check that the TSC offset are the same after
tracing is done. It could also use inotify, in case it works
for debugfs (never tried it myself).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-09-02 16:26     ` Luiz Capitulino
@ 2016-09-02 16:29       ` Luiz Capitulino
  0 siblings, 0 replies; 20+ messages in thread
From: Luiz Capitulino @ 2016-09-02 16:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti

On Fri, 2 Sep 2016 12:26:55 -0400
Luiz Capitulino <lcapitulino@redhat.com> wrote:

> I guess that what tools like trace-cmd want to do in those cases
> is to warn the user and discard the trace. A simple way of doing
> this would be to re-check that the TSC offset are the same after
> tracing is done. It could also use inotify, in case it works
> for debugfs (never tried it myself).

The second idea was probably stupid, never mind me :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-08-31 17:05 ` [PATCH 4/4] kvm: x86: export TSC offset to user-space Luiz Capitulino
  2016-09-02 13:43   ` Stefan Hajnoczi
@ 2016-09-02 17:00   ` Paolo Bonzini
  2016-09-02 17:31     ` Luiz Capitulino
  1 sibling, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-02 17:00 UTC (permalink / raw)
  To: Luiz Capitulino, kvm; +Cc: linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti



On 31/08/2016 19:05, Luiz Capitulino wrote:
>   vcpu0: 18446742405270834952
>   vcpu1: 18446742405270834952
>   vcpu2: 18446742405270834952
>   vcpu3: 18446742405270834952
> 
>  - We'll probably need to export the TSC multiplier too.
>    However, I've been using only the TSC offset for now.
>    So, let's get this merged first and do the TSC multiplier
>    as a second step

You'll need to export the number of fractional bits in the multiplier,
too.  It's going to be a very simple patch, so please do everything now.

arch/x86/kvm/x86.c is huge; please create a new file arch/x86/kvm/debugfs.c.

> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/svm.c              |  1 +
>  arch/x86/kvm/vmx.c              |  8 ++++++++
>  arch/x86/kvm/x86.c              | 30 ++++++++++++++++++++++++++++++
>  4 files changed, 40 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 33ae3a4..5714bbd 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -952,6 +952,7 @@ struct kvm_x86_ops {
>  	bool (*has_wbinvd_exit)(void);
>  
>  	u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
> +	u64 (*read_cached_tsc_offset)(struct kvm_vcpu *vcpu);
>  	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
>  
>  	u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index af523d8..c851477 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -5065,6 +5065,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.has_wbinvd_exit = svm_has_wbinvd_exit,
>  
>  	.read_tsc_offset = svm_read_tsc_offset,
> +	.read_cached_tsc_offset = svm_read_tsc_offset,
>  	.write_tsc_offset = svm_write_tsc_offset,
>  	.adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest,
>  	.read_l1_tsc = svm_read_l1_tsc,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5cede40..82dfe42 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -616,6 +616,7 @@ struct vcpu_vmx {
>  	u64 hv_deadline_tsc;
>  
>  	u64 current_tsc_ratio;
> +	u64 cached_tsc_offset;
>  
>  	bool guest_pkru_valid;
>  	u32 guest_pkru;
> @@ -2608,6 +2609,11 @@ static u64 vmx_read_tsc_offset(struct kvm_vcpu *vcpu)
>  	return vmcs_read64(TSC_OFFSET);
>  }
>  
> +static u64 vmx_read_cached_tsc_offset(struct kvm_vcpu *vcpu)
> +{
> +	return to_vmx(vcpu)->cached_tsc_offset;
> +}
> +
>  /*
>   * writes 'offset' into guest's timestamp counter offset register
>   */
> @@ -2632,6 +2638,7 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  					   vmcs_read64(TSC_OFFSET), offset);
>  		vmcs_write64(TSC_OFFSET, offset);
>  	}
> +	to_vmx(vcpu)->cached_tsc_offset = offset;
>  }
>  
>  static void vmx_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
> @@ -11275,6 +11282,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
>  
>  	.read_tsc_offset = vmx_read_tsc_offset,
> +	.read_cached_tsc_offset = vmx_read_cached_tsc_offset,
>  	.write_tsc_offset = vmx_write_tsc_offset,
>  	.adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest,
>  	.read_l1_tsc = vmx_read_l1_tsc,

You need to handle SVM as well.  So you might as well simplify the code:

- add a kvm_vcpu_write_tsc_offset wrapper for kvm_x86_ops->write_tsc_offset

- add a tsc_offset field in struct kvm_vcpu_arch

- replace kvm_x86_ops->read_tsc_offset with accesses to the new field

Then in a fifth patch export the TSC offset (and multiplier ;)) to
userspace.

I'm not very happy about having a single file for all TSC offsets.
Creating subdirectories under the PID-FD per-VM directory would be nicer
in the long run.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-09-02 17:00   ` Paolo Bonzini
@ 2016-09-02 17:31     ` Luiz Capitulino
  2016-09-05  8:10       ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Luiz Capitulino @ 2016-09-02 17:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti

On Fri, 2 Sep 2016 19:00:41 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 31/08/2016 19:05, Luiz Capitulino wrote:
> >   vcpu0: 18446742405270834952
> >   vcpu1: 18446742405270834952
> >   vcpu2: 18446742405270834952
> >   vcpu3: 18446742405270834952
> > 
> >  - We'll probably need to export the TSC multiplier too.
> >    However, I've been using only the TSC offset for now.
> >    So, let's get this merged first and do the TSC multiplier
> >    as a second step  
> 
> You'll need to export the number of fractional bits in the multiplier,
> too.  It's going to be a very simple patch, so please do everything now.

I didn't want to expose the multiplier before testing our tracing
procedure with it. So far we've been only using the TSC offset (and
it works great). I don't even know if I have a machine around to
test it, so it could take a bit.

> arch/x86/kvm/x86.c is huge; please create a new file arch/x86/kvm/debugfs.c.

Will do.

> > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  1 +
> >  arch/x86/kvm/svm.c              |  1 +
> >  arch/x86/kvm/vmx.c              |  8 ++++++++
> >  arch/x86/kvm/x86.c              | 30 ++++++++++++++++++++++++++++++
> >  4 files changed, 40 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 33ae3a4..5714bbd 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -952,6 +952,7 @@ struct kvm_x86_ops {
> >  	bool (*has_wbinvd_exit)(void);
> >  
> >  	u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
> > +	u64 (*read_cached_tsc_offset)(struct kvm_vcpu *vcpu);
> >  	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> >  
> >  	u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index af523d8..c851477 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -5065,6 +5065,7 @@ static struct kvm_x86_ops svm_x86_ops = {
> >  	.has_wbinvd_exit = svm_has_wbinvd_exit,
> >  
> >  	.read_tsc_offset = svm_read_tsc_offset,
> > +	.read_cached_tsc_offset = svm_read_tsc_offset,
> >  	.write_tsc_offset = svm_write_tsc_offset,
> >  	.adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest,
> >  	.read_l1_tsc = svm_read_l1_tsc,
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 5cede40..82dfe42 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -616,6 +616,7 @@ struct vcpu_vmx {
> >  	u64 hv_deadline_tsc;
> >  
> >  	u64 current_tsc_ratio;
> > +	u64 cached_tsc_offset;
> >  
> >  	bool guest_pkru_valid;
> >  	u32 guest_pkru;
> > @@ -2608,6 +2609,11 @@ static u64 vmx_read_tsc_offset(struct kvm_vcpu *vcpu)
> >  	return vmcs_read64(TSC_OFFSET);
> >  }
> >  
> > +static u64 vmx_read_cached_tsc_offset(struct kvm_vcpu *vcpu)
> > +{
> > +	return to_vmx(vcpu)->cached_tsc_offset;
> > +}
> > +
> >  /*
> >   * writes 'offset' into guest's timestamp counter offset register
> >   */
> > @@ -2632,6 +2638,7 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> >  					   vmcs_read64(TSC_OFFSET), offset);
> >  		vmcs_write64(TSC_OFFSET, offset);
> >  	}
> > +	to_vmx(vcpu)->cached_tsc_offset = offset;
> >  }
> >  
> >  static void vmx_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
> > @@ -11275,6 +11282,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
> >  	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
> >  
> >  	.read_tsc_offset = vmx_read_tsc_offset,
> > +	.read_cached_tsc_offset = vmx_read_cached_tsc_offset,
> >  	.write_tsc_offset = vmx_write_tsc_offset,
> >  	.adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest,
> >  	.read_l1_tsc = vmx_read_l1_tsc,  
> 
> You need to handle SVM as well.  So you might as well simplify the code:

SVM is handled:

	+	.read_cached_tsc_offset = svm_read_tsc_offset,

> - add a kvm_vcpu_write_tsc_offset wrapper for kvm_x86_ops->write_tsc_offset
> 
> - add a tsc_offset field in struct kvm_vcpu_arch
> 
> - replace kvm_x86_ops->read_tsc_offset with accesses to the new field

Given that SVM is handled, you still want me to do this?

> Then in a fifth patch export the TSC offset (and multiplier ;)) to
> userspace.
> 
> I'm not very happy about having a single file for all TSC offsets.
> Creating subdirectories under the PID-FD per-VM directory would be nicer
> in the long run.

I think Steven would also prefer that, but some people raised the
concern at KVM Forum that creating per vcpu dirs in debugfs may
consume considerable memory for a system running several dozen
if not hundreds of VMs. This concern seems valid to me, but I
can do either way.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-09-02 13:43   ` Stefan Hajnoczi
  2016-09-02 14:15     ` Steven Rostedt
  2016-09-02 16:26     ` Luiz Capitulino
@ 2016-09-02 23:49     ` Marcelo Tosatti
  2016-09-03  1:29       ` Luiz Capitulino
  2 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2016-09-02 23:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Luiz Capitulino, kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat

On Fri, Sep 02, 2016 at 09:43:01AM -0400, Stefan Hajnoczi wrote:
> On Wed, Aug 31, 2016 at 01:05:45PM -0400, Luiz Capitulino wrote:
> > We need to retrieve a VM's TSC offset in order to use
> > the host's TSC to merge host and guest traces. This is
> > explained in detail in this thread:
> > 
> >   [Qemu-devel] [RFC] host and guest kernel trace merging
> >   https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
> > 
> > Today, the only way to retrieve a VM's TSC offset is
> > by using the kvm_write_tsc_offset tracepoint. This has
> > a few problems. First, the tracepoint is only emitted
> > when the VM boots, which requires a reboot to get it if
> > the VM is already running. Second, tracepoints are not
> > supposed to be ABIs in case they need to be consumed by
> > user-space tools.
> > 
> > This commit exports a VM's TSC offset to user-space via
> > debugfs. A new file called "tsc-offset" is created in
> > the VM's debugfs directory. For example:
> > 
> >   /sys/kernel/debug/kvm/51696-10/tsc-offset
> > 
> > This file contains one TSC offset per line, for each
> > vCPU. For example:
> > 
> >   vcpu0: 18446742405270834952
> >   vcpu1: 18446742405270834952
> >   vcpu2: 18446742405270834952
> >   vcpu3: 18446742405270834952
> > 
> > There are some important observations about this
> > solution:
> > 
> >  - While all vCPUs TSC offsets should be equal for the
> >    cases we care about (ie. stable TSC and no write to
> >    the TSC MSR), I chose to follow the spec and export
> >    each vCPU's TSC offset (might also be helpful for
> >    debugging)
> > 
> >  - The TSC offset is only useful after the VM has booted
> > 
> >  - We'll probably need to export the TSC multiplier too.
> >    However, I've been using only the TSC offset for now.
> >    So, let's get this merged first and do the TSC multiplier
> >    as a second step
> 
> Can TSC offset changes occur at runtime?
> 
> One example is vcpu hotplug where the tracing tool would need to fetch
> the new vcpu's TSC offset after tracing has already started.
> 
> Another example is if QEMU or the guest change the TSC offset while
> running.  If the tracing tool doesn't notice this then trace events will have
> incorrect timestamps.
> 
> Stefan

Yes they can, and the interface should mention that "the user is
responsible for handling races of execution" (IMO).

So the workflow is:

1) User boots VM and knows the state of the VM.
2) User runs trace-cmd on the host.

Is there a need to automate gathering of traces? (that is to know the
state of reboots and so forth). I don't see one. In that case, the above
workflow is functional.

Can you add such comments to the interface Luiz (that the value
read is potentially stale).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-09-02 14:15     ` Steven Rostedt
@ 2016-09-03  0:23       ` Marcelo Tosatti
  2016-09-03  4:04         ` Masami Hiramatsu
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2016-09-03  0:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Stefan Hajnoczi, Luiz Capitulino, kvm, linux-kernel, pbonzini,
	rkrcmar, mhiramat

On Fri, Sep 02, 2016 at 10:15:41AM -0400, Steven Rostedt wrote:
> On Fri, 2 Sep 2016 09:43:01 -0400
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> > Can TSC offset changes occur at runtime?

Yes, but Linux guests don't write to the TSC offset
after booting and unless user does manual TSC writes.

> > One example is vcpu hotplug where the tracing tool would need to fetch
> > the new vcpu's TSC offset after tracing has already started.
> > 
> > Another example is if QEMU or the guest change the TSC offset while
> > running.  If the tracing tool doesn't notice this then trace events will have
> > incorrect timestamps.

So what happens is this:

HostTSC (a variable). 
GuestTSC (variable) = GuestTSCOffset (fixed) + HostTSC (variable)

Then the algorithm processes the trace as follows:
line = for each line(guest_trace)
    line = line - GuestTSCOffset (only the timestamp of course)

So from the moment the guest writes a new TSC offset, the host 
should use the new TSC offset to subtract from the trace entries.
The trace entries are in fact:

HostTSC + GuestTSCOffset

So the guest trace should contain entries for "USE NEW TSC OFFSET,
VALUE: xxx", which can be done (hum not sure if guest entries
or host entries).

However, correct me if i am wrong, the usecase seems to be:

1) Boot guest.
2) run trace-cmd
3) run workload
4) read traces on host.

Another option is to have notifications as follows: record on a buffer 
the following:

    [ EVENT: TSC offset write, VAL: [ host tsc value, guest tsc offset ] ]
    [ EVENT: TSC offset write, VAL: [ host tsc value, guest tsc offset ] ]

Then when merging the trace entries, you do:

line = for each line(host trace)
    write_to_merged_trace(line)
    if (contains_tsc_offset_event(line)) {
        GuestTSCOffset = line.GuestTSCOffset
        if (!guest_tsc_offset_initialized) {
            process_all_guest_lines(
            line = line - GuestTSCOffset (only the timestamp of course)
        }
    }

Aha, fail: the traces on the host are not sufficient to know when 
to use which offset to subtract on the guest trace.

So the only possibility is to have the guest inform the occurrence
of the events: however the guest does not have access to the TSC offset.

So the host needs to inform the new tsc offset value and the guest needs
to inform when the event occurred on its side. So the algorithm can use
information on both traces to know which value to subtract on the
algorithm above.

Is this necessary? Or people do:
1) Boot guest.
2) run trace-cmd
3) run workload
4) read traces on host.

> I believe there are tracepoints for these events. They would obviously
> need to be enabled for the tracer to catch them.
> 
> I would also recommend that they go into their own instance to make
> sure other events do not overwrite them.
> 
> -- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-09-02 23:49     ` Marcelo Tosatti
@ 2016-09-03  1:29       ` Luiz Capitulino
  0 siblings, 0 replies; 20+ messages in thread
From: Luiz Capitulino @ 2016-09-03  1:29 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Stefan Hajnoczi, kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat

On Fri, 2 Sep 2016 20:49:37 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> On Fri, Sep 02, 2016 at 09:43:01AM -0400, Stefan Hajnoczi wrote:
> > On Wed, Aug 31, 2016 at 01:05:45PM -0400, Luiz Capitulino wrote:  
> > > We need to retrieve a VM's TSC offset in order to use
> > > the host's TSC to merge host and guest traces. This is
> > > explained in detail in this thread:
> > > 
> > >   [Qemu-devel] [RFC] host and guest kernel trace merging
> > >   https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
> > > 
> > > Today, the only way to retrieve a VM's TSC offset is
> > > by using the kvm_write_tsc_offset tracepoint. This has
> > > a few problems. First, the tracepoint is only emitted
> > > when the VM boots, which requires a reboot to get it if
> > > the VM is already running. Second, tracepoints are not
> > > supposed to be ABIs in case they need to be consumed by
> > > user-space tools.
> > > 
> > > This commit exports a VM's TSC offset to user-space via
> > > debugfs. A new file called "tsc-offset" is created in
> > > the VM's debugfs directory. For example:
> > > 
> > >   /sys/kernel/debug/kvm/51696-10/tsc-offset
> > > 
> > > This file contains one TSC offset per line, for each
> > > vCPU. For example:
> > > 
> > >   vcpu0: 18446742405270834952
> > >   vcpu1: 18446742405270834952
> > >   vcpu2: 18446742405270834952
> > >   vcpu3: 18446742405270834952
> > > 
> > > There are some important observations about this
> > > solution:
> > > 
> > >  - While all vCPUs TSC offsets should be equal for the
> > >    cases we care about (ie. stable TSC and no write to
> > >    the TSC MSR), I chose to follow the spec and export
> > >    each vCPU's TSC offset (might also be helpful for
> > >    debugging)
> > > 
> > >  - The TSC offset is only useful after the VM has booted
> > > 
> > >  - We'll probably need to export the TSC multiplier too.
> > >    However, I've been using only the TSC offset for now.
> > >    So, let's get this merged first and do the TSC multiplier
> > >    as a second step  
> > 
> > Can TSC offset changes occur at runtime?
> > 
> > One example is vcpu hotplug where the tracing tool would need to fetch
> > the new vcpu's TSC offset after tracing has already started.
> > 
> > Another example is if QEMU or the guest change the TSC offset while
> > running.  If the tracing tool doesn't notice this then trace events will have
> > incorrect timestamps.
> > 
> > Stefan  
> 
> Yes they can, and the interface should mention that "the user is
> responsible for handling races of execution" (IMO).
> 
> So the workflow is:
> 
> 1) User boots VM and knows the state of the VM.
> 2) User runs trace-cmd on the host.
> 
> Is there a need to automate gathering of traces? (that is to know the
> state of reboots and so forth). I don't see one. In that case, the above
> workflow is functional.
> 
> Can you add such comments to the interface Luiz (that the value
> read is potentially stale).

Sure, no problem.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] kvm: add stub for arch specific debugfs support
  2016-08-31 17:05 ` [PATCH 3/4] kvm: add stub for arch specific debugfs support Luiz Capitulino
  2016-09-02 13:53   ` Paolo Bonzini
@ 2016-09-03  3:34   ` Masami Hiramatsu
  1 sibling, 0 replies; 20+ messages in thread
From: Masami Hiramatsu @ 2016-09-03  3:34 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: kvm, linux-kernel, pbonzini, rkrcmar, rostedt, mhiramat, mtosatti

On Wed, 31 Aug 2016 13:05:44 -0400
Luiz Capitulino <lcapitulino@redhat.com> wrote:

> kvm_arch_create_vm_debugfs() allows arch specific code to
> create entries in the VM's directory in debugfs. x86 will
> implement support for this in the next commit.

I think we can use __weak symbol for other arch.

Thank  you,

> 
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
>  arch/arm/kvm/arm.c         | 5 +++++
>  arch/mips/kvm/mips.c       | 5 +++++
>  arch/powerpc/kvm/powerpc.c | 5 +++++
>  arch/s390/kvm/kvm-s390.c   | 5 +++++
>  arch/x86/kvm/x86.c         | 5 +++++
>  include/linux/kvm_host.h   | 2 ++
>  virt/kvm/kvm_main.c        | 4 ++++
>  7 files changed, 31 insertions(+)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 75f130e..491211f 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -144,6 +144,11 @@ out_fail_alloc:
>  	return ret;
>  }
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>  {
>  	return VM_FAULT_SIGBUS;
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a6ea084..a854b8b 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -140,6 +140,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	return 0;
>  }
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  void kvm_mips_free_vcpus(struct kvm *kvm)
>  {
>  	unsigned int i;
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 6ce40dd..d8867e5 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -436,6 +436,11 @@ err_out:
>  	return -EINVAL;
>  }
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  void kvm_arch_destroy_vm(struct kvm *kvm)
>  {
>  	unsigned int i;
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index f142215..406324c 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1490,6 +1490,11 @@ out_err:
>  	return rc;
>  }
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>  	VCPU_EVENT(vcpu, 3, "%s", "free cpu");
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 19f9f9e..18dfbac 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7779,6 +7779,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	return 0;
>  }
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm)
> +{
> +	return 0;
> +}
> +
>  static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
>  {
>  	int r;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9c28b4d..d3810bf 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -835,6 +835,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
>  void kvm_arch_destroy_vm(struct kvm *kvm);
>  void kvm_arch_sync_events(struct kvm *kvm);
>  
> +int kvm_arch_create_vm_debugfs(struct kvm *kvm);
> +
>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>  
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9293285..2db53a8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -601,6 +601,10 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
>  					 stat_fops_per_vm[p->kind]))
>  			goto out_err;
>  	}
> +
> +	if (kvm_arch_create_vm_debugfs(kvm) < 0)
> +		goto out_err;
> +
>  	return 0;
>  
>  out_err:
> -- 
> 2.5.5
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-09-03  0:23       ` Marcelo Tosatti
@ 2016-09-03  4:04         ` Masami Hiramatsu
  0 siblings, 0 replies; 20+ messages in thread
From: Masami Hiramatsu @ 2016-09-03  4:04 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Steven Rostedt, Stefan Hajnoczi, Luiz Capitulino, kvm,
	linux-kernel, pbonzini, rkrcmar, mhiramat

On Fri, 2 Sep 2016 21:23:11 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> On Fri, Sep 02, 2016 at 10:15:41AM -0400, Steven Rostedt wrote:
> > On Fri, 2 Sep 2016 09:43:01 -0400
> > Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > 
> > > Can TSC offset changes occur at runtime?
> 
> Yes, but Linux guests don't write to the TSC offset
> after booting and unless user does manual TSC writes.
> 
> > > One example is vcpu hotplug where the tracing tool would need to fetch
> > > the new vcpu's TSC offset after tracing has already started.
> > > 
> > > Another example is if QEMU or the guest change the TSC offset while
> > > running.  If the tracing tool doesn't notice this then trace events will have
> > > incorrect timestamps.
> 
> So what happens is this:
> 
> HostTSC (a variable). 
> GuestTSC (variable) = GuestTSCOffset (fixed) + HostTSC (variable)

The same idea has been done by Yoshihiro
http://events.linuxfoundation.org/sites/events/files/cojp13_yunomae.pdf

 
> Then the algorithm processes the trace as follows:
> line = for each line(guest_trace)
>     line = line - GuestTSCOffset (only the timestamp of course)
> 
> So from the moment the guest writes a new TSC offset, the host 
> should use the new TSC offset to subtract from the trace entries.
> The trace entries are in fact:
> 
> HostTSC + GuestTSCOffset
> 
> So the guest trace should contain entries for "USE NEW TSC OFFSET,
> VALUE: xxx", which can be done (hum not sure if guest entries
> or host entries).
> 
> However, correct me if i am wrong, the usecase seems to be:
> 
> 1) Boot guest.
> 2) run trace-cmd
> 3) run workload
> 4) read traces on host.

IIRC, previous (current?) method is to run trace-cmd at first (before
boot the guest) so that it can get tsc-offset event and
can wait on a special unix domain socket.

For above usecase, we have to have an interface to get the current
tsc offset like Luis suggested.

> 
> Another option is to have notifications as follows: record on a buffer 
> the following:
> 
>     [ EVENT: TSC offset write, VAL: [ host tsc value, guest tsc offset ] ]
>     [ EVENT: TSC offset write, VAL: [ host tsc value, guest tsc offset ] ]
> 
> Then when merging the trace entries, you do:
> 
> line = for each line(host trace)
>     write_to_merged_trace(line)
>     if (contains_tsc_offset_event(line)) {
>         GuestTSCOffset = line.GuestTSCOffset
>         if (!guest_tsc_offset_initialized) {
>             process_all_guest_lines(
>             line = line - GuestTSCOffset (only the timestamp of course)
>         }
>     }
> 
> Aha, fail: the traces on the host are not sufficient to know when 
> to use which offset to subtract on the guest trace.
> 
> So the only possibility is to have the guest inform the occurrence
> of the events: however the guest does not have access to the TSC offset.
> 
> So the host needs to inform the new tsc offset value and the guest needs
> to inform when the event occurred on its side. So the algorithm can use
> information on both traces to know which value to subtract on the
> algorithm above.
> 
> Is this necessary? Or people do:
> 1) Boot guest.
> 2) run trace-cmd
> 3) run workload
> 4) read traces on host.
> 
> > I believe there are tracepoints for these events. They would obviously
> > need to be enabled for the tracer to catch them.

Yes, Yoshihiro introduced a tracepoint for that.
http://lkml.iu.edu/hypermail/linux/kernel/1306.1/01741.html

So, we have to trace this event in host side.

> > 
> > I would also recommend that they go into their own instance to make
> > sure other events do not overwrite them.
> > 
> > -- Steve

Thanks,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space
  2016-09-02 17:31     ` Luiz Capitulino
@ 2016-09-05  8:10       ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-05  8:10 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: kvm, linux-kernel, rkrcmar, rostedt, mhiramat, mtosatti



On 02/09/2016 19:31, Luiz Capitulino wrote:
> On Fri, 2 Sep 2016 19:00:41 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> On 31/08/2016 19:05, Luiz Capitulino wrote:
>>>   vcpu0: 18446742405270834952
>>>   vcpu1: 18446742405270834952
>>>   vcpu2: 18446742405270834952
>>>   vcpu3: 18446742405270834952
>>>
>>>  - We'll probably need to export the TSC multiplier too.
>>>    However, I've been using only the TSC offset for now.
>>>    So, let's get this merged first and do the TSC multiplier
>>>    as a second step  
>>
>> You'll need to export the number of fractional bits in the multiplier,
>> too.  It's going to be a very simple patch, so please do everything now.
> 
> I didn't want to expose the multiplier before testing our tracing
> procedure with it.

Exposing the multiplier should be independent of tracing.  I can test it
for you.

Paolo

> So far we've been only using the TSC offset (and
> it works great). I don't even know if I have a machine around to
> test it, so it could take a bit.
> 
>> arch/x86/kvm/x86.c is huge; please create a new file arch/x86/kvm/debugfs.c.
> 
> Will do.
> 
>>> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
>>> ---
>>>  arch/x86/include/asm/kvm_host.h |  1 +
>>>  arch/x86/kvm/svm.c              |  1 +
>>>  arch/x86/kvm/vmx.c              |  8 ++++++++
>>>  arch/x86/kvm/x86.c              | 30 ++++++++++++++++++++++++++++++
>>>  4 files changed, 40 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 33ae3a4..5714bbd 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -952,6 +952,7 @@ struct kvm_x86_ops {
>>>  	bool (*has_wbinvd_exit)(void);
>>>  
>>>  	u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
>>> +	u64 (*read_cached_tsc_offset)(struct kvm_vcpu *vcpu);
>>>  	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
>>>  
>>>  	u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index af523d8..c851477 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -5065,6 +5065,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>>>  	.has_wbinvd_exit = svm_has_wbinvd_exit,
>>>  
>>>  	.read_tsc_offset = svm_read_tsc_offset,
>>> +	.read_cached_tsc_offset = svm_read_tsc_offset,
>>>  	.write_tsc_offset = svm_write_tsc_offset,
>>>  	.adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest,
>>>  	.read_l1_tsc = svm_read_l1_tsc,
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 5cede40..82dfe42 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -616,6 +616,7 @@ struct vcpu_vmx {
>>>  	u64 hv_deadline_tsc;
>>>  
>>>  	u64 current_tsc_ratio;
>>> +	u64 cached_tsc_offset;
>>>  
>>>  	bool guest_pkru_valid;
>>>  	u32 guest_pkru;
>>> @@ -2608,6 +2609,11 @@ static u64 vmx_read_tsc_offset(struct kvm_vcpu *vcpu)
>>>  	return vmcs_read64(TSC_OFFSET);
>>>  }
>>>  
>>> +static u64 vmx_read_cached_tsc_offset(struct kvm_vcpu *vcpu)
>>> +{
>>> +	return to_vmx(vcpu)->cached_tsc_offset;
>>> +}
>>> +
>>>  /*
>>>   * writes 'offset' into guest's timestamp counter offset register
>>>   */
>>> @@ -2632,6 +2638,7 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>>>  					   vmcs_read64(TSC_OFFSET), offset);
>>>  		vmcs_write64(TSC_OFFSET, offset);
>>>  	}
>>> +	to_vmx(vcpu)->cached_tsc_offset = offset;
>>>  }
>>>  
>>>  static void vmx_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
>>> @@ -11275,6 +11282,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
>>>  	.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
>>>  
>>>  	.read_tsc_offset = vmx_read_tsc_offset,
>>> +	.read_cached_tsc_offset = vmx_read_cached_tsc_offset,
>>>  	.write_tsc_offset = vmx_write_tsc_offset,
>>>  	.adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest,
>>>  	.read_l1_tsc = vmx_read_l1_tsc,  
>>
>> You need to handle SVM as well.  So you might as well simplify the code:
> 
> SVM is handled:
> 
> 	+	.read_cached_tsc_offset = svm_read_tsc_offset,
> 
>> - add a kvm_vcpu_write_tsc_offset wrapper for kvm_x86_ops->write_tsc_offset
>>
>> - add a tsc_offset field in struct kvm_vcpu_arch
>>
>> - replace kvm_x86_ops->read_tsc_offset with accesses to the new field
> 
> Given that SVM is handled, you still want me to do this?
> 
>> Then in a fifth patch export the TSC offset (and multiplier ;)) to
>> userspace.
>>
>> I'm not very happy about having a single file for all TSC offsets.
>> Creating subdirectories under the PID-FD per-VM directory would be nicer
>> in the long run.
> 
> I think Steven would also prefer that, but some people raised the
> concern at KVM Forum that creating per vcpu dirs in debugfs may
> consume considerable memory for a system running several dozen
> if not hundreds of VMs. This concern seems valid to me, but I
> can do either way.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-09-05  8:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-31 17:05 [PATCH 0/4] kvm: export TSC offset to user-space Luiz Capitulino
2016-08-31 17:05 ` [PATCH 1/4] kvm: kvm_destroy_vm_debugfs(): check debugs_stat_data pointer Luiz Capitulino
2016-09-02 13:51   ` Paolo Bonzini
2016-08-31 17:05 ` [PATCH 2/4] kvm: kvm_create_vm_debugfs(): cleanup on error Luiz Capitulino
2016-09-02 13:53   ` Paolo Bonzini
2016-08-31 17:05 ` [PATCH 3/4] kvm: add stub for arch specific debugfs support Luiz Capitulino
2016-09-02 13:53   ` Paolo Bonzini
2016-09-03  3:34   ` Masami Hiramatsu
2016-08-31 17:05 ` [PATCH 4/4] kvm: x86: export TSC offset to user-space Luiz Capitulino
2016-09-02 13:43   ` Stefan Hajnoczi
2016-09-02 14:15     ` Steven Rostedt
2016-09-03  0:23       ` Marcelo Tosatti
2016-09-03  4:04         ` Masami Hiramatsu
2016-09-02 16:26     ` Luiz Capitulino
2016-09-02 16:29       ` Luiz Capitulino
2016-09-02 23:49     ` Marcelo Tosatti
2016-09-03  1:29       ` Luiz Capitulino
2016-09-02 17:00   ` Paolo Bonzini
2016-09-02 17:31     ` Luiz Capitulino
2016-09-05  8:10       ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.