All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] kvm add ioeventfd pf capability
@ 2015-08-30  9:12 Michael S. Tsirkin
  2015-08-30  9:12 ` [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations Michael S. Tsirkin
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-08-30  9:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini

One of the reasons MMIO is slower than port IO is
because it requires a page table lookup.
For normal memory accesses, this is solved by using the TLB
cache - but MMIO entries are either not present or reserved
and so are never cached.

To fix, allow installing an ioeventfd on top of a read only
memory region, which allows the CPU to cache the translations.

Warning: svm patch is untested.

Michael S. Tsirkin (3):
  vmx: allow ioeventfd for EPT violations
  svm: allow ioeventfd for NPT page faults
  kvm: add KVM_CAP_IOEVENTFD_PF capability

 include/uapi/linux/kvm.h          | 1 +
 arch/x86/kvm/svm.c                | 5 +++++
 arch/x86/kvm/vmx.c                | 5 +++++
 arch/x86/kvm/x86.c                | 1 +
 Documentation/virtual/kvm/api.txt | 7 +++++++
 5 files changed, 19 insertions(+)

-- 
MST


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-08-30  9:12 [PATCH RFC 0/3] kvm add ioeventfd pf capability Michael S. Tsirkin
@ 2015-08-30  9:12 ` Michael S. Tsirkin
  2015-08-31  2:53   ` Xiao Guangrong
  2015-09-01  3:37   ` Jason Wang
  2015-08-30  9:12 ` [PATCH RFC 2/3] svm: allow ioeventfd for NPT page faults Michael S. Tsirkin
  2015-08-30  9:12 ` [PATCH RFC 3/3] kvm: add KVM_CAP_IOEVENTFD_PF capability Michael S. Tsirkin
  2 siblings, 2 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-08-30  9:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini

Even when we skip data decoding, MMIO is slightly slower
than port IO because it uses the page-tables, so the CPU
must do a pagewalk on each access.

This overhead is normally masked by using the TLB cache:
but not so for KVM MMIO, where PTEs are marked as reserved
and so are never cached.

As ioeventfd memory is never read, make it possible to use
RO pages on the host for ioeventfds, instead.
The result is that TLBs are cached, which finally makes MMIO
as fast as port IO.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/kvm/vmx.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9d1bfd3..ed44026 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
 
 	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
+		skip_emulated_instruction(vcpu);
+		return 1;
+	}
+
 	trace_kvm_page_fault(gpa, exit_qualification);
 
 	/* It is a write fault? */
-- 
MST


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 2/3] svm: allow ioeventfd for NPT page faults
  2015-08-30  9:12 [PATCH RFC 0/3] kvm add ioeventfd pf capability Michael S. Tsirkin
  2015-08-30  9:12 ` [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations Michael S. Tsirkin
@ 2015-08-30  9:12 ` Michael S. Tsirkin
  2015-08-30  9:12 ` [PATCH RFC 3/3] kvm: add KVM_CAP_IOEVENTFD_PF capability Michael S. Tsirkin
  2 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-08-30  9:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini

MMIO is slightly slower than port IO because it uses the page-tables, so
the CPU must do a pagewalk on each access.

This overhead is normally masked by using the TLB cache:
but not so for KVM MMIO, where PTEs are marked as reserved
and so are never cached.

As ioeventfd memory is never read, make it possible to use
RO pages on the host for ioeventfds, instead.
The result is that TLBs are cached, which finally makes MMIO
as fast as port IO.

Warning: untested.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/kvm/svm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8e0c084..6422fac 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1812,6 +1812,11 @@ static int pf_interception(struct vcpu_svm *svm)
 	switch (svm->apf_reason) {
 	default:
 		error_code = svm->vmcb->control.exit_info_1;
+		if (!kvm_io_bus_write(&svm->vcpu, KVM_FAST_MMIO_BUS,
+				      fault_address, 0, NULL)) {
+			skip_emulated_instruction(&svm->vcpu);
+			return 1;
+		}
 
 		trace_kvm_page_fault(fault_address, error_code);
 		if (!npt_enabled && kvm_event_needs_reinjection(&svm->vcpu))
-- 
MST


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 3/3] kvm: add KVM_CAP_IOEVENTFD_PF capability
  2015-08-30  9:12 [PATCH RFC 0/3] kvm add ioeventfd pf capability Michael S. Tsirkin
  2015-08-30  9:12 ` [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations Michael S. Tsirkin
  2015-08-30  9:12 ` [PATCH RFC 2/3] svm: allow ioeventfd for NPT page faults Michael S. Tsirkin
@ 2015-08-30  9:12 ` Michael S. Tsirkin
  2 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-08-30  9:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/kvm.h          | 1 +
 arch/x86/kvm/x86.c                | 1 +
 Documentation/virtual/kvm/api.txt | 7 +++++++
 3 files changed, 9 insertions(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 716ad4a..4509aa3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_DISABLE_QUIRKS 116
 #define KVM_CAP_X86_SMM 117
 #define KVM_CAP_MULTI_ADDRESS_SPACE 118
+#define KVM_CAP_IOEVENTFD_PF 119
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c8015fa..f989453 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2629,6 +2629,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_IRQ_INJECT_STATUS:
 	case KVM_CAP_IOEVENTFD:
 	case KVM_CAP_IOEVENTFD_NO_LENGTH:
+	case KVM_CAP_IOEVENTFD_PF:
 	case KVM_CAP_PIT2:
 	case KVM_CAP_PIT_STATE2:
 	case KVM_CAP_SET_IDENTITY_MAP_ADDR:
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index a7926a9..85a76ad 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1618,6 +1618,13 @@ The following flags are defined:
 If datamatch flag is set, the event will be signaled only if the written value
 to the registered address is equal to datamatch in struct kvm_ioeventfd.
 
+If KVM_CAP_IOEVENTFD_NO_LENGTH is present, and when DATAMATCH flag
+is clear, len can be set to 0 to match access of any length.
+
+If KVM_CAP_IOEVENTFD_PF is present, and when DATAMATCH flag
+is clear and len is set to 0, the specified address can overlap
+a read-only memory region (as opposed to an MMIO region).
+
 For virtio-ccw devices, addr contains the subchannel id and datamatch the
 virtqueue index.
 
-- 
MST


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-08-30  9:12 ` [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations Michael S. Tsirkin
@ 2015-08-31  2:53   ` Xiao Guangrong
  2015-08-31  7:46     ` Michael S. Tsirkin
  2015-09-01  3:37   ` Jason Wang
  1 sibling, 1 reply; 14+ messages in thread
From: Xiao Guangrong @ 2015-08-31  2:53 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel; +Cc: kvm, Paolo Bonzini



On 08/30/2015 05:12 PM, Michael S. Tsirkin wrote:
> Even when we skip data decoding, MMIO is slightly slower
> than port IO because it uses the page-tables, so the CPU
> must do a pagewalk on each access.
>
> This overhead is normally masked by using the TLB cache:
> but not so for KVM MMIO, where PTEs are marked as reserved
> and so are never cached.
>
> As ioeventfd memory is never read, make it possible to use
> RO pages on the host for ioeventfds, instead.

I like this idea.

> The result is that TLBs are cached, which finally makes MMIO
> as fast as port IO.

What does "TLBs are cached" mean? Even after applying the patch
no new TLB type can be cached.

>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>   arch/x86/kvm/vmx.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 9d1bfd3..ed44026 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>   		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
>
>   	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> +	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> +		skip_emulated_instruction(vcpu);
> +		return 1;
> +	}
> +

I am afraid that the common page fault entry point is not a good place to do the
work. Would move it to kvm_handle_bad_page()? The different is the workload of
fast_page_fault() is included but it's light enough and MMIO-exit should not be
very frequent, so i think it's okay.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-08-31  2:53   ` Xiao Guangrong
@ 2015-08-31  7:46     ` Michael S. Tsirkin
  2015-08-31  8:32       ` Xiao Guangrong
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-08-31  7:46 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: linux-kernel, kvm, Paolo Bonzini

On Mon, Aug 31, 2015 at 10:53:58AM +0800, Xiao Guangrong wrote:
> 
> 
> On 08/30/2015 05:12 PM, Michael S. Tsirkin wrote:
> >Even when we skip data decoding, MMIO is slightly slower
> >than port IO because it uses the page-tables, so the CPU
> >must do a pagewalk on each access.
> >
> >This overhead is normally masked by using the TLB cache:
> >but not so for KVM MMIO, where PTEs are marked as reserved
> >and so are never cached.
> >
> >As ioeventfd memory is never read, make it possible to use
> >RO pages on the host for ioeventfds, instead.
> 
> I like this idea.
> 
> >The result is that TLBs are cached, which finally makes MMIO
> >as fast as port IO.
> 
> What does "TLBs are cached" mean? Even after applying the patch
> no new TLB type can be cached.

The Intel manual says:
	No guest-physical mappings or combined mappings are created with
	information derived from EPT paging-structure entries that are not present
	(bits 2:0 are all 0) or that are misconfigured (see Section 28.2.3.1).

	No combined mappings are created with information derived from guest
	paging-structure entries that are not present or that set reserved bits.

Thus mappings that result in EPT violation are created, this makes
EPT violation preferable to EPT misconfiguration.


> >
> >Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >---
> >  arch/x86/kvm/vmx.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> >diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >index 9d1bfd3..ed44026 100644
> >--- a/arch/x86/kvm/vmx.c
> >+++ b/arch/x86/kvm/vmx.c
> >@@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> >  		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
> >
> >  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> >+	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> >+		skip_emulated_instruction(vcpu);
> >+		return 1;
> >+	}
> >+
> 
> I am afraid that the common page fault entry point is not a good place to do the
> work.

Why isn't it?

> Would move it to kvm_handle_bad_page()? The different is the workload of
> fast_page_fault() is included but it's light enough and MMIO-exit should not be
> very frequent, so i think it's okay.

That was supposed to be a slow path, I doubt it'll work well without
major code restructuring.
IIUC by design everything that's not going through fast_page_fault
is supposed to be slow path that only happens rarely.

But in this case, the page stays read-only, we need a new fast path
through the code.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-08-31  7:46     ` Michael S. Tsirkin
@ 2015-08-31  8:32       ` Xiao Guangrong
  2015-08-31 11:27         ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Xiao Guangrong @ 2015-08-31  8:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, Paolo Bonzini



On 08/31/2015 03:46 PM, Michael S. Tsirkin wrote:
> On Mon, Aug 31, 2015 at 10:53:58AM +0800, Xiao Guangrong wrote:
>>
>>
>> On 08/30/2015 05:12 PM, Michael S. Tsirkin wrote:
>>> Even when we skip data decoding, MMIO is slightly slower
>>> than port IO because it uses the page-tables, so the CPU
>>> must do a pagewalk on each access.
>>>
>>> This overhead is normally masked by using the TLB cache:
>>> but not so for KVM MMIO, where PTEs are marked as reserved
>>> and so are never cached.
>>>
>>> As ioeventfd memory is never read, make it possible to use
>>> RO pages on the host for ioeventfds, instead.
>>
>> I like this idea.
>>
>>> The result is that TLBs are cached, which finally makes MMIO
>>> as fast as port IO.
>>
>> What does "TLBs are cached" mean? Even after applying the patch
>> no new TLB type can be cached.
>
> The Intel manual says:
> 	No guest-physical mappings or combined mappings are created with
> 	information derived from EPT paging-structure entries that are not present
> 	(bits 2:0 are all 0) or that are misconfigured (see Section 28.2.3.1).
>
> 	No combined mappings are created with information derived from guest
> 	paging-structure entries that are not present or that set reserved bits.
>
> Thus mappings that result in EPT violation are created, this makes
> EPT violation preferable to EPT misconfiguration.

Hmm... but your logic completely bypasses page-table-installation, the page
table entry is nonpresent forever for eventfd memory.

>
>
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>   arch/x86/kvm/vmx.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 9d1bfd3..ed44026 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>>>   		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
>>>
>>>   	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>>> +	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
>>> +		skip_emulated_instruction(vcpu);
>>> +		return 1;
>>> +	}
>>> +
>>
>> I am afraid that the common page fault entry point is not a good place to do the
>> work.
>
> Why isn't it?

1) You always do bus_write even if it is a read access. You can not assume that the
    memory region can't be read by guest.

2) The workload of _bus_write is added for all kinds of page fault, normal #PF is fair
    frequent than #PF happens on RO memory.

3) It completely bypasses the logic of handing RO memslot.

>
>> Would move it to kvm_handle_bad_page()? The different is the workload of
>> fast_page_fault() is included but it's light enough and MMIO-exit should not be
>> very frequent, so i think it's okay.
>
> That was supposed to be a slow path, I doubt it'll work well without
> major code restructuring.
> IIUC by design everything that's not going through fast_page_fault
> is supposed to be slow path that only happens rarely.
>

Do you have performance numbers which compare this patch and the way i figured out?

> But in this case, the page stays read-only, we need a new fast path
> through the code.
>

Another solution is making MMU recognise the RO region which is write-mostly, then
make the page table entry be reserved other than readonly.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-08-31  8:32       ` Xiao Guangrong
@ 2015-08-31 11:27         ` Michael S. Tsirkin
  2015-08-31 13:23           ` Xiao Guangrong
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-08-31 11:27 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: linux-kernel, kvm, Paolo Bonzini

On Mon, Aug 31, 2015 at 04:32:52PM +0800, Xiao Guangrong wrote:
> 
> 
> On 08/31/2015 03:46 PM, Michael S. Tsirkin wrote:
> >On Mon, Aug 31, 2015 at 10:53:58AM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 08/30/2015 05:12 PM, Michael S. Tsirkin wrote:
> >>>Even when we skip data decoding, MMIO is slightly slower
> >>>than port IO because it uses the page-tables, so the CPU
> >>>must do a pagewalk on each access.
> >>>
> >>>This overhead is normally masked by using the TLB cache:
> >>>but not so for KVM MMIO, where PTEs are marked as reserved
> >>>and so are never cached.
> >>>
> >>>As ioeventfd memory is never read, make it possible to use
> >>>RO pages on the host for ioeventfds, instead.
> >>
> >>I like this idea.
> >>
> >>>The result is that TLBs are cached, which finally makes MMIO
> >>>as fast as port IO.
> >>
> >>What does "TLBs are cached" mean? Even after applying the patch
> >>no new TLB type can be cached.
> >
> >The Intel manual says:
> >	No guest-physical mappings or combined mappings are created with
> >	information derived from EPT paging-structure entries that are not present
> >	(bits 2:0 are all 0) or that are misconfigured (see Section 28.2.3.1).
> >
> >	No combined mappings are created with information derived from guest
> >	paging-structure entries that are not present or that set reserved bits.
> >
> >Thus mappings that result in EPT violation are created, this makes
> >EPT violation preferable to EPT misconfiguration.
> 
> Hmm... but your logic completely bypasses page-table-installation, the page
> table entry is nonpresent forever for eventfd memory.

As far as I can tell, not really: a non present page is not an EPT violation,
so at the first write, the regular logic will trigger and install the PTE,
then instruction is re-executed and trigger an EPT violation.


> >
> >
> >>>
> >>>Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>---
> >>>  arch/x86/kvm/vmx.c | 5 +++++
> >>>  1 file changed, 5 insertions(+)
> >>>
> >>>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>index 9d1bfd3..ed44026 100644
> >>>--- a/arch/x86/kvm/vmx.c
> >>>+++ b/arch/x86/kvm/vmx.c
> >>>@@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> >>>  		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
> >>>
> >>>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> >>>+	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> >>>+		skip_emulated_instruction(vcpu);
> >>>+		return 1;
> >>>+	}
> >>>+
> >>
> >>I am afraid that the common page fault entry point is not a good place to do the
> >>work.
> >
> >Why isn't it?
> 
> 1) You always do bus_write even if it is a read access. You can not assume that the
>    memory region can't be read by guest.
> 
> 2) The workload of _bus_write is added for all kinds of page fault, normal #PF is fair
>    frequent than #PF happens on RO memory.

Normal PF is slow path: you need to hit disk to pull memory from swap,
etc etc. More importantly, it installs the PTE so the next access
does not cause an exit at all.

At some level that is the whole point of the patch: we are adding a fast
path option to what would normally be slow path only, so we aren't
slowing down anything important.

> 3) It completely bypasses the logic of handing RO memslot.
> 
> >
> >>Would move it to kvm_handle_bad_page()? The different is the workload of
> >>fast_page_fault() is included but it's light enough and MMIO-exit should not be
> >>very frequent, so i think it's okay.
> >
> >That was supposed to be a slow path, I doubt it'll work well without
> >major code restructuring.
> >IIUC by design everything that's not going through fast_page_fault
> >is supposed to be slow path that only happens rarely.
> >
> 
> Do you have performance numbers which compare this patch and the way i figured out?

Not yet.

> >But in this case, the page stays read-only, we need a new fast path
> >through the code.
> >
> 
> Another solution is making MMU recognise the RO region which is write-mostly, then
> make the page table entry be reserved other than readonly.

Reserved results in EPT misconfiguration, so it's not cached.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-08-31 11:27         ` Michael S. Tsirkin
@ 2015-08-31 13:23           ` Xiao Guangrong
  2015-08-31 14:57             ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Xiao Guangrong @ 2015-08-31 13:23 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, Paolo Bonzini



On 08/31/2015 07:27 PM, Michael S. Tsirkin wrote:
> On Mon, Aug 31, 2015 at 04:32:52PM +0800, Xiao Guangrong wrote:
>>
>>
>> On 08/31/2015 03:46 PM, Michael S. Tsirkin wrote:
>>> On Mon, Aug 31, 2015 at 10:53:58AM +0800, Xiao Guangrong wrote:
>>>>
>>>>
>>>> On 08/30/2015 05:12 PM, Michael S. Tsirkin wrote:
>>>>> Even when we skip data decoding, MMIO is slightly slower
>>>>> than port IO because it uses the page-tables, so the CPU
>>>>> must do a pagewalk on each access.
>>>>>
>>>>> This overhead is normally masked by using the TLB cache:
>>>>> but not so for KVM MMIO, where PTEs are marked as reserved
>>>>> and so are never cached.
>>>>>
>>>>> As ioeventfd memory is never read, make it possible to use
>>>>> RO pages on the host for ioeventfds, instead.
>>>>
>>>> I like this idea.
>>>>
>>>>> The result is that TLBs are cached, which finally makes MMIO
>>>>> as fast as port IO.
>>>>
>>>> What does "TLBs are cached" mean? Even after applying the patch
>>>> no new TLB type can be cached.
>>>
>>> The Intel manual says:
>>> 	No guest-physical mappings or combined mappings are created with
>>> 	information derived from EPT paging-structure entries that are not present
>>> 	(bits 2:0 are all 0) or that are misconfigured (see Section 28.2.3.1).
>>>
>>> 	No combined mappings are created with information derived from guest
>>> 	paging-structure entries that are not present or that set reserved bits.
>>>
>>> Thus mappings that result in EPT violation are created, this makes
>>> EPT violation preferable to EPT misconfiguration.
>>
>> Hmm... but your logic completely bypasses page-table-installation, the page
>> table entry is nonpresent forever for eventfd memory.
>
> As far as I can tell, not really: a non present page is not an EPT violation,

Actually, no.

There are two EPT VM-exit, one is EPT violation which is caused if EPT entry is
not present or the access permission is not enough. Another one is EPT misconfig
which is caused if EPT reserved bits is set.

> so at the first write, the regular logic will trigger and install the PTE,
> then instruction is re-executed and trigger an EPT violation.
>
>
>>>
>>>
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> ---
>>>>>   arch/x86/kvm/vmx.c | 5 +++++
>>>>>   1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> index 9d1bfd3..ed44026 100644
>>>>> --- a/arch/x86/kvm/vmx.c
>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>> @@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>>>>>   		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
>>>>>
>>>>>   	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>>>>> +	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
>>>>> +		skip_emulated_instruction(vcpu);
>>>>> +		return 1;
>>>>> +	}
>>>>> +
>>>>
>>>> I am afraid that the common page fault entry point is not a good place to do the
>>>> work.
>>>
>>> Why isn't it?
>>
>> 1) You always do bus_write even if it is a read access. You can not assume that the
>>     memory region can't be read by guest.
>>
>> 2) The workload of _bus_write is added for all kinds of page fault, normal #PF is fair
>>     frequent than #PF happens on RO memory.
>
> Normal PF is slow path: you need to hit disk to pull memory from swap,
> etc etc. More importantly, it installs the PTE so the next access
> does not cause an exit at all.
>
> At some level that is the whole point of the patch: we are adding a fast
> path option to what would normally be slow path only, so we aren't
> slowing down anything important.

I have another question, the eventfd memory is never read by guest and it's always
a write MMIO VM-exit, why you build it on RO memslot? Why not just use normal MMIO page
instead?

>
>> 3) It completely bypasses the logic of handing RO memslot.
>>
>>>
>>>> Would move it to kvm_handle_bad_page()? The different is the workload of
>>>> fast_page_fault() is included but it's light enough and MMIO-exit should not be
>>>> very frequent, so i think it's okay.
>>>
>>> That was supposed to be a slow path, I doubt it'll work well without
>>> major code restructuring.
>>> IIUC by design everything that's not going through fast_page_fault
>>> is supposed to be slow path that only happens rarely.
>>>
>>
>> Do you have performance numbers which compare this patch and the way i figured out?
>
> Not yet.
>
>>> But in this case, the page stays read-only, we need a new fast path
>>> through the code.
>>>
>>
>> Another solution is making MMU recognise the RO region which is write-mostly, then
>> make the page table entry be reserved other than readonly.
>
> Reserved results in EPT misconfiguration, so it's not cached.
>

See my comments above.

More addition: even if the EPT entry is cache, it is only a readonly permission in TLB
entry, this is useless to speed up write access.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-08-31 13:23           ` Xiao Guangrong
@ 2015-08-31 14:57             ` Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-08-31 14:57 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: linux-kernel, kvm, Paolo Bonzini

On Mon, Aug 31, 2015 at 09:23:13PM +0800, Xiao Guangrong wrote:
> I have another question, the eventfd memory is never read by guest and it's always
> a write MMIO VM-exit, why you build it on RO memslot? Why not just use normal MMIO page
> instead?

We do that at the moment, that's slower than PIO.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-08-30  9:12 ` [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations Michael S. Tsirkin
  2015-08-31  2:53   ` Xiao Guangrong
@ 2015-09-01  3:37   ` Jason Wang
  2015-09-01  4:36     ` Michael S. Tsirkin
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Wang @ 2015-09-01  3:37 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel; +Cc: kvm, Paolo Bonzini



On 08/30/2015 05:12 PM, Michael S. Tsirkin wrote:
> Even when we skip data decoding, MMIO is slightly slower
> than port IO because it uses the page-tables, so the CPU
> must do a pagewalk on each access.
>
> This overhead is normally masked by using the TLB cache:
> but not so for KVM MMIO, where PTEs are marked as reserved
> and so are never cached.
>
> As ioeventfd memory is never read, make it possible to use
> RO pages on the host for ioeventfds, instead.
> The result is that TLBs are cached, which finally makes MMIO
> as fast as port IO.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 9d1bfd3..ed44026 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>  		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
>  
>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> +	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> +		skip_emulated_instruction(vcpu);
> +		return 1;
> +	}
> +
>  	trace_kvm_page_fault(gpa, exit_qualification);
>  
>  	/* It is a write fault? */

Just notice that vcpu_mmio_write() tries lapic first. Should we do the
same here? Otherwise we may slow down apic access consider we may have
hundreds of eventfds.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-09-01  3:37   ` Jason Wang
@ 2015-09-01  4:36     ` Michael S. Tsirkin
  2015-09-01  4:49       ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-09-01  4:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, kvm, Paolo Bonzini

On Tue, Sep 01, 2015 at 11:37:13AM +0800, Jason Wang wrote:
> 
> 
> On 08/30/2015 05:12 PM, Michael S. Tsirkin wrote:
> > Even when we skip data decoding, MMIO is slightly slower
> > than port IO because it uses the page-tables, so the CPU
> > must do a pagewalk on each access.
> >
> > This overhead is normally masked by using the TLB cache:
> > but not so for KVM MMIO, where PTEs are marked as reserved
> > and so are never cached.
> >
> > As ioeventfd memory is never read, make it possible to use
> > RO pages on the host for ioeventfds, instead.
> > The result is that TLBs are cached, which finally makes MMIO
> > as fast as port IO.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  arch/x86/kvm/vmx.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 9d1bfd3..ed44026 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> >  		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
> >  
> >  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> > +	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> > +		skip_emulated_instruction(vcpu);
> > +		return 1;
> > +	}
> > +
> >  	trace_kvm_page_fault(gpa, exit_qualification);
> >  
> >  	/* It is a write fault? */
> 
> Just notice that vcpu_mmio_write() tries lapic first. Should we do the
> same here? Otherwise we may slow down apic access consider we may have
> hundreds of eventfds.

IIUC this does not affect mmio at all: for mmio we set
reserved page flag, so they trigger an EPT misconfiguration,
not an EPT violation.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-09-01  4:36     ` Michael S. Tsirkin
@ 2015-09-01  4:49       ` Jason Wang
  2015-09-01  6:55         ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2015-09-01  4:49 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, Paolo Bonzini



On 09/01/2015 12:36 PM, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 11:37:13AM +0800, Jason Wang wrote:
>> > 
>> > 
>> > On 08/30/2015 05:12 PM, Michael S. Tsirkin wrote:
>>> > > Even when we skip data decoding, MMIO is slightly slower
>>> > > than port IO because it uses the page-tables, so the CPU
>>> > > must do a pagewalk on each access.
>>> > >
>>> > > This overhead is normally masked by using the TLB cache:
>>> > > but not so for KVM MMIO, where PTEs are marked as reserved
>>> > > and so are never cached.
>>> > >
>>> > > As ioeventfd memory is never read, make it possible to use
>>> > > RO pages on the host for ioeventfds, instead.
>>> > > The result is that TLBs are cached, which finally makes MMIO
>>> > > as fast as port IO.
>>> > >
>>> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> > > ---
>>> > >  arch/x86/kvm/vmx.c | 5 +++++
>>> > >  1 file changed, 5 insertions(+)
>>> > >
>>> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> > > index 9d1bfd3..ed44026 100644
>>> > > --- a/arch/x86/kvm/vmx.c
>>> > > +++ b/arch/x86/kvm/vmx.c
>>> > > @@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>>> > >  		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
>>> > >  
>>> > >  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>>> > > +	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
>>> > > +		skip_emulated_instruction(vcpu);
>>> > > +		return 1;
>>> > > +	}
>>> > > +
>>> > >  	trace_kvm_page_fault(gpa, exit_qualification);
>>> > >  
>>> > >  	/* It is a write fault? */
>> > 
>> > Just notice that vcpu_mmio_write() tries lapic first. Should we do the
>> > same here? Otherwise we may slow down apic access consider we may have
>> > hundreds of eventfds.
> IIUC this does not affect mmio at all: for mmio we set
> reserved page flag, so they trigger an EPT misconfiguration,
> not an EPT violation.

I see, so the question could be asked for current misconfiguration
handler instead?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations
  2015-09-01  4:49       ` Jason Wang
@ 2015-09-01  6:55         ` Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2015-09-01  6:55 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, kvm, Paolo Bonzini

On Tue, Sep 01, 2015 at 12:49:19PM +0800, Jason Wang wrote:
> 
> 
> On 09/01/2015 12:36 PM, Michael S. Tsirkin wrote:
> > On Tue, Sep 01, 2015 at 11:37:13AM +0800, Jason Wang wrote:
> >> > 
> >> > 
> >> > On 08/30/2015 05:12 PM, Michael S. Tsirkin wrote:
> >>> > > Even when we skip data decoding, MMIO is slightly slower
> >>> > > than port IO because it uses the page-tables, so the CPU
> >>> > > must do a pagewalk on each access.
> >>> > >
> >>> > > This overhead is normally masked by using the TLB cache:
> >>> > > but not so for KVM MMIO, where PTEs are marked as reserved
> >>> > > and so are never cached.
> >>> > >
> >>> > > As ioeventfd memory is never read, make it possible to use
> >>> > > RO pages on the host for ioeventfds, instead.
> >>> > > The result is that TLBs are cached, which finally makes MMIO
> >>> > > as fast as port IO.
> >>> > >
> >>> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>> > > ---
> >>> > >  arch/x86/kvm/vmx.c | 5 +++++
> >>> > >  1 file changed, 5 insertions(+)
> >>> > >
> >>> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>> > > index 9d1bfd3..ed44026 100644
> >>> > > --- a/arch/x86/kvm/vmx.c
> >>> > > +++ b/arch/x86/kvm/vmx.c
> >>> > > @@ -5745,6 +5745,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> >>> > >  		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
> >>> > >  
> >>> > >  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> >>> > > +	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> >>> > > +		skip_emulated_instruction(vcpu);
> >>> > > +		return 1;
> >>> > > +	}
> >>> > > +
> >>> > >  	trace_kvm_page_fault(gpa, exit_qualification);
> >>> > >  
> >>> > >  	/* It is a write fault? */
> >> > 
> >> > Just notice that vcpu_mmio_write() tries lapic first. Should we do the
> >> > same here? Otherwise we may slow down apic access consider we may have
> >> > hundreds of eventfds.
> > IIUC this does not affect mmio at all: for mmio we set
> > reserved page flag, so they trigger an EPT misconfiguration,
> > not an EPT violation.
> 
> I see, so the question could be asked for current misconfiguration
> handler instead?

I don't think there's an issue: that one's only handling slow-path events ATM.

-- 
MST


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-09-01  6:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-30  9:12 [PATCH RFC 0/3] kvm add ioeventfd pf capability Michael S. Tsirkin
2015-08-30  9:12 ` [PATCH RFC 1/3] vmx: allow ioeventfd for EPT violations Michael S. Tsirkin
2015-08-31  2:53   ` Xiao Guangrong
2015-08-31  7:46     ` Michael S. Tsirkin
2015-08-31  8:32       ` Xiao Guangrong
2015-08-31 11:27         ` Michael S. Tsirkin
2015-08-31 13:23           ` Xiao Guangrong
2015-08-31 14:57             ` Michael S. Tsirkin
2015-09-01  3:37   ` Jason Wang
2015-09-01  4:36     ` Michael S. Tsirkin
2015-09-01  4:49       ` Jason Wang
2015-09-01  6:55         ` Michael S. Tsirkin
2015-08-30  9:12 ` [PATCH RFC 2/3] svm: allow ioeventfd for NPT page faults Michael S. Tsirkin
2015-08-30  9:12 ` [PATCH RFC 3/3] kvm: add KVM_CAP_IOEVENTFD_PF capability Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.