All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/12] KVM: Add host swap event notifications for PV guest
@ 2010-07-06 16:24 ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

KVM virtualizes guest memory by means of shadow pages or HW assistance
like NPT/EPT. Not all memory used by a guest is mapped into the guest
address space or even present in a host memory at any given time.
When vcpu tries to access memory page that is not mapped into the guest
address space KVM is notified about it. KVM maps the page into the guest
address space and resumes vcpu execution. If the page is swapped out
from host memory vcpu execution is suspended till the page is not swapped
into the memory again. This is inefficient since vcpu can do other work
(run other task or serve interrupts) while page gets swapped in.

To overcome this inefficiency this patch series implements "asynchronous
page fault" for paravirtualized KVM guests. If a page that vcpu is
trying to access is swapped out KVM sends an async PF to the vcpu
and continues vcpu execution. Requested page is swapped in by another
thread in parallel.  When vcpu gets async PF it puts faulted task to
sleep until "wake up" interrupt is delivered. When the page is brought
to the host memory KVM sends "wake up" interrupt and the guest's task
resumes execution.

To measure performance benefits I use a simple benchmark program (below)
that starts number of threads. Some of them do work (increment counter),
others access huge array in random location trying to generate host page
faults. The size of the array is smaller then guest memory bug bigger
then host memory so we are guarantied that host will swap out part of
the array.

Running the benchmark inside guest with 4 cpus 2G memory running in 512M
container in the host using the command line "./bm -f 4 -w 4 -t 60" (run 4
faulting threads and 4 working threads for a minute) I get this result:

With async pf:
start
worker 0: 63972141051
worker 1: 65149033299
worker 2: 66301967246
worker 3: 63423000989
total: 258846142585


Without async pf:
start
worker 0: 30619912622
worker 1: 33951339266
worker 2: 31577780093
worker 3: 33603607972
total: 129752639953

The gain is 50% as expected.

Perf data look like this:
With async pf:
    97.93%       bm  bm                    [.] work_thread
     1.74%       bm  [kernel.kallsyms]     [k] retint_careful
     0.10%       bm  [kernel.kallsyms]     [k] _raw_spin_unlock_irq
     0.08%       bm  bm                    [.] fault_thread
     0.05%       bm  [kernel.kallsyms]     [k] _raw_spin_unlock_irqrestore
     0.02%       bm  [kernel.kallsyms]     [k] __do_softirq
     0.02%       bm  [kernel.kallsyms]     [k] rcu_process_gp_end

Without async pf:
    63.42%       bm  bm                    [.] work_thread
    13.64%       bm  [kernel.kallsyms]     [k] __do_softirq
     8.95%       bm  bm                    [.] fault_thread
     5.27%       bm  [kernel.kallsyms]     [k] _raw_spin_unlock_irq
     2.79%       bm  [kernel.kallsyms]     [k] hrtimer_run_pending
     2.35%       bm  [kernel.kallsyms]     [k] run_timer_softirq
     1.28%       bm  [kernel.kallsyms]     [k] _raw_spin_lock_irq
     1.16%       bm  [kernel.kallsyms]     [k] debug_smp_processor_id
     0.23%       bm  libc-2.10.2.so        [.] random_r
     0.18%       bm  [kernel.kallsyms]     [k] rcu_bh_qs
     0.18%       bm  [kernel.kallsyms]     [k] find_busiest_group
     0.14%       bm  [kernel.kallsyms]     [k] retint_careful
     0.14%       bm  libc-2.10.2.so        [.] random

Changes:
 v1->v2
   Use MSR instead of hypercall.
   Move most of the code into arch independent place.
   halt inside a guest instead of doing "wait for page" hypercall if
    preemption is disabled.
 v2->v3
   Use MSR from range 0x4b564dxx.
   Add slot version tracking.
   Support migration by restarting all guest processes after migration.
   Drop patch that tract preemptability for non-preemptable kernels
    due to performance concerns. Send async PF to non-preemptable
    guests only when vcpu is executing userspace code.
 v3->v4
  Provide alternative page fault handler in PV guest instead of adding hook to
   standard page fault handler and patch it out on non-PV guests.
  Allow only limited number of outstanding async page fault per vcpu.
  Unify  gfn_to_pfn and gfn_to_pfn_async code.
  Cancel outstanding slow work on reset.
 
Gleb Natapov (12):
  Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
  Add PV MSR to enable asynchronous page faults delivery.
  Add async PF initialization to PV guest.
  Provide special async page fault handler when async PF capability is
    detected
  Export __get_user_pages_fast.
  Add get_user_pages() variant that fails if major fault is required.
  Maintain memslot version number
  Inject asynchronous page fault into a guest if page is swapped out.
  Retry fault before vmentry
  Handle async PF in non preemptable context
  Let host know whether the guest can handle async PF in non-userspace
    context.
  Send async PF when guest is not in userspace too.

 arch/x86/include/asm/kvm_host.h |   27 ++++-
 arch/x86/include/asm/kvm_para.h |   14 ++
 arch/x86/include/asm/traps.h    |    1 +
 arch/x86/kernel/entry_32.S      |   10 ++
 arch/x86/kernel/entry_64.S      |    3 +
 arch/x86/kernel/kvm.c           |  261 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/kvmclock.c      |   13 +--
 arch/x86/kernel/smpboot.c       |    3 +
 arch/x86/kvm/Kconfig            |    2 +
 arch/x86/kvm/mmu.c              |   62 +++++++++-
 arch/x86/kvm/paging_tmpl.h      |   50 +++++++-
 arch/x86/kvm/x86.c              |  122 +++++++++++++++++-
 arch/x86/mm/gup.c               |    2 +
 fs/ncpfs/mmap.c                 |    2 +
 include/linux/kvm.h             |    1 +
 include/linux/kvm_host.h        |   32 +++++
 include/linux/kvm_para.h        |    2 +
 include/linux/mm.h              |    5 +
 include/trace/events/kvm.h      |   60 +++++++++
 mm/filemap.c                    |    3 +
 mm/memory.c                     |   31 ++++-
 mm/shmem.c                      |    8 +-
 virt/kvm/Kconfig                |    3 +
 virt/kvm/kvm_main.c             |  266 ++++++++++++++++++++++++++++++++++++++-
 24 files changed, 949 insertions(+), 34 deletions(-)

=== benchmark.c ===

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>

#define FAULTING_THREADS 1
#define WORKING_THREADS 1
#define TIMEOUT 5
#define MEMORY 1024*1024*1024

pthread_barrier_t barrier;
volatile int stop;
size_t pages;

void *fault_thread(void* p)
{
	char *mem = p;

	pthread_barrier_wait(&barrier);

	while (!stop)
		mem[(random() % pages) << 12] = 10;

	pthread_barrier_wait(&barrier);

	return NULL;
}

void *work_thread(void* p)
{
	unsigned long *i = p;

	pthread_barrier_wait(&barrier);

	while (!stop)
		(*i)++;

	pthread_barrier_wait(&barrier);

	return NULL;
}

int main(int argc, char **argv)
{
	int ft = FAULTING_THREADS, wt = WORKING_THREADS;
	unsigned int timeout = TIMEOUT;
	size_t mem = MEMORY;
	void *buf;
	int i, opt, verbose = 0;
	pthread_t t;
	pthread_attr_t pattr;
	unsigned long *res, sum = 0;

	while((opt = getopt(argc, argv, "f:w:m:t:v")) != -1) {
		switch (opt) {
		case 'f':
			ft = atoi(optarg);
			break;
		case 'w':
			wt = atoi(optarg);
			break;
		case 'm':
			mem = atoi(optarg);
			break;
		case 't':
			timeout = atoi(optarg);
			break;
		case 'v':
			verbose++;
			break;
		default:
			fprintf(stderr, "Usage %s [-f num] [-w num] [-m byte] [-t secs]\n", argv[0]);
			exit(1);
		}
	}

	if (verbose)
		printf("fault=%d work=%d mem=%lu timeout=%d\n", ft, wt, mem, timeout);

	pages = mem >> 12;
	posix_memalign(&buf, 4096, pages << 12);
	res = malloc(sizeof (unsigned long) * wt);
	memset(res, 0, sizeof (unsigned long) * wt);

	pthread_attr_init(&pattr);
	pthread_barrier_init(&barrier, NULL, ft + wt + 1);

	for (i = 0; i < ft; i++) {
		pthread_create(&t, &pattr, fault_thread, buf);
		pthread_detach(t);
	}

	for (i = 0; i < wt; i++) {
		pthread_create(&t, &pattr, work_thread, &res[i]);
		pthread_detach(t);
	}

	/* prefault memory */
	memset(buf, 0, pages << 12);
	printf("start\n");

	pthread_barrier_wait(&barrier);

	pthread_barrier_destroy(&barrier);
	pthread_barrier_init(&barrier, NULL, ft + wt + 1);

	sleep(timeout);
	stop = 1;

	pthread_barrier_wait(&barrier);

	for (i = 0; i < wt; i++) {
		sum += res[i];
		printf("worker %d: %lu\n", i, res[i]);
	}
	printf("total: %lu\n", sum);

	return 0;
}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 00/12] KVM: Add host swap event notifications for PV guest
@ 2010-07-06 16:24 ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

KVM virtualizes guest memory by means of shadow pages or HW assistance
like NPT/EPT. Not all memory used by a guest is mapped into the guest
address space or even present in a host memory at any given time.
When vcpu tries to access memory page that is not mapped into the guest
address space KVM is notified about it. KVM maps the page into the guest
address space and resumes vcpu execution. If the page is swapped out
from host memory vcpu execution is suspended till the page is not swapped
into the memory again. This is inefficient since vcpu can do other work
(run other task or serve interrupts) while page gets swapped in.

To overcome this inefficiency this patch series implements "asynchronous
page fault" for paravirtualized KVM guests. If a page that vcpu is
trying to access is swapped out KVM sends an async PF to the vcpu
and continues vcpu execution. Requested page is swapped in by another
thread in parallel.  When vcpu gets async PF it puts faulted task to
sleep until "wake up" interrupt is delivered. When the page is brought
to the host memory KVM sends "wake up" interrupt and the guest's task
resumes execution.

To measure performance benefits I use a simple benchmark program (below)
that starts number of threads. Some of them do work (increment counter),
others access huge array in random location trying to generate host page
faults. The size of the array is smaller then guest memory bug bigger
then host memory so we are guarantied that host will swap out part of
the array.

Running the benchmark inside guest with 4 cpus 2G memory running in 512M
container in the host using the command line "./bm -f 4 -w 4 -t 60" (run 4
faulting threads and 4 working threads for a minute) I get this result:

With async pf:
start
worker 0: 63972141051
worker 1: 65149033299
worker 2: 66301967246
worker 3: 63423000989
total: 258846142585


Without async pf:
start
worker 0: 30619912622
worker 1: 33951339266
worker 2: 31577780093
worker 3: 33603607972
total: 129752639953

The gain is 50% as expected.

Perf data look like this:
With async pf:
    97.93%       bm  bm                    [.] work_thread
     1.74%       bm  [kernel.kallsyms]     [k] retint_careful
     0.10%       bm  [kernel.kallsyms]     [k] _raw_spin_unlock_irq
     0.08%       bm  bm                    [.] fault_thread
     0.05%       bm  [kernel.kallsyms]     [k] _raw_spin_unlock_irqrestore
     0.02%       bm  [kernel.kallsyms]     [k] __do_softirq
     0.02%       bm  [kernel.kallsyms]     [k] rcu_process_gp_end

Without async pf:
    63.42%       bm  bm                    [.] work_thread
    13.64%       bm  [kernel.kallsyms]     [k] __do_softirq
     8.95%       bm  bm                    [.] fault_thread
     5.27%       bm  [kernel.kallsyms]     [k] _raw_spin_unlock_irq
     2.79%       bm  [kernel.kallsyms]     [k] hrtimer_run_pending
     2.35%       bm  [kernel.kallsyms]     [k] run_timer_softirq
     1.28%       bm  [kernel.kallsyms]     [k] _raw_spin_lock_irq
     1.16%       bm  [kernel.kallsyms]     [k] debug_smp_processor_id
     0.23%       bm  libc-2.10.2.so        [.] random_r
     0.18%       bm  [kernel.kallsyms]     [k] rcu_bh_qs
     0.18%       bm  [kernel.kallsyms]     [k] find_busiest_group
     0.14%       bm  [kernel.kallsyms]     [k] retint_careful
     0.14%       bm  libc-2.10.2.so        [.] random

Changes:
 v1->v2
   Use MSR instead of hypercall.
   Move most of the code into arch independent place.
   halt inside a guest instead of doing "wait for page" hypercall if
    preemption is disabled.
 v2->v3
   Use MSR from range 0x4b564dxx.
   Add slot version tracking.
   Support migration by restarting all guest processes after migration.
   Drop patch that tract preemptability for non-preemptable kernels
    due to performance concerns. Send async PF to non-preemptable
    guests only when vcpu is executing userspace code.
 v3->v4
  Provide alternative page fault handler in PV guest instead of adding hook to
   standard page fault handler and patch it out on non-PV guests.
  Allow only limited number of outstanding async page fault per vcpu.
  Unify  gfn_to_pfn and gfn_to_pfn_async code.
  Cancel outstanding slow work on reset.
 
Gleb Natapov (12):
  Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
  Add PV MSR to enable asynchronous page faults delivery.
  Add async PF initialization to PV guest.
  Provide special async page fault handler when async PF capability is
    detected
  Export __get_user_pages_fast.
  Add get_user_pages() variant that fails if major fault is required.
  Maintain memslot version number
  Inject asynchronous page fault into a guest if page is swapped out.
  Retry fault before vmentry
  Handle async PF in non preemptable context
  Let host know whether the guest can handle async PF in non-userspace
    context.
  Send async PF when guest is not in userspace too.

 arch/x86/include/asm/kvm_host.h |   27 ++++-
 arch/x86/include/asm/kvm_para.h |   14 ++
 arch/x86/include/asm/traps.h    |    1 +
 arch/x86/kernel/entry_32.S      |   10 ++
 arch/x86/kernel/entry_64.S      |    3 +
 arch/x86/kernel/kvm.c           |  261 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/kvmclock.c      |   13 +--
 arch/x86/kernel/smpboot.c       |    3 +
 arch/x86/kvm/Kconfig            |    2 +
 arch/x86/kvm/mmu.c              |   62 +++++++++-
 arch/x86/kvm/paging_tmpl.h      |   50 +++++++-
 arch/x86/kvm/x86.c              |  122 +++++++++++++++++-
 arch/x86/mm/gup.c               |    2 +
 fs/ncpfs/mmap.c                 |    2 +
 include/linux/kvm.h             |    1 +
 include/linux/kvm_host.h        |   32 +++++
 include/linux/kvm_para.h        |    2 +
 include/linux/mm.h              |    5 +
 include/trace/events/kvm.h      |   60 +++++++++
 mm/filemap.c                    |    3 +
 mm/memory.c                     |   31 ++++-
 mm/shmem.c                      |    8 +-
 virt/kvm/Kconfig                |    3 +
 virt/kvm/kvm_main.c             |  266 ++++++++++++++++++++++++++++++++++++++-
 24 files changed, 949 insertions(+), 34 deletions(-)

=== benchmark.c ===

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>

#define FAULTING_THREADS 1
#define WORKING_THREADS 1
#define TIMEOUT 5
#define MEMORY 1024*1024*1024

pthread_barrier_t barrier;
volatile int stop;
size_t pages;

void *fault_thread(void* p)
{
	char *mem = p;

	pthread_barrier_wait(&barrier);

	while (!stop)
		mem[(random() % pages) << 12] = 10;

	pthread_barrier_wait(&barrier);

	return NULL;
}

void *work_thread(void* p)
{
	unsigned long *i = p;

	pthread_barrier_wait(&barrier);

	while (!stop)
		(*i)++;

	pthread_barrier_wait(&barrier);

	return NULL;
}

int main(int argc, char **argv)
{
	int ft = FAULTING_THREADS, wt = WORKING_THREADS;
	unsigned int timeout = TIMEOUT;
	size_t mem = MEMORY;
	void *buf;
	int i, opt, verbose = 0;
	pthread_t t;
	pthread_attr_t pattr;
	unsigned long *res, sum = 0;

	while((opt = getopt(argc, argv, "f:w:m:t:v")) != -1) {
		switch (opt) {
		case 'f':
			ft = atoi(optarg);
			break;
		case 'w':
			wt = atoi(optarg);
			break;
		case 'm':
			mem = atoi(optarg);
			break;
		case 't':
			timeout = atoi(optarg);
			break;
		case 'v':
			verbose++;
			break;
		default:
			fprintf(stderr, "Usage %s [-f num] [-w num] [-m byte] [-t secs]\n", argv[0]);
			exit(1);
		}
	}

	if (verbose)
		printf("fault=%d work=%d mem=%lu timeout=%d\n", ft, wt, mem, timeout);

	pages = mem >> 12;
	posix_memalign(&buf, 4096, pages << 12);
	res = malloc(sizeof (unsigned long) * wt);
	memset(res, 0, sizeof (unsigned long) * wt);

	pthread_attr_init(&pattr);
	pthread_barrier_init(&barrier, NULL, ft + wt + 1);

	for (i = 0; i < ft; i++) {
		pthread_create(&t, &pattr, fault_thread, buf);
		pthread_detach(t);
	}

	for (i = 0; i < wt; i++) {
		pthread_create(&t, &pattr, work_thread, &res[i]);
		pthread_detach(t);
	}

	/* prefault memory */
	memset(buf, 0, pages << 12);
	printf("start\n");

	pthread_barrier_wait(&barrier);

	pthread_barrier_destroy(&barrier);
	pthread_barrier_init(&barrier, NULL, ft + wt + 1);

	sleep(timeout);
	stop = 1;

	pthread_barrier_wait(&barrier);

	for (i = 0; i < wt; i++) {
		sum += res[i];
		printf("worker %d: %lu\n", i, res[i]);
	}
	printf("total: %lu\n", sum);

	return 0;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 01/12] Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
into generic code.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    1 +
 arch/x86/kernel/kvm.c           |   11 +++++++++++
 arch/x86/kernel/kvmclock.c      |   13 +------------
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 05eba5e..42298ab 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,7 @@ struct kvm_mmu_op_release_pt {
 #include <asm/processor.h>
 
 extern void kvmclock_init(void);
+extern int kvm_register_clock(char *txt);
 
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 63b0ec8..e6db179 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -231,10 +231,21 @@ static void __init paravirt_ops_setup(void)
 #endif
 }
 
+#ifdef CONFIG_SMP
+static void __init kvm_smp_prepare_boot_cpu(void)
+{
+	WARN_ON(kvm_register_clock("primary cpu clock"));
+	native_smp_prepare_boot_cpu();
+}
+#endif
+
 void __init kvm_guest_init(void)
 {
 	if (!kvm_para_available())
 		return;
 
 	paravirt_ops_setup();
+#ifdef CONFIG_SMP
+	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+#endif
 }
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..67a5f46 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -125,7 +125,7 @@ static struct clocksource kvm_clock = {
 	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
-static int kvm_register_clock(char *txt)
+int kvm_register_clock(char *txt)
 {
 	int cpu = smp_processor_id();
 	int low, high;
@@ -150,14 +150,6 @@ static void __cpuinit kvm_setup_secondary_clock(void)
 }
 #endif
 
-#ifdef CONFIG_SMP
-static void __init kvm_smp_prepare_boot_cpu(void)
-{
-	WARN_ON(kvm_register_clock("primary cpu clock"));
-	native_smp_prepare_boot_cpu();
-}
-#endif
-
 /*
  * After the clock is registered, the host will keep writing to the
  * registered memory location. If the guest happens to shutdown, this memory
@@ -204,9 +196,6 @@ void __init kvmclock_init(void)
 	x86_cpuinit.setup_percpu_clockev =
 		kvm_setup_secondary_clock;
 #endif
-#ifdef CONFIG_SMP
-	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
-#endif
 	machine_ops.shutdown  = kvm_shutdown;
 #ifdef CONFIG_KEXEC
 	machine_ops.crash_shutdown  = kvm_crash_shutdown;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 01/12] Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
into generic code.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    1 +
 arch/x86/kernel/kvm.c           |   11 +++++++++++
 arch/x86/kernel/kvmclock.c      |   13 +------------
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 05eba5e..42298ab 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,7 @@ struct kvm_mmu_op_release_pt {
 #include <asm/processor.h>
 
 extern void kvmclock_init(void);
+extern int kvm_register_clock(char *txt);
 
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 63b0ec8..e6db179 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -231,10 +231,21 @@ static void __init paravirt_ops_setup(void)
 #endif
 }
 
+#ifdef CONFIG_SMP
+static void __init kvm_smp_prepare_boot_cpu(void)
+{
+	WARN_ON(kvm_register_clock("primary cpu clock"));
+	native_smp_prepare_boot_cpu();
+}
+#endif
+
 void __init kvm_guest_init(void)
 {
 	if (!kvm_para_available())
 		return;
 
 	paravirt_ops_setup();
+#ifdef CONFIG_SMP
+	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+#endif
 }
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..67a5f46 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -125,7 +125,7 @@ static struct clocksource kvm_clock = {
 	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
-static int kvm_register_clock(char *txt)
+int kvm_register_clock(char *txt)
 {
 	int cpu = smp_processor_id();
 	int low, high;
@@ -150,14 +150,6 @@ static void __cpuinit kvm_setup_secondary_clock(void)
 }
 #endif
 
-#ifdef CONFIG_SMP
-static void __init kvm_smp_prepare_boot_cpu(void)
-{
-	WARN_ON(kvm_register_clock("primary cpu clock"));
-	native_smp_prepare_boot_cpu();
-}
-#endif
-
 /*
  * After the clock is registered, the host will keep writing to the
  * registered memory location. If the guest happens to shutdown, this memory
@@ -204,9 +196,6 @@ void __init kvmclock_init(void)
 	x86_cpuinit.setup_percpu_clockev =
 		kvm_setup_secondary_clock;
 #endif
-#ifdef CONFIG_SMP
-	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
-#endif
 	machine_ops.shutdown  = kvm_shutdown;
 #ifdef CONFIG_KEXEC
 	machine_ops.crash_shutdown  = kvm_crash_shutdown;
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 02/12] Add PV MSR to enable asynchronous page faults delivery.
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti


Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    3 ++
 arch/x86/include/asm/kvm_para.h |    4 +++
 arch/x86/kvm/x86.c              |   49 +++++++++++++++++++++++++++++++++++++-
 include/linux/kvm.h             |    1 +
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 502e53f..245831a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -364,6 +364,9 @@ struct kvm_vcpu_arch {
 	u64 hv_vapic;
 
 	cpumask_var_t wbinvd_dirty_mask;
+
+	u32 __user *apf_data;
+	u64 apf_msr_val;
 };
 
 struct kvm_arch {
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 42298ab..5b05e9f 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -20,6 +20,7 @@
  * are available. The use of 0x11 and 0x12 is deprecated
  */
 #define KVM_FEATURE_CLOCKSOURCE2        3
+#define KVM_FEATURE_ASYNC_PF		4
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -32,9 +33,12 @@
 /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
 #define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
+#define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 
 #define KVM_MAX_MMU_OP_BATCH           32
 
+#define KVM_ASYNC_PF_ENABLED			(1 << 0)
+
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
 #define KVM_MMU_OP_FLUSH_TLB	        2
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7070b41..744f8c1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -726,12 +726,12 @@ EXPORT_SYMBOL_GPL(kvm_get_dr);
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN	7
+#define KVM_SAVE_MSRS_BEGIN	8
 static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
 	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
-	HV_X64_MSR_APIC_ASSIST_PAGE,
+	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 	MSR_K6_STAR,
 #ifdef CONFIG_X86_64
@@ -1212,6 +1212,37 @@ static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 	return 0;
 }
 
+static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
+{
+	u64 gpa = data & ~0x3f;
+	int offset = offset_in_page(gpa);
+	unsigned long addr;
+
+	/* Bits 1:5 are resrved, Should be zero */
+	if (data & 0x3e)
+		return 1;
+
+	vcpu->arch.apf_msr_val = data;
+
+	if (!(data & KVM_ASYNC_PF_ENABLED)) {
+		vcpu->arch.apf_data = NULL;
+		return 0;
+	}
+
+	addr = gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT);
+	if (kvm_is_error_hva(addr))
+		return 1;
+
+	vcpu->arch.apf_data = (u32 __user*)(addr + offset);
+
+	/* check if address is mapped */
+	if (get_user(offset, vcpu->arch.apf_data)) {
+		vcpu->arch.apf_data = NULL;
+		return 1;
+	}
+	return 0;
+}
+
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
 	switch (msr) {
@@ -1294,6 +1325,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		kvm_request_guest_time_update(vcpu);
 		break;
 	}
+	case MSR_KVM_ASYNC_PF_EN:
+		if (kvm_pv_enable_async_pf(vcpu, data))
+			return 1;
+		break;
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
@@ -1546,6 +1581,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 	case MSR_KVM_SYSTEM_TIME_NEW:
 		data = vcpu->arch.time;
 		break;
+	case MSR_KVM_ASYNC_PF_EN:
+		data = vcpu->arch.apf_msr_val;
+		break;
 	case MSR_IA32_P5_MC_ADDR:
 	case MSR_IA32_P5_MC_TYPE:
 	case MSR_IA32_MCG_CAP:
@@ -1681,6 +1719,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_DEBUGREGS:
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
 	case KVM_CAP_XSAVE:
+	case KVM_CAP_ASYNC_PF:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -5330,6 +5369,9 @@ free_vcpu:
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
+	vcpu->arch.apf_data = NULL;
+	vcpu->arch.apf_msr_val = 0;
+
 	vcpu_load(vcpu);
 	kvm_mmu_unload(vcpu);
 	vcpu_put(vcpu);
@@ -5348,6 +5390,9 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.dr6 = DR6_FIXED_1;
 	vcpu->arch.dr7 = DR7_FIXED_1;
 
+	vcpu->arch.apf_data = NULL;
+	vcpu->arch.apf_msr_val = 0;
+
 	return kvm_x86_ops->vcpu_reset(vcpu);
 }
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 636fc38..bab7ef0 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -530,6 +530,7 @@ struct kvm_enable_cap {
 #ifdef __KVM_HAVE_XCRS
 #define KVM_CAP_XCRS 56
 #endif
+#define KVM_CAP_ASYNC_PF 57
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 02/12] Add PV MSR to enable asynchronous page faults delivery.
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti


Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    3 ++
 arch/x86/include/asm/kvm_para.h |    4 +++
 arch/x86/kvm/x86.c              |   49 +++++++++++++++++++++++++++++++++++++-
 include/linux/kvm.h             |    1 +
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 502e53f..245831a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -364,6 +364,9 @@ struct kvm_vcpu_arch {
 	u64 hv_vapic;
 
 	cpumask_var_t wbinvd_dirty_mask;
+
+	u32 __user *apf_data;
+	u64 apf_msr_val;
 };
 
 struct kvm_arch {
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 42298ab..5b05e9f 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -20,6 +20,7 @@
  * are available. The use of 0x11 and 0x12 is deprecated
  */
 #define KVM_FEATURE_CLOCKSOURCE2        3
+#define KVM_FEATURE_ASYNC_PF		4
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -32,9 +33,12 @@
 /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
 #define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
+#define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 
 #define KVM_MAX_MMU_OP_BATCH           32
 
+#define KVM_ASYNC_PF_ENABLED			(1 << 0)
+
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
 #define KVM_MMU_OP_FLUSH_TLB	        2
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7070b41..744f8c1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -726,12 +726,12 @@ EXPORT_SYMBOL_GPL(kvm_get_dr);
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN	7
+#define KVM_SAVE_MSRS_BEGIN	8
 static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
 	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
-	HV_X64_MSR_APIC_ASSIST_PAGE,
+	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 	MSR_K6_STAR,
 #ifdef CONFIG_X86_64
@@ -1212,6 +1212,37 @@ static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 	return 0;
 }
 
+static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
+{
+	u64 gpa = data & ~0x3f;
+	int offset = offset_in_page(gpa);
+	unsigned long addr;
+
+	/* Bits 1:5 are resrved, Should be zero */
+	if (data & 0x3e)
+		return 1;
+
+	vcpu->arch.apf_msr_val = data;
+
+	if (!(data & KVM_ASYNC_PF_ENABLED)) {
+		vcpu->arch.apf_data = NULL;
+		return 0;
+	}
+
+	addr = gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT);
+	if (kvm_is_error_hva(addr))
+		return 1;
+
+	vcpu->arch.apf_data = (u32 __user*)(addr + offset);
+
+	/* check if address is mapped */
+	if (get_user(offset, vcpu->arch.apf_data)) {
+		vcpu->arch.apf_data = NULL;
+		return 1;
+	}
+	return 0;
+}
+
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
 	switch (msr) {
@@ -1294,6 +1325,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		kvm_request_guest_time_update(vcpu);
 		break;
 	}
+	case MSR_KVM_ASYNC_PF_EN:
+		if (kvm_pv_enable_async_pf(vcpu, data))
+			return 1;
+		break;
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
@@ -1546,6 +1581,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 	case MSR_KVM_SYSTEM_TIME_NEW:
 		data = vcpu->arch.time;
 		break;
+	case MSR_KVM_ASYNC_PF_EN:
+		data = vcpu->arch.apf_msr_val;
+		break;
 	case MSR_IA32_P5_MC_ADDR:
 	case MSR_IA32_P5_MC_TYPE:
 	case MSR_IA32_MCG_CAP:
@@ -1681,6 +1719,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_DEBUGREGS:
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
 	case KVM_CAP_XSAVE:
+	case KVM_CAP_ASYNC_PF:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -5330,6 +5369,9 @@ free_vcpu:
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
+	vcpu->arch.apf_data = NULL;
+	vcpu->arch.apf_msr_val = 0;
+
 	vcpu_load(vcpu);
 	kvm_mmu_unload(vcpu);
 	vcpu_put(vcpu);
@@ -5348,6 +5390,9 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.dr6 = DR6_FIXED_1;
 	vcpu->arch.dr7 = DR7_FIXED_1;
 
+	vcpu->arch.apf_data = NULL;
+	vcpu->arch.apf_msr_val = 0;
+
 	return kvm_x86_ops->vcpu_reset(vcpu);
 }
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 636fc38..bab7ef0 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -530,6 +530,7 @@ struct kvm_enable_cap {
 #ifdef __KVM_HAVE_XCRS
 #define KVM_CAP_XCRS 56
 #endif
+#define KVM_CAP_ASYNC_PF 57
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 03/12] Add async PF initialization to PV guest.
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti


Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    5 ++++
 arch/x86/kernel/kvm.c           |   49 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c       |    3 ++
 include/linux/kvm_para.h        |    2 +
 4 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 5b05e9f..f1662d7 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,11 @@ struct kvm_mmu_op_release_pt {
 	__u64 pt_phys;
 };
 
+struct kvm_vcpu_pv_apf_data {
+	__u32 reason;
+	__u32 enabled;
+};
+
 #ifdef __KERNEL__
 #include <asm/processor.h>
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e6db179..be1966a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -27,7 +27,10 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
 #include <linux/hardirq.h>
+#include <linux/notifier.h>
+#include <linux/reboot.h>
 #include <asm/timer.h>
+#include <asm/cpu.h>
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -37,6 +40,7 @@ struct kvm_para_state {
 };
 
 static DEFINE_PER_CPU(struct kvm_para_state, para_state);
+static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
 
 static struct kvm_para_state *kvm_para_state(void)
 {
@@ -231,10 +235,35 @@ static void __init paravirt_ops_setup(void)
 #endif
 }
 
+static void kvm_pv_disable_apf(void *unused)
+{
+	if (!__get_cpu_var(apf_reason).enabled)
+		return;
+
+	wrmsrl(MSR_KVM_ASYNC_PF_EN, 0);
+	__get_cpu_var(apf_reason).enabled = 0;
+
+	printk(KERN_INFO"Unregister pv shared memory for cpu %d\n",
+	       smp_processor_id());
+}
+
+static int kvm_pv_reboot_notify(struct notifier_block *nb,
+				unsigned long code, void *unused)
+{
+	if (code == SYS_RESTART)
+		on_each_cpu(kvm_pv_disable_apf, NULL, 1);
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block kvm_pv_reboot_nb = {
+	.notifier_call = kvm_pv_reboot_notify,
+};
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
 	WARN_ON(kvm_register_clock("primary cpu clock"));
+	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
 }
 #endif
@@ -245,7 +274,27 @@ void __init kvm_guest_init(void)
 		return;
 
 	paravirt_ops_setup();
+	register_reboot_notifier(&kvm_pv_reboot_nb);
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+#else
+	kvm_guest_cpu_init();
 #endif
 }
+
+void __cpuinit kvm_guest_cpu_init(void)
+{
+	if (!kvm_para_available())
+		return;
+
+	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF)) {
+		u64 pa = __pa(&__get_cpu_var(apf_reason));
+
+		if (native_write_msr_safe(MSR_KVM_ASYNC_PF_EN,
+					  pa | KVM_ASYNC_PF_ENABLED, pa >> 32))
+			return;
+		__get_cpu_var(apf_reason).enabled = 1;
+		printk(KERN_INFO"Setup pv shared memory for cpu %d\n",
+		       smp_processor_id());
+	}
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c4f33b2..b70e872 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -67,6 +67,7 @@
 #include <asm/setup.h>
 #include <asm/uv/uv.h>
 #include <linux/mc146818rtc.h>
+#include <linux/kvm_para.h>
 
 #include <asm/smpboot_hooks.h>
 #include <asm/i8259.h>
@@ -329,6 +330,8 @@ notrace static void __cpuinit start_secondary(void *unused)
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	x86_platform.nmi_init();
 
+	kvm_guest_cpu_init();
+
 	/* enable local interrupts */
 	local_irq_enable();
 
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index d731092..4c8a2e6 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -26,8 +26,10 @@
 #ifdef __KERNEL__
 #ifdef CONFIG_KVM_GUEST
 void __init kvm_guest_init(void);
+void __cpuinit kvm_guest_cpu_init(void);
 #else
 #define kvm_guest_init() do { } while (0)
+#define kvm_guest_cpu_init() do { } while (0)
 #endif
 
 static inline int kvm_para_has_feature(unsigned int feature)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 03/12] Add async PF initialization to PV guest.
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti


Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    5 ++++
 arch/x86/kernel/kvm.c           |   49 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c       |    3 ++
 include/linux/kvm_para.h        |    2 +
 4 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 5b05e9f..f1662d7 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,11 @@ struct kvm_mmu_op_release_pt {
 	__u64 pt_phys;
 };
 
+struct kvm_vcpu_pv_apf_data {
+	__u32 reason;
+	__u32 enabled;
+};
+
 #ifdef __KERNEL__
 #include <asm/processor.h>
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e6db179..be1966a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -27,7 +27,10 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
 #include <linux/hardirq.h>
+#include <linux/notifier.h>
+#include <linux/reboot.h>
 #include <asm/timer.h>
+#include <asm/cpu.h>
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -37,6 +40,7 @@ struct kvm_para_state {
 };
 
 static DEFINE_PER_CPU(struct kvm_para_state, para_state);
+static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
 
 static struct kvm_para_state *kvm_para_state(void)
 {
@@ -231,10 +235,35 @@ static void __init paravirt_ops_setup(void)
 #endif
 }
 
+static void kvm_pv_disable_apf(void *unused)
+{
+	if (!__get_cpu_var(apf_reason).enabled)
+		return;
+
+	wrmsrl(MSR_KVM_ASYNC_PF_EN, 0);
+	__get_cpu_var(apf_reason).enabled = 0;
+
+	printk(KERN_INFO"Unregister pv shared memory for cpu %d\n",
+	       smp_processor_id());
+}
+
+static int kvm_pv_reboot_notify(struct notifier_block *nb,
+				unsigned long code, void *unused)
+{
+	if (code == SYS_RESTART)
+		on_each_cpu(kvm_pv_disable_apf, NULL, 1);
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block kvm_pv_reboot_nb = {
+	.notifier_call = kvm_pv_reboot_notify,
+};
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
 	WARN_ON(kvm_register_clock("primary cpu clock"));
+	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
 }
 #endif
@@ -245,7 +274,27 @@ void __init kvm_guest_init(void)
 		return;
 
 	paravirt_ops_setup();
+	register_reboot_notifier(&kvm_pv_reboot_nb);
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+#else
+	kvm_guest_cpu_init();
 #endif
 }
+
+void __cpuinit kvm_guest_cpu_init(void)
+{
+	if (!kvm_para_available())
+		return;
+
+	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF)) {
+		u64 pa = __pa(&__get_cpu_var(apf_reason));
+
+		if (native_write_msr_safe(MSR_KVM_ASYNC_PF_EN,
+					  pa | KVM_ASYNC_PF_ENABLED, pa >> 32))
+			return;
+		__get_cpu_var(apf_reason).enabled = 1;
+		printk(KERN_INFO"Setup pv shared memory for cpu %d\n",
+		       smp_processor_id());
+	}
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c4f33b2..b70e872 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -67,6 +67,7 @@
 #include <asm/setup.h>
 #include <asm/uv/uv.h>
 #include <linux/mc146818rtc.h>
+#include <linux/kvm_para.h>
 
 #include <asm/smpboot_hooks.h>
 #include <asm/i8259.h>
@@ -329,6 +330,8 @@ notrace static void __cpuinit start_secondary(void *unused)
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	x86_platform.nmi_init();
 
+	kvm_guest_cpu_init();
+
 	/* enable local interrupts */
 	local_irq_enable();
 
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index d731092..4c8a2e6 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -26,8 +26,10 @@
 #ifdef __KERNEL__
 #ifdef CONFIG_KVM_GUEST
 void __init kvm_guest_init(void);
+void __cpuinit kvm_guest_cpu_init(void);
 #else
 #define kvm_guest_init() do { } while (0)
+#define kvm_guest_cpu_init() do { } while (0)
 #endif
 
 static inline int kvm_para_has_feature(unsigned int feature)
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 04/12] Provide special async page fault handler when async PF capability is detected
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    3 +
 arch/x86/include/asm/traps.h    |    1 +
 arch/x86/kernel/entry_32.S      |   10 +++
 arch/x86/kernel/entry_64.S      |    3 +
 arch/x86/kernel/kvm.c           |  170 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 187 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index f1662d7..edf07cf 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,9 @@ struct kvm_mmu_op_release_pt {
 	__u64 pt_phys;
 };
 
+#define KVM_PV_REASON_PAGE_NOT_PRESENT 1
+#define KVM_PV_REASON_PAGE_READY 2
+
 struct kvm_vcpu_pv_apf_data {
 	__u32 reason;
 	__u32 enabled;
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index f66cda5..0310da6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -30,6 +30,7 @@ asmlinkage void segment_not_present(void);
 asmlinkage void stack_segment(void);
 asmlinkage void general_protection(void);
 asmlinkage void page_fault(void);
+asmlinkage void async_page_fault(void);
 asmlinkage void spurious_interrupt_bug(void);
 asmlinkage void coprocessor_error(void);
 asmlinkage void alignment_check(void);
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index cd49141..95e13da 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1494,6 +1494,16 @@ ENTRY(general_protection)
 	CFI_ENDPROC
 END(general_protection)
 
+#ifdef CONFIG_KVM_GUEST
+ENTRY(async_page_fault)
+	RING0_EC_FRAME
+	pushl $do_async_page_fault
+	CFI_ADJUST_CFA_OFFSET 4
+	jmp error_code
+	CFI_ENDPROC
+END(apf_page_fault)
+#endif
+
 /*
  * End of kprobes section
  */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 0697ff1..65c3eb6 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1346,6 +1346,9 @@ errorentry xen_stack_segment do_stack_segment
 #endif
 errorentry general_protection do_general_protection
 errorentry page_fault do_page_fault
+#ifdef CONFIG_KVM_GUEST
+errorentry async_page_fault do_async_page_fault
+#endif
 #ifdef CONFIG_X86_MCE
 paranoidzeroentry machine_check *machine_check_vector(%rip)
 #endif
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index be1966a..fa9f520 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -29,8 +29,14 @@
 #include <linux/hardirq.h>
 #include <linux/notifier.h>
 #include <linux/reboot.h>
+#include <linux/hash.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/kprobes.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
+#include <asm/traps.h>
+#include <asm/desc.h>
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -54,6 +60,158 @@ static void kvm_io_delay(void)
 {
 }
 
+#define KVM_TASK_SLEEP_HASHBITS 8
+#define KVM_TASK_SLEEP_HASHSIZE (1<<KVM_TASK_SLEEP_HASHBITS)
+
+struct kvm_task_sleep_node {
+	struct hlist_node link;
+	wait_queue_head_t wq;
+	u32 token;
+	int cpu;
+};
+
+static struct kvm_task_sleep_head {
+	spinlock_t lock;
+	struct hlist_head list;
+} async_pf_sleepers[KVM_TASK_SLEEP_HASHSIZE];
+
+static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
+						  u64 token)
+{
+	struct hlist_node *p;
+
+	hlist_for_each(p, &b->list) {
+		struct kvm_task_sleep_node *n =
+			hlist_entry(p, typeof(*n), link);
+		if (n->token == token)
+			return n;
+	}
+
+	return NULL;
+}
+
+static void apf_task_wait(struct task_struct *tsk, u32 token)
+{
+	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
+	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
+	struct kvm_task_sleep_node n, *e;
+	DEFINE_WAIT(wait);
+
+	spin_lock(&b->lock);
+	e = _find_apf_task(b, token);
+	if (e) {
+		/* dummy entry exist -> wake up was delivered ahead of PF */
+		hlist_del(&e->link);
+		kfree(e);
+		spin_unlock(&b->lock);
+		return;
+	}
+
+	n.token = token;
+	n.cpu = smp_processor_id();
+	init_waitqueue_head(&n.wq);
+	hlist_add_head(&n.link, &b->list);
+	spin_unlock(&b->lock);
+
+	for (;;) {
+		prepare_to_wait(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
+		if (hlist_unhashed(&n.link))
+			break;
+		schedule();
+	}
+	finish_wait(&n.wq, &wait);
+
+	return;
+}
+
+static void apf_task_wake_one(struct kvm_task_sleep_node *n)
+{
+	hlist_del_init(&n->link);
+	if (waitqueue_active(&n->wq))
+		wake_up(&n->wq);
+}
+
+static void apf_task_wake(u32 token)
+{
+	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
+	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
+	struct kvm_task_sleep_node *n;
+
+again:
+	spin_lock(&b->lock);
+	n = _find_apf_task(b, token);
+	if (!n) {
+		/*
+		 * async PF was not yet handled.
+		 * Add dummy entry for the token.
+		 */
+		n = kmalloc(sizeof(*n), GFP_ATOMIC);
+		if (!n) {
+			/*
+			 * Allocation failed! Busy wait while other vcpu
+			 * handles async PF.
+			 */
+			spin_unlock(&b->lock);
+			cpu_relax();
+			goto again;
+		}
+		n->token = token;
+		n->cpu = smp_processor_id();
+		init_waitqueue_head(&n->wq);
+		hlist_add_head(&n->link, &b->list);
+	} else
+		apf_task_wake_one(n);
+	spin_unlock(&b->lock);
+	return;
+}
+
+static void apf_task_wake_all(void)
+{
+	int i;
+
+	for (i = 0; i < KVM_TASK_SLEEP_HASHSIZE; i++) {
+		struct hlist_node *p, *next;
+		struct kvm_task_sleep_head *b = &async_pf_sleepers[i];
+		spin_lock(&b->lock);
+		hlist_for_each_safe(p, next, &b->list) {
+			struct kvm_task_sleep_node *n =
+				hlist_entry(p, typeof(*n), link);
+			if (n->cpu == smp_processor_id())
+				apf_task_wake_one(n);
+		}
+		spin_unlock(&b->lock);
+	}
+}
+
+dotraplinkage void __kprobes
+do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
+{
+	u32 reason = 0, token;
+
+	if (__get_cpu_var(apf_reason).enabled) {
+		reason = __get_cpu_var(apf_reason).reason;
+		__get_cpu_var(apf_reason).reason = 0;
+
+		token = (u32)read_cr2();
+	}
+
+	switch (reason) {
+	default:
+		do_page_fault(regs, error_code);
+		break;
+	case KVM_PV_REASON_PAGE_NOT_PRESENT:
+		/* page is swapped out by the host. */
+		apf_task_wait(current, token);
+		break;
+	case KVM_PV_REASON_PAGE_READY:
+		if (unlikely(token == ~0))
+			apf_task_wake_all();
+		else
+			apf_task_wake(token);
+		break;
+	}
+}
+
 static void kvm_mmu_op(void *buffer, unsigned len)
 {
 	int r;
@@ -268,13 +426,25 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 }
 #endif
 
+static void __init kvm_apf_trap_init(void)
+{
+	set_intr_gate(14, &async_page_fault);
+}
+
 void __init kvm_guest_init(void)
 {
+	int i;
+
 	if (!kvm_para_available())
 		return;
 
 	paravirt_ops_setup();
 	register_reboot_notifier(&kvm_pv_reboot_nb);
+	for (i = 0; i < KVM_TASK_SLEEP_HASHSIZE; i++)
+		spin_lock_init(&async_pf_sleepers[i].lock);
+	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF))
+		x86_init.irqs.trap_init = kvm_apf_trap_init;
+
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 #else
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 04/12] Provide special async page fault handler when async PF capability is detected
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    3 +
 arch/x86/include/asm/traps.h    |    1 +
 arch/x86/kernel/entry_32.S      |   10 +++
 arch/x86/kernel/entry_64.S      |    3 +
 arch/x86/kernel/kvm.c           |  170 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 187 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index f1662d7..edf07cf 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,9 @@ struct kvm_mmu_op_release_pt {
 	__u64 pt_phys;
 };
 
+#define KVM_PV_REASON_PAGE_NOT_PRESENT 1
+#define KVM_PV_REASON_PAGE_READY 2
+
 struct kvm_vcpu_pv_apf_data {
 	__u32 reason;
 	__u32 enabled;
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index f66cda5..0310da6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -30,6 +30,7 @@ asmlinkage void segment_not_present(void);
 asmlinkage void stack_segment(void);
 asmlinkage void general_protection(void);
 asmlinkage void page_fault(void);
+asmlinkage void async_page_fault(void);
 asmlinkage void spurious_interrupt_bug(void);
 asmlinkage void coprocessor_error(void);
 asmlinkage void alignment_check(void);
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index cd49141..95e13da 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1494,6 +1494,16 @@ ENTRY(general_protection)
 	CFI_ENDPROC
 END(general_protection)
 
+#ifdef CONFIG_KVM_GUEST
+ENTRY(async_page_fault)
+	RING0_EC_FRAME
+	pushl $do_async_page_fault
+	CFI_ADJUST_CFA_OFFSET 4
+	jmp error_code
+	CFI_ENDPROC
+END(apf_page_fault)
+#endif
+
 /*
  * End of kprobes section
  */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 0697ff1..65c3eb6 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1346,6 +1346,9 @@ errorentry xen_stack_segment do_stack_segment
 #endif
 errorentry general_protection do_general_protection
 errorentry page_fault do_page_fault
+#ifdef CONFIG_KVM_GUEST
+errorentry async_page_fault do_async_page_fault
+#endif
 #ifdef CONFIG_X86_MCE
 paranoidzeroentry machine_check *machine_check_vector(%rip)
 #endif
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index be1966a..fa9f520 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -29,8 +29,14 @@
 #include <linux/hardirq.h>
 #include <linux/notifier.h>
 #include <linux/reboot.h>
+#include <linux/hash.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/kprobes.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
+#include <asm/traps.h>
+#include <asm/desc.h>
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -54,6 +60,158 @@ static void kvm_io_delay(void)
 {
 }
 
+#define KVM_TASK_SLEEP_HASHBITS 8
+#define KVM_TASK_SLEEP_HASHSIZE (1<<KVM_TASK_SLEEP_HASHBITS)
+
+struct kvm_task_sleep_node {
+	struct hlist_node link;
+	wait_queue_head_t wq;
+	u32 token;
+	int cpu;
+};
+
+static struct kvm_task_sleep_head {
+	spinlock_t lock;
+	struct hlist_head list;
+} async_pf_sleepers[KVM_TASK_SLEEP_HASHSIZE];
+
+static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
+						  u64 token)
+{
+	struct hlist_node *p;
+
+	hlist_for_each(p, &b->list) {
+		struct kvm_task_sleep_node *n =
+			hlist_entry(p, typeof(*n), link);
+		if (n->token == token)
+			return n;
+	}
+
+	return NULL;
+}
+
+static void apf_task_wait(struct task_struct *tsk, u32 token)
+{
+	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
+	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
+	struct kvm_task_sleep_node n, *e;
+	DEFINE_WAIT(wait);
+
+	spin_lock(&b->lock);
+	e = _find_apf_task(b, token);
+	if (e) {
+		/* dummy entry exist -> wake up was delivered ahead of PF */
+		hlist_del(&e->link);
+		kfree(e);
+		spin_unlock(&b->lock);
+		return;
+	}
+
+	n.token = token;
+	n.cpu = smp_processor_id();
+	init_waitqueue_head(&n.wq);
+	hlist_add_head(&n.link, &b->list);
+	spin_unlock(&b->lock);
+
+	for (;;) {
+		prepare_to_wait(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
+		if (hlist_unhashed(&n.link))
+			break;
+		schedule();
+	}
+	finish_wait(&n.wq, &wait);
+
+	return;
+}
+
+static void apf_task_wake_one(struct kvm_task_sleep_node *n)
+{
+	hlist_del_init(&n->link);
+	if (waitqueue_active(&n->wq))
+		wake_up(&n->wq);
+}
+
+static void apf_task_wake(u32 token)
+{
+	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
+	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
+	struct kvm_task_sleep_node *n;
+
+again:
+	spin_lock(&b->lock);
+	n = _find_apf_task(b, token);
+	if (!n) {
+		/*
+		 * async PF was not yet handled.
+		 * Add dummy entry for the token.
+		 */
+		n = kmalloc(sizeof(*n), GFP_ATOMIC);
+		if (!n) {
+			/*
+			 * Allocation failed! Busy wait while other vcpu
+			 * handles async PF.
+			 */
+			spin_unlock(&b->lock);
+			cpu_relax();
+			goto again;
+		}
+		n->token = token;
+		n->cpu = smp_processor_id();
+		init_waitqueue_head(&n->wq);
+		hlist_add_head(&n->link, &b->list);
+	} else
+		apf_task_wake_one(n);
+	spin_unlock(&b->lock);
+	return;
+}
+
+static void apf_task_wake_all(void)
+{
+	int i;
+
+	for (i = 0; i < KVM_TASK_SLEEP_HASHSIZE; i++) {
+		struct hlist_node *p, *next;
+		struct kvm_task_sleep_head *b = &async_pf_sleepers[i];
+		spin_lock(&b->lock);
+		hlist_for_each_safe(p, next, &b->list) {
+			struct kvm_task_sleep_node *n =
+				hlist_entry(p, typeof(*n), link);
+			if (n->cpu == smp_processor_id())
+				apf_task_wake_one(n);
+		}
+		spin_unlock(&b->lock);
+	}
+}
+
+dotraplinkage void __kprobes
+do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
+{
+	u32 reason = 0, token;
+
+	if (__get_cpu_var(apf_reason).enabled) {
+		reason = __get_cpu_var(apf_reason).reason;
+		__get_cpu_var(apf_reason).reason = 0;
+
+		token = (u32)read_cr2();
+	}
+
+	switch (reason) {
+	default:
+		do_page_fault(regs, error_code);
+		break;
+	case KVM_PV_REASON_PAGE_NOT_PRESENT:
+		/* page is swapped out by the host. */
+		apf_task_wait(current, token);
+		break;
+	case KVM_PV_REASON_PAGE_READY:
+		if (unlikely(token == ~0))
+			apf_task_wake_all();
+		else
+			apf_task_wake(token);
+		break;
+	}
+}
+
 static void kvm_mmu_op(void *buffer, unsigned len)
 {
 	int r;
@@ -268,13 +426,25 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 }
 #endif
 
+static void __init kvm_apf_trap_init(void)
+{
+	set_intr_gate(14, &async_page_fault);
+}
+
 void __init kvm_guest_init(void)
 {
+	int i;
+
 	if (!kvm_para_available())
 		return;
 
 	paravirt_ops_setup();
 	register_reboot_notifier(&kvm_pv_reboot_nb);
+	for (i = 0; i < KVM_TASK_SLEEP_HASHSIZE; i++)
+		spin_lock_init(&async_pf_sleepers[i].lock);
+	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF))
+		x86_init.irqs.trap_init = kvm_apf_trap_init;
+
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 #else
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 05/12] Export __get_user_pages_fast.
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

KVM will use it to try and find a page without falling back to slow
gup. That is why get_user_pages_fast() is not enough.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/mm/gup.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 738e659..a4ce19f 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -8,6 +8,7 @@
 #include <linux/mm.h>
 #include <linux/vmstat.h>
 #include <linux/highmem.h>
+#include <linux/module.h>
 
 #include <asm/pgtable.h>
 
@@ -274,6 +275,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 
 	return nr;
 }
+EXPORT_SYMBOL_GPL(__get_user_pages_fast);
 
 /**
  * get_user_pages_fast() - pin user pages in memory
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 05/12] Export __get_user_pages_fast.
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

KVM will use it to try and find a page without falling back to slow
gup. That is why get_user_pages_fast() is not enough.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/mm/gup.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 738e659..a4ce19f 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -8,6 +8,7 @@
 #include <linux/mm.h>
 #include <linux/vmstat.h>
 #include <linux/highmem.h>
+#include <linux/module.h>
 
 #include <asm/pgtable.h>
 
@@ -274,6 +275,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 
 	return nr;
 }
+EXPORT_SYMBOL_GPL(__get_user_pages_fast);
 
 /**
  * get_user_pages_fast() - pin user pages in memory
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 06/12] Add get_user_pages() variant that fails if major fault is required.
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

This patch add get_user_pages() variant that only succeeds if getting
a reference to a page doesn't require major fault.

Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 fs/ncpfs/mmap.c    |    2 ++
 include/linux/mm.h |    5 +++++
 mm/filemap.c       |    3 +++
 mm/memory.c        |   31 ++++++++++++++++++++++++++++---
 mm/shmem.c         |    8 +++++++-
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/fs/ncpfs/mmap.c b/fs/ncpfs/mmap.c
index 56f5b3a..b9c4f36 100644
--- a/fs/ncpfs/mmap.c
+++ b/fs/ncpfs/mmap.c
@@ -39,6 +39,8 @@ static int ncp_file_mmap_fault(struct vm_area_struct *area,
 	int bufsize;
 	int pos; /* XXX: loff_t ? */
 
+	if (vmf->flags & FAULT_FLAG_MINOR)
+		return VM_FAULT_MAJOR | VM_FAULT_ERROR;
 	/*
 	 * ncpfs has nothing against high pages as long
 	 * as recvmsg and memset works on it
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4238a9c..2bfc85a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -140,6 +140,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_WRITE	0x01	/* Fault was a write access */
 #define FAULT_FLAG_NONLINEAR	0x02	/* Fault was via a nonlinear mapping */
 #define FAULT_FLAG_MKWRITE	0x04	/* Fault was mkwrite of existing pte */
+#define FAULT_FLAG_MINOR	0x08	/* Do only minor fault */
 
 /*
  * This interface is used by x86 PAT code to identify a pfn mapping that is
@@ -843,6 +844,9 @@ extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			unsigned long start, int nr_pages, int write, int force,
 			struct page **pages, struct vm_area_struct **vmas);
+int get_user_pages_noio(struct task_struct *tsk, struct mm_struct *mm,
+			unsigned long start, int nr_pages, int write, int force,
+			struct page **pages, struct vm_area_struct **vmas);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
 struct page *get_dump_page(unsigned long addr);
@@ -1373,6 +1377,7 @@ struct page *follow_page(struct vm_area_struct *, unsigned long address,
 #define FOLL_GET	0x04	/* do get_page on page */
 #define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
 #define FOLL_FORCE	0x10	/* get_user_pages read/write w/o permission */
+#define FOLL_MINOR	0x20	/* do only minor page faults */
 
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
 			void *data);
diff --git a/mm/filemap.c b/mm/filemap.c
index 20e5642..1186338 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1548,6 +1548,9 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 			goto no_cached_page;
 		}
 	} else {
+		if (vmf->flags & FAULT_FLAG_MINOR)
+			return VM_FAULT_MAJOR | VM_FAULT_ERROR;
+
 		/* No page in the page cache at all */
 		do_sync_mmap_readahead(vma, ra, file, offset);
 		count_vm_event(PGMAJFAULT);
diff --git a/mm/memory.c b/mm/memory.c
index 119b7cc..7dfaba2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1433,10 +1433,13 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			cond_resched();
 			while (!(page = follow_page(vma, start, foll_flags))) {
 				int ret;
+				unsigned int fault_fl =
+					((foll_flags & FOLL_WRITE) ?
+					FAULT_FLAG_WRITE : 0) |
+					((foll_flags & FOLL_MINOR) ?
+					FAULT_FLAG_MINOR : 0);
 
-				ret = handle_mm_fault(mm, vma, start,
-					(foll_flags & FOLL_WRITE) ?
-					FAULT_FLAG_WRITE : 0);
+				ret = handle_mm_fault(mm, vma, start, fault_fl);
 
 				if (ret & VM_FAULT_ERROR) {
 					if (ret & VM_FAULT_OOM)
@@ -1444,6 +1447,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 					if (ret &
 					    (VM_FAULT_HWPOISON|VM_FAULT_SIGBUS))
 						return i ? i : -EFAULT;
+					else if (ret & VM_FAULT_MAJOR)
+						return i ? i : -EFAULT;
 					BUG();
 				}
 				if (ret & VM_FAULT_MAJOR)
@@ -1554,6 +1559,23 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+int get_user_pages_noio(struct task_struct *tsk, struct mm_struct *mm,
+		unsigned long start, int nr_pages, int write, int force,
+		struct page **pages, struct vm_area_struct **vmas)
+{
+	int flags = FOLL_TOUCH | FOLL_MINOR;
+
+	if (pages)
+		flags |= FOLL_GET;
+	if (write)
+		flags |= FOLL_WRITE;
+	if (force)
+		flags |= FOLL_FORCE;
+
+	return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas);
+}
+EXPORT_SYMBOL(get_user_pages_noio);
+
 /**
  * get_dump_page() - pin user page in memory while writing it to core dump
  * @addr: user address
@@ -2640,6 +2662,9 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
 	page = lookup_swap_cache(entry);
 	if (!page) {
+		if (flags & FAULT_FLAG_MINOR)
+			return VM_FAULT_MAJOR | VM_FAULT_ERROR;
+
 		grab_swap_token(mm); /* Contend for token _before_ read-in */
 		page = swapin_readahead(entry,
 					GFP_HIGHUSER_MOVABLE, vma, address);
diff --git a/mm/shmem.c b/mm/shmem.c
index f65f840..acc8958 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1227,6 +1227,7 @@ static int shmem_getpage(struct inode *inode, unsigned long idx,
 	swp_entry_t swap;
 	gfp_t gfp;
 	int error;
+	int flags = type ? *type : 0;
 
 	if (idx >= SHMEM_MAX_INDEX)
 		return -EFBIG;
@@ -1275,6 +1276,11 @@ repeat:
 		swappage = lookup_swap_cache(swap);
 		if (!swappage) {
 			shmem_swp_unmap(entry);
+			if (flags & FAULT_FLAG_MINOR) {
+				spin_unlock(&info->lock);
+				*type = VM_FAULT_MAJOR | VM_FAULT_ERROR;
+				goto failed;
+			}
 			/* here we actually do the io */
 			if (type && !(*type & VM_FAULT_MAJOR)) {
 				__count_vm_event(PGMAJFAULT);
@@ -1483,7 +1489,7 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
 	int error;
-	int ret;
+	int ret = (int)vmf->flags;
 
 	if (((loff_t)vmf->pgoff << PAGE_CACHE_SHIFT) >= i_size_read(inode))
 		return VM_FAULT_SIGBUS;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 06/12] Add get_user_pages() variant that fails if major fault is required.
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

This patch add get_user_pages() variant that only succeeds if getting
a reference to a page doesn't require major fault.

Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 fs/ncpfs/mmap.c    |    2 ++
 include/linux/mm.h |    5 +++++
 mm/filemap.c       |    3 +++
 mm/memory.c        |   31 ++++++++++++++++++++++++++++---
 mm/shmem.c         |    8 +++++++-
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/fs/ncpfs/mmap.c b/fs/ncpfs/mmap.c
index 56f5b3a..b9c4f36 100644
--- a/fs/ncpfs/mmap.c
+++ b/fs/ncpfs/mmap.c
@@ -39,6 +39,8 @@ static int ncp_file_mmap_fault(struct vm_area_struct *area,
 	int bufsize;
 	int pos; /* XXX: loff_t ? */
 
+	if (vmf->flags & FAULT_FLAG_MINOR)
+		return VM_FAULT_MAJOR | VM_FAULT_ERROR;
 	/*
 	 * ncpfs has nothing against high pages as long
 	 * as recvmsg and memset works on it
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4238a9c..2bfc85a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -140,6 +140,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_WRITE	0x01	/* Fault was a write access */
 #define FAULT_FLAG_NONLINEAR	0x02	/* Fault was via a nonlinear mapping */
 #define FAULT_FLAG_MKWRITE	0x04	/* Fault was mkwrite of existing pte */
+#define FAULT_FLAG_MINOR	0x08	/* Do only minor fault */
 
 /*
  * This interface is used by x86 PAT code to identify a pfn mapping that is
@@ -843,6 +844,9 @@ extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			unsigned long start, int nr_pages, int write, int force,
 			struct page **pages, struct vm_area_struct **vmas);
+int get_user_pages_noio(struct task_struct *tsk, struct mm_struct *mm,
+			unsigned long start, int nr_pages, int write, int force,
+			struct page **pages, struct vm_area_struct **vmas);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
 struct page *get_dump_page(unsigned long addr);
@@ -1373,6 +1377,7 @@ struct page *follow_page(struct vm_area_struct *, unsigned long address,
 #define FOLL_GET	0x04	/* do get_page on page */
 #define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
 #define FOLL_FORCE	0x10	/* get_user_pages read/write w/o permission */
+#define FOLL_MINOR	0x20	/* do only minor page faults */
 
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
 			void *data);
diff --git a/mm/filemap.c b/mm/filemap.c
index 20e5642..1186338 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1548,6 +1548,9 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 			goto no_cached_page;
 		}
 	} else {
+		if (vmf->flags & FAULT_FLAG_MINOR)
+			return VM_FAULT_MAJOR | VM_FAULT_ERROR;
+
 		/* No page in the page cache at all */
 		do_sync_mmap_readahead(vma, ra, file, offset);
 		count_vm_event(PGMAJFAULT);
diff --git a/mm/memory.c b/mm/memory.c
index 119b7cc..7dfaba2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1433,10 +1433,13 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			cond_resched();
 			while (!(page = follow_page(vma, start, foll_flags))) {
 				int ret;
+				unsigned int fault_fl =
+					((foll_flags & FOLL_WRITE) ?
+					FAULT_FLAG_WRITE : 0) |
+					((foll_flags & FOLL_MINOR) ?
+					FAULT_FLAG_MINOR : 0);
 
-				ret = handle_mm_fault(mm, vma, start,
-					(foll_flags & FOLL_WRITE) ?
-					FAULT_FLAG_WRITE : 0);
+				ret = handle_mm_fault(mm, vma, start, fault_fl);
 
 				if (ret & VM_FAULT_ERROR) {
 					if (ret & VM_FAULT_OOM)
@@ -1444,6 +1447,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 					if (ret &
 					    (VM_FAULT_HWPOISON|VM_FAULT_SIGBUS))
 						return i ? i : -EFAULT;
+					else if (ret & VM_FAULT_MAJOR)
+						return i ? i : -EFAULT;
 					BUG();
 				}
 				if (ret & VM_FAULT_MAJOR)
@@ -1554,6 +1559,23 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+int get_user_pages_noio(struct task_struct *tsk, struct mm_struct *mm,
+		unsigned long start, int nr_pages, int write, int force,
+		struct page **pages, struct vm_area_struct **vmas)
+{
+	int flags = FOLL_TOUCH | FOLL_MINOR;
+
+	if (pages)
+		flags |= FOLL_GET;
+	if (write)
+		flags |= FOLL_WRITE;
+	if (force)
+		flags |= FOLL_FORCE;
+
+	return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas);
+}
+EXPORT_SYMBOL(get_user_pages_noio);
+
 /**
  * get_dump_page() - pin user page in memory while writing it to core dump
  * @addr: user address
@@ -2640,6 +2662,9 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
 	page = lookup_swap_cache(entry);
 	if (!page) {
+		if (flags & FAULT_FLAG_MINOR)
+			return VM_FAULT_MAJOR | VM_FAULT_ERROR;
+
 		grab_swap_token(mm); /* Contend for token _before_ read-in */
 		page = swapin_readahead(entry,
 					GFP_HIGHUSER_MOVABLE, vma, address);
diff --git a/mm/shmem.c b/mm/shmem.c
index f65f840..acc8958 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1227,6 +1227,7 @@ static int shmem_getpage(struct inode *inode, unsigned long idx,
 	swp_entry_t swap;
 	gfp_t gfp;
 	int error;
+	int flags = type ? *type : 0;
 
 	if (idx >= SHMEM_MAX_INDEX)
 		return -EFBIG;
@@ -1275,6 +1276,11 @@ repeat:
 		swappage = lookup_swap_cache(swap);
 		if (!swappage) {
 			shmem_swp_unmap(entry);
+			if (flags & FAULT_FLAG_MINOR) {
+				spin_unlock(&info->lock);
+				*type = VM_FAULT_MAJOR | VM_FAULT_ERROR;
+				goto failed;
+			}
 			/* here we actually do the io */
 			if (type && !(*type & VM_FAULT_MAJOR)) {
 				__count_vm_event(PGMAJFAULT);
@@ -1483,7 +1489,7 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
 	int error;
-	int ret;
+	int ret = (int)vmf->flags;
 
 	if (((loff_t)vmf->pgoff << PAGE_CACHE_SHIFT) >= i_size_read(inode))
 		return VM_FAULT_SIGBUS;
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 07/12] Maintain memslot version number
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

Code that depends on particular memslot layout can track changes and
adjust to new layout.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 include/linux/kvm_host.h |    1 +
 virt/kvm/kvm_main.c      |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e796326..64f62f1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -177,6 +177,7 @@ struct kvm {
 	raw_spinlock_t requests_lock;
 	struct mutex slots_lock;
 	struct mm_struct *mm; /* userspace tied to this vm */
+	u32 memslot_version;
 	struct kvm_memslots *memslots;
 	struct srcu_struct srcu;
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a60b6b0..733558c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -730,6 +730,7 @@ skip_lpage:
 	slots->memslots[mem->slot] = new;
 	old_memslots = kvm->memslots;
 	rcu_assign_pointer(kvm->memslots, slots);
+	kvm->memslot_version++;
 	synchronize_srcu_expedited(&kvm->srcu);
 
 	kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 07/12] Maintain memslot version number
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

Code that depends on particular memslot layout can track changes and
adjust to new layout.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 include/linux/kvm_host.h |    1 +
 virt/kvm/kvm_main.c      |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e796326..64f62f1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -177,6 +177,7 @@ struct kvm {
 	raw_spinlock_t requests_lock;
 	struct mutex slots_lock;
 	struct mm_struct *mm; /* userspace tied to this vm */
+	u32 memslot_version;
 	struct kvm_memslots *memslots;
 	struct srcu_struct srcu;
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a60b6b0..733558c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -730,6 +730,7 @@ skip_lpage:
 	slots->memslots[mem->slot] = new;
 	old_memslots = kvm->memslots;
 	rcu_assign_pointer(kvm->memslots, slots);
+	kvm->memslot_version++;
 	synchronize_srcu_expedited(&kvm->srcu);
 
 	kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

If guest access swapped out memory do not swap it in from vcpu thread
context. Setup slow work to do swapping and send async page fault to
a guest.

Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able to
reschedule.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |   16 +++
 arch/x86/kvm/Kconfig            |    2 +
 arch/x86/kvm/mmu.c              |   35 +++++-
 arch/x86/kvm/paging_tmpl.h      |   17 +++-
 arch/x86/kvm/x86.c              |   63 +++++++++-
 include/linux/kvm_host.h        |   31 +++++
 include/trace/events/kvm.h      |   60 +++++++++
 virt/kvm/Kconfig                |    3 +
 virt/kvm/kvm_main.c             |  263 ++++++++++++++++++++++++++++++++++++++-
 9 files changed, 481 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 245831a..db514ea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -366,7 +366,9 @@ struct kvm_vcpu_arch {
 	cpumask_var_t wbinvd_dirty_mask;
 
 	u32 __user *apf_data;
+	u32 apf_memslot_ver;
 	u64 apf_msr_val;
+	u32 async_pf_id;
 };
 
 struct kvm_arch {
@@ -444,6 +446,8 @@ struct kvm_vcpu_stat {
 	u32 hypercalls;
 	u32 irq_injections;
 	u32 nmi_injections;
+	u32 apf_not_present;
+	u32 apf_present;
 };
 
 struct kvm_x86_ops {
@@ -528,6 +532,10 @@ struct kvm_x86_ops {
 	const struct trace_print_flags *exit_reasons_str;
 };
 
+struct kvm_arch_async_pf {
+	u32 token;
+};
+
 extern struct kvm_x86_ops *kvm_x86_ops;
 
 int kvm_mmu_module_init(void);
@@ -763,4 +771,12 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
+struct kvm_async_pf;
+
+void kvm_arch_inject_async_page_not_present(struct kvm_vcpu *vcpu,
+					    struct kvm_async_pf *work);
+void kvm_arch_inject_async_page_present(struct kvm_vcpu *vcpu,
+					struct kvm_async_pf *work);
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
 #endif /* _ASM_X86_KVM_HOST_H */
+
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 970bbd4..2461284 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -28,6 +28,8 @@ config KVM
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_EVENTFD
 	select KVM_APIC_ARCHITECTURE
+	select KVM_ASYNC_PF
+	select SLOW_WORK
 	select USER_RETURN_NOTIFIER
 	select KVM_MMIO
 	---help---
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c515753..a49565b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -21,6 +21,7 @@
 #include "mmu.h"
 #include "x86.h"
 #include "kvm_cache_regs.h"
+#include "x86.h"
 
 #include <linux/kvm_host.h>
 #include <linux/types.h>
@@ -2264,6 +2265,21 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 			     error_code & PFERR_WRITE_MASK, gfn);
 }
 
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
+{
+	struct kvm_arch_async_pf arch;
+	arch.token = (vcpu->arch.async_pf_id++ << 12) | vcpu->vcpu_id;
+	return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
+}
+
+static bool can_do_async_pf(struct kvm_vcpu *vcpu)
+{
+	if (!vcpu->arch.apf_data || kvm_event_needs_reinjection(vcpu))
+		return false;
+
+	return !!kvm_x86_ops->get_cpl(vcpu);
+}
+
 static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 				u32 error_code)
 {
@@ -2272,6 +2288,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	int level;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
+	bool async;
 
 	ASSERT(vcpu);
 	ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
@@ -2286,7 +2303,23 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
-	pfn = gfn_to_pfn(vcpu->kvm, gfn);
+
+	if (can_do_async_pf(vcpu)) {
+		pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async);
+		trace_kvm_try_async_get_page(async, pfn);
+	} else {
+do_sync:
+		async = false;
+		pfn = gfn_to_pfn(vcpu->kvm, gfn);
+	}
+
+	if (async) {
+		if (!kvm_arch_setup_async_pf(vcpu, gpa, gfn))
+			goto do_sync;
+		return 0;
+	}
+
+	/* mmio */
 	if (is_error_pfn(pfn))
 		return kvm_handle_bad_page(vcpu->kvm, gfn, pfn);
 	spin_lock(&vcpu->kvm->mmu_lock);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 3350c02..26d6b74 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -423,6 +423,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	pfn_t pfn;
 	int level = PT_PAGE_TABLE_LEVEL;
 	unsigned long mmu_seq;
+	bool async;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
 	kvm_mmu_audit(vcpu, "pre page fault");
@@ -454,7 +455,21 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
-	pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
+
+	if (can_do_async_pf(vcpu)) {
+		pfn = gfn_to_pfn_async(vcpu->kvm, walker.gfn, &async);
+		trace_kvm_try_async_get_page(async, pfn);
+	} else {
+do_sync:
+		async = false;
+		pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
+	}
+
+	if (async) {
+		if (!kvm_arch_setup_async_pf(vcpu, addr, walker.gfn))
+			goto do_sync;
+		return 0;
+	}
 
 	/* mmio */
 	if (is_error_pfn(pfn))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 744f8c1..6b7542f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -118,6 +118,8 @@ static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs);
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ "pf_fixed", VCPU_STAT(pf_fixed) },
 	{ "pf_guest", VCPU_STAT(pf_guest) },
+	{ "apf_not_present", VCPU_STAT(apf_not_present) },
+	{ "apf_present", VCPU_STAT(apf_present) },
 	{ "tlb_flush", VCPU_STAT(tlb_flush) },
 	{ "invlpg", VCPU_STAT(invlpg) },
 	{ "exits", VCPU_STAT(exits) },
@@ -1226,6 +1228,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 
 	if (!(data & KVM_ASYNC_PF_ENABLED)) {
 		vcpu->arch.apf_data = NULL;
+		kvm_clear_async_pf_completion_queue(vcpu);
 		return 0;
 	}
 
@@ -1240,6 +1243,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 		vcpu->arch.apf_data = NULL;
 		return 1;
 	}
+	vcpu->arch.apf_memslot_ver = vcpu->kvm->memslot_version;
+	kvm_async_pf_wakeup_all(vcpu);
 	return 0;
 }
 
@@ -4721,6 +4726,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (unlikely(r))
 		goto out;
 
+	kvm_check_async_pf_completion(vcpu);
+
 	preempt_disable();
 
 	kvm_x86_ops->prepare_guest_switch(vcpu);
@@ -5393,6 +5400,8 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.apf_data = NULL;
 	vcpu->arch.apf_msr_val = 0;
 
+	kvm_clear_async_pf_completion_queue(vcpu);
+
 	return kvm_x86_ops->vcpu_reset(vcpu);
 }
 
@@ -5534,8 +5543,10 @@ static void kvm_free_vcpus(struct kvm *kvm)
 	/*
 	 * Unpin any mmu pages first.
 	 */
-	kvm_for_each_vcpu(i, vcpu, kvm)
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		kvm_clear_async_pf_completion_queue(vcpu);
 		kvm_unload_vcpu_mmu(vcpu);
+	}
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_arch_vcpu_free(vcpu);
 
@@ -5647,6 +5658,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE
+		|| !list_empty_careful(&vcpu->async_pf_done)
 		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
 		|| vcpu->arch.nmi_pending ||
 		(kvm_arch_interrupt_allowed(vcpu) &&
@@ -5704,6 +5716,55 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 }
 EXPORT_SYMBOL_GPL(kvm_set_rflags);
 
+static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
+{
+	if (unlikely(vcpu->arch.apf_memslot_ver !=
+		     vcpu->kvm->memslot_version)) {
+		u64 gpa = vcpu->arch.apf_msr_val & ~0x3f;
+		unsigned long addr;
+		int offset = offset_in_page(gpa);
+
+		addr = gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT);
+		vcpu->arch.apf_data = (u32 __user*)(addr + offset);
+		if (kvm_is_error_hva(addr)) {
+			vcpu->arch.apf_data = NULL;
+			return -EFAULT;
+		}
+	}
+
+	return put_user(val, vcpu->arch.apf_data);
+}
+
+void kvm_arch_inject_async_page_not_present(struct kvm_vcpu *vcpu,
+					    struct kvm_async_pf *work)
+{
+	if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
+		kvm_inject_page_fault(vcpu, work->arch.token, 0);
+		++vcpu->stat.apf_not_present;
+		trace_kvm_send_async_pf(work->arch.token, work->gva,
+					KVM_PV_REASON_PAGE_NOT_PRESENT);
+	}
+}
+
+void kvm_arch_inject_async_page_present(struct kvm_vcpu *vcpu,
+					struct kvm_async_pf *work)
+{
+	if (is_error_page(work->page))
+		work->arch.token = ~0; /* broadcast wakeup */
+	if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+		kvm_inject_page_fault(vcpu, work->arch.token, 0);
+		++vcpu->stat.apf_present;
+		trace_kvm_send_async_pf(work->arch.token, work->gva,
+					KVM_PV_REASON_PAGE_READY);
+	}
+}
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
+{
+	return !kvm_event_needs_reinjection(vcpu) &&
+		kvm_x86_ops->interrupt_allowed(vcpu);
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 64f62f1..09c646f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -16,6 +16,7 @@
 #include <linux/mm.h>
 #include <linux/preempt.h>
 #include <linux/msi.h>
+#include <linux/slow-work.h>
 #include <asm/signal.h>
 
 #include <linux/kvm.h>
@@ -73,6 +74,27 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 			      struct kvm_io_device *dev);
 
+#ifdef CONFIG_KVM_ASYNC_PF
+struct kvm_async_pf {
+	struct slow_work work;
+	struct list_head link;
+	struct list_head queue;
+	struct kvm_vcpu *vcpu;
+	struct mm_struct *mm;
+	gva_t gva;
+	unsigned long addr;
+	struct kvm_arch_async_pf arch;
+	struct page *page;
+	atomic_t used;
+};
+
+void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
+void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+		       struct kvm_arch_async_pf *arch);
+int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
+#endif
+
 struct kvm_vcpu {
 	struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
@@ -103,6 +125,14 @@ struct kvm_vcpu {
 	gpa_t mmio_phys_addr;
 #endif
 
+#ifdef CONFIG_KVM_ASYNC_PF
+	u32 async_pf_queued;
+	struct list_head async_pf_queue;
+	struct list_head async_pf_done;
+	spinlock_t async_pf_lock;
+	struct kvm_async_pf *async_pf_work;
+#endif
+
 	struct kvm_vcpu_arch arch;
 };
 
@@ -296,6 +326,7 @@ void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);
 
+pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async);
 pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
 pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
 			 struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 6dd3a51..6d9f0c2 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -185,6 +185,66 @@ TRACE_EVENT(kvm_age_page,
 		  __entry->referenced ? "YOUNG" : "OLD")
 );
 
+#ifdef CONFIG_KVM_ASYNC_PF
+TRACE_EVENT(
+	kvm_try_async_get_page,
+	TP_PROTO(bool async, u64 pfn),
+	TP_ARGS(async, pfn),
+
+	TP_STRUCT__entry(
+		__field(__u64, pfn)
+		),
+
+	TP_fast_assign(
+		__entry->pfn = (!async) ? pfn : (u64)-1;
+		),
+
+	TP_printk("pfn %#llx", __entry->pfn)
+);
+
+TRACE_EVENT(
+	kvm_send_async_pf,
+	TP_PROTO(u64 token, u64 gva, u64 reason),
+	TP_ARGS(token, gva, reason),
+
+	TP_STRUCT__entry(
+		__field(__u64, token)
+		__field(__u64, gva)
+		__field(bool, np)
+		),
+
+	TP_fast_assign(
+		__entry->token = token;
+		__entry->gva = gva;
+		__entry->np = (reason == KVM_PV_REASON_PAGE_NOT_PRESENT);
+		),
+
+	TP_printk("token %#llx gva %#llx %s", __entry->token, __entry->gva,
+		  __entry->np ? "not present" : "ready")
+);
+
+TRACE_EVENT(
+	kvm_async_pf_completed,
+	TP_PROTO(unsigned long address, struct page *page, u64 gva),
+	TP_ARGS(address, page, gva),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, address)
+		__field(struct page*, page)
+		__field(u64, gva)
+		),
+
+	TP_fast_assign(
+		__entry->address = address;
+		__entry->page = page;
+		__entry->gva = gva;
+		),
+
+	TP_printk("gva %#llx address %#lx pfn %lx",  __entry->gva,
+		  __entry->address, page_to_pfn(__entry->page))
+);
+#endif
+
 #endif /* _TRACE_KVM_MAIN_H */
 
 /* This part must be outside protection */
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 7f1178f..f63ccb0 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -15,3 +15,6 @@ config KVM_APIC_ARCHITECTURE
 
 config KVM_MMIO
        bool
+
+config KVM_ASYNC_PF
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 733558c..0656054 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -78,6 +78,11 @@ static atomic_t hardware_enable_failed;
 struct kmem_cache *kvm_vcpu_cache;
 EXPORT_SYMBOL_GPL(kvm_vcpu_cache);
 
+#ifdef CONFIG_KVM_ASYNC_PF
+#define ASYNC_PF_PER_VCPU 100
+static struct kmem_cache *async_pf_cache;
+#endif
+
 static __read_mostly struct preempt_ops kvm_preempt_ops;
 
 struct dentry *kvm_debugfs_dir;
@@ -183,6 +188,11 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 	vcpu->kvm = kvm;
 	vcpu->vcpu_id = id;
 	init_waitqueue_head(&vcpu->wq);
+#ifdef CONFIG_KVM_ASYNC_PF
+	INIT_LIST_HEAD(&vcpu->async_pf_done);
+	INIT_LIST_HEAD(&vcpu->async_pf_queue);
+	spin_lock_init(&vcpu->async_pf_lock);
+#endif
 
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 	if (!page) {
@@ -935,7 +945,7 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(gfn_to_hva);
 
-static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
+static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool *async)
 {
 	struct page *page[1];
 	int npages;
@@ -943,7 +953,19 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
 
 	might_sleep();
 
-	npages = get_user_pages_fast(addr, 1, 1, page);
+	if (async) {
+#ifdef CONFIG_X86
+		npages = __get_user_pages_fast(addr, 1, 1, page);
+#endif
+		if (unlikely(npages != 1)) {
+			down_read(&current->mm->mmap_sem);
+			npages = get_user_pages_noio(current, current->mm,
+						     addr, 1, 1, 0, page,
+						     NULL);
+			up_read(&current->mm->mmap_sem);
+		}
+	} else
+		npages = get_user_pages_fast(addr, 1, 1, page);
 
 	if (unlikely(npages != 1)) {
 		struct vm_area_struct *vma;
@@ -959,6 +981,9 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
 
 		if (vma == NULL || addr < vma->vm_start ||
 		    !(vma->vm_flags & VM_PFNMAP)) {
+			if (async && !(vma->vm_flags & VM_PFNMAP) &&
+			    (vma->vm_flags & VM_WRITE))
+				*async = true;
 			up_read(&current->mm->mmap_sem);
 			get_page(bad_page);
 			return page_to_pfn(bad_page);
@@ -973,25 +998,37 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
 	return pfn;
 }
 
-pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+static inline pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool *async)
 {
 	unsigned long addr;
 
+	if (async)
+		*async = false;
 	addr = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(addr)) {
 		get_page(bad_page);
 		return page_to_pfn(bad_page);
 	}
 
-	return hva_to_pfn(kvm, addr);
+	return hva_to_pfn(kvm, addr, async);
+}
+
+pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+{
+	return __gfn_to_pfn(kvm, gfn, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
+pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async)
+{
+	return __gfn_to_pfn(kvm, gfn, async);
+}
+
 pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
 			 struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	unsigned long addr = gfn_to_hva_memslot(slot, gfn);
-	return hva_to_pfn(kvm, addr);
+	return hva_to_pfn(kvm, addr, NULL);
 }
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
@@ -1204,6 +1241,196 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 	}
 }
 
+#ifdef CONFIG_KVM_ASYNC_PF
+static void async_pf_work_free(struct kvm_async_pf *apf)
+{
+	if (atomic_dec_and_test(&apf->used))
+		kmem_cache_free(async_pf_cache, apf);
+}
+
+static int async_pf_get_ref(struct slow_work *work)
+{
+	struct kvm_async_pf *apf =
+		container_of(work, struct kvm_async_pf, work);
+
+	atomic_inc(&apf->used);
+	return 0;
+}
+
+static void async_pf_put_ref(struct slow_work *work)
+{
+	struct kvm_async_pf *apf =
+		container_of(work, struct kvm_async_pf, work);
+
+	kvm_put_kvm(apf->vcpu->kvm);
+	async_pf_work_free(apf);
+}
+
+static void async_pf_execute(struct slow_work *work)
+{
+	struct page *page;
+	struct kvm_async_pf *apf =
+		container_of(work, struct kvm_async_pf, work);
+	wait_queue_head_t *q = &apf->vcpu->wq;
+
+	might_sleep();
+
+	down_read(&apf->mm->mmap_sem);
+	get_user_pages(current, apf->mm, apf->addr, 1, 1, 0, &page, NULL);
+	up_read(&apf->mm->mmap_sem);
+
+	spin_lock(&apf->vcpu->async_pf_lock);
+	list_add_tail(&apf->link, &apf->vcpu->async_pf_done);
+	apf->page = page;
+	spin_unlock(&apf->vcpu->async_pf_lock);
+
+	trace_kvm_async_pf_completed(apf->addr, apf->page, apf->gva);
+
+	if (waitqueue_active(q))
+		wake_up_interruptible(q);
+
+	mmdrop(apf->mm);
+}
+
+struct slow_work_ops async_pf_ops = {
+	.get_ref = async_pf_get_ref,
+	.put_ref = async_pf_put_ref,
+	.execute = async_pf_execute
+};
+
+void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
+{
+	/* cancel outstanding slow work item */
+	while (!list_empty(&vcpu->async_pf_queue)) {
+		struct kvm_async_pf *work =
+			list_entry(vcpu->async_pf_queue.next,
+				   typeof(*work), queue);
+		slow_work_cancel(&work->work);
+		list_del(&work->queue);
+		if (!work->page) /* work was canceled */
+			kmem_cache_free(async_pf_cache, work);
+	}
+
+	spin_lock(&vcpu->async_pf_lock);
+	while (!list_empty(&vcpu->async_pf_done)) {
+		struct kvm_async_pf *work =
+			list_entry(vcpu->async_pf_done.next,
+				   typeof(*work), link);
+		list_del(&work->link);
+		put_page(work->page);
+		kmem_cache_free(async_pf_cache, work);
+	}
+	spin_unlock(&vcpu->async_pf_lock);
+
+	vcpu->async_pf_queued = 0;
+	vcpu->async_pf_work = NULL;
+}
+
+void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
+{
+	struct kvm_async_pf *work = vcpu->async_pf_work;
+
+	if (work) {
+		vcpu->async_pf_work = NULL;
+		if (work->page == NULL) {
+			kvm_arch_inject_async_page_not_present(vcpu, work);
+			return;
+		} else {
+			spin_lock(&vcpu->async_pf_lock);
+			list_del(&work->link);
+			spin_unlock(&vcpu->async_pf_lock);
+			put_page(work->page);
+			async_pf_work_free(work);
+			list_del(&work->queue);
+			vcpu->async_pf_queued--;
+		}
+	}
+
+	if (list_empty_careful(&vcpu->async_pf_done) ||
+	    !kvm_arch_can_inject_async_page_present(vcpu))
+		return;
+
+	spin_lock(&vcpu->async_pf_lock);
+	work = list_first_entry(&vcpu->async_pf_done, typeof(*work), link);
+	list_del(&work->link);
+	spin_unlock(&vcpu->async_pf_lock);
+	list_del(&work->queue);
+	vcpu->async_pf_queued--;
+
+	kvm_arch_inject_async_page_present(vcpu, work);
+
+	put_page(work->page);
+	async_pf_work_free(work);
+}
+
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+		       struct kvm_arch_async_pf *arch)
+{
+	struct kvm_async_pf *work;
+
+	if (vcpu->async_pf_queued >= ASYNC_PF_PER_VCPU)
+		return 0;
+
+	/* setup slow work */
+
+	/* do alloc atomic since if we are going to sleep anyway we
+	   may as well sleep faulting in page */
+	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
+	if (!work)
+		return 0;
+
+	atomic_set(&work->used, 1);
+	work->page = NULL;
+	work->vcpu = vcpu;
+	work->gva = gva;
+	work->addr = gfn_to_hva(vcpu->kvm, gfn);
+	work->arch = *arch;
+	work->mm = current->mm;
+	atomic_inc(&work->mm->mm_count);
+	kvm_get_kvm(work->vcpu->kvm);
+
+	/* this can't really happen otherwise gfn_to_pfn_async
+	   would succeed */
+	if (unlikely(kvm_is_error_hva(work->addr)))
+		goto retry_sync;
+
+	slow_work_init(&work->work, &async_pf_ops);
+	if (slow_work_enqueue(&work->work) != 0)
+		goto retry_sync;
+
+	vcpu->async_pf_work = work;
+	list_add_tail(&work->queue, &vcpu->async_pf_queue);
+	vcpu->async_pf_queued++;
+	return 1;
+retry_sync:
+	kvm_put_kvm(work->vcpu->kvm);
+	mmdrop(work->mm);
+	kmem_cache_free(async_pf_cache, work);
+	return 0;
+}
+
+int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
+{
+	struct kvm_async_pf *work;
+
+	if (!list_empty(&vcpu->async_pf_done))
+		return 0;
+
+	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
+	if (!work)
+		return -ENOMEM;
+
+	atomic_set(&work->used, 1);
+	work->page = bad_page;
+	get_page(bad_page);
+	INIT_LIST_HEAD(&work->queue); /* for list_del to work */
+
+	list_add_tail(&work->link, &vcpu->async_pf_done);
+	vcpu->async_pf_queued++;
+	return 0;
+}
+#endif
+
 /*
  * The vCPU has executed a HLT instruction with in-kernel mode enabled.
  */
@@ -2267,6 +2494,19 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 		goto out_free_5;
 	}
 
+#ifdef CONFIG_KVM_ASYNC_PF
+	async_pf_cache = KMEM_CACHE(kvm_async_pf, 0);
+
+	if (!async_pf_cache) {
+		r = -ENOMEM;
+		goto out_free_6;
+	}
+
+	r = slow_work_register_user(THIS_MODULE);
+	if (r)
+		goto out_free;
+#endif
+
 	kvm_chardev_ops.owner = module;
 	kvm_vm_fops.owner = module;
 	kvm_vcpu_fops.owner = module;
@@ -2274,7 +2514,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	r = misc_register(&kvm_dev);
 	if (r) {
 		printk(KERN_ERR "kvm: misc device register failed\n");
-		goto out_free;
+		goto out_unreg;
 	}
 
 	kvm_preempt_ops.sched_in = kvm_sched_in;
@@ -2284,7 +2524,13 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 
 	return 0;
 
+out_unreg:
+#ifdef CONFIG_KVM_ASYNC_PF
+	slow_work_unregister_user(THIS_MODULE);
 out_free:
+	kmem_cache_destroy(async_pf_cache);
+out_free_6:
+#endif
 	kmem_cache_destroy(kvm_vcpu_cache);
 out_free_5:
 	sysdev_unregister(&kvm_sysdev);
@@ -2314,6 +2560,11 @@ void kvm_exit(void)
 	kvm_exit_debug();
 	misc_deregister(&kvm_dev);
 	kmem_cache_destroy(kvm_vcpu_cache);
+#ifdef CONFIG_KVM_ASYNC_PF
+	if (async_pf_cache)
+		kmem_cache_destroy(async_pf_cache);
+	slow_work_unregister_user(THIS_MODULE);
+#endif
 	sysdev_unregister(&kvm_sysdev);
 	sysdev_class_unregister(&kvm_sysdev_class);
 	unregister_reboot_notifier(&kvm_reboot_notifier);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

If guest access swapped out memory do not swap it in from vcpu thread
context. Setup slow work to do swapping and send async page fault to
a guest.

Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able to
reschedule.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |   16 +++
 arch/x86/kvm/Kconfig            |    2 +
 arch/x86/kvm/mmu.c              |   35 +++++-
 arch/x86/kvm/paging_tmpl.h      |   17 +++-
 arch/x86/kvm/x86.c              |   63 +++++++++-
 include/linux/kvm_host.h        |   31 +++++
 include/trace/events/kvm.h      |   60 +++++++++
 virt/kvm/Kconfig                |    3 +
 virt/kvm/kvm_main.c             |  263 ++++++++++++++++++++++++++++++++++++++-
 9 files changed, 481 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 245831a..db514ea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -366,7 +366,9 @@ struct kvm_vcpu_arch {
 	cpumask_var_t wbinvd_dirty_mask;
 
 	u32 __user *apf_data;
+	u32 apf_memslot_ver;
 	u64 apf_msr_val;
+	u32 async_pf_id;
 };
 
 struct kvm_arch {
@@ -444,6 +446,8 @@ struct kvm_vcpu_stat {
 	u32 hypercalls;
 	u32 irq_injections;
 	u32 nmi_injections;
+	u32 apf_not_present;
+	u32 apf_present;
 };
 
 struct kvm_x86_ops {
@@ -528,6 +532,10 @@ struct kvm_x86_ops {
 	const struct trace_print_flags *exit_reasons_str;
 };
 
+struct kvm_arch_async_pf {
+	u32 token;
+};
+
 extern struct kvm_x86_ops *kvm_x86_ops;
 
 int kvm_mmu_module_init(void);
@@ -763,4 +771,12 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
+struct kvm_async_pf;
+
+void kvm_arch_inject_async_page_not_present(struct kvm_vcpu *vcpu,
+					    struct kvm_async_pf *work);
+void kvm_arch_inject_async_page_present(struct kvm_vcpu *vcpu,
+					struct kvm_async_pf *work);
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
 #endif /* _ASM_X86_KVM_HOST_H */
+
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 970bbd4..2461284 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -28,6 +28,8 @@ config KVM
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_EVENTFD
 	select KVM_APIC_ARCHITECTURE
+	select KVM_ASYNC_PF
+	select SLOW_WORK
 	select USER_RETURN_NOTIFIER
 	select KVM_MMIO
 	---help---
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c515753..a49565b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -21,6 +21,7 @@
 #include "mmu.h"
 #include "x86.h"
 #include "kvm_cache_regs.h"
+#include "x86.h"
 
 #include <linux/kvm_host.h>
 #include <linux/types.h>
@@ -2264,6 +2265,21 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 			     error_code & PFERR_WRITE_MASK, gfn);
 }
 
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
+{
+	struct kvm_arch_async_pf arch;
+	arch.token = (vcpu->arch.async_pf_id++ << 12) | vcpu->vcpu_id;
+	return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
+}
+
+static bool can_do_async_pf(struct kvm_vcpu *vcpu)
+{
+	if (!vcpu->arch.apf_data || kvm_event_needs_reinjection(vcpu))
+		return false;
+
+	return !!kvm_x86_ops->get_cpl(vcpu);
+}
+
 static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 				u32 error_code)
 {
@@ -2272,6 +2288,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	int level;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
+	bool async;
 
 	ASSERT(vcpu);
 	ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
@@ -2286,7 +2303,23 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
-	pfn = gfn_to_pfn(vcpu->kvm, gfn);
+
+	if (can_do_async_pf(vcpu)) {
+		pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async);
+		trace_kvm_try_async_get_page(async, pfn);
+	} else {
+do_sync:
+		async = false;
+		pfn = gfn_to_pfn(vcpu->kvm, gfn);
+	}
+
+	if (async) {
+		if (!kvm_arch_setup_async_pf(vcpu, gpa, gfn))
+			goto do_sync;
+		return 0;
+	}
+
+	/* mmio */
 	if (is_error_pfn(pfn))
 		return kvm_handle_bad_page(vcpu->kvm, gfn, pfn);
 	spin_lock(&vcpu->kvm->mmu_lock);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 3350c02..26d6b74 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -423,6 +423,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	pfn_t pfn;
 	int level = PT_PAGE_TABLE_LEVEL;
 	unsigned long mmu_seq;
+	bool async;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
 	kvm_mmu_audit(vcpu, "pre page fault");
@@ -454,7 +455,21 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
-	pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
+
+	if (can_do_async_pf(vcpu)) {
+		pfn = gfn_to_pfn_async(vcpu->kvm, walker.gfn, &async);
+		trace_kvm_try_async_get_page(async, pfn);
+	} else {
+do_sync:
+		async = false;
+		pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
+	}
+
+	if (async) {
+		if (!kvm_arch_setup_async_pf(vcpu, addr, walker.gfn))
+			goto do_sync;
+		return 0;
+	}
 
 	/* mmio */
 	if (is_error_pfn(pfn))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 744f8c1..6b7542f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -118,6 +118,8 @@ static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs);
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ "pf_fixed", VCPU_STAT(pf_fixed) },
 	{ "pf_guest", VCPU_STAT(pf_guest) },
+	{ "apf_not_present", VCPU_STAT(apf_not_present) },
+	{ "apf_present", VCPU_STAT(apf_present) },
 	{ "tlb_flush", VCPU_STAT(tlb_flush) },
 	{ "invlpg", VCPU_STAT(invlpg) },
 	{ "exits", VCPU_STAT(exits) },
@@ -1226,6 +1228,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 
 	if (!(data & KVM_ASYNC_PF_ENABLED)) {
 		vcpu->arch.apf_data = NULL;
+		kvm_clear_async_pf_completion_queue(vcpu);
 		return 0;
 	}
 
@@ -1240,6 +1243,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 		vcpu->arch.apf_data = NULL;
 		return 1;
 	}
+	vcpu->arch.apf_memslot_ver = vcpu->kvm->memslot_version;
+	kvm_async_pf_wakeup_all(vcpu);
 	return 0;
 }
 
@@ -4721,6 +4726,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (unlikely(r))
 		goto out;
 
+	kvm_check_async_pf_completion(vcpu);
+
 	preempt_disable();
 
 	kvm_x86_ops->prepare_guest_switch(vcpu);
@@ -5393,6 +5400,8 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.apf_data = NULL;
 	vcpu->arch.apf_msr_val = 0;
 
+	kvm_clear_async_pf_completion_queue(vcpu);
+
 	return kvm_x86_ops->vcpu_reset(vcpu);
 }
 
@@ -5534,8 +5543,10 @@ static void kvm_free_vcpus(struct kvm *kvm)
 	/*
 	 * Unpin any mmu pages first.
 	 */
-	kvm_for_each_vcpu(i, vcpu, kvm)
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		kvm_clear_async_pf_completion_queue(vcpu);
 		kvm_unload_vcpu_mmu(vcpu);
+	}
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_arch_vcpu_free(vcpu);
 
@@ -5647,6 +5658,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE
+		|| !list_empty_careful(&vcpu->async_pf_done)
 		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
 		|| vcpu->arch.nmi_pending ||
 		(kvm_arch_interrupt_allowed(vcpu) &&
@@ -5704,6 +5716,55 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 }
 EXPORT_SYMBOL_GPL(kvm_set_rflags);
 
+static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
+{
+	if (unlikely(vcpu->arch.apf_memslot_ver !=
+		     vcpu->kvm->memslot_version)) {
+		u64 gpa = vcpu->arch.apf_msr_val & ~0x3f;
+		unsigned long addr;
+		int offset = offset_in_page(gpa);
+
+		addr = gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT);
+		vcpu->arch.apf_data = (u32 __user*)(addr + offset);
+		if (kvm_is_error_hva(addr)) {
+			vcpu->arch.apf_data = NULL;
+			return -EFAULT;
+		}
+	}
+
+	return put_user(val, vcpu->arch.apf_data);
+}
+
+void kvm_arch_inject_async_page_not_present(struct kvm_vcpu *vcpu,
+					    struct kvm_async_pf *work)
+{
+	if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
+		kvm_inject_page_fault(vcpu, work->arch.token, 0);
+		++vcpu->stat.apf_not_present;
+		trace_kvm_send_async_pf(work->arch.token, work->gva,
+					KVM_PV_REASON_PAGE_NOT_PRESENT);
+	}
+}
+
+void kvm_arch_inject_async_page_present(struct kvm_vcpu *vcpu,
+					struct kvm_async_pf *work)
+{
+	if (is_error_page(work->page))
+		work->arch.token = ~0; /* broadcast wakeup */
+	if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+		kvm_inject_page_fault(vcpu, work->arch.token, 0);
+		++vcpu->stat.apf_present;
+		trace_kvm_send_async_pf(work->arch.token, work->gva,
+					KVM_PV_REASON_PAGE_READY);
+	}
+}
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
+{
+	return !kvm_event_needs_reinjection(vcpu) &&
+		kvm_x86_ops->interrupt_allowed(vcpu);
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 64f62f1..09c646f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -16,6 +16,7 @@
 #include <linux/mm.h>
 #include <linux/preempt.h>
 #include <linux/msi.h>
+#include <linux/slow-work.h>
 #include <asm/signal.h>
 
 #include <linux/kvm.h>
@@ -73,6 +74,27 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 			      struct kvm_io_device *dev);
 
+#ifdef CONFIG_KVM_ASYNC_PF
+struct kvm_async_pf {
+	struct slow_work work;
+	struct list_head link;
+	struct list_head queue;
+	struct kvm_vcpu *vcpu;
+	struct mm_struct *mm;
+	gva_t gva;
+	unsigned long addr;
+	struct kvm_arch_async_pf arch;
+	struct page *page;
+	atomic_t used;
+};
+
+void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
+void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+		       struct kvm_arch_async_pf *arch);
+int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
+#endif
+
 struct kvm_vcpu {
 	struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
@@ -103,6 +125,14 @@ struct kvm_vcpu {
 	gpa_t mmio_phys_addr;
 #endif
 
+#ifdef CONFIG_KVM_ASYNC_PF
+	u32 async_pf_queued;
+	struct list_head async_pf_queue;
+	struct list_head async_pf_done;
+	spinlock_t async_pf_lock;
+	struct kvm_async_pf *async_pf_work;
+#endif
+
 	struct kvm_vcpu_arch arch;
 };
 
@@ -296,6 +326,7 @@ void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);
 
+pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async);
 pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
 pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
 			 struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 6dd3a51..6d9f0c2 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -185,6 +185,66 @@ TRACE_EVENT(kvm_age_page,
 		  __entry->referenced ? "YOUNG" : "OLD")
 );
 
+#ifdef CONFIG_KVM_ASYNC_PF
+TRACE_EVENT(
+	kvm_try_async_get_page,
+	TP_PROTO(bool async, u64 pfn),
+	TP_ARGS(async, pfn),
+
+	TP_STRUCT__entry(
+		__field(__u64, pfn)
+		),
+
+	TP_fast_assign(
+		__entry->pfn = (!async) ? pfn : (u64)-1;
+		),
+
+	TP_printk("pfn %#llx", __entry->pfn)
+);
+
+TRACE_EVENT(
+	kvm_send_async_pf,
+	TP_PROTO(u64 token, u64 gva, u64 reason),
+	TP_ARGS(token, gva, reason),
+
+	TP_STRUCT__entry(
+		__field(__u64, token)
+		__field(__u64, gva)
+		__field(bool, np)
+		),
+
+	TP_fast_assign(
+		__entry->token = token;
+		__entry->gva = gva;
+		__entry->np = (reason == KVM_PV_REASON_PAGE_NOT_PRESENT);
+		),
+
+	TP_printk("token %#llx gva %#llx %s", __entry->token, __entry->gva,
+		  __entry->np ? "not present" : "ready")
+);
+
+TRACE_EVENT(
+	kvm_async_pf_completed,
+	TP_PROTO(unsigned long address, struct page *page, u64 gva),
+	TP_ARGS(address, page, gva),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, address)
+		__field(struct page*, page)
+		__field(u64, gva)
+		),
+
+	TP_fast_assign(
+		__entry->address = address;
+		__entry->page = page;
+		__entry->gva = gva;
+		),
+
+	TP_printk("gva %#llx address %#lx pfn %lx",  __entry->gva,
+		  __entry->address, page_to_pfn(__entry->page))
+);
+#endif
+
 #endif /* _TRACE_KVM_MAIN_H */
 
 /* This part must be outside protection */
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 7f1178f..f63ccb0 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -15,3 +15,6 @@ config KVM_APIC_ARCHITECTURE
 
 config KVM_MMIO
        bool
+
+config KVM_ASYNC_PF
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 733558c..0656054 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -78,6 +78,11 @@ static atomic_t hardware_enable_failed;
 struct kmem_cache *kvm_vcpu_cache;
 EXPORT_SYMBOL_GPL(kvm_vcpu_cache);
 
+#ifdef CONFIG_KVM_ASYNC_PF
+#define ASYNC_PF_PER_VCPU 100
+static struct kmem_cache *async_pf_cache;
+#endif
+
 static __read_mostly struct preempt_ops kvm_preempt_ops;
 
 struct dentry *kvm_debugfs_dir;
@@ -183,6 +188,11 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 	vcpu->kvm = kvm;
 	vcpu->vcpu_id = id;
 	init_waitqueue_head(&vcpu->wq);
+#ifdef CONFIG_KVM_ASYNC_PF
+	INIT_LIST_HEAD(&vcpu->async_pf_done);
+	INIT_LIST_HEAD(&vcpu->async_pf_queue);
+	spin_lock_init(&vcpu->async_pf_lock);
+#endif
 
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 	if (!page) {
@@ -935,7 +945,7 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(gfn_to_hva);
 
-static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
+static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool *async)
 {
 	struct page *page[1];
 	int npages;
@@ -943,7 +953,19 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
 
 	might_sleep();
 
-	npages = get_user_pages_fast(addr, 1, 1, page);
+	if (async) {
+#ifdef CONFIG_X86
+		npages = __get_user_pages_fast(addr, 1, 1, page);
+#endif
+		if (unlikely(npages != 1)) {
+			down_read(&current->mm->mmap_sem);
+			npages = get_user_pages_noio(current, current->mm,
+						     addr, 1, 1, 0, page,
+						     NULL);
+			up_read(&current->mm->mmap_sem);
+		}
+	} else
+		npages = get_user_pages_fast(addr, 1, 1, page);
 
 	if (unlikely(npages != 1)) {
 		struct vm_area_struct *vma;
@@ -959,6 +981,9 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
 
 		if (vma == NULL || addr < vma->vm_start ||
 		    !(vma->vm_flags & VM_PFNMAP)) {
+			if (async && !(vma->vm_flags & VM_PFNMAP) &&
+			    (vma->vm_flags & VM_WRITE))
+				*async = true;
 			up_read(&current->mm->mmap_sem);
 			get_page(bad_page);
 			return page_to_pfn(bad_page);
@@ -973,25 +998,37 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
 	return pfn;
 }
 
-pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+static inline pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool *async)
 {
 	unsigned long addr;
 
+	if (async)
+		*async = false;
 	addr = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(addr)) {
 		get_page(bad_page);
 		return page_to_pfn(bad_page);
 	}
 
-	return hva_to_pfn(kvm, addr);
+	return hva_to_pfn(kvm, addr, async);
+}
+
+pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+{
+	return __gfn_to_pfn(kvm, gfn, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
+pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async)
+{
+	return __gfn_to_pfn(kvm, gfn, async);
+}
+
 pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
 			 struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	unsigned long addr = gfn_to_hva_memslot(slot, gfn);
-	return hva_to_pfn(kvm, addr);
+	return hva_to_pfn(kvm, addr, NULL);
 }
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
@@ -1204,6 +1241,196 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 	}
 }
 
+#ifdef CONFIG_KVM_ASYNC_PF
+static void async_pf_work_free(struct kvm_async_pf *apf)
+{
+	if (atomic_dec_and_test(&apf->used))
+		kmem_cache_free(async_pf_cache, apf);
+}
+
+static int async_pf_get_ref(struct slow_work *work)
+{
+	struct kvm_async_pf *apf =
+		container_of(work, struct kvm_async_pf, work);
+
+	atomic_inc(&apf->used);
+	return 0;
+}
+
+static void async_pf_put_ref(struct slow_work *work)
+{
+	struct kvm_async_pf *apf =
+		container_of(work, struct kvm_async_pf, work);
+
+	kvm_put_kvm(apf->vcpu->kvm);
+	async_pf_work_free(apf);
+}
+
+static void async_pf_execute(struct slow_work *work)
+{
+	struct page *page;
+	struct kvm_async_pf *apf =
+		container_of(work, struct kvm_async_pf, work);
+	wait_queue_head_t *q = &apf->vcpu->wq;
+
+	might_sleep();
+
+	down_read(&apf->mm->mmap_sem);
+	get_user_pages(current, apf->mm, apf->addr, 1, 1, 0, &page, NULL);
+	up_read(&apf->mm->mmap_sem);
+
+	spin_lock(&apf->vcpu->async_pf_lock);
+	list_add_tail(&apf->link, &apf->vcpu->async_pf_done);
+	apf->page = page;
+	spin_unlock(&apf->vcpu->async_pf_lock);
+
+	trace_kvm_async_pf_completed(apf->addr, apf->page, apf->gva);
+
+	if (waitqueue_active(q))
+		wake_up_interruptible(q);
+
+	mmdrop(apf->mm);
+}
+
+struct slow_work_ops async_pf_ops = {
+	.get_ref = async_pf_get_ref,
+	.put_ref = async_pf_put_ref,
+	.execute = async_pf_execute
+};
+
+void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
+{
+	/* cancel outstanding slow work item */
+	while (!list_empty(&vcpu->async_pf_queue)) {
+		struct kvm_async_pf *work =
+			list_entry(vcpu->async_pf_queue.next,
+				   typeof(*work), queue);
+		slow_work_cancel(&work->work);
+		list_del(&work->queue);
+		if (!work->page) /* work was canceled */
+			kmem_cache_free(async_pf_cache, work);
+	}
+
+	spin_lock(&vcpu->async_pf_lock);
+	while (!list_empty(&vcpu->async_pf_done)) {
+		struct kvm_async_pf *work =
+			list_entry(vcpu->async_pf_done.next,
+				   typeof(*work), link);
+		list_del(&work->link);
+		put_page(work->page);
+		kmem_cache_free(async_pf_cache, work);
+	}
+	spin_unlock(&vcpu->async_pf_lock);
+
+	vcpu->async_pf_queued = 0;
+	vcpu->async_pf_work = NULL;
+}
+
+void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
+{
+	struct kvm_async_pf *work = vcpu->async_pf_work;
+
+	if (work) {
+		vcpu->async_pf_work = NULL;
+		if (work->page == NULL) {
+			kvm_arch_inject_async_page_not_present(vcpu, work);
+			return;
+		} else {
+			spin_lock(&vcpu->async_pf_lock);
+			list_del(&work->link);
+			spin_unlock(&vcpu->async_pf_lock);
+			put_page(work->page);
+			async_pf_work_free(work);
+			list_del(&work->queue);
+			vcpu->async_pf_queued--;
+		}
+	}
+
+	if (list_empty_careful(&vcpu->async_pf_done) ||
+	    !kvm_arch_can_inject_async_page_present(vcpu))
+		return;
+
+	spin_lock(&vcpu->async_pf_lock);
+	work = list_first_entry(&vcpu->async_pf_done, typeof(*work), link);
+	list_del(&work->link);
+	spin_unlock(&vcpu->async_pf_lock);
+	list_del(&work->queue);
+	vcpu->async_pf_queued--;
+
+	kvm_arch_inject_async_page_present(vcpu, work);
+
+	put_page(work->page);
+	async_pf_work_free(work);
+}
+
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+		       struct kvm_arch_async_pf *arch)
+{
+	struct kvm_async_pf *work;
+
+	if (vcpu->async_pf_queued >= ASYNC_PF_PER_VCPU)
+		return 0;
+
+	/* setup slow work */
+
+	/* do alloc atomic since if we are going to sleep anyway we
+	   may as well sleep faulting in page */
+	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
+	if (!work)
+		return 0;
+
+	atomic_set(&work->used, 1);
+	work->page = NULL;
+	work->vcpu = vcpu;
+	work->gva = gva;
+	work->addr = gfn_to_hva(vcpu->kvm, gfn);
+	work->arch = *arch;
+	work->mm = current->mm;
+	atomic_inc(&work->mm->mm_count);
+	kvm_get_kvm(work->vcpu->kvm);
+
+	/* this can't really happen otherwise gfn_to_pfn_async
+	   would succeed */
+	if (unlikely(kvm_is_error_hva(work->addr)))
+		goto retry_sync;
+
+	slow_work_init(&work->work, &async_pf_ops);
+	if (slow_work_enqueue(&work->work) != 0)
+		goto retry_sync;
+
+	vcpu->async_pf_work = work;
+	list_add_tail(&work->queue, &vcpu->async_pf_queue);
+	vcpu->async_pf_queued++;
+	return 1;
+retry_sync:
+	kvm_put_kvm(work->vcpu->kvm);
+	mmdrop(work->mm);
+	kmem_cache_free(async_pf_cache, work);
+	return 0;
+}
+
+int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
+{
+	struct kvm_async_pf *work;
+
+	if (!list_empty(&vcpu->async_pf_done))
+		return 0;
+
+	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
+	if (!work)
+		return -ENOMEM;
+
+	atomic_set(&work->used, 1);
+	work->page = bad_page;
+	get_page(bad_page);
+	INIT_LIST_HEAD(&work->queue); /* for list_del to work */
+
+	list_add_tail(&work->link, &vcpu->async_pf_done);
+	vcpu->async_pf_queued++;
+	return 0;
+}
+#endif
+
 /*
  * The vCPU has executed a HLT instruction with in-kernel mode enabled.
  */
@@ -2267,6 +2494,19 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 		goto out_free_5;
 	}
 
+#ifdef CONFIG_KVM_ASYNC_PF
+	async_pf_cache = KMEM_CACHE(kvm_async_pf, 0);
+
+	if (!async_pf_cache) {
+		r = -ENOMEM;
+		goto out_free_6;
+	}
+
+	r = slow_work_register_user(THIS_MODULE);
+	if (r)
+		goto out_free;
+#endif
+
 	kvm_chardev_ops.owner = module;
 	kvm_vm_fops.owner = module;
 	kvm_vcpu_fops.owner = module;
@@ -2274,7 +2514,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	r = misc_register(&kvm_dev);
 	if (r) {
 		printk(KERN_ERR "kvm: misc device register failed\n");
-		goto out_free;
+		goto out_unreg;
 	}
 
 	kvm_preempt_ops.sched_in = kvm_sched_in;
@@ -2284,7 +2524,13 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 
 	return 0;
 
+out_unreg:
+#ifdef CONFIG_KVM_ASYNC_PF
+	slow_work_unregister_user(THIS_MODULE);
 out_free:
+	kmem_cache_destroy(async_pf_cache);
+out_free_6:
+#endif
 	kmem_cache_destroy(kvm_vcpu_cache);
 out_free_5:
 	sysdev_unregister(&kvm_sysdev);
@@ -2314,6 +2560,11 @@ void kvm_exit(void)
 	kvm_exit_debug();
 	misc_deregister(&kvm_dev);
 	kmem_cache_destroy(kvm_vcpu_cache);
+#ifdef CONFIG_KVM_ASYNC_PF
+	if (async_pf_cache)
+		kmem_cache_destroy(async_pf_cache);
+	slow_work_unregister_user(THIS_MODULE);
+#endif
 	sysdev_unregister(&kvm_sysdev);
 	sysdev_class_unregister(&kvm_sysdev_class);
 	unregister_reboot_notifier(&kvm_reboot_notifier);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 09/12] Retry fault before vmentry
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

When page is swapped in it is mapped into guest memory only after guest
tries to access it again and generate another fault. To save this fault
we can map it immediately since we know that guest is going to access
the page.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    7 ++++++-
 arch/x86/kvm/mmu.c              |   27 ++++++++++++++++++++-------
 arch/x86/kvm/paging_tmpl.h      |   37 +++++++++++++++++++++++++++++++++----
 arch/x86/kvm/x86.c              |    9 +++++++++
 virt/kvm/kvm_main.c             |    2 ++
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db514ea..45e6c12 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -236,7 +236,8 @@ struct kvm_pio_request {
  */
 struct kvm_mmu {
 	void (*new_cr3)(struct kvm_vcpu *vcpu);
-	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
+	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, bool sync);
+	int (*page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3, gva_t gva, u32 err);
 	void (*free)(struct kvm_vcpu *vcpu);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
 			    u32 *error);
@@ -534,6 +535,8 @@ struct kvm_x86_ops {
 
 struct kvm_arch_async_pf {
 	u32 token;
+	gpa_t cr3;
+	u32 error_code;
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;
@@ -777,6 +780,8 @@ void kvm_arch_inject_async_page_not_present(struct kvm_vcpu *vcpu,
 					    struct kvm_async_pf *work);
 void kvm_arch_inject_async_page_present(struct kvm_vcpu *vcpu,
 					struct kvm_async_pf *work);
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work);
 bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
 #endif /* _ASM_X86_KVM_HOST_H */
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a49565b..95a0a8b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2246,7 +2246,7 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
 }
 
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
-				u32 error_code)
+				u32 error_code, bool sync)
 {
 	gfn_t gfn;
 	int r;
@@ -2265,10 +2265,13 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 			     error_code & PFERR_WRITE_MASK, gfn);
 }
 
-int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr3, gva_t gva,
+			    gfn_t gfn, u32 error_code)
 {
 	struct kvm_arch_async_pf arch;
 	arch.token = (vcpu->arch.async_pf_id++ << 12) | vcpu->vcpu_id;
+	arch.cr3 = cr3;
+	arch.error_code = error_code;
 	return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
 }
 
@@ -2280,8 +2283,8 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu)
 	return !!kvm_x86_ops->get_cpl(vcpu);
 }
 
-static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
-				u32 error_code)
+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
+			  bool sync)
 {
 	pfn_t pfn;
 	int r;
@@ -2304,7 +2307,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (can_do_async_pf(vcpu)) {
+	if (!sync && can_do_async_pf(vcpu)) {
 		pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async);
 		trace_kvm_try_async_get_page(async, pfn);
 	} else {
@@ -2314,7 +2317,8 @@ do_sync:
 	}
 
 	if (async) {
-		if (!kvm_arch_setup_async_pf(vcpu, gpa, gfn))
+		if (!kvm_arch_setup_async_pf(vcpu, vcpu->arch.cr3, gpa, gfn,
+					     error_code))
 			goto do_sync;
 		return 0;
 	}
@@ -2338,6 +2342,12 @@ out_unlock:
 	return 0;
 }
 
+static int tdp_page_fault_sync(struct kvm_vcpu *vcpu, gpa_t cr3, gva_t gpa,
+			       u32 error_code)
+{
+	return tdp_page_fault(vcpu, gpa, error_code, true);
+}
+
 static void nonpaging_free(struct kvm_vcpu *vcpu)
 {
 	mmu_free_roots(vcpu);
@@ -2468,6 +2478,7 @@ static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
 	ASSERT(is_pae(vcpu));
 	context->new_cr3 = paging_new_cr3;
 	context->page_fault = paging64_page_fault;
+	context->page_fault_other_cr3 = paging64_page_fault_other_cr3;
 	context->gva_to_gpa = paging64_gva_to_gpa;
 	context->prefetch_page = paging64_prefetch_page;
 	context->sync_page = paging64_sync_page;
@@ -2492,6 +2503,7 @@ static int paging32_init_context(struct kvm_vcpu *vcpu)
 	reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
 	context->new_cr3 = paging_new_cr3;
 	context->page_fault = paging32_page_fault;
+	context->page_fault_other_cr3 = paging32_page_fault_other_cr3;
 	context->gva_to_gpa = paging32_gva_to_gpa;
 	context->free = paging_free;
 	context->prefetch_page = paging32_prefetch_page;
@@ -2515,6 +2527,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 
 	context->new_cr3 = nonpaging_new_cr3;
 	context->page_fault = tdp_page_fault;
+	context->page_fault_other_cr3 = tdp_page_fault_sync;
 	context->free = nonpaging_free;
 	context->prefetch_page = nonpaging_prefetch_page;
 	context->sync_page = nonpaging_sync_page;
@@ -2902,7 +2915,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code)
 	int r;
 	enum emulation_result er;
 
-	r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code);
+	r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false);
 	if (r < 0)
 		goto out;
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 26d6b74..cbc9729 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -410,8 +410,8 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
  *  Returns: 1 if we need to emulate the instruction, 0 otherwise, or
  *           a negative value on error.
  */
-static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
-			       u32 error_code)
+static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
+			     bool sync)
 {
 	int write_fault = error_code & PFERR_WRITE_MASK;
 	int user_fault = error_code & PFERR_USER_MASK;
@@ -456,7 +456,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (can_do_async_pf(vcpu)) {
+	if (!sync && can_do_async_pf(vcpu)) {
 		pfn = gfn_to_pfn_async(vcpu->kvm, walker.gfn, &async);
 		trace_kvm_try_async_get_page(async, pfn);
 	} else {
@@ -466,7 +466,8 @@ do_sync:
 	}
 
 	if (async) {
-		if (!kvm_arch_setup_async_pf(vcpu, addr, walker.gfn))
+		if (!kvm_arch_setup_async_pf(vcpu, vcpu->arch.cr3, addr,
+					     walker.gfn, error_code))
 			goto do_sync;
 		return 0;
 	}
@@ -500,6 +501,34 @@ out_unlock:
 	return 0;
 }
 
+static int FNAME(page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3,
+				       gva_t addr, u32 error_code)
+{
+	int r = 0;
+	gpa_t curr_cr3 = vcpu->arch.cr3;
+
+	if (curr_cr3 != cr3) {
+		/*
+		 * We do page fault on behalf of a process that is sleeping
+		 * because of async PF. PV guest takes reference to mm that cr3
+		 * belongs too, so it has to be valid here.
+		 */
+		kvm_set_cr3(vcpu, cr3);
+		if (kvm_mmu_reload(vcpu))
+			goto switch_cr3;
+	}
+
+	r = FNAME(page_fault)(vcpu, addr, error_code, true);
+
+switch_cr3:
+	if (curr_cr3 != vcpu->arch.cr3) {
+		kvm_set_cr3(vcpu, curr_cr3);
+		kvm_mmu_reload(vcpu);
+	}
+
+	return r;
+}
+
 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 {
 	struct kvm_shadow_walk_iterator iterator;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b7542f..ae7164e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5716,6 +5716,15 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 }
 EXPORT_SYMBOL_GPL(kvm_set_rflags);
 
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work)
+{
+	if (!vcpu->arch.mmu.page_fault_other_cr3 || is_error_page(work->page))
+		return;
+	vcpu->arch.mmu.page_fault_other_cr3(vcpu, work->arch.cr3, work->gva,
+					    work->arch.error_code);
+}
+
 static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
 {
 	if (unlikely(vcpu->arch.apf_memslot_ver !=
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0656054..409b9b9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1339,6 +1339,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 			spin_lock(&vcpu->async_pf_lock);
 			list_del(&work->link);
 			spin_unlock(&vcpu->async_pf_lock);
+			kvm_arch_async_page_ready(vcpu, work);
 			put_page(work->page);
 			async_pf_work_free(work);
 			list_del(&work->queue);
@@ -1357,6 +1358,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 	list_del(&work->queue);
 	vcpu->async_pf_queued--;
 
+	kvm_arch_async_page_ready(vcpu, work);
 	kvm_arch_inject_async_page_present(vcpu, work);
 
 	put_page(work->page);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 09/12] Retry fault before vmentry
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

When page is swapped in it is mapped into guest memory only after guest
tries to access it again and generate another fault. To save this fault
we can map it immediately since we know that guest is going to access
the page.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    7 ++++++-
 arch/x86/kvm/mmu.c              |   27 ++++++++++++++++++++-------
 arch/x86/kvm/paging_tmpl.h      |   37 +++++++++++++++++++++++++++++++++----
 arch/x86/kvm/x86.c              |    9 +++++++++
 virt/kvm/kvm_main.c             |    2 ++
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db514ea..45e6c12 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -236,7 +236,8 @@ struct kvm_pio_request {
  */
 struct kvm_mmu {
 	void (*new_cr3)(struct kvm_vcpu *vcpu);
-	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
+	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, bool sync);
+	int (*page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3, gva_t gva, u32 err);
 	void (*free)(struct kvm_vcpu *vcpu);
 	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
 			    u32 *error);
@@ -534,6 +535,8 @@ struct kvm_x86_ops {
 
 struct kvm_arch_async_pf {
 	u32 token;
+	gpa_t cr3;
+	u32 error_code;
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;
@@ -777,6 +780,8 @@ void kvm_arch_inject_async_page_not_present(struct kvm_vcpu *vcpu,
 					    struct kvm_async_pf *work);
 void kvm_arch_inject_async_page_present(struct kvm_vcpu *vcpu,
 					struct kvm_async_pf *work);
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work);
 bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
 #endif /* _ASM_X86_KVM_HOST_H */
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a49565b..95a0a8b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2246,7 +2246,7 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
 }
 
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
-				u32 error_code)
+				u32 error_code, bool sync)
 {
 	gfn_t gfn;
 	int r;
@@ -2265,10 +2265,13 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 			     error_code & PFERR_WRITE_MASK, gfn);
 }
 
-int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr3, gva_t gva,
+			    gfn_t gfn, u32 error_code)
 {
 	struct kvm_arch_async_pf arch;
 	arch.token = (vcpu->arch.async_pf_id++ << 12) | vcpu->vcpu_id;
+	arch.cr3 = cr3;
+	arch.error_code = error_code;
 	return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
 }
 
@@ -2280,8 +2283,8 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu)
 	return !!kvm_x86_ops->get_cpl(vcpu);
 }
 
-static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
-				u32 error_code)
+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
+			  bool sync)
 {
 	pfn_t pfn;
 	int r;
@@ -2304,7 +2307,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (can_do_async_pf(vcpu)) {
+	if (!sync && can_do_async_pf(vcpu)) {
 		pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async);
 		trace_kvm_try_async_get_page(async, pfn);
 	} else {
@@ -2314,7 +2317,8 @@ do_sync:
 	}
 
 	if (async) {
-		if (!kvm_arch_setup_async_pf(vcpu, gpa, gfn))
+		if (!kvm_arch_setup_async_pf(vcpu, vcpu->arch.cr3, gpa, gfn,
+					     error_code))
 			goto do_sync;
 		return 0;
 	}
@@ -2338,6 +2342,12 @@ out_unlock:
 	return 0;
 }
 
+static int tdp_page_fault_sync(struct kvm_vcpu *vcpu, gpa_t cr3, gva_t gpa,
+			       u32 error_code)
+{
+	return tdp_page_fault(vcpu, gpa, error_code, true);
+}
+
 static void nonpaging_free(struct kvm_vcpu *vcpu)
 {
 	mmu_free_roots(vcpu);
@@ -2468,6 +2478,7 @@ static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
 	ASSERT(is_pae(vcpu));
 	context->new_cr3 = paging_new_cr3;
 	context->page_fault = paging64_page_fault;
+	context->page_fault_other_cr3 = paging64_page_fault_other_cr3;
 	context->gva_to_gpa = paging64_gva_to_gpa;
 	context->prefetch_page = paging64_prefetch_page;
 	context->sync_page = paging64_sync_page;
@@ -2492,6 +2503,7 @@ static int paging32_init_context(struct kvm_vcpu *vcpu)
 	reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
 	context->new_cr3 = paging_new_cr3;
 	context->page_fault = paging32_page_fault;
+	context->page_fault_other_cr3 = paging32_page_fault_other_cr3;
 	context->gva_to_gpa = paging32_gva_to_gpa;
 	context->free = paging_free;
 	context->prefetch_page = paging32_prefetch_page;
@@ -2515,6 +2527,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 
 	context->new_cr3 = nonpaging_new_cr3;
 	context->page_fault = tdp_page_fault;
+	context->page_fault_other_cr3 = tdp_page_fault_sync;
 	context->free = nonpaging_free;
 	context->prefetch_page = nonpaging_prefetch_page;
 	context->sync_page = nonpaging_sync_page;
@@ -2902,7 +2915,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code)
 	int r;
 	enum emulation_result er;
 
-	r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code);
+	r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false);
 	if (r < 0)
 		goto out;
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 26d6b74..cbc9729 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -410,8 +410,8 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
  *  Returns: 1 if we need to emulate the instruction, 0 otherwise, or
  *           a negative value on error.
  */
-static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
-			       u32 error_code)
+static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
+			     bool sync)
 {
 	int write_fault = error_code & PFERR_WRITE_MASK;
 	int user_fault = error_code & PFERR_USER_MASK;
@@ -456,7 +456,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (can_do_async_pf(vcpu)) {
+	if (!sync && can_do_async_pf(vcpu)) {
 		pfn = gfn_to_pfn_async(vcpu->kvm, walker.gfn, &async);
 		trace_kvm_try_async_get_page(async, pfn);
 	} else {
@@ -466,7 +466,8 @@ do_sync:
 	}
 
 	if (async) {
-		if (!kvm_arch_setup_async_pf(vcpu, addr, walker.gfn))
+		if (!kvm_arch_setup_async_pf(vcpu, vcpu->arch.cr3, addr,
+					     walker.gfn, error_code))
 			goto do_sync;
 		return 0;
 	}
@@ -500,6 +501,34 @@ out_unlock:
 	return 0;
 }
 
+static int FNAME(page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3,
+				       gva_t addr, u32 error_code)
+{
+	int r = 0;
+	gpa_t curr_cr3 = vcpu->arch.cr3;
+
+	if (curr_cr3 != cr3) {
+		/*
+		 * We do page fault on behalf of a process that is sleeping
+		 * because of async PF. PV guest takes reference to mm that cr3
+		 * belongs too, so it has to be valid here.
+		 */
+		kvm_set_cr3(vcpu, cr3);
+		if (kvm_mmu_reload(vcpu))
+			goto switch_cr3;
+	}
+
+	r = FNAME(page_fault)(vcpu, addr, error_code, true);
+
+switch_cr3:
+	if (curr_cr3 != vcpu->arch.cr3) {
+		kvm_set_cr3(vcpu, curr_cr3);
+		kvm_mmu_reload(vcpu);
+	}
+
+	return r;
+}
+
 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 {
 	struct kvm_shadow_walk_iterator iterator;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b7542f..ae7164e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5716,6 +5716,15 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 }
 EXPORT_SYMBOL_GPL(kvm_set_rflags);
 
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work)
+{
+	if (!vcpu->arch.mmu.page_fault_other_cr3 || is_error_page(work->page))
+		return;
+	vcpu->arch.mmu.page_fault_other_cr3(vcpu, work->arch.cr3, work->gva,
+					    work->arch.error_code);
+}
+
 static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
 {
 	if (unlikely(vcpu->arch.apf_memslot_ver !=
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0656054..409b9b9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1339,6 +1339,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 			spin_lock(&vcpu->async_pf_lock);
 			list_del(&work->link);
 			spin_unlock(&vcpu->async_pf_lock);
+			kvm_arch_async_page_ready(vcpu, work);
 			put_page(work->page);
 			async_pf_work_free(work);
 			list_del(&work->queue);
@@ -1357,6 +1358,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 	list_del(&work->queue);
 	vcpu->async_pf_queued--;
 
+	kvm_arch_async_page_ready(vcpu, work);
 	kvm_arch_inject_async_page_present(vcpu, work);
 
 	put_page(work->page);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 10/12] Handle async PF in non preemptable context
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kernel/kvm.c |   36 ++++++++++++++++++++++++++++++++----
 1 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index fa9f520..f4d87b3 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -37,6 +37,7 @@
 #include <asm/cpu.h>
 #include <asm/traps.h>
 #include <asm/desc.h>
+#include <asm/tlbflush.h>
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -68,6 +69,8 @@ struct kvm_task_sleep_node {
 	wait_queue_head_t wq;
 	u32 token;
 	int cpu;
+	bool halted;
+	struct mm_struct *mm;
 };
 
 static struct kvm_task_sleep_head {
@@ -96,6 +99,11 @@ static void apf_task_wait(struct task_struct *tsk, u32 token)
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
 	struct kvm_task_sleep_node n, *e;
 	DEFINE_WAIT(wait);
+	int cpu, idle;
+
+	cpu = get_cpu();
+	idle = idle_cpu(cpu);
+	put_cpu();
 
 	spin_lock(&b->lock);
 	e = _find_apf_task(b, token);
@@ -109,17 +117,31 @@ static void apf_task_wait(struct task_struct *tsk, u32 token)
 
 	n.token = token;
 	n.cpu = smp_processor_id();
+	n.mm = percpu_read(cpu_tlbstate.active_mm);
+	n.halted = idle || preempt_count() > 1;
+	atomic_inc(&n.mm->mm_count);
 	init_waitqueue_head(&n.wq);
 	hlist_add_head(&n.link, &b->list);
 	spin_unlock(&b->lock);
 
 	for (;;) {
-		prepare_to_wait(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
+		if (!n.halted)
+			prepare_to_wait(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
 		if (hlist_unhashed(&n.link))
 			break;
-		schedule();
+
+		if (!n.halted) {
+			schedule();
+		} else {
+			/*
+			 * We cannot reschedule. So halt.
+			 */
+			native_safe_halt();
+			local_irq_disable();
+		}
 	}
-	finish_wait(&n.wq, &wait);
+	if (!n.halted)
+		finish_wait(&n.wq, &wait);
 
 	return;
 }
@@ -127,7 +149,12 @@ static void apf_task_wait(struct task_struct *tsk, u32 token)
 static void apf_task_wake_one(struct kvm_task_sleep_node *n)
 {
 	hlist_del_init(&n->link);
-	if (waitqueue_active(&n->wq))
+	if (!n->mm)
+		return;
+	mmdrop(n->mm);
+	if (n->halted)
+		smp_send_reschedule(n->cpu);
+	else if (waitqueue_active(&n->wq))
 		wake_up(&n->wq);
 }
 
@@ -157,6 +184,7 @@ again:
 		}
 		n->token = token;
 		n->cpu = smp_processor_id();
+		n->mm = NULL;
 		init_waitqueue_head(&n->wq);
 		hlist_add_head(&n->link, &b->list);
 	} else
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 10/12] Handle async PF in non preemptable context
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kernel/kvm.c |   36 ++++++++++++++++++++++++++++++++----
 1 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index fa9f520..f4d87b3 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -37,6 +37,7 @@
 #include <asm/cpu.h>
 #include <asm/traps.h>
 #include <asm/desc.h>
+#include <asm/tlbflush.h>
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -68,6 +69,8 @@ struct kvm_task_sleep_node {
 	wait_queue_head_t wq;
 	u32 token;
 	int cpu;
+	bool halted;
+	struct mm_struct *mm;
 };
 
 static struct kvm_task_sleep_head {
@@ -96,6 +99,11 @@ static void apf_task_wait(struct task_struct *tsk, u32 token)
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
 	struct kvm_task_sleep_node n, *e;
 	DEFINE_WAIT(wait);
+	int cpu, idle;
+
+	cpu = get_cpu();
+	idle = idle_cpu(cpu);
+	put_cpu();
 
 	spin_lock(&b->lock);
 	e = _find_apf_task(b, token);
@@ -109,17 +117,31 @@ static void apf_task_wait(struct task_struct *tsk, u32 token)
 
 	n.token = token;
 	n.cpu = smp_processor_id();
+	n.mm = percpu_read(cpu_tlbstate.active_mm);
+	n.halted = idle || preempt_count() > 1;
+	atomic_inc(&n.mm->mm_count);
 	init_waitqueue_head(&n.wq);
 	hlist_add_head(&n.link, &b->list);
 	spin_unlock(&b->lock);
 
 	for (;;) {
-		prepare_to_wait(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
+		if (!n.halted)
+			prepare_to_wait(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
 		if (hlist_unhashed(&n.link))
 			break;
-		schedule();
+
+		if (!n.halted) {
+			schedule();
+		} else {
+			/*
+			 * We cannot reschedule. So halt.
+			 */
+			native_safe_halt();
+			local_irq_disable();
+		}
 	}
-	finish_wait(&n.wq, &wait);
+	if (!n.halted)
+		finish_wait(&n.wq, &wait);
 
 	return;
 }
@@ -127,7 +149,12 @@ static void apf_task_wait(struct task_struct *tsk, u32 token)
 static void apf_task_wake_one(struct kvm_task_sleep_node *n)
 {
 	hlist_del_init(&n->link);
-	if (waitqueue_active(&n->wq))
+	if (!n->mm)
+		return;
+	mmdrop(n->mm);
+	if (n->halted)
+		smp_send_reschedule(n->cpu);
+	else if (waitqueue_active(&n->wq))
 		wake_up(&n->wq);
 }
 
@@ -157,6 +184,7 @@ again:
 		}
 		n->token = token;
 		n->cpu = smp_processor_id();
+		n->mm = NULL;
 		init_waitqueue_head(&n->wq);
 		hlist_add_head(&n->link, &b->list);
 	} else
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 11/12] Let host know whether the guest can handle async PF in non-userspace context.
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:24   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/include/asm/kvm_para.h |    1 +
 arch/x86/kernel/kvm.c           |    3 +++
 arch/x86/kvm/x86.c              |    5 +++--
 4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 45e6c12..c675d5d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -367,6 +367,7 @@ struct kvm_vcpu_arch {
 	cpumask_var_t wbinvd_dirty_mask;
 
 	u32 __user *apf_data;
+	bool apf_send_user_only;
 	u32 apf_memslot_ver;
 	u64 apf_msr_val;
 	u32 async_pf_id;
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index edf07cf..a33372c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -38,6 +38,7 @@
 #define KVM_MAX_MMU_OP_BATCH           32
 
 #define KVM_ASYNC_PF_ENABLED			(1 << 0)
+#define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index f4d87b3..f2063c2 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -488,6 +488,9 @@ void __cpuinit kvm_guest_cpu_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF)) {
 		u64 pa = __pa(&__get_cpu_var(apf_reason));
 
+#ifdef CONFIG_PREEMPT
+		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
+#endif
 		if (native_write_msr_safe(MSR_KVM_ASYNC_PF_EN,
 					  pa | KVM_ASYNC_PF_ENABLED, pa >> 32))
 			return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ae7164e..8f2ff7b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1220,8 +1220,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 	int offset = offset_in_page(gpa);
 	unsigned long addr;
 
-	/* Bits 1:5 are resrved, Should be zero */
-	if (data & 0x3e)
+	/* Bits 2:5 are resrved, Should be zero */
+	if (data & 0x3c)
 		return 1;
 
 	vcpu->arch.apf_msr_val = data;
@@ -1244,6 +1244,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 		return 1;
 	}
 	vcpu->arch.apf_memslot_ver = vcpu->kvm->memslot_version;
+	vcpu->arch.apf_send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
 	kvm_async_pf_wakeup_all(vcpu);
 	return 0;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 11/12] Let host know whether the guest can handle async PF in non-userspace context.
@ 2010-07-06 16:24   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:24 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti

If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/include/asm/kvm_para.h |    1 +
 arch/x86/kernel/kvm.c           |    3 +++
 arch/x86/kvm/x86.c              |    5 +++--
 4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 45e6c12..c675d5d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -367,6 +367,7 @@ struct kvm_vcpu_arch {
 	cpumask_var_t wbinvd_dirty_mask;
 
 	u32 __user *apf_data;
+	bool apf_send_user_only;
 	u32 apf_memslot_ver;
 	u64 apf_msr_val;
 	u32 async_pf_id;
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index edf07cf..a33372c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -38,6 +38,7 @@
 #define KVM_MAX_MMU_OP_BATCH           32
 
 #define KVM_ASYNC_PF_ENABLED			(1 << 0)
+#define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index f4d87b3..f2063c2 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -488,6 +488,9 @@ void __cpuinit kvm_guest_cpu_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF)) {
 		u64 pa = __pa(&__get_cpu_var(apf_reason));
 
+#ifdef CONFIG_PREEMPT
+		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
+#endif
 		if (native_write_msr_safe(MSR_KVM_ASYNC_PF_EN,
 					  pa | KVM_ASYNC_PF_ENABLED, pa >> 32))
 			return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ae7164e..8f2ff7b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1220,8 +1220,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 	int offset = offset_in_page(gpa);
 	unsigned long addr;
 
-	/* Bits 1:5 are resrved, Should be zero */
-	if (data & 0x3e)
+	/* Bits 2:5 are resrved, Should be zero */
+	if (data & 0x3c)
 		return 1;
 
 	vcpu->arch.apf_msr_val = data;
@@ -1244,6 +1244,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 		return 1;
 	}
 	vcpu->arch.apf_memslot_ver = vcpu->kvm->memslot_version;
+	vcpu->arch.apf_send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
 	kvm_async_pf_wakeup_all(vcpu);
 	return 0;
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 12/12] Send async PF when guest is not in userspace too.
  2010-07-06 16:24 ` Gleb Natapov
@ 2010-07-06 16:25   ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti


Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/mmu.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 95a0a8b..297f399 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2280,7 +2280,13 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.apf_data || kvm_event_needs_reinjection(vcpu))
 		return false;
 
-	return !!kvm_x86_ops->get_cpl(vcpu);
+	if (vcpu->arch.apf_send_user_only)
+		return !!kvm_x86_ops->get_cpl(vcpu);
+
+	if (!kvm_x86_ops->interrupt_allowed(vcpu))
+		return false;
+
+	return true;
 }
 
 static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 12/12] Send async PF when guest is not in userspace too.
@ 2010-07-06 16:25   ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-06 16:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl, mtosatti


Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/mmu.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 95a0a8b..297f399 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2280,7 +2280,13 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.apf_data || kvm_event_needs_reinjection(vcpu))
 		return false;
 
-	return !!kvm_x86_ops->get_cpl(vcpu);
+	if (vcpu->arch.apf_send_user_only)
+		return !!kvm_x86_ops->get_cpl(vcpu);
+
+	if (!kvm_x86_ops->interrupt_allowed(vcpu))
+		return false;
+
+	return true;
 }
 
 static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/12] Add async PF initialization to PV guest.
  2010-07-06 16:24   ` Gleb Natapov
  (?)
@ 2010-07-07 15:41     ` Peter Zijlstra
  -1 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2010-07-07 15:41 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, tglx, hpa, riel, cl, mtosatti

On Tue, 2010-07-06 at 19:24 +0300, Gleb Natapov wrote:
> @@ -329,6 +330,8 @@ notrace static void __cpuinit start_secondary(void *unused)
>         per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
>         x86_platform.nmi_init();
>  
> +       kvm_guest_cpu_init();
> +
>         /* enable local interrupts */
>         local_irq_enable(); 

CPU_STARTING hotplug notifier is too early?

called from:
   start_secondary()
     smp_callin()
       notify_cpu_starting() 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/12] Add async PF initialization to PV guest.
@ 2010-07-07 15:41     ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2010-07-07 15:41 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, tglx, hpa, riel, cl, mtosatti

On Tue, 2010-07-06 at 19:24 +0300, Gleb Natapov wrote:
> @@ -329,6 +330,8 @@ notrace static void __cpuinit start_secondary(void *unused)
>         per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
>         x86_platform.nmi_init();
>  
> +       kvm_guest_cpu_init();
> +
>         /* enable local interrupts */
>         local_irq_enable(); 

CPU_STARTING hotplug notifier is too early?

called from:
   start_secondary()
     smp_callin()
       notify_cpu_starting() 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/12] Add async PF initialization to PV guest.
@ 2010-07-07 15:41     ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2010-07-07 15:41 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, tglx, hpa, riel, cl, mtosatti

On Tue, 2010-07-06 at 19:24 +0300, Gleb Natapov wrote:
> @@ -329,6 +330,8 @@ notrace static void __cpuinit start_secondary(void *unused)
>         per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
>         x86_platform.nmi_init();
>  
> +       kvm_guest_cpu_init();
> +
>         /* enable local interrupts */
>         local_irq_enable(); 

CPU_STARTING hotplug notifier is too early?

called from:
   start_secondary()
     smp_callin()
       notify_cpu_starting() 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/12] Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-07 16:17     ` Rik van Riel
  -1 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-07 16:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
> into generic code.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/12] Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
@ 2010-07-07 16:17     ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-07 16:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
> into generic code.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 02/12] Add PV MSR to enable asynchronous page faults delivery.
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-07 18:13     ` Rik van Riel
  -1 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-07 18:13 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:

... a commit message would be useful when you submit these
patches for inclusion upstream.

> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 02/12] Add PV MSR to enable asynchronous page faults delivery.
@ 2010-07-07 18:13     ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-07 18:13 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:

... a commit message would be useful when you submit these
patches for inclusion upstream.

> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/12] Provide special async page fault handler when async PF capability is detected
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-07 22:38     ` Rik van Riel
  -1 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-07 22:38 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> When async PF capability is detected hook up special page fault handler
> that will handle async page fault events and bypass other page faults to
> regular page fault handler.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

I had some concerns with this patch, but it looks like patch
10/12 addresses all of those, so ...

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/12] Provide special async page fault handler when async PF capability is detected
@ 2010-07-07 22:38     ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-07 22:38 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> When async PF capability is detected hook up special page fault handler
> that will handle async page fault events and bypass other page faults to
> regular page fault handler.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

I had some concerns with this patch, but it looks like patch
10/12 addresses all of those, so ...

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 05/12] Export __get_user_pages_fast.
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-07 22:45     ` Rik van Riel
  -1 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-07 22:45 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> KVM will use it to try and find a page without falling back to slow
> gup. That is why get_user_pages_fast() is not enough.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 05/12] Export __get_user_pages_fast.
@ 2010-07-07 22:45     ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-07 22:45 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> KVM will use it to try and find a page without falling back to slow
> gup. That is why get_user_pages_fast() is not enough.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/12] Maintain memslot version number
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-08  0:19     ` Rik van Riel
  -1 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  0:19 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> Code that depends on particular memslot layout can track changes and
> adjust to new layout.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/12] Maintain memslot version number
@ 2010-07-08  0:19     ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  0:19 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> Code that depends on particular memslot layout can track changes and
> adjust to new layout.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-08  4:09     ` Rik van Riel
  -1 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  4:09 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> If guest access swapped out memory do not swap it in from vcpu thread
> context. Setup slow work to do swapping and send async page fault to
> a guest.
>
> Allow async page fault injection only when guest is in user mode since
> otherwise guest may be in non-sleepable context and will not be able to
> reschedule.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
@ 2010-07-08  4:09     ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  4:09 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> If guest access swapped out memory do not swap it in from vcpu thread
> context. Setup slow work to do swapping and send async page fault to
> a guest.
>
> Allow async page fault injection only when guest is in user mode since
> otherwise guest may be in non-sleepable context and will not be able to
> reschedule.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 09/12] Retry fault before vmentry
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-08  4:21     ` Rik van Riel
  -1 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  4:21 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> When page is swapped in it is mapped into guest memory only after guest
> tries to access it again and generate another fault. To save this fault
> we can map it immediately since we know that guest is going to access
> the page.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 09/12] Retry fault before vmentry
@ 2010-07-08  4:21     ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  4:21 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> When page is swapped in it is mapped into guest memory only after guest
> tries to access it again and generate another fault. To save this fault
> we can map it immediately since we know that guest is going to access
> the page.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/12] Handle async PF in non preemptable context
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-08  4:22     ` Rik van Riel
  -1 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  4:22 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> If async page fault is received by idle task or when preemp_count is
> not zero guest cannot reschedule, so do sti; hlt and wait for page to be
> ready. vcpu can still process interrupts while it waits for the page to
> be ready.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/12] Handle async PF in non preemptable context
@ 2010-07-08  4:22     ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  4:22 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> If async page fault is received by idle task or when preemp_count is
> not zero guest cannot reschedule, so do sti; hlt and wait for page to be
> ready. vcpu can still process interrupts while it waits for the page to
> be ready.
>
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 11/12] Let host know whether the guest can handle async PF in non-userspace context.
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-08  4:28     ` Rik van Riel
  -1 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  4:28 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> If guest can detect that it runs in non-preemptable context it can
> handle async PFs at any time, so let host know that it can send async
> PF even if guest cpu is not in userspace.

The code looks correct.  One question though - is there a
reason to implement the userspace-only async PF path at
all, since the handling of async PF in non-userspace context
is introduced simultaneously?

> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 11/12] Let host know whether the guest can handle async PF in non-userspace context.
@ 2010-07-08  4:28     ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  4:28 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> If guest can detect that it runs in non-preemptable context it can
> handle async PFs at any time, so let host know that it can send async
> PF even if guest cpu is not in userspace.

The code looks correct.  One question though - is there a
reason to implement the userspace-only async PF path at
all, since the handling of async PF in non-userspace context
is introduced simultaneously?

> Signed-off-by: Gleb Natapov<gleb@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 12/12] Send async PF when guest is not in userspace too.
  2010-07-06 16:25   ` Gleb Natapov
@ 2010-07-08  4:29     ` Rik van Riel
  -1 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  4:29 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:25 PM, Gleb Natapov wrote:
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

This patch needs a commit message on the next submission.
Other than that:

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 12/12] Send async PF when guest is not in userspace too.
@ 2010-07-08  4:29     ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2010-07-08  4:29 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On 07/06/2010 12:25 PM, Gleb Natapov wrote:
> Signed-off-by: Gleb Natapov<gleb@redhat.com>

This patch needs a commit message on the next submission.
Other than that:

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/12] Add async PF initialization to PV guest.
  2010-07-07 15:41     ` Peter Zijlstra
@ 2010-07-08 13:24       ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-08 13:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, tglx, hpa, riel, cl, mtosatti

On Wed, Jul 07, 2010 at 05:41:01PM +0200, Peter Zijlstra wrote:
> On Tue, 2010-07-06 at 19:24 +0300, Gleb Natapov wrote:
> > @@ -329,6 +330,8 @@ notrace static void __cpuinit start_secondary(void *unused)
> >         per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
> >         x86_platform.nmi_init();
> >  
> > +       kvm_guest_cpu_init();
> > +
> >         /* enable local interrupts */
> >         local_irq_enable(); 
> 
> CPU_STARTING hotplug notifier is too early?
> 
Actually no. I will move this call into cpu notifier.

> called from:
>    start_secondary()
>      smp_callin()
>        notify_cpu_starting() 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
			Gleb.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/12] Add async PF initialization to PV guest.
@ 2010-07-08 13:24       ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-08 13:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, tglx, hpa, riel, cl, mtosatti

On Wed, Jul 07, 2010 at 05:41:01PM +0200, Peter Zijlstra wrote:
> On Tue, 2010-07-06 at 19:24 +0300, Gleb Natapov wrote:
> > @@ -329,6 +330,8 @@ notrace static void __cpuinit start_secondary(void *unused)
> >         per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
> >         x86_platform.nmi_init();
> >  
> > +       kvm_guest_cpu_init();
> > +
> >         /* enable local interrupts */
> >         local_irq_enable(); 
> 
> CPU_STARTING hotplug notifier is too early?
> 
Actually no. I will move this call into cpu notifier.

> called from:
>    start_secondary()
>      smp_callin()
>        notify_cpu_starting() 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
			Gleb.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 11/12] Let host know whether the guest can handle async PF in non-userspace context.
  2010-07-08  4:28     ` Rik van Riel
@ 2010-07-08 13:35       ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-08 13:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On Thu, Jul 08, 2010 at 12:28:18AM -0400, Rik van Riel wrote:
> On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> >If guest can detect that it runs in non-preemptable context it can
> >handle async PFs at any time, so let host know that it can send async
> >PF even if guest cpu is not in userspace.
> 
> The code looks correct.  One question though - is there a
> reason to implement the userspace-only async PF path at
> all, since the handling of async PF in non-userspace context
> is introduced simultaneously?
> 
Guest userspace-only async PF handling is added in patch 4 and
non-userspace is added in patch 10. It is done for easy reviewing.
If I implement everything in one patch it will be harder to see why
things are done the way they are IMHO.

--
			Gleb.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 11/12] Let host know whether the guest can handle async PF in non-userspace context.
@ 2010-07-08 13:35       ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-08 13:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	cl, mtosatti

On Thu, Jul 08, 2010 at 12:28:18AM -0400, Rik van Riel wrote:
> On 07/06/2010 12:24 PM, Gleb Natapov wrote:
> >If guest can detect that it runs in non-preemptable context it can
> >handle async PFs at any time, so let host know that it can send async
> >PF even if guest cpu is not in userspace.
> 
> The code looks correct.  One question though - is there a
> reason to implement the userspace-only async PF path at
> all, since the handling of async PF in non-userspace context
> is introduced simultaneously?
> 
Guest userspace-only async PF handling is added in patch 4 and
non-userspace is added in patch 10. It is done for easy reviewing.
If I implement everything in one patch it will be harder to see why
things are done the way they are IMHO.

--
			Gleb.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-08 15:59     ` Marcelo Tosatti
  -1 siblings, 0 replies; 67+ messages in thread
From: Marcelo Tosatti @ 2010-07-08 15:59 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl

On Tue, Jul 06, 2010 at 07:24:56PM +0300, Gleb Natapov wrote:
> If guest access swapped out memory do not swap it in from vcpu thread
> context. Setup slow work to do swapping and send async page fault to
> a guest.
> 
> Allow async page fault injection only when guest is in user mode since
> otherwise guest may be in non-sleepable context and will not be able to
> reschedule.
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   16 +++
>  arch/x86/kvm/Kconfig            |    2 +
>  arch/x86/kvm/mmu.c              |   35 +++++-
>  arch/x86/kvm/paging_tmpl.h      |   17 +++-
>  arch/x86/kvm/x86.c              |   63 +++++++++-
>  include/linux/kvm_host.h        |   31 +++++
>  include/trace/events/kvm.h      |   60 +++++++++
>  virt/kvm/Kconfig                |    3 +
>  virt/kvm/kvm_main.c             |  263 ++++++++++++++++++++++++++++++++++++++-
>  9 files changed, 481 insertions(+), 9 deletions(-)
> 
> +	u32 apf_memslot_ver;
>  	u64 apf_msr_val;
> +	u32 async_pf_id;
>  };
>  
>  struct kvm_arch {
> @@ -444,6 +446,8 @@ struct kvm_vcpu_stat {
>  	u32 hypercalls;
>  	u32 irq_injections;
>  	u32 nmi_injections;
> +	u32 apf_not_present;
> +	u32 apf_present;
>  };
>  
>  struct kvm_x86_ops {
> @@ -528,6 +532,10 @@ struct kvm_x86_ops {
>  	const struct trace_print_flags *exit_reasons_str;
>  };
>  
> +struct kvm_arch_async_pf {
> +	u32 token;
> +};
> +
>  extern struct kvm_x86_ops *kvm_x86_ops;
>  
>  int kvm_mmu_module_init(void);
> @@ -763,4 +771,12 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
>  
>  bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
>  
> +struct kvm_async_pf;
> +
> +void kvm_arch_inject_async_page_not_present(struct kvm_vcpu *vcpu,
> +					    struct kvm_async_pf *work);
> +void kvm_arch_inject_async_page_present(struct kvm_vcpu *vcpu,
> +					struct kvm_async_pf *work);
> +bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
>  #endif /* _ASM_X86_KVM_HOST_H */
> +
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 970bbd4..2461284 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -28,6 +28,8 @@ config KVM
>  	select HAVE_KVM_IRQCHIP
>  	select HAVE_KVM_EVENTFD
>  	select KVM_APIC_ARCHITECTURE
> +	select KVM_ASYNC_PF
> +	select SLOW_WORK
>  	select USER_RETURN_NOTIFIER
>  	select KVM_MMIO
>  	---help---
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index c515753..a49565b 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -21,6 +21,7 @@
>  #include "mmu.h"
>  #include "x86.h"
>  #include "kvm_cache_regs.h"
> +#include "x86.h"
>  
>  #include <linux/kvm_host.h>
>  #include <linux/types.h>
> @@ -2264,6 +2265,21 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
>  			     error_code & PFERR_WRITE_MASK, gfn);
>  }
>  
> +int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
> +{
> +	struct kvm_arch_async_pf arch;
> +	arch.token = (vcpu->arch.async_pf_id++ << 12) | vcpu->vcpu_id;
> +	return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
> +}
> +
> +static bool can_do_async_pf(struct kvm_vcpu *vcpu)
> +{
> +	if (!vcpu->arch.apf_data || kvm_event_needs_reinjection(vcpu))
> +		return false;
> +
> +	return !!kvm_x86_ops->get_cpl(vcpu);
> +}
> +
>  static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
>  				u32 error_code)
>  {
> @@ -2272,6 +2288,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
>  	int level;
>  	gfn_t gfn = gpa >> PAGE_SHIFT;
>  	unsigned long mmu_seq;
> +	bool async;
>  
>  	ASSERT(vcpu);
>  	ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
> @@ -2286,7 +2303,23 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
>  
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
> -	pfn = gfn_to_pfn(vcpu->kvm, gfn);
> +
> +	if (can_do_async_pf(vcpu)) {
> +		pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async);
> +		trace_kvm_try_async_get_page(async, pfn);
> +	} else {
> +do_sync:
> +		async = false;
> +		pfn = gfn_to_pfn(vcpu->kvm, gfn);
> +	}
> +
> +	if (async) {
> +		if (!kvm_arch_setup_async_pf(vcpu, gpa, gfn))
> +			goto do_sync;
> +		return 0;
> +	}
> +
> +	/* mmio */
>  	if (is_error_pfn(pfn))
>  		return kvm_handle_bad_page(vcpu->kvm, gfn, pfn);
>  	spin_lock(&vcpu->kvm->mmu_lock);
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 3350c02..26d6b74 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -423,6 +423,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
>  	pfn_t pfn;
>  	int level = PT_PAGE_TABLE_LEVEL;
>  	unsigned long mmu_seq;
> +	bool async;
>  
>  	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
>  	kvm_mmu_audit(vcpu, "pre page fault");
> @@ -454,7 +455,21 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
>  
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
> -	pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
> +
> +	if (can_do_async_pf(vcpu)) {
> +		pfn = gfn_to_pfn_async(vcpu->kvm, walker.gfn, &async);
> +		trace_kvm_try_async_get_page(async, pfn);
> +	} else {
> +do_sync:
> +		async = false;
> +		pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
> +	}
> +
> +	if (async) {
> +		if (!kvm_arch_setup_async_pf(vcpu, addr, walker.gfn))
> +			goto do_sync;
> +		return 0;
> +	}
>  
>  	/* mmio */
>  	if (is_error_pfn(pfn))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 744f8c1..6b7542f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -118,6 +118,8 @@ static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs);
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
>  	{ "pf_fixed", VCPU_STAT(pf_fixed) },
>  	{ "pf_guest", VCPU_STAT(pf_guest) },
> +	{ "apf_not_present", VCPU_STAT(apf_not_present) },
> +	{ "apf_present", VCPU_STAT(apf_present) },
>  	{ "tlb_flush", VCPU_STAT(tlb_flush) },
>  	{ "invlpg", VCPU_STAT(invlpg) },
>  	{ "exits", VCPU_STAT(exits) },
> @@ -1226,6 +1228,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>  
>  	if (!(data & KVM_ASYNC_PF_ENABLED)) {
>  		vcpu->arch.apf_data = NULL;
> +		kvm_clear_async_pf_completion_queue(vcpu);
>  		return 0;
>  	}
>  
> @@ -1240,6 +1243,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>  		vcpu->arch.apf_data = NULL;
>  		return 1;
>  	}
> +	vcpu->arch.apf_memslot_ver = vcpu->kvm->memslot_version;
> +	kvm_async_pf_wakeup_all(vcpu);
>  	return 0;
>  }
>  
> @@ -4721,6 +4726,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	if (unlikely(r))
>  		goto out;
>  
> +	kvm_check_async_pf_completion(vcpu);
> +
>  	preempt_disable();
>  
>  	kvm_x86_ops->prepare_guest_switch(vcpu);
> @@ -5393,6 +5400,8 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
>  	vcpu->arch.apf_data = NULL;
>  	vcpu->arch.apf_msr_val = 0;
>  
> +	kvm_clear_async_pf_completion_queue(vcpu);
> +
>  	return kvm_x86_ops->vcpu_reset(vcpu);
>  }
>  
> @@ -5534,8 +5543,10 @@ static void kvm_free_vcpus(struct kvm *kvm)
>  	/*
>  	 * Unpin any mmu pages first.
>  	 */
> -	kvm_for_each_vcpu(i, vcpu, kvm)
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		kvm_clear_async_pf_completion_queue(vcpu);
>  		kvm_unload_vcpu_mmu(vcpu);
> +	}
>  	kvm_for_each_vcpu(i, vcpu, kvm)
>  		kvm_arch_vcpu_free(vcpu);
>  
> @@ -5647,6 +5658,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE
> +		|| !list_empty_careful(&vcpu->async_pf_done)
>  		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
>  		|| vcpu->arch.nmi_pending ||
>  		(kvm_arch_interrupt_allowed(vcpu) &&
> @@ -5704,6 +5716,55 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_rflags);
>  
> +static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
> +{
> +	if (unlikely(vcpu->arch.apf_memslot_ver !=
> +		     vcpu->kvm->memslot_version)) {
> +		u64 gpa = vcpu->arch.apf_msr_val & ~0x3f;
> +		unsigned long addr;
> +		int offset = offset_in_page(gpa);
> +
> +		addr = gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT);
> +		vcpu->arch.apf_data = (u32 __user*)(addr + offset);
> +		if (kvm_is_error_hva(addr)) {
> +			vcpu->arch.apf_data = NULL;
> +			return -EFAULT;
> +		}
> +	}
> +
> +	return put_user(val, vcpu->arch.apf_data);
> +}

Why not use kvm_write_guest?

> +int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
> +		       struct kvm_arch_async_pf *arch)
> +{
> +	struct kvm_async_pf *work;
> +
> +	if (vcpu->async_pf_queued >= ASYNC_PF_PER_VCPU)
> +		return 0;
> +
> +	/* setup slow work */
> +
> +	/* do alloc atomic since if we are going to sleep anyway we
> +	   may as well sleep faulting in page */
> +	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> +	if (!work)
> +		return 0;

GFP_KERNEL is fine for this context.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
@ 2010-07-08 15:59     ` Marcelo Tosatti
  0 siblings, 0 replies; 67+ messages in thread
From: Marcelo Tosatti @ 2010-07-08 15:59 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl

On Tue, Jul 06, 2010 at 07:24:56PM +0300, Gleb Natapov wrote:
> If guest access swapped out memory do not swap it in from vcpu thread
> context. Setup slow work to do swapping and send async page fault to
> a guest.
> 
> Allow async page fault injection only when guest is in user mode since
> otherwise guest may be in non-sleepable context and will not be able to
> reschedule.
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   16 +++
>  arch/x86/kvm/Kconfig            |    2 +
>  arch/x86/kvm/mmu.c              |   35 +++++-
>  arch/x86/kvm/paging_tmpl.h      |   17 +++-
>  arch/x86/kvm/x86.c              |   63 +++++++++-
>  include/linux/kvm_host.h        |   31 +++++
>  include/trace/events/kvm.h      |   60 +++++++++
>  virt/kvm/Kconfig                |    3 +
>  virt/kvm/kvm_main.c             |  263 ++++++++++++++++++++++++++++++++++++++-
>  9 files changed, 481 insertions(+), 9 deletions(-)
> 
> +	u32 apf_memslot_ver;
>  	u64 apf_msr_val;
> +	u32 async_pf_id;
>  };
>  
>  struct kvm_arch {
> @@ -444,6 +446,8 @@ struct kvm_vcpu_stat {
>  	u32 hypercalls;
>  	u32 irq_injections;
>  	u32 nmi_injections;
> +	u32 apf_not_present;
> +	u32 apf_present;
>  };
>  
>  struct kvm_x86_ops {
> @@ -528,6 +532,10 @@ struct kvm_x86_ops {
>  	const struct trace_print_flags *exit_reasons_str;
>  };
>  
> +struct kvm_arch_async_pf {
> +	u32 token;
> +};
> +
>  extern struct kvm_x86_ops *kvm_x86_ops;
>  
>  int kvm_mmu_module_init(void);
> @@ -763,4 +771,12 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
>  
>  bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
>  
> +struct kvm_async_pf;
> +
> +void kvm_arch_inject_async_page_not_present(struct kvm_vcpu *vcpu,
> +					    struct kvm_async_pf *work);
> +void kvm_arch_inject_async_page_present(struct kvm_vcpu *vcpu,
> +					struct kvm_async_pf *work);
> +bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
>  #endif /* _ASM_X86_KVM_HOST_H */
> +
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 970bbd4..2461284 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -28,6 +28,8 @@ config KVM
>  	select HAVE_KVM_IRQCHIP
>  	select HAVE_KVM_EVENTFD
>  	select KVM_APIC_ARCHITECTURE
> +	select KVM_ASYNC_PF
> +	select SLOW_WORK
>  	select USER_RETURN_NOTIFIER
>  	select KVM_MMIO
>  	---help---
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index c515753..a49565b 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -21,6 +21,7 @@
>  #include "mmu.h"
>  #include "x86.h"
>  #include "kvm_cache_regs.h"
> +#include "x86.h"
>  
>  #include <linux/kvm_host.h>
>  #include <linux/types.h>
> @@ -2264,6 +2265,21 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
>  			     error_code & PFERR_WRITE_MASK, gfn);
>  }
>  
> +int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
> +{
> +	struct kvm_arch_async_pf arch;
> +	arch.token = (vcpu->arch.async_pf_id++ << 12) | vcpu->vcpu_id;
> +	return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
> +}
> +
> +static bool can_do_async_pf(struct kvm_vcpu *vcpu)
> +{
> +	if (!vcpu->arch.apf_data || kvm_event_needs_reinjection(vcpu))
> +		return false;
> +
> +	return !!kvm_x86_ops->get_cpl(vcpu);
> +}
> +
>  static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
>  				u32 error_code)
>  {
> @@ -2272,6 +2288,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
>  	int level;
>  	gfn_t gfn = gpa >> PAGE_SHIFT;
>  	unsigned long mmu_seq;
> +	bool async;
>  
>  	ASSERT(vcpu);
>  	ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
> @@ -2286,7 +2303,23 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
>  
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
> -	pfn = gfn_to_pfn(vcpu->kvm, gfn);
> +
> +	if (can_do_async_pf(vcpu)) {
> +		pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async);
> +		trace_kvm_try_async_get_page(async, pfn);
> +	} else {
> +do_sync:
> +		async = false;
> +		pfn = gfn_to_pfn(vcpu->kvm, gfn);
> +	}
> +
> +	if (async) {
> +		if (!kvm_arch_setup_async_pf(vcpu, gpa, gfn))
> +			goto do_sync;
> +		return 0;
> +	}
> +
> +	/* mmio */
>  	if (is_error_pfn(pfn))
>  		return kvm_handle_bad_page(vcpu->kvm, gfn, pfn);
>  	spin_lock(&vcpu->kvm->mmu_lock);
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 3350c02..26d6b74 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -423,6 +423,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
>  	pfn_t pfn;
>  	int level = PT_PAGE_TABLE_LEVEL;
>  	unsigned long mmu_seq;
> +	bool async;
>  
>  	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
>  	kvm_mmu_audit(vcpu, "pre page fault");
> @@ -454,7 +455,21 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
>  
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
> -	pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
> +
> +	if (can_do_async_pf(vcpu)) {
> +		pfn = gfn_to_pfn_async(vcpu->kvm, walker.gfn, &async);
> +		trace_kvm_try_async_get_page(async, pfn);
> +	} else {
> +do_sync:
> +		async = false;
> +		pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
> +	}
> +
> +	if (async) {
> +		if (!kvm_arch_setup_async_pf(vcpu, addr, walker.gfn))
> +			goto do_sync;
> +		return 0;
> +	}
>  
>  	/* mmio */
>  	if (is_error_pfn(pfn))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 744f8c1..6b7542f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -118,6 +118,8 @@ static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs);
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
>  	{ "pf_fixed", VCPU_STAT(pf_fixed) },
>  	{ "pf_guest", VCPU_STAT(pf_guest) },
> +	{ "apf_not_present", VCPU_STAT(apf_not_present) },
> +	{ "apf_present", VCPU_STAT(apf_present) },
>  	{ "tlb_flush", VCPU_STAT(tlb_flush) },
>  	{ "invlpg", VCPU_STAT(invlpg) },
>  	{ "exits", VCPU_STAT(exits) },
> @@ -1226,6 +1228,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>  
>  	if (!(data & KVM_ASYNC_PF_ENABLED)) {
>  		vcpu->arch.apf_data = NULL;
> +		kvm_clear_async_pf_completion_queue(vcpu);
>  		return 0;
>  	}
>  
> @@ -1240,6 +1243,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>  		vcpu->arch.apf_data = NULL;
>  		return 1;
>  	}
> +	vcpu->arch.apf_memslot_ver = vcpu->kvm->memslot_version;
> +	kvm_async_pf_wakeup_all(vcpu);
>  	return 0;
>  }
>  
> @@ -4721,6 +4726,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	if (unlikely(r))
>  		goto out;
>  
> +	kvm_check_async_pf_completion(vcpu);
> +
>  	preempt_disable();
>  
>  	kvm_x86_ops->prepare_guest_switch(vcpu);
> @@ -5393,6 +5400,8 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
>  	vcpu->arch.apf_data = NULL;
>  	vcpu->arch.apf_msr_val = 0;
>  
> +	kvm_clear_async_pf_completion_queue(vcpu);
> +
>  	return kvm_x86_ops->vcpu_reset(vcpu);
>  }
>  
> @@ -5534,8 +5543,10 @@ static void kvm_free_vcpus(struct kvm *kvm)
>  	/*
>  	 * Unpin any mmu pages first.
>  	 */
> -	kvm_for_each_vcpu(i, vcpu, kvm)
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		kvm_clear_async_pf_completion_queue(vcpu);
>  		kvm_unload_vcpu_mmu(vcpu);
> +	}
>  	kvm_for_each_vcpu(i, vcpu, kvm)
>  		kvm_arch_vcpu_free(vcpu);
>  
> @@ -5647,6 +5658,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE
> +		|| !list_empty_careful(&vcpu->async_pf_done)
>  		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
>  		|| vcpu->arch.nmi_pending ||
>  		(kvm_arch_interrupt_allowed(vcpu) &&
> @@ -5704,6 +5716,55 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_rflags);
>  
> +static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
> +{
> +	if (unlikely(vcpu->arch.apf_memslot_ver !=
> +		     vcpu->kvm->memslot_version)) {
> +		u64 gpa = vcpu->arch.apf_msr_val & ~0x3f;
> +		unsigned long addr;
> +		int offset = offset_in_page(gpa);
> +
> +		addr = gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT);
> +		vcpu->arch.apf_data = (u32 __user*)(addr + offset);
> +		if (kvm_is_error_hva(addr)) {
> +			vcpu->arch.apf_data = NULL;
> +			return -EFAULT;
> +		}
> +	}
> +
> +	return put_user(val, vcpu->arch.apf_data);
> +}

Why not use kvm_write_guest?

> +int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
> +		       struct kvm_arch_async_pf *arch)
> +{
> +	struct kvm_async_pf *work;
> +
> +	if (vcpu->async_pf_queued >= ASYNC_PF_PER_VCPU)
> +		return 0;
> +
> +	/* setup slow work */
> +
> +	/* do alloc atomic since if we are going to sleep anyway we
> +	   may as well sleep faulting in page */
> +	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> +	if (!work)
> +		return 0;

GFP_KERNEL is fine for this context.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 09/12] Retry fault before vmentry
  2010-07-06 16:24   ` Gleb Natapov
@ 2010-07-08 16:17     ` Marcelo Tosatti
  -1 siblings, 0 replies; 67+ messages in thread
From: Marcelo Tosatti @ 2010-07-08 16:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl

On Tue, Jul 06, 2010 at 07:24:57PM +0300, Gleb Natapov wrote:
> When page is swapped in it is mapped into guest memory only after guest
> tries to access it again and generate another fault. To save this fault
> we can map it immediately since we know that guest is going to access
> the page.
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    7 ++++++-
>  arch/x86/kvm/mmu.c              |   27 ++++++++++++++++++++-------
>  arch/x86/kvm/paging_tmpl.h      |   37 +++++++++++++++++++++++++++++++++----
>  arch/x86/kvm/x86.c              |    9 +++++++++
>  virt/kvm/kvm_main.c             |    2 ++
>  5 files changed, 70 insertions(+), 12 deletions(-)
> 

> +static int FNAME(page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3,
> +				       gva_t addr, u32 error_code)
> +{
> +	int r = 0;
> +	gpa_t curr_cr3 = vcpu->arch.cr3;
> +
> +	if (curr_cr3 != cr3) {
> +		/*
> +		 * We do page fault on behalf of a process that is sleeping
> +		 * because of async PF. PV guest takes reference to mm that cr3
> +		 * belongs too, so it has to be valid here.
> +		 */
> +		kvm_set_cr3(vcpu, cr3);
> +		if (kvm_mmu_reload(vcpu))
> +			goto switch_cr3;
> +	}
> +
> +	r = FNAME(page_fault)(vcpu, addr, error_code, true);

KVM_REQ_MMU_SYNC request generated here must be processed before
switching to a different cr3 (otherwise vcpu_enter_guest will process it 
with the wrong cr3 in place).


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 09/12] Retry fault before vmentry
@ 2010-07-08 16:17     ` Marcelo Tosatti
  0 siblings, 0 replies; 67+ messages in thread
From: Marcelo Tosatti @ 2010-07-08 16:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl

On Tue, Jul 06, 2010 at 07:24:57PM +0300, Gleb Natapov wrote:
> When page is swapped in it is mapped into guest memory only after guest
> tries to access it again and generate another fault. To save this fault
> we can map it immediately since we know that guest is going to access
> the page.
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    7 ++++++-
>  arch/x86/kvm/mmu.c              |   27 ++++++++++++++++++++-------
>  arch/x86/kvm/paging_tmpl.h      |   37 +++++++++++++++++++++++++++++++++----
>  arch/x86/kvm/x86.c              |    9 +++++++++
>  virt/kvm/kvm_main.c             |    2 ++
>  5 files changed, 70 insertions(+), 12 deletions(-)
> 

> +static int FNAME(page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3,
> +				       gva_t addr, u32 error_code)
> +{
> +	int r = 0;
> +	gpa_t curr_cr3 = vcpu->arch.cr3;
> +
> +	if (curr_cr3 != cr3) {
> +		/*
> +		 * We do page fault on behalf of a process that is sleeping
> +		 * because of async PF. PV guest takes reference to mm that cr3
> +		 * belongs too, so it has to be valid here.
> +		 */
> +		kvm_set_cr3(vcpu, cr3);
> +		if (kvm_mmu_reload(vcpu))
> +			goto switch_cr3;
> +	}
> +
> +	r = FNAME(page_fault)(vcpu, addr, error_code, true);

KVM_REQ_MMU_SYNC request generated here must be processed before
switching to a different cr3 (otherwise vcpu_enter_guest will process it 
with the wrong cr3 in place).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
  2010-07-08 15:59     ` Marcelo Tosatti
@ 2010-07-08 18:05       ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-08 18:05 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl

On Thu, Jul 08, 2010 at 12:59:20PM -0300, Marcelo Tosatti wrote:
> > +static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
> > +{
> > +	if (unlikely(vcpu->arch.apf_memslot_ver !=
> > +		     vcpu->kvm->memslot_version)) {
> > +		u64 gpa = vcpu->arch.apf_msr_val & ~0x3f;
> > +		unsigned long addr;
> > +		int offset = offset_in_page(gpa);
> > +
> > +		addr = gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT);
> > +		vcpu->arch.apf_data = (u32 __user*)(addr + offset);
> > +		if (kvm_is_error_hva(addr)) {
> > +			vcpu->arch.apf_data = NULL;
> > +			return -EFAULT;
> > +		}
> > +	}
> > +
> > +	return put_user(val, vcpu->arch.apf_data);
> > +}
> 
> Why not use kvm_write_guest?
Because I want to cache gfn_to_hva() translation, so this code tracks
memslot changes and does translation only when needed (almost never).

> 
> > +int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
> > +		       struct kvm_arch_async_pf *arch)
> > +{
> > +	struct kvm_async_pf *work;
> > +
> > +	if (vcpu->async_pf_queued >= ASYNC_PF_PER_VCPU)
> > +		return 0;
> > +
> > +	/* setup slow work */
> > +
> > +	/* do alloc atomic since if we are going to sleep anyway we
> > +	   may as well sleep faulting in page */
> > +	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> > +	if (!work)
> > +		return 0;
> 
> GFP_KERNEL is fine for this context.
But it can sleep, no? The comment explains why I don't want to sleep
here.

--
			Gleb.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
@ 2010-07-08 18:05       ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-08 18:05 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm, linux-mm, linux-kernel, avi, mingo, a.p.zijlstra, tglx, hpa,
	riel, cl

On Thu, Jul 08, 2010 at 12:59:20PM -0300, Marcelo Tosatti wrote:
> > +static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
> > +{
> > +	if (unlikely(vcpu->arch.apf_memslot_ver !=
> > +		     vcpu->kvm->memslot_version)) {
> > +		u64 gpa = vcpu->arch.apf_msr_val & ~0x3f;
> > +		unsigned long addr;
> > +		int offset = offset_in_page(gpa);
> > +
> > +		addr = gfn_to_hva(vcpu->kvm, gpa >> PAGE_SHIFT);
> > +		vcpu->arch.apf_data = (u32 __user*)(addr + offset);
> > +		if (kvm_is_error_hva(addr)) {
> > +			vcpu->arch.apf_data = NULL;
> > +			return -EFAULT;
> > +		}
> > +	}
> > +
> > +	return put_user(val, vcpu->arch.apf_data);
> > +}
> 
> Why not use kvm_write_guest?
Because I want to cache gfn_to_hva() translation, so this code tracks
memslot changes and does translation only when needed (almost never).

> 
> > +int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
> > +		       struct kvm_arch_async_pf *arch)
> > +{
> > +	struct kvm_async_pf *work;
> > +
> > +	if (vcpu->async_pf_queued >= ASYNC_PF_PER_VCPU)
> > +		return 0;
> > +
> > +	/* setup slow work */
> > +
> > +	/* do alloc atomic since if we are going to sleep anyway we
> > +	   may as well sleep faulting in page */
> > +	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> > +	if (!work)
> > +		return 0;
> 
> GFP_KERNEL is fine for this context.
But it can sleep, no? The comment explains why I don't want to sleep
here.

--
			Gleb.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
  2010-07-08 18:05       ` Gleb Natapov
  (?)
@ 2010-07-08 18:09         ` Peter Zijlstra
  -1 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2010-07-08 18:09 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, kvm, linux-mm, linux-kernel, avi, mingo, tglx,
	hpa, riel, cl

On Thu, 2010-07-08 at 21:05 +0300, Gleb Natapov wrote:
> > > +   /* do alloc atomic since if we are going to sleep anyway we
> > > +      may as well sleep faulting in page */
> > > +   work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> > > +   if (!work)
> > > +           return 0;
> > 
> > GFP_KERNEL is fine for this context.
> But it can sleep, no? The comment explains why I don't want to sleep
> here. 

In that case, use 0, no use wasting __GFP_HIGH on something that doesn't
actually need it.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
@ 2010-07-08 18:09         ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2010-07-08 18:09 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, kvm, linux-mm, linux-kernel, avi, mingo, tglx,
	hpa, riel, cl

On Thu, 2010-07-08 at 21:05 +0300, Gleb Natapov wrote:
> > > +   /* do alloc atomic since if we are going to sleep anyway we
> > > +      may as well sleep faulting in page */
> > > +   work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> > > +   if (!work)
> > > +           return 0;
> > 
> > GFP_KERNEL is fine for this context.
> But it can sleep, no? The comment explains why I don't want to sleep
> here. 

In that case, use 0, no use wasting __GFP_HIGH on something that doesn't
actually need it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
@ 2010-07-08 18:09         ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2010-07-08 18:09 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, kvm, linux-mm, linux-kernel, avi, mingo, tglx,
	hpa, riel, cl

On Thu, 2010-07-08 at 21:05 +0300, Gleb Natapov wrote:
> > > +   /* do alloc atomic since if we are going to sleep anyway we
> > > +      may as well sleep faulting in page */
> > > +   work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> > > +   if (!work)
> > > +           return 0;
> > 
> > GFP_KERNEL is fine for this context.
> But it can sleep, no? The comment explains why I don't want to sleep
> here. 

In that case, use 0, no use wasting __GFP_HIGH on something that doesn't
actually need it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
  2010-07-08 18:09         ` Peter Zijlstra
  (?)
@ 2010-07-08 18:10           ` Peter Zijlstra
  -1 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2010-07-08 18:10 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, kvm, linux-mm, linux-kernel, avi, mingo, tglx,
	hpa, riel, cl

On Thu, 2010-07-08 at 20:09 +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 21:05 +0300, Gleb Natapov wrote:
> > > > +   /* do alloc atomic since if we are going to sleep anyway we
> > > > +      may as well sleep faulting in page */
> > > > +   work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> > > > +   if (!work)
> > > > +           return 0;
> > > 
> > > GFP_KERNEL is fine for this context.
> > But it can sleep, no? The comment explains why I don't want to sleep
> > here. 
> 
> In that case, use 0, no use wasting __GFP_HIGH on something that doesn't
> actually need it.

Ah, I just saw we have GFP_NOWAIT for that.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
@ 2010-07-08 18:10           ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2010-07-08 18:10 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, kvm, linux-mm, linux-kernel, avi, mingo, tglx,
	hpa, riel, cl

On Thu, 2010-07-08 at 20:09 +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 21:05 +0300, Gleb Natapov wrote:
> > > > +   /* do alloc atomic since if we are going to sleep anyway we
> > > > +      may as well sleep faulting in page */
> > > > +   work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> > > > +   if (!work)
> > > > +           return 0;
> > > 
> > > GFP_KERNEL is fine for this context.
> > But it can sleep, no? The comment explains why I don't want to sleep
> > here. 
> 
> In that case, use 0, no use wasting __GFP_HIGH on something that doesn't
> actually need it.

Ah, I just saw we have GFP_NOWAIT for that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
@ 2010-07-08 18:10           ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2010-07-08 18:10 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, kvm, linux-mm, linux-kernel, avi, mingo, tglx,
	hpa, riel, cl

On Thu, 2010-07-08 at 20:09 +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 21:05 +0300, Gleb Natapov wrote:
> > > > +   /* do alloc atomic since if we are going to sleep anyway we
> > > > +      may as well sleep faulting in page */
> > > > +   work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> > > > +   if (!work)
> > > > +           return 0;
> > > 
> > > GFP_KERNEL is fine for this context.
> > But it can sleep, no? The comment explains why I don't want to sleep
> > here. 
> 
> In that case, use 0, no use wasting __GFP_HIGH on something that doesn't
> actually need it.

Ah, I just saw we have GFP_NOWAIT for that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
  2010-07-08 18:10           ` Peter Zijlstra
@ 2010-07-09 15:50             ` Gleb Natapov
  -1 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-09 15:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marcelo Tosatti, kvm, linux-mm, linux-kernel, avi, mingo, tglx,
	hpa, riel, cl

On Thu, Jul 08, 2010 at 08:10:31PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 20:09 +0200, Peter Zijlstra wrote:
> > On Thu, 2010-07-08 at 21:05 +0300, Gleb Natapov wrote:
> > > > > +   /* do alloc atomic since if we are going to sleep anyway we
> > > > > +      may as well sleep faulting in page */
> > > > > +   work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> > > > > +   if (!work)
> > > > > +           return 0;
> > > > 
> > > > GFP_KERNEL is fine for this context.
> > > But it can sleep, no? The comment explains why I don't want to sleep
> > > here. 
> > 
> > In that case, use 0, no use wasting __GFP_HIGH on something that doesn't
> > actually need it.
> 
> Ah, I just saw we have GFP_NOWAIT for that.
Indeed. Will use GFP_NOWAIT.

--
			Gleb.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out.
@ 2010-07-09 15:50             ` Gleb Natapov
  0 siblings, 0 replies; 67+ messages in thread
From: Gleb Natapov @ 2010-07-09 15:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marcelo Tosatti, kvm, linux-mm, linux-kernel, avi, mingo, tglx,
	hpa, riel, cl

On Thu, Jul 08, 2010 at 08:10:31PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 20:09 +0200, Peter Zijlstra wrote:
> > On Thu, 2010-07-08 at 21:05 +0300, Gleb Natapov wrote:
> > > > > +   /* do alloc atomic since if we are going to sleep anyway we
> > > > > +      may as well sleep faulting in page */
> > > > > +   work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> > > > > +   if (!work)
> > > > > +           return 0;
> > > > 
> > > > GFP_KERNEL is fine for this context.
> > > But it can sleep, no? The comment explains why I don't want to sleep
> > > here. 
> > 
> > In that case, use 0, no use wasting __GFP_HIGH on something that doesn't
> > actually need it.
> 
> Ah, I just saw we have GFP_NOWAIT for that.
Indeed. Will use GFP_NOWAIT.

--
			Gleb.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2010-07-09 15:51 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-06 16:24 [PATCH v4 00/12] KVM: Add host swap event notifications for PV guest Gleb Natapov
2010-07-06 16:24 ` Gleb Natapov
2010-07-06 16:24 ` [PATCH v4 01/12] Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-07 16:17   ` Rik van Riel
2010-07-07 16:17     ` Rik van Riel
2010-07-06 16:24 ` [PATCH v4 02/12] Add PV MSR to enable asynchronous page faults delivery Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-07 18:13   ` Rik van Riel
2010-07-07 18:13     ` Rik van Riel
2010-07-06 16:24 ` [PATCH v4 03/12] Add async PF initialization to PV guest Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-07 15:41   ` Peter Zijlstra
2010-07-07 15:41     ` Peter Zijlstra
2010-07-07 15:41     ` Peter Zijlstra
2010-07-08 13:24     ` Gleb Natapov
2010-07-08 13:24       ` Gleb Natapov
2010-07-06 16:24 ` [PATCH v4 04/12] Provide special async page fault handler when async PF capability is detected Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-07 22:38   ` Rik van Riel
2010-07-07 22:38     ` Rik van Riel
2010-07-06 16:24 ` [PATCH v4 05/12] Export __get_user_pages_fast Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-07 22:45   ` Rik van Riel
2010-07-07 22:45     ` Rik van Riel
2010-07-06 16:24 ` [PATCH v4 06/12] Add get_user_pages() variant that fails if major fault is required Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-06 16:24 ` [PATCH v4 07/12] Maintain memslot version number Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-08  0:19   ` Rik van Riel
2010-07-08  0:19     ` Rik van Riel
2010-07-06 16:24 ` [PATCH v4 08/12] Inject asynchronous page fault into a guest if page is swapped out Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-08  4:09   ` Rik van Riel
2010-07-08  4:09     ` Rik van Riel
2010-07-08 15:59   ` Marcelo Tosatti
2010-07-08 15:59     ` Marcelo Tosatti
2010-07-08 18:05     ` Gleb Natapov
2010-07-08 18:05       ` Gleb Natapov
2010-07-08 18:09       ` Peter Zijlstra
2010-07-08 18:09         ` Peter Zijlstra
2010-07-08 18:09         ` Peter Zijlstra
2010-07-08 18:10         ` Peter Zijlstra
2010-07-08 18:10           ` Peter Zijlstra
2010-07-08 18:10           ` Peter Zijlstra
2010-07-09 15:50           ` Gleb Natapov
2010-07-09 15:50             ` Gleb Natapov
2010-07-06 16:24 ` [PATCH v4 09/12] Retry fault before vmentry Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-08  4:21   ` Rik van Riel
2010-07-08  4:21     ` Rik van Riel
2010-07-08 16:17   ` Marcelo Tosatti
2010-07-08 16:17     ` Marcelo Tosatti
2010-07-06 16:24 ` [PATCH v4 10/12] Handle async PF in non preemptable context Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-08  4:22   ` Rik van Riel
2010-07-08  4:22     ` Rik van Riel
2010-07-06 16:24 ` [PATCH v4 11/12] Let host know whether the guest can handle async PF in non-userspace context Gleb Natapov
2010-07-06 16:24   ` Gleb Natapov
2010-07-08  4:28   ` Rik van Riel
2010-07-08  4:28     ` Rik van Riel
2010-07-08 13:35     ` Gleb Natapov
2010-07-08 13:35       ` Gleb Natapov
2010-07-06 16:25 ` [PATCH v4 12/12] Send async PF when guest is not in userspace too Gleb Natapov
2010-07-06 16:25   ` Gleb Natapov
2010-07-08  4:29   ` Rik van Riel
2010-07-08  4:29     ` Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.