linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] KVM updates for the 2.6.38 merge window
@ 2011-01-10  9:21 Avi Kivity
  2011-01-10 19:31 ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2011-01-10  9:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-kernel, KVM list

Linus, please pull from

   git://git.kernel.org/pub/scm/virt/kvm/kvm.git kvm-updates/2.6.38

to receive the KVM updates for the 2.6.38 merge window.  Major features 
include

- asynchronous page faults, which allow a guest to continue processing 
interrupts even when its memory is being paged in; in the case of a 
Linux 2.6.38+ guest, it will receive a notification that the host is 
servicing a page fault, and may switch into another guest process
- AMD Bulldozer virtualization extensions: instruction decode assist, 
clean bits, xsave/avx, flush-by-asid

as well as other minor enhancements and fixes.

Alexander Graf (1):
       KVM: PPC: Fix compile warning

Andi Kleen (1):
       KVM: Move KVM context switch into own function

Andre Przywara (8):
       KVM: x86: fix CR8 handling
       KVM: move complete_insn_gp() into x86.c
       KVM: cleanup emulate_instruction
       KVM: SVM: add new SVM feature bit names
       KVM: SVM: enhance MOV CR intercept handler
       KVM: SVM: enhance mov DR intercept handler
       KVM: SVM: implement enhanced INVLPG intercept
       KVM: SVM: copy instruction bytes from VMCB

Anthony Liguori (1):
       KVM: VMX: add module parameter to avoid trapping HLT instructions 
(v5)

Avi Kivity (36):
       KVM: Don't reset mmu context unnecessarily when updating EFER
       KVM: SVM: Move guest register save out of interrupts disabled section
       KVM: SVM: Move svm->host_gs_base into a separate structure
       KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path
       KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their 
callers
       KVM: Avoid double interrupt injection with vapic
       KVM: VMX: Disallow NMI while blocked by STI
       KVM: SVM: Replace svm_has() by standard Linux cpuid accessors
       KVM: Mask KVM_GET_SUPPORTED_CPUID data with Linux cpuid info
       KVM: x86 emulator: drop unused #ifndef __KERNEL__
       KVM: x86 emulator: drop DPRINTF()
       KVM: x86 emulator: preserve an operand's segment identity
       KVM: x86 emulator: do not perform address calculations on linear 
addresses
       KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()
       KVM: Record instruction set in kvm_exit tracepoint
       KVM: Add instruction-set-specific exit qualifications to kvm_exit 
trace
       KVM: x86 emulator: introduce struct x86_exception to communicate 
faults
       KVM: x86 emulator: make emulator memory callbacks return full 
exception
       KVM: x86 emulator: drop dead pf injection in emulate_popf()
       KVM: x86 emulator: tighen up ->read_std() and ->write_std() error 
checks
       KVM: x86 emulator: simplify exception generation
       KVM: Push struct x86_exception info the various gva_to_gpa variants
       KVM: Push struct x86_exception into walk_addr()
       KVM: Pull extra page fault information into struct x86_exception
       KVM: Don't spin on virt instruction faults during reboot
       KVM: VMX: Return 0 from a failed VMREAD
       KVM: Fix build error on s390 due to missing tlbs_dirty
       KVM: MMU: Fix incorrect direct page write protection due to ro 
host page
       KVM: Correct kvm_pio tracepoint count field
       KVM guest: Fix kvm clock initialization when it's configured out
       KVM: VMX: Add definitions for more vm entry/exit control bits
       KVM: VMX: Optimize atomic EFER load
       KVM: MMU: Initialize base_role for tdp mmus
       KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()
       KVM: Replace reads of vcpu->arch.cr3 by an accessor
       KVM: Fetch guest cr3 from hardware on demand

Gleb Natapov (16):
       Add get_user_pages() variant that fails if major fault is required.
       KVM: Halt vcpu if page it tries to access is swapped out.
       KVM: Retry fault before vmentry
       KVM: Add memory slot versioning and use it to provide fast guest 
write interface
       KVM paravirt: Move kvm_smp_prepare_boot_cpu() from kvmclock.c to 
kvm.c.
       KVM: Add PV MSR to enable asynchronous page faults delivery.
       KVM paravirt: Add async PF initialization to PV guest.
       KVM: Handle async PF in a guest.
       KVM: Inject asynchronous page fault into a PV guest if page is 
swapped out.
       KVM paravirt: Handle async PF in non preemptable context
       KVM: Let host know whether the guest can handle async PF in 
non-userspace context.
       KVM: Send async PF when guest is not in userspace too.
       KVM: improve hva_to_pfn() readability
       KVM: x86: trace "exit to userspace" event
       KVM: handle exit due to INVD in VMX
       KVM: VMX: when entering real mode align segment base to 16 bytes

Heiko Carstens (2):
       KVM: add cast within kvm_clear_guest_page to fix warning
       KVM: get rid of warning within kvm_dev_ioctl_create_vm

Jan Kiszka (11):
       KVM: x86: Mark kvm_arch_setup_async_pf static
       KVM: x86: Add missing inline tag to kvm_read_and_reset_pf_reason
       KVM: x86: Avoid issuing wbinvd twice
       KVM: Refactor srcu struct release on early errors
       KVM: Clean up vm creation and release
       KVM: Clear assigned guest IRQ on release
       KVM: Switch assigned device IRQ forwarding to threaded handler
       KVM: Refactor IRQ names of assigned devices
       KVM: Save/restore state of assigned PCI device
       KVM: Clean up kvm_vm_ioctl_assigned_device
       KVM: Document device assigment API

Joerg Roedel (23):
       KVM: X86: Introduce generic guest-mode representation
       KVM: SVM: Make Use of the generic guest-mode functions
       KVM: X86: Don't report L2 emulation failures to user-space
       KVM: SVM: Add function to recalculate intercept masks
       KVM: SVM: Add manipulation functions for DRx intercepts
       KVM: SVM: Add manipulation functions for exception intercepts
       KVM: SVM: Add manipulation functions for misc intercepts
       KVM: SVM: Use get_host_vmcb function in svm_get_msr for TSC
       KVM: SVM: Add clean-bit for intercetps, tsc-offset and pause 
filter count
       KVM: SVM: Add clean-bit for IOPM_BASE and MSRPM_BASE
       KVM: SVM: Add clean-bit for the ASID
       KVM: SVM: Add clean-bit for interrupt state
       KVM: SVM: Add clean-bit for NPT state
       KVM: SVM: Add clean-bit for control registers
       KVM: SVM: Add clean-bit for DR6 and DR7
       KVM: SVM: Add clean-bit for GDT and IDT
       KVM: SVM: Add clean-bit for Segements and CPL
       KVM: SVM: Add clean-bit for CR2 register
       KVM: SVM: Add clean-bit for LBR state
       KVM: SVM: Remove flush_guest_tlb function
       KVM: SVM: Use svm_flush_tlb instead of force_new_asid
       KVM: SVM: Implement Flush-By-Asid feature
       KVM: SVM: Add xsetbv intercept

Lai Jiangshan (2):
       KVM: MMU: rename 'reset_host_protection' to 'host_writable'
       KVM: return true when user space query KVM_CAP_USER_NMI extension

Marcelo Tosatti (5):
       KVM: VMX: remove setting of shadow_base_ptes for EPT
       KVM: MMU: remove kvm_mmu_set_base_ptes
       KVM: MMU: flush TLBs on writable -> read-only spte overwrite
       KVM: propagate fault r/w information to gup(), allow read-only memory
       KVM: MMU: only write protect mappings at pagetable level

Michael S. Tsirkin (1):
       KVM: fast-path msi injection with irqfd

Roedel, Joerg (2):
       KVM: SVM: Add manipulation functions for CRx intercepts
       KVM: SVM: Add clean-bits infrastructure code

Shane Wang (1):
       KVM: VMX: Inform user about INTEL_TXT dependency

Takuya Yoshikawa (9):
       KVM: introduce wrapper functions for creating/destroying dirty 
bitmaps
       KVM: pre-allocate one more dirty bitmap to avoid vmalloc()
       KVM: use kmalloc() for small dirty bitmaps
       KVM: replace vmalloc and memset with vzalloc
       KVM: take kvm_lock for hardware_disable() during cpu hotplug
       KVM: rename hardware_[dis|en]able() to *_nolock() and add locking 
wrappers
       KVM: MMU: Avoid dropping accessed bit while removing write access
       KVM: MMU: Make the way of accessing lpage_info more generic
       KVM: Take missing slots_lock for kvm_io_bus_unregister_dev()

Tracey Dent (1):
       KVM: x86: Makefile clean up

Xiao Guangrong (22):
       KVM: fix tracing kvm_try_async_get_page
       KVM: cleanup async_pf tracepoints
       KVM: fix searching async gfn in kvm_async_pf_gfn_slot
       KVM: avoid unnecessary wait for a async pf
       KVM: handle more completed apfs if possible
       KVM: fix the race while wakeup all pv guest
       KVM: remove unused function declaration
       KVM: MMU: fix missing post sync audit
       KVM: MMU: clear apfs if page state is changed
       KVM: MMU: support apf for nonpaing guest
       KVM: MMU: fix apf prefault if nested guest is enabled
       KVM: MMU: don't mark spte notrap if reserved bit set
       KVM: MMU: fix forgot flush tlbs on sync_page path
       KVM: MMU: don't drop spte if overwrite it from W to RO
       KVM: MMU: remove 'clear_unsync' parameter
       KVM: MMU: abstract invalid guest pte mapping
       KVM: MMU: delay flush all tlbs on sync_page path
       KVM: MMU: rename 'no_apf' to 'prefault'
       KVM: MMU: fix accessed bit set on prefault path
       KVM: MMU: retry #PF for softmmu
       KVM: MMU: audit: allow audit more guests at the same time
       KVM: MMU: handle 'map_writable' in set_spte() function

  Documentation/kernel-parameters.txt |    3 +
  Documentation/kvm/api.txt           |  178 +++++++
  Documentation/kvm/cpuid.txt         |    3 +
  Documentation/kvm/msr.txt           |   36 ++-
  arch/ia64/include/asm/kvm_host.h    |    4 +
  arch/ia64/kvm/kvm-ia64.c            |   30 +-
  arch/powerpc/kvm/book3s.c           |    4 +-
  arch/powerpc/kvm/powerpc.c          |   20 +-
  arch/s390/kvm/kvm-s390.c            |   23 +-
  arch/x86/include/asm/kvm_emulate.h  |   35 +-
  arch/x86/include/asm/kvm_host.h     |   99 +++--
  arch/x86/include/asm/kvm_para.h     |   24 +
  arch/x86/include/asm/svm.h          |   57 ++-
  arch/x86/include/asm/traps.h        |    1 +
  arch/x86/include/asm/vmx.h          |   15 +
  arch/x86/kernel/entry_32.S          |   10 +
  arch/x86/kernel/entry_64.S          |    3 +
  arch/x86/kernel/kvm.c               |  317 +++++++++++++
  arch/x86/kernel/kvmclock.c          |   13 +-
  arch/x86/kvm/Kconfig                |    1 +
  arch/x86/kvm/Makefile               |    3 +-
  arch/x86/kvm/emulate.c              |  367 +++++++---------
  arch/x86/kvm/kvm_cache_regs.h       |   22 +
  arch/x86/kvm/lapic.c                |    3 +-
  arch/x86/kvm/mmu.c                  |  251 +++++++----
  arch/x86/kvm/mmu_audit.c            |   39 +-
  arch/x86/kvm/paging_tmpl.h          |  147 ++++---
  arch/x86/kvm/svm.c                  |  865 
++++++++++++++++++++++-------------
  arch/x86/kvm/trace.h                |   17 +-
  arch/x86/kvm/vmx.c                  |  156 +++++--
  arch/x86/kvm/x86.c                  |  471 ++++++++++++++-----
  fs/ncpfs/mmap.c                     |    2 +
  include/linux/kvm.h                 |    1 +
  include/linux/kvm_host.h            |  101 ++++-
  include/linux/kvm_types.h           |    7 +
  include/linux/mm.h                  |    5 +
  include/trace/events/kvm.h          |  121 +++++
  mm/filemap.c                        |    3 +
  mm/memory.c                         |   31 ++-
  mm/shmem.c                          |    8 +-
  virt/kvm/Kconfig                    |    3 +
  virt/kvm/assigned-dev.c             |  125 ++----
  virt/kvm/async_pf.c                 |  216 +++++++++
  virt/kvm/async_pf.h                 |   36 ++
  virt/kvm/eventfd.c                  |   91 ++++-
  virt/kvm/irq_comm.c                 |    7 +-
  virt/kvm/kvm_main.c                 |  342 ++++++++++----
  47 files changed, 3127 insertions(+), 1189 deletions(-)
  create mode 100644 virt/kvm/async_pf.c
  create mode 100644 virt/kvm/async_pf.h

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] KVM updates for the 2.6.38 merge window
  2011-01-10  9:21 [GIT PULL] KVM updates for the 2.6.38 merge window Avi Kivity
@ 2011-01-10 19:31 ` Linus Torvalds
  2011-01-11  9:25   ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2011-01-10 19:31 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, linux-kernel, KVM list

On Mon, Jan 10, 2011 at 1:21 AM, Avi Kivity <avi@redhat.com> wrote:
>
> - asynchronous page faults, which allow a guest to continue processing
> interrupts even when its memory is being paged in; in the case of a Linux
> 2.6.38+ guest, it will receive a notification that the host is servicing a
> page fault, and may switch into another guest process

So quite frankly, I don't like how this was done.

When you touch files like mm/memory.c, you don't just touch them. You
get sign-offs and acks from the VM maintainers. Seriously.

In this case, I pulled, looked, and then unpulled. I just don't want
it, and I think the new FAULT_FLAG_MINOR is seriously mis-named and
hacky.

Is it about atomicity? Is it about IO? Why wasn't I notified
before-hand? Was Andrew cc'd?

Not pulled.

                     Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] KVM updates for the 2.6.38 merge window
  2011-01-10 19:31 ` Linus Torvalds
@ 2011-01-11  9:25   ` Avi Kivity
  2011-01-11 16:19     ` Linus Torvalds
  2011-01-12 20:33     ` Rik van Riel
  0 siblings, 2 replies; 10+ messages in thread
From: Avi Kivity @ 2011-01-11  9:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-kernel, KVM list

On 01/10/2011 09:31 PM, Linus Torvalds wrote:
> On Mon, Jan 10, 2011 at 1:21 AM, Avi Kivity<avi@redhat.com>  wrote:
> >
> >  - asynchronous page faults, which allow a guest to continue processing
> >  interrupts even when its memory is being paged in; in the case of a Linux
> >  2.6.38+ guest, it will receive a notification that the host is servicing a
> >  page fault, and may switch into another guest process
>
> So quite frankly, I don't like how this was done.
>
> When you touch files like mm/memory.c, you don't just touch them. You
> get sign-offs and acks from the VM maintainers. Seriously.
>
> In this case, I pulled, looked, and then unpulled. I just don't want
> it, and I think the new FAULT_FLAG_MINOR is seriously mis-named and
> hacky.
>
> Is it about atomicity? Is it about IO?

IO.  It means, only allow a minor fault; fail for a major fault.  Can 
you suggest a better name?

>   Why wasn't I notified
> before-hand? Was Andrew cc'd?

Andrew and linux-mm were copied.  Rik was the only one who reviewed (and 
ack'ed) it.  I guess I should have explicitly asked for Nick's review.

How do you want to proceed?  I can pull this patch out and stub out the 
callers in kvm (which will neuter async page faults for the time being, 
but we can live with that), then fix it in the background, or we can try 
to resolve it now.

What are your issues with the patch?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] KVM updates for the 2.6.38 merge window
  2011-01-11  9:25   ` Avi Kivity
@ 2011-01-11 16:19     ` Linus Torvalds
  2011-01-11 17:14       ` Avi Kivity
  2011-01-12 20:33     ` Rik van Riel
  1 sibling, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2011-01-11 16:19 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, linux-kernel, KVM list

On Tue, Jan 11, 2011 at 1:25 AM, Avi Kivity <avi@redhat.com> wrote:
>
> What are your issues with the patch?

My issues are mainly two-fold:

 - I think "MINOR" is a totally idiotic and meaningless term. It has
no technical meaning. Why would IO be special? Is it because of
deadlock concerns with filesystem or block device layer locks? No. And
it clearly isn't about "sleeping", since a major fault can be
non-sleeping (think ramdisk, for example).

   Look at the other FAULT_FLAG_xyzzy flags. They have _hard_
technical reasons. There's no ambiguity. And we ALREADY HAVE the one
that says "return if it would need to wait", and it's called
FAULT_FLAG_ALLOW_RETRY.

The other issue is:

 - I wasn't aware of this, and clearly not enough other people were
either, or somebody would have told you that we already had people
working on the whole FAULT_FLAG_ALLOW_RETRY thing that is much fancier
and technically superior.

So it simply boils down to the fact that I don't think
FAULT_FLAG_MAJOR was a good idea. It's badly done, is a total and
utter hack, and I don't see why I should ever merge it.

                          Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] KVM updates for the 2.6.38 merge window
  2011-01-11 16:19     ` Linus Torvalds
@ 2011-01-11 17:14       ` Avi Kivity
  0 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2011-01-11 17:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-kernel, KVM list

On 01/11/2011 06:19 PM, Linus Torvalds wrote:
> On Tue, Jan 11, 2011 at 1:25 AM, Avi Kivity<avi@redhat.com>  wrote:
>> What are your issues with the patch?
> My issues are mainly two-fold:
>
>   - I think "MINOR" is a totally idiotic and meaningless term. It has
> no technical meaning. Why would IO be special? Is it because of
> deadlock concerns with filesystem or block device layer locks? No. And
> it clearly isn't about "sleeping", since a major fault can be
> non-sleeping (think ramdisk, for example).
>
>     Look at the other FAULT_FLAG_xyzzy flags. They have _hard_
> technical reasons. There's no ambiguity. And we ALREADY HAVE the one
> that says "return if it would need to wait", and it's called
> FAULT_FLAG_ALLOW_RETRY.
>

Okay; I'll drop that patch, and look at reusing the existing infrastructure.

> The other issue is:
>
>   - I wasn't aware of this, and clearly not enough other people were
> either, or somebody would have told you that we already had people
> working on the whole FAULT_FLAG_ALLOW_RETRY thing that is much fancier
> and technically superior.
>
> So it simply boils down to the fact that I don't think
> FAULT_FLAG_MAJOR was a good idea. It's badly done, is a total and
> utter hack, and I don't see why I should ever merge it.

And I'll improve the process on core patches as well.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] KVM updates for the 2.6.38 merge window
  2011-01-11  9:25   ` Avi Kivity
  2011-01-11 16:19     ` Linus Torvalds
@ 2011-01-12 20:33     ` Rik van Riel
  2011-01-12 20:53       ` Linus Torvalds
  1 sibling, 1 reply; 10+ messages in thread
From: Rik van Riel @ 2011-01-12 20:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Linus Torvalds, Marcelo Tosatti, linux-kernel, KVM list

On 01/11/2011 04:25 AM, Avi Kivity wrote:
> On 01/10/2011 09:31 PM, Linus Torvalds wrote:

>> Why wasn't I notified
>> before-hand? Was Andrew cc'd?
>
> Andrew and linux-mm were copied. Rik was the only one who reviewed (and
> ack'ed) it. I guess I should have explicitly asked for Nick's review.

Last time I reviewed it, FAULT_FLAG_ALLOW_RETRY did not
exist.  The async pagefault patches have been sitting in
limbo for quite a while...

Now that we have FAULT_FLAG_ALLOW_RETRY, the async
pagefault patches can be a little smaller.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] KVM updates for the 2.6.38 merge window
  2011-01-12 20:33     ` Rik van Riel
@ 2011-01-12 20:53       ` Linus Torvalds
  2011-01-13 12:53         ` Gleb Natapov
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2011-01-12 20:53 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Avi Kivity, Marcelo Tosatti, linux-kernel, KVM list

[-- Attachment #1: Type: text/plain, Size: 863 bytes --]

On Wed, Jan 12, 2011 at 12:33 PM, Rik van Riel <riel@redhat.com> wrote:
>
> Now that we have FAULT_FLAG_ALLOW_RETRY, the async
> pagefault patches can be a little smaller.

I suspect you do still want a new page flag, to say that
FAULT_FLAG_ALLOW_RETRY shouldn't actually wait for the page that it
allows retry for.

But even then, that flag should not be named "MINOR", it should be
about what the behaviour is actually all about ("NOWAIT_RETRY" or
whatever - it presumably would also cause us to not drop the
mmap_sem).

IOW, these days I suspect the patch _should_ look something like the attached.

Anyway, with this, you should be able to use

  FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT

to basically get a non-waiting page fault (and it will return the
VM_FAULT_RETRY error code if it failed).

NOTE! TOTALLY UNTESTED!

                       Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1116 bytes --]

 include/linux/mm.h |    1 +
 mm/filemap.c       |    6 ++++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 721f451..dc83565 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -145,6 +145,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_NONLINEAR	0x02	/* Fault was via a nonlinear mapping */
 #define FAULT_FLAG_MKWRITE	0x04	/* Fault was mkwrite of existing pte */
 #define FAULT_FLAG_ALLOW_RETRY	0x08	/* Retry fault if blocking */
+#define FAULT_FLAG_RETRY_NOWAIT	0x10	/* Don't drop mmap_sem and wait when retrying */
 
 /*
  * This interface is used by x86 PAT code to identify a pfn mapping that is
diff --git a/mm/filemap.c b/mm/filemap.c
index ca38939..10eecf7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -621,8 +621,10 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 		__lock_page(page);
 		return 1;
 	} else {
-		up_read(&mm->mmap_sem);
-		wait_on_page_locked(page);
+		if (!(flags & FAULT_FLAG_RETRY_NOWAIT)) {
+			up_read(&mm->mmap_sem);
+			wait_on_page_locked(page);
+		}
 		return 0;
 	}
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] KVM updates for the 2.6.38 merge window
  2011-01-12 20:53       ` Linus Torvalds
@ 2011-01-13 12:53         ` Gleb Natapov
  2011-01-13 15:43           ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Gleb Natapov @ 2011-01-13 12:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Avi Kivity, Marcelo Tosatti, linux-kernel, KVM list

On Wed, Jan 12, 2011 at 12:53:14PM -0800, Linus Torvalds wrote:
> On Wed, Jan 12, 2011 at 12:33 PM, Rik van Riel <riel@redhat.com> wrote:
> >
> > Now that we have FAULT_FLAG_ALLOW_RETRY, the async
> > pagefault patches can be a little smaller.
> 
> I suspect you do still want a new page flag, to say that
> FAULT_FLAG_ALLOW_RETRY shouldn't actually wait for the page that it
> allows retry for.
> 
> But even then, that flag should not be named "MINOR", it should be
> about what the behaviour is actually all about ("NOWAIT_RETRY" or
> whatever - it presumably would also cause us to not drop the
> mmap_sem).
> 
> IOW, these days I suspect the patch _should_ look something like the attached.
> 
> Anyway, with this, you should be able to use
> 
>   FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT
> 
> to basically get a non-waiting page fault (and it will return the
> VM_FAULT_RETRY error code if it failed).
> 
I implemented get_user_pages_nowait() on top of your patch. In my testing
it works as expected when used inside KVM. Does this looks OK to you?

diff --git a/include/linux/mm.h b/include/linux/mm.h
index dc83565..d78e9e7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -859,6 +859,9 @@ extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			unsigned long start, int nr_pages, int write, int force,
 			struct page **pages, struct vm_area_struct **vmas);
+int get_user_pages_nowait(struct task_struct *tsk, struct mm_struct *mm,
+			unsigned long start, int nr_pages, int write, int force,
+			struct page **pages, struct vm_area_struct **vmas);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
 struct page *get_dump_page(unsigned long addr);
@@ -1416,6 +1419,8 @@ struct page *follow_page(struct vm_area_struct *, unsigned long address,
 #define FOLL_GET	0x04	/* do get_page on page */
 #define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
 #define FOLL_FORCE	0x10	/* get_user_pages read/write w/o permission */
+#define FOLL_RETRY	0x20	/* if disk transfer is needed release mmap_sem and return error */
+#define FOLL_NOWAIT	0x40	/* if disk transfer is needed return error without releasing mmap_sem */
 
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
 			void *data);
diff --git a/mm/memory.c b/mm/memory.c
index 02e48aa..0a3d3b5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1443,8 +1443,12 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 				int ret;
 
 				ret = handle_mm_fault(mm, vma, start,
-					(foll_flags & FOLL_WRITE) ?
-					FAULT_FLAG_WRITE : 0);
+					((foll_flags & FOLL_WRITE) ?
+					FAULT_FLAG_WRITE : 0) |
+					((foll_flags & FOLL_RETRY) ?
+					FAULT_FLAG_ALLOW_RETRY : 0) |
+					((foll_flags & FOLL_NOWAIT) ?
+					FAULT_FLAG_RETRY_NOWAIT : 0));
 
 				if (ret & VM_FAULT_ERROR) {
 					if (ret & VM_FAULT_OOM)
@@ -1460,6 +1464,9 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 				else
 					tsk->min_flt++;
 
+				if (ret & VM_FAULT_RETRY)
+					return i ? i : -EFAULT;
+
 				/*
 				 * The VM_FAULT_WRITE bit tells us that
 				 * do_wp_page has broken COW when necessary,
@@ -1563,6 +1570,23 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+int get_user_pages_nowait(struct task_struct *tsk, struct mm_struct *mm,
+               unsigned long start, int nr_pages, int write, int force,
+               struct page **pages, struct vm_area_struct **vmas)
+{
+       int flags = FOLL_TOUCH | FOLL_RETRY | FOLL_NOWAIT;
+
+       if (pages)
+               flags |= FOLL_GET;
+       if (write)
+               flags |= FOLL_WRITE;
+       if (force)
+               flags |= FOLL_FORCE;
+
+       return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas);
+}
+EXPORT_SYMBOL(get_user_pages_nowait);
+
 /**
  * get_dump_page() - pin user page in memory while writing it to core dump
  * @addr: user address
--
			Gleb.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] KVM updates for the 2.6.38 merge window
  2011-01-13 12:53         ` Gleb Natapov
@ 2011-01-13 15:43           ` Linus Torvalds
  2011-01-13 18:58             ` Hugh Dickins
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2011-01-13 15:43 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Rik van Riel, Avi Kivity, Marcelo Tosatti, linux-kernel, KVM list

On Thu, Jan 13, 2011 at 4:53 AM, Gleb Natapov <gleb@redhat.com> wrote:
>
> I implemented get_user_pages_nowait() on top of your patch. In my testing
> it works as expected when used inside KVM. Does this looks OK to you?

It looks reasonable, although I suspect the subtle behavior wrt the
mmap_sem means that you should not expose the magic bare
FAULT_FLAG_ALLOW_RETRY flag to the __get_user_pages() thing. It's just
too easy to introduce bugs, methinks.

So I'd suggest

 - drop FOLL_RETRY

 -  make FOLL_NOWAIT set both (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_NOWAIT)

and that way the get_user_pages() thing will never release the
mmap_sem, and you never have any subtle locking issues for that
particular interface.

But some other VM person should look at it too.

                                 Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] KVM updates for the 2.6.38 merge window
  2011-01-13 15:43           ` Linus Torvalds
@ 2011-01-13 18:58             ` Hugh Dickins
  0 siblings, 0 replies; 10+ messages in thread
From: Hugh Dickins @ 2011-01-13 18:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Gleb Natapov, Rik van Riel, Avi Kivity, Marcelo Tosatti,
	linux-kernel, KVM list, Andrew Morton

On Thu, Jan 13, 2011 at 7:43 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 13, 2011 at 4:53 AM, Gleb Natapov <gleb@redhat.com> wrote:
>>
>> I implemented get_user_pages_nowait() on top of your patch. In my testing
>> it works as expected when used inside KVM. Does this looks OK to you?
>
> It looks reasonable, although I suspect the subtle behavior wrt the
> mmap_sem means that you should not expose the magic bare
> FAULT_FLAG_ALLOW_RETRY flag to the __get_user_pages() thing. It's just
> too easy to introduce bugs, methinks.
>
> So I'd suggest
>
>  - drop FOLL_RETRY
>
>  -  make FOLL_NOWAIT set both (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_NOWAIT)
>
> and that way the get_user_pages() thing will never release the
> mmap_sem, and you never have any subtle locking issues for that
> particular interface.
>
> But some other VM person should look at it too.

I haven't been following and am not really looking yet, but should
express a preference: that Gleb keep it as he had it (if that works),
rather than making FOLL_NOWAIT do a combination.

A couple of months ago I needed to add a FOLL flag myself, and made a
patch to use the same space for FOLL flags and FAULT_FLAGs, to end
such ugliness as you see in this patch:

+                                       ((foll_flags & FOLL_WRITE) ?
+                                       FAULT_FLAG_WRITE : 0) |
+                                       ((foll_flags & FOLL_RETRY) ?
+                                       FAULT_FLAG_ALLOW_RETRY : 0) |
+                                       ((foll_flags & FOLL_NOWAIT) ?
+                                       FAULT_FLAG_RETRY_NOWAIT : 0));

But I missed my window, I think several have been added since, with
more on the way: I intend to put it together again for mmotm once the
dust has settled a little.

__get_user_pages already has wrappers for friendly usage, I think it's
okay for the wrappers to have special knowledge of what's needed.

Hugh

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-01-13 18:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-10  9:21 [GIT PULL] KVM updates for the 2.6.38 merge window Avi Kivity
2011-01-10 19:31 ` Linus Torvalds
2011-01-11  9:25   ` Avi Kivity
2011-01-11 16:19     ` Linus Torvalds
2011-01-11 17:14       ` Avi Kivity
2011-01-12 20:33     ` Rik van Riel
2011-01-12 20:53       ` Linus Torvalds
2011-01-13 12:53         ` Gleb Natapov
2011-01-13 15:43           ` Linus Torvalds
2011-01-13 18:58             ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).