[WIP Patch v2 00/14] Avoiding slow get-user-pages via memory fault exit

* [WIP Patch v2 00/14] Avoiding slow get-user-pages via memory fault exit
@ 2023-03-15  2:17 Anish Moorthy
  2023-03-15  2:17 ` [WIP Patch v2 01/14] KVM: selftests: Allow many vCPUs and reader threads per UFFD in demand paging test Anish Moorthy
                   ` (15 more replies)
  0 siblings, 16 replies; 60+ messages in thread
From: Anish Moorthy @ 2023-03-15  2:17 UTC (permalink / raw)
  To: seanjc; +Cc: jthoughton, kvm, Anish Moorthy

Hi Sean, here's what I'm planing to send up as v2 of the scalable
userfaultfd series.

Don't worry, I'm not asking you to review this all :) I just have a few
remaining questions regarding KVM_CAP_MEMORY_FAULT_EXIT which seem important
enough to mention before I ask for more attention from others, and they'll be
clearer with the patches in hand. Anything else I'm happy to find out about when
I send the actual v2.

I want your opinion on

1. The general API I've set up for KVM_CAP_MEMORY_FAULT_EXIT
   (described in the api.rst file)
2. Whether the UNKNOWN exit reason cases (everywhere but
   handle_error_pfn atm) would need to be given "real" reasons
   before this could be merged.
3. If you think I've missed sites that currently -EFAULT to userspace

About (3): after we agreed to only tackle cases where -EFAULT currently makes it
to userspace, I went though our list and tried to trace which EFAULTS actually
bubble up to KVM_RUN. That set ended being suspiciously small, so I wanted to
sanity-check my findings with you. Lmk if you see obvious errors in my list
below.

--- EFAULTs under KVM_RUN ---

Confident that needs conversion (already converted)
---------------------------------------------------
* direct_map
* handle_error_pfn
* setup_vmgexit_scratch
* kvm_handle_page_fault
* FNAME(fetch)

EFAULT does not propagate to userspace (do not convert)
-------------------------------------------------------
* record_steal_time (arch/x86/kvm/x86.c:3463)
* hva_to_pfn_retry
* kvm_vcpu_map
* FNAME(update_accessed_dirty_bits)
* __kvm_gfn_to_hva_cache_init
  Might actually make it to userspace, but only through
  kvm_read|write_guest_offset_cached- would be covered by those conversions
* kvm_gfn_to_hva_cache_init
* __kvm_read_guest_page
* hva_to_pfn_remapped
  handle_error_pfn will handle this for the scalable uffd case. Don't think
  other callers -EFAULT to userspace.

Still unsure if needs conversion
--------------------------------
* __kvm_read_guest_atomic
  The EFAULT might be propagated though FNAME(sync_page)?
* kvm_write_guest_offset_cached (virt/kvm/kvm_main.c:3226)
* __kvm_write_guest_page
  Called from kvm_write_guest_offset_cached: if that needs change, this does too
* kvm_write_guest_page
  Two interesting paths:
      - kvm_pv_clock_pairing returns a custom KVM_EFAULT error here
        (arch/x86/kvm/x86.c:9578)
      - kvm_write_guest_offset_cached returns this directly (so if that needs
        change, this does too)
* kvm_read_guest_offset_cached
  I actually do see a path to userspace, but it's through hyper-v, which we've
  said is out of scope for round 1.

--- Actual Cover Letter ---

Omitted: hasn't changed much since v1 anyways

--- Changelog ---

WIP v2
  - Introduce KVM_CAP_X86_MEMORY_FAULT_EXIT.
  - API changes:
        - Gate KVM_CAP_MEMORY_FAULT_NOWAIT behind
          KVM_CAP_x86_MEMORY_FAULT_EXIT (on x86 only: arm has no such
          requirement).
        - Switched to memslot flag
  - Take Oliver's simplification to the "allow fast gup for readable
    faults" logic.
  - Slightly redefine the return code of user_mem_abort.
  - Fix documentation errors brought up by Marc
  - Reword commit messages in imperative mood

v1: https://lore.kernel.org/kvm/20230215011614.725983-1-amoorthy@google.com/

Anish Moorthy (14):
  KVM: selftests: Allow many vCPUs and reader threads per UFFD in demand
    paging test
  KVM: selftests: Use EPOLL in userfaultfd_util reader threads and
    signal errors via TEST_ASSERT
  KVM: Allow hva_pfn_fast to resolve read-only faults.
  KVM: x86: Add KVM_CAP_X86_MEMORY_FAULT_EXIT and associated kvm_run
    field
  KVM: x86: Implement memory fault exit for direct_map
  KVM: x86: Implement memory fault exit for kvm_handle_page_fault
  KVM: x86: Implement memory fault exit for setup_vmgexit_scratch
  KVM: x86: Implement memory fault exit for FNAME(fetch)
  KVM: Introduce KVM_CAP_MEMORY_FAULT_NOWAIT without implementation
  KVM: x86: Implement KVM_CAP_MEMORY_FAULT_NOWAIT
  KVM: arm64: Allow user_mem_abort to return 0 to signal a 'normal' exit
  KVM: arm64: Implement KVM_CAP_MEMORY_FAULT_NOWAIT
  KVM: selftests: Add memslot_flags parameter to memstress_create_vm
  KVM: selftests: Handle memory fault exits in demand_paging_test

 Documentation/virt/kvm/api.rst                |  74 ++++-
 arch/arm64/kvm/arm.c                          |   1 +
 arch/arm64/kvm/mmu.c                          |  29 +-
 arch/x86/kvm/mmu/mmu.c                        |  42 ++-
 arch/x86/kvm/mmu/paging_tmpl.h                |   4 +-
 arch/x86/kvm/svm/sev.c                        |   4 +-
 arch/x86/kvm/x86.c                            |   2 +
 include/linux/kvm_host.h                      |  22 ++
 include/uapi/linux/kvm.h                      |  19 ++
 tools/include/uapi/linux/kvm.h                |  17 ++
 .../selftests/kvm/aarch64/page_fault_test.c   |   4 +-
 .../selftests/kvm/access_tracking_perf_test.c |   2 +-
 .../selftests/kvm/demand_paging_test.c        | 253 ++++++++++++++----
 .../selftests/kvm/dirty_log_perf_test.c       |   2 +-
 .../testing/selftests/kvm/include/memstress.h |   2 +-
 .../selftests/kvm/include/userfaultfd_util.h  |  18 +-
 tools/testing/selftests/kvm/lib/memstress.c   |   4 +-
 .../selftests/kvm/lib/userfaultfd_util.c      | 160 ++++++-----
 .../kvm/memslot_modification_stress_test.c    |   2 +-
 virt/kvm/kvm_main.c                           |  41 ++-
 20 files changed, 544 insertions(+), 158 deletions(-)

-- 
2.40.0.rc1.284.g88254d51c5-goog

^ permalink raw reply	[flat|nested] 60+ messages in thread