From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE2AAC433B4 for ; Wed, 19 May 2021 22:35:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8AD4E613AA for ; Wed, 19 May 2021 22:35:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229498AbhESWg0 (ORCPT ); Wed, 19 May 2021 18:36:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229465AbhESWgZ (ORCPT ); Wed, 19 May 2021 18:36:25 -0400 Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0ED5DC061574 for ; Wed, 19 May 2021 15:35:05 -0700 (PDT) Received: by mail-il1-x132.google.com with SMTP id l15so1939059ilh.1 for ; Wed, 19 May 2021 15:35:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8taWKY5LgEYAKZy1/shzdKgVUl6wjUkkU8aHH9O6HjM=; b=Pkw59cBUdOt2BsRLN3pKCcg0cRpOd8hDUQ5v3CIyWdCqMvo0HUogV6cjdSeKnwW/dp cxXsLgWuCI3sCcPrSS4mYKbuvOEOvfBfw+JWKXOB6jAS3SBzlkod60U/H90QpU6BziD2 iRg28PHFzNMgcwvrpQivl4uqBj5PMnk//40bg1IkjU3H2iNq7bRO2iFvL8vfEexvEP9o WdIzITvaxwdmoZaPvi4aDUDAVdBCseC9AF6SfWqtjDZZdsosKNRA11kmYiyTcEwPzjTT /4t0VsK+hWn0F3a/5R3nB+jINDaFulLoCit1QFImFHc2NCjz6zXv14+z4BCvfrXkRgUz cXNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8taWKY5LgEYAKZy1/shzdKgVUl6wjUkkU8aHH9O6HjM=; b=S4jdI1+/4Gkxgo4VCc8g2VghDv8ICJQfuPC/wZHrT+1o5dE6UyUEihjysW7CTkl5OT 4z9aQiQJSktoP44QjVWpfxOz+ubQINp5umwRWYnjahyaiBc2U/tvn0krwhII7rE3hlVM buW0yGuPu/g/GO4A808WRto389ffRxll8lOUqOF/QOLM+eiJU1gbfAc8Q6VmtJ7basH2 MkoR5Lv3yDxOoiC4EW25vUUhNhLgOKEUPjoSi/NUpMHQhyafAyaQ55uTon2pa1iVBoXv eKWPWNujwPE8SZP1UdP1yWv/hUsU0F8n/4eGGIoetWZxVW+E64pj0cMzZyHdIH1YZTEN I8iQ== X-Gm-Message-State: AOAM532Z5sOU5ETRe/To5mXiHUnVRmFxaP/Lv4/B77Dva0ZCLusc4Q5V doCarNcqR5TJhNVRaJhq01tcfznhH7rFKSztB2mhvw== X-Google-Smtp-Source: ABdhPJxLSIPs6OmtcLIIJUwUPBv+pyT0FSHXXjqDKhcwU578B0nnw+iM0scfKQJcUtHiIdMKau7x3aXjDCtKMEaDGjY= X-Received: by 2002:a92:dd04:: with SMTP id n4mr1648417ilm.165.1621463704103; Wed, 19 May 2021 15:35:04 -0700 (PDT) MIME-Version: 1.0 References: <20210519200339.829146-1-axelrasmussen@google.com> <20210519200339.829146-10-axelrasmussen@google.com> In-Reply-To: From: Axel Rasmussen Date: Wed, 19 May 2021 15:34:28 -0700 Message-ID: Subject: Re: [PATCH v2 09/10] KVM: selftests: allow using UFFD minor faults for demand paging To: Ben Gardon Cc: Aaron Lewis , Alexander Graf , Andrew Jones , Andrew Morton , Emanuele Giuseppe Esposito , Eric Auger , Jacob Xu , Makarand Sonare , Oliver Upton , Paolo Bonzini , Peter Xu , Shuah Khan , Yanan Wang , kvm , LKML , linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Wed, May 19, 2021 at 3:21 PM Ben Gardon wrote: > > On Wed, May 19, 2021 at 1:04 PM Axel Rasmussen wrote: > > > > UFFD handling of MINOR faults is a new feature whose use case is to > > speed up demand paging (compared to MISSING faults). So, it's > > interesting to let this selftest exercise this new mode. > > > > Modify the demand paging test to have the option of using UFFD minor > > faults, as opposed to missing faults. Now, when turning on userfaultfd > > with '-u', the desired mode has to be specified ("MISSING" or "MINOR"). > > > > If we're in minor mode, before registering, prefault via the *alias*. > > This way, the guest will trigger minor faults, instead of missing > > faults, and we can UFFDIO_CONTINUE to resolve them. > > > > Modify the page fault handler function to use the right ioctl depending > > on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE. > > > > Signed-off-by: Axel Rasmussen > > --- > > .../selftests/kvm/demand_paging_test.c | 112 ++++++++++++------ > > 1 file changed, 79 insertions(+), 33 deletions(-) > > > > diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c > > index 01890a7b0155..df7190261923 100644 > > --- a/tools/testing/selftests/kvm/demand_paging_test.c > > +++ b/tools/testing/selftests/kvm/demand_paging_test.c > > @@ -74,33 +74,48 @@ static void *vcpu_worker(void *data) > > return NULL; > > } > > > > -static int handle_uffd_page_request(int uffd, uint64_t addr) > > +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr) > > { > > - pid_t tid; > > + pid_t tid = syscall(__NR_gettid); > > struct timespec start; > > struct timespec ts_diff; > > - struct uffdio_copy copy; > > int r; > > > > - tid = syscall(__NR_gettid); > > + clock_gettime(CLOCK_MONOTONIC, &start); > > > > - copy.src = (uint64_t)guest_data_prototype; > > - copy.dst = addr; > > - copy.len = demand_paging_size; > > - copy.mode = 0; > > + if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) { > > + struct uffdio_copy copy; > > > > - clock_gettime(CLOCK_MONOTONIC, &start); > > + copy.src = (uint64_t)guest_data_prototype; > > + copy.dst = addr; > > + copy.len = demand_paging_size; > > + copy.mode = 0; > > + > > + r = ioctl(uffd, UFFDIO_COPY, ©); > > + if (r == -1) { > > + pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n", > > + addr, tid, errno); > > + return r; > > + } > > + } else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) { > > + struct uffdio_continue cont = {0}; > > + > > + cont.range.start = addr; > > + cont.range.len = demand_paging_size; > > > > - r = ioctl(uffd, UFFDIO_COPY, ©); > > - if (r == -1) { > > - pr_info("Failed Paged in 0x%lx from thread %d with errno: %d\n", > > - addr, tid, errno); > > - return r; > > + r = ioctl(uffd, UFFDIO_CONTINUE, &cont); > > + if (r == -1) { > > + pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n", > > + addr, tid, errno); > > + return r; > > + } > > + } else { > > + TEST_FAIL("Invalid uffd mode %d", uffd_mode); > > } > > > > ts_diff = timespec_elapsed(start); > > > > - PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid, > > + PER_PAGE_DEBUG("UFFD page-in %d \t%ld ns\n", tid, > > timespec_to_ns(ts_diff)); > > PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n", > > demand_paging_size, addr, tid); > > @@ -111,6 +126,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr) > > bool quit_uffd_thread; > > > > struct uffd_handler_args { > > + int uffd_mode; > > int uffd; > > int pipefd; > > useconds_t delay; > > @@ -187,7 +203,7 @@ static void *uffd_handler_thread_fn(void *arg) > > if (delay) > > usleep(delay); > > addr = msg.arg.pagefault.address; > > - r = handle_uffd_page_request(uffd, addr); > > + r = handle_uffd_page_request(uffd_args->uffd_mode, uffd, addr); > > if (r < 0) > > return NULL; > > pages++; > > @@ -203,13 +219,32 @@ static void *uffd_handler_thread_fn(void *arg) > > > > static void setup_demand_paging(struct kvm_vm *vm, > > pthread_t *uffd_handler_thread, int pipefd, > > - useconds_t uffd_delay, > > + int uffd_mode, useconds_t uffd_delay, > > struct uffd_handler_args *uffd_args, > > - void *hva, uint64_t len) > > + void *hva, void *alias, uint64_t len) > > { > > + bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR); > > int uffd; > > struct uffdio_api uffdio_api; > > struct uffdio_register uffdio_register; > > + uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY; > > + > > + PER_PAGE_DEBUG("Userfaultfd %s mode, faults resolved with %s\n", > > + is_minor ? "MINOR" : "MISSING", > > + is_minor ? "UFFDIO_CONINUE" : "UFFDIO_COPY"); > > + > > + /* In order to get minor faults, prefault via the alias. */ > > + if (is_minor) { > > + size_t p; > > + > > + expected_ioctls = ((uint64_t) 1) << _UFFDIO_CONTINUE; > > + > > + TEST_ASSERT(alias != NULL, "Alias required for minor faults"); > > + for (p = 0; p < (len / demand_paging_size); ++p) { > > + memcpy(alias + (p * demand_paging_size), > > + guest_data_prototype, demand_paging_size); > > + } > > + } > > Would it be worth timing this operation? I think we'd need to know how > long we spent prefaulting the memory to really be able to compare UDDF > modes using this test. It's easy to time it and print out a value, so I'm happy to add it. As for how useful it is, I'm not so sure. In general the way I think of it is, the prefaulting would happen during the precopy phase of live migration. During this phase, the VM is still running on the source machine, so the VM owner doesn't notice any performance degradation or slowness in this phase. > > > > > uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); > > TEST_ASSERT(uffd >= 0, "uffd creation failed, errno: %d", errno); > > @@ -222,12 +257,13 @@ static void setup_demand_paging(struct kvm_vm *vm, > > > > uffdio_register.range.start = (uint64_t)hva; > > uffdio_register.range.len = len; > > - uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; > > + uffdio_register.mode = uffd_mode; > > TEST_ASSERT(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) != -1, > > "ioctl UFFDIO_REGISTER failed"); > > - TEST_ASSERT((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) == > > - UFFD_API_RANGE_IOCTLS, "unexpected userfaultfd ioctl set"); > > + TEST_ASSERT((uffdio_register.ioctls & expected_ioctls) == > > + expected_ioctls, "missing userfaultfd ioctls"); > > > > + uffd_args->uffd_mode = uffd_mode; > > uffd_args->uffd = uffd; > > uffd_args->pipefd = pipefd; > > uffd_args->delay = uffd_delay; > > @@ -239,7 +275,7 @@ static void setup_demand_paging(struct kvm_vm *vm, > > } > > > > struct test_params { > > - bool use_uffd; > > + int uffd_mode; > > useconds_t uffd_delay; > > enum vm_mem_backing_src_type src_type; > > bool partition_vcpu_memory_access; > > @@ -276,7 +312,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) > > perf_test_setup_vcpus(vm, nr_vcpus, guest_percpu_mem_size, > > p->partition_vcpu_memory_access); > > > > - if (p->use_uffd) { > > + if (p->uffd_mode) { > > uffd_handler_threads = > > malloc(nr_vcpus * sizeof(*uffd_handler_threads)); > > TEST_ASSERT(uffd_handler_threads, "Memory allocation failed"); > > @@ -290,6 +326,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) > > for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) { > > vm_paddr_t vcpu_gpa; > > void *vcpu_hva; > > + void *vcpu_alias; > > uint64_t vcpu_mem_size; > > > > > > @@ -304,8 +341,9 @@ static void run_test(enum vm_guest_mode mode, void *arg) > > PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n", > > vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_mem_size); > > > > - /* Cache the HVA pointer of the region */ > > + /* Cache the host addresses of the region */ > > vcpu_hva = addr_gpa2hva(vm, vcpu_gpa); > > + vcpu_alias = addr_gpa2alias(vm, vcpu_gpa); > > > > /* > > * Set up user fault fd to handle demand paging > > @@ -316,8 +354,9 @@ static void run_test(enum vm_guest_mode mode, void *arg) > > TEST_ASSERT(!r, "Failed to set up pipefd"); > > > > setup_demand_paging(vm, &uffd_handler_threads[vcpu_id], > > - pipefds[vcpu_id * 2], p->uffd_delay, > > - &uffd_args[vcpu_id], vcpu_hva, > > + pipefds[vcpu_id * 2], p->uffd_mode, > > + p->uffd_delay, &uffd_args[vcpu_id], > > + vcpu_hva, vcpu_alias, > > vcpu_mem_size); > > } > > } > > @@ -346,7 +385,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) > > > > pr_info("All vCPU threads joined\n"); > > > > - if (p->use_uffd) { > > + if (p->uffd_mode) { > > char c; > > > > /* Tell the user fault fd handler threads to quit */ > > @@ -368,7 +407,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) > > > > free(guest_data_prototype); > > free(vcpu_threads); > > - if (p->use_uffd) { > > + if (p->uffd_mode) { > > free(uffd_handler_threads); > > free(uffd_args); > > free(pipefds); > > @@ -378,11 +417,11 @@ static void run_test(enum vm_guest_mode mode, void *arg) > > static void help(char *name) > > { > > puts(""); > > - printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n" > > + printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n" > > NIT: maybe use uffd_mode or some word other than mode here to > disambiguate with -m > > > " [-b memory] [-t type] [-v vcpus] [-o]\n", name); > > guest_modes_help(); > > - printf(" -u: use User Fault FD to handle vCPU page\n" > > - " faults.\n"); > > + printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n" > > + " UFFD registration mode: 'MISSING' or 'MINOR'.\n"); > > printf(" -d: add a delay in usec to the User Fault\n" > > " FD handler to simulate demand paging\n" > > " overheads. Ignored without -u.\n"); > > @@ -409,13 +448,17 @@ int main(int argc, char *argv[]) > > > > guest_modes_append_default(); > > > > - while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) { > > + while ((opt = getopt(argc, argv, "hm:u:d:b:t:v:o")) != -1) { > > switch (opt) { > > case 'm': > > guest_modes_cmdline(optarg); > > break; > > case 'u': > > - p.use_uffd = true; > > + if (!strcmp("MISSING", optarg)) > > + p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING; > > + else if (!strcmp("MINOR", optarg)) > > + p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR; > > + TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'."); > > break; > > case 'd': > > p.uffd_delay = strtoul(optarg, NULL, 0); > > @@ -442,6 +485,9 @@ int main(int argc, char *argv[]) > > } > > } > > > > + TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM, > > + "userfaultfd MINOR mode requires shared memory; pick a different -t"); > > + > > for_each_guest_mode(run_test, &p); > > > > return 0; > > -- > > 2.31.1.751.gd2f1c929bd-goog > >