From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37757C433F5 for ; Thu, 27 Jan 2022 19:39:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245646AbiA0Tj6 (ORCPT ); Thu, 27 Jan 2022 14:39:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245644AbiA0Tjy (ORCPT ); Thu, 27 Jan 2022 14:39:54 -0500 Received: from mail-oo1-xc36.google.com (mail-oo1-xc36.google.com [IPv6:2607:f8b0:4864:20::c36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5693AC06173B for ; Thu, 27 Jan 2022 11:39:54 -0800 (PST) Received: by mail-oo1-xc36.google.com with SMTP id k13-20020a4a310d000000b002e6c0c05892so921590ooa.13 for ; Thu, 27 Jan 2022 11:39:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Kht99chlE/8EVCcEzEOZy9Nky5Oo1PbC0W2KcaE2cSo=; b=mP3qGPjmtboPf8Yk2ZKAeQ65zhSMOXTErzRCDV46nMqlSzBKzGzQoAk2Ho3GEEHrRT yNrhzILA6f3QDnCs7haxVg647xOjC5xfHY6ONuOOK2dVTP3l2asjwhvLydhwuLUilIgy 00k53ZTXufNY9780N9gmKfvIlpnjcTokHs3Nu4aJdJ9Sx4eWOpBB/k6LJy9i50qRXjO0 O/DuEh74u61FJxflu7g1UvoBQFIy9OftZv1UawnKJY3Cqer+QDnphElyDHFwCs/fqof3 W40sDy6oVzD4L2/O13HuiRfGpos500CbEaUmaUiXGMKF50wY2BbWHZS1WlQ/eJgkzCpe OzYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Kht99chlE/8EVCcEzEOZy9Nky5Oo1PbC0W2KcaE2cSo=; b=jeq5kkPBNKDyl0CzCFog1PBgUPiq6CP+f/kZM/UUsyKOVgPK+eAUcAyM9KXgjaI33W QHk5RGdhsCe/HgSCftHf5tmj2uDlLLTbivY0lYym2UgpPL+4a1LKA3a8NzJiqNerRG0z bwzT3nSo74c1DqEuxYm809JZrKuFn13XJbBJt5Is9Cps3FpxarivcC38F6OTW1rb5Y7z IzeumgA8RD9sYyjfw7opJ+CApsqAlih/smMEeIr3pLWcwr9a+A0/maH5D/fxFdurZQ4/ As/5zr1XpFLTpDADhkAl0QvIe2YVycUbHbBN4JHUz3kgQcFhOCsNpe5gezhBmkyCOtTg FM2g== X-Gm-Message-State: AOAM533aRSpEQKh6BhMK4YnlXlFxh5moyUdNtZMSzwCRaNXlaFy4/rfx yRNQYMiNzbG2ti6q9TRMAu3jnguQQ3fBD2VNEfNXrQ== X-Google-Smtp-Source: ABdhPJxqPDcgf2LwtsCXaeGr3ALh7VWF6KNIyDFo8rhocZL9wDIjd1haYjfdc3Yabyu3Qd1t4dGh8mjA1PO8QNkx3d4= X-Received: by 2002:a4a:b787:: with SMTP id a7mr2620025oop.85.1643312393447; Thu, 27 Jan 2022 11:39:53 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jim Mattson Date: Thu, 27 Jan 2022 11:39:42 -0800 Message-ID: Subject: Re: Why do we need KVM_REQ_GET_NESTED_STATE_PAGES after all To: Maxim Levitsky Cc: "kvm@vger.kernel.org" , Sean Christopherson , Paolo Bonzini , David Gilbert , Vitaly Kuznetsov , Peter Xu Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, Jan 27, 2022 at 8:04 AM Maxim Levitsky wrote: > > I would like to raise a question about this elephant in the room which I wanted to understand for > quite a long time. > > For my nested AVIC work I once again need to change the KVM_REQ_GET_NESTED_STATE_PAGES code and once > again I am asking myself, maybe we can get rid of this code, after all? We (GCE) use it so that, during post-copy, a vCPU thread can exit to userspace and demand these pages from the source itself, rather than funneling all demands through a single "demand paging listener" thread, which I believe is the equivalent of qemu's userfaultfd "fault handler" thread. Our (internal) post-copy mechanism scales quite well, because most demand paging requests are triggered by an EPT violation, which happens to be a convenient place to exit to userspace. Very few pages are typically demanded as a result of kvm_vcpu_{read,write}_guest, where the vCPU thread is so deep in the kernel call stack that it has to request the page via the demand paging listener thread. With nested virtualization, the various vmcs12 pages consulted directly by kvm (bypassing the EPT tables) were a scalability issue. (Note that, unlike upstream, we don't call nested_get_vmcs12_pages directly from VMLAUNCH/VMRESUME emulation; we always call it as a result of this request that you don't like.) As we work on converting from our (hacky) demand paging scheme to userfaultfd, we will have to solve the scalability issue anyway (unless someone else beats us to it). Eventually, I expect that our need for this request will go away. Honestly, without the exits to userspace, I don't really see how this request buys you anything upstream. When I originally submitted it, I was prepared for rejection, but Paolo said that qemu had a similar need for it, and I happily never questioned that assertion.