From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26F41C43381 for ; Wed, 20 Jan 2021 17:34:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 04BC520705 for ; Wed, 20 Jan 2021 17:34:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390623AbhATRdr (ORCPT ); Wed, 20 Jan 2021 12:33:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728916AbhATR0E (ORCPT ); Wed, 20 Jan 2021 12:26:04 -0500 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 836BBC061575 for ; Wed, 20 Jan 2021 09:25:21 -0800 (PST) Received: by mail-pf1-x42b.google.com with SMTP id j12so7759360pfj.12 for ; Wed, 20 Jan 2021 09:25:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=KEV7urkYywFz30FuSgPUexQkm0SddOC5wU2ZuJdovsU=; b=sr57Q262PncLhufam68Q0mV8eOPQ7ExS79JYjI+ZHriGoUATbSv/9nDq90xpK18aNf pCxA/BpoIyTFMKEMgeMaqxfykjcIfhDhyU+Lqq1rB9oHYW6hrQ3YgQsCGntE736yOoHJ aFV4CqtHbOUYY9WlD8psu6AqRHWQpMwEQbSFaXU2E5Z5OOyh1LXkp6tX8Fod3j7dJ8bg CFXSO6AiWTHqBAVuvXdpaZ3gv51iLaz1EtHGKdrGeHojP/Ca7LTJjAp4EGT0MtubT489 crLCZiynhhULSVZyhmZsOIVF4vFtpNYrjnicg8Ywm4PPV/Uk6g1WQEn7h1rfeXQt7XuW WlMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=KEV7urkYywFz30FuSgPUexQkm0SddOC5wU2ZuJdovsU=; b=GqbR/AsJJtirtk3jeefc1ee/QVLJsRPzHHpAVuahOLAL+CeknVDdm7Km85uW249Afm L2mMeW4TOYkqhEmdAhCkHXriYwVlMftvqX8r6rYs3R9pPG8b4rGveKsuJlh3QqOx9EtZ MLLTM83EH4LZOaCmo0B8r50GjHJxjKwHH65BOEJ6yu23w55HIHuLR8xU1uvJfvk0WQFl 1CGzC9FvB5oIfD5QzvhHTTemEDjZiui8WMltpggzWL5wjQEvCY/v2UEifQSCjHbKmC8A HXfxvMTnAZtXfz9o2Y8WW6qYub6cUp3TTrFX36ObKpgp5rGEKt7dVwc/2tdfhfFOmM6c rOGA== X-Gm-Message-State: AOAM531ltJuUp9dnMTPGGjWE/d8bNFUZ1WbhkQRLi9WRsr2Y+NiSxnBE f8nrIC792wu77nrB3SD5DNAqSd/5vav9MQ== X-Google-Smtp-Source: ABdhPJyaLaEMz2h2kWJp0ZmiSxK3MBhqFbwJH1F8FXwtvkqOtwIB3I4cS5CAg+2WK3VRtFsr4QZJvA== X-Received: by 2002:a63:3088:: with SMTP id w130mr10411646pgw.210.1611163520746; Wed, 20 Jan 2021 09:25:20 -0800 (PST) Received: from google.com ([2620:15c:f:10:1ea0:b8ff:fe73:50f5]) by smtp.gmail.com with ESMTPSA id u4sm3010029pjv.22.2021.01.20.09.25.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Jan 2021 09:25:19 -0800 (PST) Date: Wed, 20 Jan 2021 09:25:13 -0800 From: Sean Christopherson To: Vitaly Kuznetsov Cc: Igor Mammedov , kvm@vger.kernel.org, Paolo Bonzini , Wanpeng Li , Jim Mattson Subject: Re: [PATCH RFC 3/4] KVM: Define KVM_USER_MEM_SLOTS in arch-neutral include/linux/kvm_host.h Message-ID: References: <20210115131844.468982-1-vkuznets@redhat.com> <20210115131844.468982-4-vkuznets@redhat.com> <87wnwa608c.fsf@vitty.brq.redhat.com> <20210120123400.7936e526@redhat.com> <87czxz6cl9.fsf@vitty.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87czxz6cl9.fsf@vitty.brq.redhat.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Wed, Jan 20, 2021, Vitaly Kuznetsov wrote: > Igor Mammedov writes: > > > On Tue, 19 Jan 2021 09:20:42 -0800 > > Sean Christopherson wrote: > > > >> > >> Were you planning on adding a capability to check for the new and improved > >> memslots limit, e.g. to know whether or not KVM might die on a large VM? > >> If so, requiring the VMM to call an ioctl() to set a higher (or lower?) limit > >> would be another option. That wouldn't have the same permission requirements as > >> a module param, but it would likely be a more effective safeguard in practice, > >> e.g. use cases with a fixed number of memslots or a well-defined upper bound > >> could use the capability to limit themselves. > > Currently QEMU uses KVM_CAP_NR_MEMSLOTS to get limit, and depending on place the > > limit is reached it either fails gracefully (i.e. it checks if free slot is > > available before slot allocation) or aborts (in case where it tries to allocate > > slot without check). > > FWIW, 'synic problem' causes it to abort. > > > New ioctl() seems redundant as we already have upper limit check > > (unless it would allow go over that limit, which in its turn defeats purpose of > > the limit). Gah, and I even looked for an ioctl(). No idea how I didn't find NR_MEMSLOTS. > Right, I didn't plan to add any new CAP as what we already have should > be enough to query the limits. Having an ioctl to set the upper limit > seems complicated: with device and CPU hotplug it may not be easy to > guess what it should be upfront so VMMs will likely just add a call to > raise the limit in memslot modification code so it won't be providing > any new protection. True. Maybe the best approach, if we want to make the limit configurable, would be to make a lower limit opt-in. I.e. default=KVM_CAP_NR_MEMSLOTS=SHRT_MAX, with an ioctl() to set a lower limit. That would also allow us to defer adding a new ioctl() until someone actually plans on using it. > >> Thoughts? An ioctl() feels a little over-engineered, but I suspect that adding > >> a module param that defaults to N*KVM_MAX_VPCUS will be a waste, e.g. no one > >> will ever touch the param and we'll end up with dead, rarely-tested code. > > Alternatively, we can hard-code KVM_USER_MEM_SLOTS to N*KVM_MAX_VPCUS so > no new parameter is needed but personally, I'd prefer to have it > configurable (in case we decide to set it to something lower than > SHRT_MAX of course) even if it won't be altered very often (which is a > good thing for 'general purpose' usage, right?). > > First, it will allow tightening the limit for some very specific deployments > (e.g. FaaS/ Firecracker-style) to say '20' which should be enough. A module param likely isn't usable for many such deployments though, as it would require a completely isolated pool of systems. That's why an ioctl() is appealing; the expected number of memslots is a property of the VM, not of the platform. > Second, we may be overlooking some configurations where the number of > memslots is actually dependent on the number of vCPUs but nobody complained > so far just because these configutrarions use a farly small number and the > ceiling wasn't hit yet. > > One more spare thought. Per-vCPU memslots may come handy if someone > decides to move some of the KVM PV features to userspace. E.g. I can > imagine an attempt to move async_pf out of kernel. Memslots aren't per-vCPU though, any userspace feature that tries to treat them as such will be flawed in some way. Practically speaking, per-vCPU memslots just aren't feasible, at least not without KVM changes that are likely unpalatable, e.g. KVM would need to incorporate the existence of a vCPU specific memslot into the MMU role.