From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D5C0C433FE for ; Thu, 10 Dec 2020 08:46:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 09D7E22CA1 for ; Thu, 10 Dec 2020 08:46:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388166AbgLJIqG (ORCPT ); Thu, 10 Dec 2020 03:46:06 -0500 Received: from mail.kernel.org ([198.145.29.99]:45136 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726579AbgLJIqG (ORCPT ); Thu, 10 Dec 2020 03:46:06 -0500 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 06B3422CA1; Thu, 10 Dec 2020 08:45:25 +0000 (UTC) Received: from disco-boy.misterjones.org ([51.254.78.96] helo=www.loen.fr) by disco-boy.misterjones.org with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94) (envelope-from ) id 1knHZi-00013B-TO; Thu, 10 Dec 2020 08:45:23 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 10 Dec 2020 08:45:22 +0000 From: Marc Zyngier To: Joel Fernandes , Quentin Perret Cc: Sergey Senozhatsky , yezengruan , Will Deacon , LKML , Suleiman Souhlal , kvmarm@lists.cs.columbia.edu, "moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)" , "Wanghaibin (D)" Subject: Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted In-Reply-To: References: <20200721041742.197354-1-sergey.senozhatsky@gmail.com> <20200817020310.GA1210848@jagdpanzerIV.localdomain> <20200911085841.GB562@jagdpanzerIV.localdomain> User-Agent: Roundcube Webmail/1.4.9 Message-ID: <78091359dab0d8decfc452f7c5c25971@kernel.org> X-Sender: maz@kernel.org X-SA-Exim-Connect-IP: 51.254.78.96 X-SA-Exim-Rcpt-To: joelaf@google.com, qperret@google.com, sergey.senozhatsky@gmail.com, yezengruan@huawei.com, will@kernel.org, linux-kernel@vger.kernel.org, suleiman@google.com, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, wanghaibin.wang@huawei.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-12-10 01:39, Joel Fernandes wrote: [...] >> Quentin and I have discussed potential ways of improving guest >> scheduling >> on terminally broken systems (otherwise known as big-little), in the >> form of a capacity request from the guest to the host. I'm not really >> keen on the host exposing its own capacity, as that doesn't tell the >> host what the guest actually needs. > > I am not sure how a capacity request could work well. It seems the > cost of a repeated hypercall could be prohibitive. In this case, a > lighter approach might be for KVM to restrict vCPU threads to run on > certain types of cores, and pass the capacity information to the guest > at guest's boot time. That seems like a very narrow use case. If you actually pin vcpus to physical CPU classes, DT is the right place to put things, because it is completely static. This is effectively creating a virtual big-little, which is in my opinion a userspace job. > This would be a one-time cost to pay. And then, > then the guest scheduler can handle the scheduling appropriately > without any more hypercalls. Thoughts? Anything that is a one-off belongs to firmware configuration, IMO. The case I'm concerned with is when vcpus are allowed to roam across the system, and hit random physical CPUs because the host has no idea of the workload the guest deals with (specially as the AMU counters are either absent or unusable on any available core). The cost of a hypercall really depends on where you terminate it. If it is a shallow exit, that's only a few hundred cycles on any half baked CPU. Go all the way to userspace, and the host scheduler is the limit. But the frequency of that hypercall obviously matters too. How often do you expect the capacity request to fire? Probably not on each and every time slice, right? Quentin, can you shed some light on this? Thanks, M. -- Jazz is not dead. It just smells funny...