From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751968AbdIVHXv (ORCPT ); Fri, 22 Sep 2017 03:23:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52326 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751795AbdIVHXu (ORCPT ); Fri, 22 Sep 2017 03:23:50 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com F18ED7EAA1 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=pbonzini@redhat.com Subject: Re: [patch 2/3] KVM: x86: KVM_HC_RT_PRIO hypercall (host-side) To: Marcelo Tosatti Cc: Konrad Rzeszutek Wilk , kvm@vger.kernel.org, linux-kernel@vger.kernel.org References: <20170921113835.031375194@redhat.com> <20170921114039.364395490@redhat.com> <20170921133212.GN26248@char.us.oracle.com> <20170922010811.GA20133@amt.cnet> From: Paolo Bonzini Message-ID: <29aadd63-ddfe-0ddc-2d71-8c0391db0ba4@redhat.com> Date: Fri, 22 Sep 2017 09:23:47 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20170922010811.GA20133@amt.cnet> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Fri, 22 Sep 2017 07:23:50 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22/09/2017 03:08, Marcelo Tosatti wrote: > On Thu, Sep 21, 2017 at 03:49:33PM +0200, Paolo Bonzini wrote: >> On 21/09/2017 15:32, Konrad Rzeszutek Wilk wrote: >>> So the guest can change the scheduling decisions at the host level? >>> And the host HAS to follow it? There is no policy override for the >>> host to say - nah, not going to do it? > > In that case the host should not even configure the guest with this > option (this is QEMU's 'enable-rt-fifo-hc' option). > >>> Also wouldn't the guest want to always be at SCHED_FIFO? [I am thinking >>> of a guest admin who wants all the CPU resources he can get] > > No. Because in the following code, executed by the housekeeping vCPU > running at constant SCHED_FIFO priority: > > 1. Start disk I/O. > 2. busy spin > > With the emulator thread sharing the same pCPU with the housekeeping > vCPU, the emulator thread (which runs at SCHED_NORMAL), will never > be scheduled in in place of the vcpu thread at SCHED_FIFO. > > This causes a hang. But if the emulator thread can interrupt the housekeeping thread, the emulator thread should also be SCHED_FIFO at higher priority; IIRC this was in Jan's talk from a few years ago. QEMU would also have to use PI mutexes (which is the main reason why it's using QemuMutex instead of e.g. GMutex). >> Yeah, I do not understand why there should be a housekeeping VCPU that >> is running at SCHED_NORMAL. If it hurts, don't do it... > > Hope explanation above makes sense (in fact, it was you who pointed > out SCHED_FIFO should not be constant on the housekeeping vCPU, > when sharing pCPU with emulator thread at SCHED_NORMAL). The two are not exclusive... As you point out, it depends on the workload. For DPDK you can put both of them at SCHED_NORMAL. For kernel-intensive uses you must use SCHED_FIFO. Perhaps we could consider running these threads at SCHED_RR instead. Unlike SCHED_NORMAL, I am not against a hypercall that bumps temporarily SCHED_RR to SCHED_FIFO, but perhaps that's not even necessary. Paolo