From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964909AbcIPPBt (ORCPT ); Fri, 16 Sep 2016 11:01:49 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46254 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935704AbcIPPAI (ORCPT ); Fri, 16 Sep 2016 11:00:08 -0400 Date: Fri, 16 Sep 2016 16:59:58 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, dmatlack@google.com, luto@kernel.org, peterhornyack@google.com, x86@kernel.org Subject: Re: [PATCH 2/2] x86, kvm: use kvmclock to compute TSC deadline value Message-ID: <20160916145957.GF17296@potion> References: <1473200999-123004-1-git-send-email-pbonzini@redhat.com> <1473200999-123004-3-git-send-email-pbonzini@redhat.com> <20160915150851.GA15815@potion> <20160915195949.GA17095@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 16 Sep 2016 15:00:02 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2016-09-15 23:02+0200, Paolo Bonzini: > On 15/09/2016 21:59, Radim Krčmář wrote: >> 2016-09-15 18:00+0200, Paolo Bonzini: >>>> When we are already going the paravirtual route, we could add an >>>> interface that accepts the deadline in kvmclock nanoseconds. >>>> It would be much more maintanable than adding a fragile paravirtual >>>> layer on top of random interfaces. >>> >>> Good idea. >> >> I'll prepare a prototype. > > So how would this work? A single MSR, used after setting TSC deadline > mode in LVTT? Could you write it and read TSC deadline or vice versa? So far, I think that adding KVM_MSR_DEADLINE (probably more descriptive name in the end) that works only in LVTT mode seems reasonable. I am tempted to add a second LVTT-like MSR to completely isolate it from LAPIC timers, but sharing the VMX_PREEMPTION_TIMER would be needlessly complicated. > Could you write it and read TSC deadline or vice versa? KVM_MSR_DEADLINE would be interface in kvmclock nanosecond values and MSR_IA32_TSCDEADLINE in TSC values. KVM_MSR_DEADLINE would follow similar rules as MSR_IA32_TSCDEADLINE -- the interrupt fires when kvmclock reaches the value, you read what you write, and 0 disarms it. If the TSC deadline timer was enabled, then the guest could write to both MSR_IA32_TSCDEADLINE and KVM_MSR_DEADLINE, but only one could be armed at any time (non-zero write to one will set the other to 0). The dual interface would allow unconditinal addition of the PV feature without regressing users that currently use MSR_IA32_TSCDEADLINE and adapted their stack to handle KVM's TSC shortcomings ... > My idea would be "yes" for writing nsec deadline and reading TSC > deadline, but "no" for writing TSC deadline and reading nsec deadline. > In the latter case, reading nsec deadline might return an impossible > value such as -1; Both MSRs would read what was written or 0 if fired/disarmed in between. I'm not sure if I understood what you meant, though. > this lets userspace decide whether to set a nsec-based > deadline or a TSC-based deadline after migration. Hm, isn't switching to TSC-based deadline after migration pointless? We don't have any migration notifiers, so the guest interface would have to always check what interface to use. >>> This still wouldn't handle old hosts of course. >> >> The question is whether we want to carry around 150 LOC because of old >> hosts. I'd just fix Linux to avoid deadline TSC without invariant TSC. >> :) > > Yes, that would automatically blacklist it on KVM. You'd also need to > update the recent optimization to the TSC deadline timer, to also work > on other APIC timer modes or at least in your new PV mode. All modes shouldn't be much harder than just the PV mode. Thanks.