From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAA19C433FE for ; Fri, 1 Oct 2021 19:11:33 +0000 (UTC) Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by mail.kernel.org (Postfix) with ESMTP id 1CAEC61A56 for ; Fri, 1 Oct 2021 19:11:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1CAEC61A56 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.cs.columbia.edu Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 7C0594B0F7; Fri, 1 Oct 2021 15:11:32 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@redhat.com Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4t620NxZThL6; Fri, 1 Oct 2021 15:11:31 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 4CBB44B0B6; Fri, 1 Oct 2021 15:11:31 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 42A8E4B08E for ; Fri, 1 Oct 2021 15:11:29 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PyaPrboELDTy for ; Fri, 1 Oct 2021 15:11:28 -0400 (EDT) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 2C7224A98B for ; Fri, 1 Oct 2021 15:11:28 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633115488; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=4VseeF8bJPVtH4lSfVIDZ7rsne7LGF8fM7tEHcwb28Q=; b=F5J5oXj9pye/UPtxFuI8BlYGpeIlumO1dTXmB0JyKPCeXOSaRP4dbquxgSicFQR7zQVCZr OceDgQMu8Cc/g6Fgv1QGuWBzDyVvGLJYEL4AwfUn8hvnut9Y4yi6fMt6zoHZrwHrcujEm5 BAgzVs9adJnjjEH5pi7Ex2B5Mo/1K3o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-558-LDLXc03cMbO7GTlskT3zsQ-1; Fri, 01 Oct 2021 15:11:24 -0400 X-MC-Unique: LDLXc03cMbO7GTlskT3zsQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7282F18D6A31; Fri, 1 Oct 2021 19:11:22 +0000 (UTC) Received: from fuller.cnet (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C01E45C1B4; Fri, 1 Oct 2021 19:11:21 +0000 (UTC) Received: by fuller.cnet (Postfix, from userid 1000) id DED68416CE5D; Fri, 1 Oct 2021 16:11:17 -0300 (-03) Date: Fri, 1 Oct 2021 16:11:17 -0300 From: Marcelo Tosatti To: Paolo Bonzini Subject: Re: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace Message-ID: <20211001191117.GA69579@fuller.cnet> References: <20210916181538.968978-1-oupton@google.com> <20210916181538.968978-8-oupton@google.com> <20210930191416.GA19068@fuller.cnet> <48151d08-ee29-2b98-b6e1-f3c8a1ff26bc@redhat.com> <20211001103200.GA39746@fuller.cnet> <7901cb84-052d-92b6-1e6a-028396c2c691@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <7901cb84-052d-92b6-1e6a-028396c2c691@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Cc: Catalin Marinas , kvm@vger.kernel.org, Peter Shier , Marc Zyngier , David Matlack , Will Deacon , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, Jim Mattson X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Fri, Oct 01, 2021 at 05:12:20PM +0200, Paolo Bonzini wrote: > On 01/10/21 12:32, Marcelo Tosatti wrote: > > > +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + > > > kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + [...] > > > +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock > > > nanoseconds + (k_0) and realtime nanoseconds (r_0) in their > > > respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is > > > set in the provided + structure. KVM will advance the VM's > > > kvmclock to account for elapsed + time since recording the clock > > > values. > > > > You can't advance both kvmclock (kvmclock_offset variable) and the > > TSCs, which would be double counting. > > > > So you have to either add the elapsed realtime (1) between > > KVM_GET_CLOCK to kvmclock (which this patch is doing), or to the > > TSCs. If you do both, there is double counting. Am i missing > > something? > > Probably one of these two (but it's worth pointing out both of them): > > 1) the attribute that's introduced here *replaces* > KVM_SET_MSR(MSR_IA32_TSC), so the TSC is not added. > > 2) the adjustment formula later in the algorithm does not care about how > much time passed between step 1 and step 4. It just takes two well > known (TSC, kvmclock) pairs, and uses them to ensure the guest TSC is > the same on the destination as if the guest was still running on the > source. It is irrelevant that one of them is before migration and one > is after, all it matters is that one is on the source and one is on the > destination. OK, so it still relies on NTPd daemon to fix the CLOCK_REALTIME delay which is introduced during migration (which is what i would guess is the lower hanging fruit) (for guests using TSC). My point was that, by advancing the _TSC value_ by: T0. stop guest vcpus (source) T1. KVM_GET_CLOCK (source) T2. KVM_SET_CLOCK (destination) T3. Write guest TSCs (destination) T4. resume guest (destination) new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 t_0: host TSC at KVM_GET_CLOCK time. off_n: TSC offset at vcpu-n (as long as no guest TSC writes are performed, TSC offset is fixed). ... +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. Only kvmclock is advanced (by passing r_0). But a guest might not use kvmclock (hopefully modern guests on modern hosts will use TSC clocksource, whose clock_gettime is faster... some people are using that already). At some point QEMU should enable invariant TSC flag by default? That said, the point is: why not advance the _TSC_ values (instead of kvmclock nanoseconds), as doing so would reduce the "the CLOCK_REALTIME delay which is introduced during migration" for both kvmclock users and modern tsc clocksource users. So yes, i also like this patchset, but would like it even more if it fixed the case above as well (and not sure whether adding the migration delta to KVMCLOCK makes it harder to fix TSC case later). > Perhaps we can add to step 6 something like: > > > +6. Adjust the guest TSC offsets for every vCPU to account for (1) > > time + elapsed since recording state and (2) difference in TSCs > > between the + source and destination machine: + + new_off_n = t_0 > > + off_n + (k_1 - k_0) * freq - t_1 + > > "off + t - k * freq" is the guest TSC value corresponding to a time of 0 > in kvmclock. The above formula ensures that it is the same on the > destination as it was on the source. > > Also, the names are a bit hard to follow. Perhaps > > t_0 tsc_src > t_1 tsc_dest > k_0 guest_src > k_1 guest_dest > r_0 host_src > off_n ofs_src[i] > new_off_n ofs_dest[i] > > Paolo > > _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 974A4C433F5 for ; Fri, 1 Oct 2021 19:13:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 725AC61ACD for ; Fri, 1 Oct 2021 19:13:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354434AbhJATPK (ORCPT ); Fri, 1 Oct 2021 15:15:10 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:40328 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356043AbhJATNK (ORCPT ); Fri, 1 Oct 2021 15:13:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633115485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=4VseeF8bJPVtH4lSfVIDZ7rsne7LGF8fM7tEHcwb28Q=; b=QlHEc624pBMdbOAi2RU4gxIGE6Kc1fJbaRo+pLHsDwCmo3ZyVtAsm5wyZ3XiRCTHW8Ezka s8myupQsUMAYzqENh+cYEcWns4GmPsN+lwLU/2xnV8WhpOnK4cDBC/T75jKwMtceUwrJ2O 4bSb97FO36+qQdEGx/hMkzmOA9lqzI8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-558-LDLXc03cMbO7GTlskT3zsQ-1; Fri, 01 Oct 2021 15:11:24 -0400 X-MC-Unique: LDLXc03cMbO7GTlskT3zsQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7282F18D6A31; Fri, 1 Oct 2021 19:11:22 +0000 (UTC) Received: from fuller.cnet (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C01E45C1B4; Fri, 1 Oct 2021 19:11:21 +0000 (UTC) Received: by fuller.cnet (Postfix, from userid 1000) id DED68416CE5D; Fri, 1 Oct 2021 16:11:17 -0300 (-03) Date: Fri, 1 Oct 2021 16:11:17 -0300 From: Marcelo Tosatti To: Paolo Bonzini Cc: Oliver Upton , kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas Subject: Re: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace Message-ID: <20211001191117.GA69579@fuller.cnet> References: <20210916181538.968978-1-oupton@google.com> <20210916181538.968978-8-oupton@google.com> <20210930191416.GA19068@fuller.cnet> <48151d08-ee29-2b98-b6e1-f3c8a1ff26bc@redhat.com> <20211001103200.GA39746@fuller.cnet> <7901cb84-052d-92b6-1e6a-028396c2c691@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7901cb84-052d-92b6-1e6a-028396c2c691@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Fri, Oct 01, 2021 at 05:12:20PM +0200, Paolo Bonzini wrote: > On 01/10/21 12:32, Marcelo Tosatti wrote: > > > +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + > > > kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + [...] > > > +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock > > > nanoseconds + (k_0) and realtime nanoseconds (r_0) in their > > > respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is > > > set in the provided + structure. KVM will advance the VM's > > > kvmclock to account for elapsed + time since recording the clock > > > values. > > > > You can't advance both kvmclock (kvmclock_offset variable) and the > > TSCs, which would be double counting. > > > > So you have to either add the elapsed realtime (1) between > > KVM_GET_CLOCK to kvmclock (which this patch is doing), or to the > > TSCs. If you do both, there is double counting. Am i missing > > something? > > Probably one of these two (but it's worth pointing out both of them): > > 1) the attribute that's introduced here *replaces* > KVM_SET_MSR(MSR_IA32_TSC), so the TSC is not added. > > 2) the adjustment formula later in the algorithm does not care about how > much time passed between step 1 and step 4. It just takes two well > known (TSC, kvmclock) pairs, and uses them to ensure the guest TSC is > the same on the destination as if the guest was still running on the > source. It is irrelevant that one of them is before migration and one > is after, all it matters is that one is on the source and one is on the > destination. OK, so it still relies on NTPd daemon to fix the CLOCK_REALTIME delay which is introduced during migration (which is what i would guess is the lower hanging fruit) (for guests using TSC). My point was that, by advancing the _TSC value_ by: T0. stop guest vcpus (source) T1. KVM_GET_CLOCK (source) T2. KVM_SET_CLOCK (destination) T3. Write guest TSCs (destination) T4. resume guest (destination) new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 t_0: host TSC at KVM_GET_CLOCK time. off_n: TSC offset at vcpu-n (as long as no guest TSC writes are performed, TSC offset is fixed). ... +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. Only kvmclock is advanced (by passing r_0). But a guest might not use kvmclock (hopefully modern guests on modern hosts will use TSC clocksource, whose clock_gettime is faster... some people are using that already). At some point QEMU should enable invariant TSC flag by default? That said, the point is: why not advance the _TSC_ values (instead of kvmclock nanoseconds), as doing so would reduce the "the CLOCK_REALTIME delay which is introduced during migration" for both kvmclock users and modern tsc clocksource users. So yes, i also like this patchset, but would like it even more if it fixed the case above as well (and not sure whether adding the migration delta to KVMCLOCK makes it harder to fix TSC case later). > Perhaps we can add to step 6 something like: > > > +6. Adjust the guest TSC offsets for every vCPU to account for (1) > > time + elapsed since recording state and (2) difference in TSCs > > between the + source and destination machine: + + new_off_n = t_0 > > + off_n + (k_1 - k_0) * freq - t_1 + > > "off + t - k * freq" is the guest TSC value corresponding to a time of 0 > in kvmclock. The above formula ensures that it is the same on the > destination as it was on the source. > > Also, the names are a bit hard to follow. Perhaps > > t_0 tsc_src > t_1 tsc_dest > k_0 guest_src > k_1 guest_dest > r_0 host_src > off_n ofs_src[i] > new_off_n ofs_dest[i] > > Paolo > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35232C433F5 for ; Fri, 1 Oct 2021 19:14:44 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E9AE761ABE for ; Fri, 1 Oct 2021 19:14:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E9AE761ABE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=YJBIM/FAzXb2kB7ZnvQKsh2i0oc4pcgPSn1GtXwRPcc=; b=rI+St/jJAUWFY/ W5cEZAEwnMm4McfdmAh3mt3YWHo6AV+FTZ6+bGu/WR+WbgjoJ+XfM+d4fSpRsRlyzllxZYDMYijRJ e+Fo5UOamvPeS34AiI/9f31cUml7dGr+fGw5tyOxCxVmwm88TU8UbqJ8HhfqJE20rRtZ7l1uPnlGO 4xGicU/V2UWBXQ4z/rsAuUE5xr+Cy3USLQpwYJNh4N6vjFwss5+UfJPFhr6DD1XrDoh5A+No0m9Ub yhMGS2DFKY+tXz0BjweuB4Otpeb2csPyxaQjwSFp4gRma1ElfZoWtGIIVbE7Vc76B1/74Yr2qFvnq Tlw6vMMiUx6F+KTaIHKg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mWNwf-0019qQ-J3; Fri, 01 Oct 2021 19:11:45 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mWNwN-0019q6-4X for linux-arm-kernel@lists.infradead.org; Fri, 01 Oct 2021 19:11:28 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633115485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=4VseeF8bJPVtH4lSfVIDZ7rsne7LGF8fM7tEHcwb28Q=; b=QlHEc624pBMdbOAi2RU4gxIGE6Kc1fJbaRo+pLHsDwCmo3ZyVtAsm5wyZ3XiRCTHW8Ezka s8myupQsUMAYzqENh+cYEcWns4GmPsN+lwLU/2xnV8WhpOnK4cDBC/T75jKwMtceUwrJ2O 4bSb97FO36+qQdEGx/hMkzmOA9lqzI8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-558-LDLXc03cMbO7GTlskT3zsQ-1; Fri, 01 Oct 2021 15:11:24 -0400 X-MC-Unique: LDLXc03cMbO7GTlskT3zsQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7282F18D6A31; Fri, 1 Oct 2021 19:11:22 +0000 (UTC) Received: from fuller.cnet (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C01E45C1B4; Fri, 1 Oct 2021 19:11:21 +0000 (UTC) Received: by fuller.cnet (Postfix, from userid 1000) id DED68416CE5D; Fri, 1 Oct 2021 16:11:17 -0300 (-03) Date: Fri, 1 Oct 2021 16:11:17 -0300 From: Marcelo Tosatti To: Paolo Bonzini Cc: Oliver Upton , kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas Subject: Re: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace Message-ID: <20211001191117.GA69579@fuller.cnet> References: <20210916181538.968978-1-oupton@google.com> <20210916181538.968978-8-oupton@google.com> <20210930191416.GA19068@fuller.cnet> <48151d08-ee29-2b98-b6e1-f3c8a1ff26bc@redhat.com> <20211001103200.GA39746@fuller.cnet> <7901cb84-052d-92b6-1e6a-028396c2c691@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <7901cb84-052d-92b6-1e6a-028396c2c691@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211001_121127_305258_13C37A1A X-CRM114-Status: GOOD ( 34.93 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Oct 01, 2021 at 05:12:20PM +0200, Paolo Bonzini wrote: > On 01/10/21 12:32, Marcelo Tosatti wrote: > > > +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + > > > kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + [...] > > > +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock > > > nanoseconds + (k_0) and realtime nanoseconds (r_0) in their > > > respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is > > > set in the provided + structure. KVM will advance the VM's > > > kvmclock to account for elapsed + time since recording the clock > > > values. > > > > You can't advance both kvmclock (kvmclock_offset variable) and the > > TSCs, which would be double counting. > > > > So you have to either add the elapsed realtime (1) between > > KVM_GET_CLOCK to kvmclock (which this patch is doing), or to the > > TSCs. If you do both, there is double counting. Am i missing > > something? > > Probably one of these two (but it's worth pointing out both of them): > > 1) the attribute that's introduced here *replaces* > KVM_SET_MSR(MSR_IA32_TSC), so the TSC is not added. > > 2) the adjustment formula later in the algorithm does not care about how > much time passed between step 1 and step 4. It just takes two well > known (TSC, kvmclock) pairs, and uses them to ensure the guest TSC is > the same on the destination as if the guest was still running on the > source. It is irrelevant that one of them is before migration and one > is after, all it matters is that one is on the source and one is on the > destination. OK, so it still relies on NTPd daemon to fix the CLOCK_REALTIME delay which is introduced during migration (which is what i would guess is the lower hanging fruit) (for guests using TSC). My point was that, by advancing the _TSC value_ by: T0. stop guest vcpus (source) T1. KVM_GET_CLOCK (source) T2. KVM_SET_CLOCK (destination) T3. Write guest TSCs (destination) T4. resume guest (destination) new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 t_0: host TSC at KVM_GET_CLOCK time. off_n: TSC offset at vcpu-n (as long as no guest TSC writes are performed, TSC offset is fixed). ... +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. Only kvmclock is advanced (by passing r_0). But a guest might not use kvmclock (hopefully modern guests on modern hosts will use TSC clocksource, whose clock_gettime is faster... some people are using that already). At some point QEMU should enable invariant TSC flag by default? That said, the point is: why not advance the _TSC_ values (instead of kvmclock nanoseconds), as doing so would reduce the "the CLOCK_REALTIME delay which is introduced during migration" for both kvmclock users and modern tsc clocksource users. So yes, i also like this patchset, but would like it even more if it fixed the case above as well (and not sure whether adding the migration delta to KVMCLOCK makes it harder to fix TSC case later). > Perhaps we can add to step 6 something like: > > > +6. Adjust the guest TSC offsets for every vCPU to account for (1) > > time + elapsed since recording state and (2) difference in TSCs > > between the + source and destination machine: + + new_off_n = t_0 > > + off_n + (k_1 - k_0) * freq - t_1 + > > "off + t - k * freq" is the guest TSC value corresponding to a time of 0 > in kvmclock. The above formula ensures that it is the same on the > destination as it was on the source. > > Also, the names are a bit hard to follow. Perhaps > > t_0 tsc_src > t_1 tsc_dest > k_0 guest_src > k_1 guest_dest > r_0 host_src > off_n ofs_src[i] > new_off_n ofs_dest[i] > > Paolo > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel