From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: kvmclock doesn't work, help? Date: Wed, 9 Dec 2015 14:27:36 -0800 Message-ID: References: <56689A2B.6090500@redhat.com> <5668A76A.7050707@redhat.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=e89a8f923faa0cbb3b05267e9a27 Cc: kvm list , Marcelo Tosatti , Radim Krcmar , X86 ML , Alexander Graf To: Paolo Bonzini Return-path: Received: from mail-ob0-f181.google.com ([209.85.214.181]:33746 "EHLO mail-ob0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751324AbbLIW14 (ORCPT ); Wed, 9 Dec 2015 17:27:56 -0500 Received: by obbww6 with SMTP id ww6so46732778obb.0 for ; Wed, 09 Dec 2015 14:27:55 -0800 (PST) In-Reply-To: <5668A76A.7050707@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: --e89a8f923faa0cbb3b05267e9a27 Content-Type: text/plain; charset=UTF-8 On Wed, Dec 9, 2015 at 2:12 PM, Paolo Bonzini wrote: > > > On 09/12/2015 22:49, Andy Lutomirski wrote: >> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini wrote: >>> >>> >>> On 09/12/2015 22:10, Andy Lutomirski wrote: >>>> Can we please stop making kvmclock more complex? It's a beast right >>>> now, and not in a good way. It's far too tangled with the vclock >>>> machinery on both the host and guest sides, the pvclock stuff is not >>>> well thought out (even in principle in an ABI sense), and it's never >>>> been clear to my what problem exactly the kvmclock stuff is supposed >>>> to solve. >>> >>> It's supposed to solve the problem that: >>> >>> - not all hosts have a working TSC >> >> Fine, but we don't need any vdso integration for that. > > Well, you still want a fast time source. That was a given. :) If the host can't figure out how to give *itself* a fast time source, I'd be surprised if KVM can manage to give the guest a fast, reliable time source. > >>> - even if they all do, virtual machines can be migrated (or >>> saved/restored) to a host with a different TSC frequency >>> >>> - any MMIO- or PIO-based mechanism to access the current time is orders >>> of magnitude slower than the TSC and less precise too. >> >> Yup. But TSC by itself gets that benefit, too. > > Yes, the problem is if you want to solve all three of them. The first > two are solved by the ACPI PM timer with a decent resolution (70 > ns---much faster anyway than an I/O port access). The third is solved > by TSC. To solve all three, you need kvmclock. Still confused. Is kvmclock really used in cases where even the host can't pull of working TSC? > >>>> I'm somewhat tempted to suggest that we delete kvmclock entirely and >>>> start over. A correctly functioning KVM guest using TSC (i.e. >>>> ignoring kvmclock entirely) seems to work rather more reliably and >>>> considerably faster than a kvmclock guest. >>> >>> If all your hosts have a working TSC and you don't do migration or >>> save/restore, that's a valid configuration. It's not a good default, >>> however. >> >> Er? >> >> kvmclock is still really quite slow and buggy. > > Unless it takes 3-4000 clock cycles for a gettimeofday, which it > shouldn't even with vdso disabled, it's definitely not slower than PIO. > >> And the patch I identified is definitely a problem here: >> >> [ 136.131241] KVM: disabling fast timing permanently due to inability >> to recover from suspend >> >> I got that on the host with this whitespace-damaged patch: >> >> if (backwards_tsc) { >> u64 delta_cyc = max_tsc - local_tsc; >> + if (!backwards_tsc_observed) >> + pr_warn("KVM: disabling fast timing >> permanently due to inability to recover from suspend\n"); >> >> when I suspended and resumed. >> >> Can anyone explain what problem >> 16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve? On >> brief inspection, it just seems to be incorrect. Shouldn't KVM's >> normal TSC logic handle that case right? After all, all vcpus should >> be paused when we resume from suspend. At worst, we should just need >> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus. (Actually, >> shouldn't we do that regardless of which way the TSC jumped on >> suspend/resume? After all, the jTSC-to-wall-clock offset is quite >> likely to change except on the very small handful of CPUs (if any) >> that keep the TSC running in S3 and hibernate. > > I don't recall the details of that patch, so Marcelo will have to answer > this, or Alex too since he chimed in the original thread. At least it > should be made conditional on the existence of a VM at suspend time (and > the master clock stuff should be made per VM, as I suggested at > https://www.mail-archive.com/kvm@vger.kernel.org/msg102316.html). > > It would indeed be great if the master clock could be dropped. But I'm > definitely missing some of the subtle details. :( Me, too. Anyway, see the attached untested patch. Marcelo? --Andy --e89a8f923faa0cbb3b05267e9a27 Content-Type: text/x-patch; charset=US-ASCII; name="0001-x86-kvm-On-KVM-re-enable-e.g.-after-suspect-update-c.patch" Content-Disposition: attachment; filename="0001-x86-kvm-On-KVM-re-enable-e.g.-after-suspect-update-c.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_ihzdz9hx0 RnJvbSBlNGE1ZTgzNGQzZmI2ZmMyNDk5OTY2ZTFhZjQyY2I1YmQ1OWY0NDEwIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpNZXNzYWdlLUlkOiA8ZTRhNWU4MzRkM2ZiNmZjMjQ5OTk2NmUxYWY0MmNi NWJkNTlmNDQxMC4xNDQ5NzAwMDI0LmdpdC5sdXRvQGtlcm5lbC5vcmc+CkZyb206IEFuZHkgTHV0 b21pcnNraSA8bHV0b0BrZXJuZWwub3JnPgpEYXRlOiBXZWQsIDkgRGVjIDIwMTUgMTQ6MjE6MDUg LTA4MDAKU3ViamVjdDogW1BBVENIXSB4ODYva3ZtOiBPbiBLVk0gcmUtZW5hYmxlIChlLmcuIGFm dGVyIHN1c3BlY3QpLCB1cGRhdGUgY2xvY2tzCgpUaGlzIGdldHMgcmlkIG9mIHRoZSAiZGlkIFRT QyBnbyBiYWNrd2FyZHMiIGxvZ2ljIGFuZCBqdXN0IHVwZGF0ZXMKYWxsIGNsb2Nrcy4gIEl0IHNo b3VsZCB3b3JrIGJldHRlciAobm8gbW9yZSBkaXNhYmxpbmcgb2YgZmFzdAp0aW1pbmcpIGFuZCBt b3JlIHJlbGlhYmx5IChhbGwgb2YgdGhlIGNsb2NrcyBhcmUgYWN0dWFsbHkgdXBkYXRlZCkuCgpT aWduZWQtb2ZmLWJ5OiBBbmR5IEx1dG9taXJza2kgPGx1dG9Aa2VybmVsLm9yZz4KLS0tCiBhcmNo L3g4Ni9rdm0veDg2LmMgfCA3NSArKystLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0KIDEgZmlsZSBjaGFuZ2VkLCAzIGluc2VydGlvbnMoKyksIDcyIGRl bGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL2FyY2gveDg2L2t2bS94ODYuYyBiL2FyY2gveDg2L2t2 bS94ODYuYwppbmRleCBlZWQzMjI4M2QyMmMuLmM4OGY5MWY0YjFhMyAxMDA2NDQKLS0tIGEvYXJj aC94ODYva3ZtL3g4Ni5jCisrKyBiL2FyY2gveDg2L2t2bS94ODYuYwpAQCAtMTIzLDggKzEyMyw2 IEBAIG1vZHVsZV9wYXJhbSh0c2NfdG9sZXJhbmNlX3BwbSwgdWludCwgU19JUlVHTyB8IFNfSVdV U1IpOwogdW5zaWduZWQgaW50IF9fcmVhZF9tb3N0bHkgbGFwaWNfdGltZXJfYWR2YW5jZV9ucyA9 IDA7CiBtb2R1bGVfcGFyYW0obGFwaWNfdGltZXJfYWR2YW5jZV9ucywgdWludCwgU19JUlVHTyB8 IFNfSVdVU1IpOwogCi1zdGF0aWMgYm9vbCBfX3JlYWRfbW9zdGx5IGJhY2t3YXJkc190c2Nfb2Jz ZXJ2ZWQgPSBmYWxzZTsKLQogI2RlZmluZSBLVk1fTlJfU0hBUkVEX01TUlMgMTYKIAogc3RydWN0 IGt2bV9zaGFyZWRfbXNyc19nbG9iYWwgewpAQCAtMTY3MSw3ICsxNjY5LDYgQEAgc3RhdGljIHZv aWQgcHZjbG9ja191cGRhdGVfdm1fZ3RvZF9jb3B5KHN0cnVjdCBrdm0gKmt2bSkKIAkJCQkJJmth LT5tYXN0ZXJfY3ljbGVfbm93KTsKIAogCWthLT51c2VfbWFzdGVyX2Nsb2NrID0gaG9zdF90c2Nf Y2xvY2tzb3VyY2UgJiYgdmNwdXNfbWF0Y2hlZAotCQkJCSYmICFiYWNrd2FyZHNfdHNjX29ic2Vy dmVkCiAJCQkJJiYgIWthLT5ib290X3ZjcHVfcnVuc19vbGRfa3ZtY2xvY2s7CiAKIAlpZiAoa2Et PnVzZV9tYXN0ZXJfY2xvY2spCkBAIC03MzY5LDg4ICs3MzY2LDIyIEBAIGludCBrdm1fYXJjaF9o YXJkd2FyZV9lbmFibGUodm9pZCkKIAlzdHJ1Y3Qga3ZtX3ZjcHUgKnZjcHU7CiAJaW50IGk7CiAJ aW50IHJldDsKLQl1NjQgbG9jYWxfdHNjOwotCXU2NCBtYXhfdHNjID0gMDsKLQlib29sIHN0YWJs ZSwgYmFja3dhcmRzX3RzYyA9IGZhbHNlOwogCiAJa3ZtX3NoYXJlZF9tc3JfY3B1X29ubGluZSgp OwogCXJldCA9IGt2bV94ODZfb3BzLT5oYXJkd2FyZV9lbmFibGUoKTsKIAlpZiAocmV0ICE9IDAp CiAJCXJldHVybiByZXQ7CiAKLQlsb2NhbF90c2MgPSByZHRzYygpOwotCXN0YWJsZSA9ICFjaGVj a190c2NfdW5zdGFibGUoKTsKIAlsaXN0X2Zvcl9lYWNoX2VudHJ5KGt2bSwgJnZtX2xpc3QsIHZt X2xpc3QpIHsKIAkJa3ZtX2Zvcl9lYWNoX3ZjcHUoaSwgdmNwdSwga3ZtKSB7Ci0JCQlpZiAoIXN0 YWJsZSAmJiB2Y3B1LT5jcHUgPT0gc21wX3Byb2Nlc3Nvcl9pZCgpKQorCQkJaWYgKHZjcHUtPmNw dSA9PSBzbXBfcHJvY2Vzc29yX2lkKCkpIHsKIAkJCQlrdm1fbWFrZV9yZXF1ZXN0KEtWTV9SRVFf Q0xPQ0tfVVBEQVRFLCB2Y3B1KTsKLQkJCWlmIChzdGFibGUgJiYgdmNwdS0+YXJjaC5sYXN0X2hv c3RfdHNjID4gbG9jYWxfdHNjKSB7Ci0JCQkJYmFja3dhcmRzX3RzYyA9IHRydWU7Ci0JCQkJaWYg KHZjcHUtPmFyY2gubGFzdF9ob3N0X3RzYyA+IG1heF90c2MpCi0JCQkJCW1heF90c2MgPSB2Y3B1 LT5hcmNoLmxhc3RfaG9zdF90c2M7CisJCQkJa3ZtX21ha2VfcmVxdWVzdChLVk1fUkVRX01BU1RF UkNMT0NLX1VQREFURSwKKwkJCQkJCSB2Y3B1KTsKIAkJCX0KIAkJfQogCX0KIAotCS8qCi0JICog U29tZXRpbWVzLCBldmVuIHJlbGlhYmxlIFRTQ3MgZ28gYmFja3dhcmRzLiAgVGhpcyBoYXBwZW5z IG9uCi0JICogcGxhdGZvcm1zIHRoYXQgcmVzZXQgVFNDIGR1cmluZyBzdXNwZW5kIG9yIGhpYmVy bmF0ZSBhY3Rpb25zLCBidXQKLQkgKiBtYWludGFpbiBzeW5jaHJvbml6YXRpb24uICBXZSBtdXN0 IGNvbXBlbnNhdGUuICBGb3J0dW5hdGVseSwgd2UgY2FuCi0JICogZGV0ZWN0IHRoYXQgY29uZGl0 aW9uIGhlcmUsIHdoaWNoIGhhcHBlbnMgZWFybHkgaW4gQ1BVIGJyaW5ndXAsCi0JICogYmVmb3Jl IGFueSBLVk0gdGhyZWFkcyBjYW4gYmUgcnVubmluZy4gIFVuZm9ydHVuYXRlbHksIHdlIGNhbid0 Ci0JICogYnJpbmcgdGhlIFRTQ3MgZnVsbHkgdXAgdG8gZGF0ZSB3aXRoIHJlYWwgdGltZSwgYXMg d2UgYXJlbid0IHlldCBmYXIKLQkgKiBlbm91Z2ggaW50byBDUFUgYnJpbmd1cCB0aGF0IHdlIGtu b3cgaG93IG11Y2ggcmVhbCB0aW1lIGhhcyBhY3R1YWxseQotCSAqIGVsYXBzZWQ7IG91ciBoZWxw ZXIgZnVuY3Rpb24sIGdldF9rZXJuZWxfbnMoKSB3aWxsIGJlIHVzaW5nIGJvb3QKLQkgKiB2YXJp YWJsZXMgdGhhdCBoYXZlbid0IGJlZW4gdXBkYXRlZCB5ZXQuCi0JICoKLQkgKiBTbyB3ZSBzaW1w bHkgZmluZCB0aGUgbWF4aW11bSBvYnNlcnZlZCBUU0MgYWJvdmUsIHRoZW4gcmVjb3JkIHRoZQot CSAqIGFkanVzdG1lbnQgdG8gVFNDIGluIGVhY2ggVkNQVS4gIFdoZW4gdGhlIFZDUFUgbGF0ZXIg Z2V0cyBsb2FkZWQsCi0JICogdGhlIGFkanVzdG1lbnQgd2lsbCBiZSBhcHBsaWVkLiAgTm90ZSB0 aGF0IHdlIGFjY3VtdWxhdGUKLQkgKiBhZGp1c3RtZW50cywgaW4gY2FzZSBtdWx0aXBsZSBzdXNw ZW5kIGN5Y2xlcyBoYXBwZW4gYmVmb3JlIHNvbWUgVkNQVQotCSAqIGdldHMgYSBjaGFuY2UgdG8g cnVuIGFnYWluLiAgSW4gdGhlIGV2ZW50IHRoYXQgbm8gS1ZNIHRocmVhZHMgZ2V0IGEKLQkgKiBj aGFuY2UgdG8gcnVuLCB3ZSB3aWxsIG1pc3MgdGhlIGVudGlyZSBlbGFwc2VkIHBlcmlvZCwgYXMg d2UnbGwgaGF2ZQotCSAqIHJlc2V0IGxhc3RfaG9zdF90c2MsIHNvIFZDUFVzIHdpbGwgbm90IGhh dmUgdGhlIFRTQyBhZGp1c3RlZCBhbmQgbWF5Ci0JICogbG9vc2UgY3ljbGUgdGltZS4gIFRoaXMg aXNuJ3QgdG9vIGJpZyBhIGRlYWwsIHNpbmNlIHRoZSBsb3NzIHdpbGwgYmUKLQkgKiB1bmlmb3Jt IGFjcm9zcyBhbGwgVkNQVXMgKG5vdCB0byBtZW50aW9uIHRoZSBzY2VuYXJpbyBpcyBleHRyZW1l bHkKLQkgKiB1bmxpa2VseSkuIEl0IGlzIHBvc3NpYmxlIHRoYXQgYSBzZWNvbmQgaGliZXJuYXRl IHJlY292ZXJ5IGhhcHBlbnMKLQkgKiBtdWNoIGZhc3RlciB0aGFuIGEgZmlyc3QsIGNhdXNpbmcg dGhlIG9ic2VydmVkIFRTQyBoZXJlIHRvIGJlCi0JICogc21hbGxlcjsgdGhpcyB3b3VsZCByZXF1 aXJlIGFkZGl0aW9uYWwgcGFkZGluZyBhZGp1c3RtZW50LCB3aGljaCBpcwotCSAqIHdoeSB3ZSBz ZXQgbGFzdF9ob3N0X3RzYyB0byB0aGUgbG9jYWwgdHNjIG9ic2VydmVkIGhlcmUuCi0JICoKLQkg KiBOLkIuIC0gdGhpcyBjb2RlIGJlbG93IHJ1bnMgb25seSBvbiBwbGF0Zm9ybXMgd2l0aCByZWxp YWJsZSBUU0MsCi0JICogYXMgdGhhdCBpcyB0aGUgb25seSB3YXkgYmFja3dhcmRzX3RzYyBpcyBz ZXQgYWJvdmUuICBBbHNvIG5vdGUKLQkgKiB0aGF0IHRoaXMgcnVucyBmb3IgQUxMIHZjcHVzLCB3 aGljaCBpcyBub3QgYSBidWc7IGFsbCBWQ1BVcyBzaG91bGQKLQkgKiBoYXZlIHRoZSBzYW1lIGRl bHRhX2N5YyBhZGp1c3RtZW50IGFwcGxpZWQgaWYgYmFja3dhcmRzX3RzYwotCSAqIGlzIGRldGVj dGVkLiAgTm90ZSBmdXJ0aGVyLCB0aGlzIGFkanVzdG1lbnQgaXMgb25seSBkb25lIG9uY2UsCi0J ICogYXMgd2UgcmVzZXQgbGFzdF9ob3N0X3RzYyBvbiBhbGwgVkNQVXMgdG8gc3RvcCB0aGlzIGZy b20gYmVpbmcKLQkgKiBjYWxsZWQgbXVsdGlwbGUgdGltZXMgKG9uZSBmb3IgZWFjaCBwaHlzaWNh bCBDUFUgYnJpbmd1cCkuCi0JICoKLQkgKiBQbGF0Zm9ybXMgd2l0aCB1bnJlbGlhYmxlIFRTQ3Mg ZG9uJ3QgaGF2ZSB0byBkZWFsIHdpdGggdGhpcywgdGhleQotCSAqIHdpbGwgYmUgY29tcGVuc2F0 ZWQgYnkgdGhlIGxvZ2ljIGluIHZjcHVfbG9hZCwgd2hpY2ggc2V0cyB0aGUgVFNDIHRvCi0JICog Y2F0Y2h1cCBtb2RlLiAgVGhpcyB3aWxsIGNhdGNodXAgYWxsIFZDUFVzIHRvIHJlYWwgdGltZSwg YnV0IGNhbm5vdAotCSAqIGd1YXJhbnRlZSB0aGF0IHRoZXkgc3RheSBpbiBwZXJmZWN0IHN5bmNo cm9uaXphdGlvbi4KLQkgKi8KLQlpZiAoYmFja3dhcmRzX3RzYykgewotCQl1NjQgZGVsdGFfY3lj ID0gbWF4X3RzYyAtIGxvY2FsX3RzYzsKLQkJYmFja3dhcmRzX3RzY19vYnNlcnZlZCA9IHRydWU7 Ci0JCWxpc3RfZm9yX2VhY2hfZW50cnkoa3ZtLCAmdm1fbGlzdCwgdm1fbGlzdCkgewotCQkJa3Zt X2Zvcl9lYWNoX3ZjcHUoaSwgdmNwdSwga3ZtKSB7Ci0JCQkJdmNwdS0+YXJjaC50c2Nfb2Zmc2V0 X2FkanVzdG1lbnQgKz0gZGVsdGFfY3ljOwotCQkJCXZjcHUtPmFyY2gubGFzdF9ob3N0X3RzYyA9 IGxvY2FsX3RzYzsKLQkJCQlrdm1fbWFrZV9yZXF1ZXN0KEtWTV9SRVFfTUFTVEVSQ0xPQ0tfVVBE QVRFLCB2Y3B1KTsKLQkJCX0KLQotCQkJLyoKLQkJCSAqIFdlIGhhdmUgdG8gZGlzYWJsZSBUU0Mg b2Zmc2V0IG1hdGNoaW5nLi4gaWYgeW91IHdlcmUKLQkJCSAqIGJvb3RpbmcgYSBWTSB3aGlsZSBp c3N1aW5nIGFuIFM0IGhvc3Qgc3VzcGVuZC4uLi4KLQkJCSAqIHlvdSBtYXkgaGF2ZSBzb21lIHBy b2JsZW0uICBTb2x2aW5nIHRoaXMgaXNzdWUgaXMKLQkJCSAqIGxlZnQgYXMgYW4gZXhlcmNpc2Ug dG8gdGhlIHJlYWRlci4KLQkJCSAqLwotCQkJa3ZtLT5hcmNoLmxhc3RfdHNjX25zZWMgPSAwOwot CQkJa3ZtLT5hcmNoLmxhc3RfdHNjX3dyaXRlID0gMDsKLQkJfQotCi0JfQogCXJldHVybiAwOwog fQogCi0tIAoyLjUuMAoK --e89a8f923faa0cbb3b05267e9a27--