From: Peter Xu <peterx@redhat.com> To: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com>, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, "Dr . David Alan Gilbert" <dgilbert@redhat.com>, Vitaly Kuznetsov <vkuznets@redhat.com> Subject: Re: [PATCH RFC 04/15] KVM: Implement ring-based dirty memory tracking Date: Tue, 10 Dec 2019 10:52:59 -0500 Message-ID: <20191210155259.GD3352@xz-x1> (raw) In-Reply-To: <affd9d84-1b84-0c25-c431-a075c58c33dc@redhat.com> On Tue, Dec 10, 2019 at 11:07:31AM +0100, Paolo Bonzini wrote: > > I'm thinking whether I can start > > to use this information in the next post on solving an issue I > > encountered with the waitqueue. > > > > Current waitqueue is still problematic in that it could wait even with > > the mmu lock held when with vcpu context. > > I think the idea of the soft limit is that the waiting just cannot > happen. That is, the number of dirtied pages _outside_ the guest (guest > accesses are taken care of by PML, and are subtracted from the soft > limit) cannot exceed hard_limit - (soft_limit + pml_size). So the question go backs to, whether this is guaranteed somehow? Or do you prefer us to keep the warn_on_once until it triggers then we can analyze (which I doubt..)? One thing to mention is that for with-vcpu cases, we probably can even stop KVM_RUN immediately as long as either the per-vm or per-vcpu ring reaches the softlimit, then for vcpu case it should be easier to guarantee that. What I want to know is the rest of cases like ioctls or even something not from the userspace (which I think I should read more later..). If the answer is yes, I'd be more than glad to drop the waitqueue. > > > The issue is KVM_RESET_DIRTY_RINGS needs the mmu lock to manipulate > > the write bits, while it's the only interface to also wake up the > > dirty ring sleepers. They could dead lock like this: > > > > main thread vcpu thread > > =========== =========== > > kvm page fault > > mark_page_dirty_in_slot > > mmu lock taken > > mark dirty, ring full > > queue on waitqueue > > (with mmu lock) > > KVM_RESET_DIRTY_RINGS > > take mmu lock <------------ deadlock here > > reset ring gfns > > wakeup dirty ring sleepers > > > > And if we see if the mark_page_dirty_in_slot() is not with a vcpu > > context (e.g. kvm_mmu_page_fault) but with an ioctl context (those > > cases we'll use per-vm dirty ring) then it's probably fine. > > > > My planned solution: > > > > - When kvm_get_running_vcpu() != NULL, we postpone the waitqueue waits > > until we finished handling this page fault, probably in somewhere > > around vcpu_enter_guest, so that we can do wait_event() after the > > mmu lock released > > I think this can cause a race: > > vCPU 1 vCPU 2 host > --------------------------------------------------------------- > mark page dirty > write to page > treat page as not dirty > add page to ring > > where vCPU 2 skips the clean-page slow path entirely. If we're still with the rule in userspace that we first do RESET then collect and send the pages (just like what we've discussed before), then IMHO it's fine to have vcpu2 to skip the slow path? Because RESET happens at "treat page as not dirty", then if we are sure that we only collect and send pages after that point, then the latest "write to page" data from vcpu2 won't be lost even if vcpu2 is not blocked by vcpu1's ring full? Maybe we can also consider to let mark_page_dirty_in_slot() return a value, then the upper layer could have a chance to skip the spte update if mark_page_dirty_in_slot() fails to mark the dirty bit, so it can return directly with RET_PF_RETRY. Thanks, -- Peter Xu
next prev parent reply index Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-29 21:34 [PATCH RFC 00/15] KVM: Dirty ring interface Peter Xu 2019-11-29 21:34 ` [PATCH RFC 01/15] KVM: Move running VCPU from ARM to common code Peter Xu 2019-12-03 19:01 ` Sean Christopherson 2019-12-04 9:42 ` Paolo Bonzini 2019-12-09 22:05 ` Peter Xu 2019-11-29 21:34 ` [PATCH RFC 02/15] KVM: Add kvm/vcpu argument to mark_dirty_page_in_slot Peter Xu 2019-12-02 19:32 ` Sean Christopherson 2019-12-02 20:49 ` Peter Xu 2019-11-29 21:34 ` [PATCH RFC 03/15] KVM: Add build-time error check on kvm_run size Peter Xu 2019-12-02 19:30 ` Sean Christopherson 2019-12-02 20:53 ` Peter Xu 2019-12-02 22:19 ` Sean Christopherson 2019-12-02 22:40 ` Peter Xu 2019-12-03 5:50 ` Sean Christopherson 2019-12-03 13:41 ` Paolo Bonzini 2019-12-03 17:04 ` Peter Xu 2019-11-29 21:34 ` [PATCH RFC 04/15] KVM: Implement ring-based dirty memory tracking Peter Xu 2019-12-02 20:10 ` Sean Christopherson 2019-12-02 21:16 ` Peter Xu 2019-12-02 21:50 ` Sean Christopherson 2019-12-02 23:09 ` Peter Xu 2019-12-03 13:48 ` Paolo Bonzini 2019-12-03 18:46 ` Sean Christopherson 2019-12-04 10:05 ` Paolo Bonzini 2019-12-07 0:29 ` Sean Christopherson 2019-12-09 9:37 ` Paolo Bonzini 2019-12-09 21:54 ` Peter Xu 2019-12-10 10:07 ` Paolo Bonzini 2019-12-10 15:52 ` Peter Xu [this message] 2019-12-10 17:09 ` Paolo Bonzini 2019-12-15 17:21 ` Peter Xu 2019-12-16 10:08 ` Paolo Bonzini 2019-12-16 18:54 ` Peter Xu 2019-12-17 9:01 ` Paolo Bonzini 2019-12-17 16:24 ` Peter Xu 2019-12-17 16:28 ` Paolo Bonzini 2019-12-18 21:58 ` Peter Xu 2019-12-18 22:24 ` Sean Christopherson 2019-12-18 22:37 ` Paolo Bonzini 2019-12-18 22:49 ` Peter Xu 2019-12-17 2:28 ` Tian, Kevin 2019-12-17 16:18 ` Alex Williamson 2019-12-17 16:30 ` Paolo Bonzini 2019-12-18 0:29 ` Tian, Kevin [not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D19D645E5F@SHSMSX104.ccr.corp.intel.com> 2019-12-17 5:17 ` Tian, Kevin 2019-12-17 5:25 ` Yan Zhao 2019-12-17 16:24 ` Alex Williamson 2019-12-03 19:13 ` Sean Christopherson 2019-12-04 10:14 ` Paolo Bonzini 2019-12-04 14:33 ` Sean Christopherson 2019-12-04 10:38 ` Jason Wang 2019-12-04 11:04 ` Paolo Bonzini 2019-12-04 19:52 ` Peter Xu 2019-12-05 6:51 ` Jason Wang 2019-12-05 12:08 ` Peter Xu 2019-12-05 13:12 ` Jason Wang 2019-12-10 13:25 ` Michael S. Tsirkin 2019-12-10 13:31 ` Paolo Bonzini 2019-12-10 16:02 ` Peter Xu 2019-12-10 21:53 ` Michael S. Tsirkin 2019-12-11 9:05 ` Paolo Bonzini 2019-12-11 13:04 ` Michael S. Tsirkin 2019-12-11 14:54 ` Peter Xu 2019-12-10 21:48 ` Michael S. Tsirkin 2019-12-11 12:53 ` Michael S. Tsirkin 2019-12-11 14:14 ` Paolo Bonzini 2019-12-11 20:59 ` Peter Xu 2019-12-11 22:57 ` Michael S. Tsirkin 2019-12-12 0:08 ` Paolo Bonzini 2019-12-12 7:36 ` Michael S. Tsirkin 2019-12-12 8:12 ` Paolo Bonzini 2019-12-12 10:38 ` Michael S. Tsirkin 2019-12-15 17:33 ` Peter Xu 2019-12-16 9:47 ` Michael S. Tsirkin 2019-12-16 15:07 ` Peter Xu 2019-12-16 15:33 ` Michael S. Tsirkin 2019-12-16 15:47 ` Peter Xu 2019-12-11 17:24 ` Christophe de Dinechin 2019-12-13 20:23 ` Peter Xu 2019-12-14 7:57 ` Paolo Bonzini 2019-12-14 16:26 ` Peter Xu 2019-12-16 9:29 ` Paolo Bonzini 2019-12-16 15:26 ` Peter Xu 2019-12-16 15:31 ` Paolo Bonzini 2019-12-16 15:43 ` Peter Xu 2019-12-17 12:16 ` Christophe de Dinechin 2019-12-17 12:19 ` Paolo Bonzini 2019-12-17 15:38 ` Peter Xu 2019-12-17 16:31 ` Paolo Bonzini 2019-12-17 16:42 ` Peter Xu 2019-12-17 16:48 ` Paolo Bonzini 2019-12-17 19:41 ` Peter Xu 2019-12-18 0:33 ` Paolo Bonzini 2019-12-18 16:32 ` Peter Xu 2019-12-18 16:41 ` Paolo Bonzini 2019-12-20 18:19 ` Peter Xu 2019-11-29 21:34 ` [PATCH RFC 05/15] KVM: Make dirty ring exclusive to dirty bitmap log Peter Xu 2019-11-29 21:34 ` [PATCH RFC 06/15] KVM: Introduce dirty ring wait queue Peter Xu 2019-11-29 21:34 ` [PATCH RFC 07/15] KVM: X86: Implement ring-based dirty memory tracking Peter Xu 2019-11-29 21:34 ` [PATCH RFC 08/15] KVM: selftests: Always clear dirty bitmap after iteration Peter Xu 2019-11-29 21:34 ` [PATCH RFC 09/15] KVM: selftests: Sync uapi/linux/kvm.h to tools/ Peter Xu 2019-11-29 21:35 ` [PATCH RFC 10/15] KVM: selftests: Use a single binary for dirty/clear log test Peter Xu 2019-11-29 21:35 ` [PATCH RFC 11/15] KVM: selftests: Introduce after_vcpu_run hook for dirty " Peter Xu 2019-11-29 21:35 ` [PATCH RFC 12/15] KVM: selftests: Add dirty ring buffer test Peter Xu 2019-11-29 21:35 ` [PATCH RFC 13/15] KVM: selftests: Let dirty_log_test async for dirty ring test Peter Xu 2019-11-29 21:35 ` [PATCH RFC 14/15] KVM: selftests: Add "-c" parameter to dirty log test Peter Xu 2019-11-29 21:35 ` [PATCH RFC 15/15] KVM: selftests: Test dirty ring waitqueue Peter Xu 2019-11-30 8:29 ` [PATCH RFC 00/15] KVM: Dirty ring interface Paolo Bonzini 2019-12-02 2:13 ` Peter Xu 2019-12-03 13:59 ` Paolo Bonzini 2019-12-05 19:30 ` Peter Xu 2019-12-05 19:59 ` Paolo Bonzini 2019-12-05 20:52 ` Peter Xu 2019-12-02 20:21 ` Sean Christopherson 2019-12-02 20:43 ` Peter Xu 2019-12-04 10:39 ` Jason Wang 2019-12-04 19:33 ` Peter Xu 2019-12-05 6:49 ` Jason Wang 2019-12-11 13:41 ` Christophe de Dinechin 2019-12-11 14:16 ` Paolo Bonzini 2019-12-11 17:15 ` Peter Xu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191210155259.GD3352@xz-x1 \ --to=peterx@redhat.com \ --cc=dgilbert@redhat.com \ --cc=kvm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=pbonzini@redhat.com \ --cc=sean.j.christopherson@intel.com \ --cc=vkuznets@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
KVM Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \ kvm@vger.kernel.org public-inbox-index kvm Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.kvm AGPL code for this site: git clone https://public-inbox.org/public-inbox.git