From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: virtio-dev-return-6899-cohuck=redhat.com@lists.oasis-open.org Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id 7016D98425C for ; Mon, 9 Mar 2020 08:50:56 +0000 (UTC) References: <20200309030251-mutt-send-email-mst@kernel.org> From: Jason Wang Message-ID: Date: Mon, 9 Mar 2020 16:50:43 +0800 MIME-Version: 1.0 In-Reply-To: <20200309030251-mutt-send-email-mst@kernel.org> Content-Language: en-US Subject: Re: [virtio-dev] Dirty Page Tracking (DPT) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable To: "Michael S. Tsirkin" , Rob Miller Cc: Virtio-Dev List-ID: On 2020/3/9 =E4=B8=8B=E5=8D=883:38, Michael S. Tsirkin wrote: > On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote: >> I understand that DPT isn't really on the forefront of the vDPA framewor= k, but >> wanted to understand if there any initial thoughts on how this would wor= k... > And judging by the next few chapters, you are actually > talking about vhost pci, right? > >> In the migration framework, in its simplest form, (I gather) its QEMU vi= a KVM >> that is reading the dirty page table, converting bits to page numbers, t= hen >> flushing remote VM/copying local page(s)->remote VM, ect. >> >> While this is fine for a VM (say VM1) dirtying its own memory and the ac= cesses >> are trapped in the kernel as well as the log is being updated, I'm not s= ure >> what happens in the situation=C2=A0of vhost, where a remote VM (say VM2)= is dirtying >> up VM1's memory since it can directly access it, during packet reception= for >> example. >> Whatever technique is employed=C2=A0to catch this, how would this differ= from a HW >> based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is Q= EMU >> going to have a 2nd place to query the dirty logs - ie: the vDPA layer? > I don't think anyone has a good handle at the vhost pci migration yet. > But I think a reasonable way to handle that would be to > activate dirty tracking in VM2's QEMU. > > And then VM2's QEMU would periodically copy the bits to the log - does > this sound right? > >> Further I heard about a SW based DPT within the vDPA framework for those >> devices that do not (yet) support DPT inherently in HW. How is this envi= sioned >> to work? > What I am aware of is simply switching to a software virtio > for the duration of migration. The software can be pretty simple > since the formats match: just copy available entries to device ring, > and for used entries, see a used ring entry, mark page > dirty and then copy used entry to guest ring. That looks more heavyweight than e.g just relay used ring (as what dpdk=20 did) I believe? > > > Another approach that I proposed and was prototyped at some point by > Alex Duyck is guest driver touching the page in question before > processing it within guest e.g. by an atomic xor with 0. > Sounds attractive but didn't perform all that well. Intel posted i40e software solution that traps queue tail/head write.=20 But I'm not sure it's good enough. https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/ > > >> Finally, for those HW vendors that do support DPT in HW, a mapping of a = bit -> >> page isn't really an option, since no one wants to do a byte wide >> read-modify-write across the PCI bus, but rather=C2=A0 map a whole byte = to page is >> likely more desirable - the HW can just do non-posted writes to the dirt= y page >> table. If byte wise, then the QEMU/vDPA layer has to either fix-up the m= apping >> (from byte->bit) or have the capability to handle the granularity diffs. >> >> Thoughts? >> >> Rob Miller >> rob.miller@broadcom.com >> (919)721-3339 > If using an IOMMU, DPT can also be done using either PRI or dirty bit in > a PTE. PRI is an interrupt so it can kick off a thread to set bits in > the log I guess, but if it's the dirty bit then I don't think there's an > interrupt. And a polling thread does not sound attractive. I guess > we'll need a new interface to notify VDPA that QEMU is looking for dirty > logs, and then VDPA can send them to QEMU in some way. Will probably be > good enough to support vendor specific logging interfaces, too. I don't > actually have hardware which supports either so actually coding it up is > not yet practical. Yes, both PRI and PTE dirty bit requires special hardware support. We=20 can extend vDPA API to support both. For page fault, probably just a=20 IOMMU page fault handler. > > Further, at my KVM forum presentaiton I proposed a virtio-specific > pagefault handling interface. If there's a wish to standardize and > implement that, let me know and I will try to write this up in a more > formal way. Besides pagefault, if we want virito to be more like vhost, we need also=20 formalize the device state feching. E.g per vq index etc. Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org