From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Han, Weidong" Subject: RE: [RFC][PATCH] 0/9 Populate-on-demand memory Date: Thu, 25 Dec 2008 13:43:07 +0800 Message-ID: <715D42877B251141A38726ABF5CABF2C018E8307F7@pdsmsx503.ccr.corp.intel.com> References: <0A882F4D99BBF6449D58E61AAFD7EDD603BB49E5@pdsmsx502.ccr.corp.intel.com> <0A882F4D99BBF6449D58E61AAFD7EDD603BB49F5@pdsmsx502.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <0A882F4D99BBF6449D58E61AAFD7EDD603BB49F5@pdsmsx502.ccr.corp.intel.com> Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Tian, Kevin" , 'George Dunlap' Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org Tian, Kevin wrote: >> From: George Dunlap >> Sent: Wednesday, December 24, 2008 10:43 PM >>> Another tricky point could be with VT-d. If one guest page is used >>> as DMA target before balloon driver is installed, and no early >>> access on that page (like start-of-day scrubber), then PoD action >>> will not be triggered... Not sure the possibility of such >>> condition, but you may need to have some thought or guard on that. >>> em... after more thinking, actually PoD pages may be alive even >>> after balloon driver is installed. I guess before coming up a >>> solution you may add a check on whether target domain has >>> passthrough device to decide whether this feature is on on-the-fly.=20 >>=20 >> Hmm, I haven't looked at VT-d integration; it at least requires some >> examination. How are gfns translated to mfns for the VT-d hardware? >> Does it use the hardware EPT tables? Is the transaction re-startable >> if we get an EPT fault and then fix the EPT table? >=20 > there's a VT-d page table walked by VT-d engine, which is similar to > EPT content. When device dma request is intercepted by VT-d engine, > VT-d page table corresponding to that device is walked for valid > mapping. Not like EPT which is restartable, VT-d page fault is just > for log purpose since pci bus doesn't support I/O restart yet > (although pcisig is looking at this possibility). That says, if we > can't find a chance to trigger a cpu page fault before PoD page is > used as dma target, either one should be disabled if both are > configured.=20 >=20 >>=20 >> A second issue is with the emergency sweep: if a page which happens >> to be zero ends up being the target of a DMA, we may get: >> * Device request to write to gfn X, which translates to mfn Y. >> * Demand-fault on gfn Z, with no pages in the cache. >> * Emergency sweep scans through gfn space, finds that mfn Y is empty. >> It replaces gfn X with a PoD entry, and puts mfn Y behind gfn Z. >> * The request finishes. Either the request then fails (because EPT >> translation for gfn X is not valid anymore), or it silently succeeds >> in writing to mfn Y, which is now behind gfn Z instead of gfn X. >=20 > yes, this is also one issue. the request will fail since the dma > address written to device is gfn, while X->Y mapping is cut off due > to sweep.=20 >=20 >>=20 >> If we can't tell that there's an outstanding I/O on the page, then we >> can't do an emergency sweep. If we have some way of knowing that >> there's *some* outstanding I/O to *some* page, we could pause the >> guest until the I/O completes, then do the sweep. >=20 > one possibility is to have a pv dma engine or virtual VT-d engine > within guest, but that's another story. >=20 >>=20 >> At any rate, until we have that worked out, we should probably add >> some "seatbelt" code to make sure that people don't use PoD for a >> VT-d enabled domain. I know absolutely nothing about the VT-d code; >> could you either write a patch to do this check, or give me an idea >> of the simplest thing to check? >=20 > Weidong works on VT-d and could give comments on exact point > to check. >=20 You can simply check "iommu_enabled" to know whether IOMMU including VT-d a= nd AMD IOMMU is used or not. Regards, Weidong