Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.

From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: George Dunlap <george.dunlap@citrix.com>,
	Jan Beulich <JBeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>,
	xen-devel@lists.xen.org, Paul Durrant <paul.durrant@citrix.com>,
	zhiyuan.lv@intel.com, JunNakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
Date: Mon, 20 Jun 2016 17:03:27 +0800	[thread overview]
Message-ID: <5767B15F.80506@linux.intel.com> (raw)
In-Reply-To: <5763CE32.7090803@citrix.com>

On 6/17/2016 6:17 PM, George Dunlap wrote:
> On 16/06/16 10:55, Jan Beulich wrote:
>>> Previously in the 2nd version, I used p2m_change_entry_type_global() to
>>> reset the
>>> outstanding p2m_ioreq_server entries back to p2m_ram_rw asynchronously after
>>> the de-registration. But we realized later that this approach means we
>>> can not support
>>> live migration. And to recalculate the whole p2m table forcefully when
>>> de-registration
>>> happens means too much cost.
>>>
>>> And further discussion with Paul was that we can leave the
>>> responsibility to reset p2m type
>>> to the device model side, and even a device model fails to do so, the
>>> affected one will only
>>> be the current VM, neither other VM nor hypervisor will get hurt.
>>>
>>> I thought we have reached agreement in the review process of version 2,
>>> so I removed
>>> this part from version 3.
>> In which case I would appreciate the commit message to explain
>> this (in particular I admit I don't recall why live migration would
>> be affected by the p2m_change_entry_type_global() approach,
>> but the request is also so that later readers have at least some
>> source of information other than searching the mailing list).
> Yes, I don't see why either.  You wouldn't de-register the ioreq server
> until after the final sweep after the VM has been paused, right?  At
> which point the lazy p2m re-calculation shouldn't really matter much I
> don't think.

Oh, seems I need to give some explanation, and sorry for the late reply.

IIUC, p2m_change_entry_type_global() only sets the e.emt field to an 
invalid value and turn on
the e.recal flag; the real p2m reset is done in resolve_misconfig() when 
ept misconfiguration
happens or when ept_set_entry() is called.

In the 2nd version patch, we leveraged this approach, by adding 
p2m_ioreq_server into the
P2M_CHANGEABLE_TYPES, and triggering the p2m_change_entry_type_global() 
when an ioreq
server is unbounded, hoping that later accesses to these gfns will reset 
the p2m type back to
p2m_ram_rw. And for the recalculation itself, it works.

However, there are conflicts if we take live migration  into account, 
i.e. if the live migration is
triggered by the user(unintentionally maybe) during the gpu emulation 
process, resolve_misconfig()
will set all the outstanding p2m_ioreq_server entries to p2m_log_dirty, 
which is not what we expected,
because our intention is to only reset the outdated p2m_ioreq_server 
entries back to p2m_ram_rw.
Adding special treatment for p2m_ioreq_server in resolve_misconfig() is 
not enough, because we can
not judge if the gpu emulation is in process by checking if 
p2m->ioreq_server is NULL - it might be
detached from ioreq server A(with some p2m_ioreq_server entries left to 
be recalculated) and then
attached to ioreq server B.

So one solution is to disallow the log dirty feature in XenGT, i.e. just 
return failure when enable_logdirty()
is called in toolstack. But I'm afraid this will restrict XenGT's future 
live migration feature.

Another proposal is to reset the p2m type by traversing the ept table 
synchronously when ioreq server is
detached, but this approach is time consuming.

So after further discussion with Paul, our conclusion is that, the p2m 
type resetting after the ioreq
server detachment is not a must. The worst case is wrong ioreq server be 
notified, but it will not affect
other VMs or the hypervisor. And it should be the device model's 
responsibility to take care of its
correctness. And if XenGT live migration is to be supported in the 
future, we can still leverage the log
dirty code to keep track of the normal guest ram pages, and for the 
emulated guest rams(i.e. gpu page
tables), device model's cooperation would be necessary.

Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel