From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Jan Beulich" <JBeulich@suse.com>
Subject: Re: [PATCH] x86/HVM: honor p2m_ram_ro in
 hvm_map_guest_frame_rw()
Date: Fri, 24 Jul 2015 06:33:17 -0600
Message-ID: <55B24CAD020000780009528C@prv-mh.provo.novell.com>
References: <55B224660200007800095083@prv-mh.provo.novell.com>
	<55B22964.2030701@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <JBeulich@suse.com>) id 1ZIcAD-0007cX-LL
	for xen-devel@lists.xenproject.org; Fri, 24 Jul 2015 12:33:21 +0000
In-Reply-To: <55B22964.2030701@citrix.com>
Content-Disposition: inline
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>, suravee.suthikulpanit@amd.com, Eddie Dong <eddie.dong@intel.com>, Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>, Jun Nakajima <jun.nakajima@intel.com>, xen-devel <xen-devel@lists.xenproject.org>, Boris Ostrovsky <boris.ostrovsky@oracle.com>, Keir Fraser <keir@xen.org>
List-Id: xen-devel@lists.xenproject.org

>>> On 24.07.15 at 14:02, <andrew.cooper3@citrix.com> wrote:
> On 24/07/15 10:41, Jan Beulich wrote:
>> ... and its callers.
>>
>> While all non-nested users are made fully honor the semantics of that
>> type, doing so in the nested case seemed insane (if doable at all,
>> considering VMCS shadowing), and hence there the respective operations
>> are simply made fail.
> 
> Sorry, but I can't parse this sentence.  Surely in the nested case, it 
> is the host p2m type which is relevant to whether a mapping should be 
> forced read only?

No, what I mean to say is
- callers outside of nested-HVM code properly obey the write-ignore
  semantics
- callers inside nested-HVM code would be too cumbersome (and
  maybe impossible) to fix, and hence they're being made return
  failure to their callers.

>> Beyond that log-dirty handling in _hvm_map_guest_frame() looks bogus
>> too: What if a XEN_DOMCTL_SHADOW_OP_* gets issued and acted upon
>> between the setting of the dirty flag and the actual write happening?
>> I.e. shouldn't the flag instead be set in hvm_unmap_guest_frame()?
> 
> It does indeed.  (Ideally the dirty bit should probably be held high for 
> the duration that a mapping exists, but that is absolutely infeasible to 
> do).

I don't see this being too difficult, the more that for transient
mappings it doesn't really matter (if there's a race, then setting
the flag after the write(s) is good enough). For permanent
mappings I can't see why we wouldn't be able to add a (short)
linked list of pages paging_log_dirty_op() should always set the
dirty flags for.

>> @@ -3797,6 +3805,7 @@ static int hvm_load_segment_selector(
>>               break;
>>           }
>>       } while ( !(desc.b & 0x100) && /* Ensure Accessed flag is set */
>> +              writable && /* except if we are to discard writes */
>>                 (cmpxchg(&pdesc->b, desc.b, desc.b | 0x100) != desc.b) );
> 
> I can't recall where I read it in the manual, but I believe it is a 
> faultable error to load a descriptor from RO memory if the accessed bit 
> is not already set.  This was to prevent a processor livelock when 
> running with gdtr pointing into ROM (which was a considered usecase).

I don't see why a processor would live-lock in such a case. It can do
the write, and ignore whether it actually too effect. I don't see why
it would e.g. spin until it sees the flag set. (Note that a cmpxchg()
like loop alone wouldn't have that problem, i.e. for a live lock to occur
there would still need to be an outer loop doing the checking).

But even it there was such (perhaps even model specific) behavior,
without having a pointer to where this is specified (and hence what
precise fault [and error code] to raise), I wouldn't want to go that
route here.

Jan