From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Jan Beulich" <JBeulich@suse.com>
Subject: Re: [PATCH v5 10/15] x86/altp2m: add remaining support
 routines.
Date: Tue, 21 Jul 2015 00:38:22 -0600
Message-ID: <55AE04FE020000780009376F@prv-mh.provo.novell.com>
References: <1436832903-12639-1-git-send-email-edmund.h.white@intel.com>
	<1436832903-12639-11-git-send-email-edmund.h.white@intel.com>
	<55A539710200007800090CB0@mail.emea.novell.com>
	<DBC12B0F5509554280826E40BCDEE8BE54FD7766@ORSMSX104.amr.corp.intel.com>
	<55A796B20200007800091C61@mail.emea.novell.com>
	<DBC12B0F5509554280826E40BCDEE8BE54FD9227@ORSMSX104.amr.corp.intel.com>
	<55ACB6EF0200007800092E1B@mail.emea.novell.com>
	<DBC12B0F5509554280826E40BCDEE8BE54FDACDF@ORSMSX104.amr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <DBC12B0F5509554280826E40BCDEE8BE54FDACDF@ORSMSX104.amr.corp.intel.com>
Content-Disposition: inline
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Ravi Sahita <ravi.sahita@intel.com>
Cc: Tim Deegan <tim@xen.org>, Wei Liu <wei.liu2@citrix.com>, George Dunlap <george.dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, Edmund H White <edmund.h.white@intel.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, "tlengyel@novetta.com" <tlengyel@novetta.com>, Daniel De Graaf <dgdegra@tycho.nsa.gov>
List-Id: xen-devel@lists.xenproject.org

>>> On 21.07.15 at 07:46, <ravi.sahita@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>Sent: Sunday, July 19, 2015 11:53 PM
>>>>> On 18.07.15 at 00:32, <ravi.sahita@intel.com> wrote:
>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>Sent: Thursday, July 16, 2015 2:34 AM
>>>>
>>>>>>> On 16.07.15 at 11:16, <ravi.sahita@intel.com> wrote:
>>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>>>Sent: Tuesday, July 14, 2015 7:32 AM
>>>>>>>>> On 14.07.15 at 02:14, <edmund.h.white@intel.com> wrote:
>>>>>>> @@ -2965,9 +3003,15 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>>>>>>unsigned long gla,
>>>>>>>          if ( npfec.write_access )
>>>>>>>          {
>>>>>>>              paging_mark_dirty(currd, mfn_x(mfn));
>>>>>>> +            /* If p2m is really an altp2m, unlock here to avoid
>>>>>>> + lock
>>>>> ordering
>>>>>>> +             * violation when the change below is propagated from
>>>>>>> + host p2m
>>>>> */
>>>>>>> +            if ( ap2m_active )
>>>>>>> +                __put_gfn(p2m, gfn);
>>>>>>>              p2m_change_type_one(currd, gfn, p2m_ram_logdirty,
>>>>>>> p2m_ram_rw);
>>>>>>
>>>>>>And this won't result in any races?
>>>>>
>>>>> No
>>>>
>>>>To be honest I expected a little more than just "no" here. Now I have
>>>>to ask - why?
>>>>
>>>
>>> Yes, I should have described it more than that :-)  so this part of
>>> the code is handling the log dirty transition of the page, and this
>>> page permission transition happens always on the hostp2m. Given the
>>> way the locking order is setup (hostp2m->altp2m-list-lock->altp2m and
>>> there was a separate writeup and discussion with George on that), at
>>> this point in this sequence there is a p2m lock (whether it's a
>>> hostp2m or altp2m lock depends on the mode of the
>>> domain) - the reason we have to drop the lock here first is due to
>>> what happens next; the permission changes in hostp2m will be serially
>>> propagated to altp2ms and not dropping the lock here would cause a
>>> locking order violation. Hope that clarifies.
>>
>>Sadly it doesn't at all: You re-explain why you need to drop the lock, while 
> you
>>fail to say anything on why this won't cause a race.
>>
> 
> It only drops the lock on the altp2m, which is no longer required in this 
> function anyway. The important aspect is that there is still a lock held on 
> the host p2m, and that is dropped after the log-dirty updates, as it would be 
> in the non-altp2m case (maybe that was the part I should have explained 
> clearly in the para above). Does that clarify or do you see a particular race 
> condition here? (We don't ).

Sounds okay then.

>>>>>>> +long p2m_init_altp2m_by_id(struct domain *d, uint16_t idx) {
>>>>>>> +    long rc = -EINVAL;
>>>>>>
>>>>>>Why long (for both variable and function return type)? (More of
>>>>>>these in functions below.)
>>>>>
>>>>> Because the error variable in the code that calls these (in hvm.c)
>>>>> is a long, and you had given feedback earlier to propagate the
>>>>> returns from these functions through that calling code.
>>>>
>>>>I don't see the connection. The function only returns zero or -E...
>>>>values, so why would its return type be "long"?
>>>>
>>>
>>> do_hvm_op declares a rc that is of type "long" and hence this returns
>>> a "long"
>>
>>What type your caller(s) return is of no interest at all here: What would you 
> do
>>if you had multiple callers with differing return types?
>>A function's return type should be chosen based on the range of values it may
>>return, and the result possibly widened to not yield inefficient code (like 
> in
>>some of the uint16_t cases elsewhere in the series would be necessary).
>>
> 
> What do you suggest the return type be?

For the case here - int (quite obviously I would say).

For the uint16_t ones - unsigned int.

>>>>>>> +void p2m_altp2m_propagate_change(struct domain *d, gfn_t gfn,
>>>>>>> +                                 mfn_t mfn, unsigned int page_order,
>>>>>>> +                                 p2m_type_t p2mt, p2m_access_t
>>>>>>> +p2ma) {
>>>>>>> +    struct p2m_domain *p2m;
>>>>>>> +    p2m_access_t a;
>>>>>>> +    p2m_type_t t;
>>>>>>> +    mfn_t m;
>>>>>>> +    uint16_t i;
>>>>>>> +    bool_t reset_p2m;
>>>>>>> +    unsigned int reset_count = 0;
>>>>>>> +    uint16_t last_reset_idx = ~0;
>>>>>>> +
>>>>>>> +    if ( !altp2m_active(d) )
>>>>>>> +        return;
>>>>>>> +
>>>>>>> +    altp2m_list_lock(d);
>>>>>>> +
>>>>>>> +    for ( i = 0; i < MAX_ALTP2M; i++ )
>>>>>>> +    {
>>>>>>> +        if ( d->arch.altp2m_eptp[i] == INVALID_MFN )
>>>>>>> +            continue;
>>>>>>> +
>>>>>>> +        p2m = d->arch.altp2m_p2m[i];
>>>>>>> +        m = get_gfn_type_access(p2m, gfn_x(gfn), &t, &a, 0,
>>>>>>> + NULL);
>>>>>>> +
>>>>>>> +        reset_p2m = 0;
>>>>>>> +
>>>>>>> +        /* Check for a dropped page that may impact this altp2m */
>>>>>>> +        if ( mfn_x(mfn) == INVALID_MFN &&
>>>>>>> +             gfn_x(gfn) >= p2m->min_remapped_gfn &&
>>>>>>> +             gfn_x(gfn) <= p2m->max_remapped_gfn )
>>>>>>> +            reset_p2m = 1;
>>>>>>
>>>>>>Considering that this looks like an optimization, what's the
>>>>>>downside of possibly having min=0 and max=<end-of-address-space>?
>>I.e.
>>>>>>can there a long latency operation result that's this way a guest can
>>effect?
>>>>>>
>>>>>
>>>>> ... A p2m is a gfn->mfn map, amongst other things. There is a
>>>>> reverse
>>>>> mfn->gfn map, but that is only valid for the host p2m. Unless the
>>>>> remap altp2m hypercall is used, the gfn->mfn map in every altp2m
>>>>> mirrors the gfn->mfn map in the host p2m (or a subset thereof, due
>>>>> to lazy-copy), so handling removal of an mfn from a guest is simple:
>>>>> do a reverse look up for the host p2m and mark the relevant gfn as
>>>>> invalid, then do the same for every altp2m where that gfn is currently
>>valid.
>>>>>
>>>>> Remap changes things: it says take gfn1 and replace ->mfn with the
>>>>> ->mfn of gfn2. Here is where the optimization is used and the
>>>>> ->invalidate
>>>>logic is:
>>>>> record the lowest and highest gfn2's that have been used in remap
>>>>> ops; if an mfn is dropped from the hostp2m, for the purposes of
>>>>> altp2m invalidation, see if the gfn derived from the host p2m
>>>>> reverse lookup falls within the range of used gfn2's. If it does, an
>>>>> invalidation is required. Which is why min and max are inited the
>>>>> way they are - hope the explanation clarifies this optimization.
>>>>
>>>>Sadly it doesn't, it just re-states what I already understood and
>>>>doesn't answer the question: What happens if min=0 and
>>>>max=<end-of-address-
>>>>space>? I.e. can the guest nullify the optimization by careful
>>>>space>fiddling
>>> issuing
>>>>some of the new hypercalls, and if so will this have any negative
>>>>impact on
>>> the
>>>>hypervisor? I'm asking this from a security standpoint ...
>>>>
>>>
>>> To take that exact case, If min=0 and max=<end of address space> then
>>> any hostp2m change where the first mfn is dropped, will cause all
>>> altp2ms to be reset even if the mfn dropped doesn't affect altp2ms at
>>> all, which wont serve as an optimization at all - Hope that clarifies.
>>
>>Again - no. I understand the optimization is gone then. But what's the effect?
>>I.e. will the guest, by extending this range to be arbitrarily wide, be able 
> to
>>cause a long latency hypervisor operation (i.e. a DoS)?
>>
> 
> The extent of the range affects the likelihood of an invalidation. It has no 
> impact on the cost of an invalidation (so no its not a DoS issue).
> I'm not sure what change you are suggesting here or just clarification (if 
> you think this optimization is confusing perhaps some documentation of this 
> optimization will help?)

Well, the optimization must be optimizing _something_. And hence
_something_ must go sub-optimal when the optimization is being
subverted. And the question is how much worse un-optimized is
compared to optimized.

It _looks like_ the overall effect really is just to avoid a one time
(for a given non-preemptible operation) reset, but whether that's
really the case depends on the calling contexts (which, as said a
couple of times before, is hard to see for a patch that introduces
functions without callers - hence the question).

>>>>Nor do I find my question answered why max can't be initialized to zero:
>>>>You don't care whether max is a valid GFN when a certain GFN doesn't
>>>>fall in the (then empty) [min, max] range. What am I missing?
>>>
>>> Since 0 is a valid GFN so we cannot initialize min or max to 0 - since
>>> matching this condition (gfn_x(gfn) >= p2m->min_remapped_gfn &&
>>> gfn_x(gfn) <=
>>> p2m->max_remapped_gfn) will cause a reset (throw away) of the altp2m
>>> p2m->to
>>> rebuild it from the hostp2m. So essentially what is being done here is
>>> the range is the non-existent set to start with, unless some altp2m
>>> changes occur, and then it is grown to be the smallest set around the gfns
>>affected.
>>
>>Again you just re-state what was already clear, yet you neglect answering the
>>actual question. Taking what you wrote above, when max=0 (and
>>min=INVALID_GFN), then
>>
>>	gfn_x(gfn) >= p2m->min_remapped_gfn &&
>>	 gfn_x(gfn) <= p2m->max_remapped_gfn
>>
>>will be false for any value of gfn; in fact the "max" part won't even be 
> looked
>>at because the "min" part will already be false for any valid gfn, i.e. only 
> the
>>INVALID_GFN case would make it to the "max" part.
>>
> 
> You are suggesting an alternative which will work, but what we have will 
> also produce the same results, and the results are correct for both cases - so 
> can we go with the approach we have taken currently?

Of course we can, but should we? I.e. why use sub-optimal code
when there clearly is a way to improve it? Counter question - why
are you insisting to use a model requiring (in the earlier pointed out
place) four comparisons when two can do? I realize this is only a
small inefficiency here, but you should realize that they add up if
we don't look to avoid them where possible.

Jan