All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@citrix.com>
To: "Han, Huaitong" <huaitong.han@intel.com>, "Wu, Feng" <feng.wu@intel.com>
Cc: "tim@xen.org" <tim@xen.org>, "Tian, Kevin" <kevin.tian@intel.com>,
	"wei.liu2@citrix.com" <wei.liu2@citrix.com>,
	"ian.campbell@citrix.com" <ian.campbell@citrix.com>,
	"stefano.stabellini@eu.citrix.com"
	<stefano.stabellini@eu.citrix.com>,
	"george.dunlap@eu.citrix.com" <george.dunlap@eu.citrix.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
	"Dong, Eddie" <eddie.dong@intel.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	"jbeulich@suse.com" <jbeulich@suse.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	"keir@xen.org" <keir@xen.org>,
	"ian.jackson@eu.citrix.com" <ian.jackson@eu.citrix.com>
Subject: Re: [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables
Date: Fri, 18 Dec 2015 10:03:32 +0000	[thread overview]
Message-ID: <5673D9F4.5050400@citrix.com> (raw)
In-Reply-To: <1450426913.4557.20.camel@intel.com>

On 18/12/15 08:21, Han, Huaitong wrote:
> On Wed, 2015-12-16 at 15:36 +0000, George Dunlap wrote:
>> [Adding Tim, the previous mm maintainer]
>>
>> On 11/12/15 09:16, Wu, Feng wrote:
>>>>> +{
>>>>> +    void *xsave_addr;
>>>>> +    unsigned int pkru = 0;
>>>>> +    bool_t pkru_ad, pkru_wd;
>>>>> +
>>>>> +    bool_t uf = !!(pfec & PFEC_user_mode);
>>>>> +    bool_t wf = !!(pfec & PFEC_write_access);
>>>>> +    bool_t ff = !!(pfec & PFEC_insn_fetch);
>>>>> +    bool_t rsvdf = !!(pfec & PFEC_reserved_bit);
>>>>> +    bool_t pkuf  = !!(pfec & PFEC_prot_key);
>>>>
>>>> So I'm just wondering out loud here -- is there actually any
>>>> situation
>>>> in which we would want guest_walk_tables to act differently than
>>>> the
>>>> real hardware?
>>>>
>>>> That is, is there actually any situation where, pku is enabled,
>>>> the vcpu
>>>> is in long mode, PFEC_write_access and/or PFEC_page_present is
>>>> set, and
>>>> the pkey is non-zero, that we want guest_walk_tables() to only
>>>> check the
>>>> write-protect bit for the pte, and not also check the pkru?
>>>>
>>>> Because if not, it seems like it would be much more robust to
>>>> simply
>>>> *always* check for pkru_ad if PFEC_page_present is set, and for
>>>> pkru_wd
>>>> if PFEC_write_access is set.
>>>
>>> guest_walk_tables() is also used by shadow code, though we don't
>>> plan to support pkeys for shadow now, however, in that case, the
>>> 'pfec'
>>> is generated by hardware, and the pkuf bit may be 0 or 1 depending
>>> on the real page fault. If we unconditionally check pkeys in
>>> guest_walk_tables(), it is not a good ideas for someone who may
>>> want to implement the pkeys for shadow in future, since we only
>>> need to check pkeys when the pkuf is set in pfec.
>>
>> So you'll have to forgive me for being a bit slow here -- I stepped
>> away
>> from the mm code for a while, and a lot has changed since I agreed to
>> become mm maintainer earlier this year.
>>
>> So here is what I see in the tree; please correct me if I missed
>> something important.
>>
>> The function guest_walk_tables():
>>  1. Accepts a pfec argument which is meant to describe the *access
>> type*
>> that is happening
>>  2. Returns a value with the "wrong" flags (i.e., flags which needed
>> to
>> be set that were missing, or flags that needed to be clear that were
>> set).
>>  3. It checks reserved bits in the pagetable regardless of whether
>> PFEC_reserved_bit is set in the caller, and returns
>> _PAGE_INVALID_BITS
>> if so.
>>  4. If PFEC_insn_fetch is set, it will only check for nx and smep if
>> those features are enabled for the guest.
>>
>> The main callers are the various gva_to_gfn(), which accept a
>> *pointer*
>> to a pfec argument.  This pointer is actually bidirectional: its
>> value
>> passed to guest_walk_tables() to determine what access types are
>> checked; what is returned is meant to be a pfec value which can be
>> passed to a guest as part of a fault (and is by several callers). 
>>  But
>> it may also return the Xen-internal flags, PFEC_page_{paged,shared}.
>> And, importantly, it modifies the pfec value based on the bits that
>> are
>> returned from guest_walk_tables(): It will clear PFEC_present if
>> guest_walk_tables() returns _PAGE_PRESENT, and it will set
>> PFEC_reserved_bit if guest_walk_tables() returns _PAGE_INVALID_BIT.
>>
>> The next logical level up are the
>> hvm_{copy,fetch}_{to,from}_guest_virt(), which also take a pfec
>> argument: but really the only purpose of the pfec argument is to
>> allow
>> the caller to add the PFEC_user_mode flag; the other access-type
>> flags
>> are set automatically by the functions themselves (e.g., "to" sets
>> PFEC_write_access, "fetch" sets PFEC_insn_fetch, &c).  Several
>> callers
>> set PFEC_present as well, but callers set any other bits.
>>
>> (hvm_fetch_from_guest_virt() seems to only set PFEC_insn_fetch if nx
>> or
>> smep are enabled in the guest.  This seems inconsistent to me with
>> the
>> treatment of PFEC_reserved_bit: it seems like
>> hvm_fetch_from_guest_virt() should always pass in PFEC_insn_fetch,
>> particularly as guest_walk_tables() will already gate the checks
>> based
>> on whether nx or smep is enabled in the guest.  Tim, you know of any
>> reason for this?)
>>
>> So there seems to me to be no reason to pass PFEC_prot_key into
>> guest_walk_tables() or gva_to_gfn().  The pfec value passed *into*
>> those
>> should simply indicate the type of memory access being done: present,
>> write, instruction fetch, user.
>>
>> With your current series, guest_walk_tables() already checks for
>> pkeys
>> being enabled in the guest before checking for them in the
>> pagetables.
>> For shadow mode, these will be false, and so no checks will be done. 
>>  If
>> anyone ever implements pkeys for shadow mode, then these will be
>> enabled, and the checks will be done, without any intervention on the
>> part of the caller.
> 
> I have understood it, but, the problem with shadow mode is that pfec
> may come from regs->error_code(hardware), just like:
> rc = sh_walk_guest_tables(v, va, &gw, regs->error_code);
> so, when regs->error_code does not have PFEC_prot_key,
> guest_walk_tables may still check PKEY when codes is writen according
> to what you said, and it maybe return a different result.

Under what situation would the *hardware* not generate PFEC_prot_key,
but guest_walk_tables() would, under exactly the same conditions,
generate PFEC_prot_key?  Shouldn't guest_walk_tables() be made to behave
identically to the hardware?

The only situation where that might legitimately happen is when the
shadow pagetables and the guest pagetables diverge; for instance, when a
page had been write-protected because Xen thought it was a pagetable.
Then the guest pte might have write permission, but the pkey read-only
permission; but the shadow pte would not have write permission.  In that
case, the hardware might give PFEC_write_access but not PFEC_prot_key, I
suppose (haven't read the manual to be sure); but in that case, we
*want* guest_walk_tables() to detect and add PFEC_prot_key, don't we?

 -George




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2015-12-18 10:03 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-07  9:16 [V3 PATCH 0/9] x86/hvm: pkeys, add memory protection-key support Huaitong Han
2015-12-07  9:16 ` [V3 PATCH 1/9] x86/hvm: pkeys, add the flag to enable Memory Protection Keys Huaitong Han
2015-12-10 15:37   ` George Dunlap
2015-12-07  9:16 ` [V3 PATCH 2/9] x86/hvm: pkeys, add pkeys support when setting CR4 Huaitong Han
2015-12-07  9:16 ` [V3 PATCH 3/9] x86/hvm: pkeys, disable pkeys for guests in non-paging mode Huaitong Han
2015-12-07  9:16 ` [V3 PATCH 4/9] x86/hvm: pkeys, add functions to get pkeys value from PTE Huaitong Han
2015-12-10 15:48   ` George Dunlap
2015-12-10 18:47     ` Andrew Cooper
2015-12-07  9:16 ` [V3 PATCH 5/9] x86/hvm: pkeys, add functions to support PKRU access Huaitong Han
2015-12-10 18:48   ` Andrew Cooper
2015-12-07  9:16 ` [V3 PATCH 6/9] x86/hvm: pkeys, add xstate support for pkeys Huaitong Han
2015-12-10 17:39   ` George Dunlap
2015-12-10 18:57   ` Andrew Cooper
2015-12-11  9:36   ` Jan Beulich
2015-12-07  9:16 ` [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables Huaitong Han
2015-12-10 18:19   ` George Dunlap
2015-12-11  9:16     ` Wu, Feng
2015-12-11  9:23       ` Jan Beulich
2015-12-16 15:36       ` George Dunlap
2015-12-16 16:28         ` Tim Deegan
2015-12-16 16:34           ` Andrew Cooper
2015-12-16 17:33             ` Tim Deegan
2015-12-16 16:50           ` George Dunlap
2015-12-16 17:21             ` Tim Deegan
2015-12-18  8:21         ` Han, Huaitong
2015-12-18 10:03           ` George Dunlap [this message]
2015-12-18 11:46           ` Tim Deegan
2015-12-11  9:23     ` Han, Huaitong
2015-12-11  9:50       ` Jan Beulich
2015-12-11  9:26     ` Jan Beulich
2015-12-15  8:14       ` Han, Huaitong
2015-12-15  9:02         ` Jan Beulich
2015-12-16  8:16           ` Han, Huaitong
2015-12-16  8:32             ` Jan Beulich
2015-12-16  9:03               ` Han, Huaitong
2015-12-16  9:12                 ` Jan Beulich
2015-12-17  9:18                   ` Han, Huaitong
2015-12-17 10:05                     ` Jan Beulich
2015-12-10 18:59   ` Andrew Cooper
2015-12-11  7:18     ` Han, Huaitong
2015-12-11  8:48       ` Andrew Cooper
2015-12-07  9:16 ` [V3 PATCH 8/9] x86/hvm: pkeys, add pkeys support for gva2gfn funcitons Huaitong Han
2015-12-07  9:16 ` [V3 PATCH 9/9] x86/hvm: pkeys, add pkeys support for cpuid handling Huaitong Han
2015-12-11  9:47   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5673D9F4.5050400@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=eddie.dong@intel.com \
    --cc=feng.wu@intel.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=huaitong.han@intel.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=tim@xen.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.