From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables Date: Wed, 16 Dec 2015 15:36:31 +0000 Message-ID: <567184FF.9030702@citrix.com> References: <1449479780-19146-1-git-send-email-huaitong.han@intel.com> <1449479780-19146-8-git-send-email-huaitong.han@intel.com> <5669C23F.6080203@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "Wu, Feng" , "Han, Huaitong" , "jbeulich@suse.com" , "andrew.cooper3@citrix.com" , "Nakajima, Jun" , "Dong, Eddie" , "Tian, Kevin" , "george.dunlap@eu.citrix.com" , "ian.jackson@eu.citrix.com" , "stefano.stabellini@eu.citrix.com" , "ian.campbell@citrix.com" , "wei.liu2@citrix.com" , "keir@xen.org" Cc: Tim Deegan , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org [Adding Tim, the previous mm maintainer] On 11/12/15 09:16, Wu, Feng wrote: >>> +{ >>> + void *xsave_addr; >>> + unsigned int pkru = 0; >>> + bool_t pkru_ad, pkru_wd; >>> + >>> + bool_t uf = !!(pfec & PFEC_user_mode); >>> + bool_t wf = !!(pfec & PFEC_write_access); >>> + bool_t ff = !!(pfec & PFEC_insn_fetch); >>> + bool_t rsvdf = !!(pfec & PFEC_reserved_bit); >>> + bool_t pkuf = !!(pfec & PFEC_prot_key); >> >> So I'm just wondering out loud here -- is there actually any situation >> in which we would want guest_walk_tables to act differently than the >> real hardware? >> >> That is, is there actually any situation where, pku is enabled, the vcpu >> is in long mode, PFEC_write_access and/or PFEC_page_present is set, and >> the pkey is non-zero, that we want guest_walk_tables() to only check the >> write-protect bit for the pte, and not also check the pkru? >> >> Because if not, it seems like it would be much more robust to simply >> *always* check for pkru_ad if PFEC_page_present is set, and for pkru_wd >> if PFEC_write_access is set. > > guest_walk_tables() is also used by shadow code, though we don't > plan to support pkeys for shadow now, however, in that case, the 'pfec' > is generated by hardware, and the pkuf bit may be 0 or 1 depending > on the real page fault. If we unconditionally check pkeys in > guest_walk_tables(), it is not a good ideas for someone who may > want to implement the pkeys for shadow in future, since we only > need to check pkeys when the pkuf is set in pfec. So you'll have to forgive me for being a bit slow here -- I stepped away from the mm code for a while, and a lot has changed since I agreed to become mm maintainer earlier this year. So here is what I see in the tree; please correct me if I missed something important. The function guest_walk_tables(): 1. Accepts a pfec argument which is meant to describe the *access type* that is happening 2. Returns a value with the "wrong" flags (i.e., flags which needed to be set that were missing, or flags that needed to be clear that were set). 3. It checks reserved bits in the pagetable regardless of whether PFEC_reserved_bit is set in the caller, and returns _PAGE_INVALID_BITS if so. 4. If PFEC_insn_fetch is set, it will only check for nx and smep if those features are enabled for the guest. The main callers are the various gva_to_gfn(), which accept a *pointer* to a pfec argument. This pointer is actually bidirectional: its value passed to guest_walk_tables() to determine what access types are checked; what is returned is meant to be a pfec value which can be passed to a guest as part of a fault (and is by several callers). But it may also return the Xen-internal flags, PFEC_page_{paged,shared}. And, importantly, it modifies the pfec value based on the bits that are returned from guest_walk_tables(): It will clear PFEC_present if guest_walk_tables() returns _PAGE_PRESENT, and it will set PFEC_reserved_bit if guest_walk_tables() returns _PAGE_INVALID_BIT. The next logical level up are the hvm_{copy,fetch}_{to,from}_guest_virt(), which also take a pfec argument: but really the only purpose of the pfec argument is to allow the caller to add the PFEC_user_mode flag; the other access-type flags are set automatically by the functions themselves (e.g., "to" sets PFEC_write_access, "fetch" sets PFEC_insn_fetch, &c). Several callers set PFEC_present as well, but callers set any other bits. (hvm_fetch_from_guest_virt() seems to only set PFEC_insn_fetch if nx or smep are enabled in the guest. This seems inconsistent to me with the treatment of PFEC_reserved_bit: it seems like hvm_fetch_from_guest_virt() should always pass in PFEC_insn_fetch, particularly as guest_walk_tables() will already gate the checks based on whether nx or smep is enabled in the guest. Tim, you know of any reason for this?) So there seems to me to be no reason to pass PFEC_prot_key into guest_walk_tables() or gva_to_gfn(). The pfec value passed *into* those should simply indicate the type of memory access being done: present, write, instruction fetch, user. With your current series, guest_walk_tables() already checks for pkeys being enabled in the guest before checking for them in the pagetables. For shadow mode, these will be false, and so no checks will be done. If anyone ever implements pkeys for shadow mode, then these will be enabled, and the checks will be done, without any intervention on the part of the caller. The only thing that's missing is: 1. for gva_to_gfn() to check for _PAGE_PKEY_BITS on the return from guest_walk_tables(), and set PFEC_prot_key if so 2. to set PFEC_insn_fetch when pkeys are enabled in hvm_fetch_from_guest_virt(), so that guest_walk_tables() knows that it's an instruction fetch. >> Then in patch 8, you wouldn't need to go around all the __hvm_copy >> functions adding in PFEC_prot; instead, you'd just need to add >> PFEC_insn_fetch to the "fetch" (as is already done for SMEP and NX), and >> you'd be done. > > Pkeys has no business with instructions fetch, why do we need to add > PFEC_insn_fetch? PFEC_insn_fetch passed *into* guest_walk_tables() indicates that the access is an instruction fetch. Your code treats instruction fetches differently than normal reads, yes? At the moment PFEC_insn_fetch is only set in hvm_fetch_from_guest_virt() if hvm_nx_enabled() or hvm_smep_enabled() are true. Which means that if you *don't* have nx or smep enabled, then the patch series as is will fault on instruction fetches when it shouldn't. (I don't remember anyone mentioning nx or smep being enabled as a prerequisite for pkeys.) (And it seems to me that it would be better to always set PFEC_insn_fetch in hvm_fetch_from_guest_virt(), but that's a bit beyond the scope of what you're trying to accomplish here.) -George