From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [PATCH RFC 1/4] x86/mm: Shadow and p2m changes for PV mem_access Date: Tue, 29 Apr 2014 09:50:38 +0100 Message-ID: <535F83FE020000780000D397@nat28.tlf.novell.com> References: <1398746705-6658-1-git-send-email-aravindp@cisco.com> <1398746705-6658-2-git-send-email-aravindp@cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Wf3ke-0004bo-D1 for xen-devel@lists.xenproject.org; Tue, 29 Apr 2014 08:50:56 +0000 In-Reply-To: <1398746705-6658-2-git-send-email-aravindp@cisco.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Aravindh Puthiyaparambil Cc: xen-devel@lists.xenproject.org, Keir Fraser , Ian Jackson , Ian Campbell , Tim Deegan List-Id: xen-devel@lists.xenproject.org >>> On 29.04.14 at 06:45, wrote: > +void p2m_access_to_flags(u32 *flags, u32 gflags, p2m_access_t access) > +{ > + > + /* Restrict with access permissions */ > + switch (access) > + { > + case p2m_access_r: > + *flags &= ~(_PAGE_RW); > + *flags |= (_PAGE_NX_BIT|_PAGE_PRESENT); > + break; > + case p2m_access_rx: > + case p2m_access_rx2rw: > + *flags &= ~(_PAGE_NX_BIT|_PAGE_RW); > + *flags |= _PAGE_PRESENT; > + break; > + case p2m_access_rw: > + *flags |= (_PAGE_NX_BIT|_PAGE_RW|_PAGE_PRESENT); > + break; > + case p2m_access_rwx: > + default: > + *flags &= ~(_PAGE_NX_BIT); > + *flags |= (_PAGE_RW|_PAGE_PRESENT); > + break; > + } Tim will have the final say here, but I'd suggest dropping all these needless parentheses (I see only one case where they're really needed). > + > + // Allow more restrictive guest flags to be propagated instead of access > + // permissions Coding style (there are more of these, which I'm not going to individually comment on). > + if ( !(gflags & _PAGE_RW) ) > + *flags &= ~(_PAGE_RW); > + > + if ( gflags & _PAGE_NX_BIT ) > + *flags |= _PAGE_NX_BIT; And now you get things even into inconsistent state wrt parentheses. > +static int > +p2m_mem_access_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, > + unsigned int page_order, p2m_type_t p2mt, > + p2m_access_t p2ma) > +{ > + struct domain *d = p2m->domain; > + mfn_t *access_lookup_table = p2m->access_lookup_table; > + uint table_idx; > + uint page_idx; > + uint8_t *access_table_page; > + > + ASSERT(shadow_mode_mem_access(d) && access_lookup_table != NULL); Better split into two ASSERT()s, so that if it triggers one would know which of the two conditions wasn't met? > + > + /* For PV domains we only support rw, rx, rx2rw, rwx access permissions */ > + if ( unlikely(p2ma != p2m_access_r && > + p2ma != p2m_access_rw && > + p2ma != p2m_access_rx && > + p2ma != p2m_access_rwx && > + p2ma != p2m_access_rx2rw) ) > + return -EINVAL; Perhaps better done as a switch() statement. > + > + if ( page_get_owner(mfn_to_page(mfn)) != d ) > + return -ENOENT; > + > + gfn = get_gpfn_from_mfn(mfn_x(mfn)); You get "gfn" passed in and blindly overwrite it here? _If_ you need to do this lookup, shouldn't you instead check it matches the passed in one? > + > + /* > + * Values with the MSB set denote MFNs that aren't really part of the > + * domain's pseudo-physical memory map (e.g., the shared info frame). > + * Nothing to do here. > + */ > + if ( unlikely(!VALID_M2P(gfn)) ) > + return 0; > + > + if ( gfn > (d->tot_pages - 1) ) > + return -EINVAL; Hardly - a PV domain can easily make its address space sparse, and iirc pv-ops Linux even does so by default to avoid PFN/MFN collisions on MMIO space. (And as a side note, this would better be "gfn >= d->tot_pages".) > + paging_lock(d); > + > + table_idx = MEM_ACCESS_TABLE_IDX(gfn); > + page_idx = MEM_ACCESS_PAGE_IDX(gfn); > + access_table_page = map_domain_page(mfn_x(access_lookup_table[table_idx])); > + access_table_page[page_idx] = p2ma; > + unmap_domain_page(access_table_page); > + > + if ( sh_remove_all_mappings(d->vcpu[0], mfn) ) Is there anything guaranteeing d->vcpu and d->vcpu[0] being non- NULL? > +static mfn_t > +p2m_mem_access_get_entry(struct p2m_domain *p2m, unsigned long gfn, > + p2m_type_t *t, p2m_access_t *a, p2m_query_t q, > + unsigned int *page_order) > +{ > + struct domain *d = p2m->domain; > + mfn_t *access_lookup_table = p2m->access_lookup_table; > + uint table_idx; > + uint page_idx; > + uint8_t *access_table_page; > + mfn_t mfn = _mfn(gfn); // For PV guests mfn == gfn > + > + ASSERT(shadow_mode_mem_access(d) && access_lookup_table != NULL); > + > + /* Not necessarily true, but for non-translated guests, we claim > + * it's the most generic kind of memory */ I think you copied this comment verbatim from elsewhere, but I don't think it's correct as is. > + *t = p2m_ram_rw; > + > + if ( page_get_owner(mfn_to_page(mfn)) != d ) > + return _mfn(INVALID_MFN); > + > + gfn = get_gpfn_from_mfn(mfn_x(mfn)); Same comment as earlier. > + if ( gfn > (d->tot_pages - 1) ) Dito. > + table_idx = MEM_ACCESS_TABLE_IDX(gfn); > + page_idx = MEM_ACCESS_PAGE_IDX(gfn); > + > + access_table_page = map_domain_page(mfn_x(access_lookup_table[table_idx])); > + > + /* This is a hint to take the default permissions */ > + if ( access_table_page[page_idx] == p2m_access_n ) > + access_table_page[page_idx] = p2m->default_access; We're in "get" here - why does that modify any global state? > +void p2m_mem_access_teardown(struct p2m_domain *p2m) > +{ > + struct domain *d = p2m->domain; > + mfn_t *access_lookup_table = p2m->access_lookup_table; > + uint32_t nr_access_table_pages; > + uint32_t ctr; > + > + /* Reset the set_entry and get_entry function pointers */ > + p2m_pt_init(p2m); > + > + if ( !access_lookup_table ) > + return; > + > + nr_access_table_pages = get_domain_nr_access_table_pages(d); > + > + for ( ctr = 0; ctr < nr_access_table_pages; ctr++ ) No new effectively unbounded loops please. > +int p2m_mem_access_init(struct p2m_domain *p2m) > +{ > + struct domain *d = p2m->domain; > + mfn_t *access_lookup_table; > + uint32_t nr_access_table_pages; > + uint32_t ctr; > + > + nr_access_table_pages = get_domain_nr_access_table_pages(d); > + access_lookup_table = xzalloc_array(mfn_t, nr_access_table_pages); This surely is going to be an order > 0 allocation, and you even risk it being an order > MAX_ORDER one. The former is disallowed at runtime by convention/agreement, the latter is an outright bug. (And again just as a side note - you don't appear to need the zero initialization here.) > + if ( !access_lookup_table ) > + return -ENOMEM; > + > + p2m->access_lookup_table = access_lookup_table; > + > + for ( ctr = 0; ctr < nr_access_table_pages; ctr++ ) Same comment regarding the effective unboundedness of the loop. > @@ -1414,6 +1419,8 @@ long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr, > if ( pfn == ~0ul ) > { > p2m->default_access = a; > + if ( is_pv_domain(d) ) > + return p2m_mem_access_set_default(p2m); Is it really correct to set p2m->default_access _before_ calling that function? Perhaps it wouldn't be correct doing it the other way around either - I suppose you need to hold the necessary lock across both operations. > @@ -585,9 +585,17 @@ int paging_domctl(struct domain *d, xen_domctl_shadow_op_t *sc, > { > > case XEN_DOMCTL_SHADOW_OP_ENABLE: > + /* > + * Shadow mem_access mode should only be enabled when mem_access is > + * enabled in XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE. > + */ > + if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_MEM_ACCESS ) > + return -EINVAL; > + > if ( !(sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY) ) > break; > /* Else fall through... */ > + > case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY: Stray blank line - the fall through here makes it preferable to keep the two case blocks un-separated. > @@ -2443,7 +2443,8 @@ int sh_remove_all_mappings(struct vcpu *v, mfn_t gmfn) > if ( !(shadow_mode_external(v->domain) > && (page->count_info & PGC_count_mask) <= 3 > && ((page->u.inuse.type_info & PGT_count_mask) > - == !!is_xen_heap_page(page))) ) > + == !!is_xen_heap_page(page))) > + && !(shadow_mode_mem_access(v->domain)) ) You're breaking indentation, there are pointless parentheses again, but most importantly - why? > @@ -625,6 +627,20 @@ _sh_propagate(struct vcpu *v, > } > } > > + /* Propagate access permissions */ > + if ( unlikely((level == 1) && > + mem_event_check_ring(&d->mem_event->access) && > + !sh_mfn_is_a_page_table(target_mfn)) ) Just a general remark here - unlikely() used like this is pointless, it ought to be used on the individual constituents of && or ||. > + { > + struct p2m_domain *p2m = p2m_get_hostp2m(d); > + p2m_access_t a; > + p2m_type_t t; > + mfn_t mfn; > + mfn = p2m->get_entry(p2m, mfn_x(target_mfn), &t, &a, 0, NULL); Missing blank line between declarations and statements. Also this statement could instead be an initializer. > @@ -2822,6 +2838,8 @@ static int sh_page_fault(struct vcpu *v, > int r; > fetch_type_t ft = 0; > p2m_type_t p2mt; > + p2m_access_t p2ma; This variable ... > @@ -3009,7 +3027,80 @@ static int sh_page_fault(struct vcpu *v, > > /* What mfn is the guest trying to access? */ > gfn = guest_l1e_get_gfn(gw.l1e); > - gmfn = get_gfn(d, gfn, &p2mt); > + if ( likely(!mem_event_check_ring(&d->mem_event->access)) ) > + gmfn = get_gfn(d, gfn, &p2mt); > + /* > + * A mem_access listener is present, so we will first check if a violation > + * has occurred. > + */ > + else > + { > + struct p2m_domain *p2m = p2m_get_hostp2m(v->domain); ... seems to be used only in this scope, and with you already adding scope restricted variables it should go here. > + > + gmfn = get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma, 0, NULL); > + if ( mfn_valid(gmfn) && !sh_mfn_is_a_page_table(gmfn) > + && (regs->error_code & PFEC_page_present) > + && !(regs->error_code & PFEC_reserved_bit) ) > + { > + int violation = 0; > + bool_t access_w = !!(regs->error_code & PFEC_write_access); > + bool_t access_x = !!(regs->error_code & PFEC_insn_fetch); > + bool_t access_r = access_x ? 0 : !(access_w); Stray parentheses again. I'm not going to repeat this again - please just check your code before submitting. > + /* > + * Do not police writes to guest memory emanating from the Xen > + * kernel. Trying to do so will cause the same pagefault to occur > + * over and over again with an event being sent to the access > + * listener for each fault. If the access listener's vcpu is not > + * scheduled during this time, the violation is never resolved and > + * will eventually end with the host crashing. > + */ > + if ( (violation && access_w) && > + (regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END) ) > + { > + violation = 0; > + rc = p2m->set_entry(p2m, gfn_x(gfn), gmfn, PAGE_ORDER_4K, > + p2m_ram_rw, p2m_access_rw); > + } This looks more like a hack than a proper solution - shouldn't the listener be allowed to know of hypervisor side accesses? > +/* Number of access table pages for a PV domain */ > +#define get_domain_nr_access_table_pages(d) \ > + DIV_ROUND_UP(P2M_ACCESS_SIZE * (d->tot_pages - 1), PAGE_SIZE) And once again. I wonder whether this code was tested on a suitably big pv-ops PV guest. Jan