From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: [PATCH 6/7] xen/arm: flush D-cache and I-cache when appropriate Date: Thu, 15 Nov 2012 10:02:33 +0000 Message-ID: <1352973753.3499.81.camel@zakaz.uk.xensource.com> References: <1352821336-25837-6-git-send-email-stefano.stabellini@eu.citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1352821336-25837-6-git-send-email-stefano.stabellini@eu.citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Stefano Stabellini Cc: "xen-devel@lists.xensource.com" , "Tim (Xen.org)" List-Id: xen-devel@lists.xenproject.org On Tue, 2012-11-13 at 15:42 +0000, Stefano Stabellini wrote: > - invalidate tlb after setting WXN; > - flush D-cache and I-cache after relocation; > - flush D-cache after writing to smp_up_cpu; > - flush TLB before changing HTTBR; > - flush I-cache after changing HTTBR; > - flush I-cache and branch predictor after writing Xen text ptes. Since the reasoning is pretty subtle I wonder if you could say a few words in each case about why these flushes are necessary? Either here on in the comments in the code. > @@ -244,10 +245,17 @@ void __init setup_pagetables(unsigned long > boot_phys_offset, paddr_t xen_paddr) > > /* Change pagetables to the copy in the relocated Xen */ > boot_httbr = (unsigned long) xen_pgtable + phys_offset; > + flush_xen_dcache_va(&boot_httbr); > + flush_xen_dcache_va_range((void*)dest_va, _end - _start); > + isb(); > + flush_xen_text_tlb(); > + > asm volatile ( > STORE_CP64(0, HTTBR) /* Change translation base */ > "dsb;" /* Ensure visibility of HTTBR > update */ > + "isb;" > STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ > + STORE_CP32(0, ICIALLU) /* Flush I-cache */ > STORE_CP32(0, BPIALL) /* Flush branch predictor */ This now looks exactly like flush_xen_text_tlb() -- shall we call it instead of open coding? > "dsb;" /* Ensure completion of TLB+BP > flush */ > "isb;" > @@ -256,6 +264,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > /* Undo the temporary map */ > pte.bits = 0; > write_pte(xen_second + second_table_offset(dest_va), pte); > + isb(); > flush_xen_text_tlb(); Do any calls to flush_xen_text_tlb() not require a preceding isb? I'd have thought that by its nature an isb would be required every time -- in which case it may as well go in the macro. > /* Link in the fixmap pagetable */ > @@ -291,11 +300,14 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > >> PAGE_SHIFT); > pte.pt.table = 1; > write_pte(xen_second + second_linear_offset(XEN_VIRT_START), pte); > - /* Have changed a mapping used for .text. Flush everything for safety. */ > - flush_xen_text_tlb(); > + /* ISB is needed because we changed the text mappings */ > + isb(); Why is the text TLB flush is not also required in this case? > /* From now on, no mapping may be both writable and executable. */ > WRITE_CP32(READ_CP32(HSCTLR) | SCTLR_WXN, HSCTLR); > + isb(); > + /* Flush everything after setting WXN bit. */ > + flush_xen_text_tlb(); > } > > /* MMU setup for secondary CPUS (which already have paging enabled) */ > @@ -303,6 +315,8 @@ void __cpuinit mmu_init_secondary_cpu(void) > { > /* From now on, no mapping may be both writable and executable. */ > WRITE_CP32(READ_CP32(HSCTLR) | SCTLR_WXN, HSCTLR); > + isb(); > + flush_xen_text_tlb(); > } > > /* Create Xen's mappings of memory. > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > index 9511c45..1b1d556 100644 > --- a/xen/include/asm-arm/page.h > +++ b/xen/include/asm-arm/page.h > @@ -228,27 +228,72 @@ static inline lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr) > return e; > } > > -/* Write a pagetable entry */ > +/* Write a pagetable entry. > + * > + * If the table entry is changing a text mapping, it is responsibility > + * of the caller to issue an ISB after write_pte. > + */ > static inline void write_pte(lpae_t *p, lpae_t pte) > { > asm volatile ( > + /* Ensure any writes have completed with the old mappings. */ > + "dsb;" > /* Safely write the entry (STRD is atomic on CPUs that support LPAE) */ > "strd %0, %H0, [%1];" > + "dsb;" > /* Push this cacheline to the PoC so the rest of the system sees it. */ > STORE_CP32(1, DCCMVAC) > + /* Ensure that the data flush is completed before proceeding */ > + "dsb;" > : : "r" (pte.bits), "r" (p) : "memory"); > } > > + > +#define MIN_CACHELINE_BYTES 32 > +extern int cacheline_bytes; > + > +/* Function for flushing medium-sized areas. > + * if 'range' is large enough we might want to use model-specific > + * full-cache flushes. */ > +static inline void flush_xen_dcache_va_range(void *p, unsigned long size) > +{ > + void *end; > + dsb(); /* So the CPU issues all writes to the range */ > + for ( end = p + size; p < end; p += cacheline_bytes ) Tim asked if this memory read of the cached cacheline_bytes might not be more expensive than hitting the (on-core) cp register every time, which I also suspect will be the case. Looking at Linux it seems to reread the register each time in v7_flush_dcache_all which suggests that reading the register is faster or at least preferable. > + WRITE_CP32((uint32_t) p, DCCMVAC); > + dsb(); /* So we know the flushes happen before continuing */ > +} > + > + > +/* Macro for flushing a single small item. The predicate is always > + * compile-time constant so this will compile down to 3 instructions in > + * the common case. Make sure to call it with the correct type of > + * pointer! */ Do we need to worry about pointers to things which cross a cacheline boundary? i.e. a 4 byte thing at offset 30 in a 32-byte cache line. Or do ARMs C alignment rules ensure this can never happen? (I suspect the answer is yes). > +#define flush_xen_dcache_va(p) do { \ > + typeof(p) _p = (p); \ > + if ( (sizeof *_p) > MIN_CACHELINE_BYTES ) \ > + flush_xen_dcache_va_range(_p, sizeof *_p); \ > + else \ > + asm volatile ( \ > + "dsb;" /* Finish all earlier writes */ \ > + STORE_CP32(0, DCCMVAC) \ > + "dsb;" /* Finish flush before continuing */ \ > + : : "r" (_p), "m" (*_p)); \ > +} while (0) > + > + > /* > * Flush all hypervisor mappings from the TLB and branch predictor. > - * This is needed after changing Xen code mappings. > + * This is needed after changing Xen code mappings. > + * > + * The caller needs to issue the necessary barriers before this functions. Worth commenting on what those are and when they are needed? > */ > static inline void flush_xen_text_tlb(void) > { > register unsigned long r0 asm ("r0"); > asm volatile ( > - "dsb;" /* Ensure visibility of PTE writes */ > STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ > + STORE_CP32(0, ICIALLU) /* Flush I-cache */ > STORE_CP32(0, BPIALL) /* Flush branch predictor */ > "dsb;" /* Ensure completion of TLB+BP flush */ > "isb;"