From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: [PATCH 6/7] xen/arm: flush D-cache and I-cache when appropriate Date: Fri, 16 Nov 2012 15:36:34 +0000 Message-ID: References: <1352821336-25837-6-git-send-email-stefano.stabellini@eu.citrix.com> <1352973753.3499.81.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1352973753.3499.81.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: "xen-devel@lists.xensource.com" , "Tim (Xen.org)" , Stefano Stabellini List-Id: xen-devel@lists.xenproject.org On Thu, 15 Nov 2012, Ian Campbell wrote: > On Tue, 2012-11-13 at 15:42 +0000, Stefano Stabellini wrote: > > - invalidate tlb after setting WXN; > > - flush D-cache and I-cache after relocation; > > - flush D-cache after writing to smp_up_cpu; > > - flush TLB before changing HTTBR; > > - flush I-cache after changing HTTBR; > > - flush I-cache and branch predictor after writing Xen text ptes. > > Since the reasoning is pretty subtle I wonder if you could say a few > words in each case about why these flushes are necessary? Either here on > in the comments in the code. I'll remove the TLB flush before changing HTTBR because it isn't actually needed. I'll write comments in the code regarding the D-cache flush after changing smp_up_cpu. I think that the others are self explanatory (they are a consequence of the fact that we are modifying the code we are running on or the mapping of it). > > @@ -244,10 +245,17 @@ void __init setup_pagetables(unsigned long > > boot_phys_offset, paddr_t xen_paddr) > > > > /* Change pagetables to the copy in the relocated Xen */ > > boot_httbr = (unsigned long) xen_pgtable + phys_offset; > > + flush_xen_dcache_va(&boot_httbr); > > + flush_xen_dcache_va_range((void*)dest_va, _end - _start); > > + isb(); > > + flush_xen_text_tlb(); > > + > > asm volatile ( > > STORE_CP64(0, HTTBR) /* Change translation base */ > > "dsb;" /* Ensure visibility of HTTBR > > update */ > > + "isb;" > > STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ > > + STORE_CP32(0, ICIALLU) /* Flush I-cache */ > > STORE_CP32(0, BPIALL) /* Flush branch predictor */ > > This now looks exactly like flush_xen_text_tlb() -- shall we call it > instead of open coding? Yeah > > "dsb;" /* Ensure completion of TLB+BP > > flush */ > > "isb;" > > @@ -256,6 +264,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > > /* Undo the temporary map */ > > pte.bits = 0; > > write_pte(xen_second + second_table_offset(dest_va), pte); > > + isb(); > > flush_xen_text_tlb(); > > Do any calls to flush_xen_text_tlb() not require a preceding isb? I'd > have thought that by its nature an isb would be required every time -- > in which case it may as well go in the macro. OK > > /* Link in the fixmap pagetable */ > > @@ -291,11 +300,14 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > > >> PAGE_SHIFT); > > pte.pt.table = 1; > > write_pte(xen_second + second_linear_offset(XEN_VIRT_START), pte); > > - /* Have changed a mapping used for .text. Flush everything for safety. */ > > - flush_xen_text_tlb(); > > + /* ISB is needed because we changed the text mappings */ > > + isb(); > > Why is the text TLB flush is not also required in this case? You are right, it is required, but we wait a bit longer and delay it until setting the WXN bit. In fact we can also delay the isb. I'll make that change and add a comment. > > /* From now on, no mapping may be both writable and executable. */ > > WRITE_CP32(READ_CP32(HSCTLR) | SCTLR_WXN, HSCTLR); > > + isb(); > > + /* Flush everything after setting WXN bit. */ > > + flush_xen_text_tlb(); > > } > > > > /* MMU setup for secondary CPUS (which already have paging enabled) */ > > @@ -303,6 +315,8 @@ void __cpuinit mmu_init_secondary_cpu(void) > > { > > /* From now on, no mapping may be both writable and executable. */ > > WRITE_CP32(READ_CP32(HSCTLR) | SCTLR_WXN, HSCTLR); > > + isb(); > > + flush_xen_text_tlb(); > > } > > > > /* Create Xen's mappings of memory. > > > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > > index 9511c45..1b1d556 100644 > > --- a/xen/include/asm-arm/page.h > > +++ b/xen/include/asm-arm/page.h > > @@ -228,27 +228,72 @@ static inline lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr) > > return e; > > } > > > > -/* Write a pagetable entry */ > > +/* Write a pagetable entry. > > + * > > + * If the table entry is changing a text mapping, it is responsibility > > + * of the caller to issue an ISB after write_pte. > > + */ > > static inline void write_pte(lpae_t *p, lpae_t pte) > > { > > asm volatile ( > > + /* Ensure any writes have completed with the old mappings. */ > > + "dsb;" > > /* Safely write the entry (STRD is atomic on CPUs that support LPAE) */ > > "strd %0, %H0, [%1];" > > + "dsb;" > > /* Push this cacheline to the PoC so the rest of the system sees it. */ > > STORE_CP32(1, DCCMVAC) > > + /* Ensure that the data flush is completed before proceeding */ > > + "dsb;" > > : : "r" (pte.bits), "r" (p) : "memory"); > > } > > > > + > > +#define MIN_CACHELINE_BYTES 32 > > +extern int cacheline_bytes; > > + > > +/* Function for flushing medium-sized areas. > > + * if 'range' is large enough we might want to use model-specific > > + * full-cache flushes. */ > > +static inline void flush_xen_dcache_va_range(void *p, unsigned long size) > > +{ > > + void *end; > > + dsb(); /* So the CPU issues all writes to the range */ > > + for ( end = p + size; p < end; p += cacheline_bytes ) > > Tim asked if this memory read of the cached cacheline_bytes might not be > more expensive than hitting the (on-core) cp register every time, which > I also suspect will be the case. Looking at Linux it seems to reread the > register each time in v7_flush_dcache_all which suggests that reading > the register is faster or at least preferable. OK > > + WRITE_CP32((uint32_t) p, DCCMVAC); > > + dsb(); /* So we know the flushes happen before continuing */ > > +} > > + > > + > > +/* Macro for flushing a single small item. The predicate is always > > + * compile-time constant so this will compile down to 3 instructions in > > + * the common case. Make sure to call it with the correct type of > > + * pointer! */ > > Do we need to worry about pointers to things which cross a cacheline > boundary? i.e. a 4 byte thing at offset 30 in a 32-byte cache line. > > Or do ARMs C alignment rules ensure this can never happen? (I suspect > the answer is yes). I think you are right, we need to take this into consideration. I couldn't find in the ARM C specs anything about cacheline alignment, but reading the gcc docs it seems pretty clear that cachelines are expected to be aligned to the cacheline size. I think we should change the macro into: #define flush_xen_dcache_va(p) do { \ int cacheline_bytes = READ_CP32(CCSIDR); \ typeof(p) _p = (p); \ if ( ((unsigned long)_p & ~(cacheline_bytes - 1)) != \ (((unsigned long)_p + (sizeof *_p)) & ~(cacheline_bytes - 1)) ) \ flush_xen_dcache_va_range(_p, sizeof *_p); \ else \ asm volatile ( \ "dsb;" /* Finish all earlier writes */ \ STORE_CP32(0, DCCMVAC) \ "dsb;" /* Finish flush before continuing */ \ : : "r" (_p), "m" (*_p)); \ } while (0) > > +#define flush_xen_dcache_va(p) do { \ > > + typeof(p) _p = (p); \ > > + if ( (sizeof *_p) > MIN_CACHELINE_BYTES ) \ > > + flush_xen_dcache_va_range(_p, sizeof *_p); \ > > + else \ > > + asm volatile ( \ > > + "dsb;" /* Finish all earlier writes */ \ > > + STORE_CP32(0, DCCMVAC) \ > > + "dsb;" /* Finish flush before continuing */ \ > > + : : "r" (_p), "m" (*_p)); \ > > +} while (0) > > + > > + > > /* > > * Flush all hypervisor mappings from the TLB and branch predictor. > > - * This is needed after changing Xen code mappings. > > + * This is needed after changing Xen code mappings. > > + * > > + * The caller needs to issue the necessary barriers before this functions. > > Worth commenting on what those are and when they are needed? Given that we are moving isb inside flush_xen_text_tlb, it is only DSB and D-cache flushes. I'll write that down. > > */ > > static inline void flush_xen_text_tlb(void) > > { > > register unsigned long r0 asm ("r0"); > > asm volatile ( > > - "dsb;" /* Ensure visibility of PTE writes */ > > STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ > > + STORE_CP32(0, ICIALLU) /* Flush I-cache */ > > STORE_CP32(0, BPIALL) /* Flush branch predictor */ > > "dsb;" /* Ensure completion of TLB+BP flush */ > > "isb;" > > >