linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Joakim Tjernlund <joakim.tjernlund@transmode.se>,
	"scottwood@freescale.com" <scottwood@freescale.com>
Cc: "paulus@samba.org" <paulus@samba.org>,
	"mpe@ellerman.id.au" <mpe@ellerman.id.au>,
	"benh@kernel.crashing.org" <benh@kernel.crashing.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline
Date: Tue, 22 Sep 2015 22:57:48 +0200	[thread overview]
Message-ID: <5601C0CC.6090503@c-s.fr> (raw)
In-Reply-To: <1442954285.29498.79.camel@transmode.se>



Le 22/09/2015 22:38, Joakim Tjernlund a écrit :
> On Tue, 2015-09-22 at 15:35 -0500, Scott Wood wrote:
>> On Tue, 2015-09-22 at 20:32 +0000, Joakim Tjernlund wrote:
>>> On Tue, 2015-09-22 at 15:14 -0500, Scott Wood wrote:
>>>> On Tue, 2015-09-22 at 19:55 +0000, Joakim Tjernlund wrote:
>>>>> On Tue, 2015-09-22 at 14:42 -0500, Scott Wood wrote:
>>>>>> On Tue, 2015-09-22 at 19:34 +0000, Joakim Tjernlund wrote:
>>>>>>> On Tue, 2015-09-22 at 13:58 -0500, Scott Wood wrote:
>>>>>>>> On Tue, 2015-09-22 at 18:12 +0000, Joakim Tjernlund wrote:
>>>>>>>>> On Tue, 2015-09-22 at 18:51 +0200, Christophe Leroy wrote:
>>>>>>>>>> flush/clean/invalidate _dcache_range() functions are all very
>>>>>>>>>> similar and are quite short. They are mainly used in
>>>>>>>>>> __dma_sync()
>>>>>>>>>> perf_event locate them in the top 3 consumming functions
>>>>>>>>>> during
>>>>>>>>>> heavy ethernet activity
>>>>>>>>>>
>>>>>>>>>> They are good candidate for inlining, as __dma_sync() does
>>>>>>>>>> almost nothing but calling them
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>>>>>>>>> ---
>>>>>>>>>> New in v2
>>>>>>>>>>
>>>>>>>>>>   arch/powerpc/include/asm/cacheflush.h | 55
>>>>>>>>>> +++++++++++++++++++++++++++--
>>>>>>>>>>   arch/powerpc/kernel/misc_32.S         | 65 ------------------
>>>>>>>>>> ----
>>>>>>>>>> ----
>>>>>>>>>> ----
>>>>>>>>>> -----
>>>>>>>>>>   arch/powerpc/kernel/ppc_ksyms.c       |  2 ++
>>>>>>>>>>   3 files changed, 54 insertions(+), 68 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/powerpc/include/asm/cacheflush.h
>>>>>>>>>> b/arch/powerpc/include/asm/cacheflush.h
>>>>>>>>>> index 6229e6b..6169604 100644
>>>>>>>>>> --- a/arch/powerpc/include/asm/cacheflush.h
>>>>>>>>>> +++ b/arch/powerpc/include/asm/cacheflush.h
>>>>>>>>>> @@ -47,12 +47,61 @@ static inline void
>>>>>>>>>> __flush_dcache_icache_phys(unsigned long physaddr)
>>>>>>>>>>   }
>>>>>>>>>>   #endif
>>>>>>>>>>   
>>>>>>>>>> -extern void flush_dcache_range(unsigned long start, unsigned
>>>>>>>>>> long
>>>>>>>>>> stop);
>>>>>>>>>>   #ifdef CONFIG_PPC32
>>>>>>>>>> -extern void clean_dcache_range(unsigned long start, unsigned
>>>>>>>>>> long
>>>>>>>>>> stop);
>>>>>>>>>> -extern void invalidate_dcache_range(unsigned long start,
>>>>>>>>>> unsigned
>>>>>>>>>> long
>>>>>>>>>> stop);
>>>>>>>>>> +/*
>>>>>>>>>> + * Write any modified data cache blocks out to memory and
>>>>>>>>>> invalidate
>>>>>>>>>> them.
>>>>>>>>>> + * Does not invalidate the corresponding instruction cache
>>>>>>>>>> blocks.
>>>>>>>>>> + */
>>>>>>>>>> +static inline void flush_dcache_range(unsigned long start,
>>>>>>>>>> unsigned
>>>>>>>>>> long
>>>>>>>>>> stop)
>>>>>>>>>> +{
>>>>>>>>>> +   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
>>>>>>>>>> +   unsigned int size = stop - (unsigned long)addr +
>>>>>>>>>> (L1_CACHE_BYTES -
>>>>>>>>>> 1);
>>>>>>>>>> +   unsigned int i;
>>>>>>>>>> +
>>>>>>>>>> +   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr +=
>>>>>>>>>> L1_CACHE_BYTES)
>>>>>>>>>> +           dcbf(addr);
>>>>>>>>>> +   if (i)
>>>>>>>>>> +           mb();   /* sync */
>>>>>>>>>> +}
>>>>>>>>> This feels optimized for the uncommon case when there is no
>>>>>>>>> invalidation.
>>>>>>>> If you mean the "if (i)", yes, that looks odd.
>>>>>>> Yes.
>>>>>>>
>>>>>>>>> I THINK it would be better to bail early
>>>>>>>> Bail under what conditions?
>>>>>>> test for "i = 0" and return.
>>>>>> Why bother?
>>>>> I usally find it better to dela with special cases upfront så the rest
>>>>> doesn't need to
>>>>> bother. i=0 is a NOP and it is clearer to show that upfront.
>>>> No, I mean why bother special casing this at all?
>>> I just said why?
>>> You to found the if(i) mb() a bit odd and it took a little time to see why
>>> it was there.
>>> Ahh, you mean just skip the if(i) and have mb() unconditionally?
>> Yes.
>>
>>> That changes the semantics a little from the ASM version but perhaps that
>>> doesn't matter?
>> Adding more barriers than strictly necessary is a performance concern, not a
>> semantic change.
> Semantics :)
>
>>    How often are we really calling this function over an empty
>> range?
> Never hopefully, it does not make much sense.
>
>> Not that it matters much one way or another...
> probably not.
>

Here is what I get in asm. First one is with "if (i) mb();". We see gcc 
puts a beqlr. This is the form that is closest to what we had in the 
former misc_32.S
Second one if with "mb()". Here we get a branch to sync for a useless sync

c000e0ac <my_flush_dcache_range1>:
c000e0ac:       54 63 00 36     rlwinm  r3,r3,0,0,27
c000e0b0:       38 84 00 0f     addi    r4,r4,15
c000e0b4:       7d 23 20 50     subf    r9,r3,r4
c000e0b8:       55 29 e1 3f     rlwinm. r9,r9,28,4,31
c000e0bc:       4d 82 00 20     beqlr
c000e0c0:       7d 29 03 a6     mtctr   r9
c000e0c4:       7c 00 18 6c     dcbst   0,r3
c000e0c8:       38 63 00 10     addi    r3,r3,16
c000e0cc:       42 00 ff f8     bdnz    c000e0c4 
<my_flush_dcache_range1+0x18>
c000e0d0:       7c 00 04 ac     sync
c000e0d4:       4e 80 00 20     blr

c000e0d8 <my_flush_dcache_range2>:
c000e0d8:       54 63 00 36     rlwinm  r3,r3,0,0,27
c000e0dc:       38 84 00 0f     addi    r4,r4,15
c000e0e0:       7d 23 20 50     subf    r9,r3,r4
c000e0e4:       55 29 e1 3f     rlwinm. r9,r9,28,4,31
c000e0e8:       41 82 00 14     beq     c000e0fc 
<my_flush_dcache_range2+0x24>
c000e0ec:       7d 29 03 a6     mtctr   r9
c000e0f0:       7c 00 18 6c     dcbst   0,r3
c000e0f4:       38 63 00 10     addi    r3,r3,16
c000e0f8:       42 00 ff f8     bdnz    c000e0f0 
<my_flush_dcache_range2+0x18>
c000e0fc:       7c 00 04 ac     sync
c000e100:       4e 80 00 20     blr

Christophe

  reply	other threads:[~2015-09-22 20:57 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-22 16:50 [PATCH v2 00/25] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler Christophe Leroy
2015-09-28 22:07   ` Scott Wood
2015-10-06 13:35     ` Christophe Leroy
2015-10-06 16:39       ` Scott Wood
2015-10-06 16:46       ` Scott Wood
2015-10-06 20:30         ` christophe leroy
2015-10-06 20:38           ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 02/25] powerpc/8xx: Map linear kernel RAM with 8M pages Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 03/25] powerpc: Update documentation for noltlbs kernel parameter Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 04/25] powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 05/25] powerpc/8xx: Fix vaddr for IMMR early remap Christophe Leroy
2015-09-28 23:39   ` Scott Wood
2015-10-08 12:34     ` Christophe Leroy
2015-10-08 19:13       ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 06/25] powerpc32: iounmap() cannot vunmap() area mapped by TLBCAMs either Christophe Leroy
2015-09-28 23:41   ` Scott Wood
2015-10-06 13:50     ` Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 07/25] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together Christophe Leroy
2015-09-28 23:47   ` Scott Wood
2015-10-06 14:02     ` Christophe Leroy
2015-10-06 15:16       ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address Christophe Leroy
2015-09-24 11:41   ` David Laight
2015-09-24 20:14     ` Scott Wood
2015-09-25 14:46       ` David Laight
2015-09-25 17:09         ` Scott Wood
2015-09-28 23:53   ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 09/25] powerpc/8xx: show IMMR area in startup memory layout Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 10/25] powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 11/25] powerpc/8xx: map 16M RAM at startup Christophe Leroy
2015-09-28 23:58   ` Scott Wood
2015-10-06 14:10     ` Christophe Leroy
2015-10-06 15:17       ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 12/25] powerpc32: Remove useless/wrong MMU:setio progress message Christophe Leroy
2015-09-22 16:50 ` [PATCH v2 13/25] powerpc/8xx: also use r3 in the ITLB miss in all situations Christophe Leroy
2015-09-29  0:00   ` Scott Wood
2015-10-06 14:12     ` Christophe Leroy
2015-10-06 16:48       ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 14/25] powerpc32: remove ioremap_base Christophe Leroy
2015-09-29  0:38   ` Scott Wood
2015-09-22 16:50 ` [PATCH v2 15/25] powerpc/8xx: move 8xx SPRN defines into reg_8xx.h and add some missing ones Christophe Leroy
2015-09-29  0:03   ` Scott Wood
2015-10-06 14:35     ` Christophe Leroy
2015-10-06 16:56       ` Scott Wood
2015-09-22 16:51 ` [PATCH v2 16/25] powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 17/25] powerpc/8xx: remove special handling of CPU6 errata in set_dec() Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 18/25] powerpc/8xx: rewrite set_context() in C Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 19/25] powerpc/8xx: rewrite flush_instruction_cache() " Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 20/25] powerpc32: Remove clear_pages() and define clear_page() inline Christophe Leroy
2015-09-22 17:57   ` Joakim Tjernlund
2015-09-29  0:23   ` Scott Wood
2015-09-22 16:51 ` [PATCH v2 21/25] powerpc: add inline functions for cache related instructions Christophe Leroy
2015-09-29  0:25   ` Scott Wood
2015-09-22 16:51 ` [PATCH v2 22/25] powerpc32: move xxxxx_dcache_range() functions inline Christophe Leroy
2015-09-22 18:12   ` Joakim Tjernlund
2015-09-22 18:58     ` Scott Wood
2015-09-22 19:34       ` Joakim Tjernlund
2015-09-22 19:42         ` Scott Wood
2015-09-22 19:55           ` Joakim Tjernlund
2015-09-22 20:07             ` Joakim Tjernlund
2015-09-22 20:14             ` Scott Wood
2015-09-22 20:32               ` Joakim Tjernlund
2015-09-22 20:35                 ` Scott Wood
2015-09-22 20:38                   ` Joakim Tjernlund
2015-09-22 20:57                     ` Christophe Leroy [this message]
2015-09-22 22:34                       ` Scott Wood
2015-09-22 22:49                         ` Christophe Leroy
2015-09-22 22:52                           ` Scott Wood
2015-09-29  0:29   ` Scott Wood
2015-10-07 12:49     ` Christophe Leroy
2015-10-08 19:12       ` Scott Wood
2015-10-12 18:08         ` christophe leroy
2015-09-22 16:51 ` [PATCH v2 23/25] powerpc: Simplify test in __dma_sync() Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 24/25] powerpc32: small optimisation in flush_icache_range() Christophe Leroy
2015-09-22 16:51 ` [PATCH v2 25/25] powerpc32: Remove one insn in mulhdu Christophe Leroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5601C0CC.6090503@c-s.fr \
    --to=christophe.leroy@c-s.fr \
    --cc=benh@kernel.crashing.org \
    --cc=joakim.tjernlund@transmode.se \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).