linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] make __section_nr more efficient
@ 2016-07-20  4:18 Zhou Chengming
  2016-07-20 21:36 ` Dave Hansen
  0 siblings, 1 reply; 4+ messages in thread
From: Zhou Chengming @ 2016-07-20  4:18 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: akpm, tj, guohanjun, huawei.libin, zhouchengming1

When CONFIG_SPARSEMEM_EXTREME is disabled, __section_nr can get
the section number with a subtraction directly.

Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
---
 mm/sparse.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 5d0cf45..36d7bbb 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -100,11 +100,7 @@ static inline int sparse_index_init(unsigned long section_nr, int nid)
 }
 #endif
 
-/*
- * Although written for the SPARSEMEM_EXTREME case, this happens
- * to also work for the flat array case because
- * NR_SECTION_ROOTS==NR_MEM_SECTIONS.
- */
+#ifdef CONFIG_SPARSEMEM_EXTREME
 int __section_nr(struct mem_section* ms)
 {
 	unsigned long root_nr;
@@ -123,6 +119,12 @@ int __section_nr(struct mem_section* ms)
 
 	return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
 }
+#else
+int __section_nr(struct mem_section* ms)
+{
+	return (int)(ms - mem_section[0]);
+}
+#endif
 
 /*
  * During early boot, before section_mem_map is used for an actual
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] make __section_nr more efficient
  2016-07-20  4:18 [PATCH] make __section_nr more efficient Zhou Chengming
@ 2016-07-20 21:36 ` Dave Hansen
  2016-07-21  1:55   ` zhouchengming
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Hansen @ 2016-07-20 21:36 UTC (permalink / raw)
  To: Zhou Chengming, linux-kernel, linux-mm; +Cc: akpm, tj, guohanjun, huawei.libin

On 07/19/2016 09:18 PM, Zhou Chengming wrote:
> When CONFIG_SPARSEMEM_EXTREME is disabled, __section_nr can get
> the section number with a subtraction directly.

Does this actually *do* anything?

It was a long time ago, but if I remember correctly, the entire loop in
__section_nr() goes away because root_nr==NR_SECTION_ROOTS, so
root_nr=1, and the compiler optimizes away the entire subtraction.

So this basically adds an #ifdef and gets us nothing, although it makes
the situation much more explicit.  Perhaps the comment should say that
this works *and* is efficient because the compiler can optimize all the
extreme complexity away.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] make __section_nr more efficient
  2016-07-20 21:36 ` Dave Hansen
@ 2016-07-21  1:55   ` zhouchengming
  2016-07-21 14:38     ` Dave Hansen
  0 siblings, 1 reply; 4+ messages in thread
From: zhouchengming @ 2016-07-21  1:55 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, akpm, tj, guohanjun, huawei.libin

On 2016/7/21 5:36, Dave Hansen wrote:
> On 07/19/2016 09:18 PM, Zhou Chengming wrote:
>> When CONFIG_SPARSEMEM_EXTREME is disabled, __section_nr can get
>> the section number with a subtraction directly.
>
> Does this actually *do* anything?
>
> It was a long time ago, but if I remember correctly, the entire loop in
> __section_nr() goes away because root_nr==NR_SECTION_ROOTS, so
> root_nr=1, and the compiler optimizes away the entire subtraction.
>
> So this basically adds an #ifdef and gets us nothing, although it makes
> the situation much more explicit.  Perhaps the comment should say that
> this works *and* is efficient because the compiler can optimize all the
> extreme complexity away.
>
> .
>

Thanks for your reply. I don't know the compiler will optimize the loop.
But when I see the assembly code of __section_nr, it seems to still have
the loop in it.

My gcc version: gcc version 4.9.0 (GCC)
CONFIG_SPARSEMEM_EXTREME: disabled

Before this patch:

0000000000000000 <__section_nr>:
    0:   55                      push   %rbp
    1:   48 c7 c2 00 00 00 00    mov    $0x0,%rdx
                         4: R_X86_64_32S mem_section
    8:   31 c0                   xor    %eax,%eax
    a:   48 89 e5                mov    %rsp,%rbp
    d:   eb 0d                   jmp    1c <__section_nr+0x1c>
    f:   48 83 c0 01             add    $0x1,%rax
   13:   48 81 fa 00 00 00 00    cmp    $0x0,%rdx
                         16: R_X86_64_32S        mem_section+0x800000
   1a:   74 26                   je     42 <__section_nr+0x42>
   1c:   48 89 d1                mov    %rdx,%rcx
   1f:   ba 10 00 00 00          mov    $0x10,%edx
   24:   48 85 c9                test   %rcx,%rcx
   27:   74 e6                   je     f <__section_nr+0xf>
   29:   48 39 cf                cmp    %rcx,%rdi
   2c:   48 8d 51 10             lea    0x10(%rcx),%rdx
   30:   72 dd                   jb     f <__section_nr+0xf>
   32:   48 39 d7                cmp    %rdx,%rdi
   35:   73 d8                   jae    f <__section_nr+0xf>
   37:   48 29 cf                sub    %rcx,%rdi
   3a:   48 c1 ff 04             sar    $0x4,%rdi
   3e:   01 f8                   add    %edi,%eax
   40:   5d                      pop    %rbp
   41:   c3                      retq
   42:   48 29 cf                sub    %rcx,%rdi
   45:   b8 00 00 08 00          mov    $0x80000,%eax
   4a:   48 c1 ff 04             sar    $0x4,%rdi
   4e:   01 f8                   add    %edi,%eax
   50:   5d                      pop    %rbp
   51:   c3                      retq
   52:   66 66 66 66 66 2e 0f    data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
   59:   1f 84 00 00 00 00 00

After this patch:

0000000000000000 <__section_nr>:
    0:   55                      push   %rbp
    1:   48 89 f8                mov    %rdi,%rax
    4:   48 2d 00 00 00 00       sub    $0x0,%rax
                         6: R_X86_64_32S mem_section
    a:   48 89 e5                mov    %rsp,%rbp
    d:   48 c1 f8 04             sar    $0x4,%rax
   11:   5d                      pop    %rbp
   12:   c3                      retq
   13:   66 66 66 66 2e 0f 1f    data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
   1a:   84 00 00 00 00 00


Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] make __section_nr more efficient
  2016-07-21  1:55   ` zhouchengming
@ 2016-07-21 14:38     ` Dave Hansen
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Hansen @ 2016-07-21 14:38 UTC (permalink / raw)
  To: zhouchengming; +Cc: linux-kernel, linux-mm, akpm, tj, guohanjun, huawei.libin

On 07/20/2016 06:55 PM, zhouchengming wrote:
> Thanks for your reply. I don't know the compiler will optimize the loop.
> But when I see the assembly code of __section_nr, it seems to still have
> the loop in it.

Oh, well.  I guess it got broken in the last decade or so.  Your patch
looks good to me, and the fact that we ended up here means the original
approach was at least a little fragile.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-07-21 14:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-20  4:18 [PATCH] make __section_nr more efficient Zhou Chengming
2016-07-20 21:36 ` Dave Hansen
2016-07-21  1:55   ` zhouchengming
2016-07-21 14:38     ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).