All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yuan,Zhaoxiong" <yuanzhaoxiong@baidu.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
	"juri.lelli@redhat.com" <juri.lelli@redhat.com>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"bsegall@google.com" <bsegall@google.com>,
	"mgorman@suse.de" <mgorman@suse.de>,
	"bristot@redhat.com" <bristot@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched: Optimize housekeeping_cpumask in for_each_cpu_and
Date: Tue, 20 Apr 2021 06:44:54 +0000	[thread overview]
Message-ID: <830177B0-45E0-4768-80AB-A99B85D3A52F@baidu.com> (raw)
In-Reply-To: <YH1T2f96IWlR7aOi@hirez.programming.kicks-ass.net>



在 2021/4/19 下午5:57,“Peter Zijlstra”<peterz@infradead.org> 写入:

    On Sat, Apr 17, 2021 at 11:01:37PM +0800, Yuan ZhaoXiong wrote:
    > On a 128 cores AMD machine, there are 8 cores in nohz_full mode, and
    > the others are used for housekeeping. When many housekeeping cpus are
    > in idle state, we can observe huge time burn in the loop for searching
    > nearest busy housekeeper cpu by ftrace.
    > 
    >    9)               |              get_nohz_timer_target() {
    >    9)               |                housekeeping_test_cpu() {
    >    9)   0.390 us    |                  housekeeping_get_mask.part.1();
    >    9)   0.561 us    |                }
    >    9)   0.090 us    |                __rcu_read_lock();
    >    9)   0.090 us    |                housekeeping_cpumask();
    >    9)   0.521 us    |                housekeeping_cpumask();
    >    9)   0.140 us    |                housekeeping_cpumask();
    > 
    >    ...
    > 
    >    9)   0.500 us    |                housekeeping_cpumask();
    >    9)               |                housekeeping_any_cpu() {
    >    9)   0.090 us    |                  housekeeping_get_mask.part.1();
    >    9)   0.100 us    |                  sched_numa_find_closest();
    >    9)   0.491 us    |                }
    >    9)   0.100 us    |                __rcu_read_unlock();
    >    9) + 76.163 us   |              }
    > 
    > for_each_cpu_and() is a micro function, so in get_nohz_timer_target()
    > function the
    >         for_each_cpu_and(i, sched_domain_span(sd),
    >                 housekeeping_cpumask(HK_FLAG_TIMER))
    > equals to below:
    >         for (i = -1; i = cpumask_next_and(i, sched_domain_span(sd),
    >                 housekeeping_cpumask(HK_FLAG_TIMER)), i < nr_cpu_ids;)
    > That will cause that housekeeping_cpumask() will be invoked many times.
    > The housekeeping_cpumask() function returns a const value, so it is
    > unnecessary to invoke it every time. This patch can minimize the worst
    > searching time from ~76us to ~16us in my testing.
    > 
    > Similarly, the find_new_ilb() function has the same problem.
    
    Would it not make sense to mark housekeeping_cpumask() __pure instead?
    
After marking housekeeping_cpumask() __pure and then test again, the results 
proves that huge time burn in the loop for searching the nearest busy housekeeper 
still exists. 

Using objdump -D vmlinux we can see get_nohz_timer_target() disassembled code 
as below:
ffffffff810b96c0 <get_nohz_timer_target>:
ffffffff810b96c0:       e8 db 7f 94 00          callq  ffffffff81a016a0 <__fentry__>
ffffffff810b96c5:       41 57                   push   %r15
ffffffff810b96c7:       41 56                   push   %r14
ffffffff810b96c9:       41 55                   push   %r13
ffffffff810b96cb:       41 54                   push   %r12
ffffffff810b96cd:       55                      push   %rbp
ffffffff810b96ce:       53                      push   %rbx
ffffffff810b96cf:       48 83 ec 08             sub    $0x8,%rsp
ffffffff810b96d3:       65 8b 1d 56 5a f5 7e    mov    %gs:0x7ef55a56(%rip),%ebx        # f130 <cpu_number>
ffffffff810b96da:       41 89 dc                mov    %ebx,%r12d
ffffffff810b96dd:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffffffff810b96e2:       4c 63 f3                movslq %ebx,%r14
ffffffff810b96e5:       48 c7 c5 40 0b 02 00    mov    $0x20b40,%rbp
ffffffff810b96ec:       4a 8b 04 f5 20 77 13    mov    -0x7dec88e0(,%r14,8),%rax
ffffffff810b96f3:       82
ffffffff810b96f4:       49 89 ed                mov    %rbp,%r13
ffffffff810b96f7:       4c 01 e8                add    %r13,%rax
ffffffff810b96fa:       48 8b 88 90 09 00 00    mov    0x990(%rax),%rcx
ffffffff810b9701:       48 39 88 88 09 00 00    cmp    %rcx,0x988(%rax)
ffffffff810b9708:       0f 84 ce 00 00 00       je     ffffffff810b97dc <get_nohz_timer_target+0x11c>
ffffffff810b970e:       48 83 c4 08             add    $0x8,%rsp
ffffffff810b9712:       44 89 e0                mov    %r12d,%eax
ffffffff810b9715:       5b                      pop    %rbx
ffffffff810b9716:       5d                      pop    %rbp
ffffffff810b9717:       41 5c                   pop    %r12
ffffffff810b9719:       41 5d                   pop    %r13
ffffffff810b971b:       41 5e                   pop    %r14
ffffffff810b971d:       41 5f                   pop    %r15
ffffffff810b971f:       c3                      retq
ffffffff810b9720:       be 01 00 00 00          mov    $0x1,%esi
ffffffff810b9725:       89 df                   mov    %ebx,%edi
ffffffff810b9727:       e8 74 87 02 00          callq  ffffffff810e1ea0 <housekeeping_test_cpu>
ffffffff810b972c:       84 c0                   test   %al,%al
ffffffff810b972e:       75 b2                   jne    ffffffff810b96e2 <get_nohz_timer_target+0x22>
ffffffff810b9730:       e8 0b ea 03 00          callq  ffffffff810f8140 <__rcu_read_lock>
ffffffff810b9735:       48 c7 c5 40 0b 02 00    mov    $0x20b40,%rbp
ffffffff810b973c:       48 63 d3                movslq %ebx,%rdx
ffffffff810b973f:       c7 44 24 04 ff ff ff    movl   $0xffffffff,0x4(%rsp)
ffffffff810b9746:       ff
ffffffff810b9747:       48 89 e8                mov    %rbp,%rax
ffffffff810b974a:       48 03 04 d5 20 77 13    add    -0x7dec88e0(,%rdx,8),%rax
ffffffff810b9751:       82
ffffffff810b9752:       4c 8b a8 d8 09 00 00    mov    0x9d8(%rax),%r13
ffffffff810b9759:       4d 85 ed                test   %r13,%r13
ffffffff810b975c:       0f 84 d3 00 00 00       je     ffffffff810b9835 <get_nohz_timer_target+0x175>
ffffffff810b9762:       41 be ff ff ff ff       mov    $0xffffffff,%r14d
ffffffff810b9768:       4d 8d a5 38 01 00 00    lea    0x138(%r13),%r12
ffffffff810b976f:       45 89 f7                mov    %r14d,%r15d
ffffffff810b9772:       bf 01 00 00 00          mov    $0x1,%edi
ffffffff810b9777:       e8 f4 86 02 00          callq  ffffffff810e1e70 <housekeeping_cpumask>
ffffffff810b977c:       44 89 ff                mov    %r15d,%edi
ffffffff810b977f:       48 89 c2                mov    %rax,%rdx
ffffffff810b9782:       4c 89 e6                mov    %r12,%rsi
ffffffff810b9785:       e8 b6 ea 79 00          callq  ffffffff81858240 <cpumask_next_and>
ffffffff810b978a:       3b 05 b4 4e 3e 01       cmp    0x13e4eb4(%rip),%eax        # ffffffff8249e644 <nr_cpu_ids>
ffffffff810b9790:       41 89 c7                mov    %eax,%r15d
ffffffff810b9793:       0f 83 84 00 00 00       jae    ffffffff810b981d <get_nohz_timer_target+0x15d>
ffffffff810b9799:       44 39 fb                cmp    %r15d,%ebx
ffffffff810b979c:       74 d4                   je     ffffffff810b9772 <get_nohz_timer_target+0xb2>
ffffffff810b979e:       49 63 c7                movslq %r15d,%rax
ffffffff810b97a1:       48 89 ea                mov    %rbp,%rdx
ffffffff810b97a4:       48 03 14 c5 20 77 13    add    -0x7dec88e0(,%rax,8),%rdx
ffffffff810b97ab:       82
ffffffff810b97ac:       48 8b 82 90 09 00 00    mov    0x990(%rdx),%rax
ffffffff810b97b3:       48 39 82 88 09 00 00    cmp    %rax,0x988(%rdx)
ffffffff810b97ba:       75 13                   jne    ffffffff810b97cf <get_nohz_timer_target+0x10f>
ffffffff810b97bc:       8b 42 04                mov    0x4(%rdx),%eax
ffffffff810b97bf:       85 c0                   test   %eax,%eax
ffffffff810b97c1:       75 0c                   jne    ffffffff810b97cf <get_nohz_timer_target+0x10f>
ffffffff810b97c3:       48 8b 82 20 0c 00 00    mov    0xc20(%rdx),%rax
ffffffff810b97ca:       48 85 c0                test   %rax,%rax
ffffffff810b97cd:       74 a3                   je     ffffffff810b9772 <get_nohz_timer_target+0xb2>
ffffffff810b97cf:       e8 1c 33 04 00          callq  ffffffff810fcaf0 <__rcu_read_unlock>
ffffffff810b97d4:       45 89 fc                mov    %r15d,%r12d
ffffffff810b97d7:       e9 32 ff ff ff          jmpq   ffffffff810b970e <get_nohz_timer_target+0x4e>
ffffffff810b97dc:       8b 50 04                mov    0x4(%rax),%edx
ffffffff810b97df:       85 d2                   test   %edx,%edx
ffffffff810b97e1:       0f 85 27 ff ff ff       jne    ffffffff810b970e <get_nohz_timer_target+0x4e>
ffffffff810b97e7:       48 8b 80 20 0c 00 00    mov    0xc20(%rax),%rax
ffffffff810b97ee:       48 85 c0                test   %rax,%rax
ffffffff810b97f1:       0f 85 17 ff ff ff       jne    ffffffff810b970e <get_nohz_timer_target+0x4e>
ffffffff810b97f7:       e8 44 e9 03 00          callq  ffffffff810f8140 <__rcu_read_lock>
ffffffff810b97fc:       4e 03 2c f5 20 77 13    add    -0x7dec88e0(,%r14,8),%r13
ffffffff810b9803:       82
ffffffff810b9804:       89 5c 24 04             mov    %ebx,0x4(%rsp)
ffffffff810b9808:       41 89 df                mov    %ebx,%r15d
ffffffff810b980b:       4d 8b ad d8 09 00 00    mov    0x9d8(%r13),%r13
ffffffff810b9812:       4d 85 ed                test   %r13,%r13
ffffffff810b9815:       0f 85 47 ff ff ff       jne    ffffffff810b9762 <get_nohz_timer_target+0xa2>
ffffffff810b981b:       eb 12                   jmp    ffffffff810b982f <get_nohz_timer_target+0x16f>
ffffffff810b981d:       4d 8b 6d 00             mov    0x0(%r13),%r13
ffffffff810b9821:       4d 85 ed                test   %r13,%r13
ffffffff810b9824:       0f 85 3e ff ff ff       jne    ffffffff810b9768 <get_nohz_timer_target+0xa8>
ffffffff810b982a:       44 8b 7c 24 04          mov    0x4(%rsp),%r15d
ffffffff810b982f:       41 83 ff ff             cmp    $0xffffffff,%r15d
ffffffff810b9833:       75 9a                   jne    ffffffff810b97cf <get_nohz_timer_target+0x10f>
ffffffff810b9835:       bf 01 00 00 00          mov    $0x1,%edi
ffffffff810b983a:       e8 91 86 02 00          callq  ffffffff810e1ed0 <housekeeping_any_cpu>
ffffffff810b983f:       41 89 c7                mov    %eax,%r15d
ffffffff810b9842:       eb 8b                   jmp    ffffffff810b97cf <get_nohz_timer_target+0x10f>
ffffffff810b9844:       66 90                   xchg   %ax,%ax
ffffffff810b9846:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
ffffffff810b984d:       00 00 00

The disassembled code proves that the __pure mark does not work.


  reply	other threads:[~2021-04-20  7:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-17 15:01 [PATCH] sched: Optimize housekeeping_cpumask in for_each_cpu_and Yuan ZhaoXiong
2021-04-19  9:56 ` Peter Zijlstra
2021-04-20  6:44   ` Yuan,Zhaoxiong [this message]
2021-04-30  6:38     ` Yuan,Zhaoxiong
2021-05-20  8:36 ` Peter Zijlstra
2021-05-27  9:40 ` Peter Zijlstra
2021-05-31 10:37   ` Peter Zijlstra
2021-06-02  2:03 Yuan ZhaoXiong
2021-06-02  7:57 ` Peter Zijlstra
2021-06-06 13:11 Yuan ZhaoXiong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=830177B0-45E0-4768-80AB-A99B85D3A52F@baidu.com \
    --to=yuanzhaoxiong@baidu.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.