From: "Ofer Levi(SW)" <oferle@mellanox.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "rusty@rustcorp.com.au" <rusty@rustcorp.com.au>,
"mingo@redhat.com" <mingo@redhat.com>,
"Vineet.Gupta1@synopsys.com" <Vineet.Gupta1@synopsys.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Tejun Heo <tj@kernel.org>
Subject: RE: hotplug support for arch/arc/plat-eznps platform
Date: Mon, 14 Aug 2017 07:54:57 +0000 [thread overview]
Message-ID: <VI1PR0501MB21104B62CC16F41412ED3588B28C0@VI1PR0501MB2110.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <20170810154518.gl2w3llfabnszusr@hirez.programming.kicks-ass.net>
Sorry for the late response but this patch is a drawback,. Its back to about 0.4 sec per cpu bring up.
This is when possible, present and isolcpus are 16-4095
Most time is spent at:
register_sched_domain_sysctl() calling sd_sysctl_header = register_sysctl_table(sd_ctl_root);
[ 22.150000] ## CPU16 LIVE ##: Executing Code...
[ 22.170000] partition_sched_domains start
[ 22.220000] register_sched_domain_sysctl start
[ 22.580000] register_sched_domain_sysctl end
[ 22.580000] partition_sched_domains end
> BTW, what physical size does your toy have? I'm thinking its less than
> multiple racks worth like the SGI systems were.
It's a single chip with 4K cpus, capable of 400Gbps duplex. Evaluation board is pizza box size.
Thanks
> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Thursday, August 10, 2017 6:45 PM
> To: Ofer Levi(SW) <oferle@mellanox.com>
> Cc: rusty@rustcorp.com.au; mingo@redhat.com;
> Vineet.Gupta1@synopsys.com; linux-kernel@vger.kernel.org; Tejun Heo
> <tj@kernel.org>
> Subject: Re: hotplug support for arch/arc/plat-eznps platform
>
> On Thu, Aug 10, 2017 at 11:19:05AM +0200, Peter Zijlstra wrote:
> > On Thu, Aug 10, 2017 at 07:40:16AM +0000, Ofer Levi(SW) wrote:
> > > Well, this definitely have pleased the little toy :) Thank you. I
> > > really appreciate your time and effort.
> > >
> > > If I may, one more newbie question. What do I need to do for the two
> > > patches to find their way into formal kernel code?
> >
> > I'll split the first patch into two separate patches and line them up.
> >
> > I'm not sure about this last patch, I'll speak with Ingo once he's
> > back to see what would be the thing to do here.
> >
> > I suspect we can make it work, that sysctl stuff is only debug crud
> > after all and that should never get in the way of getting work done.
>
> Can you test this instead of the second patch? It should have the same
> effect.
>
>
> ---
> Subject: sched/debug: Optimize sched_domain sysctl generation
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu Aug 10 17:10:26 CEST 2017
>
> Currently we unconditionally destroy all sysctl bits and regenerate them after
> we've rebuild the domains (even if that rebuild is a no-op).
>
> And since we unconditionally (re)build the sysctl for all possible CPUs,
> onlining all CPUs gets us O(n^2) time. Instead change this to only rebuild the
> bits for CPUs we've actually installed new domains on.
>
> Reported-by: "Ofer Levi(SW)" <oferle@mellanox.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/sched/debug.c | 68
> ++++++++++++++++++++++++++++++++++++++----------
> kernel/sched/sched.h | 4 ++
> kernel/sched/topology.c | 1
> 3 files changed, 59 insertions(+), 14 deletions(-)
>
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -327,38 +327,78 @@ static struct ctl_table *sd_alloc_ctl_cp
> return table;
> }
>
> +static cpumask_var_t sd_sysctl_cpus;
> static struct ctl_table_header *sd_sysctl_header;
> +
> void register_sched_domain_sysctl(void)
> {
> - int i, cpu_num = num_possible_cpus();
> - struct ctl_table *entry = sd_alloc_ctl_entry(cpu_num + 1);
> + static struct ctl_table *cpu_entries;
> + static struct ctl_table **cpu_idx;
> char buf[32];
> + int i;
> +
> + if (!cpu_entries) {
> + cpu_entries = sd_alloc_ctl_entry(num_possible_cpus() + 1);
> + if (!cpu_entries)
> + return;
> +
> + WARN_ON(sd_ctl_dir[0].child);
> + sd_ctl_dir[0].child = cpu_entries;
> + }
> +
> + if (!cpu_idx) {
> + struct ctl_table *e = cpu_entries;
> +
> + cpu_idx = kcalloc(nr_cpu_ids, sizeof(struct ctl_table*),
> GFP_KERNEL);
> + if (!cpu_idx)
> + return;
> +
> + /* deal with sparse possible map */
> + for_each_possible_cpu(i) {
> + cpu_idx[i] = e;
> + e++;
> + }
> + }
>
> - WARN_ON(sd_ctl_dir[0].child);
> - sd_ctl_dir[0].child = entry;
> + if (!cpumask_available(sd_sysctl_cpus)) {
> + if (!alloc_cpumask_var(&sd_sysctl_cpus, GFP_KERNEL))
> + return;
>
> - if (entry == NULL)
> - return;
> + /* init to possible to not have holes in @cpu_entries */
> + cpumask_copy(sd_sysctl_cpus, cpu_possible_mask);
> + }
> +
> + for_each_cpu(i, sd_sysctl_cpus) {
> + struct ctl_table *e = cpu_idx[i];
> +
> + if (e->child)
> + sd_free_ctl_entry(&e->child);
> +
> + if (!e->procname) {
> + snprintf(buf, 32, "cpu%d", i);
> + e->procname = kstrdup(buf, GFP_KERNEL);
> + }
> + e->mode = 0555;
> + e->child = sd_alloc_ctl_cpu_table(i);
>
> - for_each_possible_cpu(i) {
> - snprintf(buf, 32, "cpu%d", i);
> - entry->procname = kstrdup(buf, GFP_KERNEL);
> - entry->mode = 0555;
> - entry->child = sd_alloc_ctl_cpu_table(i);
> - entry++;
> + __cpumask_clear_cpu(i, sd_sysctl_cpus);
> }
>
> WARN_ON(sd_sysctl_header);
> sd_sysctl_header = register_sysctl_table(sd_ctl_root);
> }
>
> +void dirty_sched_domain_sysctl(int cpu) {
> + if (cpumask_available(sd_sysctl_cpus))
> + __cpumask_set_cpu(cpu, sd_sysctl_cpus); }
> +
> /* may be called multiple times per register */ void
> unregister_sched_domain_sysctl(void)
> {
> unregister_sysctl_table(sd_sysctl_header);
> sd_sysctl_header = NULL;
> - if (sd_ctl_dir[0].child)
> - sd_free_ctl_entry(&sd_ctl_dir[0].child);
> }
> #endif /* CONFIG_SYSCTL */
> #endif /* CONFIG_SMP */
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1120,11 +1120,15 @@ extern int group_balance_cpu(struct sche
>
> #if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL) void
> register_sched_domain_sysctl(void);
> +void dirty_sched_domain_sysctl(int cpu);
> void unregister_sched_domain_sysctl(void);
> #else
> static inline void register_sched_domain_sysctl(void)
> {
> }
> +static inline void dirty_sched_domain_sysctl(int cpu) { }
> static inline void unregister_sched_domain_sysctl(void)
> {
> }
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -461,6 +461,7 @@ cpu_attach_domain(struct sched_domain *s
> rq_attach_root(rq, rd);
> tmp = rq->sd;
> rcu_assign_pointer(rq->sd, sd);
> + dirty_sched_domain_sysctl(cpu);
> destroy_sched_domains(tmp);
>
> update_top_cache_domain(cpu);
prev parent reply other threads:[~2017-08-14 7:55 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-06 5:53 hotplug support for arch/arc/plat-eznps platform Ofer Levi(SW)
2017-08-07 8:33 ` Peter Zijlstra
2017-08-07 13:41 ` Ofer Levi(SW)
2017-08-07 15:10 ` Peter Zijlstra
2017-08-08 6:49 ` Ofer Levi(SW)
2017-08-08 10:16 ` Peter Zijlstra
2017-08-09 15:19 ` Ofer Levi(SW)
2017-08-09 15:34 ` Peter Zijlstra
2017-08-10 7:40 ` Ofer Levi(SW)
2017-08-10 9:19 ` Peter Zijlstra
2017-08-10 11:51 ` Ofer Levi(SW)
2017-08-10 15:45 ` Peter Zijlstra
2017-08-14 7:54 ` Ofer Levi(SW) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VI1PR0501MB21104B62CC16F41412ED3588B28C0@VI1PR0501MB2110.eurprd05.prod.outlook.com \
--to=oferle@mellanox.com \
--cc=Vineet.Gupta1@synopsys.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rusty@rustcorp.com.au \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).