linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Quentin Perret <quentin.perret@arm.com>
To: Juri Lelli <juri.lelli@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it,
	claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it,
	bristot@redhat.com, mathieu.poirier@linaro.org,
	lizefan@huawei.com, cgroups@vger.kernel.org
Subject: Re: [PATCH v4 1/5] sched/topology: Add check to backup comment about hotplug lock
Date: Fri, 15 Jun 2018 09:39:51 +0100	[thread overview]
Message-ID: <20180615083951.GO17720@e108498-lin.cambridge.arm.com> (raw)
In-Reply-To: <20180614143037.GH12032@localhost.localdomain>

On Thursday 14 Jun 2018 at 16:30:37 (+0200), Juri Lelli wrote:
> On 14/06/18 15:18, Quentin Perret wrote:
> > On Thursday 14 Jun 2018 at 16:11:18 (+0200), Juri Lelli wrote:
> > > On 14/06/18 14:58, Quentin Perret wrote:
> > > 
> > > [...]
> > > 
> > > > Hmm not sure if this can help but I think that rebuild_sched_domains()
> > > > does _not_ take the hotplug lock before calling partition_sched_domains()
> > > > when CONFIG_CPUSETS=n. But it does take it for CONFIG_CPUSETS=y.
> > > 
> > > Did you mean cpuset_mutex?
> > 
> > Nope, I really meant the cpu_hotplug_lock !
> > 
> > With CONFIG_CPUSETS=n, rebuild_sched_domains() calls
> > partition_sched_domains() directly:
> > 
> > https://elixir.bootlin.com/linux/latest/source/include/linux/cpuset.h#L255
> > 
> > But with CONFIG_CPUSETS=y, rebuild_sched_domains() calls,
> > rebuild_sched_domains_locked(), which calls get_online_cpus() which
> > calls cpus_read_lock(), which does percpu_down_read(&cpu_hotplug_lock).
> > And all that happens before calling partition_sched_domains().
> 
> Ah, right!
>  
> > So yeah, the point I was trying to make is that there is an inconsistency
> > here, maybe for a good reason ? Maybe related to the issue you're seeing ?
> 
> The config that came with the 0day splat was indeed CONFIG_CPUSETS=n.
> 
> So, in this case IIUC we hit the !doms_new branch of partition_sched_
> domains, which uses cpu_active_mask (and cpu_possible_mask indirectly).
> Should this be still protected by the hotplug lock then?

Hmm I'm not sure ... But looking at your call trace, it seems that the
issue happens when sched_cpu_deactivate() is called (not sure why this
is called during boot BTW ?), which calls cpuset_update_active_cpus().

And again, for CONFIG_CPUSETS=n, that defaults to a raw call to
partition_sched_domain(), but with ndoms_new=1, and no lock taken.
I'm still not sure if this is done like that for a good reason, or if
this is actually an issue that this patch caught nicely ...

Quentin

  reply	other threads:[~2018-06-15  8:40 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-13 12:17 [PATCH v4 0/5] sched/deadline: fix cpusets bandwidth accounting Juri Lelli
2018-06-13 12:17 ` [PATCH v4 1/5] sched/topology: Add check to backup comment about hotplug lock Juri Lelli
2018-06-14 13:33   ` Steven Rostedt
2018-06-14 13:42     ` Juri Lelli
2018-06-14 13:47       ` Steven Rostedt
2018-06-14 13:50         ` Juri Lelli
2018-06-14 13:58           ` Quentin Perret
2018-06-14 14:11             ` Juri Lelli
2018-06-14 14:18               ` Quentin Perret
2018-06-14 14:30                 ` Juri Lelli
2018-06-15  8:39                   ` Quentin Perret [this message]
2018-06-13 12:17 ` [PATCH v4 2/5] sched/topology: Adding function partition_sched_domains_locked() Juri Lelli
2018-06-14 13:35   ` Steven Rostedt
2018-06-14 13:47     ` Juri Lelli
2018-06-13 12:17 ` [PATCH v4 3/5] sched/core: Streamlining calls to task_rq_unlock() Juri Lelli
2018-06-14 13:42   ` Steven Rostedt
2018-06-13 12:17 ` [PATCH v4 4/5] sched/core: Prevent race condition between cpuset and __sched_setscheduler() Juri Lelli
2018-06-14 13:45   ` Steven Rostedt
2018-06-14 13:51     ` Juri Lelli
2018-06-14 20:11   ` Steven Rostedt
2018-06-15  7:01     ` Juri Lelli
2018-06-15 13:07       ` Juri Lelli
2018-06-13 12:17 ` [PATCH v4 5/5] cpuset: Rebuild root domain deadline accounting information Juri Lelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180615083951.GO17720@e108498-lin.cambridge.arm.com \
    --to=quentin.perret@arm.com \
    --cc=bristot@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=claudio@evidence.eu.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=luca.abeni@santannapisa.it \
    --cc=mathieu.poirier@linaro.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tommaso.cucinotta@santannapisa.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).