From: Zhang Qiao <zhangqiao22@huawei.com>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: Tejun Heo <tj@kernel.org>, <lizefan.x@bytedance.com>,
<hannes@cmpxchg.org>, <cgroups@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [Question] set_cpus_allowed_ptr() call failed at cpuset_attach()
Date: Thu, 20 Jan 2022 15:14:22 +0800 [thread overview]
Message-ID: <ff49c096-39d9-4215-5b4f-8af2fd7c0c91@huawei.com> (raw)
In-Reply-To: <20220119130221.GA31037@blackbody.suse.cz>
hello
在 2022/1/19 21:02, Michal Koutný 写道:
> On Fri, Jan 14, 2022 at 09:15:06AM +0800, Zhang Qiao <zhangqiao22@huawei.com> wrote:
>> I found the following warning log on qemu. I migrated a task from one cpuset cgroup to
>> another, while I also performed the cpu hotplug operation, and got following calltrace.
>
> Do you have more information on what hotplug event and what error
> (from set_cpus_allowed_ptr() you observe? (And what's src/dst cpuset wrt
> root/non-root)?
I ran the LTP testcases and a test scripts that do hotplug on a random cpu at the same time.
The race condition quickly, and I can't reproduce it so far.
By reading code about set_cpus_allowed_ptr(), i think __set_cpus_allowed_ptr_locked() will
be failed when new_mask and cpu_active_mask do not intersect, as follows:
__set_cpus_allowed_ptr_locked():
....
const struct cpumask *cpu_valid_mask = cpu_active_mask;
dest_cpu = cpumask_any_and_distribute(cpu_valid_mask, new_mask);
if (dest_cpu >= nr_cpu_ids) {
ret = -EINVAL;
goto out;
}
....
}
>
>> Can we use cpus_read_lock()/cpus_read_unlock() to guarantee that set_cpus_allowed_ptr()
>> doesn't fail, as follows:
>
> I'm wondering what can be wrong with the current actors:
>
> cpuset_can_attach
> down_read(cpuset_rwsem)
> // check all migratees
> up_read(cpuset_rwsem)
> [ _cpu_down / cpuhp_setup_state ]
> schedule_work
> ...
> cpuset_hotplug_update_tasks
> down_write(cpuset_rwsem)
> up_write(cpuset_rwsem)
> ... flush_work
> [ _cpu_down / cpu_up_down_serialize_trainwrecks ]
> cpuset_attach
> down_write(cpuset_rwsem)
> set_cpus_allowed_ptr(allowed_cpus_weird)
> up_write(cpuset_rwsem)
>
i think the troublesome scenario as follows:
cpuset_can_attach
down_read(cpuset_rwsem)
// check all migratees
up_read(cpuset_rwsem)
[ _cpu_down / cpuhp_setup_state ]
cpuset_attach
down_write(cpuset_rwsem)
guarantee_online_cpus() // (load cpus_attach)
sched_cpu_deactivate
set_cpu_active(cpu, false) // will change cpu_active_mask
set_cpus_allowed_ptr(cpus_attach)
__set_cpus_allowed_ptr_locked()
// (if the intersection of cpus_attach and
cpu_active_mask is empty, will return -EINVAL)
up_write(cpuset_rwsem)
schedule_work
...
cpuset_hotplug_update_tasks
down_write(cpuset_rwsem)
up_write(cpuset_rwsem)
... flush_work
[ _cpu_down / cpu_up_down_serialize_trainwrecks ]
Regards,
Qiao
> The statement in cpuset_attach() about cpuset_can_attach() test is not
> so strong since task_can_attach() is mostly a pass for non-deadline
> tasks. Still, the use of cpuset_rwsem above should synchronize (I may be
> mistaken) the changes of cpuset's cpu masks, so I'd be interested about
> the details above to understand why the current approach doesn't work.
>
> The additional cpus_read_{,un}lock (when reordered wrt cpuset_rwsem)
> may work but your patch should explain why (in what situation).
>
> My .02€,
> Michal
> .
>
next prev parent reply other threads:[~2022-01-20 7:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <09ce5796-798e-83d0-f1a6-ba38a787bfc5@huawei.com>
2022-01-14 1:15 ` [Question] set_cpus_allowed_ptr() call failed at cpuset_attach() Zhang Qiao
2022-01-14 16:20 ` Tejun Heo
2022-01-14 20:33 ` Waiman Long
2022-01-17 2:25 ` Zhang Qiao
2022-01-17 4:35 ` Waiman Long
2022-01-17 6:27 ` Zhang Qiao
2022-01-17 6:25 ` Zhang Qiao
2022-01-19 13:02 ` Michal Koutný
2022-01-20 7:14 ` Zhang Qiao [this message]
2022-01-20 14:02 ` Michal Koutný
2022-01-21 8:33 ` Zhang Qiao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ff49c096-39d9-4215-5b4f-8af2fd7c0c91@huawei.com \
--to=zhangqiao22@huawei.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan.x@bytedance.com \
--cc=mkoutny@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).