linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhang Qiao <zhangqiao22@huawei.com>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: Tejun Heo <tj@kernel.org>, <lizefan.x@bytedance.com>,
	<hannes@cmpxchg.org>, <cgroups@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [Question] set_cpus_allowed_ptr() call failed at cpuset_attach()
Date: Thu, 20 Jan 2022 15:14:22 +0800	[thread overview]
Message-ID: <ff49c096-39d9-4215-5b4f-8af2fd7c0c91@huawei.com> (raw)
In-Reply-To: <20220119130221.GA31037@blackbody.suse.cz>

hello

在 2022/1/19 21:02, Michal Koutný 写道:
> On Fri, Jan 14, 2022 at 09:15:06AM +0800, Zhang Qiao <zhangqiao22@huawei.com> wrote:
>> 	I found the following warning log on qemu. I migrated a task from one cpuset cgroup to
>> another, while I also performed the cpu hotplug operation, and got following calltrace.
> 
> Do you have more information on what hotplug event and what error
> (from set_cpus_allowed_ptr() you observe? (And what's src/dst cpuset wrt
> root/non-root)?
  I ran the LTP testcases and a test scripts that do hotplug on a random cpu at the same time.
  The race condition quickly, and I can't reproduce it so far.
  By reading code about set_cpus_allowed_ptr(), i think __set_cpus_allowed_ptr_locked() will
be failed when new_mask and cpu_active_mask do not intersect, as follows:

 __set_cpus_allowed_ptr_locked():
	....
	const struct cpumask *cpu_valid_mask = cpu_active_mask;
	dest_cpu = cpumask_any_and_distribute(cpu_valid_mask, new_mask);
	if (dest_cpu >= nr_cpu_ids) {
		ret = -EINVAL;
		goto out;
	}
	....
}


> 
>> 	Can we use cpus_read_lock()/cpus_read_unlock() to guarantee that set_cpus_allowed_ptr()
>> doesn't fail, as follows:
> 
> I'm wondering what can be wrong with the current actors:
> 
>     cpuset_can_attach
>       down_read(cpuset_rwsem)
>         // check all migratees
>       up_read(cpuset_rwsem)
>                                       [ _cpu_down / cpuhp_setup_state ]
>                                       schedule_work
>                                       ...
>                                       cpuset_hotplug_update_tasks
>                                         down_write(cpuset_rwsem)
>                                         up_write(cpuset_rwsem)
>                                       ... flush_work
>                                       [ _cpu_down / cpu_up_down_serialize_trainwrecks ]
>     cpuset_attach
>       down_write(cpuset_rwsem)
>         set_cpus_allowed_ptr(allowed_cpus_weird)
>       up_write(cpuset_rwsem)
> 

i think the troublesome scenario as follows:
     cpuset_can_attach
       down_read(cpuset_rwsem)
         // check all migratees
       up_read(cpuset_rwsem)
                                       			[ _cpu_down / cpuhp_setup_state ]
     cpuset_attach
      	down_write(cpuset_rwsem)
	guarantee_online_cpus() // (load cpus_attach)
	     						sched_cpu_deactivate
							  set_cpu_active(cpu, false)  // will change cpu_active_mask
        set_cpus_allowed_ptr(cpus_attach)
	   __set_cpus_allowed_ptr_locked()
	     // (if the intersection of cpus_attach and
	      cpu_active_mask is empty, will return -EINVAL)
       up_write(cpuset_rwsem)
	                                     		schedule_work
        	                               		...
                	                       		cpuset_hotplug_update_tasks
                        	                	 down_write(cpuset_rwsem)
	                                	         up_write(cpuset_rwsem)
		                                       ... flush_work
        		                               [ _cpu_down / cpu_up_down_serialize_trainwrecks ]


Regards,
Qiao

> The statement in cpuset_attach() about cpuset_can_attach() test is not
> so strong since task_can_attach() is mostly a pass for non-deadline
> tasks. Still, the use of cpuset_rwsem above should synchronize (I may be
> mistaken) the changes of cpuset's cpu masks, so I'd be interested about
> the details above to understand why the current approach doesn't work.
> 
> The additional cpus_read_{,un}lock (when reordered wrt cpuset_rwsem)
> may work but your patch should explain why (in what situation).
> 
> My .02€,
> Michal
> .
> 

  reply	other threads:[~2022-01-20  7:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <09ce5796-798e-83d0-f1a6-ba38a787bfc5@huawei.com>
2022-01-14  1:15 ` [Question] set_cpus_allowed_ptr() call failed at cpuset_attach() Zhang Qiao
2022-01-14 16:20   ` Tejun Heo
2022-01-14 20:33     ` Waiman Long
2022-01-17  2:25       ` Zhang Qiao
2022-01-17  4:35         ` Waiman Long
2022-01-17  6:27           ` Zhang Qiao
2022-01-17  6:25     ` Zhang Qiao
2022-01-19 13:02   ` Michal Koutný
2022-01-20  7:14     ` Zhang Qiao [this message]
2022-01-20 14:02       ` Michal Koutný
2022-01-21  8:33         ` Zhang Qiao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff49c096-39d9-4215-5b4f-8af2fd7c0c91@huawei.com \
    --to=zhangqiao22@huawei.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mkoutny@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).