All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Waiman Long <longman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	ying.huang@intel.com, stable@vger.kernel.org
Subject: Re: [PATCH v2] cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
Date: Tue, 26 Apr 2022 11:23:37 +0800	[thread overview]
Message-ID: <20220426032337.GA84190@shbuild999.sh.intel.com> (raw)
In-Reply-To: <20220425155505.1292896-1-longman@redhat.com>

Hi Waiman,

On Mon, Apr 25, 2022 at 11:55:05AM -0400, Waiman Long wrote:
> There are 3 places where the cpu and node masks of the top cpuset can
> be initialized in the order they are executed:
>  1) start_kernel -> cpuset_init()
>  2) start_kernel -> cgroup_init() -> cpuset_bind()
>  3) kernel_init_freeable() -> do_basic_setup() -> cpuset_init_smp()
> 
> The first cpuset_init() function just sets all the bits in the masks.
> The last one executed is cpuset_init_smp() which sets up cpu and node
> masks suitable for v1, but not v2.  cpuset_bind() does the right setup
> for both v1 and v2.
> 
> For systems with cgroup v2 setup, cpuset_bind() is called once. For
> systems with cgroup v1 setup, cpuset_bind() is called twice. It is
> first called before cpuset_init_smp() in cgroup v2 mode.  Then it is
> called again when cgroup v1 filesystem is mounted in v1 mode after
> cpuset_init_smp().
> 
>   [    2.609781] cpuset_bind() called - v2 = 1
>   [    3.079473] cpuset_init_smp() called
>   [    7.103710] cpuset_bind() called - v2 = 0

I run some test, on a server with centOS, this did happen that
cpuset_bind() is called twice, first as v2 during kernel boot,
and then as v1 post-boot. 

However on a QEMU running with a basic debian rootfs image,
the second  call of cpuset_bind() didn't happen. 

> As a result, cpu and memory node hot add may fail to update the cpu and
> node masks of the top cpuset to include the newly added cpu or node in
> a cgroup v2 environment.
> 
> smp_init() is called after the first two init functions.  So we don't
> have a complete list of active cpus and memory nodes until later in
> cpuset_init_smp() which is the right time to set up effective_cpus
> and effective_mems.
> 
> To fix this problem, the potentially incorrect cpus_allowed &
> mems_allowed setup in cpuset_init_smp() are removed.  For cgroup v2
> systems, the initial cpuset_bind() call will set them up correctly.
> For cgroup v1 systems, the second call to cpuset_bind() will do the
> right setup.
> 
> cc: stable@vger.kernel.org
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 9390bfd9f1cd..6bd8f5ef40fe 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3390,8 +3390,9 @@ static struct notifier_block cpuset_track_online_nodes_nb = {
>   */
>  void __init cpuset_init_smp(void)
>  {
> -	cpumask_copy(top_cpuset.cpus_allowed, cpu_active_mask);
> -	top_cpuset.mems_allowed = node_states[N_MEMORY];

So can we keep line
  cpumask_copy(top_cpuset.cpus_allowed, cpu_active_mask);

and only remove line 
       top_cpuset.mems_allowed = node_states[N_MEMORY];
?

Thanks,
Feng

WARNING: multiple messages have this Message-ID (diff)
From: Feng Tang <feng.tang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Dave Hansen <dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v2] cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
Date: Tue, 26 Apr 2022 11:23:37 +0800	[thread overview]
Message-ID: <20220426032337.GA84190@shbuild999.sh.intel.com> (raw)
In-Reply-To: <20220425155505.1292896-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Hi Waiman,

On Mon, Apr 25, 2022 at 11:55:05AM -0400, Waiman Long wrote:
> There are 3 places where the cpu and node masks of the top cpuset can
> be initialized in the order they are executed:
>  1) start_kernel -> cpuset_init()
>  2) start_kernel -> cgroup_init() -> cpuset_bind()
>  3) kernel_init_freeable() -> do_basic_setup() -> cpuset_init_smp()
> 
> The first cpuset_init() function just sets all the bits in the masks.
> The last one executed is cpuset_init_smp() which sets up cpu and node
> masks suitable for v1, but not v2.  cpuset_bind() does the right setup
> for both v1 and v2.
> 
> For systems with cgroup v2 setup, cpuset_bind() is called once. For
> systems with cgroup v1 setup, cpuset_bind() is called twice. It is
> first called before cpuset_init_smp() in cgroup v2 mode.  Then it is
> called again when cgroup v1 filesystem is mounted in v1 mode after
> cpuset_init_smp().
> 
>   [    2.609781] cpuset_bind() called - v2 = 1
>   [    3.079473] cpuset_init_smp() called
>   [    7.103710] cpuset_bind() called - v2 = 0

I run some test, on a server with centOS, this did happen that
cpuset_bind() is called twice, first as v2 during kernel boot,
and then as v1 post-boot. 

However on a QEMU running with a basic debian rootfs image,
the second  call of cpuset_bind() didn't happen. 

> As a result, cpu and memory node hot add may fail to update the cpu and
> node masks of the top cpuset to include the newly added cpu or node in
> a cgroup v2 environment.
> 
> smp_init() is called after the first two init functions.  So we don't
> have a complete list of active cpus and memory nodes until later in
> cpuset_init_smp() which is the right time to set up effective_cpus
> and effective_mems.
> 
> To fix this problem, the potentially incorrect cpus_allowed &
> mems_allowed setup in cpuset_init_smp() are removed.  For cgroup v2
> systems, the initial cpuset_bind() call will set them up correctly.
> For cgroup v1 systems, the second call to cpuset_bind() will do the
> right setup.
> 
> cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  kernel/cgroup/cpuset.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 9390bfd9f1cd..6bd8f5ef40fe 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3390,8 +3390,9 @@ static struct notifier_block cpuset_track_online_nodes_nb = {
>   */
>  void __init cpuset_init_smp(void)
>  {
> -	cpumask_copy(top_cpuset.cpus_allowed, cpu_active_mask);
> -	top_cpuset.mems_allowed = node_states[N_MEMORY];

So can we keep line
  cpumask_copy(top_cpuset.cpus_allowed, cpu_active_mask);

and only remove line 
       top_cpuset.mems_allowed = node_states[N_MEMORY];
?

Thanks,
Feng

  reply	other threads:[~2022-04-26  3:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-25 15:55 [PATCH v2] cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp() Waiman Long
2022-04-26  3:23 ` Feng Tang [this message]
2022-04-26  3:23   ` Feng Tang
2022-04-26 14:58   ` Waiman Long
2022-04-27  1:06     ` Feng Tang
2022-04-27  1:06       ` Feng Tang
2022-04-27  2:34       ` Waiman Long
2022-04-27  2:34         ` Waiman Long
2022-04-27 12:09         ` Feng Tang
2022-04-27 12:09           ` Feng Tang
2022-04-27 13:53 ` Michal Koutný
2022-04-27 14:33   ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220426032337.GA84190@shbuild999.sh.intel.com \
    --to=feng.tang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.