[PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
@ 2023-01-20 19:48 Qais Yousef
  2023-01-20 22:16 ` Waiman Long
  0 siblings, 1 reply; 12+ messages in thread
From: Qais Yousef @ 2023-01-20 19:48 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Juri Lelli, Waiman Long
  Cc: Steven Rostedt, tj, linux-kernel, luca.abeni, claudio,
	tommaso.cucinotta, bristot, mathieu.poirier, Dietmar Eggemann,
	cgroups, Vincent Guittot, Wei Wang, Rick Yiu, Quentin Perret,
	Qais Yousef

Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information")
enabled rebuilding sched domain on cpuset and hotplug operations to
correct deadline accounting.

Rebuilding sched domain is a slow operation and we see 10+ ms delay on
suspend-resume because of that.

Since nothing is expected to change on suspend-resume operation; skip
rebuilding the sched domains to regain the time lost.

Debugged-by: Rick Yiu <rickyiu@google.com>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
---

    Changes in v2:
    
    	* Remove redundant check in update_tasks_root_domain() (Thanks Waiman)
    
    v1 link:
    
    	https://lore.kernel.org/lkml/20221216233501.gh6m75e7s66dmjgo@airbuntu/

 kernel/cgroup/cpuset.c  | 3 +++
 kernel/sched/deadline.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a29c0b13706b..9a45f083459c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1088,6 +1088,9 @@ static void rebuild_root_domains(void)
 	lockdep_assert_cpus_held();
 	lockdep_assert_held(&sched_domains_mutex);
 
+	if (cpuhp_tasks_frozen)
+		return;
+
 	rcu_read_lock();
 
 	/*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0d97d54276cc..42c1143a3956 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2575,6 +2575,9 @@ void dl_clear_root_domain(struct root_domain *rd)
 {
 	unsigned long flags;
 
+	if (cpuhp_tasks_frozen)
+		return;
+
 	raw_spin_lock_irqsave(&rd->dl_bw.lock, flags);
 	rd->dl_bw.total_bw = 0;
 	raw_spin_unlock_irqrestore(&rd->dl_bw.lock, flags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
  2023-01-20 19:48 [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume Qais Yousef
@ 2023-01-20 22:16 ` Waiman Long
  2023-01-25 16:35   ` Qais Yousef
  0 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2023-01-20 22:16 UTC (permalink / raw)
  To: Qais Yousef, Peter Zijlstra, Ingo Molnar, Juri Lelli
  Cc: Steven Rostedt, tj, linux-kernel, luca.abeni, claudio,
	tommaso.cucinotta, bristot, mathieu.poirier, Dietmar Eggemann,
	cgroups, Vincent Guittot, Wei Wang, Rick Yiu, Quentin Perret


On 1/20/23 14:48, Qais Yousef wrote:
> Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information")
> enabled rebuilding sched domain on cpuset and hotplug operations to
> correct deadline accounting.
>
> Rebuilding sched domain is a slow operation and we see 10+ ms delay on
> suspend-resume because of that.
>
> Since nothing is expected to change on suspend-resume operation; skip
> rebuilding the sched domains to regain the time lost.
>
> Debugged-by: Rick Yiu <rickyiu@google.com>
> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
> ---
>
>      Changes in v2:
>      
>      	* Remove redundant check in update_tasks_root_domain() (Thanks Waiman)
>      
>      v1 link:
>      
>      	https://lore.kernel.org/lkml/20221216233501.gh6m75e7s66dmjgo@airbuntu/
>
>   kernel/cgroup/cpuset.c  | 3 +++
>   kernel/sched/deadline.c | 3 +++
>   2 files changed, 6 insertions(+)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index a29c0b13706b..9a45f083459c 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1088,6 +1088,9 @@ static void rebuild_root_domains(void)
>   	lockdep_assert_cpus_held();
>   	lockdep_assert_held(&sched_domains_mutex);
>   
> +	if (cpuhp_tasks_frozen)
> +		return;
> +
>   	rcu_read_lock();
>   
>   	/*
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0d97d54276cc..42c1143a3956 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2575,6 +2575,9 @@ void dl_clear_root_domain(struct root_domain *rd)
>   {
>   	unsigned long flags;
>   
> +	if (cpuhp_tasks_frozen)
> +		return;
> +
>   	raw_spin_lock_irqsave(&rd->dl_bw.lock, flags);
>   	rd->dl_bw.total_bw = 0;
>   	raw_spin_unlock_irqrestore(&rd->dl_bw.lock, flags);

cpuhp_tasks_frozen is set when thaw_secondary_cpus() or 
freeze_secondary_cpus() is called. I don't know the exact suspend/resume 
calling sequences, will cpuhp_tasks_frozen be cleared at the end of 
resume sequence? Maybe we should make sure that rebuild_root_domain() is 
called at least once at the end of resume operation.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
  2023-01-20 22:16 ` Waiman Long
@ 2023-01-25 16:35   ` Qais Yousef
  2023-01-29 16:56     ` Qais Yousef
  2023-01-30  2:49     ` Waiman Long
  0 siblings, 2 replies; 12+ messages in thread
From: Qais Yousef @ 2023-01-25 16:35 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Steven Rostedt, tj,
	linux-kernel, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, Dietmar Eggemann, cgroups, Vincent Guittot,
	Wei Wang, Rick Yiu, Quentin Perret

On 01/20/23 17:16, Waiman Long wrote:
> 
> On 1/20/23 14:48, Qais Yousef wrote:
> > Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information")
> > enabled rebuilding sched domain on cpuset and hotplug operations to
> > correct deadline accounting.
> > 
> > Rebuilding sched domain is a slow operation and we see 10+ ms delay on
> > suspend-resume because of that.
> > 
> > Since nothing is expected to change on suspend-resume operation; skip
> > rebuilding the sched domains to regain the time lost.
> > 
> > Debugged-by: Rick Yiu <rickyiu@google.com>
> > Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
> > ---
> > 
> >      Changes in v2:
> >      	* Remove redundant check in update_tasks_root_domain() (Thanks Waiman)
> >      v1 link:
> >      	https://lore.kernel.org/lkml/20221216233501.gh6m75e7s66dmjgo@airbuntu/
> > 
> >   kernel/cgroup/cpuset.c  | 3 +++
> >   kernel/sched/deadline.c | 3 +++
> >   2 files changed, 6 insertions(+)
> > 
> > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > index a29c0b13706b..9a45f083459c 100644
> > --- a/kernel/cgroup/cpuset.c
> > +++ b/kernel/cgroup/cpuset.c
> > @@ -1088,6 +1088,9 @@ static void rebuild_root_domains(void)
> >   	lockdep_assert_cpus_held();
> >   	lockdep_assert_held(&sched_domains_mutex);
> > +	if (cpuhp_tasks_frozen)
> > +		return;
> > +
> >   	rcu_read_lock();
> >   	/*
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 0d97d54276cc..42c1143a3956 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -2575,6 +2575,9 @@ void dl_clear_root_domain(struct root_domain *rd)
> >   {
> >   	unsigned long flags;
> > +	if (cpuhp_tasks_frozen)
> > +		return;
> > +
> >   	raw_spin_lock_irqsave(&rd->dl_bw.lock, flags);
> >   	rd->dl_bw.total_bw = 0;
> >   	raw_spin_unlock_irqrestore(&rd->dl_bw.lock, flags);
> 
> cpuhp_tasks_frozen is set when thaw_secondary_cpus() or
> freeze_secondary_cpus() is called. I don't know the exact suspend/resume
> calling sequences, will cpuhp_tasks_frozen be cleared at the end of resume
> sequence? Maybe we should make sure that rebuild_root_domain() is called at
> least once at the end of resume operation.

Very good questions. It made me look at the logic again and I realize now that
the way force_build behaves is causing this issue.

I *think* we should just make the call rebuild_root_domains() only if
cpus_updated in cpuset_hotplug_workfn().

cpuset_cpu_active() seems to be the source of force_rebuild in my case; which
seems to be called only after the last cpu is back online (what you suggest).
In this case we can end up with cpus_updated = false, but force_rebuild = true.

Now you added a couple of new users to force_rebuild in 4b842da276a8a; I'm
trying to figure out what the conditions would be there. It seems we can have
corner cases for cpus_update might not trigger correctly?

Could the below be a good cure?

AFAICT we must rebuild the root domains if something has changed in cpuset.
Which should be captured by either having:

	* cpus_updated = true
	* force_rebuild && !cpuhp_tasks_frozen

/me goes to test the patch

--->8---

	diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
	index a29c0b13706b..363e4459559f 100644
	--- a/kernel/cgroup/cpuset.c
	+++ b/kernel/cgroup/cpuset.c
	@@ -1079,6 +1079,8 @@ static void update_tasks_root_domain(struct cpuset *cs)
		css_task_iter_end(&it);
	 }

	+static bool need_rebuild_rd = true;
	+
	 static void rebuild_root_domains(void)
	 {
		struct cpuset *cs = NULL;
	@@ -1088,6 +1090,9 @@ static void rebuild_root_domains(void)
		lockdep_assert_cpus_held();
		lockdep_assert_held(&sched_domains_mutex);

	+       if (!need_rebuild_rd)
	+               return;
	+
		rcu_read_lock();

		/*
	@@ -3627,7 +3632,9 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
		/* rebuild sched domains if cpus_allowed has changed */
		if (cpus_updated || force_rebuild) {
			force_rebuild = false;
	+               need_rebuild_rd = cpus_updated || (force_rebuild && !cpuhp_tasks_frozen);
			rebuild_sched_domains();
	+               need_rebuild_rd = true;
		}

		free_cpumasks(NULL, ptmp);


--->8---

Thanks!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
  2023-01-25 16:35   ` Qais Yousef
@ 2023-01-29 16:56     ` Qais Yousef
  2023-01-30  2:49     ` Waiman Long
  1 sibling, 0 replies; 12+ messages in thread
From: Qais Yousef @ 2023-01-29 16:56 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Steven Rostedt, tj,
	linux-kernel, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, Dietmar Eggemann, cgroups, Vincent Guittot,
	Wei Wang, Rick Yiu, Quentin Perret

On 01/25/23 16:35, Qais Yousef wrote:
> On 01/20/23 17:16, Waiman Long wrote:
> > 
> > On 1/20/23 14:48, Qais Yousef wrote:
> > > Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information")
> > > enabled rebuilding sched domain on cpuset and hotplug operations to
> > > correct deadline accounting.
> > > 
> > > Rebuilding sched domain is a slow operation and we see 10+ ms delay on
> > > suspend-resume because of that.
> > > 
> > > Since nothing is expected to change on suspend-resume operation; skip
> > > rebuilding the sched domains to regain the time lost.
> > > 
> > > Debugged-by: Rick Yiu <rickyiu@google.com>
> > > Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
> > > ---
> > > 
> > >      Changes in v2:
> > >      	* Remove redundant check in update_tasks_root_domain() (Thanks Waiman)
> > >      v1 link:
> > >      	https://lore.kernel.org/lkml/20221216233501.gh6m75e7s66dmjgo@airbuntu/
> > > 
> > >   kernel/cgroup/cpuset.c  | 3 +++
> > >   kernel/sched/deadline.c | 3 +++
> > >   2 files changed, 6 insertions(+)
> > > 
> > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > > index a29c0b13706b..9a45f083459c 100644
> > > --- a/kernel/cgroup/cpuset.c
> > > +++ b/kernel/cgroup/cpuset.c
> > > @@ -1088,6 +1088,9 @@ static void rebuild_root_domains(void)
> > >   	lockdep_assert_cpus_held();
> > >   	lockdep_assert_held(&sched_domains_mutex);
> > > +	if (cpuhp_tasks_frozen)
> > > +		return;
> > > +
> > >   	rcu_read_lock();
> > >   	/*
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index 0d97d54276cc..42c1143a3956 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -2575,6 +2575,9 @@ void dl_clear_root_domain(struct root_domain *rd)
> > >   {
> > >   	unsigned long flags;
> > > +	if (cpuhp_tasks_frozen)
> > > +		return;
> > > +
> > >   	raw_spin_lock_irqsave(&rd->dl_bw.lock, flags);
> > >   	rd->dl_bw.total_bw = 0;
> > >   	raw_spin_unlock_irqrestore(&rd->dl_bw.lock, flags);
> > 
> > cpuhp_tasks_frozen is set when thaw_secondary_cpus() or
> > freeze_secondary_cpus() is called. I don't know the exact suspend/resume
> > calling sequences, will cpuhp_tasks_frozen be cleared at the end of resume
> > sequence? Maybe we should make sure that rebuild_root_domain() is called at
> > least once at the end of resume operation.
> 
> Very good questions. It made me look at the logic again and I realize now that
> the way force_build behaves is causing this issue.
> 
> I *think* we should just make the call rebuild_root_domains() only if
> cpus_updated in cpuset_hotplug_workfn().
> 
> cpuset_cpu_active() seems to be the source of force_rebuild in my case; which
> seems to be called only after the last cpu is back online (what you suggest).
> In this case we can end up with cpus_updated = false, but force_rebuild = true.
> 
> Now you added a couple of new users to force_rebuild in 4b842da276a8a; I'm
> trying to figure out what the conditions would be there. It seems we can have
> corner cases for cpus_update might not trigger correctly?
> 
> Could the below be a good cure?
> 
> AFAICT we must rebuild the root domains if something has changed in cpuset.
> Which should be captured by either having:
> 
> 	* cpus_updated = true
> 	* force_rebuild && !cpuhp_tasks_frozen
> 
> /me goes to test the patch

It works fine.

Can we assume cpus_udpated will always be false in suspend/resume cycle? I can
then check for (force_rebuild && !cpuhp_tasks_frozen) directly in
rebuild_root_domains().


Thanks!

--
Qais Yousef

> 
> --->8---
> 
> 	diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> 	index a29c0b13706b..363e4459559f 100644
> 	--- a/kernel/cgroup/cpuset.c
> 	+++ b/kernel/cgroup/cpuset.c
> 	@@ -1079,6 +1079,8 @@ static void update_tasks_root_domain(struct cpuset *cs)
> 		css_task_iter_end(&it);
> 	 }
> 
> 	+static bool need_rebuild_rd = true;
> 	+
> 	 static void rebuild_root_domains(void)
> 	 {
> 		struct cpuset *cs = NULL;
> 	@@ -1088,6 +1090,9 @@ static void rebuild_root_domains(void)
> 		lockdep_assert_cpus_held();
> 		lockdep_assert_held(&sched_domains_mutex);
> 
> 	+       if (!need_rebuild_rd)
> 	+               return;
> 	+
> 		rcu_read_lock();
> 
> 		/*
> 	@@ -3627,7 +3632,9 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
> 		/* rebuild sched domains if cpus_allowed has changed */
> 		if (cpus_updated || force_rebuild) {
> 			force_rebuild = false;
> 	+               need_rebuild_rd = cpus_updated || (force_rebuild && !cpuhp_tasks_frozen);
> 			rebuild_sched_domains();
> 	+               need_rebuild_rd = true;
> 		}
> 
> 		free_cpumasks(NULL, ptmp);
> 
> 
> --->8---
> 
> Thanks!
> 
> --
> Qais Yousef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
  2023-01-25 16:35   ` Qais Yousef
  2023-01-29 16:56     ` Qais Yousef
@ 2023-01-30  2:49     ` Waiman Long
  2023-01-30  2:58       ` Waiman Long
  2023-01-30 13:00       ` Qais Yousef
  1 sibling, 2 replies; 12+ messages in thread
From: Waiman Long @ 2023-01-30  2:49 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Steven Rostedt, tj,
	linux-kernel, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, Dietmar Eggemann, cgroups, Vincent Guittot,
	Wei Wang, Rick Yiu, Quentin Perret

On 1/25/23 11:35, Qais Yousef wrote:
> On 01/20/23 17:16, Waiman Long wrote:
>> On 1/20/23 14:48, Qais Yousef wrote:
>>> Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information")
>>> enabled rebuilding sched domain on cpuset and hotplug operations to
>>> correct deadline accounting.
>>>
>>> Rebuilding sched domain is a slow operation and we see 10+ ms delay on
>>> suspend-resume because of that.
>>>
>>> Since nothing is expected to change on suspend-resume operation; skip
>>> rebuilding the sched domains to regain the time lost.
>>>
>>> Debugged-by: Rick Yiu <rickyiu@google.com>
>>> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
>>> ---
>>>
>>>       Changes in v2:
>>>       	* Remove redundant check in update_tasks_root_domain() (Thanks Waiman)
>>>       v1 link:
>>>       	https://lore.kernel.org/lkml/20221216233501.gh6m75e7s66dmjgo@airbuntu/
>>>
>>>    kernel/cgroup/cpuset.c  | 3 +++
>>>    kernel/sched/deadline.c | 3 +++
>>>    2 files changed, 6 insertions(+)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index a29c0b13706b..9a45f083459c 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -1088,6 +1088,9 @@ static void rebuild_root_domains(void)
>>>    	lockdep_assert_cpus_held();
>>>    	lockdep_assert_held(&sched_domains_mutex);
>>> +	if (cpuhp_tasks_frozen)
>>> +		return;
>>> +
>>>    	rcu_read_lock();
>>>    	/*
>>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>>> index 0d97d54276cc..42c1143a3956 100644
>>> --- a/kernel/sched/deadline.c
>>> +++ b/kernel/sched/deadline.c
>>> @@ -2575,6 +2575,9 @@ void dl_clear_root_domain(struct root_domain *rd)
>>>    {
>>>    	unsigned long flags;
>>> +	if (cpuhp_tasks_frozen)
>>> +		return;
>>> +
>>>    	raw_spin_lock_irqsave(&rd->dl_bw.lock, flags);
>>>    	rd->dl_bw.total_bw = 0;
>>>    	raw_spin_unlock_irqrestore(&rd->dl_bw.lock, flags);
>> cpuhp_tasks_frozen is set when thaw_secondary_cpus() or
>> freeze_secondary_cpus() is called. I don't know the exact suspend/resume
>> calling sequences, will cpuhp_tasks_frozen be cleared at the end of resume
>> sequence? Maybe we should make sure that rebuild_root_domain() is called at
>> least once at the end of resume operation.
> Very good questions. It made me look at the logic again and I realize now that
> the way force_build behaves is causing this issue.
>
> I *think* we should just make the call rebuild_root_domains() only if
> cpus_updated in cpuset_hotplug_workfn().
>
> cpuset_cpu_active() seems to be the source of force_rebuild in my case; which
> seems to be called only after the last cpu is back online (what you suggest).
> In this case we can end up with cpus_updated = false, but force_rebuild = true.
>
> Now you added a couple of new users to force_rebuild in 4b842da276a8a; I'm
> trying to figure out what the conditions would be there. It seems we can have
> corner cases for cpus_update might not trigger correctly?
>
> Could the below be a good cure?
>
> AFAICT we must rebuild the root domains if something has changed in cpuset.
> Which should be captured by either having:
>
> 	* cpus_updated = true
> 	* force_rebuild && !cpuhp_tasks_frozen
>
> /me goes to test the patch
>
> --->8---
>
> 	diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> 	index a29c0b13706b..363e4459559f 100644
> 	--- a/kernel/cgroup/cpuset.c
> 	+++ b/kernel/cgroup/cpuset.c
> 	@@ -1079,6 +1079,8 @@ static void update_tasks_root_domain(struct cpuset *cs)
> 		css_task_iter_end(&it);
> 	 }
>
> 	+static bool need_rebuild_rd = true;
> 	+
> 	 static void rebuild_root_domains(void)
> 	 {
> 		struct cpuset *cs = NULL;
> 	@@ -1088,6 +1090,9 @@ static void rebuild_root_domains(void)
> 		lockdep_assert_cpus_held();
> 		lockdep_assert_held(&sched_domains_mutex);
>
> 	+       if (!need_rebuild_rd)
> 	+               return;
> 	+
> 		rcu_read_lock();
>
> 		/*
> 	@@ -3627,7 +3632,9 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
> 		/* rebuild sched domains if cpus_allowed has changed */
> 		if (cpus_updated || force_rebuild) {
> 			force_rebuild = false;
> 	+               need_rebuild_rd = cpus_updated || (force_rebuild && !cpuhp_tasks_frozen);
> 			rebuild_sched_domains();
> 	+               need_rebuild_rd = true;

You do the force_check check after it is set to false in the previous 
statement which is definitely not correct. So it will be false whenever 
cpus_updated is false.

If you just want to skip rebuild_sched_domains() call for hotplug, why 
don't just skip the call here if the condition is right? Like

         /* rebuild sched domains if cpus_allowed has changed */
         if (cpus_updated || (force_rebuild && !cpuhp_tasks_frozen)) {
                 force_rebuild = false;
                 rebuild_sched_domains();
         }

Still, we will need to confirm that cpuhp_tasks_frozen will be cleared 
outside of the suspend/resume cycle.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
  2023-01-30  2:49     ` Waiman Long
@ 2023-01-30  2:58       ` Waiman Long
  2023-01-30 13:00       ` Qais Yousef
  1 sibling, 0 replies; 12+ messages in thread
From: Waiman Long @ 2023-01-30  2:58 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Steven Rostedt, tj,
	linux-kernel, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, Dietmar Eggemann, cgroups, Vincent Guittot,
	Wei Wang, Rick Yiu, Quentin Perret

On 1/29/23 21:49, Waiman Long wrote:
> On 1/25/23 11:35, Qais Yousef wrote:
>> On 01/20/23 17:16, Waiman Long wrote:
>>> On 1/20/23 14:48, Qais Yousef wrote:
>>>> Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline 
>>>> accounting information")
>>>> enabled rebuilding sched domain on cpuset and hotplug operations to
>>>> correct deadline accounting.
>>>>
>>>> Rebuilding sched domain is a slow operation and we see 10+ ms delay on
>>>> suspend-resume because of that.
>>>>
>>>> Since nothing is expected to change on suspend-resume operation; skip
>>>> rebuilding the sched domains to regain the time lost.
>>>>
>>>> Debugged-by: Rick Yiu <rickyiu@google.com>
>>>> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
>>>> ---
>>>>
>>>>       Changes in v2:
>>>>           * Remove redundant check in update_tasks_root_domain() 
>>>> (Thanks Waiman)
>>>>       v1 link:
>>>> https://lore.kernel.org/lkml/20221216233501.gh6m75e7s66dmjgo@airbuntu/
>>>>
>>>>    kernel/cgroup/cpuset.c  | 3 +++
>>>>    kernel/sched/deadline.c | 3 +++
>>>>    2 files changed, 6 insertions(+)
>>>>
>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>> index a29c0b13706b..9a45f083459c 100644
>>>> --- a/kernel/cgroup/cpuset.c
>>>> +++ b/kernel/cgroup/cpuset.c
>>>> @@ -1088,6 +1088,9 @@ static void rebuild_root_domains(void)
>>>>        lockdep_assert_cpus_held();
>>>>        lockdep_assert_held(&sched_domains_mutex);
>>>> +    if (cpuhp_tasks_frozen)
>>>> +        return;
>>>> +
>>>>        rcu_read_lock();
>>>>        /*
>>>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>>>> index 0d97d54276cc..42c1143a3956 100644
>>>> --- a/kernel/sched/deadline.c
>>>> +++ b/kernel/sched/deadline.c
>>>> @@ -2575,6 +2575,9 @@ void dl_clear_root_domain(struct root_domain 
>>>> *rd)
>>>>    {
>>>>        unsigned long flags;
>>>> +    if (cpuhp_tasks_frozen)
>>>> +        return;
>>>> +
>>>>        raw_spin_lock_irqsave(&rd->dl_bw.lock, flags);
>>>>        rd->dl_bw.total_bw = 0;
>>>>        raw_spin_unlock_irqrestore(&rd->dl_bw.lock, flags);
>>> cpuhp_tasks_frozen is set when thaw_secondary_cpus() or
>>> freeze_secondary_cpus() is called. I don't know the exact 
>>> suspend/resume
>>> calling sequences, will cpuhp_tasks_frozen be cleared at the end of 
>>> resume
>>> sequence? Maybe we should make sure that rebuild_root_domain() is 
>>> called at
>>> least once at the end of resume operation.
>> Very good questions. It made me look at the logic again and I realize 
>> now that
>> the way force_build behaves is causing this issue.
>>
>> I *think* we should just make the call rebuild_root_domains() only if
>> cpus_updated in cpuset_hotplug_workfn().
>>
>> cpuset_cpu_active() seems to be the source of force_rebuild in my 
>> case; which
>> seems to be called only after the last cpu is back online (what you 
>> suggest).
>> In this case we can end up with cpus_updated = false, but 
>> force_rebuild = true.
>>
>> Now you added a couple of new users to force_rebuild in 
>> 4b842da276a8a; I'm
>> trying to figure out what the conditions would be there. It seems we 
>> can have
>> corner cases for cpus_update might not trigger correctly?
>>
>> Could the below be a good cure?
>>
>> AFAICT we must rebuild the root domains if something has changed in 
>> cpuset.
>> Which should be captured by either having:
>>
>>     * cpus_updated = true
>>     * force_rebuild && !cpuhp_tasks_frozen
>>
>> /me goes to test the patch
>>
>> --->8---
>>
>>     diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>     index a29c0b13706b..363e4459559f 100644
>>     --- a/kernel/cgroup/cpuset.c
>>     +++ b/kernel/cgroup/cpuset.c
>>     @@ -1079,6 +1079,8 @@ static void update_tasks_root_domain(struct 
>> cpuset *cs)
>>         css_task_iter_end(&it);
>>      }
>>
>>     +static bool need_rebuild_rd = true;
>>     +
>>      static void rebuild_root_domains(void)
>>      {
>>         struct cpuset *cs = NULL;
>>     @@ -1088,6 +1090,9 @@ static void rebuild_root_domains(void)
>>         lockdep_assert_cpus_held();
>>         lockdep_assert_held(&sched_domains_mutex);
>>
>>     +       if (!need_rebuild_rd)
>>     +               return;
>>     +
>>         rcu_read_lock();
>>
>>         /*
>>     @@ -3627,7 +3632,9 @@ static void cpuset_hotplug_workfn(struct 
>> work_struct *work)
>>         /* rebuild sched domains if cpus_allowed has changed */
>>         if (cpus_updated || force_rebuild) {
>>             force_rebuild = false;
>>     +               need_rebuild_rd = cpus_updated || (force_rebuild 
>> && !cpuhp_tasks_frozen);
>>             rebuild_sched_domains();
>>     +               need_rebuild_rd = true;
>
> You do the force_check check after it is set to false in the previous 
> statement which is definitely not correct. So it will be false 
> whenever cpus_updated is false.
>
> If you just want to skip rebuild_sched_domains() call for hotplug, why 
> don't just skip the call here if the condition is right? Like
>
>         /* rebuild sched domains if cpus_allowed has changed */
>         if (cpus_updated || (force_rebuild && !cpuhp_tasks_frozen)) {
>                 force_rebuild = false;
>                 rebuild_sched_domains();
>         }
>
> Still, we will need to confirm that cpuhp_tasks_frozen will be cleared 
> outside of the suspend/resume cycle.

BTW, you also need to expand the comment to explain why we need to check 
for cpuhp_tasks_frozen.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
  2023-01-30  2:49     ` Waiman Long
  2023-01-30  2:58       ` Waiman Long
@ 2023-01-30 13:00       ` Qais Yousef
       [not found]         ` <253ced33-c3a8-269f-90cc-b69e66b10370@redhat.com>
  1 sibling, 1 reply; 12+ messages in thread
From: Qais Yousef @ 2023-01-30 13:00 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Steven Rostedt, tj,
	linux-kernel, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, Dietmar Eggemann, cgroups, Vincent Guittot,
	Wei Wang, Rick Yiu, Quentin Perret

On 01/29/23 21:49, Waiman Long wrote:
> On 1/25/23 11:35, Qais Yousef wrote:
> > On 01/20/23 17:16, Waiman Long wrote:
> > > On 1/20/23 14:48, Qais Yousef wrote:
> > > > Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information")
> > > > enabled rebuilding sched domain on cpuset and hotplug operations to
> > > > correct deadline accounting.
> > > > 
> > > > Rebuilding sched domain is a slow operation and we see 10+ ms delay on
> > > > suspend-resume because of that.
> > > > 
> > > > Since nothing is expected to change on suspend-resume operation; skip
> > > > rebuilding the sched domains to regain the time lost.
> > > > 
> > > > Debugged-by: Rick Yiu <rickyiu@google.com>
> > > > Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
> > > > ---
> > > > 
> > > >       Changes in v2:
> > > >       	* Remove redundant check in update_tasks_root_domain() (Thanks Waiman)
> > > >       v1 link:
> > > >       	https://lore.kernel.org/lkml/20221216233501.gh6m75e7s66dmjgo@airbuntu/
> > > > 
> > > >    kernel/cgroup/cpuset.c  | 3 +++
> > > >    kernel/sched/deadline.c | 3 +++
> > > >    2 files changed, 6 insertions(+)
> > > > 
> > > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > > > index a29c0b13706b..9a45f083459c 100644
> > > > --- a/kernel/cgroup/cpuset.c
> > > > +++ b/kernel/cgroup/cpuset.c
> > > > @@ -1088,6 +1088,9 @@ static void rebuild_root_domains(void)
> > > >    	lockdep_assert_cpus_held();
> > > >    	lockdep_assert_held(&sched_domains_mutex);
> > > > +	if (cpuhp_tasks_frozen)
> > > > +		return;
> > > > +
> > > >    	rcu_read_lock();
> > > >    	/*
> > > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > > index 0d97d54276cc..42c1143a3956 100644
> > > > --- a/kernel/sched/deadline.c
> > > > +++ b/kernel/sched/deadline.c
> > > > @@ -2575,6 +2575,9 @@ void dl_clear_root_domain(struct root_domain *rd)
> > > >    {
> > > >    	unsigned long flags;
> > > > +	if (cpuhp_tasks_frozen)
> > > > +		return;
> > > > +
> > > >    	raw_spin_lock_irqsave(&rd->dl_bw.lock, flags);
> > > >    	rd->dl_bw.total_bw = 0;
> > > >    	raw_spin_unlock_irqrestore(&rd->dl_bw.lock, flags);
> > > cpuhp_tasks_frozen is set when thaw_secondary_cpus() or
> > > freeze_secondary_cpus() is called. I don't know the exact suspend/resume
> > > calling sequences, will cpuhp_tasks_frozen be cleared at the end of resume
> > > sequence? Maybe we should make sure that rebuild_root_domain() is called at
> > > least once at the end of resume operation.
> > Very good questions. It made me look at the logic again and I realize now that
> > the way force_build behaves is causing this issue.
> > 
> > I *think* we should just make the call rebuild_root_domains() only if
> > cpus_updated in cpuset_hotplug_workfn().
> > 
> > cpuset_cpu_active() seems to be the source of force_rebuild in my case; which
> > seems to be called only after the last cpu is back online (what you suggest).
> > In this case we can end up with cpus_updated = false, but force_rebuild = true.
> > 
> > Now you added a couple of new users to force_rebuild in 4b842da276a8a; I'm
> > trying to figure out what the conditions would be there. It seems we can have
> > corner cases for cpus_update might not trigger correctly?
> > 
> > Could the below be a good cure?
> > 
> > AFAICT we must rebuild the root domains if something has changed in cpuset.
> > Which should be captured by either having:
> > 
> > 	* cpus_updated = true
> > 	* force_rebuild && !cpuhp_tasks_frozen
> > 
> > /me goes to test the patch
> > 
> > --->8---
> > 
> > 	diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > 	index a29c0b13706b..363e4459559f 100644
> > 	--- a/kernel/cgroup/cpuset.c
> > 	+++ b/kernel/cgroup/cpuset.c
> > 	@@ -1079,6 +1079,8 @@ static void update_tasks_root_domain(struct cpuset *cs)
> > 		css_task_iter_end(&it);
> > 	 }
> > 
> > 	+static bool need_rebuild_rd = true;
> > 	+
> > 	 static void rebuild_root_domains(void)
> > 	 {
> > 		struct cpuset *cs = NULL;
> > 	@@ -1088,6 +1090,9 @@ static void rebuild_root_domains(void)
> > 		lockdep_assert_cpus_held();
> > 		lockdep_assert_held(&sched_domains_mutex);
> > 
> > 	+       if (!need_rebuild_rd)
> > 	+               return;
> > 	+
> > 		rcu_read_lock();
> > 
> > 		/*
> > 	@@ -3627,7 +3632,9 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
> > 		/* rebuild sched domains if cpus_allowed has changed */
> > 		if (cpus_updated || force_rebuild) {
> > 			force_rebuild = false;
> > 	+               need_rebuild_rd = cpus_updated || (force_rebuild && !cpuhp_tasks_frozen);
> > 			rebuild_sched_domains();
> > 	+               need_rebuild_rd = true;
> 
> You do the force_check check after it is set to false in the previous
> statement which is definitely not correct. So it will be false whenever
> cpus_updated is false.
> 
> If you just want to skip rebuild_sched_domains() call for hotplug, why don't

We just need to skip rebuild_root_domains(). I think rebuild_sched_domains()
should still happen.

The issue, AFAIU, is that we assume this hotplug operation results in changes
in cpuset and the DEADLINE accounting is now wrong and must be re-calculated.
But s2ram will cause hotplug operation without actually changing the cpuset
configuration - the re-calculation is not required. But it'd be good to get
a confirmation from Juri.

> just skip the call here if the condition is right? Like
> 
>         /* rebuild sched domains if cpus_allowed has changed */
>         if (cpus_updated || (force_rebuild && !cpuhp_tasks_frozen)) {
>                 force_rebuild = false;
>                 rebuild_sched_domains();
>         }
> 
> Still, we will need to confirm that cpuhp_tasks_frozen will be cleared
> outside of the suspend/resume cycle.

I think it's fine to use this variable from the cpuhp callback context only.
Which I think this cpuset workfn is considered an extension of.

But you're right, I can't use cpuhp_tasks_frozen directly in
rebuild_root_domains() as I did in v1 because it doesn't get cleared after
calling the last _cpu_up(). force_rebuild will only be set after the last cpu
is brought online though - so this should happen once at the end.

(will update the comment too)

It seems I still need more time to study the code. What appeared simple, looks
is actually not..


Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
       [not found]         ` <253ced33-c3a8-269f-90cc-b69e66b10370@redhat.com>
@ 2023-01-30 19:48           ` Qais Yousef
  2023-01-30 19:57             ` Waiman Long
  0 siblings, 1 reply; 12+ messages in thread
From: Qais Yousef @ 2023-01-30 19:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Steven Rostedt, tj,
	linux-kernel, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, Dietmar Eggemann, cgroups, Vincent Guittot,
	Wei Wang, Rick Yiu, Quentin Perret

On 01/30/23 11:29, Waiman Long wrote:
> On 1/30/23 08:00, Qais Yousef wrote:
> 
>         just skip the call here if the condition is right? Like
> 
>                 /* rebuild sched domains if cpus_allowed has changed */
>                 if (cpus_updated || (force_rebuild && !cpuhp_tasks_frozen)) {
>                         force_rebuild = false;
>                         rebuild_sched_domains();
>                 }
> 
>         Still, we will need to confirm that cpuhp_tasks_frozen will be cleared
>         outside of the suspend/resume cycle.
> 
>     I think it's fine to use this variable from the cpuhp callback context only.
>     Which I think this cpuset workfn is considered an extension of.
> 
>     But you're right, I can't use cpuhp_tasks_frozen directly in
>     rebuild_root_domains() as I did in v1 because it doesn't get cleared after
>     calling the last _cpu_up().
> 
> That is what I suspect. So we can't use that cpuhp_tasks_frozen variable here
> in cpuset.
> 
>      force_rebuild will only be set after the last cpu
>     is brought online though - so this should happen once at the end.
> 
> Perhaps you can add another tracking variable for detecting if suspend/resume
> is in progress.

I think cpuhp_tasks_frozen is meant for that. All users who cared so far
belonged to the cpuhp callback. I think reading it from cpuset_hotplug_workfn()
is fine too as this function will only run as a consequence of the cpuhp
callback AFAICS. cpuset_cpu_active() takes care of not forcing a rebuild of
sched_domains until the last cpu becomes active - so the part of it being done
once at the end at resume is handled too.

It's just rebuild_sched_domains() will always assume it needs to clear and
rebuild deadline accounting - which is not true for suspend/resume case. But
now looking at other users of rebuild_sched_domains(), others might be getting
the hit too. For example rebuild_sched_domains_locked() is called on
update_relax_domain_level() which AFAIU should not impact dl accounting.

FWIW, I did capture a worst case scenario of 21ms because of
rebuild_root_domains().

/me thinks rebuild_root_domains() is a misleading name too as it just fixes
dl accounting but not rebuild the rd itself.

What makes sense to me now is to pass whether dl accounting requires updating
to rebuild_sched_domains() as an arg so that the caller can decide whether the
reason can affect dl accounting.

Or maybe pull rebuild_root_domains() out of the chain and let the caller call
it directly. And probably rename it to update_do_rd_accounting() or something.

I'll continue to dig more..

Thanks!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
  2023-01-30 19:48           ` Qais Yousef
@ 2023-01-30 19:57             ` Waiman Long
  2023-01-31 19:22               ` Qais Yousef
  0 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2023-01-30 19:57 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Steven Rostedt, tj,
	linux-kernel, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, Dietmar Eggemann, cgroups, Vincent Guittot,
	Wei Wang, Rick Yiu, Quentin Perret

On 1/30/23 14:48, Qais Yousef wrote:
> On 01/30/23 11:29, Waiman Long wrote:
>> On 1/30/23 08:00, Qais Yousef wrote:
>>
>>          just skip the call here if the condition is right? Like
>>
>>                  /* rebuild sched domains if cpus_allowed has changed */
>>                  if (cpus_updated || (force_rebuild && !cpuhp_tasks_frozen)) {
>>                          force_rebuild = false;
>>                          rebuild_sched_domains();
>>                  }
>>
>>          Still, we will need to confirm that cpuhp_tasks_frozen will be cleared
>>          outside of the suspend/resume cycle.
>>
>>      I think it's fine to use this variable from the cpuhp callback context only.
>>      Which I think this cpuset workfn is considered an extension of.
>>
>>      But you're right, I can't use cpuhp_tasks_frozen directly in
>>      rebuild_root_domains() as I did in v1 because it doesn't get cleared after
>>      calling the last _cpu_up().
>>
>> That is what I suspect. So we can't use that cpuhp_tasks_frozen variable here
>> in cpuset.
>>
>>       force_rebuild will only be set after the last cpu
>>      is brought online though - so this should happen once at the end.
>>
>> Perhaps you can add another tracking variable for detecting if suspend/resume
>> is in progress.
> I think cpuhp_tasks_frozen is meant for that. All users who cared so far
> belonged to the cpuhp callback. I think reading it from cpuset_hotplug_workfn()
> is fine too as this function will only run as a consequence of the cpuhp
> callback AFAICS. cpuset_cpu_active() takes care of not forcing a rebuild of
> sched_domains until the last cpu becomes active - so the part of it being done
> once at the end at resume is handled too.

Well we will have to add code to clear cpuhp_tasks_frozen at the end of 
resume then. We don't want to affect other callers unless we are sure 
that it won't affect them.

>
> It's just rebuild_sched_domains() will always assume it needs to clear and
> rebuild deadline accounting - which is not true for suspend/resume case. But
> now looking at other users of rebuild_sched_domains(), others might be getting
> the hit too. For example rebuild_sched_domains_locked() is called on
> update_relax_domain_level() which AFAIU should not impact dl accounting.
>
> FWIW, I did capture a worst case scenario of 21ms because of
> rebuild_root_domains().
>
> /me thinks rebuild_root_domains() is a misleading name too as it just fixes
> dl accounting but not rebuild the rd itself.
>
> What makes sense to me now is to pass whether dl accounting requires updating
> to rebuild_sched_domains() as an arg so that the caller can decide whether the
> reason can affect dl accounting.
>
> Or maybe pull rebuild_root_domains() out of the chain and let the caller call
> it directly. And probably rename it to update_do_rd_accounting() or something.
>
> I'll continue to dig more..

Looking forward to see that.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
  2023-01-30 19:57             ` Waiman Long
@ 2023-01-31 19:22               ` Qais Yousef
  2023-01-31 19:33                 ` Waiman Long
  0 siblings, 1 reply; 12+ messages in thread
From: Qais Yousef @ 2023-01-31 19:22 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Steven Rostedt, tj,
	linux-kernel, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, Dietmar Eggemann, cgroups, Vincent Guittot,
	Wei Wang, Rick Yiu, Quentin Perret

On 01/30/23 14:57, Waiman Long wrote:
> On 1/30/23 14:48, Qais Yousef wrote:
> > On 01/30/23 11:29, Waiman Long wrote:
> > > On 1/30/23 08:00, Qais Yousef wrote:
> > > 
> > >          just skip the call here if the condition is right? Like
> > > 
> > >                  /* rebuild sched domains if cpus_allowed has changed */
> > >                  if (cpus_updated || (force_rebuild && !cpuhp_tasks_frozen)) {
> > >                          force_rebuild = false;
> > >                          rebuild_sched_domains();
> > >                  }
> > > 
> > >          Still, we will need to confirm that cpuhp_tasks_frozen will be cleared
> > >          outside of the suspend/resume cycle.
> > > 
> > >      I think it's fine to use this variable from the cpuhp callback context only.
> > >      Which I think this cpuset workfn is considered an extension of.
> > > 
> > >      But you're right, I can't use cpuhp_tasks_frozen directly in
> > >      rebuild_root_domains() as I did in v1 because it doesn't get cleared after
> > >      calling the last _cpu_up().
> > > 
> > > That is what I suspect. So we can't use that cpuhp_tasks_frozen variable here
> > > in cpuset.
> > > 
> > >       force_rebuild will only be set after the last cpu
> > >      is brought online though - so this should happen once at the end.
> > > 
> > > Perhaps you can add another tracking variable for detecting if suspend/resume
> > > is in progress.
> > I think cpuhp_tasks_frozen is meant for that. All users who cared so far
> > belonged to the cpuhp callback. I think reading it from cpuset_hotplug_workfn()
> > is fine too as this function will only run as a consequence of the cpuhp
> > callback AFAICS. cpuset_cpu_active() takes care of not forcing a rebuild of
> > sched_domains until the last cpu becomes active - so the part of it being done
> > once at the end at resume is handled too.
> 
> Well we will have to add code to clear cpuhp_tasks_frozen at the end of
> resume then. We don't want to affect other callers unless we are sure that
> it won't affect them.

Actually I think since the cpuset_hotplug_workfn() is called later, there's
a chance to race with another cpuhp operation just after resume.

Anyway. I think we don't have to use this flag. But we'd have to better distill
the reasons of why we force_rebuild.

Your 2 new users are tripping me so far - do they handle errors where the shape
of cpuset changes? If yes, then we must take dl accounting update into
consideration for these errors.

Juri, I'd still would appreciate a confirmation from you that I'm not
understanding things completely wrong.

> 
> > 
> > It's just rebuild_sched_domains() will always assume it needs to clear and
> > rebuild deadline accounting - which is not true for suspend/resume case. But
> > now looking at other users of rebuild_sched_domains(), others might be getting
> > the hit too. For example rebuild_sched_domains_locked() is called on
> > update_relax_domain_level() which AFAIU should not impact dl accounting.
> > 
> > FWIW, I did capture a worst case scenario of 21ms because of
> > rebuild_root_domains().
> > 
> > /me thinks rebuild_root_domains() is a misleading name too as it just fixes
> > dl accounting but not rebuild the rd itself.
> > 
> > What makes sense to me now is to pass whether dl accounting requires updating
> > to rebuild_sched_domains() as an arg so that the caller can decide whether the
> > reason can affect dl accounting.
> > 
> > Or maybe pull rebuild_root_domains() out of the chain and let the caller call
> > it directly. And probably rename it to update_do_rd_accounting() or something.
> > 
> > I'll continue to dig more..
> 
> Looking forward to see that.

Another thought I had is maybe worth trying to optimize the rebuild root domain
process. Interestingly in my system there are no dl tasks but

	rebuilds_sched_domains()
	  cpuset_for_each_descendant_pre()
	    update_tasks_root_domain()
	      css_task_iter_next()
	        dl_add_task_root_domain()

seems to be going through every task in the hierarchy anyway which would
explain the slow down. We can have special variants to iterate through
hierarchies that ever seen a dl task attached to them and a special variant to
iterate through dl tasks only in a css - but I'm not sure if I'm brave enough
to go down this rabbit hole :D


Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
  2023-01-31 19:22               ` Qais Yousef
@ 2023-01-31 19:33                 ` Waiman Long
  2023-02-02 19:20                   ` Qais Yousef
  0 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2023-01-31 19:33 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Steven Rostedt, tj,
	linux-kernel, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, Dietmar Eggemann, cgroups, Vincent Guittot,
	Wei Wang, Rick Yiu, Quentin Perret


On 1/31/23 14:22, Qais Yousef wrote:
> On 01/30/23 14:57, Waiman Long wrote:
>> On 1/30/23 14:48, Qais Yousef wrote:
>>> On 01/30/23 11:29, Waiman Long wrote:
>>>> On 1/30/23 08:00, Qais Yousef wrote:
>>>>
>>>>           just skip the call here if the condition is right? Like
>>>>
>>>>                   /* rebuild sched domains if cpus_allowed has changed */
>>>>                   if (cpus_updated || (force_rebuild && !cpuhp_tasks_frozen)) {
>>>>                           force_rebuild = false;
>>>>                           rebuild_sched_domains();
>>>>                   }
>>>>
>>>>           Still, we will need to confirm that cpuhp_tasks_frozen will be cleared
>>>>           outside of the suspend/resume cycle.
>>>>
>>>>       I think it's fine to use this variable from the cpuhp callback context only.
>>>>       Which I think this cpuset workfn is considered an extension of.
>>>>
>>>>       But you're right, I can't use cpuhp_tasks_frozen directly in
>>>>       rebuild_root_domains() as I did in v1 because it doesn't get cleared after
>>>>       calling the last _cpu_up().
>>>>
>>>> That is what I suspect. So we can't use that cpuhp_tasks_frozen variable here
>>>> in cpuset.
>>>>
>>>>        force_rebuild will only be set after the last cpu
>>>>       is brought online though - so this should happen once at the end.
>>>>
>>>> Perhaps you can add another tracking variable for detecting if suspend/resume
>>>> is in progress.
>>> I think cpuhp_tasks_frozen is meant for that. All users who cared so far
>>> belonged to the cpuhp callback. I think reading it from cpuset_hotplug_workfn()
>>> is fine too as this function will only run as a consequence of the cpuhp
>>> callback AFAICS. cpuset_cpu_active() takes care of not forcing a rebuild of
>>> sched_domains until the last cpu becomes active - so the part of it being done
>>> once at the end at resume is handled too.
>> Well we will have to add code to clear cpuhp_tasks_frozen at the end of
>> resume then. We don't want to affect other callers unless we are sure that
>> it won't affect them.
> Actually I think since the cpuset_hotplug_workfn() is called later, there's
> a chance to race with another cpuhp operation just after resume.
>
> Anyway. I think we don't have to use this flag. But we'd have to better distill
> the reasons of why we force_rebuild.
>
> Your 2 new users are tripping me so far - do they handle errors where the shape
> of cpuset changes? If yes, then we must take dl accounting update into
> consideration for these errors.
The 2 new users is for the cpuset cpu partition which is used to create 
a secondary scheduling domain and hence have to call 
rebuilds_sched_domains() to set it up. Those should not be used that 
frequently.

>
> Juri, I'd still would appreciate a confirmation from you that I'm not
> understanding things completely wrong.
>
>>> It's just rebuild_sched_domains() will always assume it needs to clear and
>>> rebuild deadline accounting - which is not true for suspend/resume case. But
>>> now looking at other users of rebuild_sched_domains(), others might be getting
>>> the hit too. For example rebuild_sched_domains_locked() is called on
>>> update_relax_domain_level() which AFAIU should not impact dl accounting.
>>>
>>> FWIW, I did capture a worst case scenario of 21ms because of
>>> rebuild_root_domains().
>>>
>>> /me thinks rebuild_root_domains() is a misleading name too as it just fixes
>>> dl accounting but not rebuild the rd itself.
>>>
>>> What makes sense to me now is to pass whether dl accounting requires updating
>>> to rebuild_sched_domains() as an arg so that the caller can decide whether the
>>> reason can affect dl accounting.
>>>
>>> Or maybe pull rebuild_root_domains() out of the chain and let the caller call
>>> it directly. And probably rename it to update_do_rd_accounting() or something.
>>>
>>> I'll continue to dig more..
>> Looking forward to see that.
> Another thought I had is maybe worth trying to optimize the rebuild root domain
> process. Interestingly in my system there are no dl tasks but
>
> 	rebuilds_sched_domains()
> 	  cpuset_for_each_descendant_pre()
> 	    update_tasks_root_domain()
> 	      css_task_iter_next()
> 	        dl_add_task_root_domain()
>
> seems to be going through every task in the hierarchy anyway which would
> explain the slow down. We can have special variants to iterate through
> hierarchies that ever seen a dl task attached to them and a special variant to
> iterate through dl tasks only in a css - but I'm not sure if I'm brave enough
> to go down this rabbit hole :D

Yes, it seems like we have to check every tasks in the system to see if 
they are dl tasks. It can be expensive if there are a large number of 
tasks. Maybe we should track the # of dl tasks in each cgroup and skip 
this operation if there is none.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume
  2023-01-31 19:33                 ` Waiman Long
@ 2023-02-02 19:20                   ` Qais Yousef
  0 siblings, 0 replies; 12+ messages in thread
From: Qais Yousef @ 2023-02-02 19:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Steven Rostedt, tj,
	linux-kernel, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, Dietmar Eggemann, cgroups, Vincent Guittot,
	Wei Wang, Rick Yiu, Quentin Perret

On 01/31/23 14:33, Waiman Long wrote:
> 
> On 1/31/23 14:22, Qais Yousef wrote:
> > On 01/30/23 14:57, Waiman Long wrote:
> > > On 1/30/23 14:48, Qais Yousef wrote:
> > > > On 01/30/23 11:29, Waiman Long wrote:
> > > > > On 1/30/23 08:00, Qais Yousef wrote:
> > > > > 
> > > > >           just skip the call here if the condition is right? Like
> > > > > 
> > > > >                   /* rebuild sched domains if cpus_allowed has changed */
> > > > >                   if (cpus_updated || (force_rebuild && !cpuhp_tasks_frozen)) {
> > > > >                           force_rebuild = false;
> > > > >                           rebuild_sched_domains();
> > > > >                   }
> > > > > 
> > > > >           Still, we will need to confirm that cpuhp_tasks_frozen will be cleared
> > > > >           outside of the suspend/resume cycle.
> > > > > 
> > > > >       I think it's fine to use this variable from the cpuhp callback context only.
> > > > >       Which I think this cpuset workfn is considered an extension of.
> > > > > 
> > > > >       But you're right, I can't use cpuhp_tasks_frozen directly in
> > > > >       rebuild_root_domains() as I did in v1 because it doesn't get cleared after
> > > > >       calling the last _cpu_up().
> > > > > 
> > > > > That is what I suspect. So we can't use that cpuhp_tasks_frozen variable here
> > > > > in cpuset.
> > > > > 
> > > > >        force_rebuild will only be set after the last cpu
> > > > >       is brought online though - so this should happen once at the end.
> > > > > 
> > > > > Perhaps you can add another tracking variable for detecting if suspend/resume
> > > > > is in progress.
> > > > I think cpuhp_tasks_frozen is meant for that. All users who cared so far
> > > > belonged to the cpuhp callback. I think reading it from cpuset_hotplug_workfn()
> > > > is fine too as this function will only run as a consequence of the cpuhp
> > > > callback AFAICS. cpuset_cpu_active() takes care of not forcing a rebuild of
> > > > sched_domains until the last cpu becomes active - so the part of it being done
> > > > once at the end at resume is handled too.
> > > Well we will have to add code to clear cpuhp_tasks_frozen at the end of
> > > resume then. We don't want to affect other callers unless we are sure that
> > > it won't affect them.
> > Actually I think since the cpuset_hotplug_workfn() is called later, there's
> > a chance to race with another cpuhp operation just after resume.
> > 
> > Anyway. I think we don't have to use this flag. But we'd have to better distill
> > the reasons of why we force_rebuild.
> > 
> > Your 2 new users are tripping me so far - do they handle errors where the shape
> > of cpuset changes? If yes, then we must take dl accounting update into
> > consideration for these errors.
> The 2 new users is for the cpuset cpu partition which is used to create a
> secondary scheduling domain and hence have to call rebuilds_sched_domains()
> to set it up. Those should not be used that frequently.

Okay, thanks. So honouring these looks important, unlike the force_rebuild on
suspend/resume.

> 
> > 
> > Juri, I'd still would appreciate a confirmation from you that I'm not
> > understanding things completely wrong.
> > 
> > > > It's just rebuild_sched_domains() will always assume it needs to clear and
> > > > rebuild deadline accounting - which is not true for suspend/resume case. But
> > > > now looking at other users of rebuild_sched_domains(), others might be getting
> > > > the hit too. For example rebuild_sched_domains_locked() is called on
> > > > update_relax_domain_level() which AFAIU should not impact dl accounting.
> > > > 
> > > > FWIW, I did capture a worst case scenario of 21ms because of
> > > > rebuild_root_domains().
> > > > 
> > > > /me thinks rebuild_root_domains() is a misleading name too as it just fixes
> > > > dl accounting but not rebuild the rd itself.
> > > > 
> > > > What makes sense to me now is to pass whether dl accounting requires updating
> > > > to rebuild_sched_domains() as an arg so that the caller can decide whether the
> > > > reason can affect dl accounting.
> > > > 
> > > > Or maybe pull rebuild_root_domains() out of the chain and let the caller call
> > > > it directly. And probably rename it to update_do_rd_accounting() or something.
> > > > 
> > > > I'll continue to dig more..
> > > Looking forward to see that.
> > Another thought I had is maybe worth trying to optimize the rebuild root domain
> > process. Interestingly in my system there are no dl tasks but
> > 
> > 	rebuilds_sched_domains()
> > 	  cpuset_for_each_descendant_pre()
> > 	    update_tasks_root_domain()
> > 	      css_task_iter_next()
> > 	        dl_add_task_root_domain()
> > 
> > seems to be going through every task in the hierarchy anyway which would
> > explain the slow down. We can have special variants to iterate through
> > hierarchies that ever seen a dl task attached to them and a special variant to
> > iterate through dl tasks only in a css - but I'm not sure if I'm brave enough
> > to go down this rabbit hole :D
> 
> Yes, it seems like we have to check every tasks in the system to see if they
> are dl tasks. It can be expensive if there are a large number of tasks.
> Maybe we should track the # of dl tasks in each cgroup and skip this
> operation if there is none.

Yep, would be nice to have, hehe.


Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-02 19:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-20 19:48 [PATCH v2] sched: cpuset: Don't rebuild sched domains on suspend-resume Qais Yousef
2023-01-20 22:16 ` Waiman Long
2023-01-25 16:35   ` Qais Yousef
2023-01-29 16:56     ` Qais Yousef
2023-01-30  2:49     ` Waiman Long
2023-01-30  2:58       ` Waiman Long
2023-01-30 13:00       ` Qais Yousef
     [not found]         ` <253ced33-c3a8-269f-90cc-b69e66b10370@redhat.com>
2023-01-30 19:48           ` Qais Yousef
2023-01-30 19:57             ` Waiman Long
2023-01-31 19:22               ` Qais Yousef
2023-01-31 19:33                 ` Waiman Long
2023-02-02 19:20                   ` Qais Yousef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).