linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic
@ 2023-05-18 12:41 Hao Jia
  2023-05-19  4:15 ` Hao Jia
  0 siblings, 1 reply; 8+ messages in thread
From: Hao Jia @ 2023-05-18 12:41 UTC (permalink / raw)
  To: tj, lizefan.x, hannes; +Cc: cgroups, linux-kernel, Hao Jia

In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
needs to be updated to the current {rstatc, cgrp}->bstat, directly
assigning values instead of adding the last value to delta.

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
---
 kernel/cgroup/rstat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 9c4c55228567..3e5c4c1c92c6 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
 	/* propagate percpu delta to global */
 	cgroup_base_stat_sub(&delta, &rstatc->last_bstat);
 	cgroup_base_stat_add(&cgrp->bstat, &delta);
-	cgroup_base_stat_add(&rstatc->last_bstat, &delta);
+	rstatc->last_bstat = rstatc->bstat;
 
 	/* propagate global delta to parent (unless that's root) */
 	if (cgroup_parent(parent)) {
 		delta = cgrp->bstat;
 		cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
 		cgroup_base_stat_add(&parent->bstat, &delta);
-		cgroup_base_stat_add(&cgrp->last_bstat, &delta);
+		cgrp->last_bstat = cgrp->bstat;
 	}
 }
 
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic
  2023-05-18 12:41 [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic Hao Jia
@ 2023-05-19  4:15 ` Hao Jia
  2023-05-23 15:14   ` Michal Koutný
  0 siblings, 1 reply; 8+ messages in thread
From: Hao Jia @ 2023-05-19  4:15 UTC (permalink / raw)
  To: tj, lizefan.x, hannes; +Cc: cgroups, linux-kernel



On 2023/5/18 Hao Jia wrote:
> In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
> needs to be updated to the current {rstatc, cgrp}->bstat, directly
> assigning values instead of adding the last value to delta.
> 
> Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
> ---
>   kernel/cgroup/rstat.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index 9c4c55228567..3e5c4c1c92c6 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
>   	/* propagate percpu delta to global */
>   	cgroup_base_stat_sub(&delta, &rstatc->last_bstat);  *(1)*
>   	cgroup_base_stat_add(&cgrp->bstat, &delta);
> -	cgroup_base_stat_add(&rstatc->last_bstat, &delta);
> +	rstatc->last_bstat = rstatc->bstat;		    *(2)*

Some things are wrong, the value of rstatc->bstat at (1) and (2) may not 
be the same, rstatc->bstat may be updated on other cpu. Sorry for the noise.

>   
>   	/* propagate global delta to parent (unless that's root) */
>   	if (cgroup_parent(parent)) {
>   		delta = cgrp->bstat;
>   		cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
>   		cgroup_base_stat_add(&parent->bstat, &delta);
> -		cgroup_base_stat_add(&cgrp->last_bstat, &delta);
> +		cgrp->last_bstat = cgrp->bstat;
>   	}
>   }
>   

Maybe something like this?


In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
needs to be updated to the current {rstatc, cgrp}->bstat after the
calculation.

For the rstatc->last_bstat case, rstatc->bstat may be updated on other
cpus during our calculation, resulting in inconsistent rstatc->bstat
statistics for the two reads. So we use the temporary variable @cur to
record the read statc->bstat statistics, and use @cur to update
rstatc->last_bstat.

For the cgrp->last_bstat case, we already hold cgroup_rstat_lock, so
cgrp->bstat will not change during the calculation process, and it can
be directly used to update cgrp->last_bstat.

It is better for us to assign directly instead of using
cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat.

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
---
  kernel/cgroup/rstat.c | 9 +++++----
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 9c4c55228567..17a6a1fcc2d4 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -360,7 +360,7 @@ static void cgroup_base_stat_flush(struct cgroup 
*cgrp, int cpu)
  {
  	struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
  	struct cgroup *parent = cgroup_parent(cgrp);
-	struct cgroup_base_stat delta;
+	struct cgroup_base_stat delta, cur;
  	unsigned seq;

  	/* Root-level stats are sourced from system-wide CPU stats */
@@ -370,20 +370,21 @@ static void cgroup_base_stat_flush(struct cgroup 
*cgrp, int cpu)
  	/* fetch the current per-cpu values */
  	do {
  		seq = __u64_stats_fetch_begin(&rstatc->bsync);
-		delta = rstatc->bstat;
+		cur = rstatc->bstat;
  	} while (__u64_stats_fetch_retry(&rstatc->bsync, seq));

  	/* propagate percpu delta to global */
+	delta = cur;
  	cgroup_base_stat_sub(&delta, &rstatc->last_bstat);
  	cgroup_base_stat_add(&cgrp->bstat, &delta);
-	cgroup_base_stat_add(&rstatc->last_bstat, &delta);
+	rstatc->last_bstat = cur;

  	/* propagate global delta to parent (unless that's root) */
  	if (cgroup_parent(parent)) {
  		delta = cgrp->bstat;
  		cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
  		cgroup_base_stat_add(&parent->bstat, &delta);
-		cgroup_base_stat_add(&cgrp->last_bstat, &delta);
+		cgrp->last_bstat = cgrp->bstat;
  	}
  }


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic
  2023-05-19  4:15 ` Hao Jia
@ 2023-05-23 15:14   ` Michal Koutný
  2023-05-24  6:54     ` [External] " Hao Jia
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Koutný @ 2023-05-23 15:14 UTC (permalink / raw)
  To: Hao Jia; +Cc: tj, lizefan.x, hannes, cgroups, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

Hello Jia.

On Fri, May 19, 2023 at 12:15:57PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
> Maybe something like this?

(Next time please send with a version bump in subject.)


> In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
> needs to be updated to the current {rstatc, cgrp}->bstat after the
> calculation.
> 
> For the rstatc->last_bstat case, rstatc->bstat may be updated on other
> cpus during our calculation, resulting in inconsistent rstatc->bstat
> statistics for the two reads. So we use the temporary variable @cur to
> record the read statc->bstat statistics, and use @cur to update
> rstatc->last_bstat.

If a concurrent update happens after sample of bstat was taken for
calculation, it won't be reflected in the flushed result.
But subsequent flush will use the updated bstat and the difference from
last_bstat would account for that concurrent change (and any other
changes between the flushes).

IOW flushing cannot prevent concurrent updates but it will give
eventually consistent (repeated without more updates) results.

> It is better for us to assign directly instead of using
> cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat.

Or do you mean the copying is faster then arithmetics?

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic
  2023-05-23 15:14   ` Michal Koutný
@ 2023-05-24  6:54     ` Hao Jia
  2023-05-24  8:02       ` Michal Koutný
  0 siblings, 1 reply; 8+ messages in thread
From: Hao Jia @ 2023-05-24  6:54 UTC (permalink / raw)
  To: Michal Koutný; +Cc: tj, lizefan.x, hannes, cgroups, linux-kernel



On 2023/5/23 Michal Koutný wrote:
> Hello Jia.
> 
> On Fri, May 19, 2023 at 12:15:57PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
>> Maybe something like this?
> 
> (Next time please send with a version bump in subject.)

Thanks for your review, I will do it in the next version.

> 
> 
>> In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
>> needs to be updated to the current {rstatc, cgrp}->bstat after the
>> calculation.
>>
>> For the rstatc->last_bstat case, rstatc->bstat may be updated on other
>> cpus during our calculation, resulting in inconsistent rstatc->bstat
>> statistics for the two reads. So we use the temporary variable @cur to
>> record the read statc->bstat statistics, and use @cur to update
>> rstatc->last_bstat.
> 
> If a concurrent update happens after sample of bstat was taken for
> calculation, it won't be reflected in the flushed result.
> But subsequent flush will use the updated bstat and the difference from
> last_bstat would account for that concurrent change (and any other
> changes between the flushes).
> 
> IOW flushing cannot prevent concurrent updates but it will give
> eventually consistent (repeated without more updates) results.
> 

Yes, so we need @curr to record the bstat value after the sequence fetch 
is completed.


>> It is better for us to assign directly instead of using
>> cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat.
> 
> Or do you mean the copying is faster then arithmetics?
> 

Yes, but it may not be obvious.
Another reason is that when we complete an update, we snapshot 
last_bstat as the current bstat, which is better for readers to 
understand. Arithmetics is somewhat obscure.

Thanks,
Hao

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic
  2023-05-24  6:54     ` [External] " Hao Jia
@ 2023-05-24  8:02       ` Michal Koutný
  2023-05-24  8:41         ` Hao Jia
  2023-06-12  3:13         ` Hao Jia
  0 siblings, 2 replies; 8+ messages in thread
From: Michal Koutný @ 2023-05-24  8:02 UTC (permalink / raw)
  To: Hao Jia; +Cc: tj, lizefan.x, hannes, cgroups, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 648 bytes --]

On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
> Yes, so we need @curr to record the bstat value after the sequence fetch is
> completed.

No, I still don't see a problem that it solves. If you find incorrect
data being reported, please explain it more/with an example.

> Yes, but it may not be obvious.
> Another reason is that when we complete an update, we snapshot last_bstat as
> the current bstat, which is better for readers to understand. Arithmetics is
> somewhat obscure.

The readability here is subjective. It'd be interesting to have some
data comparing arithmetics vs copying though.

HTH,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic
  2023-05-24  8:02       ` Michal Koutný
@ 2023-05-24  8:41         ` Hao Jia
  2023-06-12  3:13         ` Hao Jia
  1 sibling, 0 replies; 8+ messages in thread
From: Hao Jia @ 2023-05-24  8:41 UTC (permalink / raw)
  To: Michal Koutný; +Cc: tj, lizefan.x, hannes, cgroups, linux-kernel



On 2023/5/24 Michal Koutný wrote:
> On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
>> Yes, so we need @curr to record the bstat value after the sequence fetch is
>> completed.
> 
> No, I still don't see a problem that it solves. If you find incorrect
> data being reported, please explain it more/with an example.

Sorry to confuse you.

My earliest patch is like this:

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 9c4c55228567..3e5c4c1c92c6 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup 
*cgrp, int cpu)
     /* propagate percpu delta to global */
     cgroup_base_stat_sub(&delta, &rstatc->last_bstat);  (1) <---
     cgroup_base_stat_add(&cgrp->bstat, &delta);
- cgroup_base_stat_add(&rstatc->last_bstat, &delta);
+ rstatc->last_bstat = rstatc->bstat; 			(2) <--

     /* propagate global delta to parent (unless that's root) */
     if (cgroup_parent(parent)) {
        delta = cgrp->bstat;
        cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
        cgroup_base_stat_add(&parent->bstat, &delta);
- cgroup_base_stat_add(&cgrp->last_bstat, &delta);
+ cgrp->last_bstat = cgrp->bstat;
     }
   }

If I understand correctly, the rstatc->bstat at (1) and (2) may be 
different. At (2) rstatc->bstat may have been updated on other CPUs.
Or we should not read rstatc->bstat directly, we should pass the 
following way

     do {
        seq = __u64_stats_fetch_begin(&rstatc->bsync);
        cur = rstatc->bstat;
     } while (__u64_stats_fetch_retry(&rstatc->bsync, seq));


> 
>> Yes, but it may not be obvious.
>> Another reason is that when we complete an update, we snapshot last_bstat as
>> the current bstat, which is better for readers to understand. Arithmetics is
>> somewhat obscure.
> 
> The readability here is subjective. It'd be interesting to have some
> data comparing arithmetics vs copying though.

Thanks for your suggestion, I plan to use RDTSC to compare the time 
consumption of arithmetics vs copying. Do you have better suggestions or 
tools?

Thanks,
Hao

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [External] Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic
  2023-05-24  8:02       ` Michal Koutný
  2023-05-24  8:41         ` Hao Jia
@ 2023-06-12  3:13         ` Hao Jia
  2023-06-13 11:52           ` Michal Koutný
  1 sibling, 1 reply; 8+ messages in thread
From: Hao Jia @ 2023-06-12  3:13 UTC (permalink / raw)
  To: Michal Koutný; +Cc: tj, lizefan.x, hannes, cgroups, linux-kernel



On 2023/5/24 Michal Koutný wrote:
> On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
>> Yes, so we need @curr to record the bstat value after the sequence fetch is
>> completed.
> 
> No, I still don't see a problem that it solves. If you find incorrect
> data being reported, please explain it more/with an example.
> 
>> Yes, but it may not be obvious.
>> Another reason is that when we complete an update, we snapshot last_bstat as
>> the current bstat, which is better for readers to understand. Arithmetics is
>> somewhat obscure.
> 
> The readability here is subjective. It'd be interesting to have some
> data comparing arithmetics vs copying though.
> 

Sorry for replying you so late. I am using RDTSC on my machine (an Intel 
Xeon(R) Platinum 8260 CPU@2.40GHz machine with 2 NUMA nodes each of 
which has 24 cores with SMT2 enabled, so 96 CPUs in total.) to compare 
the time consumption of arithmetics vs copying. There is almost no 
difference in the time consumption between arithmetics and copying.



> HTH,
> Michal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic
  2023-06-12  3:13         ` Hao Jia
@ 2023-06-13 11:52           ` Michal Koutný
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Koutný @ 2023-06-13 11:52 UTC (permalink / raw)
  To: Hao Jia; +Cc: tj, lizefan.x, hannes, cgroups, linux-kernel

On Mon, Jun 12, 2023 at 11:13:41AM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
> Sorry for replying you so late. I am using RDTSC on my machine (an Intel
> Xeon(R) Platinum 8260 CPU@2.40GHz machine with 2 NUMA nodes each of which
> has 24 cores with SMT2 enabled, so 96 CPUs in total.) to compare the time
> consumption of arithmetics vs copying. There is almost no difference in the
> time consumption between arithmetics and copying.

Thanks for carrying out and sharing this despite not convincing towards
the change.

Michal

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-13 11:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18 12:41 [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic Hao Jia
2023-05-19  4:15 ` Hao Jia
2023-05-23 15:14   ` Michal Koutný
2023-05-24  6:54     ` [External] " Hao Jia
2023-05-24  8:02       ` Michal Koutný
2023-05-24  8:41         ` Hao Jia
2023-06-12  3:13         ` Hao Jia
2023-06-13 11:52           ` Michal Koutný

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).