* [PATCH] rcu: Make jiffies_till_sched_qs writable @ 2019-07-08 6:00 Byungchul Park 2019-07-08 12:50 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Byungchul Park @ 2019-07-08 6:00 UTC (permalink / raw) To: paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai, joel, rcu Cc: linux-kernel, kernel-team jiffies_till_sched_qs is useless if it's readonly as it is used to set jiffies_to_sched_qs with its value regardless of first/next fqs jiffies. And it should be applied immediately on change through sysfs. The function for setting jiffies_to_sched_qs, adjust_jiffies_till_sched_qs() will be called only if the value from sysfs != ULONG_MAX. And the value won't be adjusted unlike first/next fqs jiffies. While at it, changed the positions of two module_param()s downward. Signed-off-by: Byungchul Park <byungchul.park@lge.com> --- kernel/rcu/tree.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index a2f8ba2..a28e2fe 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -422,9 +422,7 @@ static int rcu_is_cpu_rrupt_from_idle(void) * quiescent-state help from rcu_note_context_switch(). */ static ulong jiffies_till_sched_qs = ULONG_MAX; -module_param(jiffies_till_sched_qs, ulong, 0444); static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ /* * Make sure that we give the grace-period kthread time to detect any @@ -450,6 +448,18 @@ static void adjust_jiffies_till_sched_qs(void) WRITE_ONCE(jiffies_to_sched_qs, j); } +static int param_set_sched_qs_jiffies(const char *val, const struct kernel_param *kp) +{ + ulong j; + int ret = kstrtoul(val, 0, &j); + + if (!ret && j != ULONG_MAX) { + WRITE_ONCE(*(ulong *)kp->arg, j); + adjust_jiffies_till_sched_qs(); + } + return ret; +} + static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) { ulong j; @@ -474,6 +484,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param return ret; } +static struct kernel_param_ops sched_qs_jiffies_ops = { + .set = param_set_sched_qs_jiffies, + .get = param_get_ulong, +}; + static struct kernel_param_ops first_fqs_jiffies_ops = { .set = param_set_first_fqs_jiffies, .get = param_get_ulong, @@ -484,8 +499,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param .get = param_get_ulong, }; +module_param_cb(jiffies_till_sched_qs, &sched_qs_jiffies_ops, &jiffies_till_sched_qs, 0644); module_param_cb(jiffies_till_first_fqs, &first_fqs_jiffies_ops, &jiffies_till_first_fqs, 0644); module_param_cb(jiffies_till_next_fqs, &next_fqs_jiffies_ops, &jiffies_till_next_fqs, 0644); + +module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ module_param(rcu_kick_kthreads, bool, 0644); static void force_qs_rnp(int (*f)(struct rcu_data *rdp)); -- 1.9.1 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-08 6:00 [PATCH] rcu: Make jiffies_till_sched_qs writable Byungchul Park @ 2019-07-08 12:50 ` Paul E. McKenney 2019-07-08 13:03 ` Joel Fernandes 0 siblings, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2019-07-08 12:50 UTC (permalink / raw) To: Byungchul Park Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, joel, rcu, linux-kernel, kernel-team On Mon, Jul 08, 2019 at 03:00:09PM +0900, Byungchul Park wrote: > jiffies_till_sched_qs is useless if it's readonly as it is used to set > jiffies_to_sched_qs with its value regardless of first/next fqs jiffies. > And it should be applied immediately on change through sysfs. Actually, the intent was to only allow this to be changed at boot time. Of course, if there is now a good reason to adjust it, it needs to be adjustable. So what situation is making you want to change jiffies_till_sched_qs at runtime? To what values is it proving useful to adjust it? What (if any) relationships between this timeout and the various other RCU timeouts need to be maintained? What changes to rcutorture should be applied in order to test the ability to change this at runtime? Thanx, Paul > The function for setting jiffies_to_sched_qs, > adjust_jiffies_till_sched_qs() will be called only if > the value from sysfs != ULONG_MAX. And the value won't be adjusted > unlike first/next fqs jiffies. > > While at it, changed the positions of two module_param()s downward. > > Signed-off-by: Byungchul Park <byungchul.park@lge.com> > --- > kernel/rcu/tree.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index a2f8ba2..a28e2fe 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -422,9 +422,7 @@ static int rcu_is_cpu_rrupt_from_idle(void) > * quiescent-state help from rcu_note_context_switch(). > */ > static ulong jiffies_till_sched_qs = ULONG_MAX; > -module_param(jiffies_till_sched_qs, ulong, 0444); > static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > /* > * Make sure that we give the grace-period kthread time to detect any > @@ -450,6 +448,18 @@ static void adjust_jiffies_till_sched_qs(void) > WRITE_ONCE(jiffies_to_sched_qs, j); > } > > +static int param_set_sched_qs_jiffies(const char *val, const struct kernel_param *kp) > +{ > + ulong j; > + int ret = kstrtoul(val, 0, &j); > + > + if (!ret && j != ULONG_MAX) { > + WRITE_ONCE(*(ulong *)kp->arg, j); > + adjust_jiffies_till_sched_qs(); > + } > + return ret; > +} > + > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > { > ulong j; > @@ -474,6 +484,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > return ret; > } > > +static struct kernel_param_ops sched_qs_jiffies_ops = { > + .set = param_set_sched_qs_jiffies, > + .get = param_get_ulong, > +}; > + > static struct kernel_param_ops first_fqs_jiffies_ops = { > .set = param_set_first_fqs_jiffies, > .get = param_get_ulong, > @@ -484,8 +499,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > .get = param_get_ulong, > }; > > +module_param_cb(jiffies_till_sched_qs, &sched_qs_jiffies_ops, &jiffies_till_sched_qs, 0644); > module_param_cb(jiffies_till_first_fqs, &first_fqs_jiffies_ops, &jiffies_till_first_fqs, 0644); > module_param_cb(jiffies_till_next_fqs, &next_fqs_jiffies_ops, &jiffies_till_next_fqs, 0644); > + > +module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > module_param(rcu_kick_kthreads, bool, 0644); > > static void force_qs_rnp(int (*f)(struct rcu_data *rdp)); > -- > 1.9.1 > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-08 12:50 ` Paul E. McKenney @ 2019-07-08 13:03 ` Joel Fernandes 2019-07-08 13:19 ` Paul E. McKenney 2019-07-09 5:58 ` Byungchul Park 0 siblings, 2 replies; 53+ messages in thread From: Joel Fernandes @ 2019-07-08 13:03 UTC (permalink / raw) To: Paul E. McKenney Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team Good morning! On Mon, Jul 08, 2019 at 05:50:13AM -0700, Paul E. McKenney wrote: > On Mon, Jul 08, 2019 at 03:00:09PM +0900, Byungchul Park wrote: > > jiffies_till_sched_qs is useless if it's readonly as it is used to set > > jiffies_to_sched_qs with its value regardless of first/next fqs jiffies. > > And it should be applied immediately on change through sysfs. It is interesting it can be setup at boot time, but not at runtime. I think this can be mentioned in the change log that it is not really "read-only", because it is something that can be dynamically changed as a kernel boot parameter. > Actually, the intent was to only allow this to be changed at boot time. > Of course, if there is now a good reason to adjust it, it needs > to be adjustable. So what situation is making you want to change > jiffies_till_sched_qs at runtime? To what values is it proving useful > to adjust it? What (if any) relationships between this timeout and the > various other RCU timeouts need to be maintained? What changes to > rcutorture should be applied in order to test the ability to change > this at runtime? I am also interested in the context, are you changing it at runtime for experimentation? I recently was doing some performance experiments and it is quite interesting how reducing this value can shorten grace period times :) Joel > Thanx, Paul > > > The function for setting jiffies_to_sched_qs, > > adjust_jiffies_till_sched_qs() will be called only if > > the value from sysfs != ULONG_MAX. And the value won't be adjusted > > unlike first/next fqs jiffies. > > > > While at it, changed the positions of two module_param()s downward. > > > > Signed-off-by: Byungchul Park <byungchul.park@lge.com> > > --- > > kernel/rcu/tree.c | 22 ++++++++++++++++++++-- > > 1 file changed, 20 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index a2f8ba2..a28e2fe 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -422,9 +422,7 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > * quiescent-state help from rcu_note_context_switch(). > > */ > > static ulong jiffies_till_sched_qs = ULONG_MAX; > > -module_param(jiffies_till_sched_qs, ulong, 0444); > > static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > /* > > * Make sure that we give the grace-period kthread time to detect any > > @@ -450,6 +448,18 @@ static void adjust_jiffies_till_sched_qs(void) > > WRITE_ONCE(jiffies_to_sched_qs, j); > > } > > > > +static int param_set_sched_qs_jiffies(const char *val, const struct kernel_param *kp) > > +{ > > + ulong j; > > + int ret = kstrtoul(val, 0, &j); > > + > > + if (!ret && j != ULONG_MAX) { > > + WRITE_ONCE(*(ulong *)kp->arg, j); > > + adjust_jiffies_till_sched_qs(); > > + } > > + return ret; > > +} > > + > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > { > > ulong j; > > @@ -474,6 +484,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > return ret; > > } > > > > +static struct kernel_param_ops sched_qs_jiffies_ops = { > > + .set = param_set_sched_qs_jiffies, > > + .get = param_get_ulong, > > +}; > > + > > static struct kernel_param_ops first_fqs_jiffies_ops = { > > .set = param_set_first_fqs_jiffies, > > .get = param_get_ulong, > > @@ -484,8 +499,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > .get = param_get_ulong, > > }; > > > > +module_param_cb(jiffies_till_sched_qs, &sched_qs_jiffies_ops, &jiffies_till_sched_qs, 0644); > > module_param_cb(jiffies_till_first_fqs, &first_fqs_jiffies_ops, &jiffies_till_first_fqs, 0644); > > module_param_cb(jiffies_till_next_fqs, &next_fqs_jiffies_ops, &jiffies_till_next_fqs, 0644); > > + > > +module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > module_param(rcu_kick_kthreads, bool, 0644); > > > > static void force_qs_rnp(int (*f)(struct rcu_data *rdp)); > > -- > > 1.9.1 > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-08 13:03 ` Joel Fernandes @ 2019-07-08 13:19 ` Paul E. McKenney 2019-07-08 14:15 ` Joel Fernandes 2019-07-09 6:05 ` Byungchul Park 2019-07-09 5:58 ` Byungchul Park 1 sibling, 2 replies; 53+ messages in thread From: Paul E. McKenney @ 2019-07-08 13:19 UTC (permalink / raw) To: Joel Fernandes Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Mon, Jul 08, 2019 at 09:03:59AM -0400, Joel Fernandes wrote: > Good morning! > > On Mon, Jul 08, 2019 at 05:50:13AM -0700, Paul E. McKenney wrote: > > On Mon, Jul 08, 2019 at 03:00:09PM +0900, Byungchul Park wrote: > > > jiffies_till_sched_qs is useless if it's readonly as it is used to set > > > jiffies_to_sched_qs with its value regardless of first/next fqs jiffies. > > > And it should be applied immediately on change through sysfs. > > It is interesting it can be setup at boot time, but not at runtime. I think > this can be mentioned in the change log that it is not really "read-only", > because it is something that can be dynamically changed as a kernel boot > parameter. In Byungchul's defense, the current module_param() permissions are 0444, which really is read-only. Although I do agree that they can be written at boot, one could use this same line of reasoning to argue that const variables can be written at compile time (or, for on-stack const variables, at function-invocation time). But we still call them "const". > > Actually, the intent was to only allow this to be changed at boot time. > > Of course, if there is now a good reason to adjust it, it needs > > to be adjustable. So what situation is making you want to change > > jiffies_till_sched_qs at runtime? To what values is it proving useful > > to adjust it? What (if any) relationships between this timeout and the > > various other RCU timeouts need to be maintained? What changes to > > rcutorture should be applied in order to test the ability to change > > this at runtime? > > I am also interested in the context, are you changing it at runtime for > experimentation? I recently was doing some performance experiments and it is > quite interesting how reducing this value can shorten grace period times :) If you -really- want to reduce grace-period latencies, you can always boot with rcupdate.rcu_expedited=1. ;-) If you want to reduce grace-period latencies, but without all the IPIs that expedited grace periods give you, the rcutree.jiffies_till_first_fqs and rcutree.jiffies_till_next_fqs kernel boot parameters might be better places to start than rcutree.jiffies_till_sched_qs. For one thing, adjusting these two affects the value of jiffies_till_sched_qs. Thanx, Paul > Joel > > > > Thanx, Paul > > > > > The function for setting jiffies_to_sched_qs, > > > adjust_jiffies_till_sched_qs() will be called only if > > > the value from sysfs != ULONG_MAX. And the value won't be adjusted > > > unlike first/next fqs jiffies. > > > > > > While at it, changed the positions of two module_param()s downward. > > > > > > Signed-off-by: Byungchul Park <byungchul.park@lge.com> > > > --- > > > kernel/rcu/tree.c | 22 ++++++++++++++++++++-- > > > 1 file changed, 20 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index a2f8ba2..a28e2fe 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -422,9 +422,7 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > > * quiescent-state help from rcu_note_context_switch(). > > > */ > > > static ulong jiffies_till_sched_qs = ULONG_MAX; > > > -module_param(jiffies_till_sched_qs, ulong, 0444); > > > static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > > /* > > > * Make sure that we give the grace-period kthread time to detect any > > > @@ -450,6 +448,18 @@ static void adjust_jiffies_till_sched_qs(void) > > > WRITE_ONCE(jiffies_to_sched_qs, j); > > > } > > > > > > +static int param_set_sched_qs_jiffies(const char *val, const struct kernel_param *kp) > > > +{ > > > + ulong j; > > > + int ret = kstrtoul(val, 0, &j); > > > + > > > + if (!ret && j != ULONG_MAX) { > > > + WRITE_ONCE(*(ulong *)kp->arg, j); > > > + adjust_jiffies_till_sched_qs(); > > > + } > > > + return ret; > > > +} > > > + > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > > { > > > ulong j; > > > @@ -474,6 +484,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > return ret; > > > } > > > > > > +static struct kernel_param_ops sched_qs_jiffies_ops = { > > > + .set = param_set_sched_qs_jiffies, > > > + .get = param_get_ulong, > > > +}; > > > + > > > static struct kernel_param_ops first_fqs_jiffies_ops = { > > > .set = param_set_first_fqs_jiffies, > > > .get = param_get_ulong, > > > @@ -484,8 +499,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > .get = param_get_ulong, > > > }; > > > > > > +module_param_cb(jiffies_till_sched_qs, &sched_qs_jiffies_ops, &jiffies_till_sched_qs, 0644); > > > module_param_cb(jiffies_till_first_fqs, &first_fqs_jiffies_ops, &jiffies_till_first_fqs, 0644); > > > module_param_cb(jiffies_till_next_fqs, &next_fqs_jiffies_ops, &jiffies_till_next_fqs, 0644); > > > + > > > +module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > module_param(rcu_kick_kthreads, bool, 0644); > > > > > > static void force_qs_rnp(int (*f)(struct rcu_data *rdp)); > > > -- > > > 1.9.1 > > > > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-08 13:19 ` Paul E. McKenney @ 2019-07-08 14:15 ` Joel Fernandes 2019-07-09 6:05 ` Byungchul Park 1 sibling, 0 replies; 53+ messages in thread From: Joel Fernandes @ 2019-07-08 14:15 UTC (permalink / raw) To: Paul E. McKenney Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Mon, Jul 08, 2019 at 06:19:42AM -0700, Paul E. McKenney wrote: [snip] > > > Actually, the intent was to only allow this to be changed at boot time. > > > Of course, if there is now a good reason to adjust it, it needs > > > to be adjustable. So what situation is making you want to change > > > jiffies_till_sched_qs at runtime? To what values is it proving useful > > > to adjust it? What (if any) relationships between this timeout and the > > > various other RCU timeouts need to be maintained? What changes to > > > rcutorture should be applied in order to test the ability to change > > > this at runtime? > > > > I am also interested in the context, are you changing it at runtime for > > experimentation? I recently was doing some performance experiments and it is > > quite interesting how reducing this value can shorten grace period times :) > > If you -really- want to reduce grace-period latencies, you can always > boot with rcupdate.rcu_expedited=1. ;-) > > If you want to reduce grace-period latencies, but without all the IPIs > that expedited grace periods give you, the rcutree.jiffies_till_first_fqs > and rcutree.jiffies_till_next_fqs kernel boot parameters might be better > places to start than rcutree.jiffies_till_sched_qs. For one thing, > adjusting these two affects the value of jiffies_till_sched_qs. Yes, agreed ;) Joel > > > > > Thanx, Paul > > > > > > > The function for setting jiffies_to_sched_qs, > > > > adjust_jiffies_till_sched_qs() will be called only if > > > > the value from sysfs != ULONG_MAX. And the value won't be adjusted > > > > unlike first/next fqs jiffies. > > > > > > > > While at it, changed the positions of two module_param()s downward. > > > > > > > > Signed-off-by: Byungchul Park <byungchul.park@lge.com> > > > > --- > > > > kernel/rcu/tree.c | 22 ++++++++++++++++++++-- > > > > 1 file changed, 20 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index a2f8ba2..a28e2fe 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -422,9 +422,7 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > > > * quiescent-state help from rcu_note_context_switch(). > > > > */ > > > > static ulong jiffies_till_sched_qs = ULONG_MAX; > > > > -module_param(jiffies_till_sched_qs, ulong, 0444); > > > > static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > > > > /* > > > > * Make sure that we give the grace-period kthread time to detect any > > > > @@ -450,6 +448,18 @@ static void adjust_jiffies_till_sched_qs(void) > > > > WRITE_ONCE(jiffies_to_sched_qs, j); > > > > } > > > > > > > > +static int param_set_sched_qs_jiffies(const char *val, const struct kernel_param *kp) > > > > +{ > > > > + ulong j; > > > > + int ret = kstrtoul(val, 0, &j); > > > > + > > > > + if (!ret && j != ULONG_MAX) { > > > > + WRITE_ONCE(*(ulong *)kp->arg, j); > > > > + adjust_jiffies_till_sched_qs(); > > > > + } > > > > + return ret; > > > > +} > > > > + > > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > > > { > > > > ulong j; > > > > @@ -474,6 +484,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > > return ret; > > > > } > > > > > > > > +static struct kernel_param_ops sched_qs_jiffies_ops = { > > > > + .set = param_set_sched_qs_jiffies, > > > > + .get = param_get_ulong, > > > > +}; > > > > + > > > > static struct kernel_param_ops first_fqs_jiffies_ops = { > > > > .set = param_set_first_fqs_jiffies, > > > > .get = param_get_ulong, > > > > @@ -484,8 +499,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > > .get = param_get_ulong, > > > > }; > > > > > > > > +module_param_cb(jiffies_till_sched_qs, &sched_qs_jiffies_ops, &jiffies_till_sched_qs, 0644); > > > > module_param_cb(jiffies_till_first_fqs, &first_fqs_jiffies_ops, &jiffies_till_first_fqs, 0644); > > > > module_param_cb(jiffies_till_next_fqs, &next_fqs_jiffies_ops, &jiffies_till_next_fqs, 0644); > > > > + > > > > +module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > module_param(rcu_kick_kthreads, bool, 0644); > > > > > > > > static void force_qs_rnp(int (*f)(struct rcu_data *rdp)); > > > > -- > > > > 1.9.1 > > > > > > > > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-08 13:19 ` Paul E. McKenney 2019-07-08 14:15 ` Joel Fernandes @ 2019-07-09 6:05 ` Byungchul Park 2019-07-09 12:43 ` Paul E. McKenney 1 sibling, 1 reply; 53+ messages in thread From: Byungchul Park @ 2019-07-09 6:05 UTC (permalink / raw) To: Paul E. McKenney Cc: Joel Fernandes, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Mon, Jul 08, 2019 at 06:19:42AM -0700, Paul E. McKenney wrote: > On Mon, Jul 08, 2019 at 09:03:59AM -0400, Joel Fernandes wrote: > > Good morning! > > > > On Mon, Jul 08, 2019 at 05:50:13AM -0700, Paul E. McKenney wrote: > > > On Mon, Jul 08, 2019 at 03:00:09PM +0900, Byungchul Park wrote: > > > > jiffies_till_sched_qs is useless if it's readonly as it is used to set > > > > jiffies_to_sched_qs with its value regardless of first/next fqs jiffies. > > > > And it should be applied immediately on change through sysfs. > > > > It is interesting it can be setup at boot time, but not at runtime. I think > > this can be mentioned in the change log that it is not really "read-only", > > because it is something that can be dynamically changed as a kernel boot > > parameter. > > In Byungchul's defense, the current module_param() permissions are > 0444, which really is read-only. Although I do agree that they can > be written at boot, one could use this same line of reasoning to argue > that const variables can be written at compile time (or, for on-stack > const variables, at function-invocation time). But we still call them > "const". ;-) > > > Actually, the intent was to only allow this to be changed at boot time. > > > Of course, if there is now a good reason to adjust it, it needs > > > to be adjustable. So what situation is making you want to change > > > jiffies_till_sched_qs at runtime? To what values is it proving useful > > > to adjust it? What (if any) relationships between this timeout and the > > > various other RCU timeouts need to be maintained? What changes to > > > rcutorture should be applied in order to test the ability to change > > > this at runtime? > > > > I am also interested in the context, are you changing it at runtime for > > experimentation? I recently was doing some performance experiments and it is > > quite interesting how reducing this value can shorten grace period times :) > > If you -really- want to reduce grace-period latencies, you can always > boot with rcupdate.rcu_expedited=1. ;-) It's a quite different mechanism at the moment though... :( > If you want to reduce grace-period latencies, but without all the IPIs > that expedited grace periods give you, the rcutree.jiffies_till_first_fqs > and rcutree.jiffies_till_next_fqs kernel boot parameters might be better > places to start than rcutree.jiffies_till_sched_qs. For one thing, > adjusting these two affects the value of jiffies_till_sched_qs. Do you mean: adjusting these two affects the value of *jiffies_to_sched_qs* (instead of jiffies_till_sched_qs). Right? Thanks, Byungchul > > Thanx, Paul > > > Joel > > > > > > > Thanx, Paul > > > > > > > The function for setting jiffies_to_sched_qs, > > > > adjust_jiffies_till_sched_qs() will be called only if > > > > the value from sysfs != ULONG_MAX. And the value won't be adjusted > > > > unlike first/next fqs jiffies. > > > > > > > > While at it, changed the positions of two module_param()s downward. > > > > > > > > Signed-off-by: Byungchul Park <byungchul.park@lge.com> > > > > --- > > > > kernel/rcu/tree.c | 22 ++++++++++++++++++++-- > > > > 1 file changed, 20 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index a2f8ba2..a28e2fe 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -422,9 +422,7 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > > > * quiescent-state help from rcu_note_context_switch(). > > > > */ > > > > static ulong jiffies_till_sched_qs = ULONG_MAX; > > > > -module_param(jiffies_till_sched_qs, ulong, 0444); > > > > static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > > > > /* > > > > * Make sure that we give the grace-period kthread time to detect any > > > > @@ -450,6 +448,18 @@ static void adjust_jiffies_till_sched_qs(void) > > > > WRITE_ONCE(jiffies_to_sched_qs, j); > > > > } > > > > > > > > +static int param_set_sched_qs_jiffies(const char *val, const struct kernel_param *kp) > > > > +{ > > > > + ulong j; > > > > + int ret = kstrtoul(val, 0, &j); > > > > + > > > > + if (!ret && j != ULONG_MAX) { > > > > + WRITE_ONCE(*(ulong *)kp->arg, j); > > > > + adjust_jiffies_till_sched_qs(); > > > > + } > > > > + return ret; > > > > +} > > > > + > > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > > > { > > > > ulong j; > > > > @@ -474,6 +484,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > > return ret; > > > > } > > > > > > > > +static struct kernel_param_ops sched_qs_jiffies_ops = { > > > > + .set = param_set_sched_qs_jiffies, > > > > + .get = param_get_ulong, > > > > +}; > > > > + > > > > static struct kernel_param_ops first_fqs_jiffies_ops = { > > > > .set = param_set_first_fqs_jiffies, > > > > .get = param_get_ulong, > > > > @@ -484,8 +499,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > > .get = param_get_ulong, > > > > }; > > > > > > > > +module_param_cb(jiffies_till_sched_qs, &sched_qs_jiffies_ops, &jiffies_till_sched_qs, 0644); > > > > module_param_cb(jiffies_till_first_fqs, &first_fqs_jiffies_ops, &jiffies_till_first_fqs, 0644); > > > > module_param_cb(jiffies_till_next_fqs, &next_fqs_jiffies_ops, &jiffies_till_next_fqs, 0644); > > > > + > > > > +module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > module_param(rcu_kick_kthreads, bool, 0644); > > > > > > > > static void force_qs_rnp(int (*f)(struct rcu_data *rdp)); > > > > -- > > > > 1.9.1 > > > > > > > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-09 6:05 ` Byungchul Park @ 2019-07-09 12:43 ` Paul E. McKenney 0 siblings, 0 replies; 53+ messages in thread From: Paul E. McKenney @ 2019-07-09 12:43 UTC (permalink / raw) To: Byungchul Park Cc: Joel Fernandes, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Tue, Jul 09, 2019 at 03:05:58PM +0900, Byungchul Park wrote: > On Mon, Jul 08, 2019 at 06:19:42AM -0700, Paul E. McKenney wrote: > > On Mon, Jul 08, 2019 at 09:03:59AM -0400, Joel Fernandes wrote: > > > Good morning! > > > > > > On Mon, Jul 08, 2019 at 05:50:13AM -0700, Paul E. McKenney wrote: > > > > On Mon, Jul 08, 2019 at 03:00:09PM +0900, Byungchul Park wrote: > > > > > jiffies_till_sched_qs is useless if it's readonly as it is used to set > > > > > jiffies_to_sched_qs with its value regardless of first/next fqs jiffies. > > > > > And it should be applied immediately on change through sysfs. > > > > > > It is interesting it can be setup at boot time, but not at runtime. I think > > > this can be mentioned in the change log that it is not really "read-only", > > > because it is something that can be dynamically changed as a kernel boot > > > parameter. > > > > In Byungchul's defense, the current module_param() permissions are > > 0444, which really is read-only. Although I do agree that they can > > be written at boot, one could use this same line of reasoning to argue > > that const variables can be written at compile time (or, for on-stack > > const variables, at function-invocation time). But we still call them > > "const". > > ;-) > > > > > Actually, the intent was to only allow this to be changed at boot time. > > > > Of course, if there is now a good reason to adjust it, it needs > > > > to be adjustable. So what situation is making you want to change > > > > jiffies_till_sched_qs at runtime? To what values is it proving useful > > > > to adjust it? What (if any) relationships between this timeout and the > > > > various other RCU timeouts need to be maintained? What changes to > > > > rcutorture should be applied in order to test the ability to change > > > > this at runtime? > > > > > > I am also interested in the context, are you changing it at runtime for > > > experimentation? I recently was doing some performance experiments and it is > > > quite interesting how reducing this value can shorten grace period times :) > > > > If you -really- want to reduce grace-period latencies, you can always > > boot with rcupdate.rcu_expedited=1. ;-) > > It's a quite different mechanism at the moment though... :( Quite true, but a rather effective mechanism for reducing grace-period latency, give or take the needs of aggressive real-time workloads. > > If you want to reduce grace-period latencies, but without all the IPIs > > that expedited grace periods give you, the rcutree.jiffies_till_first_fqs > > and rcutree.jiffies_till_next_fqs kernel boot parameters might be better > > places to start than rcutree.jiffies_till_sched_qs. For one thing, > > adjusting these two affects the value of jiffies_till_sched_qs. > > Do you mean: > > adjusting these two affects the value of *jiffies_to_sched_qs* (instead > of jiffies_till_sched_qs). > > Right? Yes, and apologies for my confusion! Thaxn, Paul > Thanks, > Byungchul > > > > > Thanx, Paul > > > > > Joel > > > > > > > > > > Thanx, Paul > > > > > > > > > The function for setting jiffies_to_sched_qs, > > > > > adjust_jiffies_till_sched_qs() will be called only if > > > > > the value from sysfs != ULONG_MAX. And the value won't be adjusted > > > > > unlike first/next fqs jiffies. > > > > > > > > > > While at it, changed the positions of two module_param()s downward. > > > > > > > > > > Signed-off-by: Byungchul Park <byungchul.park@lge.com> > > > > > --- > > > > > kernel/rcu/tree.c | 22 ++++++++++++++++++++-- > > > > > 1 file changed, 20 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index a2f8ba2..a28e2fe 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -422,9 +422,7 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > > > > * quiescent-state help from rcu_note_context_switch(). > > > > > */ > > > > > static ulong jiffies_till_sched_qs = ULONG_MAX; > > > > > -module_param(jiffies_till_sched_qs, ulong, 0444); > > > > > static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > > > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > > > > > > /* > > > > > * Make sure that we give the grace-period kthread time to detect any > > > > > @@ -450,6 +448,18 @@ static void adjust_jiffies_till_sched_qs(void) > > > > > WRITE_ONCE(jiffies_to_sched_qs, j); > > > > > } > > > > > > > > > > +static int param_set_sched_qs_jiffies(const char *val, const struct kernel_param *kp) > > > > > +{ > > > > > + ulong j; > > > > > + int ret = kstrtoul(val, 0, &j); > > > > > + > > > > > + if (!ret && j != ULONG_MAX) { > > > > > + WRITE_ONCE(*(ulong *)kp->arg, j); > > > > > + adjust_jiffies_till_sched_qs(); > > > > > + } > > > > > + return ret; > > > > > +} > > > > > + > > > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > > > > { > > > > > ulong j; > > > > > @@ -474,6 +484,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > > > return ret; > > > > > } > > > > > > > > > > +static struct kernel_param_ops sched_qs_jiffies_ops = { > > > > > + .set = param_set_sched_qs_jiffies, > > > > > + .get = param_get_ulong, > > > > > +}; > > > > > + > > > > > static struct kernel_param_ops first_fqs_jiffies_ops = { > > > > > .set = param_set_first_fqs_jiffies, > > > > > .get = param_get_ulong, > > > > > @@ -484,8 +499,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > > > .get = param_get_ulong, > > > > > }; > > > > > > > > > > +module_param_cb(jiffies_till_sched_qs, &sched_qs_jiffies_ops, &jiffies_till_sched_qs, 0644); > > > > > module_param_cb(jiffies_till_first_fqs, &first_fqs_jiffies_ops, &jiffies_till_first_fqs, 0644); > > > > > module_param_cb(jiffies_till_next_fqs, &next_fqs_jiffies_ops, &jiffies_till_next_fqs, 0644); > > > > > + > > > > > +module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > module_param(rcu_kick_kthreads, bool, 0644); > > > > > > > > > > static void force_qs_rnp(int (*f)(struct rcu_data *rdp)); > > > > > -- > > > > > 1.9.1 > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-08 13:03 ` Joel Fernandes 2019-07-08 13:19 ` Paul E. McKenney @ 2019-07-09 5:58 ` Byungchul Park 2019-07-09 6:45 ` Byungchul Park 2019-07-09 12:41 ` Paul E. McKenney 1 sibling, 2 replies; 53+ messages in thread From: Byungchul Park @ 2019-07-09 5:58 UTC (permalink / raw) To: Joel Fernandes Cc: Paul E. McKenney, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Mon, Jul 08, 2019 at 09:03:59AM -0400, Joel Fernandes wrote: > > Actually, the intent was to only allow this to be changed at boot time. > > Of course, if there is now a good reason to adjust it, it needs > > to be adjustable. So what situation is making you want to change > > jiffies_till_sched_qs at runtime? To what values is it proving useful > > to adjust it? What (if any) relationships between this timeout and the > > various other RCU timeouts need to be maintained? What changes to > > rcutorture should be applied in order to test the ability to change > > this at runtime? > > I am also interested in the context, are you changing it at runtime for > experimentation? I recently was doing some performance experiments and it is > quite interesting how reducing this value can shorten grace period times :) Hi Joel, I've read a thread talking about your experiment to see how the grace periods change depending on the tunnable variables which was interesting to me. While reading it, I found out jiffies_till_sched_qs is not tunnable at runtime unlike jiffies_till_{first,next}_fqs which looks like non-sense to me that's why I tried this patch. :) Hi Paul, IMHO, as much as we want to tune the time for fqs to be initiated, we can also want to tune the time for the help from scheduler to start. I thought only difference between them is a level of urgency. I might be wrong. It would be appreciated if you let me know if I miss something. And it's ok even if the patch is turned down based on your criteria. :) Thanks, Byungchul > Joel > > > > Thanx, Paul > > > > > The function for setting jiffies_to_sched_qs, > > > adjust_jiffies_till_sched_qs() will be called only if > > > the value from sysfs != ULONG_MAX. And the value won't be adjusted > > > unlike first/next fqs jiffies. > > > > > > While at it, changed the positions of two module_param()s downward. > > > > > > Signed-off-by: Byungchul Park <byungchul.park@lge.com> > > > --- > > > kernel/rcu/tree.c | 22 ++++++++++++++++++++-- > > > 1 file changed, 20 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index a2f8ba2..a28e2fe 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -422,9 +422,7 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > > * quiescent-state help from rcu_note_context_switch(). > > > */ > > > static ulong jiffies_till_sched_qs = ULONG_MAX; > > > -module_param(jiffies_till_sched_qs, ulong, 0444); > > > static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > > /* > > > * Make sure that we give the grace-period kthread time to detect any > > > @@ -450,6 +448,18 @@ static void adjust_jiffies_till_sched_qs(void) > > > WRITE_ONCE(jiffies_to_sched_qs, j); > > > } > > > > > > +static int param_set_sched_qs_jiffies(const char *val, const struct kernel_param *kp) > > > +{ > > > + ulong j; > > > + int ret = kstrtoul(val, 0, &j); > > > + > > > + if (!ret && j != ULONG_MAX) { > > > + WRITE_ONCE(*(ulong *)kp->arg, j); > > > + adjust_jiffies_till_sched_qs(); > > > + } > > > + return ret; > > > +} > > > + > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > > { > > > ulong j; > > > @@ -474,6 +484,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > return ret; > > > } > > > > > > +static struct kernel_param_ops sched_qs_jiffies_ops = { > > > + .set = param_set_sched_qs_jiffies, > > > + .get = param_get_ulong, > > > +}; > > > + > > > static struct kernel_param_ops first_fqs_jiffies_ops = { > > > .set = param_set_first_fqs_jiffies, > > > .get = param_get_ulong, > > > @@ -484,8 +499,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > .get = param_get_ulong, > > > }; > > > > > > +module_param_cb(jiffies_till_sched_qs, &sched_qs_jiffies_ops, &jiffies_till_sched_qs, 0644); > > > module_param_cb(jiffies_till_first_fqs, &first_fqs_jiffies_ops, &jiffies_till_first_fqs, 0644); > > > module_param_cb(jiffies_till_next_fqs, &next_fqs_jiffies_ops, &jiffies_till_next_fqs, 0644); > > > + > > > +module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > module_param(rcu_kick_kthreads, bool, 0644); > > > > > > static void force_qs_rnp(int (*f)(struct rcu_data *rdp)); > > > -- > > > 1.9.1 > > > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-09 5:58 ` Byungchul Park @ 2019-07-09 6:45 ` Byungchul Park 2019-07-09 12:41 ` Paul E. McKenney 1 sibling, 0 replies; 53+ messages in thread From: Byungchul Park @ 2019-07-09 6:45 UTC (permalink / raw) To: Joel Fernandes Cc: Paul E. McKenney, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Tue, Jul 09, 2019 at 02:58:16PM +0900, Byungchul Park wrote: > On Mon, Jul 08, 2019 at 09:03:59AM -0400, Joel Fernandes wrote: > > > Actually, the intent was to only allow this to be changed at boot time. > > > Of course, if there is now a good reason to adjust it, it needs > > > to be adjustable. So what situation is making you want to change > > > jiffies_till_sched_qs at runtime? To what values is it proving useful > > > to adjust it? What (if any) relationships between this timeout and the > > > various other RCU timeouts need to be maintained? What changes to > > > rcutorture should be applied in order to test the ability to change > > > this at runtime? > > > > I am also interested in the context, are you changing it at runtime for > > experimentation? I recently was doing some performance experiments and it is > > quite interesting how reducing this value can shorten grace period times :) > > Hi Joel, > > I've read a thread talking about your experiment to see how the grace > periods change depending on the tunnable variables which was interesting > to me. While reading it, I found out jiffies_till_sched_qs is not > tunnable at runtime unlike jiffies_till_{first,next}_fqs which looks > like non-sense to me that's why I tried this patch. :) > > Hi Paul, > > IMHO, as much as we want to tune the time for fqs to be initiated, we > can also want to tune the time for the help from scheduler to start. Let me mention one more thing here... This is about jiffies_till_sched_qs, I think, used to directly set jiffies_to_sched_qs. > I thought only difference between them is a level of urgency. Of course, they are coupled in case jiffies_to_sched_qs is calculated from jiffies_till_{first,next}_fqs though, I'm just talking about its original motivation or concept. Thanks, Byungchul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-09 5:58 ` Byungchul Park 2019-07-09 6:45 ` Byungchul Park @ 2019-07-09 12:41 ` Paul E. McKenney 2019-07-10 1:20 ` Byungchul Park 1 sibling, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2019-07-09 12:41 UTC (permalink / raw) To: Byungchul Park Cc: Joel Fernandes, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Tue, Jul 09, 2019 at 02:58:16PM +0900, Byungchul Park wrote: > On Mon, Jul 08, 2019 at 09:03:59AM -0400, Joel Fernandes wrote: > > > Actually, the intent was to only allow this to be changed at boot time. > > > Of course, if there is now a good reason to adjust it, it needs > > > to be adjustable. So what situation is making you want to change > > > jiffies_till_sched_qs at runtime? To what values is it proving useful > > > to adjust it? What (if any) relationships between this timeout and the > > > various other RCU timeouts need to be maintained? What changes to > > > rcutorture should be applied in order to test the ability to change > > > this at runtime? > > > > I am also interested in the context, are you changing it at runtime for > > experimentation? I recently was doing some performance experiments and it is > > quite interesting how reducing this value can shorten grace period times :) > > Hi Joel, > > I've read a thread talking about your experiment to see how the grace > periods change depending on the tunnable variables which was interesting > to me. While reading it, I found out jiffies_till_sched_qs is not > tunnable at runtime unlike jiffies_till_{first,next}_fqs which looks > like non-sense to me that's why I tried this patch. :) > > Hi Paul, > > IMHO, as much as we want to tune the time for fqs to be initiated, we > can also want to tune the time for the help from scheduler to start. > I thought only difference between them is a level of urgency. I might be > wrong. It would be appreciated if you let me know if I miss something. Hello, Byungchul, I understand that one hypothetically might want to tune this at runtime, but have you had need to tune this at runtime on a real production workload? If so, what problem was happening that caused you to want to do this tuning? > And it's ok even if the patch is turned down based on your criteria. :) If there is a real need, something needs to be provided to meet that need. But in the absence of a real need, past experience has shown that speculative tuning knobs usually do more harm than good. ;-) Hence my question to you about a real need. Thanx, Paul > Thanks, > Byungchul > > > Joel > > > > > > > Thanx, Paul > > > > > > > The function for setting jiffies_to_sched_qs, > > > > adjust_jiffies_till_sched_qs() will be called only if > > > > the value from sysfs != ULONG_MAX. And the value won't be adjusted > > > > unlike first/next fqs jiffies. > > > > > > > > While at it, changed the positions of two module_param()s downward. > > > > > > > > Signed-off-by: Byungchul Park <byungchul.park@lge.com> > > > > --- > > > > kernel/rcu/tree.c | 22 ++++++++++++++++++++-- > > > > 1 file changed, 20 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index a2f8ba2..a28e2fe 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -422,9 +422,7 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > > > * quiescent-state help from rcu_note_context_switch(). > > > > */ > > > > static ulong jiffies_till_sched_qs = ULONG_MAX; > > > > -module_param(jiffies_till_sched_qs, ulong, 0444); > > > > static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > > > > /* > > > > * Make sure that we give the grace-period kthread time to detect any > > > > @@ -450,6 +448,18 @@ static void adjust_jiffies_till_sched_qs(void) > > > > WRITE_ONCE(jiffies_to_sched_qs, j); > > > > } > > > > > > > > +static int param_set_sched_qs_jiffies(const char *val, const struct kernel_param *kp) > > > > +{ > > > > + ulong j; > > > > + int ret = kstrtoul(val, 0, &j); > > > > + > > > > + if (!ret && j != ULONG_MAX) { > > > > + WRITE_ONCE(*(ulong *)kp->arg, j); > > > > + adjust_jiffies_till_sched_qs(); > > > > + } > > > > + return ret; > > > > +} > > > > + > > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > > > { > > > > ulong j; > > > > @@ -474,6 +484,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > > return ret; > > > > } > > > > > > > > +static struct kernel_param_ops sched_qs_jiffies_ops = { > > > > + .set = param_set_sched_qs_jiffies, > > > > + .get = param_get_ulong, > > > > +}; > > > > + > > > > static struct kernel_param_ops first_fqs_jiffies_ops = { > > > > .set = param_set_first_fqs_jiffies, > > > > .get = param_get_ulong, > > > > @@ -484,8 +499,11 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param > > > > .get = param_get_ulong, > > > > }; > > > > > > > > +module_param_cb(jiffies_till_sched_qs, &sched_qs_jiffies_ops, &jiffies_till_sched_qs, 0644); > > > > module_param_cb(jiffies_till_first_fqs, &first_fqs_jiffies_ops, &jiffies_till_first_fqs, 0644); > > > > module_param_cb(jiffies_till_next_fqs, &next_fqs_jiffies_ops, &jiffies_till_next_fqs, 0644); > > > > + > > > > +module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > module_param(rcu_kick_kthreads, bool, 0644); > > > > > > > > static void force_qs_rnp(int (*f)(struct rcu_data *rdp)); > > > > -- > > > > 1.9.1 > > > > > > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-09 12:41 ` Paul E. McKenney @ 2019-07-10 1:20 ` Byungchul Park 2019-07-11 12:30 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Byungchul Park @ 2019-07-10 1:20 UTC (permalink / raw) To: Paul E. McKenney Cc: Joel Fernandes, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Tue, Jul 09, 2019 at 05:41:02AM -0700, Paul E. McKenney wrote: > > Hi Paul, > > > > IMHO, as much as we want to tune the time for fqs to be initiated, we > > can also want to tune the time for the help from scheduler to start. > > I thought only difference between them is a level of urgency. I might be > > wrong. It would be appreciated if you let me know if I miss something. > > Hello, Byungchul, > > I understand that one hypothetically might want to tune this at runtime, > but have you had need to tune this at runtime on a real production > workload? If so, what problem was happening that caused you to want to > do this tuning? Not actually. > > And it's ok even if the patch is turned down based on your criteria. :) > > If there is a real need, something needs to be provided to meet that > need. But in the absence of a real need, past experience has shown > that speculative tuning knobs usually do more harm than good. ;-) It makes sense, "A speculative tuning knobs do more harm than good". Then, it would be better to leave jiffies_till_{first,next}_fqs tunnable but jiffies_till_sched_qs until we need it. However, (1) In case that jiffies_till_sched_qs is tunnable: We might need all of jiffies_till_{first,next}_qs, jiffies_till_sched_qs and jiffies_to_sched_qs because jiffies_to_sched_qs can be affected by any of them. So we should be able to read each value at any time. (2) In case that jiffies_till_sched_qs is not tunnable: I think we don't have to keep the jiffies_till_sched_qs any longer since that's only for setting jiffies_to_sched_qs at *booting time*, which can be done with jiffies_to_sched_qs too. It's meaningless to keep all of tree variables. The simpler and less knobs that we really need we have, the better. what do you think about it? In the following patch, I (1) removed jiffies_till_sched_qs and then (2) renamed jiffies_*to*_sched_qs to jiffies_*till*_sched_qs because I think jiffies_till_sched_qs is a much better name for the purpose. I will resend it with a commit msg after knowing your opinion on it. Thanks, Byungchul ---8<--- diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index e72c184..94b58f5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3792,10 +3792,6 @@ a value based on the most recent settings of rcutree.jiffies_till_first_fqs and rcutree.jiffies_till_next_fqs. - This calculated value may be viewed in - rcutree.jiffies_to_sched_qs. Any attempt to set - rcutree.jiffies_to_sched_qs will be cheerfully - overwritten. rcutree.kthread_prio= [KNL,BOOT] Set the SCHED_FIFO priority of the RCU per-CPU diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index a2f8ba2..ad9dc86 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -421,10 +421,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) * How long the grace period must be before we start recruiting * quiescent-state help from rcu_note_context_switch(). */ -static ulong jiffies_till_sched_qs = ULONG_MAX; +static ulong jiffies_till_sched_qs = ULONG_MAX; /* See adjust_jiffies_till_sched_qs(). */ module_param(jiffies_till_sched_qs, ulong, 0444); -static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ /* * Make sure that we give the grace-period kthread time to detect any @@ -436,18 +434,13 @@ static void adjust_jiffies_till_sched_qs(void) { unsigned long j; - /* If jiffies_till_sched_qs was specified, respect the request. */ - if (jiffies_till_sched_qs != ULONG_MAX) { - WRITE_ONCE(jiffies_to_sched_qs, jiffies_till_sched_qs); - return; - } /* Otherwise, set to third fqs scan, but bound below on large system. */ j = READ_ONCE(jiffies_till_first_fqs) + 2 * READ_ONCE(jiffies_till_next_fqs); if (j < HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV) j = HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV; pr_info("RCU calculated value of scheduler-enlistment delay is %ld jiffies.\n", j); - WRITE_ONCE(jiffies_to_sched_qs, j); + WRITE_ONCE(jiffies_till_sched_qs, j); } static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) @@ -1033,16 +1026,16 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) /* * A CPU running for an extended time within the kernel can - * delay RCU grace periods: (1) At age jiffies_to_sched_qs, - * set .rcu_urgent_qs, (2) At age 2*jiffies_to_sched_qs, set + * delay RCU grace periods: (1) At age jiffies_till_sched_qs, + * set .rcu_urgent_qs, (2) At age 2*jiffies_till_sched_qs, set * both .rcu_need_heavy_qs and .rcu_urgent_qs. Note that the * unsynchronized assignments to the per-CPU rcu_need_heavy_qs * variable are safe because the assignments are repeated if this * CPU failed to pass through a quiescent state. This code - * also checks .jiffies_resched in case jiffies_to_sched_qs + * also checks .jiffies_resched in case jiffies_till_sched_qs * is set way high. */ - jtsq = READ_ONCE(jiffies_to_sched_qs); + jtsq = READ_ONCE(jiffies_till_sched_qs); ruqp = per_cpu_ptr(&rcu_data.rcu_urgent_qs, rdp->cpu); rnhqp = &per_cpu(rcu_data.rcu_need_heavy_qs, rdp->cpu); if (!READ_ONCE(*rnhqp) && @@ -3383,7 +3376,8 @@ static void __init rcu_init_geometry(void) jiffies_till_first_fqs = d; if (jiffies_till_next_fqs == ULONG_MAX) jiffies_till_next_fqs = d; - adjust_jiffies_till_sched_qs(); + if (jiffies_till_sched_qs == ULONG_MAX) + adjust_jiffies_till_sched_qs(); /* If the compile-time values are accurate, just leave. */ if (rcu_fanout_leaf == RCU_FANOUT_LEAF && ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-10 1:20 ` Byungchul Park @ 2019-07-11 12:30 ` Paul E. McKenney 2019-07-11 13:08 ` Joel Fernandes 2019-07-12 5:48 ` Byungchul Park 0 siblings, 2 replies; 53+ messages in thread From: Paul E. McKenney @ 2019-07-11 12:30 UTC (permalink / raw) To: Byungchul Park Cc: Joel Fernandes, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Wed, Jul 10, 2019 at 10:20:25AM +0900, Byungchul Park wrote: > On Tue, Jul 09, 2019 at 05:41:02AM -0700, Paul E. McKenney wrote: > > > Hi Paul, > > > > > > IMHO, as much as we want to tune the time for fqs to be initiated, we > > > can also want to tune the time for the help from scheduler to start. > > > I thought only difference between them is a level of urgency. I might be > > > wrong. It would be appreciated if you let me know if I miss something. > > > > Hello, Byungchul, > > > > I understand that one hypothetically might want to tune this at runtime, > > but have you had need to tune this at runtime on a real production > > workload? If so, what problem was happening that caused you to want to > > do this tuning? > > Not actually. > > > > And it's ok even if the patch is turned down based on your criteria. :) > > > > If there is a real need, something needs to be provided to meet that > > need. But in the absence of a real need, past experience has shown > > that speculative tuning knobs usually do more harm than good. ;-) > > It makes sense, "A speculative tuning knobs do more harm than good". > > Then, it would be better to leave jiffies_till_{first,next}_fqs tunnable > but jiffies_till_sched_qs until we need it. > > However, > > (1) In case that jiffies_till_sched_qs is tunnable: > > We might need all of jiffies_till_{first,next}_qs, > jiffies_till_sched_qs and jiffies_to_sched_qs because > jiffies_to_sched_qs can be affected by any of them. So we > should be able to read each value at any time. > > (2) In case that jiffies_till_sched_qs is not tunnable: > > I think we don't have to keep the jiffies_till_sched_qs any > longer since that's only for setting jiffies_to_sched_qs at > *booting time*, which can be done with jiffies_to_sched_qs too. > It's meaningless to keep all of tree variables. > > The simpler and less knobs that we really need we have, the better. > > what do you think about it? > > In the following patch, I (1) removed jiffies_till_sched_qs and then > (2) renamed jiffies_*to*_sched_qs to jiffies_*till*_sched_qs because I > think jiffies_till_sched_qs is a much better name for the purpose. I > will resend it with a commit msg after knowing your opinion on it. I will give you a definite "maybe". Here are the two reasons for changing RCU's embarrassingly large array of tuning parameters: 1. They are causing a problem in production. This would represent a bug that clearly must be fixed. As you say, this change is not in this category. 2. The change simplifies either RCU's code or the process of tuning RCU, but without degrading RCU's ability to run everywhere and without removing debugging tools. The change below clearly simplifies things by removing a few lines of code, and it does not change RCU's default self-configuration. But are we sure about the debugging aspect? (Please keep in mind that many more sites are willing to change boot parameters than are willing to patch their kernels.) What do you think? Finally, I urge you to join with Joel Fernandes and go through these grace-period-duration tuning parameters. Once you guys get your heads completely around all of them and how they interact across the different possible RCU configurations, I bet that the two of you will have excellent ideas for improvement. Thanx, Paul > Thanks, > Byungchul > > ---8<--- > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index e72c184..94b58f5 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3792,10 +3792,6 @@ > a value based on the most recent settings > of rcutree.jiffies_till_first_fqs > and rcutree.jiffies_till_next_fqs. > - This calculated value may be viewed in > - rcutree.jiffies_to_sched_qs. Any attempt to set > - rcutree.jiffies_to_sched_qs will be cheerfully > - overwritten. > > rcutree.kthread_prio= [KNL,BOOT] > Set the SCHED_FIFO priority of the RCU per-CPU > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index a2f8ba2..ad9dc86 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -421,10 +421,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) > * How long the grace period must be before we start recruiting > * quiescent-state help from rcu_note_context_switch(). > */ > -static ulong jiffies_till_sched_qs = ULONG_MAX; > +static ulong jiffies_till_sched_qs = ULONG_MAX; /* See adjust_jiffies_till_sched_qs(). */ > module_param(jiffies_till_sched_qs, ulong, 0444); > -static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > /* > * Make sure that we give the grace-period kthread time to detect any > @@ -436,18 +434,13 @@ static void adjust_jiffies_till_sched_qs(void) > { > unsigned long j; > > - /* If jiffies_till_sched_qs was specified, respect the request. */ > - if (jiffies_till_sched_qs != ULONG_MAX) { > - WRITE_ONCE(jiffies_to_sched_qs, jiffies_till_sched_qs); > - return; > - } > /* Otherwise, set to third fqs scan, but bound below on large system. */ > j = READ_ONCE(jiffies_till_first_fqs) + > 2 * READ_ONCE(jiffies_till_next_fqs); > if (j < HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV) > j = HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV; > pr_info("RCU calculated value of scheduler-enlistment delay is %ld jiffies.\n", j); > - WRITE_ONCE(jiffies_to_sched_qs, j); > + WRITE_ONCE(jiffies_till_sched_qs, j); > } > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > @@ -1033,16 +1026,16 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) > > /* > * A CPU running for an extended time within the kernel can > - * delay RCU grace periods: (1) At age jiffies_to_sched_qs, > - * set .rcu_urgent_qs, (2) At age 2*jiffies_to_sched_qs, set > + * delay RCU grace periods: (1) At age jiffies_till_sched_qs, > + * set .rcu_urgent_qs, (2) At age 2*jiffies_till_sched_qs, set > * both .rcu_need_heavy_qs and .rcu_urgent_qs. Note that the > * unsynchronized assignments to the per-CPU rcu_need_heavy_qs > * variable are safe because the assignments are repeated if this > * CPU failed to pass through a quiescent state. This code > - * also checks .jiffies_resched in case jiffies_to_sched_qs > + * also checks .jiffies_resched in case jiffies_till_sched_qs > * is set way high. > */ > - jtsq = READ_ONCE(jiffies_to_sched_qs); > + jtsq = READ_ONCE(jiffies_till_sched_qs); > ruqp = per_cpu_ptr(&rcu_data.rcu_urgent_qs, rdp->cpu); > rnhqp = &per_cpu(rcu_data.rcu_need_heavy_qs, rdp->cpu); > if (!READ_ONCE(*rnhqp) && > @@ -3383,7 +3376,8 @@ static void __init rcu_init_geometry(void) > jiffies_till_first_fqs = d; > if (jiffies_till_next_fqs == ULONG_MAX) > jiffies_till_next_fqs = d; > - adjust_jiffies_till_sched_qs(); > + if (jiffies_till_sched_qs == ULONG_MAX) > + adjust_jiffies_till_sched_qs(); > > /* If the compile-time values are accurate, just leave. */ > if (rcu_fanout_leaf == RCU_FANOUT_LEAF && ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-11 12:30 ` Paul E. McKenney @ 2019-07-11 13:08 ` Joel Fernandes 2019-07-11 15:02 ` Paul E. McKenney 2019-07-12 5:52 ` Byungchul Park 2019-07-12 5:48 ` Byungchul Park 1 sibling, 2 replies; 53+ messages in thread From: Joel Fernandes @ 2019-07-11 13:08 UTC (permalink / raw) To: Paul E. McKenney Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Thu, Jul 11, 2019 at 05:30:52AM -0700, Paul E. McKenney wrote: > On Wed, Jul 10, 2019 at 10:20:25AM +0900, Byungchul Park wrote: > > On Tue, Jul 09, 2019 at 05:41:02AM -0700, Paul E. McKenney wrote: > > > > Hi Paul, > > > > > > > > IMHO, as much as we want to tune the time for fqs to be initiated, we > > > > can also want to tune the time for the help from scheduler to start. > > > > I thought only difference between them is a level of urgency. I might be > > > > wrong. It would be appreciated if you let me know if I miss something. > > > > > > Hello, Byungchul, > > > > > > I understand that one hypothetically might want to tune this at runtime, > > > but have you had need to tune this at runtime on a real production > > > workload? If so, what problem was happening that caused you to want to > > > do this tuning? > > > > Not actually. > > > > > > And it's ok even if the patch is turned down based on your criteria. :) > > > > > > If there is a real need, something needs to be provided to meet that > > > need. But in the absence of a real need, past experience has shown > > > that speculative tuning knobs usually do more harm than good. ;-) > > > > It makes sense, "A speculative tuning knobs do more harm than good". > > > > Then, it would be better to leave jiffies_till_{first,next}_fqs tunnable > > but jiffies_till_sched_qs until we need it. > > > > However, > > > > (1) In case that jiffies_till_sched_qs is tunnable: > > > > We might need all of jiffies_till_{first,next}_qs, > > jiffies_till_sched_qs and jiffies_to_sched_qs because > > jiffies_to_sched_qs can be affected by any of them. So we > > should be able to read each value at any time. > > > > (2) In case that jiffies_till_sched_qs is not tunnable: > > > > I think we don't have to keep the jiffies_till_sched_qs any > > longer since that's only for setting jiffies_to_sched_qs at > > *booting time*, which can be done with jiffies_to_sched_qs too. > > It's meaningless to keep all of tree variables. > > > > The simpler and less knobs that we really need we have, the better. > > > > what do you think about it? > > > > In the following patch, I (1) removed jiffies_till_sched_qs and then > > (2) renamed jiffies_*to*_sched_qs to jiffies_*till*_sched_qs because I > > think jiffies_till_sched_qs is a much better name for the purpose. I > > will resend it with a commit msg after knowing your opinion on it. > > I will give you a definite "maybe". > > Here are the two reasons for changing RCU's embarrassingly large array > of tuning parameters: > > 1. They are causing a problem in production. This would represent a > bug that clearly must be fixed. As you say, this change is not > in this category. > > 2. The change simplifies either RCU's code or the process of tuning > RCU, but without degrading RCU's ability to run everywhere and > without removing debugging tools. > > The change below clearly simplifies things by removing a few lines of > code, and it does not change RCU's default self-configuration. But are > we sure about the debugging aspect? (Please keep in mind that many more > sites are willing to change boot parameters than are willing to patch > their kernels.) > > What do you think? Just to add that independent of whether the runtime tunable make sense or not, may be it is still worth correcting the 0444 to be 0644 to be a separate patch? > Finally, I urge you to join with Joel Fernandes and go through these > grace-period-duration tuning parameters. Once you guys get your heads > completely around all of them and how they interact across the different > possible RCU configurations, I bet that the two of you will have excellent > ideas for improvement. Yes, I am quite happy to join forces. Byungchul, let me know what about this or other things you had in mind. I have some other RCU topics too I am trying to get my head around and planning to work on more patches. Paul, in case you had any other specific tunables or experiments in mind, let me know. I am quite happy to try out new experiments and learn something based on tuning something. thanks, - Joel > > Thanks, > > Byungchul > > > > ---8<--- > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > index e72c184..94b58f5 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -3792,10 +3792,6 @@ > > a value based on the most recent settings > > of rcutree.jiffies_till_first_fqs > > and rcutree.jiffies_till_next_fqs. > > - This calculated value may be viewed in > > - rcutree.jiffies_to_sched_qs. Any attempt to set > > - rcutree.jiffies_to_sched_qs will be cheerfully > > - overwritten. > > > > rcutree.kthread_prio= [KNL,BOOT] > > Set the SCHED_FIFO priority of the RCU per-CPU > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index a2f8ba2..ad9dc86 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -421,10 +421,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > * How long the grace period must be before we start recruiting > > * quiescent-state help from rcu_note_context_switch(). > > */ > > -static ulong jiffies_till_sched_qs = ULONG_MAX; > > +static ulong jiffies_till_sched_qs = ULONG_MAX; /* See adjust_jiffies_till_sched_qs(). */ > > module_param(jiffies_till_sched_qs, ulong, 0444); > > -static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > /* > > * Make sure that we give the grace-period kthread time to detect any > > @@ -436,18 +434,13 @@ static void adjust_jiffies_till_sched_qs(void) > > { > > unsigned long j; > > > > - /* If jiffies_till_sched_qs was specified, respect the request. */ > > - if (jiffies_till_sched_qs != ULONG_MAX) { > > - WRITE_ONCE(jiffies_to_sched_qs, jiffies_till_sched_qs); > > - return; > > - } > > /* Otherwise, set to third fqs scan, but bound below on large system. */ > > j = READ_ONCE(jiffies_till_first_fqs) + > > 2 * READ_ONCE(jiffies_till_next_fqs); > > if (j < HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV) > > j = HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV; > > pr_info("RCU calculated value of scheduler-enlistment delay is %ld jiffies.\n", j); > > - WRITE_ONCE(jiffies_to_sched_qs, j); > > + WRITE_ONCE(jiffies_till_sched_qs, j); > > } > > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > @@ -1033,16 +1026,16 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) > > > > /* > > * A CPU running for an extended time within the kernel can > > - * delay RCU grace periods: (1) At age jiffies_to_sched_qs, > > - * set .rcu_urgent_qs, (2) At age 2*jiffies_to_sched_qs, set > > + * delay RCU grace periods: (1) At age jiffies_till_sched_qs, > > + * set .rcu_urgent_qs, (2) At age 2*jiffies_till_sched_qs, set > > * both .rcu_need_heavy_qs and .rcu_urgent_qs. Note that the > > * unsynchronized assignments to the per-CPU rcu_need_heavy_qs > > * variable are safe because the assignments are repeated if this > > * CPU failed to pass through a quiescent state. This code > > - * also checks .jiffies_resched in case jiffies_to_sched_qs > > + * also checks .jiffies_resched in case jiffies_till_sched_qs > > * is set way high. > > */ > > - jtsq = READ_ONCE(jiffies_to_sched_qs); > > + jtsq = READ_ONCE(jiffies_till_sched_qs); > > ruqp = per_cpu_ptr(&rcu_data.rcu_urgent_qs, rdp->cpu); > > rnhqp = &per_cpu(rcu_data.rcu_need_heavy_qs, rdp->cpu); > > if (!READ_ONCE(*rnhqp) && > > @@ -3383,7 +3376,8 @@ static void __init rcu_init_geometry(void) > > jiffies_till_first_fqs = d; > > if (jiffies_till_next_fqs == ULONG_MAX) > > jiffies_till_next_fqs = d; > > - adjust_jiffies_till_sched_qs(); > > + if (jiffies_till_sched_qs == ULONG_MAX) > > + adjust_jiffies_till_sched_qs(); > > > > /* If the compile-time values are accurate, just leave. */ > > if (rcu_fanout_leaf == RCU_FANOUT_LEAF && > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-11 13:08 ` Joel Fernandes @ 2019-07-11 15:02 ` Paul E. McKenney 2019-07-11 16:48 ` Joel Fernandes 2019-07-12 6:00 ` Byungchul Park 2019-07-12 5:52 ` Byungchul Park 1 sibling, 2 replies; 53+ messages in thread From: Paul E. McKenney @ 2019-07-11 15:02 UTC (permalink / raw) To: Joel Fernandes Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Thu, Jul 11, 2019 at 09:08:49AM -0400, Joel Fernandes wrote: > On Thu, Jul 11, 2019 at 05:30:52AM -0700, Paul E. McKenney wrote: > > On Wed, Jul 10, 2019 at 10:20:25AM +0900, Byungchul Park wrote: > > > On Tue, Jul 09, 2019 at 05:41:02AM -0700, Paul E. McKenney wrote: > > > > > Hi Paul, > > > > > > > > > > IMHO, as much as we want to tune the time for fqs to be initiated, we > > > > > can also want to tune the time for the help from scheduler to start. > > > > > I thought only difference between them is a level of urgency. I might be > > > > > wrong. It would be appreciated if you let me know if I miss something. > > > > > > > > Hello, Byungchul, > > > > > > > > I understand that one hypothetically might want to tune this at runtime, > > > > but have you had need to tune this at runtime on a real production > > > > workload? If so, what problem was happening that caused you to want to > > > > do this tuning? > > > > > > Not actually. > > > > > > > > And it's ok even if the patch is turned down based on your criteria. :) > > > > > > > > If there is a real need, something needs to be provided to meet that > > > > need. But in the absence of a real need, past experience has shown > > > > that speculative tuning knobs usually do more harm than good. ;-) > > > > > > It makes sense, "A speculative tuning knobs do more harm than good". > > > > > > Then, it would be better to leave jiffies_till_{first,next}_fqs tunnable > > > but jiffies_till_sched_qs until we need it. > > > > > > However, > > > > > > (1) In case that jiffies_till_sched_qs is tunnable: > > > > > > We might need all of jiffies_till_{first,next}_qs, > > > jiffies_till_sched_qs and jiffies_to_sched_qs because > > > jiffies_to_sched_qs can be affected by any of them. So we > > > should be able to read each value at any time. > > > > > > (2) In case that jiffies_till_sched_qs is not tunnable: > > > > > > I think we don't have to keep the jiffies_till_sched_qs any > > > longer since that's only for setting jiffies_to_sched_qs at > > > *booting time*, which can be done with jiffies_to_sched_qs too. > > > It's meaningless to keep all of tree variables. > > > > > > The simpler and less knobs that we really need we have, the better. > > > > > > what do you think about it? > > > > > > In the following patch, I (1) removed jiffies_till_sched_qs and then > > > (2) renamed jiffies_*to*_sched_qs to jiffies_*till*_sched_qs because I > > > think jiffies_till_sched_qs is a much better name for the purpose. I > > > will resend it with a commit msg after knowing your opinion on it. > > > > I will give you a definite "maybe". > > > > Here are the two reasons for changing RCU's embarrassingly large array > > of tuning parameters: > > > > 1. They are causing a problem in production. This would represent a > > bug that clearly must be fixed. As you say, this change is not > > in this category. > > > > 2. The change simplifies either RCU's code or the process of tuning > > RCU, but without degrading RCU's ability to run everywhere and > > without removing debugging tools. > > > > The change below clearly simplifies things by removing a few lines of > > code, and it does not change RCU's default self-configuration. But are > > we sure about the debugging aspect? (Please keep in mind that many more > > sites are willing to change boot parameters than are willing to patch > > their kernels.) > > > > What do you think? > > Just to add that independent of whether the runtime tunable make sense or > not, may be it is still worth correcting the 0444 to be 0644 to be a separate > patch? You lost me on this one. Doesn't changing from 0444 to 0644 make it be a runtime tunable? > > Finally, I urge you to join with Joel Fernandes and go through these > > grace-period-duration tuning parameters. Once you guys get your heads > > completely around all of them and how they interact across the different > > possible RCU configurations, I bet that the two of you will have excellent > > ideas for improvement. > > Yes, I am quite happy to join forces. Byungchul, let me know what about this > or other things you had in mind. I have some other RCU topics too I am trying > to get my head around and planning to work on more patches. > > Paul, in case you had any other specific tunables or experiments in mind, let > me know. I am quite happy to try out new experiments and learn something > based on tuning something. These would be the tunables controlling how quickly RCU takes its various actions to encourage the current grace period to end quickly. I would be happy to give you the exact list if you wish, but most of them have appeared in this thread. The experiments should be designed to work out whether the current default settings have configurations where they act badly. This might also come up with advice for people attempting hand-tuning, or proposed parameter-checking code to avoid bad combinations. For one example, setting the RCU CPU stall timeout too low will definitely cause some unwanted splats. (Yes, one could argue that other things in the kernel should change to allow this value to decrease, but things like latency tracer and friends are probably more useful and important.) Thanx, Paul > thanks, > > - Joel > > > > > > Thanks, > > > Byungchul > > > > > > ---8<--- > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > > index e72c184..94b58f5 100644 > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > @@ -3792,10 +3792,6 @@ > > > a value based on the most recent settings > > > of rcutree.jiffies_till_first_fqs > > > and rcutree.jiffies_till_next_fqs. > > > - This calculated value may be viewed in > > > - rcutree.jiffies_to_sched_qs. Any attempt to set > > > - rcutree.jiffies_to_sched_qs will be cheerfully > > > - overwritten. > > > > > > rcutree.kthread_prio= [KNL,BOOT] > > > Set the SCHED_FIFO priority of the RCU per-CPU > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index a2f8ba2..ad9dc86 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -421,10 +421,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > > * How long the grace period must be before we start recruiting > > > * quiescent-state help from rcu_note_context_switch(). > > > */ > > > -static ulong jiffies_till_sched_qs = ULONG_MAX; > > > +static ulong jiffies_till_sched_qs = ULONG_MAX; /* See adjust_jiffies_till_sched_qs(). */ > > > module_param(jiffies_till_sched_qs, ulong, 0444); > > > -static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > > /* > > > * Make sure that we give the grace-period kthread time to detect any > > > @@ -436,18 +434,13 @@ static void adjust_jiffies_till_sched_qs(void) > > > { > > > unsigned long j; > > > > > > - /* If jiffies_till_sched_qs was specified, respect the request. */ > > > - if (jiffies_till_sched_qs != ULONG_MAX) { > > > - WRITE_ONCE(jiffies_to_sched_qs, jiffies_till_sched_qs); > > > - return; > > > - } > > > /* Otherwise, set to third fqs scan, but bound below on large system. */ > > > j = READ_ONCE(jiffies_till_first_fqs) + > > > 2 * READ_ONCE(jiffies_till_next_fqs); > > > if (j < HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV) > > > j = HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV; > > > pr_info("RCU calculated value of scheduler-enlistment delay is %ld jiffies.\n", j); > > > - WRITE_ONCE(jiffies_to_sched_qs, j); > > > + WRITE_ONCE(jiffies_till_sched_qs, j); > > > } > > > > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > > @@ -1033,16 +1026,16 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) > > > > > > /* > > > * A CPU running for an extended time within the kernel can > > > - * delay RCU grace periods: (1) At age jiffies_to_sched_qs, > > > - * set .rcu_urgent_qs, (2) At age 2*jiffies_to_sched_qs, set > > > + * delay RCU grace periods: (1) At age jiffies_till_sched_qs, > > > + * set .rcu_urgent_qs, (2) At age 2*jiffies_till_sched_qs, set > > > * both .rcu_need_heavy_qs and .rcu_urgent_qs. Note that the > > > * unsynchronized assignments to the per-CPU rcu_need_heavy_qs > > > * variable are safe because the assignments are repeated if this > > > * CPU failed to pass through a quiescent state. This code > > > - * also checks .jiffies_resched in case jiffies_to_sched_qs > > > + * also checks .jiffies_resched in case jiffies_till_sched_qs > > > * is set way high. > > > */ > > > - jtsq = READ_ONCE(jiffies_to_sched_qs); > > > + jtsq = READ_ONCE(jiffies_till_sched_qs); > > > ruqp = per_cpu_ptr(&rcu_data.rcu_urgent_qs, rdp->cpu); > > > rnhqp = &per_cpu(rcu_data.rcu_need_heavy_qs, rdp->cpu); > > > if (!READ_ONCE(*rnhqp) && > > > @@ -3383,7 +3376,8 @@ static void __init rcu_init_geometry(void) > > > jiffies_till_first_fqs = d; > > > if (jiffies_till_next_fqs == ULONG_MAX) > > > jiffies_till_next_fqs = d; > > > - adjust_jiffies_till_sched_qs(); > > > + if (jiffies_till_sched_qs == ULONG_MAX) > > > + adjust_jiffies_till_sched_qs(); > > > > > > /* If the compile-time values are accurate, just leave. */ > > > if (rcu_fanout_leaf == RCU_FANOUT_LEAF && > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-11 15:02 ` Paul E. McKenney @ 2019-07-11 16:48 ` Joel Fernandes 2019-07-11 19:58 ` Joel Fernandes 2019-07-12 6:00 ` Byungchul Park 1 sibling, 1 reply; 53+ messages in thread From: Joel Fernandes @ 2019-07-11 16:48 UTC (permalink / raw) To: Paul E. McKenney Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Thu, Jul 11, 2019 at 08:02:15AM -0700, Paul E. McKenney wrote: > On Thu, Jul 11, 2019 at 09:08:49AM -0400, Joel Fernandes wrote: > > On Thu, Jul 11, 2019 at 05:30:52AM -0700, Paul E. McKenney wrote: > > > On Wed, Jul 10, 2019 at 10:20:25AM +0900, Byungchul Park wrote: > > > > On Tue, Jul 09, 2019 at 05:41:02AM -0700, Paul E. McKenney wrote: > > > > > > Hi Paul, > > > > > > > > > > > > IMHO, as much as we want to tune the time for fqs to be initiated, we > > > > > > can also want to tune the time for the help from scheduler to start. > > > > > > I thought only difference between them is a level of urgency. I might be > > > > > > wrong. It would be appreciated if you let me know if I miss something. > > > > > > > > > > Hello, Byungchul, > > > > > > > > > > I understand that one hypothetically might want to tune this at runtime, > > > > > but have you had need to tune this at runtime on a real production > > > > > workload? If so, what problem was happening that caused you to want to > > > > > do this tuning? > > > > > > > > Not actually. > > > > > > > > > > And it's ok even if the patch is turned down based on your criteria. :) > > > > > > > > > > If there is a real need, something needs to be provided to meet that > > > > > need. But in the absence of a real need, past experience has shown > > > > > that speculative tuning knobs usually do more harm than good. ;-) > > > > > > > > It makes sense, "A speculative tuning knobs do more harm than good". > > > > > > > > Then, it would be better to leave jiffies_till_{first,next}_fqs tunnable > > > > but jiffies_till_sched_qs until we need it. > > > > > > > > However, > > > > > > > > (1) In case that jiffies_till_sched_qs is tunnable: > > > > > > > > We might need all of jiffies_till_{first,next}_qs, > > > > jiffies_till_sched_qs and jiffies_to_sched_qs because > > > > jiffies_to_sched_qs can be affected by any of them. So we > > > > should be able to read each value at any time. > > > > > > > > (2) In case that jiffies_till_sched_qs is not tunnable: > > > > > > > > I think we don't have to keep the jiffies_till_sched_qs any > > > > longer since that's only for setting jiffies_to_sched_qs at > > > > *booting time*, which can be done with jiffies_to_sched_qs too. > > > > It's meaningless to keep all of tree variables. > > > > > > > > The simpler and less knobs that we really need we have, the better. > > > > > > > > what do you think about it? > > > > > > > > In the following patch, I (1) removed jiffies_till_sched_qs and then > > > > (2) renamed jiffies_*to*_sched_qs to jiffies_*till*_sched_qs because I > > > > think jiffies_till_sched_qs is a much better name for the purpose. I > > > > will resend it with a commit msg after knowing your opinion on it. > > > > > > I will give you a definite "maybe". > > > > > > Here are the two reasons for changing RCU's embarrassingly large array > > > of tuning parameters: > > > > > > 1. They are causing a problem in production. This would represent a > > > bug that clearly must be fixed. As you say, this change is not > > > in this category. > > > > > > 2. The change simplifies either RCU's code or the process of tuning > > > RCU, but without degrading RCU's ability to run everywhere and > > > without removing debugging tools. > > > > > > The change below clearly simplifies things by removing a few lines of > > > code, and it does not change RCU's default self-configuration. But are > > > we sure about the debugging aspect? (Please keep in mind that many more > > > sites are willing to change boot parameters than are willing to patch > > > their kernels.) > > > > > > What do you think? > > > > Just to add that independent of whether the runtime tunable make sense or > > not, may be it is still worth correcting the 0444 to be 0644 to be a separate > > patch? > > You lost me on this one. Doesn't changing from 0444 to 0644 make it be > a runtime tunable? I was going by our earlier discussion that the parameter is still writable at boot time. You mentioned something like the following: --- In Byungchul's defense, the current module_param() permissions are 0444, which really is read-only. Although I do agree that they can be written at boot, one could use this same line of reasoning to argue that const variables can be written at compile time (or, for on-stack const variables, at function-invocation time). But we still call them "const". --- Sorry if I got confused. You are right that we could leave it as read-only. > > > Finally, I urge you to join with Joel Fernandes and go through these > > > grace-period-duration tuning parameters. Once you guys get your heads > > > completely around all of them and how they interact across the different > > > possible RCU configurations, I bet that the two of you will have excellent > > > ideas for improvement. > > > > Yes, I am quite happy to join forces. Byungchul, let me know what about this > > or other things you had in mind. I have some other RCU topics too I am trying > > to get my head around and planning to work on more patches. > > > > Paul, in case you had any other specific tunables or experiments in mind, let > > me know. I am quite happy to try out new experiments and learn something > > based on tuning something. > > These would be the tunables controlling how quickly RCU takes its > various actions to encourage the current grace period to end quickly. > I would be happy to give you the exact list if you wish, but most of > them have appeared in this thread. > > The experiments should be designed to work out whether the current > default settings have configurations where they act badly. This might > also come up with advice for people attempting hand-tuning, or proposed > parameter-checking code to avoid bad combinations. > > For one example, setting the RCU CPU stall timeout too low will definitely > cause some unwanted splats. (Yes, one could argue that other things in > the kernel should change to allow this value to decrease, but things > like latency tracer and friends are probably more useful and important.) Ok, thank you for the hints. I will try to poke around with these as well. I am currently also thinking of spending my time on RCU with reviving the list API patches: https://lore.kernel.org/patchwork/project/lkml/list/?series=396464 I noticed the occasional splat even with my preempt-disable test. But that was just when trace dumping was interfering which we discussed at length last week. - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-11 16:48 ` Joel Fernandes @ 2019-07-11 19:58 ` Joel Fernandes 2019-07-12 6:32 ` Byungchul Park 2019-07-12 13:01 ` Paul E. McKenney 0 siblings, 2 replies; 53+ messages in thread From: Joel Fernandes @ 2019-07-11 19:58 UTC (permalink / raw) To: Paul E. McKenney Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Thu, Jul 11, 2019 at 12:48:18PM -0400, Joel Fernandes wrote: > On Thu, Jul 11, 2019 at 08:02:15AM -0700, Paul E. McKenney wrote: > > On Thu, Jul 11, 2019 at 09:08:49AM -0400, Joel Fernandes wrote: > > > On Thu, Jul 11, 2019 at 05:30:52AM -0700, Paul E. McKenney wrote: > > > > On Wed, Jul 10, 2019 at 10:20:25AM +0900, Byungchul Park wrote: > > > > > On Tue, Jul 09, 2019 at 05:41:02AM -0700, Paul E. McKenney wrote: > > > > > > > Hi Paul, > > > > > > > > > > > > > > IMHO, as much as we want to tune the time for fqs to be initiated, we > > > > > > > can also want to tune the time for the help from scheduler to start. > > > > > > > I thought only difference between them is a level of urgency. I might be > > > > > > > wrong. It would be appreciated if you let me know if I miss something. > > > > > > > > > > > > Hello, Byungchul, > > > > > > > > > > > > I understand that one hypothetically might want to tune this at runtime, > > > > > > but have you had need to tune this at runtime on a real production > > > > > > workload? If so, what problem was happening that caused you to want to > > > > > > do this tuning? > > > > > > > > > > Not actually. > > > > > > > > > > > > And it's ok even if the patch is turned down based on your criteria. :) > > > > > > > > > > > > If there is a real need, something needs to be provided to meet that > > > > > > need. But in the absence of a real need, past experience has shown > > > > > > that speculative tuning knobs usually do more harm than good. ;-) > > > > > > > > > > It makes sense, "A speculative tuning knobs do more harm than good". > > > > > > > > > > Then, it would be better to leave jiffies_till_{first,next}_fqs tunnable > > > > > but jiffies_till_sched_qs until we need it. > > > > > > > > > > However, > > > > > > > > > > (1) In case that jiffies_till_sched_qs is tunnable: > > > > > > > > > > We might need all of jiffies_till_{first,next}_qs, > > > > > jiffies_till_sched_qs and jiffies_to_sched_qs because > > > > > jiffies_to_sched_qs can be affected by any of them. So we > > > > > should be able to read each value at any time. > > > > > > > > > > (2) In case that jiffies_till_sched_qs is not tunnable: > > > > > > > > > > I think we don't have to keep the jiffies_till_sched_qs any > > > > > longer since that's only for setting jiffies_to_sched_qs at > > > > > *booting time*, which can be done with jiffies_to_sched_qs too. > > > > > It's meaningless to keep all of tree variables. > > > > > > > > > > The simpler and less knobs that we really need we have, the better. > > > > > > > > > > what do you think about it? > > > > > > > > > > In the following patch, I (1) removed jiffies_till_sched_qs and then > > > > > (2) renamed jiffies_*to*_sched_qs to jiffies_*till*_sched_qs because I > > > > > think jiffies_till_sched_qs is a much better name for the purpose. I > > > > > will resend it with a commit msg after knowing your opinion on it. > > > > > > > > I will give you a definite "maybe". > > > > > > > > Here are the two reasons for changing RCU's embarrassingly large array > > > > of tuning parameters: > > > > > > > > 1. They are causing a problem in production. This would represent a > > > > bug that clearly must be fixed. As you say, this change is not > > > > in this category. > > > > > > > > 2. The change simplifies either RCU's code or the process of tuning > > > > RCU, but without degrading RCU's ability to run everywhere and > > > > without removing debugging tools. > > > > > > > > The change below clearly simplifies things by removing a few lines of > > > > code, and it does not change RCU's default self-configuration. But are > > > > we sure about the debugging aspect? (Please keep in mind that many more > > > > sites are willing to change boot parameters than are willing to patch > > > > their kernels.) > > > > > > > > What do you think? > > > > > > Just to add that independent of whether the runtime tunable make sense or > > > not, may be it is still worth correcting the 0444 to be 0644 to be a separate > > > patch? > > > > You lost me on this one. Doesn't changing from 0444 to 0644 make it be > > a runtime tunable? > > I was going by our earlier discussion that the parameter is still writable at > boot time. You mentioned something like the following: > --- > In Byungchul's defense, the current module_param() permissions are > 0444, which really is read-only. Although I do agree that they can > be written at boot, one could use this same line of reasoning to argue > that const variables can be written at compile time (or, for on-stack > const variables, at function-invocation time). But we still call them > "const". > --- > > Sorry if I got confused. You are right that we could leave it as read-only. > > > > > Finally, I urge you to join with Joel Fernandes and go through these > > > > grace-period-duration tuning parameters. Once you guys get your heads > > > > completely around all of them and how they interact across the different > > > > possible RCU configurations, I bet that the two of you will have excellent > > > > ideas for improvement. > > > > > > Yes, I am quite happy to join forces. Byungchul, let me know what about this > > > or other things you had in mind. I have some other RCU topics too I am trying > > > to get my head around and planning to work on more patches. > > > > > > Paul, in case you had any other specific tunables or experiments in mind, let > > > me know. I am quite happy to try out new experiments and learn something > > > based on tuning something. > > > > These would be the tunables controlling how quickly RCU takes its > > various actions to encourage the current grace period to end quickly. > > I would be happy to give you the exact list if you wish, but most of > > them have appeared in this thread. > > > > The experiments should be designed to work out whether the current > > default settings have configurations where they act badly. This might > > also come up with advice for people attempting hand-tuning, or proposed > > parameter-checking code to avoid bad combinations. > > > > For one example, setting the RCU CPU stall timeout too low will definitely > > cause some unwanted splats. (Yes, one could argue that other things in > > the kernel should change to allow this value to decrease, but things > > like latency tracer and friends are probably more useful and important.) > > Ok, thank you for the hints. Hmm, speaking of grace period durations, it seems to me the maximum grace period ever is recorded in rcu_state.gp_max. However it is not read from anywhere. Any idea why it was added but not used? I am interested in dumping this value just for fun, and seeing what I get. I wonder also it is useful to dump it in rcutorture/rcuperf to find any issues, or even expose it in sys/proc fs to see what worst case grace periods look like. - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-11 19:58 ` Joel Fernandes @ 2019-07-12 6:32 ` Byungchul Park 2019-07-12 12:51 ` Joel Fernandes 2019-07-12 13:01 ` Paul E. McKenney 1 sibling, 1 reply; 53+ messages in thread From: Byungchul Park @ 2019-07-12 6:32 UTC (permalink / raw) To: Joel Fernandes Cc: Paul E. McKenney, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > Hmm, speaking of grace period durations, it seems to me the maximum grace > period ever is recorded in rcu_state.gp_max. However it is not read from > anywhere. > > Any idea why it was added but not used? > > I am interested in dumping this value just for fun, and seeing what I get. > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > issues, or even expose it in sys/proc fs to see what worst case grace periods > look like. Hi, commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 rcu: Remove debugfs tracing removed all debugfs tracing, gp_max also included. And you sounds great. And even looks not that hard to add it like, :) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index ad9dc86..86095ff 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) raw_spin_lock_irq_rcu_node(rnp); rcu_state.gp_end = jiffies; gp_duration = rcu_state.gp_end - rcu_state.gp_start; - if (gp_duration > rcu_state.gp_max) + if (gp_duration > rcu_state.gp_max) { rcu_state.gp_max = gp_duration; + trace_rcu_grace_period(something something); + } /* * We know the grace period is complete, but to everyone else Thanks, Byungchul ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-12 6:32 ` Byungchul Park @ 2019-07-12 12:51 ` Joel Fernandes 2019-07-12 13:02 ` Paul E. McKenney 2019-07-13 8:47 ` Byungchul Park 0 siblings, 2 replies; 53+ messages in thread From: Joel Fernandes @ 2019-07-12 12:51 UTC (permalink / raw) To: Byungchul Park Cc: Paul E. McKenney, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > period ever is recorded in rcu_state.gp_max. However it is not read from > > anywhere. > > > > Any idea why it was added but not used? > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > look like. > > Hi, > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > rcu: Remove debugfs tracing > > removed all debugfs tracing, gp_max also included. > > And you sounds great. And even looks not that hard to add it like, > > :) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index ad9dc86..86095ff 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > raw_spin_lock_irq_rcu_node(rnp); > rcu_state.gp_end = jiffies; > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > - if (gp_duration > rcu_state.gp_max) > + if (gp_duration > rcu_state.gp_max) { > rcu_state.gp_max = gp_duration; > + trace_rcu_grace_period(something something); > + } Yes, that makes sense. But I think it is much better off as a readable value from a virtual fs. The drawback of tracing for this sort of thing are: - Tracing will only catch it if tracing is on - Tracing data can be lost if too many events, then no one has a clue what the max gp time is. - The data is already available in rcu_state::gp_max so copying it into the trace buffer seems a bit pointless IMHO - It is a lot easier on ones eyes to process a single counter than process heaps of traces. I think a minimal set of RCU counters exposed to /proc or /sys should not hurt and could do more good than not. The scheduler already does this for scheduler statistics. I have seen Peter complain a lot about new tracepoints but not much (or never) about new statistics. Tracing has its strengths but may not apply well here IMO. I think a counter like this could be useful for tuning of things like the jiffies_*_sched_qs, the stall timeouts and also any other RCU knobs. What do you think? - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-12 12:51 ` Joel Fernandes @ 2019-07-12 13:02 ` Paul E. McKenney 2019-07-12 13:43 ` Joel Fernandes 2019-07-13 8:47 ` Byungchul Park 1 sibling, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2019-07-12 13:02 UTC (permalink / raw) To: Joel Fernandes Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Fri, Jul 12, 2019 at 08:51:16AM -0400, Joel Fernandes wrote: > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > > period ever is recorded in rcu_state.gp_max. However it is not read from > > > anywhere. > > > > > > Any idea why it was added but not used? > > > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > > look like. > > > > Hi, > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > rcu: Remove debugfs tracing > > > > removed all debugfs tracing, gp_max also included. > > > > And you sounds great. And even looks not that hard to add it like, > > > > :) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index ad9dc86..86095ff 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > raw_spin_lock_irq_rcu_node(rnp); > > rcu_state.gp_end = jiffies; > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > - if (gp_duration > rcu_state.gp_max) > > + if (gp_duration > rcu_state.gp_max) { > > rcu_state.gp_max = gp_duration; > > + trace_rcu_grace_period(something something); > > + } > > Yes, that makes sense. But I think it is much better off as a readable value > from a virtual fs. The drawback of tracing for this sort of thing are: > - Tracing will only catch it if tracing is on > - Tracing data can be lost if too many events, then no one has a clue what > the max gp time is. > - The data is already available in rcu_state::gp_max so copying it into the > trace buffer seems a bit pointless IMHO > - It is a lot easier on ones eyes to process a single counter than process > heaps of traces. > > I think a minimal set of RCU counters exposed to /proc or /sys should not > hurt and could do more good than not. The scheduler already does this for > scheduler statistics. I have seen Peter complain a lot about new tracepoints > but not much (or never) about new statistics. > > Tracing has its strengths but may not apply well here IMO. I think a counter > like this could be useful for tuning of things like the jiffies_*_sched_qs, > the stall timeouts and also any other RCU knobs. What do you think? Is this one of those cases where eBPF is the answer, regardless of the question? ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-12 13:02 ` Paul E. McKenney @ 2019-07-12 13:43 ` Joel Fernandes 2019-07-12 14:53 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Joel Fernandes @ 2019-07-12 13:43 UTC (permalink / raw) To: Paul E. McKenney Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Fri, Jul 12, 2019 at 06:02:42AM -0700, Paul E. McKenney wrote: > On Fri, Jul 12, 2019 at 08:51:16AM -0400, Joel Fernandes wrote: > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > > > period ever is recorded in rcu_state.gp_max. However it is not read from > > > > anywhere. > > > > > > > > Any idea why it was added but not used? > > > > > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > > > look like. > > > > > > Hi, > > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > > rcu: Remove debugfs tracing > > > > > > removed all debugfs tracing, gp_max also included. > > > > > > And you sounds great. And even looks not that hard to add it like, > > > > > > :) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index ad9dc86..86095ff 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > > raw_spin_lock_irq_rcu_node(rnp); > > > rcu_state.gp_end = jiffies; > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > > - if (gp_duration > rcu_state.gp_max) > > > + if (gp_duration > rcu_state.gp_max) { > > > rcu_state.gp_max = gp_duration; > > > + trace_rcu_grace_period(something something); > > > + } > > > > Yes, that makes sense. But I think it is much better off as a readable value > > from a virtual fs. The drawback of tracing for this sort of thing are: > > - Tracing will only catch it if tracing is on > > - Tracing data can be lost if too many events, then no one has a clue what > > the max gp time is. > > - The data is already available in rcu_state::gp_max so copying it into the > > trace buffer seems a bit pointless IMHO > > - It is a lot easier on ones eyes to process a single counter than process > > heaps of traces. > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not > > hurt and could do more good than not. The scheduler already does this for > > scheduler statistics. I have seen Peter complain a lot about new tracepoints > > but not much (or never) about new statistics. > > > > Tracing has its strengths but may not apply well here IMO. I think a counter > > like this could be useful for tuning of things like the jiffies_*_sched_qs, > > the stall timeouts and also any other RCU knobs. What do you think? > > Is this one of those cases where eBPF is the answer, regardless of > the question? ;-) It could be. Except that people who fancy busybox still could benefit from the counter ;-) And also, eBPF uses RCU itself heavily, so it would help if the debug related counter itself didn't depend on it. ;-) thanks, - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-12 13:43 ` Joel Fernandes @ 2019-07-12 14:53 ` Paul E. McKenney 0 siblings, 0 replies; 53+ messages in thread From: Paul E. McKenney @ 2019-07-12 14:53 UTC (permalink / raw) To: Joel Fernandes Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Fri, Jul 12, 2019 at 09:43:11AM -0400, Joel Fernandes wrote: > On Fri, Jul 12, 2019 at 06:02:42AM -0700, Paul E. McKenney wrote: > > On Fri, Jul 12, 2019 at 08:51:16AM -0400, Joel Fernandes wrote: > > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > > > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > > > > period ever is recorded in rcu_state.gp_max. However it is not read from > > > > > anywhere. > > > > > > > > > > Any idea why it was added but not used? > > > > > > > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > > > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > > > > look like. > > > > > > > > Hi, > > > > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > > > rcu: Remove debugfs tracing > > > > > > > > removed all debugfs tracing, gp_max also included. > > > > > > > > And you sounds great. And even looks not that hard to add it like, > > > > > > > > :) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index ad9dc86..86095ff 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > > > raw_spin_lock_irq_rcu_node(rnp); > > > > rcu_state.gp_end = jiffies; > > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > > > - if (gp_duration > rcu_state.gp_max) > > > > + if (gp_duration > rcu_state.gp_max) { > > > > rcu_state.gp_max = gp_duration; > > > > + trace_rcu_grace_period(something something); > > > > + } > > > > > > Yes, that makes sense. But I think it is much better off as a readable value > > > from a virtual fs. The drawback of tracing for this sort of thing are: > > > - Tracing will only catch it if tracing is on > > > - Tracing data can be lost if too many events, then no one has a clue what > > > the max gp time is. > > > - The data is already available in rcu_state::gp_max so copying it into the > > > trace buffer seems a bit pointless IMHO > > > - It is a lot easier on ones eyes to process a single counter than process > > > heaps of traces. > > > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not > > > hurt and could do more good than not. The scheduler already does this for > > > scheduler statistics. I have seen Peter complain a lot about new tracepoints > > > but not much (or never) about new statistics. > > > > > > Tracing has its strengths but may not apply well here IMO. I think a counter > > > like this could be useful for tuning of things like the jiffies_*_sched_qs, > > > the stall timeouts and also any other RCU knobs. What do you think? > > > > Is this one of those cases where eBPF is the answer, regardless of > > the question? ;-) > > It could be. Except that people who fancy busybox still could benefit from > the counter ;-) ;-) Another approach might be for RCU to dump statistics, including this one, to the console every so often, for example, maybe every minute for the first hour, every hour for the first day, and every day after that. What else might work? (I am not a huge fan of bringing back RCU's debugfs due to testability concerns and due to the fact that very few people ever used it.) Plus the busybox people could always add a trace_printk() or similar. > And also, eBPF uses RCU itself heavily, so it would help if the debug related > counter itself didn't depend on it. ;-) I would think that eBPF printing a statistical counter from RCU wouldn't pose danger even given that eBPF uses RCU, so I suspect that your point about busybox carries more force in this argument. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-12 12:51 ` Joel Fernandes 2019-07-12 13:02 ` Paul E. McKenney @ 2019-07-13 8:47 ` Byungchul Park 2019-07-13 14:20 ` Joel Fernandes 1 sibling, 1 reply; 53+ messages in thread From: Byungchul Park @ 2019-07-13 8:47 UTC (permalink / raw) To: Joel Fernandes Cc: Byungchul Park, Paul E. McKenney, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Fri, Jul 12, 2019 at 9:51 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > > period ever is recorded in rcu_state.gp_max. However it is not read from > > > anywhere. > > > > > > Any idea why it was added but not used? > > > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > > look like. > > > > Hi, > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > rcu: Remove debugfs tracing > > > > removed all debugfs tracing, gp_max also included. > > > > And you sounds great. And even looks not that hard to add it like, > > > > :) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index ad9dc86..86095ff 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > raw_spin_lock_irq_rcu_node(rnp); > > rcu_state.gp_end = jiffies; > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > - if (gp_duration > rcu_state.gp_max) > > + if (gp_duration > rcu_state.gp_max) { > > rcu_state.gp_max = gp_duration; > > + trace_rcu_grace_period(something something); > > + } > > Yes, that makes sense. But I think it is much better off as a readable value > from a virtual fs. The drawback of tracing for this sort of thing are: > - Tracing will only catch it if tracing is on > - Tracing data can be lost if too many events, then no one has a clue what > the max gp time is. > - The data is already available in rcu_state::gp_max so copying it into the > trace buffer seems a bit pointless IMHO > - It is a lot easier on ones eyes to process a single counter than process > heaps of traces. > > I think a minimal set of RCU counters exposed to /proc or /sys should not > hurt and could do more good than not. The scheduler already does this for > scheduler statistics. I have seen Peter complain a lot about new tracepoints > but not much (or never) about new statistics. > > Tracing has its strengths but may not apply well here IMO. I think a counter > like this could be useful for tuning of things like the jiffies_*_sched_qs, > the stall timeouts and also any other RCU knobs. What do you think? I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it looks like undoing what Paul did at ae91aa0ad. I think you're rational enough. I just wondered how Paul think of it. -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-13 8:47 ` Byungchul Park @ 2019-07-13 14:20 ` Joel Fernandes 2019-07-13 15:13 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Joel Fernandes @ 2019-07-13 14:20 UTC (permalink / raw) To: Byungchul Park Cc: Byungchul Park, Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML, kernel-team On Sat, Jul 13, 2019 at 4:47 AM Byungchul Park <max.byungchul.park@gmail.com> wrote: > > On Fri, Jul 12, 2019 at 9:51 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > > > period ever is recorded in rcu_state.gp_max. However it is not read from > > > > anywhere. > > > > > > > > Any idea why it was added but not used? > > > > > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > > > look like. > > > > > > Hi, > > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > > rcu: Remove debugfs tracing > > > > > > removed all debugfs tracing, gp_max also included. > > > > > > And you sounds great. And even looks not that hard to add it like, > > > > > > :) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index ad9dc86..86095ff 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > > raw_spin_lock_irq_rcu_node(rnp); > > > rcu_state.gp_end = jiffies; > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > > - if (gp_duration > rcu_state.gp_max) > > > + if (gp_duration > rcu_state.gp_max) { > > > rcu_state.gp_max = gp_duration; > > > + trace_rcu_grace_period(something something); > > > + } > > > > Yes, that makes sense. But I think it is much better off as a readable value > > from a virtual fs. The drawback of tracing for this sort of thing are: > > - Tracing will only catch it if tracing is on > > - Tracing data can be lost if too many events, then no one has a clue what > > the max gp time is. > > - The data is already available in rcu_state::gp_max so copying it into the > > trace buffer seems a bit pointless IMHO > > - It is a lot easier on ones eyes to process a single counter than process > > heaps of traces. > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not > > hurt and could do more good than not. The scheduler already does this for > > scheduler statistics. I have seen Peter complain a lot about new tracepoints > > but not much (or never) about new statistics. > > > > Tracing has its strengths but may not apply well here IMO. I think a counter > > like this could be useful for tuning of things like the jiffies_*_sched_qs, > > the stall timeouts and also any other RCU knobs. What do you think? > > I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it > looks like undoing what Paul did at ae91aa0ad. > > I think you're rational enough. I just wondered how Paul think of it. I believe at least initially, a set of statistics can be made available only when rcutorture or rcuperf module is loaded. That way they are purely only for debugging and nothing needs to be exposed to normal kernels distributed thus reducing testability concerns. rcu_state::gp_max would be trivial to expose through this, but for other statistics that are more complicated - perhaps tracepoint_probe_register can be used to add hooks on to the tracepoints and generate statistics from them. Again the registration of the probe and the probe handler itself would all be in rcutorture/rcuperf test code and not a part of the kernel proper. Thoughts? - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-13 14:20 ` Joel Fernandes @ 2019-07-13 15:13 ` Paul E. McKenney 2019-07-13 15:42 ` Joel Fernandes 0 siblings, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2019-07-13 15:13 UTC (permalink / raw) To: Joel Fernandes Cc: Byungchul Park, Byungchul Park, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML, kernel-team On Sat, Jul 13, 2019 at 10:20:02AM -0400, Joel Fernandes wrote: > On Sat, Jul 13, 2019 at 4:47 AM Byungchul Park > <max.byungchul.park@gmail.com> wrote: > > > > On Fri, Jul 12, 2019 at 9:51 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > > > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > > > > period ever is recorded in rcu_state.gp_max. However it is not read from > > > > > anywhere. > > > > > > > > > > Any idea why it was added but not used? > > > > > > > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > > > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > > > > look like. > > > > > > > > Hi, > > > > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > > > rcu: Remove debugfs tracing > > > > > > > > removed all debugfs tracing, gp_max also included. > > > > > > > > And you sounds great. And even looks not that hard to add it like, > > > > > > > > :) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index ad9dc86..86095ff 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > > > raw_spin_lock_irq_rcu_node(rnp); > > > > rcu_state.gp_end = jiffies; > > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > > > - if (gp_duration > rcu_state.gp_max) > > > > + if (gp_duration > rcu_state.gp_max) { > > > > rcu_state.gp_max = gp_duration; > > > > + trace_rcu_grace_period(something something); > > > > + } > > > > > > Yes, that makes sense. But I think it is much better off as a readable value > > > from a virtual fs. The drawback of tracing for this sort of thing are: > > > - Tracing will only catch it if tracing is on > > > - Tracing data can be lost if too many events, then no one has a clue what > > > the max gp time is. > > > - The data is already available in rcu_state::gp_max so copying it into the > > > trace buffer seems a bit pointless IMHO > > > - It is a lot easier on ones eyes to process a single counter than process > > > heaps of traces. > > > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not > > > hurt and could do more good than not. The scheduler already does this for > > > scheduler statistics. I have seen Peter complain a lot about new tracepoints > > > but not much (or never) about new statistics. > > > > > > Tracing has its strengths but may not apply well here IMO. I think a counter > > > like this could be useful for tuning of things like the jiffies_*_sched_qs, > > > the stall timeouts and also any other RCU knobs. What do you think? > > > > I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it > > looks like undoing what Paul did at ae91aa0ad. > > > > I think you're rational enough. I just wondered how Paul think of it. > > I believe at least initially, a set of statistics can be made > available only when rcutorture or rcuperf module is loaded. That way > they are purely only for debugging and nothing needs to be exposed to > normal kernels distributed thus reducing testability concerns. > > rcu_state::gp_max would be trivial to expose through this, but for > other statistics that are more complicated - perhaps > tracepoint_probe_register can be used to add hooks on to the > tracepoints and generate statistics from them. Again the registration > of the probe and the probe handler itself would all be in > rcutorture/rcuperf test code and not a part of the kernel proper. > Thoughts? It still feels like you guys are hyperfocusing on this one particular knob. I instead need you to look at the interrelating knobs as a group. On the debugging side, suppose someone gives you an RCU bug report. What information will you need? How can you best get that information without excessive numbers of over-and-back interactions with the guy reporting the bug? As part of this last question, what information is normally supplied with the bug? Alternatively, what information are bug reporters normally expected to provide when asked? Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-13 15:13 ` Paul E. McKenney @ 2019-07-13 15:42 ` Joel Fernandes 2019-07-13 17:41 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Joel Fernandes @ 2019-07-13 15:42 UTC (permalink / raw) To: Paul E. McKenney Cc: Byungchul Park, Byungchul Park, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML, kernel-team On Sat, Jul 13, 2019 at 08:13:30AM -0700, Paul E. McKenney wrote: > On Sat, Jul 13, 2019 at 10:20:02AM -0400, Joel Fernandes wrote: > > On Sat, Jul 13, 2019 at 4:47 AM Byungchul Park > > <max.byungchul.park@gmail.com> wrote: > > > > > > On Fri, Jul 12, 2019 at 9:51 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > > > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > > > > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > > > > > period ever is recorded in rcu_state.gp_max. However it is not read from > > > > > > anywhere. > > > > > > > > > > > > Any idea why it was added but not used? > > > > > > > > > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > > > > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > > > > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > > > > > look like. > > > > > > > > > > Hi, > > > > > > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > > > > rcu: Remove debugfs tracing > > > > > > > > > > removed all debugfs tracing, gp_max also included. > > > > > > > > > > And you sounds great. And even looks not that hard to add it like, > > > > > > > > > > :) > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index ad9dc86..86095ff 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > > > > raw_spin_lock_irq_rcu_node(rnp); > > > > > rcu_state.gp_end = jiffies; > > > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > > > > - if (gp_duration > rcu_state.gp_max) > > > > > + if (gp_duration > rcu_state.gp_max) { > > > > > rcu_state.gp_max = gp_duration; > > > > > + trace_rcu_grace_period(something something); > > > > > + } > > > > > > > > Yes, that makes sense. But I think it is much better off as a readable value > > > > from a virtual fs. The drawback of tracing for this sort of thing are: > > > > - Tracing will only catch it if tracing is on > > > > - Tracing data can be lost if too many events, then no one has a clue what > > > > the max gp time is. > > > > - The data is already available in rcu_state::gp_max so copying it into the > > > > trace buffer seems a bit pointless IMHO > > > > - It is a lot easier on ones eyes to process a single counter than process > > > > heaps of traces. > > > > > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not > > > > hurt and could do more good than not. The scheduler already does this for > > > > scheduler statistics. I have seen Peter complain a lot about new tracepoints > > > > but not much (or never) about new statistics. > > > > > > > > Tracing has its strengths but may not apply well here IMO. I think a counter > > > > like this could be useful for tuning of things like the jiffies_*_sched_qs, > > > > the stall timeouts and also any other RCU knobs. What do you think? > > > > > > I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it > > > looks like undoing what Paul did at ae91aa0ad. > > > > > > I think you're rational enough. I just wondered how Paul think of it. > > > > I believe at least initially, a set of statistics can be made > > available only when rcutorture or rcuperf module is loaded. That way > > they are purely only for debugging and nothing needs to be exposed to > > normal kernels distributed thus reducing testability concerns. > > > > rcu_state::gp_max would be trivial to expose through this, but for > > other statistics that are more complicated - perhaps > > tracepoint_probe_register can be used to add hooks on to the > > tracepoints and generate statistics from them. Again the registration > > of the probe and the probe handler itself would all be in > > rcutorture/rcuperf test code and not a part of the kernel proper. > > Thoughts? > > It still feels like you guys are hyperfocusing on this one particular > knob. I instead need you to look at the interrelating knobs as a group. Thanks for the hints, we'll do that. > On the debugging side, suppose someone gives you an RCU bug report. > What information will you need? How can you best get that information > without excessive numbers of over-and-back interactions with the guy > reporting the bug? As part of this last question, what information is > normally supplied with the bug? Alternatively, what information are > bug reporters normally expected to provide when asked? I suppose I could dig out some of our Android bug reports of the past where there were RCU issues but if there's any fires you are currently fighting do send it our way as debugging homework ;-) thanks, - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-13 15:42 ` Joel Fernandes @ 2019-07-13 17:41 ` Paul E. McKenney 2019-07-14 13:39 ` Byungchul Park 2019-07-18 16:14 ` Joel Fernandes 0 siblings, 2 replies; 53+ messages in thread From: Paul E. McKenney @ 2019-07-13 17:41 UTC (permalink / raw) To: Joel Fernandes Cc: Byungchul Park, Byungchul Park, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML, kernel-team On Sat, Jul 13, 2019 at 11:42:57AM -0400, Joel Fernandes wrote: > On Sat, Jul 13, 2019 at 08:13:30AM -0700, Paul E. McKenney wrote: > > On Sat, Jul 13, 2019 at 10:20:02AM -0400, Joel Fernandes wrote: > > > On Sat, Jul 13, 2019 at 4:47 AM Byungchul Park > > > <max.byungchul.park@gmail.com> wrote: > > > > > > > > On Fri, Jul 12, 2019 at 9:51 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > > > > > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > > > > > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > > > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > > > > > > period ever is recorded in rcu_state.gp_max. However it is not read from > > > > > > > anywhere. > > > > > > > > > > > > > > Any idea why it was added but not used? > > > > > > > > > > > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > > > > > > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > > > > > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > > > > > > look like. > > > > > > > > > > > > Hi, > > > > > > > > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > > > > > rcu: Remove debugfs tracing > > > > > > > > > > > > removed all debugfs tracing, gp_max also included. > > > > > > > > > > > > And you sounds great. And even looks not that hard to add it like, > > > > > > > > > > > > :) > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > index ad9dc86..86095ff 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > > > > > raw_spin_lock_irq_rcu_node(rnp); > > > > > > rcu_state.gp_end = jiffies; > > > > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > > > > > - if (gp_duration > rcu_state.gp_max) > > > > > > + if (gp_duration > rcu_state.gp_max) { > > > > > > rcu_state.gp_max = gp_duration; > > > > > > + trace_rcu_grace_period(something something); > > > > > > + } > > > > > > > > > > Yes, that makes sense. But I think it is much better off as a readable value > > > > > from a virtual fs. The drawback of tracing for this sort of thing are: > > > > > - Tracing will only catch it if tracing is on > > > > > - Tracing data can be lost if too many events, then no one has a clue what > > > > > the max gp time is. > > > > > - The data is already available in rcu_state::gp_max so copying it into the > > > > > trace buffer seems a bit pointless IMHO > > > > > - It is a lot easier on ones eyes to process a single counter than process > > > > > heaps of traces. > > > > > > > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not > > > > > hurt and could do more good than not. The scheduler already does this for > > > > > scheduler statistics. I have seen Peter complain a lot about new tracepoints > > > > > but not much (or never) about new statistics. > > > > > > > > > > Tracing has its strengths but may not apply well here IMO. I think a counter > > > > > like this could be useful for tuning of things like the jiffies_*_sched_qs, > > > > > the stall timeouts and also any other RCU knobs. What do you think? > > > > > > > > I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it > > > > looks like undoing what Paul did at ae91aa0ad. > > > > > > > > I think you're rational enough. I just wondered how Paul think of it. > > > > > > I believe at least initially, a set of statistics can be made > > > available only when rcutorture or rcuperf module is loaded. That way > > > they are purely only for debugging and nothing needs to be exposed to > > > normal kernels distributed thus reducing testability concerns. > > > > > > rcu_state::gp_max would be trivial to expose through this, but for > > > other statistics that are more complicated - perhaps > > > tracepoint_probe_register can be used to add hooks on to the > > > tracepoints and generate statistics from them. Again the registration > > > of the probe and the probe handler itself would all be in > > > rcutorture/rcuperf test code and not a part of the kernel proper. > > > Thoughts? > > > > It still feels like you guys are hyperfocusing on this one particular > > knob. I instead need you to look at the interrelating knobs as a group. > > Thanks for the hints, we'll do that. > > > On the debugging side, suppose someone gives you an RCU bug report. > > What information will you need? How can you best get that information > > without excessive numbers of over-and-back interactions with the guy > > reporting the bug? As part of this last question, what information is > > normally supplied with the bug? Alternatively, what information are > > bug reporters normally expected to provide when asked? > > I suppose I could dig out some of our Android bug reports of the past where > there were RCU issues but if there's any fires you are currently fighting do > send it our way as debugging homework ;-) Evading the questions, are we? OK, I can be flexible. Suppose that you were getting RCU CPU stall warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). Of course, this really means that some other CPU/task is holding up multi_cpu_stop() without also blocking the current grace period. What is the best way to work out what is really holding things up? Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-13 17:41 ` Paul E. McKenney @ 2019-07-14 13:39 ` Byungchul Park 2019-07-14 13:56 ` Paul E. McKenney 2019-07-18 16:14 ` Joel Fernandes 1 sibling, 1 reply; 53+ messages in thread From: Byungchul Park @ 2019-07-14 13:39 UTC (permalink / raw) To: Paul E. McKenney Cc: Joel Fernandes, Byungchul Park, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML, kernel-team On Sun, Jul 14, 2019 at 2:41 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > On Sat, Jul 13, 2019 at 11:42:57AM -0400, Joel Fernandes wrote: > > On Sat, Jul 13, 2019 at 08:13:30AM -0700, Paul E. McKenney wrote: > > > On Sat, Jul 13, 2019 at 10:20:02AM -0400, Joel Fernandes wrote: > > > > On Sat, Jul 13, 2019 at 4:47 AM Byungchul Park > > > > <max.byungchul.park@gmail.com> wrote: > > > > > > > > > > On Fri, Jul 12, 2019 at 9:51 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > > > > > > > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > > > > > > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > > > > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > > > > > > > period ever is recorded in rcu_state.gp_max. However it is not read from > > > > > > > > anywhere. > > > > > > > > > > > > > > > > Any idea why it was added but not used? > > > > > > > > > > > > > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > > > > > > > > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > > > > > > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > > > > > > > look like. > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > > > > > > rcu: Remove debugfs tracing > > > > > > > > > > > > > > removed all debugfs tracing, gp_max also included. > > > > > > > > > > > > > > And you sounds great. And even looks not that hard to add it like, > > > > > > > > > > > > > > :) > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > > index ad9dc86..86095ff 100644 > > > > > > > --- a/kernel/rcu/tree.c > > > > > > > +++ b/kernel/rcu/tree.c > > > > > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > > > > > > raw_spin_lock_irq_rcu_node(rnp); > > > > > > > rcu_state.gp_end = jiffies; > > > > > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > > > > > > - if (gp_duration > rcu_state.gp_max) > > > > > > > + if (gp_duration > rcu_state.gp_max) { > > > > > > > rcu_state.gp_max = gp_duration; > > > > > > > + trace_rcu_grace_period(something something); > > > > > > > + } > > > > > > > > > > > > Yes, that makes sense. But I think it is much better off as a readable value > > > > > > from a virtual fs. The drawback of tracing for this sort of thing are: > > > > > > - Tracing will only catch it if tracing is on > > > > > > - Tracing data can be lost if too many events, then no one has a clue what > > > > > > the max gp time is. > > > > > > - The data is already available in rcu_state::gp_max so copying it into the > > > > > > trace buffer seems a bit pointless IMHO > > > > > > - It is a lot easier on ones eyes to process a single counter than process > > > > > > heaps of traces. > > > > > > > > > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not > > > > > > hurt and could do more good than not. The scheduler already does this for > > > > > > scheduler statistics. I have seen Peter complain a lot about new tracepoints > > > > > > but not much (or never) about new statistics. > > > > > > > > > > > > Tracing has its strengths but may not apply well here IMO. I think a counter > > > > > > like this could be useful for tuning of things like the jiffies_*_sched_qs, > > > > > > the stall timeouts and also any other RCU knobs. What do you think? > > > > > > > > > > I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it > > > > > looks like undoing what Paul did at ae91aa0ad. > > > > > > > > > > I think you're rational enough. I just wondered how Paul think of it. > > > > > > > > I believe at least initially, a set of statistics can be made > > > > available only when rcutorture or rcuperf module is loaded. That way > > > > they are purely only for debugging and nothing needs to be exposed to > > > > normal kernels distributed thus reducing testability concerns. > > > > > > > > rcu_state::gp_max would be trivial to expose through this, but for > > > > other statistics that are more complicated - perhaps > > > > tracepoint_probe_register can be used to add hooks on to the > > > > tracepoints and generate statistics from them. Again the registration > > > > of the probe and the probe handler itself would all be in > > > > rcutorture/rcuperf test code and not a part of the kernel proper. > > > > Thoughts? > > > > > > It still feels like you guys are hyperfocusing on this one particular > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > Thanks for the hints, we'll do that. > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > What information will you need? How can you best get that information > > > without excessive numbers of over-and-back interactions with the guy > > > reporting the bug? As part of this last question, what information is > > > normally supplied with the bug? Alternatively, what information are > > > bug reporters normally expected to provide when asked? > > > > I suppose I could dig out some of our Android bug reports of the past where > > there were RCU issues but if there's any fires you are currently fighting do > > send it our way as debugging homework ;-) > > Evading the questions, are we? > > OK, I can be flexible. Suppose that you were getting RCU CPU stall > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > Of course, this really means that some other CPU/task is holding up > multi_cpu_stop() without also blocking the current grace period. > > What is the best way to work out what is really holding things up? Either the stopper preempted another being in a critical section and has something wrong itself in case of PREEMPT or mechanisms for urgent control doesn't work correctly. I don't know what exactly you intended but I would check things like (1) irq disable / eqs / tick / scheduler events and (2) whether special handling for each level of qs urgency has started correctly. For that purpose all the history of those events would be more useful. And with thinking it more, we could come up with a good way to make use of those data to identify what the problem is. Do I catch the point correctly? If so, me and Joel can start to work on it. Otherwise, please correct me. -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-14 13:39 ` Byungchul Park @ 2019-07-14 13:56 ` Paul E. McKenney 2019-07-15 17:39 ` Joel Fernandes 0 siblings, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2019-07-14 13:56 UTC (permalink / raw) To: Byungchul Park Cc: Joel Fernandes, Byungchul Park, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML, kernel-team On Sun, Jul 14, 2019 at 10:39:58PM +0900, Byungchul Park wrote: > On Sun, Jul 14, 2019 at 2:41 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > On Sat, Jul 13, 2019 at 11:42:57AM -0400, Joel Fernandes wrote: > > > On Sat, Jul 13, 2019 at 08:13:30AM -0700, Paul E. McKenney wrote: > > > > On Sat, Jul 13, 2019 at 10:20:02AM -0400, Joel Fernandes wrote: > > > > > On Sat, Jul 13, 2019 at 4:47 AM Byungchul Park > > > > > <max.byungchul.park@gmail.com> wrote: > > > > > > > > > > > > On Fri, Jul 12, 2019 at 9:51 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > > > > > > > > > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote: > > > > > > > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > > > > > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > > > > > > > > period ever is recorded in rcu_state.gp_max. However it is not read from > > > > > > > > > anywhere. > > > > > > > > > > > > > > > > > > Any idea why it was added but not used? > > > > > > > > > > > > > > > > > > I am interested in dumping this value just for fun, and seeing what I get. > > > > > > > > > > > > > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > > > > > > > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > > > > > > > > look like. > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > > > > > > > rcu: Remove debugfs tracing > > > > > > > > > > > > > > > > removed all debugfs tracing, gp_max also included. > > > > > > > > > > > > > > > > And you sounds great. And even looks not that hard to add it like, > > > > > > > > > > > > > > > > :) > > > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > > > index ad9dc86..86095ff 100644 > > > > > > > > --- a/kernel/rcu/tree.c > > > > > > > > +++ b/kernel/rcu/tree.c > > > > > > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > > > > > > > raw_spin_lock_irq_rcu_node(rnp); > > > > > > > > rcu_state.gp_end = jiffies; > > > > > > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > > > > > > > - if (gp_duration > rcu_state.gp_max) > > > > > > > > + if (gp_duration > rcu_state.gp_max) { > > > > > > > > rcu_state.gp_max = gp_duration; > > > > > > > > + trace_rcu_grace_period(something something); > > > > > > > > + } > > > > > > > > > > > > > > Yes, that makes sense. But I think it is much better off as a readable value > > > > > > > from a virtual fs. The drawback of tracing for this sort of thing are: > > > > > > > - Tracing will only catch it if tracing is on > > > > > > > - Tracing data can be lost if too many events, then no one has a clue what > > > > > > > the max gp time is. > > > > > > > - The data is already available in rcu_state::gp_max so copying it into the > > > > > > > trace buffer seems a bit pointless IMHO > > > > > > > - It is a lot easier on ones eyes to process a single counter than process > > > > > > > heaps of traces. > > > > > > > > > > > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not > > > > > > > hurt and could do more good than not. The scheduler already does this for > > > > > > > scheduler statistics. I have seen Peter complain a lot about new tracepoints > > > > > > > but not much (or never) about new statistics. > > > > > > > > > > > > > > Tracing has its strengths but may not apply well here IMO. I think a counter > > > > > > > like this could be useful for tuning of things like the jiffies_*_sched_qs, > > > > > > > the stall timeouts and also any other RCU knobs. What do you think? > > > > > > > > > > > > I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it > > > > > > looks like undoing what Paul did at ae91aa0ad. > > > > > > > > > > > > I think you're rational enough. I just wondered how Paul think of it. > > > > > > > > > > I believe at least initially, a set of statistics can be made > > > > > available only when rcutorture or rcuperf module is loaded. That way > > > > > they are purely only for debugging and nothing needs to be exposed to > > > > > normal kernels distributed thus reducing testability concerns. > > > > > > > > > > rcu_state::gp_max would be trivial to expose through this, but for > > > > > other statistics that are more complicated - perhaps > > > > > tracepoint_probe_register can be used to add hooks on to the > > > > > tracepoints and generate statistics from them. Again the registration > > > > > of the probe and the probe handler itself would all be in > > > > > rcutorture/rcuperf test code and not a part of the kernel proper. > > > > > Thoughts? > > > > > > > > It still feels like you guys are hyperfocusing on this one particular > > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > > > Thanks for the hints, we'll do that. > > > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > > What information will you need? How can you best get that information > > > > without excessive numbers of over-and-back interactions with the guy > > > > reporting the bug? As part of this last question, what information is > > > > normally supplied with the bug? Alternatively, what information are > > > > bug reporters normally expected to provide when asked? > > > > > > I suppose I could dig out some of our Android bug reports of the past where > > > there were RCU issues but if there's any fires you are currently fighting do > > > send it our way as debugging homework ;-) > > > > Evading the questions, are we? > > > > OK, I can be flexible. Suppose that you were getting RCU CPU stall > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > > Of course, this really means that some other CPU/task is holding up > > multi_cpu_stop() without also blocking the current grace period. > > > > What is the best way to work out what is really holding things up? > > Either the stopper preempted another being in a critical section and > has something wrong itself in case of PREEMPT or mechanisms for > urgent control doesn't work correctly. > > I don't know what exactly you intended but I would check things like > (1) irq disable / eqs / tick / scheduler events and (2) whether special > handling for each level of qs urgency has started correctly. For that > purpose all the history of those events would be more useful. > > And with thinking it more, we could come up with a good way to > make use of those data to identify what the problem is. Do I catch > the point correctly? If so, me and Joel can start to work on it. > Otherwise, please correct me. I believe you are on the right track. In short, it would be great if the kernel would automatically dump out the needed information when cpu_stopper gets stalled, sort of like RCU does (much of the time, anyway) in its CPU stall warnings. Given a patch that did this, I would be quite happy to help advocate for it! Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-14 13:56 ` Paul E. McKenney @ 2019-07-15 17:39 ` Joel Fernandes 2019-07-15 20:09 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Joel Fernandes @ 2019-07-15 17:39 UTC (permalink / raw) To: Paul E. McKenney Cc: Byungchul Park, Byungchul Park, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML, kernel-team On Sun, Jul 14, 2019 at 06:56:10AM -0700, Paul E. McKenney wrote: [snip] > > > > > > > > > > > > > > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > > > > > > > > rcu: Remove debugfs tracing > > > > > > > > > > > > > > > > > > removed all debugfs tracing, gp_max also included. > > > > > > > > > > > > > > > > > > And you sounds great. And even looks not that hard to add it like, > > > > > > > > > > > > > > > > > > :) > > > > > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > > > > index ad9dc86..86095ff 100644 > > > > > > > > > --- a/kernel/rcu/tree.c > > > > > > > > > +++ b/kernel/rcu/tree.c > > > > > > > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > > > > > > > > raw_spin_lock_irq_rcu_node(rnp); > > > > > > > > > rcu_state.gp_end = jiffies; > > > > > > > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > > > > > > > > - if (gp_duration > rcu_state.gp_max) > > > > > > > > > + if (gp_duration > rcu_state.gp_max) { > > > > > > > > > rcu_state.gp_max = gp_duration; > > > > > > > > > + trace_rcu_grace_period(something something); > > > > > > > > > + } > > > > > > > > > > > > > > > > Yes, that makes sense. But I think it is much better off as a readable value > > > > > > > > from a virtual fs. The drawback of tracing for this sort of thing are: > > > > > > > > - Tracing will only catch it if tracing is on > > > > > > > > - Tracing data can be lost if too many events, then no one has a clue what > > > > > > > > the max gp time is. > > > > > > > > - The data is already available in rcu_state::gp_max so copying it into the > > > > > > > > trace buffer seems a bit pointless IMHO > > > > > > > > - It is a lot easier on ones eyes to process a single counter than process > > > > > > > > heaps of traces. > > > > > > > > > > > > > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not > > > > > > > > hurt and could do more good than not. The scheduler already does this for > > > > > > > > scheduler statistics. I have seen Peter complain a lot about new tracepoints > > > > > > > > but not much (or never) about new statistics. > > > > > > > > > > > > > > > > Tracing has its strengths but may not apply well here IMO. I think a counter > > > > > > > > like this could be useful for tuning of things like the jiffies_*_sched_qs, > > > > > > > > the stall timeouts and also any other RCU knobs. What do you think? > > > > > > > > > > > > > > I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it > > > > > > > looks like undoing what Paul did at ae91aa0ad. > > > > > > > > > > > > > > I think you're rational enough. I just wondered how Paul think of it. > > > > > > > > > > > > I believe at least initially, a set of statistics can be made > > > > > > available only when rcutorture or rcuperf module is loaded. That way > > > > > > they are purely only for debugging and nothing needs to be exposed to > > > > > > normal kernels distributed thus reducing testability concerns. > > > > > > > > > > > > rcu_state::gp_max would be trivial to expose through this, but for > > > > > > other statistics that are more complicated - perhaps > > > > > > tracepoint_probe_register can be used to add hooks on to the > > > > > > tracepoints and generate statistics from them. Again the registration > > > > > > of the probe and the probe handler itself would all be in > > > > > > rcutorture/rcuperf test code and not a part of the kernel proper. > > > > > > Thoughts? > > > > > > > > > > It still feels like you guys are hyperfocusing on this one particular > > > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > > > > > Thanks for the hints, we'll do that. > > > > > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > > > What information will you need? How can you best get that information > > > > > without excessive numbers of over-and-back interactions with the guy > > > > > reporting the bug? As part of this last question, what information is > > > > > normally supplied with the bug? Alternatively, what information are > > > > > bug reporters normally expected to provide when asked? > > > > > > > > I suppose I could dig out some of our Android bug reports of the past where > > > > there were RCU issues but if there's any fires you are currently fighting do > > > > send it our way as debugging homework ;-) > > > > > > Evading the questions, are we? Sorry if I sounded like evading, I was just saying that it would be nice to work on some specific issue or bug report and then figure out what is needed from there, instead of trying to predict what data is useful. Of course, I think predicting and dumping data is also useful, but I was just wondering if you had some specific issue in mind that we could work off of (you did mention the CPU stopper below so thank you!) > > > OK, I can be flexible. Suppose that you were getting RCU CPU stall > > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > > > Of course, this really means that some other CPU/task is holding up > > > multi_cpu_stop() without also blocking the current grace period. > > > > > > What is the best way to work out what is really holding things up? > > > > Either the stopper preempted another being in a critical section and > > has something wrong itself in case of PREEMPT or mechanisms for > > urgent control doesn't work correctly. > > > > I don't know what exactly you intended but I would check things like > > (1) irq disable / eqs / tick / scheduler events and (2) whether special > > handling for each level of qs urgency has started correctly. For that > > purpose all the history of those events would be more useful. Agreed, these are all good. > > And with thinking it more, we could come up with a good way to > > make use of those data to identify what the problem is. Do I catch > > the point correctly? If so, me and Joel can start to work on it. > > Otherwise, please correct me. > > I believe you are on the right track. In short, it would be great if > the kernel would automatically dump out the needed information when > cpu_stopper gets stalled, sort of like RCU does (much of the time, > anyway) in its CPU stall warnings. Given a patch that did this, I would > be quite happy to help advocate for it! In case you have a LKML link to a thread or bug report to this specific cpu_stopper issue, please do pass it along. Happy to work on this with Byungchul, thanks, - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-15 17:39 ` Joel Fernandes @ 2019-07-15 20:09 ` Paul E. McKenney 0 siblings, 0 replies; 53+ messages in thread From: Paul E. McKenney @ 2019-07-15 20:09 UTC (permalink / raw) To: Joel Fernandes Cc: Byungchul Park, Byungchul Park, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML, kernel-team On Mon, Jul 15, 2019 at 01:39:24PM -0400, Joel Fernandes wrote: > On Sun, Jul 14, 2019 at 06:56:10AM -0700, Paul E. McKenney wrote: > [snip] > > > > > > > > > > > > > > > > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7 > > > > > > > > > > rcu: Remove debugfs tracing > > > > > > > > > > > > > > > > > > > > removed all debugfs tracing, gp_max also included. > > > > > > > > > > > > > > > > > > > > And you sounds great. And even looks not that hard to add it like, > > > > > > > > > > > > > > > > > > > > :) > > > > > > > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > > > > > index ad9dc86..86095ff 100644 > > > > > > > > > > --- a/kernel/rcu/tree.c > > > > > > > > > > +++ b/kernel/rcu/tree.c > > > > > > > > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void) > > > > > > > > > > raw_spin_lock_irq_rcu_node(rnp); > > > > > > > > > > rcu_state.gp_end = jiffies; > > > > > > > > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start; > > > > > > > > > > - if (gp_duration > rcu_state.gp_max) > > > > > > > > > > + if (gp_duration > rcu_state.gp_max) { > > > > > > > > > > rcu_state.gp_max = gp_duration; > > > > > > > > > > + trace_rcu_grace_period(something something); > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > Yes, that makes sense. But I think it is much better off as a readable value > > > > > > > > > from a virtual fs. The drawback of tracing for this sort of thing are: > > > > > > > > > - Tracing will only catch it if tracing is on > > > > > > > > > - Tracing data can be lost if too many events, then no one has a clue what > > > > > > > > > the max gp time is. > > > > > > > > > - The data is already available in rcu_state::gp_max so copying it into the > > > > > > > > > trace buffer seems a bit pointless IMHO > > > > > > > > > - It is a lot easier on ones eyes to process a single counter than process > > > > > > > > > heaps of traces. > > > > > > > > > > > > > > > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not > > > > > > > > > hurt and could do more good than not. The scheduler already does this for > > > > > > > > > scheduler statistics. I have seen Peter complain a lot about new tracepoints > > > > > > > > > but not much (or never) about new statistics. > > > > > > > > > > > > > > > > > > Tracing has its strengths but may not apply well here IMO. I think a counter > > > > > > > > > like this could be useful for tuning of things like the jiffies_*_sched_qs, > > > > > > > > > the stall timeouts and also any other RCU knobs. What do you think? > > > > > > > > > > > > > > > > I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it > > > > > > > > looks like undoing what Paul did at ae91aa0ad. > > > > > > > > > > > > > > > > I think you're rational enough. I just wondered how Paul think of it. > > > > > > > > > > > > > > I believe at least initially, a set of statistics can be made > > > > > > > available only when rcutorture or rcuperf module is loaded. That way > > > > > > > they are purely only for debugging and nothing needs to be exposed to > > > > > > > normal kernels distributed thus reducing testability concerns. > > > > > > > > > > > > > > rcu_state::gp_max would be trivial to expose through this, but for > > > > > > > other statistics that are more complicated - perhaps > > > > > > > tracepoint_probe_register can be used to add hooks on to the > > > > > > > tracepoints and generate statistics from them. Again the registration > > > > > > > of the probe and the probe handler itself would all be in > > > > > > > rcutorture/rcuperf test code and not a part of the kernel proper. > > > > > > > Thoughts? > > > > > > > > > > > > It still feels like you guys are hyperfocusing on this one particular > > > > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > > > > > > > Thanks for the hints, we'll do that. > > > > > > > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > > > > What information will you need? How can you best get that information > > > > > > without excessive numbers of over-and-back interactions with the guy > > > > > > reporting the bug? As part of this last question, what information is > > > > > > normally supplied with the bug? Alternatively, what information are > > > > > > bug reporters normally expected to provide when asked? > > > > > > > > > > I suppose I could dig out some of our Android bug reports of the past where > > > > > there were RCU issues but if there's any fires you are currently fighting do > > > > > send it our way as debugging homework ;-) > > > > > > > > Evading the questions, are we? > > Sorry if I sounded like evading, I was just saying that it would be nice to > work on some specific issue or bug report and then figure out what is needed > from there, instead of trying to predict what data is useful. Of course, I > think predicting and dumping data is also useful, but I was just wondering if > you had some specific issue in mind that we could work off of (you did > mention the CPU stopper below so thank you!) No worries, and fair enough! > > > > OK, I can be flexible. Suppose that you were getting RCU CPU stall > > > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > > > > Of course, this really means that some other CPU/task is holding up > > > > multi_cpu_stop() without also blocking the current grace period. > > > > > > > > What is the best way to work out what is really holding things up? > > > > > > Either the stopper preempted another being in a critical section and > > > has something wrong itself in case of PREEMPT or mechanisms for > > > urgent control doesn't work correctly. > > > > > > I don't know what exactly you intended but I would check things like > > > (1) irq disable / eqs / tick / scheduler events and (2) whether special > > > handling for each level of qs urgency has started correctly. For that > > > purpose all the history of those events would be more useful. > > Agreed, these are all good. Just to reiterate -- it would be good if multi_cpu_stop() could automatically dump out useful information when it has been delayed too long, where "too long" is probably a bit shorter than the RCU CPU stall warning timeout. Useful information would include the stack of the task impeding multi_cpu_stop(). > > > And with thinking it more, we could come up with a good way to > > > make use of those data to identify what the problem is. Do I catch > > > the point correctly? If so, me and Joel can start to work on it. > > > Otherwise, please correct me. > > > > I believe you are on the right track. In short, it would be great if > > the kernel would automatically dump out the needed information when > > cpu_stopper gets stalled, sort of like RCU does (much of the time, > > anyway) in its CPU stall warnings. Given a patch that did this, I would > > be quite happy to help advocate for it! > > In case you have a LKML link to a thread or bug report to this specific > cpu_stopper issue, please do pass it along. I hope to have an rcutorture-based repeat-by in a few days, give or take. (Famous last words!) > Happy to work on this with Byungchul, thanks, Thank you both! Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-13 17:41 ` Paul E. McKenney 2019-07-14 13:39 ` Byungchul Park @ 2019-07-18 16:14 ` Joel Fernandes 2019-07-18 16:15 ` Joel Fernandes ` (2 more replies) 1 sibling, 3 replies; 53+ messages in thread From: Joel Fernandes @ 2019-07-18 16:14 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Byungchul Park, Byungchul Park, rcu, LKML, kernel-team Trimming the list a bit to keep my noise level low, On Sat, Jul 13, 2019 at 1:41 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: [snip] > > It still feels like you guys are hyperfocusing on this one particular > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > Thanks for the hints, we'll do that. > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > What information will you need? How can you best get that information > > > without excessive numbers of over-and-back interactions with the guy > > > reporting the bug? As part of this last question, what information is > > > normally supplied with the bug? Alternatively, what information are > > > bug reporters normally expected to provide when asked? > > > > I suppose I could dig out some of our Android bug reports of the past where > > there were RCU issues but if there's any fires you are currently fighting do > > send it our way as debugging homework ;-) > > Suppose that you were getting RCU CPU stall > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > Of course, this really means that some other CPU/task is holding up > multi_cpu_stop() without also blocking the current grace period. > So I took a shot at this trying to learn how CPU stoppers work in relation to this problem. I am assuming here say CPU X has entered MULTI_STOP_DISABLE_IRQ state in multi_cpu_stop() but another CPU Y has not yet entered this state. So CPU X is stalling RCU but it is really because of CPU Y. Now in the problem statement, you mentioned CPU Y is not holding up the grace period, which means Y doesn't have any of IRQ, BH or preemption disabled ; but is still somehow stalling RCU indirectly by troubling X. This can only happen if : - CPU Y has a thread executing on it that is higher priority than CPU X's stopper thread which prevents it from getting scheduled. - but the CPU stopper thread (migration/..) is highest priority RT so this would be some kind of an odd scheduler bug. - There is a bug in the CPU stopper machinery itself preventing it from scheduling the stopper on Y. Even though Y is not holding up the grace period. Did I get that right? Would be exciting to run the rcutorture test once Paul has it available to reproduce this problem. thanks, - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-18 16:14 ` Joel Fernandes @ 2019-07-18 16:15 ` Joel Fernandes 2019-07-18 21:34 ` Paul E. McKenney 2019-07-19 0:39 ` Byungchul Park 2 siblings, 0 replies; 53+ messages in thread From: Joel Fernandes @ 2019-07-18 16:15 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Byungchul Park, Byungchul Park, rcu, LKML, kernel-team On Thu, Jul 18, 2019 at 12:14 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > Trimming the list a bit to keep my noise level low, > > On Sat, Jul 13, 2019 at 1:41 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > [snip] > > > It still feels like you guys are hyperfocusing on this one particular > > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > > > Thanks for the hints, we'll do that. > > > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > > What information will you need? How can you best get that information > > > > without excessive numbers of over-and-back interactions with the guy > > > > reporting the bug? As part of this last question, what information is > > > > normally supplied with the bug? Alternatively, what information are > > > > bug reporters normally expected to provide when asked? > > > > > > I suppose I could dig out some of our Android bug reports of the past where > > > there were RCU issues but if there's any fires you are currently fighting do > > > send it our way as debugging homework ;-) > > > > Suppose that you were getting RCU CPU stall > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > > Of course, this really means that some other CPU/task is holding up > > multi_cpu_stop() without also blocking the current grace period. > > > > So I took a shot at this trying to learn how CPU stoppers work in > relation to this problem. > > I am assuming here say CPU X has entered MULTI_STOP_DISABLE_IRQ state > in multi_cpu_stop() but another CPU Y has not yet entered this state. > So CPU X is stalling RCU but it is really because of CPU Y. Now in the > problem statement, you mentioned CPU Y is not holding up the grace > period, which means Y doesn't have any of IRQ, BH or preemption > disabled ; but is still somehow stalling RCU indirectly by troubling > X. > > This can only happen if : > - CPU Y has a thread executing on it that is higher priority than CPU > X's stopper thread which prevents it from getting scheduled. - but the Sorry, here I meant "higher priority than CPU Y's stopper thread". ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-18 16:14 ` Joel Fernandes 2019-07-18 16:15 ` Joel Fernandes @ 2019-07-18 21:34 ` Paul E. McKenney 2019-07-19 0:48 ` Joel Fernandes 2019-07-19 0:54 ` Byungchul Park 2019-07-19 0:39 ` Byungchul Park 2 siblings, 2 replies; 53+ messages in thread From: Paul E. McKenney @ 2019-07-18 21:34 UTC (permalink / raw) To: Joel Fernandes; +Cc: Byungchul Park, Byungchul Park, rcu, LKML, kernel-team On Thu, Jul 18, 2019 at 12:14:22PM -0400, Joel Fernandes wrote: > Trimming the list a bit to keep my noise level low, > > On Sat, Jul 13, 2019 at 1:41 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > [snip] > > > It still feels like you guys are hyperfocusing on this one particular > > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > > > Thanks for the hints, we'll do that. > > > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > > What information will you need? How can you best get that information > > > > without excessive numbers of over-and-back interactions with the guy > > > > reporting the bug? As part of this last question, what information is > > > > normally supplied with the bug? Alternatively, what information are > > > > bug reporters normally expected to provide when asked? > > > > > > I suppose I could dig out some of our Android bug reports of the past where > > > there were RCU issues but if there's any fires you are currently fighting do > > > send it our way as debugging homework ;-) > > > > Suppose that you were getting RCU CPU stall > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > > Of course, this really means that some other CPU/task is holding up > > multi_cpu_stop() without also blocking the current grace period. > > > > So I took a shot at this trying to learn how CPU stoppers work in > relation to this problem. > > I am assuming here say CPU X has entered MULTI_STOP_DISABLE_IRQ state > in multi_cpu_stop() but another CPU Y has not yet entered this state. > So CPU X is stalling RCU but it is really because of CPU Y. Now in the > problem statement, you mentioned CPU Y is not holding up the grace > period, which means Y doesn't have any of IRQ, BH or preemption > disabled ; but is still somehow stalling RCU indirectly by troubling > X. > > This can only happen if : > - CPU Y has a thread executing on it that is higher priority than CPU > X's stopper thread which prevents it from getting scheduled. - but the > CPU stopper thread (migration/..) is highest priority RT so this would > be some kind of an odd scheduler bug. > - There is a bug in the CPU stopper machinery itself preventing it > from scheduling the stopper on Y. Even though Y is not holding up the > grace period. - CPU Y might have already passed through its quiescent state for the current grace period, then disabled IRQs indefinitely. Now, CPU Y would block a later grace period, but CPU X is preventing the current grace period from ending, so no such later grace period can start. > Did I get that right? Would be exciting to run the rcutorture test > once Paul has it available to reproduce this problem. Working on it! Slow, I know! Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-18 21:34 ` Paul E. McKenney @ 2019-07-19 0:48 ` Joel Fernandes 2019-07-19 0:54 ` Byungchul Park 1 sibling, 0 replies; 53+ messages in thread From: Joel Fernandes @ 2019-07-19 0:48 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Byungchul Park, Byungchul Park, rcu, LKML, kernel-team On Thu, Jul 18, 2019 at 02:34:19PM -0700, Paul E. McKenney wrote: > On Thu, Jul 18, 2019 at 12:14:22PM -0400, Joel Fernandes wrote: > > Trimming the list a bit to keep my noise level low, > > > > On Sat, Jul 13, 2019 at 1:41 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > [snip] > > > > It still feels like you guys are hyperfocusing on this one particular > > > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > > > > > Thanks for the hints, we'll do that. > > > > > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > > > What information will you need? How can you best get that information > > > > > without excessive numbers of over-and-back interactions with the guy > > > > > reporting the bug? As part of this last question, what information is > > > > > normally supplied with the bug? Alternatively, what information are > > > > > bug reporters normally expected to provide when asked? > > > > > > > > I suppose I could dig out some of our Android bug reports of the past where > > > > there were RCU issues but if there's any fires you are currently fighting do > > > > send it our way as debugging homework ;-) > > > > > > Suppose that you were getting RCU CPU stall > > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > > > Of course, this really means that some other CPU/task is holding up > > > multi_cpu_stop() without also blocking the current grace period. > > > > > > > So I took a shot at this trying to learn how CPU stoppers work in > > relation to this problem. > > > > I am assuming here say CPU X has entered MULTI_STOP_DISABLE_IRQ state > > in multi_cpu_stop() but another CPU Y has not yet entered this state. > > So CPU X is stalling RCU but it is really because of CPU Y. Now in the > > problem statement, you mentioned CPU Y is not holding up the grace > > period, which means Y doesn't have any of IRQ, BH or preemption > > disabled ; but is still somehow stalling RCU indirectly by troubling > > X. > > > > This can only happen if : > > - CPU Y has a thread executing on it that is higher priority than CPU > > X's stopper thread which prevents it from getting scheduled. - but the > > CPU stopper thread (migration/..) is highest priority RT so this would > > be some kind of an odd scheduler bug. > > - There is a bug in the CPU stopper machinery itself preventing it > > from scheduling the stopper on Y. Even though Y is not holding up the > > grace period. > > - CPU Y might have already passed through its quiescent state for > the current grace period, then disabled IRQs indefinitely. > Now, CPU Y would block a later grace period, but CPU X is > preventing the current grace period from ending, so no such > later grace period can start. Ah totally possible, yes! thanks, - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-18 21:34 ` Paul E. McKenney 2019-07-19 0:48 ` Joel Fernandes @ 2019-07-19 0:54 ` Byungchul Park 1 sibling, 0 replies; 53+ messages in thread From: Byungchul Park @ 2019-07-19 0:54 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Joel Fernandes, Byungchul Park, rcu, LKML, kernel-team On Thu, Jul 18, 2019 at 02:34:19PM -0700, Paul E. McKenney wrote: > On Thu, Jul 18, 2019 at 12:14:22PM -0400, Joel Fernandes wrote: > > Trimming the list a bit to keep my noise level low, > > > > On Sat, Jul 13, 2019 at 1:41 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > [snip] > > > > It still feels like you guys are hyperfocusing on this one particular > > > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > > > > > Thanks for the hints, we'll do that. > > > > > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > > > What information will you need? How can you best get that information > > > > > without excessive numbers of over-and-back interactions with the guy > > > > > reporting the bug? As part of this last question, what information is > > > > > normally supplied with the bug? Alternatively, what information are > > > > > bug reporters normally expected to provide when asked? > > > > > > > > I suppose I could dig out some of our Android bug reports of the past where > > > > there were RCU issues but if there's any fires you are currently fighting do > > > > send it our way as debugging homework ;-) > > > > > > Suppose that you were getting RCU CPU stall > > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > > > Of course, this really means that some other CPU/task is holding up > > > multi_cpu_stop() without also blocking the current grace period. > > > > > > > So I took a shot at this trying to learn how CPU stoppers work in > > relation to this problem. > > > > I am assuming here say CPU X has entered MULTI_STOP_DISABLE_IRQ state > > in multi_cpu_stop() but another CPU Y has not yet entered this state. > > So CPU X is stalling RCU but it is really because of CPU Y. Now in the > > problem statement, you mentioned CPU Y is not holding up the grace > > period, which means Y doesn't have any of IRQ, BH or preemption > > disabled ; but is still somehow stalling RCU indirectly by troubling > > X. > > > > This can only happen if : > > - CPU Y has a thread executing on it that is higher priority than CPU > > X's stopper thread which prevents it from getting scheduled. - but the > > CPU stopper thread (migration/..) is highest priority RT so this would > > be some kind of an odd scheduler bug. > > - There is a bug in the CPU stopper machinery itself preventing it > > from scheduling the stopper on Y. Even though Y is not holding up the > > grace period. > > - CPU Y might have already passed through its quiescent state for > the current grace period, then disabled IRQs indefinitely. Or for a longer time than the period that rcu considers as a stall. Or preemption disabled for that long time. Or the stopper on Y even has yet to be woken up inside scheduler because of any reasons but maybe locks. > Now, CPU Y would block a later grace period, but CPU X is > preventing the current grace period from ending, so no such > later grace period can start. > > > Did I get that right? Would be exciting to run the rcutorture test > > once Paul has it available to reproduce this problem. > > Working on it! Slow, I know! > > Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-18 16:14 ` Joel Fernandes 2019-07-18 16:15 ` Joel Fernandes 2019-07-18 21:34 ` Paul E. McKenney @ 2019-07-19 0:39 ` Byungchul Park 2019-07-19 0:52 ` Joel Fernandes 2 siblings, 1 reply; 53+ messages in thread From: Byungchul Park @ 2019-07-19 0:39 UTC (permalink / raw) To: Joel Fernandes; +Cc: Paul E. McKenney, Byungchul Park, rcu, LKML, kernel-team On Thu, Jul 18, 2019 at 12:14:22PM -0400, Joel Fernandes wrote: > Trimming the list a bit to keep my noise level low, > > On Sat, Jul 13, 2019 at 1:41 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > [snip] > > > It still feels like you guys are hyperfocusing on this one particular > > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > > > Thanks for the hints, we'll do that. > > > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > > What information will you need? How can you best get that information > > > > without excessive numbers of over-and-back interactions with the guy > > > > reporting the bug? As part of this last question, what information is > > > > normally supplied with the bug? Alternatively, what information are > > > > bug reporters normally expected to provide when asked? > > > > > > I suppose I could dig out some of our Android bug reports of the past where > > > there were RCU issues but if there's any fires you are currently fighting do > > > send it our way as debugging homework ;-) > > > > Suppose that you were getting RCU CPU stall > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > > Of course, this really means that some other CPU/task is holding up > > multi_cpu_stop() without also blocking the current grace period. > > > > So I took a shot at this trying to learn how CPU stoppers work in > relation to this problem. > > I am assuming here say CPU X has entered MULTI_STOP_DISABLE_IRQ state > in multi_cpu_stop() but another CPU Y has not yet entered this state. > So CPU X is stalling RCU but it is really because of CPU Y. Now in the > problem statement, you mentioned CPU Y is not holding up the grace > period, which means Y doesn't have any of IRQ, BH or preemption > disabled ; but is still somehow stalling RCU indirectly by troubling > X. > > This can only happen if : > - CPU Y has a thread executing on it that is higher priority than CPU > X's stopper thread which prevents it from getting scheduled. - but the > CPU stopper thread (migration/..) is highest priority RT so this would > be some kind of an odd scheduler bug. I think this bug hardly can happen. > - There is a bug in the CPU stopper machinery itself preventing it > from scheduling the stopper on Y. Even though Y is not holding up the > grace period. Or any thread on Y is busy with preemption/irq disabled preventing the stopper from being scheduled on Y. Or something is stuck in ttwu() to wake up the stopper on Y due to any scheduler locks such as pi_lock or rq->lock or something. I think what you mentioned can happen easily. Basically we would need information about preemption/irq disabled sections on Y and scheduler's current activity on every cpu at that time. Am I missing something? > Did I get that right? Would be exciting to run the rcutorture test > once Paul has it available to reproduce this problem. Thanks, Byungchul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-19 0:39 ` Byungchul Park @ 2019-07-19 0:52 ` Joel Fernandes 2019-07-19 1:10 ` Byungchul Park 2019-07-19 7:43 ` Paul E. McKenney 0 siblings, 2 replies; 53+ messages in thread From: Joel Fernandes @ 2019-07-19 0:52 UTC (permalink / raw) To: Byungchul Park; +Cc: Paul E. McKenney, Byungchul Park, rcu, LKML, kernel-team On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: [snip] > > - There is a bug in the CPU stopper machinery itself preventing it > > from scheduling the stopper on Y. Even though Y is not holding up the > > grace period. > > Or any thread on Y is busy with preemption/irq disabled preventing the > stopper from being scheduled on Y. > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > scheduler locks such as pi_lock or rq->lock or something. > > I think what you mentioned can happen easily. > > Basically we would need information about preemption/irq disabled > sections on Y and scheduler's current activity on every cpu at that time. I think all that's needed is an NMI backtrace on all CPUs. An ARM we don't have NMI solutions and only IPI or interrupt based backtrace works which should at least catch and the preempt disable and softirq disable cases. But yeah I don't see why just the stacks of those CPUs that are blocking the CPU X would not suffice for the trivial cases where a piece of misbehaving code disable interrupts / preemption and prevented the stopper thread from executing. May be once the test case is ready (no rush!) , then it will be more clear what can help. J. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-19 0:52 ` Joel Fernandes @ 2019-07-19 1:10 ` Byungchul Park 2019-07-19 7:43 ` Paul E. McKenney 1 sibling, 0 replies; 53+ messages in thread From: Byungchul Park @ 2019-07-19 1:10 UTC (permalink / raw) To: Joel Fernandes; +Cc: Paul E. McKenney, Byungchul Park, rcu, LKML, kernel-team On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: > [snip] > > > - There is a bug in the CPU stopper machinery itself preventing it > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > grace period. > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > stopper from being scheduled on Y. > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > scheduler locks such as pi_lock or rq->lock or something. > > > > I think what you mentioned can happen easily. > > > > Basically we would need information about preemption/irq disabled > > sections on Y and scheduler's current activity on every cpu at that time. > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > don't have NMI solutions and only IPI or interrupt based backtrace > works which should at least catch and the preempt disable and softirq > disable cases. > > But yeah I don't see why just the stacks of those CPUs that are > blocking the CPU X would not suffice for the trivial cases where a > piece of misbehaving code disable interrupts / preemption and > prevented the stopper thread from executing. Right. So it makes more interesting tho! :-) > May be once the test case is ready (no rush!) , then it will be more > clear what can help. Yes. I'm really happy to help things about RCU that I love, fixed or improved. And with the the test case or a real issue, I believe I can do more helpful work. Looking forward to it, too (no rush!). Thanks, Byungchul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-19 0:52 ` Joel Fernandes 2019-07-19 1:10 ` Byungchul Park @ 2019-07-19 7:43 ` Paul E. McKenney 2019-07-19 9:57 ` Byungchul Park 1 sibling, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2019-07-19 7:43 UTC (permalink / raw) To: Joel Fernandes; +Cc: Byungchul Park, Byungchul Park, rcu, LKML, kernel-team On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: > [snip] > > > - There is a bug in the CPU stopper machinery itself preventing it > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > grace period. > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > stopper from being scheduled on Y. > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > scheduler locks such as pi_lock or rq->lock or something. > > > > I think what you mentioned can happen easily. > > > > Basically we would need information about preemption/irq disabled > > sections on Y and scheduler's current activity on every cpu at that time. > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > don't have NMI solutions and only IPI or interrupt based backtrace > works which should at least catch and the preempt disable and softirq > disable cases. True, though people with systems having hundreds of CPUs might not thank you for forcing an NMI backtrace on each of them. Is it possible to NMI only the ones that are holding up the CPU stopper? Thanx, Paul > But yeah I don't see why just the stacks of those CPUs that are > blocking the CPU X would not suffice for the trivial cases where a > piece of misbehaving code disable interrupts / preemption and > prevented the stopper thread from executing. > > May be once the test case is ready (no rush!) , then it will be more > clear what can help. > > J. > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-19 7:43 ` Paul E. McKenney @ 2019-07-19 9:57 ` Byungchul Park 2019-07-19 19:57 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Byungchul Park @ 2019-07-19 9:57 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Joel Fernandes, Byungchul Park, rcu, LKML, kernel-team On Fri, Jul 19, 2019 at 4:43 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: > > [snip] > > > > - There is a bug in the CPU stopper machinery itself preventing it > > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > > grace period. > > > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > > stopper from being scheduled on Y. > > > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > > scheduler locks such as pi_lock or rq->lock or something. > > > > > > I think what you mentioned can happen easily. > > > > > > Basically we would need information about preemption/irq disabled > > > sections on Y and scheduler's current activity on every cpu at that time. > > > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > > don't have NMI solutions and only IPI or interrupt based backtrace > > works which should at least catch and the preempt disable and softirq > > disable cases. > > True, though people with systems having hundreds of CPUs might not > thank you for forcing an NMI backtrace on each of them. Is it possible > to NMI only the ones that are holding up the CPU stopper? What a good idea! I think it's possible! But we need to think about the case NMI doesn't work when the holding-up was caused by IRQ disabled. Though it's just around the corner of weekend, I will keep thinking on it during weekend! Thanks, Byungchul > Thanx, Paul > > > But yeah I don't see why just the stacks of those CPUs that are > > blocking the CPU X would not suffice for the trivial cases where a > > piece of misbehaving code disable interrupts / preemption and > > prevented the stopper thread from executing. > > > > May be once the test case is ready (no rush!) , then it will be more > > clear what can help. > > > > J. > > -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-19 9:57 ` Byungchul Park @ 2019-07-19 19:57 ` Paul E. McKenney 2019-07-19 20:33 ` Joel Fernandes 0 siblings, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2019-07-19 19:57 UTC (permalink / raw) To: Byungchul Park; +Cc: Joel Fernandes, Byungchul Park, rcu, LKML, kernel-team On Fri, Jul 19, 2019 at 06:57:58PM +0900, Byungchul Park wrote: > On Fri, Jul 19, 2019 at 4:43 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > > > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: > > > [snip] > > > > > - There is a bug in the CPU stopper machinery itself preventing it > > > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > > > grace period. > > > > > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > > > stopper from being scheduled on Y. > > > > > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > > > scheduler locks such as pi_lock or rq->lock or something. > > > > > > > > I think what you mentioned can happen easily. > > > > > > > > Basically we would need information about preemption/irq disabled > > > > sections on Y and scheduler's current activity on every cpu at that time. > > > > > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > > > don't have NMI solutions and only IPI or interrupt based backtrace > > > works which should at least catch and the preempt disable and softirq > > > disable cases. > > > > True, though people with systems having hundreds of CPUs might not > > thank you for forcing an NMI backtrace on each of them. Is it possible > > to NMI only the ones that are holding up the CPU stopper? > > What a good idea! I think it's possible! > > But we need to think about the case NMI doesn't work when the > holding-up was caused by IRQ disabled. > > Though it's just around the corner of weekend, I will keep thinking > on it during weekend! Very good! Thanx, Paul > Thanks, > Byungchul > > > Thanx, Paul > > > > > But yeah I don't see why just the stacks of those CPUs that are > > > blocking the CPU X would not suffice for the trivial cases where a > > > piece of misbehaving code disable interrupts / preemption and > > > prevented the stopper thread from executing. > > > > > > May be once the test case is ready (no rush!) , then it will be more > > > clear what can help. > > > > > > J. > > > > > > > -- > Thanks, > Byungchul > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-19 19:57 ` Paul E. McKenney @ 2019-07-19 20:33 ` Joel Fernandes 2019-07-23 11:05 ` Byungchul Park 0 siblings, 1 reply; 53+ messages in thread From: Joel Fernandes @ 2019-07-19 20:33 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Byungchul Park, Byungchul Park, rcu, LKML, kernel-team On Fri, Jul 19, 2019 at 3:57 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > On Fri, Jul 19, 2019 at 06:57:58PM +0900, Byungchul Park wrote: > > On Fri, Jul 19, 2019 at 4:43 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > > > On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > > > > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: > > > > [snip] > > > > > > - There is a bug in the CPU stopper machinery itself preventing it > > > > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > > > > grace period. > > > > > > > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > > > > stopper from being scheduled on Y. > > > > > > > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > > > > scheduler locks such as pi_lock or rq->lock or something. > > > > > > > > > > I think what you mentioned can happen easily. > > > > > > > > > > Basically we would need information about preemption/irq disabled > > > > > sections on Y and scheduler's current activity on every cpu at that time. > > > > > > > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > > > > don't have NMI solutions and only IPI or interrupt based backtrace > > > > works which should at least catch and the preempt disable and softirq > > > > disable cases. > > > > > > True, though people with systems having hundreds of CPUs might not > > > thank you for forcing an NMI backtrace on each of them. Is it possible > > > to NMI only the ones that are holding up the CPU stopper? > > > > What a good idea! I think it's possible! > > > > But we need to think about the case NMI doesn't work when the > > holding-up was caused by IRQ disabled. > > > > Though it's just around the corner of weekend, I will keep thinking > > on it during weekend! > > Very good! Me too will think more about it ;-) Agreed with point about 100s of CPUs usecase, Thanks, have a great weekend, - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-19 20:33 ` Joel Fernandes @ 2019-07-23 11:05 ` Byungchul Park 2019-07-23 13:47 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Byungchul Park @ 2019-07-23 11:05 UTC (permalink / raw) To: Joel Fernandes; +Cc: Paul E. McKenney, Byungchul Park, rcu, LKML, kernel-team On Fri, Jul 19, 2019 at 04:33:56PM -0400, Joel Fernandes wrote: > On Fri, Jul 19, 2019 at 3:57 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > On Fri, Jul 19, 2019 at 06:57:58PM +0900, Byungchul Park wrote: > > > On Fri, Jul 19, 2019 at 4:43 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > > > > > On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > > > > > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: > > > > > [snip] > > > > > > > - There is a bug in the CPU stopper machinery itself preventing it > > > > > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > > > > > grace period. > > > > > > > > > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > > > > > stopper from being scheduled on Y. > > > > > > > > > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > > > > > scheduler locks such as pi_lock or rq->lock or something. > > > > > > > > > > > > I think what you mentioned can happen easily. > > > > > > > > > > > > Basically we would need information about preemption/irq disabled > > > > > > sections on Y and scheduler's current activity on every cpu at that time. > > > > > > > > > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > > > > > don't have NMI solutions and only IPI or interrupt based backtrace > > > > > works which should at least catch and the preempt disable and softirq > > > > > disable cases. > > > > > > > > True, though people with systems having hundreds of CPUs might not > > > > thank you for forcing an NMI backtrace on each of them. Is it possible > > > > to NMI only the ones that are holding up the CPU stopper? > > > > > > What a good idea! I think it's possible! > > > > > > But we need to think about the case NMI doesn't work when the > > > holding-up was caused by IRQ disabled. > > > > > > Though it's just around the corner of weekend, I will keep thinking > > > on it during weekend! > > > > Very good! > > Me too will think more about it ;-) Agreed with point about 100s of > CPUs usecase, > > Thanks, have a great weekend, BTW, if there's any long code section with irq/preemption disabled, then the problem would be not only about RCU stall. And we can also use latency tracer or something to detect the bad situation. So in this case, sending ipi/nmi to the CPUs where the stoppers cannot to be scheduled does not give us additional meaningful information. I think Paul started to think about this to solve some real problem. I seriously love to help RCU and it's my pleasure to dig deep into kind of RCU stuff, but I've yet to define exactly what problem is. Sorry. Could you share the real issue? I think you don't have to reproduce it. Just sharing the issue that you got inspired from is enough. Then I might be able to develop 'how' with Joel! :-) It's our pleasure! Thanks, Byungchul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-23 11:05 ` Byungchul Park @ 2019-07-23 13:47 ` Paul E. McKenney 2019-07-23 16:54 ` Paul E. McKenney 2019-07-24 7:59 ` Byungchul Park 0 siblings, 2 replies; 53+ messages in thread From: Paul E. McKenney @ 2019-07-23 13:47 UTC (permalink / raw) To: Byungchul Park; +Cc: Joel Fernandes, Byungchul Park, rcu, LKML, kernel-team On Tue, Jul 23, 2019 at 08:05:21PM +0900, Byungchul Park wrote: > On Fri, Jul 19, 2019 at 04:33:56PM -0400, Joel Fernandes wrote: > > On Fri, Jul 19, 2019 at 3:57 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > > > On Fri, Jul 19, 2019 at 06:57:58PM +0900, Byungchul Park wrote: > > > > On Fri, Jul 19, 2019 at 4:43 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > > > > > > > On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > > > > > > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: > > > > > > [snip] > > > > > > > > - There is a bug in the CPU stopper machinery itself preventing it > > > > > > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > > > > > > grace period. > > > > > > > > > > > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > > > > > > stopper from being scheduled on Y. > > > > > > > > > > > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > > > > > > scheduler locks such as pi_lock or rq->lock or something. > > > > > > > > > > > > > > I think what you mentioned can happen easily. > > > > > > > > > > > > > > Basically we would need information about preemption/irq disabled > > > > > > > sections on Y and scheduler's current activity on every cpu at that time. > > > > > > > > > > > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > > > > > > don't have NMI solutions and only IPI or interrupt based backtrace > > > > > > works which should at least catch and the preempt disable and softirq > > > > > > disable cases. > > > > > > > > > > True, though people with systems having hundreds of CPUs might not > > > > > thank you for forcing an NMI backtrace on each of them. Is it possible > > > > > to NMI only the ones that are holding up the CPU stopper? > > > > > > > > What a good idea! I think it's possible! > > > > > > > > But we need to think about the case NMI doesn't work when the > > > > holding-up was caused by IRQ disabled. > > > > > > > > Though it's just around the corner of weekend, I will keep thinking > > > > on it during weekend! > > > > > > Very good! > > > > Me too will think more about it ;-) Agreed with point about 100s of > > CPUs usecase, > > > > Thanks, have a great weekend, > > BTW, if there's any long code section with irq/preemption disabled, then > the problem would be not only about RCU stall. And we can also use > latency tracer or something to detect the bad situation. > > So in this case, sending ipi/nmi to the CPUs where the stoppers cannot > to be scheduled does not give us additional meaningful information. > > I think Paul started to think about this to solve some real problem. I > seriously love to help RCU and it's my pleasure to dig deep into kind of > RCU stuff, but I've yet to define exactly what problem is. Sorry. > > Could you share the real issue? I think you don't have to reproduce it. > Just sharing the issue that you got inspired from is enough. Then I > might be able to develop 'how' with Joel! :-) It's our pleasure! It is unfortunately quite intermittent. I was hoping to find a way to make it happen more often. Part of the underlying problem appears to be lock contention, in that reducing contention made it even more intermittent. Which is good in general, but not for exercising the CPU-stopper issue. But perhaps your hardware will make this happen more readily than does mine. The repeat-by is simple, namely run TREE04 on branch "dev" on an eight-CPU system. It appear that the number of CPUs used by the test should match the number available on the system that you are running on, though perhaps affinity could allow mismatches. So why not try it and see what happens? Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-23 13:47 ` Paul E. McKenney @ 2019-07-23 16:54 ` Paul E. McKenney 2019-07-24 7:58 ` Byungchul Park 2019-07-24 7:59 ` Byungchul Park 1 sibling, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2019-07-23 16:54 UTC (permalink / raw) To: Byungchul Park; +Cc: Joel Fernandes, Byungchul Park, rcu, LKML, kernel-team On Tue, Jul 23, 2019 at 06:47:17AM -0700, Paul E. McKenney wrote: > On Tue, Jul 23, 2019 at 08:05:21PM +0900, Byungchul Park wrote: > > On Fri, Jul 19, 2019 at 04:33:56PM -0400, Joel Fernandes wrote: > > > On Fri, Jul 19, 2019 at 3:57 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > > > > > On Fri, Jul 19, 2019 at 06:57:58PM +0900, Byungchul Park wrote: > > > > > On Fri, Jul 19, 2019 at 4:43 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > > > > > > > > > On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > > > > > > > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: > > > > > > > [snip] > > > > > > > > > - There is a bug in the CPU stopper machinery itself preventing it > > > > > > > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > > > > > > > grace period. > > > > > > > > > > > > > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > > > > > > > stopper from being scheduled on Y. > > > > > > > > > > > > > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > > > > > > > scheduler locks such as pi_lock or rq->lock or something. > > > > > > > > > > > > > > > > I think what you mentioned can happen easily. > > > > > > > > > > > > > > > > Basically we would need information about preemption/irq disabled > > > > > > > > sections on Y and scheduler's current activity on every cpu at that time. > > > > > > > > > > > > > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > > > > > > > don't have NMI solutions and only IPI or interrupt based backtrace > > > > > > > works which should at least catch and the preempt disable and softirq > > > > > > > disable cases. > > > > > > > > > > > > True, though people with systems having hundreds of CPUs might not > > > > > > thank you for forcing an NMI backtrace on each of them. Is it possible > > > > > > to NMI only the ones that are holding up the CPU stopper? > > > > > > > > > > What a good idea! I think it's possible! > > > > > > > > > > But we need to think about the case NMI doesn't work when the > > > > > holding-up was caused by IRQ disabled. > > > > > > > > > > Though it's just around the corner of weekend, I will keep thinking > > > > > on it during weekend! > > > > > > > > Very good! > > > > > > Me too will think more about it ;-) Agreed with point about 100s of > > > CPUs usecase, > > > > > > Thanks, have a great weekend, > > > > BTW, if there's any long code section with irq/preemption disabled, then > > the problem would be not only about RCU stall. And we can also use > > latency tracer or something to detect the bad situation. > > > > So in this case, sending ipi/nmi to the CPUs where the stoppers cannot > > to be scheduled does not give us additional meaningful information. > > > > I think Paul started to think about this to solve some real problem. I > > seriously love to help RCU and it's my pleasure to dig deep into kind of > > RCU stuff, but I've yet to define exactly what problem is. Sorry. > > > > Could you share the real issue? I think you don't have to reproduce it. > > Just sharing the issue that you got inspired from is enough. Then I > > might be able to develop 'how' with Joel! :-) It's our pleasure! > > It is unfortunately quite intermittent. I was hoping to find a way > to make it happen more often. Part of the underlying problem appears > to be lock contention, in that reducing contention made it even more > intermittent. Which is good in general, but not for exercising the > CPU-stopper issue. > > But perhaps your hardware will make this happen more readily than does > mine. The repeat-by is simple, namely run TREE04 on branch "dev" on an > eight-CPU system. It appear that the number of CPUs used by the test > should match the number available on the system that you are running on, > though perhaps affinity could allow mismatches. > > So why not try it and see what happens? And another potential issue causing this is a CONFIG_NO_HZ_FULL=y kernel running in kernel mode (rcutorture on the one hand and callback invocation on the other) for extended periods of time with the scheduling clock disabled. Just started the tests for this. They will be running for quite some time, which this week is a good thing. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-23 16:54 ` Paul E. McKenney @ 2019-07-24 7:58 ` Byungchul Park 0 siblings, 0 replies; 53+ messages in thread From: Byungchul Park @ 2019-07-24 7:58 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Joel Fernandes, Byungchul Park, rcu, LKML, kernel-team On Tue, Jul 23, 2019 at 09:54:03AM -0700, Paul E. McKenney wrote: > On Tue, Jul 23, 2019 at 06:47:17AM -0700, Paul E. McKenney wrote: > > On Tue, Jul 23, 2019 at 08:05:21PM +0900, Byungchul Park wrote: > > > On Fri, Jul 19, 2019 at 04:33:56PM -0400, Joel Fernandes wrote: > > > > On Fri, Jul 19, 2019 at 3:57 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > > > > > > > On Fri, Jul 19, 2019 at 06:57:58PM +0900, Byungchul Park wrote: > > > > > > On Fri, Jul 19, 2019 at 4:43 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > > > > > > > > > > > On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > > > > > > > > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: > > > > > > > > [snip] > > > > > > > > > > - There is a bug in the CPU stopper machinery itself preventing it > > > > > > > > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > > > > > > > > grace period. > > > > > > > > > > > > > > > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > > > > > > > > stopper from being scheduled on Y. > > > > > > > > > > > > > > > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > > > > > > > > scheduler locks such as pi_lock or rq->lock or something. > > > > > > > > > > > > > > > > > > I think what you mentioned can happen easily. > > > > > > > > > > > > > > > > > > Basically we would need information about preemption/irq disabled > > > > > > > > > sections on Y and scheduler's current activity on every cpu at that time. > > > > > > > > > > > > > > > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > > > > > > > > don't have NMI solutions and only IPI or interrupt based backtrace > > > > > > > > works which should at least catch and the preempt disable and softirq > > > > > > > > disable cases. > > > > > > > > > > > > > > True, though people with systems having hundreds of CPUs might not > > > > > > > thank you for forcing an NMI backtrace on each of them. Is it possible > > > > > > > to NMI only the ones that are holding up the CPU stopper? > > > > > > > > > > > > What a good idea! I think it's possible! > > > > > > > > > > > > But we need to think about the case NMI doesn't work when the > > > > > > holding-up was caused by IRQ disabled. > > > > > > > > > > > > Though it's just around the corner of weekend, I will keep thinking > > > > > > on it during weekend! > > > > > > > > > > Very good! > > > > > > > > Me too will think more about it ;-) Agreed with point about 100s of > > > > CPUs usecase, > > > > > > > > Thanks, have a great weekend, > > > > > > BTW, if there's any long code section with irq/preemption disabled, then > > > the problem would be not only about RCU stall. And we can also use > > > latency tracer or something to detect the bad situation. > > > > > > So in this case, sending ipi/nmi to the CPUs where the stoppers cannot > > > to be scheduled does not give us additional meaningful information. > > > > > > I think Paul started to think about this to solve some real problem. I > > > seriously love to help RCU and it's my pleasure to dig deep into kind of > > > RCU stuff, but I've yet to define exactly what problem is. Sorry. > > > > > > Could you share the real issue? I think you don't have to reproduce it. > > > Just sharing the issue that you got inspired from is enough. Then I > > > might be able to develop 'how' with Joel! :-) It's our pleasure! > > > > It is unfortunately quite intermittent. I was hoping to find a way > > to make it happen more often. Part of the underlying problem appears > > to be lock contention, in that reducing contention made it even more > > intermittent. Which is good in general, but not for exercising the > > CPU-stopper issue. > > > > But perhaps your hardware will make this happen more readily than does > > mine. The repeat-by is simple, namely run TREE04 on branch "dev" on an > > eight-CPU system. It appear that the number of CPUs used by the test > > should match the number available on the system that you are running on, > > though perhaps affinity could allow mismatches. > > > > So why not try it and see what happens? > > And another potential issue causing this is a CONFIG_NO_HZ_FULL=y I see. This provides more insight into the problem. Thanks, Byungchul > kernel running in kernel mode (rcutorture on the one hand and callback > invocation on the other) for extended periods of time with the scheduling > clock disabled. Just started the tests for this. They will be running > for quite some time, which this week is a good thing. ;-) > > Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-23 13:47 ` Paul E. McKenney 2019-07-23 16:54 ` Paul E. McKenney @ 2019-07-24 7:59 ` Byungchul Park 1 sibling, 0 replies; 53+ messages in thread From: Byungchul Park @ 2019-07-24 7:59 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Joel Fernandes, Byungchul Park, rcu, LKML, kernel-team On Tue, Jul 23, 2019 at 06:47:17AM -0700, Paul E. McKenney wrote: > On Tue, Jul 23, 2019 at 08:05:21PM +0900, Byungchul Park wrote: > > On Fri, Jul 19, 2019 at 04:33:56PM -0400, Joel Fernandes wrote: > > > On Fri, Jul 19, 2019 at 3:57 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > > > > > On Fri, Jul 19, 2019 at 06:57:58PM +0900, Byungchul Park wrote: > > > > > On Fri, Jul 19, 2019 at 4:43 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > > > > > > > > > > > On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > > > > > > > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park <byungchul.park@lge.com> wrote: > > > > > > > [snip] > > > > > > > > > - There is a bug in the CPU stopper machinery itself preventing it > > > > > > > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > > > > > > > grace period. > > > > > > > > > > > > > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > > > > > > > stopper from being scheduled on Y. > > > > > > > > > > > > > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > > > > > > > scheduler locks such as pi_lock or rq->lock or something. > > > > > > > > > > > > > > > > I think what you mentioned can happen easily. > > > > > > > > > > > > > > > > Basically we would need information about preemption/irq disabled > > > > > > > > sections on Y and scheduler's current activity on every cpu at that time. > > > > > > > > > > > > > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > > > > > > > don't have NMI solutions and only IPI or interrupt based backtrace > > > > > > > works which should at least catch and the preempt disable and softirq > > > > > > > disable cases. > > > > > > > > > > > > True, though people with systems having hundreds of CPUs might not > > > > > > thank you for forcing an NMI backtrace on each of them. Is it possible > > > > > > to NMI only the ones that are holding up the CPU stopper? > > > > > > > > > > What a good idea! I think it's possible! > > > > > > > > > > But we need to think about the case NMI doesn't work when the > > > > > holding-up was caused by IRQ disabled. > > > > > > > > > > Though it's just around the corner of weekend, I will keep thinking > > > > > on it during weekend! > > > > > > > > Very good! > > > > > > Me too will think more about it ;-) Agreed with point about 100s of > > > CPUs usecase, > > > > > > Thanks, have a great weekend, > > > > BTW, if there's any long code section with irq/preemption disabled, then > > the problem would be not only about RCU stall. And we can also use > > latency tracer or something to detect the bad situation. > > > > So in this case, sending ipi/nmi to the CPUs where the stoppers cannot > > to be scheduled does not give us additional meaningful information. > > > > I think Paul started to think about this to solve some real problem. I > > seriously love to help RCU and it's my pleasure to dig deep into kind of > > RCU stuff, but I've yet to define exactly what problem is. Sorry. > > > > Could you share the real issue? I think you don't have to reproduce it. > > Just sharing the issue that you got inspired from is enough. Then I > > might be able to develop 'how' with Joel! :-) It's our pleasure! > > It is unfortunately quite intermittent. I was hoping to find a way > to make it happen more often. Part of the underlying problem appears > to be lock contention, in that reducing contention made it even more > intermittent. Which is good in general, but not for exercising the > CPU-stopper issue. > > But perhaps your hardware will make this happen more readily than does > mine. The repeat-by is simple, namely run TREE04 on branch "dev" on an > eight-CPU system. It appear that the number of CPUs used by the test > should match the number available on the system that you are running on, > though perhaps affinity could allow mismatches. > > So why not try it and see what happens? Thank you. I'll try it too. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-11 19:58 ` Joel Fernandes 2019-07-12 6:32 ` Byungchul Park @ 2019-07-12 13:01 ` Paul E. McKenney 2019-07-12 13:40 ` Joel Fernandes 1 sibling, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2019-07-12 13:01 UTC (permalink / raw) To: Joel Fernandes Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > On Thu, Jul 11, 2019 at 12:48:18PM -0400, Joel Fernandes wrote: > > On Thu, Jul 11, 2019 at 08:02:15AM -0700, Paul E. McKenney wrote: > > > On Thu, Jul 11, 2019 at 09:08:49AM -0400, Joel Fernandes wrote: > > > > On Thu, Jul 11, 2019 at 05:30:52AM -0700, Paul E. McKenney wrote: > > > > > On Wed, Jul 10, 2019 at 10:20:25AM +0900, Byungchul Park wrote: > > > > > > On Tue, Jul 09, 2019 at 05:41:02AM -0700, Paul E. McKenney wrote: > > > > > > > > Hi Paul, > > > > > > > > > > > > > > > > IMHO, as much as we want to tune the time for fqs to be initiated, we > > > > > > > > can also want to tune the time for the help from scheduler to start. > > > > > > > > I thought only difference between them is a level of urgency. I might be > > > > > > > > wrong. It would be appreciated if you let me know if I miss something. > > > > > > > > > > > > > > Hello, Byungchul, > > > > > > > > > > > > > > I understand that one hypothetically might want to tune this at runtime, > > > > > > > but have you had need to tune this at runtime on a real production > > > > > > > workload? If so, what problem was happening that caused you to want to > > > > > > > do this tuning? > > > > > > > > > > > > Not actually. > > > > > > > > > > > > > > And it's ok even if the patch is turned down based on your criteria. :) > > > > > > > > > > > > > > If there is a real need, something needs to be provided to meet that > > > > > > > need. But in the absence of a real need, past experience has shown > > > > > > > that speculative tuning knobs usually do more harm than good. ;-) > > > > > > > > > > > > It makes sense, "A speculative tuning knobs do more harm than good". > > > > > > > > > > > > Then, it would be better to leave jiffies_till_{first,next}_fqs tunnable > > > > > > but jiffies_till_sched_qs until we need it. > > > > > > > > > > > > However, > > > > > > > > > > > > (1) In case that jiffies_till_sched_qs is tunnable: > > > > > > > > > > > > We might need all of jiffies_till_{first,next}_qs, > > > > > > jiffies_till_sched_qs and jiffies_to_sched_qs because > > > > > > jiffies_to_sched_qs can be affected by any of them. So we > > > > > > should be able to read each value at any time. > > > > > > > > > > > > (2) In case that jiffies_till_sched_qs is not tunnable: > > > > > > > > > > > > I think we don't have to keep the jiffies_till_sched_qs any > > > > > > longer since that's only for setting jiffies_to_sched_qs at > > > > > > *booting time*, which can be done with jiffies_to_sched_qs too. > > > > > > It's meaningless to keep all of tree variables. > > > > > > > > > > > > The simpler and less knobs that we really need we have, the better. > > > > > > > > > > > > what do you think about it? > > > > > > > > > > > > In the following patch, I (1) removed jiffies_till_sched_qs and then > > > > > > (2) renamed jiffies_*to*_sched_qs to jiffies_*till*_sched_qs because I > > > > > > think jiffies_till_sched_qs is a much better name for the purpose. I > > > > > > will resend it with a commit msg after knowing your opinion on it. > > > > > > > > > > I will give you a definite "maybe". > > > > > > > > > > Here are the two reasons for changing RCU's embarrassingly large array > > > > > of tuning parameters: > > > > > > > > > > 1. They are causing a problem in production. This would represent a > > > > > bug that clearly must be fixed. As you say, this change is not > > > > > in this category. > > > > > > > > > > 2. The change simplifies either RCU's code or the process of tuning > > > > > RCU, but without degrading RCU's ability to run everywhere and > > > > > without removing debugging tools. > > > > > > > > > > The change below clearly simplifies things by removing a few lines of > > > > > code, and it does not change RCU's default self-configuration. But are > > > > > we sure about the debugging aspect? (Please keep in mind that many more > > > > > sites are willing to change boot parameters than are willing to patch > > > > > their kernels.) > > > > > > > > > > What do you think? > > > > > > > > Just to add that independent of whether the runtime tunable make sense or > > > > not, may be it is still worth correcting the 0444 to be 0644 to be a separate > > > > patch? > > > > > > You lost me on this one. Doesn't changing from 0444 to 0644 make it be > > > a runtime tunable? > > > > I was going by our earlier discussion that the parameter is still writable at > > boot time. You mentioned something like the following: > > --- > > In Byungchul's defense, the current module_param() permissions are > > 0444, which really is read-only. Although I do agree that they can > > be written at boot, one could use this same line of reasoning to argue > > that const variables can be written at compile time (or, for on-stack > > const variables, at function-invocation time). But we still call them > > "const". > > --- > > > > Sorry if I got confused. You are right that we could leave it as read-only. > > > > > > > Finally, I urge you to join with Joel Fernandes and go through these > > > > > grace-period-duration tuning parameters. Once you guys get your heads > > > > > completely around all of them and how they interact across the different > > > > > possible RCU configurations, I bet that the two of you will have excellent > > > > > ideas for improvement. > > > > > > > > Yes, I am quite happy to join forces. Byungchul, let me know what about this > > > > or other things you had in mind. I have some other RCU topics too I am trying > > > > to get my head around and planning to work on more patches. > > > > > > > > Paul, in case you had any other specific tunables or experiments in mind, let > > > > me know. I am quite happy to try out new experiments and learn something > > > > based on tuning something. > > > > > > These would be the tunables controlling how quickly RCU takes its > > > various actions to encourage the current grace period to end quickly. > > > I would be happy to give you the exact list if you wish, but most of > > > them have appeared in this thread. > > > > > > The experiments should be designed to work out whether the current > > > default settings have configurations where they act badly. This might > > > also come up with advice for people attempting hand-tuning, or proposed > > > parameter-checking code to avoid bad combinations. > > > > > > For one example, setting the RCU CPU stall timeout too low will definitely > > > cause some unwanted splats. (Yes, one could argue that other things in > > > the kernel should change to allow this value to decrease, but things > > > like latency tracer and friends are probably more useful and important.) > > > > Ok, thank you for the hints. > > Hmm, speaking of grace period durations, it seems to me the maximum grace > period ever is recorded in rcu_state.gp_max. However it is not read from > anywhere. > > Any idea why it was added but not used? If I remember correclty, it used to be used in debugfs prints. It is useful for working out how low you can decrease rcutorture.stall_cpu to without getting RCU CPU stall warnings. A rather infrequent need, given that the mainline default has been adjusted only once. > I am interested in dumping this value just for fun, and seeing what I get. > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > issues, or even expose it in sys/proc fs to see what worst case grace periods > look like. That might be worthwhile. Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-12 13:01 ` Paul E. McKenney @ 2019-07-12 13:40 ` Joel Fernandes 0 siblings, 0 replies; 53+ messages in thread From: Joel Fernandes @ 2019-07-12 13:40 UTC (permalink / raw) To: Paul E. McKenney Cc: Byungchul Park, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Fri, Jul 12, 2019 at 06:01:05AM -0700, Paul E. McKenney wrote: > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote: > > On Thu, Jul 11, 2019 at 12:48:18PM -0400, Joel Fernandes wrote: > > > On Thu, Jul 11, 2019 at 08:02:15AM -0700, Paul E. McKenney wrote: > > > > On Thu, Jul 11, 2019 at 09:08:49AM -0400, Joel Fernandes wrote: > > > > > On Thu, Jul 11, 2019 at 05:30:52AM -0700, Paul E. McKenney wrote: > > > > > > On Wed, Jul 10, 2019 at 10:20:25AM +0900, Byungchul Park wrote: > > > > > > > On Tue, Jul 09, 2019 at 05:41:02AM -0700, Paul E. McKenney wrote: > > > > > > > > > Hi Paul, > > > > > > > > > > > > > > > > > > IMHO, as much as we want to tune the time for fqs to be initiated, we > > > > > > > > > can also want to tune the time for the help from scheduler to start. > > > > > > > > > I thought only difference between them is a level of urgency. I might be > > > > > > > > > wrong. It would be appreciated if you let me know if I miss something. > > > > > > > > > > > > > > > > Hello, Byungchul, > > > > > > > > > > > > > > > > I understand that one hypothetically might want to tune this at runtime, > > > > > > > > but have you had need to tune this at runtime on a real production > > > > > > > > workload? If so, what problem was happening that caused you to want to > > > > > > > > do this tuning? > > > > > > > > > > > > > > Not actually. > > > > > > > > > > > > > > > > And it's ok even if the patch is turned down based on your criteria. :) > > > > > > > > > > > > > > > > If there is a real need, something needs to be provided to meet that > > > > > > > > need. But in the absence of a real need, past experience has shown > > > > > > > > that speculative tuning knobs usually do more harm than good. ;-) > > > > > > > > > > > > > > It makes sense, "A speculative tuning knobs do more harm than good". > > > > > > > > > > > > > > Then, it would be better to leave jiffies_till_{first,next}_fqs tunnable > > > > > > > but jiffies_till_sched_qs until we need it. > > > > > > > > > > > > > > However, > > > > > > > > > > > > > > (1) In case that jiffies_till_sched_qs is tunnable: > > > > > > > > > > > > > > We might need all of jiffies_till_{first,next}_qs, > > > > > > > jiffies_till_sched_qs and jiffies_to_sched_qs because > > > > > > > jiffies_to_sched_qs can be affected by any of them. So we > > > > > > > should be able to read each value at any time. > > > > > > > > > > > > > > (2) In case that jiffies_till_sched_qs is not tunnable: > > > > > > > > > > > > > > I think we don't have to keep the jiffies_till_sched_qs any > > > > > > > longer since that's only for setting jiffies_to_sched_qs at > > > > > > > *booting time*, which can be done with jiffies_to_sched_qs too. > > > > > > > It's meaningless to keep all of tree variables. > > > > > > > > > > > > > > The simpler and less knobs that we really need we have, the better. > > > > > > > > > > > > > > what do you think about it? > > > > > > > > > > > > > > In the following patch, I (1) removed jiffies_till_sched_qs and then > > > > > > > (2) renamed jiffies_*to*_sched_qs to jiffies_*till*_sched_qs because I > > > > > > > think jiffies_till_sched_qs is a much better name for the purpose. I > > > > > > > will resend it with a commit msg after knowing your opinion on it. > > > > > > > > > > > > I will give you a definite "maybe". > > > > > > > > > > > > Here are the two reasons for changing RCU's embarrassingly large array > > > > > > of tuning parameters: > > > > > > > > > > > > 1. They are causing a problem in production. This would represent a > > > > > > bug that clearly must be fixed. As you say, this change is not > > > > > > in this category. > > > > > > > > > > > > 2. The change simplifies either RCU's code or the process of tuning > > > > > > RCU, but without degrading RCU's ability to run everywhere and > > > > > > without removing debugging tools. > > > > > > > > > > > > The change below clearly simplifies things by removing a few lines of > > > > > > code, and it does not change RCU's default self-configuration. But are > > > > > > we sure about the debugging aspect? (Please keep in mind that many more > > > > > > sites are willing to change boot parameters than are willing to patch > > > > > > their kernels.) > > > > > > > > > > > > What do you think? > > > > > > > > > > Just to add that independent of whether the runtime tunable make sense or > > > > > not, may be it is still worth correcting the 0444 to be 0644 to be a separate > > > > > patch? > > > > > > > > You lost me on this one. Doesn't changing from 0444 to 0644 make it be > > > > a runtime tunable? > > > > > > I was going by our earlier discussion that the parameter is still writable at > > > boot time. You mentioned something like the following: > > > --- > > > In Byungchul's defense, the current module_param() permissions are > > > 0444, which really is read-only. Although I do agree that they can > > > be written at boot, one could use this same line of reasoning to argue > > > that const variables can be written at compile time (or, for on-stack > > > const variables, at function-invocation time). But we still call them > > > "const". > > > --- > > > > > > Sorry if I got confused. You are right that we could leave it as read-only. > > > > > > > > > Finally, I urge you to join with Joel Fernandes and go through these > > > > > > grace-period-duration tuning parameters. Once you guys get your heads > > > > > > completely around all of them and how they interact across the different > > > > > > possible RCU configurations, I bet that the two of you will have excellent > > > > > > ideas for improvement. > > > > > > > > > > Yes, I am quite happy to join forces. Byungchul, let me know what about this > > > > > or other things you had in mind. I have some other RCU topics too I am trying > > > > > to get my head around and planning to work on more patches. > > > > > > > > > > Paul, in case you had any other specific tunables or experiments in mind, let > > > > > me know. I am quite happy to try out new experiments and learn something > > > > > based on tuning something. > > > > > > > > These would be the tunables controlling how quickly RCU takes its > > > > various actions to encourage the current grace period to end quickly. > > > > I would be happy to give you the exact list if you wish, but most of > > > > them have appeared in this thread. > > > > > > > > The experiments should be designed to work out whether the current > > > > default settings have configurations where they act badly. This might > > > > also come up with advice for people attempting hand-tuning, or proposed > > > > parameter-checking code to avoid bad combinations. > > > > > > > > For one example, setting the RCU CPU stall timeout too low will definitely > > > > cause some unwanted splats. (Yes, one could argue that other things in > > > > the kernel should change to allow this value to decrease, but things > > > > like latency tracer and friends are probably more useful and important.) > > > > > > Ok, thank you for the hints. > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace > > period ever is recorded in rcu_state.gp_max. However it is not read from > > anywhere. > > > > Any idea why it was added but not used? > > If I remember correclty, it used to be used in debugfs prints. It is > useful for working out how low you can decrease rcutorture.stall_cpu to > without getting RCU CPU stall warnings. A rather infrequent need, > given that the mainline default has been adjusted only once. Got it. > > I am interested in dumping this value just for fun, and seeing what I get. > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any > > issues, or even expose it in sys/proc fs to see what worst case grace periods > > look like. > > That might be worthwhile. Thanks. I am thinking also it will help see whether something else like GP thread priority or other configuration could be causing long GP times that is otherwise not possible to see without a counter. I will work out with Byungchul and come up with something. Cheers, - Joel ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-11 15:02 ` Paul E. McKenney 2019-07-11 16:48 ` Joel Fernandes @ 2019-07-12 6:00 ` Byungchul Park 1 sibling, 0 replies; 53+ messages in thread From: Byungchul Park @ 2019-07-12 6:00 UTC (permalink / raw) To: Paul E. McKenney Cc: Joel Fernandes, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Thu, Jul 11, 2019 at 08:02:15AM -0700, Paul E. McKenney wrote: > These would be the tunables controlling how quickly RCU takes its > various actions to encourage the current grace period to end quickly. Seriously one of the most interesting thing over all kernel works. > I would be happy to give you the exact list if you wish, but most of > them have appeared in this thread. Thank you. :) > The experiments should be designed to work out whether the current > default settings have configurations where they act badly. This might > also come up with advice for people attempting hand-tuning, or proposed > parameter-checking code to avoid bad combinations. Great. > For one example, setting the RCU CPU stall timeout too low will definitely > cause some unwanted splats. (Yes, one could argue that other things in > the kernel should change to allow this value to decrease, but things > like latency tracer and friends are probably more useful and important.) Agree. Thanks, Byungchul > > Thanx, Paul > > > thanks, > > > > - Joel > > > > > > > > > > Thanks, > > > > Byungchul > > > > > > > > ---8<--- > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > > > index e72c184..94b58f5 100644 > > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > > @@ -3792,10 +3792,6 @@ > > > > a value based on the most recent settings > > > > of rcutree.jiffies_till_first_fqs > > > > and rcutree.jiffies_till_next_fqs. > > > > - This calculated value may be viewed in > > > > - rcutree.jiffies_to_sched_qs. Any attempt to set > > > > - rcutree.jiffies_to_sched_qs will be cheerfully > > > > - overwritten. > > > > > > > > rcutree.kthread_prio= [KNL,BOOT] > > > > Set the SCHED_FIFO priority of the RCU per-CPU > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index a2f8ba2..ad9dc86 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -421,10 +421,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > > > * How long the grace period must be before we start recruiting > > > > * quiescent-state help from rcu_note_context_switch(). > > > > */ > > > > -static ulong jiffies_till_sched_qs = ULONG_MAX; > > > > +static ulong jiffies_till_sched_qs = ULONG_MAX; /* See adjust_jiffies_till_sched_qs(). */ > > > > module_param(jiffies_till_sched_qs, ulong, 0444); > > > > -static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > > > > /* > > > > * Make sure that we give the grace-period kthread time to detect any > > > > @@ -436,18 +434,13 @@ static void adjust_jiffies_till_sched_qs(void) > > > > { > > > > unsigned long j; > > > > > > > > - /* If jiffies_till_sched_qs was specified, respect the request. */ > > > > - if (jiffies_till_sched_qs != ULONG_MAX) { > > > > - WRITE_ONCE(jiffies_to_sched_qs, jiffies_till_sched_qs); > > > > - return; > > > > - } > > > > /* Otherwise, set to third fqs scan, but bound below on large system. */ > > > > j = READ_ONCE(jiffies_till_first_fqs) + > > > > 2 * READ_ONCE(jiffies_till_next_fqs); > > > > if (j < HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV) > > > > j = HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV; > > > > pr_info("RCU calculated value of scheduler-enlistment delay is %ld jiffies.\n", j); > > > > - WRITE_ONCE(jiffies_to_sched_qs, j); > > > > + WRITE_ONCE(jiffies_till_sched_qs, j); > > > > } > > > > > > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > > > @@ -1033,16 +1026,16 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) > > > > > > > > /* > > > > * A CPU running for an extended time within the kernel can > > > > - * delay RCU grace periods: (1) At age jiffies_to_sched_qs, > > > > - * set .rcu_urgent_qs, (2) At age 2*jiffies_to_sched_qs, set > > > > + * delay RCU grace periods: (1) At age jiffies_till_sched_qs, > > > > + * set .rcu_urgent_qs, (2) At age 2*jiffies_till_sched_qs, set > > > > * both .rcu_need_heavy_qs and .rcu_urgent_qs. Note that the > > > > * unsynchronized assignments to the per-CPU rcu_need_heavy_qs > > > > * variable are safe because the assignments are repeated if this > > > > * CPU failed to pass through a quiescent state. This code > > > > - * also checks .jiffies_resched in case jiffies_to_sched_qs > > > > + * also checks .jiffies_resched in case jiffies_till_sched_qs > > > > * is set way high. > > > > */ > > > > - jtsq = READ_ONCE(jiffies_to_sched_qs); > > > > + jtsq = READ_ONCE(jiffies_till_sched_qs); > > > > ruqp = per_cpu_ptr(&rcu_data.rcu_urgent_qs, rdp->cpu); > > > > rnhqp = &per_cpu(rcu_data.rcu_need_heavy_qs, rdp->cpu); > > > > if (!READ_ONCE(*rnhqp) && > > > > @@ -3383,7 +3376,8 @@ static void __init rcu_init_geometry(void) > > > > jiffies_till_first_fqs = d; > > > > if (jiffies_till_next_fqs == ULONG_MAX) > > > > jiffies_till_next_fqs = d; > > > > - adjust_jiffies_till_sched_qs(); > > > > + if (jiffies_till_sched_qs == ULONG_MAX) > > > > + adjust_jiffies_till_sched_qs(); > > > > > > > > /* If the compile-time values are accurate, just leave. */ > > > > if (rcu_fanout_leaf == RCU_FANOUT_LEAF && > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-11 13:08 ` Joel Fernandes 2019-07-11 15:02 ` Paul E. McKenney @ 2019-07-12 5:52 ` Byungchul Park 1 sibling, 0 replies; 53+ messages in thread From: Byungchul Park @ 2019-07-12 5:52 UTC (permalink / raw) To: Joel Fernandes Cc: Paul E. McKenney, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Thu, Jul 11, 2019 at 09:08:49AM -0400, Joel Fernandes wrote: > > Finally, I urge you to join with Joel Fernandes and go through these > > grace-period-duration tuning parameters. Once you guys get your heads > > completely around all of them and how they interact across the different > > possible RCU configurations, I bet that the two of you will have excellent > > ideas for improvement. > > Yes, I am quite happy to join forces. Byungchul, let me know what about this I love it. Let's talk later :) > or other things you had in mind. I have some other RCU topics too I am trying > to get my head around and planning to work on more patches. Great. Thanks, Byungchul > Paul, in case you had any other specific tunables or experiments in mind, let > me know. I am quite happy to try out new experiments and learn something > based on tuning something. > > thanks, > > - Joel > > > > > > Thanks, > > > Byungchul > > > > > > ---8<--- > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > > index e72c184..94b58f5 100644 > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > @@ -3792,10 +3792,6 @@ > > > a value based on the most recent settings > > > of rcutree.jiffies_till_first_fqs > > > and rcutree.jiffies_till_next_fqs. > > > - This calculated value may be viewed in > > > - rcutree.jiffies_to_sched_qs. Any attempt to set > > > - rcutree.jiffies_to_sched_qs will be cheerfully > > > - overwritten. > > > > > > rcutree.kthread_prio= [KNL,BOOT] > > > Set the SCHED_FIFO priority of the RCU per-CPU > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index a2f8ba2..ad9dc86 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -421,10 +421,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > > * How long the grace period must be before we start recruiting > > > * quiescent-state help from rcu_note_context_switch(). > > > */ > > > -static ulong jiffies_till_sched_qs = ULONG_MAX; > > > +static ulong jiffies_till_sched_qs = ULONG_MAX; /* See adjust_jiffies_till_sched_qs(). */ > > > module_param(jiffies_till_sched_qs, ulong, 0444); > > > -static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > > /* > > > * Make sure that we give the grace-period kthread time to detect any > > > @@ -436,18 +434,13 @@ static void adjust_jiffies_till_sched_qs(void) > > > { > > > unsigned long j; > > > > > > - /* If jiffies_till_sched_qs was specified, respect the request. */ > > > - if (jiffies_till_sched_qs != ULONG_MAX) { > > > - WRITE_ONCE(jiffies_to_sched_qs, jiffies_till_sched_qs); > > > - return; > > > - } > > > /* Otherwise, set to third fqs scan, but bound below on large system. */ > > > j = READ_ONCE(jiffies_till_first_fqs) + > > > 2 * READ_ONCE(jiffies_till_next_fqs); > > > if (j < HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV) > > > j = HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV; > > > pr_info("RCU calculated value of scheduler-enlistment delay is %ld jiffies.\n", j); > > > - WRITE_ONCE(jiffies_to_sched_qs, j); > > > + WRITE_ONCE(jiffies_till_sched_qs, j); > > > } > > > > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > > @@ -1033,16 +1026,16 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) > > > > > > /* > > > * A CPU running for an extended time within the kernel can > > > - * delay RCU grace periods: (1) At age jiffies_to_sched_qs, > > > - * set .rcu_urgent_qs, (2) At age 2*jiffies_to_sched_qs, set > > > + * delay RCU grace periods: (1) At age jiffies_till_sched_qs, > > > + * set .rcu_urgent_qs, (2) At age 2*jiffies_till_sched_qs, set > > > * both .rcu_need_heavy_qs and .rcu_urgent_qs. Note that the > > > * unsynchronized assignments to the per-CPU rcu_need_heavy_qs > > > * variable are safe because the assignments are repeated if this > > > * CPU failed to pass through a quiescent state. This code > > > - * also checks .jiffies_resched in case jiffies_to_sched_qs > > > + * also checks .jiffies_resched in case jiffies_till_sched_qs > > > * is set way high. > > > */ > > > - jtsq = READ_ONCE(jiffies_to_sched_qs); > > > + jtsq = READ_ONCE(jiffies_till_sched_qs); > > > ruqp = per_cpu_ptr(&rcu_data.rcu_urgent_qs, rdp->cpu); > > > rnhqp = &per_cpu(rcu_data.rcu_need_heavy_qs, rdp->cpu); > > > if (!READ_ONCE(*rnhqp) && > > > @@ -3383,7 +3376,8 @@ static void __init rcu_init_geometry(void) > > > jiffies_till_first_fqs = d; > > > if (jiffies_till_next_fqs == ULONG_MAX) > > > jiffies_till_next_fqs = d; > > > - adjust_jiffies_till_sched_qs(); > > > + if (jiffies_till_sched_qs == ULONG_MAX) > > > + adjust_jiffies_till_sched_qs(); > > > > > > /* If the compile-time values are accurate, just leave. */ > > > if (rcu_fanout_leaf == RCU_FANOUT_LEAF && > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-11 12:30 ` Paul E. McKenney 2019-07-11 13:08 ` Joel Fernandes @ 2019-07-12 5:48 ` Byungchul Park 2019-07-13 9:08 ` Byungchul Park 1 sibling, 1 reply; 53+ messages in thread From: Byungchul Park @ 2019-07-12 5:48 UTC (permalink / raw) To: Paul E. McKenney Cc: Joel Fernandes, josh, rostedt, mathieu.desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Thu, Jul 11, 2019 at 05:30:52AM -0700, Paul E. McKenney wrote: > > > If there is a real need, something needs to be provided to meet that > > > need. But in the absence of a real need, past experience has shown > > > that speculative tuning knobs usually do more harm than good. ;-) > > > > It makes sense, "A speculative tuning knobs do more harm than good". > > > > Then, it would be better to leave jiffies_till_{first,next}_fqs tunnable > > but jiffies_till_sched_qs until we need it. > > > > However, > > > > (1) In case that jiffies_till_sched_qs is tunnable: > > > > We might need all of jiffies_till_{first,next}_qs, > > jiffies_till_sched_qs and jiffies_to_sched_qs because > > jiffies_to_sched_qs can be affected by any of them. So we > > should be able to read each value at any time. > > > > (2) In case that jiffies_till_sched_qs is not tunnable: > > > > I think we don't have to keep the jiffies_till_sched_qs any > > longer since that's only for setting jiffies_to_sched_qs at > > *booting time*, which can be done with jiffies_to_sched_qs too. > > It's meaningless to keep all of tree variables. > > > > The simpler and less knobs that we really need we have, the better. > > > > what do you think about it? > > > > In the following patch, I (1) removed jiffies_till_sched_qs and then > > (2) renamed jiffies_*to*_sched_qs to jiffies_*till*_sched_qs because I > > think jiffies_till_sched_qs is a much better name for the purpose. I > > will resend it with a commit msg after knowing your opinion on it. Hi Paul, > I will give you a definite "maybe". > > Here are the two reasons for changing RCU's embarrassingly large array > of tuning parameters: > > 1. They are causing a problem in production. This would represent a > bug that clearly must be fixed. As you say, this change is not > in this category. > > 2. The change simplifies either RCU's code or the process of tuning > RCU, but without degrading RCU's ability to run everywhere and > without removing debugging tools. Agree. > The change below clearly simplifies things by removing a few lines of > code, and it does not change RCU's default self-configuration. But are > we sure about the debugging aspect? (Please keep in mind that many more I'm sorry I don't get it. I don't think this patch affect any debugging ability. What do you think it hurts? Could you explain it more? > sites are willing to change boot parameters than are willing to patch > their kernels.) Right. > What do you think? > > Finally, I urge you to join with Joel Fernandes and go through these > grace-period-duration tuning parameters. Once you guys get your heads > completely around all of them and how they interact across the different > possible RCU configurations, I bet that the two of you will have excellent > ideas for improvement. Great. I'd be happy if I join the improvement and with Joel. I might need to ask you exactly what you expect in detail maybe. Anyway I will willingly go with it. :) Thanks, Byungchul > Thanx, Paul > > > Thanks, > > Byungchul > > > > ---8<--- > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > index e72c184..94b58f5 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -3792,10 +3792,6 @@ > > a value based on the most recent settings > > of rcutree.jiffies_till_first_fqs > > and rcutree.jiffies_till_next_fqs. > > - This calculated value may be viewed in > > - rcutree.jiffies_to_sched_qs. Any attempt to set > > - rcutree.jiffies_to_sched_qs will be cheerfully > > - overwritten. > > > > rcutree.kthread_prio= [KNL,BOOT] > > Set the SCHED_FIFO priority of the RCU per-CPU > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index a2f8ba2..ad9dc86 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -421,10 +421,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > * How long the grace period must be before we start recruiting > > * quiescent-state help from rcu_note_context_switch(). > > */ > > -static ulong jiffies_till_sched_qs = ULONG_MAX; > > +static ulong jiffies_till_sched_qs = ULONG_MAX; /* See adjust_jiffies_till_sched_qs(). */ > > module_param(jiffies_till_sched_qs, ulong, 0444); > > -static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > /* > > * Make sure that we give the grace-period kthread time to detect any > > @@ -436,18 +434,13 @@ static void adjust_jiffies_till_sched_qs(void) > > { > > unsigned long j; > > > > - /* If jiffies_till_sched_qs was specified, respect the request. */ > > - if (jiffies_till_sched_qs != ULONG_MAX) { > > - WRITE_ONCE(jiffies_to_sched_qs, jiffies_till_sched_qs); > > - return; > > - } > > /* Otherwise, set to third fqs scan, but bound below on large system. */ > > j = READ_ONCE(jiffies_till_first_fqs) + > > 2 * READ_ONCE(jiffies_till_next_fqs); > > if (j < HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV) > > j = HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV; > > pr_info("RCU calculated value of scheduler-enlistment delay is %ld jiffies.\n", j); > > - WRITE_ONCE(jiffies_to_sched_qs, j); > > + WRITE_ONCE(jiffies_till_sched_qs, j); > > } > > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > @@ -1033,16 +1026,16 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) > > > > /* > > * A CPU running for an extended time within the kernel can > > - * delay RCU grace periods: (1) At age jiffies_to_sched_qs, > > - * set .rcu_urgent_qs, (2) At age 2*jiffies_to_sched_qs, set > > + * delay RCU grace periods: (1) At age jiffies_till_sched_qs, > > + * set .rcu_urgent_qs, (2) At age 2*jiffies_till_sched_qs, set > > * both .rcu_need_heavy_qs and .rcu_urgent_qs. Note that the > > * unsynchronized assignments to the per-CPU rcu_need_heavy_qs > > * variable are safe because the assignments are repeated if this > > * CPU failed to pass through a quiescent state. This code > > - * also checks .jiffies_resched in case jiffies_to_sched_qs > > + * also checks .jiffies_resched in case jiffies_till_sched_qs > > * is set way high. > > */ > > - jtsq = READ_ONCE(jiffies_to_sched_qs); > > + jtsq = READ_ONCE(jiffies_till_sched_qs); > > ruqp = per_cpu_ptr(&rcu_data.rcu_urgent_qs, rdp->cpu); > > rnhqp = &per_cpu(rcu_data.rcu_need_heavy_qs, rdp->cpu); > > if (!READ_ONCE(*rnhqp) && > > @@ -3383,7 +3376,8 @@ static void __init rcu_init_geometry(void) > > jiffies_till_first_fqs = d; > > if (jiffies_till_next_fqs == ULONG_MAX) > > jiffies_till_next_fqs = d; > > - adjust_jiffies_till_sched_qs(); > > + if (jiffies_till_sched_qs == ULONG_MAX) > > + adjust_jiffies_till_sched_qs(); > > > > /* If the compile-time values are accurate, just leave. */ > > if (rcu_fanout_leaf == RCU_FANOUT_LEAF && ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] rcu: Make jiffies_till_sched_qs writable 2019-07-12 5:48 ` Byungchul Park @ 2019-07-13 9:08 ` Byungchul Park 0 siblings, 0 replies; 53+ messages in thread From: Byungchul Park @ 2019-07-13 9:08 UTC (permalink / raw) To: Byungchul Park Cc: Paul E. McKenney, Joel Fernandes, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, rcu, linux-kernel, kernel-team On Fri, Jul 12, 2019 at 2:50 PM Byungchul Park <byungchul.park@lge.com> wrote: > > On Thu, Jul 11, 2019 at 05:30:52AM -0700, Paul E. McKenney wrote: > > > > If there is a real need, something needs to be provided to meet that > > > > need. But in the absence of a real need, past experience has shown > > > > that speculative tuning knobs usually do more harm than good. ;-) > > > > > > It makes sense, "A speculative tuning knobs do more harm than good". > > > > > > Then, it would be better to leave jiffies_till_{first,next}_fqs tunnable > > > but jiffies_till_sched_qs until we need it. > > > > > > However, > > > > > > (1) In case that jiffies_till_sched_qs is tunnable: > > > > > > We might need all of jiffies_till_{first,next}_qs, > > > jiffies_till_sched_qs and jiffies_to_sched_qs because > > > jiffies_to_sched_qs can be affected by any of them. So we > > > should be able to read each value at any time. > > > > > > (2) In case that jiffies_till_sched_qs is not tunnable: > > > > > > I think we don't have to keep the jiffies_till_sched_qs any > > > longer since that's only for setting jiffies_to_sched_qs at > > > *booting time*, which can be done with jiffies_to_sched_qs too. > > > It's meaningless to keep all of tree variables. > > > > > > The simpler and less knobs that we really need we have, the better. > > > > > > what do you think about it? > > > > > > In the following patch, I (1) removed jiffies_till_sched_qs and then > > > (2) renamed jiffies_*to*_sched_qs to jiffies_*till*_sched_qs because I > > > think jiffies_till_sched_qs is a much better name for the purpose. I > > > will resend it with a commit msg after knowing your opinion on it. > > Hi Paul, > > > I will give you a definite "maybe". > > > > Here are the two reasons for changing RCU's embarrassingly large array > > of tuning parameters: > > > > 1. They are causing a problem in production. This would represent a > > bug that clearly must be fixed. As you say, this change is not > > in this category. > > > > 2. The change simplifies either RCU's code or the process of tuning > > RCU, but without degrading RCU's ability to run everywhere and > > without removing debugging tools. > > Agree. > > > The change below clearly simplifies things by removing a few lines of > > code, and it does not change RCU's default self-configuration. But are > > we sure about the debugging aspect? (Please keep in mind that many more > > I'm sorry I don't get it. I don't think this patch affect any debugging > ability. What do you think it hurts? Could you explain it more? I meant we are able to achieve it by directly changing rcutree.jiffies_to_sched_qs even for the debugging purpose w/o changing code itself as well. Thoughts? -- Thanks, Byungchul > > sites are willing to change boot parameters than are willing to patch > > their kernels.) > > Right. > > > What do you think? > > > > Finally, I urge you to join with Joel Fernandes and go through these > > grace-period-duration tuning parameters. Once you guys get your heads > > completely around all of them and how they interact across the different > > possible RCU configurations, I bet that the two of you will have excellent > > ideas for improvement. > > Great. I'd be happy if I join the improvement and with Joel. I might > need to ask you exactly what you expect in detail maybe. Anyway I will > willingly go with it. :) > > Thanks, > Byungchul > > > Thanx, Paul > > > > > Thanks, > > > Byungchul > > > > > > ---8<--- > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > > index e72c184..94b58f5 100644 > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > @@ -3792,10 +3792,6 @@ > > > a value based on the most recent settings > > > of rcutree.jiffies_till_first_fqs > > > and rcutree.jiffies_till_next_fqs. > > > - This calculated value may be viewed in > > > - rcutree.jiffies_to_sched_qs. Any attempt to set > > > - rcutree.jiffies_to_sched_qs will be cheerfully > > > - overwritten. > > > > > > rcutree.kthread_prio= [KNL,BOOT] > > > Set the SCHED_FIFO priority of the RCU per-CPU > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index a2f8ba2..ad9dc86 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -421,10 +421,8 @@ static int rcu_is_cpu_rrupt_from_idle(void) > > > * How long the grace period must be before we start recruiting > > > * quiescent-state help from rcu_note_context_switch(). > > > */ > > > -static ulong jiffies_till_sched_qs = ULONG_MAX; > > > +static ulong jiffies_till_sched_qs = ULONG_MAX; /* See adjust_jiffies_till_sched_qs(). */ > > > module_param(jiffies_till_sched_qs, ulong, 0444); > > > -static ulong jiffies_to_sched_qs; /* See adjust_jiffies_till_sched_qs(). */ > > > -module_param(jiffies_to_sched_qs, ulong, 0444); /* Display only! */ > > > > > > /* > > > * Make sure that we give the grace-period kthread time to detect any > > > @@ -436,18 +434,13 @@ static void adjust_jiffies_till_sched_qs(void) > > > { > > > unsigned long j; > > > > > > - /* If jiffies_till_sched_qs was specified, respect the request. */ > > > - if (jiffies_till_sched_qs != ULONG_MAX) { > > > - WRITE_ONCE(jiffies_to_sched_qs, jiffies_till_sched_qs); > > > - return; > > > - } > > > /* Otherwise, set to third fqs scan, but bound below on large system. */ > > > j = READ_ONCE(jiffies_till_first_fqs) + > > > 2 * READ_ONCE(jiffies_till_next_fqs); > > > if (j < HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV) > > > j = HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV; > > > pr_info("RCU calculated value of scheduler-enlistment delay is %ld jiffies.\n", j); > > > - WRITE_ONCE(jiffies_to_sched_qs, j); > > > + WRITE_ONCE(jiffies_till_sched_qs, j); > > > } > > > > > > static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp) > > > @@ -1033,16 +1026,16 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) > > > > > > /* > > > * A CPU running for an extended time within the kernel can > > > - * delay RCU grace periods: (1) At age jiffies_to_sched_qs, > > > - * set .rcu_urgent_qs, (2) At age 2*jiffies_to_sched_qs, set > > > + * delay RCU grace periods: (1) At age jiffies_till_sched_qs, > > > + * set .rcu_urgent_qs, (2) At age 2*jiffies_till_sched_qs, set > > > * both .rcu_need_heavy_qs and .rcu_urgent_qs. Note that the > > > * unsynchronized assignments to the per-CPU rcu_need_heavy_qs > > > * variable are safe because the assignments are repeated if this > > > * CPU failed to pass through a quiescent state. This code > > > - * also checks .jiffies_resched in case jiffies_to_sched_qs > > > + * also checks .jiffies_resched in case jiffies_till_sched_qs > > > * is set way high. > > > */ > > > - jtsq = READ_ONCE(jiffies_to_sched_qs); > > > + jtsq = READ_ONCE(jiffies_till_sched_qs); > > > ruqp = per_cpu_ptr(&rcu_data.rcu_urgent_qs, rdp->cpu); > > > rnhqp = &per_cpu(rcu_data.rcu_need_heavy_qs, rdp->cpu); > > > if (!READ_ONCE(*rnhqp) && > > > @@ -3383,7 +3376,8 @@ static void __init rcu_init_geometry(void) > > > jiffies_till_first_fqs = d; > > > if (jiffies_till_next_fqs == ULONG_MAX) > > > jiffies_till_next_fqs = d; > > > - adjust_jiffies_till_sched_qs(); > > > + if (jiffies_till_sched_qs == ULONG_MAX) > > > + adjust_jiffies_till_sched_qs(); > > > > > > /* If the compile-time values are accurate, just leave. */ > > > if (rcu_fanout_leaf == RCU_FANOUT_LEAF && ^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2019-07-24 8:00 UTC | newest] Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-08 6:00 [PATCH] rcu: Make jiffies_till_sched_qs writable Byungchul Park 2019-07-08 12:50 ` Paul E. McKenney 2019-07-08 13:03 ` Joel Fernandes 2019-07-08 13:19 ` Paul E. McKenney 2019-07-08 14:15 ` Joel Fernandes 2019-07-09 6:05 ` Byungchul Park 2019-07-09 12:43 ` Paul E. McKenney 2019-07-09 5:58 ` Byungchul Park 2019-07-09 6:45 ` Byungchul Park 2019-07-09 12:41 ` Paul E. McKenney 2019-07-10 1:20 ` Byungchul Park 2019-07-11 12:30 ` Paul E. McKenney 2019-07-11 13:08 ` Joel Fernandes 2019-07-11 15:02 ` Paul E. McKenney 2019-07-11 16:48 ` Joel Fernandes 2019-07-11 19:58 ` Joel Fernandes 2019-07-12 6:32 ` Byungchul Park 2019-07-12 12:51 ` Joel Fernandes 2019-07-12 13:02 ` Paul E. McKenney 2019-07-12 13:43 ` Joel Fernandes 2019-07-12 14:53 ` Paul E. McKenney 2019-07-13 8:47 ` Byungchul Park 2019-07-13 14:20 ` Joel Fernandes 2019-07-13 15:13 ` Paul E. McKenney 2019-07-13 15:42 ` Joel Fernandes 2019-07-13 17:41 ` Paul E. McKenney 2019-07-14 13:39 ` Byungchul Park 2019-07-14 13:56 ` Paul E. McKenney 2019-07-15 17:39 ` Joel Fernandes 2019-07-15 20:09 ` Paul E. McKenney 2019-07-18 16:14 ` Joel Fernandes 2019-07-18 16:15 ` Joel Fernandes 2019-07-18 21:34 ` Paul E. McKenney 2019-07-19 0:48 ` Joel Fernandes 2019-07-19 0:54 ` Byungchul Park 2019-07-19 0:39 ` Byungchul Park 2019-07-19 0:52 ` Joel Fernandes 2019-07-19 1:10 ` Byungchul Park 2019-07-19 7:43 ` Paul E. McKenney 2019-07-19 9:57 ` Byungchul Park 2019-07-19 19:57 ` Paul E. McKenney 2019-07-19 20:33 ` Joel Fernandes 2019-07-23 11:05 ` Byungchul Park 2019-07-23 13:47 ` Paul E. McKenney 2019-07-23 16:54 ` Paul E. McKenney 2019-07-24 7:58 ` Byungchul Park 2019-07-24 7:59 ` Byungchul Park 2019-07-12 13:01 ` Paul E. McKenney 2019-07-12 13:40 ` Joel Fernandes 2019-07-12 6:00 ` Byungchul Park 2019-07-12 5:52 ` Byungchul Park 2019-07-12 5:48 ` Byungchul Park 2019-07-13 9:08 ` Byungchul Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).