* [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev @ 2018-05-14 3:15 Joel Fernandes (Google) 2018-05-14 3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google) ` (7 more replies) 0 siblings, 8 replies; 42+ messages in thread From: Joel Fernandes (Google) @ 2018-05-14 3:15 UTC (permalink / raw) To: linux-kernel Cc: Joel Fernandes (Google), Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team Hi, Here are some fixes, clean ups and some code comments changes mostly for the new funnel locking, gp_seq changes and some tracing. Its based on latest rcu/dev branch. thanks, - Joel Joel Fernandes (Google) (8): rcu: Add comment documenting how rcu_seq_snap works rcu: Clarify usage of cond_resched for tasks-RCU rcu: Add back the cpuend tracepoint rcu: Get rid of old c variable from places in tree RCU rcu: Use rcu_node as temporary variable in funnel locking loop rcu: Add back the Startedleaf tracepoint rcu: trace CleanupMore condition only if needed rcu: Fix cpustart tracepoint gp_seq number include/linux/rcupdate.h | 11 +++-- include/trace/events/rcu.h | 19 ++++---- kernel/rcu/rcu.h | 24 +++++++++- kernel/rcu/tree.c | 92 +++++++++++++++++++++++--------------- 4 files changed, 97 insertions(+), 49 deletions(-) -- 2.17.0.441.gb46fe60e1d-goog ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-14 3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google) @ 2018-05-14 3:15 ` Joel Fernandes (Google) 2018-05-14 3:47 ` Randy Dunlap 2018-05-14 17:38 ` Paul E. McKenney 2018-05-14 3:15 ` [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU Joel Fernandes (Google) ` (6 subsequent siblings) 7 siblings, 2 replies; 42+ messages in thread From: Joel Fernandes (Google) @ 2018-05-14 3:15 UTC (permalink / raw) To: linux-kernel Cc: Joel Fernandes (Google), Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team rcu_seq_snap may be tricky for someone looking at it for the first time. Lets document how it works with an example to make it easier. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 003671825d62..fc3170914ac7 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) WRITE_ONCE(*sp, rcu_seq_endval(sp)); } -/* Take a snapshot of the update side's sequence number. */ +/* + * Take a snapshot of the update side's sequence number. + * + * This function predicts what the grace period number will be the next + * time an RCU callback will be executed, given the current grace period's + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is + * already in progress. + * + * We do this with a single addition and masking. + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of + * the seq is used to track if a GP is in progress or not, its sufficient if we + * add (2+1) and mask with ~1. Let's see why with an example: + * + * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0). + * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1) + * to account for the state bit. However, if the current seq is 7 (gp is 3 and + * state bit is 1), then it means the current grace period is already in + * progress so the next time the callback will run is at the end of grace + * period number gp+2. To account for the extra +1, we just overflow the LSB by + * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU + * is idle), then the addition of the extra 0x1 and masking will have no + * effect. This is calculated as below. + */ static inline unsigned long rcu_seq_snap(unsigned long *sp) { unsigned long s; -- 2.17.0.441.gb46fe60e1d-goog ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-14 3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google) @ 2018-05-14 3:47 ` Randy Dunlap 2018-05-14 5:05 ` Joel Fernandes 2018-05-14 17:38 ` Paul E. McKenney 1 sibling, 1 reply; 42+ messages in thread From: Randy Dunlap @ 2018-05-14 3:47 UTC (permalink / raw) To: Joel Fernandes (Google), linux-kernel Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On 05/13/2018 08:15 PM, Joel Fernandes (Google) wrote: > rcu_seq_snap may be tricky for someone looking at it for the first time. > Lets document how it works with an example to make it easier. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > index 003671825d62..fc3170914ac7 100644 > --- a/kernel/rcu/rcu.h > +++ b/kernel/rcu/rcu.h > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > } > > -/* Take a snapshot of the update side's sequence number. */ > +/* > + * Take a snapshot of the update side's sequence number. > + * > + * This function predicts what the grace period number will be the next > + * time an RCU callback will be executed, given the current grace period's > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > + * already in progress. > + * > + * We do this with a single addition and masking. > + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of > + * the seq is used to track if a GP is in progress or not, its sufficient if we it's > + * add (2+1) and mask with ~1. Let's see why with an example: > + * > + * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0). > + * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1) > + * to account for the state bit. However, if the current seq is 7 (gp is 3 and > + * state bit is 1), then it means the current grace period is already in > + * progress so the next time the callback will run is at the end of grace > + * period number gp+2. To account for the extra +1, we just overflow the LSB by > + * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU > + * is idle), then the addition of the extra 0x1 and masking will have no > + * effect. This is calculated as below. > + */ > static inline unsigned long rcu_seq_snap(unsigned long *sp) > { > unsigned long s; > -- ~Randy ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-14 3:47 ` Randy Dunlap @ 2018-05-14 5:05 ` Joel Fernandes 0 siblings, 0 replies; 42+ messages in thread From: Joel Fernandes @ 2018-05-14 5:05 UTC (permalink / raw) To: Randy Dunlap Cc: linux-kernel, Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Sun, May 13, 2018 at 08:47:24PM -0700, Randy Dunlap wrote: > On 05/13/2018 08:15 PM, Joel Fernandes (Google) wrote: > > rcu_seq_snap may be tricky for someone looking at it for the first time. > > Lets document how it works with an example to make it easier. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > --- > > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > index 003671825d62..fc3170914ac7 100644 > > --- a/kernel/rcu/rcu.h > > +++ b/kernel/rcu/rcu.h > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > > } > > > > -/* Take a snapshot of the update side's sequence number. */ > > +/* > > + * Take a snapshot of the update side's sequence number. > > + * > > + * This function predicts what the grace period number will be the next > > + * time an RCU callback will be executed, given the current grace period's > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > > + * already in progress. > > + * > > + * We do this with a single addition and masking. > > + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of > > + * the seq is used to track if a GP is in progress or not, its sufficient if we > > it's Pardon my english. Fixed, thanks, - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-14 3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google) 2018-05-14 3:47 ` Randy Dunlap @ 2018-05-14 17:38 ` Paul E. McKenney 2018-05-15 1:51 ` Joel Fernandes 1 sibling, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-14 17:38 UTC (permalink / raw) To: Joel Fernandes (Google) Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote: > rcu_seq_snap may be tricky for someone looking at it for the first time. > Lets document how it works with an example to make it easier. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > index 003671825d62..fc3170914ac7 100644 > --- a/kernel/rcu/rcu.h > +++ b/kernel/rcu/rcu.h > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > } > > -/* Take a snapshot of the update side's sequence number. */ > +/* > + * Take a snapshot of the update side's sequence number. > + * > + * This function predicts what the grace period number will be the next > + * time an RCU callback will be executed, given the current grace period's > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > + * already in progress. How about something like this? This function returns the earliest value of the grace-period sequence number that will indicate that a full grace period has elapsed since the current time. Once the grace-period sequence number has reached this value, it will be safe to invoke all callbacks that have been registered prior to the current time. This value is the current grace-period number plus two to the power of the number of low-order bits reserved for state, then rounded up to the next value in which the state bits are all zero. > + * > + * We do this with a single addition and masking. Please either fold this sentence into rest of the paragraph or add a blank line after it. > + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of > + * the seq is used to track if a GP is in progress or not, its sufficient if we > + * add (2+1) and mask with ~1. Let's see why with an example: > + * > + * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0). > + * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1) > + * to account for the state bit. However, if the current seq is 7 (gp is 3 and > + * state bit is 1), then it means the current grace period is already in > + * progress so the next time the callback will run is at the end of grace > + * period number gp+2. To account for the extra +1, we just overflow the LSB by > + * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU > + * is idle), then the addition of the extra 0x1 and masking will have no > + * effect. This is calculated as below. > + */ Having the explicit numbers is good, but please use RCU_SEQ_STATE_MASK=3, since that is the current value. One alternative (or perhaps addition) is to have a short table of numbers showing the mapping from *sp to the return value. (I started from such a table when writing this function, for whatever that is worth.) Thanx, Paul > static inline unsigned long rcu_seq_snap(unsigned long *sp) > { > unsigned long s; > -- > 2.17.0.441.gb46fe60e1d-goog > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-14 17:38 ` Paul E. McKenney @ 2018-05-15 1:51 ` Joel Fernandes 2018-05-15 3:59 ` Paul E. McKenney 0 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 1:51 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote: > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote: > > rcu_seq_snap may be tricky for someone looking at it for the first time. > > Lets document how it works with an example to make it easier. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > --- > > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > index 003671825d62..fc3170914ac7 100644 > > --- a/kernel/rcu/rcu.h > > +++ b/kernel/rcu/rcu.h > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > > } > > > > -/* Take a snapshot of the update side's sequence number. */ > > +/* > > + * Take a snapshot of the update side's sequence number. > > + * > > + * This function predicts what the grace period number will be the next > > + * time an RCU callback will be executed, given the current grace period's > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > > + * already in progress. > > How about something like this? > > This function returns the earliest value of the grace-period > sequence number that will indicate that a full grace period has > elapsed since the current time. Once the grace-period sequence > number has reached this value, it will be safe to invoke all > callbacks that have been registered prior to the current time. > This value is the current grace-period number plus two to the > power of the number of low-order bits reserved for state, then > rounded up to the next value in which the state bits are all zero. This makes sense too, but do you disagree with what I said? I was kind of thinking of snap along the lines of how the previous code worked. Where you were calling rcu_cbs_completed() or a function with a similar name. Now we call _snap. So basically I connected these 2 facts together to mean that rcu_seq_snap also does that same thing as rcu_cbs_completed - which is basically it gives the "next GP" where existing callbacks have already run and new callbacks will run at the end of this "next GP". > > + * > > + * We do this with a single addition and masking. > > Please either fold this sentence into rest of the paragraph or add a > blank line after it. > > > + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of > > + * the seq is used to track if a GP is in progress or not, its sufficient if we > > + * add (2+1) and mask with ~1. Let's see why with an example: > > + * > > + * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0). > > + * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1) > > + * to account for the state bit. However, if the current seq is 7 (gp is 3 and > > + * state bit is 1), then it means the current grace period is already in > > + * progress so the next time the callback will run is at the end of grace > > + * period number gp+2. To account for the extra +1, we just overflow the LSB by > > + * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU > > + * is idle), then the addition of the extra 0x1 and masking will have no > > + * effect. This is calculated as below. > > + */ > > Having the explicit numbers is good, but please use RCU_SEQ_STATE_MASK=3, > since that is the current value. One alternative (or perhaps addition) > is to have a short table of numbers showing the mapping from *sp to the > return value. (I started from such a table when writing this function, > for whatever that is worth.) Ok I'll try to give a better example above. thanks, Also just to let you know, thanks so much for elaborately providing an example on the other thread where we are discussing the rcu_seq_done check. I will take some time to trace this down and see if I can zero in on the same understanding as yours. I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing rnp->gp_seq in rcu_seq_done. When that rnp->gp_seq reaches 'c', it only means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what we were trying to check in that loop... that's why I felt that check wasn't correct - that's my (most likely wrong) take on the matter, and I'll get back once I trace this a bit more hopefully today :-P thanks! - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-15 1:51 ` Joel Fernandes @ 2018-05-15 3:59 ` Paul E. McKenney 2018-05-15 7:02 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-15 3:59 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote: > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote: > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote: > > > rcu_seq_snap may be tricky for someone looking at it for the first time. > > > Lets document how it works with an example to make it easier. > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > --- > > > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > index 003671825d62..fc3170914ac7 100644 > > > --- a/kernel/rcu/rcu.h > > > +++ b/kernel/rcu/rcu.h > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > > > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > > > } > > > > > > -/* Take a snapshot of the update side's sequence number. */ > > > +/* > > > + * Take a snapshot of the update side's sequence number. > > > + * > > > + * This function predicts what the grace period number will be the next > > > + * time an RCU callback will be executed, given the current grace period's > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > > > + * already in progress. > > > > How about something like this? > > > > This function returns the earliest value of the grace-period > > sequence number that will indicate that a full grace period has > > elapsed since the current time. Once the grace-period sequence > > number has reached this value, it will be safe to invoke all > > callbacks that have been registered prior to the current time. > > This value is the current grace-period number plus two to the > > power of the number of low-order bits reserved for state, then > > rounded up to the next value in which the state bits are all zero. > > This makes sense too, but do you disagree with what I said? In a pedantic sense, definitely. RCU callbacks are being executed pretty much all the time on a busy system, so it is only the recently queued ones that are guaranteed to be deferred that long. And my experience indicates that someone really will get confused by that distinction, so I feel justified in being pedantic in this case. > I was kind of thinking of snap along the lines of how the previous code > worked. Where you were calling rcu_cbs_completed() or a function with a > similar name. Now we call _snap. You are quite correct that rcu_seq_snap() is the new analog of rcu_cbs_completed(), though there are differences due to the old ->gpnum and ->completed now being represented by a single ->gp_seq value. > So basically I connected these 2 facts together to mean that rcu_seq_snap > also does that same thing as rcu_cbs_completed - which is basically it gives > the "next GP" where existing callbacks have already run and new callbacks > will run at the end of this "next GP". Ah, but you didn't have the qualifier "new" in your original. And even then, the definition of "new" might be different for different readers. > > > + * > > > + * We do this with a single addition and masking. > > > > Please either fold this sentence into rest of the paragraph or add a > > blank line after it. > > > > > + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of > > > + * the seq is used to track if a GP is in progress or not, its sufficient if we > > > + * add (2+1) and mask with ~1. Let's see why with an example: > > > + * > > > + * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0). > > > + * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1) > > > + * to account for the state bit. However, if the current seq is 7 (gp is 3 and > > > + * state bit is 1), then it means the current grace period is already in > > > + * progress so the next time the callback will run is at the end of grace > > > + * period number gp+2. To account for the extra +1, we just overflow the LSB by > > > + * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU > > > + * is idle), then the addition of the extra 0x1 and masking will have no > > > + * effect. This is calculated as below. > > > + */ > > > > Having the explicit numbers is good, but please use RCU_SEQ_STATE_MASK=3, > > since that is the current value. One alternative (or perhaps addition) > > is to have a short table of numbers showing the mapping from *sp to the > > return value. (I started from such a table when writing this function, > > for whatever that is worth.) > > Ok I'll try to give a better example above. thanks, Sounds good! > Also just to let you know, thanks so much for elaborately providing an > example on the other thread where we are discussing the rcu_seq_done check. I > will take some time to trace this down and see if I can zero in on the same > understanding as yours. > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing > rnp->gp_seq in rcu_seq_done. When that rnp->gp_seq reaches 'c', it only > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what > we were trying to check in that loop... that's why I felt that check wasn't > correct - that's my (most likely wrong) take on the matter, and I'll get back > once I trace this a bit more hopefully today :-P If your point is that interrupts are disabled throughout, so there isn't much chance of the grace period completing during that time, you are mostly right. The places you might not be right are the idle loop and offline CPUs. And yes, call_rcu() doesn't like queuing callbacks onto offline CPUs, but IIRC it is just fine in the case where callbacks have been offloaded from that CPU. And if you instead say that "c" is the requested final ->gp_seq value obtained from _snap(), the thought process might go more easily. Thanx, Paul ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-15 3:59 ` Paul E. McKenney @ 2018-05-15 7:02 ` Joel Fernandes 2018-05-15 12:55 ` Paul E. McKenney 0 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 7:02 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team Hi Paul, Good morning, hope you're having a great Tuesday. I managed to find some evening hours today to dig into this a bit more. On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote: > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote: > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote: > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote: > > > > rcu_seq_snap may be tricky for someone looking at it for the first time. > > > > Lets document how it works with an example to make it easier. > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > --- > > > > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > > > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > index 003671825d62..fc3170914ac7 100644 > > > > --- a/kernel/rcu/rcu.h > > > > +++ b/kernel/rcu/rcu.h > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > > > > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > > > > } > > > > > > > > -/* Take a snapshot of the update side's sequence number. */ > > > > +/* > > > > + * Take a snapshot of the update side's sequence number. > > > > + * > > > > + * This function predicts what the grace period number will be the next > > > > + * time an RCU callback will be executed, given the current grace period's > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > > > > + * already in progress. > > > > > > How about something like this? > > > > > > This function returns the earliest value of the grace-period > > > sequence number that will indicate that a full grace period has > > > elapsed since the current time. Once the grace-period sequence > > > number has reached this value, it will be safe to invoke all > > > callbacks that have been registered prior to the current time. > > > This value is the current grace-period number plus two to the > > > power of the number of low-order bits reserved for state, then > > > rounded up to the next value in which the state bits are all zero. > > > > This makes sense too, but do you disagree with what I said? > > In a pedantic sense, definitely. RCU callbacks are being executed pretty > much all the time on a busy system, so it is only the recently queued > ones that are guaranteed to be deferred that long. And my experience > indicates that someone really will get confused by that distinction, > so I feel justified in being pedantic in this case. Ok I agree, I'll include your comment above. > > Also just to let you know, thanks so much for elaborately providing an > > example on the other thread where we are discussing the rcu_seq_done check. I > > will take some time to trace this down and see if I can zero in on the same > > understanding as yours. > > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing > > rnp->gp_seq in rcu_seq_done. When that rnp->gp_seq reaches 'c', it only > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what > > we were trying to check in that loop... that's why I felt that check wasn't > > correct - that's my (most likely wrong) take on the matter, and I'll get back > > once I trace this a bit more hopefully today :-P > > If your point is that interrupts are disabled throughout, so there isn't > much chance of the grace period completing during that time, you are > mostly right. The places you might not be right are the idle loop and > offline CPUs. And yes, call_rcu() doesn't like queuing callbacks onto > offline CPUs, but IIRC it is just fine in the case where callbacks have > been offloaded from that CPU. > > And if you instead say that "c" is the requested final ->gp_seq value > obtained from _snap(), the thought process might go more easily. Yes I agree with c being the requested final value which is the GP for which the callbacks will be queued. At the end of the GP c, the callbacks will have executed. About the rcu_seq_done check and why I believe its not right to use it in that funnel locking loop, if you could allow me to try argument my point from a different angle... We agreed that the way gp_seq numbers work and are compared with each other to identify if a GP is elapsed or not, is different from the way the previous numbers (gp_num) were compared. Most notably, before the gp_seq conversions - inorder to start a GP, we were doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the end. Now with gp_seq, for a gp to start, we don't do the "+1", we just set the state bits. To mark the end, we clear the state bits and increment the gp_num part of gp_seq. However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE with rcu_seq_done. You did so even though the gp_seq numbers work differently from previously used numbers (gp_num and completed). I would then argue that because of the differences above, a one-to-one replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense. I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num way of things because, if c == gp_num, that means that : - c started already - c has finished. Which worked correctly, because we have nothing to do and we can bail without setting any flag. Where as now, with the gp_seq regime, c == gp_seq means: - c-1 finished (I meant -1 subtracted from the gp_num part of c) This would cause us to bail without setting any flag for starting c. I did some tracing and I could never hit the rcu_seq_done check because it never happens in my tracing that _snap returned something for which rcu_seq_done returned true, so I'm not sure if this check is needed, but you're the expert ;) @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, * not be released. */ raw_lockdep_assert_held_rcu_node(rnp); + WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */ + WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */ trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { if (rnp_root != rnp) raw_spin_lock_rcu_node(rnp_root); - WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum + - need_future_gp_mask(), c)); if (need_future_gp_element(rnp_root, c) || - ULONG_CMP_GE(rnp_root->gpnum, c) || + rcu_seq_done(&rnp_root->gp_seq, c) || ^^^^ A direct replacement of ULONG_CMP_GE is bit weird? It means we bail out if c-1 completed, and we don't set any flag for starting c. That could result in the clean up never starting c? (rnp != rnp_root && - rnp_root->gpnum != rnp_root->completed)) { + rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) { trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted")); goto unlock_out; } ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-15 7:02 ` Joel Fernandes @ 2018-05-15 12:55 ` Paul E. McKenney 2018-05-15 18:41 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-15 12:55 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Tue, May 15, 2018 at 12:02:43AM -0700, Joel Fernandes wrote: > Hi Paul, > Good morning, hope you're having a great Tuesday. I managed to find some > evening hours today to dig into this a bit more. > > On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote: > > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote: > > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote: > > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote: > > > > > rcu_seq_snap may be tricky for someone looking at it for the first time. > > > > > Lets document how it works with an example to make it easier. > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > --- > > > > > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > > > > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > index 003671825d62..fc3170914ac7 100644 > > > > > --- a/kernel/rcu/rcu.h > > > > > +++ b/kernel/rcu/rcu.h > > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > > > > > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > > > > > } > > > > > > > > > > -/* Take a snapshot of the update side's sequence number. */ > > > > > +/* > > > > > + * Take a snapshot of the update side's sequence number. > > > > > + * > > > > > + * This function predicts what the grace period number will be the next > > > > > + * time an RCU callback will be executed, given the current grace period's > > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > > > > > + * already in progress. > > > > > > > > How about something like this? > > > > > > > > This function returns the earliest value of the grace-period > > > > sequence number that will indicate that a full grace period has > > > > elapsed since the current time. Once the grace-period sequence > > > > number has reached this value, it will be safe to invoke all > > > > callbacks that have been registered prior to the current time. > > > > This value is the current grace-period number plus two to the > > > > power of the number of low-order bits reserved for state, then > > > > rounded up to the next value in which the state bits are all zero. > > > > > > This makes sense too, but do you disagree with what I said? > > > > In a pedantic sense, definitely. RCU callbacks are being executed pretty > > much all the time on a busy system, so it is only the recently queued > > ones that are guaranteed to be deferred that long. And my experience > > indicates that someone really will get confused by that distinction, > > so I feel justified in being pedantic in this case. > > Ok I agree, I'll include your comment above. > > > > Also just to let you know, thanks so much for elaborately providing an > > > example on the other thread where we are discussing the rcu_seq_done check. I > > > will take some time to trace this down and see if I can zero in on the same > > > understanding as yours. > > > > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its > > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing > > > rnp->gp_seq in rcu_seq_done. When that rnp->gp_seq reaches 'c', it only > > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what > > > we were trying to check in that loop... that's why I felt that check wasn't > > > correct - that's my (most likely wrong) take on the matter, and I'll get back > > > once I trace this a bit more hopefully today :-P > > > > If your point is that interrupts are disabled throughout, so there isn't > > much chance of the grace period completing during that time, you are > > mostly right. The places you might not be right are the idle loop and > > offline CPUs. And yes, call_rcu() doesn't like queuing callbacks onto > > offline CPUs, but IIRC it is just fine in the case where callbacks have > > been offloaded from that CPU. > > > > And if you instead say that "c" is the requested final ->gp_seq value > > obtained from _snap(), the thought process might go more easily. > > Yes I agree with c being the requested final value which is the GP for which > the callbacks will be queued. At the end of the GP c, the callbacks will have > executed. > > About the rcu_seq_done check and why I believe its not right to use it in > that funnel locking loop, if you could allow me to try argument my point from > a different angle... > > We agreed that the way gp_seq numbers work and are compared with each other > to identify if a GP is elapsed or not, is different from the way the previous > numbers (gp_num) were compared. > > Most notably, before the gp_seq conversions - inorder to start a GP, we were > doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the > end. > > Now with gp_seq, for a gp to start, we don't do the "+1", we just set the > state bits. To mark the end, we clear the state bits and increment the gp_num > part of gp_seq. > > However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period > requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE > with rcu_seq_done. You did so even though the gp_seq numbers work differently > from previously used numbers (gp_num and completed). > > I would then argue that because of the differences above, a one-to-one > replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense. > > I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num > way of things because, if c == gp_num, that means that : > - c started already > - c has finished. > Which worked correctly, because we have nothing to do and we can bail > without setting any flag. > > Where as now, with the gp_seq regime, c == gp_seq means: > - c-1 finished (I meant -1 subtracted from the gp_num part of c) > This would cause us to bail without setting any flag for starting c. > > I did some tracing and I could never hit the rcu_seq_done check because it > never happens in my tracing that _snap returned something for which > rcu_seq_done returned true, so I'm not sure if this check is needed, but > you're the expert ;) > > @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > * not be released. > */ > raw_lockdep_assert_held_rcu_node(rnp); > + WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */ > + WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */ > trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); > for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { > if (rnp_root != rnp) > raw_spin_lock_rcu_node(rnp_root); > - WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum + > - need_future_gp_mask(), c)); > if (need_future_gp_element(rnp_root, c) || > - ULONG_CMP_GE(rnp_root->gpnum, c) || > + rcu_seq_done(&rnp_root->gp_seq, c) || > > ^^^^ > A direct replacement of ULONG_CMP_GE is bit weird? It > means we bail out if c-1 completed, and we don't set any > flag for starting c. That could result in the clean up > never starting c? Ah, I see what you are getting at now. What I do instead in 334dac2da529 ("rcu: Make rcu_nocb_wait_gp() check if GP already requested") is to push the request down to the leaves of the tree and to the rcu_data structure. Once that commit is in place, the check for the grace period already being in progress isn't all that helpful, though I suppose that it could be added. One way to do that would be to replace "rcu_seq_done(&rnp_root->gp_seq, c)" with ULONG_CMP_GE(rnp_root->gpnum, (c - RCU_SEQ_STATE_MASK))", but that seems a bit baroque to me. The point of the rcu_seq_done() is to catch long delays, but given the current implementation, the fact that interrupts are disabled across all calls should prevent the rcu_seq_done() from ever returning true. (Famous last words!) So, yes, it could be removed, in theory, at least. At least until the real-time guys force me to come up with a way to run this code with interrupts enabled (hopefully never!). If I were to do that, I would first wrap it with a WARN_ON_ONCE() and leave it that way for an extended period of testing. Yes, I am paranoid. Why do you ask? ;-) Thanx, Paul > (rnp != rnp_root && > - rnp_root->gpnum != rnp_root->completed)) { > + rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) { > trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted")); > goto unlock_out; > } > > > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-15 12:55 ` Paul E. McKenney @ 2018-05-15 18:41 ` Joel Fernandes 2018-05-15 19:08 ` Paul E. McKenney 0 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 18:41 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Tue, May 15, 2018 at 05:55:07AM -0700, Paul E. McKenney wrote: > On Tue, May 15, 2018 at 12:02:43AM -0700, Joel Fernandes wrote: > > Hi Paul, > > Good morning, hope you're having a great Tuesday. I managed to find some > > evening hours today to dig into this a bit more. > > > > On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote: > > > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote: > > > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote: > > > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote: > > > > > > rcu_seq_snap may be tricky for someone looking at it for the first time. > > > > > > Lets document how it works with an example to make it easier. > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > --- > > > > > > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > > > > > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > index 003671825d62..fc3170914ac7 100644 > > > > > > --- a/kernel/rcu/rcu.h > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > > > > > > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > > > > > > } > > > > > > > > > > > > -/* Take a snapshot of the update side's sequence number. */ > > > > > > +/* > > > > > > + * Take a snapshot of the update side's sequence number. > > > > > > + * > > > > > > + * This function predicts what the grace period number will be the next > > > > > > + * time an RCU callback will be executed, given the current grace period's > > > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > > > > > > + * already in progress. > > > > > > > > > > How about something like this? > > > > > > > > > > This function returns the earliest value of the grace-period > > > > > sequence number that will indicate that a full grace period has > > > > > elapsed since the current time. Once the grace-period sequence > > > > > number has reached this value, it will be safe to invoke all > > > > > callbacks that have been registered prior to the current time. > > > > > This value is the current grace-period number plus two to the > > > > > power of the number of low-order bits reserved for state, then > > > > > rounded up to the next value in which the state bits are all zero. > > > > > > > > This makes sense too, but do you disagree with what I said? > > > > > > In a pedantic sense, definitely. RCU callbacks are being executed pretty > > > much all the time on a busy system, so it is only the recently queued > > > ones that are guaranteed to be deferred that long. And my experience > > > indicates that someone really will get confused by that distinction, > > > so I feel justified in being pedantic in this case. > > > > Ok I agree, I'll include your comment above. > > > > > > Also just to let you know, thanks so much for elaborately providing an > > > > example on the other thread where we are discussing the rcu_seq_done check. I > > > > will take some time to trace this down and see if I can zero in on the same > > > > understanding as yours. > > > > > > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its > > > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing > > > > rnp->gp_seq in rcu_seq_done. When that rnp->gp_seq reaches 'c', it only > > > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what > > > > we were trying to check in that loop... that's why I felt that check wasn't > > > > correct - that's my (most likely wrong) take on the matter, and I'll get back > > > > once I trace this a bit more hopefully today :-P > > > > > > If your point is that interrupts are disabled throughout, so there isn't > > > much chance of the grace period completing during that time, you are > > > mostly right. The places you might not be right are the idle loop and > > > offline CPUs. And yes, call_rcu() doesn't like queuing callbacks onto > > > offline CPUs, but IIRC it is just fine in the case where callbacks have > > > been offloaded from that CPU. > > > > > > And if you instead say that "c" is the requested final ->gp_seq value > > > obtained from _snap(), the thought process might go more easily. > > > > Yes I agree with c being the requested final value which is the GP for which > > the callbacks will be queued. At the end of the GP c, the callbacks will have > > executed. > > > > About the rcu_seq_done check and why I believe its not right to use it in > > that funnel locking loop, if you could allow me to try argument my point from > > a different angle... > > > > We agreed that the way gp_seq numbers work and are compared with each other > > to identify if a GP is elapsed or not, is different from the way the previous > > numbers (gp_num) were compared. > > > > Most notably, before the gp_seq conversions - inorder to start a GP, we were > > doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the > > end. > > > > Now with gp_seq, for a gp to start, we don't do the "+1", we just set the > > state bits. To mark the end, we clear the state bits and increment the gp_num > > part of gp_seq. > > > > However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period > > requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE > > with rcu_seq_done. You did so even though the gp_seq numbers work differently > > from previously used numbers (gp_num and completed). > > > > I would then argue that because of the differences above, a one-to-one > > replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense. > > > > I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num > > way of things because, if c == gp_num, that means that : > > - c started already > > - c has finished. > > Which worked correctly, because we have nothing to do and we can bail > > without setting any flag. > > > > Where as now, with the gp_seq regime, c == gp_seq means: > > - c-1 finished (I meant -1 subtracted from the gp_num part of c) > > This would cause us to bail without setting any flag for starting c. > > > > I did some tracing and I could never hit the rcu_seq_done check because it > > never happens in my tracing that _snap returned something for which > > rcu_seq_done returned true, so I'm not sure if this check is needed, but > > you're the expert ;) > > > > @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > > * not be released. > > */ > > raw_lockdep_assert_held_rcu_node(rnp); > > + WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */ > > + WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */ > > trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); > > for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { > > if (rnp_root != rnp) > > raw_spin_lock_rcu_node(rnp_root); > > - WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum + > > - need_future_gp_mask(), c)); > > if (need_future_gp_element(rnp_root, c) || > > - ULONG_CMP_GE(rnp_root->gpnum, c) || > > + rcu_seq_done(&rnp_root->gp_seq, c) || > > > > ^^^^ > > A direct replacement of ULONG_CMP_GE is bit weird? It > > means we bail out if c-1 completed, and we don't set any > > flag for starting c. That could result in the clean up > > never starting c? > > Ah, I see what you are getting at now. > > What I do instead in 334dac2da529 ("rcu: Make rcu_nocb_wait_gp() check > if GP already requested") is to push the request down to the leaves of > the tree and to the rcu_data structure. Once that commit is in place, > the check for the grace period already being in progress isn't all > that helpful, though I suppose that it could be added. One way to > do that would be to replace "rcu_seq_done(&rnp_root->gp_seq, c)" with > ULONG_CMP_GE(rnp_root->gpnum, (c - RCU_SEQ_STATE_MASK))", but that seems > a bit baroque to me. > > The point of the rcu_seq_done() is to catch long delays, but given the > current implementation, the fact that interrupts are disabled across > all calls should prevent the rcu_seq_done() from ever returning true. > (Famous last words!) So, yes, it could be removed, in theory, at least. > At least until the real-time guys force me to come up with a way to > run this code with interrupts enabled (hopefully never!). > > If I were to do that, I would first wrap it with a WARN_ON_ONCE() and > leave it that way for an extended period of testing. Yes, I am paranoid. > Why do you ask? ;-) :-D Ah I see what you're doing in that commit where you're moving the furthest request down to the leaves, so that would protect against the scenario I was describing and set the gp_seq_needed of the leaf. The code would be correct then, but one issue is it would shout out the 'Prestarted' tracepoint for 'c' when that's not really true.. rcu_seq_done(&rnp_root->gp_seq, c) translates to ULONG_CMP_GE(&rnp_root->gp_seq, c) which translates to the fact that c-1 completed. So in this case if rcu_seq_done returns true, then saying that c has been 'Prestarted' seems a bit off to me. It should be 'Startedleaf' or something since what we really are doing is just marking the leaf as you mentioned in the unlock_out part for a future start. thanks! - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-15 18:41 ` Joel Fernandes @ 2018-05-15 19:08 ` Paul E. McKenney 2018-05-15 22:55 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-15 19:08 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Tue, May 15, 2018 at 11:41:15AM -0700, Joel Fernandes wrote: > On Tue, May 15, 2018 at 05:55:07AM -0700, Paul E. McKenney wrote: > > On Tue, May 15, 2018 at 12:02:43AM -0700, Joel Fernandes wrote: > > > Hi Paul, > > > Good morning, hope you're having a great Tuesday. I managed to find some > > > evening hours today to dig into this a bit more. > > > > > > On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote: > > > > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote: > > > > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote: > > > > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote: > > > > > > > rcu_seq_snap may be tricky for someone looking at it for the first time. > > > > > > > Lets document how it works with an example to make it easier. > > > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > --- > > > > > > > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > > > > > > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > > index 003671825d62..fc3170914ac7 100644 > > > > > > > --- a/kernel/rcu/rcu.h > > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > > > > > > > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > > > > > > > } > > > > > > > > > > > > > > -/* Take a snapshot of the update side's sequence number. */ > > > > > > > +/* > > > > > > > + * Take a snapshot of the update side's sequence number. > > > > > > > + * > > > > > > > + * This function predicts what the grace period number will be the next > > > > > > > + * time an RCU callback will be executed, given the current grace period's > > > > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > > > > > > > + * already in progress. > > > > > > > > > > > > How about something like this? > > > > > > > > > > > > This function returns the earliest value of the grace-period > > > > > > sequence number that will indicate that a full grace period has > > > > > > elapsed since the current time. Once the grace-period sequence > > > > > > number has reached this value, it will be safe to invoke all > > > > > > callbacks that have been registered prior to the current time. > > > > > > This value is the current grace-period number plus two to the > > > > > > power of the number of low-order bits reserved for state, then > > > > > > rounded up to the next value in which the state bits are all zero. > > > > > > > > > > This makes sense too, but do you disagree with what I said? > > > > > > > > In a pedantic sense, definitely. RCU callbacks are being executed pretty > > > > much all the time on a busy system, so it is only the recently queued > > > > ones that are guaranteed to be deferred that long. And my experience > > > > indicates that someone really will get confused by that distinction, > > > > so I feel justified in being pedantic in this case. > > > > > > Ok I agree, I'll include your comment above. > > > > > > > > Also just to let you know, thanks so much for elaborately providing an > > > > > example on the other thread where we are discussing the rcu_seq_done check. I > > > > > will take some time to trace this down and see if I can zero in on the same > > > > > understanding as yours. > > > > > > > > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its > > > > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing > > > > > rnp->gp_seq in rcu_seq_done. When that rnp->gp_seq reaches 'c', it only > > > > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what > > > > > we were trying to check in that loop... that's why I felt that check wasn't > > > > > correct - that's my (most likely wrong) take on the matter, and I'll get back > > > > > once I trace this a bit more hopefully today :-P > > > > > > > > If your point is that interrupts are disabled throughout, so there isn't > > > > much chance of the grace period completing during that time, you are > > > > mostly right. The places you might not be right are the idle loop and > > > > offline CPUs. And yes, call_rcu() doesn't like queuing callbacks onto > > > > offline CPUs, but IIRC it is just fine in the case where callbacks have > > > > been offloaded from that CPU. > > > > > > > > And if you instead say that "c" is the requested final ->gp_seq value > > > > obtained from _snap(), the thought process might go more easily. > > > > > > Yes I agree with c being the requested final value which is the GP for which > > > the callbacks will be queued. At the end of the GP c, the callbacks will have > > > executed. > > > > > > About the rcu_seq_done check and why I believe its not right to use it in > > > that funnel locking loop, if you could allow me to try argument my point from > > > a different angle... > > > > > > We agreed that the way gp_seq numbers work and are compared with each other > > > to identify if a GP is elapsed or not, is different from the way the previous > > > numbers (gp_num) were compared. > > > > > > Most notably, before the gp_seq conversions - inorder to start a GP, we were > > > doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the > > > end. > > > > > > Now with gp_seq, for a gp to start, we don't do the "+1", we just set the > > > state bits. To mark the end, we clear the state bits and increment the gp_num > > > part of gp_seq. > > > > > > However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period > > > requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE > > > with rcu_seq_done. You did so even though the gp_seq numbers work differently > > > from previously used numbers (gp_num and completed). > > > > > > I would then argue that because of the differences above, a one-to-one > > > replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense. > > > > > > I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num > > > way of things because, if c == gp_num, that means that : > > > - c started already > > > - c has finished. > > > Which worked correctly, because we have nothing to do and we can bail > > > without setting any flag. > > > > > > Where as now, with the gp_seq regime, c == gp_seq means: > > > - c-1 finished (I meant -1 subtracted from the gp_num part of c) > > > This would cause us to bail without setting any flag for starting c. > > > > > > I did some tracing and I could never hit the rcu_seq_done check because it > > > never happens in my tracing that _snap returned something for which > > > rcu_seq_done returned true, so I'm not sure if this check is needed, but > > > you're the expert ;) > > > > > > @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > > > * not be released. > > > */ > > > raw_lockdep_assert_held_rcu_node(rnp); > > > + WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */ > > > + WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */ > > > trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); > > > for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { > > > if (rnp_root != rnp) > > > raw_spin_lock_rcu_node(rnp_root); > > > - WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum + > > > - need_future_gp_mask(), c)); > > > if (need_future_gp_element(rnp_root, c) || > > > - ULONG_CMP_GE(rnp_root->gpnum, c) || > > > + rcu_seq_done(&rnp_root->gp_seq, c) || > > > > > > ^^^^ > > > A direct replacement of ULONG_CMP_GE is bit weird? It > > > means we bail out if c-1 completed, and we don't set any > > > flag for starting c. That could result in the clean up > > > never starting c? > > > > Ah, I see what you are getting at now. > > > > What I do instead in 334dac2da529 ("rcu: Make rcu_nocb_wait_gp() check > > if GP already requested") is to push the request down to the leaves of > > the tree and to the rcu_data structure. Once that commit is in place, > > the check for the grace period already being in progress isn't all > > that helpful, though I suppose that it could be added. One way to > > do that would be to replace "rcu_seq_done(&rnp_root->gp_seq, c)" with > > ULONG_CMP_GE(rnp_root->gpnum, (c - RCU_SEQ_STATE_MASK))", but that seems > > a bit baroque to me. > > > > The point of the rcu_seq_done() is to catch long delays, but given the > > current implementation, the fact that interrupts are disabled across > > all calls should prevent the rcu_seq_done() from ever returning true. > > (Famous last words!) So, yes, it could be removed, in theory, at least. > > At least until the real-time guys force me to come up with a way to > > run this code with interrupts enabled (hopefully never!). > > > > If I were to do that, I would first wrap it with a WARN_ON_ONCE() and > > leave it that way for an extended period of testing. Yes, I am paranoid. > > Why do you ask? ;-) > :-D > > Ah I see what you're doing in that commit where you're moving the furthest > request down to the leaves, so that would protect against the scenario I was > describing and set the gp_seq_needed of the leaf. But I came up with a less baroque check for a grace period having started, at which point the question becomes "Why not just do both?", especially since a check for a grace period having started is satisfied by that grace period's having completed, which means minimal added overhead. Perhaps no added overhead for some compilers and architectures. Please see the end of this email for a prototype patch. > The code would be correct then, but one issue is it would shout out the > 'Prestarted' tracepoint for 'c' when that's not really true.. > > rcu_seq_done(&rnp_root->gp_seq, c) > > translates to ULONG_CMP_GE(&rnp_root->gp_seq, c) > > which translates to the fact that c-1 completed. > > So in this case if rcu_seq_done returns true, then saying that c has been > 'Prestarted' seems a bit off to me. It should be 'Startedleaf' or something > since what we really are doing is just marking the leaf as you mentioned in > the unlock_out part for a future start. Indeed, some of the tracing is not all that accurate. But the trace message itself contains the information needed to work out why the loop was exited, so perhaps something like 'EarlyExit'? Thanx, Paul ------------------------------------------------------------------------ commit 59a4f38edcffbef1521852fe3b26ed4ed85af16e Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Tue May 15 11:53:41 2018 -0700 rcu: Make rcu_start_this_gp() check for grace period already started In the old days of ->gpnum and ->completed, the code requesting a new grace period checked to see if that grace period had already started, bailing early if so. The new-age ->gp_seq approach instead checks whether the grace period has already finished. A compensating change pushed the requested grace period down to the bottom of the tree, thus reducing lock contention and even eliminating it in some cases. But why not further reduce contention, especially on large systems, by doing both, especially given that the cost of doing both is extremely small? This commit therefore adds a new rcu_seq_started() function that checks whether a specified grace period has already started. It then uses this new function in place of rcu_seq_done() in the rcu_start_this_gp() function's funnel locking code. Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 003671825d62..1c5cbd9d7c97 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -108,6 +108,15 @@ static inline unsigned long rcu_seq_current(unsigned long *sp) } /* + * Given a snapshot from rcu_seq_snap(), determine whether or not the + * corresponding update-side operation has started. + */ +static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) +{ + return ULONG_CMP_LT((s - 1) & ~RCU_SEQ_STATE_MASK, READ_ONCE(*sp)); +} + +/* * Given a snapshot from rcu_seq_snap(), determine whether or not a * full update-side operation has occurred. */ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9e900c5926cc..ed69f49b7054 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1580,7 +1580,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, if (rnp_root != rnp) raw_spin_lock_rcu_node(rnp_root); if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) || - rcu_seq_done(&rnp_root->gp_seq, c) || + rcu_seq_started(&rnp_root->gp_seq, c) || (rnp != rnp_root && rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) { trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted")); ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-15 19:08 ` Paul E. McKenney @ 2018-05-15 22:55 ` Joel Fernandes 2018-05-16 15:45 ` Paul E. McKenney 0 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 22:55 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Tue, May 15, 2018 at 12:08:01PM -0700, Paul E. McKenney wrote: > On Tue, May 15, 2018 at 11:41:15AM -0700, Joel Fernandes wrote: > > On Tue, May 15, 2018 at 05:55:07AM -0700, Paul E. McKenney wrote: > > > On Tue, May 15, 2018 at 12:02:43AM -0700, Joel Fernandes wrote: > > > > Hi Paul, > > > > Good morning, hope you're having a great Tuesday. I managed to find some > > > > evening hours today to dig into this a bit more. > > > > > > > > On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote: > > > > > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote: > > > > > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote: > > > > > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote: > > > > > > > > rcu_seq_snap may be tricky for someone looking at it for the first time. > > > > > > > > Lets document how it works with an example to make it easier. > > > > > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > --- > > > > > > > > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > > > > > > > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > > > index 003671825d62..fc3170914ac7 100644 > > > > > > > > --- a/kernel/rcu/rcu.h > > > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > > > > > > > > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > > > > > > > > } > > > > > > > > > > > > > > > > -/* Take a snapshot of the update side's sequence number. */ > > > > > > > > +/* > > > > > > > > + * Take a snapshot of the update side's sequence number. > > > > > > > > + * > > > > > > > > + * This function predicts what the grace period number will be the next > > > > > > > > + * time an RCU callback will be executed, given the current grace period's > > > > > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > > > > > > > > + * already in progress. > > > > > > > > > > > > > > How about something like this? > > > > > > > > > > > > > > This function returns the earliest value of the grace-period > > > > > > > sequence number that will indicate that a full grace period has > > > > > > > elapsed since the current time. Once the grace-period sequence > > > > > > > number has reached this value, it will be safe to invoke all > > > > > > > callbacks that have been registered prior to the current time. > > > > > > > This value is the current grace-period number plus two to the > > > > > > > power of the number of low-order bits reserved for state, then > > > > > > > rounded up to the next value in which the state bits are all zero. > > > > > > > > > > > > This makes sense too, but do you disagree with what I said? > > > > > > > > > > In a pedantic sense, definitely. RCU callbacks are being executed pretty > > > > > much all the time on a busy system, so it is only the recently queued > > > > > ones that are guaranteed to be deferred that long. And my experience > > > > > indicates that someone really will get confused by that distinction, > > > > > so I feel justified in being pedantic in this case. > > > > > > > > Ok I agree, I'll include your comment above. > > > > > > > > > > Also just to let you know, thanks so much for elaborately providing an > > > > > > example on the other thread where we are discussing the rcu_seq_done check. I > > > > > > will take some time to trace this down and see if I can zero in on the same > > > > > > understanding as yours. > > > > > > > > > > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its > > > > > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing > > > > > > rnp->gp_seq in rcu_seq_done. When that rnp->gp_seq reaches 'c', it only > > > > > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what > > > > > > we were trying to check in that loop... that's why I felt that check wasn't > > > > > > correct - that's my (most likely wrong) take on the matter, and I'll get back > > > > > > once I trace this a bit more hopefully today :-P > > > > > > > > > > If your point is that interrupts are disabled throughout, so there isn't > > > > > much chance of the grace period completing during that time, you are > > > > > mostly right. The places you might not be right are the idle loop and > > > > > offline CPUs. And yes, call_rcu() doesn't like queuing callbacks onto > > > > > offline CPUs, but IIRC it is just fine in the case where callbacks have > > > > > been offloaded from that CPU. > > > > > > > > > > And if you instead say that "c" is the requested final ->gp_seq value > > > > > obtained from _snap(), the thought process might go more easily. > > > > > > > > Yes I agree with c being the requested final value which is the GP for which > > > > the callbacks will be queued. At the end of the GP c, the callbacks will have > > > > executed. > > > > > > > > About the rcu_seq_done check and why I believe its not right to use it in > > > > that funnel locking loop, if you could allow me to try argument my point from > > > > a different angle... > > > > > > > > We agreed that the way gp_seq numbers work and are compared with each other > > > > to identify if a GP is elapsed or not, is different from the way the previous > > > > numbers (gp_num) were compared. > > > > > > > > Most notably, before the gp_seq conversions - inorder to start a GP, we were > > > > doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the > > > > end. > > > > > > > > Now with gp_seq, for a gp to start, we don't do the "+1", we just set the > > > > state bits. To mark the end, we clear the state bits and increment the gp_num > > > > part of gp_seq. > > > > > > > > However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period > > > > requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE > > > > with rcu_seq_done. You did so even though the gp_seq numbers work differently > > > > from previously used numbers (gp_num and completed). > > > > > > > > I would then argue that because of the differences above, a one-to-one > > > > replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense. > > > > > > > > I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num > > > > way of things because, if c == gp_num, that means that : > > > > - c started already > > > > - c has finished. > > > > Which worked correctly, because we have nothing to do and we can bail > > > > without setting any flag. > > > > > > > > Where as now, with the gp_seq regime, c == gp_seq means: > > > > - c-1 finished (I meant -1 subtracted from the gp_num part of c) > > > > This would cause us to bail without setting any flag for starting c. > > > > > > > > I did some tracing and I could never hit the rcu_seq_done check because it > > > > never happens in my tracing that _snap returned something for which > > > > rcu_seq_done returned true, so I'm not sure if this check is needed, but > > > > you're the expert ;) > > > > > > > > @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > > > > * not be released. > > > > */ > > > > raw_lockdep_assert_held_rcu_node(rnp); > > > > + WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */ > > > > + WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */ > > > > trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); > > > > for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { > > > > if (rnp_root != rnp) > > > > raw_spin_lock_rcu_node(rnp_root); > > > > - WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum + > > > > - need_future_gp_mask(), c)); > > > > if (need_future_gp_element(rnp_root, c) || > > > > - ULONG_CMP_GE(rnp_root->gpnum, c) || > > > > + rcu_seq_done(&rnp_root->gp_seq, c) || > > > > > > > > ^^^^ > > > > A direct replacement of ULONG_CMP_GE is bit weird? It > > > > means we bail out if c-1 completed, and we don't set any > > > > flag for starting c. That could result in the clean up > > > > never starting c? > > > > > > Ah, I see what you are getting at now. > > > > > > What I do instead in 334dac2da529 ("rcu: Make rcu_nocb_wait_gp() check > > > if GP already requested") is to push the request down to the leaves of > > > the tree and to the rcu_data structure. Once that commit is in place, > > > the check for the grace period already being in progress isn't all > > > that helpful, though I suppose that it could be added. One way to > > > do that would be to replace "rcu_seq_done(&rnp_root->gp_seq, c)" with > > > ULONG_CMP_GE(rnp_root->gpnum, (c - RCU_SEQ_STATE_MASK))", but that seems > > > a bit baroque to me. > > > > > > The point of the rcu_seq_done() is to catch long delays, but given the > > > current implementation, the fact that interrupts are disabled across > > > all calls should prevent the rcu_seq_done() from ever returning true. > > > (Famous last words!) So, yes, it could be removed, in theory, at least. > > > At least until the real-time guys force me to come up with a way to > > > run this code with interrupts enabled (hopefully never!). > > > > > > If I were to do that, I would first wrap it with a WARN_ON_ONCE() and > > > leave it that way for an extended period of testing. Yes, I am paranoid. > > > Why do you ask? ;-) > > :-D > > > > Ah I see what you're doing in that commit where you're moving the furthest > > request down to the leaves, so that would protect against the scenario I was > > describing and set the gp_seq_needed of the leaf. > > But I came up with a less baroque check for a grace period having started, > at which point the question becomes "Why not just do both?", especially > since a check for a grace period having started is satisfied by that > grace period's having completed, which means minimal added overhead. > Perhaps no added overhead for some compilers and architectures. > > Please see the end of this email for a prototype patch. > > > The code would be correct then, but one issue is it would shout out the > > 'Prestarted' tracepoint for 'c' when that's not really true.. > > > > rcu_seq_done(&rnp_root->gp_seq, c) > > > > translates to ULONG_CMP_GE(&rnp_root->gp_seq, c) > > > > which translates to the fact that c-1 completed. > > > > So in this case if rcu_seq_done returns true, then saying that c has been > > 'Prestarted' seems a bit off to me. It should be 'Startedleaf' or something > > since what we really are doing is just marking the leaf as you mentioned in > > the unlock_out part for a future start. > > Indeed, some of the tracing is not all that accurate. But the trace > message itself contains the information needed to work out why the > loop was exited, so perhaps something like 'EarlyExit'? I think since you're now using rcu_seq_start to determine if c has started or completed since, the current 'Prestarted' trace will cover it. > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 59a4f38edcffbef1521852fe3b26ed4ed85af16e > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Tue May 15 11:53:41 2018 -0700 > > rcu: Make rcu_start_this_gp() check for grace period already started > > In the old days of ->gpnum and ->completed, the code requesting a new > grace period checked to see if that grace period had already started, > bailing early if so. The new-age ->gp_seq approach instead checks > whether the grace period has already finished. A compensating change > pushed the requested grace period down to the bottom of the tree, thus > reducing lock contention and even eliminating it in some cases. But why > not further reduce contention, especially on large systems, by doing both, > especially given that the cost of doing both is extremely small? > > This commit therefore adds a new rcu_seq_started() function that checks > whether a specified grace period has already started. It then uses > this new function in place of rcu_seq_done() in the rcu_start_this_gp() > function's funnel locking code. > > Reported-by: Joel Fernandes <joel@joelfernandes.org> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > index 003671825d62..1c5cbd9d7c97 100644 > --- a/kernel/rcu/rcu.h > +++ b/kernel/rcu/rcu.h > @@ -108,6 +108,15 @@ static inline unsigned long rcu_seq_current(unsigned long *sp) > } > > /* > + * Given a snapshot from rcu_seq_snap(), determine whether or not the > + * corresponding update-side operation has started. > + */ > +static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > +{ > + return ULONG_CMP_LT((s - 1) & ~RCU_SEQ_STATE_MASK, READ_ONCE(*sp)); > +} > + > +/* > * Given a snapshot from rcu_seq_snap(), determine whether or not a > * full update-side operation has occurred. > */ > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 9e900c5926cc..ed69f49b7054 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1580,7 +1580,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > if (rnp_root != rnp) > raw_spin_lock_rcu_node(rnp_root); > if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) || > - rcu_seq_done(&rnp_root->gp_seq, c) || > + rcu_seq_started(&rnp_root->gp_seq, c) || Yes, this does exactly what I was wanting, thanks! I think this puts our discussion about this to rest :-) By the way I was starting to beautify this loop like below last week, with code comments. I felt it would be easier to parse this loop in the future for whoever was reading it. Are you interested in such a patch? If not, let me know and I'll drop this and focus on the other changes you requested. Something like... (just an example , actual code would be different) for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) { int prestarted = 0; /* Acquire lock if non-leaf node */ if (rnp_node != rnp) raw_spin_lock_rcu_node(rnp_node); /* Has the GP asked been recorded as a future need */ if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start)) prestarted = 1; /* Has the GP requested for already been completed */ if (!prestarted && rcu_seq_completed(&rnp_node->gp_seq, gp_seq_start)) prestarted = 1; ... etc... if (prestarted) { trace_rcu_this_gp(rnp_node, rdp, gp_seq_start, TPS("Prestarted")); goto unlock_out; } thanks, - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-15 22:55 ` Joel Fernandes @ 2018-05-16 15:45 ` Paul E. McKenney 2018-05-16 23:21 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-16 15:45 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Tue, May 15, 2018 at 03:55:09PM -0700, Joel Fernandes wrote: > On Tue, May 15, 2018 at 12:08:01PM -0700, Paul E. McKenney wrote: > > On Tue, May 15, 2018 at 11:41:15AM -0700, Joel Fernandes wrote: > > > On Tue, May 15, 2018 at 05:55:07AM -0700, Paul E. McKenney wrote: > > > > On Tue, May 15, 2018 at 12:02:43AM -0700, Joel Fernandes wrote: > > > > > Hi Paul, > > > > > Good morning, hope you're having a great Tuesday. I managed to find some > > > > > evening hours today to dig into this a bit more. > > > > > > > > > > On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote: > > > > > > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote: > > > > > > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote: > > > > > > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote: > > > > > > > > > rcu_seq_snap may be tricky for someone looking at it for the first time. > > > > > > > > > Lets document how it works with an example to make it easier. > > > > > > > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > > --- > > > > > > > > > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > > > > > > > > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > > > > index 003671825d62..fc3170914ac7 100644 > > > > > > > > > --- a/kernel/rcu/rcu.h > > > > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > > > > > > > > > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > > > > > > > > > } > > > > > > > > > > > > > > > > > > -/* Take a snapshot of the update side's sequence number. */ > > > > > > > > > +/* > > > > > > > > > + * Take a snapshot of the update side's sequence number. > > > > > > > > > + * > > > > > > > > > + * This function predicts what the grace period number will be the next > > > > > > > > > + * time an RCU callback will be executed, given the current grace period's > > > > > > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > > > > > > > > > + * already in progress. > > > > > > > > > > > > > > > > How about something like this? > > > > > > > > > > > > > > > > This function returns the earliest value of the grace-period > > > > > > > > sequence number that will indicate that a full grace period has > > > > > > > > elapsed since the current time. Once the grace-period sequence > > > > > > > > number has reached this value, it will be safe to invoke all > > > > > > > > callbacks that have been registered prior to the current time. > > > > > > > > This value is the current grace-period number plus two to the > > > > > > > > power of the number of low-order bits reserved for state, then > > > > > > > > rounded up to the next value in which the state bits are all zero. > > > > > > > > > > > > > > This makes sense too, but do you disagree with what I said? > > > > > > > > > > > > In a pedantic sense, definitely. RCU callbacks are being executed pretty > > > > > > much all the time on a busy system, so it is only the recently queued > > > > > > ones that are guaranteed to be deferred that long. And my experience > > > > > > indicates that someone really will get confused by that distinction, > > > > > > so I feel justified in being pedantic in this case. > > > > > > > > > > Ok I agree, I'll include your comment above. > > > > > > > > > > > > Also just to let you know, thanks so much for elaborately providing an > > > > > > > example on the other thread where we are discussing the rcu_seq_done check. I > > > > > > > will take some time to trace this down and see if I can zero in on the same > > > > > > > understanding as yours. > > > > > > > > > > > > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its > > > > > > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing > > > > > > > rnp->gp_seq in rcu_seq_done. When that rnp->gp_seq reaches 'c', it only > > > > > > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what > > > > > > > we were trying to check in that loop... that's why I felt that check wasn't > > > > > > > correct - that's my (most likely wrong) take on the matter, and I'll get back > > > > > > > once I trace this a bit more hopefully today :-P > > > > > > > > > > > > If your point is that interrupts are disabled throughout, so there isn't > > > > > > much chance of the grace period completing during that time, you are > > > > > > mostly right. The places you might not be right are the idle loop and > > > > > > offline CPUs. And yes, call_rcu() doesn't like queuing callbacks onto > > > > > > offline CPUs, but IIRC it is just fine in the case where callbacks have > > > > > > been offloaded from that CPU. > > > > > > > > > > > > And if you instead say that "c" is the requested final ->gp_seq value > > > > > > obtained from _snap(), the thought process might go more easily. > > > > > > > > > > Yes I agree with c being the requested final value which is the GP for which > > > > > the callbacks will be queued. At the end of the GP c, the callbacks will have > > > > > executed. > > > > > > > > > > About the rcu_seq_done check and why I believe its not right to use it in > > > > > that funnel locking loop, if you could allow me to try argument my point from > > > > > a different angle... > > > > > > > > > > We agreed that the way gp_seq numbers work and are compared with each other > > > > > to identify if a GP is elapsed or not, is different from the way the previous > > > > > numbers (gp_num) were compared. > > > > > > > > > > Most notably, before the gp_seq conversions - inorder to start a GP, we were > > > > > doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the > > > > > end. > > > > > > > > > > Now with gp_seq, for a gp to start, we don't do the "+1", we just set the > > > > > state bits. To mark the end, we clear the state bits and increment the gp_num > > > > > part of gp_seq. > > > > > > > > > > However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period > > > > > requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE > > > > > with rcu_seq_done. You did so even though the gp_seq numbers work differently > > > > > from previously used numbers (gp_num and completed). > > > > > > > > > > I would then argue that because of the differences above, a one-to-one > > > > > replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense. > > > > > > > > > > I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num > > > > > way of things because, if c == gp_num, that means that : > > > > > - c started already > > > > > - c has finished. > > > > > Which worked correctly, because we have nothing to do and we can bail > > > > > without setting any flag. > > > > > > > > > > Where as now, with the gp_seq regime, c == gp_seq means: > > > > > - c-1 finished (I meant -1 subtracted from the gp_num part of c) > > > > > This would cause us to bail without setting any flag for starting c. > > > > > > > > > > I did some tracing and I could never hit the rcu_seq_done check because it > > > > > never happens in my tracing that _snap returned something for which > > > > > rcu_seq_done returned true, so I'm not sure if this check is needed, but > > > > > you're the expert ;) > > > > > > > > > > @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > > > > > * not be released. > > > > > */ > > > > > raw_lockdep_assert_held_rcu_node(rnp); > > > > > + WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */ > > > > > + WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */ > > > > > trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); > > > > > for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { > > > > > if (rnp_root != rnp) > > > > > raw_spin_lock_rcu_node(rnp_root); > > > > > - WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum + > > > > > - need_future_gp_mask(), c)); > > > > > if (need_future_gp_element(rnp_root, c) || > > > > > - ULONG_CMP_GE(rnp_root->gpnum, c) || > > > > > + rcu_seq_done(&rnp_root->gp_seq, c) || > > > > > > > > > > ^^^^ > > > > > A direct replacement of ULONG_CMP_GE is bit weird? It > > > > > means we bail out if c-1 completed, and we don't set any > > > > > flag for starting c. That could result in the clean up > > > > > never starting c? > > > > > > > > Ah, I see what you are getting at now. > > > > > > > > What I do instead in 334dac2da529 ("rcu: Make rcu_nocb_wait_gp() check > > > > if GP already requested") is to push the request down to the leaves of > > > > the tree and to the rcu_data structure. Once that commit is in place, > > > > the check for the grace period already being in progress isn't all > > > > that helpful, though I suppose that it could be added. One way to > > > > do that would be to replace "rcu_seq_done(&rnp_root->gp_seq, c)" with > > > > ULONG_CMP_GE(rnp_root->gpnum, (c - RCU_SEQ_STATE_MASK))", but that seems > > > > a bit baroque to me. > > > > > > > > The point of the rcu_seq_done() is to catch long delays, but given the > > > > current implementation, the fact that interrupts are disabled across > > > > all calls should prevent the rcu_seq_done() from ever returning true. > > > > (Famous last words!) So, yes, it could be removed, in theory, at least. > > > > At least until the real-time guys force me to come up with a way to > > > > run this code with interrupts enabled (hopefully never!). > > > > > > > > If I were to do that, I would first wrap it with a WARN_ON_ONCE() and > > > > leave it that way for an extended period of testing. Yes, I am paranoid. > > > > Why do you ask? ;-) > > > :-D > > > > > > Ah I see what you're doing in that commit where you're moving the furthest > > > request down to the leaves, so that would protect against the scenario I was > > > describing and set the gp_seq_needed of the leaf. > > > > But I came up with a less baroque check for a grace period having started, > > at which point the question becomes "Why not just do both?", especially > > since a check for a grace period having started is satisfied by that > > grace period's having completed, which means minimal added overhead. > > Perhaps no added overhead for some compilers and architectures. > > > > Please see the end of this email for a prototype patch. > > > > > The code would be correct then, but one issue is it would shout out the > > > 'Prestarted' tracepoint for 'c' when that's not really true.. > > > > > > rcu_seq_done(&rnp_root->gp_seq, c) > > > > > > translates to ULONG_CMP_GE(&rnp_root->gp_seq, c) > > > > > > which translates to the fact that c-1 completed. > > > > > > So in this case if rcu_seq_done returns true, then saying that c has been > > > 'Prestarted' seems a bit off to me. It should be 'Startedleaf' or something > > > since what we really are doing is just marking the leaf as you mentioned in > > > the unlock_out part for a future start. > > > > Indeed, some of the tracing is not all that accurate. But the trace > > message itself contains the information needed to work out why the > > loop was exited, so perhaps something like 'EarlyExit'? > > I think since you're now using rcu_seq_start to determine if c has started or > completed since, the current 'Prestarted' trace will cover it. "My work is done!" ;-) > > ------------------------------------------------------------------------ > > > > commit 59a4f38edcffbef1521852fe3b26ed4ed85af16e > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Date: Tue May 15 11:53:41 2018 -0700 > > > > rcu: Make rcu_start_this_gp() check for grace period already started > > > > In the old days of ->gpnum and ->completed, the code requesting a new > > grace period checked to see if that grace period had already started, > > bailing early if so. The new-age ->gp_seq approach instead checks > > whether the grace period has already finished. A compensating change > > pushed the requested grace period down to the bottom of the tree, thus > > reducing lock contention and even eliminating it in some cases. But why > > not further reduce contention, especially on large systems, by doing both, > > especially given that the cost of doing both is extremely small? > > > > This commit therefore adds a new rcu_seq_started() function that checks > > whether a specified grace period has already started. It then uses > > this new function in place of rcu_seq_done() in the rcu_start_this_gp() > > function's funnel locking code. > > > > Reported-by: Joel Fernandes <joel@joelfernandes.org> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > index 003671825d62..1c5cbd9d7c97 100644 > > --- a/kernel/rcu/rcu.h > > +++ b/kernel/rcu/rcu.h > > @@ -108,6 +108,15 @@ static inline unsigned long rcu_seq_current(unsigned long *sp) > > } > > > > /* > > + * Given a snapshot from rcu_seq_snap(), determine whether or not the > > + * corresponding update-side operation has started. > > + */ > > +static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > +{ > > + return ULONG_CMP_LT((s - 1) & ~RCU_SEQ_STATE_MASK, READ_ONCE(*sp)); > > +} > > + > > +/* > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > * full update-side operation has occurred. > > */ > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 9e900c5926cc..ed69f49b7054 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1580,7 +1580,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > > if (rnp_root != rnp) > > raw_spin_lock_rcu_node(rnp_root); > > if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) || > > - rcu_seq_done(&rnp_root->gp_seq, c) || > > + rcu_seq_started(&rnp_root->gp_seq, c) || > > Yes, this does exactly what I was wanting, thanks! I think this puts our > discussion about this to rest :-) > > By the way I was starting to beautify this loop like below last week, with > code comments. I felt it would be easier to parse this loop in the future > for whoever was reading it. Are you interested in such a patch? If not, let > me know and I'll drop this and focus on the other changes you requested. > > Something like... (just an example , actual code would be different) > > for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) { > int prestarted = 0; > > /* Acquire lock if non-leaf node */ > if (rnp_node != rnp) > raw_spin_lock_rcu_node(rnp_node); > > /* Has the GP asked been recorded as a future need */ > if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start)) > prestarted = 1; > > /* Has the GP requested for already been completed */ > if (!prestarted && rcu_seq_completed(&rnp_node->gp_seq, gp_seq_start)) > prestarted = 1; > > ... etc... > if (prestarted) { > trace_rcu_this_gp(rnp_node, rdp, gp_seq_start, > TPS("Prestarted")); > goto unlock_out; > } At the moment, I don't believe that the extra lines of code pay for themselves, but I do agree that this loop is a bit more obtuse than I would like. Thanx, Paul ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works 2018-05-16 15:45 ` Paul E. McKenney @ 2018-05-16 23:21 ` Joel Fernandes 0 siblings, 0 replies; 42+ messages in thread From: Joel Fernandes @ 2018-05-16 23:21 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Wed, May 16, 2018 at 08:45:43AM -0700, Paul E. McKenney wrote: [...] > > > > The code would be correct then, but one issue is it would shout out the > > > > 'Prestarted' tracepoint for 'c' when that's not really true.. > > > > > > > > rcu_seq_done(&rnp_root->gp_seq, c) > > > > > > > > translates to ULONG_CMP_GE(&rnp_root->gp_seq, c) > > > > > > > > which translates to the fact that c-1 completed. > > > > > > > > So in this case if rcu_seq_done returns true, then saying that c has been > > > > 'Prestarted' seems a bit off to me. It should be 'Startedleaf' or something > > > > since what we really are doing is just marking the leaf as you mentioned in > > > > the unlock_out part for a future start. > > > > > > Indeed, some of the tracing is not all that accurate. But the trace > > > message itself contains the information needed to work out why the > > > loop was exited, so perhaps something like 'EarlyExit'? > > > > I think since you're now using rcu_seq_start to determine if c has started or > > completed since, the current 'Prestarted' trace will cover it. > > "My work is done!" ;-) :-D Its cool how a conversation can turn into a code improvement. > > > ------------------------------------------------------------------------ > > > > > > commit 59a4f38edcffbef1521852fe3b26ed4ed85af16e > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > Date: Tue May 15 11:53:41 2018 -0700 > > > > > > rcu: Make rcu_start_this_gp() check for grace period already started > > > > > > In the old days of ->gpnum and ->completed, the code requesting a new > > > grace period checked to see if that grace period had already started, > > > bailing early if so. The new-age ->gp_seq approach instead checks > > > whether the grace period has already finished. A compensating change > > > pushed the requested grace period down to the bottom of the tree, thus > > > reducing lock contention and even eliminating it in some cases. But why > > > not further reduce contention, especially on large systems, by doing both, > > > especially given that the cost of doing both is extremely small? > > > > > > This commit therefore adds a new rcu_seq_started() function that checks > > > whether a specified grace period has already started. It then uses > > > this new function in place of rcu_seq_done() in the rcu_start_this_gp() > > > function's funnel locking code. > > > > > > Reported-by: Joel Fernandes <joel@joelfernandes.org> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > index 003671825d62..1c5cbd9d7c97 100644 > > > --- a/kernel/rcu/rcu.h > > > +++ b/kernel/rcu/rcu.h > > > @@ -108,6 +108,15 @@ static inline unsigned long rcu_seq_current(unsigned long *sp) > > > } > > > > > > /* > > > + * Given a snapshot from rcu_seq_snap(), determine whether or not the > > > + * corresponding update-side operation has started. > > > + */ > > > +static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > +{ > > > + return ULONG_CMP_LT((s - 1) & ~RCU_SEQ_STATE_MASK, READ_ONCE(*sp)); > > > +} > > > + > > > +/* > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > * full update-side operation has occurred. > > > */ > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 9e900c5926cc..ed69f49b7054 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -1580,7 +1580,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > > > if (rnp_root != rnp) > > > raw_spin_lock_rcu_node(rnp_root); > > > if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) || > > > - rcu_seq_done(&rnp_root->gp_seq, c) || > > > + rcu_seq_started(&rnp_root->gp_seq, c) || > > > > Yes, this does exactly what I was wanting, thanks! I think this puts our > > discussion about this to rest :-) > > > > By the way I was starting to beautify this loop like below last week, with > > code comments. I felt it would be easier to parse this loop in the future > > for whoever was reading it. Are you interested in such a patch? If not, let > > me know and I'll drop this and focus on the other changes you requested. > > > > Something like... (just an example , actual code would be different) > > > > for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) { > > int prestarted = 0; > > > > /* Acquire lock if non-leaf node */ > > if (rnp_node != rnp) > > raw_spin_lock_rcu_node(rnp_node); > > > > /* Has the GP asked been recorded as a future need */ > > if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start)) > > prestarted = 1; > > > > /* Has the GP requested for already been completed */ > > if (!prestarted && rcu_seq_completed(&rnp_node->gp_seq, gp_seq_start)) > > prestarted = 1; > > > > ... etc... > > if (prestarted) { > > trace_rcu_this_gp(rnp_node, rdp, gp_seq_start, > > TPS("Prestarted")); > > goto unlock_out; > > } > > At the moment, I don't believe that the extra lines of code pay for > themselves, but I do agree that this loop is a bit more obtuse than I > would like. Yeah I was also thinking the same. I'm glad I checked, thanks for the feedback. I'll focus on the other comments then. thanks, - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU 2018-05-14 3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google) 2018-05-14 3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google) @ 2018-05-14 3:15 ` Joel Fernandes (Google) 2018-05-14 14:54 ` Steven Rostedt 2018-05-14 3:15 ` [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint Joel Fernandes (Google) ` (5 subsequent siblings) 7 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes (Google) @ 2018-05-14 3:15 UTC (permalink / raw) To: linux-kernel Cc: Joel Fernandes (Google), Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team Recently we had a discussion about cond_resched unconditionally recording a voluntary context switch [1]. Lets add a comment clarifying that how this API is to be used. [1] https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.park@lge.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- include/linux/rcupdate.h | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 743226176350..a9881007ece6 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { } } while (0) /* - * Note a voluntary context switch for RCU-tasks benefit. This is a - * macro rather than an inline function to avoid #include hell. + * Note an attempt to perform a voluntary context switch for RCU-tasks benefit. + * + * This is called even in situations where a context switch didn't really + * happen even though it was requested. The caller uses it to indicate + * traversal of an RCU-tasks quiescent state. This is a macro rather than an + * inline function to avoid #include hell. */ #ifdef CONFIG_TASKS_RCU #define rcu_note_voluntary_context_switch_lite(t) \ @@ -187,7 +191,8 @@ static inline void exit_tasks_rcu_finish(void) { } #endif /* #else #ifdef CONFIG_TASKS_RCU */ /** - * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU + * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU. + * The quiescent state report is made even if cond_resched() did nothing. * * This macro resembles cond_resched(), except that it is defined to * report potential quiescent states to RCU-tasks even if the cond_resched() -- 2.17.0.441.gb46fe60e1d-goog ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU 2018-05-14 3:15 ` [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU Joel Fernandes (Google) @ 2018-05-14 14:54 ` Steven Rostedt 2018-05-14 17:22 ` Paul E. McKenney 0 siblings, 1 reply; 42+ messages in thread From: Steven Rostedt @ 2018-05-14 14:54 UTC (permalink / raw) To: Joel Fernandes (Google) Cc: linux-kernel, Paul E. McKenney, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Sun, 13 May 2018 20:15:35 -0700 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote: > Recently we had a discussion about cond_resched unconditionally > recording a voluntary context switch [1]. > > Lets add a comment clarifying that how this API is to be used. > > [1] https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.park@lge.com > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > include/linux/rcupdate.h | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 743226176350..a9881007ece6 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { } > } while (0) > > /* > - * Note a voluntary context switch for RCU-tasks benefit. This is a > - * macro rather than an inline function to avoid #include hell. > + * Note an attempt to perform a voluntary context switch for RCU-tasks benefit. > + * > + * This is called even in situations where a context switch didn't really > + * happen even though it was requested. The caller uses it to indicate > + * traversal of an RCU-tasks quiescent state. This is a macro rather than an > + * inline function to avoid #include hell. I don't know. I just don't like the wording. It sounds too much like it was written by someone that was confused for it being called when a context switch didn't occur ;-) What about something more like: /* * This is called to denote a RCU-task quiescent state. It is placed at * voluntary preemption points, as RCU-task critical sections may not * perform voluntary preemption or scheduling calls. It does not matter * if the task is scheduled out or not, just that a voluntary preemption * may be done. */ > */ > #ifdef CONFIG_TASKS_RCU > #define rcu_note_voluntary_context_switch_lite(t) \ > @@ -187,7 +191,8 @@ static inline void exit_tasks_rcu_finish(void) { } > #endif /* #else #ifdef CONFIG_TASKS_RCU */ > > /** > - * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU > + * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU. > + * The quiescent state report is made even if cond_resched() did nothing. Same thing here. * The quiescent state report does not depend on cond_resched() scheduling. -- Steve > * > * This macro resembles cond_resched(), except that it is defined to > * report potential quiescent states to RCU-tasks even if the cond_resched() ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU 2018-05-14 14:54 ` Steven Rostedt @ 2018-05-14 17:22 ` Paul E. McKenney 2018-05-15 0:35 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-14 17:22 UTC (permalink / raw) To: Steven Rostedt Cc: Joel Fernandes (Google), linux-kernel, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 10:54:54AM -0400, Steven Rostedt wrote: > On Sun, 13 May 2018 20:15:35 -0700 > "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote: > > > Recently we had a discussion about cond_resched unconditionally > > recording a voluntary context switch [1]. > > > > Lets add a comment clarifying that how this API is to be used. > > > > [1] https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.park@lge.com > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > --- > > include/linux/rcupdate.h | 11 ++++++++--- > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 743226176350..a9881007ece6 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { } > > } while (0) > > > > /* > > - * Note a voluntary context switch for RCU-tasks benefit. This is a > > - * macro rather than an inline function to avoid #include hell. > > + * Note an attempt to perform a voluntary context switch for RCU-tasks benefit. > > + * > > + * This is called even in situations where a context switch didn't really > > + * happen even though it was requested. The caller uses it to indicate > > + * traversal of an RCU-tasks quiescent state. This is a macro rather than an > > + * inline function to avoid #include hell. > > I don't know. I just don't like the wording. It sounds too much like > it was written by someone that was confused for it being called when a > context switch didn't occur ;-) > > What about something more like: > > /* > * This is called to denote a RCU-task quiescent state. It is placed at > * voluntary preemption points, as RCU-task critical sections may not > * perform voluntary preemption or scheduling calls. It does not matter > * if the task is scheduled out or not, just that a voluntary preemption > * may be done. > */ s/RCU-task/RCU-tasks/ and I am good with this. Thanx, Paul > > */ > > #ifdef CONFIG_TASKS_RCU > > #define rcu_note_voluntary_context_switch_lite(t) \ > > @@ -187,7 +191,8 @@ static inline void exit_tasks_rcu_finish(void) { } > > #endif /* #else #ifdef CONFIG_TASKS_RCU */ > > > > /** > > - * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU > > + * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU. > > + * The quiescent state report is made even if cond_resched() did nothing. > > Same thing here. > > * The quiescent state report does not depend on cond_resched() scheduling. > > -- Steve > > > > * > > * This macro resembles cond_resched(), except that it is defined to > > * report potential quiescent states to RCU-tasks even if the cond_resched() > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU 2018-05-14 17:22 ` Paul E. McKenney @ 2018-05-15 0:35 ` Joel Fernandes 2018-05-15 3:42 ` Paul E. McKenney 0 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 0:35 UTC (permalink / raw) To: Paul E. McKenney Cc: Steven Rostedt, linux-kernel, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 10:22:05AM -0700, Paul E. McKenney wrote: > On Mon, May 14, 2018 at 10:54:54AM -0400, Steven Rostedt wrote: > > On Sun, 13 May 2018 20:15:35 -0700 > > "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote: > > > > > Recently we had a discussion about cond_resched unconditionally > > > recording a voluntary context switch [1]. > > > > > > Lets add a comment clarifying that how this API is to be used. > > > > > > [1] https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.park@lge.com > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > --- > > > include/linux/rcupdate.h | 11 ++++++++--- > > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > > index 743226176350..a9881007ece6 100644 > > > --- a/include/linux/rcupdate.h > > > +++ b/include/linux/rcupdate.h > > > @@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { } > > > } while (0) > > > > > > /* > > > - * Note a voluntary context switch for RCU-tasks benefit. This is a > > > - * macro rather than an inline function to avoid #include hell. > > > + * Note an attempt to perform a voluntary context switch for RCU-tasks benefit. > > > + * > > > + * This is called even in situations where a context switch didn't really > > > + * happen even though it was requested. The caller uses it to indicate > > > + * traversal of an RCU-tasks quiescent state. This is a macro rather than an > > > + * inline function to avoid #include hell. > > > > I don't know. I just don't like the wording. It sounds too much like > > it was written by someone that was confused for it being called when a > > context switch didn't occur ;-) True :) > > > > What about something more like: > > > > /* > > * This is called to denote a RCU-task quiescent state. It is placed at > > * voluntary preemption points, as RCU-task critical sections may not > > * perform voluntary preemption or scheduling calls. It does not matter > > * if the task is scheduled out or not, just that a voluntary preemption > > * may be done. > > */ > > s/RCU-task/RCU-tasks/ and I am good with this. Ok. I like Steve's comment better too. Btw, I see you just posted a change of the macro name from rcu_note_voluntary_context_switch_lite to rcu_tasks_qs which actually in itself is much more descriptive. Considering this, I feel the new name is quite self-documenting in itself. So I am more inclined to drop this patch in any series reposting, but let me know if you feel otherwise. thanks, - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU 2018-05-15 0:35 ` Joel Fernandes @ 2018-05-15 3:42 ` Paul E. McKenney 0 siblings, 0 replies; 42+ messages in thread From: Paul E. McKenney @ 2018-05-15 3:42 UTC (permalink / raw) To: Joel Fernandes Cc: Steven Rostedt, linux-kernel, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 05:35:55PM -0700, Joel Fernandes wrote: > On Mon, May 14, 2018 at 10:22:05AM -0700, Paul E. McKenney wrote: > > On Mon, May 14, 2018 at 10:54:54AM -0400, Steven Rostedt wrote: > > > On Sun, 13 May 2018 20:15:35 -0700 > > > "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote: > > > > > > > Recently we had a discussion about cond_resched unconditionally > > > > recording a voluntary context switch [1]. > > > > > > > > Lets add a comment clarifying that how this API is to be used. > > > > > > > > [1] https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.park@lge.com > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > --- > > > > include/linux/rcupdate.h | 11 ++++++++--- > > > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > > > index 743226176350..a9881007ece6 100644 > > > > --- a/include/linux/rcupdate.h > > > > +++ b/include/linux/rcupdate.h > > > > @@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { } > > > > } while (0) > > > > > > > > /* > > > > - * Note a voluntary context switch for RCU-tasks benefit. This is a > > > > - * macro rather than an inline function to avoid #include hell. > > > > + * Note an attempt to perform a voluntary context switch for RCU-tasks benefit. > > > > + * > > > > + * This is called even in situations where a context switch didn't really > > > > + * happen even though it was requested. The caller uses it to indicate > > > > + * traversal of an RCU-tasks quiescent state. This is a macro rather than an > > > > + * inline function to avoid #include hell. > > > > > > I don't know. I just don't like the wording. It sounds too much like > > > it was written by someone that was confused for it being called when a > > > context switch didn't occur ;-) > > True :) > > > > > > > What about something more like: > > > > > > /* > > > * This is called to denote a RCU-task quiescent state. It is placed at > > > * voluntary preemption points, as RCU-task critical sections may not > > > * perform voluntary preemption or scheduling calls. It does not matter > > > * if the task is scheduled out or not, just that a voluntary preemption > > > * may be done. > > > */ > > > > s/RCU-task/RCU-tasks/ and I am good with this. > > Ok. I like Steve's comment better too. > > Btw, I see you just posted a change of the macro name from > rcu_note_voluntary_context_switch_lite to rcu_tasks_qs which actually in > itself is much more descriptive. Considering this, I feel the new name is > quite self-documenting in itself. So I am more inclined to drop this patch in > any series reposting, but let me know if you feel otherwise. I agree, but let's see what other people think. Thanx, Paul ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint 2018-05-14 3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google) 2018-05-14 3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google) 2018-05-14 3:15 ` [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU Joel Fernandes (Google) @ 2018-05-14 3:15 ` Joel Fernandes (Google) 2018-05-14 18:12 ` Paul E. McKenney 2018-05-14 3:15 ` [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU Joel Fernandes (Google) ` (4 subsequent siblings) 7 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes (Google) @ 2018-05-14 3:15 UTC (permalink / raw) To: linux-kernel Cc: Joel Fernandes (Google), Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team Commit be4b8beed87d ("rcu: Move RCU's grace-period-change code to ->gp_seq") removed the cpuend grace period trace point. This patch adds it back. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- kernel/rcu/tree.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9ad931bff409..29ccc60bdbfc 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1774,10 +1774,12 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp, /* Handle the ends of any preceding grace periods first. */ if (rcu_seq_completed_gp(rdp->gp_seq, rnp->gp_seq) || - unlikely(READ_ONCE(rdp->gpwrap))) + unlikely(READ_ONCE(rdp->gpwrap))) { ret = rcu_advance_cbs(rsp, rnp, rdp); /* Advance callbacks. */ - else + trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpuend")); + } else { ret = rcu_accelerate_cbs(rsp, rnp, rdp); /* Recent callbacks. */ + } /* Now handle the beginnings of any new-to-this-CPU grace periods. */ if (rcu_seq_new_gp(rdp->gp_seq, rnp->gp_seq) || -- 2.17.0.441.gb46fe60e1d-goog ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint 2018-05-14 3:15 ` [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint Joel Fernandes (Google) @ 2018-05-14 18:12 ` Paul E. McKenney 2018-05-15 0:43 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-14 18:12 UTC (permalink / raw) To: Joel Fernandes (Google) Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Sun, May 13, 2018 at 08:15:36PM -0700, Joel Fernandes (Google) wrote: > Commit be4b8beed87d ("rcu: Move RCU's grace-period-change code to ->gp_seq") > removed the cpuend grace period trace point. This patch adds it back. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Good catch!!! I queued this to be squashed into the offending commit, with attribution. Thanx, Paul > --- > kernel/rcu/tree.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 9ad931bff409..29ccc60bdbfc 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1774,10 +1774,12 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp, > > /* Handle the ends of any preceding grace periods first. */ > if (rcu_seq_completed_gp(rdp->gp_seq, rnp->gp_seq) || > - unlikely(READ_ONCE(rdp->gpwrap))) > + unlikely(READ_ONCE(rdp->gpwrap))) { > ret = rcu_advance_cbs(rsp, rnp, rdp); /* Advance callbacks. */ > - else > + trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpuend")); > + } else { > ret = rcu_accelerate_cbs(rsp, rnp, rdp); /* Recent callbacks. */ > + } > > /* Now handle the beginnings of any new-to-this-CPU grace periods. */ > if (rcu_seq_new_gp(rdp->gp_seq, rnp->gp_seq) || > -- > 2.17.0.441.gb46fe60e1d-goog > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint 2018-05-14 18:12 ` Paul E. McKenney @ 2018-05-15 0:43 ` Joel Fernandes 0 siblings, 0 replies; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 0:43 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 11:12:27AM -0700, Paul E. McKenney wrote: > On Sun, May 13, 2018 at 08:15:36PM -0700, Joel Fernandes (Google) wrote: > > Commit be4b8beed87d ("rcu: Move RCU's grace-period-change code to ->gp_seq") > > removed the cpuend grace period trace point. This patch adds it back. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > Good catch!!! I queued this to be squashed into the offending commit, > with attribution. Cool, thanks! - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU 2018-05-14 3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google) ` (2 preceding siblings ...) 2018-05-14 3:15 ` [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint Joel Fernandes (Google) @ 2018-05-14 3:15 ` Joel Fernandes (Google) 2018-05-14 17:57 ` Paul E. McKenney 2018-05-14 3:15 ` [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop Joel Fernandes (Google) ` (3 subsequent siblings) 7 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes (Google) @ 2018-05-14 3:15 UTC (permalink / raw) To: linux-kernel Cc: Joel Fernandes (Google), Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team The 'c' variable was used previously to store the grace period that is being requested. However it is not very meaningful for a code reader, this patch replaces it with gp_seq_start indicating that this is the grace period that was requested. Also updating tracing with the new name. Just a clean up patch, no logical change. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- include/trace/events/rcu.h | 15 ++++++------ kernel/rcu/tree.c | 47 ++++++++++++++++++++++---------------- 2 files changed, 35 insertions(+), 27 deletions(-) diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index ce9d1a1cac78..539900a9f8c7 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -103,15 +103,16 @@ TRACE_EVENT(rcu_grace_period, */ TRACE_EVENT(rcu_future_grace_period, - TP_PROTO(const char *rcuname, unsigned long gp_seq, unsigned long c, - u8 level, int grplo, int grphi, const char *gpevent), + TP_PROTO(const char *rcuname, unsigned long gp_seq, + unsigned long gp_seq_start, u8 level, int grplo, int grphi, + const char *gpevent), - TP_ARGS(rcuname, gp_seq, c, level, grplo, grphi, gpevent), + TP_ARGS(rcuname, gp_seq, gp_seq_start, level, grplo, grphi, gpevent), TP_STRUCT__entry( __field(const char *, rcuname) __field(unsigned long, gp_seq) - __field(unsigned long, c) + __field(unsigned long, gp_seq_start) __field(u8, level) __field(int, grplo) __field(int, grphi) @@ -121,7 +122,7 @@ TRACE_EVENT(rcu_future_grace_period, TP_fast_assign( __entry->rcuname = rcuname; __entry->gp_seq = gp_seq; - __entry->c = c; + __entry->gp_seq_start = gp_seq_start; __entry->level = level; __entry->grplo = grplo; __entry->grphi = grphi; @@ -129,7 +130,7 @@ TRACE_EVENT(rcu_future_grace_period, ), TP_printk("%s %lu %lu %u %d %d %s", - __entry->rcuname, __entry->gp_seq, __entry->c, __entry->level, + __entry->rcuname, __entry->gp_seq, __entry->gp_seq_start, __entry->level, __entry->grplo, __entry->grphi, __entry->gpevent) ); @@ -751,7 +752,7 @@ TRACE_EVENT(rcu_barrier, #else /* #ifdef CONFIG_RCU_TRACE */ #define trace_rcu_grace_period(rcuname, gp_seq, gpevent) do { } while (0) -#define trace_rcu_future_grace_period(rcuname, gp_seq, c, \ +#define trace_rcu_future_grace_period(rcuname, gp_seq, gp_seq_start, \ level, grplo, grphi, event) \ do { } while (0) #define trace_rcu_grace_period_init(rcuname, gp_seq, level, grplo, grphi, \ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 29ccc60bdbfc..9f5679ba413b 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1541,13 +1541,18 @@ void rcu_cpu_stall_reset(void) /* Trace-event wrapper function for trace_rcu_future_grace_period. */ static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, - unsigned long c, const char *s) + unsigned long gp_seq_start, const char *s) { - trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, c, + trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, gp_seq_start, rnp->level, rnp->grplo, rnp->grphi, s); } /* + * rcu_start_this_gp - Request the start of a particular grace period + * @rnp: The leaf node of the CPU from which to start. + * @rdp: The rcu_data corresponding to the CPU from which to start. + * @gp_seq_start: The gp_seq of the grace period to start. + * * Start the specified grace period, as needed to handle newly arrived * callbacks. The required future grace periods are recorded in each * rcu_node structure's ->gp_seq_needed field. Returns true if there @@ -1555,9 +1560,11 @@ static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, * * The caller must hold the specified rcu_node structure's ->lock, which * is why the caller is responsible for waking the grace-period kthread. + * + * Returns true if the GP thread needs to be awakened else false. */ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, - unsigned long c) + unsigned long gp_seq_start) { bool ret = false; struct rcu_state *rsp = rdp->rsp; @@ -1573,18 +1580,19 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, * not be released. */ raw_lockdep_assert_held_rcu_node(rnp); - trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); + trace_rcu_this_gp(rnp, rdp, gp_seq_start, TPS("Startleaf")); for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { if (rnp_root != rnp) raw_spin_lock_rcu_node(rnp_root); - if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) || - rcu_seq_done(&rnp_root->gp_seq, c) || + if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start) || + rcu_seq_done(&rnp_root->gp_seq, gp_seq_start) || (rnp != rnp_root && rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) { - trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted")); + trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, + TPS("Prestarted")); goto unlock_out; } - rnp_root->gp_seq_needed = c; + rnp_root->gp_seq_needed = gp_seq_start; if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) goto unlock_out; if (rnp_root != rnp && rnp_root->parent != NULL) @@ -1595,14 +1603,14 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, /* If GP already in progress, just leave, otherwise start one. */ if (rcu_gp_in_progress(rsp)) { - trace_rcu_this_gp(rnp_root, rdp, c, TPS("Startedleafroot")); + trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("Startedleafroot")); goto unlock_out; } - trace_rcu_this_gp(rnp_root, rdp, c, TPS("Startedroot")); + trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("Startedroot")); WRITE_ONCE(rsp->gp_flags, rsp->gp_flags | RCU_GP_FLAG_INIT); rsp->gp_req_activity = jiffies; if (!rsp->gp_kthread) { - trace_rcu_this_gp(rnp_root, rdp, c, TPS("NoGPkthread")); + trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("NoGPkthread")); goto unlock_out; } trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq")); @@ -1611,9 +1619,9 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, if (rnp != rnp_root) raw_spin_unlock_rcu_node(rnp_root); /* Push furthest requested GP to leaf node and rcu_data structure. */ - if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c)) { - rnp->gp_seq_needed = c; - rdp->gp_seq_needed = c; + if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start)) { + rnp->gp_seq_needed = gp_seq_start; + rdp->gp_seq_needed = gp_seq_start; } return ret; } @@ -1624,14 +1632,13 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, */ static bool rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) { - unsigned long c = rnp->gp_seq; bool needmore; struct rcu_data *rdp = this_cpu_ptr(rsp->rda); needmore = ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed); if (!needmore) rnp->gp_seq_needed = rnp->gp_seq; /* Avoid counter wrap. */ - trace_rcu_this_gp(rnp, rdp, c, + trace_rcu_this_gp(rnp, rdp, rnp->gp_seq, needmore ? TPS("CleanupMore") : TPS("Cleanup")); return needmore; } @@ -1667,7 +1674,7 @@ static void rcu_gp_kthread_wake(struct rcu_state *rsp) static bool rcu_accelerate_cbs(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp) { - unsigned long c; + unsigned long gp_seq_start; bool ret = false; raw_lockdep_assert_held_rcu_node(rnp); @@ -1686,9 +1693,9 @@ static bool rcu_accelerate_cbs(struct rcu_state *rsp, struct rcu_node *rnp, * accelerating callback invocation to an earlier grace-period * number. */ - c = rcu_seq_snap(&rsp->gp_seq); - if (rcu_segcblist_accelerate(&rdp->cblist, c)) - ret = rcu_start_this_gp(rnp, rdp, c); + gp_seq_start = rcu_seq_snap(&rsp->gp_seq); + if (rcu_segcblist_accelerate(&rdp->cblist, gp_seq_start)) + ret = rcu_start_this_gp(rnp, rdp, gp_seq_start); /* Trace depending on how much we were able to accelerate. */ if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL)) -- 2.17.0.441.gb46fe60e1d-goog ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU 2018-05-14 3:15 ` [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU Joel Fernandes (Google) @ 2018-05-14 17:57 ` Paul E. McKenney 2018-05-15 0:41 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-14 17:57 UTC (permalink / raw) To: Joel Fernandes (Google) Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Sun, May 13, 2018 at 08:15:37PM -0700, Joel Fernandes (Google) wrote: > The 'c' variable was used previously to store the grace period > that is being requested. However it is not very meaningful for > a code reader, this patch replaces it with gp_seq_start indicating that Good catch, but how about "gp_seq_req" instead of gp_seq_start? Thanx, Paul > this is the grace period that was requested. Also updating tracing with > the new name. > > Just a clean up patch, no logical change. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > include/trace/events/rcu.h | 15 ++++++------ > kernel/rcu/tree.c | 47 ++++++++++++++++++++++---------------- > 2 files changed, 35 insertions(+), 27 deletions(-) > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h > index ce9d1a1cac78..539900a9f8c7 100644 > --- a/include/trace/events/rcu.h > +++ b/include/trace/events/rcu.h > @@ -103,15 +103,16 @@ TRACE_EVENT(rcu_grace_period, > */ > TRACE_EVENT(rcu_future_grace_period, > > - TP_PROTO(const char *rcuname, unsigned long gp_seq, unsigned long c, > - u8 level, int grplo, int grphi, const char *gpevent), > + TP_PROTO(const char *rcuname, unsigned long gp_seq, > + unsigned long gp_seq_start, u8 level, int grplo, int grphi, > + const char *gpevent), > > - TP_ARGS(rcuname, gp_seq, c, level, grplo, grphi, gpevent), > + TP_ARGS(rcuname, gp_seq, gp_seq_start, level, grplo, grphi, gpevent), > > TP_STRUCT__entry( > __field(const char *, rcuname) > __field(unsigned long, gp_seq) > - __field(unsigned long, c) > + __field(unsigned long, gp_seq_start) > __field(u8, level) > __field(int, grplo) > __field(int, grphi) > @@ -121,7 +122,7 @@ TRACE_EVENT(rcu_future_grace_period, > TP_fast_assign( > __entry->rcuname = rcuname; > __entry->gp_seq = gp_seq; > - __entry->c = c; > + __entry->gp_seq_start = gp_seq_start; > __entry->level = level; > __entry->grplo = grplo; > __entry->grphi = grphi; > @@ -129,7 +130,7 @@ TRACE_EVENT(rcu_future_grace_period, > ), > > TP_printk("%s %lu %lu %u %d %d %s", > - __entry->rcuname, __entry->gp_seq, __entry->c, __entry->level, > + __entry->rcuname, __entry->gp_seq, __entry->gp_seq_start, __entry->level, > __entry->grplo, __entry->grphi, __entry->gpevent) > ); > > @@ -751,7 +752,7 @@ TRACE_EVENT(rcu_barrier, > #else /* #ifdef CONFIG_RCU_TRACE */ > > #define trace_rcu_grace_period(rcuname, gp_seq, gpevent) do { } while (0) > -#define trace_rcu_future_grace_period(rcuname, gp_seq, c, \ > +#define trace_rcu_future_grace_period(rcuname, gp_seq, gp_seq_start, \ > level, grplo, grphi, event) \ > do { } while (0) > #define trace_rcu_grace_period_init(rcuname, gp_seq, level, grplo, grphi, \ > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 29ccc60bdbfc..9f5679ba413b 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1541,13 +1541,18 @@ void rcu_cpu_stall_reset(void) > > /* Trace-event wrapper function for trace_rcu_future_grace_period. */ > static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > - unsigned long c, const char *s) > + unsigned long gp_seq_start, const char *s) > { > - trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, c, > + trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, gp_seq_start, > rnp->level, rnp->grplo, rnp->grphi, s); > } > > /* > + * rcu_start_this_gp - Request the start of a particular grace period > + * @rnp: The leaf node of the CPU from which to start. > + * @rdp: The rcu_data corresponding to the CPU from which to start. > + * @gp_seq_start: The gp_seq of the grace period to start. > + * > * Start the specified grace period, as needed to handle newly arrived > * callbacks. The required future grace periods are recorded in each > * rcu_node structure's ->gp_seq_needed field. Returns true if there > @@ -1555,9 +1560,11 @@ static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > * > * The caller must hold the specified rcu_node structure's ->lock, which > * is why the caller is responsible for waking the grace-period kthread. > + * > + * Returns true if the GP thread needs to be awakened else false. > */ > static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > - unsigned long c) > + unsigned long gp_seq_start) > { > bool ret = false; > struct rcu_state *rsp = rdp->rsp; > @@ -1573,18 +1580,19 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > * not be released. > */ > raw_lockdep_assert_held_rcu_node(rnp); > - trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); > + trace_rcu_this_gp(rnp, rdp, gp_seq_start, TPS("Startleaf")); > for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { > if (rnp_root != rnp) > raw_spin_lock_rcu_node(rnp_root); > - if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) || > - rcu_seq_done(&rnp_root->gp_seq, c) || > + if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start) || > + rcu_seq_done(&rnp_root->gp_seq, gp_seq_start) || > (rnp != rnp_root && > rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) { > - trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted")); > + trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, > + TPS("Prestarted")); > goto unlock_out; > } > - rnp_root->gp_seq_needed = c; > + rnp_root->gp_seq_needed = gp_seq_start; > if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) > goto unlock_out; > if (rnp_root != rnp && rnp_root->parent != NULL) > @@ -1595,14 +1603,14 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > > /* If GP already in progress, just leave, otherwise start one. */ > if (rcu_gp_in_progress(rsp)) { > - trace_rcu_this_gp(rnp_root, rdp, c, TPS("Startedleafroot")); > + trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("Startedleafroot")); > goto unlock_out; > } > - trace_rcu_this_gp(rnp_root, rdp, c, TPS("Startedroot")); > + trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("Startedroot")); > WRITE_ONCE(rsp->gp_flags, rsp->gp_flags | RCU_GP_FLAG_INIT); > rsp->gp_req_activity = jiffies; > if (!rsp->gp_kthread) { > - trace_rcu_this_gp(rnp_root, rdp, c, TPS("NoGPkthread")); > + trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("NoGPkthread")); > goto unlock_out; > } > trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq")); > @@ -1611,9 +1619,9 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > if (rnp != rnp_root) > raw_spin_unlock_rcu_node(rnp_root); > /* Push furthest requested GP to leaf node and rcu_data structure. */ > - if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c)) { > - rnp->gp_seq_needed = c; > - rdp->gp_seq_needed = c; > + if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start)) { > + rnp->gp_seq_needed = gp_seq_start; > + rdp->gp_seq_needed = gp_seq_start; > } > return ret; > } > @@ -1624,14 +1632,13 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > */ > static bool rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) > { > - unsigned long c = rnp->gp_seq; > bool needmore; > struct rcu_data *rdp = this_cpu_ptr(rsp->rda); > > needmore = ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed); > if (!needmore) > rnp->gp_seq_needed = rnp->gp_seq; /* Avoid counter wrap. */ > - trace_rcu_this_gp(rnp, rdp, c, > + trace_rcu_this_gp(rnp, rdp, rnp->gp_seq, > needmore ? TPS("CleanupMore") : TPS("Cleanup")); > return needmore; > } > @@ -1667,7 +1674,7 @@ static void rcu_gp_kthread_wake(struct rcu_state *rsp) > static bool rcu_accelerate_cbs(struct rcu_state *rsp, struct rcu_node *rnp, > struct rcu_data *rdp) > { > - unsigned long c; > + unsigned long gp_seq_start; > bool ret = false; > > raw_lockdep_assert_held_rcu_node(rnp); > @@ -1686,9 +1693,9 @@ static bool rcu_accelerate_cbs(struct rcu_state *rsp, struct rcu_node *rnp, > * accelerating callback invocation to an earlier grace-period > * number. > */ > - c = rcu_seq_snap(&rsp->gp_seq); > - if (rcu_segcblist_accelerate(&rdp->cblist, c)) > - ret = rcu_start_this_gp(rnp, rdp, c); > + gp_seq_start = rcu_seq_snap(&rsp->gp_seq); > + if (rcu_segcblist_accelerate(&rdp->cblist, gp_seq_start)) > + ret = rcu_start_this_gp(rnp, rdp, gp_seq_start); > > /* Trace depending on how much we were able to accelerate. */ > if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL)) > -- > 2.17.0.441.gb46fe60e1d-goog > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU 2018-05-14 17:57 ` Paul E. McKenney @ 2018-05-15 0:41 ` Joel Fernandes 0 siblings, 0 replies; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 0:41 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 10:57:39AM -0700, Paul E. McKenney wrote: > On Sun, May 13, 2018 at 08:15:37PM -0700, Joel Fernandes (Google) wrote: > > The 'c' variable was used previously to store the grace period > > that is being requested. However it is not very meaningful for > > a code reader, this patch replaces it with gp_seq_start indicating that > > Good catch, but how about "gp_seq_req" instead of gp_seq_start? Yes, that works as well and is also 2 character less. Will change it. thanks, - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop 2018-05-14 3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google) ` (3 preceding siblings ...) 2018-05-14 3:15 ` [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU Joel Fernandes (Google) @ 2018-05-14 3:15 ` Joel Fernandes (Google) 2018-05-14 18:00 ` Paul E. McKenney 2018-05-14 3:15 ` [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint Joel Fernandes (Google) ` (2 subsequent siblings) 7 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes (Google) @ 2018-05-14 3:15 UTC (permalink / raw) To: linux-kernel Cc: Joel Fernandes (Google), Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team The funnel locking loop in rcu_start_this_gp uses rcu_root as a temporary variable while walking the combining tree. This causes a tiresome exercise of a code reader reminding themselves that rcu_root may not be root. Lets just call it rcu_node, and then finally when rcu_node is the rcu_root, lets assign it at that time. Just a clean up patch, no logical change. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- kernel/rcu/tree.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9f5679ba413b..40670047d22c 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1568,7 +1568,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, { bool ret = false; struct rcu_state *rsp = rdp->rsp; - struct rcu_node *rnp_root; + struct rcu_node *rnp_node, *rnp_root = NULL; /* * Use funnel locking to either acquire the root rcu_node @@ -1581,24 +1581,26 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, */ raw_lockdep_assert_held_rcu_node(rnp); trace_rcu_this_gp(rnp, rdp, gp_seq_start, TPS("Startleaf")); - for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { - if (rnp_root != rnp) - raw_spin_lock_rcu_node(rnp_root); - if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start) || - rcu_seq_done(&rnp_root->gp_seq, gp_seq_start) || - (rnp != rnp_root && - rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) { - trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, + for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) { + if (rnp_node != rnp) + raw_spin_lock_rcu_node(rnp_node); + if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start) || + rcu_seq_done(&rnp_node->gp_seq, gp_seq_start) || + (rnp != rnp_node && + rcu_seq_state(rcu_seq_current(&rnp_node->gp_seq)))) { + trace_rcu_this_gp(rnp_node, rdp, gp_seq_start, TPS("Prestarted")); goto unlock_out; } - rnp_root->gp_seq_needed = gp_seq_start; + rnp_node->gp_seq_needed = gp_seq_start; if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) goto unlock_out; - if (rnp_root != rnp && rnp_root->parent != NULL) - raw_spin_unlock_rcu_node(rnp_root); - if (!rnp_root->parent) + if (rnp_node != rnp && rnp_node->parent != NULL) + raw_spin_unlock_rcu_node(rnp_node); + if (!rnp_node->parent) { + rnp_root = rnp_node; break; /* At root, and perhaps also leaf. */ + } } /* If GP already in progress, just leave, otherwise start one. */ @@ -1616,10 +1618,10 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq")); ret = true; /* Caller must wake GP kthread. */ unlock_out: - if (rnp != rnp_root) - raw_spin_unlock_rcu_node(rnp_root); + if (rnp != rnp_node) + raw_spin_unlock_rcu_node(rnp_node); /* Push furthest requested GP to leaf node and rcu_data structure. */ - if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start)) { + if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start)) { rnp->gp_seq_needed = gp_seq_start; rdp->gp_seq_needed = gp_seq_start; } -- 2.17.0.441.gb46fe60e1d-goog ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop 2018-05-14 3:15 ` [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop Joel Fernandes (Google) @ 2018-05-14 18:00 ` Paul E. McKenney 2018-05-15 0:43 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-14 18:00 UTC (permalink / raw) To: Joel Fernandes (Google) Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Sun, May 13, 2018 at 08:15:38PM -0700, Joel Fernandes (Google) wrote: > The funnel locking loop in rcu_start_this_gp uses rcu_root as a > temporary variable while walking the combining tree. This causes a > tiresome exercise of a code reader reminding themselves that rcu_root > may not be root. Lets just call it rcu_node, and then finally when > rcu_node is the rcu_root, lets assign it at that time. > > Just a clean up patch, no logical change. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> I agree that my names could use some improvement, but given that you called it rnp_node in the patch and rcu_node in the commit log, I would argue that rnp_node has a Hamming-distance problem. ;-) How about rnp_start for the formal parameter, rnp for the cursor running up the tree, and retaining rnp_root for the root? I considered and rejected the thought of rnp_leaf for the formal parameter because this function is not necessarily called with a leaf rcu_node structure. Thanx, Paul > --- > kernel/rcu/tree.c | 34 ++++++++++++++++++---------------- > 1 file changed, 18 insertions(+), 16 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 9f5679ba413b..40670047d22c 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1568,7 +1568,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > { > bool ret = false; > struct rcu_state *rsp = rdp->rsp; > - struct rcu_node *rnp_root; > + struct rcu_node *rnp_node, *rnp_root = NULL; > > /* > * Use funnel locking to either acquire the root rcu_node > @@ -1581,24 +1581,26 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > */ > raw_lockdep_assert_held_rcu_node(rnp); > trace_rcu_this_gp(rnp, rdp, gp_seq_start, TPS("Startleaf")); > - for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { > - if (rnp_root != rnp) > - raw_spin_lock_rcu_node(rnp_root); > - if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start) || > - rcu_seq_done(&rnp_root->gp_seq, gp_seq_start) || > - (rnp != rnp_root && > - rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) { > - trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, > + for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) { > + if (rnp_node != rnp) > + raw_spin_lock_rcu_node(rnp_node); > + if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start) || > + rcu_seq_done(&rnp_node->gp_seq, gp_seq_start) || > + (rnp != rnp_node && > + rcu_seq_state(rcu_seq_current(&rnp_node->gp_seq)))) { > + trace_rcu_this_gp(rnp_node, rdp, gp_seq_start, > TPS("Prestarted")); > goto unlock_out; > } > - rnp_root->gp_seq_needed = gp_seq_start; > + rnp_node->gp_seq_needed = gp_seq_start; > if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) > goto unlock_out; > - if (rnp_root != rnp && rnp_root->parent != NULL) > - raw_spin_unlock_rcu_node(rnp_root); > - if (!rnp_root->parent) > + if (rnp_node != rnp && rnp_node->parent != NULL) > + raw_spin_unlock_rcu_node(rnp_node); > + if (!rnp_node->parent) { > + rnp_root = rnp_node; > break; /* At root, and perhaps also leaf. */ > + } > } > > /* If GP already in progress, just leave, otherwise start one. */ > @@ -1616,10 +1618,10 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq")); > ret = true; /* Caller must wake GP kthread. */ > unlock_out: > - if (rnp != rnp_root) > - raw_spin_unlock_rcu_node(rnp_root); > + if (rnp != rnp_node) > + raw_spin_unlock_rcu_node(rnp_node); > /* Push furthest requested GP to leaf node and rcu_data structure. */ > - if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start)) { > + if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start)) { > rnp->gp_seq_needed = gp_seq_start; > rdp->gp_seq_needed = gp_seq_start; > } > -- > 2.17.0.441.gb46fe60e1d-goog > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop 2018-05-14 18:00 ` Paul E. McKenney @ 2018-05-15 0:43 ` Joel Fernandes 0 siblings, 0 replies; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 0:43 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 11:00:27AM -0700, Paul E. McKenney wrote: > On Sun, May 13, 2018 at 08:15:38PM -0700, Joel Fernandes (Google) wrote: > > The funnel locking loop in rcu_start_this_gp uses rcu_root as a > > temporary variable while walking the combining tree. This causes a > > tiresome exercise of a code reader reminding themselves that rcu_root > > may not be root. Lets just call it rcu_node, and then finally when > > rcu_node is the rcu_root, lets assign it at that time. > > > > Just a clean up patch, no logical change. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > I agree that my names could use some improvement, but given that you > called it rnp_node in the patch and rcu_node in the commit log, I would > argue that rnp_node has a Hamming-distance problem. ;-) > > How about rnp_start for the formal parameter, rnp for the cursor running > up the tree, and retaining rnp_root for the root? Ok, that's fine with me. Probably easier to read too. I will rewrite it accordingly. thanks, - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint 2018-05-14 3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google) ` (4 preceding siblings ...) 2018-05-14 3:15 ` [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop Joel Fernandes (Google) @ 2018-05-14 3:15 ` Joel Fernandes (Google) 2018-05-14 18:38 ` Paul E. McKenney 2018-05-14 3:15 ` [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed Joel Fernandes (Google) 2018-05-14 3:15 ` [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number Joel Fernandes (Google) 7 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes (Google) @ 2018-05-14 3:15 UTC (permalink / raw) To: linux-kernel Cc: Joel Fernandes (Google), Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team In recent discussion [1], the check for whether a leaf believes RCU is not idle, is being added back to funnel locking code, to avoid more locking. In this we are marking the leaf node for a future grace-period and bailing out since a GP is currently in progress. However the tracepoint is missing. Lets add it back. Also add a small comment about why we do this check (basically the point is to avoid locking intermediate nodes unnecessarily) and clarify the comments in the trace event header now that we are doing traversal of one or more intermediate nodes. [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- include/trace/events/rcu.h | 4 ++-- kernel/rcu/tree.c | 11 ++++++++++- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index 539900a9f8c7..dc0bd11739c7 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period, * * "Startleaf": Request a grace period based on leaf-node data. * "Prestarted": Someone beat us to the request - * "Startedleaf": Leaf-node start proved sufficient. - * "Startedleafroot": Leaf-node start proved sufficient after checking root. + * "Startedleaf": Leaf and one or more non-root nodes marked for future start. + * "Startedleafroot": all non-root nodes from leaf to root marked for future start. * "Startedroot": Requested a nocb grace period based on root-node data. * "NoGPkthread": The RCU grace-period kthread has not yet started. * "StartWait": Start waiting for the requested grace period. diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 40670047d22c..8401a253e7de 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1593,8 +1593,17 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, goto unlock_out; } rnp_node->gp_seq_needed = gp_seq_start; - if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) + + /* + * Check if leaf believes a GP is in progress, if yes we can + * bail and avoid more locking. We have already marked the leaf. + */ + if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) { + trace_rcu_this_gp(rnp_node, rdp, gp_seq_start, + TPS("Startedleaf")); goto unlock_out; + } + if (rnp_node != rnp && rnp_node->parent != NULL) raw_spin_unlock_rcu_node(rnp_node); if (!rnp_node->parent) { -- 2.17.0.441.gb46fe60e1d-goog ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint 2018-05-14 3:15 ` [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint Joel Fernandes (Google) @ 2018-05-14 18:38 ` Paul E. McKenney 2018-05-15 0:57 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-14 18:38 UTC (permalink / raw) To: Joel Fernandes (Google) Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote: > In recent discussion [1], the check for whether a leaf believes RCU is > not idle, is being added back to funnel locking code, to avoid more > locking. In this we are marking the leaf node for a future grace-period > and bailing out since a GP is currently in progress. However the > tracepoint is missing. Lets add it back. > > Also add a small comment about why we do this check (basically the point > is to avoid locking intermediate nodes unnecessarily) and clarify the > comments in the trace event header now that we are doing traversal of > one or more intermediate nodes. > > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Looks like a good idea, but it does not apply -- which is not a surprise, given the change rate in this code. I hand-applied as a modification to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress") with attribution, but with the changes below. Please let me know if I am missing something. Ah, I see -- this commit depends on your earlier name-change commit. I therefore made this patch use the old names. > --- > include/trace/events/rcu.h | 4 ++-- > kernel/rcu/tree.c | 11 ++++++++++- > 2 files changed, 12 insertions(+), 3 deletions(-) > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h > index 539900a9f8c7..dc0bd11739c7 100644 > --- a/include/trace/events/rcu.h > +++ b/include/trace/events/rcu.h > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period, > * > * "Startleaf": Request a grace period based on leaf-node data. > * "Prestarted": Someone beat us to the request > - * "Startedleaf": Leaf-node start proved sufficient. > - * "Startedleafroot": Leaf-node start proved sufficient after checking root. > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start. Actually, we only get to that trace if all we did was mark the leaf node, right? > + * "Startedleafroot": all non-root nodes from leaf to root marked for future start. I got rid of the "non-root" part, given that we had to have marked the root to break out of the loop. Thanx, Paul > * "Startedroot": Requested a nocb grace period based on root-node data. > * "NoGPkthread": The RCU grace-period kthread has not yet started. > * "StartWait": Start waiting for the requested grace period. > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 40670047d22c..8401a253e7de 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1593,8 +1593,17 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > goto unlock_out; > } > rnp_node->gp_seq_needed = gp_seq_start; > - if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) > + > + /* > + * Check if leaf believes a GP is in progress, if yes we can > + * bail and avoid more locking. We have already marked the leaf. > + */ > + if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) { > + trace_rcu_this_gp(rnp_node, rdp, gp_seq_start, > + TPS("Startedleaf")); > goto unlock_out; > + } > + > if (rnp_node != rnp && rnp_node->parent != NULL) > raw_spin_unlock_rcu_node(rnp_node); > if (!rnp_node->parent) { > -- > 2.17.0.441.gb46fe60e1d-goog > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint 2018-05-14 18:38 ` Paul E. McKenney @ 2018-05-15 0:57 ` Joel Fernandes 2018-05-15 3:46 ` Paul E. McKenney 0 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 0:57 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 11:38:23AM -0700, Paul E. McKenney wrote: > On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote: > > In recent discussion [1], the check for whether a leaf believes RCU is > > not idle, is being added back to funnel locking code, to avoid more > > locking. In this we are marking the leaf node for a future grace-period > > and bailing out since a GP is currently in progress. However the > > tracepoint is missing. Lets add it back. > > > > Also add a small comment about why we do this check (basically the point > > is to avoid locking intermediate nodes unnecessarily) and clarify the > > comments in the trace event header now that we are doing traversal of > > one or more intermediate nodes. > > > > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > Looks like a good idea, but it does not apply -- which is not a surprise, > given the change rate in this code. I hand-applied as a modification > to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress") > with attribution, but with the changes below. Please let me know if I > am missing something. > > Ah, I see -- this commit depends on your earlier name-change commit. > I therefore made this patch use the old names. Ok, I'll check your new tree and rebase. > > --- > > include/trace/events/rcu.h | 4 ++-- > > kernel/rcu/tree.c | 11 ++++++++++- > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h > > index 539900a9f8c7..dc0bd11739c7 100644 > > --- a/include/trace/events/rcu.h > > +++ b/include/trace/events/rcu.h > > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period, > > * > > * "Startleaf": Request a grace period based on leaf-node data. > > * "Prestarted": Someone beat us to the request > > - * "Startedleaf": Leaf-node start proved sufficient. > > - * "Startedleafroot": Leaf-node start proved sufficient after checking root. > > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start. > > Actually, we only get to that trace if all we did was mark the leaf > node, right? I didn't think so. In the code we are doing the check for rnp every time we walk up the tree. So even when we are on an intermediate node, we do the check of the node we started with. I thought that's what you wanted to do. It makes sense to me to do so too. > > + * "Startedleafroot": all non-root nodes from leaf to root marked for future start. > > I got rid of the "non-root" part, given that we had to have marked > the root to break out of the loop. Ah yes, sorry. That's absolutely true. thanks, - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint 2018-05-15 0:57 ` Joel Fernandes @ 2018-05-15 3:46 ` Paul E. McKenney 2018-05-15 23:04 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-15 3:46 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 05:57:09PM -0700, Joel Fernandes wrote: > On Mon, May 14, 2018 at 11:38:23AM -0700, Paul E. McKenney wrote: > > On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote: > > > In recent discussion [1], the check for whether a leaf believes RCU is > > > not idle, is being added back to funnel locking code, to avoid more > > > locking. In this we are marking the leaf node for a future grace-period > > > and bailing out since a GP is currently in progress. However the > > > tracepoint is missing. Lets add it back. > > > > > > Also add a small comment about why we do this check (basically the point > > > is to avoid locking intermediate nodes unnecessarily) and clarify the > > > comments in the trace event header now that we are doing traversal of > > > one or more intermediate nodes. > > > > > > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > Looks like a good idea, but it does not apply -- which is not a surprise, > > given the change rate in this code. I hand-applied as a modification > > to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress") > > with attribution, but with the changes below. Please let me know if I > > am missing something. > > > > Ah, I see -- this commit depends on your earlier name-change commit. > > I therefore made this patch use the old names. > > Ok, I'll check your new tree and rebase. Sounds good! > > > --- > > > include/trace/events/rcu.h | 4 ++-- > > > kernel/rcu/tree.c | 11 ++++++++++- > > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h > > > index 539900a9f8c7..dc0bd11739c7 100644 > > > --- a/include/trace/events/rcu.h > > > +++ b/include/trace/events/rcu.h > > > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period, > > > * > > > * "Startleaf": Request a grace period based on leaf-node data. > > > * "Prestarted": Someone beat us to the request > > > - * "Startedleaf": Leaf-node start proved sufficient. > > > - * "Startedleafroot": Leaf-node start proved sufficient after checking root. > > > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start. > > > > Actually, we only get to that trace if all we did was mark the leaf > > node, right? > > I didn't think so. In the code we are doing the check for rnp every time we > walk up the tree. So even when we are on an intermediate node, we do the > check of the node we started with. I thought that's what you wanted to do. It > makes sense to me to do so too. If we are not on the initial (usually leaf) node, then the similar check in the previous "if" statement would have sent us to unlock_out, right? (And yes, I should have said "mark the initial node" above.) Thanx, Paul > > > + * "Startedleafroot": all non-root nodes from leaf to root marked for future start. > > > > I got rid of the "non-root" part, given that we had to have marked > > the root to break out of the loop. > > Ah yes, sorry. That's absolutely true. > > thanks, > > - Joel > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint 2018-05-15 3:46 ` Paul E. McKenney @ 2018-05-15 23:04 ` Joel Fernandes 2018-05-16 15:48 ` Paul E. McKenney 0 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 23:04 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 08:46:03PM -0700, Paul E. McKenney wrote: > On Mon, May 14, 2018 at 05:57:09PM -0700, Joel Fernandes wrote: > > On Mon, May 14, 2018 at 11:38:23AM -0700, Paul E. McKenney wrote: > > > On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote: > > > > In recent discussion [1], the check for whether a leaf believes RCU is > > > > not idle, is being added back to funnel locking code, to avoid more > > > > locking. In this we are marking the leaf node for a future grace-period > > > > and bailing out since a GP is currently in progress. However the > > > > tracepoint is missing. Lets add it back. > > > > > > > > Also add a small comment about why we do this check (basically the point > > > > is to avoid locking intermediate nodes unnecessarily) and clarify the > > > > comments in the trace event header now that we are doing traversal of > > > > one or more intermediate nodes. > > > > > > > > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > Looks like a good idea, but it does not apply -- which is not a surprise, > > > given the change rate in this code. I hand-applied as a modification > > > to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress") > > > with attribution, but with the changes below. Please let me know if I > > > am missing something. > > > > > > Ah, I see -- this commit depends on your earlier name-change commit. > > > I therefore made this patch use the old names. > > > > Ok, I'll check your new tree and rebase. > > Sounds good! > > > > > --- > > > > include/trace/events/rcu.h | 4 ++-- > > > > kernel/rcu/tree.c | 11 ++++++++++- > > > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h > > > > index 539900a9f8c7..dc0bd11739c7 100644 > > > > --- a/include/trace/events/rcu.h > > > > +++ b/include/trace/events/rcu.h > > > > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period, > > > > * > > > > * "Startleaf": Request a grace period based on leaf-node data. > > > > * "Prestarted": Someone beat us to the request > > > > - * "Startedleaf": Leaf-node start proved sufficient. > > > > - * "Startedleafroot": Leaf-node start proved sufficient after checking root. > > > > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start. > > > > > > Actually, we only get to that trace if all we did was mark the leaf > > > node, right? > > > > I didn't think so. In the code we are doing the check for rnp every time we > > walk up the tree. So even when we are on an intermediate node, we do the > > check of the node we started with. I thought that's what you wanted to do. It > > makes sense to me to do so too. > > If we are not on the initial (usually leaf) node, then the similar check > in the previous "if" statement would have sent us to unlock_out, right? > > (And yes, I should have said "mark the initial node" above.) I may have missed this, sorry. Yes, that would be true unless the check could be true not at the firsti iteration, but after the first iteration? (i.e. another path started the initially idle GP). That's why I changed it to "one or more non-root nodes marked". What do you think? thanks, - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint 2018-05-15 23:04 ` Joel Fernandes @ 2018-05-16 15:48 ` Paul E. McKenney 2018-05-16 23:13 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-16 15:48 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Tue, May 15, 2018 at 04:04:30PM -0700, Joel Fernandes wrote: > On Mon, May 14, 2018 at 08:46:03PM -0700, Paul E. McKenney wrote: > > On Mon, May 14, 2018 at 05:57:09PM -0700, Joel Fernandes wrote: > > > On Mon, May 14, 2018 at 11:38:23AM -0700, Paul E. McKenney wrote: > > > > On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote: > > > > > In recent discussion [1], the check for whether a leaf believes RCU is > > > > > not idle, is being added back to funnel locking code, to avoid more > > > > > locking. In this we are marking the leaf node for a future grace-period > > > > > and bailing out since a GP is currently in progress. However the > > > > > tracepoint is missing. Lets add it back. > > > > > > > > > > Also add a small comment about why we do this check (basically the point > > > > > is to avoid locking intermediate nodes unnecessarily) and clarify the > > > > > comments in the trace event header now that we are doing traversal of > > > > > one or more intermediate nodes. > > > > > > > > > > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > Looks like a good idea, but it does not apply -- which is not a surprise, > > > > given the change rate in this code. I hand-applied as a modification > > > > to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress") > > > > with attribution, but with the changes below. Please let me know if I > > > > am missing something. > > > > > > > > Ah, I see -- this commit depends on your earlier name-change commit. > > > > I therefore made this patch use the old names. > > > > > > Ok, I'll check your new tree and rebase. > > > > Sounds good! > > > > > > > --- > > > > > include/trace/events/rcu.h | 4 ++-- > > > > > kernel/rcu/tree.c | 11 ++++++++++- > > > > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h > > > > > index 539900a9f8c7..dc0bd11739c7 100644 > > > > > --- a/include/trace/events/rcu.h > > > > > +++ b/include/trace/events/rcu.h > > > > > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period, > > > > > * > > > > > * "Startleaf": Request a grace period based on leaf-node data. > > > > > * "Prestarted": Someone beat us to the request > > > > > - * "Startedleaf": Leaf-node start proved sufficient. > > > > > - * "Startedleafroot": Leaf-node start proved sufficient after checking root. > > > > > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start. > > > > > > > > Actually, we only get to that trace if all we did was mark the leaf > > > > node, right? > > > > > > I didn't think so. In the code we are doing the check for rnp every time we > > > walk up the tree. So even when we are on an intermediate node, we do the > > > check of the node we started with. I thought that's what you wanted to do. It > > > makes sense to me to do so too. > > > > If we are not on the initial (usually leaf) node, then the similar check > > in the previous "if" statement would have sent us to unlock_out, right? > > > > (And yes, I should have said "mark the initial node" above.) > > I may have missed this, sorry. > > Yes, that would be true unless the check could be true not at the firsti > iteration, but after the first iteration? (i.e. another path started the > initially idle GP). That's why I changed it to "one or more non-root nodes > marked". After the first iteration, the check after setting ->gp_seq_needed is dead code. If that check would have succeeded, the same check in the big "if" statement would have taken the early exit. Thanx, Paul > What do you think? > > thanks, > > - Joel > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint 2018-05-16 15:48 ` Paul E. McKenney @ 2018-05-16 23:13 ` Joel Fernandes 0 siblings, 0 replies; 42+ messages in thread From: Joel Fernandes @ 2018-05-16 23:13 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Wed, May 16, 2018 at 08:48:29AM -0700, Paul E. McKenney wrote: > On Tue, May 15, 2018 at 04:04:30PM -0700, Joel Fernandes wrote: > > On Mon, May 14, 2018 at 08:46:03PM -0700, Paul E. McKenney wrote: > > > On Mon, May 14, 2018 at 05:57:09PM -0700, Joel Fernandes wrote: > > > > On Mon, May 14, 2018 at 11:38:23AM -0700, Paul E. McKenney wrote: > > > > > On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote: > > > > > > In recent discussion [1], the check for whether a leaf believes RCU is > > > > > > not idle, is being added back to funnel locking code, to avoid more > > > > > > locking. In this we are marking the leaf node for a future grace-period > > > > > > and bailing out since a GP is currently in progress. However the > > > > > > tracepoint is missing. Lets add it back. > > > > > > > > > > > > Also add a small comment about why we do this check (basically the point > > > > > > is to avoid locking intermediate nodes unnecessarily) and clarify the > > > > > > comments in the trace event header now that we are doing traversal of > > > > > > one or more intermediate nodes. > > > > > > > > > > > > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > > > Looks like a good idea, but it does not apply -- which is not a surprise, > > > > > given the change rate in this code. I hand-applied as a modification > > > > > to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress") > > > > > with attribution, but with the changes below. Please let me know if I > > > > > am missing something. > > > > > > > > > > Ah, I see -- this commit depends on your earlier name-change commit. > > > > > I therefore made this patch use the old names. > > > > > > > > Ok, I'll check your new tree and rebase. > > > > > > Sounds good! > > > > > > > > > --- > > > > > > include/trace/events/rcu.h | 4 ++-- > > > > > > kernel/rcu/tree.c | 11 ++++++++++- > > > > > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h > > > > > > index 539900a9f8c7..dc0bd11739c7 100644 > > > > > > --- a/include/trace/events/rcu.h > > > > > > +++ b/include/trace/events/rcu.h > > > > > > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period, > > > > > > * > > > > > > * "Startleaf": Request a grace period based on leaf-node data. > > > > > > * "Prestarted": Someone beat us to the request > > > > > > - * "Startedleaf": Leaf-node start proved sufficient. > > > > > > - * "Startedleafroot": Leaf-node start proved sufficient after checking root. > > > > > > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start. > > > > > > > > > > Actually, we only get to that trace if all we did was mark the leaf > > > > > node, right? > > > > > > > > I didn't think so. In the code we are doing the check for rnp every time we > > > > walk up the tree. So even when we are on an intermediate node, we do the > > > > check of the node we started with. I thought that's what you wanted to do. It > > > > makes sense to me to do so too. > > > > > > If we are not on the initial (usually leaf) node, then the similar check > > > in the previous "if" statement would have sent us to unlock_out, right? > > > > > > (And yes, I should have said "mark the initial node" above.) > > > > I may have missed this, sorry. > > > > Yes, that would be true unless the check could be true not at the firsti > > iteration, but after the first iteration? (i.e. another path started the > > initially idle GP). That's why I changed it to "one or more non-root nodes > > marked". > > After the first iteration, the check after setting ->gp_seq_needed is > dead code. If that check would have succeeded, the same check in the > big "if" statement would have taken the early exit. Oh yes, ofcourse!! I understand it now. thanks, - Joel ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed 2018-05-14 3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google) ` (5 preceding siblings ...) 2018-05-14 3:15 ` [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint Joel Fernandes (Google) @ 2018-05-14 3:15 ` Joel Fernandes (Google) 2018-05-14 19:20 ` Paul E. McKenney 2018-05-14 3:15 ` [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number Joel Fernandes (Google) 7 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes (Google) @ 2018-05-14 3:15 UTC (permalink / raw) To: linux-kernel Cc: Joel Fernandes (Google), Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team Currently the tree RCU clean up code records a CleanupMore trace event even if the GP was already in progress. This makes CleanupMore show up twice for no reason. Avoid it. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- kernel/rcu/tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8401a253e7de..25c44328d071 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2083,7 +2083,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp) rsp->gp_state = RCU_GP_IDLE; /* Check for GP requests since above loop. */ rdp = this_cpu_ptr(rsp->rda); - if (ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) { + if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) { trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed, TPS("CleanupMore")); needgp = true; -- 2.17.0.441.gb46fe60e1d-goog ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed 2018-05-14 3:15 ` [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed Joel Fernandes (Google) @ 2018-05-14 19:20 ` Paul E. McKenney 2018-05-15 1:01 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-14 19:20 UTC (permalink / raw) To: Joel Fernandes (Google) Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Sun, May 13, 2018 at 08:15:40PM -0700, Joel Fernandes (Google) wrote: > Currently the tree RCU clean up code records a CleanupMore trace event > even if the GP was already in progress. This makes CleanupMore show up > twice for no reason. Avoid it. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Good catch, and I applied this patch. I did rework the commit log a bit, so please look it over to make sure I didn't mess it up. Thanx, Paul ------------------------------------------------------------------------ commit 52c4e689efd975f5383895b1bc1b91bc90fdd372 Author: Joel Fernandes (Google) <joel@joelfernandes.org> Date: Sun May 13 20:15:40 2018 -0700 rcu: Produce last "CleanupMore" trace only if late-breaking request Currently the tree RCU clean-up code records a "CleanupMore" trace event in response to late-arriving grace-period requests even if the grace period was already requested. This makes "CleanupMore" show up an extra time (in addition to once for each rcu_node structure that was previously marked with the request) for no good reason. This commit therefore avoids emitting this trace message unless the only if the only request for this next grace period arrived during or after the cleanup scan of the rcu_node structures. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8063a0478870..de6447dd73de 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2072,7 +2072,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp) rsp->gp_state = RCU_GP_IDLE; /* Check for GP requests since above loop. */ rdp = this_cpu_ptr(rsp->rda); - if (ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) { + if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) { trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed, TPS("CleanupMore")); needgp = true; ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed 2018-05-14 19:20 ` Paul E. McKenney @ 2018-05-15 1:01 ` Joel Fernandes 2018-05-15 3:47 ` Paul E. McKenney 0 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 1:01 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 12:20:28PM -0700, Paul E. McKenney wrote: > On Sun, May 13, 2018 at 08:15:40PM -0700, Joel Fernandes (Google) wrote: > > Currently the tree RCU clean up code records a CleanupMore trace event > > even if the GP was already in progress. This makes CleanupMore show up > > twice for no reason. Avoid it. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > Good catch, and I applied this patch. I did rework the commit log > a bit, so please look it over to make sure I didn't mess it up. > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 52c4e689efd975f5383895b1bc1b91bc90fdd372 > Author: Joel Fernandes (Google) <joel@joelfernandes.org> > Date: Sun May 13 20:15:40 2018 -0700 > > rcu: Produce last "CleanupMore" trace only if late-breaking request > > Currently the tree RCU clean-up code records a "CleanupMore" trace > event in response to late-arriving grace-period requests even if the > grace period was already requested. This makes "CleanupMore" show up an > extra time (in addition to once for each rcu_node structure that was > previously marked with the request) for no good reason. This commit > therefore avoids emitting this trace message unless the only if the only > request for this next grace period arrived during or after the cleanup > scan of the rcu_node structures. Yes, this is fine except "unless the only if the only" should be "unless the". thanks, - Joel > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 8063a0478870..de6447dd73de 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2072,7 +2072,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp) > rsp->gp_state = RCU_GP_IDLE; > /* Check for GP requests since above loop. */ > rdp = this_cpu_ptr(rsp->rda); > - if (ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) { > + if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) { > trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed, > TPS("CleanupMore")); > needgp = true; > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed 2018-05-15 1:01 ` Joel Fernandes @ 2018-05-15 3:47 ` Paul E. McKenney 0 siblings, 0 replies; 42+ messages in thread From: Paul E. McKenney @ 2018-05-15 3:47 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 06:01:31PM -0700, Joel Fernandes wrote: > On Mon, May 14, 2018 at 12:20:28PM -0700, Paul E. McKenney wrote: > > On Sun, May 13, 2018 at 08:15:40PM -0700, Joel Fernandes (Google) wrote: > > > Currently the tree RCU clean up code records a CleanupMore trace event > > > even if the GP was already in progress. This makes CleanupMore show up > > > twice for no reason. Avoid it. > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > Good catch, and I applied this patch. I did rework the commit log > > a bit, so please look it over to make sure I didn't mess it up. > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit 52c4e689efd975f5383895b1bc1b91bc90fdd372 > > Author: Joel Fernandes (Google) <joel@joelfernandes.org> > > Date: Sun May 13 20:15:40 2018 -0700 > > > > rcu: Produce last "CleanupMore" trace only if late-breaking request > > > > Currently the tree RCU clean-up code records a "CleanupMore" trace > > event in response to late-arriving grace-period requests even if the > > grace period was already requested. This makes "CleanupMore" show up an > > extra time (in addition to once for each rcu_node structure that was > > previously marked with the request) for no good reason. This commit > > therefore avoids emitting this trace message unless the only if the only > > request for this next grace period arrived during or after the cleanup > > scan of the rcu_node structures. > > Yes, this is fine except "unless the only if the only" should be "unless the". I did update this after sending, and I still have "the the". Will fix on next rebase. As my daughter recently reminded me, the Law of Conservation of Bugs. ;-) Thanx, Paul > thanks, > > - Joel > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 8063a0478870..de6447dd73de 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2072,7 +2072,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp) > > rsp->gp_state = RCU_GP_IDLE; > > /* Check for GP requests since above loop. */ > > rdp = this_cpu_ptr(rsp->rda); > > - if (ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) { > > + if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) { > > trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed, > > TPS("CleanupMore")); > > needgp = true; > > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number 2018-05-14 3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google) ` (6 preceding siblings ...) 2018-05-14 3:15 ` [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed Joel Fernandes (Google) @ 2018-05-14 3:15 ` Joel Fernandes (Google) 2018-05-14 20:33 ` Paul E. McKenney 7 siblings, 1 reply; 42+ messages in thread From: Joel Fernandes (Google) @ 2018-05-14 3:15 UTC (permalink / raw) To: linux-kernel Cc: Joel Fernandes (Google), Paul E. McKenney, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team cpustart shows a stale gp_seq. This is because rdp->gp_seq is updated only at the end of the __note_gp_changes function. For this reason, use rnp->gp_seq instead. I believe we can't update rdp->gp_seq too early so lets just use the gp_seq from rnp instead. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- kernel/rcu/tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 25c44328d071..58d2b68f8b98 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1807,7 +1807,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp, * set up to detect a quiescent state, otherwise don't * go looking for one. */ - trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpustart")); + trace_rcu_grace_period(rsp->name, rnp->gp_seq, TPS("cpustart")); need_gp = !!(rnp->qsmask & rdp->grpmask); rdp->cpu_no_qs.b.norm = need_gp; rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_dynticks.rcu_qs_ctr); -- 2.17.0.441.gb46fe60e1d-goog ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number 2018-05-14 3:15 ` [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number Joel Fernandes (Google) @ 2018-05-14 20:33 ` Paul E. McKenney 2018-05-15 1:02 ` Joel Fernandes 0 siblings, 1 reply; 42+ messages in thread From: Paul E. McKenney @ 2018-05-14 20:33 UTC (permalink / raw) To: Joel Fernandes (Google) Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Sun, May 13, 2018 at 08:15:41PM -0700, Joel Fernandes (Google) wrote: > cpustart shows a stale gp_seq. This is because rdp->gp_seq is updated > only at the end of the __note_gp_changes function. For this reason, use > rnp->gp_seq instead. I believe we can't update rdp->gp_seq too early so > lets just use the gp_seq from rnp instead. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> I took this one with the usual commit-log update. Please take a look! Thanx, Paul ------------------------------------------------------------------------ commit 3e14c6a5225d201776edd9a96d9c9b4435855446 Author: Joel Fernandes (Google) <joel@joelfernandes.org> Date: Sun May 13 20:15:41 2018 -0700 rcu: Fix cpustart tracepoint gp_seq number The "cpustart" trace event shows a stale gp_seq. This is because it uses rdp->gp_seq, which is updated only at the end of the __note_gp_changes() function. This commit therefore instead uses rnp->gp_seq. An alternative fix would be to update rdp->gp_seq earlier, but this would break RCU's detection of the beginning of a new-to-this-CPU grace period. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index de6447dd73de..23b855f5c5cb 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1796,7 +1796,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp, * set up to detect a quiescent state, otherwise don't * go looking for one. */ - trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpustart")); + trace_rcu_grace_period(rsp->name, rnp->gp_seq, TPS("cpustart")); need_gp = !!(rnp->qsmask & rdp->grpmask); rdp->cpu_no_qs.b.norm = need_gp; rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_dynticks.rcu_qs_ctr); ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number 2018-05-14 20:33 ` Paul E. McKenney @ 2018-05-15 1:02 ` Joel Fernandes 0 siblings, 0 replies; 42+ messages in thread From: Joel Fernandes @ 2018-05-15 1:02 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team On Mon, May 14, 2018 at 01:33:20PM -0700, Paul E. McKenney wrote: > On Sun, May 13, 2018 at 08:15:41PM -0700, Joel Fernandes (Google) wrote: > > cpustart shows a stale gp_seq. This is because rdp->gp_seq is updated > > only at the end of the __note_gp_changes function. For this reason, use > > rnp->gp_seq instead. I believe we can't update rdp->gp_seq too early so > > lets just use the gp_seq from rnp instead. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > I took this one with the usual commit-log update. Please take a look! Looks good, thanks! - Joel > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 3e14c6a5225d201776edd9a96d9c9b4435855446 > Author: Joel Fernandes (Google) <joel@joelfernandes.org> > Date: Sun May 13 20:15:41 2018 -0700 > > rcu: Fix cpustart tracepoint gp_seq number > > The "cpustart" trace event shows a stale gp_seq. This is because it uses > rdp->gp_seq, which is updated only at the end of the __note_gp_changes() > function. This commit therefore instead uses rnp->gp_seq. > > An alternative fix would be to update rdp->gp_seq earlier, but this would > break RCU's detection of the beginning of a new-to-this-CPU grace period. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index de6447dd73de..23b855f5c5cb 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1796,7 +1796,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp, > * set up to detect a quiescent state, otherwise don't > * go looking for one. > */ > - trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpustart")); > + trace_rcu_grace_period(rsp->name, rnp->gp_seq, TPS("cpustart")); > need_gp = !!(rnp->qsmask & rdp->grpmask); > rdp->cpu_no_qs.b.norm = need_gp; > rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_dynticks.rcu_qs_ctr); > ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2018-05-16 23:21 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-14 3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google) 2018-05-14 3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google) 2018-05-14 3:47 ` Randy Dunlap 2018-05-14 5:05 ` Joel Fernandes 2018-05-14 17:38 ` Paul E. McKenney 2018-05-15 1:51 ` Joel Fernandes 2018-05-15 3:59 ` Paul E. McKenney 2018-05-15 7:02 ` Joel Fernandes 2018-05-15 12:55 ` Paul E. McKenney 2018-05-15 18:41 ` Joel Fernandes 2018-05-15 19:08 ` Paul E. McKenney 2018-05-15 22:55 ` Joel Fernandes 2018-05-16 15:45 ` Paul E. McKenney 2018-05-16 23:21 ` Joel Fernandes 2018-05-14 3:15 ` [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU Joel Fernandes (Google) 2018-05-14 14:54 ` Steven Rostedt 2018-05-14 17:22 ` Paul E. McKenney 2018-05-15 0:35 ` Joel Fernandes 2018-05-15 3:42 ` Paul E. McKenney 2018-05-14 3:15 ` [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint Joel Fernandes (Google) 2018-05-14 18:12 ` Paul E. McKenney 2018-05-15 0:43 ` Joel Fernandes 2018-05-14 3:15 ` [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU Joel Fernandes (Google) 2018-05-14 17:57 ` Paul E. McKenney 2018-05-15 0:41 ` Joel Fernandes 2018-05-14 3:15 ` [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop Joel Fernandes (Google) 2018-05-14 18:00 ` Paul E. McKenney 2018-05-15 0:43 ` Joel Fernandes 2018-05-14 3:15 ` [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint Joel Fernandes (Google) 2018-05-14 18:38 ` Paul E. McKenney 2018-05-15 0:57 ` Joel Fernandes 2018-05-15 3:46 ` Paul E. McKenney 2018-05-15 23:04 ` Joel Fernandes 2018-05-16 15:48 ` Paul E. McKenney 2018-05-16 23:13 ` Joel Fernandes 2018-05-14 3:15 ` [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed Joel Fernandes (Google) 2018-05-14 19:20 ` Paul E. McKenney 2018-05-15 1:01 ` Joel Fernandes 2018-05-15 3:47 ` Paul E. McKenney 2018-05-14 3:15 ` [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number Joel Fernandes (Google) 2018-05-14 20:33 ` Paul E. McKenney 2018-05-15 1:02 ` Joel Fernandes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).