On Thu, Dec 15, 2016 at 11:43:52AM +0000, Mark Rutland wrote: > On Thu, Dec 15, 2016 at 10:42:00AM +0800, Boqun Feng wrote: > > There are some places inside RCU core, where we need to iterate all mask > > (->qsmask, ->expmask, etc) bits in a leaf node, in order to iterate all > > corresponding CPUs. The current code iterates all possible CPUs in this > > leaf node and then checks with the mask to see whether the bit is set. > > > > However, given the fact that most bits in cpu_possible_mask are set but > > rare bits in an RCU leaf node mask are set(in other words, ->qsmask and > > its friends are usually more sparse than cpu_possible_mask), it's better > > to iterate in the other way, that is iterating mask bits in a leaf node. > > By doing so, we can save several checks in the loop, moreover, that fast > > path checking(e.g. ->qsmask == 0) could then be consolidated into the > > loop logic. > > > > This patch introduce for_each_leaf_node_cpu() to iterate mask bits in a > > more efficient way. > > > > By design, The CPUs whose bits are set in the leaf node masks should be > > a subset of possible CPUs, so we don't need extra check with > > cpu_possible(), however, a WARN_ON_ONCE() is put in the loop to check > > whether there are some nasty cases we miss. > > > > Signed-off-by: Boqun Feng > > --- > > kernel/rcu/tree.h | 16 ++++++++++++++++ > > 1 file changed, 16 insertions(+) > > > > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h > > index c0a4bf8f1ed0..70ef44a082e0 100644 > > --- a/kernel/rcu/tree.h > > +++ b/kernel/rcu/tree.h > > @@ -295,6 +295,22 @@ struct rcu_node { > > cpu <= rnp->grphi; \ > > cpu = cpumask_next((cpu), cpu_possible_mask)) > > > > + > > +#define MASK_BITS(mask) (BITS_PER_BYTE * sizeof(mask)) > > +/* > > + * Iterate over all CPUs a leaf RCU node which are still masked in > > + * @mask. > > + * > > + * Note @rnp has to be a leaf node and @mask has to belong to @rnp. > > Not a big deal, but perhaps it's worth enforcing this? If we took just > the name of the mask here, (e.g. qsmask rather than rnp->qsmask), we > could have the macro always use (rnp)->(mask). That would also make the > invocations shorter. > I thought about this approach, but there may be some cases it seems inappropriate, see patch #5, passing "qsmaskinitnext" directly to the for_each_leaf_node_cpu() might be OK, but it just break another abstraction layer which rcu_rnp_online_cpus() provides. > > And we > > + * assume that no CPU is masked in @mask but not set in cpu_possible_mask. IOW, > > + * masks of a leaf node never set a bit for an "impossible" CPU. > > + */ > > +#define for_each_leaf_node_cpu(rnp, mask, cpu) \ > > + for ((cpu) = (rnp)->grplo + find_first_bit(&(mask), MASK_BITS(mask)); \ > > + (cpu) <= (rnp)->grphi && !WARN_ON_ONCE(!cpu_possible(cpu)); \ > > If this happens, we'll exit the loop. If there are any reamining > possible CPUs, we'll skip them, which would be less than ideal. > > I guess this shouldn't happen anyway, but it might be worth continuing. > I chose to break if we met impossible only because I wanted to avoid using that "if(...) else" trick in an iteration macro ;-) I don't know whether this is the first time something like this is brought into kernel, so I'm kinda hesitating to bring this in. But seems I got you as one supporter ;-) Certainly, skip is better than stop. > > + (cpu) = (rnp)->grplo + find_next_bit(&(mask), MASK_BITS(mask), \ > > + (cpu) - (rnp)->grplo + 1)) > > + > > I was going to ask if that + 1 was correct, but I see that it is! > > So FWIW: > > Acked-by: Mark Rutland > Thanks ;-) > > I had a go at handling my comments above, but I'm not sure it's any > better: > > #define cpu_to_grp(rnp, cpu) ((cpu) - (rnp)->grplo) > > #define grp_to_cpu(rnp, cpu) ((cpu) + (rnp)->grplo) > > #define node_first_cpu(rnp, mask) \ > grp_to_cpu(find_first_bit(&(rnp)->mask, MASK_BITS((rnp)->mask))) > > #define node_next_cpu(rnp, mask, cpu) > grp_to_cpu(rnp, find_next_bit(&(rnp)->mask, MASK_BITS((rnp)->mask), > cpu_to_grp(rnp, cpu) + 1)) > I tried something similar, but it seems bringing too many abstraction layers just for one macro. I basically follow the rule: if the potential users are less than three, no need to do abstraction ;-) But thank you for looking into this ;-) Regards, Boqun > #define for_each_leaf_node_cpu(rnp, mask, cpu) \ > for ((cpu) = node_first_cpu(rnp, mask); \ > (cpu) <= (rnp)->grphi; \ > (cpu) = node_next_cpu(rnp, mask, cpu)) \ > if (WARN_ON_ONCE(!cpu_possible(cpu))) \ > continue; \ > else > > Thanks, > Mark.