On Wed, Aug 06, 2014 at 05:09:59AM -0700, Paul E. McKenney wrote: > > Or you could shoot all CPUs with resched_cpu() which would have them > > cycle through schedule() even if there's nothing but the idle thread to > > run. That guarantees they'll go to sleep again in a !trampoline. > > Good point, that would be an easier way to handle the idle threads than > messing with rcu_tasks_kthread()'s affinity. Thank you! One issue though, resched_cpu() doesn't wait for that to complete. We'd need something that would guarantee the remote CPU has actually completed execution. > > But I still very much hate the polling stuff... > > > > Can't we abuse the preempt notifiers? Say we make it possible to install > > preemption notifiers cross-task, then the task-rcu can install a > > preempt-out notifier which completes the rcu-task wait. > > > > After all, since we tagged it it was !running, and being scheduled out > > means it ran (once) and therefore isn't on a trampoline anymore. > > Maybe I am being overly paranoid, but couldn't the task be preempted > in a trampoline, be resumed, execute one instruction (still in the > tramopoline) and be preempted again? Ah, what I failed to state was we should check the sleep condition. So 'voluntary' schedule() calls. Of course if we'd made something specific to the trampoline thing and not 'task'-rcu we could simply check if the IP was inside a trampoline or not. > > And the tick, which checks to see if the task got to userspace can do > > the same, remove the notifier and then complete. > > My main concern with this sort of approach is that I have to deal > with full-up concurrency (200 CPUs all complete tasks concurrently, > for example), which would make for a much larger and more complex patch. > Now, I do admit that it is quite possible that I will end up there anyway, > for example, if more people start using RCU-tasks, but I see no need to > hurry this process. ;-) You mean cacheline contention on the struct completion? I'd first make it simple and only fix it if/when it becomes a problem. 200 CPUs contending on a single cacheline _once_ is annoying, but probably still lots cheaper than polling state for at least that many tasks.