On Thu, 23 Jun 2011, Tejun Heo wrote: > Hello, Ingo. > > On Thu, Jun 23, 2011 at 12:44 PM, Ingo Molnar wrote: > > > > * Tejun Heo wrote: > > > >> The patch description is simply untrue.  It does affect how wq > >> behaves under heavy CPU load.  The effect might be perfectly okay > >> but more likely it will result in subtle suboptimal behaviors under > >> certain load situations which would be difficult to characterize > >> and track down.  Again, the trade off (mostly killing of > >> ttwu_local) could be okay but you can't get away with just claiming > >> "there's no harm". Fair enough. I'm happy to reword this. > > Well, either it can be measured or not. If you can suggest a specific > > testing method to Thomas, please do. > > Crafting a test case where the suggested change results in worse > behavior isn't difficult (it ends up waking/creating workers which it > doesn't have to between ttwu and actual execution); however, as with > any micro benchmark, the problem is with assessing whether and how > much it would matter in actual workloads (whatever that means) and > workqueue is used throughout the kernel with widely varying patterns > making drawing conclusion a bit tricky. Given that, changing the > behavior for the worse just for this cleanup doesn't sound like too > sweet a deal. Is there any other reason to change the behavior > (latency, whatever) other than the ttuw_local ugliness? ttwu_local should have never been there in the first place. And yes, it's a latency and a correctness issue. Workqueues are not part of the scheduler, period. I'm fine with a call in pre/post schedule, so it's a separate code path and not convoluted into the guts of the scheduler code. The whole design is completely plugged into the scheduler, as you _abuse_ rq->lock to serialize the accounting and allow the unlocked access to the idle_list. It does not matter at all whether that code path is fast or not, it simply does not belong there. We could do a lot of other fancy accounting tricks with hooks inside the scheduler. If we tolerate workqueues hackery in there how do we rule out other crap which wants to have hooks plugged into that? There are issues which can't be solved other than from the scheduler core, e.g. the VCPU switch, but the workqueue overload accounting is definitely not in that category. The only reason why you want this precise accounting thing is a corner case of system overload, but not the common case. In case of system overload you can do extra work to figure out how many of these beasts are on the fly and whether it makes sense to start some more, simply because it does not matter in a overload situation whether you run through a list to gather information. Can we please stop adding stuff to the most crucial code pathes in the kernel just to avoid thinking a bit harder about problems? Thanks, tglx