On Thu, Nov 26 2020, Hillf Danton wrote: > On Fri, 20 Nov 2020 15:33:27 +1100 NeilBrown wrote: >> >>My first idea was to add WQ_CPU_INTENSIVE to the nfsiod workqueue, but >>Trond wondered what was special about NFS. Many filesystems call iput >>from a workqueue, so finding a solution that helps them all is best. > > In terms of iput, I think we can splice WQ_CPU_INTENSIVE to > WQ_MEM_RECLAIM. I'm actually starting to think that WQ_CPU_INTENSIVE is a mistake. If you really have cpu-intensive work, you should be using WQ_UNBOUND. It is possible that there might be work that is CPU intensive and which must be run on a particular CPU - such as clearing out per-cpu lists of recently freed slab allocations. But I don't WQ_CPU_INTENSIVE is currently used that way. I cannot find *any* users of WQ_CPU_INTENSIVE which call cond_resched() in the relevant work items. And if the code doesn't call cond_resched() (or similar), then it isn't really CPU-intensive. > >>I then suggested getting cond_resched() to do something more useful when >>called by a worker. PeterZ didn't like the overhead. >> >>Also, TJ seemed to be against auto-adjusting for cpu-intensive code, >>preferring the right sort of workqueue to be chosen up front. > > Actually WQ_EVENTS_LONG sounds better than WQ_CPU_INTENSIVE, given that > we have two events WQs with the same attr. There is no WQ_EVENTS_LONG > > system_wq = alloc_workqueue("events", 0, 0); > system_long_wq = alloc_workqueue("events_long", 0, 0); > > Then what are the boundaries we can draw in between WQ_MEM_RECLAIM, > WQ_CPU_INTENSIVE and WQ_EVENTS_LONG? I think system_long_wq is a design flaw. Some code (mistakenly) schedules work on system_wq, calls flush_workqueue(system_wq)) and expects that to complete reasonably quickly. To ensure this can work, system_long_wq was created and work items that might take a long time are encouraged to be run there. Instead, the mistaken code should create its own work queue, schedule work on that, and flush that queue. Thanks, NeilBrown > > > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -4261,6 +4261,9 @@ struct workqueue_struct *alloc_workqueue > if ((flags & WQ_POWER_EFFICIENT) && wq_power_efficient) > flags |= WQ_UNBOUND; > > + if (flags & (WQ_MEM_RECLAIM | WQ_UNBOUND) == WQ_MEM_RECLAIM) > + flags |= WQ_CPU_INTENSIVE; > + > /* allocate wq and format name */ > if (flags & WQ_UNBOUND) > tbl_size = nr_node_ids * sizeof(wq->numa_pwq_tbl[0]);