On Wed, Nov 18 2020, Hillf Danton wrote:

> On Wed, 18 Nov 2020 16:11:44 +1100 NeilBrown wrote:
>> On Wed, Nov 18 2020, Hillf Danton wrote:
>> ...
>> I don't think this is a good idea.
>
> Let me add a few more words.
>
>> cond_resched() is expected to be called often.  Adding all this extra
>
> They are those only invoked in concurrency-managed worker contexts and
> are thus supposed to be less often than thought; what is more the callers
> know what they are doing if a schedule() follows up, needless to say it
> is an ant-antenna-size add-in to check WORKER_CPU_INTENSIVE given
> 	WARN_ON_ONCE(workqueue_mustnt_use_cpu())
> added in cond_resched().

"supposed to be less often" is the central point here.
Because the facts are that they sometime happen with high frequency
despite what is "supposed" to happen.
Either the assumption that CM-workers don't call cond_resched() is
wrong, or the code that schedules such workers on CM-queues is wrong.

I much prefer the perspective that the assumption is wrong.  If that is
agreed then we need to handle that circumstance without making
cond_resched() more expensive.
Note that adding WARN_ON_ONCE() does not make it more expensive as it is
only enabled with KERNEL_DEBUG (and WQ_WATCHDOG, though the particular
config option could be changed). It isn't needed in production.

If the workqueue maintainers are unmovable in the position that a
CM-workitem must not use excessive CPU ever, and so must not call
cond_resched(), then I can take that back to the NFS maintainers and
negotiate different workqueue settings.  But as I've said, I think this
is requiring the decision to be made in a place that is not well
positioned to make it.

Thanks,
NeilBrown