Re: [PATCH] sched: Fix adverse effects of NFS client on interactive response

From: Peter Williams <pwil3058@bigpond.net.au>
To: Helge Hafting <helgehaf@aitel.hist.no>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
	Ingo Molnar <mingo@elte.hu>, Con Kolivas <kernel@kolivas.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched: Fix adverse effects of NFS client on	interactive response
Date: Thu, 05 Jan 2006 08:51:15 +1100	[thread overview]
Message-ID: <43BC4353.3090704@bigpond.net.au> (raw)
In-Reply-To: <43BB2414.6060400@bigpond.net.au>

Peter Williams wrote:
> Peter Williams wrote:
> 
>> Helge Hafting wrote:
>>
>>> On Wed, Dec 21, 2005 at 05:32:52PM +1100, Peter Williams wrote:
>>>
>>>> Trond Myklebust wrote:
>>>
>>>
>>>
>>> [...]
>>>
>>>>> Sorry. That theory is just plain wrong. ALL of those case _ARE_
>>>>> interactive sleeps.
>>>>
>>>>
>>>>
>>>> It's not a theory.  It's a result of observing a -j 16 build with 
>>>> the sources on an NFS mounted file system with top with and without 
>>>> the patches and comparing that with the same builds with the sources 
>>>> on a local file system.  Without the patches the tasks in the kernel 
>>>> build all get the same dynamic priority as the X server and other 
>>>> interactive programs when the sources are on an NFS mounted file 
>>>> system.  With the patches they generally have dynamic priorities 
>>>> between 6 to 10 higher than the X server and other interactive 
>>>> programs.
>>>>
>>>
>>> A process waiting for NFS data looses cpu time, which is spent on 
>>> running something else.  Therefore, it gains some priority so it 
>>> won't be
>>> forever behind when it wakes up.  Same as for any other io waiting.
>>
>>
>>
>> That's more or less independent of this issue as the distribution of 
>> CPU to tasks is largely determined by the time slice mechanism and the 
>> dynamic priority is primarily about latency.  (This distinction is a 
>> little distorted by the fact that, under some circumstances, 
>> "interactive" tasks don't get moved to the expired list at the end of 
>> their time slice but this usually won't matter as genuine interactive 
>> tasks aren't generally CPU hogs.)  In other words, the issue that you 
>> raised is largely solved by the time tasks spend on the active queue 
>> before moving to the expired queue rather than the order in which they 
>> run when on the active queue.
>>
>> This problem is all about those tasks getting an inappropriate boost 
>> to improve their latency because they are mistakenly believed to be 
>> interactive.
> 
> 
> One of the unfortunate side effects of this is that it can effect 
> scheduler fairness because if these tasks get sufficient bonus points 
> the TASK_INTERACTIVE() macro will return true for them and they will be 
> rescheduled on the active queue instead of the expired queue at the end 
> of the time slice (provided EXPIRED_STARVING()) doesn't prevent this). 
> This will have an adverse effect on scheduling fairness.

I should have added here that if EXPIRED_STARVING() stops these tasks 
from being requeued on the active queue at the end of their time slice 
then it will also stop genuine interactive tasks from being requeued on 
the active queue with bad effects for interactive responsiveness.

> 
> The ideal design of the scheduler would be for the fairness mechanism 
> and the interactive responsiveness mechanism to be independent but this 
> is not the case due to the fact that requeueing interactive tasks on the 
> expired array could add unacceptably to their latency.  As I said above 
> this slight divergence from the ideal of perfect independence shouldn't 
> matter as genuine interactive processes aren't very CPU intensive.
> 
> In summary, inappropriate identification of CPU intensive tasks as 
> interactive has two bad effects: 1) responsiveness problems for genuine 
> interactive tasks due to the extra competition at their dynamic priority 
> and 2) a degradation of scheduling fairness; not just one.
> 
> For an example of the effect of inappropriate identification of CPU hogs 
> as interactive tasks see the thread "[SCHED] Totally WRONG priority 
> calculation with specific test-case (since 2.6.10-bk12)" in this list.
> 
>>  Having had a closer think about the way the scheduler works I'm now 
>> of the opinion that completely ignoring sleeps labelled as 
>> TASK_NONINTERACTIVE may be a mistake and that it might be more 
>> appropriate to treat them the same as TASK_UNITERRUPTIBLE but I'll bow 
>> to Ingo on this as he would have a better understanding of the issues 
>> involved.

I've changed my mind again on this and now think that, rather than 
treating TASK_NONINTERACTIVE sleeps the way TASK_UNINTERRUPTIBLE sleeps 
are currently treated, TASK_UNINTERRUPTIBLE sleeps should be ignored 
just like TASK_NONINTERACTIVE sleeps currently are.

>>
>>>
>>> Perhaps expecting a 16-way parallel make to have "no impact" is
>>> a bit optimistic.  How about nicing the make, explicitly telling
>>> linux that it isn't important?
>>
>>
>>
>> Yes, but that shouldn't be necessary.  If I do the same build on a 
>> local file system everything works OK and the tasks in the build have 
>> dynamic priorities 8 to 10 slots higher than the X server and other 
>> interactive programs.
> 
> 
> Further analysis indicates that this is not a complete solution as the 
> tasks would still be identified as interactive and given a bonus. 
> Although the change of nice value would be sufficient to stop these 
> tasks competing with the genuine interactive tasks, they would probably 
> still get a positive return value from TASK_INTERACTIVE() (as it's 
> effectively based on the bonus acquired i.e. difference between prio and 
> static_prio) and hence preferential treatment at the end of their time 
> slice with a consequent degradation of scheduling fairness.
> 
>>
>>>  Or how about giving important
>>> tasks extra priority?
>>
>>
>>
>> Only root can do that.  But some operating systems do just that e.g. 
>> Solaris has an IA scheduling class (which all X based programs are run 
>> in) that takes precedence over programs in the TS class (which is the 
>> equivalent of Linus's SCHED_NORMAL).  I'm not sure how they handle the 
>> privileges issues related to stopping inappropriate programs misusing 
>> the IA class.  IA is really just TS with a boost which is effectively 
>> just the reverse implementation of what the new SCHED_BATCH achieves. 
>> Arguably, SCHED_BATCH is the superior way of doing this as it doesn't 
>> cause any privilege issues as shifting to SCHED_BATCH can be done by 
>> the owner of the task.
>>
>> The main drawback to the SCHED_BATCH approach is that it (currently) 
>> requires the user to explicitly set it on the relevant tasks.  It's 
>> long term success would be greatly enhanced if programmers could be 
>> convinced to have their programs switch themselves to SCHED_BATCH 
>> unless they are genuine interactive processes.

I think that some of the harder to understand parts of the scheduler 
code are actually attempts to overcome the undesirable effects (such as 
those I've described) of inappropriately identifying tasks as 
interactive.  I think that it would have been better to attempt to fix 
the inappropriate identifications rather than their effects and I think 
the prudent use of TASK_NONINTERACTIVE is an important tool for 
achieving this.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce