linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* process starvation with 2.6 scheduler
@ 2006-06-05 19:48 Kallol Biswas
  2006-06-05 23:49 ` (no subject) Hack Sung Lee
  2006-06-06  8:01 ` process starvation with 2.6 scheduler Mike Galbraith
  0 siblings, 2 replies; 16+ messages in thread
From: Kallol Biswas @ 2006-06-05 19:48 UTC (permalink / raw)
  To: linux-kernel

Hello,
       We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.

We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.

The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.

If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.

The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.

With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 


For Active  Process:
  Time_slice 84
  Policy : SCHED_NORMAL
  Dynamic priority: 118
  Static priority: 120
  Preempt_count: 0x20100
  Flags = 0
  State = 0 (TASK_RUNNING)

For Starved Process:
  Time slice: 77
  Policy: SCHED_NORMAL
  Dynamic priority: 120
  Static priority: 120
  Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
  Flags = 0 
  State = 0 (TASK_RUNNING)

Any help to debug the problem is welcome. 

Kallol

^ permalink raw reply	[flat|nested] 16+ messages in thread

* (no subject)
  2006-06-05 19:48 process starvation with 2.6 scheduler Kallol Biswas
@ 2006-06-05 23:49 ` Hack Sung Lee
  2006-06-06  8:01 ` process starvation with 2.6 scheduler Mike Galbraith
  1 sibling, 0 replies; 16+ messages in thread
From: Hack Sung Lee @ 2006-06-05 23:49 UTC (permalink / raw)
  To: linux-kernel

unsubscribe linux-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: process starvation with 2.6 scheduler
  2006-06-05 19:48 process starvation with 2.6 scheduler Kallol Biswas
  2006-06-05 23:49 ` (no subject) Hack Sung Lee
@ 2006-06-06  8:01 ` Mike Galbraith
  2006-06-06 16:55   ` Stephen Hemminger
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-06-06  8:01 UTC (permalink / raw)
  To: Kallol Biswas; +Cc: linux-kernel

(please line wrap)

On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
> Hello,
>        We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.
> 
> We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.
> 
> The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.
> 
> If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.
> 
> The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.
> 
> With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 
> 
> 
> For Active  Process:
>   Time_slice 84
>   Policy : SCHED_NORMAL
>   Dynamic priority: 118
>   Static priority: 120
>   Preempt_count: 0x20100
>   Flags = 0
>   State = 0 (TASK_RUNNING)
> 
> For Starved Process:
>   Time slice: 77
>   Policy: SCHED_NORMAL
>   Dynamic priority: 120
>   Static priority: 120
>   Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
>   Flags = 0 
>   State = 0 (TASK_RUNNING)
> 
> Any help to debug the problem is welcome. 

I'm having difficulty understanding.  Are you saying that the "starved"
tasks are runnable, but receiving _zero_ cpu?  That's impossible with
only one other SCHED_NORMAL task afaik, which makes me think you may
mean they're not receiving cpu frequently enough to keep clients from
timing out?  One task which has slept enough to acquire interactive
status (as above) can hold others off the cpu for quite a while if it
starts a burst of heavy cpu burning.  If your netperf clients are
choking on this latency, running the servers at nice 19 should prevent
the problem.

	-Mike


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: process starvation with 2.6 scheduler
  2006-06-06  8:01 ` process starvation with 2.6 scheduler Mike Galbraith
@ 2006-06-06 16:55   ` Stephen Hemminger
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2006-06-06 16:55 UTC (permalink / raw)
  To: linux-kernel

On Tue, 06 Jun 2006 10:01:58 +0200
Mike Galbraith <efault@gmx.de> wrote:

> (please line wrap)
> 
> On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
> > Hello,
> >        We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.
> > 
> > We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.
> > 
> > The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.
> > 
> > If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.
> > 
> > The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.
> > 
> > With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 
> > 
> > 
> > For Active  Process:
> >   Time_slice 84
> >   Policy : SCHED_NORMAL
> >   Dynamic priority: 118
> >   Static priority: 120
> >   Preempt_count: 0x20100
> >   Flags = 0
> >   State = 0 (TASK_RUNNING)
> > 
> > For Starved Process:
> >   Time slice: 77
> >   Policy: SCHED_NORMAL
> >   Dynamic priority: 120
> >   Static priority: 120
> >   Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
> >   Flags = 0 
> >   State = 0 (TASK_RUNNING)
> > 
> > Any help to debug the problem is welcome. 
> 
> I'm having difficulty understanding.  Are you saying that the "starved"
> tasks are runnable, but receiving _zero_ cpu?  That's impossible with
> only one other SCHED_NORMAL task afaik, which makes me think you may
> mean they're not receiving cpu frequently enough to keep clients from
> timing out?  One task which has slept enough to acquire interactive
> status (as above) can hold others off the cpu for quite a while if it
> starts a burst of heavy cpu burning.  If your netperf clients are
> choking on this latency, running the servers at nice 19 should prevent
> the problem.
> 


Is the processor getting consumed by network traffic in soft irq?
If you are using non NAPI device driver, then it is easy to get soft irq
overwhelmed with packets.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
  2006-06-14 17:56 Kallol Biswas
@ 2006-06-14 19:26 ` Mike Galbraith
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Galbraith @ 2006-06-14 19:26 UTC (permalink / raw)
  To: Kallol Biswas; +Cc: Stephen Hemminger, linux-kernel, Radjendirane Codandaramane

On Wed, 2006-06-14 at 10:56 -0700, Kallol Biswas wrote:
> Yes, all 3 clients run on a Redhat 9 box.

Ok, what do the priorities and cpu distribution look like?

	-Mike


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
@ 2006-06-14 17:56 Kallol Biswas
  2006-06-14 19:26 ` Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Kallol Biswas @ 2006-06-14 17:56 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Stephen Hemminger, linux-kernel, Radjendirane Codandaramane

Yes, all 3 clients run on a Redhat 9 box.

-----Original Message-----
From: Mike Galbraith [mailto:efault@gmx.de] 
Sent: Tuesday, June 13, 2006 10:13 PM
To: Kallol Biswas
Cc: Stephen Hemminger; linux-kernel@vger.kernel.org; Radjendirane Codandaramane
Subject: RE: process starvation with 2.6 scheduler

On Tue, 2006-06-13 at 16:03 -0700, Kallol Biswas wrote:
> It seems that with the priority set to 19 the netserver processes do not starve but still we have unfair scheduling issue. The netperf clients do not timeout now but one of the servers runs much less than the other. It seems that thorough understanding of scheduling algorithm is essential at this point.

Are the clients all on one box?

	-Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
  2006-06-13 23:03 Kallol Biswas
@ 2006-06-14  5:13 ` Mike Galbraith
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Galbraith @ 2006-06-14  5:13 UTC (permalink / raw)
  To: Kallol Biswas; +Cc: Stephen Hemminger, linux-kernel, Radjendirane Codandaramane

On Tue, 2006-06-13 at 16:03 -0700, Kallol Biswas wrote:
> It seems that with the priority set to 19 the netserver processes do not starve but still we have unfair scheduling issue. The netperf clients do not timeout now but one of the servers runs much less than the other. It seems that thorough understanding of scheduling algorithm is essential at this point.

Are the clients all on one box?

	-Mike


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
@ 2006-06-13 23:03 Kallol Biswas
  2006-06-14  5:13 ` Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Kallol Biswas @ 2006-06-13 23:03 UTC (permalink / raw)
  To: Kallol Biswas, Stephen Hemminger, linux-kernel
  Cc: Mike Galbraith, Radjendirane Codandaramane

It seems that with the priority set to 19 the netserver processes do not starve but still we have unfair scheduling issue. The netperf clients do not timeout now but one of the servers runs much less than the other. It seems that thorough understanding of scheduling algorithm is essential at this point.

-----Original Message-----
From: Kallol Biswas 
Sent: Tuesday, June 06, 2006 2:58 PM
To: Kallol Biswas; 'Stephen Hemminger'; 'linux-kernel@vger.kernel.org'
Cc: 'Mike Galbraith'
Subject: RE: process starvation with 2.6 scheduler

Thanks for help. We do not see the issue if every netserver's priority is set to 19 with setpriority() call.

-----Original Message-----
From: Kallol Biswas 
Sent: Tuesday, June 06, 2006 10:56 AM
To: 'Stephen Hemminger'; linux-kernel@vger.kernel.org
Cc: 'Mike Galbraith'
Subject: RE: process starvation with 2.6 scheduler


I have verified that the starved tasks are in the runqueue (prio_array_t 
array[0], active points to array[0]), the timestamp and last_ran 
indicate that they have not run for a while.

The network traffic is of request response type.

Client (on an external box)3 ports ---- 3 cables ----3 ports Emulated Host
 
The netperf clients run on an external box, the emulated host (ppc440) runs 
the servers. A client sends request to a server, the server returns the 
reply, then the next request from the client goes to the server. There are 3
clients and 3 servers, one client-server pair for each connection 
(3 connections: 3 ports on external box  --3 connection 
 -- 3 ports on emulated host).

Since traffic is of request/response in nature and the packets reach
user space (to netserver) before turning around I do not think slow CPU is an issue.

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Stephen Hemminger
Sent: Tuesday, June 06, 2006 9:56 AM
To: linux-kernel@vger.kernel.org
Subject: Re: process starvation with 2.6 scheduler

On Tue, 06 Jun 2006 10:01:58 +0200
Mike Galbraith <efault@gmx.de> wrote:

> (please line wrap)
> 
> On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
> > Hello,
> >        We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.
> > 
> > We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.
> > 
> > The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.
> > 
> > If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.
> > 
> > The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.
> > 
> > With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 
> > 
> > 
> > For Active  Process:
> >   Time_slice 84
> >   Policy : SCHED_NORMAL
> >   Dynamic priority: 118
> >   Static priority: 120
> >   Preempt_count: 0x20100
> >   Flags = 0
> >   State = 0 (TASK_RUNNING)
> > 
> > For Starved Process:
> >   Time slice: 77
> >   Policy: SCHED_NORMAL
> >   Dynamic priority: 120
> >   Static priority: 120
> >   Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
> >   Flags = 0 
> >   State = 0 (TASK_RUNNING)
> > 
> > Any help to debug the problem is welcome. 
> 
> I'm having difficulty understanding.  Are you saying that the "starved"
> tasks are runnable, but receiving _zero_ cpu?  That's impossible with
> only one other SCHED_NORMAL task afaik, which makes me think you may
> mean they're not receiving cpu frequently enough to keep clients from
> timing out?  One task which has slept enough to acquire interactive
> status (as above) can hold others off the cpu for quite a while if it
> starts a burst of heavy cpu burning.  If your netperf clients are
> choking on this latency, running the servers at nice 19 should prevent
> the problem.
> 


Is the processor getting consumed by network traffic in soft irq?
If you are using non NAPI device driver, then it is easy to get soft irq
overwhelmed with packets.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
  2006-06-06 21:58 Kallol Biswas
@ 2006-06-07  5:27 ` Mike Galbraith
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Galbraith @ 2006-06-07  5:27 UTC (permalink / raw)
  To: Kallol Biswas; +Cc: Stephen Hemminger, linux-kernel

On Tue, 2006-06-06 at 14:58 -0700, Kallol Biswas wrote:
> Thanks for help. We do not see the issue if every netserver's priority is set to 19 with setpriority() call.

FYI, there have been changes to the scheduler since 2.6.11 days that
reduce the likelihood of this scenario somewhat.

	-Mike


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
@ 2006-06-06 21:58 Kallol Biswas
  2006-06-07  5:27 ` Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Kallol Biswas @ 2006-06-06 21:58 UTC (permalink / raw)
  To: Kallol Biswas, Stephen Hemminger, linux-kernel; +Cc: Mike Galbraith

Thanks for help. We do not see the issue if every netserver's priority is set to 19 with setpriority() call.

-----Original Message-----
From: Kallol Biswas 
Sent: Tuesday, June 06, 2006 10:56 AM
To: 'Stephen Hemminger'; linux-kernel@vger.kernel.org
Cc: 'Mike Galbraith'
Subject: RE: process starvation with 2.6 scheduler


I have verified that the starved tasks are in the runqueue (prio_array_t 
array[0], active points to array[0]), the timestamp and last_ran 
indicate that they have not run for a while.

The network traffic is of request response type.

Client (on an external box)3 ports ---- 3 cables ----3 ports Emulated Host
 
The netperf clients run on an external box, the emulated host (ppc440) runs 
the servers. A client sends request to a server, the server returns the 
reply, then the next request from the client goes to the server. There are 3
clients and 3 servers, one client-server pair for each connection 
(3 connections: 3 ports on external box  --3 connection 
 -- 3 ports on emulated host).

Since traffic is of request/response in nature and the packets reach
user space (to netserver) before turning around I do not think slow CPU is an issue.

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Stephen Hemminger
Sent: Tuesday, June 06, 2006 9:56 AM
To: linux-kernel@vger.kernel.org
Subject: Re: process starvation with 2.6 scheduler

On Tue, 06 Jun 2006 10:01:58 +0200
Mike Galbraith <efault@gmx.de> wrote:

> (please line wrap)
> 
> On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
> > Hello,
> >        We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.
> > 
> > We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.
> > 
> > The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.
> > 
> > If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.
> > 
> > The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.
> > 
> > With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 
> > 
> > 
> > For Active  Process:
> >   Time_slice 84
> >   Policy : SCHED_NORMAL
> >   Dynamic priority: 118
> >   Static priority: 120
> >   Preempt_count: 0x20100
> >   Flags = 0
> >   State = 0 (TASK_RUNNING)
> > 
> > For Starved Process:
> >   Time slice: 77
> >   Policy: SCHED_NORMAL
> >   Dynamic priority: 120
> >   Static priority: 120
> >   Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
> >   Flags = 0 
> >   State = 0 (TASK_RUNNING)
> > 
> > Any help to debug the problem is welcome. 
> 
> I'm having difficulty understanding.  Are you saying that the "starved"
> tasks are runnable, but receiving _zero_ cpu?  That's impossible with
> only one other SCHED_NORMAL task afaik, which makes me think you may
> mean they're not receiving cpu frequently enough to keep clients from
> timing out?  One task which has slept enough to acquire interactive
> status (as above) can hold others off the cpu for quite a while if it
> starts a burst of heavy cpu burning.  If your netperf clients are
> choking on this latency, running the servers at nice 19 should prevent
> the problem.
> 


Is the processor getting consumed by network traffic in soft irq?
If you are using non NAPI device driver, then it is easy to get soft irq
overwhelmed with packets.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
@ 2006-06-06 18:51 Kallol Biswas
  0 siblings, 0 replies; 16+ messages in thread
From: Kallol Biswas @ 2006-06-06 18:51 UTC (permalink / raw)
  To: Kallol Biswas, Stephen Hemminger, linux-kernel; +Cc: Mike Galbraith

More information:
Turning on CONFIG_SCHEDSTAT I have got more information. Next I will try lowering the nice value of the servers.

Starved Process:
Sched_info->pcnt 33
            Cpu_time 64
            Run_delay 113
            Last_arrival 0xffc4a89

Active Process:
Sched_info->pcnt 238
             Cpu_time 2852
             Run_delay 190
             Last arrival 0xfffc4aa5

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Kallol Biswas
Sent: Tuesday, June 06, 2006 10:56 AM
To: Stephen Hemminger; linux-kernel@vger.kernel.org
Cc: Mike Galbraith
Subject: RE: process starvation with 2.6 scheduler


I have verified that the starved tasks are in the runqueue (prio_array_t 
array[0], active points to array[0]), the timestamp and last_ran 
indicate that they have not run for a while.

The network traffic is of request response type.

Client (on an external box)3 ports ---- 3 cables ----3 ports Emulated Host
 
The netperf clients run on an external box, the emulated host (ppc440) runs 
the servers. A client sends request to a server, the server returns the 
reply, then the next request from the client goes to the server. There are 3
clients and 3 servers, one client-server pair for each connection 
(3 connections: 3 ports on external box  --3 connection 
 -- 3 ports on emulated host).

Since traffic is of request/response in nature and the packets reach
user space (to netserver) before turning around I do not think slow CPU is an issue.

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Stephen Hemminger
Sent: Tuesday, June 06, 2006 9:56 AM
To: linux-kernel@vger.kernel.org
Subject: Re: process starvation with 2.6 scheduler

On Tue, 06 Jun 2006 10:01:58 +0200
Mike Galbraith <efault@gmx.de> wrote:

> (please line wrap)
> 
> On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
> > Hello,
> >        We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.
> > 
> > We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.
> > 
> > The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.
> > 
> > If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.
> > 
> > The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.
> > 
> > With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 
> > 
> > 
> > For Active  Process:
> >   Time_slice 84
> >   Policy : SCHED_NORMAL
> >   Dynamic priority: 118
> >   Static priority: 120
> >   Preempt_count: 0x20100
> >   Flags = 0
> >   State = 0 (TASK_RUNNING)
> > 
> > For Starved Process:
> >   Time slice: 77
> >   Policy: SCHED_NORMAL
> >   Dynamic priority: 120
> >   Static priority: 120
> >   Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
> >   Flags = 0 
> >   State = 0 (TASK_RUNNING)
> > 
> > Any help to debug the problem is welcome. 
> 
> I'm having difficulty understanding.  Are you saying that the "starved"
> tasks are runnable, but receiving _zero_ cpu?  That's impossible with
> only one other SCHED_NORMAL task afaik, which makes me think you may
> mean they're not receiving cpu frequently enough to keep clients from
> timing out?  One task which has slept enough to acquire interactive
> status (as above) can hold others off the cpu for quite a while if it
> starts a burst of heavy cpu burning.  If your netperf clients are
> choking on this latency, running the servers at nice 19 should prevent
> the problem.
> 


Is the processor getting consumed by network traffic in soft irq?
If you are using non NAPI device driver, then it is easy to get soft irq
overwhelmed with packets.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
@ 2006-06-06 17:55 Kallol Biswas
  0 siblings, 0 replies; 16+ messages in thread
From: Kallol Biswas @ 2006-06-06 17:55 UTC (permalink / raw)
  To: Stephen Hemminger, linux-kernel; +Cc: Mike Galbraith


I have verified that the starved tasks are in the runqueue (prio_array_t 
array[0], active points to array[0]), the timestamp and last_ran 
indicate that they have not run for a while.

The network traffic is of request response type.

Client (on an external box)3 ports ---- 3 cables ----3 ports Emulated Host
 
The netperf clients run on an external box, the emulated host (ppc440) runs 
the servers. A client sends request to a server, the server returns the 
reply, then the next request from the client goes to the server. There are 3
clients and 3 servers, one client-server pair for each connection 
(3 connections: 3 ports on external box  --3 connection 
 -- 3 ports on emulated host).

Since traffic is of request/response in nature and the packets reach
user space (to netserver) before turning around I do not think slow CPU is an issue.

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Stephen Hemminger
Sent: Tuesday, June 06, 2006 9:56 AM
To: linux-kernel@vger.kernel.org
Subject: Re: process starvation with 2.6 scheduler

On Tue, 06 Jun 2006 10:01:58 +0200
Mike Galbraith <efault@gmx.de> wrote:

> (please line wrap)
> 
> On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
> > Hello,
> >        We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.
> > 
> > We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.
> > 
> > The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.
> > 
> > If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.
> > 
> > The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.
> > 
> > With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 
> > 
> > 
> > For Active  Process:
> >   Time_slice 84
> >   Policy : SCHED_NORMAL
> >   Dynamic priority: 118
> >   Static priority: 120
> >   Preempt_count: 0x20100
> >   Flags = 0
> >   State = 0 (TASK_RUNNING)
> > 
> > For Starved Process:
> >   Time slice: 77
> >   Policy: SCHED_NORMAL
> >   Dynamic priority: 120
> >   Static priority: 120
> >   Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
> >   Flags = 0 
> >   State = 0 (TASK_RUNNING)
> > 
> > Any help to debug the problem is welcome. 
> 
> I'm having difficulty understanding.  Are you saying that the "starved"
> tasks are runnable, but receiving _zero_ cpu?  That's impossible with
> only one other SCHED_NORMAL task afaik, which makes me think you may
> mean they're not receiving cpu frequently enough to keep clients from
> timing out?  One task which has slept enough to acquire interactive
> status (as above) can hold others off the cpu for quite a while if it
> starts a burst of heavy cpu burning.  If your netperf clients are
> choking on this latency, running the servers at nice 19 should prevent
> the problem.
> 


Is the processor getting consumed by network traffic in soft irq?
If you are using non NAPI device driver, then it is easy to get soft irq
overwhelmed with packets.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
@ 2006-06-05 23:09 Kallol Biswas
  0 siblings, 0 replies; 16+ messages in thread
From: Kallol Biswas @ 2006-06-05 23:09 UTC (permalink / raw)
  To: linux-kernel



-----Original Message-----
From: Kallol Biswas 
Sent: Monday, June 05, 2006 4:05 PM
To: 'linuxppc-dev@ozlabs.org'
Subject: RE: process starvation with 2.6 scheduler

Some more information:

For the active process:
            Last_ran 0x190458787
            Timestamp: 0x190458787
For the starved process:
            Last_ran: 0x14dc18cfd
            Timestamp: 0x14dc18cfd

-----Original Message-----
From: Kallol Biswas 
Sent: Monday, June 05, 2006 3:09 PM
To: 'linuxppc-dev@ozlabs.org'
Subject: FW: process starvation with 2.6 scheduler



-----Original Message-----
From: Kallol Biswas 
Sent: Monday, June 05, 2006 2:30 PM
To: Kallol Biswas; linux-kernel@vger.kernel.org
Subject: RE: process starvation with 2.6 scheduler

I have checked the per processor run queue data structure (we have only one).

The active process is the in the queue list 118 of the of array[0] and the
starved process is the queue list 120 of the array[0]. The pointer, active points to array[0] and expired points to array[1].

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Kallol Biswas
Sent: Monday, June 05, 2006 1:47 PM
To: linux-kernel@vger.kernel.org
Subject: process starvation with 2.6 scheduler


From: Kallol Biswas 
Sent: Monday, June 05, 2006 12:49 PM
To: 'linux-kernel@vger.kernel.org'
Subject: process starvation with 2.6 scheduler

Hello,
       We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.

We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.

The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.

If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.

The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.

With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 


For Active  Process:
  Time_slice 84
  Policy : SCHED_NORMAL
  Dynamic priority: 118
  Static priority: 120
  Preempt_count: 0x20100
  Thread_info->flags = 0
  State = 0 (TASK_RUNNING)

For Starved Process:
  Time slice: 77
  Policy: SCHED_NORMAL
  Dynamic priority: 120
  Static priority: 120
  Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
  Thread_info->flags = 0 
  State = 0 (TASK_RUNNING)



CONFIG_PREEMPT is not set.
The system has single CPU.


Any help to debug the problem is welcome. 

Kallol
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
@ 2006-06-05 21:30 Kallol Biswas
  0 siblings, 0 replies; 16+ messages in thread
From: Kallol Biswas @ 2006-06-05 21:30 UTC (permalink / raw)
  To: Kallol Biswas, linux-kernel

I have checked the per processor run queue data structure (we have only one).

The active process is the in the queue list 118 of the of array[0] and the
starved process is the queue list 120 of the array[0]. The pointer, active points to array[0] and expired points to array[1].

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Kallol Biswas
Sent: Monday, June 05, 2006 1:47 PM
To: linux-kernel@vger.kernel.org
Subject: process starvation with 2.6 scheduler


From: Kallol Biswas 
Sent: Monday, June 05, 2006 12:49 PM
To: 'linux-kernel@vger.kernel.org'
Subject: process starvation with 2.6 scheduler

Hello,
       We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.

We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.

The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.

If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.

The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.

With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 


For Active  Process:
  Time_slice 84
  Policy : SCHED_NORMAL
  Dynamic priority: 118
  Static priority: 120
  Preempt_count: 0x20100
  Flags = 0
  State = 0 (TASK_RUNNING)

For Starved Process:
  Time slice: 77
  Policy: SCHED_NORMAL
  Dynamic priority: 120
  Static priority: 120
  Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
  Flags = 0 
  State = 0 (TASK_RUNNING)



CONFIG_PREEMPT is not set.
The system has single CPU.


Any help to debug the problem is welcome. 

Kallol
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* process starvation with 2.6 scheduler
@ 2006-06-05 20:46 Kallol Biswas
  0 siblings, 0 replies; 16+ messages in thread
From: Kallol Biswas @ 2006-06-05 20:46 UTC (permalink / raw)
  To: linux-kernel


From: Kallol Biswas 
Sent: Monday, June 05, 2006 12:49 PM
To: 'linux-kernel@vger.kernel.org'
Subject: process starvation with 2.6 scheduler

Hello,
       We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.

We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.

The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.

If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.

The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.

With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 


For Active  Process:
  Time_slice 84
  Policy : SCHED_NORMAL
  Dynamic priority: 118
  Static priority: 120
  Preempt_count: 0x20100
  Flags = 0
  State = 0 (TASK_RUNNING)

For Starved Process:
  Time slice: 77
  Policy: SCHED_NORMAL
  Dynamic priority: 120
  Static priority: 120
  Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
  Flags = 0 
  State = 0 (TASK_RUNNING)



CONFIG_PREEMPT is not set.
The system has single CPU.


Any help to debug the problem is welcome. 

Kallol

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: process starvation with 2.6 scheduler
@ 2006-06-05 20:36 Kallol Biswas
  0 siblings, 0 replies; 16+ messages in thread
From: Kallol Biswas @ 2006-06-05 20:36 UTC (permalink / raw)
  To: Kallol Biswas, linux-kernel

More information: CONFIG_PREEMPT is not set.
                  The system has single CPU.

-----Original Message-----
From: Kallol Biswas 
Sent: Monday, June 05, 2006 12:49 PM
To: 'linux-kernel@vger.kernel.org'
Subject: process starvation with 2.6 scheduler

Hello,
       We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.

We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.

The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.

If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.

The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.

With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. 


For Active  Process:
  Time_slice 84
  Policy : SCHED_NORMAL
  Dynamic priority: 118
  Static priority: 120
  Preempt_count: 0x20100
  Flags = 0
  State = 0 (TASK_RUNNING)

For Starved Process:
  Time slice: 77
  Policy: SCHED_NORMAL
  Dynamic priority: 120
  Static priority: 120
  Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
  Flags = 0 
  State = 0 (TASK_RUNNING)

Any help to debug the problem is welcome. 

Kallol

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-06-14 19:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-05 19:48 process starvation with 2.6 scheduler Kallol Biswas
2006-06-05 23:49 ` (no subject) Hack Sung Lee
2006-06-06  8:01 ` process starvation with 2.6 scheduler Mike Galbraith
2006-06-06 16:55   ` Stephen Hemminger
2006-06-05 20:36 Kallol Biswas
2006-06-05 20:46 Kallol Biswas
2006-06-05 21:30 Kallol Biswas
2006-06-05 23:09 Kallol Biswas
2006-06-06 17:55 Kallol Biswas
2006-06-06 18:51 Kallol Biswas
2006-06-06 21:58 Kallol Biswas
2006-06-07  5:27 ` Mike Galbraith
2006-06-13 23:03 Kallol Biswas
2006-06-14  5:13 ` Mike Galbraith
2006-06-14 17:56 Kallol Biswas
2006-06-14 19:26 ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).