linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* posible latency issues in seq_read
@ 2007-07-20 21:15 Chris Friesen
  2007-07-20 22:18 ` Lee Revell
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2007-07-20 21:15 UTC (permalink / raw)
  To: linux-kernel


We've run into an issue (on 2.6.10) where calling "lsof" triggers lost 
packets on our server.  Preempt is disabled, and NAPI is enabled.

It appears that for some reason the networking softirq is not being 
handled in a timely fashion, which means that the rx ring buffer fills 
up and packets overflow.

It appears that the problem path is:

seq_read
	tcp_seq_next
		established_get_next
			read_lock/read_unlock

The issue appears to be related to the amount of time that this syscall 
takes.  While we're in the syscall we cannot run the softirqd thread, 
and so the rx buffer is not being cleaned.

The fact that there are kmalloc(GFP_KERNEL) calls in seq_read() seems to 
indicate that sleeping is safe, so would it be reasonable to call 
schedule() periodically (maybe based on elapsed time) to ensure that 
system latency is kept under control?

Thanks,

Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: posible latency issues in seq_read
  2007-07-20 21:15 posible latency issues in seq_read Chris Friesen
@ 2007-07-20 22:18 ` Lee Revell
  2007-07-20 22:39   ` Chris Friesen
  0 siblings, 1 reply; 5+ messages in thread
From: Lee Revell @ 2007-07-20 22:18 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-kernel

On 7/20/07, Chris Friesen <cfriesen@nortel.com> wrote:
>
> We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
> packets on our server.  Preempt is disabled, and NAPI is enabled.
>

Can you reproduce with a recent kernel?  Lots of latency issues have
been fixed since then.

Lee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: posible latency issues in seq_read
  2007-07-20 22:18 ` Lee Revell
@ 2007-07-20 22:39   ` Chris Friesen
  2007-07-21  3:46     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2007-07-20 22:39 UTC (permalink / raw)
  To: Lee Revell; +Cc: linux-kernel

Lee Revell wrote:
> On 7/20/07, Chris Friesen <cfriesen@nortel.com> wrote:

>> We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
>> packets on our server.  Preempt is disabled, and NAPI is enabled.

> Can you reproduce with a recent kernel?  Lots of latency issues have
> been fixed since then.

Unfortunately I have to fix it on this version (the bug was found on 
shipped product), so if there was a difference I'd have to isolate the 
changes and backport them.  Also, I can't run the software that triggers 
the problem on a newer kernel as it has dependencies on various patches 
that are not in mainline.

Basically what I'd like to know is whether calling schedule() in 
seq_read() is safe or whether it would break assumptions made by 
seq_file users.

Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: posible latency issues in seq_read
  2007-07-20 22:39   ` Chris Friesen
@ 2007-07-21  3:46     ` Eric Dumazet
  2007-07-23 17:45       ` Chris Friesen
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2007-07-21  3:46 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Lee Revell, linux-kernel, linux-net

Chris Friesen a écrit :
> Lee Revell wrote:
>> On 7/20/07, Chris Friesen <cfriesen@nortel.com> wrote:
> 
>>> We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
>>> packets on our server.  Preempt is disabled, and NAPI is enabled.
> 
>> Can you reproduce with a recent kernel?  Lots of latency issues have
>> been fixed since then.
> 
> Unfortunately I have to fix it on this version (the bug was found on 
> shipped product), so if there was a difference I'd have to isolate the 
> changes and backport them.  Also, I can't run the software that triggers 
> the problem on a newer kernel as it has dependencies on various patches 
> that are not in mainline.
> 
> Basically what I'd like to know is whether calling schedule() in 
> seq_read() is safe or whether it would break assumptions made by 
> seq_file users.
> 

It wont help much. seq_read() is fine in itself.

The problem is in established_get_next() and established_get_first() not 
allowing softirq processing, while scanning a possibly huge hash table, even 
if few sockets are hashed in.

As cond_resched_softirq() was added in linux-2.6.11, you probably *need* to 
check the diffs between linux-2.6.10 & linux-2.6.11

files :

include/linux/sched.h
net/core/sock.c      (__release_sock() latency)
net/ipv4/tcp_ipv4.c  (/proc/net/tcp latency)



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: posible latency issues in seq_read
  2007-07-21  3:46     ` Eric Dumazet
@ 2007-07-23 17:45       ` Chris Friesen
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Friesen @ 2007-07-23 17:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Lee Revell, linux-kernel, linux-net

Eric Dumazet wrote:

> The problem is in established_get_next() and established_get_first() not 
> allowing softirq processing, while scanning a possibly huge hash table, 
> even if few sockets are hashed in.
> 
> As cond_resched_softirq() was added in linux-2.6.11, you probably *need* 
> to check the diffs between linux-2.6.10 & linux-2.6.11

Thanks for the pointers to the likely culprits.

Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-07-23 17:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-20 21:15 posible latency issues in seq_read Chris Friesen
2007-07-20 22:18 ` Lee Revell
2007-07-20 22:39   ` Chris Friesen
2007-07-21  3:46     ` Eric Dumazet
2007-07-23 17:45       ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).