* posible latency issues in seq_read
@ 2007-07-20 21:15 Chris Friesen
2007-07-20 22:18 ` Lee Revell
0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2007-07-20 21:15 UTC (permalink / raw)
To: linux-kernel
We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
packets on our server. Preempt is disabled, and NAPI is enabled.
It appears that for some reason the networking softirq is not being
handled in a timely fashion, which means that the rx ring buffer fills
up and packets overflow.
It appears that the problem path is:
seq_read
tcp_seq_next
established_get_next
read_lock/read_unlock
The issue appears to be related to the amount of time that this syscall
takes. While we're in the syscall we cannot run the softirqd thread,
and so the rx buffer is not being cleaned.
The fact that there are kmalloc(GFP_KERNEL) calls in seq_read() seems to
indicate that sleeping is safe, so would it be reasonable to call
schedule() periodically (maybe based on elapsed time) to ensure that
system latency is kept under control?
Thanks,
Chris
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: posible latency issues in seq_read
2007-07-20 21:15 posible latency issues in seq_read Chris Friesen
@ 2007-07-20 22:18 ` Lee Revell
2007-07-20 22:39 ` Chris Friesen
0 siblings, 1 reply; 5+ messages in thread
From: Lee Revell @ 2007-07-20 22:18 UTC (permalink / raw)
To: Chris Friesen; +Cc: linux-kernel
On 7/20/07, Chris Friesen <cfriesen@nortel.com> wrote:
>
> We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
> packets on our server. Preempt is disabled, and NAPI is enabled.
>
Can you reproduce with a recent kernel? Lots of latency issues have
been fixed since then.
Lee
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: posible latency issues in seq_read
2007-07-20 22:18 ` Lee Revell
@ 2007-07-20 22:39 ` Chris Friesen
2007-07-21 3:46 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2007-07-20 22:39 UTC (permalink / raw)
To: Lee Revell; +Cc: linux-kernel
Lee Revell wrote:
> On 7/20/07, Chris Friesen <cfriesen@nortel.com> wrote:
>> We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
>> packets on our server. Preempt is disabled, and NAPI is enabled.
> Can you reproduce with a recent kernel? Lots of latency issues have
> been fixed since then.
Unfortunately I have to fix it on this version (the bug was found on
shipped product), so if there was a difference I'd have to isolate the
changes and backport them. Also, I can't run the software that triggers
the problem on a newer kernel as it has dependencies on various patches
that are not in mainline.
Basically what I'd like to know is whether calling schedule() in
seq_read() is safe or whether it would break assumptions made by
seq_file users.
Chris
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: posible latency issues in seq_read
2007-07-20 22:39 ` Chris Friesen
@ 2007-07-21 3:46 ` Eric Dumazet
2007-07-23 17:45 ` Chris Friesen
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2007-07-21 3:46 UTC (permalink / raw)
To: Chris Friesen; +Cc: Lee Revell, linux-kernel, linux-net
Chris Friesen a écrit :
> Lee Revell wrote:
>> On 7/20/07, Chris Friesen <cfriesen@nortel.com> wrote:
>
>>> We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
>>> packets on our server. Preempt is disabled, and NAPI is enabled.
>
>> Can you reproduce with a recent kernel? Lots of latency issues have
>> been fixed since then.
>
> Unfortunately I have to fix it on this version (the bug was found on
> shipped product), so if there was a difference I'd have to isolate the
> changes and backport them. Also, I can't run the software that triggers
> the problem on a newer kernel as it has dependencies on various patches
> that are not in mainline.
>
> Basically what I'd like to know is whether calling schedule() in
> seq_read() is safe or whether it would break assumptions made by
> seq_file users.
>
It wont help much. seq_read() is fine in itself.
The problem is in established_get_next() and established_get_first() not
allowing softirq processing, while scanning a possibly huge hash table, even
if few sockets are hashed in.
As cond_resched_softirq() was added in linux-2.6.11, you probably *need* to
check the diffs between linux-2.6.10 & linux-2.6.11
files :
include/linux/sched.h
net/core/sock.c (__release_sock() latency)
net/ipv4/tcp_ipv4.c (/proc/net/tcp latency)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: posible latency issues in seq_read
2007-07-21 3:46 ` Eric Dumazet
@ 2007-07-23 17:45 ` Chris Friesen
0 siblings, 0 replies; 5+ messages in thread
From: Chris Friesen @ 2007-07-23 17:45 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Lee Revell, linux-kernel, linux-net
Eric Dumazet wrote:
> The problem is in established_get_next() and established_get_first() not
> allowing softirq processing, while scanning a possibly huge hash table,
> even if few sockets are hashed in.
>
> As cond_resched_softirq() was added in linux-2.6.11, you probably *need*
> to check the diffs between linux-2.6.10 & linux-2.6.11
Thanks for the pointers to the likely culprits.
Chris
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-07-23 17:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-20 21:15 posible latency issues in seq_read Chris Friesen
2007-07-20 22:18 ` Lee Revell
2007-07-20 22:39 ` Chris Friesen
2007-07-21 3:46 ` Eric Dumazet
2007-07-23 17:45 ` Chris Friesen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).