All of lore.kernel.org
 help / color / mirror / Atom feed
* Help? sendfile() blocked in sk_stream_wait_memory()
@ 2011-01-31 21:47 Patrick J. LoPresti
  0 siblings, 0 replies; only message in thread
From: Patrick J. LoPresti @ 2011-01-31 21:47 UTC (permalink / raw)
  To: linux-kernel

Hello.  I have a client/server application that has been working fine
for years on dozens of systems deployed in the field.

I am working on upgrading our systems to newer versions of hardware
and Linux, and now my application is occasionally hanging in
sendfile().  The hang is moderately hard to reproduce.

My kernel version is 2.6.32.27-0.2-default (Suse 11 SP1 latest
update).  I am working this problem though Suse, but I am hoping
someone here could kindly give me some pointers as well.

Here is the backtrace from /proc/<pid>/stack:

[<ffffffff812efdc8>] sk_stream_wait_memory+0x1a8/0x250
[<ffffffff8132c9b9>] do_tcp_sendpages+0x209/0x500
[<ffffffff8132cd3e>] tcp_sendpage+0x8e/0xa0
[<ffffffff812e2446>] kernel_sendpage+0x16/0x30
[<ffffffff812e2495>] sock_sendpage+0x35/0x40
[<ffffffff8111f12f>] pipe_to_sendpage+0x5f/0x90
[<ffffffff8111f1cd>] splice_from_pipe_feed+0x6d/0x120
[<ffffffff8111f74e>] __splice_from_pipe+0x5e/0x80
[<ffffffff8111f7be>] splice_from_pipe+0x4e/0x70
[<ffffffff8111fcfb>] direct_splice_actor+0x1b/0x20
[<ffffffff81120474>] splice_direct_to_actor+0xe4/0x1c0
[<ffffffff8112059b>] do_splice_direct+0x4b/0x70
[<ffffffff810fd02e>] do_sendfile+0x19e/0x210
[<ffffffff810fd12e>] sys_sendfile64+0x8e/0xb0
[<ffffffff81002f7b>] system_call_fastpath+0x16/0x1b

(Briefly, the client uses sendfile() to push data to the server, which
uses recv() to receive it.)

Using gdb, I have verified that the client is blocked in sendfile()
and the server is blocked in recv() on the socket between them.

I have disassembled my vmlinux to verify that
sk_stream_wait_memory+0x1a8/0x250 is the address following a call to
schedule_timeout(), as one might expect.

netstat shows both sides of the socket in "CONNECTED" state.

I have hammered the network connection between these systems pretty
hard and it is not showing any problems that I can discern.  (This is
a 10GigE connection, for what it is worth.)  I am working on building
a duplicate system to help verify that it is not a hardware problem.

My question is this:  What is my next step for debugging this?  As far
as I can tell, the socket has just sort of...  "stopped", for no
apparent reason.  I am not afraid to add some instrumentation to my
kernel, but I do not understand the socket code well enough even to
know where to begin.

Alternatively, any ideas for changes I could make to my system
configuration or application (e.g., adjusting sndbuf size?), even if
it were just a work-around and not a fix, would be appreciated.

Thanks.

 - Pat

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2011-01-31 21:47 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-31 21:47 Help? sendfile() blocked in sk_stream_wait_memory() Patrick J. LoPresti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.