linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* connection failure after "tcp: remove max_qlen_log"
@ 2016-01-23 23:54 Kui Zhang
  2016-01-24  0:11 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Kui Zhang @ 2016-01-23 23:54 UTC (permalink / raw)
  To: edumazet, davem; +Cc: linux-kernel

Hello,

One of our java software would not start after this:

# first bad commit: [ef547f2ac16bd9d77a780a0e7c70857e69e8f23f] tcp:
remove max_qlen_log


Log from software:

STATUS | monitor  | 2016/01/23 14:22:55 | Launching a Service...
INFO   | buserver | 2016/01/23 14:23:00 | WrapperManager class
initialized by thread: main  Using classloader:
sun.misc.Launcher$AppClassLoader@2876b359
INFO   | buserver | 2016/01/23 14:23:00 |
INFO   | buserver | 2016/01/23 14:23:00 | Wrapper Manager: JVM #1
INFO   | buserver | 2016/01/23 14:23:00 | Running a 64-bit JVM.
INFO   | buserver | 2016/01/23 14:23:00 | Wrapper Manager: Registering
shutdown hook
INFO   | buserver | 2016/01/23 14:23:00 | Wrapper Manager: Using wrapper
INFO   | buserver | 2016/01/23 14:23:00 | Load native library.  One or
more attempts may fail if platform specific libraries do not exist.
INFO   | buserver | 2016/01/23 14:23:00 | Loading native library
failed: libwrapper-linux-x86-64.so  Cause:
java.lang.UnsatisfiedLinkError: no wrapper-linux-x86-64 in
java.library.path
INFO   | buserver | 2016/01/23 14:23:00 | Loaded native library: libwrapper.so
INFO   | buserver | 2016/01/23 14:23:00 | Calling native initialization method.
INFO   | buserver | 2016/01/23 14:23:00 | Inside native WrapperManager
initialization method
INFO   | buserver | 2016/01/23 14:23:00 | Java Version   :
1.7.0_80-b15 Java HotSpot(TM) 64-Bit Server VM
INFO   | buserver | 2016/01/23 14:23:00 | Java VM Vendor : Oracle Corporation
INFO   | buserver | 2016/01/23 14:23:00 |
INFO   | buserver | 2016/01/23 14:23:00 |
WrapperManager.start(com.r1soft.backup.server.BUServerWrapper@7e515f40,
args["wait"]) called by thread: main
INFO   | buserver | 2016/01/23 14:23:00 | Open socket to
wrapper...Wrapper-Connection
ERROR  | monitor  | 2016/01/23 14:23:24 | Startup failed: Timed out
waiting for a signal from the Service.


### normal output around this point
### INFO   | buserver | 2016/01/23 14:30:57 | Opened Socket from 31000 to 32000


ADVICE | monitor  | 2016/01/23 14:23:24 |
ADVICE | monitor  | 2016/01/23 14:23:24 |
------------------------------------------------------------------------
ADVICE | monitor  | 2016/01/23 14:23:24 | Advice:
ADVICE | monitor  | 2016/01/23 14:23:24 | The Wrapper consists of a
native component as well as a set of classes
ADVICE | monitor  | 2016/01/23 14:23:24 | which run within the Service
that it launches.  The Java component of the
ADVICE | monitor  | 2016/01/23 14:23:24 | Wrapper must be initialized
promptly after the Service is launched or the
ADVICE | monitor  | 2016/01/23 14:23:24 | Wrapper will timeout, as
just happened.  Most likely the main class
ADVICE | monitor  | 2016/01/23 14:23:24 | specified in the Wrapper
configuration file is not correctly initializing
ADVICE | monitor  | 2016/01/23 14:23:24 | the Wrapper classes:
ADVICE | monitor  | 2016/01/23 14:23:24 |     Main
ADVICE | monitor  | 2016/01/23 14:23:24 | While it is possible to do
so manually, the Wrapper ships with helper
ADVICE | monitor  | 2016/01/23 14:23:24 | classes to make this
initialization processes automatic.
ADVICE | monitor  | 2016/01/23 14:23:24 | Please review the
integration section of the Wrapper's documentation
ADVICE | monitor  | 2016/01/23 14:23:24 | for the various methods
which can be employed to launch an application
ADVICE | monitor  | 2016/01/23 14:23:24 | within the Wrapper:
ADVICE | monitor  | 2016/01/23 14:23:24 |
http://wrapper.tanukisoftware.org/doc/english/integrate.html
ADVICE | monitor  | 2016/01/23 14:23:24 |
------------------------------------------------------------------------
ADVICE | monitor  | 2016/01/23 14:23:24 |
ERROR  | monitor  | 2016/01/23 14:23:24 | JVM did not exit on request,
terminated
DEBUG  | monitor  | 2016/01/23 14:23:24 | Signal trapped.  Details:
DEBUG  | monitor  | 2016/01/23 14:23:24 |   signal number=17
(SIGCHLD), source="unknown"
DEBUG  | monitor  | 2016/01/23 14:23:24 | Received SIGCHLD, calling wait().
DEBUG  | monitor  | 2016/01/23 14:23:24 | wait() returned, child
process should be gone.
STATUS | monitor  | 2016/01/23 14:23:24 | Service Restarts disabled.
Shutting down.
STATUS | monitor  | 2016/01/23 14:23:24 | <-- Monitor Stopped



strace snippet:

10685 nanosleep({0, 100000000},  <unfinished ...>
10684 <... nanosleep resumed> NULL)     = 0
10684 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
10684 fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
10684 bind(4, {sa_family=AF_INET, sin_port=htons(32000),
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
10684 stat("/etc/localtime", {st_dev=makedev(9, 2), st_ino=1828322,
st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096,
st_blocks=8, st_size=2875, st_atime=2015/10/13-11:06:06,
st_mtime=2015/10/13-11:06:06, st_ctime=2015/10/13-11:06:06}) = 0
10684 listen(4, 0)                      = 0
10684 accept(4, 0x7ffcae409990, [16])   = -1 EAGAIN (Resource
temporarily unavailable)
10684 write(3, "STATUS | monitor  | 2016/01/22 15:28:05 | --> Monitor
Started as Daemon\nDEBUG  | monitor  | 2016/01/22 15:28:05 | Using
tick timer.\nDEBUG  | monitorp | 2016/01/22 15:28:05 | server
listening on port 32000.\n", 206) = 206
10684 nanosleep({0, 100000000},  <unfinished ...>
10685 <... nanosleep resumed> NULL)     = 0
10685 nanosleep({0, 100000000},  <unfinished ...>
10684 <... nanosleep resumed> NULL)     = 0
10684 accept(4, 0x7ffcae409990, [16])   = -1 EAGAIN (Resource
temporarily unavailable)
10684 stat("/etc/localtime", {st_dev=makedev(9, 2), st_ino=1828322,
st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096,
st_blocks=8, st_size=2875, st_atime=2015/10/13-11:06:06,
st_mtime=2015/10/13-11:06:06, st_ctime=2015/10/13-11:06:06}) = 0


10684 <... nanosleep resumed> NULL)     = 0
10684 read(5, 0x7ffcae4095c0, 1024)     = -1 EAGAIN (Resource
temporarily unavailable)
10684 accept(4,  <unfinished ...>
10686 lstat("/usr",  <unfinished ...>
10684 <... accept resumed> 0x7ffcae409990, [16]) = -1 EAGAIN (Resource
temporarily unavailable)
10686 <... lstat resumed> {st_dev=makedev(9, 2), st_ino=5922817,
st_mode=S_IFDIR|0755, st_nlink=11, st_uid=0, st_gid=0,
st_blksize=4096, st_blocks=8, st_size=4096,
st_atime=2010/04/27-15:01:55, st_mtime=2012/07/19-10:30:24,
st_ctime=2012/07/19-10:30:24}) = 0
10684 wait4(10686,  <unfinished ...>


10684 read(5, 0x7ffcae4095c0, 1024)     = -1 EAGAIN (Resource
temporarily unavailable)
10684 accept(4, 0x7ffcae409990, [16])   = -1 EAGAIN (Resource
temporarily unavailable)
10684 wait4(10686, 0x7ffcae409a04, WNOHANG, NULL) = 0
10684 nanosleep({0, 100000000},  <unfinished ...>
10702 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
10702 futex(0x23b8128, FUTEX_WAKE_PRIVATE, 1) = 0
10702 futex(0x23b8154, FUTEX_WAIT_BITSET_PRIVATE, 1, {3795,
461801741}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
10702 futex(0x23b8128, FUTEX_WAKE_PRIVATE, 1) = 0
10702 futex(0x23b8154, FUTEX_WAIT_BITSET_PRIVATE, 1, {3795,
511937320}, ffffffff <unfinished ...>
10703 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
10703 futex(0x2a49428, FUTEX_WAKE_PRIVATE, 1) = 0
10703 futex(0x2a49454, FUTEX_WAIT_BITSET_PRIVATE, 1, {3795,
577866793}, ffffffff <unfinished ...>
10695 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
10695 futex(0x237ea28, FUTEX_WAKE_PRIVATE, 1) = 0
10695 futex(0x237ea54, FUTEX_WAIT_BITSET_PRIVATE, 1, {3796,
488884445}, ffffffff <unfinished ...>
10685 <... nanosleep resumed> NULL)     = 0
10685 nanosleep({0, 100000000},  <unfinished ...>
10684 <... nanosleep resumed> NULL)     = 0
10684 read(5, 0x7ffcae4095c0, 1024)     = -1 EAGAIN (Resource
temporarily unavailable)
10684 accept(4, 0x7ffcae409990, [16])   = -1 EAGAIN (Resource
temporarily unavailable)
10684 wait4(10686, 0x7ffcae409a04, WNOHANG, NULL) = 0
10684 nanosleep({0, 100000000},  <unfinished ...>
10702 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
10702 futex(0x23b8128, FUTEX_WAKE_PRIVATE, 1) = 0
10702 futex(0x23b8154, FUTEX_WAIT_BITSET_PRIVATE, 1, {3795,
562081947}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
10702 futex(0x23b8128, FUTEX_WAKE_PRIVATE, 1) = 0
10702 futex(0x23b8154, FUTEX_WAIT_BITSET_PRIVATE, 1, {3795,
612211605}, ffffffff <unfinished ...>
10703 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
10703 futex(0x2a49428, FUTEX_WAKE_PRIVATE, 1) = 0
10703 futex(0x2a49454, FUTEX_WAIT_BITSET_PRIVATE, 1, {3795,
678010858}, ffffffff <unfinished ...>
10685 <... nanosleep resumed> NULL)     = 0
10685 nanosleep({0, 100000000},  <unfinished ...>
10684 <... nanosleep resumed> NULL)     = 0
10684 read(5, 0x7ffcae4095c0, 1024)     = -1 EAGAIN (Resource
temporarily unavailable)
10684 accept(4, 0x7ffcae409990, [16])   = -1 EAGAIN (Resource
temporarily unavailable)


thanks
Kui.Z

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: connection failure after "tcp: remove max_qlen_log"
  2016-01-23 23:54 connection failure after "tcp: remove max_qlen_log" Kui Zhang
@ 2016-01-24  0:11 ` Eric Dumazet
  2016-01-24  2:08   ` Kui Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-01-24  0:11 UTC (permalink / raw)
  To: Kui Zhang; +Cc: edumazet, davem, linux-kernel

On Sat, 2016-01-23 at 15:54 -0800, Kui Zhang wrote:
> Hello,
> 
> One of our java software would not start after this:
> 
> # first bad commit: [ef547f2ac16bd9d77a780a0e7c70857e69e8f23f] tcp:
> remove max_qlen_log
> 
> 
> Log from software:
> 
> STATUS | monitor  | 2016/01/23 14:22:55 | Launching a Service...
> INFO   | buserver | 2016/01/23 14:23:00 | WrapperManager class
> initialized by thread: main  Using classloader:
> sun.misc.Launcher$AppClassLoader@2876b359
> INFO   | buserver | 2016/01/23 14:23:00 |
> INFO   | buserver | 2016/01/23 14:23:00 | Wrapper Manager: JVM #1
> INFO   | buserver | 2016/01/23 14:23:00 | Running a 64-bit JVM.
> INFO   | buserver | 2016/01/23 14:23:00 | Wrapper Manager: Registering
> shutdown hook
> INFO   | buserver | 2016/01/23 14:23:00 | Wrapper Manager: Using wrapper
> INFO   | buserver | 2016/01/23 14:23:00 | Load native library.  One or
> more attempts may fail if platform specific libraries do not exist.
> INFO   | buserver | 2016/01/23 14:23:00 | Loading native library
> failed: libwrapper-linux-x86-64.so  Cause:
> java.lang.UnsatisfiedLinkError: no wrapper-linux-x86-64 in
> java.library.path
> INFO   | buserver | 2016/01/23 14:23:00 | Loaded native library: libwrapper.so
> INFO   | buserver | 2016/01/23 14:23:00 | Calling native initialization method.
> INFO   | buserver | 2016/01/23 14:23:00 | Inside native WrapperManager
> initialization method
> INFO   | buserver | 2016/01/23 14:23:00 | Java Version   :
> 1.7.0_80-b15 Java HotSpot(TM) 64-Bit Server VM
> INFO   | buserver | 2016/01/23 14:23:00 | Java VM Vendor : Oracle Corporation
> INFO   | buserver | 2016/01/23 14:23:00 |
> INFO   | buserver | 2016/01/23 14:23:00 |
> WrapperManager.start(com.r1soft.backup.server.BUServerWrapper@7e515f40,
> args["wait"]) called by thread: main
> INFO   | buserver | 2016/01/23 14:23:00 | Open socket to
> wrapper...Wrapper-Connection
> ERROR  | monitor  | 2016/01/23 14:23:24 | Startup failed: Timed out
> waiting for a signal from the Service.
> 
> 
> ### normal output around this point
> ### INFO   | buserver | 2016/01/23 14:30:57 | Opened Socket from 31000 to 32000
> 
> 
> ADVICE | monitor  | 2016/01/23 14:23:24 |
> ADVICE | monitor  | 2016/01/23 14:23:24 |
> ------------------------------------------------------------------------
> ADVICE | monitor  | 2016/01/23 14:23:24 | Advice:
> ADVICE | monitor  | 2016/01/23 14:23:24 | The Wrapper consists of a
> native component as well as a set of classes
> ADVICE | monitor  | 2016/01/23 14:23:24 | which run within the Service
> that it launches.  The Java component of the
> ADVICE | monitor  | 2016/01/23 14:23:24 | Wrapper must be initialized
> promptly after the Service is launched or the
> ADVICE | monitor  | 2016/01/23 14:23:24 | Wrapper will timeout, as
> just happened.  Most likely the main class
> ADVICE | monitor  | 2016/01/23 14:23:24 | specified in the Wrapper
> configuration file is not correctly initializing
> ADVICE | monitor  | 2016/01/23 14:23:24 | the Wrapper classes:
> ADVICE | monitor  | 2016/01/23 14:23:24 |     Main
> ADVICE | monitor  | 2016/01/23 14:23:24 | While it is possible to do
> so manually, the Wrapper ships with helper
> ADVICE | monitor  | 2016/01/23 14:23:24 | classes to make this
> initialization processes automatic.
> ADVICE | monitor  | 2016/01/23 14:23:24 | Please review the
> integration section of the Wrapper's documentation
> ADVICE | monitor  | 2016/01/23 14:23:24 | for the various methods
> which can be employed to launch an application
> ADVICE | monitor  | 2016/01/23 14:23:24 | within the Wrapper:
> ADVICE | monitor  | 2016/01/23 14:23:24 |
> http://wrapper.tanukisoftware.org/doc/english/integrate.html
> ADVICE | monitor  | 2016/01/23 14:23:24 |
> ------------------------------------------------------------------------
> ADVICE | monitor  | 2016/01/23 14:23:24 |
> ERROR  | monitor  | 2016/01/23 14:23:24 | JVM did not exit on request,
> terminated
> DEBUG  | monitor  | 2016/01/23 14:23:24 | Signal trapped.  Details:
> DEBUG  | monitor  | 2016/01/23 14:23:24 |   signal number=17
> (SIGCHLD), source="unknown"
> DEBUG  | monitor  | 2016/01/23 14:23:24 | Received SIGCHLD, calling wait().
> DEBUG  | monitor  | 2016/01/23 14:23:24 | wait() returned, child
> process should be gone.
> STATUS | monitor  | 2016/01/23 14:23:24 | Service Restarts disabled.
> Shutting down.
> STATUS | monitor  | 2016/01/23 14:23:24 | <-- Monitor Stopped
> 
> 
> 
> strace snippet:
> 
> 10685 nanosleep({0, 100000000},  <unfinished ...>
> 10684 <... nanosleep resumed> NULL)     = 0
> 10684 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
> 10684 fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
> 10684 bind(4, {sa_family=AF_INET, sin_port=htons(32000),
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
> 10684 stat("/etc/localtime", {st_dev=makedev(9, 2), st_ino=1828322,
> st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096,
> st_blocks=8, st_size=2875, st_atime=2015/10/13-11:06:06,
> st_mtime=2015/10/13-11:06:06, st_ctime=2015/10/13-11:06:06}) = 0
> 10684 listen(4, 0)                      = 0


listen(fd, 0) ?

> 10684 accept(4, 0x7ffcae409990, [16])   = -1 EAGAIN (Resource
> temporarily unavailable)
...
> 10684 accept(4, 0x7ffcae409990, [16])   = -1 EAGAIN (Resource
> temporarily unavailable)
...

It seems the bug is in user space program.

Use listen(fd, 10) or listen(fd, 1000) ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: connection failure after "tcp: remove max_qlen_log"
  2016-01-24  0:11 ` Eric Dumazet
@ 2016-01-24  2:08   ` Kui Zhang
  2016-01-24  2:45     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Kui Zhang @ 2016-01-24  2:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, davem, linux-kernel

Per man page, listen(fd, 0) is valid.

A  backlog  argument  of  0 may allow the socket to accept
connections, in which case the length of the listen queue may be set
to an implementation-defined minimum value.

http://pubs.opengroup.org/onlinepubs/009695399/functions/listen.html


I just notice additional entries in dmesg. Which do not appear previously.

[101304.240385] TCP: request_sock_TCP: Possible SYN flooding on port
32000. Sending cookies.  Check SNMP counters.
[178237.343790] TCP: request_sock_TCP: Possible SYN flooding on port
42000. Sending cookies.  Check SNMP counters.


#!/usr/bin/python

import socket
sock = socket.socket()
sock.bind(('127.0.0.1',42000))
sock.listen(0)
s, a= sock.accept()

s.send('test')


And telnet to it.





On Sat, Jan 23, 2016 at 4:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2016-01-23 at 15:54 -0800, Kui Zhang wrote:
>> Hello,
>>
>> One of our java software would not start after this:
>>
>> # first bad commit: [ef547f2ac16bd9d77a780a0e7c70857e69e8f23f] tcp:
>> remove max_qlen_log
>>
>>
>> Log from software:
>>
>> STATUS | monitor  | 2016/01/23 14:22:55 | Launching a Service...
>> INFO   | buserver | 2016/01/23 14:23:00 | WrapperManager class
>> initialized by thread: main  Using classloader:
>> sun.misc.Launcher$AppClassLoader@2876b359
>> INFO   | buserver | 2016/01/23 14:23:00 |
>> INFO   | buserver | 2016/01/23 14:23:00 | Wrapper Manager: JVM #1
>> INFO   | buserver | 2016/01/23 14:23:00 | Running a 64-bit JVM.
>> INFO   | buserver | 2016/01/23 14:23:00 | Wrapper Manager: Registering
>> shutdown hook
>> INFO   | buserver | 2016/01/23 14:23:00 | Wrapper Manager: Using wrapper
>> INFO   | buserver | 2016/01/23 14:23:00 | Load native library.  One or
>> more attempts may fail if platform specific libraries do not exist.
>> INFO   | buserver | 2016/01/23 14:23:00 | Loading native library
>> failed: libwrapper-linux-x86-64.so  Cause:
>> java.lang.UnsatisfiedLinkError: no wrapper-linux-x86-64 in
>> java.library.path
>> INFO   | buserver | 2016/01/23 14:23:00 | Loaded native library: libwrapper.so
>> INFO   | buserver | 2016/01/23 14:23:00 | Calling native initialization method.
>> INFO   | buserver | 2016/01/23 14:23:00 | Inside native WrapperManager
>> initialization method
>> INFO   | buserver | 2016/01/23 14:23:00 | Java Version   :
>> 1.7.0_80-b15 Java HotSpot(TM) 64-Bit Server VM
>> INFO   | buserver | 2016/01/23 14:23:00 | Java VM Vendor : Oracle Corporation
>> INFO   | buserver | 2016/01/23 14:23:00 |
>> INFO   | buserver | 2016/01/23 14:23:00 |
>> WrapperManager.start(com.r1soft.backup.server.BUServerWrapper@7e515f40,
>> args["wait"]) called by thread: main
>> INFO   | buserver | 2016/01/23 14:23:00 | Open socket to
>> wrapper...Wrapper-Connection
>> ERROR  | monitor  | 2016/01/23 14:23:24 | Startup failed: Timed out
>> waiting for a signal from the Service.
>>
>>
>> ### normal output around this point
>> ### INFO   | buserver | 2016/01/23 14:30:57 | Opened Socket from 31000 to 32000
>>
>>
>> ADVICE | monitor  | 2016/01/23 14:23:24 |
>> ADVICE | monitor  | 2016/01/23 14:23:24 |
>> ------------------------------------------------------------------------
>> ADVICE | monitor  | 2016/01/23 14:23:24 | Advice:
>> ADVICE | monitor  | 2016/01/23 14:23:24 | The Wrapper consists of a
>> native component as well as a set of classes
>> ADVICE | monitor  | 2016/01/23 14:23:24 | which run within the Service
>> that it launches.  The Java component of the
>> ADVICE | monitor  | 2016/01/23 14:23:24 | Wrapper must be initialized
>> promptly after the Service is launched or the
>> ADVICE | monitor  | 2016/01/23 14:23:24 | Wrapper will timeout, as
>> just happened.  Most likely the main class
>> ADVICE | monitor  | 2016/01/23 14:23:24 | specified in the Wrapper
>> configuration file is not correctly initializing
>> ADVICE | monitor  | 2016/01/23 14:23:24 | the Wrapper classes:
>> ADVICE | monitor  | 2016/01/23 14:23:24 |     Main
>> ADVICE | monitor  | 2016/01/23 14:23:24 | While it is possible to do
>> so manually, the Wrapper ships with helper
>> ADVICE | monitor  | 2016/01/23 14:23:24 | classes to make this
>> initialization processes automatic.
>> ADVICE | monitor  | 2016/01/23 14:23:24 | Please review the
>> integration section of the Wrapper's documentation
>> ADVICE | monitor  | 2016/01/23 14:23:24 | for the various methods
>> which can be employed to launch an application
>> ADVICE | monitor  | 2016/01/23 14:23:24 | within the Wrapper:
>> ADVICE | monitor  | 2016/01/23 14:23:24 |
>> http://wrapper.tanukisoftware.org/doc/english/integrate.html
>> ADVICE | monitor  | 2016/01/23 14:23:24 |
>> ------------------------------------------------------------------------
>> ADVICE | monitor  | 2016/01/23 14:23:24 |
>> ERROR  | monitor  | 2016/01/23 14:23:24 | JVM did not exit on request,
>> terminated
>> DEBUG  | monitor  | 2016/01/23 14:23:24 | Signal trapped.  Details:
>> DEBUG  | monitor  | 2016/01/23 14:23:24 |   signal number=17
>> (SIGCHLD), source="unknown"
>> DEBUG  | monitor  | 2016/01/23 14:23:24 | Received SIGCHLD, calling wait().
>> DEBUG  | monitor  | 2016/01/23 14:23:24 | wait() returned, child
>> process should be gone.
>> STATUS | monitor  | 2016/01/23 14:23:24 | Service Restarts disabled.
>> Shutting down.
>> STATUS | monitor  | 2016/01/23 14:23:24 | <-- Monitor Stopped
>>
>>
>>
>> strace snippet:
>>
>> 10685 nanosleep({0, 100000000},  <unfinished ...>
>> 10684 <... nanosleep resumed> NULL)     = 0
>> 10684 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
>> 10684 fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
>> 10684 bind(4, {sa_family=AF_INET, sin_port=htons(32000),
>> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
>> 10684 stat("/etc/localtime", {st_dev=makedev(9, 2), st_ino=1828322,
>> st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096,
>> st_blocks=8, st_size=2875, st_atime=2015/10/13-11:06:06,
>> st_mtime=2015/10/13-11:06:06, st_ctime=2015/10/13-11:06:06}) = 0
>> 10684 listen(4, 0)                      = 0
>
>
> listen(fd, 0) ?
>
>> 10684 accept(4, 0x7ffcae409990, [16])   = -1 EAGAIN (Resource
>> temporarily unavailable)
> ...
>> 10684 accept(4, 0x7ffcae409990, [16])   = -1 EAGAIN (Resource
>> temporarily unavailable)
> ...
>
> It seems the bug is in user space program.
>
> Use listen(fd, 10) or listen(fd, 1000) ?
>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: connection failure after "tcp: remove max_qlen_log"
  2016-01-24  2:08   ` Kui Zhang
@ 2016-01-24  2:45     ` Eric Dumazet
  2016-01-24 10:17       ` Kui Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-01-24  2:45 UTC (permalink / raw)
  To: Kui Zhang; +Cc: Eric Dumazet, David Miller, linux-kernel

On Sat, Jan 23, 2016 at 6:08 PM, Kui Zhang <kuizhang@gmail.com> wrote:
> Per man page, listen(fd, 0) is valid.
>
> A  backlog  argument  of  0 may allow the socket to accept
> connections, in which case the length of the listen queue may be set
> to an implementation-defined minimum value.
>
> http://pubs.opengroup.org/onlinepubs/009695399/functions/listen.html
>

It is perfectly valid, as you pointed out, to set the implementation
defined minimum to 0.

Since you do not want to depend on some magic number, just set the
backlog to 1, or even better 1000



Linux man page :


       int listen(int sockfd, int backlog);

DESCRIPTION
       listen() marks the socket referred to by sockfd as a passive
socket, that is, as a socket that will be used
       to accept incoming connection requests using accept(2).

       The sockfd argument is a file descriptor that refers to a
socket of type SOCK_STREAM or SOCK_SEQPACKET.

       The backlog argument defines the maximum length to which the
queue of pending connections  for  sockfd  may
       grow.   If  a  connection  request  arrives when the queue is
full, the client may receive an error with an
       indication of ECONNREFUSED or, if the underlying protocol
supports  retransmission,  the  request  may  be
       ignored so that a later reattempt at connection succeeds.

maximum length = 0

For the second problem it was already fixed.

commit acb4a6bfc80ddeea4c44074dd630f916259e909e
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Oct 6 14:49:58 2015 -0700

    tcp: ensure prior synack rtx behavior with small backlogs

    Some applications use a listen() backlog of 1.
-----------------------------------------------------------------------------------------------


Setting the backlog to 0 is a way to not accept connections, while
still keeping the port bound.

A listener is now able to temporarily not accept new flows.


If you want to accept connections, just set the backlog to something reasonable.

As a bonus, your daemon will not reject a connection attempt just
because few SYN_RECV sockets are waiting for the 3rd packet of 3WHS,
when dealing with large RTT.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: connection failure after "tcp: remove max_qlen_log"
  2016-01-24  2:45     ` Eric Dumazet
@ 2016-01-24 10:17       ` Kui Zhang
  2016-01-25  3:18         ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Kui Zhang @ 2016-01-24 10:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, linux-kernel

We licensed the java software. As far as I can tell, the connection is
for IPC with a child process. There should not be large RTT.

I will contact vendor regarding to listen(fd,0) issue, on Monday.

I am not fully convinced, that is the problem. I saw that man page for
listen. However accept() works, in python, with backlog = 0. The java
software works with kernel build one commit before.


strace for blocking:

listen(3, 0)                            = 0
accept(3, {sa_family=AF_INET, sin_port=htons(37562),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
sendto(4, "ping\n", 5, 0, NULL, 0)      = 5
recvfrom(4, "sd\n", 100, 0, NULL, NULL) = 3
close(4)                                = 0
accept(3, {sa_family=AF_INET, sin_port=htons(37564),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
sendto(4, "ping\n", 5, 0, NULL, 0)      = 5
recvfrom(4, "sd\n", 100, 0, NULL, NULL) = 3
close(4)                                = 0
accept(3, {sa_family=AF_INET, sin_port=htons(37572),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
sendto(4, "ping\n", 5, 0, NULL, 0)      = 5
recvfrom(4, "bbb\n", 100, 0, NULL, NULL) = 4
close(4)                                = 0
accept(3, {sa_family=AF_INET, sin_port=htons(37574),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
sendto(4, "ping\n", 5, 0, NULL, 0)      = 5
recvfrom(4, "aaa\n", 100, 0, NULL, NULL) = 4
close(4)                                = 0
accept(3, {sa_family=AF_INET, sin_port=htons(37636),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
sendto(4, "ping\n", 5, 0, NULL, 0)      = 5
recvfrom(4, "s\n", 100, 0, NULL, NULL)  = 2
close(4)                                = 0
accept(3, {sa_family=AF_INET, sin_port=htons(37638),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
sendto(4, "ping\n", 5, 0, NULL, 0)      = 5
recvfrom(4, "\n\n\n", 100, 0, NULL, NULL) = 3
close(4)                                = 0



strace for non-blocking:

5539  listen(3, 0)                      = 0
5539  select(4, [3], [], [3], {30, 0})  = 0 (Timeout)
5539  select(4, [3], [], [3], {30, 0})  = 1 (in [3], left {16, 316258})
5539  accept(3, {sa_family=AF_INET, sin_port=htons(50364),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
5539  fcntl(4, F_GETFL)                 = 0x2 (flags O_RDWR)
5539  fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
5539  getpeername(4, {sa_family=AF_INET, sin_port=htons(50364),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
5539  select(5, [3 4], [], [3 4], {30, 0}) = 1 (in [3], left {27, 898930})
5539  accept(3, {sa_family=AF_INET, sin_port=htons(50366),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
5539  fcntl(5, F_GETFL)                 = 0x2 (flags O_RDWR)
5539  fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0
5539  getpeername(5, {sa_family=AF_INET, sin_port=htons(50366),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
5539  select(6, [3 4 5], [], [3 4 5], {30, 0}) = 1 (in [3], left {27, 455817})
5539  accept(3, {sa_family=AF_INET, sin_port=htons(50368),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 6
5539  fcntl(6, F_GETFL)                 = 0x2 (flags O_RDWR)
5539  fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
5539  getpeername(6, {sa_family=AF_INET, sin_port=htons(50368),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
5539  select(7, [3 4 5 6], [], [3 4 5 6], {30, 0}) = 1 (in [6], left
{15, 227130})
5539  recvfrom(6, "a1111\n", 1024, 0, NULL, NULL) = 6
5539  select(7, [3 4 5 6], [6], [3 4 5 6], {30, 0}) = 1 (out [6], left
{29, 999984})
5539  sendto(6, "a1111\n", 6, 0, NULL, 0) = 6
5539  select(7, [3 4 5 6], [], [3 4 5 6], {30, 0}) = 1 (in [5], left
{28, 150454})
5539  recvfrom(5, "b11111\n", 1024, 0, NULL, NULL) = 7
5539  select(7, [3 4 5 6], [5], [3 4 5 6], {30, 0}) = 1 (out [5], left
{29, 999997})
5539  sendto(5, "b11111\n", 7, 0, NULL, 0) = 7
5539  select(7, [3 4 5 6], [], [3 4 5 6], {30, 0}) = 1 (in [4], left
{28, 459804})
5539  recvfrom(4, "c11111\n", 1024, 0, NULL, NULL) = 7
5539  select(7, [3 4 5 6], [4], [3 4 5 6], {30, 0}) = 1 (out [4], left
{29, 999996})
5539  sendto(4, "c11111\n", 7, 0, NULL, 0) = 7


What am i missing ?

thanks


On Sat, Jan 23, 2016 at 6:45 PM, Eric Dumazet <edumazet@google.com> wrote:
> On Sat, Jan 23, 2016 at 6:08 PM, Kui Zhang <kuizhang@gmail.com> wrote:
>> Per man page, listen(fd, 0) is valid.
>>
>> A  backlog  argument  of  0 may allow the socket to accept
>> connections, in which case the length of the listen queue may be set
>> to an implementation-defined minimum value.
>>
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/listen.html
>>
>
> It is perfectly valid, as you pointed out, to set the implementation
> defined minimum to 0.
>
> Since you do not want to depend on some magic number, just set the
> backlog to 1, or even better 1000
>
>
>
> Linux man page :
>
>
>        int listen(int sockfd, int backlog);
>
> DESCRIPTION
>        listen() marks the socket referred to by sockfd as a passive
> socket, that is, as a socket that will be used
>        to accept incoming connection requests using accept(2).
>
>        The sockfd argument is a file descriptor that refers to a
> socket of type SOCK_STREAM or SOCK_SEQPACKET.
>
>        The backlog argument defines the maximum length to which the
> queue of pending connections  for  sockfd  may
>        grow.   If  a  connection  request  arrives when the queue is
> full, the client may receive an error with an
>        indication of ECONNREFUSED or, if the underlying protocol
> supports  retransmission,  the  request  may  be
>        ignored so that a later reattempt at connection succeeds.
>
> maximum length = 0
>
> For the second problem it was already fixed.
>
> commit acb4a6bfc80ddeea4c44074dd630f916259e909e
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Tue Oct 6 14:49:58 2015 -0700
>
>     tcp: ensure prior synack rtx behavior with small backlogs
>
>     Some applications use a listen() backlog of 1.
> -----------------------------------------------------------------------------------------------
>
>
> Setting the backlog to 0 is a way to not accept connections, while
> still keeping the port bound.
>
> A listener is now able to temporarily not accept new flows.
>
>
> If you want to accept connections, just set the backlog to something reasonable.
>
> As a bonus, your daemon will not reject a connection attempt just
> because few SYN_RECV sockets are waiting for the 3rd packet of 3WHS,
> when dealing with large RTT.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: connection failure after "tcp: remove max_qlen_log"
  2016-01-24 10:17       ` Kui Zhang
@ 2016-01-25  3:18         ` Eric Dumazet
  2016-01-26  7:40           ` Kui Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-01-25  3:18 UTC (permalink / raw)
  To: Kui Zhang; +Cc: Eric Dumazet, David Miller, linux-kernel

On Sun, Jan 24, 2016 at 2:17 AM, Kui Zhang <kuizhang@gmail.com> wrote:
> We licensed the java software. As far as I can tell, the connection is
> for IPC with a child process. There should not be large RTT.
>
> I will contact vendor regarding to listen(fd,0) issue, on Monday.
>
> I am not fully convinced, that is the problem. I saw that man page for
> listen. However accept() works, in python, with backlog = 0. The java
> software works with kernel build one commit before.

Note that python (2.4.6 here) translates sock.listen(0) to listen(fd,
1), maybe to avoid surprises on some OS ;)

$ cat lis.py
#!/usr/bin/python

import socket
sock = socket.socket()
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('127.0.0.1',42000))
sock.listen(0)
s, a= sock.accept()

s.send('test')

$ strace ./lis.py
...
close(5)                                = 0
close(4)                                = 0
close(3)                                = 0
futex(0x8d8e460, FUTEX_WAKE, 1)         = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(42000),
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
listen(3, 1)                            = 0
accept(3,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: connection failure after "tcp: remove max_qlen_log"
  2016-01-25  3:18         ` Eric Dumazet
@ 2016-01-26  7:40           ` Kui Zhang
  2016-02-27 19:56             ` Kui Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Kui Zhang @ 2016-01-26  7:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, linux-kernel

I was using 2.7.11.

Older version did set backlog to 1 ( http://bugs.python.org/issue8498 ).



On Sun, Jan 24, 2016 at 7:18 PM, Eric Dumazet <edumazet@google.com> wrote:
> On Sun, Jan 24, 2016 at 2:17 AM, Kui Zhang <kuizhang@gmail.com> wrote:
>> We licensed the java software. As far as I can tell, the connection is
>> for IPC with a child process. There should not be large RTT.
>>
>> I will contact vendor regarding to listen(fd,0) issue, on Monday.
>>
>> I am not fully convinced, that is the problem. I saw that man page for
>> listen. However accept() works, in python, with backlog = 0. The java
>> software works with kernel build one commit before.
>
> Note that python (2.4.6 here) translates sock.listen(0) to listen(fd,
> 1), maybe to avoid surprises on some OS ;)
>
> $ cat lis.py
> #!/usr/bin/python
>
> import socket
> sock = socket.socket()
> sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
> sock.bind(('127.0.0.1',42000))
> sock.listen(0)
> s, a= sock.accept()
>
> s.send('test')
>
> $ strace ./lis.py
> ...
> close(5)                                = 0
> close(4)                                = 0
> close(3)                                = 0
> futex(0x8d8e460, FUTEX_WAKE, 1)         = 0
> socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
> setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> bind(3, {sa_family=AF_INET, sin_port=htons(42000),
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
> listen(3, 1)                            = 0
> accept(3,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: connection failure after "tcp: remove max_qlen_log"
  2016-01-26  7:40           ` Kui Zhang
@ 2016-02-27 19:56             ` Kui Zhang
  0 siblings, 0 replies; 8+ messages in thread
From: Kui Zhang @ 2016-02-27 19:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, linux-kernel

Finally tracked down some additional info:

sock.listen(0) behavior appears to be affected by tcp_syncookies


sysctl -w 'net.ipv4.tcp_syncookies=1'; python lis.py
[/tmp]# telnet 127.0.0.1 42000
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
test
Connection closed by foreign host.


sysctl -w 'net.ipv4.tcp_syncookies=0'; python lis.py
[/tmp]# telnet 127.0.0.1 42000
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection timed out




>>> $ cat lis.py
>>> #!/usr/bin/python
>>>
>>> import socket
>>> sock = socket.socket()
>>> sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
>>> sock.bind(('127.0.0.1',42000))
>>> sock.listen(0)
>>> s, a= sock.accept()
>>>
>>> s.send('test')
>>>
~






On Mon, Jan 25, 2016 at 11:40 PM, Kui Zhang <kuizhang@gmail.com> wrote:
> I was using 2.7.11.
>
> Older version did set backlog to 1 ( http://bugs.python.org/issue8498 ).
>
>
>
> On Sun, Jan 24, 2016 at 7:18 PM, Eric Dumazet <edumazet@google.com> wrote:
>> On Sun, Jan 24, 2016 at 2:17 AM, Kui Zhang <kuizhang@gmail.com> wrote:
>>> We licensed the java software. As far as I can tell, the connection is
>>> for IPC with a child process. There should not be large RTT.
>>>
>>> I will contact vendor regarding to listen(fd,0) issue, on Monday.
>>>
>>> I am not fully convinced, that is the problem. I saw that man page for
>>> listen. However accept() works, in python, with backlog = 0. The java
>>> software works with kernel build one commit before.
>>
>> Note that python (2.4.6 here) translates sock.listen(0) to listen(fd,
>> 1), maybe to avoid surprises on some OS ;)
>>
>> $ cat lis.py
>> #!/usr/bin/python
>>
>> import socket
>> sock = socket.socket()
>> sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
>> sock.bind(('127.0.0.1',42000))
>> sock.listen(0)
>> s, a= sock.accept()
>>
>> s.send('test')
>>
>> $ strace ./lis.py
>> ...
>> close(5)                                = 0
>> close(4)                                = 0
>> close(3)                                = 0
>> futex(0x8d8e460, FUTEX_WAKE, 1)         = 0
>> socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
>> setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>> bind(3, {sa_family=AF_INET, sin_port=htons(42000),
>> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
>> listen(3, 1)                            = 0
>> accept(3,

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-02-27 19:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-23 23:54 connection failure after "tcp: remove max_qlen_log" Kui Zhang
2016-01-24  0:11 ` Eric Dumazet
2016-01-24  2:08   ` Kui Zhang
2016-01-24  2:45     ` Eric Dumazet
2016-01-24 10:17       ` Kui Zhang
2016-01-25  3:18         ` Eric Dumazet
2016-01-26  7:40           ` Kui Zhang
2016-02-27 19:56             ` Kui Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).