All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.34: Problem with UDP traffic on lo + poll(?)
@ 2010-09-06 17:11 Krzysztof Oledzki
  2010-09-06 19:42 ` Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: Krzysztof Oledzki @ 2010-09-06 17:11 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3359 bytes --]

Hello,

For the last two days I have been trying to track a starange problem I 
bumped into after upgrading my kernel from 2.6.31.12 to 2.6.34.6.

The problem is that several times a day, nagios logs that plugins are not 
able to resolve DNS hostnames of monitored hosts. The DNS service is 
provided locally by the host itself so all traffic is handled over a 
loopback interface. The host handles rather moderate traffic - ~1000pps 
and ~30 DNS requests per second. This DNS service is also provided to 
other hosts that are also running 2.6.34.6 and are connected over a 
Ethernet network, but the problem exists only locally.

After a long investigation I found that I'm able to reproduce this problem 
by adding: "*.t IN A 127.0.0.1" to the "lan" zone and using the following 
script:

--- cut here ---
a=0
while strace -o /tmp/s.log.1 -s 1024  /usr/lib64/nagios/plugins/check_icmp -H $a.t.lan ; do
  date
  sleep 0.1
  a=$((a+1))
done
-- cut here ---

Strace shows that the problem is in receiving responses from the 
nameserver:

socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.130.53")}, 28) = 0
poll([{fd=4, events=POLLOUT}], 1, 0)    = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "\333b\1\0\0\1\0\0\0\0\0\0\0041817\1t\3lan\0\0\1\0\1", 28, MSG_NOSIGNAL, NULL, 0) = 28
poll([{fd=4, events=POLLIN}], 1, 5000)  = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 0)    = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "\333b\1\0\0\1\0\0\0\0\0\0\0041817\1t\3lan\0\0\1\0\1", 28, MSG_NOSIGNAL, NULL, 0) = 28
poll([{fd=4, events=POLLIN}], 1, 5000)  = 0 (Timeout)
close(4)                                = 0

However, tcpdump attached to lo shows that both the request and 
the response are properly delivered:

03:00:47.181529 IP (tos 0x0, ttl 64, id 47869, offset 0, flags [DF], proto UDP (17), length 56)
     192.168.130.53.41083 > 192.168.130.53.53: 56162+ A? 1817.t.lan. (28)
03:00:47.181585 IP (tos 0x0, ttl 64, id 29563, offset 0, flags [none], proto UDP (17), length 112)
     192.168.130.53.53 > 192.168.130.53.41083: 56162* 1/1/1 1817.t.lan. A 127.0.0.1 (84)
--
03:00:52.186465 IP (tos 0x0, ttl 64, id 47870, offset 0, flags [DF], proto UDP (17), length 56)
     192.168.130.53.41083 > 192.168.130.53.53: 56162+ A? 1817.t.lan. (28)
03:00:52.186580 IP (tos 0x0, ttl 64, id 29576, offset 0, flags [none], proto UDP (17), length 112)
     192.168.130.53.53 > 192.168.130.53.41083: 56162* 1/1/1 1817.t.lan. A 127.0.0.1 (84)

03:00:57.298221 IP (tos 0x0, ttl 64, id 57985, offset 0, flags [DF], proto UDP (17), length 60)
     192.168.130.53.39370 > 192.168.130.53.53: 145+ A? 1817.t.lan.lan. (32)
03:00:57.298300 IP (tos 0x0, ttl 64, id 29584, offset 0, flags [none], proto UDP (17), length 116)
     192.168.130.53.53 > 192.168.130.53.39370: 145 NXDomain* 0/1/0 (88)

In most cases it takes from 2m to 15m to trigger this error and so far I 
have not been able to reproduce it on my lab environment. Downgrading 
the kernel back to 2.6.31 cures the issue.

I have a very short service window so bisecting is nearly impossible. 
During the next few days I should be able to find if this problem was 
introduced in 2.6.32 or 2.6.33, but if you have clues what to check first 
or idea about some smart debug patches, I will be very grateful.

Best regards,

 			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2010-09-09  4:39 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-06 17:11 2.6.34: Problem with UDP traffic on lo + poll(?) Krzysztof Oledzki
2010-09-06 19:42 ` Eric Dumazet
2010-09-06 19:55   ` Krzysztof Olędzki
2010-09-06 20:29     ` Eric Dumazet
2010-09-06 20:44       ` Krzysztof Olędzki
2010-09-06 20:48         ` Krzysztof Olędzki
2010-09-07 15:37           ` Krzysztof Olędzki
2010-09-07 16:36             ` Eric Dumazet
2010-09-07 19:20               ` Krzysztof Olędzki
2010-09-07 19:26               ` Eric Dumazet
2010-09-07 19:59                 ` David Miller
2010-09-07 21:35                   ` [PATCH] inet: dont set inet_rcv_saddr in connect() Eric Dumazet
2010-09-07 21:52                     ` Krzysztof Olędzki
2010-09-08  2:16                       ` David Miller
2010-09-08  4:13                         ` Eric Dumazet
2010-09-08  2:34                     ` Brian Haley
2010-09-08  3:34                       ` David Miller
2010-09-08  4:42                         ` Eric Dumazet
2010-09-08  5:51                           ` David Miller
2010-09-08  4:57                       ` Eric Dumazet
2010-09-08  5:36                         ` David Miller
2010-09-08  5:52                           ` Eric Dumazet
2010-09-08 10:10                             ` [PATCH] udp: add rehash on connect() Eric Dumazet
2010-09-08 15:06                               ` Krzysztof Olędzki
2010-09-08 15:17                                 ` Eric Dumazet
2010-09-08 15:29                                   ` Krzysztof Olędzki
2010-09-08 15:08                               ` [PATCH v2] " Eric Dumazet
2010-09-08 16:52                                 ` Krzysztof Olędzki
2010-09-09  4:39                                   ` David Miller
2010-09-08 14:27                             ` [PATCH] inet: dont set inet_rcv_saddr in connect() Eric Dumazet
2010-09-07 21:28                 ` 2.6.34: Problem with UDP traffic on lo + poll(?) Krzysztof Olędzki
2010-09-07 21:39                   ` Eric Dumazet
2010-09-07 21:51                     ` Krzysztof Olędzki
2010-09-08  4:12                       ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.