netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible networking regression in 3.6.0
@ 2012-09-17 15:44 Chris Clayton
  2012-09-18 14:21 ` Chris Clayton
  0 siblings, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-09-17 15:44 UTC (permalink / raw)
  To: netdev

Hi,

I'm having a problem with networking. I'm running Windows XP as a KVM 
guest on a laptop running kernel 3.6.0-rc6. The identical configuration 
works fine with kernels 3.5.4 and 3.4.11 (and has done so, largely 
unchanged, since since KVM was introduced in 2.6.<whatever>.)

The configuration is:

XP guest:	192.168.200.1 (gateway 192.168.200.254)
tap0:		192.168.200.254
host:		192.168.0.40 (gateway 192.168.0.1)
router:		192.168.0.1

The script that starts up the firewall includes the following commands:

# Load the connection-sharing for qemu/kvm guests
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
...
# allow traffic to and from the qemu/kvm virtual networks
NETS="200 201"
for net in $NETS; do
   iptables -A INPUT -s 192.168.$net.0/24 -j ACCEPT
   iptables -A OUTPUT -d 192.168.$net.0/24 -j ACCEPT
done
...

The network-related modules that are loaded are:

$ lsmod
Module                  Size  Used by
tun                    12412  0
xt_state                 891  1
iptable_filter           852  1
ipt_MASQUERADE          1222  1
iptable_nat             3087  1
nf_nat                 10901  2 ipt_MASQUERADE,iptable_nat
nf_conntrack_ipv4       4942  4 nf_nat,iptable_nat
nf_defrag_ipv4           815  1 nf_conntrack_ipv4
nf_conntrack           37644  5 
ipt_MASQUERADE,nf_nat,xt_state,iptable_nat,nf_conntrack_ipv4
...
r8169                  47159  0

 From the host I can successfully ping the guest, tap0 and the router as 
you would expect, but from the guest, although I can ping the host and 
tap0, I cannot ping the router. In practice, this means I have no 
internet access from the guest. As I say, this configuration works 
perfectly under 3.5.x and 3.4.x kernels.

I'll do a coarse-grained "bisect" of Linus' 3.6 release candidates and 
report back, but does anyone have any prime-suspect patches that may be 
at the cause of this problem?

Let me know if there are any other diagnostics I can provide. Also, as 
I'm not subscribed to netdev, please cc me to any reply.

Thanks,

Chris

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-17 15:44 Possible networking regression in 3.6.0 Chris Clayton
@ 2012-09-18 14:21 ` Chris Clayton
  2012-09-18 14:31   ` Chris Clayton
  0 siblings, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-09-18 14:21 UTC (permalink / raw)
  To: netdev



On 09/17/12 16:44, Chris Clayton wrote:
> Hi,
>
> I'm having a problem with networking. I'm running Windows XP as a KVM
> guest on a laptop running kernel 3.6.0-rc6. The identical configuration
> works fine with kernels 3.5.4 and 3.4.11 (and has done so, largely
> unchanged, since since KVM was introduced in 2.6.<whatever>.)
>
> The configuration is:
>
> XP guest:    192.168.200.1 (gateway 192.168.200.254)
> tap0:        192.168.200.254
> host:        192.168.0.40 (gateway 192.168.0.1)
> router:        192.168.0.1
>
> The script that starts up the firewall includes the following commands:
>
> # Load the connection-sharing for qemu/kvm guests
> echo 1 > /proc/sys/net/ipv4/ip_forward
> iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
> ...
> # allow traffic to and from the qemu/kvm virtual networks
> NETS="200 201"
> for net in $NETS; do
>    iptables -A INPUT -s 192.168.$net.0/24 -j ACCEPT
>    iptables -A OUTPUT -d 192.168.$net.0/24 -j ACCEPT
> done
> ...
>
> The network-related modules that are loaded are:
>
> $ lsmod
> Module                  Size  Used by
> tun                    12412  0
> xt_state                 891  1
> iptable_filter           852  1
> ipt_MASQUERADE          1222  1
> iptable_nat             3087  1
> nf_nat                 10901  2 ipt_MASQUERADE,iptable_nat
> nf_conntrack_ipv4       4942  4 nf_nat,iptable_nat
> nf_defrag_ipv4           815  1 nf_conntrack_ipv4
> nf_conntrack           37644  5
> ipt_MASQUERADE,nf_nat,xt_state,iptable_nat,nf_conntrack_ipv4
> ...
> r8169                  47159  0
>
>  From the host I can successfully ping the guest, tap0 and the router as
> you would expect, but from the guest, although I can ping the host and
> tap0, I cannot ping the router. In practice, this means I have no
> internet access from the guest. As I say, this configuration works
> perfectly under 3.5.x and 3.4.x kernels.
>
> I'll do a coarse-grained "bisect" of Linus' 3.6 release candidates and
> report back, but does anyone have any prime-suspect patches that may be
> at the cause of this problem?
>

-rc1 turned out to have the problem so I've bisected between 3.5 and 
3.6-rc1. I arrived at:

$ git bisect bad
d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5 is the first bad commit
commit d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
Author: David S. Miller <davem@davemloft.net>
Date:   Tue Jul 17 12:58:50 2012 -0700

     ipv4: Cache input routes in fib_info nexthops.

     Caching input routes is slightly simpler than output routes, since we
     don't need to be concerned with nexthop exceptions.  (locally
     destined, and routed packets, never trigger PMTU events or redirects
     that will be processed by us).

     However, we have to elide caching for the DIRECTSRC and non-zero itag
     cases.

     Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 6bbc75c1cbe62bf84ea412d3b98adf2b614779cd 
3ad7256b4a71e63ca4530977c0550121ea803d35 M      include
:040000 040000 18c2a950a53c4eec9bfa12185d1e382dfed74af8 
a2ab6157d6cd54930da395758c6ded3a225d1f04 M      net

The bisect log:
git bisect start
# bad: [0d7614f09c1ebdbaa1599a5aba7593f147bf96ee] Linux 3.6-rc1
git bisect bad 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee
# good: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
git bisect good 28a33cbc24e4256c143dce96c7d93bf423229f92
# bad: [614a6d4341b3760ca98a1c2c09141b71db5d1e90] Merge branch 'for-3.6' 
of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
git bisect bad 614a6d4341b3760ca98a1c2c09141b71db5d1e90
# bad: [320f5ea0cedc08ef65d67e056bcb9d181386ef2c] genetlink: define 
lockdep_genl_is_held() when CONFIG_LOCKDEP
git bisect bad 320f5ea0cedc08ef65d67e056bcb9d181386ef2c
# good: [0cd06647b7c24f6633e32a505930a9aa70138c22] Merge branch 'master' 
of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
git bisect good 0cd06647b7c24f6633e32a505930a9aa70138c22
# good: [dbfa600148a25903976910863c75dae185f8d187] cxgb3: set maximal 
number of default RSS queues
git bisect good dbfa600148a25903976910863c75dae185f8d187
# good: [efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3] bnx2: Try to recover 
from PCI block reset
git bisect good efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3
# good: [1bf91cdc1bba94ea062a9147d924815c13f029f2] ixgbe: Drop 
references to deprecated pci_ DMA api and instead use dma_ API
git bisect good 1bf91cdc1bba94ea062a9147d924815c13f029f2
# good: [b6dfd939fdc249fcf8cd7b8006f76239b33eb581] ixgbe: add support 
for new 82599 device
git bisect good b6dfd939fdc249fcf8cd7b8006f76239b33eb581
# good: [3ba97381343b271296487bf073eb670d5465a8b8] net: ethernet: 
davinci_emac: add pm_runtime support
git bisect good 3ba97381343b271296487bf073eb670d5465a8b8
# bad: [5e9965c15ba88319500284e590733f4a4629a288] Merge branch 
'kill_rtcache'
git bisect bad 5e9965c15ba88319500284e590733f4a4629a288
# good: [f5b0a8743601a4477419171f5046bd07d1c080a0] net: Document 
dst->obsolete better.
git bisect good f5b0a8743601a4477419171f5046bd07d1c080a0
# bad: [ba3f7f04ef2b19aace38f855aedd17fe43035d50] ipv4: Kill 
FLOWI_FLAG_RT_NOCACHE and associated code.
git bisect bad ba3f7f04ef2b19aace38f855aedd17fe43035d50
# good: [f2bb4bedf35d5167a073dcdddf16543f351ef3ae] ipv4: Cache output 
routes in fib_info nexthops.
git bisect good f2bb4bedf35d5167a073dcdddf16543f351ef3ae
# bad: [d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5] ipv4: Cache input 
routes in fib_info nexthops.
git bisect bad d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5

Checking out the parent commit 
(f2bb4bedf35d5167a073dcdddf16543f351ef3ae) and building and installing 
the kernel gives a working configuration, so I'm pretty confident in the 
outcome of the bisect. Reversing the patch gives errors, so I've not 
tested master with the patch reversed.

Let me know if I can help in any way to identify a fix.

Chris

> Let me know if there are any other diagnostics I can provide. Also, as
> I'm not subscribed to netdev, please cc me to any reply.
>
> Thanks,
>
> Chris

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-18 14:21 ` Chris Clayton
@ 2012-09-18 14:31   ` Chris Clayton
  2012-09-18 14:40     ` Eric Dumazet
  2012-09-18 14:44     ` Possible networking regression in 3.6.0 Chris Clayton
  0 siblings, 2 replies; 59+ messages in thread
From: Chris Clayton @ 2012-09-18 14:31 UTC (permalink / raw)
  To: netdev

>> ...
>> r8169                  47159  0
>>
>>  From the host I can successfully ping the guest, tap0 and the router as
>> you would expect, but from the guest, although I can ping the host and
>> tap0, I cannot ping the router. In practice, this means I have no
>> internet access from the guest. As I say, this configuration works
>> perfectly under 3.5.x and 3.4.x kernels.
>>
>> I'll do a coarse-grained "bisect" of Linus' 3.6 release candidates and
>> report back, but does anyone have any prime-suspect patches that may be
>> at the cause of this problem?
>>
>
> -rc1 turned out to have the problem so I've bisected between 3.5 and
> 3.6-rc1. I arrived at:
>
> $ git bisect bad
> d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5 is the first bad commit
> commit d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
> Author: David S. Miller <davem@davemloft.net>
> Date:   Tue Jul 17 12:58:50 2012 -0700
>
>      ipv4: Cache input routes in fib_info nexthops.
>
>      Caching input routes is slightly simpler than output routes, since we
>      don't need to be concerned with nexthop exceptions.  (locally
>      destined, and routed packets, never trigger PMTU events or redirects
>      that will be processed by us).
>
>      However, we have to elide caching for the DIRECTSRC and non-zero itag
>      cases.
>
>      Signed-off-by: David S. Miller <davem@davemloft.net>
>
> :040000 040000 6bbc75c1cbe62bf84ea412d3b98adf2b614779cd
> 3ad7256b4a71e63ca4530977c0550121ea803d35 M      include
> :040000 040000 18c2a950a53c4eec9bfa12185d1e382dfed74af8
> a2ab6157d6cd54930da395758c6ded3a225d1f04 M      net
>
> The bisect log:
> git bisect start
> # bad: [0d7614f09c1ebdbaa1599a5aba7593f147bf96ee] Linux 3.6-rc1
> git bisect bad 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee
> # good: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
> git bisect good 28a33cbc24e4256c143dce96c7d93bf423229f92
> # bad: [614a6d4341b3760ca98a1c2c09141b71db5d1e90] Merge branch 'for-3.6'
> of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
> git bisect bad 614a6d4341b3760ca98a1c2c09141b71db5d1e90
> # bad: [320f5ea0cedc08ef65d67e056bcb9d181386ef2c] genetlink: define
> lockdep_genl_is_held() when CONFIG_LOCKDEP
> git bisect bad 320f5ea0cedc08ef65d67e056bcb9d181386ef2c
> # good: [0cd06647b7c24f6633e32a505930a9aa70138c22] Merge branch 'master'
> of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
> git bisect good 0cd06647b7c24f6633e32a505930a9aa70138c22
> # good: [dbfa600148a25903976910863c75dae185f8d187] cxgb3: set maximal
> number of default RSS queues
> git bisect good dbfa600148a25903976910863c75dae185f8d187
> # good: [efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3] bnx2: Try to recover
> from PCI block reset
> git bisect good efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3
> # good: [1bf91cdc1bba94ea062a9147d924815c13f029f2] ixgbe: Drop
> references to deprecated pci_ DMA api and instead use dma_ API
> git bisect good 1bf91cdc1bba94ea062a9147d924815c13f029f2
> # good: [b6dfd939fdc249fcf8cd7b8006f76239b33eb581] ixgbe: add support
> for new 82599 device
> git bisect good b6dfd939fdc249fcf8cd7b8006f76239b33eb581
> # good: [3ba97381343b271296487bf073eb670d5465a8b8] net: ethernet:
> davinci_emac: add pm_runtime support
> git bisect good 3ba97381343b271296487bf073eb670d5465a8b8
> # bad: [5e9965c15ba88319500284e590733f4a4629a288] Merge branch
> 'kill_rtcache'
> git bisect bad 5e9965c15ba88319500284e590733f4a4629a288
> # good: [f5b0a8743601a4477419171f5046bd07d1c080a0] net: Document
> dst->obsolete better.
> git bisect good f5b0a8743601a4477419171f5046bd07d1c080a0
> # bad: [ba3f7f04ef2b19aace38f855aedd17fe43035d50] ipv4: Kill
> FLOWI_FLAG_RT_NOCACHE and associated code.
> git bisect bad ba3f7f04ef2b19aace38f855aedd17fe43035d50
> # good: [f2bb4bedf35d5167a073dcdddf16543f351ef3ae] ipv4: Cache output
> routes in fib_info nexthops.
> git bisect good f2bb4bedf35d5167a073dcdddf16543f351ef3ae
> # bad: [d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5] ipv4: Cache input
> routes in fib_info nexthops.
> git bisect bad d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
>
> Checking out the parent commit
> (f2bb4bedf35d5167a073dcdddf16543f351ef3ae) and building and installing
> the kernel gives a working configuration, so I'm pretty confident in the
> outcome of the bisect. Reversing the patch gives errors, so I've not
> tested master with the patch reversed.
>
> Let me know if I can help in any way to identify a fix.
>
Sorry, I forgot to say that I also have tried running TinyCore Linux as 
a KVM guest on a 3.6.0-rc6 kernel, and I can ping the router fine, so 
the problem seems to be something specifically related to ruuning 
Windows XP as the guest. I don't have any other guests installed so 
that's as much as I can say, although I could maybe install a Win7 guest 
tomorrow if that would help.

> Chris
>
>> Let me know if there are any other diagnostics I can provide. Also, as
>> I'm not subscribed to netdev, please cc me to any reply.
>>
>> Thanks,
>>
>> Chris

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-18 14:31   ` Chris Clayton
@ 2012-09-18 14:40     ` Eric Dumazet
  2012-09-18 15:51       ` Chris Clayton
  2012-09-19 15:26       ` Chris Clayton
  2012-09-18 14:44     ` Possible networking regression in 3.6.0 Chris Clayton
  1 sibling, 2 replies; 59+ messages in thread
From: Eric Dumazet @ 2012-09-18 14:40 UTC (permalink / raw)
  To: Chris Clayton; +Cc: netdev

On Tue, 2012-09-18 at 15:31 +0100, Chris Clayton wrote:
> >> ...
> >> r8169                  47159  0
> >>
> >>  From the host I can successfully ping the guest, tap0 and the router as
> >> you would expect, but from the guest, although I can ping the host and
> >> tap0, I cannot ping the router. In practice, this means I have no
> >> internet access from the guest. As I say, this configuration works
> >> perfectly under 3.5.x and 3.4.x kernels.
> >>
> >> I'll do a coarse-grained "bisect" of Linus' 3.6 release candidates and
> >> report back, but does anyone have any prime-suspect patches that may be
> >> at the cause of this problem?
> >>
> >
> > -rc1 turned out to have the problem so I've bisected between 3.5 and
> > 3.6-rc1. I arrived at:
> >
> > $ git bisect bad
> > d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5 is the first bad commit
> > commit d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
> > Author: David S. Miller <davem@davemloft.net>
> > Date:   Tue Jul 17 12:58:50 2012 -0700
> >
> >      ipv4: Cache input routes in fib_info nexthops.
> >
> >      Caching input routes is slightly simpler than output routes, since we
> >      don't need to be concerned with nexthop exceptions.  (locally
> >      destined, and routed packets, never trigger PMTU events or redirects
> >      that will be processed by us).
> >
> >      However, we have to elide caching for the DIRECTSRC and non-zero itag
> >      cases.
> >
> >      Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > :040000 040000 6bbc75c1cbe62bf84ea412d3b98adf2b614779cd
> > 3ad7256b4a71e63ca4530977c0550121ea803d35 M      include
> > :040000 040000 18c2a950a53c4eec9bfa12185d1e382dfed74af8
> > a2ab6157d6cd54930da395758c6ded3a225d1f04 M      net
> >
> > The bisect log:
> > git bisect start
> > # bad: [0d7614f09c1ebdbaa1599a5aba7593f147bf96ee] Linux 3.6-rc1
> > git bisect bad 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee
> > # good: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
> > git bisect good 28a33cbc24e4256c143dce96c7d93bf423229f92
> > # bad: [614a6d4341b3760ca98a1c2c09141b71db5d1e90] Merge branch 'for-3.6'
> > of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
> > git bisect bad 614a6d4341b3760ca98a1c2c09141b71db5d1e90
> > # bad: [320f5ea0cedc08ef65d67e056bcb9d181386ef2c] genetlink: define
> > lockdep_genl_is_held() when CONFIG_LOCKDEP
> > git bisect bad 320f5ea0cedc08ef65d67e056bcb9d181386ef2c
> > # good: [0cd06647b7c24f6633e32a505930a9aa70138c22] Merge branch 'master'
> > of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
> > git bisect good 0cd06647b7c24f6633e32a505930a9aa70138c22
> > # good: [dbfa600148a25903976910863c75dae185f8d187] cxgb3: set maximal
> > number of default RSS queues
> > git bisect good dbfa600148a25903976910863c75dae185f8d187
> > # good: [efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3] bnx2: Try to recover
> > from PCI block reset
> > git bisect good efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3
> > # good: [1bf91cdc1bba94ea062a9147d924815c13f029f2] ixgbe: Drop
> > references to deprecated pci_ DMA api and instead use dma_ API
> > git bisect good 1bf91cdc1bba94ea062a9147d924815c13f029f2
> > # good: [b6dfd939fdc249fcf8cd7b8006f76239b33eb581] ixgbe: add support
> > for new 82599 device
> > git bisect good b6dfd939fdc249fcf8cd7b8006f76239b33eb581
> > # good: [3ba97381343b271296487bf073eb670d5465a8b8] net: ethernet:
> > davinci_emac: add pm_runtime support
> > git bisect good 3ba97381343b271296487bf073eb670d5465a8b8
> > # bad: [5e9965c15ba88319500284e590733f4a4629a288] Merge branch
> > 'kill_rtcache'
> > git bisect bad 5e9965c15ba88319500284e590733f4a4629a288
> > # good: [f5b0a8743601a4477419171f5046bd07d1c080a0] net: Document
> > dst->obsolete better.
> > git bisect good f5b0a8743601a4477419171f5046bd07d1c080a0
> > # bad: [ba3f7f04ef2b19aace38f855aedd17fe43035d50] ipv4: Kill
> > FLOWI_FLAG_RT_NOCACHE and associated code.
> > git bisect bad ba3f7f04ef2b19aace38f855aedd17fe43035d50
> > # good: [f2bb4bedf35d5167a073dcdddf16543f351ef3ae] ipv4: Cache output
> > routes in fib_info nexthops.
> > git bisect good f2bb4bedf35d5167a073dcdddf16543f351ef3ae
> > # bad: [d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5] ipv4: Cache input
> > routes in fib_info nexthops.
> > git bisect bad d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
> >
> > Checking out the parent commit
> > (f2bb4bedf35d5167a073dcdddf16543f351ef3ae) and building and installing
> > the kernel gives a working configuration, so I'm pretty confident in the
> > outcome of the bisect. Reversing the patch gives errors, so I've not
> > tested master with the patch reversed.
> >
> > Let me know if I can help in any way to identify a fix.
> >
> Sorry, I forgot to say that I also have tried running TinyCore Linux as 
> a KVM guest on a 3.6.0-rc6 kernel, and I can ping the router fine, so 
> the problem seems to be something specifically related to ruuning 
> Windows XP as the guest. I don't have any other guests installed so 
> that's as much as I can say, although I could maybe install a Win7 guest 
> tomorrow if that would help.

It would help to have some traffic sample, maybe.

Especially if the problem is not easily reproductible for us.

(I dont have Windows XP nor Win7)

Also the bisect might point to a commit with an already fixed bug :

commit 4331debc51ee1ce319f4a389484e0e8e05de2aca
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Jul 25 05:11:23 2012 +0000

    ipv4: rt_cache_valid must check expired routes
    
    commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
    introduced rt_cache_valid() helper. It unfortunately doesn't check if
    route is expired before caching it.
    
    I noticed sk_setup_caps() was constantly called on a tcp workload.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-18 14:31   ` Chris Clayton
  2012-09-18 14:40     ` Eric Dumazet
@ 2012-09-18 14:44     ` Chris Clayton
  1 sibling, 0 replies; 59+ messages in thread
From: Chris Clayton @ 2012-09-18 14:44 UTC (permalink / raw)
  To: netdev

>>
> Sorry, I forgot to say that I also have tried running TinyCore Linux as
> a KVM guest on a 3.6.0-rc6 kernel, and I can ping the router fine, so
> the problem seems to be something specifically related to ruuning
> Windows XP as the guest. I don't have any other guests installed so
> that's as much as I can say, although I could maybe install a Win7 guest
> tomorrow if that would help.
>

Sorry again, but ignore the message above, please. Wrong kernel used in 
test. In fact, I get the same failure to ping the router running on a 
6.6.0-rc6 kernel.

Apologies for the noise.

Chris

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-18 14:40     ` Eric Dumazet
@ 2012-09-18 15:51       ` Chris Clayton
  2012-09-19 15:26       ` Chris Clayton
  1 sibling, 0 replies; 59+ messages in thread
From: Chris Clayton @ 2012-09-18 15:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Thanks for the reply, Eric.

>>> -rc1 turned out to have the problem so I've bisected between 3.5 and
>>> 3.6-rc1. I arrived at:
>>>
>>> $ git bisect bad
>>> d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5 is the first bad commit
>>> commit d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
>>> Author: David S. Miller <davem@davemloft.net>
>>> Date:   Tue Jul 17 12:58:50 2012 -0700
>>>
>>>       ipv4: Cache input routes in fib_info nexthops.
>>>
>>>       Caching input routes is slightly simpler than output routes, since we
>>>       don't need to be concerned with nexthop exceptions.  (locally
>>>       destined, and routed packets, never trigger PMTU events or redirects
>>>       that will be processed by us).
>>>
>>>       However, we have to elide caching for the DIRECTSRC and non-zero itag
>>>       cases.
>>>
>>>       Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>> :040000 040000 6bbc75c1cbe62bf84ea412d3b98adf2b614779cd
>>> 3ad7256b4a71e63ca4530977c0550121ea803d35 M      include
>>> :040000 040000 18c2a950a53c4eec9bfa12185d1e382dfed74af8
>>> a2ab6157d6cd54930da395758c6ded3a225d1f04 M      net
>>>
>>> The bisect log:
>>> git bisect start
>>> # bad: [0d7614f09c1ebdbaa1599a5aba7593f147bf96ee] Linux 3.6-rc1
>>> git bisect bad 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee
>>> # good: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
>>> git bisect good 28a33cbc24e4256c143dce96c7d93bf423229f92
>>> # bad: [614a6d4341b3760ca98a1c2c09141b71db5d1e90] Merge branch 'for-3.6'
>>> of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
>>> git bisect bad 614a6d4341b3760ca98a1c2c09141b71db5d1e90
>>> # bad: [320f5ea0cedc08ef65d67e056bcb9d181386ef2c] genetlink: define
>>> lockdep_genl_is_held() when CONFIG_LOCKDEP
>>> git bisect bad 320f5ea0cedc08ef65d67e056bcb9d181386ef2c
>>> # good: [0cd06647b7c24f6633e32a505930a9aa70138c22] Merge branch 'master'
>>> of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
>>> git bisect good 0cd06647b7c24f6633e32a505930a9aa70138c22
>>> # good: [dbfa600148a25903976910863c75dae185f8d187] cxgb3: set maximal
>>> number of default RSS queues
>>> git bisect good dbfa600148a25903976910863c75dae185f8d187
>>> # good: [efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3] bnx2: Try to recover
>>> from PCI block reset
>>> git bisect good efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3
>>> # good: [1bf91cdc1bba94ea062a9147d924815c13f029f2] ixgbe: Drop
>>> references to deprecated pci_ DMA api and instead use dma_ API
>>> git bisect good 1bf91cdc1bba94ea062a9147d924815c13f029f2
>>> # good: [b6dfd939fdc249fcf8cd7b8006f76239b33eb581] ixgbe: add support
>>> for new 82599 device
>>> git bisect good b6dfd939fdc249fcf8cd7b8006f76239b33eb581
>>> # good: [3ba97381343b271296487bf073eb670d5465a8b8] net: ethernet:
>>> davinci_emac: add pm_runtime support
>>> git bisect good 3ba97381343b271296487bf073eb670d5465a8b8
>>> # bad: [5e9965c15ba88319500284e590733f4a4629a288] Merge branch
>>> 'kill_rtcache'
>>> git bisect bad 5e9965c15ba88319500284e590733f4a4629a288
>>> # good: [f5b0a8743601a4477419171f5046bd07d1c080a0] net: Document
>>> dst->obsolete better.
>>> git bisect good f5b0a8743601a4477419171f5046bd07d1c080a0
>>> # bad: [ba3f7f04ef2b19aace38f855aedd17fe43035d50] ipv4: Kill
>>> FLOWI_FLAG_RT_NOCACHE and associated code.
>>> git bisect bad ba3f7f04ef2b19aace38f855aedd17fe43035d50
>>> # good: [f2bb4bedf35d5167a073dcdddf16543f351ef3ae] ipv4: Cache output
>>> routes in fib_info nexthops.
>>> git bisect good f2bb4bedf35d5167a073dcdddf16543f351ef3ae
>>> # bad: [d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5] ipv4: Cache input
>>> routes in fib_info nexthops.
>>> git bisect bad d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
>>>
>>> Checking out the parent commit
>>> (f2bb4bedf35d5167a073dcdddf16543f351ef3ae) and building and installing
>>> the kernel gives a working configuration, so I'm pretty confident in the
>>> outcome of the bisect. Reversing the patch gives errors, so I've not
>>> tested master with the patch reversed.
>>>
>>> Let me know if I can help in any way to identify a fix.
>>>
>> Sorry, I forgot to say that I also have tried running TinyCore Linux as
>> a KVM guest on a 3.6.0-rc6 kernel, and I can ping the router fine, so
>> the problem seems to be something specifically related to ruuning
>> Windows XP as the guest. I don't have any other guests installed so
>> that's as much as I can say, although I could maybe install a Win7 guest
>> tomorrow if that would help.
>


I hope you've seen my later email in which I reported my error in my 
testing that led me to believe that all was OK with a linux client. In 
fact, The router is inaccessible from both the Windows XP and the Linux 
clients.

> It would help to have some traffic sample, maybe.
>

I'll need help here. How would I go about collecting that traffic. I 
have wireshark installed, but haven't used it for years. Would a trace 
from that be helpful? It might take me a while to figure out how to 
capture it?

> Especially if the problem is not easily reproductible for us.
>
> (I dont have Windows XP nor Win7)
>
> Also the bisect might point to a commit with an already fixed bug :

This fix is already in 3.6.0-rc6. BTW, I've pulled the latest changes 
from kernel.org this afternoon, but that hasn't helped.
>
> commit 4331debc51ee1ce319f4a389484e0e8e05de2aca
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Wed Jul 25 05:11:23 2012 +0000
>
>      ipv4: rt_cache_valid must check expired routes
>
>      commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
>      introduced rt_cache_valid() helper. It unfortunately doesn't check if
>      route is expired before caching it.
>
>      I noticed sk_setup_caps() was constantly called on a tcp workload.
>
>      Signed-off-by: Eric Dumazet <edumazet@google.com>
>      Signed-off-by: David S. Miller <davem@davemloft.net>
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-18 14:40     ` Eric Dumazet
  2012-09-18 15:51       ` Chris Clayton
@ 2012-09-19 15:26       ` Chris Clayton
  2012-09-22  6:26         ` Chris Clayton
  1 sibling, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-09-19 15:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

>
> It would help to have some traffic sample, maybe.
>
> Especially if the problem is not easily reproductible for us.
>

OK, I've used an netsniff-ng to capture the traffic on all interfaces on 
the host (that would be tap0 and eth0, I guess) whilst attempting to 
ping the router from the WinXP KVM client. The result is a pcap file 
that I processed with tcpdump to produce:

reading from file net-trace.pcap, link-type EN10MB (Ethernet)
14:56:31.406336 ARP, Request who-has 192.168.200.254 tell 192.168.200.1, 
length 28
         0x0000:  0001 0800 0604 0001 5254 0c3b 1728 c0a8
         0x0010:  c801 0000 0000 0000 c0a8 c8fe
14:56:31.406357 ARP, Reply 192.168.200.254 is-at 46:83:93:8f:f0:7e, 
length 28
         0x0000:  0001 0800 0604 0002 4683 938f f07e c0a8
         0x0010:  c8fe 5254 0c3b 1728 c0a8 c801
14:56:31.406534 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id 
512, seq 4352, length 40
         0x0000:  4500 003c 0195 0000 8001 efd8 c0a8 c801
         0x0010:  c0a8 0001 0800 3a5c 0200 1100 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:31.406566 ARP, Request who-has 192.168.0.1 tell 192.168.0.40, 
length 28
         0x0000:  0001 0800 0604 0001 5c9a d85c 6331 c0a8
         0x0010:  0028 0000 0000 0000 c0a8 0001
14:56:31.410830 ARP, Reply 192.168.0.1 is-at 00:1f:33:80:09:44, length 46
         0x0000:  0001 0800 0604 0002 001f 3380 0944 c0a8
         0x0010:  0001 5c9a d85c 6331 c0a8 0028 c0a8 0001
         0x0020:  e000 0001 1164 ee9b 0000 0000 4500
14:56:31.410851 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id 
512, seq 4352, length 40
         0x0000:  4500 003c 0195 0000 7f01 b8b2 c0a8 0028
         0x0010:  c0a8 0001 0800 3a5c 0200 1100 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:31.414474 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512, 
seq 4352, length 40
         0x0000:  4500 003c cf4f 0000 ff01 6af7 c0a8 0001
         0x0010:  c0a8 0028 0000 425c 0200 1100 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:36.404781 ARP, Request who-has 192.168.0.40 tell 192.168.0.1, 
length 46
         0x0000:  0001 0800 0604 0001 001f 3380 0944 c0a8
         0x0010:  0001 0000 0000 0000 c0a8 0028 c0a8 0001
         0x0020:  c0a8 0028 0000 425c 0200 1100 6162
14:56:36.404806 ARP, Reply 192.168.0.40 is-at 5c:9a:d8:5c:63:31, length 28
         0x0000:  0001 0800 0604 0002 5c9a d85c 6331 c0a8
         0x0010:  0028 001f 3380 0944 c0a8 0001
14:56:36.689750 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id 
512, seq 4608, length 40
         0x0000:  4500 003c 0196 0000 8001 efd7 c0a8 c801
         0x0010:  c0a8 0001 0800 395c 0200 1200 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:36.689774 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id 
512, seq 4608, length 40
         0x0000:  4500 003c 0196 0000 7f01 b8b1 c0a8 0028
         0x0010:  c0a8 0001 0800 395c 0200 1200 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:36.693330 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512, 
seq 4608, length 40
         0x0000:  4500 003c cf50 0000 ff01 6af6 c0a8 0001
         0x0010:  c0a8 0028 0000 415c 0200 1200 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:42.189424 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id 
512, seq 4864, length 40
         0x0000:  4500 003c 0197 0000 8001 efd6 c0a8 c801
         0x0010:  c0a8 0001 0800 385c 0200 1300 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:42.189447 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id 
512, seq 4864, length 40
         0x0000:  4500 003c 0197 0000 7f01 b8b0 c0a8 0028
         0x0010:  c0a8 0001 0800 385c 0200 1300 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:42.193029 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512, 
seq 4864, length 40
         0x0000:  4500 003c cf51 0000 ff01 6af5 c0a8 0001
         0x0010:  c0a8 0028 0000 405c 0200 1300 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:47.689414 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id 
512, seq 5120, length 40
         0x0000:  4500 003c 0198 0000 8001 efd5 c0a8 c801
         0x0010:  c0a8 0001 0800 375c 0200 1400 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:47.689439 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id 
512, seq 5120, length 40
         0x0000:  4500 003c 0198 0000 7f01 b8af c0a8 0028
         0x0010:  c0a8 0001 0800 375c 0200 1400 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869
14:56:47.693661 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512, 
seq 5120, length 40
         0x0000:  4500 003c cf52 0000 ff01 6af4 c0a8 0001
         0x0010:  c0a8 0028 0000 3f5c 0200 1400 6162 6364
         0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
         0x0030:  7576 7761 6263 6465 6667 6869

Is this what you asked for?

Chris

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-19 15:26       ` Chris Clayton
@ 2012-09-22  6:26         ` Chris Clayton
  2012-09-27 11:50           ` Chris Clayton
  0 siblings, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-09-22  6:26 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Eric Dumazet, netdev

I guess you network developer folks are either very busy or this 
regression is proving a bit troublesome to identify, so I've opened a 
bugzilla report to keep track of it. The report number is 47761.

Chris

On 09/19/12 16:26, Chris Clayton wrote:
>>
>> It would help to have some traffic sample, maybe.
>>
>> Especially if the problem is not easily reproductible for us.
>>
>
> OK, I've used an netsniff-ng to capture the traffic on all interfaces on
> the host (that would be tap0 and eth0, I guess) whilst attempting to
> ping the router from the WinXP KVM client. The result is a pcap file
> that I processed with tcpdump to produce:
>
> reading from file net-trace.pcap, link-type EN10MB (Ethernet)
> 14:56:31.406336 ARP, Request who-has 192.168.200.254 tell 192.168.200.1,
> length 28
>          0x0000:  0001 0800 0604 0001 5254 0c3b 1728 c0a8
>          0x0010:  c801 0000 0000 0000 c0a8 c8fe
> 14:56:31.406357 ARP, Reply 192.168.200.254 is-at 46:83:93:8f:f0:7e,
> length 28
>          0x0000:  0001 0800 0604 0002 4683 938f f07e c0a8
>          0x0010:  c8fe 5254 0c3b 1728 c0a8 c801
> 14:56:31.406534 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
> 512, seq 4352, length 40
>          0x0000:  4500 003c 0195 0000 8001 efd8 c0a8 c801
>          0x0010:  c0a8 0001 0800 3a5c 0200 1100 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:31.406566 ARP, Request who-has 192.168.0.1 tell 192.168.0.40,
> length 28
>          0x0000:  0001 0800 0604 0001 5c9a d85c 6331 c0a8
>          0x0010:  0028 0000 0000 0000 c0a8 0001
> 14:56:31.410830 ARP, Reply 192.168.0.1 is-at 00:1f:33:80:09:44, length 46
>          0x0000:  0001 0800 0604 0002 001f 3380 0944 c0a8
>          0x0010:  0001 5c9a d85c 6331 c0a8 0028 c0a8 0001
>          0x0020:  e000 0001 1164 ee9b 0000 0000 4500
> 14:56:31.410851 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
> 512, seq 4352, length 40
>          0x0000:  4500 003c 0195 0000 7f01 b8b2 c0a8 0028
>          0x0010:  c0a8 0001 0800 3a5c 0200 1100 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:31.414474 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
> seq 4352, length 40
>          0x0000:  4500 003c cf4f 0000 ff01 6af7 c0a8 0001
>          0x0010:  c0a8 0028 0000 425c 0200 1100 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:36.404781 ARP, Request who-has 192.168.0.40 tell 192.168.0.1,
> length 46
>          0x0000:  0001 0800 0604 0001 001f 3380 0944 c0a8
>          0x0010:  0001 0000 0000 0000 c0a8 0028 c0a8 0001
>          0x0020:  c0a8 0028 0000 425c 0200 1100 6162
> 14:56:36.404806 ARP, Reply 192.168.0.40 is-at 5c:9a:d8:5c:63:31, length 28
>          0x0000:  0001 0800 0604 0002 5c9a d85c 6331 c0a8
>          0x0010:  0028 001f 3380 0944 c0a8 0001
> 14:56:36.689750 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
> 512, seq 4608, length 40
>          0x0000:  4500 003c 0196 0000 8001 efd7 c0a8 c801
>          0x0010:  c0a8 0001 0800 395c 0200 1200 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:36.689774 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
> 512, seq 4608, length 40
>          0x0000:  4500 003c 0196 0000 7f01 b8b1 c0a8 0028
>          0x0010:  c0a8 0001 0800 395c 0200 1200 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:36.693330 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
> seq 4608, length 40
>          0x0000:  4500 003c cf50 0000 ff01 6af6 c0a8 0001
>          0x0010:  c0a8 0028 0000 415c 0200 1200 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:42.189424 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
> 512, seq 4864, length 40
>          0x0000:  4500 003c 0197 0000 8001 efd6 c0a8 c801
>          0x0010:  c0a8 0001 0800 385c 0200 1300 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:42.189447 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
> 512, seq 4864, length 40
>          0x0000:  4500 003c 0197 0000 7f01 b8b0 c0a8 0028
>          0x0010:  c0a8 0001 0800 385c 0200 1300 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:42.193029 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
> seq 4864, length 40
>          0x0000:  4500 003c cf51 0000 ff01 6af5 c0a8 0001
>          0x0010:  c0a8 0028 0000 405c 0200 1300 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:47.689414 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
> 512, seq 5120, length 40
>          0x0000:  4500 003c 0198 0000 8001 efd5 c0a8 c801
>          0x0010:  c0a8 0001 0800 375c 0200 1400 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:47.689439 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
> 512, seq 5120, length 40
>          0x0000:  4500 003c 0198 0000 7f01 b8af c0a8 0028
>          0x0010:  c0a8 0001 0800 375c 0200 1400 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
> 14:56:47.693661 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
> seq 5120, length 40
>          0x0000:  4500 003c cf52 0000 ff01 6af4 c0a8 0001
>          0x0010:  c0a8 0028 0000 3f5c 0200 1400 6162 6364
>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>          0x0030:  7576 7761 6263 6465 6667 6869
>
> Is this what you asked for?
>
> Chris
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-22  6:26         ` Chris Clayton
@ 2012-09-27 11:50           ` Chris Clayton
  2012-09-27 12:14             ` Eric Dumazet
  0 siblings, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-09-27 11:50 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Eric Dumazet, netdev, gpiez

Just for information - I've pulled Linus' tree this morning and the 
problem is still present. Also, Gunther Piaz has reported, via the 
bugzilla entry, that he too has hit this regression.

On 09/22/12 07:26, Chris Clayton wrote:
> I guess you network developer folks are either very busy or this
> regression is proving a bit troublesome to identify, so I've opened a
> bugzilla report to keep track of it. The report number is 47761.
>
> Chris
>
> On 09/19/12 16:26, Chris Clayton wrote:
>>>
>>> It would help to have some traffic sample, maybe.
>>>
>>> Especially if the problem is not easily reproductible for us.
>>>
>>
>> OK, I've used an netsniff-ng to capture the traffic on all interfaces on
>> the host (that would be tap0 and eth0, I guess) whilst attempting to
>> ping the router from the WinXP KVM client. The result is a pcap file
>> that I processed with tcpdump to produce:
>>
>> reading from file net-trace.pcap, link-type EN10MB (Ethernet)
>> 14:56:31.406336 ARP, Request who-has 192.168.200.254 tell 192.168.200.1,
>> length 28
>>          0x0000:  0001 0800 0604 0001 5254 0c3b 1728 c0a8
>>          0x0010:  c801 0000 0000 0000 c0a8 c8fe
>> 14:56:31.406357 ARP, Reply 192.168.200.254 is-at 46:83:93:8f:f0:7e,
>> length 28
>>          0x0000:  0001 0800 0604 0002 4683 938f f07e c0a8
>>          0x0010:  c8fe 5254 0c3b 1728 c0a8 c801
>> 14:56:31.406534 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
>> 512, seq 4352, length 40
>>          0x0000:  4500 003c 0195 0000 8001 efd8 c0a8 c801
>>          0x0010:  c0a8 0001 0800 3a5c 0200 1100 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:31.406566 ARP, Request who-has 192.168.0.1 tell 192.168.0.40,
>> length 28
>>          0x0000:  0001 0800 0604 0001 5c9a d85c 6331 c0a8
>>          0x0010:  0028 0000 0000 0000 c0a8 0001
>> 14:56:31.410830 ARP, Reply 192.168.0.1 is-at 00:1f:33:80:09:44, length 46
>>          0x0000:  0001 0800 0604 0002 001f 3380 0944 c0a8
>>          0x0010:  0001 5c9a d85c 6331 c0a8 0028 c0a8 0001
>>          0x0020:  e000 0001 1164 ee9b 0000 0000 4500
>> 14:56:31.410851 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
>> 512, seq 4352, length 40
>>          0x0000:  4500 003c 0195 0000 7f01 b8b2 c0a8 0028
>>          0x0010:  c0a8 0001 0800 3a5c 0200 1100 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:31.414474 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
>> seq 4352, length 40
>>          0x0000:  4500 003c cf4f 0000 ff01 6af7 c0a8 0001
>>          0x0010:  c0a8 0028 0000 425c 0200 1100 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:36.404781 ARP, Request who-has 192.168.0.40 tell 192.168.0.1,
>> length 46
>>          0x0000:  0001 0800 0604 0001 001f 3380 0944 c0a8
>>          0x0010:  0001 0000 0000 0000 c0a8 0028 c0a8 0001
>>          0x0020:  c0a8 0028 0000 425c 0200 1100 6162
>> 14:56:36.404806 ARP, Reply 192.168.0.40 is-at 5c:9a:d8:5c:63:31,
>> length 28
>>          0x0000:  0001 0800 0604 0002 5c9a d85c 6331 c0a8
>>          0x0010:  0028 001f 3380 0944 c0a8 0001
>> 14:56:36.689750 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
>> 512, seq 4608, length 40
>>          0x0000:  4500 003c 0196 0000 8001 efd7 c0a8 c801
>>          0x0010:  c0a8 0001 0800 395c 0200 1200 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:36.689774 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
>> 512, seq 4608, length 40
>>          0x0000:  4500 003c 0196 0000 7f01 b8b1 c0a8 0028
>>          0x0010:  c0a8 0001 0800 395c 0200 1200 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:36.693330 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
>> seq 4608, length 40
>>          0x0000:  4500 003c cf50 0000 ff01 6af6 c0a8 0001
>>          0x0010:  c0a8 0028 0000 415c 0200 1200 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:42.189424 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
>> 512, seq 4864, length 40
>>          0x0000:  4500 003c 0197 0000 8001 efd6 c0a8 c801
>>          0x0010:  c0a8 0001 0800 385c 0200 1300 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:42.189447 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
>> 512, seq 4864, length 40
>>          0x0000:  4500 003c 0197 0000 7f01 b8b0 c0a8 0028
>>          0x0010:  c0a8 0001 0800 385c 0200 1300 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:42.193029 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
>> seq 4864, length 40
>>          0x0000:  4500 003c cf51 0000 ff01 6af5 c0a8 0001
>>          0x0010:  c0a8 0028 0000 405c 0200 1300 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:47.689414 IP 192.168.200.1 > 192.168.0.1: ICMP echo request, id
>> 512, seq 5120, length 40
>>          0x0000:  4500 003c 0198 0000 8001 efd5 c0a8 c801
>>          0x0010:  c0a8 0001 0800 375c 0200 1400 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:47.689439 IP 192.168.0.40 > 192.168.0.1: ICMP echo request, id
>> 512, seq 5120, length 40
>>          0x0000:  4500 003c 0198 0000 7f01 b8af c0a8 0028
>>          0x0010:  c0a8 0001 0800 375c 0200 1400 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>> 14:56:47.693661 IP 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512,
>> seq 5120, length 40
>>          0x0000:  4500 003c cf52 0000 ff01 6af4 c0a8 0001
>>          0x0010:  c0a8 0028 0000 3f5c 0200 1400 6162 6364
>>          0x0020:  6566 6768 696a 6b6c 6d6e 6f70 7172 7374
>>          0x0030:  7576 7761 6263 6465 6667 6869
>>
>> Is this what you asked for?
>>
>> Chris
>>
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-27 11:50           ` Chris Clayton
@ 2012-09-27 12:14             ` Eric Dumazet
  2012-09-27 18:05               ` Chris Clayton
  0 siblings, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-09-27 12:14 UTC (permalink / raw)
  To: Chris Clayton; +Cc: netdev, gpiez

On Thu, 2012-09-27 at 12:50 +0100, Chris Clayton wrote:
> Just for information - I've pulled Linus' tree this morning and the 
> problem is still present. Also, Gunther Piaz has reported, via the 
> bugzilla entry, that he too has hit this regression.

I tried to reproduce the bug, and my kvm guests have no problem.

I guess you need to precisely describe how you setup your network, so
that I can reproduce the problem and eventually fix it.

Thanks

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-27 12:14             ` Eric Dumazet
@ 2012-09-27 18:05               ` Chris Clayton
  2012-09-27 21:03                 ` Eric Dumazet
  0 siblings, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-09-27 18:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, gpiez

On 09/27/12 13:14, Eric Dumazet wrote:
> On Thu, 2012-09-27 at 12:50 +0100, Chris Clayton wrote:
>> Just for information - I've pulled Linus' tree this morning and the
>> problem is still present. Also, Gunther Piaz has reported, via the
>> bugzilla entry, that he too has hit this regression.
>
> I tried to reproduce the bug, and my kvm guests have no problem.
>
> I guess you need to precisely describe how you setup your network, so
> that I can reproduce the problem and eventually fix it.
>

You've seen the bits from my firewall setup script that relate to this 
issue. I start the WinXP client with another script:

#!/bin/sh
if [ -e $HOME/kvm/var/run/kvm-winxp.pid ]; then
     echo "winxp is already running ..." > /dev/stderr
     exit 1
fi

# make sure the kvm modules are loaded
if test -z "$(grep '\<kvm\>' /proc/misc)"; then
     sudo modprobe kvm-intel
     while test -z "$(grep '\<kvm\>' /proc/misc)"; do
         true
     done
fi

# make sure tun module is loaded
if test ! -e /dev/net/tun; then
     sudo modprobe tun
fi

# figure out the cpu to use
QVER=$(qemu-kvm --version | cut -d' ' -f 4 | sed 's/,/./')
# assumes major version is 1
MINORVER=$(echo $QVER | cut -d'.' -f 2)
if [ $MINORVER -ge 1 ]; then
     CPU="host"
else
     CPU="qemu64"
fi

# set up the network interface
TAPDEV=$(sudo tunctl -b -u $(whoami))
sudo ifconfig $TAPDEV 192.168.200.254 netmask 255.255.255.0 broadcast 
192.168.200.255

# start Windows XP
qemu-kvm -drive file=$HOME/kvm/winxp.qcow2,index=0,cache=none,if=virtio 
-cpu $CPU -smp cores=1,threads=2 -soundhw es1370 \
     -m 768 -net nic,model=virtio,macaddr=$(getmacaddr) -net 
tap,ifname=$TAPDEV -startdate $(date +%Y-%m-%dT%H:%M:%S) \
     -name kxplaptop -pidfile $HOME/kvm/var/run/kvm-winxp.pid $*

# stop the network interface
sudo ifconfig $TAPDEV down
sudo tunctl -d $TAPDEV &>/dev/null

# tidy up
rm -f $HOME/kvm/var/run/kvm-winxp.pid


The call to getmacaddr just returns the next in a sequence of mac 
addresses. qemu-kvm is a symlink to /usr/bin/qemu-system-i386. I first 
found the problem whilst running qemu-kvm version 1.1.1 although I've 
since updated to 1.2.0.

By the way, I doubt it will make a difference, but, although my laptop 
has a 64bit CPU, I am running a 32 bit kernel and, obviously, user space.

Let me know if you need anything else.

Thanks

> Thanks
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-27 18:05               ` Chris Clayton
@ 2012-09-27 21:03                 ` Eric Dumazet
  2012-09-27 21:17                   ` Eric Dumazet
  0 siblings, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-09-27 21:03 UTC (permalink / raw)
  To: Chris Clayton; +Cc: netdev, gpiez

On Thu, 2012-09-27 at 19:05 +0100, Chris Clayton wrote:
> On 09/27/12 13:14, Eric Dumazet wrote:
> > On Thu, 2012-09-27 at 12:50 +0100, Chris Clayton wrote:
> >> Just for information - I've pulled Linus' tree this morning and the
> >> problem is still present. Also, Gunther Piaz has reported, via the
> >> bugzilla entry, that he too has hit this regression.
> >
> > I tried to reproduce the bug, and my kvm guests have no problem.
> >
> > I guess you need to precisely describe how you setup your network, so
> > that I can reproduce the problem and eventually fix it.
> >
> 
> You've seen the bits from my firewall setup script that relate to this 
> issue. I start the WinXP client with another script:
> 
> #!/bin/sh
> if [ -e $HOME/kvm/var/run/kvm-winxp.pid ]; then
>      echo "winxp is already running ..." > /dev/stderr
>      exit 1
> fi
> 
> # make sure the kvm modules are loaded
> if test -z "$(grep '\<kvm\>' /proc/misc)"; then
>      sudo modprobe kvm-intel
>      while test -z "$(grep '\<kvm\>' /proc/misc)"; do
>          true
>      done
> fi
> 
> # make sure tun module is loaded
> if test ! -e /dev/net/tun; then
>      sudo modprobe tun
> fi
> 
> # figure out the cpu to use
> QVER=$(qemu-kvm --version | cut -d' ' -f 4 | sed 's/,/./')
> # assumes major version is 1
> MINORVER=$(echo $QVER | cut -d'.' -f 2)
> if [ $MINORVER -ge 1 ]; then
>      CPU="host"
> else
>      CPU="qemu64"
> fi
> 
> # set up the network interface
> TAPDEV=$(sudo tunctl -b -u $(whoami))
> sudo ifconfig $TAPDEV 192.168.200.254 netmask 255.255.255.0 broadcast 
> 192.168.200.255
> 
> # start Windows XP
> qemu-kvm -drive file=$HOME/kvm/winxp.qcow2,index=0,cache=none,if=virtio 
> -cpu $CPU -smp cores=1,threads=2 -soundhw es1370 \
>      -m 768 -net nic,model=virtio,macaddr=$(getmacaddr) -net 
> tap,ifname=$TAPDEV -startdate $(date +%Y-%m-%dT%H:%M:%S) \
>      -name kxplaptop -pidfile $HOME/kvm/var/run/kvm-winxp.pid $*
> 
> # stop the network interface
> sudo ifconfig $TAPDEV down
> sudo tunctl -d $TAPDEV &>/dev/null
> 
> # tidy up
> rm -f $HOME/kvm/var/run/kvm-winxp.pid
> 
> 
> The call to getmacaddr just returns the next in a sequence of mac 
> addresses. qemu-kvm is a symlink to /usr/bin/qemu-system-i386. I first 
> found the problem whilst running qemu-kvm version 1.1.1 although I've 
> since updated to 1.2.0.
> 
> By the way, I doubt it will make a difference, but, although my laptop 
> has a 64bit CPU, I am running a 32 bit kernel and, obviously, user space.
> 
> Let me know if you need anything else.

It works for me.

Hmm, maybe your guest is using DHCP and DHCP fails ?

Could you check ?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-27 21:03                 ` Eric Dumazet
@ 2012-09-27 21:17                   ` Eric Dumazet
  2012-09-28  6:53                     ` David Miller
  2012-09-28  9:22                     ` Chris Clayton
  0 siblings, 2 replies; 59+ messages in thread
From: Eric Dumazet @ 2012-09-27 21:17 UTC (permalink / raw)
  To: Chris Clayton, David Miller; +Cc: netdev, gpiez

On Thu, 2012-09-27 at 23:03 +0200, Eric Dumazet wrote:
> On Thu, 2012-09-27 at 19:05 +0100, Chris Clayton wrote:
> > On 09/27/12 13:14, Eric Dumazet wrote:
> > > On Thu, 2012-09-27 at 12:50 +0100, Chris Clayton wrote:
> > >> Just for information - I've pulled Linus' tree this morning and the
> > >> problem is still present. Also, Gunther Piaz has reported, via the
> > >> bugzilla entry, that he too has hit this regression.
> > >
> > > I tried to reproduce the bug, and my kvm guests have no problem.
> > >
> > > I guess you need to precisely describe how you setup your network, so
> > > that I can reproduce the problem and eventually fix it.
> > >
> > 
> > You've seen the bits from my firewall setup script that relate to this 
> > issue. I start the WinXP client with another script:
> > 
> > #!/bin/sh
> > if [ -e $HOME/kvm/var/run/kvm-winxp.pid ]; then
> >      echo "winxp is already running ..." > /dev/stderr
> >      exit 1
> > fi
> > 
> > # make sure the kvm modules are loaded
> > if test -z "$(grep '\<kvm\>' /proc/misc)"; then
> >      sudo modprobe kvm-intel
> >      while test -z "$(grep '\<kvm\>' /proc/misc)"; do
> >          true
> >      done
> > fi
> > 
> > # make sure tun module is loaded
> > if test ! -e /dev/net/tun; then
> >      sudo modprobe tun
> > fi
> > 
> > # figure out the cpu to use
> > QVER=$(qemu-kvm --version | cut -d' ' -f 4 | sed 's/,/./')
> > # assumes major version is 1
> > MINORVER=$(echo $QVER | cut -d'.' -f 2)
> > if [ $MINORVER -ge 1 ]; then
> >      CPU="host"
> > else
> >      CPU="qemu64"
> > fi
> > 
> > # set up the network interface
> > TAPDEV=$(sudo tunctl -b -u $(whoami))
> > sudo ifconfig $TAPDEV 192.168.200.254 netmask 255.255.255.0 broadcast 
> > 192.168.200.255
> > 
> > # start Windows XP
> > qemu-kvm -drive file=$HOME/kvm/winxp.qcow2,index=0,cache=none,if=virtio 
> > -cpu $CPU -smp cores=1,threads=2 -soundhw es1370 \
> >      -m 768 -net nic,model=virtio,macaddr=$(getmacaddr) -net 
> > tap,ifname=$TAPDEV -startdate $(date +%Y-%m-%dT%H:%M:%S) \
> >      -name kxplaptop -pidfile $HOME/kvm/var/run/kvm-winxp.pid $*
> > 
> > # stop the network interface
> > sudo ifconfig $TAPDEV down
> > sudo tunctl -d $TAPDEV &>/dev/null
> > 
> > # tidy up
> > rm -f $HOME/kvm/var/run/kvm-winxp.pid
> > 
> > 
> > The call to getmacaddr just returns the next in a sequence of mac 
> > addresses. qemu-kvm is a symlink to /usr/bin/qemu-system-i386. I first 
> > found the problem whilst running qemu-kvm version 1.1.1 although I've 
> > since updated to 1.2.0.
> > 
> > By the way, I doubt it will make a difference, but, although my laptop 
> > has a 64bit CPU, I am running a 32 bit kernel and, obviously, user space.
> > 
> > Let me know if you need anything else.
> 
> It works for me.
> 
> Hmm, maybe your guest is using DHCP and DHCP fails ?

Yes it seems the problem. On the host I tried :

# ip ro get 8.8.8.8 from 192.168.200.1 iif tap1
8.8.8.8 from 192.168.200.1 via 172.30.42.1 dev eth0 
    cache  iif *

So if the guest tries to send a frame to 8.8.8.8 we are going to forward
the packet to eth0

But if the guest tries to send to 255.255.255.255, we try to deliver the
packet to the host itself, instead of broadcasting to eth0

# ip ro get 255.255.255.255 from 192.168.200.1 iif tap1
broadcast 255.255.255.255 from 192.168.200.1 dev lo 
    cache <local,brd>  iif *


David, maybe you'll have an idea ?

Thanks

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-27 21:17                   ` Eric Dumazet
@ 2012-09-28  6:53                     ` David Miller
  2012-09-28  9:14                       ` Chris Clayton
  2012-09-28  9:22                     ` Chris Clayton
  1 sibling, 1 reply; 59+ messages in thread
From: David Miller @ 2012-09-28  6:53 UTC (permalink / raw)
  To: eric.dumazet; +Cc: chris2553, netdev, gpiez

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 27 Sep 2012 23:17:04 +0200

> Yes it seems the problem. On the host I tried :
> 
> # ip ro get 8.8.8.8 from 192.168.200.1 iif tap1
> 8.8.8.8 from 192.168.200.1 via 172.30.42.1 dev eth0 
>     cache  iif *
> 
> So if the guest tries to send a frame to 8.8.8.8 we are going to forward
> the packet to eth0
> 
> But if the guest tries to send to 255.255.255.255, we try to deliver the
> packet to the host itself, instead of broadcasting to eth0
> 
> # ip ro get 255.255.255.255 from 192.168.200.1 iif tap1
> broadcast 255.255.255.255 from 192.168.200.1 dev lo 
>     cache <local,brd>  iif *
> 
> David, maybe you'll have an idea ?

Perhaps this was introduced by:

commit 7bd86cc282a458b66c41e3f6676de6656c99b8db
Author: Yan, Zheng <zheng.z.yan@intel.com>
Date:   Sun Aug 12 20:09:59 2012 +0000

    ipv4: Cache local output routes
    
    Commit caacf05e5ad1abf causes big drop of UDP loop back performance.
    The cause of the regression is that we do not cache the local output
    routes. Each time we send a datagram from unconnected UDP socket,
    the kernel allocates a dst_entry and adds it to the rt_uncached_list.
    It creates lock contention on the rt_uncached_lock.
    
    Reported-by: Alex Shi <alex.shi@intel.com>
    Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e4ba974..fd9ecb5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2028,7 +2028,6 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 		}
 		dev_out = net->loopback_dev;
 		fl4->flowi4_oif = dev_out->ifindex;
-		res.fi = NULL;
 		flags |= RTCF_LOCAL;
 		goto make_route;
 	}

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-28  6:53                     ` David Miller
@ 2012-09-28  9:14                       ` Chris Clayton
  0 siblings, 0 replies; 59+ messages in thread
From: Chris Clayton @ 2012-09-28  9:14 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, gpiez



On 09/28/12 07:53, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 27 Sep 2012 23:17:04 +0200
>
>> Yes it seems the problem. On the host I tried :
>>
>> # ip ro get 8.8.8.8 from 192.168.200.1 iif tap1
>> 8.8.8.8 from 192.168.200.1 via 172.30.42.1 dev eth0
>>      cache  iif *
>>
>> So if the guest tries to send a frame to 8.8.8.8 we are going to forward
>> the packet to eth0
>>
>> But if the guest tries to send to 255.255.255.255, we try to deliver the
>> packet to the host itself, instead of broadcasting to eth0
>>
>> # ip ro get 255.255.255.255 from 192.168.200.1 iif tap1
>> broadcast 255.255.255.255 from 192.168.200.1 dev lo
>>      cache <local,brd>  iif *
>>
>> David, maybe you'll have an idea ?
>
> Perhaps this was introduced by:

Thanks, David.

Unfortunately, reversing that patch does not fix the problem. The pings 
from the KVM client to the router still time out.

I have bisected this (see 
http://marc.info/?l=linux-netdev&m=134797809611847&w=2) and that rendered:

$ git bisect bad
d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5 is the first bad commit
commit d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
Author: David S. Miller <davem@davemloft.net>
Date:   Tue Jul 17 12:58:50 2012 -0700

     ipv4: Cache input routes in fib_info nexthops.

     Caching input routes is slightly simpler than output routes, since we
     don't need to be concerned with nexthop exceptions.  (locally
     destined, and routed packets, never trigger PMTU events or redirects
     that will be processed by us).

     However, we have to elide caching for the DIRECTSRC and non-zero itag
     cases.

     Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 6bbc75c1cbe62bf84ea412d3b98adf2b614779cd 
3ad7256b4a71e63ca4530977c0550121ea803d35 M      include
:040000 040000 18c2a950a53c4eec9bfa12185d1e382dfed74af8 
a2ab6157d6cd54930da395758c6ded3a225d1f04 M      net

Unfortunately, the related patches don't reverse cleanly, but a kernel 
built from a git checkout of the parent commit ( 
f2bb4bedf35d5167a073dcdddf16543f351ef3ae) works fine.

>
> commit 7bd86cc282a458b66c41e3f6676de6656c99b8db
> Author: Yan, Zheng <zheng.z.yan@intel.com>
> Date:   Sun Aug 12 20:09:59 2012 +0000
>
>      ipv4: Cache local output routes
>
>      Commit caacf05e5ad1abf causes big drop of UDP loop back performance.
>      The cause of the regression is that we do not cache the local output
>      routes. Each time we send a datagram from unconnected UDP socket,
>      the kernel allocates a dst_entry and adds it to the rt_uncached_list.
>      It creates lock contention on the rt_uncached_lock.
>
>      Reported-by: Alex Shi <alex.shi@intel.com>
>      Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
>      Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index e4ba974..fd9ecb5 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2028,7 +2028,6 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
>   		}
>   		dev_out = net->loopback_dev;
>   		fl4->flowi4_oif = dev_out->ifindex;
> -		res.fi = NULL;
>   		flags |= RTCF_LOCAL;
>   		goto make_route;
>   	}
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-27 21:17                   ` Eric Dumazet
  2012-09-28  6:53                     ` David Miller
@ 2012-09-28  9:22                     ` Chris Clayton
  2012-09-28 11:26                       ` Eric Dumazet
  1 sibling, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-09-28  9:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, gpiez



On 09/27/12 22:17, Eric Dumazet wrote:
> On Thu, 2012-09-27 at 23:03 +0200, Eric Dumazet wrote:
>> On Thu, 2012-09-27 at 19:05 +0100, Chris Clayton wrote:
>>> On 09/27/12 13:14, Eric Dumazet wrote:
>>>> On Thu, 2012-09-27 at 12:50 +0100, Chris Clayton wrote:
>>>>> Just for information - I've pulled Linus' tree this morning and the
>>>>> problem is still present. Also, Gunther Piaz has reported, via the
>>>>> bugzilla entry, that he too has hit this regression.
>>>>
>>>> I tried to reproduce the bug, and my kvm guests have no problem.
>>>>
>>>> I guess you need to precisely describe how you setup your network, so
>>>> that I can reproduce the problem and eventually fix it.
>>>>
>>>
>>> You've seen the bits from my firewall setup script that relate to this
>>> issue. I start the WinXP client with another script:
>>>
>>> #!/bin/sh
>>> if [ -e $HOME/kvm/var/run/kvm-winxp.pid ]; then
>>>       echo "winxp is already running ..." > /dev/stderr
>>>       exit 1
>>> fi
>>>
>>> # make sure the kvm modules are loaded
>>> if test -z "$(grep '\<kvm\>' /proc/misc)"; then
>>>       sudo modprobe kvm-intel
>>>       while test -z "$(grep '\<kvm\>' /proc/misc)"; do
>>>           true
>>>       done
>>> fi
>>>
>>> # make sure tun module is loaded
>>> if test ! -e /dev/net/tun; then
>>>       sudo modprobe tun
>>> fi
>>>
>>> # figure out the cpu to use
>>> QVER=$(qemu-kvm --version | cut -d' ' -f 4 | sed 's/,/./')
>>> # assumes major version is 1
>>> MINORVER=$(echo $QVER | cut -d'.' -f 2)
>>> if [ $MINORVER -ge 1 ]; then
>>>       CPU="host"
>>> else
>>>       CPU="qemu64"
>>> fi
>>>
>>> # set up the network interface
>>> TAPDEV=$(sudo tunctl -b -u $(whoami))
>>> sudo ifconfig $TAPDEV 192.168.200.254 netmask 255.255.255.0 broadcast
>>> 192.168.200.255
>>>
>>> # start Windows XP
>>> qemu-kvm -drive file=$HOME/kvm/winxp.qcow2,index=0,cache=none,if=virtio
>>> -cpu $CPU -smp cores=1,threads=2 -soundhw es1370 \
>>>       -m 768 -net nic,model=virtio,macaddr=$(getmacaddr) -net
>>> tap,ifname=$TAPDEV -startdate $(date +%Y-%m-%dT%H:%M:%S) \
>>>       -name kxplaptop -pidfile $HOME/kvm/var/run/kvm-winxp.pid $*
>>>
>>> # stop the network interface
>>> sudo ifconfig $TAPDEV down
>>> sudo tunctl -d $TAPDEV &>/dev/null
>>>
>>> # tidy up
>>> rm -f $HOME/kvm/var/run/kvm-winxp.pid
>>>
>>>
>>> The call to getmacaddr just returns the next in a sequence of mac
>>> addresses. qemu-kvm is a symlink to /usr/bin/qemu-system-i386. I first
>>> found the problem whilst running qemu-kvm version 1.1.1 although I've
>>> since updated to 1.2.0.
>>>
>>> By the way, I doubt it will make a difference, but, although my laptop
>>> has a 64bit CPU, I am running a 32 bit kernel and, obviously, user space.
>>>
>>> Let me know if you need anything else.
>>
>> It works for me.
>>
>> Hmm, maybe your guest is using DHCP and DHCP fails ?

No, the WinXP guest is configured with a fixed IP address 
(192.168.200.1). Subnet mask is 255.255.255.0, and default gateway is 
192.168.200.254. DNS is 192.168.0.1.

>
> Yes it seems the problem. On the host I tried :
>
> # ip ro get 8.8.8.8 from 192.168.200.1 iif tap1
> 8.8.8.8 from 192.168.200.1 via 172.30.42.1 dev eth0
>      cache  iif *
>
> So if the guest tries to send a frame to 8.8.8.8 we are going to forward
> the packet to eth0
>
> But if the guest tries to send to 255.255.255.255, we try to deliver the
> packet to the host itself, instead of broadcasting to eth0
>
> # ip ro get 255.255.255.255 from 192.168.200.1 iif tap1
> broadcast 255.255.255.255 from 192.168.200.1 dev lo
>      cache <local,brd>  iif *
>
>
> David, maybe you'll have an idea ?
>
> Thanks
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-28  9:22                     ` Chris Clayton
@ 2012-09-28 11:26                       ` Eric Dumazet
  2012-09-28 14:28                         ` Chris Clayton
  2012-09-30 15:26                         ` Chris Clayton
  0 siblings, 2 replies; 59+ messages in thread
From: Eric Dumazet @ 2012-09-28 11:26 UTC (permalink / raw)
  To: Chris Clayton; +Cc: David Miller, netdev, gpiez

On Fri, 2012-09-28 at 10:22 +0100, Chris Clayton wrote:

> No, the WinXP guest is configured with a fixed IP address 
> (192.168.200.1). Subnet mask is 255.255.255.0, and default gateway is 
> 192.168.200.254. DNS is 192.168.0.1.
> 

I have no problem with such a setup, with a linux guest.

Could you send again a tcpdump, but including link-level header ?
(option -e)

Ideally, you could send two traces, one taken on tap0, and another taken
on eth0.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-28 11:26                       ` Eric Dumazet
@ 2012-09-28 14:28                         ` Chris Clayton
  2012-09-30 15:26                         ` Chris Clayton
  1 sibling, 0 replies; 59+ messages in thread
From: Chris Clayton @ 2012-09-28 14:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, gpiez



On 09/28/12 12:26, Eric Dumazet wrote:
> On Fri, 2012-09-28 at 10:22 +0100, Chris Clayton wrote:
>
>> No, the WinXP guest is configured with a fixed IP address
>> (192.168.200.1). Subnet mask is 255.255.255.0, and default gateway is
>> 192.168.200.254. DNS is 192.168.0.1.
>>
>
> I have no problem with such a setup, with a linux guest.
>
> Could you send again a tcpdump, but including link-level header ?
> (option -e)
>
> Ideally, you could send two traces, one taken on tap0, and another taken
> on eth0.
>

Two traces

Trace 1 - tap0 (192.168.200.254) whilst pinging router (192.168.0.1)from 
KVM guest (192.168.200.1):

15:03:14.953599 52:54:0c:3b:17:38 > Broadcast, ethertype ARP (0x0806), 
length 42: Request who-has 192.168.200.254 tell 192.168.200.1, length 28
15:03:14.953617 9e:c3:0c:c8:65:8d > 52:54:0c:3b:17:38, ethertype ARP 
(0x0806), length 42: Reply 192.168.200.254 is-at 9e:c3:0c:c8:65:8d, 
length 28
15:03:14.953725 52:54:0c:3b:17:38 > 9e:c3:0c:c8:65:8d, ethertype IPv4 
(0x0800), length 74: 192.168.200.1 > 192.168.0.1: ICMP echo request, id 
512, seq 5376, length 40
15:03:20.427278 52:54:0c:3b:17:38 > 9e:c3:0c:c8:65:8d, ethertype IPv4 
(0x0800), length 74: 192.168.200.1 > 192.168.0.1: ICMP echo request, id 
512, seq 5632, length 40
15:03:25.942215 52:54:0c:3b:17:38 > 9e:c3:0c:c8:65:8d, ethertype IPv4 
(0x0800), length 74: 192.168.200.1 > 192.168.0.1: ICMP echo request, id 
512, seq 5888, length 40
15:03:31.455578 52:54:0c:3b:17:38 > 9e:c3:0c:c8:65:8d, ethertype IPv4 
(0x0800), length 74: 192.168.200.1 > 192.168.0.1: ICMP echo request, id 
512, seq 6144, length 40

Trace 2 - eth0 (192.168.0.40) whilst pinging router (192.168.0.1)from 
KVM guest (192.168.200.1):

15:04:06.427863 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 74: 192.168.0.40 > 192.168.0.1: ICMP echo request, id 
512, seq 6400, length 40
15:04:06.432100 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 74: 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 
512, seq 6400, length 40
15:04:11.430877 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype ARP 
(0x0806), length 60: Request who-has 192.168.0.40 tell 192.168.0.1, 
length 46
15:04:11.430898 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype ARP 
(0x0806), length 42: Reply 192.168.0.40 is-at 5c:9a:d8:5c:63:31, length 28
15:04:11.567319 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 74: 192.168.0.40 > 192.168.0.1: ICMP echo request, id 
512, seq 6656, length 40
15:04:11.571534 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 74: 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 
512, seq 6656, length 40
15:04:16.577137 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype ARP 
(0x0806), length 42: Request who-has 192.168.0.1 tell 192.168.0.40, 
length 28
15:04:16.580373 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype ARP 
(0x0806), length 60: Reply 192.168.0.1 is-at 00:1f:33:80:09:44, length 46
15:04:17.083328 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 74: 192.168.0.40 > 192.168.0.1: ICMP echo request, id 
512, seq 6912, length 40
15:04:17.086854 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 74: 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 
512, seq 6912, length 40
15:04:22.585766 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 74: 192.168.0.40 > 192.168.0.1: ICMP echo request, id 
512, seq 7168, length 40
15:04:22.589989 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 74: 192.168.0.1 > 192.168.0.40: ICMP echo reply, id 
512, seq 7168, length 40
15:04:32.240422 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 446: 192.168.0.112.2704 > 239.255.255.250.1900: UDP, 
length 404
15:04:32.241404 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 455: 192.168.0.112.2704 > 239.255.255.250.1900: UDP, 
length 413
15:04:32.242915 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 494: 192.168.0.112.2704 > 239.255.255.250.1900: UDP, 
length 452
15:04:32.243986 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 490: 192.168.0.112.1434 > 239.255.255.250.1900: UDP, 
length 448
15:04:32.245476 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 486: 192.168.0.112.2901 > 239.255.255.250.1900: UDP, 
length 444
15:04:32.246545 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 486: 192.168.0.112.3828 > 239.255.255.250.1900: UDP, 
length 444
15:04:32.342459 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 446: 192.168.0.112.4445 > 239.255.255.250.1900: UDP, 
length 404
15:04:32.343506 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 455: 192.168.0.112.4445 > 239.255.255.250.1900: UDP, 
length 413
15:04:32.345017 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 494: 192.168.0.112.4445 > 239.255.255.250.1900: UDP, 
length 452
15:04:32.346087 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 490: 192.168.0.112.2735 > 239.255.255.250.1900: UDP, 
length 448
15:04:32.348314 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 486: 192.168.0.112.4940 > 239.255.255.250.1900: UDP, 
length 444
15:04:32.349362 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 486: 192.168.0.112.1029 > 239.255.255.250.1900: UDP, 
length 444

The second trace seems to contain some upnp-related traffic involving my 
satellite TV box. If it would help, I can turn that off when my wife 
isn't watching TV, and run the traces again.

Chris

>
>
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-28 11:26                       ` Eric Dumazet
  2012-09-28 14:28                         ` Chris Clayton
@ 2012-09-30 15:26                         ` Chris Clayton
  2012-09-30 19:45                           ` Eric Dumazet
  1 sibling, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-09-30 15:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, gpiez

Hi Eric,

On 09/28/12 12:26, Eric Dumazet wrote:
> On Fri, 2012-09-28 at 10:22 +0100, Chris Clayton wrote:
>
>> No, the WinXP guest is configured with a fixed IP address
>> (192.168.200.1). Subnet mask is 255.255.255.0, and default gateway is
>> 192.168.200.254. DNS is 192.168.0.1.
>>
>
> I have no problem with such a setup, with a linux guest.
>
> Could you send again a tcpdump, but including link-level header ?
> (option -e)
>
> Ideally, you could send two traces, one taken on tap0, and another taken
> on eth0.
>
Below are two more traces that I think may well be more useful than 
those I sent on Friday. They are taken with tcpdump directly (after some 
reading up on that application) rather than tcpdump translations of pcap 
files captured with netsniff-ng. Also, they are taken concurrently, so 
they show the traffic on tap0 and eth0 at the time of an unsuccessful 
attempt to ping the router from the WinXP KVM client. The command was: 
sudo tcpdump -nev -i eth0 -Z chris >eth0.trace

tap0:
16:05:14.909057 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 128, id 286, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.200.1.49391 > 192.168.0.1.domain: 33727+ A? wpad. (22)
16:05:21.909026 52:54:0c:3b:17:39 > Broadcast, ethertype IPv4 (0x0800), 
length 92: (tos 0x0, ttl 128, id 287, offset 0, flags [none], proto UDP 
(17), length 78)
     192.168.200.1.netbios-ns > 192.168.200.255.netbios-ns: NBT UDP 
PACKET(137): QUERY; REQUEST; BROADCAST
16:05:21.909123 62:4e:ff:6b:0d:ce > Broadcast, ethertype IPv4 (0x0800), 
length 264: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP 
(17), length 250)
     192.168.200.254.netbios-dgm > 192.168.200.255.netbios-dgm: NBT UDP 
PACKET(138)
16:05:21.909141 62:4e:ff:6b:0d:ce > Broadcast, ethertype IPv4 (0x0800), 
length 249: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP 
(17), length 235)
     192.168.200.254.netbios-dgm > 192.168.200.255.netbios-dgm: NBT UDP 
PACKET(138)
16:05:22.261009 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 128, id 288, offset 0, flags [none], 
proto ICMP (1), length 60)
     192.168.200.1 > 192.168.0.1: ICMP echo request, id 512, seq 3840, 
length 40
16:05:22.704716 52:54:0c:3b:17:39 > Broadcast, ethertype IPv4 (0x0800), 
length 92: (tos 0x0, ttl 128, id 289, offset 0, flags [none], proto UDP 
(17), length 78)
     192.168.200.1.netbios-ns > 192.168.200.255.netbios-ns: NBT UDP 
PACKET(137): QUERY; REQUEST; BROADCAST
16:05:23.457224 52:54:0c:3b:17:39 > Broadcast, ethertype IPv4 (0x0800), 
length 92: (tos 0x0, ttl 128, id 290, offset 0, flags [none], proto UDP 
(17), length 78)
     192.168.200.1.netbios-ns > 192.168.200.255.netbios-ns: NBT UDP 
PACKET(137): QUERY; REQUEST; BROADCAST
16:05:24.208015 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 82: (tos 0x0, ttl 128, id 291, offset 0, flags [none], 
proto UDP (17), length 68)
     192.168.200.1.56551 > 192.168.0.1.domain: 63293+ A? 
download.microsoft.com. (40)
16:05:25.204731 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 82: (tos 0x0, ttl 128, id 292, offset 0, flags [none], 
proto UDP (17), length 68)
     192.168.200.1.56551 > 192.168.0.1.domain: 63293+ A? 
download.microsoft.com. (40)
16:05:26.204743 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 82: (tos 0x0, ttl 128, id 293, offset 0, flags [none], 
proto UDP (17), length 68)
     192.168.200.1.56551 > 192.168.0.1.domain: 63293+ A? 
download.microsoft.com. (40)
16:05:27.580723 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 128, id 294, offset 0, flags [none], 
proto ICMP (1), length 60)
     192.168.200.1 > 192.168.0.1: ICMP echo request, id 512, seq 4096, 
length 40
16:05:28.204764 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 82: (tos 0x0, ttl 128, id 295, offset 0, flags [none], 
proto UDP (17), length 68)
     192.168.200.1.56551 > 192.168.0.1.domain: 63293+ A? 
download.microsoft.com. (40)
16:05:32.204731 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 82: (tos 0x0, ttl 128, id 296, offset 0, flags [none], 
proto UDP (17), length 68)
     192.168.200.1.56551 > 192.168.0.1.domain: 63293+ A? 
download.microsoft.com. (40)
16:05:33.080759 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 128, id 297, offset 0, flags [none], 
proto ICMP (1), length 60)
     192.168.200.1 > 192.168.0.1: ICMP echo request, id 512, seq 4352, 
length 40
16:05:38.582182 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 128, id 298, offset 0, flags [none], 
proto ICMP (1), length 60)
     192.168.200.1 > 192.168.0.1: ICMP echo request, id 512, seq 4608, 
length 40
16:05:39.218737 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 128, id 299, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.200.1.60955 > 192.168.0.1.domain: 26953+ A? wpad. (22)
16:05:40.204735 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 128, id 300, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.200.1.60955 > 192.168.0.1.domain: 26953+ A? wpad. (22)
16:05:41.204721 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 128, id 301, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.200.1.60955 > 192.168.0.1.domain: 26953+ A? wpad. (22)
16:05:43.238517 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 128, id 302, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.200.1.60955 > 192.168.0.1.domain: 26953+ A? wpad. (22)
16:05:47.236721 52:54:0c:3b:17:39 > 62:4e:ff:6b:0d:ce, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 128, id 303, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.200.1.60955 > 192.168.0.1.domain: 26953+ A? wpad. (22)


eth0:

16:05:22.261037 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 127, id 288, offset 0, flags [none], 
proto ICMP (1), length 60)
     192.168.0.40 > 192.168.0.1: ICMP echo request, id 512, seq 3840, 
length 40
16:05:22.264612 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 255, id 53593, offset 0, flags 
[none], proto ICMP (1), length 60)
     192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512, seq 3840, 
length 40
16:05:24.208041 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 82: (tos 0x0, ttl 127, id 291, offset 0, flags [none], 
proto UDP (17), length 68)
     192.168.0.40.56551 > 192.168.0.1.domain: 63293+ A? 
download.microsoft.com. (40)
16:05:24.270825 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 426: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], 
proto UDP (17), length 412)
     192.168.0.1.domain > 192.168.0.40.56551: 63293 7/8/0 
download.microsoft.com. CNAME download.microsoft.com.nsatc.net., 
download.microsoft.com.nsatc.net. CNAME main.dl.ms.akadns.net., 
main.dl.ms.akadns.net. CNAME intl.dl.ms.akadns.net., 
intl.dl.ms.akadns.net. CNAME dl.ms.georedirector.akadns.net., 
dl.ms.georedirector.akadns.net. CNAME a767.ms.akamai.net., 
a767.ms.akamai.net. A 90.223.216.161, a767.ms.akamai.net. A 
90.223.216.153 (384)
16:05:25.204745 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 82: (tos 0x0, ttl 127, id 292, offset 0, flags [none], 
proto UDP (17), length 68)
     192.168.0.40.56551 > 192.168.0.1.domain: 63293+ A? 
download.microsoft.com. (40)
16:05:25.266414 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 442: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], 
proto UDP (17), length 428)
     192.168.0.1.domain > 192.168.0.40.56551: 63293 7/8/1 
download.microsoft.com. CNAME download.microsoft.com.nsatc.net., 
download.microsoft.com.nsatc.net. CNAME main.dl.ms.akadns.net., 
main.dl.ms.akadns.net. CNAME intl.dl.ms.akadns.net., 
intl.dl.ms.akadns.net. CNAME dl.ms.georedirector.akadns.net., 
dl.ms.georedirector.akadns.net. CNAME a767.ms.akamai.net., 
a767.ms.akamai.net. A 90.223.216.153, a767.ms.akamai.net. A 
90.223.216.161 (400)
16:05:26.204761 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 82: (tos 0x0, ttl 127, id 293, offset 0, flags [none], 
proto UDP (17), length 68)
     192.168.0.40.56551 > 192.168.0.1.domain: 63293+ A? 
download.microsoft.com. (40)
16:05:26.237788 00:1f:33:80:09:44 > 01:00:5e:00:00:01, ethertype IPv4 
(0x0800), length 60: (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto 
IGMP (2), length 28)
     192.168.0.1 > 224.0.0.1: igmp query v2
16:05:26.266706 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 458: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], 
proto UDP (17), length 444)
     192.168.0.1.domain > 192.168.0.40.56551: 63293 7/8/2 
download.microsoft.com. CNAME download.microsoft.com.nsatc.net., 
download.microsoft.com.nsatc.net. CNAME main.dl.ms.akadns.net., 
main.dl.ms.akadns.net. CNAME intl.dl.ms.akadns.net., 
intl.dl.ms.akadns.net. CNAME dl.ms.georedirector.akadns.net., 
dl.ms.georedirector.akadns.net. CNAME a767.ms.akamai.net., 
a767.ms.akamai.net. A 90.223.216.161, a767.ms.akamai.net. A 
90.223.216.153 (416)
16:05:27.580742 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 127, id 294, offset 0, flags [none], 
proto ICMP (1), length 60)
     192.168.0.40 > 192.168.0.1: ICMP echo request, id 512, seq 4096, 
length 40
16:05:27.585193 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 255, id 53594, offset 0, flags 
[none], proto ICMP (1), length 60)
     192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512, seq 4096, 
length 40
16:05:28.204783 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 82: (tos 0x0, ttl 127, id 295, offset 0, flags [none], 
proto UDP (17), length 68)
     192.168.0.40.56551 > 192.168.0.1.domain: 63293+ A? 
download.microsoft.com. (40)
16:05:28.267047 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 442: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], 
proto UDP (17), length 428)
     192.168.0.1.domain > 192.168.0.40.56551: 63293 7/8/1 
download.microsoft.com. CNAME download.microsoft.com.nsatc.net., 
download.microsoft.com.nsatc.net. CNAME main.dl.ms.akadns.net., 
main.dl.ms.akadns.net. CNAME intl.dl.ms.akadns.net., 
intl.dl.ms.akadns.net. CNAME dl.ms.georedirector.akadns.net., 
dl.ms.georedirector.akadns.net. CNAME a767.ms.akamai.net., 
a767.ms.akamai.net. A 90.223.216.161, a767.ms.akamai.net. A 
90.223.216.153 (400)
16:05:29.267032 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype ARP 
(0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 
192.168.0.40 tell 192.168.0.1, length 46
16:05:29.267049 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype ARP 
(0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 192.168.0.40 
is-at 5c:9a:d8:5c:63:31, length 28
16:05:32.204753 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 82: (tos 0x0, ttl 127, id 296, offset 0, flags [none], 
proto UDP (17), length 68)
     192.168.0.40.56551 > 192.168.0.1.domain: 63293+ A? 
download.microsoft.com. (40)
16:05:32.267308 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 458: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], 
proto UDP (17), length 444)
     192.168.0.1.domain > 192.168.0.40.56551: 63293 7/8/2 
download.microsoft.com. CNAME download.microsoft.com.nsatc.net., 
download.microsoft.com.nsatc.net. CNAME main.dl.ms.akadns.net., 
main.dl.ms.akadns.net. CNAME intl.dl.ms.akadns.net., 
intl.dl.ms.akadns.net. CNAME dl.ms.georedirector.akadns.net., 
dl.ms.georedirector.akadns.net. CNAME a767.ms.akamai.net., 
a767.ms.akamai.net. A 90.223.216.161, a767.ms.akamai.net. A 
90.223.216.153 (416)
16:05:33.080772 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 127, id 297, offset 0, flags [none], 
proto ICMP (1), length 60)
     192.168.0.40 > 192.168.0.1: ICMP echo request, id 512, seq 4352, 
length 40
16:05:33.084435 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 255, id 53595, offset 0, flags 
[none], proto ICMP (1), length 60)
     192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512, seq 4352, 
length 40
16:05:35.277471 00:1f:33:80:09:44 > 01:00:5e:00:00:02, ethertype IPv4 
(0x0800), length 60: (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto 
IGMP (2), length 32, options (RA))
     192.168.0.1 > 224.0.0.2: igmp v2 report 224.0.0.2
16:05:38.582202 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 127, id 298, offset 0, flags [none], 
proto ICMP (1), length 60)
     192.168.0.40 > 192.168.0.1: ICMP echo request, id 512, seq 4608, 
length 40
16:05:38.587143 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 74: (tos 0x0, ttl 255, id 53596, offset 0, flags 
[none], proto ICMP (1), length 60)
     192.168.0.1 > 192.168.0.40: ICMP echo reply, id 512, seq 4608, 
length 40
16:05:39.218763 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 127, id 299, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.0.40.60955 > 192.168.0.1.domain: 26953+ A? wpad. (22)
16:05:39.280065 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 139: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], 
proto UDP (17), length 125)
     192.168.0.1.domain > 192.168.0.40.60955: 26953 NXDomain 0/1/0 (97)
16:05:40.204754 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 127, id 300, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.0.40.60955 > 192.168.0.1.domain: 26953+ A? wpad. (22)
16:05:40.266317 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 139: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], 
proto UDP (17), length 125)
     192.168.0.1.domain > 192.168.0.40.60955: 26953 NXDomain 0/1/0 (97)
16:05:41.204738 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 127, id 301, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.0.40.60955 > 192.168.0.1.domain: 26953+ A? wpad. (22)
16:05:41.266343 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 139: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], 
proto UDP (17), length 125)
     192.168.0.1.domain > 192.168.0.40.60955: 26953 NXDomain 0/1/0 (97)
16:05:43.238538 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 127, id 302, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.0.40.60955 > 192.168.0.1.domain: 26953+ A? wpad. (22)
16:05:43.301692 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 139: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], 
proto UDP (17), length 125)
     192.168.0.1.domain > 192.168.0.40.60955: 26953 NXDomain 0/1/0 (97)
16:05:44.230290 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype ARP 
(0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 
192.168.0.1 tell 192.168.0.40, length 28
16:05:44.233532 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype ARP 
(0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 192.168.0.1 
is-at 00:1f:33:80:09:44, length 46
16:05:47.236740 5c:9a:d8:5c:63:31 > 00:1f:33:80:09:44, ethertype IPv4 
(0x0800), length 64: (tos 0x0, ttl 127, id 303, offset 0, flags [none], 
proto UDP (17), length 50)
     192.168.0.40.60955 > 192.168.0.1.domain: 26953+ A? wpad. (22)
16:05:47.296388 00:1f:33:80:09:44 > 5c:9a:d8:5c:63:31, ethertype IPv4 
(0x0800), length 139: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], 
proto UDP (17), length 125)
     192.168.0.1.domain > 192.168.0.40.60955: 26953 NXDomain 0/1/0 (97)
16:05:48.530940 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 446: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 432)
     192.168.0.112.3829 > 239.255.255.250.1900: UDP, length 404
16:05:48.531962 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 455: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 441)
     192.168.0.112.3829 > 239.255.255.250.1900: UDP, length 413
16:05:48.533472 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 494: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 480)
     192.168.0.112.3829 > 239.255.255.250.1900: UDP, length 452
16:05:48.534564 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 490: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 476)
     192.168.0.112.2600 > 239.255.255.250.1900: UDP, length 448
16:05:48.536749 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 486: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 472)
     192.168.0.112.2411 > 239.255.255.250.1900: UDP, length 444
16:05:48.537798 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 486: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 472)
     192.168.0.112.1205 > 239.255.255.250.1900: UDP, length 444
16:05:48.633492 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 446: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 432)
     192.168.0.112.1378 > 239.255.255.250.1900: UDP, length 404
16:05:48.634558 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 455: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 441)
     192.168.0.112.1378 > 239.255.255.250.1900: UDP, length 413
16:05:48.636069 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 494: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 480)
     192.168.0.112.1378 > 239.255.255.250.1900: UDP, length 452
16:05:48.637119 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 490: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 476)
     192.168.0.112.3487 > 239.255.255.250.1900: UDP, length 448
16:05:48.638631 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 486: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 472)
     192.168.0.112.4415 > 239.255.255.250.1900: UDP, length 444
16:05:48.639702 00:19:fb:be:cb:55 > 01:00:5e:7f:ff:fa, ethertype IPv4 
(0x0800), length 486: (tos 0x0, ttl 4, id 0, offset 0, flags [DF], proto 
UDP (17), length 472)
     192.168.0.112.2700 > 239.255.255.250.1900: UDP, length 444

>
>
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-30 15:26                         ` Chris Clayton
@ 2012-09-30 19:45                           ` Eric Dumazet
  2012-10-01  8:36                             ` Chris Clayton
  0 siblings, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-09-30 19:45 UTC (permalink / raw)
  To: Chris Clayton; +Cc: David Miller, netdev, gpiez

On Sun, 2012-09-30 at 16:26 +0100, Chris Clayton wrote:
> Hi Eric,
> 
> On 09/28/12 12:26, Eric Dumazet wrote:
> > On Fri, 2012-09-28 at 10:22 +0100, Chris Clayton wrote:
> >
> >> No, the WinXP guest is configured with a fixed IP address
> >> (192.168.200.1). Subnet mask is 255.255.255.0, and default gateway is
> >> 192.168.200.254. DNS is 192.168.0.1.
> >>
> >
> > I have no problem with such a setup, with a linux guest.
> >
> > Could you send again a tcpdump, but including link-level header ?
> > (option -e)
> >
> > Ideally, you could send two traces, one taken on tap0, and another taken
> > on eth0.
> >
> Below are two more traces that I think may well be more useful than 
> those I sent on Friday. They are taken with tcpdump directly (after some 
> reading up on that application) rather than tcpdump translations of pcap 
> files captured with netsniff-ng. Also, they are taken concurrently, so 
> they show the traffic on tap0 and eth0 at the time of an unsuccessful 
> attempt to ping the router from the WinXP KVM client. The command was: 
> sudo tcpdump -nev -i eth0 -Z chris >eth0.trace


Could you send "netstat -s" before/after your tests ?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-09-30 19:45                           ` Eric Dumazet
@ 2012-10-01  8:36                             ` Chris Clayton
  2012-10-01  9:15                               ` Eric Dumazet
  0 siblings, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-10-01  8:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, gpiez



On 09/30/12 20:45, Eric Dumazet wrote:
> On Sun, 2012-09-30 at 16:26 +0100, Chris Clayton wrote:
>> Hi Eric,
>>
>> On 09/28/12 12:26, Eric Dumazet wrote:
>>> On Fri, 2012-09-28 at 10:22 +0100, Chris Clayton wrote:
>>>
>>>> No, the WinXP guest is configured with a fixed IP address
>>>> (192.168.200.1). Subnet mask is 255.255.255.0, and default gateway is
>>>> 192.168.200.254. DNS is 192.168.0.1.
>>>>
>>>
>>> I have no problem with such a setup, with a linux guest.
>>>
>>> Could you send again a tcpdump, but including link-level header ?
>>> (option -e)
>>>
>>> Ideally, you could send two traces, one taken on tap0, and another taken
>>> on eth0.
>>>
>> Below are two more traces that I think may well be more useful than
>> those I sent on Friday. They are taken with tcpdump directly (after some
>> reading up on that application) rather than tcpdump translations of pcap
>> files captured with netsniff-ng. Also, they are taken concurrently, so
>> they show the traffic on tap0 and eth0 at the time of an unsuccessful
>> attempt to ping the router from the WinXP KVM client. The command was:
>> sudo tcpdump -nev -i eth0 -Z chris >eth0.trace
>
>
> Could you send "netstat -s" before/after your tests ?
>
Before:

$ netstat -s
Ip:
     485 total packets received
     10 forwarded
     0 incoming packets discarded
     473 incoming packets delivered
     383 requests sent out
Icmp:
     0 ICMP messages received
     0 input ICMP message failed.
     ICMP input histogram:
     0 ICMP messages sent
     0 ICMP messages failed
     ICMP output histogram:
Tcp:
     12 active connections openings
     0 passive connection openings
     6 failed connection attempts
     0 connection resets received
     5 connections established
     374 segments received
     306 segments send out
     0 segments retransmited
     0 bad segments received.
     6 resets sent
Udp:
     164 packets received
     0 packets to unknown port received.
     0 packet receive errors
     67 packets sent
     RcvbufErrors: 0
     SndbufErrors: 0
UdpLite:
     InDatagrams: 0
     NoPorts: 0
     InErrors: 0
     OutDatagrams: 0
     RcvbufErrors: 0
     SndbufErrors: 0
error parsing /proc/net/snmp: Success

After:

$ netstat -s
Ip:
     519 total packets received
     21 forwarded
     0 incoming packets discarded
     496 incoming packets delivered
     406 requests sent out
Icmp:
     4 ICMP messages received
     4 input ICMP message failed.
     ICMP input histogram:
         echo replies: 4
     0 ICMP messages sent
     0 ICMP messages failed
     ICMP output histogram:
IcmpMsg:
         InType0: 4
Tcp:
     13 active connections openings
     0 passive connection openings
     6 failed connection attempts
     0 connection resets received
     5 connections established
     381 segments received
     316 segments send out
     0 segments retransmited
     0 bad segments received.
     6 resets sent
Udp:
     173 packets received
     0 packets to unknown port received.
     0 packet receive errors
     69 packets sent
     RcvbufErrors: 0
     SndbufErrors: 0
UdpLite:
     InDatagrams: 0
     NoPorts: 0
     InErrors: 0
     OutDatagrams: 0
     RcvbufErrors: 0
     SndbufErrors: 0
error parsing /proc/net/snmp: Success

>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01  8:36                             ` Chris Clayton
@ 2012-10-01  9:15                               ` Eric Dumazet
  2012-10-01 15:13                                 ` Chris Clayton
  2012-10-01 19:34                                 ` Dave Jones
  0 siblings, 2 replies; 59+ messages in thread
From: Eric Dumazet @ 2012-10-01  9:15 UTC (permalink / raw)
  To: Chris Clayton; +Cc: David Miller, netdev, gpiez

On Mon, 2012-10-01 at 09:36 +0100, Chris Clayton wrote:
> 

>      0 ICMP messages received
>      0 input ICMP message failed.
>      ICMP input histogram:
>      0 ICMP messages sent
>      0 ICMP messages failed
>      ICMP output histogram:

> 
> After:
> 
> $ netstat -s
> Icmp:
>      4 ICMP messages received
>      4 input ICMP message failed.
>      ICMP input histogram:
>          echo replies: 4

So icmp replies come back and are delivered to host instead of being
forwarded.

I wonder if MASQUERADE broke...

Could you send

iptables -t -nat -nvL
conntrack -L   # while ping is running from guest

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01  9:15                               ` Eric Dumazet
@ 2012-10-01 15:13                                 ` Chris Clayton
  2012-10-01 15:31                                   ` Eric Dumazet
  2012-10-01 19:34                                 ` Dave Jones
  1 sibling, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-10-01 15:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, gpiez



On 10/01/12 10:15, Eric Dumazet wrote:
> On Mon, 2012-10-01 at 09:36 +0100, Chris Clayton wrote:
>>
>
>>       0 ICMP messages received
>>       0 input ICMP message failed.
>>       ICMP input histogram:
>>       0 ICMP messages sent
>>       0 ICMP messages failed
>>       ICMP output histogram:
>
>>
>> After:
>>
>> $ netstat -s
>> Icmp:
>>       4 ICMP messages received
>>       4 input ICMP message failed.
>>       ICMP input histogram:
>>           echo replies: 4
>
> So icmp replies come back and are delivered to host instead of being
> forwarded.
>
> I wonder if MASQUERADE broke...
>
> Could you send
>
> iptables -t -nat -nvL

$ iptables -t -nat -nvL
iptables v1.4.15: can't initialize iptables table `-nat': Table does not 
exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.

> conntrack -L   # while ping is running from guest

$ conntrack -L
conntrack v1.2.2 (conntrack-tools): Operation failed: invalid parameters

Forgive me for asking, but why is the problem not down to the change 
that I identified by bisecting? The title of the patch is "ipv4: Cache 
local output routes" and, although I'm a million miles from being an 
expert here, to me it does make it look a good candidate. 
http://marc.info/?l=linux-netdev&m=134797809611847&w=2

>
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 15:13                                 ` Chris Clayton
@ 2012-10-01 15:31                                   ` Eric Dumazet
  2012-10-01 16:19                                     ` Chris Clayton
  2012-10-01 18:34                                     ` Captain Obvious
  0 siblings, 2 replies; 59+ messages in thread
From: Eric Dumazet @ 2012-10-01 15:31 UTC (permalink / raw)
  To: Chris Clayton; +Cc: David Miller, netdev, gpiez

On Mon, 2012-10-01 at 16:13 +0100, Chris Clayton wrote:
> 
> On 10/01/12 10:15, Eric Dumazet wrote:
> > On Mon, 2012-10-01 at 09:36 +0100, Chris Clayton wrote:
> >>
> >
> >>       0 ICMP messages received
> >>       0 input ICMP message failed.
> >>       ICMP input histogram:
> >>       0 ICMP messages sent
> >>       0 ICMP messages failed
> >>       ICMP output histogram:
> >
> >>
> >> After:
> >>
> >> $ netstat -s
> >> Icmp:
> >>       4 ICMP messages received
> >>       4 input ICMP message failed.
> >>       ICMP input histogram:
> >>           echo replies: 4
> >
> > So icmp replies come back and are delivered to host instead of being
> > forwarded.
> >
> > I wonder if MASQUERADE broke...
> >
> > Could you send
> >
> > iptables -t -nat -nvL
> 
> $ iptables -t -nat -nvL
> iptables v1.4.15: can't initialize iptables table `-nat': Table does not 
> exist (do you need to insmod?)
> Perhaps iptables or your kernel needs to be upgraded.
> 
> > conntrack -L   # while ping is running from guest
> 
> $ conntrack -L
> conntrack v1.2.2 (conntrack-tools): Operation failed: invalid parameters
> 

Thats not expected, you described you used MASQUERADE target, so
"iptables -t nat -nvL" should display something.


> Forgive me for asking, but why is the problem not down to the change 
> that I identified by bisecting? The title of the patch is "ipv4: Cache 
> local output routes" and, although I'm a million miles from being an 
> expert here, to me it does make it look a good candidate. 
> http://marc.info/?l=linux-netdev&m=134797809611847&w=2

Because I cant reproduce your problem at all, using your setup.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 15:31                                   ` Eric Dumazet
@ 2012-10-01 16:19                                     ` Chris Clayton
  2012-10-01 16:37                                       ` Eric Dumazet
  2012-10-01 18:34                                     ` Captain Obvious
  1 sibling, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-10-01 16:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, gpiez



On 10/01/12 16:31, Eric Dumazet wrote:
> On Mon, 2012-10-01 at 16:13 +0100, Chris Clayton wrote:
>>
>> On 10/01/12 10:15, Eric Dumazet wrote:
>>> On Mon, 2012-10-01 at 09:36 +0100, Chris Clayton wrote:
>>>>
>>>
>>>>        0 ICMP messages received
>>>>        0 input ICMP message failed.
>>>>        ICMP input histogram:
>>>>        0 ICMP messages sent
>>>>        0 ICMP messages failed
>>>>        ICMP output histogram:
>>>
>>>>
>>>> After:
>>>>
>>>> $ netstat -s
>>>> Icmp:
>>>>        4 ICMP messages received
>>>>        4 input ICMP message failed.
>>>>        ICMP input histogram:
>>>>            echo replies: 4
>>>
>>> So icmp replies come back and are delivered to host instead of being
>>> forwarded.
>>>
>>> I wonder if MASQUERADE broke...
>>>
>>> Could you send
>>>
>>> iptables -t -nat -nvL
>>
>> $ iptables -t -nat -nvL
>> iptables v1.4.15: can't initialize iptables table `-nat': Table does not
>> exist (do you need to insmod?)
>> Perhaps iptables or your kernel needs to be upgraded.
>>
>>> conntrack -L   # while ping is running from guest
>>
>> $ conntrack -L
>> conntrack v1.2.2 (conntrack-tools): Operation failed: invalid parameters
>>
>
> Thats not expected, you described you used MASQUERADE target, so
> "iptables -t nat -nvL" should display something.
>

To check this I've booted a 3.5.4 kernel. I get the same response to the 
two commands. I also double checked that, with a 3.5.4 kernel, pinging 
the router and browsing the internet from the client work and they do.

Except for the packets and bytes columns, the command iptables -nvL 
gives the following output under both 3.5.4 and 3.6.0 kernels:

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
  pkts bytes target     prot opt in     out     source destination
  3757 3240K ACCEPT     all  --  *      *       0.0.0.0/0 0.0.0.0/0 
        state RELATED,ESTABLISHED
    14   840 ACCEPT     all  --  *      *       127.0.0.1 127.0.0.1
    41  4362 ACCEPT     all  --  *      *       192.168.0.0/24 0.0.0.0/0
    90 12780 ACCEPT     all  --  *      *       192.168.200.0/24 0.0.0.0/0
     0     0 ACCEPT     all  --  *      *       192.168.201.0/24 0.0.0.0/0
     0     0 DROP       all  --  *      *       0.0.0.0/0 0.0.0.0/0

Chain FORWARD (policy ACCEPT 4470 packets, 3065K bytes)
  pkts bytes target     prot opt in     out     source destination

Chain OUTPUT (policy ACCEPT 3243 packets, 349K bytes)
  pkts bytes target     prot opt in     out     source destination
    64  8344 ACCEPT     all  --  *      *       0.0.0.0/0 192.168.200.0/24
     0     0 ACCEPT     all  --  *      *       0.0.0.0/0 192.168.201.0/24

>
>> Forgive me for asking, but why is the problem not down to the change
>> that I identified by bisecting? The title of the patch is "ipv4: Cache
>> local output routes" and, although I'm a million miles from being an
>> expert here, to me it does make it look a good candidate.
>> http://marc.info/?l=linux-netdev&m=134797809611847&w=2
>
> Because I cant reproduce your problem at all, using your setup.
>
OK, thanks.
>
>
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 16:19                                     ` Chris Clayton
@ 2012-10-01 16:37                                       ` Eric Dumazet
  2012-10-01 18:28                                         ` Chris Clayton
  0 siblings, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-10-01 16:37 UTC (permalink / raw)
  To: Chris Clayton; +Cc: David Miller, netdev, gpiez

On Mon, 2012-10-01 at 17:19 +0100, Chris Clayton wrote:
> 
> On 10/01/12 16:31, Eric Dumazet wrote:
> > On Mon, 2012-10-01 at 16:13 +0100, Chris Clayton wrote:
> >>
> >> On 10/01/12 10:15, Eric Dumazet wrote:
> >>> On Mon, 2012-10-01 at 09:36 +0100, Chris Clayton wrote:
> >>>>
> >>>
> >>>>        0 ICMP messages received
> >>>>        0 input ICMP message failed.
> >>>>        ICMP input histogram:
> >>>>        0 ICMP messages sent
> >>>>        0 ICMP messages failed
> >>>>        ICMP output histogram:
> >>>
> >>>>
> >>>> After:
> >>>>
> >>>> $ netstat -s
> >>>> Icmp:
> >>>>        4 ICMP messages received
> >>>>        4 input ICMP message failed.
> >>>>        ICMP input histogram:
> >>>>            echo replies: 4
> >>>
> >>> So icmp replies come back and are delivered to host instead of being
> >>> forwarded.
> >>>
> >>> I wonder if MASQUERADE broke...
> >>>
> >>> Could you send
> >>>
> >>> iptables -t -nat -nvL
> >>
> >> $ iptables -t -nat -nvL
> >> iptables v1.4.15: can't initialize iptables table `-nat': Table does not
> >> exist (do you need to insmod?)
> >> Perhaps iptables or your kernel needs to be upgraded.
> >>
> >>> conntrack -L   # while ping is running from guest
> >>
> >> $ conntrack -L
> >> conntrack v1.2.2 (conntrack-tools): Operation failed: invalid parameters
> >>
> >
> > Thats not expected, you described you used MASQUERADE target, so
> > "iptables -t nat -nvL" should display something.
> >
> 
> To check this I've booted a 3.5.4 kernel. I get the same response to the 
> two commands. I also double checked that, with a 3.5.4 kernel, pinging 
> the router and browsing the internet from the client work and they do.
> 
> Except for the packets and bytes columns, the command iptables -nvL 
> gives the following output under both 3.5.4 and 3.6.0 kernels:
> 
> Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
>   pkts bytes target     prot opt in     out     source destination
>   3757 3240K ACCEPT     all  --  *      *       0.0.0.0/0 0.0.0.0/0 
>         state RELATED,ESTABLISHED
>     14   840 ACCEPT     all  --  *      *       127.0.0.1 127.0.0.1
>     41  4362 ACCEPT     all  --  *      *       192.168.0.0/24 0.0.0.0/0
>     90 12780 ACCEPT     all  --  *      *       192.168.200.0/24 0.0.0.0/0
>      0     0 ACCEPT     all  --  *      *       192.168.201.0/24 0.0.0.0/0
>      0     0 DROP       all  --  *      *       0.0.0.0/0 0.0.0.0/0
> 
> Chain FORWARD (policy ACCEPT 4470 packets, 3065K bytes)
>   pkts bytes target     prot opt in     out     source destination
> 
> Chain OUTPUT (policy ACCEPT 3243 packets, 349K bytes)
>   pkts bytes target     prot opt in     out     source destination
>     64  8344 ACCEPT     all  --  *      *       0.0.0.0/0 192.168.200.0/24
>      0     0 ACCEPT     all  --  *      *       0.0.0.0/0 192.168.201.0/24

I am lost, since n your first mail you said :
-----------------------------------------------------------------------------
# Load the connection-sharing for qemu/kvm guests
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
...
# allow traffic to and from the qemu/kvm virtual networks
NETS="200 201"
for net in $NETS; do
   iptables -A INPUT -s 192.168.$net.0/24 -j ACCEPT
   iptables -A OUTPUT -d 192.168.$net.0/24 -j ACCEPT
done
...

The network-related modules that are loaded are:

$ lsmod
Module                  Size  Used by
tun                    12412  0
xt_state                 891  1
iptable_filter           852  1
ipt_MASQUERADE          1222  1
iptable_nat             3087  1
nf_nat                 10901  2 ipt_MASQUERADE,iptable_nat
nf_conntrack_ipv4       4942  4 nf_nat,iptable_nat
nf_defrag_ipv4           815  1 nf_conntrack_ipv4
nf_conntrack           37644  5 
ipt_MASQUERADE,nf_nat,xt_state,iptable_nat,nf_conntrack_ipv4
...
r8169                  47159  0


-----------------------------------------------

Now you say you dont have nat ?

Something is wrong.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 16:37                                       ` Eric Dumazet
@ 2012-10-01 18:28                                         ` Chris Clayton
  0 siblings, 0 replies; 59+ messages in thread
From: Chris Clayton @ 2012-10-01 18:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, gpiez



On 10/01/12 17:37, Eric Dumazet wrote:
> On Mon, 2012-10-01 at 17:19 +0100, Chris Clayton wrote:
>>
>> On 10/01/12 16:31, Eric Dumazet wrote:
>>> On Mon, 2012-10-01 at 16:13 +0100, Chris Clayton wrote:
>>>>
>>>> On 10/01/12 10:15, Eric Dumazet wrote:
>>>>> On Mon, 2012-10-01 at 09:36 +0100, Chris Clayton wrote:
>>>>>>
>>>>>
>>>>>>         0 ICMP messages received
>>>>>>         0 input ICMP message failed.
>>>>>>         ICMP input histogram:
>>>>>>         0 ICMP messages sent
>>>>>>         0 ICMP messages failed
>>>>>>         ICMP output histogram:
>>>>>
>>>>>>
>>>>>> After:
>>>>>>
>>>>>> $ netstat -s
>>>>>> Icmp:
>>>>>>         4 ICMP messages received
>>>>>>         4 input ICMP message failed.
>>>>>>         ICMP input histogram:
>>>>>>             echo replies: 4
>>>>>
>>>>> So icmp replies come back and are delivered to host instead of being
>>>>> forwarded.
>>>>>
>>>>> I wonder if MASQUERADE broke...
>>>>>
>>>>> Could you send
>>>>>
>>>>> iptables -t -nat -nvL
>>>>
>>>> $ iptables -t -nat -nvL
>>>> iptables v1.4.15: can't initialize iptables table `-nat': Table does not
>>>> exist (do you need to insmod?)
>>>> Perhaps iptables or your kernel needs to be upgraded.
>>>>
>>>>> conntrack -L   # while ping is running from guest
>>>>
>>>> $ conntrack -L
>>>> conntrack v1.2.2 (conntrack-tools): Operation failed: invalid parameters
>>>>
>>>
>>> Thats not expected, you described you used MASQUERADE target, so
>>> "iptables -t nat -nvL" should display something.
>>>
>>
>> To check this I've booted a 3.5.4 kernel. I get the same response to the
>> two commands. I also double checked that, with a 3.5.4 kernel, pinging
>> the router and browsing the internet from the client work and they do.
>>
>> Except for the packets and bytes columns, the command iptables -nvL
>> gives the following output under both 3.5.4 and 3.6.0 kernels:
>>
>> Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
>>    pkts bytes target     prot opt in     out     source destination
>>    3757 3240K ACCEPT     all  --  *      *       0.0.0.0/0 0.0.0.0/0
>>          state RELATED,ESTABLISHED
>>      14   840 ACCEPT     all  --  *      *       127.0.0.1 127.0.0.1
>>      41  4362 ACCEPT     all  --  *      *       192.168.0.0/24 0.0.0.0/0
>>      90 12780 ACCEPT     all  --  *      *       192.168.200.0/24 0.0.0.0/0
>>       0     0 ACCEPT     all  --  *      *       192.168.201.0/24 0.0.0.0/0
>>       0     0 DROP       all  --  *      *       0.0.0.0/0 0.0.0.0/0
>>
>> Chain FORWARD (policy ACCEPT 4470 packets, 3065K bytes)
>>    pkts bytes target     prot opt in     out     source destination
>>
>> Chain OUTPUT (policy ACCEPT 3243 packets, 349K bytes)
>>    pkts bytes target     prot opt in     out     source destination
>>      64  8344 ACCEPT     all  --  *      *       0.0.0.0/0 192.168.200.0/24
>>       0     0 ACCEPT     all  --  *      *       0.0.0.0/0 192.168.201.0/24
>
> I am lost, since n your first mail you said :
> -----------------------------------------------------------------------------
> # Load the connection-sharing for qemu/kvm guests
> echo 1 > /proc/sys/net/ipv4/ip_forward
> iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
> ...
> # allow traffic to and from the qemu/kvm virtual networks
> NETS="200 201"
> for net in $NETS; do
>     iptables -A INPUT -s 192.168.$net.0/24 -j ACCEPT
>     iptables -A OUTPUT -d 192.168.$net.0/24 -j ACCEPT
> done
> ...
>
> The network-related modules that are loaded are:
>
> $ lsmod
> Module                  Size  Used by
> tun                    12412  0
> xt_state                 891  1
> iptable_filter           852  1
> ipt_MASQUERADE          1222  1
> iptable_nat             3087  1
> nf_nat                 10901  2 ipt_MASQUERADE,iptable_nat
> nf_conntrack_ipv4       4942  4 nf_nat,iptable_nat
> nf_defrag_ipv4           815  1 nf_conntrack_ipv4
> nf_conntrack           37644  5
> ipt_MASQUERADE,nf_nat,xt_state,iptable_nat,nf_conntrack_ipv4
> ...
> r8169                  47159  0
>
>
> -----------------------------------------------
>
> Now you say you dont have nat ?
>
> Something is wrong.
>

Here's the complete script that starts up my firewall. I can't recall 
having changed this at all for two or three years, other than when a 
replacement router changed the network from 192.168.1.x or I add (or 
remove) other networks to (from) the $NETS list for other KVM clients

$ cat /etc/rc.d/rc.firewall
#! /bin/sh

case "$1" in
     stop)
         echo 0 > /proc/sys/net/ipv4/ip_forward
         # clear out the current settings
         iptables -F
         iptables -X
         iptables -Z
         ;;
     start)
         # Load the connection-sharing for qemu/kvm guests
         echo 1 > /proc/sys/net/ipv4/ip_forward
         iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

         iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

         # Allow anything internal to this machine (i.e. localhost)
	# is this really necessary?
         iptables -A INPUT -s 127.0.0.1 -d 127.0.0.1 -j ACCEPT

         # Allow any traffic from nodes on home network
         iptables -A INPUT -s 192.168.0.0/24 -j ACCEPT

         # and traffic to and from the qemu/kvm virtual networks
         NETS="200 201"
         for net in $NETS; do
             iptables -A INPUT -s 192.168.$net.0/24 -j ACCEPT
             iptables -A OUTPUT -d 192.168.$net.0/24 -j ACCEPT
          done

         # drop everything else
         # iptables -A INPUT -j LOG --log-level 4 --log-prefix "FIREWALL: "
         iptables -A INPUT -j DROP
         ;;
     restart|reload)
         $0 stop
         $0 start
         ;;
     status)
         iptables -L
         ;;
     *)
         echo "Usage: $0 {start|stop|restart|reload|status}"
         exit 1
         ;;
esac

>

eth0 is set up by calling /sbin/ifup from udev on the add event for eth0 
(wlan0 is disabled on the laptop, so that won't be getting in the way). 
Here's the script (the SSID is not really XXXXX:

$ cat /sbin/ifup
#!/bin/sh

PATH="/usr/bin:/usr/sbin:/sbin:/bin"
export PATH

SSID=XXXXX

#logger "$0 called with arguments $@"
if [ "$1" = "wlan0" ]; then

     # Bring the interface up before the iwconfig stuff below
     # assign ip address later else association with AP fails when using WPA
     ifconfig wlan0 up

     # Configure the wireless adapter
     iw wlan0 connect $SSID

     # start wpa_supplicant
     if [ -z `pgrep wpa_supplicant` ]; then
         wpa_supplicant -c/etc/wpa_supplicant/wpa_supplicant.conf 
-iwlan0 -Dwext -B -f/var/log/wpa_supplicant.log
     fi

     # wait until associated with the AP - can take a while with WPA
     secs=0
     until iw wlan0 link | grep -q "SSID: $SSID"; do
         let secs++
         if [ $secs -ge 20 ]; then
             logger -p user.err -t IFUP "Failed to associate with AP 
within 20 seconds"
             exit -1
         fi
         sleep 1
     done

     # set the regulatory domain (kernel >= 2.6.28)
     iw reg set GB

     ifconfig wlan0 192.168.0.140 netmask 255.255.255.0 up

     route add default gw 192.168.0.1 netmask 0.0.0.0 metric 1

     exit 0

fi

if [ "$1" = "eth0" ] ; then
     # load the module if necessary
     if ! grep -q eth0 /proc/net/dev; then
         modprobe r8169
     fi

     # wait up to 5 seconds for eth0 to appear
     secs=0
     until grep -q eth0 /proc/net/dev; do
         let secs++
         if [ $secs -ge 5 ]; then
             logger -p user.err -t IFUP "eth0 failed to appear within 5 
seconds"
             exit -1
         fi
         sleep 1
     done

     ifconfig eth0 192.168.0.40 netmask 255.255.255.0 up

     route add default gw 192.168.0.1 netmask 0.0.0.0 metric 1

     exit 0
fi

When the KVM client is running the routing on the host is:

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use 
Iface
default         router.local.la 0.0.0.0         UG    1      0        0 eth0
Unix            *               255.0.0.0       U     0      0        0 lo
local.lan       *               255.255.255.0   U     0      0        0 eth0
192.168.200.0   *               255.255.255.0   U     0      0        0 tap0

Like I say, the set up has been like this for ages and has worked. It's 
only since I started using 3.6 kernels that I've had a problem. I don't 
recall anything from the nat table ever having been listed by iptables -L.

>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 15:31                                   ` Eric Dumazet
  2012-10-01 16:19                                     ` Chris Clayton
@ 2012-10-01 18:34                                     ` Captain Obvious
  2012-10-01 19:21                                       ` Eric Dumazet
  2012-10-01 19:22                                       ` Chris Clayton
  1 sibling, 2 replies; 59+ messages in thread
From: Captain Obvious @ 2012-10-01 18:34 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Eric Dumazet, David Miller, netdev, gpiez

Eric Dumazet <eric.dumazet@gmail.com> :
[...]
> > > Could you send
> > >
> > > iptables -t -nat -nvL
> > 
> > $ iptables -t -nat -nvL
                  ^ typo

Please try "iptables -t nat -nvL" as was also suggested.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 18:34                                     ` Captain Obvious
@ 2012-10-01 19:21                                       ` Eric Dumazet
  2012-10-01 19:55                                         ` Chris Clayton
  2012-10-01 19:22                                       ` Chris Clayton
  1 sibling, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-10-01 19:21 UTC (permalink / raw)
  To: Captain Obvious; +Cc: Chris Clayton, David Miller, netdev, gpiez

On Mon, 2012-10-01 at 20:34 +0200, Captain Obvious wrote:
> Eric Dumazet <eric.dumazet@gmail.com> :
> [...]
> > > > Could you send
> > > >
> > > > iptables -t -nat -nvL
> > > 
> > > $ iptables -t -nat -nvL
>                   ^ typo
> 
> Please try "iptables -t nat -nvL" as was also suggested.
> 

Oh well, good catch ;)

And for conntrack -L, please Chris add CONFIG_NF_CT_NETLINK=m to your
kernel .config

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 18:34                                     ` Captain Obvious
  2012-10-01 19:21                                       ` Eric Dumazet
@ 2012-10-01 19:22                                       ` Chris Clayton
  1 sibling, 0 replies; 59+ messages in thread
From: Chris Clayton @ 2012-10-01 19:22 UTC (permalink / raw)
  To: Captain Obvious; +Cc: Eric Dumazet, David Miller, netdev, gpiez



On 10/01/12 19:34, Captain Obvious wrote:
> Eric Dumazet <eric.dumazet@gmail.com> :
> [...]
>>>> Could you send
>>>>
>>>> iptables -t -nat -nvL
>>>
>>> $ iptables -t -nat -nvL
>                    ^ typo
>
> Please try "iptables -t nat -nvL" as was also suggested.
>

Good catch, Captain. Thanks.

$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 58 packets, 7716 bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain INPUT (policy ACCEPT 41 packets, 5895 bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain OUTPUT (policy ACCEPT 1158 packets, 75559 bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain POSTROUTING (policy ACCEPT 208 packets, 14279 bytes)
  pkts bytes target     prot opt in     out     source 
destination
   951 61351 MASQUERADE  all  --  *      eth0    0.0.0.0/0 
0.0.0.0/0

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01  9:15                               ` Eric Dumazet
  2012-10-01 15:13                                 ` Chris Clayton
@ 2012-10-01 19:34                                 ` Dave Jones
  2012-10-01 20:01                                   ` David Miller
  1 sibling, 1 reply; 59+ messages in thread
From: Dave Jones @ 2012-10-01 19:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Chris Clayton, David Miller, netdev, gpiez

On Mon, Oct 01, 2012 at 11:15:50AM +0200, Eric Dumazet wrote:
 > > 
 > > $ netstat -s
 > > Icmp:
 > >      4 ICMP messages received
 > >      4 input ICMP message failed.
 > >      ICMP input histogram:
 > >          echo replies: 4
 > 
 > So icmp replies come back and are delivered to host instead of being
 > forwarded.
 > 
 > I wonder if MASQUERADE broke...

I hit something that sounds just like this a few months back..
http://lists.openwall.net/netdev/2012/07/25/53

It "went away" a few builds later, but I've seen it happen
again from time to time.

	Dave

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 19:21                                       ` Eric Dumazet
@ 2012-10-01 19:55                                         ` Chris Clayton
  0 siblings, 0 replies; 59+ messages in thread
From: Chris Clayton @ 2012-10-01 19:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Captain Obvious, David Miller, netdev, gpiez



On 10/01/12 20:21, Eric Dumazet wrote:
> On Mon, 2012-10-01 at 20:34 +0200, Captain Obvious wrote:
>> Eric Dumazet <eric.dumazet@gmail.com> :
>> [...]
>>>>> Could you send
>>>>>
>>>>> iptables -t -nat -nvL
>>>>
>>>> $ iptables -t -nat -nvL
>>                    ^ typo
>>
>> Please try "iptables -t nat -nvL" as was also suggested.
>>
>
> Oh well, good catch ;)
>
> And for conntrack -L, please Chris add CONFIG_NF_CT_NETLINK=m to your
> kernel .config
>

$ conntrack -L
unknown  2 566 src=192.168.0.1 dst=224.0.0.1 [UNREPLIED] src=224.0.0.1 
dst=192.168.0.1 use=1
icmp     1 25 src=192.168.200.1 dst=192.168.0.1 type=8 code=0 id=512 
src=192.168.0.1 dst=192.168.0.40 type=0 code=0 id=512 use=1
conntrack v1.2.2 (conntrack-tools): 2 flow entries have been shown.

>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 19:34                                 ` Dave Jones
@ 2012-10-01 20:01                                   ` David Miller
  2012-10-01 20:04                                     ` Eric Dumazet
  0 siblings, 1 reply; 59+ messages in thread
From: David Miller @ 2012-10-01 20:01 UTC (permalink / raw)
  To: davej; +Cc: eric.dumazet, chris2553, netdev, gpiez

From: Dave Jones <davej@redhat.com>
Date: Mon, 1 Oct 2012 15:34:34 -0400

> On Mon, Oct 01, 2012 at 11:15:50AM +0200, Eric Dumazet wrote:
>  > > 
>  > > $ netstat -s
>  > > Icmp:
>  > >      4 ICMP messages received
>  > >      4 input ICMP message failed.
>  > >      ICMP input histogram:
>  > >          echo replies: 4
>  > 
>  > So icmp replies come back and are delivered to host instead of being
>  > forwarded.
>  > 
>  > I wonder if MASQUERADE broke...
> 
> I hit something that sounds just like this a few months back..
> http://lists.openwall.net/netdev/2012/07/25/53
> 
> It "went away" a few builds later, but I've seen it happen
> again from time to time.

Yep I remembe that report.

If you can find a way to more reliably trigger the case, that would
help us immensely.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 20:01                                   ` David Miller
@ 2012-10-01 20:04                                     ` Eric Dumazet
  2012-10-02 15:27                                       ` Edivaldo de Araújo Pereira
  2012-10-02 15:35                                       ` Eric Dumazet
  0 siblings, 2 replies; 59+ messages in thread
From: Eric Dumazet @ 2012-10-01 20:04 UTC (permalink / raw)
  To: David Miller; +Cc: davej, chris2553, netdev, gpiez

On Mon, 2012-10-01 at 16:01 -0400, David Miller wrote:
> From: Dave Jones <davej@redhat.com>
> Date: Mon, 1 Oct 2012 15:34:34 -0400
> 
> > On Mon, Oct 01, 2012 at 11:15:50AM +0200, Eric Dumazet wrote:
> >  > > 
> >  > > $ netstat -s
> >  > > Icmp:
> >  > >      4 ICMP messages received
> >  > >      4 input ICMP message failed.
> >  > >      ICMP input histogram:
> >  > >          echo replies: 4
> >  > 
> >  > So icmp replies come back and are delivered to host instead of being
> >  > forwarded.
> >  > 
> >  > I wonder if MASQUERADE broke...
> > 
> > I hit something that sounds just like this a few months back..
> > http://lists.openwall.net/netdev/2012/07/25/53
> > 
> > It "went away" a few builds later, but I've seen it happen
> > again from time to time.
> 
> Yep I remembe that report.
> 
> If you can find a way to more reliably trigger the case, that would
> help us immensely.

I am building a KMEMCHECK kernel, as a last try before my night ;)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 20:04                                     ` Eric Dumazet
@ 2012-10-02 15:27                                       ` Edivaldo de Araújo Pereira
  2012-10-02 15:35                                       ` Eric Dumazet
  1 sibling, 0 replies; 59+ messages in thread
From: Edivaldo de Araújo Pereira @ 2012-10-02 15:27 UTC (permalink / raw)
  To: netdev

HEric Dumazet <eric.dumazet <at> gmail.com> writes:

> 
> On Mon, 2012-10-01 at 16:01 -0400, David Miller wrote:
> > From: Dave Jones <davej <at> redhat.com>
> > Date: Mon, 1 Oct 2012 15:34:34 -0400
> > 
> > > On Mon, Oct 01, 2012 at 11:15:50AM +0200, Eric Dumazet wrote:
> > >  > > 
> > >  > > $ netstat -s
> > >  > > Icmp:
> > >  > >      4 ICMP messages received
> > >  > >      4 input ICMP message failed.
> > >  > >      ICMP input histogram:
> > >  > >          echo replies: 4
> > >  > 
> > >  > So icmp replies come back and are delivered to host instead of being
> > >  > forwarded.
> > >  > 
> > >  > I wonder if MASQUERADE broke...
> > > 
> > > I hit something that sounds just like this a few months back..
> > > http://lists.openwall.net/netdev/2012/07/25/53
> > > 
> > > It "went away" a few builds later, but I've seen it happen
> > > again from time to time.
> > 
> > Yep I remembe that report.
> > 
> > If you can find a way to more reliably trigger the case, that would
> > help us immensely.
> 
> I am building a KMEMCHECK kernel, as a last try before my night ;)
> 
> 
i,

I'm facing this kind of problem, too, but it is a little different; from the 
kvm guest I can ping the local host and any host outside my local (physical) 
network, but cannot ping other hosts in the local (physical) net. This happens 
whith guests in a virtual switch (vde) or in any bridged tun/tap. I switched 
back to 3.5.4, for now.

Thanks
Edivaldo de Araújo Pereira

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-01 20:04                                     ` Eric Dumazet
  2012-10-02 15:27                                       ` Edivaldo de Araújo Pereira
@ 2012-10-02 15:35                                       ` Eric Dumazet
  2012-10-02 15:48                                         ` Eric Dumazet
  1 sibling, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-10-02 15:35 UTC (permalink / raw)
  To: David Miller; +Cc: davej, chris2553, netdev, gpiez

On Mon, 2012-10-01 at 22:04 +0200, Eric Dumazet wrote:
> On Mon, 2012-10-01 at 16:01 -0400, David Miller wrote:

> > If you can find a way to more reliably trigger the case, that would
> > help us immensely.
> 
> I am building a KMEMCHECK kernel, as a last try before my night ;)

This was a total disaster. KMEMCHECK dies horribly on my machine

David, shouldnt we use a nh_rth_forward instead of a nh_rth_input in
__mkroute_input() ?

(And change rt_cache_route() as well ?)

I am testing a patch right now.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-02 15:35                                       ` Eric Dumazet
@ 2012-10-02 15:48                                         ` Eric Dumazet
  2012-10-02 15:57                                           ` Dave Jones
                                                             ` (4 more replies)
  0 siblings, 5 replies; 59+ messages in thread
From: Eric Dumazet @ 2012-10-02 15:48 UTC (permalink / raw)
  To: David Miller; +Cc: chris2553, netdev, gpiez, Dave Jones

From: Eric Dumazet <edumazet@google.com>

On Tue, 2012-10-02 at 17:35 +0200, Eric Dumazet wrote:
> On Mon, 2012-10-01 at 22:04 +0200, Eric Dumazet wrote:
> > On Mon, 2012-10-01 at 16:01 -0400, David Miller wrote:
> 
> > > If you can find a way to more reliably trigger the case, that would
> > > help us immensely.
> > 
> > I am building a KMEMCHECK kernel, as a last try before my night ;)
> 
> This was a total disaster. KMEMCHECK dies horribly on my machine
> 
> David, shouldnt we use a nh_rth_forward instead of a nh_rth_input in
> __mkroute_input() ?
> 
> (And change rt_cache_route() as well ?)
> 
> I am testing a patch right now.

Yeah, this patch seems to fix the bug for me.

[PATCH] ipv4: properly cache forward routes

commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.

This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.

Add a separate cache (nh_rth_forward) to solve the problem.

Many thanks to Chris Clayton for his patience and help.

Reported-by: Chris Clayton <chris2553@googlemail.com>
Bisected-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ip_fib.h     |    1 +
 net/ipv4/fib_semantics.c |    1 +
 net/ipv4/route.c         |   16 ++++++++--------
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 926142e..ce7ffe9 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -85,6 +85,7 @@ struct fib_nh {
 	int			nh_saddr_genid;
 	struct rtable __rcu * __percpu *nh_pcpu_rth_output;
 	struct rtable __rcu	*nh_rth_input;
+	struct rtable __rcu	*nh_rth_forward;
 	struct fnhe_hash_bucket	*nh_exceptions;
 };
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 3509065..45b5d1d 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -208,6 +208,7 @@ static void free_fib_info_rcu(struct rcu_head *head)
 			free_nh_exceptions(nexthop_nh);
 		rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output);
 		rt_fibinfo_free(&nexthop_nh->nh_rth_input);
+		rt_fibinfo_free(&nexthop_nh->nh_rth_forward);
 	} endfor_nexthops(fi);
 
 	release_net(fi->fib_net);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index ff62206..50898d6 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1193,14 +1193,12 @@ static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
 	return ret;
 }
 
-static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
+static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt, struct rtable **p)
 {
-	struct rtable *orig, *prev, **p;
+	struct rtable *orig, *prev;
 	bool ret = true;
 
-	if (rt_is_input_route(rt)) {
-		p = (struct rtable **)&nh->nh_rth_input;
-	} else {
+	if (!p) {
 		if (!nh->nh_pcpu_rth_output)
 			goto nocache;
 		p = (struct rtable **)__this_cpu_ptr(nh->nh_pcpu_rth_output);
@@ -1290,7 +1288,7 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 		if (unlikely(fnhe))
 			cached = rt_bind_exception(rt, fnhe, daddr);
 		else if (!(rt->dst.flags & DST_NOCACHE))
-			cached = rt_cache_route(nh, rt);
+			cached = rt_cache_route(nh, rt, NULL);
 	}
 	if (unlikely(!cached))
 		rt_add_uncached_list(rt);
@@ -1462,7 +1460,7 @@ static int __mkroute_input(struct sk_buff *skb,
 	do_cache = false;
 	if (res->fi) {
 		if (!itag) {
-			rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
+			rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_forward);
 			if (rt_cache_valid(rth)) {
 				skb_dst_set_noref(skb, &rth->dst);
 				goto out;
@@ -1493,6 +1491,8 @@ static int __mkroute_input(struct sk_buff *skb,
 
 	rt_set_nexthop(rth, daddr, res, NULL, res->fi, res->type, itag);
 	skb_dst_set(skb, &rth->dst);
+	if (do_cache)
+		rt_cache_route(&FIB_RES_NH(*res), rth, &FIB_RES_NH(*res).nh_rth_forward);
 out:
 	err = 0;
  cleanup:
@@ -1663,7 +1663,7 @@ local_input:
 		rth->rt_flags 	&= ~RTCF_LOCAL;
 	}
 	if (do_cache)
-		rt_cache_route(&FIB_RES_NH(res), rth);
+		rt_cache_route(&FIB_RES_NH(res), rth, &FIB_RES_NH(res).nh_rth_input);
 	skb_dst_set(skb, &rth->dst);
 	err = 0;
 	goto out;

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-02 15:48                                         ` Eric Dumazet
@ 2012-10-02 15:57                                           ` Dave Jones
  2012-10-02 16:06                                             ` Eric Dumazet
  2012-10-02 18:25                                           ` David Miller
                                                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 59+ messages in thread
From: Dave Jones @ 2012-10-02 15:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, chris2553, netdev, gpiez

On Tue, Oct 02, 2012 at 05:48:39PM +0200, Eric Dumazet wrote:
 > From: Eric Dumazet <edumazet@google.com>
 > 
 > On Tue, 2012-10-02 at 17:35 +0200, Eric Dumazet wrote:
 > > On Mon, 2012-10-01 at 22:04 +0200, Eric Dumazet wrote:
 > > > On Mon, 2012-10-01 at 16:01 -0400, David Miller wrote:
 > > 
 > > > > If you can find a way to more reliably trigger the case, that would
 > > > > help us immensely.
 > > > 
 > > > I am building a KMEMCHECK kernel, as a last try before my night ;)
 > > 
 > > This was a total disaster. KMEMCHECK dies horribly on my machine
 > > 
 > > David, shouldnt we use a nh_rth_forward instead of a nh_rth_input in
 > > __mkroute_input() ?
 > > 
 > > (And change rt_cache_route() as well ?)
 > > 
 > > I am testing a patch right now.
 > 
 > Yeah, this patch seems to fix the bug for me.

Good work! Any idea why it didn't happen on every build for me ?

>From your description, this should have failed every time ?

	Dave

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-02 15:57                                           ` Dave Jones
@ 2012-10-02 16:06                                             ` Eric Dumazet
  0 siblings, 0 replies; 59+ messages in thread
From: Eric Dumazet @ 2012-10-02 16:06 UTC (permalink / raw)
  To: Dave Jones; +Cc: David Miller, chris2553, netdev, gpiez

On Tue, 2012-10-02 at 11:57 -0400, Dave Jones wrote:

> 
> Good work! Any idea why it didn't happen on every build for me ?
> 
> From your description, this should have failed every time ?

Well, it seems that as long as you had forwarded packets and a route not
yet cached in nh_rth_input, we were using a brand new route (and correct
one)

But as soon as a locally generated traffic did cache a route in
nh_rth_input, forwarded packets immediately were using this cache and
were delivered (and dropped) to local host.

Maybe my patch is not the good fix, but at least its a step in
understanding the problem.

Thanks

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-02 15:48                                         ` Eric Dumazet
  2012-10-02 15:57                                           ` Dave Jones
@ 2012-10-02 18:25                                           ` David Miller
  2012-10-02 21:14                                             ` Alexander Duyck
  2012-10-02 23:24                                           ` Julian Anastasov
                                                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 59+ messages in thread
From: David Miller @ 2012-10-02 18:25 UTC (permalink / raw)
  To: eric.dumazet; +Cc: chris2553, netdev, gpiez, davej

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 02 Oct 2012 17:48:39 +0200

> [PATCH] ipv4: properly cache forward routes
> 
> commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
> introduced a regression for forwarding.
> 
> This was hard to reproduce but the symptom was that packets were
> delivered to local host instead of being forwarded.
> 
> Add a separate cache (nh_rth_forward) to solve the problem.
> 
> Many thanks to Chris Clayton for his patience and help.
> 
> Reported-by: Chris Clayton <chris2553@googlemail.com>
> Bisected-by: Chris Clayton <chris2553@googlemail.com>
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Thanks for figuring this out, I'll think about this more
deeply.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-02 18:25                                           ` David Miller
@ 2012-10-02 21:14                                             ` Alexander Duyck
  2012-10-02 21:35                                               ` Eric Dumazet
  0 siblings, 1 reply; 59+ messages in thread
From: Alexander Duyck @ 2012-10-02 21:14 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, chris2553, netdev, gpiez, davej

On 10/02/2012 11:25 AM, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 02 Oct 2012 17:48:39 +0200
>
>> [PATCH] ipv4: properly cache forward routes
>>
>> commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
>> introduced a regression for forwarding.
>>
>> This was hard to reproduce but the symptom was that packets were
>> delivered to local host instead of being forwarded.
>>
>> Add a separate cache (nh_rth_forward) to solve the problem.
>>
>> Many thanks to Chris Clayton for his patience and help.
>>
>> Reported-by: Chris Clayton <chris2553@googlemail.com>
>> Bisected-by: Chris Clayton <chris2553@googlemail.com>
>> Reported-by: Dave Jones <davej@redhat.com>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Thanks for figuring this out, I'll think about this more
> deeply.
I think something may have been missed in this patch.

With it applied to net-next I am unable to remove the ixgbe driver after
running a routing traffic test.  The specific message I am getting is:
    unregister_netdevice: waiting for eth2 to become free. Usage count = -7

Thanks,

Alex

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-02 21:14                                             ` Alexander Duyck
@ 2012-10-02 21:35                                               ` Eric Dumazet
  0 siblings, 0 replies; 59+ messages in thread
From: Eric Dumazet @ 2012-10-02 21:35 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: David Miller, chris2553, netdev, gpiez, davej

On Tue, 2012-10-02 at 14:14 -0700, Alexander Duyck wrote:

> I think something may have been missed in this patch.
> 
> With it applied to net-next I am unable to remove the ixgbe driver after
> running a routing traffic test.  The specific message I am getting is:
>     unregister_netdevice: waiting for eth2 to become free. Usage count = -7

Yes, I realized later that rt_set_nexthop(), called from
__mkroute_input() was responsible to do the caching...

So another version is needed, I'll do that tomorrow unless David can fix
the problem while I sleep a bit ;)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-02 15:48                                         ` Eric Dumazet
  2012-10-02 15:57                                           ` Dave Jones
  2012-10-02 18:25                                           ` David Miller
@ 2012-10-02 23:24                                           ` Julian Anastasov
  2012-10-03  3:10                                             ` David Miller
  2012-10-03  7:28                                             ` [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive Eric Dumazet
  2012-10-03  2:55                                           ` Possible networking regression in 3.6.0 David Miller
  2012-10-04 11:25                                           ` [PATCH] ipv4: add a fib_type to fib_info Eric Dumazet
  4 siblings, 2 replies; 59+ messages in thread
From: Julian Anastasov @ 2012-10-02 23:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, chris2553, netdev, gpiez, Dave Jones


	Hello,

On Tue, 2 Oct 2012, Eric Dumazet wrote:

> > David, shouldnt we use a nh_rth_forward instead of a nh_rth_input in
> > __mkroute_input() ?
> > 
> > (And change rt_cache_route() as well ?)
> > 
> > I am testing a patch right now.
> 
> Yeah, this patch seems to fix the bug for me.
> 
> [PATCH] ipv4: properly cache forward routes
> 
> commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
> introduced a regression for forwarding.
> 
> This was hard to reproduce but the symptom was that packets were
> delivered to local host instead of being forwarded.
> 
> Add a separate cache (nh_rth_forward) to solve the problem.

	Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:

broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src 192.168.0.1
192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1

	The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.

	And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.

	Another possible failure is for output routes:

multicast 224.0.0.0/4 fib_info
with unicast
192.168.0.0/24 fib_info

	The multicast sets RTCF_MULTICAST | RTCF_LOCAL
and can cause problems for generated unicast traffic on
fib_info reuse. Depends on the scope, for multicast it is
usually scope global, so may be it is difficult to happen
in practice.

	__mkroute_output works for local/unicast routes
because they differ in scope.

> Many thanks to Chris Clayton for his patience and help.
> 
> Reported-by: Chris Clayton <chris2553@googlemail.com>
> Bisected-by: Chris Clayton <chris2553@googlemail.com>
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/net/ip_fib.h     |    1 +
>  net/ipv4/fib_semantics.c |    1 +
>  net/ipv4/route.c         |   16 ++++++++--------
>  3 files changed, 10 insertions(+), 8 deletions(-)

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-02 15:48                                         ` Eric Dumazet
                                                             ` (2 preceding siblings ...)
  2012-10-02 23:24                                           ` Julian Anastasov
@ 2012-10-03  2:55                                           ` David Miller
  2012-10-04 11:25                                           ` [PATCH] ipv4: add a fib_type to fib_info Eric Dumazet
  4 siblings, 0 replies; 59+ messages in thread
From: David Miller @ 2012-10-03  2:55 UTC (permalink / raw)
  To: eric.dumazet; +Cc: chris2553, netdev, gpiez, davej

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 02 Oct 2012 17:48:39 +0200

> [PATCH] ipv4: properly cache forward routes
> 
> commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
> introduced a regression for forwarding.
> 
> This was hard to reproduce but the symptom was that packets were
> delivered to local host instead of being forwarded.
> 
> Add a separate cache (nh_rth_forward) to solve the problem.
> 
> Many thanks to Chris Clayton for his patience and help.
> 
> Reported-by: Chris Clayton <chris2553@googlemail.com>
> Bisected-by: Chris Clayton <chris2553@googlemail.com>
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

I'm still having trouble understanding how this can happen,
which is probably why I introduced this bug in the first
place :-)

Only INPUT routes created by ip_route_input_slow() cache using
nh_rth_input.

Routes for locally destinations vs. forwarded destinations will
resolve to different fib_info objects.

If at some point a new route is added which turns a local destination
into one for which we forward, normal invalidation of cached routes
ought to fix it.

There's some sequence of events I don't understand that causes the
corrupt route cache, can you show it to me?

Thanks.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-02 23:24                                           ` Julian Anastasov
@ 2012-10-03  3:10                                             ` David Miller
  2012-10-03 15:01                                               ` Chris Clayton
  2012-10-03 20:57                                               ` Julian Anastasov
  2012-10-03  7:28                                             ` [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive Eric Dumazet
  1 sibling, 2 replies; 59+ messages in thread
From: David Miller @ 2012-10-03  3:10 UTC (permalink / raw)
  To: ja; +Cc: eric.dumazet, chris2553, netdev, gpiez, davej

From: Julian Anastasov <ja@ssi.bg>
Date: Wed, 3 Oct 2012 02:24:53 +0300 (EEST)

> 	Can it be a problem related to fib_info reuse
> from different routes. For example, when local IP address
> is created for subnet we have:
> 
> broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src 192.168.0.1
> 192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
> local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1
> 
> 	The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
> a reused fib_info structure where we put cached routes.
> The result can be same fib_info for 192.168.0.255 and
> 192.168.0.0/24. RTN_BROADCAST is cached only for input
> routes. Incoming broadcast to 192.168.0.255 can be cached
> and can cause problems for traffic forwarded to 192.168.0.0/24.
> So, this patch should solve the problem because it
> separates the broadcast from unicast traffic.

Now I understand the problem.

I think the way to fix this is to add cfg->fc_type as another
thing that fib_info objects are key'd by.

I think it also would fix your obscure output multicast case too.
 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive
  2012-10-02 23:24                                           ` Julian Anastasov
  2012-10-03  3:10                                             ` David Miller
@ 2012-10-03  7:28                                             ` Eric Dumazet
  2012-10-03 12:45                                               ` David Stevens
  1 sibling, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-10-03  7:28 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, chris2553, netdev, gpiez, Dave Jones

On Wed, 2012-10-03 at 02:24 +0300, Julian Anastasov wrote:
> 	Hello,
> 
> On Tue, 2 Oct 2012, Eric Dumazet wrote:
> 
> > > David, shouldnt we use a nh_rth_forward instead of a nh_rth_input in
> > > __mkroute_input() ?
> > > 
> > > (And change rt_cache_route() as well ?)
> > > 
> > > I am testing a patch right now.
> > 
> > Yeah, this patch seems to fix the bug for me.
> > 
> > [PATCH] ipv4: properly cache forward routes
> > 
> > commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
> > introduced a regression for forwarding.
> > 
> > This was hard to reproduce but the symptom was that packets were
> > delivered to local host instead of being forwarded.
> > 
> > Add a separate cache (nh_rth_forward) to solve the problem.
> 
> 	Can it be a problem related to fib_info reuse
> from different routes. For example, when local IP address
> is created for subnet we have:
> 
> broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src 192.168.0.1
> 192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
> local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1
> 
> 	The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
> a reused fib_info structure where we put cached routes.
> The result can be same fib_info for 192.168.0.255 and
> 192.168.0.0/24. RTN_BROADCAST is cached only for input
> routes. Incoming broadcast to 192.168.0.255 can be cached
> and can cause problems for traffic forwarded to 192.168.0.0/24.
> So, this patch should solve the problem because it
> separates the broadcast from unicast traffic.
> 
> 	And the ip_route_input_slow caching will work for
> local and broadcast input routes (above routes 1 and 3) just
> because they differ in scope and use different fib_info.
> 
> 	Another possible failure is for output routes:
> 
> multicast 224.0.0.0/4 fib_info
> with unicast
> 192.168.0.0/24 fib_info
> 
> 	The multicast sets RTCF_MULTICAST | RTCF_LOCAL
> and can cause problems for generated unicast traffic on
> fib_info reuse. Depends on the scope, for multicast it is
> usually scope global, so may be it is difficult to happen
> in practice.
> 
> 	__mkroute_output works for local/unicast routes
> because they differ in scope.


Thanks Julian for these informations.

BTW, it seems we dont properly increase UDP MIB counters when a
multicast message is not delivered to at least one socket.

Lets fix this to ease future bug hunting.

I hate when "netstat -s" is useless and we have to use dropwatch to
figure out where we drop a frame.

[PATCH] udp: increment UDP_MIB_NOPORTS in multicast receive

We should increment UDP_MIB_NOPORTS in the case we found
no socket to deliver a copy of one incoming UDP message.

(RFC 4113 udpNoPorts)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/udp.c |    1 +
 net/ipv6/udp.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 79c8dbe..dfa73c5 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1591,6 +1591,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			sock_put(stack[i]);
 	} else {
 		kfree_skb(skb);
+		UDP_INC_STATS_BH(net, UDP_MIB_NOPORTS, udptable != &udp_table);
 	}
 	return 0;
 }
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index fc99972..0be9ac2 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -748,6 +748,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			sock_put(stack[i]);
 	} else {
 		kfree_skb(skb);
+		UDP6_INC_STATS_BH(net, UDP_MIB_NOPORTS, udptable != &udp_table);
 	}
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive
  2012-10-03  7:28                                             ` [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive Eric Dumazet
@ 2012-10-03 12:45                                               ` David Stevens
  2012-10-03 13:15                                                 ` Eric Dumazet
  0 siblings, 1 reply; 59+ messages in thread
From: David Stevens @ 2012-10-03 12:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: chris2553, Dave Jones, David Miller, gpiez, Julian Anastasov,
	netdev, netdev-owner

netdev-owner@vger.kernel.org wrote on 10/03/2012 03:28:48 AM:
 
> BTW, it seems we dont properly increase UDP MIB counters when a
> multicast message is not delivered to at least one socket.

If an interface is in promiscuous mode or there are false
positives in a multicast address filter, wouldn't this count as
"drops" packets that were never intended for this machine?

I think an otherwise valid multicast or broadcast packet that doesn't
have a local receiver is not an error and shouldn't be counted.

                                                        +-DLS

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive
  2012-10-03 12:45                                               ` David Stevens
@ 2012-10-03 13:15                                                 ` Eric Dumazet
  2012-10-03 14:09                                                   ` David Stevens
  0 siblings, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-10-03 13:15 UTC (permalink / raw)
  To: David Stevens
  Cc: chris2553, Dave Jones, David Miller, gpiez, Julian Anastasov,
	netdev, netdev-owner

On Wed, 2012-10-03 at 08:45 -0400, David Stevens wrote:
> netdev-owner@vger.kernel.org wrote on 10/03/2012 03:28:48 AM:
>  
> > BTW, it seems we dont properly increase UDP MIB counters when a
> > multicast message is not delivered to at least one socket.
> 
> If an interface is in promiscuous mode or there are false
> positives in a multicast address filter, wouldn't this count as
> "drops" packets that were never intended for this machine?
> 

Yes, probably. So we drop them and its expected.

> I think an otherwise valid multicast or broadcast packet that doesn't
> have a local receiver is not an error and shouldn't be counted.

Hmmm

This counter is not an "error counter", just a "counter".

RFC definitions are exactly :

udpNoPorts OBJECT-TYPE
       SYNTAX     Counter32
       MAX-ACCESS read-only
       STATUS     current
       DESCRIPTION
              "The total number of received UDP datagrams for which
               there was no application at the destination port.

udpInErrors OBJECT-TYPE
       SYNTAX     Counter32
       MAX-ACCESS read-only
       STATUS     current
       DESCRIPTION
              "The number of received UDP datagrams that could not be
               delivered for reasons other than the lack of an
               application at the destination port.


So when a host receives an UDP datagram but there was no application
at the destination port we should increment udpNoPorts, and its not
an error but just a fact.

Now _if_ some reader interprets udpNoPorts increases as an indication
of errors, this reader is wrong.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive
  2012-10-03 13:15                                                 ` Eric Dumazet
@ 2012-10-03 14:09                                                   ` David Stevens
  2012-10-03 15:29                                                     ` Eric Dumazet
  2012-10-03 17:39                                                     ` Rick Jones
  0 siblings, 2 replies; 59+ messages in thread
From: David Stevens @ 2012-10-03 14:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: chris2553, Dave Jones, David Miller, gpiez, Julian Anastasov,
	netdev, netdev-owner

Eric Dumazet <eric.dumazet@gmail.com> wrote on 10/03/2012 09:15:51 AM:
 
> So when a host receives an UDP datagram but there was no application
> at the destination port we should increment udpNoPorts, and its not
> an error but just a fact.

        Of course. I think our difference is on the definition of 
"receives".
I don't think a packet delivered locally due to promiscuous mode, 
broadcast
or an imperfect multicast address filter match is a host UDP datagram 
receive.
These packets really shouldn't be delivered to UDP at all; they are not
addressed to this host (at least the non-broadcast, no-membership ones).
        A unicast UDP packet that doesn't match a local IP address does 
not
increment this counter. A promiscuous mode multicast delivery is no 
different,
except that the destination alone doesn't tell us if it is for us.

        I think counting these will primarily lead to administrators 
seeing
non-zero drops and wasting their time trying to track them down.

                                                                +-DLS

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-03  3:10                                             ` David Miller
@ 2012-10-03 15:01                                               ` Chris Clayton
  2012-10-03 20:57                                               ` Julian Anastasov
  1 sibling, 0 replies; 59+ messages in thread
From: Chris Clayton @ 2012-10-03 15:01 UTC (permalink / raw)
  To: David Miller; +Cc: ja, eric.dumazet, netdev, gpiez, davej



On 10/03/12 04:10, David Miller wrote:
> From: Julian Anastasov <ja@ssi.bg>
> Date: Wed, 3 Oct 2012 02:24:53 +0300 (EEST)
>
>> 	Can it be a problem related to fib_info reuse
>> from different routes. For example, when local IP address
>> is created for subnet we have:
>>
>> broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src 192.168.0.1
>> 192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
>> local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1
>>
>> 	The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
>> a reused fib_info structure where we put cached routes.
>> The result can be same fib_info for 192.168.0.255 and
>> 192.168.0.0/24. RTN_BROADCAST is cached only for input
>> routes. Incoming broadcast to 192.168.0.255 can be cached
>> and can cause problems for traffic forwarded to 192.168.0.0/24.
>> So, this patch should solve the problem because it
>> separates the broadcast from unicast traffic.
>
> Now I understand the problem.
>
> I think the way to fix this is to add cfg->fc_type as another
> thing that fib_info objects are key'd by.
>
> I think it also would fix your obscure output multicast case too.
>
>

I've seen the discussion about whether Eric's patch is OK or not, but 
thought I'd give it a spin anyway. It applies to 3.6.0 with some fuzz, 
but I can confirm that with the patch applied I can now ping my router 
and browse the internet from a KVM client, so the Eric's diagnosis 
matches the problem I reported.

However, after closing the client, I got an oops. I've taken a 
photograph of the screen and uploaded it to 
http://i714.photobucket.com/albums/ww149/chris2553/IMAG0059.jpg. As it's 
not the final patch, this may be a red herring, but I thought I'd better 
give a heads up anyway.

Chris

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive
  2012-10-03 14:09                                                   ` David Stevens
@ 2012-10-03 15:29                                                     ` Eric Dumazet
  2012-10-03 17:31                                                       ` David Stevens
  2012-10-03 17:39                                                     ` Rick Jones
  1 sibling, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-10-03 15:29 UTC (permalink / raw)
  To: David Stevens
  Cc: chris2553, Dave Jones, David Miller, gpiez, Julian Anastasov,
	netdev, netdev-owner

On Wed, 2012-10-03 at 10:09 -0400, David Stevens wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote on 10/03/2012 09:15:51 AM:
>  
> > So when a host receives an UDP datagram but there was no application
> > at the destination port we should increment udpNoPorts, and its not
> > an error but just a fact.
> 
>         Of course. I think our difference is on the definition of 
> "receives".

A receive is a packet delivered to this host.
Interface being promiscuous or not doesnt really matter.

> I don't think a packet delivered locally due to promiscuous mode, 
> broadcast
> or an imperfect multicast address filter match is a host UDP datagram 
> receive.
> These packets really shouldn't be delivered to UDP at all; they are not
> addressed to this host (at least the non-broadcast, no-membership ones).

Thats the bug we currently are tracking. If some error is happening and
packet is delivered instead of being forwarded or dropped, we need a
counter being incremented to catch the bug.

>         A unicast UDP packet that doesn't match a local IP address does 
> not
> increment this counter. 

It _does_ increment this counter right now, not sure what you mean.

We currently correctly increment udpNoPorts if we receive an unicast UDP
packet that doesnt find a matching socket (because socket(s) are bound
to specific addresses instead of ANY_ADDR)

This is an extension of the "there was no application at the destination
port" to "there was no application at the destination port and
destination address"

> A promiscuous mode multicast delivery is no 
> different,
> except that the destination alone doesn't tell us if it is for us.
> 
>         I think counting these will primarily lead to administrators 
> seeing
> non-zero drops and wasting their time trying to track them down.

Well, as I said, seeing increments of this counter is perfectly fine and
matches RFC. It permits better diagnostics. Hiding bugs is not very
helpful.

Most of the time I am trying to track a bug in linux network stack, the
very first thing I ask to reporters is to post "netstat -s" before/after
their tests exactly because I want to see _some_ counters be incremented
and catch obvious problems.

And alas, many drops in our stack are not correctly reported because we
forgot to increment a counter at the right place.

I am fine adding a new SNMP McastDrops counter if you feel its better.

# grep Udp: /proc/net/snmp
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors McastDrops
Udp: 11449164 15473 514616 290821178 0 184352 134

"netstat -s -u" would display :

Udp:
    11449164 packets received
    15473 packets to unknown port received.
    514616 packet receive errors
    290821178 packets sent
    SndbufErrors: 184352
    McastDrops: 134


Non official patch since net-next is not open :

 include/linux/snmp.h |    1 +
 net/ipv4/proc.c      |    1 +
 net/ipv4/udp.c       |    2 ++
 net/ipv6/proc.c      |    2 ++
 net/ipv6/udp.c       |    2 ++
 5 files changed, 8 insertions(+)

diff --git a/include/linux/snmp.h b/include/linux/snmp.h
index 00bc189..321d643 100644
--- a/include/linux/snmp.h
+++ b/include/linux/snmp.h
@@ -145,6 +145,7 @@ enum
 	UDP_MIB_OUTDATAGRAMS,			/* OutDatagrams */
 	UDP_MIB_RCVBUFERRORS,			/* RcvbufErrors */
 	UDP_MIB_SNDBUFERRORS,			/* SndbufErrors */
+	UDP_MIB_MCASTDROPS,			/* McastDrops (linux extension) */
 	__UDP_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 957acd1..1e932ee 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -172,6 +172,7 @@ static const struct snmp_mib snmp4_udp_list[] = {
 	SNMP_MIB_ITEM("OutDatagrams", UDP_MIB_OUTDATAGRAMS),
 	SNMP_MIB_ITEM("RcvbufErrors", UDP_MIB_RCVBUFERRORS),
 	SNMP_MIB_ITEM("SndbufErrors", UDP_MIB_SNDBUFERRORS),
+	SNMP_MIB_ITEM("McastDrops", UDP_MIB_MCASTDROPS),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 2814f66..4e2a4f7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1591,6 +1591,8 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			sock_put(stack[i]);
 	} else {
 		kfree_skb(skb);
+		UDP_INC_STATS_BH(net, UDP_MIB_MCASTDROPS,
+				 udptable != &udp_table);
 	}
 	return 0;
 }
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 745a320..f2c12ea 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -129,6 +129,7 @@ static const struct snmp_mib snmp6_udp6_list[] = {
 	SNMP_MIB_ITEM("Udp6OutDatagrams", UDP_MIB_OUTDATAGRAMS),
 	SNMP_MIB_ITEM("Udp6RcvbufErrors", UDP_MIB_RCVBUFERRORS),
 	SNMP_MIB_ITEM("Udp6SndbufErrors", UDP_MIB_SNDBUFERRORS),
+	SNMP_MIB_ITEM("Udp6McastDrops", UDP_MIB_MCASTDROPS),
 	SNMP_MIB_SENTINEL
 };
 
@@ -139,6 +140,7 @@ static const struct snmp_mib snmp6_udplite6_list[] = {
 	SNMP_MIB_ITEM("UdpLite6OutDatagrams", UDP_MIB_OUTDATAGRAMS),
 	SNMP_MIB_ITEM("UdpLite6RcvbufErrors", UDP_MIB_RCVBUFERRORS),
 	SNMP_MIB_ITEM("UdpLite6SndbufErrors", UDP_MIB_SNDBUFERRORS),
+	SNMP_MIB_ITEM("UdpLite6McastDrops", UDP_MIB_MCASTDROPS);
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 07e2bfe..c8caf1b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -748,6 +748,8 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			sock_put(stack[i]);
 	} else {
 		kfree_skb(skb);
+		UDP6_INC_STATS_BH(net, UDP_MIB_MCASTDROPS,
+				  udptable != &udp_table);
 	}
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive
  2012-10-03 15:29                                                     ` Eric Dumazet
@ 2012-10-03 17:31                                                       ` David Stevens
  2012-10-03 19:30                                                         ` David Miller
  0 siblings, 1 reply; 59+ messages in thread
From: David Stevens @ 2012-10-03 17:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: chris2553, Dave Jones, David Miller, gpiez, Julian Anastasov,
	netdev, netdev-owner

Eric Dumazet <eric.dumazet@gmail.com> wrote on 10/03/2012 11:29:13 AM:

> >         Of course. I think our difference is on the definition of 
> > "receives".
> 
> A receive is a packet delivered to this host.
> Interface being promiscuous or not doesnt really matter.

        A receive is a packet *addressed* to this host. My point was
that running tcpdump/wireshark to look at other hosts' traffic
shouldn't affect any UDP MIB (these are ordinarily filtered by IP),
but I forgot that we are checking in software, as well as the HW multicast
address filter, for multicast group membership. So promiscuous mode and
imperfect NIC MAF hashes shouldn't actually result in local delivery
and that problem isn't there at all.

        I do think, still, that it is common to have broadcasts and
multicasts (for joined groups, even) with traffic completely uninteresting
to this host and that having a drop counter going up for those will
appear to be losses and errors when they are completely harmless and
irrelevant.
        But since it can't be incremented for items that are not actually
addressed to the local host, as I originally thought, I don't object
anymore. Sorry for the sidetrack -- I should've verified that originally.

                                                        +-DLS

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive
  2012-10-03 14:09                                                   ` David Stevens
  2012-10-03 15:29                                                     ` Eric Dumazet
@ 2012-10-03 17:39                                                     ` Rick Jones
  1 sibling, 0 replies; 59+ messages in thread
From: Rick Jones @ 2012-10-03 17:39 UTC (permalink / raw)
  To: David Stevens
  Cc: Eric Dumazet, chris2553, Dave Jones, David Miller, gpiez,
	Julian Anastasov, netdev, netdev-owner

On 10/03/2012 07:09 AM, David Stevens wrote:
> Of course. I think our difference is on the definition of
> "receives". I don't think a packet delivered locally due to
> promiscuous mode, broadcast or an imperfect multicast address filter
> match is a host UDP datagram receive. These packets really shouldn't
> be delivered to UDP at all; they are not addressed to this host (at
> least the non-broadcast, no-membership ones). A unicast UDP packet
> that doesn't match a local IP address does not increment this
> counter. A promiscuous mode multicast delivery is no different,
> except that the destination alone doesn't tell us if it is for us.
>
> I think counting these will primarily lead to administrators seeing
> non-zero drops and wasting their time trying to track them down.

I would tend to agree with David on this one. Or they might cease trying 
to track them down because they've gotten so many "false positives."

Isn't "meant for me" vs "not meant for me" at the heard of "drops" 
versus "discards?"

Once the packet is in the host, is it tagged in some way with "this was 
received as promiscuous/whatnot?"

rick

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive
  2012-10-03 17:31                                                       ` David Stevens
@ 2012-10-03 19:30                                                         ` David Miller
  0 siblings, 0 replies; 59+ messages in thread
From: David Miller @ 2012-10-03 19:30 UTC (permalink / raw)
  To: dlstevens; +Cc: eric.dumazet, chris2553, davej, gpiez, ja, netdev, netdev-owner

From: David Stevens <dlstevens@us.ibm.com>
Date: Wed, 3 Oct 2012 13:31:30 -0400

> Eric Dumazet <eric.dumazet@gmail.com> wrote on 10/03/2012 11:29:13 AM:
> 
>> >         Of course. I think our difference is on the definition of 
>> > "receives".
>> 
>> A receive is a packet delivered to this host.
>> Interface being promiscuous or not doesnt really matter.
> 
>         A receive is a packet *addressed* to this host.

Although I'm largely ambivalent, this one sentence tipped me over
towards David's side on this issue.

But this is easy to resolve Eric, just simply make a new custom
counter that counts these new cases you care about and document it
properly.

Thanks.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Possible networking regression in 3.6.0
  2012-10-03  3:10                                             ` David Miller
  2012-10-03 15:01                                               ` Chris Clayton
@ 2012-10-03 20:57                                               ` Julian Anastasov
  1 sibling, 0 replies; 59+ messages in thread
From: Julian Anastasov @ 2012-10-03 20:57 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, chris2553, netdev, gpiez, davej


	Hello,

On Tue, 2 Oct 2012, David Miller wrote:

> From: Julian Anastasov <ja@ssi.bg>
> Date: Wed, 3 Oct 2012 02:24:53 +0300 (EEST)
> 
> > 	Can it be a problem related to fib_info reuse
> > from different routes. For example, when local IP address
> > is created for subnet we have:
> > 
> > broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src 192.168.0.1
> > 192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
> > local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1
> > 
> > 	The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
> > a reused fib_info structure where we put cached routes.
> > The result can be same fib_info for 192.168.0.255 and
> > 192.168.0.0/24. RTN_BROADCAST is cached only for input
> > routes. Incoming broadcast to 192.168.0.255 can be cached
> > and can cause problems for traffic forwarded to 192.168.0.0/24.
> > So, this patch should solve the problem because it
> > separates the broadcast from unicast traffic.
> 
> Now I understand the problem.
> 
> I think the way to fix this is to add cfg->fc_type as another
> thing that fib_info objects are key'd by.
> 
> I think it also would fix your obscure output multicast case too.

	Agreed. I don't see problem with this idea.
It will avoid confusions with rt_type.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH] ipv4: add a fib_type to fib_info
  2012-10-02 15:48                                         ` Eric Dumazet
                                                             ` (3 preceding siblings ...)
  2012-10-03  2:55                                           ` Possible networking regression in 3.6.0 David Miller
@ 2012-10-04 11:25                                           ` Eric Dumazet
  2012-10-04 13:08                                             ` Chris Clayton
  4 siblings, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-10-04 11:25 UTC (permalink / raw)
  To: David Miller; +Cc: chris2553, netdev, gpiez, Dave Jones, Julian Anastasov

On Tue, 2012-10-02 at 17:48 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> On Tue, 2012-10-02 at 17:35 +0200, Eric Dumazet wrote:
> > On Mon, 2012-10-01 at 22:04 +0200, Eric Dumazet wrote:
> > > On Mon, 2012-10-01 at 16:01 -0400, David Miller wrote:
> > 
> > > > If you can find a way to more reliably trigger the case, that would
> > > > help us immensely.
> > > 
> > > I am building a KMEMCHECK kernel, as a last try before my night ;)
> > 
> > This was a total disaster. KMEMCHECK dies horribly on my machine
> > 
> > David, shouldnt we use a nh_rth_forward instead of a nh_rth_input in
> > __mkroute_input() ?
> > 
> > (And change rt_cache_route() as well ?)
> > 
> > I am testing a patch right now.
> 

OK so I implemented David idea and it seems to work.

Testers are needed, thanks ! ;)

[PATCH] ipv4: add a fib_type to fib_info

commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.

This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.

David suggested to add fib_type to fib_info so that we dont
inadvertently share same fib_info for different purposes.

With help from Julian Anastasov who provided very helpful
hints, reproduced here :

<quote>
        Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:

broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
192.168.0.1
192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1

        The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.

        And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.

</quote>

Many thanks to Chris Clayton for his patience and help.

Reported-by: Chris Clayton <chris2553@googlemail.com>
Bisected-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
---
 include/net/ip_fib.h     |    1 +
 net/ipv4/fib_semantics.c |    2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 926142e..9497be1 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -102,6 +102,7 @@ struct fib_info {
 	unsigned char		fib_dead;
 	unsigned char		fib_protocol;
 	unsigned char		fib_scope;
+	unsigned char		fib_type;
 	__be32			fib_prefsrc;
 	u32			fib_priority;
 	u32			*fib_metrics;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 3509065..2677530 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -314,6 +314,7 @@ static struct fib_info *fib_find_info(const struct fib_info *nfi)
 		    nfi->fib_scope == fi->fib_scope &&
 		    nfi->fib_prefsrc == fi->fib_prefsrc &&
 		    nfi->fib_priority == fi->fib_priority &&
+		    nfi->fib_type == fi->fib_type &&
 		    memcmp(nfi->fib_metrics, fi->fib_metrics,
 			   sizeof(u32) * RTAX_MAX) == 0 &&
 		    ((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_F_DEAD) == 0 &&
@@ -833,6 +834,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
 	fi->fib_flags = cfg->fc_flags;
 	fi->fib_priority = cfg->fc_priority;
 	fi->fib_prefsrc = cfg->fc_prefsrc;
+	fi->fib_type = cfg->fc_type;
 
 	fi->fib_nhs = nhs;
 	change_nexthops(fi) {

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH] ipv4: add a fib_type to fib_info
  2012-10-04 11:25                                           ` [PATCH] ipv4: add a fib_type to fib_info Eric Dumazet
@ 2012-10-04 13:08                                             ` Chris Clayton
  2012-10-04 13:32                                               ` Eric Dumazet
  0 siblings, 1 reply; 59+ messages in thread
From: Chris Clayton @ 2012-10-04 13:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, gpiez, Dave Jones, Julian Anastasov



On 10/04/12 12:25, Eric Dumazet wrote:
> On Tue, 2012-10-02 at 17:48 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> On Tue, 2012-10-02 at 17:35 +0200, Eric Dumazet wrote:
>>> On Mon, 2012-10-01 at 22:04 +0200, Eric Dumazet wrote:
>>>> On Mon, 2012-10-01 at 16:01 -0400, David Miller wrote:
>>>
>>>>> If you can find a way to more reliably trigger the case, that would
>>>>> help us immensely.
>>>>
>>>> I am building a KMEMCHECK kernel, as a last try before my night ;)
>>>
>>> This was a total disaster. KMEMCHECK dies horribly on my machine
>>>
>>> David, shouldnt we use a nh_rth_forward instead of a nh_rth_input in
>>> __mkroute_input() ?
>>>
>>> (And change rt_cache_route() as well ?)
>>>
>>> I am testing a patch right now.
>>
>
> OK so I implemented David idea and it seems to work.
>
> Testers are needed, thanks ! ;)
>

I've tested 3.6.0 with this patch applied and networking in a WinXP KVM 
client is now working fine. The patch applies cleanly to 3.6.0, so I 
assume the patch will be forwarded to stable in due course.

Tested-by: Chris Clayton <chris2553@googlemail.com>

> [PATCH] ipv4: add a fib_type to fib_info
>
> commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
> introduced a regression for forwarding.
>
> This was hard to reproduce but the symptom was that packets were
> delivered to local host instead of being forwarded.
>
> David suggested to add fib_type to fib_info so that we dont
> inadvertently share same fib_info for different purposes.
>
> With help from Julian Anastasov who provided very helpful
> hints, reproduced here :
>
> <quote>
>          Can it be a problem related to fib_info reuse
> from different routes. For example, when local IP address
> is created for subnet we have:
>
> broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
> 192.168.0.1
> 192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
> local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1
>
>          The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
> a reused fib_info structure where we put cached routes.
> The result can be same fib_info for 192.168.0.255 and
> 192.168.0.0/24. RTN_BROADCAST is cached only for input
> routes. Incoming broadcast to 192.168.0.255 can be cached
> and can cause problems for traffic forwarded to 192.168.0.0/24.
> So, this patch should solve the problem because it
> separates the broadcast from unicast traffic.
>
>          And the ip_route_input_slow caching will work for
> local and broadcast input routes (above routes 1 and 3) just
> because they differ in scope and use different fib_info.
>
> </quote>
>
> Many thanks to Chris Clayton for his patience and help.
>
> Reported-by: Chris Clayton <chris2553@googlemail.com>
> Bisected-by: Chris Clayton <chris2553@googlemail.com>
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Julian Anastasov <ja@ssi.bg>
> ---
>   include/net/ip_fib.h     |    1 +
>   net/ipv4/fib_semantics.c |    2 ++
>   2 files changed, 3 insertions(+)
>
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 926142e..9497be1 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -102,6 +102,7 @@ struct fib_info {
>   	unsigned char		fib_dead;
>   	unsigned char		fib_protocol;
>   	unsigned char		fib_scope;
> +	unsigned char		fib_type;
>   	__be32			fib_prefsrc;
>   	u32			fib_priority;
>   	u32			*fib_metrics;
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 3509065..2677530 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -314,6 +314,7 @@ static struct fib_info *fib_find_info(const struct fib_info *nfi)
>   		    nfi->fib_scope == fi->fib_scope &&
>   		    nfi->fib_prefsrc == fi->fib_prefsrc &&
>   		    nfi->fib_priority == fi->fib_priority &&
> +		    nfi->fib_type == fi->fib_type &&
>   		    memcmp(nfi->fib_metrics, fi->fib_metrics,
>   			   sizeof(u32) * RTAX_MAX) == 0 &&
>   		    ((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_F_DEAD) == 0 &&
> @@ -833,6 +834,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
>   	fi->fib_flags = cfg->fc_flags;
>   	fi->fib_priority = cfg->fc_priority;
>   	fi->fib_prefsrc = cfg->fc_prefsrc;
> +	fi->fib_type = cfg->fc_type;
>
>   	fi->fib_nhs = nhs;
>   	change_nexthops(fi) {
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH] ipv4: add a fib_type to fib_info
  2012-10-04 13:08                                             ` Chris Clayton
@ 2012-10-04 13:32                                               ` Eric Dumazet
  2012-10-04 18:14                                                 ` David Miller
  0 siblings, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-10-04 13:32 UTC (permalink / raw)
  To: Chris Clayton; +Cc: David Miller, netdev, gpiez, Dave Jones, Julian Anastasov

On Thu, 2012-10-04 at 14:08 +0100, Chris Clayton wrote:

> I've tested 3.6.0 with this patch applied and networking in a WinXP KVM 
> client is now working fine. The patch applies cleanly to 3.6.0, so I 
> assume the patch will be forwarded to stable in due course.
> 
> Tested-by: Chris Clayton <chris2553@googlemail.com>

Thanks for testing.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH] ipv4: add a fib_type to fib_info
  2012-10-04 13:32                                               ` Eric Dumazet
@ 2012-10-04 18:14                                                 ` David Miller
  0 siblings, 0 replies; 59+ messages in thread
From: David Miller @ 2012-10-04 18:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: chris2553, netdev, gpiez, davej, ja

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 04 Oct 2012 15:32:08 +0200

> On Thu, 2012-10-04 at 14:08 +0100, Chris Clayton wrote:
> 
>> I've tested 3.6.0 with this patch applied and networking in a WinXP KVM 
>> client is now working fine. The patch applies cleanly to 3.6.0, so I 
>> assume the patch will be forwarded to stable in due course.
>> 
>> Tested-by: Chris Clayton <chris2553@googlemail.com>
> 
> Thanks for testing.

Applied and queued up for -stable, thanks everyone.

Note that this change means we can completely remove the type fields
from fib_alias and fib_result when net-next opens up, as the value can
be fetched from the fib_info directly now.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2012-10-04 18:14 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-17 15:44 Possible networking regression in 3.6.0 Chris Clayton
2012-09-18 14:21 ` Chris Clayton
2012-09-18 14:31   ` Chris Clayton
2012-09-18 14:40     ` Eric Dumazet
2012-09-18 15:51       ` Chris Clayton
2012-09-19 15:26       ` Chris Clayton
2012-09-22  6:26         ` Chris Clayton
2012-09-27 11:50           ` Chris Clayton
2012-09-27 12:14             ` Eric Dumazet
2012-09-27 18:05               ` Chris Clayton
2012-09-27 21:03                 ` Eric Dumazet
2012-09-27 21:17                   ` Eric Dumazet
2012-09-28  6:53                     ` David Miller
2012-09-28  9:14                       ` Chris Clayton
2012-09-28  9:22                     ` Chris Clayton
2012-09-28 11:26                       ` Eric Dumazet
2012-09-28 14:28                         ` Chris Clayton
2012-09-30 15:26                         ` Chris Clayton
2012-09-30 19:45                           ` Eric Dumazet
2012-10-01  8:36                             ` Chris Clayton
2012-10-01  9:15                               ` Eric Dumazet
2012-10-01 15:13                                 ` Chris Clayton
2012-10-01 15:31                                   ` Eric Dumazet
2012-10-01 16:19                                     ` Chris Clayton
2012-10-01 16:37                                       ` Eric Dumazet
2012-10-01 18:28                                         ` Chris Clayton
2012-10-01 18:34                                     ` Captain Obvious
2012-10-01 19:21                                       ` Eric Dumazet
2012-10-01 19:55                                         ` Chris Clayton
2012-10-01 19:22                                       ` Chris Clayton
2012-10-01 19:34                                 ` Dave Jones
2012-10-01 20:01                                   ` David Miller
2012-10-01 20:04                                     ` Eric Dumazet
2012-10-02 15:27                                       ` Edivaldo de Araújo Pereira
2012-10-02 15:35                                       ` Eric Dumazet
2012-10-02 15:48                                         ` Eric Dumazet
2012-10-02 15:57                                           ` Dave Jones
2012-10-02 16:06                                             ` Eric Dumazet
2012-10-02 18:25                                           ` David Miller
2012-10-02 21:14                                             ` Alexander Duyck
2012-10-02 21:35                                               ` Eric Dumazet
2012-10-02 23:24                                           ` Julian Anastasov
2012-10-03  3:10                                             ` David Miller
2012-10-03 15:01                                               ` Chris Clayton
2012-10-03 20:57                                               ` Julian Anastasov
2012-10-03  7:28                                             ` [PATCH] udp: increment UDP_MIB_NOPORTS in mcast receive Eric Dumazet
2012-10-03 12:45                                               ` David Stevens
2012-10-03 13:15                                                 ` Eric Dumazet
2012-10-03 14:09                                                   ` David Stevens
2012-10-03 15:29                                                     ` Eric Dumazet
2012-10-03 17:31                                                       ` David Stevens
2012-10-03 19:30                                                         ` David Miller
2012-10-03 17:39                                                     ` Rick Jones
2012-10-03  2:55                                           ` Possible networking regression in 3.6.0 David Miller
2012-10-04 11:25                                           ` [PATCH] ipv4: add a fib_type to fib_info Eric Dumazet
2012-10-04 13:08                                             ` Chris Clayton
2012-10-04 13:32                                               ` Eric Dumazet
2012-10-04 18:14                                                 ` David Miller
2012-09-18 14:44     ` Possible networking regression in 3.6.0 Chris Clayton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).