All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
@ 2012-07-06  9:47 Nieścierowicz Adam
  2012-07-06 10:13 ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Nieścierowicz Adam @ 2012-07-06  9:47 UTC (permalink / raw)
  To: Eric Dumazet, Netdev

Hello,
Can I send something that will help determine the cause of the problem?


W dniu 08.06.2012 11:41, Eric Dumazet napisał(a):

> On Fri, 2012-06-08 at 10:58 +0200, Nieścierowicz Adam wrote:
>
>> Hello, recently we changed on the router kernel from 2.6.38.1 to 
>> 3.4.1
>> and noticed 30% packet loss when traffic increases up to 250MB / s.
>> Similar is for kernel 3.5-rc1 Here a link to ifstat
>> http://wklej.org/id/767577/ [2]
>
> You should give as much as possible delails on your setup (hardware,
> software)
>
> lspci
> cat /proc/cpuinfo
> cat /proc/interrupts
> ifconfig -a
> tc -s -d qdisc
> dmesg
> netstat -s

currently running on 2.6.38.1 and traffic is 100Mb / s

lspci: http://wklej.org/id/769102/
/proc/cpuinfo: http://wklej.org/id/769104/
/proc/interrupts: http://wklej.org/id/769106/
ifconfig -a: http://wklej.org/id/769108/
tc -s -d qdisc: http://wklej.org/id/769109/
dmesg: here are some logs from iptables
netstat -s: http://wklej.org/id/769110/
lsmod: http://wklej.org/id/769117/
/proc/net/softnet_stat: http://wklej.org/id/769116/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-07-06  9:47 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s Nieścierowicz Adam
@ 2012-07-06 10:13 ` Eric Dumazet
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2012-07-06 10:13 UTC (permalink / raw)
  To: adam.niescierowicz; +Cc: Netdev

On Fri, 2012-07-06 at 11:47 +0200, Nieścierowicz Adam wrote:
> Hello,
> Can I send something that will help determine the cause of the problem?
> 
> 
> W dniu 08.06.2012 11:41, Eric Dumazet napisał(a):
> 
> > On Fri, 2012-06-08 at 10:58 +0200, Nieścierowicz Adam wrote:
> >
> >> Hello, recently we changed on the router kernel from 2.6.38.1 to 
> >> 3.4.1
> >> and noticed 30% packet loss when traffic increases up to 250MB / s.
> >> Similar is for kernel 3.5-rc1 Here a link to ifstat
> >> http://wklej.org/id/767577/ [2]
> >
> > You should give as much as possible delails on your setup (hardware,
> > software)
> >
> > lspci
> > cat /proc/cpuinfo
> > cat /proc/interrupts
> > ifconfig -a
> > tc -s -d qdisc
> > dmesg
> > netstat -s
> 
> currently running on 2.6.38.1 and traffic is 100Mb / s
> 
> lspci: http://wklej.org/id/769102/
> /proc/cpuinfo: http://wklej.org/id/769104/
> /proc/interrupts: http://wklej.org/id/769106/
> ifconfig -a: http://wklej.org/id/769108/
> tc -s -d qdisc: http://wklej.org/id/769109/
> dmesg: here are some logs from iptables
> netstat -s: http://wklej.org/id/769110/
> lsmod: http://wklej.org/id/769117/
> /proc/net/softnet_stat: http://wklej.org/id/769116/


Same infos of 3.5-rcX kernel would be nice.

What NIC is eth0 ?  (dmesg please)

It seems all network traffic on 2.6.38 is handled by a single cpu (cpu0)

(seen in /proc/interrupts)

I suspect that with 3.4 or 3.5 kernels, traffic is handled by many cpus
and they hit false sharing and contention.

You probably get better performance doing some affinity tuning :

For example, 
  eth0 serviced by cpu0
  eth2 serviced by cpu1
  eth3 serviced by cpu2
  eth5 serviced by cpu3

and so on...

check and/or set /proc/irq/${NUM}/smp_affinity

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-09 19:56                 ` Nieścierowicz Adam
@ 2012-10-10  4:59                   ` Jeff Kirsher
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Kirsher @ 2012-10-10  4:59 UTC (permalink / raw)
  To: adam.niescierowicz; +Cc: Andre Tomt, Eric Dumazet, Netdev, Jesse Brandeburg

On 10/09/2012 12:56 PM, Nieścierowicz Adam wrote:
> W dniu 08.10.2012 14:59, Andre Tomt napisał(a):
>
>> On 08. okt. 2012 14:32, Andre Tomt wrote:
>>
>>> On 08. okt. 2012 14:13, Eric Dumazet wrote:
>>>
>>>> On Mon, 2012-10-08 at 14:00 +0200, Andre Tomt wrote:
>>>>
>>>>> On 08. okt. 2012 12:49, Nieścierowicz Adam wrote:
>>>>>
>>>>>> W dniu 08.10.2012 11:47, Eric Dumazet napisał(a):
>>>>>>
>>>>>>> Anyway you dont say where are drops, (ifconfig give us very few
>>>>>>> drops)
>>>>>> you can see no losses(drop), but a temporary decline in traffic
>>>>>> on the interface to 0kb/s
>>>>> This sounds very familiar, could it be something similar to:
>>>>> http://marc.info/?l=linux-netdev&m=134594936016796&w=3 [1] The chip
>>>>> seems to be of the same family (though not model)
>>>> Yes, but Adam says 3.4.1 already has a problem, while commit
>>>> 2cb7a9cc008c25dc03314de563c00c107b3e5432 is in 3.5 only. Since Adam
>>>> uses Intel e1000e, it could be the BQL related problem.
>>> The other chips have had DMA burst flag enabled for longer, so that he
>>> sees the same problem in 3.4 while I'm not makes sense. Hmm, as 3.4 is
>>> when BQL went in (IIRC) it seems very likely that this BQL issue is the
>>> problem for both of us.
>>
>> To clarify; I think the DMA burst flag in the driver triggers the BQL
>> related issue. Judging by the patchwork link for wthresh=1 this seems
>> very related indeed.
>>
>> Removing the FLAG2_DMA_BURST flag for 82574 in the driver works for me.
>> Adam, it might be worth testing out a build on your system too with the
>> flag removed. If you try the attached patch (for 3.6, probably OK for
>> 3.5) and the problem dissapears, we are probably at least talking about
>> the same bug.
>
> after applying the patch everything looks good, no visible loss
>
> Do you expect to correct the bug in mainline? 
Jesse Brandenburg is working on a patch for upstream currently to fix 
the issue.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-08 12:59               ` Andre Tomt
@ 2012-10-09 19:56                 ` Nieścierowicz Adam
  2012-10-10  4:59                   ` Jeff Kirsher
  0 siblings, 1 reply; 18+ messages in thread
From: Nieścierowicz Adam @ 2012-10-09 19:56 UTC (permalink / raw)
  To: Andre Tomt; +Cc: Eric Dumazet, Netdev

W dniu 08.10.2012 14:59, Andre Tomt napisał(a):

> On 08. okt. 2012 14:32, Andre Tomt wrote:
>
>> On 08. okt. 2012 14:13, Eric Dumazet wrote:
>>
>>> On Mon, 2012-10-08 at 14:00 +0200, Andre Tomt wrote:
>>>
>>>> On 08. okt. 2012 12:49, Nieścierowicz Adam wrote:
>>>>
>>>>> W dniu 08.10.2012 11:47, Eric Dumazet napisał(a):
>>>>>
>>>>>> Anyway you dont say where are drops, (ifconfig give us very few
>>>>>> drops)
>>>>> you can see no losses(drop), but a temporary decline in traffic
>>>>> on the interface to 0kb/s
>>>> This sounds very familiar, could it be something similar to:
>>>> http://marc.info/?l=linux-netdev&m=134594936016796&w=3 [1] The 
>>>> chip
>>>> seems to be of the same family (though not model)
>>> Yes, but Adam says 3.4.1 already has a problem, while commit
>>> 2cb7a9cc008c25dc03314de563c00c107b3e5432 is in 3.5 only. Since Adam
>>> uses Intel e1000e, it could be the BQL related problem.
>> The other chips have had DMA burst flag enabled for longer, so that 
>> he
>> sees the same problem in 3.4 while I'm not makes sense. Hmm, as 3.4 
>> is
>> when BQL went in (IIRC) it seems very likely that this BQL issue is 
>> the
>> problem for both of us.
>
> To clarify; I think the DMA burst flag in the driver triggers the BQL
> related issue. Judging by the patchwork link for wthresh=1 this seems
> very related indeed.
>
> Removing the FLAG2_DMA_BURST flag for 82574 in the driver works for 
> me.
> Adam, it might be worth testing out a build on your system too with 
> the
> flag removed. If you try the attached patch (for 3.6, probably OK for
> 3.5) and the problem dissapears, we are probably at least talking 
> about
> the same bug.

after applying the patch everything looks good, no visible loss

Do you expect to correct the bug in mainline?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-08 12:32             ` Andre Tomt
@ 2012-10-08 12:59               ` Andre Tomt
  2012-10-09 19:56                 ` Nieścierowicz Adam
  0 siblings, 1 reply; 18+ messages in thread
From: Andre Tomt @ 2012-10-08 12:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: adam.niescierowicz, Netdev

[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]

On 08. okt. 2012 14:32, Andre Tomt wrote:
> On 08. okt. 2012 14:13, Eric Dumazet wrote:
>> On Mon, 2012-10-08 at 14:00 +0200, Andre Tomt wrote:
>>> On 08. okt. 2012 12:49, Nieścierowicz Adam wrote:
>>>> W dniu 08.10.2012 11:47, Eric Dumazet napisał(a):
>>>>> Anyway you dont say where are drops, (ifconfig give us very few drops)
>>>>
>>>> you can see no losses(drop), but a temporary decline in traffic on the
>>>> interface to 0kb/s
>>>
>>> This sounds very familiar, could it be something similar to:
>>> http://marc.info/?l=linux-netdev&m=134594936016796&w=3
>>>
>>> The chip seems to be of the same family (though not model)
>>
>> Yes, but Adam says 3.4.1 already has a problem, while
>> commit 2cb7a9cc008c25dc03314de563c00c107b3e5432 is in 3.5 only.
>  >
>> Since Adam uses Intel e1000e, it could be the BQL related problem.
>
> The other chips have had DMA burst flag enabled for longer, so that he
> sees the same problem in 3.4 while I'm not makes sense. Hmm, as 3.4 is
> when BQL went in (IIRC) it seems very likely that this BQL issue is the
> problem for both of us.

To clarify; I think the DMA burst flag in the driver triggers the BQL 
related issue. Judging by the patchwork link for wthresh=1 this seems 
very related indeed.

Removing the FLAG2_DMA_BURST flag for 82574 in the driver works for me. 
Adam, it might be worth testing out a build on your system too with the 
flag removed. If you try the attached patch (for 3.6, probably OK for 
3.5) and the problem dissapears, we are probably at least talking about 
the same bug.

[-- Attachment #2: e1000e-disable-dma-burst.patch --]
[-- Type: text/x-patch, Size: 1355 bytes --]

diff -Naur linux-3.6.1/drivers/net/ethernet/intel/e1000e/82571.c linux-3.6.1-2/drivers/net/ethernet/intel/e1000e/82571.c
--- linux-3.6.1/drivers/net/ethernet/intel/e1000e/82571.c	2012-10-07 17:41:28.000000000 +0200
+++ linux-3.6.1-2/drivers/net/ethernet/intel/e1000e/82571.c	2012-10-08 14:54:08.853095363 +0200
@@ -2031,8 +2031,7 @@
 				  | FLAG_RESET_OVERWRITES_LAA /* errata */
 				  | FLAG_TARC_SPEED_MODE_BIT /* errata */
 				  | FLAG_APME_CHECK_PORT_B,
-	.flags2			= FLAG2_DISABLE_ASPM_L1 /* errata 13 */
-				  | FLAG2_DMA_BURST,
+	.flags2			= FLAG2_DISABLE_ASPM_L1, /* errata 13 */
 	.pba			= 38,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,
@@ -2049,8 +2048,7 @@
 				  | FLAG_APME_IN_CTRL3
 				  | FLAG_HAS_CTRLEXT_ON_LOAD
 				  | FLAG_TARC_SPEED_MODE_BIT, /* errata */
-	.flags2			= FLAG2_DISABLE_ASPM_L1 /* errata 13 */
-				  | FLAG2_DMA_BURST,
+	.flags2			= FLAG2_DISABLE_ASPM_L1, /* errata 13 */
 	.pba			= 38,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,
@@ -2090,8 +2088,7 @@
 	.flags2			 = FLAG2_CHECK_PHY_HANG
 				  | FLAG2_DISABLE_ASPM_L0S
 				  | FLAG2_DISABLE_ASPM_L1
-				  | FLAG2_NO_DISABLE_RX
-				  | FLAG2_DMA_BURST,
+				  | FLAG2_NO_DISABLE_RX,
 	.pba			= 32,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-08 12:13           ` Eric Dumazet
@ 2012-10-08 12:32             ` Andre Tomt
  2012-10-08 12:59               ` Andre Tomt
  0 siblings, 1 reply; 18+ messages in thread
From: Andre Tomt @ 2012-10-08 12:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: adam.niescierowicz, Netdev

On 08. okt. 2012 14:13, Eric Dumazet wrote:
> On Mon, 2012-10-08 at 14:00 +0200, Andre Tomt wrote:
>> On 08. okt. 2012 12:49, Nieścierowicz Adam wrote:
>>> W dniu 08.10.2012 11:47, Eric Dumazet napisał(a):
>>>> Anyway you dont say where are drops, (ifconfig give us very few drops)
>>>
>>> you can see no losses(drop), but a temporary decline in traffic on the
>>> interface to 0kb/s
>>
>> This sounds very familiar, could it be something similar to:
>> http://marc.info/?l=linux-netdev&m=134594936016796&w=3
>>
>> The chip seems to be of the same family (though not model)
>
> Yes, but Adam says 3.4.1 already has a problem, while
> commit 2cb7a9cc008c25dc03314de563c00c107b3e5432 is in 3.5 only.
 >
> Since Adam uses Intel e1000e, it could be the BQL related problem.

The other chips have had DMA burst flag enabled for longer, so that he 
sees the same problem in 3.4 while I'm not makes sense. Hmm, as 3.4 is 
when BQL went in (IIRC) it seems very likely that this BQL issue is the 
problem for both of us.

> (Not sure if Intel guys finally fixed the problem, if not, its really
> insane)
>
> http://patchwork.ozlabs.org/patch/163298/

Ugh. :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-08 12:00         ` Andre Tomt
  2012-10-08 12:06           ` Nieścierowicz Adam
@ 2012-10-08 12:13           ` Eric Dumazet
  2012-10-08 12:32             ` Andre Tomt
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2012-10-08 12:13 UTC (permalink / raw)
  To: Andre Tomt; +Cc: adam.niescierowicz, Netdev

On Mon, 2012-10-08 at 14:00 +0200, Andre Tomt wrote:
> On 08. okt. 2012 12:49, Nieścierowicz Adam wrote:
> > W dniu 08.10.2012 11:47, Eric Dumazet napisał(a):
> >> Anyway you dont say where are drops, (ifconfig give us very few drops)
> >
> > you can see no losses(drop), but a temporary decline in traffic on the
> > interface to 0kb/s
> 
> This sounds very familiar, could it be something similar to:
> http://marc.info/?l=linux-netdev&m=134594936016796&w=3
> 
> The chip seems to be of the same family (though not model)

Yes, but Adam says 3.4.1 already has a problem, while
commit 2cb7a9cc008c25dc03314de563c00c107b3e5432 is in 3.5 only.

Since Adam uses Intel e1000e, it could be the BQL related problem.

(Not sure if Intel guys finally fixed the problem, if not, its really
insane)

http://patchwork.ozlabs.org/patch/163298/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-08 12:00         ` Andre Tomt
@ 2012-10-08 12:06           ` Nieścierowicz Adam
  2012-10-08 12:13           ` Eric Dumazet
  1 sibling, 0 replies; 18+ messages in thread
From: Nieścierowicz Adam @ 2012-10-08 12:06 UTC (permalink / raw)
  To: Andre Tomt; +Cc: Eric Dumazet, Netdev

W dniu 08.10.2012 14:00, Andre Tomt napisał(a):

> On 08. okt. 2012 12:49, Nieścierowicz Adam wrote:
>
>> W dniu 08.10.2012 11:47, Eric Dumazet napisał(a):
>>
>>> Anyway you dont say where are drops, (ifconfig give us very few
>>> drops)
>> you can see no losses(drop), but a temporary decline in traffic on 
>> the
>> interface to 0kb/s
>
> This sounds very familiar, could it be something similar to:
> http://marc.info/?l=linux-netdev&m=134594936016796&w=3 [1]
>
> The chip seems to be of the same family (though not model)

In fact it looks similarly.
Here the problem has performed in kernel 3.4.1, 3.5-rcX, and 3.6

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-08 10:49       ` Nieścierowicz Adam
@ 2012-10-08 12:00         ` Andre Tomt
  2012-10-08 12:06           ` Nieścierowicz Adam
  2012-10-08 12:13           ` Eric Dumazet
  0 siblings, 2 replies; 18+ messages in thread
From: Andre Tomt @ 2012-10-08 12:00 UTC (permalink / raw)
  To: adam.niescierowicz; +Cc: Eric Dumazet, Netdev

On 08. okt. 2012 12:49, Nieścierowicz Adam wrote:
> W dniu 08.10.2012 11:47, Eric Dumazet napisał(a):
>> Anyway you dont say where are drops, (ifconfig give us very few drops)
>
> you can see no losses(drop), but a temporary decline in traffic on the
> interface to 0kb/s

This sounds very familiar, could it be something similar to:
http://marc.info/?l=linux-netdev&m=134594936016796&w=3

The chip seems to be of the same family (though not model)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-08  9:47     ` Eric Dumazet
@ 2012-10-08 10:49       ` Nieścierowicz Adam
  2012-10-08 12:00         ` Andre Tomt
  0 siblings, 1 reply; 18+ messages in thread
From: Nieścierowicz Adam @ 2012-10-08 10:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev

W dniu 08.10.2012 11:47, Eric Dumazet napisał(a):

> On Mon, 2012-10-08 at 11:29 +0200, Nieścierowicz Adam wrote:
>
>>> You should use RPS on eth2/eth3 because they are non multi queue.
>>> Documentation/networking/scaling.txt should give you all the needed
>>> info
>> I set processors for rps such as affinity, unfortunately it did not
>> help --- cat /sys/class/net/eth{2,3,4,5}/queues/rx-0/rps_cpus 0040 
>> 0080
>> 0100 0200 --- CPU affinity http://wklej.org/id/843161/ [1]
>
> I said eth2 and eth3
>

eth2,eth3 and eth4,eth5 is the same card so I changed them as well, or 
made ​​a mistake?

> And you should use cpu11->cpu15 instead of cpu6->cpu9 since they are 
> in
> use...
>

I read in the file Documentation/networking/scaling.txt
---
if the rps_cpus for each queue are the ones that
share the same memory domain as the interrupting CPU for that queue
---

so i used the same CPU, I misunderstood?


> Anyway you dont say where are drops, (ifconfig give us very few 
> drops)

you can see no losses(drop), but a temporary decline in traffic on the 
interface to 0kb/s

>
> Also your eth0 seems to have a strange balance :
>
> RX interrupts seems to be well balanced on 4 queues :
>
> 76: 503 0 169271690 0 0 0 PCI-MSI-edge eth0-rx-0
> 77: 405 0 0 164532538 0 0 PCI-MSI-edge eth0-rx-1
> 78: 408 0 0 0 152778723 0 PCI-MSI-edge eth0-rx-2
> 79: 349 0 0 0 0 155011301 PCI-MSI-edge eth0-rx-3
> 80: 144 0 443432394 0 0 0 PCI-MSI-edge eth0-tx-0
> 81: 18 0 0 2043311 0 0 PCI-MSI-edge eth0-tx-1
> 82: 30 0 0 0 1934537 0 PCI-MSI-edge eth0-tx-2
> 83: 137 0 0 0 0 1968272 PCI-MSI-edge eth0-tx-3
>
> But TX seems to mostly use queue 0
>
> Packets sent to eth0 are coming from where ?

Packets come mainly from two routers(Edge BGP and local NAT)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-08  9:29   ` Nieścierowicz Adam
@ 2012-10-08  9:47     ` Eric Dumazet
  2012-10-08 10:49       ` Nieścierowicz Adam
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2012-10-08  9:47 UTC (permalink / raw)
  To: adam.niescierowicz; +Cc: Netdev

On Mon, 2012-10-08 at 11:29 +0200, Nieścierowicz Adam wrote:
> >
> > You should use RPS on eth2/eth3 because they are non multi queue.
> >
> > Documentation/networking/scaling.txt should give you all the needed 
> > info
> 
> I set processors for rps such as affinity, unfortunately it did not 
> help
> 
> ---
> cat /sys/class/net/eth{2,3,4,5}/queues/rx-0/rps_cpus
> 0040
> 0080
> 0100
> 0200
> ---
> CPU affinity http://wklej.org/id/843161/
> 
> 

I said eth2 and eth3

And you should use cpu11->cpu15 instead of cpu6->cpu9 since they are in
use...

Anyway you dont say where are drops, (ifconfig give us very few drops)

Also your eth0 seems to have a strange balance :

RX interrupts seems to be well balanced on 4 queues :

76:        503          0  169271690          0          0          0          PCI-MSI-edge      eth0-rx-0
77:        405          0          0  164532538          0          0          PCI-MSI-edge      eth0-rx-1
78:        408          0          0          0  152778723          0          PCI-MSI-edge      eth0-rx-2
79:        349          0          0          0          0  155011301          PCI-MSI-edge      eth0-rx-3
80:        144          0  443432394          0          0          0          PCI-MSI-edge      eth0-tx-0
81:         18          0          0    2043311          0          0          PCI-MSI-edge      eth0-tx-1
82:         30          0          0          0    1934537          0          PCI-MSI-edge      eth0-tx-2
83:        137          0          0          0          0    1968272          PCI-MSI-edge      eth0-tx-3

But TX seems to mostly use queue 0

Packets sent to eth0 are coming from where ?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-08  6:22 ` Eric Dumazet
@ 2012-10-08  9:29   ` Nieścierowicz Adam
  2012-10-08  9:47     ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Nieścierowicz Adam @ 2012-10-08  9:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev

>
> You should use RPS on eth2/eth3 because they are non multi queue.
>
> Documentation/networking/scaling.txt should give you all the needed 
> info

I set processors for rps such as affinity, unfortunately it did not 
help

---
cat /sys/class/net/eth{2,3,4,5}/queues/rx-0/rps_cpus
0040
0080
0100
0200
---
CPU affinity http://wklej.org/id/843161/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-10-07 19:18 Nieścierowicz Adam
@ 2012-10-08  6:22 ` Eric Dumazet
  2012-10-08  9:29   ` Nieścierowicz Adam
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2012-10-08  6:22 UTC (permalink / raw)
  To: adam.niescierowicz; +Cc: Netdev

On Sun, 2012-10-07 at 21:18 +0200, Nieścierowicz Adam wrote:
> W dniu 06.07.2012 12:13, Eric Dumazet napisał(a):
> 
> > On Fri, 2012-07-06 at 11:47 +0200, Nieścierowicz Adam wrote:
> >
> >> Hello, Can I send something that will help determine the cause of 
> >> the
> >> problem? W dniu 08.06.2012 11:41, Eric Dumazet napisał(a):
> >>
> >>> On Fri, 2012-06-08 at 10:58 +0200, Nieścierowicz Adam wrote:
> >>>
> >>>> Hello, recently we changed on the router kernel from 2.6.38.1 to
> >>>> 3.4.1 and noticed 30% packet loss when traffic increases up to
> >>>> 250MB / s. Similar is for kernel 3.5-rc1 Here a link to ifstat
> >>>> http://wklej.org/id/767577/ [1] [2]
> >>> You should give as much as possible delails on your setup 
> >>> (hardware,
> >>> software) lspci cat /proc/cpuinfo cat /proc/interrupts ifconfig -a 
> >>> tc
> >>> -s -d qdisc dmesg netstat -s
> >> currently running on 2.6.38.1 and traffic is 100Mb / s lspci:
> >> http://wklej.org/id/769102/ [2] /proc/cpuinfo:
> >> http://wklej.org/id/769104/ [3] /proc/interrupts:
> >> http://wklej.org/id/769106/ [4] ifconfig -a:
> >> http://wklej.org/id/769108/ [5] tc -s -d qdisc:
> >> http://wklej.org/id/769109/ [6] dmesg: here are some logs from 
> >> iptables
> >> netstat -s: http://wklej.org/id/769110/ [7] lsmod:
> >> http://wklej.org/id/769117/ [8] /proc/net/softnet_stat:
> >> http://wklej.org/id/769116/ [9]
> >
> > Same infos of 3.5-rcX kernel would be nice.
> >
> > What NIC is eth0 ? (dmesg please)
> >
> > It seems all network traffic on 2.6.38 is handled by a single cpu 
> > (cpu0)
> >
> > (seen in /proc/interrupts)
> >
> > I suspect that with 3.4 or 3.5 kernels, traffic is handled by many 
> > cpus
> > and they hit false sharing and contention.
> >
> > You probably get better performance doing some affinity tuning :
> >
> > For example,
> > eth0 serviced by cpu0
> > eth2 serviced by cpu1
> > eth3 serviced by cpu2
> > eth5 serviced by cpu3
> >
> > and so on...
> >
> > check and/or set /proc/irq/${NUM}/smp_affinity
> 
> hello
> I would go back to an earlier thread.
> 
> Currently is installed kernel 3.6.0 and symptoms are the same
> 
> about configuration:
> 
> - affinity on
> 
> - lspci: http://wklej.org/id/843156/ [10]
> 
> - /proc/cpuinfo: http://wklej.org/id/843158/ [11]
> 
> - /proc/interrupts: http://wklej.org/id/843161/ [12]
> 
> - ifconfig -a: http://wklej.org/id/843162/ [13]
> 
> - tc -s -d qdisc: http://wklej.org/id/843164/ [14]
> 
> - dmesg: http://wklej.org/id/843166/ [15]
> 
> - lsmod: http://wklej.org/id/843167/ [16]
> 
> - /proc/net/softnet_stat: /proc/net/softnet_stat
> 
> attach something else?
> 
> Thanks

You should use RPS on eth2/eth3 because they are non multi queue.

Documentation/networking/scaling.txt should give you all the needed info

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
@ 2012-10-07 19:18 Nieścierowicz Adam
  2012-10-08  6:22 ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Nieścierowicz Adam @ 2012-10-07 19:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev

W dniu 06.07.2012 12:13, Eric Dumazet napisał(a):

> On Fri, 2012-07-06 at 11:47 +0200, Nieścierowicz Adam wrote:
>
>> Hello, Can I send something that will help determine the cause of 
>> the
>> problem? W dniu 08.06.2012 11:41, Eric Dumazet napisał(a):
>>
>>> On Fri, 2012-06-08 at 10:58 +0200, Nieścierowicz Adam wrote:
>>>
>>>> Hello, recently we changed on the router kernel from 2.6.38.1 to
>>>> 3.4.1 and noticed 30% packet loss when traffic increases up to
>>>> 250MB / s. Similar is for kernel 3.5-rc1 Here a link to ifstat
>>>> http://wklej.org/id/767577/ [1] [2]
>>> You should give as much as possible delails on your setup 
>>> (hardware,
>>> software) lspci cat /proc/cpuinfo cat /proc/interrupts ifconfig -a 
>>> tc
>>> -s -d qdisc dmesg netstat -s
>> currently running on 2.6.38.1 and traffic is 100Mb / s lspci:
>> http://wklej.org/id/769102/ [2] /proc/cpuinfo:
>> http://wklej.org/id/769104/ [3] /proc/interrupts:
>> http://wklej.org/id/769106/ [4] ifconfig -a:
>> http://wklej.org/id/769108/ [5] tc -s -d qdisc:
>> http://wklej.org/id/769109/ [6] dmesg: here are some logs from 
>> iptables
>> netstat -s: http://wklej.org/id/769110/ [7] lsmod:
>> http://wklej.org/id/769117/ [8] /proc/net/softnet_stat:
>> http://wklej.org/id/769116/ [9]
>
> Same infos of 3.5-rcX kernel would be nice.
>
> What NIC is eth0 ? (dmesg please)
>
> It seems all network traffic on 2.6.38 is handled by a single cpu 
> (cpu0)
>
> (seen in /proc/interrupts)
>
> I suspect that with 3.4 or 3.5 kernels, traffic is handled by many 
> cpus
> and they hit false sharing and contention.
>
> You probably get better performance doing some affinity tuning :
>
> For example,
> eth0 serviced by cpu0
> eth2 serviced by cpu1
> eth3 serviced by cpu2
> eth5 serviced by cpu3
>
> and so on...
>
> check and/or set /proc/irq/${NUM}/smp_affinity

hello
I would go back to an earlier thread.

Currently is installed kernel 3.6.0 and symptoms are the same

about configuration:

- affinity on

- lspci: http://wklej.org/id/843156/ [10]

- /proc/cpuinfo: http://wklej.org/id/843158/ [11]

- /proc/interrupts: http://wklej.org/id/843161/ [12]

- ifconfig -a: http://wklej.org/id/843162/ [13]

- tc -s -d qdisc: http://wklej.org/id/843164/ [14]

- dmesg: http://wklej.org/id/843166/ [15]

- lsmod: http://wklej.org/id/843167/ [16]

- /proc/net/softnet_stat: /proc/net/softnet_stat

attach something else?

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-06-08  9:41 ` Eric Dumazet
@ 2012-06-08  9:43   ` Eric Dumazet
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2012-06-08  9:43 UTC (permalink / raw)
  To: adam.niescierowicz; +Cc: Netdev

On Fri, 2012-06-08 at 11:41 +0200, Eric Dumazet wrote:
> On Fri, 2012-06-08 at 10:58 +0200, Nieścierowicz Adam wrote:

> lspci
> cat /proc/cpuinfo
> cat /proc/interrupts
> ifconfig -a
> tc -s -d qdisc
> dmesg
> netstat -s
> 

cat /proc/net/softnet_stat
lsmod

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
  2012-06-08  8:58 Nieścierowicz Adam
@ 2012-06-08  9:41 ` Eric Dumazet
  2012-06-08  9:43   ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2012-06-08  9:41 UTC (permalink / raw)
  To: adam.niescierowicz; +Cc: Netdev

On Fri, 2012-06-08 at 10:58 +0200, Nieścierowicz Adam wrote:
> Hello,
> 
> recently we changed on the router kernel from 2.6.38.1 to 3.4.1 and 
> noticed
> 30% packet loss when traffic increases up to 250MB / s.
> 
> Similar is for kernel 3.5-rc1
> 
> Here a link to ifstat http://wklej.org/id/767577/

You should give as much as possible delails on your setup (hardware,
software)

lspci
cat /proc/cpuinfo
cat /proc/interrupts
ifconfig -a
tc -s -d qdisc
dmesg
netstat -s

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
@ 2012-06-08  9:31 Nieścierowicz Adam
  0 siblings, 0 replies; 18+ messages in thread
From: Nieścierowicz Adam @ 2012-06-08  9:31 UTC (permalink / raw)
  To: Eric Dumazet, Netdev

W dniu 08.06.2012 11:41, Eric Dumazet napisał(a):

> On Fri, 2012-06-08 at 10:58 +0200, Nieścierowicz Adam wrote:
>
>> Hello, recently we changed on the router kernel from 2.6.38.1 to 
>> 3.4.1
>> and noticed 30% packet loss when traffic increases up to 250MB / s.
>> Similar is for kernel 3.5-rc1 Here a link to ifstat
>> http://wklej.org/id/767577/ [2]
>
> You should give as much as possible delails on your setup (hardware,
> software)
>
> lspci
> cat /proc/cpuinfo
> cat /proc/interrupts
> ifconfig -a
> tc -s -d qdisc
> dmesg
> netstat -s

currently running on 2.6.38.1 and traffic is 100Mb / s

lspci: http://wklej.org/id/769102/
/proc/cpuinfo: http://wklej.org/id/769104/
/proc/interrupts: http://wklej.org/id/769106/
ifconfig -a: http://wklej.org/id/769108/
tc -s -d qdisc: http://wklej.org/id/769109/
dmesg: here are some logs from iptables
netstat -s: http://wklej.org/id/769110/
lsmod: http://wklej.org/id/769117/
/proc/net/softnet_stat: http://wklej.org/id/769116/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s
@ 2012-06-08  8:58 Nieścierowicz Adam
  2012-06-08  9:41 ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Nieścierowicz Adam @ 2012-06-08  8:58 UTC (permalink / raw)
  To: Netdev

Hello,

recently we changed on the router kernel from 2.6.38.1 to 3.4.1 and 
noticed
30% packet loss when traffic increases up to 250MB / s.

Similar is for kernel 3.5-rc1

Here a link to ifstat http://wklej.org/id/767577/

Regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-10-10  4:59 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-06  9:47 3.4.1 and 3.5-rc1 Packet lost at 250Mb/s Nieścierowicz Adam
2012-07-06 10:13 ` Eric Dumazet
  -- strict thread matches above, loose matches on Subject: below --
2012-10-07 19:18 Nieścierowicz Adam
2012-10-08  6:22 ` Eric Dumazet
2012-10-08  9:29   ` Nieścierowicz Adam
2012-10-08  9:47     ` Eric Dumazet
2012-10-08 10:49       ` Nieścierowicz Adam
2012-10-08 12:00         ` Andre Tomt
2012-10-08 12:06           ` Nieścierowicz Adam
2012-10-08 12:13           ` Eric Dumazet
2012-10-08 12:32             ` Andre Tomt
2012-10-08 12:59               ` Andre Tomt
2012-10-09 19:56                 ` Nieścierowicz Adam
2012-10-10  4:59                   ` Jeff Kirsher
2012-06-08  9:31 Nieścierowicz Adam
2012-06-08  8:58 Nieścierowicz Adam
2012-06-08  9:41 ` Eric Dumazet
2012-06-08  9:43   ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.