All of lore.kernel.org
 help / color / mirror / Atom feed
* IPoIB GRO
@ 2013-11-03 10:58 Markus Stockhausen
       [not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A3E3B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-03 10:58 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Or Gerlitz, Erez Shitrit

[-- Attachment #1: Type: text/plain, Size: 1911 bytes --]

Hello,

I have a little update to the unlucky GRO IPoIB behaviour I observed 
in the last weeks in datagram mode on our ConnectX cards. In the
GRO receive path the kernel steps into the inet_gro_receive() function
of net/ipv4/af_inet.c. If I read the code right it compares two
IP packets and decides if they come from the same "flow". 
Further checks are included in some subroutines that narrow
down the comparison to IPv4 and so on.

I put a debugging message into the following comparison that
seems to be the culprit of it all. 

inet_gro_receive()
  ...
  /* All fields must match except length and checksum. */
  NAPI_GRO_CB(p)->flush |=
    (iph->ttl ^ iph2->ttl) |
    (iph->tos ^ iph2->tos) |
    (__force int)((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) |
    ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
  /* Do some debug */
  printk("%i %i %i\n",ntohs(iph2->id),NAPI_GRO_CB(p)->count,id);
  ...

On a normal GBit Intel card the kernel output reads:

32933 12 32945 
32933 13 32946
32946 1 32947
32946 2 32948
...
32946 15 32961
32964 3 32967
32964 4 32968
...

The interpretation of it all should be that packet ids must match 
the sum of the initial packet id plus its count field. Then
we have a GRO candidate.

On our ib0 interface the count field of a received packet seems
to be 1 most of the time and the packet id always matches the
initial packet id:

35754 1 35754
35754 1 35754
35754 1 35754
...
35754 1 35786
35786 1 35786
35786 1 35786
...

Thats why the flush flag is always set and the GRO stack does
not work at all. I'm willing to dig deeper into this but I'm unsure 
if those fields are filled on sender or receiver side and especially 
where in the IPoIB stack. Maybe someone can point me into the
right direction so that I can dig deeper and provide some more 
information.

Bet regards.

Markus

=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

^ permalink raw reply	[flat|nested] 16+ messages in thread

* AW: IPoIB GRO
       [not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A3E3B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
@ 2013-11-04  8:12   ` Markus Stockhausen
  2013-11-04  8:24   ` Erez Shitrit
  1 sibling, 0 replies; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-04  8:12 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Or Gerlitz, Erez Shitrit

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

> Thats why the flush flag is always set and the GRO stack does
> not work at all. I'm willing to dig deeper into this but I'm unsure
> if those fields are filled on sender or receiver side and especially
> where in the IPoIB stack. 

Maybe I got the reason for that strange ack behaviour during
large NFS over IPoIB reads and hopefully someone can confirm this

If I turn on TSO for an IPoIB datagram interface on the sender side 
GRO on the receiver side is totally broken. This due to the fact that 
TSO "generates" large 60k packets that are offloaded into 
fragments. Each of these fragments has the same ID in the packet 
header. GRO expects IDs to be in incremental order and issues a 
flush after each package. Each flush results in an ACK packet back 
to the server.

With TSO disabled GRO can kick in. Packets are build with 
sequential IDs. GRO only acknowledges every few packets.

For a fully cached file read of 6GB the numbers read:

TSO on: ~220MByte/s - 1,522,679 MLX4 Interrupts on server
TSO off: ~550MByte/s - 318,322 MLX4 Interrupts on server

Is there any chance IPoIB TSO handling can be optimized?

Markus
=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: IPoIB GRO
       [not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A3E3B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
  2013-11-04  8:12   ` AW: " Markus Stockhausen
@ 2013-11-04  8:24   ` Erez Shitrit
       [not found]     ` <527759C6.3070009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 16+ messages in thread
From: Erez Shitrit @ 2013-11-04  8:24 UTC (permalink / raw)
  To: Markus Stockhausen; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz

Hi Markus,

As Or already mentioned, it seems that we have accumulations of ip 
packets, when GRO is enabled over ib interface, from tcpdump in the 
recieve side we can see:

10:09:27.336951 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq 
3795959253:3796023381, ack 2, win 110, length 64128
10:09:27.336987 IP 11.134.41.1.35957 > 11.134.33.1.41377: Flags [.], ack 
3796023381, win 2036, length 0
10:09:27.337022 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq 
3796023381:3796087509, ack 2, win 110, length 64128
10:09:27.337044 IP 11.134.41.1.35957 > 11.134.33.1.41377: Flags [.], ack 
3796087509, win 3038, length 0
10:09:27.337083 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq 
3796087509:3796151637, ack 2, win 110, length 64128
10:09:27.337107 IP 11.134.41.1.35957 > 11.134.33.1.41377: Flags [.], ack 
3796151637, win 4040, length 0
10:09:27.337142 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq 
3796151637:3796215765, ack 2, win 110, length 64128
.....
....

don't you see that behaviour in tcpdump? what kernel are you using?

I will take a look into the gro/our code to check if we missed 
something, and update.

Thanks, Erez

> Hello,
>
> I have a little update to the unlucky GRO IPoIB behaviour I observed
> in the last weeks in datagram mode on our ConnectX cards. In the
> GRO receive path the kernel steps into the inet_gro_receive() function
> of net/ipv4/af_inet.c. If I read the code right it compares two
> IP packets and decides if they come from the same "flow".
> Further checks are included in some subroutines that narrow
> down the comparison to IPv4 and so on.
>
> I put a debugging message into the following comparison that
> seems to be the culprit of it all.
>
> inet_gro_receive()
>    ...
>    /* All fields must match except length and checksum. */
>    NAPI_GRO_CB(p)->flush |=
>      (iph->ttl ^ iph2->ttl) |
>      (iph->tos ^ iph2->tos) |
>      (__force int)((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) |
>      ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
>    /* Do some debug */
>    printk("%i %i %i\n",ntohs(iph2->id),NAPI_GRO_CB(p)->count,id);
>    ...
>
> On a normal GBit Intel card the kernel output reads:
>
> 32933 12 32945
> 32933 13 32946
> 32946 1 32947
> 32946 2 32948
> ...
> 32946 15 32961
> 32964 3 32967
> 32964 4 32968
> ...
>
> The interpretation of it all should be that packet ids must match
> the sum of the initial packet id plus its count field. Then
> we have a GRO candidate.
>
> On our ib0 interface the count field of a received packet seems
> to be 1 most of the time and the packet id always matches the
> initial packet id:
>
> 35754 1 35754
> 35754 1 35754
> 35754 1 35754
> ...
> 35754 1 35786
> 35786 1 35786
> 35786 1 35786
> ...
>
> Thats why the flush flag is always set and the GRO stack does
> not work at all. I'm willing to dig deeper into this but I'm unsure
> if those fields are filled on sender or receiver side and especially
> where in the IPoIB stack. Maybe someone can point me into the
> right direction so that I can dig deeper and provide some more
> information.
>
> Bet regards.
>
> Markus
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* AW: IPoIB GRO
       [not found]     ` <527759C6.3070009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2013-11-04  8:40       ` Markus Stockhausen
       [not found]         ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4301-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-04  8:40 UTC (permalink / raw)
  To: Erez Shitrit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz

[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]

Hi Erez,

> don't you see that behaviour in tcpdump? what kernel are you using?

On server side we have a 3.5 on client side a 3.11 kernel each of them with
kernel standard drivers/modules. I can see the same pattern of GRO 
aggregation on the client that you mention but only if I disable TSO for 
ib0 on the server side. 

The test I'm running on the client is like this. The second and third read
run are definetly served by the NFS server side cache.

sysctl -w net.ipv4.tcp_mem="4096 65536 4194304"
sysctl -w net.ipv4.tcp_rmem="4096 65536 4194304"
sysctl -w net.ipv4.tcp_wmem="4096 65536 4194304"
sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608

mount -o nfsvers=3,rsize=262144,wsize=262144 10.10.30.251:/export /mnt
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
umount /mnt

Markus
=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AW: IPoIB GRO
       [not found]         ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4301-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
@ 2013-11-04 12:41           ` Erez Shitrit
       [not found]             ` <52779612.9020103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Erez Shitrit @ 2013-11-04 12:41 UTC (permalink / raw)
  To: Markus Stockhausen; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz

Hi Markus,

Can you please tell me what is the FW version you have on your ConnectX 
cards?

Thanks, Erez

> Hi Erez,
>
>> don't you see that behaviour in tcpdump? what kernel are you using?
> On server side we have a 3.5 on client side a 3.11 kernel each of them with
> kernel standard drivers/modules. I can see the same pattern of GRO
> aggregation on the client that you mention but only if I disable TSO for
> ib0 on the server side.
>
> The test I'm running on the client is like this. The second and third read
> run are definetly served by the NFS server side cache.
>
> sysctl -w net.ipv4.tcp_mem="4096 65536 4194304"
> sysctl -w net.ipv4.tcp_rmem="4096 65536 4194304"
> sysctl -w net.ipv4.tcp_wmem="4096 65536 4194304"
> sysctl -w net.core.rmem_max=8388608
> sysctl -w net.core.wmem_max=8388608
>
> mount -o nfsvers=3,rsize=262144,wsize=262144 10.10.30.251:/export /mnt
> echo 3 > /proc/sys/vm/drop_caches
> dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
> echo 3 > /proc/sys/vm/drop_caches
> dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
> echo 3 > /proc/sys/vm/drop_caches
> dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
> umount /mnt
>
> Markus

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* AW: AW: IPoIB GRO
       [not found]             ` <52779612.9020103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2013-11-04 13:21               ` Markus Stockhausen
       [not found]                 ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A45CF-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-04 13:21 UTC (permalink / raw)
  To: Erez Shitrit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz

[-- Attachment #1: Type: text/plain, Size: 1850 bytes --]

> Hi Markus,
> 
> Can you please tell me what is the FW version you have on your ConnectX
> cards?

of course. the server has:

root@client:~# ibstat
CA 'mlx4_0'
        CA type: MT26418
        Number of ports: 1
        Firmware version: 2.9.1000
        Hardware version: a0
        Node GUID: 0x0002c903000ec11a
        System image GUID: 0x0002c903000ec11d
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 4
                LMC: 0
                SM lid: 2
                Capability mask: 0x02510868
                Port GUID: 0x0002c903000ec11b

The  client has an older 2.7.x firmware. Mostly because of
the X58 chipset incompatibility with newer firmwares. Your
question suggests that this behaviour may be related to the 
older firmware. So I changed the client side test to another 
host with newer firmware. Nevertheless the TSO problem 
occurs there too.

root@client:~# ibstat
CA 'mlx4_0'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.9.1000
        Hardware version: a0
        Node GUID: 0x001e0bffff4cf9c4
        System image GUID: 0x001e0bffff4cf9c7
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 14
                LMC: 0
                SM lid: 2
                Capability mask: 0x02510868
                Port GUID: 0x001e0bffff4cf9c5
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x001e0bffff4cf9c6

Best regards & thanks in advance.

Markus
=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AW: IPoIB GRO
       [not found]                 ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A45CF-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
@ 2013-11-04 21:17                   ` Wendy Cheng
       [not found]                     ` <CABgxfbEom7fjdshX5AaSXT3P_y=3xFwN9T3V+QXkB0bK-EfNjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-11-05  8:07                   ` AW: " Or Gerlitz
  1 sibling, 1 reply; 16+ messages in thread
From: Wendy Cheng @ 2013-11-04 21:17 UTC (permalink / raw)
  To: Markus Stockhausen
  Cc: Erez Shitrit, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz

I looked at TSO code earlier this year. IIRC, if TSO is on, the upper
layer (e.g. IP) would just send the super-packet down (to IPOIB) w/out
segmentation (for send); if off, it then does the segmentation (to
match the MTU size) before calling device's send. For GSO, I would
imagine it needs some sorts of segmentation sequence to know how to
pull them together on the receive end. Look to me that the
"segmentation offload" (TSO) and "receive offload (GSO) are mutual
exclusive ? Check out dev_gro_receive() (line number based on 2.6.32
RHEL kernel):

   2980
   2981         if (skb_is_gso(skb) || skb_has_frags(skb))
   2982                 goto normal;


See how it bails out when TSO (skb_is_gso()) is on ? So it looks like
an IPOIB bug that ipoib_ib_handle_rx_wc() does a unconditional
napi_gro_receive() regardless adapter capability (and TSO setting).

Just a guess !

-- Wendy
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AW: AW: IPoIB GRO
       [not found]                 ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A45CF-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
  2013-11-04 21:17                   ` Wendy Cheng
@ 2013-11-05  8:07                   ` Or Gerlitz
       [not found]                     ` <5278A757.2070406-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2013-11-05  8:07 UTC (permalink / raw)
  To: Markus Stockhausen, Erez Shitrit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 04/11/2013 15:21, Markus Stockhausen wrote:
> I changed the client side test to another  host with newer firmware. Nevertheless the TSO problem occurs there too.
>
> root@client:~# ibstat
> CA 'mlx4_0'
>          CA type: MT25418
>          Number of ports: 2
>          Firmware version: 2.9.1000
>          Hardware version: a0

I see. This didn't happen on our setups here since we tests with newer 
cards (ConnectX2/3/3-pro).
For ConnectX1 (A0) and this firmware that you are using smells like 
something goes wrong. If possible,
I would change to newish card.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AW: IPoIB GRO
       [not found]                     ` <CABgxfbEom7fjdshX5AaSXT3P_y=3xFwN9T3V+QXkB0bK-EfNjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-11-05  8:14                       ` Or Gerlitz
  0 siblings, 0 replies; 16+ messages in thread
From: Or Gerlitz @ 2013-11-05  8:14 UTC (permalink / raw)
  To: Wendy Cheng, Markus Stockhausen
  Cc: Erez Shitrit, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 04/11/2013 23:17, Wendy Cheng wrote:
> Look to me that the "segmentation offload" (TSO) and "receive offload (GSO) are mutual exclusive ?

Wendy, the problem Markus stepped on was no GRO on the  receive side b/c 
of bad ID-ing of packets on the sender side during the  TSO process.

GSO is SW TSO, has nothing to do with GRO.

The code you referred two probably protects against an skb attempting to 
go through both TSO/GSO and  GRO
(e.g on forwarding) which isn't the case here.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* AW: AW: AW: IPoIB GRO
       [not found]                     ` <5278A757.2070406-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-11-05  8:25                       ` Markus Stockhausen
       [not found]                         ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4B3D-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-05  8:25 UTC (permalink / raw)
  To: Or Gerlitz, Erez Shitrit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng

[-- Attachment #1: Type: text/plain, Size: 994 bytes --]

> I see. This didn't happen on our setups here since we tests with 
> newer cards (ConnectX2/3/3-pro).
> For ConnectX1 (A0) and this firmware that you are using smells 
> like something goes wrong. If possible, I would change to newish 
> card.

No problem with that. My journey up to here was hard but very 
interesting. Especially when you expect everything in the system
to be consistent and new speedups with every new kernel or
driver version. Encountering a throughput drop of nearly 50% 
with the upgrade of our NFS servers I was challenged.

With TSO disabled on our old cards I'm back to LRO speeds and 
I'm more than happy with that. 

Just a final clarification for the interested reader: Are the TCP Ids 
in an TSO setup generated through firmware or in the software 
stack? And if in firmware: How does the card know how to 
increase them? I would expect that it only works with IB packets
and does not know of the IP encapsulation.

Best regards.

Markus
=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AW: AW: AW: IPoIB GRO
       [not found]                         ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4B3D-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
@ 2013-11-05  8:48                           ` Erez Shitrit
       [not found]                             ` <5278B0CA.9080305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2013-11-05  8:49                           ` AW: AW: " Or Gerlitz
  1 sibling, 1 reply; 16+ messages in thread
From: Erez Shitrit @ 2013-11-05  8:48 UTC (permalink / raw)
  To: Markus Stockhausen
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng


>> I see. This didn't happen on our setups here since we tests with
>> newer cards (ConnectX2/3/3-pro).
>> For ConnectX1 (A0) and this firmware that you are using smells
>> like something goes wrong. If possible, I would change to newish
>> card.
> No problem with that. My journey up to here was hard but very
> interesting. Especially when you expect everything in the system
> to be consistent and new speedups with every new kernel or
> driver version. Encountering a throughput drop of nearly 50%
> with the upgrade of our NFS servers I was challenged.
>
> With TSO disabled on our old cards I'm back to LRO speeds and
> I'm more than happy with that.
>
> Just a final clarification for the interested reader: Are the TCP Ids
> in an TSO setup generated through firmware or in the software
> stack? And if in firmware: How does the card know how to
> increase them? I would expect that it only works with IB packets
> and does not know of the IP encapsulation.
The card (HW) knows how to deal with IP packets, the card is configured 
via the FW to increase the ip-id for each ip packet that it is part of 
the full message.

so, to summarize:
The HW does the work (truncates the big ip packet to series of ip 
packets, each with the relevant mtu size and increases the ip-id for each)
The FW enables that work on the HW
the FW in A0 card doesn't enable that option for the HW.

>
> Best regards.
>
> Markus

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AW: AW: AW: IPoIB GRO
       [not found]                         ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4B3D-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
  2013-11-05  8:48                           ` Erez Shitrit
@ 2013-11-05  8:49                           ` Or Gerlitz
  1 sibling, 0 replies; 16+ messages in thread
From: Or Gerlitz @ 2013-11-05  8:49 UTC (permalink / raw)
  To: Markus Stockhausen, Erez Shitrit
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng

On 05/11/2013 10:25, Markus Stockhausen wrote:
> Are the TCP Ids in an TSO setup generated through firmware or in the software stack?

in HW
> And if in firmware: How does the card know how to increase them? I would expect that it only works with IB packets
> and does not know of the IP encapsulation.

All vendors networking HW which does TSO gets a hint from the driver 
this is TSO packet, in your case see
mlx4_ib_post_send and look for IB_WR_LSO

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AW: AW: AW: IPoIB GRO
       [not found]                             ` <5278B0CA.9080305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2013-11-05 17:24                               ` Jason Gunthorpe
       [not found]                                 ` <20131105172431.GA14706-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2013-11-05 17:24 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: Markus Stockhausen, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng

On Tue, Nov 05, 2013 at 10:48:10AM +0200, Erez Shitrit wrote:

> so, to summarize:
> The HW does the work (truncates the big ip packet to series of ip
> packets, each with the relevant mtu size and increases the ip-id for
> each)
> The FW enables that work on the HW
> the FW in A0 card doesn't enable that option for the HW.

Sounds like this bug causes a performance regression, and it sounds
like it puts incorrect packets on the wire.

This should be patched, have the driver disable TSO for cards that
can't support it...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* AW: AW: AW: AW: IPoIB GRO
       [not found]                                 ` <20131105172431.GA14706-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2013-11-05 18:08                                   ` Markus Stockhausen
       [not found]                                     ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A508B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-05 18:08 UTC (permalink / raw)
  To: Jason Gunthorpe, Erez Shitrit
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng

[-- Attachment #1: Type: text/plain, Size: 742 bytes --]

> > so, to summarize:
> > The HW does the work (truncates the big ip packet to series of ip
> > packets, each with the relevant mtu size and increases the ip-id for
> > each)
> > The FW enables that work on the HW
> > the FW in A0 card doesn't enable that option for the HW.
> 
> Sounds like this bug causes a performance regression, and it sounds
> like it puts incorrect packets on the wire.
> 
> This should be patched, have the driver disable TSO for cards that
> can't support it...
> 
> Jason

Incredible how a card that does not support TSO can bring big packets
on the wire that somehow get reassembled on the client side :) Maybe 
a two liner in mlx4_ib_query_device() could prevent further discussions.

Markus
=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re:  IPoIB GRO
       [not found]                                     ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A508B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
@ 2013-11-06  7:50                                       ` Or Gerlitz
       [not found]                                         ` <5279F4DB.8040202-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2013-11-06  7:50 UTC (permalink / raw)
  To: Markus Stockhausen, Jason Gunthorpe, Erez Shitrit
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng

On 05/11/2013 20:08, Markus Stockhausen wrote:
> Incredible how a card that does not support TSO can bring big packets
> on the wire that somehow get reassembled on the client side
not sure to follow, you have shown they are **not**  reassembled, correct?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* AW:  IPoIB GRO
       [not found]                                         ` <5279F4DB.8040202-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-11-06  7:58                                           ` Markus Stockhausen
  0 siblings, 0 replies; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-06  7:58 UTC (permalink / raw)
  To: Or Gerlitz, Jason Gunthorpe, Erez Shitrit
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

> Von: Or Gerlitz [ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
> Gesendet: Mittwoch, 6. November 2013 08:50
> An: Markus Stockhausen; Jason Gunthorpe; Erez Shitrit
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Wendy Cheng
> Betreff: Re:  IPoIB GRO
> 
> On 05/11/2013 20:08, Markus Stockhausen wrote:
> > Incredible how a card that does not support TSO can bring big packets
> > on the wire that somehow get reassembled on the client side
>not sure to follow, you have shown they are **not**  reassembled, correct?

Sorry for being not correct. I meant that activating TSO
on that particular card seems to be nothing more than 
creating fragments. They are reassembled but not in the
GRO path. From my stupid point of view that could have 
resulted in much more problems than GRO not working 
correctly. 

Markus

=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-11-06  7:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-03 10:58 IPoIB GRO Markus Stockhausen
     [not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A3E3B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-04  8:12   ` AW: " Markus Stockhausen
2013-11-04  8:24   ` Erez Shitrit
     [not found]     ` <527759C6.3070009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-11-04  8:40       ` AW: " Markus Stockhausen
     [not found]         ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4301-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-04 12:41           ` Erez Shitrit
     [not found]             ` <52779612.9020103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-11-04 13:21               ` AW: " Markus Stockhausen
     [not found]                 ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A45CF-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-04 21:17                   ` Wendy Cheng
     [not found]                     ` <CABgxfbEom7fjdshX5AaSXT3P_y=3xFwN9T3V+QXkB0bK-EfNjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-05  8:14                       ` Or Gerlitz
2013-11-05  8:07                   ` AW: " Or Gerlitz
     [not found]                     ` <5278A757.2070406-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-05  8:25                       ` AW: " Markus Stockhausen
     [not found]                         ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4B3D-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-05  8:48                           ` Erez Shitrit
     [not found]                             ` <5278B0CA.9080305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-11-05 17:24                               ` Jason Gunthorpe
     [not found]                                 ` <20131105172431.GA14706-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-11-05 18:08                                   ` AW: " Markus Stockhausen
     [not found]                                     ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A508B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-06  7:50                                       ` Or Gerlitz
     [not found]                                         ` <5279F4DB.8040202-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-06  7:58                                           ` AW: " Markus Stockhausen
2013-11-05  8:49                           ` AW: AW: " Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.