* IPoIB GRO
@ 2013-11-03 10:58 Markus Stockhausen
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A3E3B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-03 10:58 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Or Gerlitz, Erez Shitrit
[-- Attachment #1: Type: text/plain, Size: 1911 bytes --]
Hello,
I have a little update to the unlucky GRO IPoIB behaviour I observed
in the last weeks in datagram mode on our ConnectX cards. In the
GRO receive path the kernel steps into the inet_gro_receive() function
of net/ipv4/af_inet.c. If I read the code right it compares two
IP packets and decides if they come from the same "flow".
Further checks are included in some subroutines that narrow
down the comparison to IPv4 and so on.
I put a debugging message into the following comparison that
seems to be the culprit of it all.
inet_gro_receive()
...
/* All fields must match except length and checksum. */
NAPI_GRO_CB(p)->flush |=
(iph->ttl ^ iph2->ttl) |
(iph->tos ^ iph2->tos) |
(__force int)((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) |
((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
/* Do some debug */
printk("%i %i %i\n",ntohs(iph2->id),NAPI_GRO_CB(p)->count,id);
...
On a normal GBit Intel card the kernel output reads:
32933 12 32945
32933 13 32946
32946 1 32947
32946 2 32948
...
32946 15 32961
32964 3 32967
32964 4 32968
...
The interpretation of it all should be that packet ids must match
the sum of the initial packet id plus its count field. Then
we have a GRO candidate.
On our ib0 interface the count field of a received packet seems
to be 1 most of the time and the packet id always matches the
initial packet id:
35754 1 35754
35754 1 35754
35754 1 35754
...
35754 1 35786
35786 1 35786
35786 1 35786
...
Thats why the flush flag is always set and the GRO stack does
not work at all. I'm willing to dig deeper into this but I'm unsure
if those fields are filled on sender or receiver side and especially
where in the IPoIB stack. Maybe someone can point me into the
right direction so that I can dig deeper and provide some more
information.
Bet regards.
Markus
=
[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
Vorstand:
Kadir Akin
Dr. Michael Höhnerbach
Vorsitzender des Aufsichtsrates:
Hans Kristian Langva
Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
executive board:
Kadir Akin
Dr. Michael Höhnerbach
President of the supervisory board:
Hans Kristian Langva
Registry office: district court Cologne
Register number: HRB 52 497
****************************************************************************
^ permalink raw reply [flat|nested] 16+ messages in thread
* AW: IPoIB GRO
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A3E3B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
@ 2013-11-04 8:12 ` Markus Stockhausen
2013-11-04 8:24 ` Erez Shitrit
1 sibling, 0 replies; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-04 8:12 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Or Gerlitz, Erez Shitrit
[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]
> Thats why the flush flag is always set and the GRO stack does
> not work at all. I'm willing to dig deeper into this but I'm unsure
> if those fields are filled on sender or receiver side and especially
> where in the IPoIB stack.
Maybe I got the reason for that strange ack behaviour during
large NFS over IPoIB reads and hopefully someone can confirm this
If I turn on TSO for an IPoIB datagram interface on the sender side
GRO on the receiver side is totally broken. This due to the fact that
TSO "generates" large 60k packets that are offloaded into
fragments. Each of these fragments has the same ID in the packet
header. GRO expects IDs to be in incremental order and issues a
flush after each package. Each flush results in an ACK packet back
to the server.
With TSO disabled GRO can kick in. Packets are build with
sequential IDs. GRO only acknowledges every few packets.
For a fully cached file read of 6GB the numbers read:
TSO on: ~220MByte/s - 1,522,679 MLX4 Interrupts on server
TSO off: ~550MByte/s - 318,322 MLX4 Interrupts on server
Is there any chance IPoIB TSO handling can be optimized?
Markus
=
[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
Vorstand:
Kadir Akin
Dr. Michael Höhnerbach
Vorsitzender des Aufsichtsrates:
Hans Kristian Langva
Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
executive board:
Kadir Akin
Dr. Michael Höhnerbach
President of the supervisory board:
Hans Kristian Langva
Registry office: district court Cologne
Register number: HRB 52 497
****************************************************************************
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPoIB GRO
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A3E3B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-04 8:12 ` AW: " Markus Stockhausen
@ 2013-11-04 8:24 ` Erez Shitrit
[not found] ` <527759C6.3070009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
1 sibling, 1 reply; 16+ messages in thread
From: Erez Shitrit @ 2013-11-04 8:24 UTC (permalink / raw)
To: Markus Stockhausen; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz
Hi Markus,
As Or already mentioned, it seems that we have accumulations of ip
packets, when GRO is enabled over ib interface, from tcpdump in the
recieve side we can see:
10:09:27.336951 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq
3795959253:3796023381, ack 2, win 110, length 64128
10:09:27.336987 IP 11.134.41.1.35957 > 11.134.33.1.41377: Flags [.], ack
3796023381, win 2036, length 0
10:09:27.337022 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq
3796023381:3796087509, ack 2, win 110, length 64128
10:09:27.337044 IP 11.134.41.1.35957 > 11.134.33.1.41377: Flags [.], ack
3796087509, win 3038, length 0
10:09:27.337083 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq
3796087509:3796151637, ack 2, win 110, length 64128
10:09:27.337107 IP 11.134.41.1.35957 > 11.134.33.1.41377: Flags [.], ack
3796151637, win 4040, length 0
10:09:27.337142 IP 11.134.33.1.41377 > 11.134.41.1.35957: Flags [.], seq
3796151637:3796215765, ack 2, win 110, length 64128
.....
....
don't you see that behaviour in tcpdump? what kernel are you using?
I will take a look into the gro/our code to check if we missed
something, and update.
Thanks, Erez
> Hello,
>
> I have a little update to the unlucky GRO IPoIB behaviour I observed
> in the last weeks in datagram mode on our ConnectX cards. In the
> GRO receive path the kernel steps into the inet_gro_receive() function
> of net/ipv4/af_inet.c. If I read the code right it compares two
> IP packets and decides if they come from the same "flow".
> Further checks are included in some subroutines that narrow
> down the comparison to IPv4 and so on.
>
> I put a debugging message into the following comparison that
> seems to be the culprit of it all.
>
> inet_gro_receive()
> ...
> /* All fields must match except length and checksum. */
> NAPI_GRO_CB(p)->flush |=
> (iph->ttl ^ iph2->ttl) |
> (iph->tos ^ iph2->tos) |
> (__force int)((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) |
> ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
> /* Do some debug */
> printk("%i %i %i\n",ntohs(iph2->id),NAPI_GRO_CB(p)->count,id);
> ...
>
> On a normal GBit Intel card the kernel output reads:
>
> 32933 12 32945
> 32933 13 32946
> 32946 1 32947
> 32946 2 32948
> ...
> 32946 15 32961
> 32964 3 32967
> 32964 4 32968
> ...
>
> The interpretation of it all should be that packet ids must match
> the sum of the initial packet id plus its count field. Then
> we have a GRO candidate.
>
> On our ib0 interface the count field of a received packet seems
> to be 1 most of the time and the packet id always matches the
> initial packet id:
>
> 35754 1 35754
> 35754 1 35754
> 35754 1 35754
> ...
> 35754 1 35786
> 35786 1 35786
> 35786 1 35786
> ...
>
> Thats why the flush flag is always set and the GRO stack does
> not work at all. I'm willing to dig deeper into this but I'm unsure
> if those fields are filled on sender or receiver side and especially
> where in the IPoIB stack. Maybe someone can point me into the
> right direction so that I can dig deeper and provide some more
> information.
>
> Bet regards.
>
> Markus
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* AW: IPoIB GRO
[not found] ` <527759C6.3070009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2013-11-04 8:40 ` Markus Stockhausen
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4301-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-04 8:40 UTC (permalink / raw)
To: Erez Shitrit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz
[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]
Hi Erez,
> don't you see that behaviour in tcpdump? what kernel are you using?
On server side we have a 3.5 on client side a 3.11 kernel each of them with
kernel standard drivers/modules. I can see the same pattern of GRO
aggregation on the client that you mention but only if I disable TSO for
ib0 on the server side.
The test I'm running on the client is like this. The second and third read
run are definetly served by the NFS server side cache.
sysctl -w net.ipv4.tcp_mem="4096 65536 4194304"
sysctl -w net.ipv4.tcp_rmem="4096 65536 4194304"
sysctl -w net.ipv4.tcp_wmem="4096 65536 4194304"
sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608
mount -o nfsvers=3,rsize=262144,wsize=262144 10.10.30.251:/export /mnt
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
umount /mnt
Markus
=
[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
Vorstand:
Kadir Akin
Dr. Michael Höhnerbach
Vorsitzender des Aufsichtsrates:
Hans Kristian Langva
Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
executive board:
Kadir Akin
Dr. Michael Höhnerbach
President of the supervisory board:
Hans Kristian Langva
Registry office: district court Cologne
Register number: HRB 52 497
****************************************************************************
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: AW: IPoIB GRO
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4301-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
@ 2013-11-04 12:41 ` Erez Shitrit
[not found] ` <52779612.9020103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Erez Shitrit @ 2013-11-04 12:41 UTC (permalink / raw)
To: Markus Stockhausen; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz
Hi Markus,
Can you please tell me what is the FW version you have on your ConnectX
cards?
Thanks, Erez
> Hi Erez,
>
>> don't you see that behaviour in tcpdump? what kernel are you using?
> On server side we have a 3.5 on client side a 3.11 kernel each of them with
> kernel standard drivers/modules. I can see the same pattern of GRO
> aggregation on the client that you mention but only if I disable TSO for
> ib0 on the server side.
>
> The test I'm running on the client is like this. The second and third read
> run are definetly served by the NFS server side cache.
>
> sysctl -w net.ipv4.tcp_mem="4096 65536 4194304"
> sysctl -w net.ipv4.tcp_rmem="4096 65536 4194304"
> sysctl -w net.ipv4.tcp_wmem="4096 65536 4194304"
> sysctl -w net.core.rmem_max=8388608
> sysctl -w net.core.wmem_max=8388608
>
> mount -o nfsvers=3,rsize=262144,wsize=262144 10.10.30.251:/export /mnt
> echo 3 > /proc/sys/vm/drop_caches
> dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
> echo 3 > /proc/sys/vm/drop_caches
> dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
> echo 3 > /proc/sys/vm/drop_caches
> dd if=/mnt/xxx.iso of=/dev/null bs=1M count=5000
> umount /mnt
>
> Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* AW: AW: IPoIB GRO
[not found] ` <52779612.9020103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2013-11-04 13:21 ` Markus Stockhausen
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A45CF-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-04 13:21 UTC (permalink / raw)
To: Erez Shitrit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz
[-- Attachment #1: Type: text/plain, Size: 1850 bytes --]
> Hi Markus,
>
> Can you please tell me what is the FW version you have on your ConnectX
> cards?
of course. the server has:
root@client:~# ibstat
CA 'mlx4_0'
CA type: MT26418
Number of ports: 1
Firmware version: 2.9.1000
Hardware version: a0
Node GUID: 0x0002c903000ec11a
System image GUID: 0x0002c903000ec11d
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 4
LMC: 0
SM lid: 2
Capability mask: 0x02510868
Port GUID: 0x0002c903000ec11b
The client has an older 2.7.x firmware. Mostly because of
the X58 chipset incompatibility with newer firmwares. Your
question suggests that this behaviour may be related to the
older firmware. So I changed the client side test to another
host with newer firmware. Nevertheless the TSO problem
occurs there too.
root@client:~# ibstat
CA 'mlx4_0'
CA type: MT25418
Number of ports: 2
Firmware version: 2.9.1000
Hardware version: a0
Node GUID: 0x001e0bffff4cf9c4
System image GUID: 0x001e0bffff4cf9c7
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 14
LMC: 0
SM lid: 2
Capability mask: 0x02510868
Port GUID: 0x001e0bffff4cf9c5
Port 2:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510868
Port GUID: 0x001e0bffff4cf9c6
Best regards & thanks in advance.
Markus
=
[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
Vorstand:
Kadir Akin
Dr. Michael Höhnerbach
Vorsitzender des Aufsichtsrates:
Hans Kristian Langva
Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
executive board:
Kadir Akin
Dr. Michael Höhnerbach
President of the supervisory board:
Hans Kristian Langva
Registry office: district court Cologne
Register number: HRB 52 497
****************************************************************************
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: AW: IPoIB GRO
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A45CF-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
@ 2013-11-04 21:17 ` Wendy Cheng
[not found] ` <CABgxfbEom7fjdshX5AaSXT3P_y=3xFwN9T3V+QXkB0bK-EfNjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-05 8:07 ` AW: " Or Gerlitz
1 sibling, 1 reply; 16+ messages in thread
From: Wendy Cheng @ 2013-11-04 21:17 UTC (permalink / raw)
To: Markus Stockhausen
Cc: Erez Shitrit, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz
I looked at TSO code earlier this year. IIRC, if TSO is on, the upper
layer (e.g. IP) would just send the super-packet down (to IPOIB) w/out
segmentation (for send); if off, it then does the segmentation (to
match the MTU size) before calling device's send. For GSO, I would
imagine it needs some sorts of segmentation sequence to know how to
pull them together on the receive end. Look to me that the
"segmentation offload" (TSO) and "receive offload (GSO) are mutual
exclusive ? Check out dev_gro_receive() (line number based on 2.6.32
RHEL kernel):
2980
2981 if (skb_is_gso(skb) || skb_has_frags(skb))
2982 goto normal;
See how it bails out when TSO (skb_is_gso()) is on ? So it looks like
an IPOIB bug that ipoib_ib_handle_rx_wc() does a unconditional
napi_gro_receive() regardless adapter capability (and TSO setting).
Just a guess !
-- Wendy
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: AW: AW: IPoIB GRO
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A45CF-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-04 21:17 ` Wendy Cheng
@ 2013-11-05 8:07 ` Or Gerlitz
[not found] ` <5278A757.2070406-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
1 sibling, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2013-11-05 8:07 UTC (permalink / raw)
To: Markus Stockhausen, Erez Shitrit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 04/11/2013 15:21, Markus Stockhausen wrote:
> I changed the client side test to another host with newer firmware. Nevertheless the TSO problem occurs there too.
>
> root@client:~# ibstat
> CA 'mlx4_0'
> CA type: MT25418
> Number of ports: 2
> Firmware version: 2.9.1000
> Hardware version: a0
I see. This didn't happen on our setups here since we tests with newer
cards (ConnectX2/3/3-pro).
For ConnectX1 (A0) and this firmware that you are using smells like
something goes wrong. If possible,
I would change to newish card.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: AW: IPoIB GRO
[not found] ` <CABgxfbEom7fjdshX5AaSXT3P_y=3xFwN9T3V+QXkB0bK-EfNjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-11-05 8:14 ` Or Gerlitz
0 siblings, 0 replies; 16+ messages in thread
From: Or Gerlitz @ 2013-11-05 8:14 UTC (permalink / raw)
To: Wendy Cheng, Markus Stockhausen
Cc: Erez Shitrit, linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 04/11/2013 23:17, Wendy Cheng wrote:
> Look to me that the "segmentation offload" (TSO) and "receive offload (GSO) are mutual exclusive ?
Wendy, the problem Markus stepped on was no GRO on the receive side b/c
of bad ID-ing of packets on the sender side during the TSO process.
GSO is SW TSO, has nothing to do with GRO.
The code you referred two probably protects against an skb attempting to
go through both TSO/GSO and GRO
(e.g on forwarding) which isn't the case here.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* AW: AW: AW: IPoIB GRO
[not found] ` <5278A757.2070406-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-11-05 8:25 ` Markus Stockhausen
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4B3D-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-05 8:25 UTC (permalink / raw)
To: Or Gerlitz, Erez Shitrit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng
[-- Attachment #1: Type: text/plain, Size: 994 bytes --]
> I see. This didn't happen on our setups here since we tests with
> newer cards (ConnectX2/3/3-pro).
> For ConnectX1 (A0) and this firmware that you are using smells
> like something goes wrong. If possible, I would change to newish
> card.
No problem with that. My journey up to here was hard but very
interesting. Especially when you expect everything in the system
to be consistent and new speedups with every new kernel or
driver version. Encountering a throughput drop of nearly 50%
with the upgrade of our NFS servers I was challenged.
With TSO disabled on our old cards I'm back to LRO speeds and
I'm more than happy with that.
Just a final clarification for the interested reader: Are the TCP Ids
in an TSO setup generated through firmware or in the software
stack? And if in firmware: How does the card know how to
increase them? I would expect that it only works with IB packets
and does not know of the IP encapsulation.
Best regards.
Markus
=
[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
Vorstand:
Kadir Akin
Dr. Michael Höhnerbach
Vorsitzender des Aufsichtsrates:
Hans Kristian Langva
Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
executive board:
Kadir Akin
Dr. Michael Höhnerbach
President of the supervisory board:
Hans Kristian Langva
Registry office: district court Cologne
Register number: HRB 52 497
****************************************************************************
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: AW: AW: AW: IPoIB GRO
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4B3D-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
@ 2013-11-05 8:48 ` Erez Shitrit
[not found] ` <5278B0CA.9080305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-11-05 8:49 ` AW: AW: " Or Gerlitz
1 sibling, 1 reply; 16+ messages in thread
From: Erez Shitrit @ 2013-11-05 8:48 UTC (permalink / raw)
To: Markus Stockhausen
Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng
>> I see. This didn't happen on our setups here since we tests with
>> newer cards (ConnectX2/3/3-pro).
>> For ConnectX1 (A0) and this firmware that you are using smells
>> like something goes wrong. If possible, I would change to newish
>> card.
> No problem with that. My journey up to here was hard but very
> interesting. Especially when you expect everything in the system
> to be consistent and new speedups with every new kernel or
> driver version. Encountering a throughput drop of nearly 50%
> with the upgrade of our NFS servers I was challenged.
>
> With TSO disabled on our old cards I'm back to LRO speeds and
> I'm more than happy with that.
>
> Just a final clarification for the interested reader: Are the TCP Ids
> in an TSO setup generated through firmware or in the software
> stack? And if in firmware: How does the card know how to
> increase them? I would expect that it only works with IB packets
> and does not know of the IP encapsulation.
The card (HW) knows how to deal with IP packets, the card is configured
via the FW to increase the ip-id for each ip packet that it is part of
the full message.
so, to summarize:
The HW does the work (truncates the big ip packet to series of ip
packets, each with the relevant mtu size and increases the ip-id for each)
The FW enables that work on the HW
the FW in A0 card doesn't enable that option for the HW.
>
> Best regards.
>
> Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: AW: AW: AW: IPoIB GRO
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4B3D-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-05 8:48 ` Erez Shitrit
@ 2013-11-05 8:49 ` Or Gerlitz
1 sibling, 0 replies; 16+ messages in thread
From: Or Gerlitz @ 2013-11-05 8:49 UTC (permalink / raw)
To: Markus Stockhausen, Erez Shitrit
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng
On 05/11/2013 10:25, Markus Stockhausen wrote:
> Are the TCP Ids in an TSO setup generated through firmware or in the software stack?
in HW
> And if in firmware: How does the card know how to increase them? I would expect that it only works with IB packets
> and does not know of the IP encapsulation.
All vendors networking HW which does TSO gets a hint from the driver
this is TSO packet, in your case see
mlx4_ib_post_send and look for IB_WR_LSO
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: AW: AW: AW: IPoIB GRO
[not found] ` <5278B0CA.9080305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2013-11-05 17:24 ` Jason Gunthorpe
[not found] ` <20131105172431.GA14706-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2013-11-05 17:24 UTC (permalink / raw)
To: Erez Shitrit
Cc: Markus Stockhausen, Or Gerlitz,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng
On Tue, Nov 05, 2013 at 10:48:10AM +0200, Erez Shitrit wrote:
> so, to summarize:
> The HW does the work (truncates the big ip packet to series of ip
> packets, each with the relevant mtu size and increases the ip-id for
> each)
> The FW enables that work on the HW
> the FW in A0 card doesn't enable that option for the HW.
Sounds like this bug causes a performance regression, and it sounds
like it puts incorrect packets on the wire.
This should be patched, have the driver disable TSO for cards that
can't support it...
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* AW: AW: AW: AW: IPoIB GRO
[not found] ` <20131105172431.GA14706-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2013-11-05 18:08 ` Markus Stockhausen
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A508B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-05 18:08 UTC (permalink / raw)
To: Jason Gunthorpe, Erez Shitrit
Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng
[-- Attachment #1: Type: text/plain, Size: 742 bytes --]
> > so, to summarize:
> > The HW does the work (truncates the big ip packet to series of ip
> > packets, each with the relevant mtu size and increases the ip-id for
> > each)
> > The FW enables that work on the HW
> > the FW in A0 card doesn't enable that option for the HW.
>
> Sounds like this bug causes a performance regression, and it sounds
> like it puts incorrect packets on the wire.
>
> This should be patched, have the driver disable TSO for cards that
> can't support it...
>
> Jason
Incredible how a card that does not support TSO can bring big packets
on the wire that somehow get reassembled on the client side :) Maybe
a two liner in mlx4_ib_query_device() could prevent further discussions.
Markus
=
[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
Vorstand:
Kadir Akin
Dr. Michael Höhnerbach
Vorsitzender des Aufsichtsrates:
Hans Kristian Langva
Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
executive board:
Kadir Akin
Dr. Michael Höhnerbach
President of the supervisory board:
Hans Kristian Langva
Registry office: district court Cologne
Register number: HRB 52 497
****************************************************************************
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPoIB GRO
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A508B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
@ 2013-11-06 7:50 ` Or Gerlitz
[not found] ` <5279F4DB.8040202-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2013-11-06 7:50 UTC (permalink / raw)
To: Markus Stockhausen, Jason Gunthorpe, Erez Shitrit
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng
On 05/11/2013 20:08, Markus Stockhausen wrote:
> Incredible how a card that does not support TSO can bring big packets
> on the wire that somehow get reassembled on the client side
not sure to follow, you have shown they are **not** reassembled, correct?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* AW: IPoIB GRO
[not found] ` <5279F4DB.8040202-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-11-06 7:58 ` Markus Stockhausen
0 siblings, 0 replies; 16+ messages in thread
From: Markus Stockhausen @ 2013-11-06 7:58 UTC (permalink / raw)
To: Or Gerlitz, Jason Gunthorpe, Erez Shitrit
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Wendy Cheng
[-- Attachment #1: Type: text/plain, Size: 852 bytes --]
> Von: Or Gerlitz [ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
> Gesendet: Mittwoch, 6. November 2013 08:50
> An: Markus Stockhausen; Jason Gunthorpe; Erez Shitrit
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Wendy Cheng
> Betreff: Re: IPoIB GRO
>
> On 05/11/2013 20:08, Markus Stockhausen wrote:
> > Incredible how a card that does not support TSO can bring big packets
> > on the wire that somehow get reassembled on the client side
>not sure to follow, you have shown they are **not** reassembled, correct?
Sorry for being not correct. I meant that activating TSO
on that particular card seems to be nothing more than
creating fragments. They are reassembled but not in the
GRO path. From my stupid point of view that could have
resulted in much more problems than GRO not working
correctly.
Markus
=
[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
Vorstand:
Kadir Akin
Dr. Michael Höhnerbach
Vorsitzender des Aufsichtsrates:
Hans Kristian Langva
Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
executive board:
Kadir Akin
Dr. Michael Höhnerbach
President of the supervisory board:
Hans Kristian Langva
Registry office: district court Cologne
Register number: HRB 52 497
****************************************************************************
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-11-06 7:58 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-03 10:58 IPoIB GRO Markus Stockhausen
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A3E3B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-04 8:12 ` AW: " Markus Stockhausen
2013-11-04 8:24 ` Erez Shitrit
[not found] ` <527759C6.3070009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-11-04 8:40 ` AW: " Markus Stockhausen
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4301-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-04 12:41 ` Erez Shitrit
[not found] ` <52779612.9020103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-11-04 13:21 ` AW: " Markus Stockhausen
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A45CF-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-04 21:17 ` Wendy Cheng
[not found] ` <CABgxfbEom7fjdshX5AaSXT3P_y=3xFwN9T3V+QXkB0bK-EfNjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-05 8:14 ` Or Gerlitz
2013-11-05 8:07 ` AW: " Or Gerlitz
[not found] ` <5278A757.2070406-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-05 8:25 ` AW: " Markus Stockhausen
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A4B3D-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-05 8:48 ` Erez Shitrit
[not found] ` <5278B0CA.9080305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-11-05 17:24 ` Jason Gunthorpe
[not found] ` <20131105172431.GA14706-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-11-05 18:08 ` AW: " Markus Stockhausen
[not found] ` <12EF8D94C6F8734FB2FF37B9FBEDD173585A508B-Xnr6BND5kcg29+KCeZIpYi5l6jQMEky5@public.gmane.org>
2013-11-06 7:50 ` Or Gerlitz
[not found] ` <5279F4DB.8040202-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-06 7:58 ` AW: " Markus Stockhausen
2013-11-05 8:49 ` AW: AW: " Or Gerlitz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.