* BUG:af_packet fails to TX TSO frames @ 2017-10-11 8:39 Anton Ivanov 2017-10-11 13:39 ` Willem de Bruijn 0 siblings, 1 reply; 27+ messages in thread From: Anton Ivanov @ 2017-10-11 8:39 UTC (permalink / raw) To: netdev; +Cc: davem Hi all, I am having an issue with af_packet.c It fails to transmit any TSO frame submitted via raw socket + vnet headers. An identical frame is considered valid for tap. The frames are generated out of legit linux skbufs (in UML) and vnet headers work for checksumming on raw, so I should have the raw initialization right. The header is supposedly parsed correctly and the newly formed skbuf is sent to the device transmit routine (or enqueued) . I have debugged it as far as it reaching the following line in packet_snd() (line 2592 in 4.13): err = po->xmit(skb); This returns NET_XMIT_DROP for any TSO capable device I tested. They dislike the frame. Same frame is accepted by tap. I have went through the header parsing and skb allocation code in both af_packet and tap several times and I do not see any material difference (except the new zerocopy stuff). So, frankly, I am stuck. Can someone help me to debug this. I do not see an easy way to debug it, but this is not a part of the kernel I am familiar with. Is there a suitable helper function to try to segment the frame and see exactly what is wrong with it? Cc-ing DaveM as this has no specific maintainer so it falls under his umbrella remit. -- Anton R. Ivanov Cambridge Greys Limited, England and Wales company No 10273661 http://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 8:39 BUG:af_packet fails to TX TSO frames Anton Ivanov @ 2017-10-11 13:39 ` Willem de Bruijn 2017-10-11 13:50 ` Anton Ivanov 2017-10-13 7:25 ` Not BUG, feature :) af_packet " Anton Ivanov 0 siblings, 2 replies; 27+ messages in thread From: Willem de Bruijn @ 2017-10-11 13:39 UTC (permalink / raw) To: Anton Ivanov; +Cc: Network Development, David Miller On Wed, Oct 11, 2017 at 4:39 AM, Anton Ivanov <anton.ivanov@cambridgegreys.com> wrote: > Hi all, > > I am having an issue with af_packet.c > > It fails to transmit any TSO frame submitted via raw socket + vnet headers. > An identical frame is considered valid for tap. This may be due to validation. As of commit 104ba78c9880 ("packet: on direct_xmit, limit tso and csum to supported devices") the packet socket code validates TSO packets and HW support in the direct_xmit path. Do you have a test program or packet (incl. vnet hdr) to reproduce this with? I usually test this path with https://github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c > The frames are generated out of legit linux skbufs (in UML) and vnet headers > work for checksumming on raw, so I should have the raw initialization right. > > The header is supposedly parsed correctly and the newly formed skbuf is sent > to the device transmit routine (or enqueued) . I have debugged it as far as > it reaching the following line in packet_snd() (line 2592 in 4.13): > > err = po->xmit(skb); That maps either on to packet_direct_xmit or dev_queue_xmit. > > This returns NET_XMIT_DROP for any TSO capable device I tested. You can also try perf record -a -g -e skb:kfree_skb sleep 10 perf report to see where these packets are dropped. > They dislike > the frame. Same frame is accepted by tap. I have went through the header > parsing and skb allocation code in both af_packet and tap several times and > I do not see any material difference (except the new zerocopy stuff). So, > frankly, I am stuck. > > Can someone help me to debug this. I do not see an easy way to debug it, but > this is not a part of the kernel I am familiar with. Is there a suitable > helper function to try to segment the frame and see exactly what is wrong > with it? > > Cc-ing DaveM as this has no specific maintainer so it falls under his > umbrella remit. > > -- > Anton R. Ivanov > > Cambridge Greys Limited, England and Wales company No 10273661 > http://www.cambridgegreys.com/ > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 13:39 ` Willem de Bruijn @ 2017-10-11 13:50 ` Anton Ivanov 2017-10-11 15:54 ` Anton Ivanov 2017-10-13 7:25 ` Not BUG, feature :) af_packet " Anton Ivanov 1 sibling, 1 reply; 27+ messages in thread From: Anton Ivanov @ 2017-10-11 13:50 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development, David Miller On 10/11/17 14:39, Willem de Bruijn wrote: > On Wed, Oct 11, 2017 at 4:39 AM, Anton Ivanov > <anton.ivanov@cambridgegreys.com> wrote: >> Hi all, >> >> I am having an issue with af_packet.c >> >> It fails to transmit any TSO frame submitted via raw socket + vnet headers. >> An identical frame is considered valid for tap. > This may be due to validation. As of commit 104ba78c9880 ("packet: on > direct_xmit, limit tso and csum to supported devices") the packet socket > code validates TSO packets and HW support in the direct_xmit path. I will look at it. I have tried with bridge+vEth (raw socket on the vEth) and directly on a tg3 and e1000e. All of these should be tso capable. CSUM definitely works in both cases, it is only TSO and only via raw socket on TX which is refusing to work. > > Do you have a test program or packet (incl. vnet hdr) to reproduce this > with? I usually test this path with My test program at present is UML instance with my vector IO patches. Latest version just went to the uml mailing list and can be pulled from http://foswiki.kot-begemot.co.uk/Main/EatYourOwnDogFoodOnUML I am going to dump one of the frames being produced and isolate it for a standalone test case. > > https://github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c > >> The frames are generated out of legit linux skbufs (in UML) and vnet headers >> work for checksumming on raw, so I should have the raw initialization right. >> >> The header is supposedly parsed correctly and the newly formed skbuf is sent >> to the device transmit routine (or enqueued) . I have debugged it as far as >> it reaching the following line in packet_snd() (line 2592 in 4.13): >> >> err = po->xmit(skb); > That maps either on to packet_direct_xmit or dev_queue_xmit. I know. In my case it is direct_xmit as I have asked for QDISC bypass. > >> This returns NET_XMIT_DROP for any TSO capable device I tested. > You can also try > > perf record -a -g -e skb:kfree_skb sleep 10 > perf report > > to see where these packets are dropped. Thanks, will try that. > >> They dislike >> the frame. Same frame is accepted by tap. I have went through the header >> parsing and skb allocation code in both af_packet and tap several times and >> I do not see any material difference (except the new zerocopy stuff). So, >> frankly, I am stuck. >> >> Can someone help me to debug this. I do not see an easy way to debug it, but >> this is not a part of the kernel I am familiar with. Is there a suitable >> helper function to try to segment the frame and see exactly what is wrong >> with it? >> >> Cc-ing DaveM as this has no specific maintainer so it falls under his >> umbrella remit. >> >> -- >> Anton R. Ivanov >> >> Cambridge Greys Limited, England and Wales company No 10273661 >> http://www.cambridgegreys.com/ >> -- Anton R. Ivanov Cambridge Greys Limited, England and Wales company No 10273661 http://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 13:50 ` Anton Ivanov @ 2017-10-11 15:54 ` Anton Ivanov 2017-10-11 16:26 ` Willem de Bruijn 0 siblings, 1 reply; 27+ messages in thread From: Anton Ivanov @ 2017-10-11 15:54 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development, David Miller It is that patch. I rolled it back and immediately got it to work correctly on a Broadcom Tigon. I can test on all other scenarios, I have tried, I suspect it will come back alive on all of them. I am going to try to trace it through and see exactly where it drops a skb which the card has no issues in accepting. A. On 10/11/17 14:50, Anton Ivanov wrote: > > > On 10/11/17 14:39, Willem de Bruijn wrote: >> On Wed, Oct 11, 2017 at 4:39 AM, Anton Ivanov >> <anton.ivanov@cambridgegreys.com> wrote: >>> Hi all, >>> >>> I am having an issue with af_packet.c >>> >>> It fails to transmit any TSO frame submitted via raw socket + vnet >>> headers. >>> An identical frame is considered valid for tap. >> This may be due to validation. As of commit 104ba78c9880 ("packet: on >> direct_xmit, limit tso and csum to supported devices") the packet socket >> code validates TSO packets and HW support in the direct_xmit path. > > I will look at it. I have tried with bridge+vEth (raw socket on the > vEth) and directly on a tg3 and e1000e. All of these should be tso > capable. > > CSUM definitely works in both cases, it is only TSO and only via raw > socket on TX which is refusing to work. > >> >> Do you have a test program or packet (incl. vnet hdr) to reproduce this >> with? I usually test this path with > > My test program at present is UML instance with my vector IO patches. > Latest version just went to the uml mailing list and can be pulled from > > http://foswiki.kot-begemot.co.uk/Main/EatYourOwnDogFoodOnUML > > I am going to dump one of the frames being produced and isolate it for > a standalone test case. > >> >> https://github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c >> >> >>> The frames are generated out of legit linux skbufs (in UML) and vnet >>> headers >>> work for checksumming on raw, so I should have the raw >>> initialization right. >>> >>> The header is supposedly parsed correctly and the newly formed skbuf >>> is sent >>> to the device transmit routine (or enqueued) . I have debugged it as >>> far as >>> it reaching the following line in packet_snd() (line 2592 in 4.13): >>> >>> err = po->xmit(skb); >> That maps either on to packet_direct_xmit or dev_queue_xmit. > > I know. In my case it is direct_xmit as I have asked for QDISC bypass. > >> >>> This returns NET_XMIT_DROP for any TSO capable device I tested. >> You can also try >> >> perf record -a -g -e skb:kfree_skb sleep 10 >> perf report >> >> to see where these packets are dropped. > > Thanks, will try that. > >> >>> They dislike >>> the frame. Same frame is accepted by tap. I have went through the >>> header >>> parsing and skb allocation code in both af_packet and tap several >>> times and >>> I do not see any material difference (except the new zerocopy >>> stuff). So, >>> frankly, I am stuck. >>> >>> Can someone help me to debug this. I do not see an easy way to debug >>> it, but >>> this is not a part of the kernel I am familiar with. Is there a >>> suitable >>> helper function to try to segment the frame and see exactly what is >>> wrong >>> with it? >>> >>> Cc-ing DaveM as this has no specific maintainer so it falls under his >>> umbrella remit. >>> >>> -- >>> Anton R. Ivanov >>> >>> Cambridge Greys Limited, England and Wales company No 10273661 >>> http://www.cambridgegreys.com/ >>> > -- Anton R. Ivanov Cambridge Greys Limited, England and Wales company No 10273661 http://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 15:54 ` Anton Ivanov @ 2017-10-11 16:26 ` Willem de Bruijn 2017-10-11 16:32 ` Anton Ivanov 2017-10-11 18:39 ` Anton Ivanov 0 siblings, 2 replies; 27+ messages in thread From: Willem de Bruijn @ 2017-10-11 16:26 UTC (permalink / raw) To: Anton Ivanov; +Cc: Network Development, David Miller On Wed, Oct 11, 2017 at 11:54 AM, Anton Ivanov <anton.ivanov@cambridgegreys.com> wrote: > It is that patch. > > I rolled it back and immediately got it to work correctly on a Broadcom > Tigon. I can test on all other scenarios, I have tried, I suspect it will > come back alive on all of them. > > I am going to try to trace it through and see exactly where it drops a skb > which the card has no issues in accepting. It might be in the initialization of gso_type and csum. The virtio_net_hdr can encode various combinations of flags that are not allowed by the validation logic. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 16:26 ` Willem de Bruijn @ 2017-10-11 16:32 ` Anton Ivanov 2017-10-11 18:39 ` Anton Ivanov 1 sibling, 0 replies; 27+ messages in thread From: Anton Ivanov @ 2017-10-11 16:32 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development, David Miller Working through it at the moment. The validation logic is prohibiting what the hardware considers to be a perfectly legit skb. Once I narrow down the culprit I will come back with my findings. Thanks for pointing me where to look by the way. A. On 10/11/17 17:26, Willem de Bruijn wrote: > On Wed, Oct 11, 2017 at 11:54 AM, Anton Ivanov > <anton.ivanov@cambridgegreys.com> wrote: >> It is that patch. >> >> I rolled it back and immediately got it to work correctly on a Broadcom >> Tigon. I can test on all other scenarios, I have tried, I suspect it will >> come back alive on all of them. >> >> I am going to try to trace it through and see exactly where it drops a skb >> which the card has no issues in accepting. > It might be in the initialization of gso_type and csum. The virtio_net_hdr > can encode various combinations of flags that are not allowed by the > validation logic. > -- Anton R. Ivanov Cambridge Greys Limited, England and Wales company No 10273661 http://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 16:26 ` Willem de Bruijn 2017-10-11 16:32 ` Anton Ivanov @ 2017-10-11 18:39 ` Anton Ivanov 2017-10-11 18:57 ` Willem de Bruijn 1 sibling, 1 reply; 27+ messages in thread From: Anton Ivanov @ 2017-10-11 18:39 UTC (permalink / raw) To: Willem de Bruijn, Anton Ivanov; +Cc: Network Development, David Miller The check as now insists that the actual driver supports GSO_ROBUST, because we have marked the skb dodgy. The specific bit which does this check is in net_gso_ok() Now, lets's see how many Ethernet drivers set GSO_ROBUST. find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST {} \; That returns nothing in 4.x IMHO - af_packet allocates the skb, does all checks (and extra may be added) on the gso, why is this set dodgy in the first place? A. On 11/10/17 17:26, Willem de Bruijn wrote: > On Wed, Oct 11, 2017 at 11:54 AM, Anton Ivanov > <anton.ivanov@cambridgegreys.com> wrote: >> It is that patch. >> >> I rolled it back and immediately got it to work correctly on a Broadcom >> Tigon. I can test on all other scenarios, I have tried, I suspect it will >> come back alive on all of them. >> >> I am going to try to trace it through and see exactly where it drops a skb >> which the card has no issues in accepting. > It might be in the initialization of gso_type and csum. The virtio_net_hdr > can encode various combinations of flags that are not allowed by the > validation logic. > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 18:39 ` Anton Ivanov @ 2017-10-11 18:57 ` Willem de Bruijn 2017-10-11 19:39 ` Anton Ivanov 0 siblings, 1 reply; 27+ messages in thread From: Willem de Bruijn @ 2017-10-11 18:57 UTC (permalink / raw) To: Anton Ivanov; +Cc: Anton Ivanov, Network Development, David Miller On Wed, Oct 11, 2017 at 2:39 PM, Anton Ivanov <anton.ivanov@kot-begemot.co.uk> wrote: > The check as now insists that the actual driver supports GSO_ROBUST, because > we have marked the skb dodgy. > > The specific bit which does this check is in net_gso_ok() > > Now, lets's see how many Ethernet drivers set GSO_ROBUST. > > find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST > {} \; > > That returns nothing in 4.x > > IMHO - af_packet allocates the skb, does all checks (and extra may be added) > on the gso, why is this set dodgy in the first place? It is set when the header has to be validated. The segmentation logic will validate and fixup gso_segs. See for instance tcp_gso_segment: if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) { /* Packet is from an untrusted source, reset gso_segs. */ skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss); segs = NULL; goto out; } If the device would have the robust bit set and otherwise supports the required features, fix up gso_segs and pass the large packet to the device. Else it continues to the software gso path. Large packets generated with psock_txring_vnet.c pass this test. I suspect that there is a subtle difference in the virtio_net_hdr fields that that generates vs. your program. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 18:57 ` Willem de Bruijn @ 2017-10-11 19:39 ` Anton Ivanov 2017-10-11 21:02 ` [uml-devel] Fwd: " Anton Ivanov 2017-10-11 21:05 ` Willem de Bruijn 0 siblings, 2 replies; 27+ messages in thread From: Anton Ivanov @ 2017-10-11 19:39 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Anton Ivanov, Network Development, David Miller On 11/10/17 19:57, Willem de Bruijn wrote: > On Wed, Oct 11, 2017 at 2:39 PM, Anton Ivanov > <anton.ivanov@kot-begemot.co.uk> wrote: >> The check as now insists that the actual driver supports GSO_ROBUST, because >> we have marked the skb dodgy. >> >> The specific bit which does this check is in net_gso_ok() >> >> Now, lets's see how many Ethernet drivers set GSO_ROBUST. >> >> find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST >> {} \; >> >> That returns nothing in 4.x >> >> IMHO - af_packet allocates the skb, does all checks (and extra may be added) >> on the gso, why is this set dodgy in the first place? > It is set when the header has to be validated. > > The segmentation logic will validate and fixup gso_segs. See for > instance tcp_gso_segment: > > if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) { > /* Packet is from an untrusted source, reset gso_segs. */ > > skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss); > > segs = NULL; > goto out; > } > > If the device would have the robust bit set and otherwise supports the > required features, fix up gso_segs and pass the large packet to the > device. > > Else it continues to the software gso path. > > Large packets generated with psock_txring_vnet.c pass this test. I That test is indeed a different path - this goes via the tpacket_snd which allocs via sock_alloc_send_skb. That results in a non-fragged skb as it calls pskb after that with data_len = 0 asking for a contiguous one. My stuff is using sendmmsg which ends up via packet_snd which allocs via sock_alloc_send_pskb which is invoked in a way which always creates 2 segments - one for the linear section and one for the rest (and more if needed). It is faster than tpacket by the way (several times). As a comparison tap and other virtual drivers use sock_alloc_send_pskb with non-zero data length which results in multiple frags. The code in packet_snd is in fact identical with tap (+/- some cosmetic differences). That is the difference between the tests and that is why your test works and mine fails. Now, alloc-ing a 64k contiguous skb every time IMHO is wrong. So the logic in the xmit check at present works only because it is given only a very corner case for a GSO frame and tested versus it. It should work with the generic case which is what comes out of sock_alloc_send_pskb (same as in tap). A. > suspect that there is a subtle difference in the virtio_net_hdr fields > that that generates vs. your program. > ^ permalink raw reply [flat|nested] 27+ messages in thread
* [uml-devel] Fwd: Re: BUG:af_packet fails to TX TSO frames 2017-10-11 19:39 ` Anton Ivanov @ 2017-10-11 21:02 ` Anton Ivanov 2017-10-11 21:05 ` Willem de Bruijn 1 sibling, 0 replies; 27+ messages in thread From: Anton Ivanov @ 2017-10-11 21:02 UTC (permalink / raw) To: user-mode-linux-devel [-- Attachment #1.1: Type: text/plain, Size: 3374 bytes --] And that is the culprit of why GSO does not work with raw sockets. There can be no workaround unfortunately, this simply will need to be enabled in the driver once the kernel is fixed. It should work for older kernels prior to the fix in comit104ba78c9880 too. So from that perspective there is nothing that can be done to the vector io drivers - the code is correct, it is the host kernel side which is broken :( A. -------- Forwarded Message -------- Subject: Re: BUG:af_packet fails to TX TSO frames Date: Wed, 11 Oct 2017 20:39:57 +0100 From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk> To: Willem de Bruijn <willemdebruijn.kernel@gmail.com> CC: Anton Ivanov <anton.ivanov@cambridgegreys.com>, Network Development <netdev@vger.kernel.org>, David Miller <davem@davemloft.net> On 11/10/17 19:57, Willem de Bruijn wrote: > On Wed, Oct 11, 2017 at 2:39 PM, Anton Ivanov > <anton.ivanov@kot-begemot.co.uk> wrote: >> The check as now insists that the actual driver supports GSO_ROBUST, because >> we have marked the skb dodgy. >> >> The specific bit which does this check is in net_gso_ok() >> >> Now, lets's see how many Ethernet drivers set GSO_ROBUST. >> >> find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST >> {} \; >> >> That returns nothing in 4.x >> >> IMHO - af_packet allocates the skb, does all checks (and extra may be added) >> on the gso, why is this set dodgy in the first place? > It is set when the header has to be validated. > > The segmentation logic will validate and fixup gso_segs. See for > instance tcp_gso_segment: > > if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) { > /* Packet is from an untrusted source, reset gso_segs. */ > > skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss); > > segs = NULL; > goto out; > } > > If the device would have the robust bit set and otherwise supports the > required features, fix up gso_segs and pass the large packet to the > device. > > Else it continues to the software gso path. > > Large packets generated with psock_txring_vnet.c pass this test. I That test is indeed a different path - this goes via the tpacket_snd which allocs via sock_alloc_send_skb. That results in a non-fragged skb as it calls pskb after that with data_len = 0 asking for a contiguous one. My stuff is using sendmmsg which ends up via packet_snd which allocs via sock_alloc_send_pskb which is invoked in a way which always creates 2 segments - one for the linear section and one for the rest (and more if needed). It is faster than tpacket by the way (several times). As a comparison tap and other virtual drivers use sock_alloc_send_pskb with non-zero data length which results in multiple frags. The code in packet_snd is in fact identical with tap (+/- some cosmetic differences). That is the difference between the tests and that is why your test works and mine fails. Now, alloc-ing a 64k contiguous skb every time IMHO is wrong. So the logic in the xmit check at present works only because it is given only a very corner case for a GSO frame and tested versus it. It should work with the generic case which is what comes out of sock_alloc_send_pskb (same as in tap). A. > suspect that there is a subtle difference in the virtio_net_hdr fields > that that generates vs. your program. > [-- Attachment #1.2: Type: text/html, Size: 5188 bytes --] [-- Attachment #2: Type: text/plain, Size: 202 bytes --] ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot [-- Attachment #3: Type: text/plain, Size: 194 bytes --] _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 19:39 ` Anton Ivanov 2017-10-11 21:02 ` [uml-devel] Fwd: " Anton Ivanov @ 2017-10-11 21:05 ` Willem de Bruijn 2017-10-11 21:55 ` Anton Ivanov 1 sibling, 1 reply; 27+ messages in thread From: Willem de Bruijn @ 2017-10-11 21:05 UTC (permalink / raw) To: Anton Ivanov; +Cc: Anton Ivanov, Network Development, David Miller On Wed, Oct 11, 2017 at 3:39 PM, Anton Ivanov <anton.ivanov@kot-begemot.co.uk> wrote: > On 11/10/17 19:57, Willem de Bruijn wrote: >> On Wed, Oct 11, 2017 at 2:39 PM, Anton Ivanov >> <anton.ivanov@kot-begemot.co.uk> wrote: >>> The check as now insists that the actual driver supports GSO_ROBUST, because >>> we have marked the skb dodgy. >>> >>> The specific bit which does this check is in net_gso_ok() >>> >>> Now, lets's see how many Ethernet drivers set GSO_ROBUST. >>> >>> find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST >>> {} \; >>> >>> That returns nothing in 4.x >>> >>> IMHO - af_packet allocates the skb, does all checks (and extra may be added) >>> on the gso, why is this set dodgy in the first place? >> It is set when the header has to be validated. >> >> The segmentation logic will validate and fixup gso_segs. See for >> instance tcp_gso_segment: >> >> if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) { >> /* Packet is from an untrusted source, reset gso_segs. */ >> >> skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss); >> >> segs = NULL; >> goto out; >> } >> >> If the device would have the robust bit set and otherwise supports the >> required features, fix up gso_segs and pass the large packet to the >> device. >> >> Else it continues to the software gso path. >> >> Large packets generated with psock_txring_vnet.c pass this test. I > > That test is indeed a different path The test can be run both with and without ring: psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v -N both with and without qdisc bypass ('-q'). > - this goes via the tpacket_snd > which allocs via sock_alloc_send_skb. That results in a non-fragged skb > as it calls pskb after that with data_len = 0 asking for a contiguous one. but attached the ring slot as fragments in tpacket_fill_skb. > My stuff is using sendmmsg which ends up via packet_snd which allocs > via sock_alloc_send_pskb which is invoked in a way which always creates > 2 segments - one for the linear section and one for the rest (and more > if needed). It is faster than tpacket by the way (several times). > > As a comparison tap and other virtual drivers use sock_alloc_send_pskb > with non-zero data length which results in multiple frags. The code in > packet_snd is in fact identical with tap (+/- some cosmetic differences). > > That is the difference between the tests and that is why your test works > and mine fails. All the above test cases work for me, including those that build skbs with fragments. Could you try those. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 21:05 ` Willem de Bruijn @ 2017-10-11 21:55 ` Anton Ivanov 2017-10-11 22:01 ` Anton Ivanov 0 siblings, 1 reply; 27+ messages in thread From: Anton Ivanov @ 2017-10-11 21:55 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Anton Ivanov, Network Development, David Miller [snip] > The test can be run both with and without ring: > > psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v > psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v -N > > both with and without qdisc bypass ('-q'). Thanks, apologies, I was being inpatient. Started reading the source, saw the tpacket bits and stopped there. > >> - this goes via the tpacket_snd >> which allocs via sock_alloc_send_skb. That results in a non-fragged skb >> as it calls pskb after that with data_len = 0 asking for a contiguous one. > but attached the ring slot as fragments in tpacket_fill_skb. > >> My stuff is using sendmmsg which ends up via packet_snd which allocs >> via sock_alloc_send_pskb which is invoked in a way which always creates >> 2 segments - one for the linear section and one for the rest (and more >> if needed). It is faster than tpacket by the way (several times). >> >> As a comparison tap and other virtual drivers use sock_alloc_send_pskb >> with non-zero data length which results in multiple frags. The code in >> packet_snd is in fact identical with tap (+/- some cosmetic differences). >> >> That is the difference between the tests and that is why your test works >> and mine fails. > All the above test cases work for me, including those that build skbs > with fragments. Could you try those. Tried it, works on all of the adapters and hosts where mine fails. I will step by step hack-in the differences so it behaves same as mine until I find the culprit. This will be tomorrow though, it is late here. The only obvious difference I can see at this point is that I am using iovs and sending the vnet header as iov[0] and the data in pieces after that while your code is doing a send() for the whole frame. This should not make any difference though - it all ends up as an iov internally in the kernel. A. > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 21:55 ` Anton Ivanov @ 2017-10-11 22:01 ` Anton Ivanov 2017-10-12 0:19 ` Willem de Bruijn 0 siblings, 1 reply; 27+ messages in thread From: Anton Ivanov @ 2017-10-11 22:01 UTC (permalink / raw) To: Anton Ivanov, Willem de Bruijn; +Cc: Network Development, David Miller [snip] > This will be tomorrow though, it is late here. > > The only obvious difference I can see at this point is that I am using > iovs and sending the vnet header as iov[0] and the data in pieces after > that while your code is doing a send() for the whole frame. This should > not make any difference though - it all ends up as an iov internally in > the kernel. Spoke too soon. It is not reporting any errors, but there is nothing coming out on the actual Ethernet. Leaving it till tomorrow. A. > > A. > > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-11 22:01 ` Anton Ivanov @ 2017-10-12 0:19 ` Willem de Bruijn 2017-10-12 6:11 ` Anton Ivanov 0 siblings, 1 reply; 27+ messages in thread From: Willem de Bruijn @ 2017-10-12 0:19 UTC (permalink / raw) To: Anton Ivanov; +Cc: Anton Ivanov, Network Development, David Miller On Wed, Oct 11, 2017 at 6:01 PM, Anton Ivanov <anton.ivanov@cambridgegreys.com> wrote: > [snip] > >> This will be tomorrow though, it is late here. >> >> The only obvious difference I can see at this point is that I am using >> iovs and sending the vnet header as iov[0] and the data in pieces after >> that while your code is doing a send() for the whole frame. This should >> not make any difference though - it all ends up as an iov internally in >> the kernel. > > Spoke too soon. It is not reporting any errors, but there is nothing > coming out on the actual Ethernet. It works for me on various platforms. On the receiver, drop these fake tcp packets in iptables and read them with tcpdump iptables -A PREROUTING -t raw -p tcp --dport 9 -j DROP tcpdump src $src_ip Note that not all combinations of flags are supported by the kernel and that some flags have non-obvious behavior (disable a feature, in place of enable it). Specifically, mtu sized packets either must not pass a vnet_hdr or must pass one with gso explicitly disabled ('-G'). psock_txring_vnet -s $src_ip $dst_ip -l 1400 psock_txring_vnet -s $src_ip $dst_ip -l 1400 -v -G psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N -v -G Conversely, packets that exceed mtu have to have the gso flags in the virtio_net_hdr: psock_txring_vnet -s $src_ip $dst_ip -l 4400 -v psock_txring_vnet -s $src_ip $dst_ip -l 4400 -N -v When sending a large packet, but not passing a virtio_net_hdr along ('-v'), the test fails with psock_txring_vnet: send: Message too long When passing a header along, but not disabling gso, the packet is indeed dropped silently. I verified correct segmentation with three modes of ethtool ethtool -K eth0 tso off gso off ethtool -K eth0 tso off gso on ethtool -K eth0 tso on gso on by reading tcpdump on the sender. The receive side results are the same with dev_queue_xmit and packet_direct_xmit ('-q') mode. With direct_xmit, the packets are not observed on the send side. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 0:19 ` Willem de Bruijn @ 2017-10-12 6:11 ` Anton Ivanov 2017-10-12 8:46 ` Anton Ivanov 0 siblings, 1 reply; 27+ messages in thread From: Anton Ivanov @ 2017-10-12 6:11 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Anton Ivanov, Network Development, David Miller On 12/10/17 01:19, Willem de Bruijn wrote: > On Wed, Oct 11, 2017 at 6:01 PM, Anton Ivanov > <anton.ivanov@cambridgegreys.com> wrote: >> [snip] >> >>> This will be tomorrow though, it is late here. >>> >>> The only obvious difference I can see at this point is that I am using >>> iovs and sending the vnet header as iov[0] and the data in pieces after >>> that while your code is doing a send() for the whole frame. This should >>> not make any difference though - it all ends up as an iov internally in >>> the kernel. >> Spoke too soon. It is not reporting any errors, but there is nothing >> coming out on the actual Ethernet. Different issue - switch was blacklisting the fake dst MAC after the first packet as a suspected flood attack. It worked after I changed the mac in the source to a "real" one. In fact, an argument for that would be nice :) I will go through it step by step now and figure out exactly what and where is wrong with the framing on my side. Thanks for your help. A. > It works for me on various platforms. On the receiver, drop these fake > tcp packets in iptables and read them with tcpdump > > iptables -A PREROUTING -t raw -p tcp --dport 9 -j DROP > tcpdump src $src_ip > > Note that not all combinations of flags are supported by the kernel > and that some flags have non-obvious behavior (disable a feature, in > place of enable it). > > Specifically, mtu sized packets either must not pass a vnet_hdr or > must pass one with gso explicitly disabled ('-G'). > > psock_txring_vnet -s $src_ip $dst_ip -l 1400 > psock_txring_vnet -s $src_ip $dst_ip -l 1400 -v -G > psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N > psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N -v -G > > Conversely, packets that exceed mtu have to have the gso flags in the > virtio_net_hdr: > > psock_txring_vnet -s $src_ip $dst_ip -l 4400 -v > psock_txring_vnet -s $src_ip $dst_ip -l 4400 -N -v > > When sending a large packet, but not passing a virtio_net_hdr along > ('-v'), the test fails with > > psock_txring_vnet: send: Message too long > > When passing a header along, but not disabling gso, the packet is > indeed dropped silently. > > I verified correct segmentation with three modes of ethtool > > ethtool -K eth0 tso off gso off > ethtool -K eth0 tso off gso on > ethtool -K eth0 tso on gso on > > by reading tcpdump on the sender. > > The receive side results are the same with dev_queue_xmit and > packet_direct_xmit ('-q') mode. With direct_xmit, the packets are not > observed on the send side. > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 6:11 ` Anton Ivanov @ 2017-10-12 8:46 ` Anton Ivanov 2017-10-12 13:39 ` Willem de Bruijn 0 siblings, 1 reply; 27+ messages in thread From: Anton Ivanov @ 2017-10-12 8:46 UTC (permalink / raw) To: Anton Ivanov, Willem de Bruijn; +Cc: Network Development, David Miller Hi Willem, Thanks for all your help so far. I modified your test program to do sendmmsg as an option in addition to send. I will send you a github pull req with the changes later today. This allows to send let's say n frames (let's say 64) in one syscall - probably the highest TX rate you can get out of Linux userspace. The test case works via that route too which is now getting into the realm of the surreal. If I produce a real vnet frame out of a live kernel frame using virtio_net_hdr_from_skb() and try to send it it fails on the check in af_packet, while succeeding for tap. If I remove the af_packet check the frame is accepted by the hardware too. If I produce it a synthetic frame + vnet header using the test program - it works. Go figure. I am going to continue digging into it. At the very least I now have a positive test case which uses the same semantics as my code so I have something to compare to. A. On 10/12/17 07:11, Anton Ivanov wrote: > On 12/10/17 01:19, Willem de Bruijn wrote: >> On Wed, Oct 11, 2017 at 6:01 PM, Anton Ivanov >> <anton.ivanov@cambridgegreys.com> wrote: >>> [snip] >>> >>>> This will be tomorrow though, it is late here. >>>> >>>> The only obvious difference I can see at this point is that I am using >>>> iovs and sending the vnet header as iov[0] and the data in pieces after >>>> that while your code is doing a send() for the whole frame. This should >>>> not make any difference though - it all ends up as an iov internally in >>>> the kernel. >>> Spoke too soon. It is not reporting any errors, but there is nothing >>> coming out on the actual Ethernet. > Different issue - switch was blacklisting the fake dst MAC after the > first packet as a suspected flood attack. > > It worked after I changed the mac in the source to a "real" one. In > fact, an argument for that > would be nice :) > > I will go through it step by step now and figure out exactly what and > where is wrong with the framing on my side. > > Thanks for your help. > > A. > >> It works for me on various platforms. On the receiver, drop these fake >> tcp packets in iptables and read them with tcpdump >> >> iptables -A PREROUTING -t raw -p tcp --dport 9 -j DROP >> tcpdump src $src_ip >> >> Note that not all combinations of flags are supported by the kernel >> and that some flags have non-obvious behavior (disable a feature, in >> place of enable it). >> >> Specifically, mtu sized packets either must not pass a vnet_hdr or >> must pass one with gso explicitly disabled ('-G'). >> >> psock_txring_vnet -s $src_ip $dst_ip -l 1400 >> psock_txring_vnet -s $src_ip $dst_ip -l 1400 -v -G >> psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N >> psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N -v -G >> >> Conversely, packets that exceed mtu have to have the gso flags in the >> virtio_net_hdr: >> >> psock_txring_vnet -s $src_ip $dst_ip -l 4400 -v >> psock_txring_vnet -s $src_ip $dst_ip -l 4400 -N -v >> >> When sending a large packet, but not passing a virtio_net_hdr along >> ('-v'), the test fails with >> >> psock_txring_vnet: send: Message too long >> >> When passing a header along, but not disabling gso, the packet is >> indeed dropped silently. >> >> I verified correct segmentation with three modes of ethtool >> >> ethtool -K eth0 tso off gso off >> ethtool -K eth0 tso off gso on >> ethtool -K eth0 tso on gso on >> >> by reading tcpdump on the sender. >> >> The receive side results are the same with dev_queue_xmit and >> packet_direct_xmit ('-q') mode. With direct_xmit, the packets are not >> observed on the send side. >> > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 8:46 ` Anton Ivanov @ 2017-10-12 13:39 ` Willem de Bruijn 2017-10-12 14:12 ` Anton Ivanov 0 siblings, 1 reply; 27+ messages in thread From: Willem de Bruijn @ 2017-10-12 13:39 UTC (permalink / raw) To: Anton Ivanov; +Cc: Anton Ivanov, Network Development, David Miller > If I produce a real vnet frame out of a live kernel frame using > virtio_net_hdr_from_skb() and try to send it it fails on the check in > af_packet, while succeeding for tap. If I remove the af_packet check the > frame is accepted by the hardware too. > > If I produce it a synthetic frame + vnet header using the test program - it > works. Go figure. Besides looking at the raw frame bytes, also compare the setup of virtio_net_header, as well as the tcp checksum field. The stack expects the pseudo header to have already been calculated. > I am going to continue digging into it. > > At the very least I now have a positive test case which uses the same > semantics as my code so I have something to compare to. Glad to hear that the test is helpful. I wrote it because I have run into these exact same issues in the past. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 13:39 ` Willem de Bruijn @ 2017-10-12 14:12 ` Anton Ivanov 2017-10-12 15:44 ` Anton Ivanov 0 siblings, 1 reply; 27+ messages in thread From: Anton Ivanov @ 2017-10-12 14:12 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development On 10/12/17 14:39, Willem de Bruijn wrote: >> If I produce a real vnet frame out of a live kernel frame using >> virtio_net_hdr_from_skb() and try to send it it fails on the check in >> af_packet, while succeeding for tap. If I remove the af_packet check the >> frame is accepted by the hardware too. >> >> If I produce it a synthetic frame + vnet header using the test program - it >> works. Go figure. > Besides looking at the raw frame bytes, also compare the setup > of virtio_net_header, as well as the tcp checksum field. The stack > expects the pseudo header to have already been calculated. I am feeding it a skb which is coming up in the tx routine of a User Mode Linux device which is marked as NETIF_F_HW_CSUM and SG - that results in a skb with csum-ed headers, body set up for CSUM_PARTIAL and multiple fragments (always at least 1 more frag besides the TCP head). That has everything in order as expected by virtio_net_hdr_from_skb and this is what I use to generate the vnet header. It works correctly for csum and GRO with af_packet and it works correctly for everything using a tap device. It fails only on GSO + af_packet TX. What I am doing is the same thing virtio_net does - it just takes the output of virtio_net_hdr_from_skb and does nothing more. There should be no need to do anything more :( It should just work. Unless there is a gremlin somewhere in the machinery and that gremlin needs some light to be flushed out. > >> I am going to continue digging into it. >> >> At the very least I now have a positive test case which uses the same >> semantics as my code so I have something to compare to. > Glad to hear that the test is helpful. I wrote it because I > have run into these exact same issues in the past. It is. I have changes ready for it so it also supports vector IO, need to finish fighting with it. A. > -- Anton R. Ivanov Cambridge Greys Limited, England and Wales company No 10273661 http://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 14:12 ` Anton Ivanov @ 2017-10-12 15:44 ` Anton Ivanov 2017-10-12 15:57 ` [uml-devel] " Anton Ivanov 2017-10-12 16:30 ` Willem de Bruijn 0 siblings, 2 replies; 27+ messages in thread From: Anton Ivanov @ 2017-10-12 15:44 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development, David Miller Found it. Two bugs canceling each other. The bind sequence in: psock_txring_vnet.c is wrong. It does the following addr.sll_protocol = htons(ETH_P_IP); before calling bind. If you set addr.sll_protocol to ETH_P_ALL where it should have been in the first place the test program blows up with -ENOBUFS I think what is happening is that this value is taken into account when looking at "what should I use to segment it with" in skb_mac_gso_segment which is invoked at the end of the verification chain which starts in packet_direct_xmit in af_packet.c I have not tried the other test cases like setting it to ETH_P_IP and giving it IPv6 traffic or the opposite, but my guess is that these will fail too if they need GSO to be applied. A. On 10/12/17 15:12, Anton Ivanov wrote: > > > On 10/12/17 14:39, Willem de Bruijn wrote: >>> If I produce a real vnet frame out of a live kernel frame using >>> virtio_net_hdr_from_skb() and try to send it it fails on the check in >>> af_packet, while succeeding for tap. If I remove the af_packet check >>> the >>> frame is accepted by the hardware too. >>> >>> If I produce it a synthetic frame + vnet header using the test >>> program - it >>> works. Go figure. >> Besides looking at the raw frame bytes, also compare the setup >> of virtio_net_header, as well as the tcp checksum field. The stack >> expects the pseudo header to have already been calculated. > > I am feeding it a skb which is coming up in the tx routine of a User > Mode Linux device which is marked as NETIF_F_HW_CSUM and SG - that > results in a skb with csum-ed headers, body set up for CSUM_PARTIAL > and multiple fragments (always at least 1 more frag besides the TCP > head). > > That has everything in order as expected by virtio_net_hdr_from_skb > and this is what I use to generate the vnet header. It works correctly > for csum and GRO with af_packet and it works correctly for everything > using a tap device. It fails only on GSO + af_packet TX. > > What I am doing is the same thing virtio_net does - it just takes the > output of virtio_net_hdr_from_skb and does nothing more. There should > be no need to do anything more :( > > It should just work. > > Unless there is a gremlin somewhere in the machinery and that gremlin > needs some light to be flushed out. >> >>> I am going to continue digging into it. >>> >>> At the very least I now have a positive test case which uses the same >>> semantics as my code so I have something to compare to. >> Glad to hear that the test is helpful. I wrote it because I >> have run into these exact same issues in the past. > > It is. I have changes ready for it so it also supports vector IO, need > to finish fighting with it. > > A. > >> > -- Anton R. Ivanov Cambridge Greys Limited, England and Wales company No 10273661 http://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 15:44 ` Anton Ivanov @ 2017-10-12 15:57 ` Anton Ivanov 2017-10-12 16:30 ` Willem de Bruijn 1 sibling, 0 replies; 27+ messages in thread From: Anton Ivanov @ 2017-10-12 15:57 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development, David Miller, user-mode-linux-devel Also confirmed using my UML patchset. This is inside the UML guest. root@blinky:~# iperf -c 192.168.98.1 ------------------------------------------------------------ Client connecting to 192.168.98.1, TCP port 5001 TCP window size: 414 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.98.146 port 36744 connected with 192.168.98.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 4.52 GBytes 3.89 Gbits/sec This is GSO on a raw socket Virtual NIC to host. Normal speed without GSO on same machine is ~ 500. So for the time being I can turn off TSO for anything but v4 in the User Mode Linux patchset (declare the guest NIC as TSOV4 capable only) and enable it once the host kernel is fixed. Alternatively, there is the rather ugly approach of using multiple FDs - one for v4, one for v6, one for... A. On 10/12/17 16:44, Anton Ivanov wrote: > Found it. > > Two bugs canceling each other. > The bind sequence in: psock_txring_vnet.c is wrong. > > It does the following addr.sll_protocol = htons(ETH_P_IP); > before calling bind. > > If you set addr.sll_protocol to ETH_P_ALL where it should have been in > the first place the test program blows up with -ENOBUFS > > I think what is happening is that this value is taken into account > when looking at "what should I use to segment it with" in > skb_mac_gso_segment which is invoked at the end of the verification > chain which starts in packet_direct_xmit in af_packet.c > > I have not tried the other test cases like setting it to ETH_P_IP and > giving it IPv6 traffic or the opposite, but my guess is that these > will fail too if they need GSO to be applied. > > A. > > On 10/12/17 15:12, Anton Ivanov wrote: >> >> >> On 10/12/17 14:39, Willem de Bruijn wrote: >>>> If I produce a real vnet frame out of a live kernel frame using >>>> virtio_net_hdr_from_skb() and try to send it it fails on the check in >>>> af_packet, while succeeding for tap. If I remove the af_packet >>>> check the >>>> frame is accepted by the hardware too. >>>> >>>> If I produce it a synthetic frame + vnet header using the test >>>> program - it >>>> works. Go figure. >>> Besides looking at the raw frame bytes, also compare the setup >>> of virtio_net_header, as well as the tcp checksum field. The stack >>> expects the pseudo header to have already been calculated. >> >> I am feeding it a skb which is coming up in the tx routine of a User >> Mode Linux device which is marked as NETIF_F_HW_CSUM and SG - that >> results in a skb with csum-ed headers, body set up for CSUM_PARTIAL >> and multiple fragments (always at least 1 more frag besides the TCP >> head). >> >> That has everything in order as expected by virtio_net_hdr_from_skb >> and this is what I use to generate the vnet header. It works >> correctly for csum and GRO with af_packet and it works correctly for >> everything using a tap device. It fails only on GSO + af_packet TX. >> >> What I am doing is the same thing virtio_net does - it just takes the >> output of virtio_net_hdr_from_skb and does nothing more. There should >> be no need to do anything more :( >> >> It should just work. >> >> Unless there is a gremlin somewhere in the machinery and that gremlin >> needs some light to be flushed out. >>> >>>> I am going to continue digging into it. >>>> >>>> At the very least I now have a positive test case which uses the same >>>> semantics as my code so I have something to compare to. >>> Glad to hear that the test is helpful. I wrote it because I >>> have run into these exact same issues in the past. >> >> It is. I have changes ready for it so it also supports vector IO, >> need to finish fighting with it. >> >> A. >> >>> >> > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [uml-devel] BUG:af_packet fails to TX TSO frames @ 2017-10-12 15:57 ` Anton Ivanov 0 siblings, 0 replies; 27+ messages in thread From: Anton Ivanov @ 2017-10-12 15:57 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development, David Miller, user-mode-linux-devel Also confirmed using my UML patchset. This is inside the UML guest. root@blinky:~# iperf -c 192.168.98.1 ------------------------------------------------------------ Client connecting to 192.168.98.1, TCP port 5001 TCP window size: 414 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.98.146 port 36744 connected with 192.168.98.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 4.52 GBytes 3.89 Gbits/sec This is GSO on a raw socket Virtual NIC to host. Normal speed without GSO on same machine is ~ 500. So for the time being I can turn off TSO for anything but v4 in the User Mode Linux patchset (declare the guest NIC as TSOV4 capable only) and enable it once the host kernel is fixed. Alternatively, there is the rather ugly approach of using multiple FDs - one for v4, one for v6, one for... A. On 10/12/17 16:44, Anton Ivanov wrote: > Found it. > > Two bugs canceling each other. > The bind sequence in: psock_txring_vnet.c is wrong. > > It does the following addr.sll_protocol = htons(ETH_P_IP); > before calling bind. > > If you set addr.sll_protocol to ETH_P_ALL where it should have been in > the first place the test program blows up with -ENOBUFS > > I think what is happening is that this value is taken into account > when looking at "what should I use to segment it with" in > skb_mac_gso_segment which is invoked at the end of the verification > chain which starts in packet_direct_xmit in af_packet.c > > I have not tried the other test cases like setting it to ETH_P_IP and > giving it IPv6 traffic or the opposite, but my guess is that these > will fail too if they need GSO to be applied. > > A. > > On 10/12/17 15:12, Anton Ivanov wrote: >> >> >> On 10/12/17 14:39, Willem de Bruijn wrote: >>>> If I produce a real vnet frame out of a live kernel frame using >>>> virtio_net_hdr_from_skb() and try to send it it fails on the check in >>>> af_packet, while succeeding for tap. If I remove the af_packet >>>> check the >>>> frame is accepted by the hardware too. >>>> >>>> If I produce it a synthetic frame + vnet header using the test >>>> program - it >>>> works. Go figure. >>> Besides looking at the raw frame bytes, also compare the setup >>> of virtio_net_header, as well as the tcp checksum field. The stack >>> expects the pseudo header to have already been calculated. >> >> I am feeding it a skb which is coming up in the tx routine of a User >> Mode Linux device which is marked as NETIF_F_HW_CSUM and SG - that >> results in a skb with csum-ed headers, body set up for CSUM_PARTIAL >> and multiple fragments (always at least 1 more frag besides the TCP >> head). >> >> That has everything in order as expected by virtio_net_hdr_from_skb >> and this is what I use to generate the vnet header. It works >> correctly for csum and GRO with af_packet and it works correctly for >> everything using a tap device. It fails only on GSO + af_packet TX. >> >> What I am doing is the same thing virtio_net does - it just takes the >> output of virtio_net_hdr_from_skb and does nothing more. There should >> be no need to do anything more :( >> >> It should just work. >> >> Unless there is a gremlin somewhere in the machinery and that gremlin >> needs some light to be flushed out. >>> >>>> I am going to continue digging into it. >>>> >>>> At the very least I now have a positive test case which uses the same >>>> semantics as my code so I have something to compare to. >>> Glad to hear that the test is helpful. I wrote it because I >>> have run into these exact same issues in the past. >> >> It is. I have changes ready for it so it also supports vector IO, >> need to finish fighting with it. >> >> A. >> >>> >> > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 15:44 ` Anton Ivanov 2017-10-12 15:57 ` [uml-devel] " Anton Ivanov @ 2017-10-12 16:30 ` Willem de Bruijn 2017-10-12 17:25 ` Willem de Bruijn 1 sibling, 1 reply; 27+ messages in thread From: Willem de Bruijn @ 2017-10-12 16:30 UTC (permalink / raw) To: Anton Ivanov; +Cc: Network Development, David Miller On Thu, Oct 12, 2017 at 11:44 AM, Anton Ivanov <anton.ivanov@cambridgegreys.com> wrote: > Found it. > > Two bugs canceling each other. > The bind sequence in: psock_txring_vnet.c is wrong. > > It does the following addr.sll_protocol = htons(ETH_P_IP); > before calling bind. > > If you set addr.sll_protocol to ETH_P_ALL where it should have been in the > first place the test program blows up with -ENOBUFS There is no such requirement that the socket should bind to ETH_P_ALL. > I think what is happening is that this value is taken into account when > looking at "what should I use to segment it with" in skb_mac_gso_segment > which is invoked at the end of the verification chain which starts in > packet_direct_xmit in af_packet.c packet_snd sets skb->protocol based on the protocol that the packet socket is bound to. Binding to ETH_P_IP is the right choice here. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 16:30 ` Willem de Bruijn @ 2017-10-12 17:25 ` Willem de Bruijn 2017-10-12 17:58 ` Anton Ivanov 0 siblings, 1 reply; 27+ messages in thread From: Willem de Bruijn @ 2017-10-12 17:25 UTC (permalink / raw) To: Anton Ivanov; +Cc: Network Development, David Miller On Thu, Oct 12, 2017 at 12:30 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > On Thu, Oct 12, 2017 at 11:44 AM, Anton Ivanov > <anton.ivanov@cambridgegreys.com> wrote: >> Found it. >> >> Two bugs canceling each other. >> The bind sequence in: psock_txring_vnet.c is wrong. >> >> It does the following addr.sll_protocol = htons(ETH_P_IP); >> before calling bind. >> >> If you set addr.sll_protocol to ETH_P_ALL where it should have been in the >> first place the test program blows up with -ENOBUFS > > There is no such requirement that the socket should bind to ETH_P_ALL. > >> I think what is happening is that this value is taken into account when >> looking at "what should I use to segment it with" in skb_mac_gso_segment >> which is invoked at the end of the verification chain which starts in >> packet_direct_xmit in af_packet.c > > packet_snd sets skb->protocol based on the protocol that the packet > socket is bound to. Binding to ETH_P_IP is the right choice here. To avoid having to open multiple sockets for different protocols, sockaddr_ll can also be passed in the msg_name argument on each call. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 17:25 ` Willem de Bruijn @ 2017-10-12 17:58 ` Anton Ivanov 2017-10-12 18:41 ` Willem de Bruijn 0 siblings, 1 reply; 27+ messages in thread From: Anton Ivanov @ 2017-10-12 17:58 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development, David Miller On 10/12/17 18:25, Willem de Bruijn wrote: > On Thu, Oct 12, 2017 at 12:30 PM, Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: >> On Thu, Oct 12, 2017 at 11:44 AM, Anton Ivanov >> <anton.ivanov@cambridgegreys.com> wrote: >>> Found it. >>> >>> Two bugs canceling each other. >>> The bind sequence in: psock_txring_vnet.c is wrong. >>> >>> It does the following addr.sll_protocol = htons(ETH_P_IP); >>> before calling bind. >>> >>> If you set addr.sll_protocol to ETH_P_ALL where it should have been in the >>> first place the test program blows up with -ENOBUFS >> There is no such requirement that the socket should bind to ETH_P_ALL. There is no requirement to bind to ETH_P_IP either and most code examples going back more than 10 years to the days of TCP Illustrated use ALL. I just did the v6 test - if you bind with ETH_P_IP GSO on TCPv6 is broken and returns NOBUF and vice versa. >> >>> I think what is happening is that this value is taken into account when >>> looking at "what should I use to segment it with" in skb_mac_gso_segment >>> which is invoked at the end of the verification chain which starts in >>> packet_direct_xmit in af_packet.c >> packet_snd sets skb->protocol based on the protocol that the packet >> socket is bound to. Binding to ETH_P_IP is the right choice here. > To avoid having to open multiple sockets for different protocols, > sockaddr_ll can also be passed in the msg_name argument on > each call. Does not work for vnet headers - it honors what you bound with. I tried to bind with ETH_ALL and pass ETH_P_IP as an arg and it ENOBUF-ed > -- Anton R. Ivanov Cambridge Greys Limited, England and Wales company No 10273661 http://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 17:58 ` Anton Ivanov @ 2017-10-12 18:41 ` Willem de Bruijn 2017-10-12 19:55 ` Anton Ivanov 0 siblings, 1 reply; 27+ messages in thread From: Willem de Bruijn @ 2017-10-12 18:41 UTC (permalink / raw) To: Anton Ivanov; +Cc: Network Development, David Miller On Thu, Oct 12, 2017 at 1:58 PM, Anton Ivanov <anton.ivanov@cambridgegreys.com> wrote: > > > On 10/12/17 18:25, Willem de Bruijn wrote: >> >> On Thu, Oct 12, 2017 at 12:30 PM, Willem de Bruijn >> <willemdebruijn.kernel@gmail.com> wrote: >>> >>> On Thu, Oct 12, 2017 at 11:44 AM, Anton Ivanov >>> <anton.ivanov@cambridgegreys.com> wrote: >>>> >>>> Found it. >>>> >>>> Two bugs canceling each other. >>>> The bind sequence in: psock_txring_vnet.c is wrong. >>>> >>>> It does the following addr.sll_protocol = htons(ETH_P_IP); >>>> before calling bind. >>>> >>>> If you set addr.sll_protocol to ETH_P_ALL where it should have been in >>>> the >>>> first place the test program blows up with -ENOBUFS >>> >>> There is no such requirement that the socket should bind to ETH_P_ALL. > > > There is no requirement to bind to ETH_P_IP either and most code examples > going back more than 10 years to the days of TCP Illustrated use ALL. For send only sockets, it is often advised to pass 0 as protocol to socket(), so as to avoid having to spend cycles on packet reception at all. Commit c72219b75fde ("packet: infer protocol from ethernet header if unset") explicitly added logic to infer skb->protocol for this common case of sockets, if using rings. > I just did the v6 test - if you bind with ETH_P_IP GSO on TCPv6 is broken > and returns NOBUF and vice versa. Given that skb->protocol is set from proto, that is indeed not expected to work. >>>> I think what is happening is that this value is taken into account when >>>> looking at "what should I use to segment it with" in skb_mac_gso_segment >>>> which is invoked at the end of the verification chain which starts in >>>> packet_direct_xmit in af_packet.c >>> >>> packet_snd sets skb->protocol based on the protocol that the packet >>> socket is bound to. Binding to ETH_P_IP is the right choice here. >> >> To avoid having to open multiple sockets for different protocols, >> sockaddr_ll can also be passed in the msg_name argument on >> each call. > > > Does not work for vnet headers - it honors what you bound with. I tried to > bind with ETH_ALL and pass ETH_P_IP as an arg and it ENOBUF-ed Odd. The code for looking up proto in packet_snd looks fairly straightforward: /* * Get and verify the address. */ if (likely(saddr == NULL)) { dev = packet_cached_dev_get(po); proto = po->num; addr = NULL; } else { err = -EINVAL; if (msg->msg_namelen < sizeof(struct sockaddr_ll)) goto out; if (msg->msg_namelen < (saddr->sll_halen + offsetof(struct sockaddr_ll, sll_addr))) goto out; proto = saddr->sll_protocol; addr = saddr->sll_addr; dev = dev_get_by_index(sock_net(sk), saddr->sll_ifindex); } followed later by skb->protocol = proto; skb->dev = dev; ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: BUG:af_packet fails to TX TSO frames 2017-10-12 18:41 ` Willem de Bruijn @ 2017-10-12 19:55 ` Anton Ivanov 0 siblings, 0 replies; 27+ messages in thread From: Anton Ivanov @ 2017-10-12 19:55 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development, David Miller [snip] >> There is no requirement to bind to ETH_P_IP either and most code examples >> going back more than 10 years to the days of TCP Illustrated use ALL. > For send only sockets, it is often advised to pass 0 as protocol to > socket(), so as to avoid having to spend cycles on packet reception > at all. Normally, it would have been a single rw socket. Thankfully, I built into the UML vector IO patchset the ability to use different fds for read and write per virtual NIC. Coming back to "normally" - as an application developer I would have expected a rw socket to work and not to need 2+ sockets for this. Having to work around at that level is IMHO a bit over the top. > > Commit c72219b75fde ("packet: infer protocol from ethernet header > if unset") explicitly added logic to infer skb->protocol for this common > case of sockets, if using rings. I will look into it. > >> I just did the v6 test - if you bind with ETH_P_IP GSO on TCPv6 is broken >> and returns NOBUF and vice versa. > Given that skb->protocol is set from proto, that is indeed not expected to work. > >>>>> I think what is happening is that this value is taken into account when >>>>> looking at "what should I use to segment it with" in skb_mac_gso_segment >>>>> which is invoked at the end of the verification chain which starts in >>>>> packet_direct_xmit in af_packet.c >>>> packet_snd sets skb->protocol based on the protocol that the packet >>>> socket is bound to. Binding to ETH_P_IP is the right choice here. >>> To avoid having to open multiple sockets for different protocols, >>> sockaddr_ll can also be passed in the msg_name argument on >>> each call. >> >> Does not work for vnet headers - it honors what you bound with. I tried to >> bind with ETH_ALL and pass ETH_P_IP as an arg and it ENOBUF-ed > Odd. The code for looking up proto in packet_snd looks fairly straightforward: I will double-check it tomorrow and send you a pull request for the updated test application. > > /* > * Get and verify the address. > */ > > if (likely(saddr == NULL)) { > dev = packet_cached_dev_get(po); > proto = po->num; > addr = NULL; > } else { > err = -EINVAL; > if (msg->msg_namelen < sizeof(struct sockaddr_ll)) > goto out; > if (msg->msg_namelen < (saddr->sll_halen + > offsetof(struct sockaddr_ll, sll_addr))) > goto out; > proto = saddr->sll_protocol; > addr = saddr->sll_addr; > dev = dev_get_by_index(sock_net(sk), saddr->sll_ifindex); > } > > followed later by > > skb->protocol = proto; > skb->dev = dev; > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Not BUG, feature :) af_packet fails to TX TSO frames 2017-10-11 13:39 ` Willem de Bruijn 2017-10-11 13:50 ` Anton Ivanov @ 2017-10-13 7:25 ` Anton Ivanov 1 sibling, 0 replies; 27+ messages in thread From: Anton Ivanov @ 2017-10-13 7:25 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development Let's reset this one. I cannot say I am happy with the current number of caveats on using this. We should at least throw a small page similar to the one dedicated to tpacket_v2 into the kernel docs (or amend the offload docs). As I do not know all the corner cases well enough I find it difficult to volunteer on this one, it should be someone more familiar with the code. If they are documented, we can say that this is not a bug, it is a feature :) What I can volunteer on is contributing some code to the actual af_packet.c. When I did the digging in the source, I noticed that tap has an unfair advantage over raw by allowing the user to specify buffers from userspace. I am happy to port that, test it and contribute it. -- Anton R. Ivanov Cambridge Greys Limited, England and Wales company No 10273661 http://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2017-10-13 7:25 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-11 8:39 BUG:af_packet fails to TX TSO frames Anton Ivanov 2017-10-11 13:39 ` Willem de Bruijn 2017-10-11 13:50 ` Anton Ivanov 2017-10-11 15:54 ` Anton Ivanov 2017-10-11 16:26 ` Willem de Bruijn 2017-10-11 16:32 ` Anton Ivanov 2017-10-11 18:39 ` Anton Ivanov 2017-10-11 18:57 ` Willem de Bruijn 2017-10-11 19:39 ` Anton Ivanov 2017-10-11 21:02 ` [uml-devel] Fwd: " Anton Ivanov 2017-10-11 21:05 ` Willem de Bruijn 2017-10-11 21:55 ` Anton Ivanov 2017-10-11 22:01 ` Anton Ivanov 2017-10-12 0:19 ` Willem de Bruijn 2017-10-12 6:11 ` Anton Ivanov 2017-10-12 8:46 ` Anton Ivanov 2017-10-12 13:39 ` Willem de Bruijn 2017-10-12 14:12 ` Anton Ivanov 2017-10-12 15:44 ` Anton Ivanov 2017-10-12 15:57 ` Anton Ivanov 2017-10-12 15:57 ` [uml-devel] " Anton Ivanov 2017-10-12 16:30 ` Willem de Bruijn 2017-10-12 17:25 ` Willem de Bruijn 2017-10-12 17:58 ` Anton Ivanov 2017-10-12 18:41 ` Willem de Bruijn 2017-10-12 19:55 ` Anton Ivanov 2017-10-13 7:25 ` Not BUG, feature :) af_packet " Anton Ivanov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.