All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG:af_packet fails to TX TSO frames
@ 2017-10-11  8:39 Anton Ivanov
  2017-10-11 13:39 ` Willem de Bruijn
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Ivanov @ 2017-10-11  8:39 UTC (permalink / raw)
  To: netdev; +Cc: davem

Hi all,

I am having an issue with af_packet.c

It fails to transmit any TSO frame submitted via raw socket + vnet 
headers. An identical frame is considered valid for tap.

The frames are generated out of legit linux skbufs (in UML) and vnet 
headers work for checksumming on raw, so I should have the raw 
initialization right.

The header is supposedly parsed correctly and the newly formed skbuf is 
sent to the device transmit routine (or enqueued) . I have debugged it 
as far as it reaching the following line in packet_snd() (line 2592 in 
4.13):

err = po->xmit(skb);

This returns NET_XMIT_DROP for any TSO capable device I tested. They 
dislike the frame. Same frame is accepted by tap. I have went through 
the header parsing and skb allocation code in both af_packet and tap 
several times and I do not see any material difference (except the new 
zerocopy stuff). So, frankly, I am stuck.

Can someone help me to debug this. I do not see an easy way to debug it, 
but this is not a part of the kernel I am familiar with. Is there a 
suitable helper function to try to segment the frame and see exactly 
what is wrong with it?

Cc-ing DaveM as this has no specific maintainer so it falls under his 
umbrella remit.

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/	

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11  8:39 BUG:af_packet fails to TX TSO frames Anton Ivanov
@ 2017-10-11 13:39 ` Willem de Bruijn
  2017-10-11 13:50   ` Anton Ivanov
  2017-10-13  7:25   ` Not BUG, feature :) af_packet " Anton Ivanov
  0 siblings, 2 replies; 27+ messages in thread
From: Willem de Bruijn @ 2017-10-11 13:39 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Network Development, David Miller

On Wed, Oct 11, 2017 at 4:39 AM, Anton Ivanov
<anton.ivanov@cambridgegreys.com> wrote:
> Hi all,
>
> I am having an issue with af_packet.c
>
> It fails to transmit any TSO frame submitted via raw socket + vnet headers.
> An identical frame is considered valid for tap.

This may be due to validation. As of commit 104ba78c9880 ("packet: on
direct_xmit, limit tso and csum to supported devices") the packet socket
code validates TSO packets and HW support in the direct_xmit path.

Do you have a test program or packet (incl. vnet hdr) to reproduce this
with? I usually test this path with

https://github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c

> The frames are generated out of legit linux skbufs (in UML) and vnet headers
> work for checksumming on raw, so I should have the raw initialization right.
>
> The header is supposedly parsed correctly and the newly formed skbuf is sent
> to the device transmit routine (or enqueued) . I have debugged it as far as
> it reaching the following line in packet_snd() (line 2592 in 4.13):
>
> err = po->xmit(skb);

That maps either on to packet_direct_xmit or dev_queue_xmit.

>
> This returns NET_XMIT_DROP for any TSO capable device I tested.

You can also try

  perf record -a -g -e skb:kfree_skb sleep 10
  perf report

to see where these packets are dropped.

> They dislike
> the frame. Same frame is accepted by tap. I have went through the header
> parsing and skb allocation code in both af_packet and tap several times and
> I do not see any material difference (except the new zerocopy stuff). So,
> frankly, I am stuck.
>
> Can someone help me to debug this. I do not see an easy way to debug it, but
> this is not a part of the kernel I am familiar with. Is there a suitable
> helper function to try to segment the frame and see exactly what is wrong
> with it?
>
> Cc-ing DaveM as this has no specific maintainer so it falls under his
> umbrella remit.
>
> --
> Anton R. Ivanov
>
> Cambridge Greys Limited, England and Wales company No 10273661
> http://www.cambridgegreys.com/
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 13:39 ` Willem de Bruijn
@ 2017-10-11 13:50   ` Anton Ivanov
  2017-10-11 15:54     ` Anton Ivanov
  2017-10-13  7:25   ` Not BUG, feature :) af_packet " Anton Ivanov
  1 sibling, 1 reply; 27+ messages in thread
From: Anton Ivanov @ 2017-10-11 13:50 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller



On 10/11/17 14:39, Willem de Bruijn wrote:
> On Wed, Oct 11, 2017 at 4:39 AM, Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>> Hi all,
>>
>> I am having an issue with af_packet.c
>>
>> It fails to transmit any TSO frame submitted via raw socket + vnet headers.
>> An identical frame is considered valid for tap.
> This may be due to validation. As of commit 104ba78c9880 ("packet: on
> direct_xmit, limit tso and csum to supported devices") the packet socket
> code validates TSO packets and HW support in the direct_xmit path.

I will look at it. I have tried with bridge+vEth (raw socket on the 
vEth) and directly on a tg3 and e1000e. All of these should be tso capable.

CSUM definitely works in both cases, it is only TSO and only via raw 
socket on TX which is refusing to work.

>
> Do you have a test program or packet (incl. vnet hdr) to reproduce this
> with? I usually test this path with

My test program at present is UML instance with my vector IO patches. 
Latest version just went to the uml mailing list and can be pulled from

http://foswiki.kot-begemot.co.uk/Main/EatYourOwnDogFoodOnUML

I am going to dump one of the frames being produced and isolate it for a 
standalone test case.

>
> https://github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c
>
>> The frames are generated out of legit linux skbufs (in UML) and vnet headers
>> work for checksumming on raw, so I should have the raw initialization right.
>>
>> The header is supposedly parsed correctly and the newly formed skbuf is sent
>> to the device transmit routine (or enqueued) . I have debugged it as far as
>> it reaching the following line in packet_snd() (line 2592 in 4.13):
>>
>> err = po->xmit(skb);
> That maps either on to packet_direct_xmit or dev_queue_xmit.

I know. In my case it is direct_xmit as I have asked for QDISC bypass.

>
>> This returns NET_XMIT_DROP for any TSO capable device I tested.
> You can also try
>
>    perf record -a -g -e skb:kfree_skb sleep 10
>    perf report
>
> to see where these packets are dropped.

Thanks, will try that.

>
>> They dislike
>> the frame. Same frame is accepted by tap. I have went through the header
>> parsing and skb allocation code in both af_packet and tap several times and
>> I do not see any material difference (except the new zerocopy stuff). So,
>> frankly, I am stuck.
>>
>> Can someone help me to debug this. I do not see an easy way to debug it, but
>> this is not a part of the kernel I am familiar with. Is there a suitable
>> helper function to try to segment the frame and see exactly what is wrong
>> with it?
>>
>> Cc-ing DaveM as this has no specific maintainer so it falls under his
>> umbrella remit.
>>
>> --
>> Anton R. Ivanov
>>
>> Cambridge Greys Limited, England and Wales company No 10273661
>> http://www.cambridgegreys.com/
>>

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 13:50   ` Anton Ivanov
@ 2017-10-11 15:54     ` Anton Ivanov
  2017-10-11 16:26       ` Willem de Bruijn
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Ivanov @ 2017-10-11 15:54 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller

It is that patch.

I rolled it back and immediately got it to work correctly on a Broadcom 
Tigon. I can test on all other scenarios, I have tried, I suspect it 
will come back alive on all of them.

I am going to try to trace it through and see exactly where it drops a 
skb which the card has no issues in accepting.

A.

On 10/11/17 14:50, Anton Ivanov wrote:
>
>
> On 10/11/17 14:39, Willem de Bruijn wrote:
>> On Wed, Oct 11, 2017 at 4:39 AM, Anton Ivanov
>> <anton.ivanov@cambridgegreys.com> wrote:
>>> Hi all,
>>>
>>> I am having an issue with af_packet.c
>>>
>>> It fails to transmit any TSO frame submitted via raw socket + vnet 
>>> headers.
>>> An identical frame is considered valid for tap.
>> This may be due to validation. As of commit 104ba78c9880 ("packet: on
>> direct_xmit, limit tso and csum to supported devices") the packet socket
>> code validates TSO packets and HW support in the direct_xmit path.
>
> I will look at it. I have tried with bridge+vEth (raw socket on the 
> vEth) and directly on a tg3 and e1000e. All of these should be tso 
> capable.
>
> CSUM definitely works in both cases, it is only TSO and only via raw 
> socket on TX which is refusing to work.
>
>>
>> Do you have a test program or packet (incl. vnet hdr) to reproduce this
>> with? I usually test this path with
>
> My test program at present is UML instance with my vector IO patches. 
> Latest version just went to the uml mailing list and can be pulled from
>
> http://foswiki.kot-begemot.co.uk/Main/EatYourOwnDogFoodOnUML
>
> I am going to dump one of the frames being produced and isolate it for 
> a standalone test case.
>
>>
>> https://github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c 
>>
>>
>>> The frames are generated out of legit linux skbufs (in UML) and vnet 
>>> headers
>>> work for checksumming on raw, so I should have the raw 
>>> initialization right.
>>>
>>> The header is supposedly parsed correctly and the newly formed skbuf 
>>> is sent
>>> to the device transmit routine (or enqueued) . I have debugged it as 
>>> far as
>>> it reaching the following line in packet_snd() (line 2592 in 4.13):
>>>
>>> err = po->xmit(skb);
>> That maps either on to packet_direct_xmit or dev_queue_xmit.
>
> I know. In my case it is direct_xmit as I have asked for QDISC bypass.
>
>>
>>> This returns NET_XMIT_DROP for any TSO capable device I tested.
>> You can also try
>>
>>    perf record -a -g -e skb:kfree_skb sleep 10
>>    perf report
>>
>> to see where these packets are dropped.
>
> Thanks, will try that.
>
>>
>>> They dislike
>>> the frame. Same frame is accepted by tap. I have went through the 
>>> header
>>> parsing and skb allocation code in both af_packet and tap several 
>>> times and
>>> I do not see any material difference (except the new zerocopy 
>>> stuff). So,
>>> frankly, I am stuck.
>>>
>>> Can someone help me to debug this. I do not see an easy way to debug 
>>> it, but
>>> this is not a part of the kernel I am familiar with. Is there a 
>>> suitable
>>> helper function to try to segment the frame and see exactly what is 
>>> wrong
>>> with it?
>>>
>>> Cc-ing DaveM as this has no specific maintainer so it falls under his
>>> umbrella remit.
>>>
>>> -- 
>>> Anton R. Ivanov
>>>
>>> Cambridge Greys Limited, England and Wales company No 10273661
>>> http://www.cambridgegreys.com/
>>>
>

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 15:54     ` Anton Ivanov
@ 2017-10-11 16:26       ` Willem de Bruijn
  2017-10-11 16:32         ` Anton Ivanov
  2017-10-11 18:39         ` Anton Ivanov
  0 siblings, 2 replies; 27+ messages in thread
From: Willem de Bruijn @ 2017-10-11 16:26 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Network Development, David Miller

On Wed, Oct 11, 2017 at 11:54 AM, Anton Ivanov
<anton.ivanov@cambridgegreys.com> wrote:
> It is that patch.
>
> I rolled it back and immediately got it to work correctly on a Broadcom
> Tigon. I can test on all other scenarios, I have tried, I suspect it will
> come back alive on all of them.
>
> I am going to try to trace it through and see exactly where it drops a skb
> which the card has no issues in accepting.

It might be in the initialization of gso_type and csum. The virtio_net_hdr
can encode various combinations of flags that are not allowed by the
validation logic.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 16:26       ` Willem de Bruijn
@ 2017-10-11 16:32         ` Anton Ivanov
  2017-10-11 18:39         ` Anton Ivanov
  1 sibling, 0 replies; 27+ messages in thread
From: Anton Ivanov @ 2017-10-11 16:32 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller

Working through it at the moment.

The validation logic is prohibiting what the hardware considers to be a 
perfectly legit skb.

Once I narrow down the culprit I will come back with my findings.

Thanks for pointing me where to look by the way.

A.

On 10/11/17 17:26, Willem de Bruijn wrote:
> On Wed, Oct 11, 2017 at 11:54 AM, Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>> It is that patch.
>>
>> I rolled it back and immediately got it to work correctly on a Broadcom
>> Tigon. I can test on all other scenarios, I have tried, I suspect it will
>> come back alive on all of them.
>>
>> I am going to try to trace it through and see exactly where it drops a skb
>> which the card has no issues in accepting.
> It might be in the initialization of gso_type and csum. The virtio_net_hdr
> can encode various combinations of flags that are not allowed by the
> validation logic.
>

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 16:26       ` Willem de Bruijn
  2017-10-11 16:32         ` Anton Ivanov
@ 2017-10-11 18:39         ` Anton Ivanov
  2017-10-11 18:57           ` Willem de Bruijn
  1 sibling, 1 reply; 27+ messages in thread
From: Anton Ivanov @ 2017-10-11 18:39 UTC (permalink / raw)
  To: Willem de Bruijn, Anton Ivanov; +Cc: Network Development, David Miller

The check as now insists that the actual driver supports GSO_ROBUST, 
because we have marked the skb dodgy.

The specific bit which does this check is in net_gso_ok()

Now, lets's see how many Ethernet drivers set GSO_ROBUST.

find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H 
GSO_ROBUST {} \;

That returns nothing in 4.x

IMHO - af_packet allocates the skb, does all checks (and extra may be 
added) on the gso, why is this set dodgy in the first place?

A.


On 11/10/17 17:26, Willem de Bruijn wrote:
> On Wed, Oct 11, 2017 at 11:54 AM, Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>> It is that patch.
>>
>> I rolled it back and immediately got it to work correctly on a Broadcom
>> Tigon. I can test on all other scenarios, I have tried, I suspect it will
>> come back alive on all of them.
>>
>> I am going to try to trace it through and see exactly where it drops a skb
>> which the card has no issues in accepting.
> It might be in the initialization of gso_type and csum. The virtio_net_hdr
> can encode various combinations of flags that are not allowed by the
> validation logic.
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 18:39         ` Anton Ivanov
@ 2017-10-11 18:57           ` Willem de Bruijn
  2017-10-11 19:39             ` Anton Ivanov
  0 siblings, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2017-10-11 18:57 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Anton Ivanov, Network Development, David Miller

On Wed, Oct 11, 2017 at 2:39 PM, Anton Ivanov
<anton.ivanov@kot-begemot.co.uk> wrote:
> The check as now insists that the actual driver supports GSO_ROBUST, because
> we have marked the skb dodgy.
>
> The specific bit which does this check is in net_gso_ok()
>
> Now, lets's see how many Ethernet drivers set GSO_ROBUST.
>
> find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST
> {} \;
>
> That returns nothing in 4.x
>
> IMHO - af_packet allocates the skb, does all checks (and extra may be added)
> on the gso, why is this set dodgy in the first place?

It is set when the header has to be validated.

The segmentation logic will validate and fixup gso_segs. See for
instance tcp_gso_segment:

        if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
                /* Packet is from an untrusted source, reset gso_segs. */

                skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss);

                segs = NULL;
                goto out;
        }

If the device would have the robust bit set and otherwise supports the
required features, fix up gso_segs and pass the large packet to the
device.

Else it continues to the software gso path.

Large packets generated with psock_txring_vnet.c pass this test. I
suspect that there is a subtle difference in the virtio_net_hdr fields
that that generates vs. your program.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 18:57           ` Willem de Bruijn
@ 2017-10-11 19:39             ` Anton Ivanov
  2017-10-11 21:02               ` [uml-devel] Fwd: " Anton Ivanov
  2017-10-11 21:05               ` Willem de Bruijn
  0 siblings, 2 replies; 27+ messages in thread
From: Anton Ivanov @ 2017-10-11 19:39 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Anton Ivanov, Network Development, David Miller

On 11/10/17 19:57, Willem de Bruijn wrote:
> On Wed, Oct 11, 2017 at 2:39 PM, Anton Ivanov
> <anton.ivanov@kot-begemot.co.uk> wrote:
>> The check as now insists that the actual driver supports GSO_ROBUST, because
>> we have marked the skb dodgy.
>>
>> The specific bit which does this check is in net_gso_ok()
>>
>> Now, lets's see how many Ethernet drivers set GSO_ROBUST.
>>
>> find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST
>> {} \;
>>
>> That returns nothing in 4.x
>>
>> IMHO - af_packet allocates the skb, does all checks (and extra may be added)
>> on the gso, why is this set dodgy in the first place?
> It is set when the header has to be validated.
>
> The segmentation logic will validate and fixup gso_segs. See for
> instance tcp_gso_segment:
>
>         if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
>                 /* Packet is from an untrusted source, reset gso_segs. */
>
>                 skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss);
>
>                 segs = NULL;
>                 goto out;
>         }
>
> If the device would have the robust bit set and otherwise supports the
> required features, fix up gso_segs and pass the large packet to the
> device.
>
> Else it continues to the software gso path.
>
> Large packets generated with psock_txring_vnet.c pass this test. I

That test is indeed a different path - this goes via the tpacket_snd
which allocs via sock_alloc_send_skb. That results in a non-fragged skb
as it calls pskb after that with data_len = 0 asking for a contiguous one.

My stuff is using sendmmsg which ends up via packet_snd which allocs
via  sock_alloc_send_pskb which is invoked in a way which always creates
2 segments - one for the linear section and one for the rest (and more
if needed). It is faster than tpacket by the way (several times).

As a comparison tap and other virtual drivers use sock_alloc_send_pskb
with non-zero data length which results in multiple frags. The code in
packet_snd is in fact identical with tap (+/- some cosmetic differences).

That is the difference between the tests and that is why your test works
and mine fails.

Now, alloc-ing a 64k contiguous skb every time IMHO is wrong.

So the logic in the xmit check at present works only because it is given
only a very corner case for a GSO frame and tested versus it. It should
work with the generic case which is what comes out of
sock_alloc_send_pskb (same as in tap).

A.



> suspect that there is a subtle difference in the virtio_net_hdr fields
> that that generates vs. your program.
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [uml-devel] Fwd: Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 19:39             ` Anton Ivanov
@ 2017-10-11 21:02               ` Anton Ivanov
  2017-10-11 21:05               ` Willem de Bruijn
  1 sibling, 0 replies; 27+ messages in thread
From: Anton Ivanov @ 2017-10-11 21:02 UTC (permalink / raw)
  To: user-mode-linux-devel


[-- Attachment #1.1: Type: text/plain, Size: 3374 bytes --]

And that is the culprit of why GSO does not work with raw sockets.

There can be no workaround unfortunately, this simply will need to be
enabled in the driver once the kernel is fixed.

It should work for older kernels prior to the fix in comit104ba78c9880 too.

So from that perspective there is nothing that can be done to the vector
io drivers - the code is correct, it is the host kernel side which is
broken :(

A.

-------- Forwarded Message --------
Subject: 	Re: BUG:af_packet fails to TX TSO frames
Date: 	Wed, 11 Oct 2017 20:39:57 +0100
From: 	Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
To: 	Willem de Bruijn <willemdebruijn.kernel@gmail.com>
CC: 	Anton Ivanov <anton.ivanov@cambridgegreys.com>, Network Development
<netdev@vger.kernel.org>, David Miller <davem@davemloft.net>



On 11/10/17 19:57, Willem de Bruijn wrote:
> On Wed, Oct 11, 2017 at 2:39 PM, Anton Ivanov
> <anton.ivanov@kot-begemot.co.uk> wrote:
>> The check as now insists that the actual driver supports GSO_ROBUST, because
>> we have marked the skb dodgy.
>>
>> The specific bit which does this check is in net_gso_ok()
>>
>> Now, lets's see how many Ethernet drivers set GSO_ROBUST.
>>
>> find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST
>> {} \;
>>
>> That returns nothing in 4.x
>>
>> IMHO - af_packet allocates the skb, does all checks (and extra may be added)
>> on the gso, why is this set dodgy in the first place?
> It is set when the header has to be validated.
>
> The segmentation logic will validate and fixup gso_segs. See for
> instance tcp_gso_segment:
>
>         if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
>                 /* Packet is from an untrusted source, reset gso_segs. */
>
>                 skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss);
>
>                 segs = NULL;
>                 goto out;
>         }
>
> If the device would have the robust bit set and otherwise supports the
> required features, fix up gso_segs and pass the large packet to the
> device.
>
> Else it continues to the software gso path.
>
> Large packets generated with psock_txring_vnet.c pass this test. I

That test is indeed a different path - this goes via the tpacket_snd
which allocs via sock_alloc_send_skb. That results in a non-fragged skb
as it calls pskb after that with data_len = 0 asking for a contiguous one.

My stuff is using sendmmsg which ends up via packet_snd which allocs
via  sock_alloc_send_pskb which is invoked in a way which always creates
2 segments - one for the linear section and one for the rest (and more
if needed). It is faster than tpacket by the way (several times).

As a comparison tap and other virtual drivers use sock_alloc_send_pskb
with non-zero data length which results in multiple frags. The code in
packet_snd is in fact identical with tap (+/- some cosmetic differences).

That is the difference between the tests and that is why your test works
and mine fails.

Now, alloc-ing a 64k contiguous skb every time IMHO is wrong.

So the logic in the xmit check at present works only because it is given
only a very corner case for a GSO frame and tested versus it. It should
work with the generic case which is what comes out of
sock_alloc_send_pskb (same as in tap).

A.



> suspect that there is a subtle difference in the virtio_net_hdr fields
> that that generates vs. your program.
>



[-- Attachment #1.2: Type: text/html, Size: 5188 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

[-- Attachment #3: Type: text/plain, Size: 194 bytes --]

_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 19:39             ` Anton Ivanov
  2017-10-11 21:02               ` [uml-devel] Fwd: " Anton Ivanov
@ 2017-10-11 21:05               ` Willem de Bruijn
  2017-10-11 21:55                 ` Anton Ivanov
  1 sibling, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2017-10-11 21:05 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Anton Ivanov, Network Development, David Miller

On Wed, Oct 11, 2017 at 3:39 PM, Anton Ivanov
<anton.ivanov@kot-begemot.co.uk> wrote:
> On 11/10/17 19:57, Willem de Bruijn wrote:
>> On Wed, Oct 11, 2017 at 2:39 PM, Anton Ivanov
>> <anton.ivanov@kot-begemot.co.uk> wrote:
>>> The check as now insists that the actual driver supports GSO_ROBUST, because
>>> we have marked the skb dodgy.
>>>
>>> The specific bit which does this check is in net_gso_ok()
>>>
>>> Now, lets's see how many Ethernet drivers set GSO_ROBUST.
>>>
>>> find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST
>>> {} \;
>>>
>>> That returns nothing in 4.x
>>>
>>> IMHO - af_packet allocates the skb, does all checks (and extra may be added)
>>> on the gso, why is this set dodgy in the first place?
>> It is set when the header has to be validated.
>>
>> The segmentation logic will validate and fixup gso_segs. See for
>> instance tcp_gso_segment:
>>
>>         if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
>>                 /* Packet is from an untrusted source, reset gso_segs. */
>>
>>                 skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss);
>>
>>                 segs = NULL;
>>                 goto out;
>>         }
>>
>> If the device would have the robust bit set and otherwise supports the
>> required features, fix up gso_segs and pass the large packet to the
>> device.
>>
>> Else it continues to the software gso path.
>>
>> Large packets generated with psock_txring_vnet.c pass this test. I
>
> That test is indeed a different path

The test can be run both with and without ring:

  psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v
  psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v -N

both with and without qdisc bypass ('-q').

>  - this goes via the tpacket_snd
> which allocs via sock_alloc_send_skb. That results in a non-fragged skb
> as it calls pskb after that with data_len = 0 asking for a contiguous one.

but attached the ring slot as fragments in tpacket_fill_skb.

> My stuff is using sendmmsg which ends up via packet_snd which allocs
> via  sock_alloc_send_pskb which is invoked in a way which always creates
> 2 segments - one for the linear section and one for the rest (and more
> if needed). It is faster than tpacket by the way (several times).
>
> As a comparison tap and other virtual drivers use sock_alloc_send_pskb
> with non-zero data length which results in multiple frags. The code in
> packet_snd is in fact identical with tap (+/- some cosmetic differences).
>
> That is the difference between the tests and that is why your test works
> and mine fails.

All the above test cases work for me, including those that build skbs
with fragments. Could you try those.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 21:05               ` Willem de Bruijn
@ 2017-10-11 21:55                 ` Anton Ivanov
  2017-10-11 22:01                   ` Anton Ivanov
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Ivanov @ 2017-10-11 21:55 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Anton Ivanov, Network Development, David Miller

[snip]

> The test can be run both with and without ring:
>
>   psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v
>   psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v -N
>
> both with and without qdisc bypass ('-q').

Thanks, apologies, I was being inpatient. Started reading the source,
saw the tpacket bits and stopped there.

>
>>  - this goes via the tpacket_snd
>> which allocs via sock_alloc_send_skb. That results in a non-fragged skb
>> as it calls pskb after that with data_len = 0 asking for a contiguous one.
> but attached the ring slot as fragments in tpacket_fill_skb.
>
>> My stuff is using sendmmsg which ends up via packet_snd which allocs
>> via  sock_alloc_send_pskb which is invoked in a way which always creates
>> 2 segments - one for the linear section and one for the rest (and more
>> if needed). It is faster than tpacket by the way (several times).
>>
>> As a comparison tap and other virtual drivers use sock_alloc_send_pskb
>> with non-zero data length which results in multiple frags. The code in
>> packet_snd is in fact identical with tap (+/- some cosmetic differences).
>>
>> That is the difference between the tests and that is why your test works
>> and mine fails.
> All the above test cases work for me, including those that build skbs
> with fragments. Could you try those.

Tried it, works on all of the adapters and hosts where mine fails. I
will step by step hack-in the differences so it behaves same as mine
until I find the culprit.

This will be tomorrow though, it is late here.

The only obvious difference I can see at this point is that I am using
iovs and sending the vnet header as iov[0] and the data in pieces after
that while your code is doing a send() for the whole frame. This should
not make any difference though - it all ends up as an iov internally in
the kernel.

A.

>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 21:55                 ` Anton Ivanov
@ 2017-10-11 22:01                   ` Anton Ivanov
  2017-10-12  0:19                     ` Willem de Bruijn
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Ivanov @ 2017-10-11 22:01 UTC (permalink / raw)
  To: Anton Ivanov, Willem de Bruijn; +Cc: Network Development, David Miller

[snip]

> This will be tomorrow though, it is late here.
>
> The only obvious difference I can see at this point is that I am using
> iovs and sending the vnet header as iov[0] and the data in pieces after
> that while your code is doing a send() for the whole frame. This should
> not make any difference though - it all ends up as an iov internally in
> the kernel.

Spoke too soon. It is not reporting any errors, but there is nothing
coming out on the actual Ethernet.

Leaving it till tomorrow.

A.

>
> A.
>
>


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-11 22:01                   ` Anton Ivanov
@ 2017-10-12  0:19                     ` Willem de Bruijn
  2017-10-12  6:11                       ` Anton Ivanov
  0 siblings, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2017-10-12  0:19 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Anton Ivanov, Network Development, David Miller

On Wed, Oct 11, 2017 at 6:01 PM, Anton Ivanov
<anton.ivanov@cambridgegreys.com> wrote:
> [snip]
>
>> This will be tomorrow though, it is late here.
>>
>> The only obvious difference I can see at this point is that I am using
>> iovs and sending the vnet header as iov[0] and the data in pieces after
>> that while your code is doing a send() for the whole frame. This should
>> not make any difference though - it all ends up as an iov internally in
>> the kernel.
>
> Spoke too soon. It is not reporting any errors, but there is nothing
> coming out on the actual Ethernet.

It works for me on various platforms. On the receiver, drop these fake
tcp packets in iptables and read them with tcpdump

   iptables -A PREROUTING -t raw -p tcp --dport 9 -j DROP
   tcpdump src $src_ip

Note that not all combinations of flags are supported by the kernel
and that some flags have non-obvious behavior (disable a feature, in
place of enable it).

Specifically, mtu sized packets either must not pass a vnet_hdr or
must pass one with gso explicitly disabled ('-G').

  psock_txring_vnet -s $src_ip $dst_ip -l 1400
  psock_txring_vnet -s $src_ip $dst_ip -l 1400 -v -G
  psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N
  psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N -v -G

Conversely, packets that exceed mtu have to have the gso flags in the
virtio_net_hdr:

  psock_txring_vnet -s $src_ip $dst_ip -l 4400 -v
  psock_txring_vnet -s $src_ip $dst_ip -l 4400 -N -v

When sending a large packet, but not passing a virtio_net_hdr along
('-v'), the test fails with

  psock_txring_vnet: send: Message too long

When passing a header along, but not disabling gso, the packet is
indeed dropped silently.

I verified correct segmentation with three modes of ethtool

  ethtool -K eth0 tso off gso off
  ethtool -K eth0 tso off gso on
  ethtool -K eth0 tso on gso on

by reading tcpdump on the sender.

The receive side results are the same with dev_queue_xmit and
packet_direct_xmit ('-q') mode. With direct_xmit, the packets are not
observed on the send side.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12  0:19                     ` Willem de Bruijn
@ 2017-10-12  6:11                       ` Anton Ivanov
  2017-10-12  8:46                         ` Anton Ivanov
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Ivanov @ 2017-10-12  6:11 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Anton Ivanov, Network Development, David Miller

On 12/10/17 01:19, Willem de Bruijn wrote:
> On Wed, Oct 11, 2017 at 6:01 PM, Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>> [snip]
>>
>>> This will be tomorrow though, it is late here.
>>>
>>> The only obvious difference I can see at this point is that I am using
>>> iovs and sending the vnet header as iov[0] and the data in pieces after
>>> that while your code is doing a send() for the whole frame. This should
>>> not make any difference though - it all ends up as an iov internally in
>>> the kernel.
>> Spoke too soon. It is not reporting any errors, but there is nothing
>> coming out on the actual Ethernet.

Different issue - switch was blacklisting the fake dst MAC after the
first packet as a suspected flood attack.

It worked after I changed the mac in the source to a "real" one. In
fact, an argument for that
would be nice :)

I will go through it step by step now and figure out exactly what and
where is wrong with the framing on my side.

Thanks for your help.

A.

> It works for me on various platforms. On the receiver, drop these fake
> tcp packets in iptables and read them with tcpdump
>
>    iptables -A PREROUTING -t raw -p tcp --dport 9 -j DROP
>    tcpdump src $src_ip
>
> Note that not all combinations of flags are supported by the kernel
> and that some flags have non-obvious behavior (disable a feature, in
> place of enable it).
>
> Specifically, mtu sized packets either must not pass a vnet_hdr or
> must pass one with gso explicitly disabled ('-G').
>
>   psock_txring_vnet -s $src_ip $dst_ip -l 1400
>   psock_txring_vnet -s $src_ip $dst_ip -l 1400 -v -G
>   psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N
>   psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N -v -G
>
> Conversely, packets that exceed mtu have to have the gso flags in the
> virtio_net_hdr:
>
>   psock_txring_vnet -s $src_ip $dst_ip -l 4400 -v
>   psock_txring_vnet -s $src_ip $dst_ip -l 4400 -N -v
>
> When sending a large packet, but not passing a virtio_net_hdr along
> ('-v'), the test fails with
>
>   psock_txring_vnet: send: Message too long
>
> When passing a header along, but not disabling gso, the packet is
> indeed dropped silently.
>
> I verified correct segmentation with three modes of ethtool
>
>   ethtool -K eth0 tso off gso off
>   ethtool -K eth0 tso off gso on
>   ethtool -K eth0 tso on gso on
>
> by reading tcpdump on the sender.
>
> The receive side results are the same with dev_queue_xmit and
> packet_direct_xmit ('-q') mode. With direct_xmit, the packets are not
> observed on the send side.
>


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12  6:11                       ` Anton Ivanov
@ 2017-10-12  8:46                         ` Anton Ivanov
  2017-10-12 13:39                           ` Willem de Bruijn
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Ivanov @ 2017-10-12  8:46 UTC (permalink / raw)
  To: Anton Ivanov, Willem de Bruijn; +Cc: Network Development, David Miller

Hi Willem,

Thanks for all your help so far.

I modified your test program to do sendmmsg as an option in addition to 
send. I will send you a github pull req with the changes later today. 
This allows to send let's say n frames (let's say 64) in one syscall - 
probably the highest TX rate you can get out of Linux userspace.

The test case works via that route too which is now getting into the 
realm of the surreal.

If I produce a real vnet frame out of a live kernel frame using 
virtio_net_hdr_from_skb() and try to send it it fails on the check in 
af_packet, while succeeding for tap. If I remove the af_packet check the 
frame is accepted by the hardware too.

If I produce it a synthetic frame + vnet header using the test program - 
it works. Go figure.

I am going to continue digging into it.

At the very least I now have a positive test case which uses the same 
semantics as my code so I have something to compare to.

A.

On 10/12/17 07:11, Anton Ivanov wrote:
> On 12/10/17 01:19, Willem de Bruijn wrote:
>> On Wed, Oct 11, 2017 at 6:01 PM, Anton Ivanov
>> <anton.ivanov@cambridgegreys.com> wrote:
>>> [snip]
>>>
>>>> This will be tomorrow though, it is late here.
>>>>
>>>> The only obvious difference I can see at this point is that I am using
>>>> iovs and sending the vnet header as iov[0] and the data in pieces after
>>>> that while your code is doing a send() for the whole frame. This should
>>>> not make any difference though - it all ends up as an iov internally in
>>>> the kernel.
>>> Spoke too soon. It is not reporting any errors, but there is nothing
>>> coming out on the actual Ethernet.
> Different issue - switch was blacklisting the fake dst MAC after the
> first packet as a suspected flood attack.
>
> It worked after I changed the mac in the source to a "real" one. In
> fact, an argument for that
> would be nice :)
>
> I will go through it step by step now and figure out exactly what and
> where is wrong with the framing on my side.
>
> Thanks for your help.
>
> A.
>
>> It works for me on various platforms. On the receiver, drop these fake
>> tcp packets in iptables and read them with tcpdump
>>
>>     iptables -A PREROUTING -t raw -p tcp --dport 9 -j DROP
>>     tcpdump src $src_ip
>>
>> Note that not all combinations of flags are supported by the kernel
>> and that some flags have non-obvious behavior (disable a feature, in
>> place of enable it).
>>
>> Specifically, mtu sized packets either must not pass a vnet_hdr or
>> must pass one with gso explicitly disabled ('-G').
>>
>>    psock_txring_vnet -s $src_ip $dst_ip -l 1400
>>    psock_txring_vnet -s $src_ip $dst_ip -l 1400 -v -G
>>    psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N
>>    psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N -v -G
>>
>> Conversely, packets that exceed mtu have to have the gso flags in the
>> virtio_net_hdr:
>>
>>    psock_txring_vnet -s $src_ip $dst_ip -l 4400 -v
>>    psock_txring_vnet -s $src_ip $dst_ip -l 4400 -N -v
>>
>> When sending a large packet, but not passing a virtio_net_hdr along
>> ('-v'), the test fails with
>>
>>    psock_txring_vnet: send: Message too long
>>
>> When passing a header along, but not disabling gso, the packet is
>> indeed dropped silently.
>>
>> I verified correct segmentation with three modes of ethtool
>>
>>    ethtool -K eth0 tso off gso off
>>    ethtool -K eth0 tso off gso on
>>    ethtool -K eth0 tso on gso on
>>
>> by reading tcpdump on the sender.
>>
>> The receive side results are the same with dev_queue_xmit and
>> packet_direct_xmit ('-q') mode. With direct_xmit, the packets are not
>> observed on the send side.
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12  8:46                         ` Anton Ivanov
@ 2017-10-12 13:39                           ` Willem de Bruijn
  2017-10-12 14:12                             ` Anton Ivanov
  0 siblings, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2017-10-12 13:39 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Anton Ivanov, Network Development, David Miller

> If I produce a real vnet frame out of a live kernel frame using
> virtio_net_hdr_from_skb() and try to send it it fails on the check in
> af_packet, while succeeding for tap. If I remove the af_packet check the
> frame is accepted by the hardware too.
>
> If I produce it a synthetic frame + vnet header using the test program - it
> works. Go figure.

Besides looking at the raw frame bytes, also compare the setup
of virtio_net_header, as well as the tcp checksum field. The stack
expects the pseudo header to have already been calculated.

> I am going to continue digging into it.
>
> At the very least I now have a positive test case which uses the same
> semantics as my code so I have something to compare to.

Glad to hear that the test is helpful. I wrote it because I
have run into these exact same issues in the past.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12 13:39                           ` Willem de Bruijn
@ 2017-10-12 14:12                             ` Anton Ivanov
  2017-10-12 15:44                               ` Anton Ivanov
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Ivanov @ 2017-10-12 14:12 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development



On 10/12/17 14:39, Willem de Bruijn wrote:
>> If I produce a real vnet frame out of a live kernel frame using
>> virtio_net_hdr_from_skb() and try to send it it fails on the check in
>> af_packet, while succeeding for tap. If I remove the af_packet check the
>> frame is accepted by the hardware too.
>>
>> If I produce it a synthetic frame + vnet header using the test program - it
>> works. Go figure.
> Besides looking at the raw frame bytes, also compare the setup
> of virtio_net_header, as well as the tcp checksum field. The stack
> expects the pseudo header to have already been calculated.

I am feeding it a skb which is coming up in the tx routine of a User 
Mode Linux device which is marked as NETIF_F_HW_CSUM and SG - that 
results in a skb with csum-ed headers, body set up for CSUM_PARTIAL and 
multiple fragments (always at least 1 more frag besides the TCP head).

That has everything in order as expected by virtio_net_hdr_from_skb and 
this is what I use to generate the vnet header. It works correctly for 
csum and GRO with af_packet and it works correctly for everything using 
a tap device. It fails only on GSO + af_packet TX.

What I am doing is the same thing virtio_net does - it just takes the 
output of virtio_net_hdr_from_skb and does nothing more. There should be 
no need to do anything more :(

It should just work.

Unless there is a gremlin somewhere in the machinery and that gremlin 
needs some light to be flushed out.
>
>> I am going to continue digging into it.
>>
>> At the very least I now have a positive test case which uses the same
>> semantics as my code so I have something to compare to.
> Glad to hear that the test is helpful. I wrote it because I
> have run into these exact same issues in the past.

It is. I have changes ready for it so it also supports vector IO, need 
to finish fighting with it.

A.

>

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12 14:12                             ` Anton Ivanov
@ 2017-10-12 15:44                               ` Anton Ivanov
  2017-10-12 15:57                                   ` [uml-devel] " Anton Ivanov
  2017-10-12 16:30                                 ` Willem de Bruijn
  0 siblings, 2 replies; 27+ messages in thread
From: Anton Ivanov @ 2017-10-12 15:44 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller

Found it.

Two bugs canceling each other.
The bind sequence in:  psock_txring_vnet.c is wrong.

It does the following addr.sll_protocol =    htons(ETH_P_IP);
before calling bind.

If you set addr.sll_protocol to ETH_P_ALL where it should have been in 
the first place the test program blows up with -ENOBUFS

I think what is happening is that this value is taken into account when 
looking at "what should I use to segment it with" in skb_mac_gso_segment 
which is invoked at the end of the verification chain which starts in 
packet_direct_xmit in af_packet.c

I have not tried the other test cases like setting it to ETH_P_IP and 
giving it IPv6 traffic or the opposite, but my guess is that these will 
fail too if they need GSO to be applied.

A.

On 10/12/17 15:12, Anton Ivanov wrote:
>
>
> On 10/12/17 14:39, Willem de Bruijn wrote:
>>> If I produce a real vnet frame out of a live kernel frame using
>>> virtio_net_hdr_from_skb() and try to send it it fails on the check in
>>> af_packet, while succeeding for tap. If I remove the af_packet check 
>>> the
>>> frame is accepted by the hardware too.
>>>
>>> If I produce it a synthetic frame + vnet header using the test 
>>> program - it
>>> works. Go figure.
>> Besides looking at the raw frame bytes, also compare the setup
>> of virtio_net_header, as well as the tcp checksum field. The stack
>> expects the pseudo header to have already been calculated.
>
> I am feeding it a skb which is coming up in the tx routine of a User 
> Mode Linux device which is marked as NETIF_F_HW_CSUM and SG - that 
> results in a skb with csum-ed headers, body set up for CSUM_PARTIAL 
> and multiple fragments (always at least 1 more frag besides the TCP 
> head).
>
> That has everything in order as expected by virtio_net_hdr_from_skb 
> and this is what I use to generate the vnet header. It works correctly 
> for csum and GRO with af_packet and it works correctly for everything 
> using a tap device. It fails only on GSO + af_packet TX.
>
> What I am doing is the same thing virtio_net does - it just takes the 
> output of virtio_net_hdr_from_skb and does nothing more. There should 
> be no need to do anything more :(
>
> It should just work.
>
> Unless there is a gremlin somewhere in the machinery and that gremlin 
> needs some light to be flushed out.
>>
>>> I am going to continue digging into it.
>>>
>>> At the very least I now have a positive test case which uses the same
>>> semantics as my code so I have something to compare to.
>> Glad to hear that the test is helpful. I wrote it because I
>> have run into these exact same issues in the past.
>
> It is. I have changes ready for it so it also supports vector IO, need 
> to finish fighting with it.
>
> A.
>
>>
>

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12 15:44                               ` Anton Ivanov
@ 2017-10-12 15:57                                   ` Anton Ivanov
  2017-10-12 16:30                                 ` Willem de Bruijn
  1 sibling, 0 replies; 27+ messages in thread
From: Anton Ivanov @ 2017-10-12 15:57 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller, user-mode-linux-devel

Also confirmed using my UML patchset. This is inside the UML guest.

root@blinky:~# iperf -c 192.168.98.1
------------------------------------------------------------
Client connecting to 192.168.98.1, TCP port 5001
TCP window size:  414 KByte (default)
------------------------------------------------------------
[  3] local 192.168.98.146 port 36744 connected with 192.168.98.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  4.52 GBytes  3.89 Gbits/sec

This is GSO on a raw socket Virtual NIC to host. Normal speed without 
GSO on same machine is ~ 500.

So for the time being I can turn off TSO for anything but v4 in the User 
Mode Linux patchset (declare the guest NIC as TSOV4 capable only) and 
enable it once the host kernel is fixed. Alternatively, there is the 
rather ugly approach of using multiple FDs - one for v4, one for v6, one 
for...

A.

On 10/12/17 16:44, Anton Ivanov wrote:
> Found it.
>
> Two bugs canceling each other.
> The bind sequence in:  psock_txring_vnet.c is wrong.
>
> It does the following addr.sll_protocol =    htons(ETH_P_IP);
> before calling bind.
>
> If you set addr.sll_protocol to ETH_P_ALL where it should have been in 
> the first place the test program blows up with -ENOBUFS
>
> I think what is happening is that this value is taken into account 
> when looking at "what should I use to segment it with" in 
> skb_mac_gso_segment which is invoked at the end of the verification 
> chain which starts in packet_direct_xmit in af_packet.c
>
> I have not tried the other test cases like setting it to ETH_P_IP and 
> giving it IPv6 traffic or the opposite, but my guess is that these 
> will fail too if they need GSO to be applied.
>
> A.
>
> On 10/12/17 15:12, Anton Ivanov wrote:
>>
>>
>> On 10/12/17 14:39, Willem de Bruijn wrote:
>>>> If I produce a real vnet frame out of a live kernel frame using
>>>> virtio_net_hdr_from_skb() and try to send it it fails on the check in
>>>> af_packet, while succeeding for tap. If I remove the af_packet 
>>>> check the
>>>> frame is accepted by the hardware too.
>>>>
>>>> If I produce it a synthetic frame + vnet header using the test 
>>>> program - it
>>>> works. Go figure.
>>> Besides looking at the raw frame bytes, also compare the setup
>>> of virtio_net_header, as well as the tcp checksum field. The stack
>>> expects the pseudo header to have already been calculated.
>>
>> I am feeding it a skb which is coming up in the tx routine of a User 
>> Mode Linux device which is marked as NETIF_F_HW_CSUM and SG - that 
>> results in a skb with csum-ed headers, body set up for CSUM_PARTIAL 
>> and multiple fragments (always at least 1 more frag besides the TCP 
>> head).
>>
>> That has everything in order as expected by virtio_net_hdr_from_skb 
>> and this is what I use to generate the vnet header. It works 
>> correctly for csum and GRO with af_packet and it works correctly for 
>> everything using a tap device. It fails only on GSO + af_packet TX.
>>
>> What I am doing is the same thing virtio_net does - it just takes the 
>> output of virtio_net_hdr_from_skb and does nothing more. There should 
>> be no need to do anything more :(
>>
>> It should just work.
>>
>> Unless there is a gremlin somewhere in the machinery and that gremlin 
>> needs some light to be flushed out.
>>>
>>>> I am going to continue digging into it.
>>>>
>>>> At the very least I now have a positive test case which uses the same
>>>> semantics as my code so I have something to compare to.
>>> Glad to hear that the test is helpful. I wrote it because I
>>> have run into these exact same issues in the past.
>>
>> It is. I have changes ready for it so it also supports vector IO, 
>> need to finish fighting with it.
>>
>> A.
>>
>>>
>>
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] BUG:af_packet fails to TX TSO frames
@ 2017-10-12 15:57                                   ` Anton Ivanov
  0 siblings, 0 replies; 27+ messages in thread
From: Anton Ivanov @ 2017-10-12 15:57 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller, user-mode-linux-devel

Also confirmed using my UML patchset. This is inside the UML guest.

root@blinky:~# iperf -c 192.168.98.1
------------------------------------------------------------
Client connecting to 192.168.98.1, TCP port 5001
TCP window size:  414 KByte (default)
------------------------------------------------------------
[  3] local 192.168.98.146 port 36744 connected with 192.168.98.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  4.52 GBytes  3.89 Gbits/sec

This is GSO on a raw socket Virtual NIC to host. Normal speed without 
GSO on same machine is ~ 500.

So for the time being I can turn off TSO for anything but v4 in the User 
Mode Linux patchset (declare the guest NIC as TSOV4 capable only) and 
enable it once the host kernel is fixed. Alternatively, there is the 
rather ugly approach of using multiple FDs - one for v4, one for v6, one 
for...

A.

On 10/12/17 16:44, Anton Ivanov wrote:
> Found it.
>
> Two bugs canceling each other.
> The bind sequence in:  psock_txring_vnet.c is wrong.
>
> It does the following addr.sll_protocol =    htons(ETH_P_IP);
> before calling bind.
>
> If you set addr.sll_protocol to ETH_P_ALL where it should have been in 
> the first place the test program blows up with -ENOBUFS
>
> I think what is happening is that this value is taken into account 
> when looking at "what should I use to segment it with" in 
> skb_mac_gso_segment which is invoked at the end of the verification 
> chain which starts in packet_direct_xmit in af_packet.c
>
> I have not tried the other test cases like setting it to ETH_P_IP and 
> giving it IPv6 traffic or the opposite, but my guess is that these 
> will fail too if they need GSO to be applied.
>
> A.
>
> On 10/12/17 15:12, Anton Ivanov wrote:
>>
>>
>> On 10/12/17 14:39, Willem de Bruijn wrote:
>>>> If I produce a real vnet frame out of a live kernel frame using
>>>> virtio_net_hdr_from_skb() and try to send it it fails on the check in
>>>> af_packet, while succeeding for tap. If I remove the af_packet 
>>>> check the
>>>> frame is accepted by the hardware too.
>>>>
>>>> If I produce it a synthetic frame + vnet header using the test 
>>>> program - it
>>>> works. Go figure.
>>> Besides looking at the raw frame bytes, also compare the setup
>>> of virtio_net_header, as well as the tcp checksum field. The stack
>>> expects the pseudo header to have already been calculated.
>>
>> I am feeding it a skb which is coming up in the tx routine of a User 
>> Mode Linux device which is marked as NETIF_F_HW_CSUM and SG - that 
>> results in a skb with csum-ed headers, body set up for CSUM_PARTIAL 
>> and multiple fragments (always at least 1 more frag besides the TCP 
>> head).
>>
>> That has everything in order as expected by virtio_net_hdr_from_skb 
>> and this is what I use to generate the vnet header. It works 
>> correctly for csum and GRO with af_packet and it works correctly for 
>> everything using a tap device. It fails only on GSO + af_packet TX.
>>
>> What I am doing is the same thing virtio_net does - it just takes the 
>> output of virtio_net_hdr_from_skb and does nothing more. There should 
>> be no need to do anything more :(
>>
>> It should just work.
>>
>> Unless there is a gremlin somewhere in the machinery and that gremlin 
>> needs some light to be flushed out.
>>>
>>>> I am going to continue digging into it.
>>>>
>>>> At the very least I now have a positive test case which uses the same
>>>> semantics as my code so I have something to compare to.
>>> Glad to hear that the test is helpful. I wrote it because I
>>> have run into these exact same issues in the past.
>>
>> It is. I have changes ready for it so it also supports vector IO, 
>> need to finish fighting with it.
>>
>> A.
>>
>>>
>>
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12 15:44                               ` Anton Ivanov
  2017-10-12 15:57                                   ` [uml-devel] " Anton Ivanov
@ 2017-10-12 16:30                                 ` Willem de Bruijn
  2017-10-12 17:25                                   ` Willem de Bruijn
  1 sibling, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2017-10-12 16:30 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Network Development, David Miller

On Thu, Oct 12, 2017 at 11:44 AM, Anton Ivanov
<anton.ivanov@cambridgegreys.com> wrote:
> Found it.
>
> Two bugs canceling each other.
> The bind sequence in:  psock_txring_vnet.c is wrong.
>
> It does the following addr.sll_protocol =    htons(ETH_P_IP);
> before calling bind.
>
> If you set addr.sll_protocol to ETH_P_ALL where it should have been in the
> first place the test program blows up with -ENOBUFS

There is no such requirement that the socket should bind to ETH_P_ALL.

> I think what is happening is that this value is taken into account when
> looking at "what should I use to segment it with" in skb_mac_gso_segment
> which is invoked at the end of the verification chain which starts in
> packet_direct_xmit in af_packet.c

packet_snd sets skb->protocol based on the protocol that the packet
socket is bound to. Binding to ETH_P_IP is the right choice here.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12 16:30                                 ` Willem de Bruijn
@ 2017-10-12 17:25                                   ` Willem de Bruijn
  2017-10-12 17:58                                     ` Anton Ivanov
  0 siblings, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2017-10-12 17:25 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Network Development, David Miller

On Thu, Oct 12, 2017 at 12:30 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Thu, Oct 12, 2017 at 11:44 AM, Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>> Found it.
>>
>> Two bugs canceling each other.
>> The bind sequence in:  psock_txring_vnet.c is wrong.
>>
>> It does the following addr.sll_protocol =    htons(ETH_P_IP);
>> before calling bind.
>>
>> If you set addr.sll_protocol to ETH_P_ALL where it should have been in the
>> first place the test program blows up with -ENOBUFS
>
> There is no such requirement that the socket should bind to ETH_P_ALL.
>
>> I think what is happening is that this value is taken into account when
>> looking at "what should I use to segment it with" in skb_mac_gso_segment
>> which is invoked at the end of the verification chain which starts in
>> packet_direct_xmit in af_packet.c
>
> packet_snd sets skb->protocol based on the protocol that the packet
> socket is bound to. Binding to ETH_P_IP is the right choice here.

To avoid having to open multiple sockets for different protocols,
sockaddr_ll can also be passed in the msg_name argument on
each call.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12 17:25                                   ` Willem de Bruijn
@ 2017-10-12 17:58                                     ` Anton Ivanov
  2017-10-12 18:41                                       ` Willem de Bruijn
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Ivanov @ 2017-10-12 17:58 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller



On 10/12/17 18:25, Willem de Bruijn wrote:
> On Thu, Oct 12, 2017 at 12:30 PM, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Thu, Oct 12, 2017 at 11:44 AM, Anton Ivanov
>> <anton.ivanov@cambridgegreys.com> wrote:
>>> Found it.
>>>
>>> Two bugs canceling each other.
>>> The bind sequence in:  psock_txring_vnet.c is wrong.
>>>
>>> It does the following addr.sll_protocol =    htons(ETH_P_IP);
>>> before calling bind.
>>>
>>> If you set addr.sll_protocol to ETH_P_ALL where it should have been in the
>>> first place the test program blows up with -ENOBUFS
>> There is no such requirement that the socket should bind to ETH_P_ALL.

There is no requirement to bind to ETH_P_IP either and most code 
examples going back more than 10 years to the days of TCP Illustrated 
use ALL.

I just did the v6 test - if you bind with ETH_P_IP GSO on TCPv6 is 
broken and returns NOBUF and vice versa.

>>
>>> I think what is happening is that this value is taken into account when
>>> looking at "what should I use to segment it with" in skb_mac_gso_segment
>>> which is invoked at the end of the verification chain which starts in
>>> packet_direct_xmit in af_packet.c
>> packet_snd sets skb->protocol based on the protocol that the packet
>> socket is bound to. Binding to ETH_P_IP is the right choice here.
> To avoid having to open multiple sockets for different protocols,
> sockaddr_ll can also be passed in the msg_name argument on
> each call.

Does not work for vnet headers - it honors what you bound with. I tried 
to bind with ETH_ALL and pass ETH_P_IP as an arg and it ENOBUF-ed

>

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12 17:58                                     ` Anton Ivanov
@ 2017-10-12 18:41                                       ` Willem de Bruijn
  2017-10-12 19:55                                         ` Anton Ivanov
  0 siblings, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2017-10-12 18:41 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Network Development, David Miller

On Thu, Oct 12, 2017 at 1:58 PM, Anton Ivanov
<anton.ivanov@cambridgegreys.com> wrote:
>
>
> On 10/12/17 18:25, Willem de Bruijn wrote:
>>
>> On Thu, Oct 12, 2017 at 12:30 PM, Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>>>
>>> On Thu, Oct 12, 2017 at 11:44 AM, Anton Ivanov
>>> <anton.ivanov@cambridgegreys.com> wrote:
>>>>
>>>> Found it.
>>>>
>>>> Two bugs canceling each other.
>>>> The bind sequence in:  psock_txring_vnet.c is wrong.
>>>>
>>>> It does the following addr.sll_protocol =    htons(ETH_P_IP);
>>>> before calling bind.
>>>>
>>>> If you set addr.sll_protocol to ETH_P_ALL where it should have been in
>>>> the
>>>> first place the test program blows up with -ENOBUFS
>>>
>>> There is no such requirement that the socket should bind to ETH_P_ALL.
>
>
> There is no requirement to bind to ETH_P_IP either and most code examples
> going back more than 10 years to the days of TCP Illustrated use ALL.

For send only sockets, it is often advised to pass 0 as protocol to
socket(), so as to avoid having to spend cycles on packet reception
at all.

Commit c72219b75fde ("packet: infer protocol from ethernet header
if unset") explicitly added logic to infer skb->protocol for this common
case of sockets, if using rings.

> I just did the v6 test - if you bind with ETH_P_IP GSO on TCPv6 is broken
> and returns NOBUF and vice versa.

Given that skb->protocol is set from proto, that is indeed not expected to work.

>>>> I think what is happening is that this value is taken into account when
>>>> looking at "what should I use to segment it with" in skb_mac_gso_segment
>>>> which is invoked at the end of the verification chain which starts in
>>>> packet_direct_xmit in af_packet.c
>>>
>>> packet_snd sets skb->protocol based on the protocol that the packet
>>> socket is bound to. Binding to ETH_P_IP is the right choice here.
>>
>> To avoid having to open multiple sockets for different protocols,
>> sockaddr_ll can also be passed in the msg_name argument on
>> each call.
>
>
> Does not work for vnet headers - it honors what you bound with. I tried to
> bind with ETH_ALL and pass ETH_P_IP as an arg and it ENOBUF-ed

Odd. The code for looking up proto in packet_snd looks fairly straightforward:

        /*
         *      Get and verify the address.
         */

        if (likely(saddr == NULL)) {
                dev     = packet_cached_dev_get(po);
                proto   = po->num;
                addr    = NULL;
        } else {
                err = -EINVAL;
                if (msg->msg_namelen < sizeof(struct sockaddr_ll))
                        goto out;
                if (msg->msg_namelen < (saddr->sll_halen +
offsetof(struct sockaddr_ll, sll_addr)))
                        goto out;
                proto   = saddr->sll_protocol;
                addr    = saddr->sll_addr;
                dev = dev_get_by_index(sock_net(sk), saddr->sll_ifindex);
        }

followed later by

        skb->protocol = proto;
        skb->dev = dev;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: BUG:af_packet fails to TX TSO frames
  2017-10-12 18:41                                       ` Willem de Bruijn
@ 2017-10-12 19:55                                         ` Anton Ivanov
  0 siblings, 0 replies; 27+ messages in thread
From: Anton Ivanov @ 2017-10-12 19:55 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller

[snip]

>> There is no requirement to bind to ETH_P_IP either and most code examples
>> going back more than 10 years to the days of TCP Illustrated use ALL.
> For send only sockets, it is often advised to pass 0 as protocol to
> socket(), so as to avoid having to spend cycles on packet reception
> at all.

Normally, it would have been a single rw socket. Thankfully, I built
into the UML vector IO patchset the ability to use different fds for
read and write per virtual NIC.

Coming back to "normally" - as an application developer I would have
expected a rw socket to work and not to need 2+ sockets for this. Having
to work around at that level is IMHO a bit over the top.

>
> Commit c72219b75fde ("packet: infer protocol from ethernet header
> if unset") explicitly added logic to infer skb->protocol for this common
> case of sockets, if using rings.
I will look into it.

>
>> I just did the v6 test - if you bind with ETH_P_IP GSO on TCPv6 is broken
>> and returns NOBUF and vice versa.
> Given that skb->protocol is set from proto, that is indeed not expected to work.
>
>>>>> I think what is happening is that this value is taken into account when
>>>>> looking at "what should I use to segment it with" in skb_mac_gso_segment
>>>>> which is invoked at the end of the verification chain which starts in
>>>>> packet_direct_xmit in af_packet.c
>>>> packet_snd sets skb->protocol based on the protocol that the packet
>>>> socket is bound to. Binding to ETH_P_IP is the right choice here.
>>> To avoid having to open multiple sockets for different protocols,
>>> sockaddr_ll can also be passed in the msg_name argument on
>>> each call.
>>
>> Does not work for vnet headers - it honors what you bound with. I tried to
>> bind with ETH_ALL and pass ETH_P_IP as an arg and it ENOBUF-ed
> Odd. The code for looking up proto in packet_snd looks fairly straightforward:

I will double-check it tomorrow and send you a pull request for the
updated test application.

>
>         /*
>          *      Get and verify the address.
>          */
>
>         if (likely(saddr == NULL)) {
>                 dev     = packet_cached_dev_get(po);
>                 proto   = po->num;
>                 addr    = NULL;
>         } else {
>                 err = -EINVAL;
>                 if (msg->msg_namelen < sizeof(struct sockaddr_ll))
>                         goto out;
>                 if (msg->msg_namelen < (saddr->sll_halen +
> offsetof(struct sockaddr_ll, sll_addr)))
>                         goto out;
>                 proto   = saddr->sll_protocol;
>                 addr    = saddr->sll_addr;
>                 dev = dev_get_by_index(sock_net(sk), saddr->sll_ifindex);
>         }
>
> followed later by
>
>         skb->protocol = proto;
>         skb->dev = dev;
>


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Not BUG, feature :) af_packet fails to TX TSO frames
  2017-10-11 13:39 ` Willem de Bruijn
  2017-10-11 13:50   ` Anton Ivanov
@ 2017-10-13  7:25   ` Anton Ivanov
  1 sibling, 0 replies; 27+ messages in thread
From: Anton Ivanov @ 2017-10-13  7:25 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development

Let's reset this one.

I cannot say I am happy with the current number of caveats on using 
this. We should at least throw a small page similar to the one dedicated 
to tpacket_v2 into the kernel docs (or amend the offload docs). As I do 
not know all the corner cases well enough I find it difficult to 
volunteer on this one, it should be someone more familiar with the code. 
If they are documented, we can say that this is not a bug, it is a 
feature :)

What I can volunteer on is contributing some code to the actual 
af_packet.c. When I did the digging in the source, I noticed that tap 
has an unfair advantage over raw by allowing the user to specify buffers 
from userspace. I am happy to port that, test it and contribute it.

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-10-13  7:25 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-11  8:39 BUG:af_packet fails to TX TSO frames Anton Ivanov
2017-10-11 13:39 ` Willem de Bruijn
2017-10-11 13:50   ` Anton Ivanov
2017-10-11 15:54     ` Anton Ivanov
2017-10-11 16:26       ` Willem de Bruijn
2017-10-11 16:32         ` Anton Ivanov
2017-10-11 18:39         ` Anton Ivanov
2017-10-11 18:57           ` Willem de Bruijn
2017-10-11 19:39             ` Anton Ivanov
2017-10-11 21:02               ` [uml-devel] Fwd: " Anton Ivanov
2017-10-11 21:05               ` Willem de Bruijn
2017-10-11 21:55                 ` Anton Ivanov
2017-10-11 22:01                   ` Anton Ivanov
2017-10-12  0:19                     ` Willem de Bruijn
2017-10-12  6:11                       ` Anton Ivanov
2017-10-12  8:46                         ` Anton Ivanov
2017-10-12 13:39                           ` Willem de Bruijn
2017-10-12 14:12                             ` Anton Ivanov
2017-10-12 15:44                               ` Anton Ivanov
2017-10-12 15:57                                 ` Anton Ivanov
2017-10-12 15:57                                   ` [uml-devel] " Anton Ivanov
2017-10-12 16:30                                 ` Willem de Bruijn
2017-10-12 17:25                                   ` Willem de Bruijn
2017-10-12 17:58                                     ` Anton Ivanov
2017-10-12 18:41                                       ` Willem de Bruijn
2017-10-12 19:55                                         ` Anton Ivanov
2017-10-13  7:25   ` Not BUG, feature :) af_packet " Anton Ivanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.