* 64 bit time regression in recvmmsg() @ 2019-11-29 14:34 Anton Ivanov 2019-11-29 15:05 ` Geert Uytterhoeven 0 siblings, 1 reply; 12+ messages in thread From: Anton Ivanov @ 2019-11-29 14:34 UTC (permalink / raw) To: linux-um Hi all, Unfortunately, it looks like the recent year 2038 have broken compatibility for one particular syscall interface we use - recvmmsg. The host now occasionally returns -22 (EINVAL) and the only way I see for this to happen looking at the source is if when it gets something bogus as a timeout. I think I have eliminated all other possible sources for this error. The picture can be observed when using a 64 bit host 5.2 kernel on a Debian 64 bit buster userspace (glibc compiled vs 4.19 headers). The code as it is written at present retries and by sheer luck and perseverance it manages to work, but this needs to be fixed. Brgds, -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 64 bit time regression in recvmmsg() 2019-11-29 14:34 64 bit time regression in recvmmsg() Anton Ivanov @ 2019-11-29 15:05 ` Geert Uytterhoeven 2019-11-29 15:17 ` Arnd Bergmann 0 siblings, 1 reply; 12+ messages in thread From: Geert Uytterhoeven @ 2019-11-29 15:05 UTC (permalink / raw) To: Anton Ivanov; +Cc: linux-um, Arnd Bergmann CC Arnd On Fri, Nov 29, 2019 at 3:34 PM Anton Ivanov <anton.ivanov@cambridgegreys.com> wrote: > Unfortunately, it looks like the recent year 2038 have broken > compatibility for one particular syscall interface we use - recvmmsg. > > The host now occasionally returns -22 (EINVAL) and the only way I see > for this to happen looking at the source is if when it gets something > bogus as a timeout. > > I think I have eliminated all other possible sources for this error. > > The picture can be observed when using a 64 bit host 5.2 kernel on a > Debian 64 bit buster userspace (glibc compiled vs 4.19 headers). > > The code as it is written at present retries and by sheer luck and > perseverance it manages to work, but this needs to be fixed. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 64 bit time regression in recvmmsg() 2019-11-29 15:05 ` Geert Uytterhoeven @ 2019-11-29 15:17 ` Arnd Bergmann 2019-11-29 16:34 ` Anton Ivanov 0 siblings, 1 reply; 12+ messages in thread From: Arnd Bergmann @ 2019-11-29 15:17 UTC (permalink / raw) To: Geert Uytterhoeven; +Cc: Anton Ivanov, linux-um On Fri, Nov 29, 2019 at 4:05 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Fri, Nov 29, 2019 at 3:34 PM Anton Ivanov > <anton.ivanov@cambridgegreys.com> wrote: > > Unfortunately, it looks like the recent year 2038 have broken > > compatibility for one particular syscall interface we use - recvmmsg. > > > > The host now occasionally returns -22 (EINVAL) and the only way I see > > for this to happen looking at the source is if when it gets something > > bogus as a timeout. > > > > I think I have eliminated all other possible sources for this error. > > > > The picture can be observed when using a 64 bit host 5.2 kernel on a > > Debian 64 bit buster userspace (glibc compiled vs 4.19 headers). > > > > The code as it is written at present retries and by sheer luck and > > perseverance it manages to work, but this needs to be fixed. I only see a single call to recvmmsg() in arch/um, in uml_vector_recvmmsg(), and this passes a NULL timeout pointer, is this the one that broke or should I be looking at something else? Do I understand you right that the regression is on a pure 64-bit system? Arnd ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 64 bit time regression in recvmmsg() 2019-11-29 15:17 ` Arnd Bergmann @ 2019-11-29 16:34 ` Anton Ivanov 2019-12-04 21:08 ` Arnd Bergmann 2019-12-06 17:49 ` Anton Ivanov 0 siblings, 2 replies; 12+ messages in thread From: Anton Ivanov @ 2019-11-29 16:34 UTC (permalink / raw) To: Arnd Bergmann, Geert Uytterhoeven; +Cc: linux-um On 29/11/2019 15:17, Arnd Bergmann wrote: > On Fri, Nov 29, 2019 at 4:05 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: >> On Fri, Nov 29, 2019 at 3:34 PM Anton Ivanov >> <anton.ivanov@cambridgegreys.com> wrote: >>> Unfortunately, it looks like the recent year 2038 have broken >>> compatibility for one particular syscall interface we use - recvmmsg. >>> >>> The host now occasionally returns -22 (EINVAL) and the only way I see >>> for this to happen looking at the source is if when it gets something >>> bogus as a timeout. >>> >>> I think I have eliminated all other possible sources for this error. >>> >>> The picture can be observed when using a 64 bit host 5.2 kernel on a >>> Debian 64 bit buster userspace (glibc compiled vs 4.19 headers). >>> >>> The code as it is written at present retries and by sheer luck and >>> perseverance it manages to work, but this needs to be fixed. > > I only see a single call to recvmmsg() in arch/um, in uml_vector_recvmmsg(), > and this passes a NULL timeout pointer, is this the one that broke or > should I be looking at something else? > > Do I understand you right that the regression is on a pure 64-bit system? Yes. 64 bit host, Debian Buster with the stock 4.19 replaced by a 5.2.17 kernel upgrade, 5.2 and 5.3 64 bit UML also running Debian Buster inside. It sporadically produces -EINVAL (-22) from the recvmmsg. I went through the allocation and deallocation of the actual mmsg several times to ensure it is not that. IMHO it is the timespec conversion somewhere, but I cannot pinpoint the actual cause. The kernel command line is: /var/autofs/local/src/linux-work/linux/vmlinux mem=2048M umid=OPX \ ubd0=/var/autofs/local/UML/OPX-3.0-Work.img \ vec0:transport=raw,ifname=pveth0,depth=128,gro=1,mac=92:9b:36:5e:38:69 \ root=/dev/ubda ro con=null con0=null,fd:2 con1=fd:0,fd:1 pveth0 is one half of a veth pair created by doing ip link add lveth0 type veth peer name pveth0 && ifconfig pveth0 up lveth0 is used for the host side. Brgds, > > Arnd > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 64 bit time regression in recvmmsg() 2019-11-29 16:34 ` Anton Ivanov @ 2019-12-04 21:08 ` Arnd Bergmann 2019-12-05 9:41 ` Anton Ivanov 2019-12-06 17:49 ` Anton Ivanov 1 sibling, 1 reply; 12+ messages in thread From: Arnd Bergmann @ 2019-12-04 21:08 UTC (permalink / raw) To: Anton Ivanov; +Cc: Geert Uytterhoeven, linux-um On Fri, Nov 29, 2019 at 5:34 PM Anton Ivanov <anton.ivanov@cambridgegreys.com> wrote: > > > > On 29/11/2019 15:17, Arnd Bergmann wrote: > > On Fri, Nov 29, 2019 at 4:05 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > >> On Fri, Nov 29, 2019 at 3:34 PM Anton Ivanov > >> <anton.ivanov@cambridgegreys.com> wrote: > >>> Unfortunately, it looks like the recent year 2038 have broken > >>> compatibility for one particular syscall interface we use - recvmmsg. > >>> > >>> The host now occasionally returns -22 (EINVAL) and the only way I see > >>> for this to happen looking at the source is if when it gets something > >>> bogus as a timeout. > >>> > >>> I think I have eliminated all other possible sources for this error. > >>> > >>> The picture can be observed when using a 64 bit host 5.2 kernel on a > >>> Debian 64 bit buster userspace (glibc compiled vs 4.19 headers). > >>> > >>> The code as it is written at present retries and by sheer luck and > >>> perseverance it manages to work, but this needs to be fixed. > > > > I only see a single call to recvmmsg() in arch/um, in uml_vector_recvmmsg(), > > and this passes a NULL timeout pointer, is this the one that broke or > > should I be looking at something else? > > > > Do I understand you right that the regression is on a pure 64-bit system? > > Yes. > > 64 bit host, Debian Buster with the stock 4.19 replaced by a 5.2.17 > kernel upgrade, 5.2 and 5.3 64 bit UML also running Debian Buster inside. > > It sporadically produces -EINVAL (-22) from the recvmmsg. > > I went through the allocation and deallocation of the actual mmsg > several times to ensure it is not that. > > IMHO it is the timespec conversion somewhere, but I cannot pinpoint the > actual cause. > I've looked at the changes again as well now without finding anything suspicious. The only commit of mine that significantly changes recvmmsg is e11d4284e2f4 ("y2038: socket: Add compat_sys_recvmmsg_time64"). I tried reverting it on top of 5.2.17, but that causes a lot of conflicts. The best suggestion I still have would be to check out the kernel before and after this commit, to see if it is the root cause. This one was part of the linux-5.0 merge window, so right in the middle of the range you have identified. Arnd ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 64 bit time regression in recvmmsg() 2019-12-04 21:08 ` Arnd Bergmann @ 2019-12-05 9:41 ` Anton Ivanov 0 siblings, 0 replies; 12+ messages in thread From: Anton Ivanov @ 2019-12-05 9:41 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Geert Uytterhoeven, linux-um On 04/12/2019 21:08, Arnd Bergmann wrote: > On Fri, Nov 29, 2019 at 5:34 PM Anton Ivanov > <anton.ivanov@cambridgegreys.com> wrote: >> >> >> >> On 29/11/2019 15:17, Arnd Bergmann wrote: >>> On Fri, Nov 29, 2019 at 4:05 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: >>>> On Fri, Nov 29, 2019 at 3:34 PM Anton Ivanov >>>> <anton.ivanov@cambridgegreys.com> wrote: >>>>> Unfortunately, it looks like the recent year 2038 have broken >>>>> compatibility for one particular syscall interface we use - recvmmsg. >>>>> >>>>> The host now occasionally returns -22 (EINVAL) and the only way I see >>>>> for this to happen looking at the source is if when it gets something >>>>> bogus as a timeout. >>>>> >>>>> I think I have eliminated all other possible sources for this error. >>>>> >>>>> The picture can be observed when using a 64 bit host 5.2 kernel on a >>>>> Debian 64 bit buster userspace (glibc compiled vs 4.19 headers). >>>>> >>>>> The code as it is written at present retries and by sheer luck and >>>>> perseverance it manages to work, but this needs to be fixed. >>> >>> I only see a single call to recvmmsg() in arch/um, in uml_vector_recvmmsg(), >>> and this passes a NULL timeout pointer, is this the one that broke or >>> should I be looking at something else? >>> >>> Do I understand you right that the regression is on a pure 64-bit system? >> >> Yes. >> >> 64 bit host, Debian Buster with the stock 4.19 replaced by a 5.2.17 >> kernel upgrade, 5.2 and 5.3 64 bit UML also running Debian Buster inside. >> >> It sporadically produces -EINVAL (-22) from the recvmmsg. >> >> I went through the allocation and deallocation of the actual mmsg >> several times to ensure it is not that. >> >> IMHO it is the timespec conversion somewhere, but I cannot pinpoint the >> actual cause. >> > > I've looked at the changes again as well now without finding anything > suspicious. The only commit of mine that significantly changes recvmmsg > is e11d4284e2f4 ("y2038: socket: Add compat_sys_recvmmsg_time64"). > I tried reverting it on top of 5.2.17, but that causes a lot of conflicts. > > The best suggestion I still have would be to check out the kernel before > and after this commit, to see if it is the root cause. This one was part > of the linux-5.0 merge window, so right in the middle of the range > you have identified. > > Arnd > I am still chasing this - there is some host regression in play too. Once I have a better understanding on what is going on I will post more info. Brgds, -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 64 bit time regression in recvmmsg() 2019-11-29 16:34 ` Anton Ivanov 2019-12-04 21:08 ` Arnd Bergmann @ 2019-12-06 17:49 ` Anton Ivanov 2019-12-06 20:06 ` Arnd Bergmann 1 sibling, 1 reply; 12+ messages in thread From: Anton Ivanov @ 2019-12-06 17:49 UTC (permalink / raw) To: Arnd Bergmann, Geert Uytterhoeven; +Cc: linux-um On 29/11/2019 16:34, Anton Ivanov wrote: > > > On 29/11/2019 15:17, Arnd Bergmann wrote: >> On Fri, Nov 29, 2019 at 4:05 PM Geert Uytterhoeven >> <geert@linux-m68k.org> wrote: >>> On Fri, Nov 29, 2019 at 3:34 PM Anton Ivanov >>> <anton.ivanov@cambridgegreys.com> wrote: >>>> Unfortunately, it looks like the recent year 2038 have broken >>>> compatibility for one particular syscall interface we use - recvmmsg. >>>> >>>> The host now occasionally returns -22 (EINVAL) and the only way I see >>>> for this to happen looking at the source is if when it gets something >>>> bogus as a timeout. >>>> >>>> I think I have eliminated all other possible sources for this error. >>>> >>>> The picture can be observed when using a 64 bit host 5.2 kernel on a >>>> Debian 64 bit buster userspace (glibc compiled vs 4.19 headers). >>>> >>>> The code as it is written at present retries and by sheer luck and >>>> perseverance it manages to work, but this needs to be fixed. >> >> I only see a single call to recvmmsg() in arch/um, in >> uml_vector_recvmmsg(), >> and this passes a NULL timeout pointer, is this the one that broke or >> should I be looking at something else? >> >> Do I understand you right that the regression is on a pure 64-bit system? > > Yes. > > 64 bit host, Debian Buster with the stock 4.19 replaced by a 5.2.17 > kernel upgrade, 5.2 and 5.3 64 bit UML also running Debian Buster inside. > > It sporadically produces -EINVAL (-22) from the recvmmsg. > > I went through the allocation and deallocation of the actual mmsg > several times to ensure it is not that. > > IMHO it is the timespec conversion somewhere, but I cannot pinpoint the > actual cause. > > The kernel command line is: > > > /var/autofs/local/src/linux-work/linux/vmlinux mem=2048M umid=OPX \ > ubd0=/var/autofs/local/UML/OPX-3.0-Work.img \ > vec0:transport=raw,ifname=pveth0,depth=128,gro=1,mac=92:9b:36:5e:38:69 \ > root=/dev/ubda ro con=null con0=null,fd:2 con1=fd:0,fd:1 > > pveth0 is one half of a veth pair created by doing > > > ip link add lveth0 type veth peer name pveth0 && ifconfig pveth0 up > > lveth0 is used for the host side. > > Brgds, > >> >> Arnd >> > Arnd, I apologize, problem elsewhere. I have narrowed it down, it is a host regression at the end, not a recvmmsg/time one. The EINVAL is occasionally returned from the guts of packet_rcv_vnet https://elixir.bootlin.com/linux/latest/ident/packet_rcv_vnet in af_packet. I am going to try to figure out exactly when it happens and why. My sincere apologies, -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 64 bit time regression in recvmmsg() 2019-12-06 17:49 ` Anton Ivanov @ 2019-12-06 20:06 ` Arnd Bergmann 2019-12-09 9:19 ` Anton Ivanov 0 siblings, 1 reply; 12+ messages in thread From: Arnd Bergmann @ 2019-12-06 20:06 UTC (permalink / raw) To: Anton Ivanov; +Cc: Geert Uytterhoeven, linux-um On Fri, Dec 6, 2019 at 6:50 PM Anton Ivanov <anton.ivanov@cambridgegreys.com> wrote: > On 29/11/2019 16:34, Anton Ivanov wrote: > > I apologize, problem elsewhere. I have narrowed it down, it is a host > regression at the end, not a recvmmsg/time one. > > The EINVAL is occasionally returned from the guts of packet_rcv_vnet > > https://elixir.bootlin.com/linux/latest/ident/packet_rcv_vnet > > in af_packet. I am going to try to figure out exactly when it happens > and why. > > My sincere apologies, No, worries, I'm glad it wasn't me this time ;-) Arnd ^ permalink raw reply [flat|nested] 12+ messages in thread
* Invalid GSO - from 4.x (TBA) to 5.5-rc1. Was: Re: 64 bit time regression in recvmmsg() 2019-12-06 20:06 ` Arnd Bergmann @ 2019-12-09 9:19 ` Anton Ivanov 0 siblings, 0 replies; 12+ messages in thread From: Anton Ivanov @ 2019-12-09 9:19 UTC (permalink / raw) To: Arnd Bergmann; +Cc: linux-um, netdev On 06/12/2019 20:06, Arnd Bergmann wrote: > On Fri, Dec 6, 2019 at 6:50 PM Anton Ivanov > <anton.ivanov@cambridgegreys.com> wrote: >> On 29/11/2019 16:34, Anton Ivanov wrote: >> >> I apologize, problem elsewhere. I have narrowed it down, it is a host >> regression at the end, not a recvmmsg/time one. >> >> The EINVAL is occasionally returned from the guts of packet_rcv_vnet >> >> https://elixir.bootlin.com/linux/latest/ident/packet_rcv_vnet >> >> in af_packet. I am going to try to figure out exactly when it happens >> and why. >> >> My sincere apologies, > > No, worries, I'm glad it wasn't me this time ;-) At some point in late 4.x (I am going to try narrowing down the version), gso code introduced a condition which should not occur: We have sinfo(skb)->gso_type = 0 while sinfo(skb)->gso_size = 752. skb->len = 818 skb->data_len = 0 As a result we get a skb which is GSO, but it has a no GSO type. This shows up in virtio_net_hdr_from_skb() which detects the condition as invalid and returns an -EINVAL A few interesting things. 1. Size is always 752. 2. I have found it while tracing an -EINVAL when using recvmmsg() on raw sockets. No idea if it shows up elsewhere. 3. The environment under test is reading/writing to a raw socket opened on a vEth pair configured as follows: ip link add l-veth0 type veth peer name p-veth0 && ifconfig p-veth0 up ifconfig l-veth0 192.168.97.1 netmask 255.255.255.252 the UML linux instance used as a reader/writer relies on recvmmsg/sendmmesg raw socket with vnet headers enabled. virtio_net_hdr_from_skb() is called from the af_packet packet_recv_vnet function. This condition simply should not occur. A skb with no data in the frags, gso on, gso size less than MTU and no type looks bogus. > > Arnd > > _______________________________________________ > linux-um mailing list > linux-um@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-um > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Invalid GSO - from 4.x (TBA) to 5.5-rc1. Was: Re: 64 bit time regression in recvmmsg() @ 2019-12-09 9:19 ` Anton Ivanov 0 siblings, 0 replies; 12+ messages in thread From: Anton Ivanov @ 2019-12-09 9:19 UTC (permalink / raw) To: Arnd Bergmann; +Cc: netdev, linux-um On 06/12/2019 20:06, Arnd Bergmann wrote: > On Fri, Dec 6, 2019 at 6:50 PM Anton Ivanov > <anton.ivanov@cambridgegreys.com> wrote: >> On 29/11/2019 16:34, Anton Ivanov wrote: >> >> I apologize, problem elsewhere. I have narrowed it down, it is a host >> regression at the end, not a recvmmsg/time one. >> >> The EINVAL is occasionally returned from the guts of packet_rcv_vnet >> >> https://elixir.bootlin.com/linux/latest/ident/packet_rcv_vnet >> >> in af_packet. I am going to try to figure out exactly when it happens >> and why. >> >> My sincere apologies, > > No, worries, I'm glad it wasn't me this time ;-) At some point in late 4.x (I am going to try narrowing down the version), gso code introduced a condition which should not occur: We have sinfo(skb)->gso_type = 0 while sinfo(skb)->gso_size = 752. skb->len = 818 skb->data_len = 0 As a result we get a skb which is GSO, but it has a no GSO type. This shows up in virtio_net_hdr_from_skb() which detects the condition as invalid and returns an -EINVAL A few interesting things. 1. Size is always 752. 2. I have found it while tracing an -EINVAL when using recvmmsg() on raw sockets. No idea if it shows up elsewhere. 3. The environment under test is reading/writing to a raw socket opened on a vEth pair configured as follows: ip link add l-veth0 type veth peer name p-veth0 && ifconfig p-veth0 up ifconfig l-veth0 192.168.97.1 netmask 255.255.255.252 the UML linux instance used as a reader/writer relies on recvmmsg/sendmmesg raw socket with vnet headers enabled. virtio_net_hdr_from_skb() is called from the af_packet packet_recv_vnet function. This condition simply should not occur. A skb with no data in the frags, gso on, gso size less than MTU and no type looks bogus. > > Arnd > > _______________________________________________ > linux-um mailing list > linux-um@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-um > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Invalid GSO - from 4.x (TBA) to 5.5-rc1. Was: Re: 64 bit time regression in recvmmsg() 2019-12-09 9:19 ` Anton Ivanov @ 2019-12-09 10:05 ` Anton Ivanov -1 siblings, 0 replies; 12+ messages in thread From: Anton Ivanov @ 2019-12-09 10:05 UTC (permalink / raw) To: Arnd Bergmann; +Cc: netdev, linux-um On 09/12/2019 09:19, Anton Ivanov wrote: > > > On 06/12/2019 20:06, Arnd Bergmann wrote: >> On Fri, Dec 6, 2019 at 6:50 PM Anton Ivanov >> <anton.ivanov@cambridgegreys.com> wrote: >>> On 29/11/2019 16:34, Anton Ivanov wrote: >>> >>> I apologize, problem elsewhere. I have narrowed it down, it is a host >>> regression at the end, not a recvmmsg/time one. >>> >>> The EINVAL is occasionally returned from the guts of packet_rcv_vnet >>> >>> https://elixir.bootlin.com/linux/latest/ident/packet_rcv_vnet >>> >>> in af_packet. I am going to try to figure out exactly when it happens >>> and why. >>> >>> My sincere apologies, >> >> No, worries, I'm glad it wasn't me this time ;-) > > At some point in late 4.x (I am going to try narrowing down the > version), gso code introduced a condition which should not occur: > > We have > > sinfo(skb)->gso_type = 0 > > while > > sinfo(skb)->gso_size = 752. > skb->len = 818 > skb->data_len = 0 > > As a result we get a skb which is GSO, but it has a no GSO type. > > This shows up in virtio_net_hdr_from_skb() which detects the condition > as invalid and returns an -EINVAL > > A few interesting things. > > 1. Size is always 752. > > 2. I have found it while tracing an -EINVAL when using recvmmsg() on raw > sockets. No idea if it shows up elsewhere. > > 3. The environment under test is reading/writing to a raw socket opened > on a vEth pair configured as follows: > > > ip link add l-veth0 type veth peer name p-veth0 && ifconfig p-veth0 up > ifconfig l-veth0 192.168.97.1 netmask 255.255.255.252 > > the UML linux instance used as a reader/writer relies on > recvmmsg/sendmmesg raw socket with vnet headers enabled. > > virtio_net_hdr_from_skb() is called from the af_packet packet_recv_vnet > function. > > This condition simply should not occur. A skb with no data in the frags, > gso on, gso size less than MTU and no type looks bogus. While the gso marking on the frame is bogus, the frame itself actually looks good (as seen from the UML used as a reader). I get ~ 10% throughput increase because recvmmsg does not get broken all the time into smaller vector sizes and no frames reported as dropped due to checksum or other issues. I will send a patch with a proposed fix shortly. We should still find who is producing these frames though. > > >> >> Arnd >> >> _______________________________________________ >> linux-um mailing list >> linux-um@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-um >> > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Invalid GSO - from 4.x (TBA) to 5.5-rc1. Was: Re: 64 bit time regression in recvmmsg() @ 2019-12-09 10:05 ` Anton Ivanov 0 siblings, 0 replies; 12+ messages in thread From: Anton Ivanov @ 2019-12-09 10:05 UTC (permalink / raw) To: Arnd Bergmann; +Cc: netdev, linux-um On 09/12/2019 09:19, Anton Ivanov wrote: > > > On 06/12/2019 20:06, Arnd Bergmann wrote: >> On Fri, Dec 6, 2019 at 6:50 PM Anton Ivanov >> <anton.ivanov@cambridgegreys.com> wrote: >>> On 29/11/2019 16:34, Anton Ivanov wrote: >>> >>> I apologize, problem elsewhere. I have narrowed it down, it is a host >>> regression at the end, not a recvmmsg/time one. >>> >>> The EINVAL is occasionally returned from the guts of packet_rcv_vnet >>> >>> https://elixir.bootlin.com/linux/latest/ident/packet_rcv_vnet >>> >>> in af_packet. I am going to try to figure out exactly when it happens >>> and why. >>> >>> My sincere apologies, >> >> No, worries, I'm glad it wasn't me this time ;-) > > At some point in late 4.x (I am going to try narrowing down the > version), gso code introduced a condition which should not occur: > > We have > > sinfo(skb)->gso_type = 0 > > while > > sinfo(skb)->gso_size = 752. > skb->len = 818 > skb->data_len = 0 > > As a result we get a skb which is GSO, but it has a no GSO type. > > This shows up in virtio_net_hdr_from_skb() which detects the condition > as invalid and returns an -EINVAL > > A few interesting things. > > 1. Size is always 752. > > 2. I have found it while tracing an -EINVAL when using recvmmsg() on raw > sockets. No idea if it shows up elsewhere. > > 3. The environment under test is reading/writing to a raw socket opened > on a vEth pair configured as follows: > > > ip link add l-veth0 type veth peer name p-veth0 && ifconfig p-veth0 up > ifconfig l-veth0 192.168.97.1 netmask 255.255.255.252 > > the UML linux instance used as a reader/writer relies on > recvmmsg/sendmmesg raw socket with vnet headers enabled. > > virtio_net_hdr_from_skb() is called from the af_packet packet_recv_vnet > function. > > This condition simply should not occur. A skb with no data in the frags, > gso on, gso size less than MTU and no type looks bogus. While the gso marking on the frame is bogus, the frame itself actually looks good (as seen from the UML used as a reader). I get ~ 10% throughput increase because recvmmsg does not get broken all the time into smaller vector sizes and no frames reported as dropped due to checksum or other issues. I will send a patch with a proposed fix shortly. We should still find who is producing these frames though. > > >> >> Arnd >> >> _______________________________________________ >> linux-um mailing list >> linux-um@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-um >> > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-12-09 10:05 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-29 14:34 64 bit time regression in recvmmsg() Anton Ivanov 2019-11-29 15:05 ` Geert Uytterhoeven 2019-11-29 15:17 ` Arnd Bergmann 2019-11-29 16:34 ` Anton Ivanov 2019-12-04 21:08 ` Arnd Bergmann 2019-12-05 9:41 ` Anton Ivanov 2019-12-06 17:49 ` Anton Ivanov 2019-12-06 20:06 ` Arnd Bergmann 2019-12-09 9:19 ` Invalid GSO - from 4.x (TBA) to 5.5-rc1. Was: " Anton Ivanov 2019-12-09 9:19 ` Anton Ivanov 2019-12-09 10:05 ` Anton Ivanov 2019-12-09 10:05 ` Anton Ivanov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.