All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Fred Klassen <fklassen@appneta.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Shuah Khan <shuah@kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-kselftest@vger.kernel.org,
	Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH net 4/4] net/udpgso_bench_tx: audit error queue
Date: Tue, 28 May 2019 13:07:02 -0400	[thread overview]
Message-ID: <CAF=yD-KcX-zCgZFVVVMU7JFy+gJwRpUoViA_mWdM4QtHNr685g@mail.gmail.com> (raw)
In-Reply-To: <9811659B-6D5A-4C4F-9CF8-735E9CA6DE4E@appneta.com>

On Tue, May 28, 2019 at 12:57 PM Fred Klassen <fklassen@appneta.com> wrote:
>
>
>
> > On May 28, 2019, at 8:08 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> >
>
> I will push up latest patches soon.
>
> I did some testing and discovered that only TCP audit tests failed. They
> failed much less often when enabling poll.  Once in about 20 runs
> still failed. Therefore I commented out the TCP audit tests.

Sounds good, thanks.

> You may be interested that I reduced test lengths from 4 to 3 seconds,
> but I am still getting 3 reports per test. I picked up the extra report by
> changing 'if (tnow > treport)’ to 'if (tnow >= treport)’

Nice!

> > The only issue specific to GSO is that xmit_more can forego this
> > doorbell until the last segment. We want to complicate this logic with
> > a special case based on tx_flags. A process that cares should either
> > not use GSO, or the timestamp should be associated with the last
> > segment as I've been arguing so far.
>
> This is the area I was thinking of looking into. I’m not sure it will work
> or that it will be too messy. It may be worth a little bit of digging to
> see if there is anything there. That will be down the road a bu

Sorry, I meant  we [do not (!)] want to complicate this logic. And
definitely don't want to read skb_shinfo where that cacheline isn't
accessed already.

Given xmit_more, do you still find the first segment the right one to
move the timestamp tx_flags to in __udp_gso_segment?

>
> >>
> >> I’ll get back to you when I have tested this more thoroughly. Early results
> >> suggest that adding the -P poll() option has fixed it without any appreciable
> >> performance hit. I’ll share raw results with you, and we can make a final
> >> decision together.
> >
> > In the main loop? It still is peculiar that notifications appear to go
> > missing unless the process blocks waiting for them. Nothing in
> > sock_zerocopy_callback or the queueing onto the error queue should
> > cause drops, as far as I know.
> >
>
> Now that I know the issue is only in TCP, I can speculate that all bytes are
> being reported, but done with fewer messages. It may warrant some
> investigation in case there is some kind of bug.

This would definitely still be a bug and should not happen. We have
quite a bit of experience with TCP zerocopy and I have not run into
this in practice, so I do think that it is somehow a test artifact.

> > Indeed. Ideally even run all tests, but return error if any failed,
> > like this recent patch
> >
> >  selftests/bpf: fail test_tunnel.sh if subtests fail
> >  https://patchwork.ozlabs.org/patch/1105221/
> >
> > but that may be a lot of code churn and better left to a separate patch.
>
> I like it. I have it coded up, and it seems to work well. I’ll make a
> separate commit in the patch set so we can yank it out if you feel
> it is too much

Great. Yes, it sounds like an independent improvement, in which case
it is easier to review as a separate patch. If you already have it, by
all means send it as part of the larger patchset.

WARNING: multiple messages have this Message-ID (diff)
From: willemdebruijn.kernel at gmail.com (Willem de Bruijn)
Subject: [PATCH net 4/4] net/udpgso_bench_tx: audit error queue
Date: Tue, 28 May 2019 13:07:02 -0400	[thread overview]
Message-ID: <CAF=yD-KcX-zCgZFVVVMU7JFy+gJwRpUoViA_mWdM4QtHNr685g@mail.gmail.com> (raw)
In-Reply-To: <9811659B-6D5A-4C4F-9CF8-735E9CA6DE4E@appneta.com>

On Tue, May 28, 2019 at 12:57 PM Fred Klassen <fklassen at appneta.com> wrote:
>
>
>
> > On May 28, 2019, at 8:08 AM, Willem de Bruijn <willemdebruijn.kernel at gmail.com> wrote:
> >
>
> I will push up latest patches soon.
>
> I did some testing and discovered that only TCP audit tests failed. They
> failed much less often when enabling poll.  Once in about 20 runs
> still failed. Therefore I commented out the TCP audit tests.

Sounds good, thanks.

> You may be interested that I reduced test lengths from 4 to 3 seconds,
> but I am still getting 3 reports per test. I picked up the extra report by
> changing 'if (tnow > treport)’ to 'if (tnow >= treport)’

Nice!

> > The only issue specific to GSO is that xmit_more can forego this
> > doorbell until the last segment. We want to complicate this logic with
> > a special case based on tx_flags. A process that cares should either
> > not use GSO, or the timestamp should be associated with the last
> > segment as I've been arguing so far.
>
> This is the area I was thinking of looking into. I’m not sure it will work
> or that it will be too messy. It may be worth a little bit of digging to
> see if there is anything there. That will be down the road a bu

Sorry, I meant  we [do not (!)] want to complicate this logic. And
definitely don't want to read skb_shinfo where that cacheline isn't
accessed already.

Given xmit_more, do you still find the first segment the right one to
move the timestamp tx_flags to in __udp_gso_segment?

>
> >>
> >> I’ll get back to you when I have tested this more thoroughly. Early results
> >> suggest that adding the -P poll() option has fixed it without any appreciable
> >> performance hit. I’ll share raw results with you, and we can make a final
> >> decision together.
> >
> > In the main loop? It still is peculiar that notifications appear to go
> > missing unless the process blocks waiting for them. Nothing in
> > sock_zerocopy_callback or the queueing onto the error queue should
> > cause drops, as far as I know.
> >
>
> Now that I know the issue is only in TCP, I can speculate that all bytes are
> being reported, but done with fewer messages. It may warrant some
> investigation in case there is some kind of bug.

This would definitely still be a bug and should not happen. We have
quite a bit of experience with TCP zerocopy and I have not run into
this in practice, so I do think that it is somehow a test artifact.

> > Indeed. Ideally even run all tests, but return error if any failed,
> > like this recent patch
> >
> >  selftests/bpf: fail test_tunnel.sh if subtests fail
> >  https://patchwork.ozlabs.org/patch/1105221/
> >
> > but that may be a lot of code churn and better left to a separate patch.
>
> I like it. I have it coded up, and it seems to work well. I’ll make a
> separate commit in the patch set so we can yank it out if you feel
> it is too much

Great. Yes, it sounds like an independent improvement, in which case
it is easier to review as a separate patch. If you already have it, by
all means send it as part of the larger patchset.

WARNING: multiple messages have this Message-ID (diff)
From: willemdebruijn.kernel@gmail.com (Willem de Bruijn)
Subject: [PATCH net 4/4] net/udpgso_bench_tx: audit error queue
Date: Tue, 28 May 2019 13:07:02 -0400	[thread overview]
Message-ID: <CAF=yD-KcX-zCgZFVVVMU7JFy+gJwRpUoViA_mWdM4QtHNr685g@mail.gmail.com> (raw)
Message-ID: <20190528170702.zMUYFASj2sWvfb4qeDillfybGeEKOk8AXubON3fJN18@z> (raw)
In-Reply-To: <9811659B-6D5A-4C4F-9CF8-735E9CA6DE4E@appneta.com>

On Tue, May 28, 2019@12:57 PM Fred Klassen <fklassen@appneta.com> wrote:
>
>
>
> > On May 28, 2019,@8:08 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> >
>
> I will push up latest patches soon.
>
> I did some testing and discovered that only TCP audit tests failed. They
> failed much less often when enabling poll.  Once in about 20 runs
> still failed. Therefore I commented out the TCP audit tests.

Sounds good, thanks.

> You may be interested that I reduced test lengths from 4 to 3 seconds,
> but I am still getting 3 reports per test. I picked up the extra report by
> changing 'if (tnow > treport)’ to 'if (tnow >= treport)’

Nice!

> > The only issue specific to GSO is that xmit_more can forego this
> > doorbell until the last segment. We want to complicate this logic with
> > a special case based on tx_flags. A process that cares should either
> > not use GSO, or the timestamp should be associated with the last
> > segment as I've been arguing so far.
>
> This is the area I was thinking of looking into. I’m not sure it will work
> or that it will be too messy. It may be worth a little bit of digging to
> see if there is anything there. That will be down the road a bu

Sorry, I meant  we [do not (!)] want to complicate this logic. And
definitely don't want to read skb_shinfo where that cacheline isn't
accessed already.

Given xmit_more, do you still find the first segment the right one to
move the timestamp tx_flags to in __udp_gso_segment?

>
> >>
> >> I’ll get back to you when I have tested this more thoroughly. Early results
> >> suggest that adding the -P poll() option has fixed it without any appreciable
> >> performance hit. I’ll share raw results with you, and we can make a final
> >> decision together.
> >
> > In the main loop? It still is peculiar that notifications appear to go
> > missing unless the process blocks waiting for them. Nothing in
> > sock_zerocopy_callback or the queueing onto the error queue should
> > cause drops, as far as I know.
> >
>
> Now that I know the issue is only in TCP, I can speculate that all bytes are
> being reported, but done with fewer messages. It may warrant some
> investigation in case there is some kind of bug.

This would definitely still be a bug and should not happen. We have
quite a bit of experience with TCP zerocopy and I have not run into
this in practice, so I do think that it is somehow a test artifact.

> > Indeed. Ideally even run all tests, but return error if any failed,
> > like this recent patch
> >
> >  selftests/bpf: fail test_tunnel.sh if subtests fail
> >  https://patchwork.ozlabs.org/patch/1105221/
> >
> > but that may be a lot of code churn and better left to a separate patch.
>
> I like it. I have it coded up, and it seems to work well. I’ll make a
> separate commit in the patch set so we can yank it out if you feel
> it is too much

Great. Yes, it sounds like an independent improvement, in which case
it is easier to review as a separate patch. If you already have it, by
all means send it as part of the larger patchset.

  reply	other threads:[~2019-05-28 17:07 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-23 21:06 [PATCH net 0/4] Allow TX timestamp with UDP GSO Fred Klassen
2019-05-23 21:06 ` Fred Klassen
2019-05-23 21:06 ` fklassen
2019-05-23 21:06 ` [PATCH net 1/4] net/udp_gso: " Fred Klassen
2019-05-23 21:06   ` Fred Klassen
2019-05-23 21:06   ` fklassen
2019-05-23 21:39   ` Willem de Bruijn
2019-05-23 21:39     ` Willem de Bruijn
2019-05-23 21:39     ` willemdebruijn.kernel
2019-05-24  1:38     ` Fred Klassen
2019-05-24  1:38       ` Fred Klassen
2019-05-24  1:38       ` fklassen
2019-05-24  4:53       ` Willem de Bruijn
2019-05-24  4:53         ` Willem de Bruijn
2019-05-24  4:53         ` willemdebruijn.kernel
2019-05-24 16:34         ` Fred Klassen
2019-05-24 16:34           ` Fred Klassen
2019-05-24 16:34           ` fklassen
2019-05-24 19:29           ` Willem de Bruijn
2019-05-24 19:29             ` Willem de Bruijn
2019-05-24 19:29             ` willemdebruijn.kernel
2019-05-24 22:01             ` Fred Klassen
2019-05-24 22:01               ` Fred Klassen
2019-05-24 22:01               ` fklassen
2019-05-25 15:20               ` Willem de Bruijn
2019-05-25 15:20                 ` Willem de Bruijn
2019-05-25 15:20                 ` willemdebruijn.kernel
2019-05-25 18:47                 ` Fred Klassen
2019-05-25 18:47                   ` Fred Klassen
2019-05-25 18:47                   ` fklassen
2019-05-27  1:30                   ` Willem de Bruijn
2019-05-27  1:30                     ` Willem de Bruijn
2019-05-27  1:30                     ` willemdebruijn.kernel
2019-05-27  2:09                     ` Willem de Bruijn
2019-05-27  2:09                       ` Willem de Bruijn
2019-05-27  2:09                       ` willemdebruijn.kernel
2019-05-25 20:46     ` Fred Klassen
2019-05-25 20:46       ` Fred Klassen
2019-05-25 20:46       ` fklassen
2019-05-23 21:59   ` Willem de Bruijn
2019-05-23 21:59     ` Willem de Bruijn
2019-05-23 21:59     ` willemdebruijn.kernel
2019-05-25 20:09     ` Fred Klassen
2019-05-25 20:09       ` Fred Klassen
2019-05-25 20:09       ` fklassen
2019-05-25 20:47     ` Fred Klassen
2019-05-25 20:47       ` Fred Klassen
2019-05-25 20:47       ` fklassen
2019-05-23 21:06 ` [PATCH net 2/4] net/udpgso_bench_tx: options to exercise TX CMSG Fred Klassen
2019-05-23 21:06   ` Fred Klassen
2019-05-23 21:06   ` fklassen
2019-05-23 21:45   ` Willem de Bruijn
2019-05-23 21:45     ` Willem de Bruijn
2019-05-23 21:45     ` willemdebruijn.kernel
2019-05-23 21:52   ` Willem de Bruijn
2019-05-23 21:52     ` Willem de Bruijn
2019-05-23 21:52     ` willemdebruijn.kernel
2019-05-24  2:10     ` Fred Klassen
2019-05-24  2:10       ` Fred Klassen
2019-05-24  2:10       ` fklassen
2019-05-23 21:06 ` [PATCH net 3/4] net/udpgso_bench_tx: fix sendmmsg on unconnected socket Fred Klassen
2019-05-23 21:06   ` Fred Klassen
2019-05-23 21:06   ` fklassen
2019-05-23 21:06 ` [PATCH net 4/4] net/udpgso_bench_tx: audit error queue Fred Klassen
2019-05-23 21:06   ` Fred Klassen
2019-05-23 21:06   ` fklassen
2019-05-23 21:56   ` Willem de Bruijn
2019-05-23 21:56     ` Willem de Bruijn
2019-05-23 21:56     ` willemdebruijn.kernel
2019-05-24  1:27     ` Fred Klassen
2019-05-24  1:27       ` Fred Klassen
2019-05-24  1:27       ` fklassen
2019-05-24  5:02       ` Willem de Bruijn
2019-05-24  5:02         ` Willem de Bruijn
2019-05-24  5:02         ` willemdebruijn.kernel
2019-05-27 21:30     ` Fred Klassen
2019-05-27 21:30       ` Fred Klassen
2019-05-27 21:30       ` fklassen
2019-05-27 21:46       ` Willem de Bruijn
2019-05-27 21:46         ` Willem de Bruijn
2019-05-27 21:46         ` willemdebruijn.kernel
2019-05-27 22:56         ` Fred Klassen
2019-05-27 22:56           ` Fred Klassen
2019-05-27 22:56           ` fklassen
2019-05-28  1:15           ` Willem de Bruijn
2019-05-28  1:15             ` Willem de Bruijn
2019-05-28  1:15             ` willemdebruijn.kernel
2019-05-28  5:19             ` Fred Klassen
2019-05-28  5:19               ` Fred Klassen
2019-05-28  5:19               ` fklassen
2019-05-28 15:08               ` Willem de Bruijn
2019-05-28 15:08                 ` Willem de Bruijn
2019-05-28 15:08                 ` willemdebruijn.kernel
2019-05-28 16:57                 ` Fred Klassen
2019-05-28 16:57                   ` Fred Klassen
2019-05-28 16:57                   ` fklassen
2019-05-28 17:07                   ` Willem de Bruijn [this message]
2019-05-28 17:07                     ` Willem de Bruijn
2019-05-28 17:07                     ` willemdebruijn.kernel
2019-05-28 17:11                     ` Willem de Bruijn
2019-05-28 17:11                       ` Willem de Bruijn
2019-05-28 17:11                       ` willemdebruijn.kernel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF=yD-KcX-zCgZFVVVMU7JFy+gJwRpUoViA_mWdM4QtHNr685g@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=fklassen@appneta.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=shuah@kernel.org \
    --cc=willemb@google.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.