All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Fred Klassen <fklassen@appneta.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Shuah Khan <shuah@kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH net 1/4] net/udp_gso: Allow TX timestamp with UDP GSO
Date: Sun, 26 May 2019 21:09:03 -0500	[thread overview]
Message-ID: <CAF=yD-+h2qJP0M5XQrcFVfyn3TP7Jd0UJ1zFf0kbUeC9uKKNxQ@mail.gmail.com> (raw)
In-Reply-To: <CAF=yD-KTJGYY-yf=+zwa8SyrCNAfZjqjomJ=B=yFcs+juDeShA@mail.gmail.com>

On Sun, May 26, 2019 at 8:30 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Sat, May 25, 2019 at 1:47 PM Fred Klassen <fklassen@appneta.com> wrote:
> >
> >
> >
> > > On May 25, 2019, at 8:20 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Fri, May 24, 2019 at 6:01 PM Fred Klassen <fklassen@appneta.com> wrote:
> > >>
> > >>
> > >>
> > >>> On May 24, 2019, at 12:29 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > >>>
> > >>> It is the last moment that a timestamp can be generated for the last
> > >>> byte, I don't see how that is "neither the start nor the end of a GSO
> > >>> packet”.
> > >>
> > >> My misunderstanding. I thought TCP did last segment timestamping, not
> > >> last byte. In that case, your statements make sense.
> > >>
> > >>>> It would be interesting if a practical case can be made for timestamping
> > >>>> the last segment. In my mind, I don’t see how that would be valuable.
> > >>>
> > >>> It depends whether you are interested in measuring network latency or
> > >>> host transmit path latency.
> > >>>
> > >>> For the latter, knowing the time from the start of the sendmsg call to
> > >>> the moment the last byte hits the wire is most relevant. Or in absence
> > >>> of (well defined) hardware support, the last byte being queued to the
> > >>> device is the next best thing.
> > >
> > > Sounds to me like both cases have a legitimate use case, and we want
> > > to support both.
> > >
> > > Implementation constraints are that storage for this timestamp
> > > information is scarce and we cannot add new cold cacheline accesses in
> > > the datapath.
> > >
> > > The simplest approach would be to unconditionally timestamp both the
> > > first and last segment. With the same ID. Not terribly elegant. But it
> > > works.
> > >
> > > If conditional, tx_flags has only one bit left. I think we can harvest
> > > some, as not all defined bits are in use at the same stages in the
> > > datapath, but that is not a trivial change. Some might also better be
> > > set in the skb, instead of skb_shinfo. Which would also avoids
> > > touching that cacheline. We could possibly repurpose bits from u32
> > > tskey.
> > >
> > > All that can come later. Initially, unless we can come up with
> > > something more elegant, I would suggest that UDP follows the rule
> > > established by TCP and timestamps the last byte. And we add an
> > > explicit SOF_TIMESTAMPING_OPT_FIRSTBYTE that is initially only
> > > supported for UDP, sets a new SKBTX_TX_FB_TSTAMP bit in
> > > __sock_tx_timestamp and is interpreted in __udp_gso_segment.
> > >
> >
> > I don’t see how to practically TX timestamp the last byte of any packet
> > (UDP GSO or otherwise). The best we could do is timestamp the last
> > segment,  or rather the time that the last segment is queued. Let me
> > attempt to explain.
> >
> > First let’s look at software TX timestamps which are for are generated
> > by skb_tx_timestamp() in nearly every network driver’s xmit routine. It
> > states:
> >
> > —————————— cut ————————————
> >  * Ethernet MAC Drivers should call this function in their hard_xmit()
> >  * function immediately before giving the sk_buff to the MAC hardware.
> > —————————— cut ————————————
> >
> > That means that the sk_buff will get timestamped just before rather
> > than just after it is sent. To truly capture the timestamp of the last
> > byte, this routine routine would have to be called a second time, right
> > after sending to MAC hardware. Then the user program would have
> > sort out the 2 timestamps. My guess is that this isn’t something that
> > NIC vendors would be willing to implement in their drivers.
> >
> > So, the best we can do is timestamp is just before the last segment.
> > Suppose UDP GSO sends 3000 bytes to a 1500 byte MTU adapter.
> > If we set SKBTX_HW_TSTAMP flag on the last segment, the timestamp
> > occurs half way through the burst. But it may not be exactly half way
> > because the segments may get queued much faster than wire rate.
> > Therefore the time between segment 1 and segment 2 may be much
> > much smaller than their spacing on the wire. I would not find this
> > useful.
>
> For measuring host queueing latency, a timestamp at the existing
> skb_tx_timestamp() for the last segment is perfectly informative.

In most cases all segments will be sent in a single xmit_more train.
In which case the device doorbell is rung when the last segment is
queued.

A device may also pause in the middle of a train, causing the rest of
the list to be requeued and resent after a tx completion frees up
descriptors and wakes the device. This seems like a relevant exception
to be able to measure.

That said, I am not opposed to the first segment, if we have to make a
binary choice for a default. Either option has cons. See more specific
revision requests in the v2 patch.

WARNING: multiple messages have this Message-ID (diff)
From: willemdebruijn.kernel at gmail.com (Willem de Bruijn)
Subject: [PATCH net 1/4] net/udp_gso: Allow TX timestamp with UDP GSO
Date: Sun, 26 May 2019 21:09:03 -0500	[thread overview]
Message-ID: <CAF=yD-+h2qJP0M5XQrcFVfyn3TP7Jd0UJ1zFf0kbUeC9uKKNxQ@mail.gmail.com> (raw)
In-Reply-To: <CAF=yD-KTJGYY-yf=+zwa8SyrCNAfZjqjomJ=B=yFcs+juDeShA@mail.gmail.com>

On Sun, May 26, 2019 at 8:30 PM Willem de Bruijn
<willemdebruijn.kernel at gmail.com> wrote:
>
> On Sat, May 25, 2019 at 1:47 PM Fred Klassen <fklassen at appneta.com> wrote:
> >
> >
> >
> > > On May 25, 2019, at 8:20 AM, Willem de Bruijn <willemdebruijn.kernel at gmail.com> wrote:
> > >
> > > On Fri, May 24, 2019 at 6:01 PM Fred Klassen <fklassen at appneta.com> wrote:
> > >>
> > >>
> > >>
> > >>> On May 24, 2019, at 12:29 PM, Willem de Bruijn <willemdebruijn.kernel at gmail.com> wrote:
> > >>>
> > >>> It is the last moment that a timestamp can be generated for the last
> > >>> byte, I don't see how that is "neither the start nor the end of a GSO
> > >>> packet”.
> > >>
> > >> My misunderstanding. I thought TCP did last segment timestamping, not
> > >> last byte. In that case, your statements make sense.
> > >>
> > >>>> It would be interesting if a practical case can be made for timestamping
> > >>>> the last segment. In my mind, I don’t see how that would be valuable.
> > >>>
> > >>> It depends whether you are interested in measuring network latency or
> > >>> host transmit path latency.
> > >>>
> > >>> For the latter, knowing the time from the start of the sendmsg call to
> > >>> the moment the last byte hits the wire is most relevant. Or in absence
> > >>> of (well defined) hardware support, the last byte being queued to the
> > >>> device is the next best thing.
> > >
> > > Sounds to me like both cases have a legitimate use case, and we want
> > > to support both.
> > >
> > > Implementation constraints are that storage for this timestamp
> > > information is scarce and we cannot add new cold cacheline accesses in
> > > the datapath.
> > >
> > > The simplest approach would be to unconditionally timestamp both the
> > > first and last segment. With the same ID. Not terribly elegant. But it
> > > works.
> > >
> > > If conditional, tx_flags has only one bit left. I think we can harvest
> > > some, as not all defined bits are in use at the same stages in the
> > > datapath, but that is not a trivial change. Some might also better be
> > > set in the skb, instead of skb_shinfo. Which would also avoids
> > > touching that cacheline. We could possibly repurpose bits from u32
> > > tskey.
> > >
> > > All that can come later. Initially, unless we can come up with
> > > something more elegant, I would suggest that UDP follows the rule
> > > established by TCP and timestamps the last byte. And we add an
> > > explicit SOF_TIMESTAMPING_OPT_FIRSTBYTE that is initially only
> > > supported for UDP, sets a new SKBTX_TX_FB_TSTAMP bit in
> > > __sock_tx_timestamp and is interpreted in __udp_gso_segment.
> > >
> >
> > I don’t see how to practically TX timestamp the last byte of any packet
> > (UDP GSO or otherwise). The best we could do is timestamp the last
> > segment,  or rather the time that the last segment is queued. Let me
> > attempt to explain.
> >
> > First let’s look at software TX timestamps which are for are generated
> > by skb_tx_timestamp() in nearly every network driver’s xmit routine. It
> > states:
> >
> > —————————— cut ————————————
> >  * Ethernet MAC Drivers should call this function in their hard_xmit()
> >  * function immediately before giving the sk_buff to the MAC hardware.
> > —————————— cut ————————————
> >
> > That means that the sk_buff will get timestamped just before rather
> > than just after it is sent. To truly capture the timestamp of the last
> > byte, this routine routine would have to be called a second time, right
> > after sending to MAC hardware. Then the user program would have
> > sort out the 2 timestamps. My guess is that this isn’t something that
> > NIC vendors would be willing to implement in their drivers.
> >
> > So, the best we can do is timestamp is just before the last segment.
> > Suppose UDP GSO sends 3000 bytes to a 1500 byte MTU adapter.
> > If we set SKBTX_HW_TSTAMP flag on the last segment, the timestamp
> > occurs half way through the burst. But it may not be exactly half way
> > because the segments may get queued much faster than wire rate.
> > Therefore the time between segment 1 and segment 2 may be much
> > much smaller than their spacing on the wire. I would not find this
> > useful.
>
> For measuring host queueing latency, a timestamp at the existing
> skb_tx_timestamp() for the last segment is perfectly informative.

In most cases all segments will be sent in a single xmit_more train.
In which case the device doorbell is rung when the last segment is
queued.

A device may also pause in the middle of a train, causing the rest of
the list to be requeued and resent after a tx completion frees up
descriptors and wakes the device. This seems like a relevant exception
to be able to measure.

That said, I am not opposed to the first segment, if we have to make a
binary choice for a default. Either option has cons. See more specific
revision requests in the v2 patch.

WARNING: multiple messages have this Message-ID (diff)
From: willemdebruijn.kernel@gmail.com (Willem de Bruijn)
Subject: [PATCH net 1/4] net/udp_gso: Allow TX timestamp with UDP GSO
Date: Sun, 26 May 2019 21:09:03 -0500	[thread overview]
Message-ID: <CAF=yD-+h2qJP0M5XQrcFVfyn3TP7Jd0UJ1zFf0kbUeC9uKKNxQ@mail.gmail.com> (raw)
Message-ID: <20190527020903.DL2FvfOGXXKUrOIfQz99rs792OmnPW9-wuZtu7h-nfY@z> (raw)
In-Reply-To: <CAF=yD-KTJGYY-yf=+zwa8SyrCNAfZjqjomJ=B=yFcs+juDeShA@mail.gmail.com>

On Sun, May 26, 2019 at 8:30 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Sat, May 25, 2019@1:47 PM Fred Klassen <fklassen@appneta.com> wrote:
> >
> >
> >
> > > On May 25, 2019,@8:20 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Fri, May 24, 2019@6:01 PM Fred Klassen <fklassen@appneta.com> wrote:
> > >>
> > >>
> > >>
> > >>> On May 24, 2019,@12:29 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > >>>
> > >>> It is the last moment that a timestamp can be generated for the last
> > >>> byte, I don't see how that is "neither the start nor the end of a GSO
> > >>> packet”.
> > >>
> > >> My misunderstanding. I thought TCP did last segment timestamping, not
> > >> last byte. In that case, your statements make sense.
> > >>
> > >>>> It would be interesting if a practical case can be made for timestamping
> > >>>> the last segment. In my mind, I don’t see how that would be valuable.
> > >>>
> > >>> It depends whether you are interested in measuring network latency or
> > >>> host transmit path latency.
> > >>>
> > >>> For the latter, knowing the time from the start of the sendmsg call to
> > >>> the moment the last byte hits the wire is most relevant. Or in absence
> > >>> of (well defined) hardware support, the last byte being queued to the
> > >>> device is the next best thing.
> > >
> > > Sounds to me like both cases have a legitimate use case, and we want
> > > to support both.
> > >
> > > Implementation constraints are that storage for this timestamp
> > > information is scarce and we cannot add new cold cacheline accesses in
> > > the datapath.
> > >
> > > The simplest approach would be to unconditionally timestamp both the
> > > first and last segment. With the same ID. Not terribly elegant. But it
> > > works.
> > >
> > > If conditional, tx_flags has only one bit left. I think we can harvest
> > > some, as not all defined bits are in use at the same stages in the
> > > datapath, but that is not a trivial change. Some might also better be
> > > set in the skb, instead of skb_shinfo. Which would also avoids
> > > touching that cacheline. We could possibly repurpose bits from u32
> > > tskey.
> > >
> > > All that can come later. Initially, unless we can come up with
> > > something more elegant, I would suggest that UDP follows the rule
> > > established by TCP and timestamps the last byte. And we add an
> > > explicit SOF_TIMESTAMPING_OPT_FIRSTBYTE that is initially only
> > > supported for UDP, sets a new SKBTX_TX_FB_TSTAMP bit in
> > > __sock_tx_timestamp and is interpreted in __udp_gso_segment.
> > >
> >
> > I don’t see how to practically TX timestamp the last byte of any packet
> > (UDP GSO or otherwise). The best we could do is timestamp the last
> > segment,  or rather the time that the last segment is queued. Let me
> > attempt to explain.
> >
> > First let’s look at software TX timestamps which are for are generated
> > by skb_tx_timestamp() in nearly every network driver’s xmit routine. It
> > states:
> >
> > —————————— cut ————————————
> >  * Ethernet MAC Drivers should call this function in their hard_xmit()
> >  * function immediately before giving the sk_buff to the MAC hardware.
> > —————————— cut ————————————
> >
> > That means that the sk_buff will get timestamped just before rather
> > than just after it is sent. To truly capture the timestamp of the last
> > byte, this routine routine would have to be called a second time, right
> > after sending to MAC hardware. Then the user program would have
> > sort out the 2 timestamps. My guess is that this isn’t something that
> > NIC vendors would be willing to implement in their drivers.
> >
> > So, the best we can do is timestamp is just before the last segment.
> > Suppose UDP GSO sends 3000 bytes to a 1500 byte MTU adapter.
> > If we set SKBTX_HW_TSTAMP flag on the last segment, the timestamp
> > occurs half way through the burst. But it may not be exactly half way
> > because the segments may get queued much faster than wire rate.
> > Therefore the time between segment 1 and segment 2 may be much
> > much smaller than their spacing on the wire. I would not find this
> > useful.
>
> For measuring host queueing latency, a timestamp at the existing
> skb_tx_timestamp() for the last segment is perfectly informative.

In most cases all segments will be sent in a single xmit_more train.
In which case the device doorbell is rung when the last segment is
queued.

A device may also pause in the middle of a train, causing the rest of
the list to be requeued and resent after a tx completion frees up
descriptors and wakes the device. This seems like a relevant exception
to be able to measure.

That said, I am not opposed to the first segment, if we have to make a
binary choice for a default. Either option has cons. See more specific
revision requests in the v2 patch.

  reply	other threads:[~2019-05-27  2:09 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-23 21:06 [PATCH net 0/4] Allow TX timestamp with UDP GSO Fred Klassen
2019-05-23 21:06 ` Fred Klassen
2019-05-23 21:06 ` fklassen
2019-05-23 21:06 ` [PATCH net 1/4] net/udp_gso: " Fred Klassen
2019-05-23 21:06   ` Fred Klassen
2019-05-23 21:06   ` fklassen
2019-05-23 21:39   ` Willem de Bruijn
2019-05-23 21:39     ` Willem de Bruijn
2019-05-23 21:39     ` willemdebruijn.kernel
2019-05-24  1:38     ` Fred Klassen
2019-05-24  1:38       ` Fred Klassen
2019-05-24  1:38       ` fklassen
2019-05-24  4:53       ` Willem de Bruijn
2019-05-24  4:53         ` Willem de Bruijn
2019-05-24  4:53         ` willemdebruijn.kernel
2019-05-24 16:34         ` Fred Klassen
2019-05-24 16:34           ` Fred Klassen
2019-05-24 16:34           ` fklassen
2019-05-24 19:29           ` Willem de Bruijn
2019-05-24 19:29             ` Willem de Bruijn
2019-05-24 19:29             ` willemdebruijn.kernel
2019-05-24 22:01             ` Fred Klassen
2019-05-24 22:01               ` Fred Klassen
2019-05-24 22:01               ` fklassen
2019-05-25 15:20               ` Willem de Bruijn
2019-05-25 15:20                 ` Willem de Bruijn
2019-05-25 15:20                 ` willemdebruijn.kernel
2019-05-25 18:47                 ` Fred Klassen
2019-05-25 18:47                   ` Fred Klassen
2019-05-25 18:47                   ` fklassen
2019-05-27  1:30                   ` Willem de Bruijn
2019-05-27  1:30                     ` Willem de Bruijn
2019-05-27  1:30                     ` willemdebruijn.kernel
2019-05-27  2:09                     ` Willem de Bruijn [this message]
2019-05-27  2:09                       ` Willem de Bruijn
2019-05-27  2:09                       ` willemdebruijn.kernel
2019-05-25 20:46     ` Fred Klassen
2019-05-25 20:46       ` Fred Klassen
2019-05-25 20:46       ` fklassen
2019-05-23 21:59   ` Willem de Bruijn
2019-05-23 21:59     ` Willem de Bruijn
2019-05-23 21:59     ` willemdebruijn.kernel
2019-05-25 20:09     ` Fred Klassen
2019-05-25 20:09       ` Fred Klassen
2019-05-25 20:09       ` fklassen
2019-05-25 20:47     ` Fred Klassen
2019-05-25 20:47       ` Fred Klassen
2019-05-25 20:47       ` fklassen
2019-05-23 21:06 ` [PATCH net 2/4] net/udpgso_bench_tx: options to exercise TX CMSG Fred Klassen
2019-05-23 21:06   ` Fred Klassen
2019-05-23 21:06   ` fklassen
2019-05-23 21:45   ` Willem de Bruijn
2019-05-23 21:45     ` Willem de Bruijn
2019-05-23 21:45     ` willemdebruijn.kernel
2019-05-23 21:52   ` Willem de Bruijn
2019-05-23 21:52     ` Willem de Bruijn
2019-05-23 21:52     ` willemdebruijn.kernel
2019-05-24  2:10     ` Fred Klassen
2019-05-24  2:10       ` Fred Klassen
2019-05-24  2:10       ` fklassen
2019-05-23 21:06 ` [PATCH net 3/4] net/udpgso_bench_tx: fix sendmmsg on unconnected socket Fred Klassen
2019-05-23 21:06   ` Fred Klassen
2019-05-23 21:06   ` fklassen
2019-05-23 21:06 ` [PATCH net 4/4] net/udpgso_bench_tx: audit error queue Fred Klassen
2019-05-23 21:06   ` Fred Klassen
2019-05-23 21:06   ` fklassen
2019-05-23 21:56   ` Willem de Bruijn
2019-05-23 21:56     ` Willem de Bruijn
2019-05-23 21:56     ` willemdebruijn.kernel
2019-05-24  1:27     ` Fred Klassen
2019-05-24  1:27       ` Fred Klassen
2019-05-24  1:27       ` fklassen
2019-05-24  5:02       ` Willem de Bruijn
2019-05-24  5:02         ` Willem de Bruijn
2019-05-24  5:02         ` willemdebruijn.kernel
2019-05-27 21:30     ` Fred Klassen
2019-05-27 21:30       ` Fred Klassen
2019-05-27 21:30       ` fklassen
2019-05-27 21:46       ` Willem de Bruijn
2019-05-27 21:46         ` Willem de Bruijn
2019-05-27 21:46         ` willemdebruijn.kernel
2019-05-27 22:56         ` Fred Klassen
2019-05-27 22:56           ` Fred Klassen
2019-05-27 22:56           ` fklassen
2019-05-28  1:15           ` Willem de Bruijn
2019-05-28  1:15             ` Willem de Bruijn
2019-05-28  1:15             ` willemdebruijn.kernel
2019-05-28  5:19             ` Fred Klassen
2019-05-28  5:19               ` Fred Klassen
2019-05-28  5:19               ` fklassen
2019-05-28 15:08               ` Willem de Bruijn
2019-05-28 15:08                 ` Willem de Bruijn
2019-05-28 15:08                 ` willemdebruijn.kernel
2019-05-28 16:57                 ` Fred Klassen
2019-05-28 16:57                   ` Fred Klassen
2019-05-28 16:57                   ` fklassen
2019-05-28 17:07                   ` Willem de Bruijn
2019-05-28 17:07                     ` Willem de Bruijn
2019-05-28 17:07                     ` willemdebruijn.kernel
2019-05-28 17:11                     ` Willem de Bruijn
2019-05-28 17:11                       ` Willem de Bruijn
2019-05-28 17:11                       ` willemdebruijn.kernel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF=yD-+h2qJP0M5XQrcFVfyn3TP7Jd0UJ1zFf0kbUeC9uKKNxQ@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=fklassen@appneta.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=shuah@kernel.org \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.