All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: <netdev@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, <kernel-team@fb.com>
Subject: Re: [RFC PATCH v2 net-next] net: Preserve skb delivery time during forward
Date: Thu, 16 Dec 2021 16:33:07 -0800	[thread overview]
Message-ID: <20211217003307.hm6yoznmpfu5jd26@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CA+FuTSfsrMUAz-5Huf2j4f35ttqO5gpFKvsn4uJLXtRPqEaKEg@mail.gmail.com>

On Thu, Dec 16, 2021 at 05:58:49PM -0500, Willem de Bruijn wrote:
> > > > @@ -530,7 +538,14 @@ struct skb_shared_info {
> > > >         /* Warning: this field is not always filled in (UFO)! */
> > > >         unsigned short  gso_segs;
> > > >         struct sk_buff  *frag_list;
> > > > -       struct skb_shared_hwtstamps hwtstamps;
> > > > +       union {
> > > > +               /* If SKBTX_DELIVERY_TSTAMP is set in tx_flags,
> > > > +                * tx_delivery_tstamp is stored instead of
> > > > +                * hwtstamps.
> > > > +                */
> > >
> > > Should we just encode the timebase and/or type { timestamp,
> > > delivery_time } in th lower bits of the timestamp field? Its
> > > resolution is higher than actual clock precision.
> > In skb->tstamp ?
> 
> Yes. Arguably a hack, but those bits are in the noise now, and it
> avoids the clone issue with skb_shinfo (and scarcity of flag bits
> there).
> 
> > >
> > > is non-zero skb->tstamp test not sufficient, instead of
> > > SKBTX_DELIVERY_TSTAMP_ALLOW_FWD.
> > >
> > > It is if only called on the egress path. Is bpf on ingress the only
> > > reason for this?
> > Ah. ic.  meaning testing non-zero skb->tstamp and then call
> > skb_save_delivery_time() only during the veth-egress-path:
> > somewhere in veth_xmit() => veth_forward_skb() but before
> > skb->tstamp was reset to 0 in __dev_forward_skb().
> 
> Right. If delivery_time is the only use of skb->tstamp on egress, and
> timestamp is the only use on ingress, then the only time the
> delivery_time needs to be cached if when looping from egress to
> ingress and this field is non-zero.
> 
> >
> > Keep *_forward() and bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)
> > because the skb->tstamp could be stamped by net_timestamp_check().
> >
> > Then SKBTX_DELIVERY_TSTAMP_ALLOW_FWD is not needed.
> >
> > Did I understand your suggestion correctly?
> 
> I think so.
> 
> But the reality is complicated if something may be setting a delivery
> time on ingress (a BPF filter?)
If bpf@ingress needs to set a delivery_time, the only reasonable
usecase is to finally egress it out by calling bpf_redirect_neigh().
One option is to have a new bpf_redirect_*() helper to take the fifth
'delivery_time' argument and have the skb_do_redirect() to set
the delivery_time in skb.  An extra helper is not ideal but probably
acceptable considering other tricky constraints we are working with.

Another potential issue is,
after looping from egress to ingress, the skb->tstamp has the delivery_time.
If it is passing up to the stack, it needs to be reset back to
timestamp (skb->tstamp = ktime_get_real()) before it is used.
Not sure what is the best place to do it (?) ....hmm 

> >
> > However, we still need a bit to distinguish tx_delivery_tstamp
> > from hwtstamps.
> >
> > >
> > > > +{
> > > > +       if (skb_shinfo(skb)->tx_flags & SKBTX_DELIVERY_TSTAMP_ALLOW_FWD) {
> > > > +               skb_shinfo(skb)->tx_delivery_tstamp = skb->tstamp;
> > > > +               skb_shinfo(skb)->tx_flags |= SKBTX_DELIVERY_TSTAMP;
> > > > +               skb_shinfo(skb)->tx_flags &= ~SKBTX_DELIVERY_TSTAMP_ALLOW_FWD;
> > > > +       }
> > >
> > > Is this only called when there are no clones/shares?
> > No, I don't think so.  TCP clone it.  I also started thinking about
> > this after noticing a mistake in the change in  __tcp_transmit_skb().
> >
> > There are other places that change tx_flags, e.g. tcp_offload.c.
> > It is not shared at those places or there is some specific points
> > in the stack that is safe to change ?
> 
> The packet probably is not yet shared. Until the TCP stack gives a
> packet to the IP layer, it can treat it as exclusive.
> 
> Though it does seem that these fields are accessed in a possibly racy
> manner. Drivers with hardware tx timestamp offload may set
> skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS without checking
> whether the skb may be cloned.

      parent reply	other threads:[~2021-12-17  0:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15 20:11 [RFC PATCH v2 net-next] net: Preserve skb delivery time during forward Martin KaFai Lau
2021-12-15 22:07 ` Julian Anastasov
2021-12-16 15:32 ` Willem de Bruijn
2021-12-16 22:23   ` Martin KaFai Lau
2021-12-16 22:58     ` Willem de Bruijn
2021-12-16 23:42       ` Daniel Borkmann
2021-12-17  7:33         ` Martin KaFai Lau
2021-12-17 11:13           ` Daniel Borkmann
2021-12-17 17:57             ` Martin KaFai Lau
2021-12-17  0:33       ` Martin KaFai Lau [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211217003307.hm6yoznmpfu5jd26@kafai-mbp.dhcp.thefacebook.com \
    --to=kafai@fb.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.