All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	<netdev@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, <kernel-team@fb.com>
Subject: Re: [RFC PATCH v2 net-next] net: Preserve skb delivery time during forward
Date: Fri, 17 Dec 2021 09:57:05 -0800	[thread overview]
Message-ID: <20211217175705.fpbsjoqeuaul3ydp@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <32c53d3c-8393-c5ba-4a43-6e40bd2ed258@iogearbox.net>

On Fri, Dec 17, 2021 at 12:13:14PM +0100, Daniel Borkmann wrote:
> On 12/17/21 8:33 AM, Martin KaFai Lau wrote:
> > On Fri, Dec 17, 2021 at 12:42:30AM +0100, Daniel Borkmann wrote:
> > > On 12/16/21 11:58 PM, Willem de Bruijn wrote:
> > > > > > > @@ -530,7 +538,14 @@ struct skb_shared_info {
> > > > > > >           /* Warning: this field is not always filled in (UFO)! */
> > > > > > >           unsigned short  gso_segs;
> > > > > > >           struct sk_buff  *frag_list;
> > > > > > > -       struct skb_shared_hwtstamps hwtstamps;
> > > > > > > +       union {
> > > > > > > +               /* If SKBTX_DELIVERY_TSTAMP is set in tx_flags,
> > > > > > > +                * tx_delivery_tstamp is stored instead of
> > > > > > > +                * hwtstamps.
> > > > > > > +                */
> > > > > > 
> > > > > > Should we just encode the timebase and/or type { timestamp,
> > > > > > delivery_time } in th lower bits of the timestamp field? Its
> > > > > > resolution is higher than actual clock precision.
> > > > > In skb->tstamp ?
> > > > 
> > > > Yes. Arguably a hack, but those bits are in the noise now, and it
> > > > avoids the clone issue with skb_shinfo (and scarcity of flag bits
> > > > there).
> > > > 
> > > > > > is non-zero skb->tstamp test not sufficient, instead of
> > > > > > SKBTX_DELIVERY_TSTAMP_ALLOW_FWD.
> > > > > > 
> > > > > > It is if only called on the egress path. Is bpf on ingress the only
> > > > > > reason for this?
> > > > > Ah. ic.  meaning testing non-zero skb->tstamp and then call
> > > > > skb_save_delivery_time() only during the veth-egress-path:
> > > > > somewhere in veth_xmit() => veth_forward_skb() but before
> > > > > skb->tstamp was reset to 0 in __dev_forward_skb().
> > > > 
> > > > Right. If delivery_time is the only use of skb->tstamp on egress, and
> > > > timestamp is the only use on ingress, then the only time the
> > > > delivery_time needs to be cached if when looping from egress to
> > > > ingress and this field is non-zero.
> > > > 
> > > > > Keep *_forward() and bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)
> > > > > because the skb->tstamp could be stamped by net_timestamp_check().
> > > > > 
> > > > > Then SKBTX_DELIVERY_TSTAMP_ALLOW_FWD is not needed.
> > > > > 
> > > > > Did I understand your suggestion correctly?
> > > > 
> > > > I think so.
> > > > 
> > > > But the reality is complicated if something may be setting a delivery
> > > > time on ingress (a BPF filter?)
> > > 
> > > I'm not quite following the 'bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)'
> > > part yet; in our case we would need to preserve it as well, for example, we are
> > > redirecting via bpf from bpf@tc-ingress@host-veth to bpf@tc-egress@phys-dev in
> > > the egress path and fq sits on phys-dev.. (I mean if needed we could easily do
> > > that as shown in my prev diff with a flag for the helper).
> > Right, we have the same use case:
> >      redirecting from bpf@tc-ingress@host-veth to bpf@tc-egress@phys-dev in
> >      the egress path and fq sits on phys-dev
> > 
> > My earlier comment was on having the delivery_time preserved in
> > the skb_shared_hwtstamps.  The delivery_time (e.g. EDT) and
> > timestamp (timestamp as RX timestamp) are separately stored when
> > looping from veth-egress to veth-ingress:
> > 
> > 	delivery_time in skb_shared_hwtstamps
> > 	rx timestamp in skb->tstamp
> > 
> > Thus, when bpf_redirect_neigh(phys-dev) happens, bpf_out_*() can
> > continue to reset skb->tstamp as-is while delivery_time will
> > automatically be kept in skb_shared_hwtstamps.  When the skb
> > reaches the egress@phys-dev (__dev_queue_xmit), the delivery_time
> > in skb_shared_hwtstamps will be restored into skb->tstamp (done
> > in skb_restore_delivery_time in this patch).
> 
> I think that could probably work, also in particular if it's restored once
> on stacked devs e.g. when instead of phys-dev we're dealing with upper
> tunnel dev (e.g. vxlan/geneve + BPF with collect_md). Wouldn't we still
> need something like SKBTX_DELIVERY_TSTAMP_ALLOW_FWD, e.g. when the phys
> driver sets skb_hwtstamps(skb)->hwtstamp on RX, and this gets carried on
> the ingress path to the target namespaces' socket?
In the opposite direction phys-dev => host-veth => netns-veth,
the phys driver can set the hwtstamp but it won't set the
SKBTX_DELIVERY_TSTAMP.  This hwstamp gets carried onto the ingress
path of the target namesapce which I think is the current behavior
also ?

From phys-dev => host-veth, the skb is forwarded and its
skb->tstamp reset to 0.  veth_xmit() won't save zero
skb->tstamp into hwstamp and SKBTX_DELIVERY_TSTAMP won't
be set also.

  reply	other threads:[~2021-12-17 17:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15 20:11 [RFC PATCH v2 net-next] net: Preserve skb delivery time during forward Martin KaFai Lau
2021-12-15 22:07 ` Julian Anastasov
2021-12-16 15:32 ` Willem de Bruijn
2021-12-16 22:23   ` Martin KaFai Lau
2021-12-16 22:58     ` Willem de Bruijn
2021-12-16 23:42       ` Daniel Borkmann
2021-12-17  7:33         ` Martin KaFai Lau
2021-12-17 11:13           ` Daniel Borkmann
2021-12-17 17:57             ` Martin KaFai Lau [this message]
2021-12-17  0:33       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211217175705.fpbsjoqeuaul3ydp@kafai-mbp.dhcp.thefacebook.com \
    --to=kafai@fb.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.