All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Martin KaFai Lau <kafai@fb.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	netdev@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	kernel-team@fb.com
Subject: Re: [RFC PATCH v2 net-next] net: Preserve skb delivery time during forward
Date: Fri, 17 Dec 2021 12:13:14 +0100	[thread overview]
Message-ID: <32c53d3c-8393-c5ba-4a43-6e40bd2ed258@iogearbox.net> (raw)
In-Reply-To: <20211217073351.k4thkro443m3fnme@kafai-mbp.dhcp.thefacebook.com>

On 12/17/21 8:33 AM, Martin KaFai Lau wrote:
> On Fri, Dec 17, 2021 at 12:42:30AM +0100, Daniel Borkmann wrote:
>> On 12/16/21 11:58 PM, Willem de Bruijn wrote:
>>>>>> @@ -530,7 +538,14 @@ struct skb_shared_info {
>>>>>>           /* Warning: this field is not always filled in (UFO)! */
>>>>>>           unsigned short  gso_segs;
>>>>>>           struct sk_buff  *frag_list;
>>>>>> -       struct skb_shared_hwtstamps hwtstamps;
>>>>>> +       union {
>>>>>> +               /* If SKBTX_DELIVERY_TSTAMP is set in tx_flags,
>>>>>> +                * tx_delivery_tstamp is stored instead of
>>>>>> +                * hwtstamps.
>>>>>> +                */
>>>>>
>>>>> Should we just encode the timebase and/or type { timestamp,
>>>>> delivery_time } in th lower bits of the timestamp field? Its
>>>>> resolution is higher than actual clock precision.
>>>> In skb->tstamp ?
>>>
>>> Yes. Arguably a hack, but those bits are in the noise now, and it
>>> avoids the clone issue with skb_shinfo (and scarcity of flag bits
>>> there).
>>>
>>>>> is non-zero skb->tstamp test not sufficient, instead of
>>>>> SKBTX_DELIVERY_TSTAMP_ALLOW_FWD.
>>>>>
>>>>> It is if only called on the egress path. Is bpf on ingress the only
>>>>> reason for this?
>>>> Ah. ic.  meaning testing non-zero skb->tstamp and then call
>>>> skb_save_delivery_time() only during the veth-egress-path:
>>>> somewhere in veth_xmit() => veth_forward_skb() but before
>>>> skb->tstamp was reset to 0 in __dev_forward_skb().
>>>
>>> Right. If delivery_time is the only use of skb->tstamp on egress, and
>>> timestamp is the only use on ingress, then the only time the
>>> delivery_time needs to be cached if when looping from egress to
>>> ingress and this field is non-zero.
>>>
>>>> Keep *_forward() and bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)
>>>> because the skb->tstamp could be stamped by net_timestamp_check().
>>>>
>>>> Then SKBTX_DELIVERY_TSTAMP_ALLOW_FWD is not needed.
>>>>
>>>> Did I understand your suggestion correctly?
>>>
>>> I think so.
>>>
>>> But the reality is complicated if something may be setting a delivery
>>> time on ingress (a BPF filter?)
>>
>> I'm not quite following the 'bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)'
>> part yet; in our case we would need to preserve it as well, for example, we are
>> redirecting via bpf from bpf@tc-ingress@host-veth to bpf@tc-egress@phys-dev in
>> the egress path and fq sits on phys-dev.. (I mean if needed we could easily do
>> that as shown in my prev diff with a flag for the helper).
> Right, we have the same use case:
>      redirecting from bpf@tc-ingress@host-veth to bpf@tc-egress@phys-dev in
>      the egress path and fq sits on phys-dev
> 
> My earlier comment was on having the delivery_time preserved in
> the skb_shared_hwtstamps.  The delivery_time (e.g. EDT) and
> timestamp (timestamp as RX timestamp) are separately stored when
> looping from veth-egress to veth-ingress:
> 
> 	delivery_time in skb_shared_hwtstamps
> 	rx timestamp in skb->tstamp
> 
> Thus, when bpf_redirect_neigh(phys-dev) happens, bpf_out_*() can
> continue to reset skb->tstamp as-is while delivery_time will
> automatically be kept in skb_shared_hwtstamps.  When the skb
> reaches the egress@phys-dev (__dev_queue_xmit), the delivery_time
> in skb_shared_hwtstamps will be restored into skb->tstamp (done
> in skb_restore_delivery_time in this patch).

I think that could probably work, also in particular if it's restored once
on stacked devs e.g. when instead of phys-dev we're dealing with upper
tunnel dev (e.g. vxlan/geneve + BPF with collect_md). Wouldn't we still
need something like SKBTX_DELIVERY_TSTAMP_ALLOW_FWD, e.g. when the phys
driver sets skb_hwtstamps(skb)->hwtstamp on RX, and this gets carried on
the ingress path to the target namespaces' socket?

Thanks,
Daniel

  reply	other threads:[~2021-12-17 11:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15 20:11 [RFC PATCH v2 net-next] net: Preserve skb delivery time during forward Martin KaFai Lau
2021-12-15 22:07 ` Julian Anastasov
2021-12-16 15:32 ` Willem de Bruijn
2021-12-16 22:23   ` Martin KaFai Lau
2021-12-16 22:58     ` Willem de Bruijn
2021-12-16 23:42       ` Daniel Borkmann
2021-12-17  7:33         ` Martin KaFai Lau
2021-12-17 11:13           ` Daniel Borkmann [this message]
2021-12-17 17:57             ` Martin KaFai Lau
2021-12-17  0:33       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32c53d3c-8393-c5ba-4a43-6e40bd2ed258@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=ast@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.