All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Martin KaFai Lau <kafai@fb.com>
Cc: netdev@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	kernel-team@fb.com
Subject: Re: [RFC PATCH v2 net-next] net: Preserve skb delivery time during forward
Date: Fri, 17 Dec 2021 00:42:30 +0100	[thread overview]
Message-ID: <ca11f6f6-86f9-52c9-4251-90bf0b6f588a@iogearbox.net> (raw)
In-Reply-To: <CA+FuTSfsrMUAz-5Huf2j4f35ttqO5gpFKvsn4uJLXtRPqEaKEg@mail.gmail.com>

On 12/16/21 11:58 PM, Willem de Bruijn wrote:
>>>> @@ -530,7 +538,14 @@ struct skb_shared_info {
>>>>          /* Warning: this field is not always filled in (UFO)! */
>>>>          unsigned short  gso_segs;
>>>>          struct sk_buff  *frag_list;
>>>> -       struct skb_shared_hwtstamps hwtstamps;
>>>> +       union {
>>>> +               /* If SKBTX_DELIVERY_TSTAMP is set in tx_flags,
>>>> +                * tx_delivery_tstamp is stored instead of
>>>> +                * hwtstamps.
>>>> +                */
>>>
>>> Should we just encode the timebase and/or type { timestamp,
>>> delivery_time } in th lower bits of the timestamp field? Its
>>> resolution is higher than actual clock precision.
>> In skb->tstamp ?
> 
> Yes. Arguably a hack, but those bits are in the noise now, and it
> avoids the clone issue with skb_shinfo (and scarcity of flag bits
> there).
> 
>>> is non-zero skb->tstamp test not sufficient, instead of
>>> SKBTX_DELIVERY_TSTAMP_ALLOW_FWD.
>>>
>>> It is if only called on the egress path. Is bpf on ingress the only
>>> reason for this?
>> Ah. ic.  meaning testing non-zero skb->tstamp and then call
>> skb_save_delivery_time() only during the veth-egress-path:
>> somewhere in veth_xmit() => veth_forward_skb() but before
>> skb->tstamp was reset to 0 in __dev_forward_skb().
> 
> Right. If delivery_time is the only use of skb->tstamp on egress, and
> timestamp is the only use on ingress, then the only time the
> delivery_time needs to be cached if when looping from egress to
> ingress and this field is non-zero.
> 
>> Keep *_forward() and bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)
>> because the skb->tstamp could be stamped by net_timestamp_check().
>>
>> Then SKBTX_DELIVERY_TSTAMP_ALLOW_FWD is not needed.
>>
>> Did I understand your suggestion correctly?
> 
> I think so.
> 
> But the reality is complicated if something may be setting a delivery
> time on ingress (a BPF filter?)

I'm not quite following the 'bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)'
part yet; in our case we would need to preserve it as well, for example, we are
redirecting via bpf from bpf@tc-ingress@host-veth to bpf@tc-egress@phys-dev in
the egress path and fq sits on phys-dev.. (I mean if needed we could easily do
that as shown in my prev diff with a flag for the helper).

>> However, we still need a bit to distinguish tx_delivery_tstamp
>> from hwtstamps.
>>
>>>
>>>> +{
>>>> +       if (skb_shinfo(skb)->tx_flags & SKBTX_DELIVERY_TSTAMP_ALLOW_FWD) {
>>>> +               skb_shinfo(skb)->tx_delivery_tstamp = skb->tstamp;
>>>> +               skb_shinfo(skb)->tx_flags |= SKBTX_DELIVERY_TSTAMP;
>>>> +               skb_shinfo(skb)->tx_flags &= ~SKBTX_DELIVERY_TSTAMP_ALLOW_FWD;
>>>> +       }
>>>
>>> Is this only called when there are no clones/shares?
>> No, I don't think so.  TCP clone it.  I also started thinking about
>> this after noticing a mistake in the change in  __tcp_transmit_skb().
>>
>> There are other places that change tx_flags, e.g. tcp_offload.c.
>> It is not shared at those places or there is some specific points
>> in the stack that is safe to change ?
> 
> The packet probably is not yet shared. Until the TCP stack gives a
> packet to the IP layer, it can treat it as exclusive.
> 
> Though it does seem that these fields are accessed in a possibly racy
> manner. Drivers with hardware tx timestamp offload may set
> skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS without checking
> whether the skb may be cloned.

  reply	other threads:[~2021-12-16 23:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15 20:11 [RFC PATCH v2 net-next] net: Preserve skb delivery time during forward Martin KaFai Lau
2021-12-15 22:07 ` Julian Anastasov
2021-12-16 15:32 ` Willem de Bruijn
2021-12-16 22:23   ` Martin KaFai Lau
2021-12-16 22:58     ` Willem de Bruijn
2021-12-16 23:42       ` Daniel Borkmann [this message]
2021-12-17  7:33         ` Martin KaFai Lau
2021-12-17 11:13           ` Daniel Borkmann
2021-12-17 17:57             ` Martin KaFai Lau
2021-12-17  0:33       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca11f6f6-86f9-52c9-4251-90bf0b6f588a@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=ast@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.