All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: <bpf@vger.kernel.org>, <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, <kernel-team@fb.com>
Subject: Re: [RFC PATCH v3 net-next 1/4] net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp
Date: Sat, 22 Jan 2022 11:52:28 -0800	[thread overview]
Message-ID: <20220122195228.psu6bsodh3k3ds5q@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CA+FuTSe1d91JC_bQvFGdoAaAEG4fur6KfzkNheA-ymnnMharXQ@mail.gmail.com>

On Sat, Jan 22, 2022 at 10:32:16AM -0500, Willem de Bruijn wrote:
> On Fri, Jan 21, 2022 at 2:30 AM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > skb->tstamp was first used as the (rcv) timestamp in real time clock base.
> > The major usage is to report it to the user (e.g. SO_TIMESTAMP).
> >
> > Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP)
> > during egress and used by the qdisc (e.g. sch_fq) to make decision on when
> > the skb can be passed to the dev.
> >
> > Currently, there is no way to tell skb->tstamp having the (rcv) timestamp
> > or the delivery_time, so it is always reset to 0 whenever forwarded
> > between egress and ingress.
> >
> > While it makes sense to always clear the (rcv) timestamp in skb->tstamp
> > to avoid confusing sch_fq that expects the delivery_time, it is a
> > performance issue [0] to clear the delivery_time if the skb finally
> > egress to a fq@phy-dev.  For example, when forwarding from egress to
> > ingress and then finally back to egress:
> >
> >             tcp-sender => veth@netns => veth@hostns => fq@eth0@hostns
> >                                      ^              ^
> >                                      reset          rest
> >
> > [0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf 
> >
> > This patch adds one bit skb->mono_delivery_time to flag the skb->tstamp
> > is storing the mono delivery_time instead of the (rcv) timestamp.
> >
> > The current use case is to keep the TCP mono delivery_time (EDT) and
> > to be used with sch_fq.  The later patch will also allow tc-bpf to read
> > and change the mono delivery_time.
> >
> > In the future, another bit (e.g. skb->user_delivery_time) can be added
> > for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid.
> >
> > [ This patch is a prep work.  The following patch will
> >   get the other parts of the stack ready first.  Then another patch
> >   after that will finally set the skb->mono_delivery_time. ]
> >
> > skb_set_delivery_time() function is added.  It is used by the tcp_output.c
> > and during ip[6] fragmentation to assign the delivery_time to
> > the skb->tstamp and also set the skb->mono_delivery_time.
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> >  include/linux/skbuff.h                     | 13 +++++++++++++
> >  net/bridge/netfilter/nf_conntrack_bridge.c |  5 +++--
> >  net/ipv4/ip_output.c                       |  5 +++--
> >  net/ipv4/tcp_output.c                      | 16 +++++++++-------
> >  net/ipv6/ip6_output.c                      |  5 +++--
> >  net/ipv6/netfilter.c                       |  5 +++--
> >  6 files changed, 34 insertions(+), 15 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index bf11e1fbd69b..b9e20187242a 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -720,6 +720,10 @@ typedef unsigned char *sk_buff_data_t;
> >   *     @dst_pending_confirm: need to confirm neighbour
> >   *     @decrypted: Decrypted SKB
> >   *     @slow_gro: state present at GRO time, slower prepare step required
> > + *     @mono_delivery_time: When set, skb->tstamap has the
> 
> tstamp
> 
> > + *             delivery_time in mono clock base (i.e. EDT).  Otherwise, the
> > + *             skb->tstamp has the (rcv) timestamp at ingress and
> > + *             delivery_time at egress.
> >   *     @napi_id: id of the NAPI struct this skb came from
> >   *     @sender_cpu: (aka @napi_id) source CPU in XPS
> >   *     @secmark: security marking
> > @@ -890,6 +894,7 @@ struct sk_buff {
> >         __u8                    decrypted:1;
> >  #endif
> >         __u8                    slow_gro:1;
> > +       __u8                    mono_delivery_time:1;
> 
> This bit fills a hole, does not change sk_buff size, right?
> 
> >
> >  #ifdef CONFIG_NET_SCHED
> >         __u16                   tc_index;       /* traffic control index */
> > @@ -3903,6 +3908,14 @@ static inline ktime_t net_invalid_timestamp(void)
> >         return 0;
> >  }
> >
> > +static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
> > +                                        bool mono)
> > +{
> > +       skb->tstamp = kt;
> > +       /* Setting mono_delivery_time will be enabled later */
> > +       /* skb->mono_delivery_time = kt && mono; */
> > +}
> > +
> >  static inline u8 skb_metadata_len(const struct sk_buff *skb)
> >  {
> >         return skb_shinfo(skb)->meta_len;
> > diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
> > index fdbed3158555..ebfb2a5c59e4 100644
> > --- a/net/bridge/netfilter/nf_conntrack_bridge.c
> > +++ b/net/bridge/netfilter/nf_conntrack_bridge.c
> > @@ -32,6 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
> >                                            struct sk_buff *))
> >  {
> >         int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
> > +       bool mono_delivery_time = skb->mono_delivery_time;
> 
> This use of a local variable is just a style choice, not needed for
> correctness, correct?
It is needed.  The current code also saves the very first
skb->tstamp (two lines below):

> >         unsigned int hlen, ll_rs, mtu;
> >         ktime_t tstamp = skb->tstamp;

My understanding is all frags reuse the first (/original) skb tstamp info.
The frags output() is one-by-one, meaning when sending the later frags,
the first skb has already been output()-ed, so the first skb tstamp info
needs to be saved in local variables.

  reply	other threads:[~2022-01-22 19:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21  7:30 [RFC PATCH v3 net-next 0/4] Preserve mono delivery time (EDT) in skb->tstamp Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 1/4] net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp Martin KaFai Lau
2022-01-22 15:32   ` Willem de Bruijn
2022-01-22 19:52     ` Martin KaFai Lau [this message]
2022-01-22 20:03     ` Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 2/4] net: Add skb_clear_tstamp() to keep the mono delivery_time Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 3/4] net: Set skb->mono_delivery_time and clear it when delivering locally Martin KaFai Lau
2022-01-21 12:02   ` Julian Anastasov
2022-01-22  3:28     ` Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 4/4] bpf: Add __sk_buff->mono_delivery_time and handle __sk_buff->tstamp based on tc_at_ingress Martin KaFai Lau
2022-01-21 18:50   ` sdf
2022-01-21 20:56     ` Martin KaFai Lau
2022-01-21 22:33       ` sdf
2022-01-22 15:43 ` [RFC PATCH v3 net-next 0/4] Preserve mono delivery time (EDT) in skb->tstamp Willem de Bruijn
2022-01-22 21:05   ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220122195228.psu6bsodh3k3ds5q@kafai-mbp.dhcp.thefacebook.com \
    --to=kafai@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.