netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: <bpf@vger.kernel.org>, <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, <kernel-team@fb.com>,
	Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH v4 net-next 5/8] bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress
Date: Tue, 15 Feb 2022 21:51:42 -0800	[thread overview]
Message-ID: <20220216055142.n445wwtqmqewc57a@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <f0d76498-0bb1-36a6-0c8c-b334f6fb13c2@iogearbox.net>

On Wed, Feb 16, 2022 at 12:30:53AM +0100, Daniel Borkmann wrote:
> On 2/11/22 8:13 AM, Martin KaFai Lau wrote:
> > The current tc-bpf@ingress reads and writes the __sk_buff->tstamp
> > as a (rcv) timestamp.  This patch is to backward compatible with the
> > (rcv) timestamp expectation when the skb->tstamp has a mono delivery_time.
> > 
> > If needed, the patch first saves the mono delivery_time.  Depending on
> > the static key "netstamp_needed_key", it then resets the skb->tstamp to
> > either 0 or ktime_get_real() before running the tc-bpf@ingress.  After
> > the tc-bpf prog returns, if the (rcv) timestamp in skb->tstamp has not
> > been changed, it will restore the earlier saved mono delivery_time.
> > 
> > The current logic to run tc-bpf@ingress is refactored to a new
> > bpf_prog_run_at_ingress() function and shared between cls_bpf and act_bpf.
> > The above new delivery_time save/restore logic is also done together in
> > this function.
> > 
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> >   include/linux/filter.h | 28 ++++++++++++++++++++++++++++
> >   net/sched/act_bpf.c    |  5 +----
> >   net/sched/cls_bpf.c    |  6 +-----
> >   3 files changed, 30 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index d23e999dc032..e43e1701a80e 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -699,6 +699,34 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)
> >   	cb->data_end  = skb->data + skb_headlen(skb);
> >   }
> > +static __always_inline u32 bpf_prog_run_at_ingress(const struct bpf_prog *prog,
> > +						   struct sk_buff *skb)
> > +{
> > +	ktime_t tstamp, saved_mono_dtime = 0;
> > +	int filter_res;
> > +
> > +	if (unlikely(skb->mono_delivery_time)) {
> > +		saved_mono_dtime = skb->tstamp;
> > +		skb->mono_delivery_time = 0;
> > +		if (static_branch_unlikely(&netstamp_needed_key))
> > +			skb->tstamp = tstamp = ktime_get_real();
> > +		else
> > +			skb->tstamp = tstamp = 0;
> > +	}
> > +
> > +	/* It is safe to push/pull even if skb_shared() */
> > +	__skb_push(skb, skb->mac_len);
> > +	bpf_compute_data_pointers(skb);
> > +	filter_res = bpf_prog_run(prog, skb);
> > +	__skb_pull(skb, skb->mac_len);
> > +
> > +	/* __sk_buff->tstamp was not changed, restore the delivery_time */
> > +	if (unlikely(saved_mono_dtime) && skb_tstamp(skb) == tstamp)
> > +		skb_set_delivery_time(skb, saved_mono_dtime, true);
> 
> So above detour is for skb->tstamp backwards compatibility so users will see real time.
> I don't see why we special case {cls,act}_bpf-only, given this will also be the case
> for other subsystems (e.g. netfilter) when they read access plain skb->tstamp and get
> the egress one instead of ktime_get_real() upon deferred skb_clear_delivery_time().
> 
> If we would generally ignore it, then the above bpf_prog_run_at_ingress() save/restore
> detour is not needed (so patch 5/6 should be dropped). (Meaning, if we need to special
> case {cls,act}_bpf only, we could also have gone for simpler bpf-only solution..)
The limitation here is there is only one skb->tstamp field.  I don't see
a bpf-only solution or not will make a difference here.

Regarding the netfilter (good point!), I only see it is used in nfnetlink_log.c
and nfnetlink_queue.c.  Like the tapping cases (earlier than the bpf run-point)
and in general other ingress cases, it cannot assume the rcv timestamp is
always there, so they can be changed like af_packet in patch 3
which is a straight forward change.  I can make the change in v5.

Going back to the cls_bpf at ingress.  If the concern is on code cleanliness,
how about removing this dance for now while the current rcv tstamp usage is
unclear at ingress.  Meaning keep the delivery_time (if any) in skb->tstamp.
This dance could be brought in later when there was breakage and legit usecase
reported.  The new bpf prog will have to use the __sk_buff->delivery_time_type
regardless if it wants to use skb->tstamp as the delivery_time, so they won't
assume delivery_time is always in skb->tstamp and it will be fine even this
dance would be brought back in later.

I prefer to keep it as-is to avoid unlikely breakage but open to remove
it for now.

Regarding patch 6, it is unrelated.  It needs to clear the
mono_delivery_time bit if the bpf writes 0 to the skb->tstamp.

  reply	other threads:[~2022-02-16  5:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-11  7:12 [PATCH v4 net-next 0/8] Preserve mono delivery time (EDT) in skb->tstamp Martin KaFai Lau
2022-02-11  7:12 ` [PATCH v4 net-next 1/8] net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp Martin KaFai Lau
2022-02-12 19:13   ` Cong Wang
2022-02-12 23:27     ` Martin KaFai Lau
2022-02-15 20:49   ` Daniel Borkmann
2022-02-16  6:11     ` Martin KaFai Lau
2022-02-11  7:12 ` [PATCH v4 net-next 2/8] net: Add skb_clear_tstamp() to keep the mono delivery_time Martin KaFai Lau
2022-02-11  7:12 ` [PATCH v4 net-next 3/8] net: Set skb->mono_delivery_time and clear it after sch_handle_ingress() Martin KaFai Lau
2022-02-15 21:04   ` Daniel Borkmann
2022-02-11  7:12 ` [PATCH v4 net-next 4/8] net: Postpone skb_clear_delivery_time() until knowing the skb is delivered locally Martin KaFai Lau
2022-02-11  7:13 ` [PATCH v4 net-next 5/8] bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress Martin KaFai Lau
2022-02-15 23:30   ` Daniel Borkmann
2022-02-16  5:51     ` Martin KaFai Lau [this message]
2022-02-17  0:03       ` Daniel Borkmann
2022-02-17  1:50         ` Martin KaFai Lau
2022-02-17  6:52           ` Martin KaFai Lau
2022-02-11  7:13 ` [PATCH v4 net-next 6/8] bpf: Clear skb->mono_delivery_time bit if needed after running tc-bpf@egress Martin KaFai Lau
2022-02-11  7:13 ` [PATCH v4 net-next 7/8] bpf: Add __sk_buff->delivery_time_type and bpf_skb_set_delivery_time() Martin KaFai Lau
2022-02-11  7:13 ` [PATCH v4 net-next 8/8] bpf: selftests: test skb->tstamp in redirect_neigh Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220216055142.n445wwtqmqewc57a@kafai-mbp.dhcp.thefacebook.com \
    --to=kafai@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).