netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Julian Anastasov <ja@ssi.bg>
To: Martin KaFai Lau <kafai@fb.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	kernel-team@fb.com, Willem de Bruijn <willemb@google.com>
Subject: Re: [RFC PATCH v3 net-next 3/4] net: Set skb->mono_delivery_time and clear it when delivering locally
Date: Fri, 21 Jan 2022 14:02:23 +0200 (EET)	[thread overview]
Message-ID: <ca728d81-80e8-3767-d5e-d44f6ad96e43@ssi.bg> (raw)
In-Reply-To: <20220121073045.4179438-1-kafai@fb.com>


	Hello,

On Thu, 20 Jan 2022, Martin KaFai Lau wrote:

> This patch sets the skb->mono_delivery_time to flag the skb->tstamp
> is used as the mono delivery_time (EDT) instead of the (rcv) timestamp.
> 
> skb_clear_delivery_time() is added to clear the delivery_time and set
> back to the (rcv) timestamp if needed when the skb is being delivered
> locally (to a sk).  skb_clear_delivery_time() is called in
> ip_local_deliver() and ip6_input().  In most of the regular ingress
> cases, the skb->tstamp should already have the (rcv) timestamp.
> For the egress loop back to ingress cases, the marking of the (rcv)
> timestamp is postponed from dev.c to ip_local_deliver() and
> ip6_input().
> 
> Another case needs to clear the delivery_time is the network
> tapping (e.g. af_packet by tcpdump).  Regardless of tapping at the ingress
> or egress,  the tapped skb is received by the af_packet socket, so
> it is ingress to the af_packet socket and it expects
> the (rcv) timestamp.
> 
> When tapping at egress, dev_queue_xmit_nit() is used.  It has already
> expected skb->tstamp may have delivery_time,  so it does
> skb_clone()+net_timestamp_set() to ensure the cloned skb has
> the (rcv) timestamp before passing to the af_packet sk.
> This patch only adds to clear the skb->mono_delivery_time
> bit in net_timestamp_set().
> 
> When tapping at ingress, it currently expects the skb->tstamp is either 0
> or has the (rcv) timestamp.  Meaning, the tapping at ingress path
> has already expected the skb->tstamp could be 0 and it will get
> the (rcv) timestamp by ktime_get_real() when needed.
> 
> There are two cases for tapping at ingress:
> 
> One case is af_packet queues the skb to its sk_receive_queue.  The skb
> is either not shared or new clone created.  The skb_clear_delivery_time()
> is called to clear the delivery_time (if any) before it is queued to the
> sk_receive_queue.
> 
> Another case, the ingress skb is directly copied to the rx_ring
> and tpacket_get_timestamp() is used to get the (rcv) timestamp.
> skb_tstamp() is used in tpacket_get_timestamp() to check
> the skb->mono_delivery_time bit before returning skb->tstamp.
> As mentioned earlier, the tapping@ingress has already expected
> the skb may not have the (rcv) timestamp (because no sk has asked
> for it) and has handled this case by directly calling ktime_get_real().
> 
> In __skb_tstamp_tx, it clones the egress skb and queues the clone to the
> sk_error_queue.  The outgoing skb may have the mono delivery_time while
> the (rcv) timestamp is expected for the clone, so the
> skb->mono_delivery_time bit is also cleared from the clone.
> 
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  include/linux/skbuff.h | 27 +++++++++++++++++++++++++--
>  net/core/dev.c         |  4 +++-
>  net/core/skbuff.c      |  6 ++++--
>  net/ipv4/ip_input.c    |  1 +
>  net/ipv6/ip6_input.c   |  1 +
>  net/packet/af_packet.c |  4 +++-
>  6 files changed, 37 insertions(+), 6 deletions(-)
> 

> diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
> index 3a025c011971..35311ca75496 100644
> --- a/net/ipv4/ip_input.c
> +++ b/net/ipv4/ip_input.c
> @@ -244,6 +244,7 @@ int ip_local_deliver(struct sk_buff *skb)
>  	 */
>  	struct net *net = dev_net(skb->dev);
>  
> +	skb_clear_delivery_time(skb);

	Is it safe to move this line into ip_local_deliver_finish ?

>  	if (ip_is_fragment(ip_hdr(skb))) {
>  		if (ip_defrag(net, skb, IP_DEFRAG_LOCAL_DELIVER))
>  			return 0;
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 80256717868e..84f93864b774 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -469,6 +469,7 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
>  
>  int ip6_input(struct sk_buff *skb)
>  {
> +	skb_clear_delivery_time(skb);

	Is it safe to move this line into ip6_input_finish?
The problem for both cases is that IPVS hooks at LOCAL_IN and
can decide to forward the packet by returning NF_STOLEN and
avoiding the _finish code. In short, before reaching the
_finish code it is still not decided that packet reaches the
sockets.

>  	return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_IN,
>  		       dev_net(skb->dev), NULL, skb, skb->dev, NULL,
>  		       ip6_input_finish);

Regards

--
Julian Anastasov <ja@ssi.bg>


  reply	other threads:[~2022-01-21 12:08 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21  7:30 [RFC PATCH v3 net-next 0/4] Preserve mono delivery time (EDT) in skb->tstamp Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 1/4] net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp Martin KaFai Lau
2022-01-22 15:32   ` Willem de Bruijn
2022-01-22 19:52     ` Martin KaFai Lau
2022-01-22 20:03     ` Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 2/4] net: Add skb_clear_tstamp() to keep the mono delivery_time Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 3/4] net: Set skb->mono_delivery_time and clear it when delivering locally Martin KaFai Lau
2022-01-21 12:02   ` Julian Anastasov [this message]
2022-01-22  3:28     ` Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 4/4] bpf: Add __sk_buff->mono_delivery_time and handle __sk_buff->tstamp based on tc_at_ingress Martin KaFai Lau
2022-01-21 18:50   ` sdf
2022-01-21 20:56     ` Martin KaFai Lau
2022-01-21 22:33       ` sdf
2022-01-22 15:43 ` [RFC PATCH v3 net-next 0/4] Preserve mono delivery time (EDT) in skb->tstamp Willem de Bruijn
2022-01-22 21:05   ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca728d81-80e8-3767-d5e-d44f6ad96e43@ssi.bg \
    --to=ja@ssi.bg \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).