All of lore.kernel.org
 help / color / mirror / Atom feed
From: Julian Anastasov <ja@ssi.bg>
To: Martin KaFai Lau <kafai@fb.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	kernel-team@fb.com, Willem de Bruijn <willemb@google.com>
Subject: Re: [RFC PATCH v3 net-next 3/4] net: Set skb->mono_delivery_time and clear it when delivering locally
Date: Fri, 21 Jan 2022 14:02:23 +0200 (EET)	[thread overview]
Message-ID: <ca728d81-80e8-3767-d5e-d44f6ad96e43@ssi.bg> (raw)
In-Reply-To: <20220121073045.4179438-1-kafai@fb.com>


	Hello,

On Thu, 20 Jan 2022, Martin KaFai Lau wrote:

> This patch sets the skb->mono_delivery_time to flag the skb->tstamp
> is used as the mono delivery_time (EDT) instead of the (rcv) timestamp.
> 
> skb_clear_delivery_time() is added to clear the delivery_time and set
> back to the (rcv) timestamp if needed when the skb is being delivered
> locally (to a sk).  skb_clear_delivery_time() is called in
> ip_local_deliver() and ip6_input().  In most of the regular ingress
> cases, the skb->tstamp should already have the (rcv) timestamp.
> For the egress loop back to ingress cases, the marking of the (rcv)
> timestamp is postponed from dev.c to ip_local_deliver() and
> ip6_input().
> 
> Another case needs to clear the delivery_time is the network
> tapping (e.g. af_packet by tcpdump).  Regardless of tapping at the ingress
> or egress,  the tapped skb is received by the af_packet socket, so
> it is ingress to the af_packet socket and it expects
> the (rcv) timestamp.
> 
> When tapping at egress, dev_queue_xmit_nit() is used.  It has already
> expected skb->tstamp may have delivery_time,  so it does
> skb_clone()+net_timestamp_set() to ensure the cloned skb has
> the (rcv) timestamp before passing to the af_packet sk.
> This patch only adds to clear the skb->mono_delivery_time
> bit in net_timestamp_set().
> 
> When tapping at ingress, it currently expects the skb->tstamp is either 0
> or has the (rcv) timestamp.  Meaning, the tapping at ingress path
> has already expected the skb->tstamp could be 0 and it will get
> the (rcv) timestamp by ktime_get_real() when needed.
> 
> There are two cases for tapping at ingress:
> 
> One case is af_packet queues the skb to its sk_receive_queue.  The skb
> is either not shared or new clone created.  The skb_clear_delivery_time()
> is called to clear the delivery_time (if any) before it is queued to the
> sk_receive_queue.
> 
> Another case, the ingress skb is directly copied to the rx_ring
> and tpacket_get_timestamp() is used to get the (rcv) timestamp.
> skb_tstamp() is used in tpacket_get_timestamp() to check
> the skb->mono_delivery_time bit before returning skb->tstamp.
> As mentioned earlier, the tapping@ingress has already expected
> the skb may not have the (rcv) timestamp (because no sk has asked
> for it) and has handled this case by directly calling ktime_get_real().
> 
> In __skb_tstamp_tx, it clones the egress skb and queues the clone to the
> sk_error_queue.  The outgoing skb may have the mono delivery_time while
> the (rcv) timestamp is expected for the clone, so the
> skb->mono_delivery_time bit is also cleared from the clone.
> 
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  include/linux/skbuff.h | 27 +++++++++++++++++++++++++--
>  net/core/dev.c         |  4 +++-
>  net/core/skbuff.c      |  6 ++++--
>  net/ipv4/ip_input.c    |  1 +
>  net/ipv6/ip6_input.c   |  1 +
>  net/packet/af_packet.c |  4 +++-
>  6 files changed, 37 insertions(+), 6 deletions(-)
> 

> diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
> index 3a025c011971..35311ca75496 100644
> --- a/net/ipv4/ip_input.c
> +++ b/net/ipv4/ip_input.c
> @@ -244,6 +244,7 @@ int ip_local_deliver(struct sk_buff *skb)
>  	 */
>  	struct net *net = dev_net(skb->dev);
>  
> +	skb_clear_delivery_time(skb);

	Is it safe to move this line into ip_local_deliver_finish ?

>  	if (ip_is_fragment(ip_hdr(skb))) {
>  		if (ip_defrag(net, skb, IP_DEFRAG_LOCAL_DELIVER))
>  			return 0;
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 80256717868e..84f93864b774 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -469,6 +469,7 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
>  
>  int ip6_input(struct sk_buff *skb)
>  {
> +	skb_clear_delivery_time(skb);

	Is it safe to move this line into ip6_input_finish?
The problem for both cases is that IPVS hooks at LOCAL_IN and
can decide to forward the packet by returning NF_STOLEN and
avoiding the _finish code. In short, before reaching the
_finish code it is still not decided that packet reaches the
sockets.

>  	return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_IN,
>  		       dev_net(skb->dev), NULL, skb, skb->dev, NULL,
>  		       ip6_input_finish);

Regards

--
Julian Anastasov <ja@ssi.bg>


  reply	other threads:[~2022-01-21 12:08 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21  7:30 [RFC PATCH v3 net-next 0/4] Preserve mono delivery time (EDT) in skb->tstamp Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 1/4] net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp Martin KaFai Lau
2022-01-22 15:32   ` Willem de Bruijn
2022-01-22 19:52     ` Martin KaFai Lau
2022-01-22 20:03     ` Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 2/4] net: Add skb_clear_tstamp() to keep the mono delivery_time Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 3/4] net: Set skb->mono_delivery_time and clear it when delivering locally Martin KaFai Lau
2022-01-21 12:02   ` Julian Anastasov [this message]
2022-01-22  3:28     ` Martin KaFai Lau
2022-01-21  7:30 ` [RFC PATCH v3 net-next 4/4] bpf: Add __sk_buff->mono_delivery_time and handle __sk_buff->tstamp based on tc_at_ingress Martin KaFai Lau
2022-01-21 18:50   ` sdf
2022-01-21 20:56     ` Martin KaFai Lau
2022-01-21 22:33       ` sdf
2022-01-22 15:43 ` [RFC PATCH v3 net-next 0/4] Preserve mono delivery time (EDT) in skb->tstamp Willem de Bruijn
2022-01-22 21:05   ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca728d81-80e8-3767-d5e-d44f6ad96e43@ssi.bg \
    --to=ja@ssi.bg \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.