netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Abhishek Chauhan <quic_abchauha@quicinc.com>
To: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrew Halaney <ahalaney@redhat.com>,
	Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>
Cc: kernel@quicinc.com
Subject: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
Date: Fri,  1 Mar 2024 12:13:48 -0800	[thread overview]
Message-ID: <20240301201348.2815102-1-quic_abchauha@quicinc.com> (raw)

Bridge driver today has no support to forward the userspace timestamp
packets and ends up resetting the timestamp. ETF qdisc checks the
packet coming from userspace and encounters to be 0 thereby dropping
time sensitive packets. These changes will allow userspace timestamps
packets to be forwarded from the bridge to NIC drivers.

Setting the same bit (mono_delivery_time) to avoid dropping of
userspace tstamp packets in the forwarding path.

Existing functionality of mono_delivery_time remains unaltered here,
instead just extended with userspace tstamp support for bridge
forwarding path.

Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
---
Changes since v3
- Setting mono_delivery_time at all instances where the skb->tstamp is 
  initialized with sockcm.transmit_time as reviewed by Willem
- Removed repetitive comments from all the sources file and limited only
  to skbuff.h as suggested by Willem
- Re-phrased the comment explanation in skbuff.h and made it much simpler 
  and generic as suggested by Willem 

Changes since v2
- Updated the commit subject and message. 
- Took care of few comments from Willem to re-use mono_delivery_time
  with comments and documentations in the header and source file.
- Took care of comment from Andrew on the typo in the comment.
- Existing self-test test cases are executed to make sure existing 
  implementation is not impacted as stated by Paolo.(so_txtime.sh). 
- Internal validation of UDP packets using iperf/so_priority/so_txtime
  with MQPRIO + ETF offload is executed as well.
- Test case is included below

Test 1 :- FQ + ETF (SW path)

[root@ecbldauto-lvarm04-lnx ~]# ./so_txtime.sh
[  280.640551] q->last time is 1707955476143297550
[  283.338947] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[  284.078429] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready

SO_TXTIME ipv4 clock monotonic
payload:a delay:109 expected:0 (us)

SO_TXTIME ipv6 clock monotonic
payload:a delay:140 expected:0 (us)

SO_TXTIME ipv6 clock monotonic
payload:a delay:12739 expected:10000 (us)

SO_TXTIME ipv4 clock monotonic
payload:a delay:10054 expected:10000 (us)
payload:b delay:20043 expected:20000 (us)

SO_TXTIME ipv6 clock monotonic
payload:b delay:20078 expected:20000 (us)
payload:a delay:20177 expected:20000 (us)

SO_TXTIME ipv4 clock tai
send: pkt a at -1707955482913ms dropped: invalid txtime
[  287.070504] now is set to 1707955482913404839
[  287.070509] tx time from SKB is 0
./so_txtime: recv: timeout: Resource temporarily unavailable

SO_TXTIME ipv6 clock tai
send: pkt a at 0ms dropped: invalid txtime
[  287.070510] q->last time is 0
[  287.420590] now is set to 1707955483263491298
[  287.420596] tx time from SKB is 1707955483263454527
./so_txtime: recv: timeout: Resource temporarily unavailable

SO_TXTIME ipv6 clock tai
[  287.420597] q->last time is 0
[  287.700598] now is set to 1707955483543498954
[  287.700604] tx time from SKB is 1707955483553463173
payload:a delay:9655 expected:10000 (us)

SO_TXTIME ipv4 clock tai
[  287.700605] q->last time is 0
[  288.100532] now is set to 1707955483943432391
[  288.100537] tx time from SKB is 1707955483953413016
payload:a delay:9668 expected:10000 (us)[  288.100538] q->last time is 1707955483553463173

[  288.100546] now is set to 1707955483943446975
[  288.100547] tx time from SKB is 1707955483963413016
payload:b delay:20484 expected:20000 (us)

SO_TXTIME ipv6 clock tai
[  288.100547] q->last time is 1707955483553463173
[  288.440582] now is set to 1707955484283482495
[  288.440587] tx time from SKB is 1707955484303452808
payload:b delay:9648 expected:10000 (us)[  288.440588] q->last time is 1707955483963413016

[  288.440598] now is set to 1707955484283499370
payload:a delay:22037 expected:20000 (us)
[  288.440599] tx time from SKB is 1707955484293452808
OK. All tests passed


Test case 2 (MQPRIO + ETF HW offload)

[root@ecbldauto-lvarm04-lnx ~]# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 4 \
            map 0 2 1 3 3 2 2 2 2 2 2 2 2 2 2 2 \
            queues 1@0 1@1 1@2 1@3\
            hw 0
[root@ecbldauto-lvarm04-lnx ~]#
tc qdisc replace dev eth0 parent 100:4 etf \
            clockid CLOCK_TAI delta 40000  offload skip_sock_check
[   89.145838] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue test log 3, number of queues 4, qopt enable 1, tbs queue bit 1
[   89.145846] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue 3


[root@ecbldauto-lvarm04-lnx ~]# ./a.out -4 -c tai -S 192.168.1.1 -D 192.168.1.2 a,1,b,2

SO_TXTIME ipv4 clock tai

 glob_tstat = 1707955395256170394
[  199.623650] now is set to 1707955395256215810
[  199.623655] tx time from SKB is 1707955395257170394
[  199.623656] q->last time is 0
[  199.623663] now is set to 1707955395256230029
[  199.623664] tx time from SKB is 1707955395258170394
[  199.623665] q->last time is 0
[  199.624589] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 257170394 nsec
[  199.625573] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 258170394 nsec

Changes since v1 
- Changed the commit subject as i am modifying the mono_delivery_time 
  bit with clockid_delivery_time.
- Took care of suggestion mentioned by Willem to use the same bit for 
  userspace delivery time as there are no conflicts between TCP and 
  SCM_TXTIME, because explicit cmsg makes no sense for TCP and only
  RAW and DGRAM sockets interprets it. 
- Clear explaination of why this is needed mentioned below and this 
  is extending the work done by Martin for mono_delivery_time 
  https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/
- Version 1 patch can be referenced with below link which states 
  the exact problem with tc-etf and discussions which took place
  https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/ 

 include/linux/skbuff.h | 6 +++---
 net/ipv4/ip_output.c   | 1 +
 net/ipv4/raw.c         | 1 +
 net/ipv6/ip6_output.c  | 2 +-
 net/ipv6/raw.c         | 2 +-
 net/packet/af_packet.c | 4 +++-
 6 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2dde34c29203..4726298d4ed4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -817,9 +817,9 @@ typedef unsigned char *sk_buff_data_t;
  *	@decrypted: Decrypted SKB
  *	@slow_gro: state present at GRO time, slower prepare step required
  *	@mono_delivery_time: When set, skb->tstamp has the
- *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
- *		skb->tstamp has the (rcv) timestamp at ingress and
- *		delivery_time at egress.
+ *		delivery_time in mono clock base (i.e., EDT) or a clock base chosen
+ *		by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at
+ *		ingress.
  *	@napi_id: id of the NAPI struct this skb came from
  *	@sender_cpu: (aka @napi_id) source CPU in XPS
  *	@alloc_cpu: CPU which did the skb allocation.
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 5b5a0adb927f..ff1df64c5697 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1455,6 +1455,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
 	skb->mark = cork->mark;
 	skb->tstamp = cork->transmit_time;
+	skb->mono_delivery_time = !!skb->tstamp;
 	/*
 	 * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
 	 * on dst refcount
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index aea89326c697..c4c29fc5b73f 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -353,6 +353,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc->mark;
 	skb->tstamp = sockc->transmit_time;
+	skb->mono_delivery_time = !!skb->tstamp;
 	skb_dst_set(skb, &rt->dst);
 	*rtp = NULL;
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a722a43dd668..2fc1d03dc07d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1922,7 +1922,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = cork->base.mark;
 	skb->tstamp = cork->base.transmit_time;
-
+	skb->mono_delivery_time = !!skb->tstamp;
 	ip6_cork_steal_dst(skb, cork);
 	IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
 	if (proto == IPPROTO_ICMPV6) {
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 03dbb874c363..13f54f8eea35 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -616,7 +616,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc->mark;
 	skb->tstamp = sockc->transmit_time;
-
+	skb->mono_delivery_time = !!skb->tstamp;
 	skb_put(skb, length);
 	skb_reset_network_header(skb);
 	iph = ipv6_hdr(skb);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c9bbc2686690..0db31ca4982d 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2057,7 +2057,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = READ_ONCE(sk->sk_mark);
 	skb->tstamp = sockc.transmit_time;
-
+	skb->mono_delivery_time = !!skb->tstamp;
 	skb_setup_tx_timestamp(skb, sockc.tsflags);
 
 	if (unlikely(extra_len == 4))
@@ -2586,6 +2586,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	skb->priority = READ_ONCE(po->sk.sk_priority);
 	skb->mark = READ_ONCE(po->sk.sk_mark);
 	skb->tstamp = sockc->transmit_time;
+	skb->mono_delivery_time = !!skb->tstamp;
 	skb_setup_tx_timestamp(skb, sockc->tsflags);
 	skb_zcopy_set_nouarg(skb, ph.raw);
 
@@ -3064,6 +3065,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc.mark;
 	skb->tstamp = sockc.transmit_time;
+	skb->mono_delivery_time = !!skb->tstamp;
 
 	if (unlikely(extra_len == 4))
 		skb->no_fcs = 1;
-- 
2.25.1


             reply	other threads:[~2024-03-01 20:14 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01 20:13 Abhishek Chauhan [this message]
2024-03-01 22:21 ` [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets Willem de Bruijn
2024-03-05 13:00 ` patchwork-bot+netdevbpf
2024-03-12 23:52 ` Martin KaFai Lau
2024-03-13  4:34   ` Abhishek Chauhan (ABC)
2024-03-13  5:32     ` Abhishek Chauhan (ABC)
2024-03-13  8:52   ` Willem de Bruijn
2024-03-13 18:42     ` Martin KaFai Lau
2024-03-13 19:36       ` Willem de Bruijn
2024-03-13 20:59         ` Abhishek Chauhan (ABC)
2024-03-13 21:19           ` Martin KaFai Lau
2024-03-13 21:41             ` Daniel Borkmann
2024-03-13 21:01         ` Martin KaFai Lau
2024-03-13 21:26           ` Abhishek Chauhan (ABC)
2024-03-13 21:40             ` Willem de Bruijn
2024-03-13 22:08               ` Martin KaFai Lau
2024-03-14  9:49                 ` Willem de Bruijn
2024-03-14 19:21                   ` Martin KaFai Lau
2024-03-14 20:28                     ` Willem de Bruijn
2024-03-14 20:53                       ` Abhishek Chauhan (ABC)
2024-03-14 21:48                         ` Martin KaFai Lau
2024-03-14 21:54                           ` Martin KaFai Lau
2024-03-14 22:29                           ` Abhishek Chauhan (ABC)
2024-03-18 19:02                             ` Abhishek Chauhan (ABC)
2024-03-19 19:46                               ` Martin KaFai Lau
2024-03-19 20:12                                 ` Abhishek Chauhan (ABC)
2024-03-20  6:22                               ` Abhishek Chauhan (ABC)
2024-03-20 20:30                                 ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240301201348.2815102-1-quic_abchauha@quicinc.com \
    --to=quic_abchauha@quicinc.com \
    --cc=ahalaney@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernel@quicinc.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).