All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field
@ 2021-12-24 13:50 Justin Iurman
  2021-12-24 17:53 ` Ido Schimmel
  0 siblings, 1 reply; 9+ messages in thread
From: Justin Iurman @ 2021-12-24 13:50 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, dsahern, yoshfuji, justin.iurman

v2:
 - Fix sparse warning (use rcu_dereference)

This patch adds support for the queue depth in IOAM trace data fields.

The draft [1] says the following:

   The "queue depth" field is a 4-octet unsigned integer field.  This
   field indicates the current length of the egress interface queue of
   the interface from where the packet is forwarded out.  The queue
   depth is expressed as the current amount of memory buffers used by
   the queue (a packet could consume one or more memory buffers,
   depending on its size).

An existing function (i.e., qdisc_qstats_qlen_backlog) is used to
retrieve the current queue length without reinventing the wheel.

Note: it was tested and qlen is increasing when an artificial delay is
added on the egress with tc.

  [1] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.2.7

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
 net/ipv6/ioam6.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
index 122a3d47424c..969a5adbaf5c 100644
--- a/net/ipv6/ioam6.c
+++ b/net/ipv6/ioam6.c
@@ -13,10 +13,12 @@
 #include <linux/ioam6.h>
 #include <linux/ioam6_genl.h>
 #include <linux/rhashtable.h>
+#include <linux/netdevice.h>
 
 #include <net/addrconf.h>
 #include <net/genetlink.h>
 #include <net/ioam6.h>
+#include <net/sch_generic.h>
 
 static void ioam6_ns_release(struct ioam6_namespace *ns)
 {
@@ -717,7 +719,19 @@ static void __ioam6_fill_trace_data(struct sk_buff *skb,
 
 	/* queue depth */
 	if (trace->type.bit6) {
-		*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
+		struct netdev_queue *queue;
+		struct Qdisc *qdisc;
+		__u32 qlen, backlog;
+
+		if (skb_dst(skb)->dev->flags & IFF_LOOPBACK) {
+			*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
+		} else {
+			queue = skb_get_tx_queue(skb_dst(skb)->dev, skb);
+			qdisc = rcu_dereference(queue->qdisc);
+			qdisc_qstats_qlen_backlog(qdisc, &qlen, &backlog);
+
+			*(__be32 *)data = cpu_to_be32(qlen);
+		}
 		data += sizeof(__be32);
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field
  2021-12-24 13:50 [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field Justin Iurman
@ 2021-12-24 17:53 ` Ido Schimmel
  2021-12-26 11:47   ` Justin Iurman
  0 siblings, 1 reply; 9+ messages in thread
From: Ido Schimmel @ 2021-12-24 17:53 UTC (permalink / raw)
  To: Justin Iurman; +Cc: netdev, davem, kuba, dsahern, yoshfuji

On Fri, Dec 24, 2021 at 02:50:00PM +0100, Justin Iurman wrote:
> v2:
>  - Fix sparse warning (use rcu_dereference)
> 
> This patch adds support for the queue depth in IOAM trace data fields.
> 
> The draft [1] says the following:
> 
>    The "queue depth" field is a 4-octet unsigned integer field.  This
>    field indicates the current length of the egress interface queue of
>    the interface from where the packet is forwarded out.  The queue
>    depth is expressed as the current amount of memory buffers used by
>    the queue (a packet could consume one or more memory buffers,
>    depending on its size).
> 
> An existing function (i.e., qdisc_qstats_qlen_backlog) is used to
> retrieve the current queue length without reinventing the wheel.
> 
> Note: it was tested and qlen is increasing when an artificial delay is
> added on the egress with tc.
> 
>   [1] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.2.7
> 
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
>  net/ipv6/ioam6.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> index 122a3d47424c..969a5adbaf5c 100644
> --- a/net/ipv6/ioam6.c
> +++ b/net/ipv6/ioam6.c
> @@ -13,10 +13,12 @@
>  #include <linux/ioam6.h>
>  #include <linux/ioam6_genl.h>
>  #include <linux/rhashtable.h>
> +#include <linux/netdevice.h>
>  
>  #include <net/addrconf.h>
>  #include <net/genetlink.h>
>  #include <net/ioam6.h>
> +#include <net/sch_generic.h>
>  
>  static void ioam6_ns_release(struct ioam6_namespace *ns)
>  {
> @@ -717,7 +719,19 @@ static void __ioam6_fill_trace_data(struct sk_buff *skb,
>  
>  	/* queue depth */
>  	if (trace->type.bit6) {
> -		*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
> +		struct netdev_queue *queue;
> +		struct Qdisc *qdisc;
> +		__u32 qlen, backlog;
> +
> +		if (skb_dst(skb)->dev->flags & IFF_LOOPBACK) {
> +			*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
> +		} else {
> +			queue = skb_get_tx_queue(skb_dst(skb)->dev, skb);
> +			qdisc = rcu_dereference(queue->qdisc);
> +			qdisc_qstats_qlen_backlog(qdisc, &qlen, &backlog);
> +
> +			*(__be32 *)data = cpu_to_be32(qlen);

Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
seems that queue depth needs to take into account the size of the
enqueued packets, not only their number.

Did you check what other IOAM implementations (SW/HW) report for queue
depth? I would assume that they report bytes.

> +		}
>  		data += sizeof(__be32);
>  	}
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field
  2021-12-24 17:53 ` Ido Schimmel
@ 2021-12-26 11:47   ` Justin Iurman
  2021-12-26 12:40     ` Ido Schimmel
  0 siblings, 1 reply; 9+ messages in thread
From: Justin Iurman @ 2021-12-26 11:47 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, dsahern, yoshfuji

On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
> Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
> seems that queue depth needs to take into account the size of the
> enqueued packets, not only their number.

The quoted paragraph contains the following sentence:

   "The queue depth is expressed as the current amount of memory
    buffers used by the queue"

So my understanding is that we need their number, not their size.

> Did you check what other IOAM implementations (SW/HW) report for queue
> depth? I would assume that they report bytes.

Unfortunately, IOAM is quite new, and so IOAM implementations don't
grow on trees. The Linux kernel implementation is one of the first,
except for VPP and IOS (Cisco) which did not implement the queue
depth data field.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field
  2021-12-26 11:47   ` Justin Iurman
@ 2021-12-26 12:40     ` Ido Schimmel
  2021-12-26 12:59       ` Justin Iurman
  0 siblings, 1 reply; 9+ messages in thread
From: Ido Schimmel @ 2021-12-26 12:40 UTC (permalink / raw)
  To: Justin Iurman; +Cc: netdev, davem, kuba, dsahern, yoshfuji

On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
> > seems that queue depth needs to take into account the size of the
> > enqueued packets, not only their number.
> 
> The quoted paragraph contains the following sentence:
> 
>    "The queue depth is expressed as the current amount of memory
>     buffers used by the queue"
> 
> So my understanding is that we need their number, not their size.

It also says "a packet could consume one or more memory buffers,
depending on its size". If, for example, you define tc-red limit as 1M,
then it makes a lot of difference if the 1,000 packets you have in the
queue are 9,000 bytes in size or 64 bytes.

> 
> > Did you check what other IOAM implementations (SW/HW) report for queue
> > depth? I would assume that they report bytes.
> 
> Unfortunately, IOAM is quite new, and so IOAM implementations don't
> grow on trees. The Linux kernel implementation is one of the first,
> except for VPP and IOS (Cisco) which did not implement the queue
> depth data field.

At least on Mellanox/Nvidia switches, queue depth (not necessarily for
IOAM) is always reported in bytes. I have a colleague who authored a few
IOAM IETF drafts, I will ask for his input on this and share.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field
  2021-12-26 12:40     ` Ido Schimmel
@ 2021-12-26 12:59       ` Justin Iurman
  2021-12-26 13:15         ` Ido Schimmel
  0 siblings, 1 reply; 9+ messages in thread
From: Justin Iurman @ 2021-12-26 12:59 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, dsahern, yoshfuji

On Dec 26, 2021, at 1:40 PM, Ido Schimmel idosch@idosch.org wrote:
> On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
>> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
>> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
>> > seems that queue depth needs to take into account the size of the
>> > enqueued packets, not only their number.
>> 
>> The quoted paragraph contains the following sentence:
>> 
>>    "The queue depth is expressed as the current amount of memory
>>     buffers used by the queue"
>> 
>> So my understanding is that we need their number, not their size.
> 
> It also says "a packet could consume one or more memory buffers,
> depending on its size". If, for example, you define tc-red limit as 1M,
> then it makes a lot of difference if the 1,000 packets you have in the
> queue are 9,000 bytes in size or 64 bytes.

Agree. We probably could use 'backlog' instead, regarding this
statement:

  "It should be noted that the semantics of some of the node data fields
   that are defined below, such as the queue depth and buffer occupancy,
   are implementation specific.  This approach is intended to allow IOAM
   nodes with various different architectures."

It would indeed make more sense, based on your example. However, the
limit (32 bits) could be reached faster using 'backlog' rather than
'qlen'. But I guess this tradeoff is the price to pay to be as close
as possible to the spec.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field
  2021-12-26 12:59       ` Justin Iurman
@ 2021-12-26 13:15         ` Ido Schimmel
  2021-12-27 14:06           ` Justin Iurman
  0 siblings, 1 reply; 9+ messages in thread
From: Ido Schimmel @ 2021-12-26 13:15 UTC (permalink / raw)
  To: Justin Iurman; +Cc: netdev, davem, kuba, dsahern, yoshfuji

On Sun, Dec 26, 2021 at 01:59:08PM +0100, Justin Iurman wrote:
> On Dec 26, 2021, at 1:40 PM, Ido Schimmel idosch@idosch.org wrote:
> > On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
> >> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
> >> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
> >> > seems that queue depth needs to take into account the size of the
> >> > enqueued packets, not only their number.
> >> 
> >> The quoted paragraph contains the following sentence:
> >> 
> >>    "The queue depth is expressed as the current amount of memory
> >>     buffers used by the queue"
> >> 
> >> So my understanding is that we need their number, not their size.
> > 
> > It also says "a packet could consume one or more memory buffers,
> > depending on its size". If, for example, you define tc-red limit as 1M,
> > then it makes a lot of difference if the 1,000 packets you have in the
> > queue are 9,000 bytes in size or 64 bytes.
> 
> Agree. We probably could use 'backlog' instead, regarding this
> statement:
> 
>   "It should be noted that the semantics of some of the node data fields
>    that are defined below, such as the queue depth and buffer occupancy,
>    are implementation specific.  This approach is intended to allow IOAM
>    nodes with various different architectures."
> 
> It would indeed make more sense, based on your example. However, the
> limit (32 bits) could be reached faster using 'backlog' rather than
> 'qlen'. But I guess this tradeoff is the price to pay to be as close
> as possible to the spec.

At least in Linux 'backlog' is 32 bits so we are OK :)
We don't have such big buffers in hardware and I'm not sure what
insights an operator will get from a queue depth larger than 4GB...

I just got an OOO auto-reply from my colleague so I'm not sure I will be
able to share his input before next week. Anyway, reporting 'backlog'
makes sense to me, FWIW.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field
  2021-12-26 13:15         ` Ido Schimmel
@ 2021-12-27 14:06           ` Justin Iurman
  2021-12-30 14:47             ` Ido Schimmel
  0 siblings, 1 reply; 9+ messages in thread
From: Justin Iurman @ 2021-12-27 14:06 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, dsahern, yoshfuji

On Dec 26, 2021, at 2:15 PM, Ido Schimmel idosch@idosch.org wrote:
> On Sun, Dec 26, 2021 at 01:59:08PM +0100, Justin Iurman wrote:
>> On Dec 26, 2021, at 1:40 PM, Ido Schimmel idosch@idosch.org wrote:
>> > On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
>> >> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
>> >> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
>> >> > seems that queue depth needs to take into account the size of the
>> >> > enqueued packets, not only their number.
>> >> 
>> >> The quoted paragraph contains the following sentence:
>> >> 
>> >>    "The queue depth is expressed as the current amount of memory
>> >>     buffers used by the queue"
>> >> 
>> >> So my understanding is that we need their number, not their size.
>> > 
>> > It also says "a packet could consume one or more memory buffers,
>> > depending on its size". If, for example, you define tc-red limit as 1M,
>> > then it makes a lot of difference if the 1,000 packets you have in the
>> > queue are 9,000 bytes in size or 64 bytes.
>> 
>> Agree. We probably could use 'backlog' instead, regarding this
>> statement:
>> 
>>   "It should be noted that the semantics of some of the node data fields
>>    that are defined below, such as the queue depth and buffer occupancy,
>>    are implementation specific.  This approach is intended to allow IOAM
>>    nodes with various different architectures."
>> 
>> It would indeed make more sense, based on your example. However, the
>> limit (32 bits) could be reached faster using 'backlog' rather than
>> 'qlen'. But I guess this tradeoff is the price to pay to be as close
>> as possible to the spec.
> 
> At least in Linux 'backlog' is 32 bits so we are OK :)
> We don't have such big buffers in hardware and I'm not sure what
> insights an operator will get from a queue depth larger than 4GB...

Indeed :-)

> I just got an OOO auto-reply from my colleague so I'm not sure I will be
> able to share his input before next week. Anyway, reporting 'backlog'
> makes sense to me, FWIW.

Right. I read that Linus is planning to release a -rc8 so I think I can
wait another week before posting -v3.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field
  2021-12-27 14:06           ` Justin Iurman
@ 2021-12-30 14:47             ` Ido Schimmel
  2021-12-30 16:50               ` Justin Iurman
  0 siblings, 1 reply; 9+ messages in thread
From: Ido Schimmel @ 2021-12-30 14:47 UTC (permalink / raw)
  To: Justin Iurman; +Cc: netdev, davem, kuba, dsahern, yoshfuji

On Mon, Dec 27, 2021 at 03:06:42PM +0100, Justin Iurman wrote:
> On Dec 26, 2021, at 2:15 PM, Ido Schimmel idosch@idosch.org wrote:
> > On Sun, Dec 26, 2021 at 01:59:08PM +0100, Justin Iurman wrote:
> >> On Dec 26, 2021, at 1:40 PM, Ido Schimmel idosch@idosch.org wrote:
> >> > On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
> >> >> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
> >> >> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
> >> >> > seems that queue depth needs to take into account the size of the
> >> >> > enqueued packets, not only their number.
> >> >> 
> >> >> The quoted paragraph contains the following sentence:
> >> >> 
> >> >>    "The queue depth is expressed as the current amount of memory
> >> >>     buffers used by the queue"
> >> >> 
> >> >> So my understanding is that we need their number, not their size.
> >> > 
> >> > It also says "a packet could consume one or more memory buffers,
> >> > depending on its size". If, for example, you define tc-red limit as 1M,
> >> > then it makes a lot of difference if the 1,000 packets you have in the
> >> > queue are 9,000 bytes in size or 64 bytes.
> >> 
> >> Agree. We probably could use 'backlog' instead, regarding this
> >> statement:
> >> 
> >>   "It should be noted that the semantics of some of the node data fields
> >>    that are defined below, such as the queue depth and buffer occupancy,
> >>    are implementation specific.  This approach is intended to allow IOAM
> >>    nodes with various different architectures."
> >> 
> >> It would indeed make more sense, based on your example. However, the
> >> limit (32 bits) could be reached faster using 'backlog' rather than
> >> 'qlen'. But I guess this tradeoff is the price to pay to be as close
> >> as possible to the spec.
> > 
> > At least in Linux 'backlog' is 32 bits so we are OK :)
> > We don't have such big buffers in hardware and I'm not sure what
> > insights an operator will get from a queue depth larger than 4GB...
> 
> Indeed :-)
> 
> > I just got an OOO auto-reply from my colleague so I'm not sure I will be
> > able to share his input before next week. Anyway, reporting 'backlog'
> > makes sense to me, FWIW.
> 
> Right. I read that Linus is planning to release a -rc8 so I think I can
> wait another week before posting -v3.

The answer I got from my colleagues is that they expect the field to
either encode bytes (what Mellanox/Nvidia is doing) or "cells", which is
an "allocation granularity of memory within the shared buffer" (see man
devlink-sb).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field
  2021-12-30 14:47             ` Ido Schimmel
@ 2021-12-30 16:50               ` Justin Iurman
  0 siblings, 0 replies; 9+ messages in thread
From: Justin Iurman @ 2021-12-30 16:50 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, dsahern, yoshfuji

On Dec 30, 2021, at 3:47 PM, Ido Schimmel idosch@idosch.org wrote:
> On Mon, Dec 27, 2021 at 03:06:42PM +0100, Justin Iurman wrote:
>> On Dec 26, 2021, at 2:15 PM, Ido Schimmel idosch@idosch.org wrote:
>> > On Sun, Dec 26, 2021 at 01:59:08PM +0100, Justin Iurman wrote:
>> >> On Dec 26, 2021, at 1:40 PM, Ido Schimmel idosch@idosch.org wrote:
>> >> > On Sun, Dec 26, 2021 at 12:47:51PM +0100, Justin Iurman wrote:
>> >> >> On Dec 24, 2021, at 6:53 PM, Ido Schimmel idosch@idosch.org wrote:
>> >> >> > Why 'qlen' is used and not 'backlog'? From the paragraph you quoted it
>> >> >> > seems that queue depth needs to take into account the size of the
>> >> >> > enqueued packets, not only their number.
>> >> >> 
>> >> >> The quoted paragraph contains the following sentence:
>> >> >> 
>> >> >>    "The queue depth is expressed as the current amount of memory
>> >> >>     buffers used by the queue"
>> >> >> 
>> >> >> So my understanding is that we need their number, not their size.
>> >> > 
>> >> > It also says "a packet could consume one or more memory buffers,
>> >> > depending on its size". If, for example, you define tc-red limit as 1M,
>> >> > then it makes a lot of difference if the 1,000 packets you have in the
>> >> > queue are 9,000 bytes in size or 64 bytes.
>> >> 
>> >> Agree. We probably could use 'backlog' instead, regarding this
>> >> statement:
>> >> 
>> >>   "It should be noted that the semantics of some of the node data fields
>> >>    that are defined below, such as the queue depth and buffer occupancy,
>> >>    are implementation specific.  This approach is intended to allow IOAM
>> >>    nodes with various different architectures."
>> >> 
>> >> It would indeed make more sense, based on your example. However, the
>> >> limit (32 bits) could be reached faster using 'backlog' rather than
>> >> 'qlen'. But I guess this tradeoff is the price to pay to be as close
>> >> as possible to the spec.
>> > 
>> > At least in Linux 'backlog' is 32 bits so we are OK :)
>> > We don't have such big buffers in hardware and I'm not sure what
>> > insights an operator will get from a queue depth larger than 4GB...
>> 
>> Indeed :-)
>> 
>> > I just got an OOO auto-reply from my colleague so I'm not sure I will be
>> > able to share his input before next week. Anyway, reporting 'backlog'
>> > makes sense to me, FWIW.
>> 
>> Right. I read that Linus is planning to release a -rc8 so I think I can
>> wait another week before posting -v3.
> 
> The answer I got from my colleagues is that they expect the field to
> either encode bytes (what Mellanox/Nvidia is doing) or "cells", which is
> an "allocation granularity of memory within the shared buffer" (see man
> devlink-sb).

Thanks for that. It looks like devlink-sb would be gold for IOAM. But
based on what we discussed previously with Jakub, it cannot be used here
unfortunately. So I guess we have no choice but to use 'backlog' and
therefore report bytes. Which is also fine anyway. Thanks again for your
helpful comments, Ido. I appreciate.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-30 16:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-24 13:50 [PATCH net-next v2] ipv6: ioam: Support for Queue depth data field Justin Iurman
2021-12-24 17:53 ` Ido Schimmel
2021-12-26 11:47   ` Justin Iurman
2021-12-26 12:40     ` Ido Schimmel
2021-12-26 12:59       ` Justin Iurman
2021-12-26 13:15         ` Ido Schimmel
2021-12-27 14:06           ` Justin Iurman
2021-12-30 14:47             ` Ido Schimmel
2021-12-30 16:50               ` Justin Iurman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.