netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH iproute2-next 0/6] ip: nexthop: Support resilient groups
@ 2021-03-12 17:23 Petr Machata
  2021-03-12 17:23 ` [PATCH iproute2-next 1/6] nexthop: Synchronize uAPI files Petr Machata
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Petr Machata @ 2021-03-12 17:23 UTC (permalink / raw)
  To: netdev, dsahern, stephen; +Cc: Ido Schimmel, Petr Machata

Support for resilient next-hop groups was recently accepted to Linux
kernel[1]. Resilient next-hop groups add a layer of indirection between the
SKB hash and the next hop. Thus the hash is used to reference a hash table
bucket, which is then used to reference a particular next hop. This allows
the system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.

In this patch set, introduce support for resilient next-hop groups to
iproute2.

- Patch #1 brings include/uapi/linux/nexthop.h and /rtnetlink.h up to date.

- Patches #2 and #3 add new helpers that will be useful later.

- Patch #4 extends the ip/nexthop sub-tool to accept group type as a
  command line argument, and to dispatch based on the specified type.

- Patch #5 adds the support for resilient next-hop groups.

- Patch #6 adds the support for resilient next-hop group bucket interface.

To illustrate the usage, consider the following commands:

 # ip nexthop add id 1 via 192.0.2.2 dev dummy1
 # ip nexthop add id 2 via 192.0.2.3 dev dummy1
 # ip nexthop add id 10 group 1/2 type resilient \
	buckets 8 idle_timer 60 unbalanced_timer 300

The last command creates a resilient next-hop group. It will have 8
buckets, each bucket will be considered idle when no traffic hits it for at
least 60 seconds, and if the table remains out of balance for 300 seconds,
it will be forcefully brought into balance.

And this is how the next-hop group bucket interface looks:

 # ip nexthop bucket show id 10
 id 10 index 0 idle_time 5.59 nhid 1
 id 10 index 1 idle_time 5.59 nhid 1
 id 10 index 2 idle_time 8.74 nhid 2
 id 10 index 3 idle_time 8.74 nhid 2
 id 10 index 4 idle_time 8.74 nhid 1
 id 10 index 5 idle_time 8.74 nhid 1
 id 10 index 6 idle_time 8.74 nhid 1
 id 10 index 7 idle_time 8.74 nhid 1

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2a0186a37700b0d5b8cc40be202a62af44f02fa2

Ido Schimmel (4):
  nexthop: Synchronize uAPI files
  nexthop: Add ability to specify group type
  nexthop: Add support for resilient nexthop groups
  nexthop: Add support for nexthop buckets

Petr Machata (2):
  json_print: Add print_tv()
  nexthop: Extract a helper to parse a NH ID

 include/json_print.h           |   1 +
 include/libnetlink.h           |   3 +
 include/uapi/linux/nexthop.h   |  47 +++-
 include/uapi/linux/rtnetlink.h |   7 +
 ip/ip_common.h                 |   1 +
 ip/ipmonitor.c                 |   6 +
 ip/ipnexthop.c                 | 451 ++++++++++++++++++++++++++++++++-
 lib/json_print.c               |  13 +
 lib/libnetlink.c               |  26 ++
 man/man8/ip-nexthop.8          | 112 +++++++-
 10 files changed, 650 insertions(+), 17 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH iproute2-next 1/6] nexthop: Synchronize uAPI files
  2021-03-12 17:23 [PATCH iproute2-next 0/6] ip: nexthop: Support resilient groups Petr Machata
@ 2021-03-12 17:23 ` Petr Machata
  2021-03-12 17:23 ` [PATCH iproute2-next 2/6] json_print: Add print_tv() Petr Machata
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2021-03-12 17:23 UTC (permalink / raw)
  To: netdev, dsahern, stephen; +Cc: Ido Schimmel, Petr Machata, Petr Machata

From: Petr Machata <me@pmachata.org>

From: Ido Schimmel <idosch@nvidia.com>

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 include/uapi/linux/nexthop.h   | 47 +++++++++++++++++++++++++++++++++-
 include/uapi/linux/rtnetlink.h |  7 +++++
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/nexthop.h b/include/uapi/linux/nexthop.h
index b0a5613905ef..37b14b4ea6c4 100644
--- a/include/uapi/linux/nexthop.h
+++ b/include/uapi/linux/nexthop.h
@@ -21,7 +21,10 @@ struct nexthop_grp {
 };
 
 enum {
-	NEXTHOP_GRP_TYPE_MPATH,  /* default type if not specified */
+	NEXTHOP_GRP_TYPE_MPATH,  /* hash-threshold nexthop group
+				  * default type if not specified
+				  */
+	NEXTHOP_GRP_TYPE_RES,    /* resilient nexthop group */
 	__NEXTHOP_GRP_TYPE_MAX,
 };
 
@@ -52,8 +55,50 @@ enum {
 	NHA_FDB,	/* flag; nexthop belongs to a bridge fdb */
 	/* if NHA_FDB is added, OIF, BLACKHOLE, ENCAP cannot be set */
 
+	/* nested; resilient nexthop group attributes */
+	NHA_RES_GROUP,
+	/* nested; nexthop bucket attributes */
+	NHA_RES_BUCKET,
+
 	__NHA_MAX,
 };
 
 #define NHA_MAX	(__NHA_MAX - 1)
+
+enum {
+	NHA_RES_GROUP_UNSPEC,
+	/* Pad attribute for 64-bit alignment. */
+	NHA_RES_GROUP_PAD = NHA_RES_GROUP_UNSPEC,
+
+	/* u16; number of nexthop buckets in a resilient nexthop group */
+	NHA_RES_GROUP_BUCKETS,
+	/* clock_t as u32; nexthop bucket idle timer (per-group) */
+	NHA_RES_GROUP_IDLE_TIMER,
+	/* clock_t as u32; nexthop unbalanced timer */
+	NHA_RES_GROUP_UNBALANCED_TIMER,
+	/* clock_t as u64; nexthop unbalanced time */
+	NHA_RES_GROUP_UNBALANCED_TIME,
+
+	__NHA_RES_GROUP_MAX,
+};
+
+#define NHA_RES_GROUP_MAX	(__NHA_RES_GROUP_MAX - 1)
+
+enum {
+	NHA_RES_BUCKET_UNSPEC,
+	/* Pad attribute for 64-bit alignment. */
+	NHA_RES_BUCKET_PAD = NHA_RES_BUCKET_UNSPEC,
+
+	/* u16; nexthop bucket index */
+	NHA_RES_BUCKET_INDEX,
+	/* clock_t as u64; nexthop bucket idle time */
+	NHA_RES_BUCKET_IDLE_TIME,
+	/* u32; nexthop id assigned to the nexthop bucket */
+	NHA_RES_BUCKET_NH_ID,
+
+	__NHA_RES_BUCKET_MAX,
+};
+
+#define NHA_RES_BUCKET_MAX	(__NHA_RES_BUCKET_MAX - 1)
+
 #endif
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index b34b9add5f65..f62cccc17651 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -178,6 +178,13 @@ enum {
 	RTM_GETVLAN,
 #define RTM_GETVLAN	RTM_GETVLAN
 
+	RTM_NEWNEXTHOPBUCKET = 116,
+#define RTM_NEWNEXTHOPBUCKET	RTM_NEWNEXTHOPBUCKET
+	RTM_DELNEXTHOPBUCKET,
+#define RTM_DELNEXTHOPBUCKET	RTM_DELNEXTHOPBUCKET
+	RTM_GETNEXTHOPBUCKET,
+#define RTM_GETNEXTHOPBUCKET	RTM_GETNEXTHOPBUCKET
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH iproute2-next 2/6] json_print: Add print_tv()
  2021-03-12 17:23 [PATCH iproute2-next 0/6] ip: nexthop: Support resilient groups Petr Machata
  2021-03-12 17:23 ` [PATCH iproute2-next 1/6] nexthop: Synchronize uAPI files Petr Machata
@ 2021-03-12 17:23 ` Petr Machata
  2021-03-12 17:23 ` [PATCH iproute2-next 3/6] nexthop: Extract a helper to parse a NH ID Petr Machata
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2021-03-12 17:23 UTC (permalink / raw)
  To: netdev, dsahern, stephen; +Cc: Ido Schimmel, Petr Machata, Petr Machata

From: Petr Machata <me@pmachata.org>

From: Petr Machata <petrm@nvidia.com>

Add a helper to dump a timeval. Print by first converting to double and
then dispatching to print_color_float().

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 include/json_print.h |  1 +
 lib/json_print.c     | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/json_print.h b/include/json_print.h
index 6fcf9fd910ec..63eee3823fe4 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -81,6 +81,7 @@ _PRINT_FUNC(0xhex, unsigned long long)
 _PRINT_FUNC(luint, unsigned long)
 _PRINT_FUNC(lluint, unsigned long long)
 _PRINT_FUNC(float, double)
+_PRINT_FUNC(tv, struct timeval *)
 #undef _PRINT_FUNC
 
 #define _PRINT_NAME_VALUE_FUNC(type_name, type, format_char)		  \
diff --git a/lib/json_print.c b/lib/json_print.c
index 994a2f8d6ae0..1018bfb36d94 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -299,6 +299,19 @@ int print_color_null(enum output_type type,
 	return ret;
 }
 
+int print_color_tv(enum output_type type,
+		   enum color_attr color,
+		   const char *key,
+		   const char *fmt,
+		   struct timeval *tv)
+{
+	double usecs = tv->tv_usec;
+	double secs = tv->tv_sec;
+	double time = secs + usecs / 1000000;
+
+	return print_color_float(type, color, key, fmt, time);
+}
+
 /* Print line separator (if not in JSON mode) */
 void print_nl(void)
 {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH iproute2-next 3/6] nexthop: Extract a helper to parse a NH ID
  2021-03-12 17:23 [PATCH iproute2-next 0/6] ip: nexthop: Support resilient groups Petr Machata
  2021-03-12 17:23 ` [PATCH iproute2-next 1/6] nexthop: Synchronize uAPI files Petr Machata
  2021-03-12 17:23 ` [PATCH iproute2-next 2/6] json_print: Add print_tv() Petr Machata
@ 2021-03-12 17:23 ` Petr Machata
  2021-03-12 17:23 ` [PATCH iproute2-next 4/6] nexthop: Add ability to specify group type Petr Machata
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2021-03-12 17:23 UTC (permalink / raw)
  To: netdev, dsahern, stephen; +Cc: Ido Schimmel, Petr Machata, Petr Machata

From: Petr Machata <me@pmachata.org>

From: Petr Machata <petrm@nvidia.com>

NH ID extraction is a common operation, and will become more common still
with the resilient NH groups support. Add a helper that does what it
usually done and returns the parsed NH ID.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 ip/ipnexthop.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/ip/ipnexthop.c b/ip/ipnexthop.c
index 20cde586596b..126b0b17cab4 100644
--- a/ip/ipnexthop.c
+++ b/ip/ipnexthop.c
@@ -327,6 +327,15 @@ static int add_nh_group_attr(struct nlmsghdr *n, int maxlen, char *argv)
 	return addattr_l(n, maxlen, NHA_GROUP, grps, count * sizeof(*grps));
 }
 
+static int ipnh_parse_id(const char *argv)
+{
+	__u32 id;
+
+	if (get_unsigned(&id, argv, 0))
+		invarg("invalid id value", argv);
+	return id;
+}
+
 static int ipnh_modify(int cmd, unsigned int flags, int argc, char **argv)
 {
 	struct {
@@ -343,12 +352,9 @@ static int ipnh_modify(int cmd, unsigned int flags, int argc, char **argv)
 
 	while (argc > 0) {
 		if (!strcmp(*argv, "id")) {
-			__u32 id;
-
 			NEXT_ARG();
-			if (get_unsigned(&id, *argv, 0))
-				invarg("invalid id value", *argv);
-			addattr32(&req.n, sizeof(req), NHA_ID, id);
+			addattr32(&req.n, sizeof(req), NHA_ID,
+				  ipnh_parse_id(*argv));
 		} else if (!strcmp(*argv, "dev")) {
 			int ifindex;
 
@@ -485,12 +491,8 @@ static int ipnh_list_flush(int argc, char **argv, int action)
 			if (!filter.master)
 				invarg("VRF does not exist\n", *argv);
 		} else if (!strcmp(*argv, "id")) {
-			__u32 id;
-
 			NEXT_ARG();
-			if (get_unsigned(&id, *argv, 0))
-				invarg("invalid id value", *argv);
-			return ipnh_get_id(id);
+			return ipnh_get_id(ipnh_parse_id(*argv));
 		} else if (!matches(*argv, "protocol")) {
 			__u32 proto;
 
@@ -536,8 +538,7 @@ static int ipnh_get(int argc, char **argv)
 	while (argc > 0) {
 		if (!strcmp(*argv, "id")) {
 			NEXT_ARG();
-			if (get_unsigned(&id, *argv, 0))
-				invarg("invalid id value", *argv);
+			id = ipnh_parse_id(*argv);
 		} else  {
 			usage();
 		}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH iproute2-next 4/6] nexthop: Add ability to specify group type
  2021-03-12 17:23 [PATCH iproute2-next 0/6] ip: nexthop: Support resilient groups Petr Machata
                   ` (2 preceding siblings ...)
  2021-03-12 17:23 ` [PATCH iproute2-next 3/6] nexthop: Extract a helper to parse a NH ID Petr Machata
@ 2021-03-12 17:23 ` Petr Machata
  2021-03-14 15:55   ` David Ahern
  2021-03-12 17:23 ` [PATCH iproute2-next 5/6] nexthop: Add support for resilient nexthop groups Petr Machata
  2021-03-12 17:23 ` [PATCH iproute2-next 6/6] nexthop: Add support for nexthop buckets Petr Machata
  5 siblings, 1 reply; 9+ messages in thread
From: Petr Machata @ 2021-03-12 17:23 UTC (permalink / raw)
  To: netdev, dsahern, stephen; +Cc: Ido Schimmel, Petr Machata

From: Petr Machata <me@pmachata.org>

From: Ido Schimmel <idosch@nvidia.com>

Next patches are going to add a 'resilient' nexthop group type, so allow
users to specify the type using the 'type' argument. Currently, only
'mpath' type is supported.

These two command are equivalent:

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 ip/ipnexthop.c        | 32 +++++++++++++++++++++++++++++++-
 man/man8/ip-nexthop.8 | 18 ++++++++++++++++--
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/ip/ipnexthop.c b/ip/ipnexthop.c
index 126b0b17cab4..5aae32629edd 100644
--- a/ip/ipnexthop.c
+++ b/ip/ipnexthop.c
@@ -42,8 +42,10 @@ static void usage(void)
 		"SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]\n"
 		"            [ groups ] [ fdb ]\n"
 		"NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]\n"
-		"        [ encap ENCAPTYPE ENCAPHDR ] | group GROUP [ fdb ] }\n"
+		"        [ encap ENCAPTYPE ENCAPHDR ] |\n"
+		"        group GROUP [ fdb ] [ type TYPE ] }\n"
 		"GROUP := [ <id[,weight]>/<id[,weight]>/... ]\n"
+		"TYPE := { mpath }\n"
 		"ENCAPTYPE := [ mpls ]\n"
 		"ENCAPHDR := [ MPLSLABEL ]\n");
 	exit(-1);
@@ -327,6 +329,32 @@ static int add_nh_group_attr(struct nlmsghdr *n, int maxlen, char *argv)
 	return addattr_l(n, maxlen, NHA_GROUP, grps, count * sizeof(*grps));
 }
 
+static int read_nh_group_type(const char *name)
+{
+	if (strcmp(name, "mpath") == 0)
+		return NEXTHOP_GRP_TYPE_MPATH;
+
+	return __NEXTHOP_GRP_TYPE_MAX;
+}
+
+static void parse_nh_group_type(struct nlmsghdr *n, int maxlen, int *argcp,
+				char ***argvp)
+{
+	char **argv = *argvp;
+	int argc = *argcp;
+	__u16 type;
+
+	NEXT_ARG();
+	type = read_nh_group_type(*argv);
+	if (type > NEXTHOP_GRP_TYPE_MAX)
+		invarg("\"type\" value is invalid\n", *argv);
+
+	*argcp = argc;
+	*argvp = argv;
+
+	addattr16(n, maxlen, NHA_GROUP_TYPE, type);
+}
+
 static int ipnh_parse_id(const char *argv)
 {
 	__u32 id;
@@ -409,6 +437,8 @@ static int ipnh_modify(int cmd, unsigned int flags, int argc, char **argv)
 
 			if (add_nh_group_attr(&req.n, sizeof(req), *argv))
 				invarg("\"group\" value is invalid\n", *argv);
+		} else if (!strcmp(*argv, "type")) {
+			parse_nh_group_type(&req.n, sizeof(req), &argc, &argv);
 		} else if (matches(*argv, "protocol") == 0) {
 			__u32 prot;
 
diff --git a/man/man8/ip-nexthop.8 b/man/man8/ip-nexthop.8
index 4d55f4dbcc75..f02e0555a000 100644
--- a/man/man8/ip-nexthop.8
+++ b/man/man8/ip-nexthop.8
@@ -54,7 +54,9 @@ ip-nexthop \- nexthop object management
 .BR fdb " ] | "
 .B  group
 .IR GROUP " [ "
-.BR fdb " ] } "
+.BR fdb " ] [ "
+.B type
+.IR TYPE " ] } "
 
 .ti -8
 .IR ENCAP " := [ "
@@ -71,6 +73,10 @@ ip-nexthop \- nexthop object management
 .IR GROUP " := "
 .BR id "[," weight "[/...]"
 
+.ti -8
+.IR TYPE " := { "
+.BR mpath " }"
+
 .SH DESCRIPTION
 .B ip nexthop
 is used to manipulate entries in the kernel's nexthop tables.
@@ -122,9 +128,17 @@ is a set of encapsulation attributes specific to the
 .in -2
 
 .TP
-.BI group " GROUP"
+.BI group " GROUP [ " type " TYPE ]"
 create a nexthop group. Group specification is id with an optional
 weight (id,weight) and a '/' as a separator between entries.
+.sp
+.I TYPE
+is a string specifying the nexthop group type. Namely:
+
+.in +8
+.BI mpath
+- multipath nexthop group
+
 .TP
 .B blackhole
 create a blackhole nexthop
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH iproute2-next 5/6] nexthop: Add support for resilient nexthop groups
  2021-03-12 17:23 [PATCH iproute2-next 0/6] ip: nexthop: Support resilient groups Petr Machata
                   ` (3 preceding siblings ...)
  2021-03-12 17:23 ` [PATCH iproute2-next 4/6] nexthop: Add ability to specify group type Petr Machata
@ 2021-03-12 17:23 ` Petr Machata
  2021-03-12 17:23 ` [PATCH iproute2-next 6/6] nexthop: Add support for nexthop buckets Petr Machata
  5 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2021-03-12 17:23 UTC (permalink / raw)
  To: netdev, dsahern, stephen; +Cc: Ido Schimmel, Petr Machata

From: Petr Machata <me@pmachata.org>

From: Ido Schimmel <idosch@nvidia.com>

Add ability to configure resilient nexthop groups and show their current
configuration. Example:

 # ip nexthop add id 10 group 1/2 type resilient buckets 8
 # ip nexthop show id 10
 id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0
 # ip -j -p nexthop show id 10
 [ {
         "id": 10,
         "group": [ {
                 "id": 1
             },{
                 "id": 2
             } ],
         "type": "resilient",
         "resilient_args": {
             "buckets": 8,
             "idle_timer": 120,
             "unbalanced_timer": 0
         },
         "flags": [ ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 ip/ipnexthop.c        | 144 +++++++++++++++++++++++++++++++++++++++++-
 man/man8/ip-nexthop.8 |  55 +++++++++++++++-
 2 files changed, 193 insertions(+), 6 deletions(-)

diff --git a/ip/ipnexthop.c b/ip/ipnexthop.c
index 5aae32629edd..1d50bf7529c4 100644
--- a/ip/ipnexthop.c
+++ b/ip/ipnexthop.c
@@ -43,9 +43,12 @@ static void usage(void)
 		"            [ groups ] [ fdb ]\n"
 		"NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]\n"
 		"        [ encap ENCAPTYPE ENCAPHDR ] |\n"
-		"        group GROUP [ fdb ] [ type TYPE ] }\n"
+		"        group GROUP [ fdb ] [ type TYPE [ TYPE_ARGS ] ] }\n"
 		"GROUP := [ <id[,weight]>/<id[,weight]>/... ]\n"
-		"TYPE := { mpath }\n"
+		"TYPE := { mpath | resilient }\n"
+		"TYPE_ARGS := [ RESILIENT_ARGS ]\n"
+		"RESILIENT_ARGS := [ buckets BUCKETS ] [ idle_timer IDLE ]\n"
+		"                  [ unbalanced_timer UNBALANCED ]\n"
 		"ENCAPTYPE := [ mpls ]\n"
 		"ENCAPHDR := [ MPLSLABEL ]\n");
 	exit(-1);
@@ -203,6 +206,66 @@ static void print_nh_group(FILE *fp, const struct rtattr *grps_attr)
 	close_json_array(PRINT_JSON, NULL);
 }
 
+static const char *nh_group_type_name(__u16 type)
+{
+	switch (type) {
+	case NEXTHOP_GRP_TYPE_MPATH:
+		return "mpath";
+	case NEXTHOP_GRP_TYPE_RES:
+		return "resilient";
+	default:
+		return "<unknown type>";
+	}
+}
+
+static void print_nh_group_type(FILE *fp, const struct rtattr *grp_type_attr)
+{
+	__u16 type = rta_getattr_u16(grp_type_attr);
+
+	if (type == NEXTHOP_GRP_TYPE_MPATH)
+		/* Do not print type in order not to break existing output. */
+		return;
+
+	print_string(PRINT_ANY, "type", "type %s ", nh_group_type_name(type));
+}
+
+static void print_nh_res_group(FILE *fp, const struct rtattr *res_grp_attr)
+{
+	struct rtattr *tb[NHA_RES_GROUP_MAX + 1];
+	struct rtattr *rta;
+	struct timeval tv;
+
+	parse_rtattr_nested(tb, NHA_RES_GROUP_MAX, res_grp_attr);
+
+	open_json_object("resilient_args");
+
+	if (tb[NHA_RES_GROUP_BUCKETS])
+		print_uint(PRINT_ANY, "buckets", "buckets %u ",
+			   rta_getattr_u16(tb[NHA_RES_GROUP_BUCKETS]));
+
+	if (tb[NHA_RES_GROUP_IDLE_TIMER]) {
+		rta = tb[NHA_RES_GROUP_IDLE_TIMER];
+		__jiffies_to_tv(&tv, rta_getattr_u32(rta));
+		print_tv(PRINT_ANY, "idle_timer", "idle_timer %g ", &tv);
+	}
+
+	if (tb[NHA_RES_GROUP_UNBALANCED_TIMER]) {
+		rta = tb[NHA_RES_GROUP_UNBALANCED_TIMER];
+		__jiffies_to_tv(&tv, rta_getattr_u32(rta));
+		print_tv(PRINT_ANY, "unbalanced_timer", "unbalanced_timer %g ",
+			 &tv);
+	}
+
+	if (tb[NHA_RES_GROUP_UNBALANCED_TIME]) {
+		rta = tb[NHA_RES_GROUP_UNBALANCED_TIME];
+		__jiffies_to_tv(&tv, rta_getattr_u32(rta));
+		print_tv(PRINT_ANY, "unbalanced_time", "unbalanced_time %g ",
+			 &tv);
+	}
+
+	close_json_object();
+}
+
 int print_nexthop(struct nlmsghdr *n, void *arg)
 {
 	struct nhmsg *nhm = NLMSG_DATA(n);
@@ -229,7 +292,7 @@ int print_nexthop(struct nlmsghdr *n, void *arg)
 	if (filter.proto && filter.proto != nhm->nh_protocol)
 		return 0;
 
-	parse_rtattr(tb, NHA_MAX, RTM_NHA(nhm), len);
+	parse_rtattr_flags(tb, NHA_MAX, RTM_NHA(nhm), len, NLA_F_NESTED);
 
 	open_json_object(NULL);
 
@@ -243,6 +306,12 @@ int print_nexthop(struct nlmsghdr *n, void *arg)
 	if (tb[NHA_GROUP])
 		print_nh_group(fp, tb[NHA_GROUP]);
 
+	if (tb[NHA_GROUP_TYPE])
+		print_nh_group_type(fp, tb[NHA_GROUP_TYPE]);
+
+	if (tb[NHA_RES_GROUP])
+		print_nh_res_group(fp, tb[NHA_RES_GROUP]);
+
 	if (tb[NHA_ENCAP])
 		lwt_print_encap(fp, tb[NHA_ENCAP_TYPE], tb[NHA_ENCAP]);
 
@@ -333,10 +402,70 @@ static int read_nh_group_type(const char *name)
 {
 	if (strcmp(name, "mpath") == 0)
 		return NEXTHOP_GRP_TYPE_MPATH;
+	else if (strcmp(name, "resilient") == 0)
+		return NEXTHOP_GRP_TYPE_RES;
 
 	return __NEXTHOP_GRP_TYPE_MAX;
 }
 
+static void parse_nh_group_type_res(struct nlmsghdr *n, int maxlen, int *argcp,
+				    char ***argvp)
+{
+	char **argv = *argvp;
+	struct rtattr *nest;
+	int argc = *argcp;
+
+	if (!NEXT_ARG_OK())
+		return;
+
+	nest = addattr_nest(n, maxlen, NHA_RES_GROUP);
+	nest->rta_type |= NLA_F_NESTED;
+
+	NEXT_ARG_FWD();
+	while (argc > 0) {
+		if (strcmp(*argv, "buckets") == 0) {
+			__u16 buckets;
+
+			NEXT_ARG();
+			if (get_u16(&buckets, *argv, 0))
+				invarg("invalid buckets value", *argv);
+
+			addattr16(n, maxlen, NHA_RES_GROUP_BUCKETS, buckets);
+		} else if (strcmp(*argv, "idle_timer") == 0) {
+			__u32 idle_timer;
+
+			NEXT_ARG();
+			if (get_unsigned(&idle_timer, *argv, 0) ||
+			    idle_timer >= ~0UL / 100)
+				invarg("invalid idle timer value", *argv);
+
+			addattr32(n, maxlen, NHA_RES_GROUP_IDLE_TIMER,
+				  idle_timer * 100);
+		} else if (strcmp(*argv, "unbalanced_timer") == 0) {
+			__u32 unbalanced_timer;
+
+			NEXT_ARG();
+			if (get_unsigned(&unbalanced_timer, *argv, 0) ||
+			    unbalanced_timer >= ~0UL / 100)
+				invarg("invalid unbalanced timer value", *argv);
+
+			addattr32(n, maxlen, NHA_RES_GROUP_UNBALANCED_TIMER,
+				  unbalanced_timer * 100);
+		} else {
+			break;
+		}
+		argc--; argv++;
+	}
+
+	/* argv is currently the first unparsed argument, but ipnh_modify()
+	 * will move to the next, so step back.
+	 */
+	*argcp = argc + 1;
+	*argvp = argv - 1;
+
+	addattr_nest_end(n, nest);
+}
+
 static void parse_nh_group_type(struct nlmsghdr *n, int maxlen, int *argcp,
 				char ***argvp)
 {
@@ -349,6 +478,15 @@ static void parse_nh_group_type(struct nlmsghdr *n, int maxlen, int *argcp,
 	if (type > NEXTHOP_GRP_TYPE_MAX)
 		invarg("\"type\" value is invalid\n", *argv);
 
+	switch (type) {
+	case NEXTHOP_GRP_TYPE_MPATH:
+		/* No additional arguments */
+		break;
+	case NEXTHOP_GRP_TYPE_RES:
+		parse_nh_group_type_res(n, maxlen, &argc, &argv);
+		break;
+	}
+
 	*argcp = argc;
 	*argvp = argv;
 
diff --git a/man/man8/ip-nexthop.8 b/man/man8/ip-nexthop.8
index f02e0555a000..c68fcc0f9cf5 100644
--- a/man/man8/ip-nexthop.8
+++ b/man/man8/ip-nexthop.8
@@ -56,7 +56,7 @@ ip-nexthop \- nexthop object management
 .IR GROUP " [ "
 .BR fdb " ] [ "
 .B type
-.IR TYPE " ] } "
+.IR TYPE " [ " TYPE_ARGS " ] ] }"
 
 .ti -8
 .IR ENCAP " := [ "
@@ -75,7 +75,20 @@ ip-nexthop \- nexthop object management
 
 .ti -8
 .IR TYPE " := { "
-.BR mpath " }"
+.BR mpath " | " resilient " }"
+
+.ti -8
+.IR TYPE_ARGS " := [ "
+.IR RESILIENT_ARGS " ] "
+
+.ti -8
+.IR RESILIENT_ARGS " := "
+.RB "[ " buckets
+.IR BUCKETS " ] [ "
+.B  idle_timer
+.IR IDLE " ] [ "
+.B  unbalanced_timer
+.IR UNBALANCED " ]"
 
 .SH DESCRIPTION
 .B ip nexthop
@@ -128,7 +141,7 @@ is a set of encapsulation attributes specific to the
 .in -2
 
 .TP
-.BI group " GROUP [ " type " TYPE ]"
+.BI group " GROUP [ " type " TYPE [ TYPE_ARGS ] ]"
 create a nexthop group. Group specification is id with an optional
 weight (id,weight) and a '/' as a separator between entries.
 .sp
@@ -138,6 +151,37 @@ is a string specifying the nexthop group type. Namely:
 .in +8
 .BI mpath
 - multipath nexthop group
+.sp
+.BI resilient
+- resilient nexthop group. Group is resilient to addition and deletion of
+nexthops
+
+.sp
+.in -8
+.I TYPE_ARGS
+is a set of attributes specific to the
+.I TYPE.
+
+.in +8
+.B resilient
+.in +2
+.B buckets
+.I BUCKETS
+- Number of nexthop buckets. Cannot be changed for an existing group
+.sp
+
+.B idle_timer
+.I IDLE
+- Time in seconds in which a nexthop bucket does not see traffic and is
+therefore considered idle. Default is 120 seconds
+
+.B unbalanced_timer
+.I UNBALANCED
+- Time in seconds in which a nexthop group is unbalanced and is therefore
+considered unbalanced. The kernel will try to rebalance unbalanced groups, which
+might result in some flows being reset. A value of 0 means that no
+rebalancing will take place. Default is 0 seconds
+.in -2
 
 .TP
 .B blackhole
@@ -224,6 +268,11 @@ ip nexthop add id 7 group 5/6 fdb
 Adds a fdb nexthop group with id 7. A fdb nexthop group can only have
 fdb nexthops.
 .RE
+.PP
+ip nexthop add id 10 group 1/2 type resilient buckets 32
+.RS 4
+Add a resilient nexthop group with id 10 and 32 nexthop buckets.
+.RE
 .SH SEE ALSO
 .br
 .BR ip (8)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH iproute2-next 6/6] nexthop: Add support for nexthop buckets
  2021-03-12 17:23 [PATCH iproute2-next 0/6] ip: nexthop: Support resilient groups Petr Machata
                   ` (4 preceding siblings ...)
  2021-03-12 17:23 ` [PATCH iproute2-next 5/6] nexthop: Add support for resilient nexthop groups Petr Machata
@ 2021-03-12 17:23 ` Petr Machata
  5 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2021-03-12 17:23 UTC (permalink / raw)
  To: netdev, dsahern, stephen; +Cc: Ido Schimmel, Petr Machata

From: Petr Machata <me@pmachata.org>

From: Ido Schimmel <idosch@nvidia.com>

Add ability to dump multiple nexthop buckets and get a specific one.
Example:

 # ip nexthop add id 10 group 1/2 type resilient buckets 8
 # ip nexthop
 id 1 via 192.0.2.2 dev dummy10 scope link
 id 2 via 192.0.2.19 dev dummy20 scope link
 id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0 unbalanced_time 0
 # ip nexthop bucket
 id 10 index 0 idle_time 28.1 nhid 2
 id 10 index 1 idle_time 28.1 nhid 2
 id 10 index 2 idle_time 28.1 nhid 2
 id 10 index 3 idle_time 28.1 nhid 2
 id 10 index 4 idle_time 28.1 nhid 1
 id 10 index 5 idle_time 28.1 nhid 1
 id 10 index 6 idle_time 28.1 nhid 1
 id 10 index 7 idle_time 28.1 nhid 1
 # ip nexthop bucket show nhid 1
 id 10 index 4 idle_time 53.59 nhid 1
 id 10 index 5 idle_time 53.59 nhid 1
 id 10 index 6 idle_time 53.59 nhid 1
 id 10 index 7 idle_time 53.59 nhid 1
 # ip nexthop bucket get id 10 index 5
 id 10 index 5 idle_time 81 nhid 1
 # ip -j -p nexthop bucket get id 10 index 5
 [ {
         "id": 10,
         "bucket": {
             "index": 5,
             "idle_time": 104.89,
             "nhid": 1
         },
         "flags": [ ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 include/libnetlink.h  |   3 +
 ip/ip_common.h        |   1 +
 ip/ipmonitor.c        |   6 +
 ip/ipnexthop.c        | 254 ++++++++++++++++++++++++++++++++++++++++++
 lib/libnetlink.c      |  26 +++++
 man/man8/ip-nexthop.8 |  45 ++++++++
 6 files changed, 335 insertions(+)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index b9073a6a13ad..e8ed5d7fb495 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -97,6 +97,9 @@ int rtnl_dump_request_n(struct rtnl_handle *rth, struct nlmsghdr *n)
 int rtnl_nexthopdump_req(struct rtnl_handle *rth, int family,
 			 req_filter_fn_t filter_fn)
 	__attribute__((warn_unused_result));
+int rtnl_nexthop_bucket_dump_req(struct rtnl_handle *rth, int family,
+				 req_filter_fn_t filter_fn)
+	__attribute__((warn_unused_result));
 
 struct rtnl_ctrl_data {
 	int	nsid;
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 9a31e837563f..55a5521c4275 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -53,6 +53,7 @@ int print_rule(struct nlmsghdr *n, void *arg);
 int print_netconf(struct rtnl_ctrl_data *ctrl,
 		  struct nlmsghdr *n, void *arg);
 int print_nexthop(struct nlmsghdr *n, void *arg);
+int print_nexthop_bucket(struct nlmsghdr *n, void *arg);
 void netns_map_init(void);
 void netns_nsid_socket_init(void);
 int print_nsid(struct nlmsghdr *n, void *arg);
diff --git a/ip/ipmonitor.c b/ip/ipmonitor.c
index 99f5fda8ba1f..d7f31cf5d1b5 100644
--- a/ip/ipmonitor.c
+++ b/ip/ipmonitor.c
@@ -90,6 +90,12 @@ static int accept_msg(struct rtnl_ctrl_data *ctrl,
 		print_nexthop(n, arg);
 		return 0;
 
+	case RTM_NEWNEXTHOPBUCKET:
+	case RTM_DELNEXTHOPBUCKET:
+		print_headers(fp, "[NEXTHOPBUCKET]", ctrl);
+		print_nexthop_bucket(n, arg);
+		return 0;
+
 	case RTM_NEWLINK:
 	case RTM_DELLINK:
 		ll_remember_index(n, NULL);
diff --git a/ip/ipnexthop.c b/ip/ipnexthop.c
index 1d50bf7529c4..0263307c49df 100644
--- a/ip/ipnexthop.c
+++ b/ip/ipnexthop.c
@@ -21,6 +21,8 @@ static struct {
 	unsigned int master;
 	unsigned int proto;
 	unsigned int fdb;
+	unsigned int id;
+	unsigned int nhid;
 } filter;
 
 enum {
@@ -39,8 +41,11 @@ static void usage(void)
 		"Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR\n"
 		"       ip nexthop { add | replace } id ID NH [ protocol ID ]\n"
 		"       ip nexthop { get | del } id ID\n"
+		"       ip nexthop bucket list BUCKET_SELECTOR\n"
+		"       ip nexthop bucket get id ID index INDEX\n"
 		"SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]\n"
 		"            [ groups ] [ fdb ]\n"
+		"BUCKET_SELECTOR := SELECTOR | [ nhid ID ]\n"
 		"NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]\n"
 		"        [ encap ENCAPTYPE ENCAPHDR ] |\n"
 		"        group GROUP [ fdb ] [ type TYPE [ TYPE_ARGS ] ] }\n"
@@ -85,6 +90,36 @@ static int nh_dump_filter(struct nlmsghdr *nlh, int reqlen)
 	return 0;
 }
 
+static int nh_dump_bucket_filter(struct nlmsghdr *nlh, int reqlen)
+{
+	struct rtattr *nest;
+	int err = 0;
+
+	err = nh_dump_filter(nlh, reqlen);
+	if (err)
+		return err;
+
+	if (filter.id) {
+		err = addattr32(nlh, reqlen, NHA_ID, filter.id);
+		if (err)
+			return err;
+	}
+
+	if (filter.nhid) {
+		nest = addattr_nest(nlh, reqlen, NHA_RES_BUCKET);
+		nest->rta_type |= NLA_F_NESTED;
+
+		err = addattr32(nlh, reqlen, NHA_RES_BUCKET_NH_ID,
+				filter.nhid);
+		if (err)
+			return err;
+
+		addattr_nest_end(nlh, nest);
+	}
+
+	return err;
+}
+
 static struct rtnl_handle rth_del = { .fd = -1 };
 
 static int delete_nexthop(__u32 id)
@@ -266,6 +301,33 @@ static void print_nh_res_group(FILE *fp, const struct rtattr *res_grp_attr)
 	close_json_object();
 }
 
+static void print_nh_res_bucket(FILE *fp, const struct rtattr *res_bucket_attr)
+{
+	struct rtattr *tb[NHA_RES_BUCKET_MAX + 1];
+
+	parse_rtattr_nested(tb, NHA_RES_BUCKET_MAX, res_bucket_attr);
+
+	open_json_object("bucket");
+
+	if (tb[NHA_RES_BUCKET_INDEX])
+		print_uint(PRINT_ANY, "index", "index %u ",
+			   rta_getattr_u16(tb[NHA_RES_BUCKET_INDEX]));
+
+	if (tb[NHA_RES_BUCKET_IDLE_TIME]) {
+		struct rtattr *rta = tb[NHA_RES_BUCKET_IDLE_TIME];
+		struct timeval tv;
+
+		__jiffies_to_tv(&tv, rta_getattr_u64(rta));
+		print_tv(PRINT_ANY, "idle_time", "idle_time %g ", &tv);
+	}
+
+	if (tb[NHA_RES_BUCKET_NH_ID])
+		print_uint(PRINT_ANY, "nhid", "nhid %u ",
+			   rta_getattr_u32(tb[NHA_RES_BUCKET_NH_ID]));
+
+	close_json_object();
+}
+
 int print_nexthop(struct nlmsghdr *n, void *arg)
 {
 	struct nhmsg *nhm = NLMSG_DATA(n);
@@ -346,6 +408,50 @@ int print_nexthop(struct nlmsghdr *n, void *arg)
 	return 0;
 }
 
+int print_nexthop_bucket(struct nlmsghdr *n, void *arg)
+{
+	struct nhmsg *nhm = NLMSG_DATA(n);
+	struct rtattr *tb[NHA_MAX+1];
+	FILE *fp = (FILE *)arg;
+	int len;
+
+	if (n->nlmsg_type != RTM_DELNEXTHOPBUCKET &&
+	    n->nlmsg_type != RTM_NEWNEXTHOPBUCKET) {
+		fprintf(stderr, "Not a nexthop bucket: %08x %08x %08x\n",
+			n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
+		return -1;
+	}
+
+	len = n->nlmsg_len - NLMSG_SPACE(sizeof(*nhm));
+	if (len < 0) {
+		close_json_object();
+		fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
+		return -1;
+	}
+
+	parse_rtattr_flags(tb, NHA_MAX, RTM_NHA(nhm), len, NLA_F_NESTED);
+
+	open_json_object(NULL);
+
+	if (n->nlmsg_type == RTM_DELNEXTHOP)
+		print_bool(PRINT_ANY, "deleted", "Deleted ", true);
+
+	if (tb[NHA_ID])
+		print_uint(PRINT_ANY, "id", "id %u ",
+			   rta_getattr_u32(tb[NHA_ID]));
+
+	if (tb[NHA_RES_BUCKET])
+		print_nh_res_bucket(fp, tb[NHA_RES_BUCKET]);
+
+	print_rt_flags(fp, nhm->nh_flags);
+
+	print_string(PRINT_FP, NULL, "%s", "\n");
+	close_json_object();
+	fflush(fp);
+
+	return 0;
+}
+
 static int add_nh_group_attr(struct nlmsghdr *n, int maxlen, char *argv)
 {
 	struct nexthop_grp *grps;
@@ -721,6 +827,151 @@ static int ipnh_get(int argc, char **argv)
 	return ipnh_get_id(id);
 }
 
+static int ipnh_bucket_list(int argc, char **argv)
+{
+	while (argc > 0) {
+		if (!matches(*argv, "dev")) {
+			NEXT_ARG();
+			filter.ifindex = ll_name_to_index(*argv);
+			if (!filter.ifindex)
+				invarg("Device does not exist\n", *argv);
+		} else if (!matches(*argv, "master")) {
+			NEXT_ARG();
+			filter.master = ll_name_to_index(*argv);
+			if (!filter.master)
+				invarg("Device does not exist\n", *argv);
+		} else if (matches(*argv, "vrf") == 0) {
+			NEXT_ARG();
+			if (!name_is_vrf(*argv))
+				invarg("Invalid VRF\n", *argv);
+			filter.master = ll_name_to_index(*argv);
+			if (!filter.master)
+				invarg("VRF does not exist\n", *argv);
+		} else if (!strcmp(*argv, "id")) {
+			NEXT_ARG();
+			filter.id = ipnh_parse_id(*argv);
+		} else if (!strcmp(*argv, "nhid")) {
+			NEXT_ARG();
+			filter.nhid = ipnh_parse_id(*argv);
+		} else if (matches(*argv, "help") == 0) {
+			usage();
+		} else {
+			invarg("", *argv);
+		}
+		argc--; argv++;
+	}
+
+	if (rtnl_nexthop_bucket_dump_req(&rth, preferred_family,
+					 nh_dump_bucket_filter) < 0) {
+		perror("Cannot send dump request");
+		return -2;
+	}
+
+	new_json_obj(json);
+
+	if (rtnl_dump_filter(&rth, print_nexthop_bucket, stdout) < 0) {
+		fprintf(stderr, "Dump terminated\n");
+		return -2;
+	}
+
+	delete_json_obj();
+	fflush(stdout);
+
+	return 0;
+}
+
+static int ipnh_bucket_get_id(__u32 id, __u16 bucket_index)
+{
+	struct {
+		struct nlmsghdr	n;
+		struct nhmsg	nhm;
+		char		buf[1024];
+	} req = {
+		.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct nhmsg)),
+		.n.nlmsg_flags = NLM_F_REQUEST,
+		.n.nlmsg_type  = RTM_GETNEXTHOPBUCKET,
+		.nhm.nh_family = preferred_family,
+	};
+	struct nlmsghdr *answer;
+	struct rtattr *nest;
+
+	addattr32(&req.n, sizeof(req), NHA_ID, id);
+
+	nest = addattr_nest(&req.n, sizeof(req), NHA_RES_BUCKET);
+	nest->rta_type |= NLA_F_NESTED;
+
+	addattr16(&req.n, sizeof(req), NHA_RES_BUCKET_INDEX, bucket_index);
+
+	addattr_nest_end(&req.n, nest);
+
+	if (rtnl_talk(&rth, &req.n, &answer) < 0)
+		return -2;
+
+	new_json_obj(json);
+
+	if (print_nexthop_bucket(answer, (void *)stdout) < 0) {
+		free(answer);
+		return -1;
+	}
+
+	delete_json_obj();
+	fflush(stdout);
+
+	free(answer);
+
+	return 0;
+}
+
+static int ipnh_bucket_get(int argc, char **argv)
+{
+	bool bucket_valid = false;
+	__u16 bucket_index;
+	__u32 id = 0;
+
+	while (argc > 0) {
+		if (!strcmp(*argv, "id")) {
+			NEXT_ARG();
+			id = ipnh_parse_id(*argv);
+		} else if (!strcmp(*argv, "index")) {
+			NEXT_ARG();
+			if (get_u16(&bucket_index, *argv, 0))
+				invarg("invalid bucket index value", *argv);
+			bucket_valid = true;
+		} else  {
+			usage();
+		}
+		argc--; argv++;
+	}
+
+	if (!id || !bucket_valid) {
+		usage();
+		return -1;
+	}
+
+	return ipnh_bucket_get_id(id, bucket_index);
+}
+
+static int do_ipnh_bucket(int argc, char **argv)
+{
+	if (argc < 1)
+		return ipnh_bucket_list(0, NULL);
+
+	if (!matches(*argv, "list") ||
+	    !matches(*argv, "show") ||
+	    !matches(*argv, "lst"))
+		return ipnh_bucket_list(argc-1, argv+1);
+
+	if (!matches(*argv, "get"))
+		return ipnh_bucket_get(argc-1, argv+1);
+
+	if (!matches(*argv, "help"))
+		usage();
+
+	fprintf(stderr,
+		"Command \"%s\" is unknown, try \"ip nexthop help\".\n", *argv);
+	exit(-1);
+}
+
 int do_ipnh(int argc, char **argv)
 {
 	if (argc < 1)
@@ -746,6 +997,9 @@ int do_ipnh(int argc, char **argv)
 	if (!matches(*argv, "flush"))
 		return ipnh_list_flush(argc-1, argv+1, IPNH_FLUSH);
 
+	if (!matches(*argv, "bucket"))
+		return do_ipnh_bucket(argc-1, argv+1);
+
 	if (!matches(*argv, "help"))
 		usage();
 
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index c958aa57d0cd..6885087b34f9 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -282,6 +282,32 @@ int rtnl_nexthopdump_req(struct rtnl_handle *rth, int family,
 	return send(rth->fd, &req, sizeof(req), 0);
 }
 
+int rtnl_nexthop_bucket_dump_req(struct rtnl_handle *rth, int family,
+				 req_filter_fn_t filter_fn)
+{
+	struct {
+		struct nlmsghdr nlh;
+		struct nhmsg nhm;
+		char buf[128];
+	} req = {
+		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct nhmsg)),
+		.nlh.nlmsg_type = RTM_GETNEXTHOPBUCKET,
+		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
+		.nlh.nlmsg_seq = rth->dump = ++rth->seq,
+		.nhm.nh_family = family,
+	};
+
+	if (filter_fn) {
+		int err;
+
+		err = filter_fn(&req.nlh, sizeof(req));
+		if (err)
+			return err;
+	}
+
+	return send(rth->fd, &req, sizeof(req), 0);
+}
+
 int rtnl_addrdump_req(struct rtnl_handle *rth, int family,
 		      req_filter_fn_t filter_fn)
 {
diff --git a/man/man8/ip-nexthop.8 b/man/man8/ip-nexthop.8
index c68fcc0f9cf5..d9ce634fb3d8 100644
--- a/man/man8/ip-nexthop.8
+++ b/man/man8/ip-nexthop.8
@@ -28,6 +28,14 @@ ip-nexthop \- nexthop object management
 .BR "ip nexthop" " { " get " | " del " } id "
 .I  ID
 
+.ti -8
+.BI "ip nexthop bucket list " BUCKET_SELECTOR
+
+.ti -8
+.BR "ip nexthop bucket get " id
+.I  ID
+.RI "index " INDEX
+
 .ti -8
 .IR SELECTOR " := "
 .RB "[ " id
@@ -41,6 +49,12 @@ ip-nexthop \- nexthop object management
 .BR  groups " ] [ "
 .BR  fdb " ]"
 
+.ti -8
+.IR BUCKET_SELECTOR " := "
+.IR SELECTOR
+.RB " | [ " nhid
+.IR ID " ]"
+
 .ti -8
 .IR NH " := { "
 .BR blackhole " | [ "
@@ -229,6 +243,37 @@ as show.
 ip nexthop get id ID
 get a single nexthop by id
 
+.TP
+ip nexthop bucket show
+show the contents of the nexthop bucket table or the nexthop buckets
+selected by some criteria.
+.RS
+.TP
+.BI id " ID "
+.in +0
+show the nexthop buckets that belong to a nexthop group with a given id
+.TP
+.BI nhid " ID "
+.in +0
+show the nexthop buckets that hold a nexthop with a given id
+.TP
+.BI dev " DEV "
+.in +0
+show the nexthop buckets using the given device
+.TP
+.BI vrf " NAME "
+.in +0
+show the nexthop buckets using devices associated with the vrf name
+.TP
+.BI master " DEV "
+.in +0
+show the nexthop buckets using devices enslaved to given master device
+.RE
+
+.TP
+ip nexthop bucket get id ID index INDEX
+get a single nexthop bucket by nexthop group id and bucket index
+
 .SH EXAMPLES
 .PP
 ip nexthop ls
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH iproute2-next 4/6] nexthop: Add ability to specify group type
  2021-03-12 17:23 ` [PATCH iproute2-next 4/6] nexthop: Add ability to specify group type Petr Machata
@ 2021-03-14 15:55   ` David Ahern
  2021-03-15 11:38     ` Petr Machata
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2021-03-14 15:55 UTC (permalink / raw)
  To: Petr Machata, netdev, stephen; +Cc: Ido Schimmel, Petr Machata

On 3/12/21 10:23 AM, Petr Machata wrote:
> From: Petr Machata <me@pmachata.org>
> 
> From: Ido Schimmel <idosch@nvidia.com>

All of the patches have the above. If Ido is the author and you are
sending, AIUI you add your Signed-off-by below his.

> 
> Next patches are going to add a 'resilient' nexthop group type, so allow
> users to specify the type using the 'type' argument. Currently, only
> 'mpath' type is supported.
> 
> These two command are equivalent:
> 
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  ip/ipnexthop.c        | 32 +++++++++++++++++++++++++++++++-
>  man/man8/ip-nexthop.8 | 18 ++++++++++++++++--
>  2 files changed, 47 insertions(+), 3 deletions(-)
> 

...

> diff --git a/man/man8/ip-nexthop.8 b/man/man8/ip-nexthop.8
> index 4d55f4dbcc75..f02e0555a000 100644
> --- a/man/man8/ip-nexthop.8
> +++ b/man/man8/ip-nexthop.8
> @@ -54,7 +54,9 @@ ip-nexthop \- nexthop object management
>  .BR fdb " ] | "
>  .B  group
>  .IR GROUP " [ "
> -.BR fdb " ] } "
> +.BR fdb " ] [ "
> +.B type
> +.IR TYPE " ] } "
>  
>  .ti -8
>  .IR ENCAP " := [ "
> @@ -71,6 +73,10 @@ ip-nexthop \- nexthop object management
>  .IR GROUP " := "
>  .BR id "[," weight "[/...]"
>  
> +.ti -8
> +.IR TYPE " := { "
> +.BR mpath " }"
> +
>  .SH DESCRIPTION
>  .B ip nexthop
>  is used to manipulate entries in the kernel's nexthop tables.
> @@ -122,9 +128,17 @@ is a set of encapsulation attributes specific to the
>  .in -2
>  
>  .TP
> -.BI group " GROUP"
> +.BI group " GROUP [ " type " TYPE ]"
>  create a nexthop group. Group specification is id with an optional
>  weight (id,weight) and a '/' as a separator between entries.
> +.sp
> +.I TYPE
> +is a string specifying the nexthop group type. Namely:
> +
> +.in +8
> +.BI mpath
> +- multipath nexthop group
> +

Add a comment that this is the default group type and refers to the
legacy hash-bashed multipath group.

The rest of the patches look ok to me.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH iproute2-next 4/6] nexthop: Add ability to specify group type
  2021-03-14 15:55   ` David Ahern
@ 2021-03-15 11:38     ` Petr Machata
  0 siblings, 0 replies; 9+ messages in thread
From: Petr Machata @ 2021-03-15 11:38 UTC (permalink / raw)
  To: David Ahern; +Cc: Petr Machata, netdev, stephen, Ido Schimmel, Petr Machata


David Ahern <dsahern@gmail.com> writes:

> On 3/12/21 10:23 AM, Petr Machata wrote:
>> From: Petr Machata <me@pmachata.org>
>> 
>> From: Ido Schimmel <idosch@nvidia.com>
>
> All of the patches have the above. If Ido is the author and you are
> sending, AIUI you add your Signed-off-by below his.

Sorry about that, that's a leftover from when I was sending the DCB
patches. I'll resend with the correct headers.

>> +.sp
>> +.I TYPE
>> +is a string specifying the nexthop group type. Namely:
>> +
>> +.in +8
>> +.BI mpath
>> +- multipath nexthop group
>> +
>
> Add a comment that this is the default group type and refers to the
> legacy hash-bashed multipath group.

OK.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-15 11:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12 17:23 [PATCH iproute2-next 0/6] ip: nexthop: Support resilient groups Petr Machata
2021-03-12 17:23 ` [PATCH iproute2-next 1/6] nexthop: Synchronize uAPI files Petr Machata
2021-03-12 17:23 ` [PATCH iproute2-next 2/6] json_print: Add print_tv() Petr Machata
2021-03-12 17:23 ` [PATCH iproute2-next 3/6] nexthop: Extract a helper to parse a NH ID Petr Machata
2021-03-12 17:23 ` [PATCH iproute2-next 4/6] nexthop: Add ability to specify group type Petr Machata
2021-03-14 15:55   ` David Ahern
2021-03-15 11:38     ` Petr Machata
2021-03-12 17:23 ` [PATCH iproute2-next 5/6] nexthop: Add support for resilient nexthop groups Petr Machata
2021-03-12 17:23 ` [PATCH iproute2-next 6/6] nexthop: Add support for nexthop buckets Petr Machata

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).