All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
@ 2022-09-27 21:23 Jakub Kicinski
  2022-09-28  7:03 ` Nikolay Aleksandrov
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Jakub Kicinski @ 2022-09-27 21:23 UTC (permalink / raw)
  To: netdev, davem, edumazet, pabeni
  Cc: Jakub Kicinski, Johannes Berg, Pablo Neira Ayuso,
	Florian Westphal, Jamal Hadi Salim, Jacob Keller,
	Florent Fourcot, Guillaume Nault, Nicolas Dichtel,
	Nikolay Aleksandrov, Hangbin Liu

nlmsg_flags are full of historical baggage, inconsistencies and
strangeness. Try to document it more thoroughly. Explain the meaning
of the ECHO flag (and while at it clarify the comment in the uAPI).
Handwave a little about the NEW request flags and how they make
sense on the surface but cater to really old paradigm before commands
were a thing.

I will add more notes on how to make use of ECHO and discouragement
for reuse of flags to the kernel-side documentation.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
--
CC: Johannes Berg <johannes@sipsolutions.net>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
CC: Florian Westphal <fw@strlen.de>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: Jacob Keller <jacob.e.keller@intel.com>
CC: Florent Fourcot <florent.fourcot@wifirst.fr>
CC: Guillaume Nault <gnault@redhat.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Hangbin Liu <liuhangbin@gmail.com>
---
 Documentation/userspace-api/netlink/intro.rst | 61 +++++++++++++++----
 include/uapi/linux/netlink.h                  |  2 +-
 2 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/Documentation/userspace-api/netlink/intro.rst b/Documentation/userspace-api/netlink/intro.rst
index 8f1220756412..0955e9f203d3 100644
--- a/Documentation/userspace-api/netlink/intro.rst
+++ b/Documentation/userspace-api/netlink/intro.rst
@@ -623,22 +623,57 @@ Even though other protocols and Generic Netlink commands often use
 the same verbs in their message names (``GET``, ``SET``) the concept
 of request types did not find wider adoption.
 
-Message flags
--------------
+Notification echo
+-----------------
+
+``NLM_F_ECHO`` requests for notifications resulting from the request
+to be queued onto the requesting socket. This is useful to discover
+the impact of the request.
+
+Note that this feature is not universally implemented.
+
+Other request-type-specific flags
+---------------------------------
+
+Classic Netlink defined various flags for its ``GET``, ``NEW``
+and ``DEL`` requests in the upper byte of nlmsg_flags in struct nlmsghdr.
+Since request types have not been generalized the request type specific
+flags are rarely used (and considered deprecated for new families).
+
+For ``GET`` - ``NLM_F_ROOT`` and ``NLM_F_MATCH`` are combined into
+``NLM_F_DUMP``, and not used separately. ``NLM_F_ATOMIC`` is never used.
+
+For ``DEL`` - ``NLM_F_NONREC`` is only used by nftables and ``NLM_F_BULK``
+only by FDB some operations.
+
+The flags for ``NEW`` are used most commonly in classic Netlink. Unfortunately,
+the meaning is not crystal clear. The following description is based on the
+best guess of the intention of the authors, and in practice all families
+stray from it in one way or another. ``NLM_F_REPLACE`` asks to replace
+an existing object, if no matching object exists the operation should fail.
+``NLM_F_EXCL`` has the opposite semantics and only succeeds if object already
+existed.
+``NLM_F_CREATE`` asks for the object to be created if it does not
+exist, it can be combined with ``NLM_F_REPLACE`` and ``NLM_F_EXCL``.
+
+A comment in the main Netlink uAPI header states::
+
+   4.4BSD ADD		NLM_F_CREATE|NLM_F_EXCL
+   4.4BSD CHANGE	NLM_F_REPLACE
 
-The earlier section has already covered the basic request flags
-(``NLM_F_REQUEST``, ``NLM_F_ACK``, ``NLM_F_DUMP``) and the ``NLMSG_ERROR`` /
-``NLMSG_DONE`` flags (``NLM_F_CAPPED``, ``NLM_F_ACK_TLVS``).
-Dump flags were also mentioned (``NLM_F_MULTI``, ``NLM_F_DUMP_INTR``).
+   True CHANGE		NLM_F_CREATE|NLM_F_REPLACE
+   Append		NLM_F_CREATE
+   Check		NLM_F_EXCL
 
-Those are the main flags of note, with a small exception (of ``ieee802154``)
-Generic Netlink does not make use of other flags. If the protocol needs
-to communicate special constraints for a request it should use
-an attribute, not the flags in struct nlmsghdr.
+which seems to indicate that those flags predate request types.
+``NLM_F_REPLACE`` without ``NLM_F_CREATE`` was initially used instead
+of ``SET`` commands.
+``NLM_F_EXCL`` without ``NLM_F_CREATE`` was used to check if object exists
+without creating it, presumably predating ``GET`` commands.
 
-Classic Netlink, however, defined various flags for its ``GET``, ``NEW``
-and ``DEL`` requests. Since request types have not been generalized
-the request type specific flags should not be used either.
+``NLM_F_APPEND`` indicates that if one key can have multiple objects associated
+with it (e.g. multiple next-hop objects for a route) the new object should be
+added to the list rather than replacing the entire list.
 
 uAPI reference
 ==============
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index e0689dbd2cde..e2ae82e3f9f7 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -62,7 +62,7 @@ struct nlmsghdr {
 #define NLM_F_REQUEST		0x01	/* It is request message. 	*/
 #define NLM_F_MULTI		0x02	/* Multipart message, terminated by NLMSG_DONE */
 #define NLM_F_ACK		0x04	/* Reply with ack, with zero or error code */
-#define NLM_F_ECHO		0x08	/* Echo this request 		*/
+#define NLM_F_ECHO		0x08	/* Receive resulting notifications */
 #define NLM_F_DUMP_INTR		0x10	/* Dump was inconsistent due to sequence change */
 #define NLM_F_DUMP_FILTERED	0x20	/* Dump was filtered as requested */
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-27 21:23 [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags Jakub Kicinski
@ 2022-09-28  7:03 ` Nikolay Aleksandrov
  2022-09-28 14:21   ` Jakub Kicinski
  2022-09-28  8:04 ` Florent Fourcot
  2022-09-30  2:21 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-09-28  7:03 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, davem, edumazet, pabeni
  Cc: Johannes Berg, Pablo Neira Ayuso, Florian Westphal,
	Jamal Hadi Salim, Jacob Keller, Florent Fourcot, Guillaume Nault,
	Nicolas Dichtel, Hangbin Liu

On 28/09/2022 00:23, Jakub Kicinski wrote:
> nlmsg_flags are full of historical baggage, inconsistencies and
> strangeness. Try to document it more thoroughly. Explain the meaning
> of the ECHO flag (and while at it clarify the comment in the uAPI).
> Handwave a little about the NEW request flags and how they make
> sense on the surface but cater to really old paradigm before commands
> were a thing.
> 
> I will add more notes on how to make use of ECHO and discouragement
> for reuse of flags to the kernel-side documentation.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> --
> CC: Johannes Berg <johannes@sipsolutions.net>
> CC: Pablo Neira Ayuso <pablo@netfilter.org>
> CC: Florian Westphal <fw@strlen.de>
> CC: Jamal Hadi Salim <jhs@mojatatu.com>
> CC: Jacob Keller <jacob.e.keller@intel.com>
> CC: Florent Fourcot <florent.fourcot@wifirst.fr>
> CC: Guillaume Nault <gnault@redhat.com>
> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> CC: Nikolay Aleksandrov <razor@blackwall.org>
> CC: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  Documentation/userspace-api/netlink/intro.rst | 61 +++++++++++++++----
>  include/uapi/linux/netlink.h                  |  2 +-
>  2 files changed, 49 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/userspace-api/netlink/intro.rst b/Documentation/userspace-api/netlink/intro.rst
> index 8f1220756412..0955e9f203d3 100644
> --- a/Documentation/userspace-api/netlink/intro.rst
> +++ b/Documentation/userspace-api/netlink/intro.rst
> @@ -623,22 +623,57 @@ Even though other protocols and Generic Netlink commands often use
>  the same verbs in their message names (``GET``, ``SET``) the concept
>  of request types did not find wider adoption.
>  
> -Message flags
> --------------
> +Notification echo
> +-----------------
> +
> +``NLM_F_ECHO`` requests for notifications resulting from the request
> +to be queued onto the requesting socket. This is useful to discover
> +the impact of the request.
> +
> +Note that this feature is not universally implemented.
> +
> +Other request-type-specific flags
> +---------------------------------
> +
> +Classic Netlink defined various flags for its ``GET``, ``NEW``
> +and ``DEL`` requests in the upper byte of nlmsg_flags in struct nlmsghdr.
> +Since request types have not been generalized the request type specific
> +flags are rarely used (and considered deprecated for new families).
> +
> +For ``GET`` - ``NLM_F_ROOT`` and ``NLM_F_MATCH`` are combined into
> +``NLM_F_DUMP``, and not used separately. ``NLM_F_ATOMIC`` is never used.
> +
> +For ``DEL`` - ``NLM_F_NONREC`` is only used by nftables and ``NLM_F_BULK``
> +only by FDB some operations.
> +

The part about NLM_F_BULK is correct now, but won't be soon. I have patches to add
bulk delete to mdbs as well, and IIRC there were plans for other object types.
I can update the doc once they are applied, but IMO it will be more useful to explain
why they are used instead of who's using them, i.e. the BULK was added to support
flush for FDBs w/ filtering initially and it's a flag so others can re-use it
(my first attempt targeted only FDBs[1], but after a discussion it became clear that
it will be more beneficial if other object types can re-use it so moved to a flag).
The first version of the fdb flush support used only netlink attributes to do the
flush via setlink, later moved to a specific RTM command (RTM_FLUSHNEIGH)[2] and
finally settled on the flag[3][4] so everyone can use it.

FWIW the rest looks good to me.

Thanks,
 Nik

[1] https://www.spinics.net/lists/netdev/msg812473.html
[2] https://lore.kernel.org/netdev/20220411230356.GB8838@u2004-local/T/
[3] https://lore.kernel.org/netdev/20220411230356.GB8838@u2004-local/T/#m293c48e788e1a82aa77696f657004e8b4ab72967
[4] https://www.spinics.net/lists/netdev/msg813149.html

> +The flags for ``NEW`` are used most commonly in classic Netlink. Unfortunately,
> +the meaning is not crystal clear. The following description is based on the
> +best guess of the intention of the authors, and in practice all families
> +stray from it in one way or another. ``NLM_F_REPLACE`` asks to replace
> +an existing object, if no matching object exists the operation should fail.
> +``NLM_F_EXCL`` has the opposite semantics and only succeeds if object already
> +existed.
> +``NLM_F_CREATE`` asks for the object to be created if it does not
> +exist, it can be combined with ``NLM_F_REPLACE`` and ``NLM_F_EXCL``.
> +
> +A comment in the main Netlink uAPI header states::
> +
> +   4.4BSD ADD		NLM_F_CREATE|NLM_F_EXCL
> +   4.4BSD CHANGE	NLM_F_REPLACE
>  
> -The earlier section has already covered the basic request flags
> -(``NLM_F_REQUEST``, ``NLM_F_ACK``, ``NLM_F_DUMP``) and the ``NLMSG_ERROR`` /
> -``NLMSG_DONE`` flags (``NLM_F_CAPPED``, ``NLM_F_ACK_TLVS``).
> -Dump flags were also mentioned (``NLM_F_MULTI``, ``NLM_F_DUMP_INTR``).
> +   True CHANGE		NLM_F_CREATE|NLM_F_REPLACE
> +   Append		NLM_F_CREATE
> +   Check		NLM_F_EXCL
>  
> -Those are the main flags of note, with a small exception (of ``ieee802154``)
> -Generic Netlink does not make use of other flags. If the protocol needs
> -to communicate special constraints for a request it should use
> -an attribute, not the flags in struct nlmsghdr.
> +which seems to indicate that those flags predate request types.
> +``NLM_F_REPLACE`` without ``NLM_F_CREATE`` was initially used instead
> +of ``SET`` commands.
> +``NLM_F_EXCL`` without ``NLM_F_CREATE`` was used to check if object exists
> +without creating it, presumably predating ``GET`` commands.
>  
> -Classic Netlink, however, defined various flags for its ``GET``, ``NEW``
> -and ``DEL`` requests. Since request types have not been generalized
> -the request type specific flags should not be used either.
> +``NLM_F_APPEND`` indicates that if one key can have multiple objects associated
> +with it (e.g. multiple next-hop objects for a route) the new object should be
> +added to the list rather than replacing the entire list.
>  
>  uAPI reference
>  ==============
> diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
> index e0689dbd2cde..e2ae82e3f9f7 100644
> --- a/include/uapi/linux/netlink.h
> +++ b/include/uapi/linux/netlink.h
> @@ -62,7 +62,7 @@ struct nlmsghdr {
>  #define NLM_F_REQUEST		0x01	/* It is request message. 	*/
>  #define NLM_F_MULTI		0x02	/* Multipart message, terminated by NLMSG_DONE */
>  #define NLM_F_ACK		0x04	/* Reply with ack, with zero or error code */
> -#define NLM_F_ECHO		0x08	/* Echo this request 		*/
> +#define NLM_F_ECHO		0x08	/* Receive resulting notifications */
>  #define NLM_F_DUMP_INTR		0x10	/* Dump was inconsistent due to sequence change */
>  #define NLM_F_DUMP_FILTERED	0x20	/* Dump was filtered as requested */
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-27 21:23 [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags Jakub Kicinski
  2022-09-28  7:03 ` Nikolay Aleksandrov
@ 2022-09-28  8:04 ` Florent Fourcot
  2022-09-28  8:55   ` Nicolas Dichtel
  2022-09-30  2:21 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 20+ messages in thread
From: Florent Fourcot @ 2022-09-28  8:04 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, davem, edumazet, pabeni
  Cc: Johannes Berg, Pablo Neira Ayuso, Florian Westphal,
	Jamal Hadi Salim, Jacob Keller, Guillaume Nault, Nicolas Dichtel,
	Nikolay Aleksandrov, Hangbin Liu

Hello,

About NLM_F_EXCL, I'm not sure that my comment is relevant for your 
intro.rst document, but it has another usage in ipset submodule. For 
IPSET_CMD_DEL, setting NLM_F_EXCL means "raise an error if entry does 
not exist before the delete".

For IPSET_CMD_ADD, setting NLM_F_EXCL will raises an error if it already 
exists (that is fine with your documentation). But without the flag, 
IPSET_CMD_ADD is more or less like a "replace" operation, since one can 
update fields like counters values (IPSET_ATTR_PACKETS / 
IPSET_ATTR_BYTES), without using NLM_F_REPLACE flag.

Best regards

-- 
*Ce message et toutes les pièces jointes (ci-après le "message") sont 
établis à l’intention exclusive des destinataires désignés. Il contient des 
informations confidentielles et pouvant être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de détruire le message. Toute utilisation de 
ce message non conforme à sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite, sauf autorisation expresse 
de l'émetteur*

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-28  8:04 ` Florent Fourcot
@ 2022-09-28  8:55   ` Nicolas Dichtel
  2022-09-28  9:21     ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Dichtel @ 2022-09-28  8:55 UTC (permalink / raw)
  To: Florent Fourcot, Jakub Kicinski, netdev, davem, edumazet, pabeni
  Cc: Johannes Berg, Pablo Neira Ayuso, Florian Westphal,
	Jamal Hadi Salim, Jacob Keller, Guillaume Nault,
	Nikolay Aleksandrov, Hangbin Liu


Hello,

Le 28/09/2022 à 10:04, Florent Fourcot a écrit :
> Hello,
> 
> About NLM_F_EXCL, I'm not sure that my comment is relevant for your intro.rst
> document, but it has another usage in ipset submodule. For IPSET_CMD_DEL,
> setting NLM_F_EXCL means "raise an error if entry does not exist before the
> delete".
So NLM_F_EXCL could be used with DEL command for netfilter netlink but cannot be
used (it overlaps with NLM_F_BULK, see [1]) with DEL command for rtnetlink.
Sigh :(

[1] https://lore.kernel.org/netdev/0198618f-7b52-3023-5e9f-b38c49af1677@6wind.com/


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-28  8:55   ` Nicolas Dichtel
@ 2022-09-28  9:21     ` Nikolay Aleksandrov
  2022-09-28 14:37       ` Jakub Kicinski
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-09-28  9:21 UTC (permalink / raw)
  To: nicolas.dichtel, Florent Fourcot, Jakub Kicinski, netdev, davem,
	edumazet, pabeni
  Cc: Johannes Berg, Pablo Neira Ayuso, Florian Westphal,
	Jamal Hadi Salim, Jacob Keller, Guillaume Nault, Hangbin Liu

On 28/09/2022 11:55, Nicolas Dichtel wrote:
> 
> Hello,
> 
> Le 28/09/2022 à 10:04, Florent Fourcot a écrit :
>> Hello,
>>
>> About NLM_F_EXCL, I'm not sure that my comment is relevant for your intro.rst
>> document, but it has another usage in ipset submodule. For IPSET_CMD_DEL,
>> setting NLM_F_EXCL means "raise an error if entry does not exist before the
>> delete".
> So NLM_F_EXCL could be used with DEL command for netfilter netlink but cannot be
> used (it overlaps with NLM_F_BULK, see [1]) with DEL command for rtnetlink.
> Sigh :(
> 
> [1] https://lore.kernel.org/netdev/0198618f-7b52-3023-5e9f-b38c49af1677@6wind.com/
> 
> 
> Regards,
> Nicolas

One could argue that's abuse of the api, but since it's part of a different family
I guess it's ok. NLM_F_EXCL is a modifier of NEW cmd as the comment above it says
and has never had rtnetlink DEL users.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-28  7:03 ` Nikolay Aleksandrov
@ 2022-09-28 14:21   ` Jakub Kicinski
  2022-09-28 14:40     ` Nikolay Aleksandrov
  2022-09-30 11:07     ` Jamal Hadi Salim
  0 siblings, 2 replies; 20+ messages in thread
From: Jakub Kicinski @ 2022-09-28 14:21 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, davem, edumazet, pabeni, Johannes Berg,
	Pablo Neira Ayuso, Florian Westphal, Jamal Hadi Salim,
	Jacob Keller, Florent Fourcot, Guillaume Nault, Nicolas Dichtel,
	Hangbin Liu

On Wed, 28 Sep 2022 10:03:07 +0300 Nikolay Aleksandrov wrote:
> The part about NLM_F_BULK is correct now, but won't be soon. I have patches to add
> bulk delete to mdbs as well, and IIRC there were plans for other object types.
> I can update the doc once they are applied, but IMO it will be more useful to explain
> why they are used instead of who's using them, i.e. the BULK was added to support
> flush for FDBs w/ filtering initially and it's a flag so others can re-use it
> (my first attempt targeted only FDBs[1], but after a discussion it became clear that
> it will be more beneficial if other object types can re-use it so moved to a flag).
> The first version of the fdb flush support used only netlink attributes to do the
> flush via setlink, later moved to a specific RTM command (RTM_FLUSHNEIGH)[2] and
> finally settled on the flag[3][4] so everyone can use it.

I thought that's all FDB-ish stuff. Not really looking forward to the
use of flags spreading but within rtnl it may make some sense. We can
just update the docs tho, no?

BTW how would you define the exact semantics of NLM_F_BULK vs it not
being set, in abstract terms?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-28  9:21     ` Nikolay Aleksandrov
@ 2022-09-28 14:37       ` Jakub Kicinski
  2022-09-28 14:46         ` Nikolay Aleksandrov
  2022-09-28 15:19         ` Nicolas Dichtel
  0 siblings, 2 replies; 20+ messages in thread
From: Jakub Kicinski @ 2022-09-28 14:37 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: nicolas.dichtel, Florent Fourcot, netdev, davem, edumazet,
	pabeni, Johannes Berg, Pablo Neira Ayuso, Florian Westphal,
	Jamal Hadi Salim, Jacob Keller, Guillaume Nault, Hangbin Liu

On Wed, 28 Sep 2022 12:21:57 +0300 Nikolay Aleksandrov wrote:
> On 28/09/2022 11:55, Nicolas Dichtel wrote:
> > Le 28/09/2022 à 10:04, Florent Fourcot a écrit :  
> >> About NLM_F_EXCL, I'm not sure that my comment is relevant for your intro.rst
> >> document, but it has another usage in ipset submodule. For IPSET_CMD_DEL,
> >> setting NLM_F_EXCL means "raise an error if entry does not exist before the
> >> delete".  

Interesting.

> > So NLM_F_EXCL could be used with DEL command for netfilter netlink but cannot be
> > used (it overlaps with NLM_F_BULK, see [1]) with DEL command for rtnetlink.
> > Sigh :(
> > 
> > [1] https://lore.kernel.org/netdev/0198618f-7b52-3023-5e9f-b38c49af1677@6wind.com/
> 
> One could argue that's abuse of the api, but since it's part of a different family
> I guess it's ok. NLM_F_EXCL is a modifier of NEW cmd as the comment above it says
> and has never had rtnetlink DEL users.

It's fine in the sense that it works, but it's rather pointless to call
the flags common if they have different semantics depending on the
corner of the kernel they are used in, right?

I was very tempted to send a patch which would validate the top
byte of flags in genetlink for new commands, this way we may some day
find a truly common (as in enforced by the code) use for the bits. 

WDYT?

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 7c136de117eb..0fbaed49e23f 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -739,6 +739,22 @@ static int genl_family_rcv_msg_doit(const struct genl_family *family,
 	return err;
 }
 
+static int genl_header_check(struct nlmsghdr *nlh, struct genlmsghdr *hdr)
+{
+	if (hdr->reserved)
+		return -EINVAL;
+
+	/* Flags - lower byte check */
+	if (nlh->nlmsg_flags & 0xff & ~(NLM_F_REQUEST | NLM_F_ACK | NLM_F_ECHO))
+		return -EINVAL;
+	/* Flags - higher byte check */
+	if (nlh->nlmsg_flags & 0xff00 &&
+	    nlh->nlmsg_flags & 0xff00 != NLM_F_DUMP)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int genl_family_rcv_msg(const struct genl_family *family,
 			       struct sk_buff *skb,
 			       struct nlmsghdr *nlh,
@@ -757,7 +773,7 @@ static int genl_family_rcv_msg(const struct genl_family *family,
 	if (nlh->nlmsg_len < nlmsg_msg_size(hdrlen))
 		return -EINVAL;
 
-	if (hdr->cmd >= family->resv_start_op && hdr->reserved)
+	if (hdr->cmd >= family->resv_start_op && genl_header_check(nlh, hdr))
 		return -EINVAL;
 
 	if (genl_get_cmd(hdr->cmd, family, &op))

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-28 14:21   ` Jakub Kicinski
@ 2022-09-28 14:40     ` Nikolay Aleksandrov
  2022-09-28 14:43       ` Nikolay Aleksandrov
  2022-09-30 11:07     ` Jamal Hadi Salim
  1 sibling, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-09-28 14:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, edumazet, pabeni, Johannes Berg,
	Pablo Neira Ayuso, Florian Westphal, Jamal Hadi Salim,
	Jacob Keller, Florent Fourcot, Guillaume Nault, Nicolas Dichtel,
	Hangbin Liu

On 28/09/2022 17:21, Jakub Kicinski wrote:
> On Wed, 28 Sep 2022 10:03:07 +0300 Nikolay Aleksandrov wrote:
>> The part about NLM_F_BULK is correct now, but won't be soon. I have patches to add
>> bulk delete to mdbs as well, and IIRC there were plans for other object types.
>> I can update the doc once they are applied, but IMO it will be more useful to explain
>> why they are used instead of who's using them, i.e. the BULK was added to support
>> flush for FDBs w/ filtering initially and it's a flag so others can re-use it
>> (my first attempt targeted only FDBs[1], but after a discussion it became clear that
>> it will be more beneficial if other object types can re-use it so moved to a flag).
>> The first version of the fdb flush support used only netlink attributes to do the
>> flush via setlink, later moved to a specific RTM command (RTM_FLUSHNEIGH)[2] and
>> finally settled on the flag[3][4] so everyone can use it.
> 
> I thought that's all FDB-ish stuff. Not really looking forward to the
> use of flags spreading but within rtnl it may make some sense. We can
> just update the docs tho, no?
> 

Sure, that's ok.

> BTW how would you define the exact semantics of NLM_F_BULK vs it not
> being set, in abstract terms?

Well, BULK is a delete modified to act on multiple objects, so I'd say if it is not
set the DELETE targets a single object vs multiple objects if set. Obviously in more
formal terms, sorry not at a PC right now. :)




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-28 14:40     ` Nikolay Aleksandrov
@ 2022-09-28 14:43       ` Nikolay Aleksandrov
  0 siblings, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-09-28 14:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, edumazet, pabeni, Johannes Berg,
	Pablo Neira Ayuso, Florian Westphal, Jamal Hadi Salim,
	Jacob Keller, Florent Fourcot, Guillaume Nault, Nicolas Dichtel,
	Hangbin Liu

On 28/09/2022 17:40, Nikolay Aleksandrov wrote:
> On 28/09/2022 17:21, Jakub Kicinski wrote:
>> On Wed, 28 Sep 2022 10:03:07 +0300 Nikolay Aleksandrov wrote:
>>> The part about NLM_F_BULK is correct now, but won't be soon. I have patches to add
>>> bulk delete to mdbs as well, and IIRC there were plans for other object types.
>>> I can update the doc once they are applied, but IMO it will be more useful to explain
>>> why they are used instead of who's using them, i.e. the BULK was added to support
>>> flush for FDBs w/ filtering initially and it's a flag so others can re-use it
>>> (my first attempt targeted only FDBs[1], but after a discussion it became clear that
>>> it will be more beneficial if other object types can re-use it so moved to a flag).
>>> The first version of the fdb flush support used only netlink attributes to do the
>>> flush via setlink, later moved to a specific RTM command (RTM_FLUSHNEIGH)[2] and
>>> finally settled on the flag[3][4] so everyone can use it.
>>
>> I thought that's all FDB-ish stuff. Not really looking forward to the
>> use of flags spreading but within rtnl it may make some sense. We can
>> just update the docs tho, no?
>>
> 
> Sure, that's ok.
> 
>> BTW how would you define the exact semantics of NLM_F_BULK vs it not
>> being set, in abstract terms?
> 
> Well, BULK is a delete modified to act on multiple objects, so I'd say if it is not
> set the DELETE targets a single object vs multiple objects if set. Obviously in more
> formal terms, sorry not at a PC right now. :)
> 

s/modified/modifier/
in terms of the comments above these flags in the header file



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-28 14:37       ` Jakub Kicinski
@ 2022-09-28 14:46         ` Nikolay Aleksandrov
  2022-09-28 15:15           ` Jakub Kicinski
  2022-09-28 15:19         ` Nicolas Dichtel
  1 sibling, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-09-28 14:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: nicolas.dichtel, Florent Fourcot, netdev, davem, edumazet,
	pabeni, Johannes Berg, Pablo Neira Ayuso, Florian Westphal,
	Jamal Hadi Salim, Jacob Keller, Guillaume Nault, Hangbin Liu

On 28/09/2022 17:37, Jakub Kicinski wrote:
> On Wed, 28 Sep 2022 12:21:57 +0300 Nikolay Aleksandrov wrote:
>> On 28/09/2022 11:55, Nicolas Dichtel wrote:
>>> Le 28/09/2022 à 10:04, Florent Fourcot a écrit :  
>>>> About NLM_F_EXCL, I'm not sure that my comment is relevant for your intro.rst
>>>> document, but it has another usage in ipset submodule. For IPSET_CMD_DEL,
>>>> setting NLM_F_EXCL means "raise an error if entry does not exist before the
>>>> delete".  
> 
> Interesting.
> 
>>> So NLM_F_EXCL could be used with DEL command for netfilter netlink but cannot be
>>> used (it overlaps with NLM_F_BULK, see [1]) with DEL command for rtnetlink.
>>> Sigh :(
>>>
>>> [1] https://lore.kernel.org/netdev/0198618f-7b52-3023-5e9f-b38c49af1677@6wind.com/
>>
>> One could argue that's abuse of the api, but since it's part of a different family
>> I guess it's ok. NLM_F_EXCL is a modifier of NEW cmd as the comment above it says
>> and has never had rtnetlink DEL users.
> 
> It's fine in the sense that it works, but it's rather pointless to call
> the flags common if they have different semantics depending on the
> corner of the kernel they are used in, right?
> 

Right, and their comments and docs become kind of meaningless because of that.

> I was very tempted to send a patch which would validate the top
> byte of flags in genetlink for new commands, this way we may some day
> find a truly common (as in enforced by the code) use for the bits. 
> 
> WDYT?
> 

I like it, can't check right now if we can get into the same issue as with BULK where
someone is passing unused/wrong flags with the command and we break him though.
But I'd bite the bullet and maybe issue an extack msg as well.

> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index 7c136de117eb..0fbaed49e23f 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -739,6 +739,22 @@ static int genl_family_rcv_msg_doit(const struct genl_family *family,
>  	return err;
>  }
>  
> +static int genl_header_check(struct nlmsghdr *nlh, struct genlmsghdr *hdr)
> +{
> +	if (hdr->reserved)
> +		return -EINVAL;
> +
> +	/* Flags - lower byte check */
> +	if (nlh->nlmsg_flags & 0xff & ~(NLM_F_REQUEST | NLM_F_ACK | NLM_F_ECHO))
> +		return -EINVAL;
> +	/* Flags - higher byte check */
> +	if (nlh->nlmsg_flags & 0xff00 &&
> +	    nlh->nlmsg_flags & 0xff00 != NLM_F_DUMP)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static int genl_family_rcv_msg(const struct genl_family *family,
>  			       struct sk_buff *skb,
>  			       struct nlmsghdr *nlh,
> @@ -757,7 +773,7 @@ static int genl_family_rcv_msg(const struct genl_family *family,
>  	if (nlh->nlmsg_len < nlmsg_msg_size(hdrlen))
>  		return -EINVAL;
>  
> -	if (hdr->cmd >= family->resv_start_op && hdr->reserved)
> +	if (hdr->cmd >= family->resv_start_op && genl_header_check(nlh, hdr))
>  		return -EINVAL;
>  
>  	if (genl_get_cmd(hdr->cmd, family, &op))


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-28 14:46         ` Nikolay Aleksandrov
@ 2022-09-28 15:15           ` Jakub Kicinski
  0 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2022-09-28 15:15 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: nicolas.dichtel, Florent Fourcot, netdev, davem, edumazet,
	pabeni, Johannes Berg, Pablo Neira Ayuso, Florian Westphal,
	Jamal Hadi Salim, Jacob Keller, Guillaume Nault, Hangbin Liu

On Wed, 28 Sep 2022 17:46:28 +0300 Nikolay Aleksandrov wrote:
> I like it, can't check right now if we can get into the same issue as with BULK where
> someone is passing unused/wrong flags with the command and we break him though.

So it'd only be effective for new commands, hopefully that's good
enough:

-	if (hdr->cmd >= family->resv_start_op && hdr->reserved)
+	if (hdr->cmd >= family->resv_start_op && genl_header_check(nlh, hdr))

the resv_start_op thing I added in this cycle.

> But I'd bite the bullet and maybe issue an extack msg as well.

Fair point, more tests more magic. I'll add a msg.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-28 14:37       ` Jakub Kicinski
  2022-09-28 14:46         ` Nikolay Aleksandrov
@ 2022-09-28 15:19         ` Nicolas Dichtel
  1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Dichtel @ 2022-09-28 15:19 UTC (permalink / raw)
  To: Jakub Kicinski, Nikolay Aleksandrov
  Cc: Florent Fourcot, netdev, davem, edumazet, pabeni, Johannes Berg,
	Pablo Neira Ayuso, Florian Westphal, Jamal Hadi Salim,
	Jacob Keller, Guillaume Nault, Hangbin Liu

Le 28/09/2022 à 16:37, Jakub Kicinski a écrit :
[snip]
> I was very tempted to send a patch which would validate the top
> byte of flags in genetlink for new commands, this way we may some day
> find a truly common (as in enforced by the code) use for the bits. 
> 
> WDYT?
I think it's worth trying it. In the worst case, this could be reverted.
It will help to see how people are using these flags.
The same kind of patch for rtnetlink could be interesting before someone adds a
new flag that overlaps.

> 
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index 7c136de117eb..0fbaed49e23f 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -739,6 +739,22 @@ static int genl_family_rcv_msg_doit(const struct genl_family *family,
>  	return err;
>  }
>  
> +static int genl_header_check(struct nlmsghdr *nlh, struct genlmsghdr *hdr)
> +{
> +	if (hdr->reserved)
> +		return -EINVAL;
> +
> +	/* Flags - lower byte check */
> +	if (nlh->nlmsg_flags & 0xff & ~(NLM_F_REQUEST | NLM_F_ACK | NLM_F_ECHO))
> +		return -EINVAL;
> +	/* Flags - higher byte check */
> +	if (nlh->nlmsg_flags & 0xff00 &&
> +	    nlh->nlmsg_flags & 0xff00 != NLM_F_DUMP)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static int genl_family_rcv_msg(const struct genl_family *family,
>  			       struct sk_buff *skb,
>  			       struct nlmsghdr *nlh,
> @@ -757,7 +773,7 @@ static int genl_family_rcv_msg(const struct genl_family *family,
>  	if (nlh->nlmsg_len < nlmsg_msg_size(hdrlen))
>  		return -EINVAL;
>  
> -	if (hdr->cmd >= family->resv_start_op && hdr->reserved)
> +	if (hdr->cmd >= family->resv_start_op && genl_header_check(nlh, hdr))
>  		return -EINVAL;
>  
>  	if (genl_get_cmd(hdr->cmd, family, &op))

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-27 21:23 [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags Jakub Kicinski
  2022-09-28  7:03 ` Nikolay Aleksandrov
  2022-09-28  8:04 ` Florent Fourcot
@ 2022-09-30  2:21 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 20+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-09-30  2:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, edumazet, pabeni, johannes, pablo, fw, jhs,
	jacob.e.keller, florent.fourcot, gnault, nicolas.dichtel, razor,
	liuhangbin

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 27 Sep 2022 14:23:06 -0700 you wrote:
> nlmsg_flags are full of historical baggage, inconsistencies and
> strangeness. Try to document it more thoroughly. Explain the meaning
> of the ECHO flag (and while at it clarify the comment in the uAPI).
> Handwave a little about the NEW request flags and how they make
> sense on the surface but cater to really old paradigm before commands
> were a thing.
> 
> [...]

Here is the summary with links:
  - [net-next] docs: netlink: clarify the historical baggage of Netlink flags
    https://git.kernel.org/netdev/net-next/c/5493a2ad0d20

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-28 14:21   ` Jakub Kicinski
  2022-09-28 14:40     ` Nikolay Aleksandrov
@ 2022-09-30 11:07     ` Jamal Hadi Salim
  2022-09-30 11:29       ` Nikolay Aleksandrov
  1 sibling, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2022-09-30 11:07 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Nikolay Aleksandrov, netdev, davem, edumazet, pabeni,
	Johannes Berg, Pablo Neira Ayuso, Florian Westphal, Jacob Keller,
	Florent Fourcot, Guillaume Nault, Nicolas Dichtel, Hangbin Liu

On Wed, Sep 28, 2022 at 10:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 28 Sep 2022 10:03:07 +0300 Nikolay Aleksandrov wrote:
> > The part about NLM_F_BULK is correct now, but won't be soon. I have patches to add
> > bulk delete to mdbs as well, and IIRC there were plans for other object types.
> > I can update the doc once they are applied, but IMO it will be more useful to explain
> > why they are used instead of who's using them, i.e. the BULK was added to support
> > flush for FDBs w/ filtering initially and it's a flag so others can re-use it
> > (my first attempt targeted only FDBs[1], but after a discussion it became clear that
> > it will be more beneficial if other object types can re-use it so moved to a flag).
> > The first version of the fdb flush support used only netlink attributes to do the
> > flush via setlink, later moved to a specific RTM command (RTM_FLUSHNEIGH)[2] and
> > finally settled on the flag[3][4] so everyone can use it.
>
> I thought that's all FDB-ish stuff. Not really looking forward to the
> use of flags spreading but within rtnl it may make some sense. We can
> just update the docs tho, no?
>
> BTW how would you define the exact semantics of NLM_F_BULK vs it not
> being set, in abstract terms?

I missed the discussion.
NLM_F_BULK looks strange. Why pollute the main header? Wouldnt NLM_F_ROOT
have sufficed to trigger the semantic? Or better create your own
"service header"
and put fdb specific into that object header? i.e the idea in netlink
being you specify:
{Main header: Object specific header: Object specific attributes in TLVs}
Main header should only be extended if you really really have to -
i.e requires much
longer review..

Historical context: We did try to push netlink as a wire protocol
(meaning it goes
across nodes instead of user-kernel), see:
https://www.rfc-editor.org/rfc/rfc3549.html and
https://datatracker.ietf.org/doc/html/draft-jhsrha-forces-netlink2-02
(there's an implementation
of netlink2 that i can dig up).
unfortunately there was much politiking and ForCES protocol was the compromise;
as a side, there's _no way in hell_ current netlink would be possible
to use on the wire
because of all these one-offs that were being added over time.
RFC3549 has some documentation which is not really up to date but still valid
(section 2.2/3 on the header). Note: The EXCL, APPEND etc are very
similar to file system
semantics (eg open())
Generic netlink was written because there were too many people
grabbing netlink ids
and we were going to run out of them at some point. IIRC, i called it
"foobar" originally
and Dave didnt like that name. It was intended to be the less
efficient cousin (subject to
more locks etc) - and the RPC semantic changes were mostly to
accommodate the fact
that you couldnt use netlink proper - although I am more a fan of
CRUD(restful) semantics
than RPC (netlink proper is salvagable for CRUD as is).

Not sure how much of this can be leveraged in the documentation (I
will take a look).

cheers,
jamal

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-30 11:07     ` Jamal Hadi Salim
@ 2022-09-30 11:29       ` Nikolay Aleksandrov
  2022-09-30 14:24         ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-09-30 11:29 UTC (permalink / raw)
  To: Jamal Hadi Salim, Jakub Kicinski
  Cc: netdev, davem, edumazet, pabeni, Johannes Berg,
	Pablo Neira Ayuso, Florian Westphal, Jacob Keller,
	Florent Fourcot, Guillaume Nault, Nicolas Dichtel, Hangbin Liu

On 30/09/2022 14:07, Jamal Hadi Salim wrote:
> On Wed, Sep 28, 2022 at 10:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Wed, 28 Sep 2022 10:03:07 +0300 Nikolay Aleksandrov wrote:
>>> The part about NLM_F_BULK is correct now, but won't be soon. I have patches to add
>>> bulk delete to mdbs as well, and IIRC there were plans for other object types.
>>> I can update the doc once they are applied, but IMO it will be more useful to explain
>>> why they are used instead of who's using them, i.e. the BULK was added to support
>>> flush for FDBs w/ filtering initially and it's a flag so others can re-use it
>>> (my first attempt targeted only FDBs[1], but after a discussion it became clear that
>>> it will be more beneficial if other object types can re-use it so moved to a flag).
>>> The first version of the fdb flush support used only netlink attributes to do the
>>> flush via setlink, later moved to a specific RTM command (RTM_FLUSHNEIGH)[2] and
>>> finally settled on the flag[3][4] so everyone can use it.
>>
>> I thought that's all FDB-ish stuff. Not really looking forward to the
>> use of flags spreading but within rtnl it may make some sense. We can
>> just update the docs tho, no?
>>
>> BTW how would you define the exact semantics of NLM_F_BULK vs it not
>> being set, in abstract terms?
> 
> I missed the discussion.
> NLM_F_BULK looks strange. Why pollute the main header? Wouldnt NLM_F_ROOT
> have sufficed to trigger the semantic? Or better create your own
> "service header"
> and put fdb specific into that object header? i.e the idea in netlink
> being you specify:
> {Main header: Object specific header: Object specific attributes in TLVs}
> Main header should only be extended if you really really have to -
> i.e requires much
> longer review..
> 

I did that initially, my first submission used netlink attributes specifically
for fdbs, but then reviews suggested same functionality is needed for other object types
e.g. mdbs, vxlan db, neighbor entries and so on. My second attempt added a new RTM_ msg type that
deletes multiple objects, and finally all settled on the flag because all families can
make use of it, it modifies the delete op to act on multiple objects. For more context please
check the discussions bellow [1] is my first submission (all netlink), [2] is RTM_
and [3][4] are the flag:

[1] https://www.spinics.net/lists/netdev/msg812473.html
[2] https://lore.kernel.org/netdev/20220411230356.GB8838@u2004-local/T/
[3] https://lore.kernel.org/netdev/20220411230356.GB8838@u2004-local/T/#m293c48e788e1a82aa77696f657004e8b4ab72967
[4] https://www.spinics.net/lists/netdev/msg813149.html

> Historical context: We did try to push netlink as a wire protocol
> (meaning it goes
> across nodes instead of user-kernel), see:
> https://www.rfc-editor.org/rfc/rfc3549.html and
> https://datatracker.ietf.org/doc/html/draft-jhsrha-forces-netlink2-02
> (there's an implementation
> of netlink2 that i can dig up).
> unfortunately there was much politiking and ForCES protocol was the compromise;
> as a side, there's _no way in hell_ current netlink would be possible
> to use on the wire
> because of all these one-offs that were being added over time.
> RFC3549 has some documentation which is not really up to date but still valid
> (section 2.2/3 on the header). Note: The EXCL, APPEND etc are very
> similar to file system
> semantics (eg open())
> Generic netlink was written because there were too many people
> grabbing netlink ids
> and we were going to run out of them at some point. IIRC, i called it
> "foobar" originally
> and Dave didnt like that name. It was intended to be the less
> efficient cousin (subject to
> more locks etc) - and the RPC semantic changes were mostly to
> accommodate the fact
> that you couldnt use netlink proper - although I am more a fan of
> CRUD(restful) semantics
> than RPC (netlink proper is salvagable for CRUD as is).
> 
> Not sure how much of this can be leveraged in the documentation (I
> will take a look).
> 
> cheers,
> jamal
> 
> cheers,
> jamal


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-30 11:29       ` Nikolay Aleksandrov
@ 2022-09-30 14:24         ` Jamal Hadi Salim
  2022-09-30 14:34           ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2022-09-30 14:24 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, Johannes Berg,
	Pablo Neira Ayuso, Florian Westphal, Jacob Keller,
	Florent Fourcot, Guillaume Nault, Nicolas Dichtel, Hangbin Liu

On Fri, Sep 30, 2022 at 7:29 AM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> On 30/09/2022 14:07, Jamal Hadi Salim wrote:
> > On Wed, Sep 28, 2022 at 10:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> On Wed, 28 Sep 2022 10:03:07 +0300 Nikolay Aleksandrov wrote:
> >>> The part about NLM_F_BULK is correct now, but won't be soon. I have patches to add
> >>> bulk delete to mdbs as well, and IIRC there were plans for other object types.
> >>> I can update the doc once they are applied, but IMO it will be more useful to explain
> >>> why they are used instead of who's using them, i.e. the BULK was added to support
> >>> flush for FDBs w/ filtering initially and it's a flag so others can re-use it
> >>> (my first attempt targeted only FDBs[1], but after a discussion it became clear that
> >>> it will be more beneficial if other object types can re-use it so moved to a flag).
> >>> The first version of the fdb flush support used only netlink attributes to do the
> >>> flush via setlink, later moved to a specific RTM command (RTM_FLUSHNEIGH)[2] and
> >>> finally settled on the flag[3][4] so everyone can use it.
> >>
> >> I thought that's all FDB-ish stuff. Not really looking forward to the
> >> use of flags spreading but within rtnl it may make some sense. We can
> >> just update the docs tho, no?
> >>
> >> BTW how would you define the exact semantics of NLM_F_BULK vs it not
> >> being set, in abstract terms?
> >
> > I missed the discussion.
> > NLM_F_BULK looks strange. Why pollute the main header? Wouldnt NLM_F_ROOT
> > have sufficed to trigger the semantic? Or better create your own
> > "service header"
> > and put fdb specific into that object header? i.e the idea in netlink
> > being you specify:
> > {Main header: Object specific header: Object specific attributes in TLVs}
> > Main header should only be extended if you really really have to -
> > i.e requires much
> > longer review..
> >
>
> I did that initially, my first submission used netlink attributes specifically
> for fdbs, but then reviews suggested same functionality is needed for other object types
> e.g. mdbs, vxlan db, neighbor entries and so on. My second attempt added a new RTM_ msg type that
> deletes multiple objects, and finally all settled on the flag because all families can
> make use of it, it modifies the delete op to act on multiple objects. For more context please
> check the discussions bellow [1] is my first submission (all netlink), [2] is RTM_
> and [3][4] are the flag:

I think what you are looking for is a way to either get or delete
selective objects
(dump and flush dont filter - they mean "everything"); iow, you send a filtering
expression and a get/del command alongside it. The filtering
expression is very specific
to the object and needs to be specified as such a TLV is appropriate.

Really NLM_F_ROOT and _MATCH are sufficient. The filtering expression is
the challenge.
"functionality is needed for other objects" is a true statement; the question is
whether we keep hacking into the kernel for these filtering expressions.
An idea i toyed with at one point was to send alongside the netlink
request a lua
program that will execute your selector. Then you dont have to keep hacking
and extending netlink TLVs or adding more to the kernel.
Lua is very appealing because you dont have to compile anything - and the
whole program goes into the kernel as a string.
Second best alternative could be to install a bpf program as your selector and
invoke it from netlink code to select the entries that get deleted or returned.
You will have to keep transactional state.

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-30 14:24         ` Jamal Hadi Salim
@ 2022-09-30 14:34           ` Nikolay Aleksandrov
  2022-09-30 16:36             ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-09-30 14:34 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, Johannes Berg,
	Pablo Neira Ayuso, Florian Westphal, Jacob Keller,
	Florent Fourcot, Guillaume Nault, Nicolas Dichtel, Hangbin Liu

On 30/09/2022 17:24, Jamal Hadi Salim wrote:
> On Fri, Sep 30, 2022 at 7:29 AM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>>
>> On 30/09/2022 14:07, Jamal Hadi Salim wrote:
>>> On Wed, Sep 28, 2022 at 10:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>
>>>> On Wed, 28 Sep 2022 10:03:07 +0300 Nikolay Aleksandrov wrote:
>>>>> The part about NLM_F_BULK is correct now, but won't be soon. I have patches to add
>>>>> bulk delete to mdbs as well, and IIRC there were plans for other object types.
>>>>> I can update the doc once they are applied, but IMO it will be more useful to explain
>>>>> why they are used instead of who's using them, i.e. the BULK was added to support
>>>>> flush for FDBs w/ filtering initially and it's a flag so others can re-use it
>>>>> (my first attempt targeted only FDBs[1], but after a discussion it became clear that
>>>>> it will be more beneficial if other object types can re-use it so moved to a flag).
>>>>> The first version of the fdb flush support used only netlink attributes to do the
>>>>> flush via setlink, later moved to a specific RTM command (RTM_FLUSHNEIGH)[2] and
>>>>> finally settled on the flag[3][4] so everyone can use it.
>>>>
>>>> I thought that's all FDB-ish stuff. Not really looking forward to the
>>>> use of flags spreading but within rtnl it may make some sense. We can
>>>> just update the docs tho, no?
>>>>
>>>> BTW how would you define the exact semantics of NLM_F_BULK vs it not
>>>> being set, in abstract terms?
>>>
>>> I missed the discussion.
>>> NLM_F_BULK looks strange. Why pollute the main header? Wouldnt NLM_F_ROOT
>>> have sufficed to trigger the semantic? Or better create your own
>>> "service header"
>>> and put fdb specific into that object header? i.e the idea in netlink
>>> being you specify:
>>> {Main header: Object specific header: Object specific attributes in TLVs}
>>> Main header should only be extended if you really really have to -
>>> i.e requires much
>>> longer review..
>>>
>>
>> I did that initially, my first submission used netlink attributes specifically
>> for fdbs, but then reviews suggested same functionality is needed for other object types
>> e.g. mdbs, vxlan db, neighbor entries and so on. My second attempt added a new RTM_ msg type that
>> deletes multiple objects, and finally all settled on the flag because all families can
>> make use of it, it modifies the delete op to act on multiple objects. For more context please
>> check the discussions bellow [1] is my first submission (all netlink), [2] is RTM_
>> and [3][4] are the flag:
> 
> I think what you are looking for is a way to either get or delete
> selective objects
> (dump and flush dont filter - they mean "everything"); iow, you send a filtering

They must be able to flush everything, too. Filter matching all/empty filter, we need
it for mdbs and possibly other object types would want that.

> expression and a get/del command alongside it. The filtering
> expression is very specific
> to the object and needs to be specified as such a TLV is appropriate.
> 

Right, and that is what got implemented. The filtering TLVs are bridge and fdb-specific
they don't affect any other subsystem. The BULK flag denotes the delete will
affect multiple objects.

> Really NLM_F_ROOT and _MATCH are sufficient. The filtering expression is
> the challenge.

NLM_F_ROOT isn't usable for a DEL expression because its bit is already used by NLM_F_NONREC
and it wouldn't be nice to change meaning of the bit based on the subsystem. NLM_F_MATCH's bit
actually matches NLM_F_BULK :)

> "functionality is needed for other objects" is a true statement; the question is
> whether we keep hacking into the kernel for these filtering expressions.> An idea i toyed with at one point was to send alongside the netlink
> request a lua
> program that will execute your selector. Then you dont have to keep hacking
> and extending netlink TLVs or adding more to the kernel.
> Lua is very appealing because you dont have to compile anything - and the
> whole program goes into the kernel as a string.
> Second best alternative could be to install a bpf program as your selector and
> invoke it from netlink code to select the entries that get deleted or returned.
> You will have to keep transactional state.
> 

Sometime back I played with a different idea - expressing the filters with the existing TLV objects
so whatever can be specified by user-space can also be used as a filter (also for filtering
dump requests) with some introspection. The lua idea sounds nice though.

> cheers,
> jamal


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-30 14:34           ` Nikolay Aleksandrov
@ 2022-09-30 16:36             ` Jamal Hadi Salim
  2022-09-30 18:19               ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2022-09-30 16:36 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, Johannes Berg,
	Pablo Neira Ayuso, Florian Westphal, Jacob Keller,
	Florent Fourcot, Guillaume Nault, Nicolas Dichtel, Hangbin Liu

On Fri, Sep 30, 2022 at 10:34 AM Nikolay Aleksandrov
<razor@blackwall.org> wrote:
>
> On 30/09/2022 17:24, Jamal Hadi Salim wrote:
> > On Fri, Sep 30, 2022 at 7:29 AM Nikolay Aleksandrov <razor@blackwall.org> wrote:

[..]

> >
> > I think what you are looking for is a way to either get or delete
> > selective objects
> > (dump and flush dont filter - they mean "everything"); iow, you send a filtering
>
> They must be able to flush everything, too. Filter matching all/empty filter, we need
> it for mdbs and possibly other object types would want that.

You only have one object type though per netlink request i.e you
dont have in the same message fdb and mdb objects?

> > expression and a get/del command alongside it. The filtering
> > expression is very specific
> > to the object and needs to be specified as such a TLV is appropriate.
> >
>
> Right, and that is what got implemented. The filtering TLVs are bridge and fdb-specific
> they don't affect any other subsystem. The BULK flag denotes the delete will
> affect multiple objects.
>

Isnt it sufficient to indicate what objects need to be deleted based on presence
of TLVs or the service header for that object?

> > Really NLM_F_ROOT and _MATCH are sufficient. The filtering expression is
> > the challenge.
>
> NLM_F_ROOT isn't usable for a DEL expression because its bit is already used by NLM_F_NONREC
> and it wouldn't be nice to change meaning of the bit based on the subsystem. NLM_F_MATCH's bit
> actually matches NLM_F_BULK :)
>

Ouch. Ok, it got messy over time i guess. We probably should have
spent more time
discussing NLM_F_NONREC since it has a single user with very specific
need and it
got imposed on all.
I get your point - i am still not sure if a global flag is the right answer.

>
> Sometime back I played with a different idea - expressing the filters with the existing TLV objects
> so whatever can be specified by user-space can also be used as a filter (also for filtering
> dump requests) with some introspection. The lua idea sounds nice though.

So what is the content of the TLV in that case?
I think ebpf may work with some acrobatics. We did try classical ebpf and it was
messy. Note for scaling, this is not just about Delete and Get but
also for generated
events, where one can send to the kernel a filter so they dont see a broadcast
of everything. See for example a use case here:
https://www.files.netdevconf.info/d/46fd7e152d1d4f6c88ac/files/?p=/LargeScaleTCPAnalytics.pdf

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-30 16:36             ` Jamal Hadi Salim
@ 2022-09-30 18:19               ` Nikolay Aleksandrov
  2022-10-02 13:59                 ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-09-30 18:19 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, Johannes Berg,
	Pablo Neira Ayuso, Florian Westphal, Jacob Keller,
	Florent Fourcot, Guillaume Nault, Nicolas Dichtel, Hangbin Liu

On 30/09/2022 19:36, Jamal Hadi Salim wrote:
> On Fri, Sep 30, 2022 at 10:34 AM Nikolay Aleksandrov
> <razor@blackwall.org> wrote:
>>
>> On 30/09/2022 17:24, Jamal Hadi Salim wrote:
>>> On Fri, Sep 30, 2022 at 7:29 AM Nikolay Aleksandrov <razor@blackwall.org> wrote:
> 
> [..]
> 
>>>
>>> I think what you are looking for is a way to either get or delete
>>> selective objects
>>> (dump and flush dont filter - they mean "everything"); iow, you send a filtering
>>
>> They must be able to flush everything, too. Filter matching all/empty filter, we need
>> it for mdbs and possibly other object types would want that.
> 
> You only have one object type though per netlink request i.e you
> dont have in the same message fdb and mdb objects?
> 

Yep, it is object-type and family- specific, as is the call itself.

>>> expression and a get/del command alongside it. The filtering
>>> expression is very specific
>>> to the object and needs to be specified as such a TLV is appropriate.
>>>
>>
>> Right, and that is what got implemented. The filtering TLVs are bridge and fdb-specific
>> they don't affect any other subsystem. The BULK flag denotes the delete will
>> affect multiple objects.
>>
> 
> Isnt it sufficient to indicate what objects need to be deleted based on presence
> of TLVs or the service header for that object?
> 

That was my initial proposal for the fdbs. :)  When flush attribute was present it would
act on it (and filter based on embedded filters). The only non-intuitive part was that it
happened through SETLINK (changelink), which is a bit strange for a delete op.

>>> Really NLM_F_ROOT and _MATCH are sufficient. The filtering expression is
>>> the challenge.
>>
>> NLM_F_ROOT isn't usable for a DEL expression because its bit is already used by NLM_F_NONREC
>> and it wouldn't be nice to change meaning of the bit based on the subsystem. NLM_F_MATCH's bit
>> actually matches NLM_F_BULK :)
>>
> 
> Ouch. Ok, it got messy over time i guess. We probably should have
> spent more time
> discussing NLM_F_NONREC since it has a single user with very specific
> need and it
> got imposed on all.
> I get your point - i am still not sure if a global flag is the right answer.
> 

Personally, I prefer the complete netlink approach (tlvs describing the operation and filters).
In the end the flag was close enough, I kept all of the family specific code the same just the entry
point was different and other families could use it as a modifier to their del commands.

>>
>> Sometime back I played with a different idea - expressing the filters with the existing TLV objects
>> so whatever can be specified by user-space can also be used as a filter (also for filtering
>> dump requests) with some introspection. The lua idea sounds nice though.
> 
> So what is the content of the TLV in that case?

My first approach, which wasn't using bpf, used the tlv type to define specific filters on the various
types, incl. binary (which at the time was only an exact match, could be improved though). BPF w/ btf
would be the obvious choice these days.

> I think ebpf may work with some acrobatics. We did try classical ebpf and it was
> messy. Note for scaling, this is not just about Delete and Get but
> also for generated
> events, where one can send to the kernel a filter so they dont see a broadcast

Yeah, I remember CL having scaling issues in some user-space software that was snooping
netlink messages and that's the reason I looked into filtering at that time.

> of everything. See for example a use case here:
> https://www.files.netdevconf.info/d/46fd7e152d1d4f6c88ac/files/?p=/LargeScaleTCPAnalytics.pdf
> 
> cheers,
> jamal


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags
  2022-09-30 18:19               ` Nikolay Aleksandrov
@ 2022-10-02 13:59                 ` Jamal Hadi Salim
  0 siblings, 0 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2022-10-02 13:59 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, Johannes Berg,
	Pablo Neira Ayuso, Florian Westphal, Jacob Keller,
	Florent Fourcot, Guillaume Nault, Nicolas Dichtel, Hangbin Liu

On Fri, Sep 30, 2022 at 2:19 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> On 30/09/2022 19:36, Jamal Hadi Salim wrote:
> > On Fri, Sep 30, 2022 at 10:34 AM Nikolay Aleksandrov
> > <razor@blackwall.org> wrote:

[..]

> > You only have one object type though per netlink request i.e you
> > dont have in the same message fdb and mdb objects?
> >
>
> Yep, it is object-type and family- specific, as is the call itself.
>

Ok, so that makes it easier.

[..]

> > Isnt it sufficient to indicate what objects need to be deleted based on presence
> > of TLVs or the service header for that object?
> >
>
> That was my initial proposal for the fdbs. :)  When flush attribute was present it would
> act on it (and filter based on embedded filters). The only non-intuitive part was that it
> happened through SETLINK (changelink), which is a bit strange for a delete op.
>
> >>> Really NLM_F_ROOT and _MATCH are sufficient. The filtering expression is
> >>> the challenge.
> >>
> >> NLM_F_ROOT isn't usable for a DEL expression because its bit is already used by NLM_F_NONREC
> >> and it wouldn't be nice to change meaning of the bit based on the subsystem. NLM_F_MATCH's bit
> >> actually matches NLM_F_BULK :)
> >>
> >
> > Ouch. Ok, it got messy over time i guess. We probably should have
> > spent more time
> > discussing NLM_F_NONREC since it has a single user with very specific
> > need and it
> > got imposed on all.
> > I get your point - i am still not sure if a global flag is the right answer.
> >
>
> Personally, I prefer the complete netlink approach (tlvs describing the operation and filters).
> In the end the flag was close enough, I kept all of the family specific code the same just the entry
> point was different and other families could use it as a modifier to their del commands.
>

BTW, it seems that nftables is an outlier. You should still be able to
use NLM_F_ROOT
acronmy for DELETE.
act_api uses NLM_F_ROOT on delete to flush the whole table of actions. My
git-archealogy-foo says since 2005. NLM_F_NONREC was added in 2017.
So you really should just be able to use NLM_F_ROOT to check for Delete
of the whole table and TLV specific to service to filter further.

> >>
> >> Sometime back I played with a different idea - expressing the filters with the existing TLV objects
> >> so whatever can be specified by user-space can also be used as a filter (also for filtering
> >> dump requests) with some introspection. The lua idea sounds nice though.
> >
> > So what is the content of the TLV in that case?
>
> My first approach, which wasn't using bpf, used the tlv type to define specific filters on the various
> types, incl. binary (which at the time was only an exact match, could be improved though). BPF w/ btf
> would be the obvious choice these days.
>

The filter TLVs are good because the rest of the world can use them.
The challenge is experessability. Like you say above,
exact match is easy; inet diag has its own DSL to describe things which could
be easily extended. A solution like a Lua script is second best and of course
not to rule out ebpf - but that requires more skills.

> > I think ebpf may work with some acrobatics. We did try classical ebpf and it was
> > messy. Note for scaling, this is not just about Delete and Get but
> > also for generated
> > events, where one can send to the kernel a filter so they dont see a broadcast
>
> Yeah, I remember CL having scaling issues in some user-space software that was snooping
> netlink messages and that's the reason I looked into filtering at that time.
>

They are related problems. When you have 1000s of potential events it
just doesnt scale.
My idea was to specify a filter to select a subset and then open
multiple sockets
each specifying a different filter subset.

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-10-02 13:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 21:23 [PATCH net-next] docs: netlink: clarify the historical baggage of Netlink flags Jakub Kicinski
2022-09-28  7:03 ` Nikolay Aleksandrov
2022-09-28 14:21   ` Jakub Kicinski
2022-09-28 14:40     ` Nikolay Aleksandrov
2022-09-28 14:43       ` Nikolay Aleksandrov
2022-09-30 11:07     ` Jamal Hadi Salim
2022-09-30 11:29       ` Nikolay Aleksandrov
2022-09-30 14:24         ` Jamal Hadi Salim
2022-09-30 14:34           ` Nikolay Aleksandrov
2022-09-30 16:36             ` Jamal Hadi Salim
2022-09-30 18:19               ` Nikolay Aleksandrov
2022-10-02 13:59                 ` Jamal Hadi Salim
2022-09-28  8:04 ` Florent Fourcot
2022-09-28  8:55   ` Nicolas Dichtel
2022-09-28  9:21     ` Nikolay Aleksandrov
2022-09-28 14:37       ` Jakub Kicinski
2022-09-28 14:46         ` Nikolay Aleksandrov
2022-09-28 15:15           ` Jakub Kicinski
2022-09-28 15:19         ` Nicolas Dichtel
2022-09-30  2:21 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.