netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC libnetfilter_queue 0/1] Eliminate packet copy when constructing struct pkt_buff
@ 2021-05-04  2:34 Duncan Roe
  2021-05-04  2:34 ` [PATCH RFC libnetfilter_queue 1/1] " Duncan Roe
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Duncan Roe @ 2021-05-04  2:34 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel

Hi Pablo,

This is item 2 of 4 after which I think we could do a new release.

Item 3 is to eliminate packet copy when returning a mangled packet in a verdict.
I have this working in inline code, not yet factored into function calls.

Item 4 is to document how to use the library (i.e. using non-deprecated calls).
I haven't started it yet.
Perhaps I can get rid of the implicit forced-load of libnfnetlink (only used by
deprecated functions).

Cheers ... Duncan.

Duncan Roe (1):
  Eliminate packet copy when constructing struct pkt_buff

 examples/nf-queue.c                    | 22 ++++++-
 include/libnetfilter_queue/Makefile.am |  1 +
 include/libnetfilter_queue/callback.h  | 11 ++++
 include/libnetfilter_queue/pktbuff.h   |  2 +
 src/Makefile.am                        |  1 +
 src/extra/callback.c                   | 52 +++++++++++++++++
 src/extra/pktbuff.c                    | 80 ++++++++++++++++++--------
 7 files changed, 141 insertions(+), 28 deletions(-)
 create mode 100644 include/libnetfilter_queue/callback.h
 create mode 100644 src/extra/callback.c

-- 
2.17.5


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC libnetfilter_queue 1/1] Eliminate packet copy when constructing struct pkt_buff
  2021-05-04  2:34 [PATCH RFC libnetfilter_queue 0/1] Eliminate packet copy when constructing struct pkt_buff Duncan Roe
@ 2021-05-04  2:34 ` Duncan Roe
  2021-05-18  3:08   ` [PATCH libnetfilter_queue v2 0/1] Speed-up Duncan Roe
  2021-05-18  3:08   ` [PATCH libnetfilter_queue v2 1/1] Eliminate packet copy when constructing struct pkt_buff Duncan Roe
  2021-05-04  2:50 ` [PATCH RFC libnetfilter_queue 0/1] " Duncan Roe
  2021-05-31  3:11 ` Duncan Roe
  2 siblings, 2 replies; 9+ messages in thread
From: Duncan Roe @ 2021-05-04  2:34 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel

To avoid a copy, the new code takes advantage of the fact that the netfilter
netlink queue never returns multipart messages.
This means that the buffer space following that callback data is available for
packet expansion when mangling.

nfq_cb_run() is a new nfq-specific callback runqueue for netlink messages.
The principal function of nfq_cb_run() is to pass to the called function what is
the length of free space after the packet.
As a side benefit, nfq_cb_run() also gives the called function a pointer to a
zeroised struct pkt_buff, avoiding the malloc / free that was previously needed.

nfq_cb_t is a new typedef for the function called by nfq_cb_run()
[c.f. mnl_cb_t / mnl_cb_run].

No doxygen documentation update yet -
will do once function prototypes are agreed.
In the meantime, examples/nf-queue.c is updated to demonstrate the new API.

Signed-off-by: Duncan Roe <duncan_roe@optusnet.com.au>
---
 examples/nf-queue.c                    | 22 ++++++-
 include/libnetfilter_queue/Makefile.am |  1 +
 include/libnetfilter_queue/callback.h  | 11 ++++
 include/libnetfilter_queue/pktbuff.h   |  2 +
 src/Makefile.am                        |  1 +
 src/extra/callback.c                   | 52 +++++++++++++++++
 src/extra/pktbuff.c                    | 80 ++++++++++++++++++--------
 7 files changed, 141 insertions(+), 28 deletions(-)
 create mode 100644 include/libnetfilter_queue/callback.h
 create mode 100644 src/extra/callback.c

diff --git a/examples/nf-queue.c b/examples/nf-queue.c
index 3da2c24..15c0d77 100644
--- a/examples/nf-queue.c
+++ b/examples/nf-queue.c
@@ -13,6 +13,8 @@
 #include <linux/types.h>
 #include <linux/netfilter/nfnetlink_queue.h>
 
+#include <libnetfilter_queue/pktbuff.h>
+#include <libnetfilter_queue/callback.h>
 #include <libnetfilter_queue/libnetfilter_queue.h>
 
 /* only for NFQA_CT, not needed otherwise: */
@@ -46,13 +48,17 @@ nfq_send_verdict(int queue_num, uint32_t id)
 	}
 }
 
-static int queue_cb(const struct nlmsghdr *nlh, void *data)
+static int queue_cb(const struct nlmsghdr *nlh, void *data,
+		    struct pkt_buff *supplied_pktb, size_t supplied_extra)
 {
 	struct nfqnl_msg_packet_hdr *ph = NULL;
 	struct nlattr *attr[NFQA_MAX+1] = {};
+	struct pkt_buff *pktb;
 	uint32_t id = 0, skbinfo;
 	struct nfgenmsg *nfg;
+	uint8_t *payload;
 	uint16_t plen;
+	int i;
 
 	if (nfq_nlmsg_parse(nlh, attr) < 0) {
 		perror("problems parsing");
@@ -69,7 +75,10 @@ static int queue_cb(const struct nlmsghdr *nlh, void *data)
 	ph = mnl_attr_get_payload(attr[NFQA_PACKET_HDR]);
 
 	plen = mnl_attr_get_payload_len(attr[NFQA_PAYLOAD]);
-	/* void *payload = mnl_attr_get_payload(attr[NFQA_PAYLOAD]); */
+	payload = mnl_attr_get_payload(attr[NFQA_PAYLOAD]);
+
+	pktb = pktb_populate(supplied_pktb, AF_INET, payload, plen,
+			     supplied_extra);
 
 	skbinfo = attr[NFQA_SKB_INFO] ? ntohl(mnl_attr_get_u32(attr[NFQA_SKB_INFO])) : 0;
 
@@ -97,6 +106,13 @@ static int queue_cb(const struct nlmsghdr *nlh, void *data)
 		printf(", checksum not ready");
 	puts(")");
 
+	printf("payload (len=%d) [", plen);
+
+	for (i = 0; i < pktb_len(pktb); i++)
+		printf("%02x", payload[i]);
+
+	printf("]\n");
+
 	nfq_send_verdict(ntohs(nfg->res_id), id);
 
 	return MNL_CB_OK;
@@ -168,7 +184,7 @@ int main(int argc, char *argv[])
 			exit(EXIT_FAILURE);
 		}
 
-		ret = mnl_cb_run(buf, ret, 0, portid, queue_cb, NULL);
+		ret = nfq_cb_run(buf, ret, sizeof_buf, portid, queue_cb, NULL);
 		if (ret < 0){
 			perror("mnl_cb_run");
 			exit(EXIT_FAILURE);
diff --git a/include/libnetfilter_queue/Makefile.am b/include/libnetfilter_queue/Makefile.am
index e436bab..ed6524b 100644
--- a/include/libnetfilter_queue/Makefile.am
+++ b/include/libnetfilter_queue/Makefile.am
@@ -5,4 +5,5 @@ pkginclude_HEADERS = libnetfilter_queue.h	\
 		     libnetfilter_queue_ipv6.h	\
 		     libnetfilter_queue_tcp.h	\
 		     libnetfilter_queue_udp.h	\
+		     callback.h			\
 		     pktbuff.h
diff --git a/include/libnetfilter_queue/callback.h b/include/libnetfilter_queue/callback.h
new file mode 100644
index 0000000..27bfe31
--- /dev/null
+++ b/include/libnetfilter_queue/callback.h
@@ -0,0 +1,11 @@
+#ifndef _LIBNETFILTER_QUEUE_CALLBACK_H_
+#define _LIBNETFILTER_QUEUE_CALLBACK_H_
+
+struct nlattr;
+struct pkt_buff;
+
+typedef int (*nfq_cb_t)(const struct nlmsghdr *nlh, void *data, struct pkt_buff *pktb, size_t extra);
+
+int nfq_cb_run(const void *buf, size_t buflen, size_t bufcap, unsigned int portid, nfq_cb_t cb_data, void *data);
+
+#endif
diff --git a/include/libnetfilter_queue/pktbuff.h b/include/libnetfilter_queue/pktbuff.h
index 42bc153..33829cc 100644
--- a/include/libnetfilter_queue/pktbuff.h
+++ b/include/libnetfilter_queue/pktbuff.h
@@ -6,6 +6,8 @@ struct pkt_buff;
 struct pkt_buff *pktb_alloc(int family, void *data, size_t len, size_t extra);
 void pktb_free(struct pkt_buff *pktb);
 
+struct pkt_buff *pktb_populate(struct pkt_buff *pktb, int family, void *data, size_t len, size_t extra);
+
 uint8_t *pktb_data(struct pkt_buff *pktb);
 uint32_t pktb_len(struct pkt_buff *pktb);
 
diff --git a/src/Makefile.am b/src/Makefile.am
index 079853e..376bc45 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -31,6 +31,7 @@ libnetfilter_queue_la_LDFLAGS = -Wc,-nostartfiles \
 libnetfilter_queue_la_SOURCES = libnetfilter_queue.c	\
 				nlmsg.c			\
 				extra/checksum.c	\
+				extra/callback.c	\
 				extra/icmp.c		\
 				extra/ipv6.c		\
 				extra/tcp.c		\
diff --git a/src/extra/callback.c b/src/extra/callback.c
new file mode 100644
index 0000000..50b15ce
--- /dev/null
+++ b/src/extra/callback.c
@@ -0,0 +1,52 @@
+/*
+ * (C) 2021 by Duncan Roe <duncan_roe@optusnet.com.au>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; either version 2.1 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <libmnl/libmnl.h>
+#include <linux/netlink.h>
+#include <libnetfilter_queue/callback.h>
+#include <libnetfilter_queue/libnetfilter_queue.h>
+#include "internal.h"
+
+/* ---------------------------------------------------------------------- */
+/* It would be less code to have local_cb() declared within nfq_cb_run(). */
+/* gcc is fine with that; unfortunately clang 8.0.1 is not.               */
+/* Use thread local dumps for the stack variables that lexical scoping    */
+/* would have allowed local_cb() to access therefore.                     */
+/* ---------------------------------------------------------------------- */
+
+static __thread nfq_cb_t cb_func_sv;
+static __thread size_t bufcap_sv, buflen_sv;
+
+static int local_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct pkt_buff pktb_instance = { };
+
+	return cb_func_sv(nlh, data, &pktb_instance, bufcap_sv - buflen_sv);
+}
+
+EXPORT_SYMBOL int nfq_cb_run(const void *buf, size_t buflen, size_t bufcap,
+			     unsigned int portid, nfq_cb_t cb_func, void *data)
+{
+	const struct nlmsghdr *nlh = buf;
+
+	cb_func_sv = cb_func;
+	bufcap_sv = bufcap;
+	buflen_sv = buflen;
+
+	/* Verify not multi-part */
+	if (nlh->nlmsg_flags & NLM_F_MULTI) {
+		errno = E2BIG;
+		return MNL_CB_ERROR;
+	}
+
+
+	return mnl_cb_run(buf, buflen, 0, portid, local_cb, data);
+}
diff --git a/src/extra/pktbuff.c b/src/extra/pktbuff.c
index 6dd0ca9..df23335 100644
--- a/src/extra/pktbuff.c
+++ b/src/extra/pktbuff.c
@@ -19,6 +19,8 @@
 #include <netinet/tcp.h>
 
 #include "internal.h"
+#include <libnetfilter_queue/pktbuff.h> /* I.e. local copy */
+					/* (to verify prototypes) */
 
 /**
  * \defgroup pktbuff User-space network packet buffer
@@ -29,6 +31,44 @@
  * @{
  */
 
+static int __pktb_setup(int family, struct pkt_buff *pktb)
+{
+	struct ethhdr *ethhdr;
+
+	switch (family) {
+	case AF_INET:
+	case AF_INET6:
+		pktb->network_header = pktb->data;
+		break;
+	case AF_BRIDGE:
+		ethhdr = (struct ethhdr *)pktb->data;
+		pktb->mac_header = pktb->data;
+
+		switch(ethhdr->h_proto) {
+		case ETH_P_IP:
+		case ETH_P_IPV6:
+			pktb->network_header = pktb->data + ETH_HLEN;
+			break;
+		default:
+			/* This protocol is unsupported. */
+			errno = EPROTONOSUPPORT;
+			return -1;
+		}
+		break;
+	}
+
+	return 0;
+}
+
+static void pktb_setup_metadata(struct pkt_buff *pktb, void *pkt_data,
+				size_t len, size_t extra)
+{
+	pktb->len = len;
+	pktb->data_len = len + extra;
+
+	pktb->data = pkt_data;
+}
+
 /**
  * pktb_alloc - allocate a new packet buffer
  * \param family Indicate what family. Currently supported families are
@@ -52,7 +92,6 @@ EXPORT_SYMBOL
 struct pkt_buff *pktb_alloc(int family, void *data, size_t len, size_t extra)
 {
 	struct pkt_buff *pktb;
-	struct ethhdr *ethhdr;
 	void *pkt_data;
 
 	pktb = calloc(1, sizeof(struct pkt_buff) + len + extra);
@@ -63,36 +102,27 @@ struct pkt_buff *pktb_alloc(int family, void *data, size_t len, size_t extra)
 	pkt_data = (uint8_t *)pktb + sizeof(struct pkt_buff);
 	memcpy(pkt_data, data, len);
 
-	pktb->len = len;
-	pktb->data_len = len + extra;
+	pktb_setup_metadata(pktb, pkt_data, len, extra);
 
-	pktb->data = pkt_data;
+	if (__pktb_setup(family, pktb) < 0) {
+		free(pktb);
+		return NULL;
+	}
 
-	switch(family) {
-	case AF_INET:
-	case AF_INET6:
-		pktb->network_header = pktb->data;
-		break;
-	case AF_BRIDGE:
-		ethhdr = (struct ethhdr *)pktb->data;
-		pktb->mac_header = pktb->data;
+	return pktb;
+}
 
-		switch(ethhdr->h_proto) {
-		case ETH_P_IP:
-		case ETH_P_IPV6:
-			pktb->network_header = pktb->data + ETH_HLEN;
-			break;
-		default:
-			/* This protocol is unsupported. */
-			errno = EPROTONOSUPPORT;
-			free(pktb);
-			return NULL;
-		}
-		break;
-	}
+EXPORT_SYMBOL
+struct pkt_buff *pktb_populate(struct pkt_buff *pktb, int family, void *data,
+			       size_t len, size_t extra)
+{
+	pktb_setup_metadata(pktb, data, len, extra);
+	if (__pktb_setup(family, pktb) < 0)
+		pktb = NULL;
 	return pktb;
 }
 
+
 /**
  * pktb_data - get pointer to network packet
  * \param pktb Pointer to userspace packet buffer
-- 
2.17.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC libnetfilter_queue 0/1] Eliminate packet copy when constructing struct pkt_buff
  2021-05-04  2:34 [PATCH RFC libnetfilter_queue 0/1] Eliminate packet copy when constructing struct pkt_buff Duncan Roe
  2021-05-04  2:34 ` [PATCH RFC libnetfilter_queue 1/1] " Duncan Roe
@ 2021-05-04  2:50 ` Duncan Roe
  2021-05-31  3:11 ` Duncan Roe
  2 siblings, 0 replies; 9+ messages in thread
From: Duncan Roe @ 2021-05-04  2:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

On Tue, May 04, 2021 at 12:34:30PM +1000, Duncan Roe wrote:
> Hi Pablo,
>
> This is item 2 of 4 after which I think we could do a new release.
>
[...]

Item 1 was "`make distcheck` passes with doxygen enabled", applied.

Cheers ... Duncan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH libnetfilter_queue v2 0/1] Speed-up
  2021-05-04  2:34 ` [PATCH RFC libnetfilter_queue 1/1] " Duncan Roe
@ 2021-05-18  3:08   ` Duncan Roe
  2021-05-18  3:08   ` [PATCH libnetfilter_queue v2 1/1] Eliminate packet copy when constructing struct pkt_buff Duncan Roe
  1 sibling, 0 replies; 9+ messages in thread
From: Duncan Roe @ 2021-05-18  3:08 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, Duncan Roe

Hi Pablo,

The RFC didn't elicit any comments in 2 weeks so I guess the new function
prototypes are acceptable.

This update replaces use of __thread variables wih passing down the needed info
via the data arg. This saves 5 seconds CPU over 100,000,000 packets on my AMD
Ryzen 7. The original data arg is also passed down.

Please apply this patch when next you find time to look at user space,

Cheers ... Duncan.

Duncan Roe (1):
  Eliminate packet copy when constructing struct pkt_buff

 examples/nf-queue.c                    | 22 ++++++-
 include/libnetfilter_queue/Makefile.am |  1 +
 include/libnetfilter_queue/callback.h  | 11 ++++
 include/libnetfilter_queue/pktbuff.h   |  2 +
 src/Makefile.am                        |  1 +
 src/extra/callback.c                   | 60 +++++++++++++++++++
 src/extra/pktbuff.c                    | 80 ++++++++++++++++++--------
 7 files changed, 149 insertions(+), 28 deletions(-)
 create mode 100644 include/libnetfilter_queue/callback.h
 create mode 100644 src/extra/callback.c

--
2.17.5


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH libnetfilter_queue v2 1/1] Eliminate packet copy when constructing struct pkt_buff
  2021-05-04  2:34 ` [PATCH RFC libnetfilter_queue 1/1] " Duncan Roe
  2021-05-18  3:08   ` [PATCH libnetfilter_queue v2 0/1] Speed-up Duncan Roe
@ 2021-05-18  3:08   ` Duncan Roe
  2021-05-27 20:23     ` Pablo Neira Ayuso
  1 sibling, 1 reply; 9+ messages in thread
From: Duncan Roe @ 2021-05-18  3:08 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, Duncan Roe

To avoid a copy, the new code takes advantage of the fact that the netfilter
netlink queue never returns multipart messages.
This means that the buffer space following that callback data is available for
packet expansion when mangling.

nfq_cb_run() is a new nfq-specific callback runqueue for netlink messages.
The principal function of nfq_cb_run() is to pass to the called function what is
the length of free space after the packet.
As a side benefit, nfq_cb_run() also gives the called functio a pointer to a
zeroised struct pkt_buff, avoiding the malloc / free that was previously needed.

nfq_cb_t is a new typedef for the function called by nfq_cb_run()
[c.f. mnl_cb_t / mnl_cb_run].

No doxygen documentation update yet -
will do once function prototypes are agreed.
In the meantime, examples/nf-queue.c is updated to demonstrate the new API.

Signed-off-by: Duncan Roe <duncan_roe@optusnet.com.au>
---
 examples/nf-queue.c                    | 22 ++++++-
 include/libnetfilter_queue/Makefile.am |  1 +
 include/libnetfilter_queue/callback.h  | 11 ++++
 include/libnetfilter_queue/pktbuff.h   |  2 +
 src/Makefile.am                        |  1 +
 src/extra/callback.c                   | 60 +++++++++++++++++++
 src/extra/pktbuff.c                    | 80 ++++++++++++++++++--------
 7 files changed, 149 insertions(+), 28 deletions(-)
 create mode 100644 include/libnetfilter_queue/callback.h
 create mode 100644 src/extra/callback.c

diff --git a/examples/nf-queue.c b/examples/nf-queue.c
index 3da2c24..15c0d77 100644
--- a/examples/nf-queue.c
+++ b/examples/nf-queue.c
@@ -13,6 +13,8 @@
 #include <linux/types.h>
 #include <linux/netfilter/nfnetlink_queue.h>
 
+#include <libnetfilter_queue/pktbuff.h>
+#include <libnetfilter_queue/callback.h>
 #include <libnetfilter_queue/libnetfilter_queue.h>
 
 /* only for NFQA_CT, not needed otherwise: */
@@ -46,13 +48,17 @@ nfq_send_verdict(int queue_num, uint32_t id)
 	}
 }
 
-static int queue_cb(const struct nlmsghdr *nlh, void *data)
+static int queue_cb(const struct nlmsghdr *nlh, void *data,
+		    struct pkt_buff *supplied_pktb, size_t supplied_extra)
 {
 	struct nfqnl_msg_packet_hdr *ph = NULL;
 	struct nlattr *attr[NFQA_MAX+1] = {};
+	struct pkt_buff *pktb;
 	uint32_t id = 0, skbinfo;
 	struct nfgenmsg *nfg;
+	uint8_t *payload;
 	uint16_t plen;
+	int i;
 
 	if (nfq_nlmsg_parse(nlh, attr) < 0) {
 		perror("problems parsing");
@@ -69,7 +75,10 @@ static int queue_cb(const struct nlmsghdr *nlh, void *data)
 	ph = mnl_attr_get_payload(attr[NFQA_PACKET_HDR]);
 
 	plen = mnl_attr_get_payload_len(attr[NFQA_PAYLOAD]);
-	/* void *payload = mnl_attr_get_payload(attr[NFQA_PAYLOAD]); */
+	payload = mnl_attr_get_payload(attr[NFQA_PAYLOAD]);
+
+	pktb = pktb_populate(supplied_pktb, AF_INET, payload, plen,
+			     supplied_extra);
 
 	skbinfo = attr[NFQA_SKB_INFO] ? ntohl(mnl_attr_get_u32(attr[NFQA_SKB_INFO])) : 0;
 
@@ -97,6 +106,13 @@ static int queue_cb(const struct nlmsghdr *nlh, void *data)
 		printf(", checksum not ready");
 	puts(")");
 
+	printf("payload (len=%d) [", plen);
+
+	for (i = 0; i < pktb_len(pktb); i++)
+		printf("%02x", payload[i]);
+
+	printf("]\n");
+
 	nfq_send_verdict(ntohs(nfg->res_id), id);
 
 	return MNL_CB_OK;
@@ -168,7 +184,7 @@ int main(int argc, char *argv[])
 			exit(EXIT_FAILURE);
 		}
 
-		ret = mnl_cb_run(buf, ret, 0, portid, queue_cb, NULL);
+		ret = nfq_cb_run(buf, ret, sizeof_buf, portid, queue_cb, NULL);
 		if (ret < 0){
 			perror("mnl_cb_run");
 			exit(EXIT_FAILURE);
diff --git a/include/libnetfilter_queue/Makefile.am b/include/libnetfilter_queue/Makefile.am
index e436bab..ed6524b 100644
--- a/include/libnetfilter_queue/Makefile.am
+++ b/include/libnetfilter_queue/Makefile.am
@@ -5,4 +5,5 @@ pkginclude_HEADERS = libnetfilter_queue.h	\
 		     libnetfilter_queue_ipv6.h	\
 		     libnetfilter_queue_tcp.h	\
 		     libnetfilter_queue_udp.h	\
+		     callback.h			\
 		     pktbuff.h
diff --git a/include/libnetfilter_queue/callback.h b/include/libnetfilter_queue/callback.h
new file mode 100644
index 0000000..27bfe31
--- /dev/null
+++ b/include/libnetfilter_queue/callback.h
@@ -0,0 +1,11 @@
+#ifndef _LIBNETFILTER_QUEUE_CALLBACK_H_
+#define _LIBNETFILTER_QUEUE_CALLBACK_H_
+
+struct nlattr;
+struct pkt_buff;
+
+typedef int (*nfq_cb_t)(const struct nlmsghdr *nlh, void *data, struct pkt_buff *pktb, size_t extra);
+
+int nfq_cb_run(const void *buf, size_t buflen, size_t bufcap, unsigned int portid, nfq_cb_t cb_data, void *data);
+
+#endif
diff --git a/include/libnetfilter_queue/pktbuff.h b/include/libnetfilter_queue/pktbuff.h
index 42bc153..33829cc 100644
--- a/include/libnetfilter_queue/pktbuff.h
+++ b/include/libnetfilter_queue/pktbuff.h
@@ -6,6 +6,8 @@ struct pkt_buff;
 struct pkt_buff *pktb_alloc(int family, void *data, size_t len, size_t extra);
 void pktb_free(struct pkt_buff *pktb);
 
+struct pkt_buff *pktb_populate(struct pkt_buff *pktb, int family, void *data, size_t len, size_t extra);
+
 uint8_t *pktb_data(struct pkt_buff *pktb);
 uint32_t pktb_len(struct pkt_buff *pktb);
 
diff --git a/src/Makefile.am b/src/Makefile.am
index 079853e..376bc45 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -31,6 +31,7 @@ libnetfilter_queue_la_LDFLAGS = -Wc,-nostartfiles \
 libnetfilter_queue_la_SOURCES = libnetfilter_queue.c	\
 				nlmsg.c			\
 				extra/checksum.c	\
+				extra/callback.c	\
 				extra/icmp.c		\
 				extra/ipv6.c		\
 				extra/tcp.c		\
diff --git a/src/extra/callback.c b/src/extra/callback.c
new file mode 100644
index 0000000..2d67848
--- /dev/null
+++ b/src/extra/callback.c
@@ -0,0 +1,60 @@
+/*
+ * (C) 2021 by Duncan Roe <duncan_roe@optusnet.com.au>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; either version 2.1 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <libmnl/libmnl.h>
+#include <linux/netlink.h>
+#include <libnetfilter_queue/callback.h>
+#include <libnetfilter_queue/libnetfilter_queue.h>
+#include "internal.h"
+
+/* ---------------------------------------------------------------------- */
+/* It would be less code to have local_cb() declared within nfq_cb_run(). */
+/* gcc is fine with that; unfortunately clang 8.0.1 is not.               */
+/* Lexical scoping would have given access to the 3 outer stack variables */
+/* needed by local_cb(); absent that, pass them down through the data arg */
+/* ---------------------------------------------------------------------- */
+
+typedef struct qwerty
+{
+	nfq_cb_t cb_func;
+	size_t buflen;
+	size_t bufcap;
+	void *data;
+} werty;
+
+static int local_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct pkt_buff pktb_instance = { };
+	werty *w = (werty *)data;
+
+	return w->cb_func(nlh, w->data, &pktb_instance, w->bufcap - w->buflen);
+}
+
+EXPORT_SYMBOL int nfq_cb_run(const void *buf, size_t buflen, size_t bufcap,
+			     unsigned int portid, nfq_cb_t cb_func, void *data)
+{
+	const struct nlmsghdr *nlh = buf;
+	werty wert;
+
+	wert.cb_func = cb_func;
+	wert.bufcap = bufcap;
+	wert.buflen = buflen;
+	wert.data = data;
+
+	/* Verify not multi-part */
+	if (nlh->nlmsg_flags & NLM_F_MULTI) {
+		errno = E2BIG;
+		return MNL_CB_ERROR;
+	}
+
+
+	return mnl_cb_run(buf, buflen, 0, portid, local_cb, &wert);
+}
diff --git a/src/extra/pktbuff.c b/src/extra/pktbuff.c
index 6dd0ca9..df23335 100644
--- a/src/extra/pktbuff.c
+++ b/src/extra/pktbuff.c
@@ -19,6 +19,8 @@
 #include <netinet/tcp.h>
 
 #include "internal.h"
+#include <libnetfilter_queue/pktbuff.h> /* I.e. local copy */
+					/* (to verify prototypes) */
 
 /**
  * \defgroup pktbuff User-space network packet buffer
@@ -29,6 +31,44 @@
  * @{
  */
 
+static int __pktb_setup(int family, struct pkt_buff *pktb)
+{
+	struct ethhdr *ethhdr;
+
+	switch (family) {
+	case AF_INET:
+	case AF_INET6:
+		pktb->network_header = pktb->data;
+		break;
+	case AF_BRIDGE:
+		ethhdr = (struct ethhdr *)pktb->data;
+		pktb->mac_header = pktb->data;
+
+		switch(ethhdr->h_proto) {
+		case ETH_P_IP:
+		case ETH_P_IPV6:
+			pktb->network_header = pktb->data + ETH_HLEN;
+			break;
+		default:
+			/* This protocol is unsupported. */
+			errno = EPROTONOSUPPORT;
+			return -1;
+		}
+		break;
+	}
+
+	return 0;
+}
+
+static void pktb_setup_metadata(struct pkt_buff *pktb, void *pkt_data,
+				size_t len, size_t extra)
+{
+	pktb->len = len;
+	pktb->data_len = len + extra;
+
+	pktb->data = pkt_data;
+}
+
 /**
  * pktb_alloc - allocate a new packet buffer
  * \param family Indicate what family. Currently supported families are
@@ -52,7 +92,6 @@ EXPORT_SYMBOL
 struct pkt_buff *pktb_alloc(int family, void *data, size_t len, size_t extra)
 {
 	struct pkt_buff *pktb;
-	struct ethhdr *ethhdr;
 	void *pkt_data;
 
 	pktb = calloc(1, sizeof(struct pkt_buff) + len + extra);
@@ -63,36 +102,27 @@ struct pkt_buff *pktb_alloc(int family, void *data, size_t len, size_t extra)
 	pkt_data = (uint8_t *)pktb + sizeof(struct pkt_buff);
 	memcpy(pkt_data, data, len);
 
-	pktb->len = len;
-	pktb->data_len = len + extra;
+	pktb_setup_metadata(pktb, pkt_data, len, extra);
 
-	pktb->data = pkt_data;
+	if (__pktb_setup(family, pktb) < 0) {
+		free(pktb);
+		return NULL;
+	}
 
-	switch(family) {
-	case AF_INET:
-	case AF_INET6:
-		pktb->network_header = pktb->data;
-		break;
-	case AF_BRIDGE:
-		ethhdr = (struct ethhdr *)pktb->data;
-		pktb->mac_header = pktb->data;
+	return pktb;
+}
 
-		switch(ethhdr->h_proto) {
-		case ETH_P_IP:
-		case ETH_P_IPV6:
-			pktb->network_header = pktb->data + ETH_HLEN;
-			break;
-		default:
-			/* This protocol is unsupported. */
-			errno = EPROTONOSUPPORT;
-			free(pktb);
-			return NULL;
-		}
-		break;
-	}
+EXPORT_SYMBOL
+struct pkt_buff *pktb_populate(struct pkt_buff *pktb, int family, void *data,
+			       size_t len, size_t extra)
+{
+	pktb_setup_metadata(pktb, data, len, extra);
+	if (__pktb_setup(family, pktb) < 0)
+		pktb = NULL;
 	return pktb;
 }
 
+
 /**
  * pktb_data - get pointer to network packet
  * \param pktb Pointer to userspace packet buffer
-- 
2.17.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH libnetfilter_queue v2 1/1] Eliminate packet copy when constructing struct pkt_buff
  2021-05-18  3:08   ` [PATCH libnetfilter_queue v2 1/1] Eliminate packet copy when constructing struct pkt_buff Duncan Roe
@ 2021-05-27 20:23     ` Pablo Neira Ayuso
  2021-07-18  4:28       ` Duncan Roe
  2021-08-04  1:38       ` Duncan Roe
  0 siblings, 2 replies; 9+ messages in thread
From: Pablo Neira Ayuso @ 2021-05-27 20:23 UTC (permalink / raw)
  To: Duncan Roe; +Cc: netfilter-devel

On Tue, May 18, 2021 at 01:08:48PM +1000, Duncan Roe wrote:
> To avoid a copy, the new code takes advantage of the fact that the netfilter
> netlink queue never returns multipart messages.
> This means that the buffer space following that callback data is available for
> packet expansion when mangling.
> 
> nfq_cb_run() is a new nfq-specific callback runqueue for netlink messages.
> The principal function of nfq_cb_run() is to pass to the called function what is
> the length of free space after the packet.
> As a side benefit, nfq_cb_run() also gives the called functio a pointer to a
> zeroised struct pkt_buff, avoiding the malloc / free that was previously needed.
> 
> nfq_cb_t is a new typedef for the function called by nfq_cb_run()
> [c.f. mnl_cb_t / mnl_cb_run].

Interesting idea: let me get back to you with a proposal based on this
patch.

Meanwhile, I have pushed out the __pktb_setup() function which is
going to be needed:

http://git.netfilter.org/libnetfilter_queue/commit/?id=710f891c8a6116f520948f5cf448489947fb7d78

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC libnetfilter_queue 0/1] Eliminate packet copy when constructing struct pkt_buff
  2021-05-04  2:34 [PATCH RFC libnetfilter_queue 0/1] Eliminate packet copy when constructing struct pkt_buff Duncan Roe
  2021-05-04  2:34 ` [PATCH RFC libnetfilter_queue 1/1] " Duncan Roe
  2021-05-04  2:50 ` [PATCH RFC libnetfilter_queue 0/1] " Duncan Roe
@ 2021-05-31  3:11 ` Duncan Roe
  2 siblings, 0 replies; 9+ messages in thread
From: Duncan Roe @ 2021-05-31  3:11 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Netfilter Development

Hi Pablo,

On Tue, May 04, 2021 at 12:34:30PM +1000, Duncan Roe wrote:
> Hi Pablo,
>
> This is item 2 of 4 after which I think we could do a new release.
>
> Item 3 is to eliminate packet copy when returning a mangled packet
> in a verdict.
> I have this working in inline code, not yet factored into function calls.
>
[SNIP]

I have abandoned item 3. Timing tests showed sendmsg() of 3 or 4 buffers to be
slower than memcpy() them into 1 buf and send that (i.e. use
nfq_nlmsg_verdict_put_pkt() and mnl_socket_sendto()).

Instead, I'll take a look at stopping the automatic load of libnfnetlink, as we
discussed a while back.

Cheers ... Duncan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH libnetfilter_queue v2 1/1] Eliminate packet copy when constructing struct pkt_buff
  2021-05-27 20:23     ` Pablo Neira Ayuso
@ 2021-07-18  4:28       ` Duncan Roe
  2021-08-04  1:38       ` Duncan Roe
  1 sibling, 0 replies; 9+ messages in thread
From: Duncan Roe @ 2021-07-18  4:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Netfilter Development

On Thu, May 27, 2021 at 10:23:15PM +0200, Pablo Neira Ayuso wrote:
> On Tue, May 18, 2021 at 01:08:48PM +1000, Duncan Roe wrote:
> > To avoid a copy, the new code takes advantage of the fact that the netfilter
> > netlink queue never returns multipart messages.
> > This means that the buffer space following that callback data is available for
> > packet expansion when mangling.
> >
> > nfq_cb_run() is a new nfq-specific callback runqueue for netlink messages.
> > The principal function of nfq_cb_run() is to pass to the called function what is
> > the length of free space after the packet.
> > As a side benefit, nfq_cb_run() also gives the called functio a pointer to a
> > zeroised struct pkt_buff, avoiding the malloc / free that was previously needed.
> >
> > nfq_cb_t is a new typedef for the function called by nfq_cb_run()
> > [c.f. mnl_cb_t / mnl_cb_run].
>
> Interesting idea: let me get back to you with a proposal based on this
> patch.
>
[...]

It occurred to me there is no real need to use a callback any more.

However, mnl_cb_run() does some checks before and after invoking the cb.
Some of these checks may still be valid, so leave it as_is?

This patch has been on the table for a while, any idea when you might find time
to respond?

Cheers ... Duncan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH libnetfilter_queue v2 1/1] Eliminate packet copy when constructing struct pkt_buff
  2021-05-27 20:23     ` Pablo Neira Ayuso
  2021-07-18  4:28       ` Duncan Roe
@ 2021-08-04  1:38       ` Duncan Roe
  1 sibling, 0 replies; 9+ messages in thread
From: Duncan Roe @ 2021-08-04  1:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Netfilter Development

Hi Pablo,

On Thu, May 27, 2021 at 10:23:15PM +0200, Pablo Neira Ayuso wrote:
> On Tue, May 18, 2021 at 01:08:48PM +1000, Duncan Roe wrote:
> > To avoid a copy, the new code takes advantage of the fact that the netfilter
> > netlink queue never returns multipart messages.
[...]
>
> Interesting idea: let me get back to you with a proposal based on this
> patch.
>
> Meanwhile, I have pushed out the __pktb_setup() function which is
> going to be needed:
>
> http://git.netfilter.org/libnetfilter_queue/commit/?id=710f891c8a6116f520948f5cf448489947fb7d78
>
> Thanks.

It also occurred to me to wonder what is the benefit of having struct pkt_buff
be opaque? It's never going to have a buffer tacked on the end of it any more,
so can simply be declared to be sizeof(struct pkt_buff).

Users could read the values of struct members directly rather than having to
learn and use the current procedural interface. That would have to use less
instructions to achieve, but I have yet to benchmark to see if the improvement
is measureable.

We could document when (if ever) the structure may be written to directly but
even if developers break the rules, what damage can they do? This is a userspace
program: they're not going to crash the kernel.

I sidestepped this question in the code by passing down pktb_instance from
local_cb().

Regardless of the above, do you think you might have a proposal for me some time
soon?

Cheers ... Duncan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-08-04  1:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04  2:34 [PATCH RFC libnetfilter_queue 0/1] Eliminate packet copy when constructing struct pkt_buff Duncan Roe
2021-05-04  2:34 ` [PATCH RFC libnetfilter_queue 1/1] " Duncan Roe
2021-05-18  3:08   ` [PATCH libnetfilter_queue v2 0/1] Speed-up Duncan Roe
2021-05-18  3:08   ` [PATCH libnetfilter_queue v2 1/1] Eliminate packet copy when constructing struct pkt_buff Duncan Roe
2021-05-27 20:23     ` Pablo Neira Ayuso
2021-07-18  4:28       ` Duncan Roe
2021-08-04  1:38       ` Duncan Roe
2021-05-04  2:50 ` [PATCH RFC libnetfilter_queue 0/1] " Duncan Roe
2021-05-31  3:11 ` Duncan Roe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).