* [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation
@ 2013-12-13 14:22 Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 1/6] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Thomas Graf @ 2013-12-13 14:22 UTC (permalink / raw)
To: jesse
Cc: dev, eric.dumazet, fleitner, ffusco, dborkmann, bhutchings,
netdev, davem
Series implementing a zerocopy method for OVS upcall messages. Based
on top of commit: (''openvswitch: Enable memory mapped Netlink i/o'')
Thomas Graf (6):
net: Export skb_zerocopy() to zerocopy from one skb to another
openvswitch: Allow user space to announce ability to accept unaligned
Netlink messages
openvswitch: Drop user features if old user space attempted to create
datapath
openvswitch: Pass datapath into userspace queue functions
openvswitch: Use skb_zerocopy() for upcall
openvswitch: Compute checksum in skb_gso_segment() if needed
include/linux/skbuff.h | 3 +
include/uapi/linux/openvswitch.h | 14 ++++-
net/core/skbuff.c | 85 ++++++++++++++++++++++++++++
net/netfilter/nfnetlink_queue_core.c | 59 ++-----------------
net/openvswitch/datapath.c | 106 ++++++++++++++++++++++++++---------
net/openvswitch/datapath.h | 2 +
6 files changed, 185 insertions(+), 84 deletions(-)
---
V9: - Dropped patches 1-3 (merged)
- Dropped skb argument from upcall_msg_size()
- Dropped dp_ifindex and net argument from all queue functions
- Added patch to remove NETIF_F_HW_CSUM from __skb_gso_segment()
V8: - Dropped patch adding NLM_F_REPLACE support, I'll pursue this in a
separate patch series. Addresses Jesse's comments.
- Improved comment describing OVS_DATAPATH_VERSION bump.
V7: - removed unintential kernel-doc comment
- WARN_ONCE() -> WARN(), message on single line, added \n
V6: - Added memory mapped netlink i/o support
- Drop user_features if old user space not aware of user features
reuses an existing datapath
V5: - Removed padding requirement in user space
- Added OVS_DP_F_UNALIGNED flag let user space signal ability to
receive unaligned Netlink messages, fall back to linear copy
otherwise.
V4: - Daniel Borkmann pointed out that the style in skbuff.h has changed
in net-next, adapted to no longer using extern in headers.
V3: - Removed unneeded alignment of nlmsg_len after padding
V2: - Added skb_zerocopy_headlen() to calculate headroom of destination
buffer. This also takes care of the from->head_frag issue.
- Attribute alignment for frag_list case
- API documentation
- performance data for CHECKSUM_PARTIAL tx case
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next 1/6] net: Export skb_zerocopy() to zerocopy from one skb to another
2013-12-13 14:22 [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Thomas Graf
@ 2013-12-13 14:22 ` Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 2/6] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Thomas Graf
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Thomas Graf @ 2013-12-13 14:22 UTC (permalink / raw)
To: jesse
Cc: dev, eric.dumazet, fleitner, ffusco, dborkmann, bhutchings,
netdev, davem
Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
include/linux/skbuff.h | 3 ++
net/core/skbuff.c | 85 ++++++++++++++++++++++++++++++++++++
net/netfilter/nfnetlink_queue_core.c | 59 ++-----------------------
3 files changed, 92 insertions(+), 55 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 77c7aae..a6781af 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2363,6 +2363,9 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
struct pipe_inode_info *pipe, unsigned int len,
unsigned int flags);
void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
+unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
+void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
+ int len, int hlen);
void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 06e72d3..63116d7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2122,6 +2122,91 @@ __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset,
}
EXPORT_SYMBOL(skb_copy_and_csum_bits);
+ /**
+ * skb_zerocopy_headlen - Calculate headroom needed for skb_zerocopy()
+ * @from: source buffer
+ *
+ * Calculates the amount of linear headroom needed in the 'to' skb passed
+ * into skb_zerocopy().
+ */
+unsigned int
+skb_zerocopy_headlen(const struct sk_buff *from)
+{
+ unsigned int hlen = 0;
+
+ if (!from->head_frag ||
+ skb_headlen(from) < L1_CACHE_BYTES ||
+ skb_shinfo(from)->nr_frags >= MAX_SKB_FRAGS)
+ hlen = skb_headlen(from);
+
+ if (skb_has_frag_list(from))
+ hlen = from->len;
+
+ return hlen;
+}
+EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
+
+/**
+ * skb_zerocopy - Zero copy skb to skb
+ * @to: destination buffer
+ * @source: source buffer
+ * @len: number of bytes to copy from source buffer
+ * @hlen: size of linear headroom in destination buffer
+ *
+ * Copies up to `len` bytes from `from` to `to` by creating references
+ * to the frags in the source buffer.
+ *
+ * The `hlen` as calculated by skb_zerocopy_headlen() specifies the
+ * headroom in the `to` buffer.
+ */
+void
+skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
+{
+ int i, j = 0;
+ int plen = 0; /* length of skb->head fragment */
+ struct page *page;
+ unsigned int offset;
+
+ BUG_ON(!from->head_frag && !hlen);
+
+ /* dont bother with small payloads */
+ if (len <= skb_tailroom(to)) {
+ skb_copy_bits(from, 0, skb_put(to, len), len);
+ return;
+ }
+
+ if (hlen) {
+ skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+ len -= hlen;
+ } else {
+ plen = min_t(int, skb_headlen(from), len);
+ if (plen) {
+ page = virt_to_head_page(from->head);
+ offset = from->data - (unsigned char *)page_address(page);
+ __skb_fill_page_desc(to, 0, page, offset, plen);
+ get_page(page);
+ j = 1;
+ len -= plen;
+ }
+ }
+
+ to->truesize += len + plen;
+ to->len += len + plen;
+ to->data_len += len + plen;
+
+ for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
+ if (!len)
+ break;
+ skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
+ skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
+ len -= skb_shinfo(to)->frags[j].size;
+ skb_frag_ref(to, j);
+ j++;
+ }
+ skb_shinfo(to)->nr_frags = j;
+}
+EXPORT_SYMBOL_GPL(skb_zerocopy);
+
void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to)
{
__wsum csum;
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 21258cf..615ee12 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -235,51 +235,6 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn, unsigned long data)
spin_unlock_bh(&queue->lock);
}
-static void
-nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
-{
- int i, j = 0;
- int plen = 0; /* length of skb->head fragment */
- struct page *page;
- unsigned int offset;
-
- /* dont bother with small payloads */
- if (len <= skb_tailroom(to)) {
- skb_copy_bits(from, 0, skb_put(to, len), len);
- return;
- }
-
- if (hlen) {
- skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
- len -= hlen;
- } else {
- plen = min_t(int, skb_headlen(from), len);
- if (plen) {
- page = virt_to_head_page(from->head);
- offset = from->data - (unsigned char *)page_address(page);
- __skb_fill_page_desc(to, 0, page, offset, plen);
- get_page(page);
- j = 1;
- len -= plen;
- }
- }
-
- to->truesize += len + plen;
- to->len += len + plen;
- to->data_len += len + plen;
-
- for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
- if (!len)
- break;
- skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
- skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
- len -= skb_shinfo(to)->frags[j].size;
- skb_frag_ref(to, j);
- j++;
- }
- skb_shinfo(to)->nr_frags = j;
-}
-
static int
nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet,
bool csum_verify)
@@ -304,7 +259,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
{
size_t size;
size_t data_len = 0, cap_len = 0;
- int hlen = 0;
+ unsigned int hlen = 0;
struct sk_buff *skb;
struct nlattr *nla;
struct nfqnl_msg_packet_hdr *pmsg;
@@ -356,14 +311,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
if (data_len > entskb->len)
data_len = entskb->len;
- if (!entskb->head_frag ||
- skb_headlen(entskb) < L1_CACHE_BYTES ||
- skb_shinfo(entskb)->nr_frags >= MAX_SKB_FRAGS)
- hlen = skb_headlen(entskb);
-
- if (skb_has_frag_list(entskb))
- hlen = entskb->len;
- hlen = min_t(int, data_len, hlen);
+ hlen = skb_zerocopy_headlen(entskb);
+ hlen = min_t(unsigned int, hlen, data_len);
size += sizeof(struct nlattr) + hlen;
cap_len = entskb->len;
break;
@@ -504,7 +453,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
nla->nla_type = NFQA_PAYLOAD;
nla->nla_len = nla_attr_size(data_len);
- nfqnl_zcopy(skb, entskb, data_len, hlen);
+ skb_zerocopy(skb, entskb, data_len, hlen);
}
nlh->nlmsg_len = skb->len;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next 2/6] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages
2013-12-13 14:22 [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 1/6] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
@ 2013-12-13 14:22 ` Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 3/6] openvswitch: Drop user features if old user space attempted to create datapath Thomas Graf
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Thomas Graf @ 2013-12-13 14:22 UTC (permalink / raw)
To: jesse
Cc: dev, eric.dumazet, fleitner, ffusco, dborkmann, bhutchings,
netdev, davem
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
include/uapi/linux/openvswitch.h | 4 ++++
net/openvswitch/datapath.c | 14 ++++++++++++++
net/openvswitch/datapath.h | 2 ++
3 files changed, 20 insertions(+)
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index d120f9f..07ef2c3 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -75,6 +75,7 @@ enum ovs_datapath_attr {
OVS_DP_ATTR_UPCALL_PID, /* Netlink PID to receive upcalls */
OVS_DP_ATTR_STATS, /* struct ovs_dp_stats */
OVS_DP_ATTR_MEGAFLOW_STATS, /* struct ovs_dp_megaflow_stats */
+ OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */
__OVS_DP_ATTR_MAX
};
@@ -106,6 +107,9 @@ struct ovs_vport_stats {
__u64 tx_dropped; /* no space available in linux */
};
+/* Allow last Netlink attribute to be unaligned */
+#define OVS_DP_F_UNALIGNED (1 << 0)
+
/* Fixed logical ports. */
#define OVSP_LOCAL ((__u32)0)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0ac9cde..95d4424 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1067,6 +1067,7 @@ static const struct genl_ops dp_flow_genl_ops[] = {
static const struct nla_policy datapath_policy[OVS_DP_ATTR_MAX + 1] = {
[OVS_DP_ATTR_NAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 },
[OVS_DP_ATTR_UPCALL_PID] = { .type = NLA_U32 },
+ [OVS_DP_ATTR_USER_FEATURES] = { .type = NLA_U32 },
};
static struct genl_family dp_datapath_genl_family = {
@@ -1125,6 +1126,9 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
&dp_megaflow_stats))
goto nla_put_failure;
+ if (nla_put_u32(skb, OVS_DP_ATTR_USER_FEATURES, dp->user_features))
+ goto nla_put_failure;
+
return genlmsg_end(skb, ovs_header);
nla_put_failure:
@@ -1171,6 +1175,12 @@ static struct datapath *lookup_datapath(struct net *net,
return dp ? dp : ERR_PTR(-ENODEV);
}
+static void ovs_dp_change(struct datapath *dp, struct nlattr **a)
+{
+ if (a[OVS_DP_ATTR_USER_FEATURES])
+ dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
+}
+
static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
{
struct nlattr **a = info->attrs;
@@ -1229,6 +1239,8 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
parms.port_no = OVSP_LOCAL;
parms.upcall_portid = nla_get_u32(a[OVS_DP_ATTR_UPCALL_PID]);
+ ovs_dp_change(dp, a);
+
vport = new_vport(&parms);
if (IS_ERR(vport)) {
err = PTR_ERR(vport);
@@ -1332,6 +1344,8 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
if (IS_ERR(dp))
goto unlock;
+ ovs_dp_change(dp, info->attrs);
+
reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
if (IS_ERR(reply)) {
err = PTR_ERR(reply);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 4067ea4..193e2e0 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -88,6 +88,8 @@ struct datapath {
/* Network namespace ref. */
struct net *net;
#endif
+
+ u32 user_features;
};
/**
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next 3/6] openvswitch: Drop user features if old user space attempted to create datapath
2013-12-13 14:22 [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 1/6] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 2/6] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Thomas Graf
@ 2013-12-13 14:22 ` Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 4/6] openvswitch: Pass datapath into userspace queue functions Thomas Graf
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Thomas Graf @ 2013-12-13 14:22 UTC (permalink / raw)
To: jesse
Cc: dev, eric.dumazet, fleitner, ffusco, dborkmann, bhutchings,
netdev, davem
Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
include/uapi/linux/openvswitch.h | 10 +++++++++-
net/openvswitch/datapath.c | 21 +++++++++++++++++++++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 07ef2c3..970553c 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -40,7 +40,15 @@ struct ovs_header {
#define OVS_DATAPATH_FAMILY "ovs_datapath"
#define OVS_DATAPATH_MCGROUP "ovs_datapath"
-#define OVS_DATAPATH_VERSION 0x1
+
+/* V2:
+ * - API users are expected to provide OVS_DP_ATTR_USER_FEATURES
+ * when creating the datapath.
+ */
+#define OVS_DATAPATH_VERSION 2
+
+/* First OVS datapath version to support features */
+#define OVS_DP_VER_FEATURES 2
enum ovs_datapath_cmd {
OVS_DP_CMD_UNSPEC,
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 95d4424..8eaa39a 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1175,6 +1175,18 @@ static struct datapath *lookup_datapath(struct net *net,
return dp ? dp : ERR_PTR(-ENODEV);
}
+static void ovs_dp_reset_user_features(struct sk_buff *skb, struct genl_info *info)
+{
+ struct datapath *dp;
+
+ dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
+ if (!dp)
+ return;
+
+ WARN(dp->user_features, "Dropping previously announced user features\n");
+ dp->user_features = 0;
+}
+
static void ovs_dp_change(struct datapath *dp, struct nlattr **a)
{
if (a[OVS_DP_ATTR_USER_FEATURES])
@@ -1247,6 +1259,15 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
if (err == -EBUSY)
err = -EEXIST;
+ if (err == -EEXIST) {
+ /* An outdated user space instance that does not understand
+ * the concept of user_features has attempted to create a new
+ * datapath and is likely to reuse it. Drop all user features.
+ */
+ if (info->genlhdr->version < OVS_DP_VER_FEATURES)
+ ovs_dp_reset_user_features(skb, info);
+ }
+
goto err_destroy_ports_array;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next 4/6] openvswitch: Pass datapath into userspace queue functions
2013-12-13 14:22 [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Thomas Graf
` (2 preceding siblings ...)
2013-12-13 14:22 ` [PATCH net-next 3/6] openvswitch: Drop user features if old user space attempted to create datapath Thomas Graf
@ 2013-12-13 14:22 ` Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 5/6] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Thomas Graf @ 2013-12-13 14:22 UTC (permalink / raw)
To: jesse
Cc: dev, eric.dumazet, fleitner, ffusco, dborkmann, bhutchings,
netdev, davem
Allows removing the net and dp_ifindex argument and simplify the
code.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
net/openvswitch/datapath.c | 34 ++++++++++++++--------------------
1 file changed, 14 insertions(+), 20 deletions(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 8eaa39a..8f6318c 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -108,10 +108,9 @@ int lockdep_ovsl_is_held(void)
#endif
static struct vport *new_vport(const struct vport_parms *);
-static int queue_gso_packets(struct net *, int dp_ifindex, struct sk_buff *,
+static int queue_gso_packets(struct datapath *dp, struct sk_buff *,
const struct dp_upcall_info *);
-static int queue_userspace_packet(struct net *, int dp_ifindex,
- struct sk_buff *,
+static int queue_userspace_packet(struct datapath *dp, struct sk_buff *,
const struct dp_upcall_info *);
/* Must be called with rcu_read_lock or ovs_mutex. */
@@ -277,7 +276,6 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
const struct dp_upcall_info *upcall_info)
{
struct dp_stats_percpu *stats;
- int dp_ifindex;
int err;
if (upcall_info->portid == 0) {
@@ -285,16 +283,10 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
goto err;
}
- dp_ifindex = get_dpifindex(dp);
- if (!dp_ifindex) {
- err = -ENODEV;
- goto err;
- }
-
if (!skb_is_gso(skb))
- err = queue_userspace_packet(ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
+ err = queue_userspace_packet(dp, skb, upcall_info);
else
- err = queue_gso_packets(ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
+ err = queue_gso_packets(dp, skb, upcall_info);
if (err)
goto err;
@@ -310,8 +302,7 @@ err:
return err;
}
-static int queue_gso_packets(struct net *net, int dp_ifindex,
- struct sk_buff *skb,
+static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
const struct dp_upcall_info *upcall_info)
{
unsigned short gso_type = skb_shinfo(skb)->gso_type;
@@ -327,7 +318,7 @@ static int queue_gso_packets(struct net *net, int dp_ifindex,
/* Queue all of the segments. */
skb = segs;
do {
- err = queue_userspace_packet(net, dp_ifindex, skb, upcall_info);
+ err = queue_userspace_packet(dp, skb, upcall_info);
if (err)
break;
@@ -394,8 +385,7 @@ static size_t upcall_msg_size(const struct sk_buff *skb,
return size;
}
-static int queue_userspace_packet(struct net *net, int dp_ifindex,
- struct sk_buff *skb,
+static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
const struct dp_upcall_info *upcall_info)
{
struct ovs_header *upcall;
@@ -403,11 +393,15 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
struct sk_buff *user_skb; /* to be queued to userspace */
struct nlattr *nla;
struct genl_info info = {
- .dst_sk = net->genl_sock,
+ .dst_sk = ovs_dp_get_net(dp)->genl_sock,
.snd_portid = upcall_info->portid,
};
size_t len;
- int err;
+ int err, dp_ifindex;
+
+ dp_ifindex = get_dpifindex(dp);
+ if (!dp_ifindex)
+ return -ENODEV;
if (vlan_tx_tag_present(skb)) {
nskb = skb_clone(skb, GFP_ATOMIC);
@@ -452,7 +446,7 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
skb_copy_and_csum_dev(skb, nla_data(nla));
genlmsg_end(user_skb, upcall);
- err = genlmsg_unicast(net, user_skb, upcall_info->portid);
+ err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb, upcall_info->portid);
out:
kfree_skb(nskb);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next 5/6] openvswitch: Use skb_zerocopy() for upcall
2013-12-13 14:22 [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Thomas Graf
` (3 preceding siblings ...)
2013-12-13 14:22 ` [PATCH net-next 4/6] openvswitch: Pass datapath into userspace queue functions Thomas Graf
@ 2013-12-13 14:22 ` Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 6/6] openvswitch: Compute checksum in skb_gso_segment() if needed Thomas Graf
2013-12-17 1:22 ` [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Jesse Gross
6 siblings, 0 replies; 8+ messages in thread
From: Thomas Graf @ 2013-12-13 14:22 UTC (permalink / raw)
To: jesse
Cc: dev, eric.dumazet, fleitner, ffusco, dborkmann, bhutchings,
netdev, davem
Use of skb_zerocopy() can avoid the expensive call to memcpy()
when copying the packet data into the Netlink skb. Completes
checksum through skb_checksum_help() if not already done in
GSO segmentation.
Zerocopy is only performed if user space supported unaligned
Netlink messages. memory mapped netlink i/o is preferred over
zerocopy if it is set up.
Cost of upcall is significantly reduced from:
+ 7.48% vhost-8471 [k] memcpy
+ 5.57% ovs-vswitchd [k] memcpy
+ 2.81% vhost-8471 [k] csum_partial_copy_generic
to:
+ 5.72% ovs-vswitchd [k] memcpy
+ 3.32% vhost-5153 [k] memcpy
+ 0.68% vhost-5153 [k] skb_zerocopy
(megaflows disabled)
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
net/openvswitch/datapath.c | 37 +++++++++++++++++++++++++++++--------
1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 8f6318c..ffed1cd 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -371,11 +371,11 @@ static size_t key_attr_size(void)
+ nla_total_size(28); /* OVS_KEY_ATTR_ND */
}
-static size_t upcall_msg_size(const struct sk_buff *skb,
- const struct nlattr *userdata)
+static size_t upcall_msg_size(const struct nlattr *userdata,
+ unsigned int hdrlen)
{
size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
- + nla_total_size(skb->len) /* OVS_PACKET_ATTR_PACKET */
+ + nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
+ nla_total_size(key_attr_size()); /* OVS_PACKET_ATTR_KEY */
/* OVS_PACKET_ATTR_USERDATA */
@@ -397,6 +397,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
.snd_portid = upcall_info->portid,
};
size_t len;
+ unsigned int hlen;
int err, dp_ifindex;
dp_ifindex = get_dpifindex(dp);
@@ -421,7 +422,21 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
goto out;
}
- len = upcall_msg_size(skb, upcall_info->userdata);
+ /* Complete checksum if needed */
+ if (skb->ip_summed == CHECKSUM_PARTIAL &&
+ (err = skb_checksum_help(skb)))
+ goto out;
+
+ /* Older versions of OVS user space enforce alignment of the last
+ * Netlink attribute to NLA_ALIGNTO which would require extensive
+ * padding logic. Only perform zerocopy if padding is not required.
+ */
+ if (dp->user_features & OVS_DP_F_UNALIGNED)
+ hlen = skb_zerocopy_headlen(skb);
+ else
+ hlen = skb->len;
+
+ len = upcall_msg_size(upcall_info->userdata, hlen);
user_skb = genlmsg_new_unicast(len, &info, GFP_ATOMIC);
if (!user_skb) {
err = -ENOMEM;
@@ -441,13 +456,19 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
nla_len(upcall_info->userdata),
nla_data(upcall_info->userdata));
- nla = __nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, skb->len);
+ /* Only reserve room for attribute header, packet data is added
+ * in skb_zerocopy() */
+ if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0))) {
+ err = -ENOBUFS;
+ goto out;
+ }
+ nla->nla_len = nla_attr_size(skb->len);
- skb_copy_and_csum_dev(skb, nla_data(nla));
+ skb_zerocopy(user_skb, skb, skb->len, hlen);
- genlmsg_end(user_skb, upcall);
- err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb, upcall_info->portid);
+ ((struct nlmsghdr *) user_skb->data)->nlmsg_len = user_skb->len;
+ err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb, upcall_info->portid);
out:
kfree_skb(nskb);
return err;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next 6/6] openvswitch: Compute checksum in skb_gso_segment() if needed
2013-12-13 14:22 [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Thomas Graf
` (4 preceding siblings ...)
2013-12-13 14:22 ` [PATCH net-next 5/6] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
@ 2013-12-13 14:22 ` Thomas Graf
2013-12-17 1:22 ` [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Jesse Gross
6 siblings, 0 replies; 8+ messages in thread
From: Thomas Graf @ 2013-12-13 14:22 UTC (permalink / raw)
To: jesse
Cc: dev, eric.dumazet, fleitner, ffusco, dborkmann, bhutchings,
netdev, davem
The copy & csum optimization is no longer present with zerocopy
enabled. Compute the checksum in skb_gso_segment() directly by
dropping the HW CSUM capability from the features passed in.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
net/openvswitch/datapath.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index ffed1cd..20ac30e 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -311,7 +311,7 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
struct sk_buff *segs, *nskb;
int err;
- segs = __skb_gso_segment(skb, NETIF_F_SG | NETIF_F_HW_CSUM, false);
+ segs = __skb_gso_segment(skb, NETIF_F_SG, false);
if (IS_ERR(segs))
return PTR_ERR(segs);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation
2013-12-13 14:22 [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Thomas Graf
` (5 preceding siblings ...)
2013-12-13 14:22 ` [PATCH net-next 6/6] openvswitch: Compute checksum in skb_gso_segment() if needed Thomas Graf
@ 2013-12-17 1:22 ` Jesse Gross
6 siblings, 0 replies; 8+ messages in thread
From: Jesse Gross @ 2013-12-17 1:22 UTC (permalink / raw)
To: Thomas Graf
Cc: dev, Eric Dumazet, fleitner, Francesco Fusco, dborkmann,
Ben Hutchings, netdev, David Miller
On Fri, Dec 13, 2013 at 6:22 AM, Thomas Graf <tgraf@suug.ch> wrote:
> Series implementing a zerocopy method for OVS upcall messages. Based
> on top of commit: (''openvswitch: Enable memory mapped Netlink i/o'')
Series applied, thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-12-17 1:22 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-13 14:22 [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 1/6] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 2/6] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 3/6] openvswitch: Drop user features if old user space attempted to create datapath Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 4/6] openvswitch: Pass datapath into userspace queue functions Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 5/6] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
2013-12-13 14:22 ` [PATCH net-next 6/6] openvswitch: Compute checksum in skb_gso_segment() if needed Thomas Graf
2013-12-17 1:22 ` [PATCH net-next 0/6 v9] Open vSwitch zercopy upcall optimiziation Jesse Gross
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).