* [PATCH net-next, 0/4] hv_netvsc: Add XDP support and some error handling fixes @ 2019-10-28 21:06 Haiyang Zhang 2019-10-28 21:07 ` [PATCH net-next, 1/4] hv_netvsc: Fix error handling in netvsc_set_features() Haiyang Zhang ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Haiyang Zhang @ 2019-10-28 21:06 UTC (permalink / raw) To: sashal, linux-hyperv, netdev Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel This patch set fixes some error handling issues in netvsc driver, and add XDP support. Haiyang Zhang (4): hv_netvsc: Fix error handling in netvsc_set_features() hv_netvsc: Fix error handling in netvsc_attach() hv_netvsc: Add XDP support hv_netvsc: Update document for XDP support .../networking/device_drivers/microsoft/netvsc.txt | 14 ++ drivers/net/hyperv/Makefile | 2 +- drivers/net/hyperv/hyperv_net.h | 15 ++ drivers/net/hyperv/netvsc.c | 8 +- drivers/net/hyperv/netvsc_bpf.c | 211 +++++++++++++++++++++ drivers/net/hyperv/netvsc_drv.c | 150 ++++++++++++--- 6 files changed, 374 insertions(+), 26 deletions(-) create mode 100644 drivers/net/hyperv/netvsc_bpf.c -- 1.8.3.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net-next, 1/4] hv_netvsc: Fix error handling in netvsc_set_features() 2019-10-28 21:06 [PATCH net-next, 0/4] hv_netvsc: Add XDP support and some error handling fixes Haiyang Zhang @ 2019-10-28 21:07 ` Haiyang Zhang 2019-10-28 21:07 ` [PATCH net-next, 2/4] hv_netvsc: Fix error handling in netvsc_attach() Haiyang Zhang ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Haiyang Zhang @ 2019-10-28 21:07 UTC (permalink / raw) To: sashal, linux-hyperv, netdev Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel When an error is returned by rndis_filter_set_offload_params(), we should still assign the unaffected features to ndev->features. Otherwise, these features will be missing. Fixes: d6792a5a0747 ("hv_netvsc: Add handler for LRO setting change") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> --- drivers/net/hyperv/netvsc_drv.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 39dddcd..734e411 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -1807,8 +1807,10 @@ static int netvsc_set_features(struct net_device *ndev, ret = rndis_filter_set_offload_params(ndev, nvdev, &offloads); - if (ret) + if (ret) { features ^= NETIF_F_LRO; + ndev->features = features; + } syncvf: if (!vf_netdev) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net-next, 2/4] hv_netvsc: Fix error handling in netvsc_attach() 2019-10-28 21:06 [PATCH net-next, 0/4] hv_netvsc: Add XDP support and some error handling fixes Haiyang Zhang 2019-10-28 21:07 ` [PATCH net-next, 1/4] hv_netvsc: Fix error handling in netvsc_set_features() Haiyang Zhang @ 2019-10-28 21:07 ` Haiyang Zhang 2019-11-01 20:42 ` Markus Elfring 2019-10-28 21:07 ` [PATCH net-next, 3/4] hv_netvsc: Add XDP support Haiyang Zhang 2019-10-28 21:07 ` [PATCH net-next, 4/4] hv_netvsc: Update document for " Haiyang Zhang 3 siblings, 1 reply; 13+ messages in thread From: Haiyang Zhang @ 2019-10-28 21:07 UTC (permalink / raw) To: sashal, linux-hyperv, netdev Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel If rndis_filter_open() fails, we need to remove the rndis device created in earlier steps, before returning an error code. Otherwise, the retry of netvsc_attach() from its callers will fail and hang. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> --- drivers/net/hyperv/netvsc_drv.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 734e411..a14fc8e 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -982,7 +982,7 @@ static int netvsc_attach(struct net_device *ndev, if (netif_running(ndev)) { ret = rndis_filter_open(nvdev); if (ret) - return ret; + goto err; rdev = nvdev->extension; if (!rdev->link_state) @@ -990,6 +990,13 @@ static int netvsc_attach(struct net_device *ndev, } return 0; + +err: + netif_device_detach(ndev); + + rndis_filter_device_remove(hdev, nvdev); + + return ret; } static int netvsc_set_channels(struct net_device *net, -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net-next, 2/4] hv_netvsc: Fix error handling in netvsc_attach() 2019-10-28 21:07 ` [PATCH net-next, 2/4] hv_netvsc: Fix error handling in netvsc_attach() Haiyang Zhang @ 2019-11-01 20:42 ` Markus Elfring 2019-11-04 15:08 ` Haiyang Zhang 0 siblings, 1 reply; 13+ messages in thread From: Markus Elfring @ 2019-11-01 20:42 UTC (permalink / raw) To: Haiyang Zhang, linux-hyperv, netdev Cc: kernel-janitors, linux-kernel, David S. Miller, K. Y. Srinivasan, Olaf Hering, Sasha Levin, Stephen Hemminger, Vitaly Kuznetsov > If rndis_filter_open() fails, we need to remove the rndis device created > in earlier steps, before returning an error code. Otherwise, the retry of > netvsc_attach() from its callers will fail and hang. How do you think about to choose a more “imperative mood” for your change description? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=0dbe6cb8f7e05bc9611602ef45980a6c57b245a3#n151 … > +++ b/drivers/net/hyperv/netvsc_drv.c > @@ -982,7 +982,7 @@ static int netvsc_attach(struct net_device *ndev, > if (netif_running(ndev)) { > ret = rndis_filter_open(nvdev); > if (ret) > - return ret; > + goto err; > > rdev = nvdev->extension; > if (!rdev->link_state) … I would prefer to specify the completed exception handling (addition of two function calls) by a compound statement in the shown if branch directly. If you would insist to use a goto statement, I find an other label more appropriate according to the Linux coding style. Regards, Markus ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH net-next, 2/4] hv_netvsc: Fix error handling in netvsc_attach() 2019-11-01 20:42 ` Markus Elfring @ 2019-11-04 15:08 ` Haiyang Zhang 0 siblings, 0 replies; 13+ messages in thread From: Haiyang Zhang @ 2019-11-04 15:08 UTC (permalink / raw) To: Markus Elfring, linux-hyperv, netdev Cc: kernel-janitors, linux-kernel, David S. Miller, KY Srinivasan, Olaf Hering, Sasha Levin, Stephen Hemminger, vkuznets > -----Original Message----- > From: Markus Elfring <Markus.Elfring@web.de> > Sent: Friday, November 1, 2019 4:43 PM > To: Haiyang Zhang <haiyangz@microsoft.com>; linux- > hyperv@vger.kernel.org; netdev@vger.kernel.org > Cc: kernel-janitors@vger.kernel.org; linux-kernel@vger.kernel.org; David S. > Miller <davem@davemloft.net>; KY Srinivasan <kys@microsoft.com>; Olaf > Hering <olaf@aepfle.de>; Sasha Levin <sashal@kernel.org>; Stephen > Hemminger <sthemmin@microsoft.com>; vkuznets <vkuznets@redhat.com> > Subject: Re: [PATCH net-next, 2/4] hv_netvsc: Fix error handling in > netvsc_attach() > > > If rndis_filter_open() fails, we need to remove the rndis device > > created in earlier steps, before returning an error code. Otherwise, > > the retry of > > netvsc_attach() from its callers will fail and hang. > > How do you think about to choose a more “imperative mood” for your > change description? > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.k > ernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git% > 2Ftree%2FDocumentation%2Fprocess%2Fsubmitting- > patches.rst%3Fid%3D0dbe6cb8f7e05bc9611602ef45980a6c57b245a3%23n151 > &data=02%7C01%7Chaiyangz%40microsoft.com%7C162aa016f45e4293c > abb08d75f0c0fee%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637 > 082377796295159&sdata=ytjxGYTPI2D4BoNbslKPvBbsfGUEM7hXj1YAiG > hn8Ik%3D&reserved=0 Agreed. Thanks. > > > … > > +++ b/drivers/net/hyperv/netvsc_drv.c > > @@ -982,7 +982,7 @@ static int netvsc_attach(struct net_device *ndev, > > if (netif_running(ndev)) { > > ret = rndis_filter_open(nvdev); > > if (ret) > > - return ret; > > + goto err; > > > > rdev = nvdev->extension; > > if (!rdev->link_state) > … > > I would prefer to specify the completed exception handling (addition of two > function calls) by a compound statement in the shown if branch directly. > > If you would insist to use a goto statement, I find an other label more > appropriate according to the Linux coding style. I will have more patches that make multiple entry points of error clean up steps -- so I used goto instead of putting the functions in each if-branch. I will name the labels more meaningfully. Thanks, - Haiyang ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net-next, 3/4] hv_netvsc: Add XDP support 2019-10-28 21:06 [PATCH net-next, 0/4] hv_netvsc: Add XDP support and some error handling fixes Haiyang Zhang 2019-10-28 21:07 ` [PATCH net-next, 1/4] hv_netvsc: Fix error handling in netvsc_set_features() Haiyang Zhang 2019-10-28 21:07 ` [PATCH net-next, 2/4] hv_netvsc: Fix error handling in netvsc_attach() Haiyang Zhang @ 2019-10-28 21:07 ` Haiyang Zhang 2019-10-28 21:33 ` Jakub Kicinski 2019-10-28 21:07 ` [PATCH net-next, 4/4] hv_netvsc: Update document for " Haiyang Zhang 3 siblings, 1 reply; 13+ messages in thread From: Haiyang Zhang @ 2019-10-28 21:07 UTC (permalink / raw) To: sashal, linux-hyperv, netdev Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel This patch adds support of XDP in native mode for hv_netvsc driver, and transparently sets the XDP program on the associated VF NIC as well. XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO before running XDP: ethtool -K eth0 lro off XDP actions not yet supported: XDP_TX, XDP_REDIRECT Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> --- drivers/net/hyperv/Makefile | 2 +- drivers/net/hyperv/hyperv_net.h | 15 +++ drivers/net/hyperv/netvsc.c | 8 +- drivers/net/hyperv/netvsc_bpf.c | 211 ++++++++++++++++++++++++++++++++++++++++ drivers/net/hyperv/netvsc_drv.c | 141 ++++++++++++++++++++++----- 5 files changed, 351 insertions(+), 26 deletions(-) create mode 100644 drivers/net/hyperv/netvsc_bpf.c diff --git a/drivers/net/hyperv/Makefile b/drivers/net/hyperv/Makefile index 3a2aa07..0db7cca 100644 --- a/drivers/net/hyperv/Makefile +++ b/drivers/net/hyperv/Makefile @@ -1,4 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only obj-$(CONFIG_HYPERV_NET) += hv_netvsc.o -hv_netvsc-y := netvsc_drv.o netvsc.o rndis_filter.o netvsc_trace.o +hv_netvsc-y := netvsc_drv.o netvsc.o rndis_filter.o netvsc_trace.o netvsc_bpf.o diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 670ef68..e5aa256 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -142,6 +142,8 @@ struct netvsc_device_info { u32 send_section_size; u32 recv_section_size; + struct bpf_prog *bprog; + u8 rss_key[NETVSC_HASH_KEYLEN]; }; @@ -199,6 +201,15 @@ int netvsc_recv_callback(struct net_device *net, void netvsc_channel_cb(void *context); int netvsc_poll(struct napi_struct *napi, int budget); +u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan, + void **p_pbuf); +unsigned int netvsc_xdp_fraglen(unsigned int len); +struct bpf_prog *netvsc_xdp_get(struct netvsc_device *nvdev); +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog, + struct netvsc_device *nvdev); +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog); +int netvsc_bpf(struct net_device *dev, struct netdev_bpf *bpf); + int rndis_set_subchannel(struct net_device *ndev, struct netvsc_device *nvdev, struct netvsc_device_info *dev_info); @@ -865,6 +876,7 @@ struct netvsc_stats { u64 bytes; u64 broadcast; u64 multicast; + u64 xdp_drop; struct u64_stats_sync syncp; }; @@ -965,6 +977,9 @@ struct netvsc_channel { atomic_t queue_sends; struct nvsc_rsc rsc; + struct bpf_prog __rcu *bpf_prog; + struct xdp_rxq_info xdp_rxq; + struct netvsc_stats tx_stats; struct netvsc_stats rx_stats; }; diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index d22a36f..688487b 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -122,8 +122,10 @@ static void free_netvsc_device(struct rcu_head *head) vfree(nvdev->send_buf); kfree(nvdev->send_section_map); - for (i = 0; i < VRSS_CHANNEL_MAX; i++) + for (i = 0; i < VRSS_CHANNEL_MAX; i++) { + xdp_rxq_info_unreg(&nvdev->chan_table[i].xdp_rxq); vfree(nvdev->chan_table[i].mrc.slots); + } kfree(nvdev); } @@ -1370,6 +1372,10 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device, nvchan->net_device = net_device; u64_stats_init(&nvchan->tx_stats.syncp); u64_stats_init(&nvchan->rx_stats.syncp); + + xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i); + xdp_rxq_info_reg_mem_model(&nvchan->xdp_rxq, + MEM_TYPE_PAGE_SHARED, NULL); } /* Enable NAPI handler before init callbacks */ diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c new file mode 100644 index 0000000..4d235ac --- /dev/null +++ b/drivers/net/hyperv/netvsc_bpf.c @@ -0,0 +1,211 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2019, Microsoft Corporation. + * + * Author: + * Haiyang Zhang <haiyangz@microsoft.com> + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/netdevice.h> +#include <linux/etherdevice.h> +#include <linux/ethtool.h> +#include <linux/bpf.h> +#include <linux/bpf_trace.h> +#include <linux/kernel.h> +#include <net/xdp.h> + +#include <linux/mutex.h> +#include <linux/rtnetlink.h> + +#include "hyperv_net.h" + +u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan, + void **p_pbuf) +{ + struct page *page = NULL; + void *data = nvchan->rsc.data[0]; + u32 len = nvchan->rsc.len[0]; + void *pbuf = data; + struct bpf_prog *prog; + struct xdp_buff xdp; + u32 act = XDP_PASS; + + *p_pbuf = NULL; + + rcu_read_lock(); + prog = rcu_dereference(nvchan->bpf_prog); + + if (!prog || nvchan->rsc.cnt > 1) + goto out; + + /* copy to a new page buffer if data are not within a page */ + if (virt_to_page(data) != virt_to_page(data + len - 1)) { + page = alloc_page(GFP_ATOMIC); + if (!page) + goto out; + + pbuf = page_address(page); + memcpy(pbuf, nvchan->rsc.data[0], len); + + *p_pbuf = pbuf; + } + + xdp.data_hard_start = pbuf; + xdp.data = xdp.data_hard_start; + xdp_set_data_meta_invalid(&xdp); + xdp.data_end = xdp.data + len; + xdp.rxq = &nvchan->xdp_rxq; + xdp.handle = 0; + + act = bpf_prog_run_xdp(prog, &xdp); + + switch (act) { + case XDP_PASS: + /* Pass to upper layers */ + break; + + case XDP_ABORTED: + trace_xdp_exception(ndev, prog, act); + break; + + case XDP_DROP: + break; + + default: + bpf_warn_invalid_xdp_action(act); + } + +out: + rcu_read_unlock(); + + if (page && act != XDP_PASS) { + *p_pbuf = NULL; + __free_page(page); + } + + return act; +} + +unsigned int netvsc_xdp_fraglen(unsigned int len) +{ + return SKB_DATA_ALIGN(len) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); +} + +struct bpf_prog *netvsc_xdp_get(struct netvsc_device *nvdev) +{ + return rtnl_dereference(nvdev->chan_table[0].bpf_prog); +} + +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog, + struct netvsc_device *nvdev) +{ + struct bpf_prog *old_prog; + int frag_max, i; + + old_prog = netvsc_xdp_get(nvdev); + + if (!old_prog && !prog) + return 0; + + frag_max = netvsc_xdp_fraglen(dev->mtu + ETH_HLEN); + if (prog && frag_max > PAGE_SIZE) { + netdev_err(dev, "XDP: mtu:%u too large, frag:%u\n", + dev->mtu, frag_max); + return -EOPNOTSUPP; + } + + if (prog && (dev->features & NETIF_F_LRO)) { + netdev_err(dev, "XDP: not support LRO\n"); + return -EOPNOTSUPP; + } + + if (prog) { + prog = bpf_prog_add(prog, nvdev->num_chn); + if (IS_ERR(prog)) + return PTR_ERR(prog); + } + + for (i = 0; i < nvdev->num_chn; i++) + rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog); + + if (old_prog) + for (i = 0; i < nvdev->num_chn; i++) + bpf_prog_put(old_prog); + + return 0; +} + +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog) +{ + struct netdev_bpf xdp; + bpf_op_t ndo_bpf; + + ASSERT_RTNL(); + + if (!vf_netdev) + return 0; + + ndo_bpf = vf_netdev->netdev_ops->ndo_bpf; + if (!ndo_bpf) + return 0; + + memset(&xdp, 0, sizeof(xdp)); + + xdp.command = XDP_SETUP_PROG; + xdp.prog = prog; + + return ndo_bpf(vf_netdev, &xdp); +} + +static u32 netvsc_xdp_query(struct netvsc_device *nvdev) +{ + struct bpf_prog *prog = netvsc_xdp_get(nvdev); + + if (prog) + return prog->aux->id; + + return 0; +} + +int netvsc_bpf(struct net_device *dev, struct netdev_bpf *bpf) +{ + struct net_device_context *ndevctx = netdev_priv(dev); + struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev); + struct net_device *vf_netdev = rtnl_dereference(ndevctx->vf_netdev); + int ret; + + if (!nvdev || nvdev->destroy) { + if (bpf->command == XDP_QUERY_PROG) { + bpf->prog_id = 0; + return 0; /* Query must always succeed */ + } else { + return -ENODEV; + } + } + + switch (bpf->command) { + case XDP_SETUP_PROG: + ret = netvsc_xdp_set(dev, bpf->prog, nvdev); + + if (ret) + return ret; + + ret = netvsc_vf_setxdp(vf_netdev, bpf->prog); + + if (ret) { + netdev_err(dev, "vf_setxdp failed:%d\n", ret); + netvsc_xdp_set(dev, NULL, nvdev); + } + + return ret; + + case XDP_QUERY_PROG: + bpf->prog_id = netvsc_xdp_query(nvdev); + return 0; + + default: + return -EINVAL; + } +} diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index a14fc8e..415f8db 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -25,6 +25,7 @@ #include <linux/slab.h> #include <linux/rtnetlink.h> #include <linux/netpoll.h> +#include <linux/bpf.h> #include <net/arp.h> #include <net/route.h> @@ -760,7 +761,8 @@ static void netvsc_comp_ipcsum(struct sk_buff *skb) } static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net, - struct netvsc_channel *nvchan) + struct netvsc_channel *nvchan, + void *pbuf) { struct napi_struct *napi = &nvchan->napi; const struct ndis_pkt_8021q_info *vlan = nvchan->rsc.vlan; @@ -769,16 +771,32 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net, struct sk_buff *skb; int i; - skb = napi_alloc_skb(napi, nvchan->rsc.pktlen); - if (!skb) - return skb; + if (pbuf) { + unsigned int len = nvchan->rsc.pktlen; + unsigned int frag_size = netvsc_xdp_fraglen(len); - /* - * Copy to skb. This copy is needed here since the memory pointed by - * hv_netvsc_packet cannot be deallocated - */ - for (i = 0; i < nvchan->rsc.cnt; i++) - skb_put_data(skb, nvchan->rsc.data[i], nvchan->rsc.len[i]); + skb = build_skb(pbuf, frag_size); + + if (!skb) { + __free_page(virt_to_page(pbuf)); + return NULL; + } + + skb_put(skb, len); + skb->dev = napi->dev; + } else { + skb = napi_alloc_skb(napi, nvchan->rsc.pktlen); + + if (!skb) + return NULL; + + /* Copy to skb. This copy is needed here since the memory + * pointed by hv_netvsc_packet cannot be deallocated. + */ + for (i = 0; i < nvchan->rsc.cnt; i++) + skb_put_data(skb, nvchan->rsc.data[i], + nvchan->rsc.len[i]); + } skb->protocol = eth_type_trans(skb, net); @@ -826,13 +844,25 @@ int netvsc_recv_callback(struct net_device *net, struct vmbus_channel *channel = nvchan->channel; u16 q_idx = channel->offermsg.offer.sub_channel_index; struct sk_buff *skb; - struct netvsc_stats *rx_stats; + struct netvsc_stats *rx_stats = &nvchan->rx_stats; + void *pbuf = NULL; /* page buffer */ + u32 act; if (net->reg_state != NETREG_REGISTERED) return NVSP_STAT_FAIL; + act = netvsc_run_xdp(net, nvchan, &pbuf); + + if (act != XDP_PASS) { + u64_stats_update_begin(&rx_stats->syncp); + rx_stats->xdp_drop++; + u64_stats_update_end(&rx_stats->syncp); + + return NVSP_STAT_SUCCESS; /* consumed by XDP */ + } + /* Allocate a skb - TODO direct I/O to pages? */ - skb = netvsc_alloc_recv_skb(net, nvchan); + skb = netvsc_alloc_recv_skb(net, nvchan, pbuf); if (unlikely(!skb)) { ++net_device_ctx->eth_stats.rx_no_memory; @@ -846,7 +876,6 @@ int netvsc_recv_callback(struct net_device *net, * on the synthetic device because modifying the VF device * statistics will not work correctly. */ - rx_stats = &nvchan->rx_stats; u64_stats_update_begin(&rx_stats->syncp); rx_stats->packets++; rx_stats->bytes += nvchan->rsc.pktlen; @@ -887,6 +916,7 @@ static void netvsc_get_channels(struct net_device *net, (struct netvsc_device *nvdev) { struct netvsc_device_info *dev_info; + struct bpf_prog *prog; dev_info = kzalloc(sizeof(*dev_info), GFP_ATOMIC); @@ -894,6 +924,8 @@ static void netvsc_get_channels(struct net_device *net, return NULL; if (nvdev) { + ASSERT_RTNL(); + dev_info->num_chn = nvdev->num_chn; dev_info->send_sections = nvdev->send_section_cnt; dev_info->send_section_size = nvdev->send_section_size; @@ -902,6 +934,13 @@ static void netvsc_get_channels(struct net_device *net, memcpy(dev_info->rss_key, nvdev->extension->rss_key, NETVSC_HASH_KEYLEN); + + prog = netvsc_xdp_get(nvdev); + if (prog) { + prog = bpf_prog_add(prog, 1); + if (!IS_ERR(prog)) + dev_info->bprog = prog; + } } else { dev_info->num_chn = VRSS_CHANNEL_DEFAULT; dev_info->send_sections = NETVSC_DEFAULT_TX; @@ -913,6 +952,17 @@ static void netvsc_get_channels(struct net_device *net, return dev_info; } +/* Free struct netvsc_device_info */ +static void netvsc_devinfo_put(struct netvsc_device_info *dev_info) +{ + if (dev_info->bprog) { + ASSERT_RTNL(); + bpf_prog_put(dev_info->bprog); + } + + kfree(dev_info); +} + static int netvsc_detach(struct net_device *ndev, struct netvsc_device *nvdev) { @@ -924,6 +974,8 @@ static int netvsc_detach(struct net_device *ndev, if (cancel_work_sync(&nvdev->subchan_work)) nvdev->num_chn = 1; + netvsc_xdp_set(ndev, NULL, nvdev); + /* If device was up (receiving) then shutdown */ if (netif_running(ndev)) { netvsc_tx_disable(nvdev, ndev); @@ -957,7 +1009,8 @@ static int netvsc_attach(struct net_device *ndev, struct hv_device *hdev = ndev_ctx->device_ctx; struct netvsc_device *nvdev; struct rndis_device *rdev; - int ret; + struct bpf_prog *prog; + int ret = 0; nvdev = rndis_filter_device_add(hdev, dev_info); if (IS_ERR(nvdev)) @@ -973,6 +1026,13 @@ static int netvsc_attach(struct net_device *ndev, } } + prog = dev_info->bprog; + if (prog) { + ret = netvsc_xdp_set(ndev, prog, nvdev); + if (ret) + goto err1; + } + /* In any case device is now ready */ netif_device_attach(ndev); @@ -982,7 +1042,7 @@ static int netvsc_attach(struct net_device *ndev, if (netif_running(ndev)) { ret = rndis_filter_open(nvdev); if (ret) - goto err; + goto err2; rdev = nvdev->extension; if (!rdev->link_state) @@ -991,9 +1051,10 @@ static int netvsc_attach(struct net_device *ndev, return 0; -err: +err2: netif_device_detach(ndev); +err1: rndis_filter_device_remove(hdev, nvdev); return ret; @@ -1043,7 +1104,7 @@ static int netvsc_set_channels(struct net_device *net, } out: - kfree(device_info); + netvsc_devinfo_put(device_info); return ret; } @@ -1150,7 +1211,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) dev_set_mtu(vf_netdev, orig_mtu); out: - kfree(device_info); + netvsc_devinfo_put(device_info); return ret; } @@ -1375,8 +1436,8 @@ static int netvsc_set_mac_addr(struct net_device *ndev, void *p) /* statistics per queue (rx/tx packets/bytes) */ #define NETVSC_PCPU_STATS_LEN (num_present_cpus() * ARRAY_SIZE(pcpu_stats)) -/* 4 statistics per queue (rx/tx packets/bytes) */ -#define NETVSC_QUEUE_STATS_LEN(dev) ((dev)->num_chn * 4) +/* 5 statistics per queue (rx/tx packets/bytes, rx xdp_drop) */ +#define NETVSC_QUEUE_STATS_LEN(dev) ((dev)->num_chn * 5) static int netvsc_get_sset_count(struct net_device *dev, int string_set) { @@ -1408,6 +1469,7 @@ static void netvsc_get_ethtool_stats(struct net_device *dev, struct netvsc_ethtool_pcpu_stats *pcpu_sum; unsigned int start; u64 packets, bytes; + u64 xdp_drop; int i, j, cpu; if (!nvdev) @@ -1436,9 +1498,11 @@ static void netvsc_get_ethtool_stats(struct net_device *dev, start = u64_stats_fetch_begin_irq(&qstats->syncp); packets = qstats->packets; bytes = qstats->bytes; + xdp_drop = qstats->xdp_drop; } while (u64_stats_fetch_retry_irq(&qstats->syncp, start)); data[i++] = packets; data[i++] = bytes; + data[i++] = xdp_drop; } pcpu_sum = kvmalloc_array(num_possible_cpus(), @@ -1486,6 +1550,8 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data) p += ETH_GSTRING_LEN; sprintf(p, "rx_queue_%u_bytes", i); p += ETH_GSTRING_LEN; + sprintf(p, "rx_queue_%u_xdp_drop", i); + p += ETH_GSTRING_LEN; } for_each_present_cpu(cpu) { @@ -1782,10 +1848,27 @@ static int netvsc_set_ringparam(struct net_device *ndev, } out: - kfree(device_info); + netvsc_devinfo_put(device_info); return ret; } +static netdev_features_t netvsc_fix_features(struct net_device *ndev, + netdev_features_t features) +{ + struct net_device_context *ndevctx = netdev_priv(ndev); + struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev); + + if (!nvdev || nvdev->destroy) + return features; + + if ((features & NETIF_F_LRO) && netvsc_xdp_get(nvdev)) { + features ^= NETIF_F_LRO; + netdev_info(ndev, "Skip LRO - unsupported with XDP\n"); + } + + return features; +} + static int netvsc_set_features(struct net_device *ndev, netdev_features_t features) { @@ -1872,12 +1955,14 @@ static void netvsc_set_msglevel(struct net_device *ndev, u32 val) .ndo_start_xmit = netvsc_start_xmit, .ndo_change_rx_flags = netvsc_change_rx_flags, .ndo_set_rx_mode = netvsc_set_rx_mode, + .ndo_fix_features = netvsc_fix_features, .ndo_set_features = netvsc_set_features, .ndo_change_mtu = netvsc_change_mtu, .ndo_validate_addr = eth_validate_addr, .ndo_set_mac_address = netvsc_set_mac_addr, .ndo_select_queue = netvsc_select_queue, .ndo_get_stats64 = netvsc_get_stats64, + .ndo_bpf = netvsc_bpf, }; /* @@ -2164,6 +2249,7 @@ static int netvsc_register_vf(struct net_device *vf_netdev) { struct net_device_context *net_device_ctx; struct netvsc_device *netvsc_dev; + struct bpf_prog *prog; struct net_device *ndev; int ret; @@ -2208,6 +2294,9 @@ static int netvsc_register_vf(struct net_device *vf_netdev) vf_netdev->wanted_features = ndev->features; netdev_update_features(vf_netdev); + prog = netvsc_xdp_get(netvsc_dev); + netvsc_vf_setxdp(vf_netdev, prog); + return NOTIFY_OK; } @@ -2249,6 +2338,8 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev) netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name); + netvsc_vf_setxdp(vf_netdev, NULL); + netdev_rx_handler_unregister(vf_netdev); netdev_upper_dev_unlink(vf_netdev, ndev); RCU_INIT_POINTER(net_device_ctx->vf_netdev, NULL); @@ -2362,14 +2453,14 @@ static int netvsc_probe(struct hv_device *dev, list_add(&net_device_ctx->list, &netvsc_dev_list); rtnl_unlock(); - kfree(device_info); + netvsc_devinfo_put(device_info); return 0; register_failed: rtnl_unlock(); rndis_filter_device_remove(dev, nvdev); rndis_failed: - kfree(device_info); + netvsc_devinfo_put(device_info); devinfo_failed: free_percpu(net_device_ctx->vf_stats); no_stats: @@ -2397,8 +2488,10 @@ static int netvsc_remove(struct hv_device *dev) rtnl_lock(); nvdev = rtnl_dereference(ndev_ctx->nvdev); - if (nvdev) + if (nvdev) { cancel_work_sync(&nvdev->subchan_work); + netvsc_xdp_set(net, NULL, nvdev); + } /* * Call to the vsc driver to let it know that the device is being -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support 2019-10-28 21:07 ` [PATCH net-next, 3/4] hv_netvsc: Add XDP support Haiyang Zhang @ 2019-10-28 21:33 ` Jakub Kicinski 2019-10-29 19:17 ` Haiyang Zhang 0 siblings, 1 reply; 13+ messages in thread From: Jakub Kicinski @ 2019-10-28 21:33 UTC (permalink / raw) To: Haiyang Zhang Cc: sashal, linux-hyperv, netdev, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel On Mon, 28 Oct 2019 21:07:04 +0000, Haiyang Zhang wrote: > This patch adds support of XDP in native mode for hv_netvsc driver, and > transparently sets the XDP program on the associated VF NIC as well. > > XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO > before running XDP: > ethtool -K eth0 lro off > > XDP actions not yet supported: > XDP_TX, XDP_REDIRECT I don't think we want to merge support without at least XDP_TX these days.. And without the ability to prepend headers this may be the least complete initial XDP implementation we've seen :( > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c > index d22a36f..688487b 100644 > --- a/drivers/net/hyperv/netvsc.c > +++ b/drivers/net/hyperv/netvsc.c > @@ -122,8 +122,10 @@ static void free_netvsc_device(struct rcu_head *head) > vfree(nvdev->send_buf); > kfree(nvdev->send_section_map); > > - for (i = 0; i < VRSS_CHANNEL_MAX; i++) > + for (i = 0; i < VRSS_CHANNEL_MAX; i++) { > + xdp_rxq_info_unreg(&nvdev->chan_table[i].xdp_rxq); > vfree(nvdev->chan_table[i].mrc.slots); > + } > > kfree(nvdev); > } > @@ -1370,6 +1372,10 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device, > nvchan->net_device = net_device; > u64_stats_init(&nvchan->tx_stats.syncp); > u64_stats_init(&nvchan->rx_stats.syncp); > + > + xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i); > + xdp_rxq_info_reg_mem_model(&nvchan->xdp_rxq, > + MEM_TYPE_PAGE_SHARED, NULL); These can fail. > } > > /* Enable NAPI handler before init callbacks */ > diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c > new file mode 100644 > index 0000000..4d235ac > --- /dev/null > +++ b/drivers/net/hyperv/netvsc_bpf.c > @@ -0,0 +1,211 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Copyright (c) 2019, Microsoft Corporation. > + * > + * Author: > + * Haiyang Zhang <haiyangz@microsoft.com> > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include <linux/netdevice.h> > +#include <linux/etherdevice.h> > +#include <linux/ethtool.h> > +#include <linux/bpf.h> > +#include <linux/bpf_trace.h> > +#include <linux/kernel.h> > +#include <net/xdp.h> > + > +#include <linux/mutex.h> > +#include <linux/rtnetlink.h> > + > +#include "hyperv_net.h" > + > +u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan, > + void **p_pbuf) > +{ > + struct page *page = NULL; > + void *data = nvchan->rsc.data[0]; > + u32 len = nvchan->rsc.len[0]; > + void *pbuf = data; > + struct bpf_prog *prog; > + struct xdp_buff xdp; > + u32 act = XDP_PASS; > + > + *p_pbuf = NULL; > + > + rcu_read_lock(); > + prog = rcu_dereference(nvchan->bpf_prog); > + > + if (!prog || nvchan->rsc.cnt > 1) Can rsc.cnt == 1 not be ensured at setup time? This looks quite limiting if random frames could be forced to bypass the filter. > + goto out; > + > + /* copy to a new page buffer if data are not within a page */ > + if (virt_to_page(data) != virt_to_page(data + len - 1)) { > + page = alloc_page(GFP_ATOMIC); > + if (!page) > + goto out; Returning XDP_PASS on allocation failure seems highly questionable. > + pbuf = page_address(page); > + memcpy(pbuf, nvchan->rsc.data[0], len); > + > + *p_pbuf = pbuf; > + } > + > + xdp.data_hard_start = pbuf; > + xdp.data = xdp.data_hard_start; This patch also doesn't add any headroom for XDP to prepend data :( > + xdp_set_data_meta_invalid(&xdp); > + xdp.data_end = xdp.data + len; > + xdp.rxq = &nvchan->xdp_rxq; > + xdp.handle = 0; > + > + act = bpf_prog_run_xdp(prog, &xdp); > + > + switch (act) { > + case XDP_PASS: > + /* Pass to upper layers */ > + break; > + > + case XDP_ABORTED: > + trace_xdp_exception(ndev, prog, act); > + break; > + > + case XDP_DROP: > + break; > + > + default: > + bpf_warn_invalid_xdp_action(act); > + } > + > +out: > + rcu_read_unlock(); > + > + if (page && act != XDP_PASS) { > + *p_pbuf = NULL; > + __free_page(page); > + } > + > + return act; > +} > + > +unsigned int netvsc_xdp_fraglen(unsigned int len) > +{ > + return SKB_DATA_ALIGN(len) + > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > +} > + > +struct bpf_prog *netvsc_xdp_get(struct netvsc_device *nvdev) > +{ > + return rtnl_dereference(nvdev->chan_table[0].bpf_prog); > +} > + > +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog, > + struct netvsc_device *nvdev) > +{ > + struct bpf_prog *old_prog; > + int frag_max, i; > + > + old_prog = netvsc_xdp_get(nvdev); > + > + if (!old_prog && !prog) > + return 0; I think this case is now handled by the core. > + frag_max = netvsc_xdp_fraglen(dev->mtu + ETH_HLEN); > + if (prog && frag_max > PAGE_SIZE) { > + netdev_err(dev, "XDP: mtu:%u too large, frag:%u\n", > + dev->mtu, frag_max); > + return -EOPNOTSUPP; > + } > + > + if (prog && (dev->features & NETIF_F_LRO)) { > + netdev_err(dev, "XDP: not support LRO\n"); Please report this via extack, that way users will see it in the console in which they're installing the program. > + return -EOPNOTSUPP; > + } > + > + if (prog) { > + prog = bpf_prog_add(prog, nvdev->num_chn); > + if (IS_ERR(prog)) > + return PTR_ERR(prog); > + } > + > + for (i = 0; i < nvdev->num_chn; i++) > + rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog); > + > + if (old_prog) > + for (i = 0; i < nvdev->num_chn; i++) > + bpf_prog_put(old_prog); > + > + return 0; > +} > + > +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog) > +{ > + struct netdev_bpf xdp; > + bpf_op_t ndo_bpf; > + > + ASSERT_RTNL(); > + > + if (!vf_netdev) > + return 0; > + > + ndo_bpf = vf_netdev->netdev_ops->ndo_bpf; > + if (!ndo_bpf) > + return 0; > + > + memset(&xdp, 0, sizeof(xdp)); > + > + xdp.command = XDP_SETUP_PROG; > + xdp.prog = prog; > + > + return ndo_bpf(vf_netdev, &xdp); IMHO the automatic propagation is not a good idea. Especially if the propagation doesn't make the entire installation fail if VF doesn't have ndo_bpf. > +} ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH net-next, 3/4] hv_netvsc: Add XDP support 2019-10-28 21:33 ` Jakub Kicinski @ 2019-10-29 19:17 ` Haiyang Zhang 2019-10-29 19:53 ` Jakub Kicinski 2019-10-29 21:59 ` Stephen Hemminger 0 siblings, 2 replies; 13+ messages in thread From: Haiyang Zhang @ 2019-10-29 19:17 UTC (permalink / raw) To: Jakub Kicinski Cc: sashal, linux-hyperv, netdev, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel > -----Original Message----- > From: Jakub Kicinski <jakub.kicinski@netronome.com> > Sent: Monday, October 28, 2019 5:33 PM > To: Haiyang Zhang <haiyangz@microsoft.com> > Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org; > netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen > Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets > <vkuznets@redhat.com>; davem@davemloft.net; linux- > kernel@vger.kernel.org > Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support > > On Mon, 28 Oct 2019 21:07:04 +0000, Haiyang Zhang wrote: > > This patch adds support of XDP in native mode for hv_netvsc driver, and > > transparently sets the XDP program on the associated VF NIC as well. > > > > XDP program cannot run with LRO (RSC) enabled, so you need to disable > LRO > > before running XDP: > > ethtool -K eth0 lro off > > > > XDP actions not yet supported: > > XDP_TX, XDP_REDIRECT > > I don't think we want to merge support without at least XDP_TX these > days.. Thanks for your detailed comments -- I'm working on the XDP_TX... > > And without the ability to prepend headers this may be the least > complete initial XDP implementation we've seen :( The RNDIS packet buffer received by netvsc doesn't have a head room, but I'm considering copy the packets to the page buffer, with a head room space reserved for XDP. > > > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> > > > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c > > index d22a36f..688487b 100644 > > --- a/drivers/net/hyperv/netvsc.c > > +++ b/drivers/net/hyperv/netvsc.c > > @@ -122,8 +122,10 @@ static void free_netvsc_device(struct rcu_head > *head) > > vfree(nvdev->send_buf); > > kfree(nvdev->send_section_map); > > > > - for (i = 0; i < VRSS_CHANNEL_MAX; i++) > > + for (i = 0; i < VRSS_CHANNEL_MAX; i++) { > > + xdp_rxq_info_unreg(&nvdev->chan_table[i].xdp_rxq); > > vfree(nvdev->chan_table[i].mrc.slots); > > + } > > > > kfree(nvdev); > > } > > @@ -1370,6 +1372,10 @@ struct netvsc_device *netvsc_device_add(struct > hv_device *device, > > nvchan->net_device = net_device; > > u64_stats_init(&nvchan->tx_stats.syncp); > > u64_stats_init(&nvchan->rx_stats.syncp); > > + > > + xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i); > > + xdp_rxq_info_reg_mem_model(&nvchan->xdp_rxq, > > + MEM_TYPE_PAGE_SHARED, NULL); > > These can fail. I will add error handling. > > > } > > > > /* Enable NAPI handler before init callbacks */ > > diff --git a/drivers/net/hyperv/netvsc_bpf.c > b/drivers/net/hyperv/netvsc_bpf.c > > new file mode 100644 > > index 0000000..4d235ac > > --- /dev/null > > +++ b/drivers/net/hyperv/netvsc_bpf.c > > @@ -0,0 +1,211 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* Copyright (c) 2019, Microsoft Corporation. > > + * > > + * Author: > > + * Haiyang Zhang <haiyangz@microsoft.com> > > + */ > > + > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > + > > +#include <linux/netdevice.h> > > +#include <linux/etherdevice.h> > > +#include <linux/ethtool.h> > > +#include <linux/bpf.h> > > +#include <linux/bpf_trace.h> > > +#include <linux/kernel.h> > > +#include <net/xdp.h> > > + > > +#include <linux/mutex.h> > > +#include <linux/rtnetlink.h> > > + > > +#include "hyperv_net.h" > > + > > +u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel > *nvchan, > > + void **p_pbuf) > > +{ > > + struct page *page = NULL; > > + void *data = nvchan->rsc.data[0]; > > + u32 len = nvchan->rsc.len[0]; > > + void *pbuf = data; > > + struct bpf_prog *prog; > > + struct xdp_buff xdp; > > + u32 act = XDP_PASS; > > + > > + *p_pbuf = NULL; > > + > > + rcu_read_lock(); > > + prog = rcu_dereference(nvchan->bpf_prog); > > + > > + if (!prog || nvchan->rsc.cnt > 1) > > Can rsc.cnt == 1 not be ensured at setup time? This looks quite > limiting if random frames could be forced to bypass the filter. Yes, the setup code already check/ensure LRO is disabled. So rsc.cnt > 1 is NOT expected here. Just an error check. I will change the return value to XDP_ABORTED for this. > > > + goto out; > > + > > + /* copy to a new page buffer if data are not within a page */ > > + if (virt_to_page(data) != virt_to_page(data + len - 1)) { > > + page = alloc_page(GFP_ATOMIC); > > + if (!page) > > + goto out; > > Returning XDP_PASS on allocation failure seems highly questionable. I will change the return value to XDP_ABORTED for this too. > > > + pbuf = page_address(page); > > + memcpy(pbuf, nvchan->rsc.data[0], len); > > + > > + *p_pbuf = pbuf; > > + } > > + > > + xdp.data_hard_start = pbuf; > > + xdp.data = xdp.data_hard_start; > > This patch also doesn't add any headroom for XDP to prepend data :( I'm considering to add the headroom to the start of the page. > > > + xdp_set_data_meta_invalid(&xdp); > > + xdp.data_end = xdp.data + len; > > + xdp.rxq = &nvchan->xdp_rxq; > > + xdp.handle = 0; > > + > > + act = bpf_prog_run_xdp(prog, &xdp); > > + > > + switch (act) { > > + case XDP_PASS: > > + /* Pass to upper layers */ > > + break; > > + > > + case XDP_ABORTED: > > + trace_xdp_exception(ndev, prog, act); > > + break; > > + > > + case XDP_DROP: > > + break; > > + > > + default: > > + bpf_warn_invalid_xdp_action(act); > > + } > > + > > +out: > > + rcu_read_unlock(); > > + > > + if (page && act != XDP_PASS) { > > + *p_pbuf = NULL; > > + __free_page(page); > > + } > > + > > + return act; > > +} > > + > > +unsigned int netvsc_xdp_fraglen(unsigned int len) > > +{ > > + return SKB_DATA_ALIGN(len) + > > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > +} > > + > > +struct bpf_prog *netvsc_xdp_get(struct netvsc_device *nvdev) > > +{ > > + return rtnl_dereference(nvdev->chan_table[0].bpf_prog); > > +} > > + > > +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog, > > + struct netvsc_device *nvdev) > > +{ > > + struct bpf_prog *old_prog; > > + int frag_max, i; > > + > > + old_prog = netvsc_xdp_get(nvdev); > > + > > + if (!old_prog && !prog) > > + return 0; > > I think this case is now handled by the core. Thanks for the reminder. I saw the code in dev_change_xdp_fd(), so the upper layer doesn't call XDP_SETUP_PROG with old/new prog both NULL. But this function is also called by other functions in our driver, like netvsc_detach(), netvsc_remove(), etc. Instead of checking for NULL in each place, I still keep the check inside netvsc_xdp_set(). > > > + frag_max = netvsc_xdp_fraglen(dev->mtu + ETH_HLEN); > > + if (prog && frag_max > PAGE_SIZE) { > > + netdev_err(dev, "XDP: mtu:%u too large, frag:%u\n", > > + dev->mtu, frag_max); > > + return -EOPNOTSUPP; > > + } > > + > > + if (prog && (dev->features & NETIF_F_LRO)) { > > + netdev_err(dev, "XDP: not support LRO\n"); > > Please report this via extack, that way users will see it in the console > in which they're installing the program. I will. > > > + return -EOPNOTSUPP; > > + } > > + > > + if (prog) { > > + prog = bpf_prog_add(prog, nvdev->num_chn); > > + if (IS_ERR(prog)) > > + return PTR_ERR(prog); > > + } > > + > > + for (i = 0; i < nvdev->num_chn; i++) > > + rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog); > > + > > + if (old_prog) > > + for (i = 0; i < nvdev->num_chn; i++) > > + bpf_prog_put(old_prog); > > + > > + return 0; > > +} > > + > > +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog > *prog) > > +{ > > + struct netdev_bpf xdp; > > + bpf_op_t ndo_bpf; > > + > > + ASSERT_RTNL(); > > + > > + if (!vf_netdev) > > + return 0; > > + > > + ndo_bpf = vf_netdev->netdev_ops->ndo_bpf; > > + if (!ndo_bpf) > > + return 0; > > + > > + memset(&xdp, 0, sizeof(xdp)); > > + > > + xdp.command = XDP_SETUP_PROG; > > + xdp.prog = prog; > > + > > + return ndo_bpf(vf_netdev, &xdp); > > IMHO the automatic propagation is not a good idea. Especially if the > propagation doesn't make the entire installation fail if VF doesn't > have ndo_bpf. On Hyperv and Azure hosts, VF is always acting as a slave below netvsc. And they are both active -- most data packets go to VF, but broadcast, multicast, and TCP SYN packets go to netvsc synthetic data path. The synthetic NIC (netvsc) is also a failover NIC when VF is not available. We ask customers to only use the synthetic NIC directly. So propagation of XDP setting to VF NIC is desired. But, I will change the return code to error, so the entire installation fails if VF is present but unable to set XDP prog. Thanks, - Haiyang ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support 2019-10-29 19:17 ` Haiyang Zhang @ 2019-10-29 19:53 ` Jakub Kicinski 2019-10-29 20:01 ` Haiyang Zhang 2019-10-29 21:59 ` Stephen Hemminger 1 sibling, 1 reply; 13+ messages in thread From: Jakub Kicinski @ 2019-10-29 19:53 UTC (permalink / raw) To: Haiyang Zhang Cc: sashal, linux-hyperv, netdev, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel On Tue, 29 Oct 2019 19:17:25 +0000, Haiyang Zhang wrote: > > > +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog, > > > + struct netvsc_device *nvdev) > > > +{ > > > + struct bpf_prog *old_prog; > > > + int frag_max, i; > > > + > > > + old_prog = netvsc_xdp_get(nvdev); > > > + > > > + if (!old_prog && !prog) > > > + return 0; > > > > I think this case is now handled by the core. > Thanks for the reminder. I saw the code in dev_change_xdp_fd(), so the upper layer > doesn't call XDP_SETUP_PROG with old/new prog both NULL. > But this function is also called by other functions in our driver, like netvsc_detach(), > netvsc_remove(), etc. Instead of checking for NULL in each place, I still keep the check inside > netvsc_xdp_set(). I see. Makes sense on a closer look. BTW would you do me a favour and reformat this line: static struct netvsc_device_info *netvsc_devinfo_get (struct netvsc_device *nvdev) to look like this: static struct netvsc_device_info *netvsc_devinfo_get(struct netvsc_device *nvdev) or static struct netvsc_device_info * netvsc_devinfo_get(struct netvsc_device *nvdev) Otherwise git diff gets confused about which function given chunk belongs to. (Incorrectly thinking your patch is touching netvsc_get_channels()). I spent few minutes trying to figure out what's going on there :) > > > > > + return -EOPNOTSUPP; > > > + } > > > + > > > + if (prog) { > > > + prog = bpf_prog_add(prog, nvdev->num_chn); > > > + if (IS_ERR(prog)) > > > + return PTR_ERR(prog); > > > + } > > > + > > > + for (i = 0; i < nvdev->num_chn; i++) > > > + rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog); > > > + > > > + if (old_prog) > > > + for (i = 0; i < nvdev->num_chn; i++) > > > + bpf_prog_put(old_prog); > > > + > > > + return 0; > > > +} > > > + > > > +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog) > > > +{ > > > + struct netdev_bpf xdp; > > > + bpf_op_t ndo_bpf; > > > + > > > + ASSERT_RTNL(); > > > + > > > + if (!vf_netdev) > > > + return 0; > > > + > > > + ndo_bpf = vf_netdev->netdev_ops->ndo_bpf; > > > + if (!ndo_bpf) > > > + return 0; > > > + > > > + memset(&xdp, 0, sizeof(xdp)); > > > + > > > + xdp.command = XDP_SETUP_PROG; > > > + xdp.prog = prog; > > > + > > > + return ndo_bpf(vf_netdev, &xdp); > > > > IMHO the automatic propagation is not a good idea. Especially if the > > propagation doesn't make the entire installation fail if VF doesn't > > have ndo_bpf. > > On Hyperv and Azure hosts, VF is always acting as a slave below netvsc. > And they are both active -- most data packets go to VF, but broadcast, > multicast, and TCP SYN packets go to netvsc synthetic data path. The synthetic > NIC (netvsc) is also a failover NIC when VF is not available. > We ask customers to only use the synthetic NIC directly. So propagation > of XDP setting to VF NIC is desired. > But, I will change the return code to error, so the entire installation fails if VF is > present but unable to set XDP prog. Okay, if I read the rest of the code correctly you also fail attach if xdp propagation failed? If that's the case and we return an error here on missing NDO, then the propagation could be okay. So the semantics are these: (a) install on virt - potentially overwrites the existing VF prog; (b) install on VF is not noticed by virt; (c) uninstall on virt - clears both virt and VF, regardless what program was installed on virt; (d) uninstall on VF does not propagate; Since you're adding documentation it would perhaps be worth stating there that touching the program on the VF is not supported/may lead to breakage, and users should only touch/configure the program on the virt. ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH net-next, 3/4] hv_netvsc: Add XDP support 2019-10-29 19:53 ` Jakub Kicinski @ 2019-10-29 20:01 ` Haiyang Zhang 0 siblings, 0 replies; 13+ messages in thread From: Haiyang Zhang @ 2019-10-29 20:01 UTC (permalink / raw) To: Jakub Kicinski Cc: sashal, linux-hyperv, netdev, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel > -----Original Message----- > From: Jakub Kicinski <jakub.kicinski@netronome.com> > Sent: Tuesday, October 29, 2019 3:53 PM > To: Haiyang Zhang <haiyangz@microsoft.com> > Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org; > netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen > Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets > <vkuznets@redhat.com>; davem@davemloft.net; linux- > kernel@vger.kernel.org > Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support > > On Tue, 29 Oct 2019 19:17:25 +0000, Haiyang Zhang wrote: > > > > +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog, > > > > + struct netvsc_device *nvdev) > > > > +{ > > > > + struct bpf_prog *old_prog; > > > > + int frag_max, i; > > > > + > > > > + old_prog = netvsc_xdp_get(nvdev); > > > > + > > > > + if (!old_prog && !prog) > > > > + return 0; > > > > > > I think this case is now handled by the core. > > Thanks for the reminder. I saw the code in dev_change_xdp_fd(), so the > upper layer > > doesn't call XDP_SETUP_PROG with old/new prog both NULL. > > But this function is also called by other functions in our driver, like > netvsc_detach(), > > netvsc_remove(), etc. Instead of checking for NULL in each place, I still > keep the check inside > > netvsc_xdp_set(). > > I see. Makes sense on a closer look. > > BTW would you do me a favour and reformat this line: > > static struct netvsc_device_info *netvsc_devinfo_get > (struct netvsc_device *nvdev) > > to look like this: > > static > struct netvsc_device_info *netvsc_devinfo_get(struct netvsc_device > *nvdev) > > or > > static struct netvsc_device_info * > netvsc_devinfo_get(struct netvsc_device *nvdev) > > Otherwise git diff gets confused about which function given chunk > belongs to. (Incorrectly thinking your patch is touching > netvsc_get_channels()). I spent few minutes trying to figure out what's > going on there :) I will. > > > > > > > > + return -EOPNOTSUPP; > > > > + } > > > > + > > > > + if (prog) { > > > > + prog = bpf_prog_add(prog, nvdev->num_chn); > > > > + if (IS_ERR(prog)) > > > > + return PTR_ERR(prog); > > > > + } > > > > + > > > > + for (i = 0; i < nvdev->num_chn; i++) > > > > + rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog); > > > > + > > > > + if (old_prog) > > > > + for (i = 0; i < nvdev->num_chn; i++) > > > > + bpf_prog_put(old_prog); > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog > *prog) > > > > +{ > > > > + struct netdev_bpf xdp; > > > > + bpf_op_t ndo_bpf; > > > > + > > > > + ASSERT_RTNL(); > > > > + > > > > + if (!vf_netdev) > > > > + return 0; > > > > + > > > > + ndo_bpf = vf_netdev->netdev_ops->ndo_bpf; > > > > + if (!ndo_bpf) > > > > + return 0; > > > > + > > > > + memset(&xdp, 0, sizeof(xdp)); > > > > + > > > > + xdp.command = XDP_SETUP_PROG; > > > > + xdp.prog = prog; > > > > + > > > > + return ndo_bpf(vf_netdev, &xdp); > > > > > > IMHO the automatic propagation is not a good idea. Especially if the > > > propagation doesn't make the entire installation fail if VF doesn't > > > have ndo_bpf. > > > > On Hyperv and Azure hosts, VF is always acting as a slave below netvsc. > > And they are both active -- most data packets go to VF, but broadcast, > > multicast, and TCP SYN packets go to netvsc synthetic data path. The > synthetic > > NIC (netvsc) is also a failover NIC when VF is not available. > > We ask customers to only use the synthetic NIC directly. So propagation > > of XDP setting to VF NIC is desired. > > But, I will change the return code to error, so the entire installation fails if > VF is > > present but unable to set XDP prog. > > Okay, if I read the rest of the code correctly you also fail attach > if xdp propagation failed? If that's the case and we return an error > here on missing NDO, then the propagation could be okay. > > So the semantics are these: > > (a) install on virt - potentially overwrites the existing VF prog; > (b) install on VF is not noticed by virt; > (c) uninstall on virt - clears both virt and VF, regardless what > program was installed on virt; > (d) uninstall on VF does not propagate; > > Since you're adding documentation it would perhaps be worth stating > there that touching the program on the VF is not supported/may lead > to breakage, and users should only touch/configure the program on the > virt. Sure I will document the recommended way of install xdp prog. Thanks, - Haiyang ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support 2019-10-29 19:17 ` Haiyang Zhang 2019-10-29 19:53 ` Jakub Kicinski @ 2019-10-29 21:59 ` Stephen Hemminger 2019-10-29 22:08 ` Haiyang Zhang 1 sibling, 1 reply; 13+ messages in thread From: Stephen Hemminger @ 2019-10-29 21:59 UTC (permalink / raw) To: Haiyang Zhang Cc: Jakub Kicinski, sashal, linux-hyperv, netdev, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel On Tue, 29 Oct 2019 19:17:25 +0000 Haiyang Zhang <haiyangz@microsoft.com> wrote: > > -----Original Message----- > > From: Jakub Kicinski <jakub.kicinski@netronome.com> > > Sent: Monday, October 28, 2019 5:33 PM > > To: Haiyang Zhang <haiyangz@microsoft.com> > > Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org; > > netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen > > Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets > > <vkuznets@redhat.com>; davem@davemloft.net; linux- > > kernel@vger.kernel.org > > Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support > > > > On Mon, 28 Oct 2019 21:07:04 +0000, Haiyang Zhang wrote: > > > This patch adds support of XDP in native mode for hv_netvsc driver, and > > > transparently sets the XDP program on the associated VF NIC as well. > > > > > > XDP program cannot run with LRO (RSC) enabled, so you need to disable > > LRO > > > before running XDP: > > > ethtool -K eth0 lro off > > > > > > XDP actions not yet supported: > > > XDP_TX, XDP_REDIRECT > > > > I don't think we want to merge support without at least XDP_TX these > > days.. > Thanks for your detailed comments -- > I'm working on the XDP_TX... > > > > > And without the ability to prepend headers this may be the least > > complete initial XDP implementation we've seen :( > The RNDIS packet buffer received by netvsc doesn't have a head room, but I'm > considering copy the packets to the page buffer, with a head room space > reserved for XDP. There is a small amount of headroom available by reusing the RNDIS header and packet space. Looks like 40 bytes or so. ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH net-next, 3/4] hv_netvsc: Add XDP support 2019-10-29 21:59 ` Stephen Hemminger @ 2019-10-29 22:08 ` Haiyang Zhang 0 siblings, 0 replies; 13+ messages in thread From: Haiyang Zhang @ 2019-10-29 22:08 UTC (permalink / raw) To: Stephen Hemminger Cc: Jakub Kicinski, sashal, linux-hyperv, netdev, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel > -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Tuesday, October 29, 2019 5:59 PM > To: Haiyang Zhang <haiyangz@microsoft.com> > Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; sashal@kernel.org; > linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; KY Srinivasan > <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; > olaf@aepfle.de; vkuznets <vkuznets@redhat.com>; davem@davemloft.net; > linux-kernel@vger.kernel.org > Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support > > On Tue, 29 Oct 2019 19:17:25 +0000 > Haiyang Zhang <haiyangz@microsoft.com> wrote: > > > > -----Original Message----- > > > From: Jakub Kicinski <jakub.kicinski@netronome.com> > > > Sent: Monday, October 28, 2019 5:33 PM > > > To: Haiyang Zhang <haiyangz@microsoft.com> > > > Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org; > > > netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen > > > Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets > > > <vkuznets@redhat.com>; davem@davemloft.net; linux- > > > kernel@vger.kernel.org > > > Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support > > > > > > On Mon, 28 Oct 2019 21:07:04 +0000, Haiyang Zhang wrote: > > > > This patch adds support of XDP in native mode for hv_netvsc driver, > and > > > > transparently sets the XDP program on the associated VF NIC as well. > > > > > > > > XDP program cannot run with LRO (RSC) enabled, so you need to > disable > > > LRO > > > > before running XDP: > > > > ethtool -K eth0 lro off > > > > > > > > XDP actions not yet supported: > > > > XDP_TX, XDP_REDIRECT > > > > > > I don't think we want to merge support without at least XDP_TX these > > > days.. > > Thanks for your detailed comments -- > > I'm working on the XDP_TX... > > > > > > > > And without the ability to prepend headers this may be the least > > > complete initial XDP implementation we've seen :( > > The RNDIS packet buffer received by netvsc doesn't have a head room, but > I'm > > considering copy the packets to the page buffer, with a head room space > > reserved for XDP. > > > There is a small amount of headroom available by reusing the RNDIS > header and packet space. Looks like 40 bytes or so. Yes, I thought about the RNDIS header. But is this space sufficient? I saw some drivers, like virtio_net gives bigger headroom: 256 bytes. Thanks, - Haiyang ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net-next, 4/4] hv_netvsc: Update document for XDP support 2019-10-28 21:06 [PATCH net-next, 0/4] hv_netvsc: Add XDP support and some error handling fixes Haiyang Zhang ` (2 preceding siblings ...) 2019-10-28 21:07 ` [PATCH net-next, 3/4] hv_netvsc: Add XDP support Haiyang Zhang @ 2019-10-28 21:07 ` Haiyang Zhang 3 siblings, 0 replies; 13+ messages in thread From: Haiyang Zhang @ 2019-10-28 21:07 UTC (permalink / raw) To: sashal, linux-hyperv, netdev Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf, vkuznets, davem, linux-kernel Added the new section in the document regarding XDP support by hv_netvsc driver. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> --- .../networking/device_drivers/microsoft/netvsc.txt | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/Documentation/networking/device_drivers/microsoft/netvsc.txt b/Documentation/networking/device_drivers/microsoft/netvsc.txt index 3bfa635..69ccfca 100644 --- a/Documentation/networking/device_drivers/microsoft/netvsc.txt +++ b/Documentation/networking/device_drivers/microsoft/netvsc.txt @@ -82,3 +82,17 @@ Features contain one or more packets. The send buffer is an optimization, the driver will use slower method to handle very large packets or if the send buffer area is exhausted. + + XDP support + ----------- + XDP (eXpress Data Path) is a feature that runs eBPF bytecode at the early + stage when packets arrive at a NIC card. The goal is to increase performance + for packet processing, reducing the overhead of SKB allocation and other + upper network layers. + + hv_netvsc supports XDP in native mode, and transparently sets the XDP + program on the associated VF NIC as well. + + XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO + before running XDP: + ethtool -K eth0 lro off -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-11-04 15:09 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-28 21:06 [PATCH net-next, 0/4] hv_netvsc: Add XDP support and some error handling fixes Haiyang Zhang 2019-10-28 21:07 ` [PATCH net-next, 1/4] hv_netvsc: Fix error handling in netvsc_set_features() Haiyang Zhang 2019-10-28 21:07 ` [PATCH net-next, 2/4] hv_netvsc: Fix error handling in netvsc_attach() Haiyang Zhang 2019-11-01 20:42 ` Markus Elfring 2019-11-04 15:08 ` Haiyang Zhang 2019-10-28 21:07 ` [PATCH net-next, 3/4] hv_netvsc: Add XDP support Haiyang Zhang 2019-10-28 21:33 ` Jakub Kicinski 2019-10-29 19:17 ` Haiyang Zhang 2019-10-29 19:53 ` Jakub Kicinski 2019-10-29 20:01 ` Haiyang Zhang 2019-10-29 21:59 ` Stephen Hemminger 2019-10-29 22:08 ` Haiyang Zhang 2019-10-28 21:07 ` [PATCH net-next, 4/4] hv_netvsc: Update document for " Haiyang Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).