* [PATCH bpf-next 0/3] veth: Bulk XDP_TX @ 2019-05-23 10:56 Toshiaki Makita 2019-05-23 10:56 ` [PATCH bpf-next 1/3] xdp: Add bulk XDP_TX queue Toshiaki Makita ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Toshiaki Makita @ 2019-05-23 10:56 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: Toshiaki Makita, netdev, xdp-newbies, bpf This adds an infrastructure for bulk XDP_TX and makes veth use it. Improves XDP_TX performance by approximately 8%. The detailed performance numbers are shown in patch 3. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Toshiaki Makita (3): xdp: Add bulk XDP_TX queue xdp: Add tracepoint for bulk XDP_TX veth: Support bulk XDP_TX drivers/net/veth.c | 26 +++++++++++++++++++++++++- include/net/xdp.h | 7 +++++++ include/trace/events/xdp.h | 25 +++++++++++++++++++++++++ kernel/bpf/core.c | 1 + net/core/xdp.c | 3 +++ 5 files changed, 61 insertions(+), 1 deletion(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH bpf-next 1/3] xdp: Add bulk XDP_TX queue 2019-05-23 10:56 [PATCH bpf-next 0/3] veth: Bulk XDP_TX Toshiaki Makita @ 2019-05-23 10:56 ` Toshiaki Makita 2019-05-23 11:11 ` Toke Høiland-Jørgensen 2019-05-23 10:56 ` [PATCH bpf-next 2/3] xdp: Add tracepoint for bulk XDP_TX Toshiaki Makita 2019-05-23 10:56 ` [PATCH bpf-next 3/3] veth: Support " Toshiaki Makita 2 siblings, 1 reply; 23+ messages in thread From: Toshiaki Makita @ 2019-05-23 10:56 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: Toshiaki Makita, netdev, xdp-newbies, bpf XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the heavy cost of indirect call but it also reduces lock acquisition on the destination device that needs locks like veth and tun. XDP_TX does not use indirect calls but drivers which require locks can benefit from the bulk transmit for XDP_TX as well. This patch adds per-cpu queues which can be used for bulk transmit on XDP_TX. I did not add functions like enqueue/flush but exposed the queue directly because we should avoid indirect calls on XDP_TX. Note that the queue must be flushed, i.e. "count" member needs to be set to 0, when a NAPI handler which used this queue exits. Otherwise packets left in the queue will be transmitted from totally unintentional devices. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> --- include/net/xdp.h | 7 +++++++ net/core/xdp.c | 3 +++ 2 files changed, 10 insertions(+) diff --git a/include/net/xdp.h b/include/net/xdp.h index 0f25b36..30b36c8 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -84,6 +84,13 @@ struct xdp_frame { struct net_device *dev_rx; /* used by cpumap */ }; +#define XDP_TX_BULK_SIZE 16 +struct xdp_tx_bulk_queue { + struct xdp_frame *q[XDP_TX_BULK_SIZE]; + unsigned int count; +}; +DECLARE_PER_CPU(struct xdp_tx_bulk_queue, xdp_tx_bq); + /* Clear kernel pointers in xdp_frame */ static inline void xdp_scrub_frame(struct xdp_frame *frame) { diff --git a/net/core/xdp.c b/net/core/xdp.c index 4b2b194..0622f2d 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -40,6 +40,9 @@ struct xdp_mem_allocator { struct rcu_head rcu; }; +DEFINE_PER_CPU(struct xdp_tx_bulk_queue, xdp_tx_bq); +EXPORT_PER_CPU_SYMBOL_GPL(xdp_tx_bq); + static u32 xdp_mem_id_hashfn(const void *data, u32 len, u32 seed) { const u32 *k = data; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 1/3] xdp: Add bulk XDP_TX queue 2019-05-23 10:56 ` [PATCH bpf-next 1/3] xdp: Add bulk XDP_TX queue Toshiaki Makita @ 2019-05-23 11:11 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 23+ messages in thread From: Toke Høiland-Jørgensen @ 2019-05-23 11:11 UTC (permalink / raw) To: Toshiaki Makita, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: Toshiaki Makita, netdev, xdp-newbies, bpf Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to > the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the > heavy cost of indirect call but it also reduces lock acquisition on the > destination device that needs locks like veth and tun. > > XDP_TX does not use indirect calls but drivers which require locks can > benefit from the bulk transmit for XDP_TX as well. XDP_TX happens on the same device, so there's an implicit bulking happening because of the NAPI cycle. So why is an additional mechanism needed (in the general case)? -Toke ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 1/3] xdp: Add bulk XDP_TX queue @ 2019-05-23 11:11 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 23+ messages in thread From: Toke Høiland-Jørgensen @ 2019-05-23 11:11 UTC (permalink / raw) To: Toshiaki Makita, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: netdev, xdp-newbies, bpf Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to > the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the > heavy cost of indirect call but it also reduces lock acquisition on the > destination device that needs locks like veth and tun. > > XDP_TX does not use indirect calls but drivers which require locks can > benefit from the bulk transmit for XDP_TX as well. XDP_TX happens on the same device, so there's an implicit bulking happening because of the NAPI cycle. So why is an additional mechanism needed (in the general case)? -Toke ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 1/3] xdp: Add bulk XDP_TX queue 2019-05-23 11:11 ` Toke Høiland-Jørgensen (?) @ 2019-05-23 11:24 ` Toshiaki Makita 2019-05-23 11:33 ` Toke Høiland-Jørgensen -1 siblings, 1 reply; 23+ messages in thread From: Toshiaki Makita @ 2019-05-23 11:24 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: netdev, xdp-newbies, bpf On 2019/05/23 20:11, Toke Høiland-Jørgensen wrote: > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > >> XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to >> the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the >> heavy cost of indirect call but it also reduces lock acquisition on the >> destination device that needs locks like veth and tun. >> >> XDP_TX does not use indirect calls but drivers which require locks can >> benefit from the bulk transmit for XDP_TX as well. > > XDP_TX happens on the same device, so there's an implicit bulking > happening because of the NAPI cycle. So why is an additional mechanism > needed (in the general case)? Not sure what the implicit bulking you mention is. XDP_TX calls .ndo_xdp_xmit() for each packet, and it acquires a lock in veth and tun. To avoid this, we need additional storage for bulking like devmap for XDP_REDIRECT. -- Toshiaki Makita ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 1/3] xdp: Add bulk XDP_TX queue 2019-05-23 11:24 ` Toshiaki Makita @ 2019-05-23 11:33 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 23+ messages in thread From: Toke Høiland-Jørgensen @ 2019-05-23 11:33 UTC (permalink / raw) To: Toshiaki Makita, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: netdev, xdp-newbies, bpf Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > On 2019/05/23 20:11, Toke Høiland-Jørgensen wrote: >> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: >> >>> XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to >>> the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the >>> heavy cost of indirect call but it also reduces lock acquisition on the >>> destination device that needs locks like veth and tun. >>> >>> XDP_TX does not use indirect calls but drivers which require locks can >>> benefit from the bulk transmit for XDP_TX as well. >> >> XDP_TX happens on the same device, so there's an implicit bulking >> happening because of the NAPI cycle. So why is an additional mechanism >> needed (in the general case)? > > Not sure what the implicit bulking you mention is. XDP_TX calls > .ndo_xdp_xmit() for each packet, and it acquires a lock in veth and > tun. To avoid this, we need additional storage for bulking like devmap > for XDP_REDIRECT. The bulking is in veth_poll(), where veth_xdp_flush() is only called at the end. But see my other reply to the veth.c patch for the lock contention issue... -Toke ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH bpf-next 2/3] xdp: Add tracepoint for bulk XDP_TX 2019-05-23 10:56 [PATCH bpf-next 0/3] veth: Bulk XDP_TX Toshiaki Makita 2019-05-23 10:56 ` [PATCH bpf-next 1/3] xdp: Add bulk XDP_TX queue Toshiaki Makita @ 2019-05-23 10:56 ` Toshiaki Makita 2019-05-23 13:12 ` Jesper Dangaard Brouer 2019-05-23 10:56 ` [PATCH bpf-next 3/3] veth: Support " Toshiaki Makita 2 siblings, 1 reply; 23+ messages in thread From: Toshiaki Makita @ 2019-05-23 10:56 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: Toshiaki Makita, netdev, xdp-newbies, bpf This is introduced for admins to check what is happening on XDP_TX when bulk XDP_TX is in use. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> --- include/trace/events/xdp.h | 25 +++++++++++++++++++++++++ kernel/bpf/core.c | 1 + 2 files changed, 26 insertions(+) diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index e95cb86..e06ea65 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -50,6 +50,31 @@ __entry->ifindex) ); +TRACE_EVENT(xdp_bulk_tx, + + TP_PROTO(const struct net_device *dev, + int sent, int drops, int err), + + TP_ARGS(dev, sent, drops, err), + + TP_STRUCT__entry( + __field(int, ifindex) + __field(int, drops) + __field(int, sent) + __field(int, err) + ), + + TP_fast_assign( + __entry->ifindex = dev->ifindex; + __entry->drops = drops; + __entry->sent = sent; + __entry->err = err; + ), + + TP_printk("ifindex=%d sent=%d drops=%d err=%d", + __entry->ifindex, __entry->sent, __entry->drops, __entry->err) +); + DECLARE_EVENT_CLASS(xdp_redirect_template, TP_PROTO(const struct net_device *dev, diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 242a643..7687488 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2108,3 +2108,4 @@ int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, #include <linux/bpf_trace.h> EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception); +EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 2/3] xdp: Add tracepoint for bulk XDP_TX 2019-05-23 10:56 ` [PATCH bpf-next 2/3] xdp: Add tracepoint for bulk XDP_TX Toshiaki Makita @ 2019-05-23 13:12 ` Jesper Dangaard Brouer 2019-05-24 1:33 ` Toshiaki Makita 0 siblings, 1 reply; 23+ messages in thread From: Jesper Dangaard Brouer @ 2019-05-23 13:12 UTC (permalink / raw) To: Toshiaki Makita Cc: Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev, xdp-newbies, bpf, brouer On Thu, 23 May 2019 19:56:47 +0900 Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: > This is introduced for admins to check what is happening on XDP_TX when > bulk XDP_TX is in use. > > Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> > --- > include/trace/events/xdp.h | 25 +++++++++++++++++++++++++ > kernel/bpf/core.c | 1 + > 2 files changed, 26 insertions(+) > > diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h > index e95cb86..e06ea65 100644 > --- a/include/trace/events/xdp.h > +++ b/include/trace/events/xdp.h > @@ -50,6 +50,31 @@ > __entry->ifindex) > ); > > +TRACE_EVENT(xdp_bulk_tx, > + You are using this tracepoint like/instead of trace_xdp_devmap_xmit if I understand correctly? Or maybe the trace_xdp_redirect tracepoint. The point is that is will be good if the tracepoints can share the TP_STRUCT layout beginning, as it allows for attaching and reusing eBPF code that is only interested in the top part of the struct. I would also want to see some identifier, that trace programs can use to group and corrolate these events, you do have ifindex, but most other XDP tracepoints also have "prog_id". > + TP_PROTO(const struct net_device *dev, > + int sent, int drops, int err), > + > + TP_ARGS(dev, sent, drops, err), > + > + TP_STRUCT__entry( > + __field(int, ifindex) > + __field(int, drops) > + __field(int, sent) > + __field(int, err) > + ), The xdp_redirect_template have: TP_STRUCT__entry( __field(int, prog_id) __field(u32, act) __field(int, ifindex) __field(int, err) __field(int, to_ifindex) __field(u32, map_id) __field(int, map_index) ), And e.g. TRACE_EVENT xdp_exception have: TP_STRUCT__entry( __field(int, prog_id) __field(u32, act) __field(int, ifindex) ), > + > + TP_fast_assign( > + __entry->ifindex = dev->ifindex; > + __entry->drops = drops; > + __entry->sent = sent; > + __entry->err = err; > + ), > + > + TP_printk("ifindex=%d sent=%d drops=%d err=%d", > + __entry->ifindex, __entry->sent, __entry->drops, __entry->err) > +); > + > DECLARE_EVENT_CLASS(xdp_redirect_template, > > TP_PROTO(const struct net_device *dev, > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index 242a643..7687488 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -2108,3 +2108,4 @@ int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, > #include <linux/bpf_trace.h> > > EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception); > +EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx); -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 2/3] xdp: Add tracepoint for bulk XDP_TX 2019-05-23 13:12 ` Jesper Dangaard Brouer @ 2019-05-24 1:33 ` Toshiaki Makita 0 siblings, 0 replies; 23+ messages in thread From: Toshiaki Makita @ 2019-05-24 1:33 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev, xdp-newbies, bpf On 2019/05/23 22:12, Jesper Dangaard Brouer wrote: > On Thu, 23 May 2019 19:56:47 +0900 > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: > >> This is introduced for admins to check what is happening on XDP_TX when >> bulk XDP_TX is in use. >> >> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >> --- >> include/trace/events/xdp.h | 25 +++++++++++++++++++++++++ >> kernel/bpf/core.c | 1 + >> 2 files changed, 26 insertions(+) >> >> diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h >> index e95cb86..e06ea65 100644 >> --- a/include/trace/events/xdp.h >> +++ b/include/trace/events/xdp.h >> @@ -50,6 +50,31 @@ >> __entry->ifindex) >> ); >> >> +TRACE_EVENT(xdp_bulk_tx, >> + > > You are using this tracepoint like/instead of trace_xdp_devmap_xmit if > I understand correctly? Or maybe the trace_xdp_redirect tracepoint. Yes, I have trace_xdp_devmap_xmit in mind, which is for XDP_REDIRECT. > The point is that is will be good if the tracepoints can share the > TP_STRUCT layout beginning, as it allows for attaching and reusing eBPF > code that is only interested in the top part of the struct. It's good, but this tracepoint does not have map concept so differs from xdp_devmap_xmit. > I would also want to see some identifier, that trace programs can use > to group and corrolate these events, you do have ifindex, but most > other XDP tracepoints also have "prog_id". I have considered that too. The problem is that we cannot pass a reliable prog_id since bulk xmit happens after RCU critical section of XDP_TX. xdp_devmap_xmit does not have prog_id and I guess there is a similar reason for it? -- Toshiaki Makita ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-23 10:56 [PATCH bpf-next 0/3] veth: Bulk XDP_TX Toshiaki Makita 2019-05-23 10:56 ` [PATCH bpf-next 1/3] xdp: Add bulk XDP_TX queue Toshiaki Makita 2019-05-23 10:56 ` [PATCH bpf-next 2/3] xdp: Add tracepoint for bulk XDP_TX Toshiaki Makita @ 2019-05-23 10:56 ` Toshiaki Makita 2019-05-23 11:25 ` Toke Høiland-Jørgensen 2 siblings, 1 reply; 23+ messages in thread From: Toshiaki Makita @ 2019-05-23 10:56 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: Toshiaki Makita, netdev, xdp-newbies, bpf This improves XDP_TX performance by about 8%. Here are single core XDP_TX test results. CPU consumptions are taken from "perf report --no-child". - Before: 7.26 Mpps _raw_spin_lock 7.83% veth_xdp_xmit 12.23% - After: 7.84 Mpps _raw_spin_lock 1.17% veth_xdp_xmit 6.45% Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> --- drivers/net/veth.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 52110e5..4edc75f 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device *dev, int n, return ret; } +static void veth_xdp_flush_bq(struct net_device *dev) +{ + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); + int sent, i, err = 0; + + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); + if (sent < 0) { + err = sent; + sent = 0; + for (i = 0; i < bq->count; i++) + xdp_return_frame(bq->q[i]); + } + trace_xdp_bulk_tx(dev, sent, bq->count - sent, err); + + bq->count = 0; +} + static void veth_xdp_flush(struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); @@ -449,6 +466,7 @@ static void veth_xdp_flush(struct net_device *dev) struct veth_rq *rq; rcu_read_lock(); + veth_xdp_flush_bq(dev); rcv = rcu_dereference(priv->peer); if (unlikely(!rcv)) goto out; @@ -466,12 +484,18 @@ static void veth_xdp_flush(struct net_device *dev) static int veth_xdp_tx(struct net_device *dev, struct xdp_buff *xdp) { + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); struct xdp_frame *frame = convert_to_xdp_frame(xdp); if (unlikely(!frame)) return -EOVERFLOW; - return veth_xdp_xmit(dev, 1, &frame, 0); + if (unlikely(bq->count == XDP_TX_BULK_SIZE)) + veth_xdp_flush_bq(dev); + + bq->q[bq->count++] = frame; + + return 0; } static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-23 10:56 ` [PATCH bpf-next 3/3] veth: Support " Toshiaki Makita @ 2019-05-23 11:25 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 23+ messages in thread From: Toke Høiland-Jørgensen @ 2019-05-23 11:25 UTC (permalink / raw) To: Toshiaki Makita, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: Toshiaki Makita, netdev, xdp-newbies, bpf Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > This improves XDP_TX performance by about 8%. > > Here are single core XDP_TX test results. CPU consumptions are taken > from "perf report --no-child". > > - Before: > > 7.26 Mpps > > _raw_spin_lock 7.83% > veth_xdp_xmit 12.23% > > - After: > > 7.84 Mpps > > _raw_spin_lock 1.17% > veth_xdp_xmit 6.45% > > Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> > --- > drivers/net/veth.c | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index 52110e5..4edc75f 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device *dev, int n, > return ret; > } > > +static void veth_xdp_flush_bq(struct net_device *dev) > +{ > + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); > + int sent, i, err = 0; > + > + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So you're introducing an additional per-cpu bulk queue, only to avoid lock contention around the existing pointer ring. But the pointer ring is per-rq, so if you have lock contention, this means you must have multiple CPUs servicing the same rq, no? So why not just fix that instead? -Toke ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX @ 2019-05-23 11:25 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 23+ messages in thread From: Toke Høiland-Jørgensen @ 2019-05-23 11:25 UTC (permalink / raw) To: Toshiaki Makita, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: netdev, xdp-newbies, bpf Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > This improves XDP_TX performance by about 8%. > > Here are single core XDP_TX test results. CPU consumptions are taken > from "perf report --no-child". > > - Before: > > 7.26 Mpps > > _raw_spin_lock 7.83% > veth_xdp_xmit 12.23% > > - After: > > 7.84 Mpps > > _raw_spin_lock 1.17% > veth_xdp_xmit 6.45% > > Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> > --- > drivers/net/veth.c | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index 52110e5..4edc75f 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device *dev, int n, > return ret; > } > > +static void veth_xdp_flush_bq(struct net_device *dev) > +{ > + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); > + int sent, i, err = 0; > + > + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So you're introducing an additional per-cpu bulk queue, only to avoid lock contention around the existing pointer ring. But the pointer ring is per-rq, so if you have lock contention, this means you must have multiple CPUs servicing the same rq, no? So why not just fix that instead? -Toke ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-23 11:25 ` Toke Høiland-Jørgensen (?) @ 2019-05-23 11:35 ` Toshiaki Makita 2019-05-23 12:18 ` Toke Høiland-Jørgensen 2019-05-23 13:29 ` Jesper Dangaard Brouer -1 siblings, 2 replies; 23+ messages in thread From: Toshiaki Makita @ 2019-05-23 11:35 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: netdev, xdp-newbies, bpf On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > >> This improves XDP_TX performance by about 8%. >> >> Here are single core XDP_TX test results. CPU consumptions are taken >> from "perf report --no-child". >> >> - Before: >> >> 7.26 Mpps >> >> _raw_spin_lock 7.83% >> veth_xdp_xmit 12.23% >> >> - After: >> >> 7.84 Mpps >> >> _raw_spin_lock 1.17% >> veth_xdp_xmit 6.45% >> >> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >> --- >> drivers/net/veth.c | 26 +++++++++++++++++++++++++- >> 1 file changed, 25 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >> index 52110e5..4edc75f 100644 >> --- a/drivers/net/veth.c >> +++ b/drivers/net/veth.c >> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device *dev, int n, >> return ret; >> } >> >> +static void veth_xdp_flush_bq(struct net_device *dev) >> +{ >> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); >> + int sent, i, err = 0; >> + >> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); > > Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So > you're introducing an additional per-cpu bulk queue, only to avoid lock > contention around the existing pointer ring. But the pointer ring is > per-rq, so if you have lock contention, this means you must have > multiple CPUs servicing the same rq, no? Yes, it's possible. Not recommended though. > So why not just fix that > instead? The queues are shared with packets from stack sent from peer. That's because I needed the lock. I have tried to separate the queues, one for redirect and one for stack, but receiver side got too complicated and it ended up with worse performance. -- Toshiaki Makita ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-23 11:35 ` Toshiaki Makita @ 2019-05-23 12:18 ` Toke Høiland-Jørgensen 2019-05-23 13:40 ` Toshiaki Makita 2019-05-23 13:29 ` Jesper Dangaard Brouer 1 sibling, 1 reply; 23+ messages in thread From: Toke Høiland-Jørgensen @ 2019-05-23 12:18 UTC (permalink / raw) To: Toshiaki Makita, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: netdev, xdp-newbies, bpf Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: >> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: >> >>> This improves XDP_TX performance by about 8%. >>> >>> Here are single core XDP_TX test results. CPU consumptions are taken >>> from "perf report --no-child". >>> >>> - Before: >>> >>> 7.26 Mpps >>> >>> _raw_spin_lock 7.83% >>> veth_xdp_xmit 12.23% >>> >>> - After: >>> >>> 7.84 Mpps >>> >>> _raw_spin_lock 1.17% >>> veth_xdp_xmit 6.45% >>> >>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >>> --- >>> drivers/net/veth.c | 26 +++++++++++++++++++++++++- >>> 1 file changed, 25 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>> index 52110e5..4edc75f 100644 >>> --- a/drivers/net/veth.c >>> +++ b/drivers/net/veth.c >>> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device *dev, int n, >>> return ret; >>> } >>> >>> +static void veth_xdp_flush_bq(struct net_device *dev) >>> +{ >>> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); >>> + int sent, i, err = 0; >>> + >>> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); >> >> Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So >> you're introducing an additional per-cpu bulk queue, only to avoid lock >> contention around the existing pointer ring. But the pointer ring is >> per-rq, so if you have lock contention, this means you must have >> multiple CPUs servicing the same rq, no? > > Yes, it's possible. Not recommended though. > >> So why not just fix that instead? > > The queues are shared with packets from stack sent from peer. That's > because I needed the lock. I have tried to separate the queues, one for > redirect and one for stack, but receiver side got too complicated and it > ended up with worse performance. I meant fix it with configuration. Now many receive queues are you running on the veth device in your benchmarks, and how have you configured the RPS? -Toke ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-23 12:18 ` Toke Høiland-Jørgensen @ 2019-05-23 13:40 ` Toshiaki Makita 0 siblings, 0 replies; 23+ messages in thread From: Toshiaki Makita @ 2019-05-23 13:40 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Toshiaki Makita, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend Cc: netdev, xdp-newbies, bpf On 19/05/23 (木) 21:18:25, Toke Høiland-Jørgensen wrote: > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > >> On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: >>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: >>> >>>> This improves XDP_TX performance by about 8%. >>>> >>>> Here are single core XDP_TX test results. CPU consumptions are taken >>>> from "perf report --no-child". >>>> >>>> - Before: >>>> >>>> 7.26 Mpps >>>> >>>> _raw_spin_lock 7.83% >>>> veth_xdp_xmit 12.23% >>>> >>>> - After: >>>> >>>> 7.84 Mpps >>>> >>>> _raw_spin_lock 1.17% >>>> veth_xdp_xmit 6.45% >>>> >>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >>>> --- >>>> drivers/net/veth.c | 26 +++++++++++++++++++++++++- >>>> 1 file changed, 25 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>>> index 52110e5..4edc75f 100644 >>>> --- a/drivers/net/veth.c >>>> +++ b/drivers/net/veth.c >>>> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device *dev, int n, >>>> return ret; >>>> } >>>> >>>> +static void veth_xdp_flush_bq(struct net_device *dev) >>>> +{ >>>> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); >>>> + int sent, i, err = 0; >>>> + >>>> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); >>> >>> Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So >>> you're introducing an additional per-cpu bulk queue, only to avoid lock >>> contention around the existing pointer ring. But the pointer ring is >>> per-rq, so if you have lock contention, this means you must have >>> multiple CPUs servicing the same rq, no? >> >> Yes, it's possible. Not recommended though. >> >>> So why not just fix that instead? >> >> The queues are shared with packets from stack sent from peer. That's >> because I needed the lock. I have tried to separate the queues, one for >> redirect and one for stack, but receiver side got too complicated and it >> ended up with worse performance. > > I meant fix it with configuration. Now many receive queues are you > running on the veth device in your benchmarks, and how have you > configured the RPS? As I wrote this test is a single queue test and does not have any contention. Per packet lock has some overhead even in that configuration. Toshiaki Makita ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-23 11:35 ` Toshiaki Makita 2019-05-23 12:18 ` Toke Høiland-Jørgensen @ 2019-05-23 13:29 ` Jesper Dangaard Brouer 2019-05-23 13:51 ` Toshiaki Makita 1 sibling, 1 reply; 23+ messages in thread From: Jesper Dangaard Brouer @ 2019-05-23 13:29 UTC (permalink / raw) To: Toshiaki Makita Cc: Toke Høiland-Jørgensen, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev, xdp-newbies, bpf, brouer On Thu, 23 May 2019 20:35:50 +0900 Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: > On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: > > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > > > >> This improves XDP_TX performance by about 8%. > >> > >> Here are single core XDP_TX test results. CPU consumptions are taken > >> from "perf report --no-child". > >> > >> - Before: > >> > >> 7.26 Mpps > >> > >> _raw_spin_lock 7.83% > >> veth_xdp_xmit 12.23% > >> > >> - After: > >> > >> 7.84 Mpps > >> > >> _raw_spin_lock 1.17% > >> veth_xdp_xmit 6.45% > >> > >> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> > >> --- > >> drivers/net/veth.c | 26 +++++++++++++++++++++++++- > >> 1 file changed, 25 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/net/veth.c b/drivers/net/veth.c > >> index 52110e5..4edc75f 100644 > >> --- a/drivers/net/veth.c > >> +++ b/drivers/net/veth.c > >> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device *dev, int n, > >> return ret; > >> } > >> > >> +static void veth_xdp_flush_bq(struct net_device *dev) > >> +{ > >> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); > >> + int sent, i, err = 0; > >> + > >> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); > > > > Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So > > you're introducing an additional per-cpu bulk queue, only to avoid lock > > contention around the existing pointer ring. But the pointer ring is > > per-rq, so if you have lock contention, this means you must have > > multiple CPUs servicing the same rq, no? > > Yes, it's possible. Not recommended though. > I think the general per-cpu TX bulk queue is overkill. There is a loop over packets in veth_xdp_rcv(struct veth_rq *rq, budget, *status), and the caller veth_poll() will call veth_xdp_flush(rq->dev). Why can't you store this "temp" bulk array in struct veth_rq ? You could even alloc/create it on the stack of veth_poll() and send it along via a pointer to veth_xdp_rcv). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-23 13:29 ` Jesper Dangaard Brouer @ 2019-05-23 13:51 ` Toshiaki Makita 2019-05-24 3:13 ` Jason Wang 2019-05-24 9:53 ` Jesper Dangaard Brouer 0 siblings, 2 replies; 23+ messages in thread From: Toshiaki Makita @ 2019-05-23 13:51 UTC (permalink / raw) To: Jesper Dangaard Brouer, Toshiaki Makita Cc: Toke Høiland-Jørgensen, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev, xdp-newbies, bpf On 19/05/23 (木) 22:29:27, Jesper Dangaard Brouer wrote: > On Thu, 23 May 2019 20:35:50 +0900 > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: > >> On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: >>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: >>> >>>> This improves XDP_TX performance by about 8%. >>>> >>>> Here are single core XDP_TX test results. CPU consumptions are taken >>>> from "perf report --no-child". >>>> >>>> - Before: >>>> >>>> 7.26 Mpps >>>> >>>> _raw_spin_lock 7.83% >>>> veth_xdp_xmit 12.23% >>>> >>>> - After: >>>> >>>> 7.84 Mpps >>>> >>>> _raw_spin_lock 1.17% >>>> veth_xdp_xmit 6.45% >>>> >>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >>>> --- >>>> drivers/net/veth.c | 26 +++++++++++++++++++++++++- >>>> 1 file changed, 25 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>>> index 52110e5..4edc75f 100644 >>>> --- a/drivers/net/veth.c >>>> +++ b/drivers/net/veth.c >>>> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device *dev, int n, >>>> return ret; >>>> } >>>> >>>> +static void veth_xdp_flush_bq(struct net_device *dev) >>>> +{ >>>> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); >>>> + int sent, i, err = 0; >>>> + >>>> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); >>> >>> Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So >>> you're introducing an additional per-cpu bulk queue, only to avoid lock >>> contention around the existing pointer ring. But the pointer ring is >>> per-rq, so if you have lock contention, this means you must have >>> multiple CPUs servicing the same rq, no? >> >> Yes, it's possible. Not recommended though. >> > > I think the general per-cpu TX bulk queue is overkill. There is a loop > over packets in veth_xdp_rcv(struct veth_rq *rq, budget, *status), and > the caller veth_poll() will call veth_xdp_flush(rq->dev). > > Why can't you store this "temp" bulk array in struct veth_rq ? Of course I can. But I thought tun has the same problem and we can decrease memory footprint by sharing the same storage between devices. Or if other devices want to reduce queues so that we can use XDP on many-cpu servers and introduce locks, we can use this storage for that case as well. Still do you prefer veth-specific solution? > > You could even alloc/create it on the stack of veth_poll() and send it > along via a pointer to veth_xdp_rcv). > Toshiaki Makita ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-23 13:51 ` Toshiaki Makita @ 2019-05-24 3:13 ` Jason Wang 2019-05-24 3:28 ` Toshiaki Makita 2019-05-24 9:53 ` Jesper Dangaard Brouer 1 sibling, 1 reply; 23+ messages in thread From: Jason Wang @ 2019-05-24 3:13 UTC (permalink / raw) To: Toshiaki Makita, Jesper Dangaard Brouer, Toshiaki Makita Cc: Toke Høiland-Jørgensen, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev, xdp-newbies, bpf On 2019/5/23 下午9:51, Toshiaki Makita wrote: > On 19/05/23 (木) 22:29:27, Jesper Dangaard Brouer wrote: >> On Thu, 23 May 2019 20:35:50 +0900 >> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: >> >>> On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: >>>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: >>>>> This improves XDP_TX performance by about 8%. >>>>> >>>>> Here are single core XDP_TX test results. CPU consumptions are taken >>>>> from "perf report --no-child". >>>>> >>>>> - Before: >>>>> >>>>> 7.26 Mpps >>>>> >>>>> _raw_spin_lock 7.83% >>>>> veth_xdp_xmit 12.23% >>>>> >>>>> - After: >>>>> >>>>> 7.84 Mpps >>>>> >>>>> _raw_spin_lock 1.17% >>>>> veth_xdp_xmit 6.45% >>>>> >>>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >>>>> --- >>>>> drivers/net/veth.c | 26 +++++++++++++++++++++++++- >>>>> 1 file changed, 25 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>>>> index 52110e5..4edc75f 100644 >>>>> --- a/drivers/net/veth.c >>>>> +++ b/drivers/net/veth.c >>>>> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device >>>>> *dev, int n, >>>>> return ret; >>>>> } >>>>> +static void veth_xdp_flush_bq(struct net_device *dev) >>>>> +{ >>>>> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); >>>>> + int sent, i, err = 0; >>>>> + >>>>> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); >>>> >>>> Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So >>>> you're introducing an additional per-cpu bulk queue, only to avoid >>>> lock >>>> contention around the existing pointer ring. But the pointer ring is >>>> per-rq, so if you have lock contention, this means you must have >>>> multiple CPUs servicing the same rq, no? >>> >>> Yes, it's possible. Not recommended though. >>> >> >> I think the general per-cpu TX bulk queue is overkill. There is a loop >> over packets in veth_xdp_rcv(struct veth_rq *rq, budget, *status), and >> the caller veth_poll() will call veth_xdp_flush(rq->dev). >> >> Why can't you store this "temp" bulk array in struct veth_rq ? > > Of course I can. But I thought tun has the same problem and we can > decrease memory footprint by sharing the same storage between devices. For TUN and for its fast path where vhost passes a bulk of XDP frames (through msg_control) to us, we probably just need a temporary bulk array in tun_xdp_one() instead of a global one. I can post patch or maybe you if you're interested in this. Thanks > Or if other devices want to reduce queues so that we can use XDP on > many-cpu servers and introduce locks, we can use this storage for that > case as well. > > Still do you prefer veth-specific solution? > >> >> You could even alloc/create it on the stack of veth_poll() and send it >> along via a pointer to veth_xdp_rcv). >> > > Toshiaki Makita ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-24 3:13 ` Jason Wang @ 2019-05-24 3:28 ` Toshiaki Makita 2019-05-24 3:54 ` Jason Wang 0 siblings, 1 reply; 23+ messages in thread From: Toshiaki Makita @ 2019-05-24 3:28 UTC (permalink / raw) To: Jason Wang, Toshiaki Makita, Jesper Dangaard Brouer Cc: Toke Høiland-Jørgensen, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev, xdp-newbies, bpf On 2019/05/24 12:13, Jason Wang wrote: > On 2019/5/23 下午9:51, Toshiaki Makita wrote: >> On 19/05/23 (木) 22:29:27, Jesper Dangaard Brouer wrote: >>> On Thu, 23 May 2019 20:35:50 +0900 >>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: >>> >>>> On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: >>>>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: >>>>>> This improves XDP_TX performance by about 8%. >>>>>> >>>>>> Here are single core XDP_TX test results. CPU consumptions are taken >>>>>> from "perf report --no-child". >>>>>> >>>>>> - Before: >>>>>> >>>>>> 7.26 Mpps >>>>>> >>>>>> _raw_spin_lock 7.83% >>>>>> veth_xdp_xmit 12.23% >>>>>> >>>>>> - After: >>>>>> >>>>>> 7.84 Mpps >>>>>> >>>>>> _raw_spin_lock 1.17% >>>>>> veth_xdp_xmit 6.45% >>>>>> >>>>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >>>>>> --- >>>>>> drivers/net/veth.c | 26 +++++++++++++++++++++++++- >>>>>> 1 file changed, 25 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>>>>> index 52110e5..4edc75f 100644 >>>>>> --- a/drivers/net/veth.c >>>>>> +++ b/drivers/net/veth.c >>>>>> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device >>>>>> *dev, int n, >>>>>> return ret; >>>>>> } >>>>>> +static void veth_xdp_flush_bq(struct net_device *dev) >>>>>> +{ >>>>>> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); >>>>>> + int sent, i, err = 0; >>>>>> + >>>>>> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); >>>>> >>>>> Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So >>>>> you're introducing an additional per-cpu bulk queue, only to avoid >>>>> lock >>>>> contention around the existing pointer ring. But the pointer ring is >>>>> per-rq, so if you have lock contention, this means you must have >>>>> multiple CPUs servicing the same rq, no? >>>> >>>> Yes, it's possible. Not recommended though. >>>> >>> >>> I think the general per-cpu TX bulk queue is overkill. There is a loop >>> over packets in veth_xdp_rcv(struct veth_rq *rq, budget, *status), and >>> the caller veth_poll() will call veth_xdp_flush(rq->dev). >>> >>> Why can't you store this "temp" bulk array in struct veth_rq ? >> >> Of course I can. But I thought tun has the same problem and we can >> decrease memory footprint by sharing the same storage between devices. > > > For TUN and for its fast path where vhost passes a bulk of XDP frames > (through msg_control) to us, we probably just need a temporary bulk > array in tun_xdp_one() instead of a global one. I can post patch or > maybe you if you're interested in this. Of course you/I can. What I'm concerned is that could be waste of cache line when softirq runs veth napi handler and then tun napi handler. > > Thanks > > >> Or if other devices want to reduce queues so that we can use XDP on >> many-cpu servers and introduce locks, we can use this storage for that >> case as well. >> >> Still do you prefer veth-specific solution? >> >>> >>> You could even alloc/create it on the stack of veth_poll() and send it >>> along via a pointer to veth_xdp_rcv). >>> >> >> Toshiaki Makita > > -- Toshiaki Makita ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-24 3:28 ` Toshiaki Makita @ 2019-05-24 3:54 ` Jason Wang 2019-05-24 4:52 ` Toshiaki Makita 0 siblings, 1 reply; 23+ messages in thread From: Jason Wang @ 2019-05-24 3:54 UTC (permalink / raw) To: Toshiaki Makita, Toshiaki Makita, Jesper Dangaard Brouer Cc: Toke Høiland-Jørgensen, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev, xdp-newbies, bpf On 2019/5/24 上午11:28, Toshiaki Makita wrote: > On 2019/05/24 12:13, Jason Wang wrote: >> On 2019/5/23 下午9:51, Toshiaki Makita wrote: >>> On 19/05/23 (木) 22:29:27, Jesper Dangaard Brouer wrote: >>>> On Thu, 23 May 2019 20:35:50 +0900 >>>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: >>>> >>>>> On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: >>>>>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: >>>>>>> This improves XDP_TX performance by about 8%. >>>>>>> >>>>>>> Here are single core XDP_TX test results. CPU consumptions are taken >>>>>>> from "perf report --no-child". >>>>>>> >>>>>>> - Before: >>>>>>> >>>>>>> 7.26 Mpps >>>>>>> >>>>>>> _raw_spin_lock 7.83% >>>>>>> veth_xdp_xmit 12.23% >>>>>>> >>>>>>> - After: >>>>>>> >>>>>>> 7.84 Mpps >>>>>>> >>>>>>> _raw_spin_lock 1.17% >>>>>>> veth_xdp_xmit 6.45% >>>>>>> >>>>>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >>>>>>> --- >>>>>>> drivers/net/veth.c | 26 +++++++++++++++++++++++++- >>>>>>> 1 file changed, 25 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>>>>>> index 52110e5..4edc75f 100644 >>>>>>> --- a/drivers/net/veth.c >>>>>>> +++ b/drivers/net/veth.c >>>>>>> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device >>>>>>> *dev, int n, >>>>>>> return ret; >>>>>>> } >>>>>>> +static void veth_xdp_flush_bq(struct net_device *dev) >>>>>>> +{ >>>>>>> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); >>>>>>> + int sent, i, err = 0; >>>>>>> + >>>>>>> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); >>>>>> Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So >>>>>> you're introducing an additional per-cpu bulk queue, only to avoid >>>>>> lock >>>>>> contention around the existing pointer ring. But the pointer ring is >>>>>> per-rq, so if you have lock contention, this means you must have >>>>>> multiple CPUs servicing the same rq, no? >>>>> Yes, it's possible. Not recommended though. >>>>> >>>> I think the general per-cpu TX bulk queue is overkill. There is a loop >>>> over packets in veth_xdp_rcv(struct veth_rq *rq, budget, *status), and >>>> the caller veth_poll() will call veth_xdp_flush(rq->dev). >>>> >>>> Why can't you store this "temp" bulk array in struct veth_rq ? >>> Of course I can. But I thought tun has the same problem and we can >>> decrease memory footprint by sharing the same storage between devices. >> >> For TUN and for its fast path where vhost passes a bulk of XDP frames >> (through msg_control) to us, we probably just need a temporary bulk >> array in tun_xdp_one() instead of a global one. I can post patch or >> maybe you if you're interested in this. > Of course you/I can. What I'm concerned is that could be waste of cache > line when softirq runs veth napi handler and then tun napi handler. > Well, technically the bulk queue passed to TUN could be reused. I admit it may save cacheline in ideal case but I wonder how much we could gain on real workload. (Note TUN doesn't use napi handler to do XDP, it has a NAPI mode but it was mainly used for hardening and XDP was not implemented there, maybe we should fix this). Thanks >> Thanks >> >> >>> Or if other devices want to reduce queues so that we can use XDP on >>> many-cpu servers and introduce locks, we can use this storage for that >>> case as well. >>> >>> Still do you prefer veth-specific solution? >>> >>>> You could even alloc/create it on the stack of veth_poll() and send it >>>> along via a pointer to veth_xdp_rcv). >>>> >>> Toshiaki Makita >> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-24 3:54 ` Jason Wang @ 2019-05-24 4:52 ` Toshiaki Makita 0 siblings, 0 replies; 23+ messages in thread From: Toshiaki Makita @ 2019-05-24 4:52 UTC (permalink / raw) To: Jason Wang, Toshiaki Makita, Jesper Dangaard Brouer Cc: Toke Høiland-Jørgensen, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev, xdp-newbies, bpf On 2019/05/24 12:54, Jason Wang wrote: > On 2019/5/24 上午11:28, Toshiaki Makita wrote: >> On 2019/05/24 12:13, Jason Wang wrote: >>> On 2019/5/23 下午9:51, Toshiaki Makita wrote: >>>> On 19/05/23 (木) 22:29:27, Jesper Dangaard Brouer wrote: >>>>> On Thu, 23 May 2019 20:35:50 +0900 >>>>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: >>>>> >>>>>> On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: >>>>>>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: >>>>>>>> This improves XDP_TX performance by about 8%. >>>>>>>> >>>>>>>> Here are single core XDP_TX test results. CPU consumptions are >>>>>>>> taken >>>>>>>> from "perf report --no-child". >>>>>>>> >>>>>>>> - Before: >>>>>>>> >>>>>>>> 7.26 Mpps >>>>>>>> >>>>>>>> _raw_spin_lock 7.83% >>>>>>>> veth_xdp_xmit 12.23% >>>>>>>> >>>>>>>> - After: >>>>>>>> >>>>>>>> 7.84 Mpps >>>>>>>> >>>>>>>> _raw_spin_lock 1.17% >>>>>>>> veth_xdp_xmit 6.45% >>>>>>>> >>>>>>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >>>>>>>> --- >>>>>>>> drivers/net/veth.c | 26 +++++++++++++++++++++++++- >>>>>>>> 1 file changed, 25 insertions(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>>>>>>> index 52110e5..4edc75f 100644 >>>>>>>> --- a/drivers/net/veth.c >>>>>>>> +++ b/drivers/net/veth.c >>>>>>>> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device >>>>>>>> *dev, int n, >>>>>>>> return ret; >>>>>>>> } >>>>>>>> +static void veth_xdp_flush_bq(struct net_device *dev) >>>>>>>> +{ >>>>>>>> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); >>>>>>>> + int sent, i, err = 0; >>>>>>>> + >>>>>>>> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); >>>>>>> Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So >>>>>>> you're introducing an additional per-cpu bulk queue, only to avoid >>>>>>> lock >>>>>>> contention around the existing pointer ring. But the pointer ring is >>>>>>> per-rq, so if you have lock contention, this means you must have >>>>>>> multiple CPUs servicing the same rq, no? >>>>>> Yes, it's possible. Not recommended though. >>>>>> >>>>> I think the general per-cpu TX bulk queue is overkill. There is a >>>>> loop >>>>> over packets in veth_xdp_rcv(struct veth_rq *rq, budget, *status), and >>>>> the caller veth_poll() will call veth_xdp_flush(rq->dev). >>>>> >>>>> Why can't you store this "temp" bulk array in struct veth_rq ? >>>> Of course I can. But I thought tun has the same problem and we can >>>> decrease memory footprint by sharing the same storage between devices. >>> >>> For TUN and for its fast path where vhost passes a bulk of XDP frames >>> (through msg_control) to us, we probably just need a temporary bulk >>> array in tun_xdp_one() instead of a global one. I can post patch or >>> maybe you if you're interested in this. >> Of course you/I can. What I'm concerned is that could be waste of cache >> line when softirq runs veth napi handler and then tun napi handler. >> > > Well, technically the bulk queue passed to TUN could be reused. I admit > it may save cacheline in ideal case but I wonder how much we could gain > on real workload. I see veth_rq ptr_ring suffering from cacheline miss, which makes me conservative about adding more buffers for xdp_frames. I'll wait for some more feedback from others. > (Note TUN doesn't use napi handler to do XDP, it has a > NAPI mode but it was mainly used for hardening and XDP was not > implemented there, maybe we should fix this). Ah, that's true. Sorry for confusion. > > Thanks > > >>> Thanks >>> >>> >>>> Or if other devices want to reduce queues so that we can use XDP on >>>> many-cpu servers and introduce locks, we can use this storage for that >>>> case as well. >>>> >>>> Still do you prefer veth-specific solution? >>>> >>>>> You could even alloc/create it on the stack of veth_poll() and send it >>>>> along via a pointer to veth_xdp_rcv). >>>>> >>>> Toshiaki Makita >>> > > -- Toshiaki Makita ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-23 13:51 ` Toshiaki Makita 2019-05-24 3:13 ` Jason Wang @ 2019-05-24 9:53 ` Jesper Dangaard Brouer 2019-05-27 6:08 ` Toshiaki Makita 1 sibling, 1 reply; 23+ messages in thread From: Jesper Dangaard Brouer @ 2019-05-24 9:53 UTC (permalink / raw) To: Toshiaki Makita Cc: Toshiaki Makita, Toke Høiland-Jørgensen, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev, xdp-newbies, bpf, brouer On Thu, 23 May 2019 22:51:34 +0900 Toshiaki Makita <toshiaki.makita1@gmail.com> wrote: > On 19/05/23 (木) 22:29:27, Jesper Dangaard Brouer wrote: > > On Thu, 23 May 2019 20:35:50 +0900 > > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: > > > >> On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: > >>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > >>> > >>>> This improves XDP_TX performance by about 8%. > >>>> > >>>> Here are single core XDP_TX test results. CPU consumptions are taken > >>>> from "perf report --no-child". > >>>> > >>>> - Before: > >>>> > >>>> 7.26 Mpps > >>>> > >>>> _raw_spin_lock 7.83% > >>>> veth_xdp_xmit 12.23% > >>>> > >>>> - After: > >>>> > >>>> 7.84 Mpps > >>>> > >>>> _raw_spin_lock 1.17% > >>>> veth_xdp_xmit 6.45% > >>>> > >>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> > >>>> --- > >>>> drivers/net/veth.c | 26 +++++++++++++++++++++++++- > >>>> 1 file changed, 25 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c > >>>> index 52110e5..4edc75f 100644 > >>>> --- a/drivers/net/veth.c > >>>> +++ b/drivers/net/veth.c > >>>> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device *dev, int n, > >>>> return ret; > >>>> } > >>>> > >>>> +static void veth_xdp_flush_bq(struct net_device *dev) > >>>> +{ > >>>> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); > >>>> + int sent, i, err = 0; > >>>> + > >>>> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); > >>> > >>> Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So > >>> you're introducing an additional per-cpu bulk queue, only to avoid lock > >>> contention around the existing pointer ring. But the pointer ring is > >>> per-rq, so if you have lock contention, this means you must have > >>> multiple CPUs servicing the same rq, no? > >> > >> Yes, it's possible. Not recommended though. > >> > > > > I think the general per-cpu TX bulk queue is overkill. There is a loop > > over packets in veth_xdp_rcv(struct veth_rq *rq, budget, *status), and > > the caller veth_poll() will call veth_xdp_flush(rq->dev). > > > > Why can't you store this "temp" bulk array in struct veth_rq ? > > Of course I can. But I thought tun has the same problem and we can > decrease memory footprint by sharing the same storage between devices. > Or if other devices want to reduce queues so that we can use XDP on > many-cpu servers and introduce locks, we can use this storage for > that case as well. > > Still do you prefer veth-specific solution? Yes. Another reason is that with this shared/general per-cpu TX bulk queue, I can easily see bugs resulting in xdp_frames getting transmitted on a completely other NIC, which will be hard to debug for people. > > > > You could even alloc/create it on the stack of veth_poll() and send > > it along via a pointer to veth_xdp_rcv). IHMO it would be cleaner code wise to place the "temp" bulk array in struct veth_rq. But if you worry about performance and want a hot cacheline for this, then you could just use the call-stack for veth_poll(), as I described. It should not be too ugly code wise to do this I think. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH bpf-next 3/3] veth: Support bulk XDP_TX 2019-05-24 9:53 ` Jesper Dangaard Brouer @ 2019-05-27 6:08 ` Toshiaki Makita 0 siblings, 0 replies; 23+ messages in thread From: Toshiaki Makita @ 2019-05-27 6:08 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: Toshiaki Makita, Toke Høiland-Jørgensen, Alexei Starovoitov, Daniel Borkmann, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev, xdp-newbies, bpf On 2019/05/24 18:53, Jesper Dangaard Brouer wrote: > On Thu, 23 May 2019 22:51:34 +0900 > Toshiaki Makita <toshiaki.makita1@gmail.com> wrote: > >> On 19/05/23 (木) 22:29:27, Jesper Dangaard Brouer wrote: >>> On Thu, 23 May 2019 20:35:50 +0900 >>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: >>> >>>> On 2019/05/23 20:25, Toke Høiland-Jørgensen wrote: >>>>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: >>>>> >>>>>> This improves XDP_TX performance by about 8%. >>>>>> >>>>>> Here are single core XDP_TX test results. CPU consumptions are taken >>>>>> from "perf report --no-child". >>>>>> >>>>>> - Before: >>>>>> >>>>>> 7.26 Mpps >>>>>> >>>>>> _raw_spin_lock 7.83% >>>>>> veth_xdp_xmit 12.23% >>>>>> >>>>>> - After: >>>>>> >>>>>> 7.84 Mpps >>>>>> >>>>>> _raw_spin_lock 1.17% >>>>>> veth_xdp_xmit 6.45% >>>>>> >>>>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >>>>>> --- >>>>>> drivers/net/veth.c | 26 +++++++++++++++++++++++++- >>>>>> 1 file changed, 25 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>>>>> index 52110e5..4edc75f 100644 >>>>>> --- a/drivers/net/veth.c >>>>>> +++ b/drivers/net/veth.c >>>>>> @@ -442,6 +442,23 @@ static int veth_xdp_xmit(struct net_device *dev, int n, >>>>>> return ret; >>>>>> } >>>>>> >>>>>> +static void veth_xdp_flush_bq(struct net_device *dev) >>>>>> +{ >>>>>> + struct xdp_tx_bulk_queue *bq = this_cpu_ptr(&xdp_tx_bq); >>>>>> + int sent, i, err = 0; >>>>>> + >>>>>> + sent = veth_xdp_xmit(dev, bq->count, bq->q, 0); >>>>> >>>>> Wait, veth_xdp_xmit() is just putting frames on a pointer ring. So >>>>> you're introducing an additional per-cpu bulk queue, only to avoid lock >>>>> contention around the existing pointer ring. But the pointer ring is >>>>> per-rq, so if you have lock contention, this means you must have >>>>> multiple CPUs servicing the same rq, no? >>>> >>>> Yes, it's possible. Not recommended though. >>>> >>> >>> I think the general per-cpu TX bulk queue is overkill. There is a loop >>> over packets in veth_xdp_rcv(struct veth_rq *rq, budget, *status), and >>> the caller veth_poll() will call veth_xdp_flush(rq->dev). >>> >>> Why can't you store this "temp" bulk array in struct veth_rq ? >> >> Of course I can. But I thought tun has the same problem and we can >> decrease memory footprint by sharing the same storage between devices. >> Or if other devices want to reduce queues so that we can use XDP on >> many-cpu servers and introduce locks, we can use this storage for >> that case as well. >> >> Still do you prefer veth-specific solution? > > Yes. Another reason is that with this shared/general per-cpu TX bulk > queue, I can easily see bugs resulting in xdp_frames getting > transmitted on a completely other NIC, which will be hard to debug for > people. > >>> >>> You could even alloc/create it on the stack of veth_poll() and send >>> it along via a pointer to veth_xdp_rcv). > > IHMO it would be cleaner code wise to place the "temp" bulk array in > struct veth_rq. But if you worry about performance and want a hot > cacheline for this, then you could just use the call-stack for > veth_poll(), as I described. It should not be too ugly code wise to do > this I think. Rethinking this I agree to not using global but use stack. For performance you are right, stack should be as hot as global if other drivers use stack as well. I was a bit concerned about stack size, but 128 bytes size is probably acceptable these days. Wrt debugging, indeed the global solution is probably more difficult. When we fail to flush bq, the stack solution can be tracked by something like kmemleak but the global one cannot. Also the global solution has a risk to send packets from unintentional devices, which leads to a security problem. With the stack solution missing flush just causes packet loss and memory leak. -- Toshiaki Makita ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2019-05-27 6:10 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-23 10:56 [PATCH bpf-next 0/3] veth: Bulk XDP_TX Toshiaki Makita 2019-05-23 10:56 ` [PATCH bpf-next 1/3] xdp: Add bulk XDP_TX queue Toshiaki Makita 2019-05-23 11:11 ` Toke Høiland-Jørgensen 2019-05-23 11:11 ` Toke Høiland-Jørgensen 2019-05-23 11:24 ` Toshiaki Makita 2019-05-23 11:33 ` Toke Høiland-Jørgensen 2019-05-23 10:56 ` [PATCH bpf-next 2/3] xdp: Add tracepoint for bulk XDP_TX Toshiaki Makita 2019-05-23 13:12 ` Jesper Dangaard Brouer 2019-05-24 1:33 ` Toshiaki Makita 2019-05-23 10:56 ` [PATCH bpf-next 3/3] veth: Support " Toshiaki Makita 2019-05-23 11:25 ` Toke Høiland-Jørgensen 2019-05-23 11:25 ` Toke Høiland-Jørgensen 2019-05-23 11:35 ` Toshiaki Makita 2019-05-23 12:18 ` Toke Høiland-Jørgensen 2019-05-23 13:40 ` Toshiaki Makita 2019-05-23 13:29 ` Jesper Dangaard Brouer 2019-05-23 13:51 ` Toshiaki Makita 2019-05-24 3:13 ` Jason Wang 2019-05-24 3:28 ` Toshiaki Makita 2019-05-24 3:54 ` Jason Wang 2019-05-24 4:52 ` Toshiaki Makita 2019-05-24 9:53 ` Jesper Dangaard Brouer 2019-05-27 6:08 ` Toshiaki Makita
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.