netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v5 net-next 0/6] enable tx interrupts for virtio-net
@ 2015-02-09  8:39 Jason Wang
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending Jason Wang
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Jason Wang @ 2015-02-09  8:39 UTC (permalink / raw)
  To: netdev, linux-kernel, virtualization, mst; +Cc: pagupta

Hi:

This is a new version of trying to enable tx interrupts for
virtio-net.

We used to try to avoid tx interrupts and orphan packets before
transmission for virtio-net. This breaks socket accounting and can
lead serveral other side effects e.g:

- Several other functions which depends on socket accounting can not
  work  correctly (e.g  TCP Small Queue)
- No tx completion which make BQL or packet generator can not work
  correctly.

This series tries to solve the issue by enabling tx interrupts. To
minize the performance impacts of this, several optimizations were
used:

- In guest side, try to use delayed callbacks as much as possible.
- In host side, try to use interrupt coalescing for reduce tx
  interrupts. About 10% - 15% performance were improved with this.

Perforamnce test shows:
- Very few regression (less than 5%) were noticed TCP_RR on 1 vcpu 1
  queue guest.
- CPU utilization is increased in some cases.
- All other cases, tx interrupts can perform equal or better than
  orphaning (much more obvious when testing small packet transmission
  as a byproduct since socket accounting works for TCP packet).

Changes from RFCv4:
- fix the virtqueue_enable_cb_delayed() return value when only 1
  buffer is pending.
- try to disable callbacks by publish event index in
  virtqueue_disable_cb(). Tests shows about 2% - 3% improvement on
  multiple sessions of TCP_RR.
- Revert some of Micahel's tweaks from RFC v1 (see patch 3 for
  details).
- use netif_wake_subqueue() instead of netif_start_subqueue() in
  free_old_xmit_skbs(), since it may be called in tx napi.
- in start_xmit(), try to enable the callback only when current skb is
  the last in the list or tx has already been stopped. This avoid the
  callbacks enabling in heavy load.
- return ns instead of us in vhost_net_check_coalesce_and_signal()
- measure the time interval of real interrupts instead of calls to
  vhost_signal()
- drop bql from the series since it does not affact performance from
  the test result.
Changes from RFC V3:
- Don't free tx packets in ndo_start_xmit()
- Add interrupt coalescing support for virtio-net
Changes from RFC v2:
- clean up code, address issues raised by Jason
Changes from RFC v1:
- address comments by Jason Wang, use delayed cb everywhere
- rebased Jason's patch on top of mine and include it (with some
tweaks)

Test result:

Environment:
Host: net-next.git
Guest: net-next.git
Qemu: qemu.git
CPU: Two Intel(R) Xeon(R) CPU E5620  @ 2.40GHz
Card: Two Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network
Connection (rev 01) connected back to back
Coalescing in host: rx-usecs: 1
Coalescing in guest:
- none if no tx interrupt
- tx-frames 8 tx-usecs 64
Zerocopy disabled for vhost_net

1 VCPU 1 queue:

Guest TX:
sessions/sizes/+-throughput%/+-cpu%/+-per_cpu_throughput%/
1/64/+41.2/-2.6/+45.1/
2/64/+40.8/-2.7/+44.7/
4/64/+38.5/-2.8/+42.5/
8/64/+36.1/-2.1/+38.9/
1/256/+158.5/-5.0/+172.3/
2/256/+155.3/-4.7/+167.9/
4/256/+156.4/-5.3/+170.6/
8/256/+155.4/-4.9/+168.6/
1/512/+276.5/-3.0/+288.3/
2/512/+280.1/-3.3/+293.1/
4/512/+276.1/-2.7/+286.7/
8/512/+267.1/-3.2/+279.1/
1/1024/+344.0/-6.9/+376.9/
2/1024/+362.3/-2.2/+372.8/
4/1024/+370.4/-1.1/+375.8/
8/1024/+374.0/+0.6/+371.1/
1/4096/+15.1/-30.4/+65.5/
2/4096/+6.3/-32.9/+58.4/
4/4096/+6.9/-32.4/+58.3/
8/4096/+6.7/-26.4/+45.0/
1/16384/-0.6/-4.6/+4.2/
2/16384/+0.0/-5.0/+5.4/
4/16384/+0.0/-2.5/+2.6/
8/16384/-0.0/-13.8/+13.2/
1/65535/-0.5/-0.5/0.0/
2/65535/-0.0/+4.0/-3.9/
4/65535/-0.0/+9.4/-8.6/
8/65535/+0.1/+8.6/-7.8/

TCP_RR:
sessions/sizes/+-throughput%/+-cpu%/+-per_cpu_throughput%/
1/1/-2.4/+16.3/-16.6/
50/1/-3.4/-0.2/-3.2/
1/64/+3.3/+0.7/+2.7/
50/64/-3.6/+0.2/-3.8/
1/256/-0.4/+16.6/-14.7/
50/256/-6.1/+0.2/-6.2/

Guest RX:
sessions/sizes/+-throughput%/+-cpu%/+-per_cpu_throughput%/
1/64/+17.3/+4.9/+10.0/
2/64/+13.8/+1.1/+12.3/
4/64/+3.8/+1.0/+2.7/
8/64/+1.4/+2.1/-0.7
1/256/+24.5/-1.3/+19.1/
2/256/-3.9/+3.8/-7.5/
4/256/-1.3/+1.5/-2.8/
8/256/+1.5/+1.7/-0.2/
1/512/+2.9/+12.7/-8.6/
2/512/-0.0/+3.5/-3.4/
4/512/-1.4/+2.6/-4.0/
8/512/-1.3/+3.1/-4.2/
1/1024/-0.2/+6.0/-5.9/
2/1024/+0.0/+3.1/-3.1/
4/1024/+0.1/+0.9/-0.7/
8/1024/-1.1/-4.1/+3.2/
1/4096/-0.6/+4.0/-4.4/
2/4096/-0.0/+3.6/-3.4/
4/4096/+0.7/+3.8/-3.1/
8/4096/-0.5/+2.4/-2.9/
1/16384/-1.1/+4.9/-5.7/
2/16384/-0.0/+2.4/-2.4/
4/16384/-0.1/+4.7/-4.6/
8/16384/+1.3/-2.0/+3.2/
1/65535/+1.9/+6.0/-3.8/
2/65535/+0.0/+3.3/-3.2/
4/65535/+0.1/+5.1/-4.7/
8/65535/-0.7/+8.0/-8.5/

4 VCPU 1 queue:

Guest TX:
sessions/sizes/+-throughput%/+-cpu%/+-per_cpu_throughput%/
1/64/+45.9/+2.7/+41.9/
2/64/+28.8/+0.2/+28.4/
4/64/+11.2/-0.6/+11.9/
8/64/+10.0/-0.5/+10.6/
1/256/+162.4/+1.0/+160.1/
2/256/+99.7/-0.2/+99.9/
4/256/+10.2/-11.3/+24.2/
8/256/-0.0/-13.8/+16.0/
1/512/+232.4/-0.2/+232.8/
2/512/+105.9/-9.9/+128.5/
4/512/+6.7/-18.8/+31.4/
8/512/+0.0/-21.0/+26.6/
1/1024/+285.1/-2.9/+296.9/
2/1024/+99.3/-22.4/+156.8/
4/1024/+3.7/-24.0/+36.4/
8/1024/+0.1/-23.9/+31.6/
1/4096/+35.9/-17.4/+64.6/
2/4096/+11.4/-27.4/+53.4/
4/4096/+0.1/-36.0/+56.3/
8/4096/+0.0/-34.9/+53.7/
1/16384/-1.8/-9.5/+8.5/
2/16384/+0.0/-2.3/+2.4/
4/16384/+0.0/-3.0/+3.1/
8/16384/+0.1/-0.3/+0.4/
1/65535/-1.1/+3.1/-4.0/
2/65535/+0.0/+5.4/-5.1/
4/65535/+0.0/+7.2/-6.7/
8/65535/-0.0/+7.1/-6.7/

TCP_RR:
sessions/sizes/+-throughput%/+-cpu%/+-per_cpu_throughput%/
1/1/-0.5/+11.7/-11.0/
50/1/-1.9/+2.2/-4.1/
1/64/-0.9/+15.5/-14.3/
50/64/-3.3/+1.3/-4.5/
1/256/+0.6/+5.7/-5.5/
50/256/-1.6/+2.3/-3.8/

Guest RX:
sessions/sizes/+-throughput%/+-cpu%/+-per_cpu_throughput%/
1/64/+12.2/+10.7/+1.3/
2/64/-3.0/+1.9/-4.9/
4/64/+0.9/+0.6/+0.2/
8/64/-2.8/+11.9/-13.6/
1/256/+3.6/-1.8/+10.7/
2/256/+1.9/-0.8/+2.8/
4/256/+0.1/+2.1/-1.9/
8/256/+1.1/+4.3/-2.1/
1/512/+1.5/+1.8/-0.9/
2/512/-0.1/+5.5/-5.3/
4/512/+0.5/+12.9/-11.7/
8/512/+5.5/+10.2/-4.0/
1/1024/+2.6/+6.8/-4.0/
2/1024/-0.0/+3.3/-3.3/
4/1024/+0.1/+5.1/-4.7/
8/1024/+3.6/-2.9/+6.5/
1/4096/+1.0/+7.0/-5.7/
2/4096/-0.0/+1.7/-1.5/
4/4096/+0.1/+3.6/-3.4/
8/4096/-1.0/+2.6/-4.0/
1/16384/+0.6/+3.3/-2.7/
2/16384/+0.0/-2.9/+3.3/
4/16384/+1.5/+1.3/+0.7/
8/16384/+0.5/+8.5/-6.9/
1/65535/+0.5/+2.4/-2.0/
2/65535/+0.0/-1.2/+1.5/
4/65535/+0.1/-4.5/+4.9/
8/65535/+1.4/+7.1/-5.7/

4 VCPU 4 queues:

Guest TX:
sessions/sizes/+-throughput%/+-cpu%/+-per_cpu_throughput%/
1/64/+43.6/-7.9/+56.5/
2/64/+41.7/-3.8/+47.3/
4/64/+43.7/-3.7/+49.3/
8/64/+40.2/-3.5/+45.3/
1/256/+155.9/-2.9/+164.2/
2/256/+160.6/-3.2/+169.1/
4/256/+77.5/-23.2/+131.4/
8/256/+82.5/-21.1/+132.4/
1/512/+214.7/-0.7/+217.9/
2/512/+125.2/-10.1/+151.8/
4/512/+30.4/-37.0/+107.6/
8/512/+0.8/-17.1/+22.9/
1/1024/+250.0/+3.1/+239.5/
2/1024/+83.6/-19.7/+129.0/
4/1024/+18.0/-39.4/+95.2/
8/1024/+1.9/-20.5/+30.3/
1/4096/+26.6/-10.9/+42.1/
2/4096/+0.9/-2.3/+3.9/
4/4096/+0.9/-12.6/+16.8/
8/4096/-0.4/-29.4/+42.1/
1/16384/+12.0/-3.1/+15.9/
2/16384/-2.8/-2.1/+0.5/
4/16384/-1.6/-15.3/+18.2/
8/16384/+0.5/-17.9/+34.5/
1/65535/+12.2/-2.9/+15.8/
2/65535/-4.3/+4.7/-8.6/
4/65535/+1.1/-9.8/+15.9/
8/65535/-1.3/-21.8/+34.1/

TCP_RR:
sessions/sizes/+-throughput%/+-cpu%/+-per_cpu_throughput%/
1/1/+5.4/-0.4/+6.6/
50/1/+6.9/+2.8/+3.9/
1/64/+7.8/+8.4/-1.5/
50/64/+11.1/+0.8/+10.4/
1/256/+3.9/-2.9/+10.3/
50/256/+17.4/+2.7/+14.1/

Guest RX:
sessions/sizes/+-throughput%/+-cpu%/+-per_cpu_throughput%/
1/64/+3.9/+3.8/+9.8/
2/64/-11.0/+11.2/-20.4/
4/64/-5.2/+14.4/-16.6/
8/64/-8.2/+17.2/-21.6/
1/256/+37.3/-32.0/+47.7/
2/256/+16.2/-9.7/+27.7/
4/256/+0.7/+1.5/-0.9/
8/256/+2.1/+8.6/-5.8/
1/512/+2.7/-2.2/+5.5/
2/512/+1.5/+9.4/-7.4/
4/512/-0.0/+10.7/-9.6/
8/512/+4.0/+11.6/-7.0/
1/1024/-3.1/-5.7/+3.2/
2/1024/+0.0/+5.5/-4.9/
4/1024/-0.2/+7.0/-6.9/
8/1024/-1.5/+1.3/-2.6/
1/4096/-1.0/-10.5/+10.6/
2/4096/+0.0/-0.3/+1.0/
4/4096/+0.1/+15.1/-13.0/
8/4096/-1.3/+15.5/-14.6/
1/16384/-0.5/-11.6/+12.5/
2/16384/+0.0/+6.1/-5.9/
4/16384/-1.7/+24.2/-20.5/
8/16384/-2.2/+2.8/-2.6/
1/65535/-0.9/-12.4/+13.0/
2/65535/+0.1/+4.7/-4.3/
4/65535/0.0/+10.6/-9.5/
8/65535/-1.8/+9.4/-10.1/

I've also done test on mlx4, almos the same trends. So not post here.

Thanks

Jason Wang (6):
  virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were
    pending
  virtio_ring: try to disable event index callbacks in
    virtqueue_disable_cb()
  virtio_net: enable tx interrupt
  virtio-net: add basic interrupt coalescing support
  vhost: let vhost_signal() returns whether signalled
  vhost_net: interrupt coalescing support

 drivers/net/virtio_net.c        | 203 +++++++++++++++++++++++++++++++---------
 drivers/vhost/net.c             | 199 +++++++++++++++++++++++++++++++++++++--
 drivers/vhost/vhost.c           |   7 +-
 drivers/vhost/vhost.h           |   2 +-
 drivers/virtio/virtio_ring.c    |   7 +-
 include/uapi/linux/vhost.h      |  12 +++
 include/uapi/linux/virtio_net.h |  12 +++
 7 files changed, 383 insertions(+), 59 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending
  2015-02-09  8:39 [PATCH RFC v5 net-next 0/6] enable tx interrupts for virtio-net Jason Wang
@ 2015-02-09  8:39 ` Jason Wang
  2015-02-10  1:03   ` Rusty Russell
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb() Jason Wang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2015-02-09  8:39 UTC (permalink / raw)
  To: netdev, linux-kernel, virtualization, mst; +Cc: pagupta

We currently does:

bufs = (avail->idx - last_used_idx) * 3 / 4;

This is ok now since we only try to enable the delayed callbacks when
the queue is about to be full. This may not work well when there is
only one pending buffer in the virtqueue (this may be the case after
tx interrupt was enabled). Since virtqueue_enable_cb() will return
false which may cause unnecessary triggering of napis. This patch
correct this by only calculate the four thirds when bufs is not one.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 00ec6b3..545fed5 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -636,7 +636,10 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 	 * entry. Always do both to keep code simple. */
 	vq->vring.avail->flags &= cpu_to_virtio16(_vq->vdev, ~VRING_AVAIL_F_NO_INTERRUPT);
 	/* TODO: tune this threshold */
-	bufs = (u16)(virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) - vq->last_used_idx) * 3 / 4;
+	bufs = (u16)(virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) -
+		                     vq->last_used_idx);
+	if (bufs != 1)
+		bufs = bufs * 3 / 4;
 	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs);
 	virtio_mb(vq->weak_barriers);
 	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb()
  2015-02-09  8:39 [PATCH RFC v5 net-next 0/6] enable tx interrupts for virtio-net Jason Wang
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending Jason Wang
@ 2015-02-09  8:39 ` Jason Wang
  2015-02-10  1:07   ` Rusty Russell
  2015-02-10 10:24   ` Michael S. Tsirkin
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 3/6] virtio_net: enable tx interrupt Jason Wang
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: Jason Wang @ 2015-02-09  8:39 UTC (permalink / raw)
  To: netdev, linux-kernel, virtualization, mst; +Cc: pagupta

Currently, we do nothing to prevent the callbacks in
virtqueue_disable_cb() when event index is used. This may cause
spurious interrupts which may damage the performance. This patch tries
to publish avail event as the used even to prevent the callbacks.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 545fed5..e9ffbfb 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -539,6 +539,8 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	vq->vring.avail->flags |= cpu_to_virtio16(_vq->vdev, VRING_AVAIL_F_NO_INTERRUPT);
+	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev,
+						       vq->vring.avail->idx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v5 net-next 3/6] virtio_net: enable tx interrupt
  2015-02-09  8:39 [PATCH RFC v5 net-next 0/6] enable tx interrupts for virtio-net Jason Wang
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending Jason Wang
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb() Jason Wang
@ 2015-02-09  8:39 ` Jason Wang
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support Jason Wang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2015-02-09  8:39 UTC (permalink / raw)
  To: netdev, linux-kernel, virtualization, mst; +Cc: pagupta

On newer hosts that support delayed tx interrupts,
we probably don't have much to gain from orphaning
packets early.

Note: this might degrade performance for
hosts without event idx support.
Should be addressed by the next patch.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Changes from RFCv4:
- change:
        virtqueue_disable_cb(sq->vq);
        napi_schedule(&sq->napi);

  in skb_xmit_done() to:

  if (__napi_schedule_prep(&sq->napi)) {
     virtqueue_diable_cb(sq->vq);
     __napi_schedule(&sq->napi);
  }

  to solve the race on architectures that atomic operations were not
  serialized. And do solve a similar issue in virtnet_poll_tx().
- use netif_wake_subqueue() instead of netif_start_subqueue() in
  free_old_xmit_skbs(), since it may be called in tx napi.
- in start_xmit(), try to enable the callback only when current skb is
  the last in the list or tx has already been stopped. This avoid the
  callbacks enabling in heavy load.
---
 drivers/net/virtio_net.c | 136 +++++++++++++++++++++++++++++++----------------
 1 file changed, 90 insertions(+), 46 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11e2e81..cc5f5de 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
 	/* Name of the send queue: output.$index */
 	char name[40];
+
+	struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -140,6 +142,9 @@ struct virtnet_info {
 
 	/* CPU hot plug notifier */
 	struct notifier_block nb;
+
+	/* Budget for polling tx completion */
+	u32 tx_work_limit;
 };
 
 struct padded_vnet_hdr {
@@ -207,15 +212,43 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
+static unsigned int free_old_xmit_skbs(struct netdev_queue *txq,
+				       struct send_queue *sq, int budget)
+{
+	struct sk_buff *skb;
+	unsigned int len;
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	unsigned int packets = 0;
+
+	while (packets < budget &&
+	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		pr_debug("Sent skb %p\n", skb);
+
+		u64_stats_update_begin(&stats->tx_syncp);
+		stats->tx_bytes += skb->len;
+		stats->tx_packets++;
+		u64_stats_update_end(&stats->tx_syncp);
+
+		dev_kfree_skb_any(skb);
+		packets++;
+	}
+
+	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+		netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+
+	return packets;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
+	struct send_queue *sq = &vi->sq[vq2txq(vq)];
 
-	/* Suppress further interrupts. */
-	virtqueue_disable_cb(vq);
-
-	/* We were probably waiting for more output buffers. */
-	netif_wake_subqueue(vi->dev, vq2txq(vq));
+	if (napi_schedule_prep(&sq->napi)) {
+		virtqueue_disable_cb(sq->vq);
+		__napi_schedule(&sq->napi);
+	}
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -776,6 +809,30 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 	return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct send_queue *sq =
+		container_of(napi, struct send_queue, napi);
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+	u32 limit = vi->tx_work_limit;
+	unsigned int r, sent;
+
+	__netif_tx_lock(txq, smp_processor_id());
+	sent = free_old_xmit_skbs(txq, sq, limit);
+	if (sent < limit) {
+		r = virtqueue_enable_cb_prepare(sq->vq);
+		napi_complete(napi);
+		if (unlikely(virtqueue_poll(sq->vq, r)) &&
+		    napi_schedule_prep(napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(napi);
+		}
+	}
+	__netif_tx_unlock(txq);
+	return sent < limit ? 0 : budget;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
@@ -824,30 +881,12 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(&vi->rq[i]);
+		napi_enable(&vi->sq[i].napi);
 	}
 
 	return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
-{
-	struct sk_buff *skb;
-	unsigned int len;
-	struct virtnet_info *vi = sq->vq->vdev->priv;
-	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
-
-	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		pr_debug("Sent skb %p\n", skb);
-
-		u64_stats_update_begin(&stats->tx_syncp);
-		stats->tx_bytes += skb->len;
-		stats->tx_packets++;
-		u64_stats_update_end(&stats->tx_syncp);
-
-		dev_kfree_skb_any(skb);
-	}
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
@@ -910,7 +949,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 		sg_set_buf(sq->sg, hdr, hdr_len);
 		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
 	}
-	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
+
+	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb,
+				    GFP_ATOMIC);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -922,8 +963,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !skb->xmit_more;
 
-	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	virtqueue_disable_cb(sq->vq);
 
 	/* Try to transmit */
 	err = xmit_skb(sq, skb);
@@ -939,27 +979,19 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_OK;
 	}
 
-	/* Don't wait up for transmitted skbs to be freed. */
-	skb_orphan(skb);
-	nf_reset(skb);
-
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
-	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
+	if (sq->vq->num_free < 2+MAX_SKB_FRAGS)
 		netif_stop_subqueue(dev, qnum);
-		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
-			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
-			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
-				netif_start_subqueue(dev, qnum);
-				virtqueue_disable_cb(sq->vq);
-			}
-		}
-	}
 
-	if (kick || netif_xmit_stopped(txq))
+	if (kick || netif_xmit_stopped(txq)) {
 		virtqueue_kick(sq->vq);
-
+		if (!virtqueue_enable_cb_delayed(sq->vq) &&
+		    napi_schedule_prep(&sq->napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(&sq->napi);
+		}
+	}
 	return NETDEV_TX_OK;
 }
 
@@ -1137,8 +1169,10 @@ static int virtnet_close(struct net_device *dev)
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_disable(&vi->rq[i].napi);
+		napi_disable(&vi->sq[i].napi);
+	}
 
 	return 0;
 }
@@ -1451,8 +1485,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 {
 	int i;
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		netif_napi_del(&vi->rq[i].napi);
+		netif_napi_del(&vi->sq[i].napi);
+	}
 
 	kfree(vi->rq);
 	kfree(vi->sq);
@@ -1606,6 +1642,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
 			       napi_weight);
 		napi_hash_add(&vi->rq[i].napi);
+		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+			       napi_weight);
 
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
@@ -1830,6 +1868,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (err)
 		goto free_stats;
 
+	vi->tx_work_limit = napi_weight;
+
 #ifdef CONFIG_SYSFS
 	if (vi->mergeable_rx_bufs)
 		dev->sysfs_rx_queue_group = &virtio_net_mrg_rx_group;
@@ -1944,8 +1984,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			napi_disable(&vi->rq[i].napi);
+			napi_disable(&vi->sq[i].napi);
 			napi_hash_del(&vi->rq[i].napi);
 			netif_napi_del(&vi->rq[i].napi);
+			netif_napi_del(&vi->sq[i].napi);
 		}
 	}
 
@@ -1970,8 +2012,10 @@ static int virtnet_restore(struct virtio_device *vdev)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(&vi->rq[i]);
+			napi_enable(&vi->sq[i].napi);
+		}
 	}
 
 	netif_device_attach(vi->dev);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
  2015-02-09  8:39 [PATCH RFC v5 net-next 0/6] enable tx interrupts for virtio-net Jason Wang
                   ` (2 preceding siblings ...)
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 3/6] virtio_net: enable tx interrupt Jason Wang
@ 2015-02-09  8:39 ` Jason Wang
  2015-02-10  1:32   ` Rusty Russell
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 5/6] vhost: let vhost_signal() returns whether signalled Jason Wang
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 6/6] vhost_net: interrupt coalescing support Jason Wang
  5 siblings, 1 reply; 24+ messages in thread
From: Jason Wang @ 2015-02-09  8:39 UTC (permalink / raw)
  To: netdev, linux-kernel, virtualization, mst; +Cc: pagupta

This patch enables the interrupt coalescing setting through ethtool.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c        | 67 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/virtio_net.h | 12 ++++++++
 2 files changed, 79 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index cc5f5de..2b958fb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -145,6 +145,11 @@ struct virtnet_info {
 
 	/* Budget for polling tx completion */
 	u32 tx_work_limit;
+
+	__u32 rx_coalesce_usecs;
+	__u32 rx_max_coalesced_frames;
+	__u32 tx_coalesce_usecs;
+	__u32 tx_max_coalesced_frames;
 };
 
 struct padded_vnet_hdr {
@@ -1404,12 +1409,73 @@ static void virtnet_get_channels(struct net_device *dev,
 	channels->other_count = 0;
 }
 
+static int virtnet_set_coalesce(struct net_device *dev,
+				struct ethtool_coalesce *ec)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct scatterlist sg;
+	struct virtio_net_ctrl_coalesce c;
+
+	if (!vi->has_cvq ||
+	    !virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_COALESCE))
+		return -EOPNOTSUPP;
+	if (vi->rx_coalesce_usecs != ec->rx_coalesce_usecs ||
+	    vi->rx_max_coalesced_frames != ec->rx_max_coalesced_frames) {
+		c.coalesce_usecs = ec->rx_coalesce_usecs;
+		c.max_coalesced_frames = ec->rx_max_coalesced_frames;
+		sg_init_one(&sg, &c, sizeof(c));
+		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
+					  VIRTIO_NET_CTRL_COALESCE_RX_SET,
+					  &sg)) {
+			dev_warn(&dev->dev, "Fail to set rx coalescing\n");
+			return -EINVAL;
+		}
+		vi->rx_coalesce_usecs = ec->rx_coalesce_usecs;
+		vi->rx_max_coalesced_frames = ec->rx_max_coalesced_frames;
+	}
+
+	if (vi->tx_coalesce_usecs != ec->tx_coalesce_usecs ||
+	    vi->tx_max_coalesced_frames != ec->tx_max_coalesced_frames) {
+		c.coalesce_usecs = ec->tx_coalesce_usecs;
+		c.max_coalesced_frames = ec->tx_max_coalesced_frames;
+		sg_init_one(&sg, &c, sizeof(c));
+		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
+					  VIRTIO_NET_CTRL_COALESCE_TX_SET,
+					  &sg)) {
+			dev_warn(&dev->dev, "Fail to set tx coalescing\n");
+			return -EINVAL;
+		}
+		vi->tx_coalesce_usecs = ec->tx_coalesce_usecs;
+		vi->tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
+	}
+
+	vi->tx_work_limit = ec->tx_max_coalesced_frames_irq;
+
+	return 0;
+}
+
+static int virtnet_get_coalesce(struct net_device *dev,
+				struct ethtool_coalesce *ec)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+
+	ec->rx_coalesce_usecs = vi->rx_coalesce_usecs;
+	ec->rx_max_coalesced_frames = vi->rx_max_coalesced_frames;
+	ec->tx_coalesce_usecs = vi->tx_coalesce_usecs;
+	ec->tx_max_coalesced_frames = vi->tx_max_coalesced_frames;
+	ec->tx_max_coalesced_frames_irq = vi->tx_work_limit;
+
+	return 0;
+}
+
 static const struct ethtool_ops virtnet_ethtool_ops = {
 	.get_drvinfo = virtnet_get_drvinfo,
 	.get_link = ethtool_op_get_link,
 	.get_ringparam = virtnet_get_ringparam,
 	.set_channels = virtnet_set_channels,
 	.get_channels = virtnet_get_channels,
+	.set_coalesce = virtnet_set_coalesce,
+	.get_coalesce = virtnet_get_coalesce,
 };
 
 #define MIN_MTU 68
@@ -2048,6 +2114,7 @@ static unsigned int features[] = {
 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
 	VIRTIO_NET_F_CTRL_MAC_ADDR,
 	VIRTIO_F_ANY_LAYOUT,
+	VIRTIO_NET_F_CTRL_COALESCE,
 };
 
 static struct virtio_driver virtio_net_driver = {
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index b5f1677..332009d 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -34,6 +34,7 @@
 /* The feature bitmap for virtio net */
 #define VIRTIO_NET_F_CSUM	0	/* Host handles pkts w/ partial csum */
 #define VIRTIO_NET_F_GUEST_CSUM	1	/* Guest handles pkts w/ partial csum */
+#define VIRTIO_NET_F_CTRL_COALESCE 3	/* Set coalescing */
 #define VIRTIO_NET_F_MAC	5	/* Host has given MAC address. */
 #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
 #define VIRTIO_NET_F_GUEST_TSO4	7	/* Guest can handle TSOv4 in. */
@@ -202,4 +203,15 @@ struct virtio_net_ctrl_mq {
  #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
  #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
 
+struct virtio_net_ctrl_coalesce {
+	__u32 coalesce_usecs;
+	__u32 max_coalesced_frames;
+};
+
+#define VIRTIO_NET_CTRL_COALESCE 6
+ #define VIRTIO_NET_CTRL_COALESCE_TX_SET 0
+ #define VIRTIO_NET_CTRL_COALESCE_TX_GET 1
+ #define VIRTIO_NET_CTRL_COALESCE_RX_SET 2
+ #define VIRTIO_NET_CTRL_COALESCE_RX_GET 3
+
 #endif /* _LINUX_VIRTIO_NET_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v5 net-next 5/6] vhost: let vhost_signal() returns whether signalled
  2015-02-09  8:39 [PATCH RFC v5 net-next 0/6] enable tx interrupts for virtio-net Jason Wang
                   ` (3 preceding siblings ...)
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support Jason Wang
@ 2015-02-09  8:39 ` Jason Wang
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 6/6] vhost_net: interrupt coalescing support Jason Wang
  5 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2015-02-09  8:39 UTC (permalink / raw)
  To: netdev, linux-kernel, virtualization, mst; +Cc: pagupta, Jason Wang

Let vhost_signal() return whether or not vhost has injected an
interrupt to guest. This is used for interrupt coalescing
implementation to calculate the interval between two interrupts.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 7 +++++--
 drivers/vhost/vhost.h | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index cb807d0..20d6b84 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1480,11 +1480,14 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 }
 
 /* This actually signals the guest, using eventfd. */
-void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+bool vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
 	/* Signal the Guest tell them we used something up. */
-	if (vq->call_ctx && vhost_notify(dev, vq))
+	if (vq->call_ctx && vhost_notify(dev, vq)) {
 		eventfd_signal(vq->call_ctx, 1);
+		return true;
+	}
+	return false;
 }
 EXPORT_SYMBOL_GPL(vhost_signal);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8c1c792..a482563 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -148,7 +148,7 @@ void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
 			       unsigned int id, int len);
 void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
-void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
+bool vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v5 net-next 6/6] vhost_net: interrupt coalescing support
  2015-02-09  8:39 [PATCH RFC v5 net-next 0/6] enable tx interrupts for virtio-net Jason Wang
                   ` (4 preceding siblings ...)
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 5/6] vhost: let vhost_signal() returns whether signalled Jason Wang
@ 2015-02-09  8:39 ` Jason Wang
  5 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2015-02-09  8:39 UTC (permalink / raw)
  To: netdev, linux-kernel, virtualization, mst; +Cc: pagupta

This patch implements interrupt coalescing support for vhost_net. And provides
ioctl()s for userspace to get and set coalescing parameters. Two kinds of
parameters were allowed to be set:

- max_coalesced_frames: which is the maximum numbers of packets were allowed
  before issuing an irq.
- coalesced_usecs: which is the maximum number of micro seconds were allowed
  before issuing an irq if at least one packet were pending.

A per virtqueue hrtimer were used for coalesced_usecs.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Changes from RFCv4:
- return ns instead of us in vhost_net_check_coalesce_and_signal()
- measure the time interval of real interrupts instead of calls to vhost_signal().
---
 drivers/vhost/net.c        | 199 +++++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/vhost.h |  12 +++
 2 files changed, 202 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 6906f76..3222ac9 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -18,6 +18,7 @@
 #include <linux/file.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/timer.h>
 
 #include <linux/net.h>
 #include <linux/if_packet.h>
@@ -62,7 +63,8 @@ enum {
 	VHOST_NET_FEATURES = VHOST_FEATURES |
 			 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
 			 (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
-			 (1ULL << VIRTIO_F_VERSION_1),
+			 (1ULL << VIRTIO_F_VERSION_1) |
+			 (1ULL << VIRTIO_NET_F_CTRL_COALESCE),
 };
 
 enum {
@@ -100,6 +102,15 @@ struct vhost_net_virtqueue {
 	/* Reference counting for outstanding ubufs.
 	 * Protected by vq mutex. Writers must also take device mutex. */
 	struct vhost_net_ubuf_ref *ubufs;
+	/* Microseconds after at least 1 paket is processed before
+	 * generating an interrupt.
+	 */
+	__u32 coalesce_usecs;
+	/* Packets are processed before genearting an interrupt. */
+	__u32 max_coalesced_frames;
+	__u32 coalesced;
+	ktime_t last_signal;
+	struct hrtimer c_timer;
 };
 
 struct vhost_net {
@@ -197,11 +208,16 @@ static void vhost_net_vq_reset(struct vhost_net *n)
 	vhost_net_clear_ubuf_info(n);
 
 	for (i = 0; i < VHOST_NET_VQ_MAX; i++) {
+		hrtimer_cancel(&n->vqs[i].c_timer);
 		n->vqs[i].done_idx = 0;
 		n->vqs[i].upend_idx = 0;
 		n->vqs[i].ubufs = NULL;
 		n->vqs[i].vhost_hlen = 0;
 		n->vqs[i].sock_hlen = 0;
+		n->vqs[i].max_coalesced_frames = 0;
+		n->vqs[i].coalesce_usecs = 0;
+		n->vqs[i].last_signal = ktime_get();
+		n->vqs[i].coalesced = 0;
 	}
 
 }
@@ -273,6 +289,55 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
 	}
 }
 
+static int vhost_net_check_coalesce_and_signal(struct vhost_dev *dev,
+					       struct vhost_net_virtqueue *nvq)
+{
+	struct vhost_virtqueue *vq = &nvq->vq;
+	int left = 0;
+	ktime_t now;
+
+	if (nvq->coalesced) {
+		now = ktime_get();
+		left = nvq->coalesce_usecs -
+		       ktime_to_us(ktime_sub(now, nvq->last_signal));
+		if (left <= 0) {
+			vhost_signal(dev, vq);
+			nvq->last_signal = now;
+			nvq->coalesced = 0;
+		}
+	}
+
+	return left * NSEC_PER_USEC;
+}
+
+static bool vhost_net_add_used_and_signal_n(struct vhost_dev *dev,
+					    struct vhost_net_virtqueue *nvq,
+					    struct vring_used_elem *heads,
+					    unsigned count)
+{
+	struct vhost_virtqueue *vq = &nvq->vq;
+	bool can_coalesce = nvq->max_coalesced_frames && nvq->coalesce_usecs;
+	bool ret = false;
+
+	vhost_add_used_n(vq, heads, count);
+
+	if (can_coalesce) {
+		ktime_t now = ktime_get();
+
+		nvq->coalesced += count;
+		if (((nvq->coalesced >= nvq->max_coalesced_frames) ||
+		     (ktime_to_us(ktime_sub(now, nvq->last_signal)) >=
+		      nvq->coalesce_usecs)) && vhost_signal(dev, vq)) {
+			nvq->coalesced = 0;
+			nvq->last_signal = now;
+			ret = true;
+		}
+	} else {
+		vhost_signal(dev, vq);
+	}
+	return ret;
+}
+
 /* In case of DMA done not in order in lower device driver for some reason.
  * upend_idx is used to track end of used idx, done_idx is used to track head
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
@@ -297,8 +362,8 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 	}
 	while (j) {
 		add = min(UIO_MAXIOV - nvq->done_idx, j);
-		vhost_add_used_and_signal_n(vq->dev, vq,
-					    &vq->heads[nvq->done_idx], add);
+		vhost_net_add_used_and_signal_n(vq->dev, nvq,
+						&vq->heads[nvq->done_idx], add);
 		nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
 		j -= add;
 	}
@@ -351,6 +416,7 @@ static void handle_tx(struct vhost_net *net)
 	struct socket *sock;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
 	bool zcopy, zcopy_used;
+	int left;
 
 	mutex_lock(&vq->mutex);
 	sock = vq->private_data;
@@ -362,6 +428,8 @@ static void handle_tx(struct vhost_net *net)
 	hdr_size = nvq->vhost_hlen;
 	zcopy = nvq->ubufs;
 
+	vhost_net_check_coalesce_and_signal(&net->dev, nvq);
+
 	for (;;) {
 		/* Release DMAs done buffers first */
 		if (zcopy)
@@ -444,10 +512,15 @@ static void handle_tx(struct vhost_net *net)
 		if (err != len)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
-		if (!zcopy_used)
-			vhost_add_used_and_signal(&net->dev, vq, head, 0);
-		else
+
+		if (!zcopy_used) {
+			struct vring_used_elem heads = { head, 0 };
+
+			vhost_net_add_used_and_signal_n(&net->dev,
+							nvq, &heads, 1);
+		} else {
 			vhost_zerocopy_signal_used(net, vq);
+		}
 		total_len += len;
 		vhost_net_tx_packet(net);
 		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
@@ -455,6 +528,12 @@ static void handle_tx(struct vhost_net *net)
 			break;
 		}
 	}
+
+	left = vhost_net_check_coalesce_and_signal(&net->dev, nvq);
+	if (left > 0)
+		hrtimer_start(&nvq->c_timer, ns_to_ktime(left),
+			      HRTIMER_MODE_REL);
+
 out:
 	mutex_unlock(&vq->mutex);
 }
@@ -574,7 +653,7 @@ static void handle_rx(struct vhost_net *net)
 		.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
 	};
 	size_t total_len = 0;
-	int err, mergeable;
+	int err, mergeable, left;
 	s16 headcount;
 	size_t vhost_hlen, sock_hlen;
 	size_t vhost_len, sock_len;
@@ -593,6 +672,8 @@ static void handle_rx(struct vhost_net *net)
 		vq->log : NULL;
 	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
 
+	vhost_net_check_coalesce_and_signal(&net->dev, nvq);
+
 	while ((sock_len = peek_head_len(sock->sk))) {
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
@@ -658,8 +739,10 @@ static void handle_rx(struct vhost_net *net)
 			vhost_discard_vq_desc(vq, headcount);
 			break;
 		}
-		vhost_add_used_and_signal_n(&net->dev, vq, vq->heads,
-					    headcount);
+
+		vhost_net_add_used_and_signal_n(&net->dev, nvq,
+						vq->heads, headcount);
+
 		if (unlikely(vq_log))
 			vhost_log_write(vq, vq_log, log, vhost_len);
 		total_len += vhost_len;
@@ -668,6 +751,12 @@ static void handle_rx(struct vhost_net *net)
 			break;
 		}
 	}
+
+	left = vhost_net_check_coalesce_and_signal(&net->dev, nvq);
+	if (left > 0)
+		hrtimer_start(&nvq->c_timer, ms_to_ktime(left),
+			HRTIMER_MODE_REL);
+
 out:
 	mutex_unlock(&vq->mutex);
 }
@@ -704,6 +793,18 @@ static void handle_rx_net(struct vhost_work *work)
 	handle_rx(net);
 }
 
+static enum hrtimer_restart vhost_net_timer_handler(struct hrtimer *timer)
+{
+	struct vhost_net_virtqueue *nvq = container_of(timer,
+						struct vhost_net_virtqueue,
+						c_timer);
+	struct vhost_virtqueue *vq = &nvq->vq;
+
+	vhost_poll_queue(&vq->poll);
+
+	return HRTIMER_NORESTART;
+}
+
 static int vhost_net_open(struct inode *inode, struct file *f)
 {
 	struct vhost_net *n;
@@ -735,6 +836,13 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		n->vqs[i].done_idx = 0;
 		n->vqs[i].vhost_hlen = 0;
 		n->vqs[i].sock_hlen = 0;
+		n->vqs[i].max_coalesced_frames = 0;
+		n->vqs[i].coalesce_usecs = 0;
+		n->vqs[i].last_signal = ktime_get();
+		n->vqs[i].coalesced = 0;
+		hrtimer_init(&n->vqs[i].c_timer, CLOCK_MONOTONIC,
+			     HRTIMER_MODE_REL);
+		n->vqs[i].c_timer.function = vhost_net_timer_handler;
 	}
 	vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
 
@@ -911,6 +1019,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 	struct vhost_virtqueue *vq;
 	struct vhost_net_virtqueue *nvq;
 	struct vhost_net_ubuf_ref *ubufs, *oldubufs = NULL;
+	unsigned int coalesced;
 	int r;
 
 	mutex_lock(&n->dev.mutex);
@@ -939,6 +1048,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 
 	/* start polling new socket */
 	oldsock = vq->private_data;
+	coalesced = nvq->coalesced;
 	if (sock != oldsock) {
 		ubufs = vhost_net_ubuf_alloc(vq,
 					     sock && vhost_sock_zcopy(sock));
@@ -973,6 +1083,12 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 		mutex_unlock(&vq->mutex);
 	}
 
+	if (coalesced) {
+		mutex_lock(&vq->mutex);
+		vhost_signal(&n->dev, vq);
+		mutex_unlock(&vq->mutex);
+	}
+
 	if (oldsock) {
 		vhost_net_flush_vq(n, index);
 		sockfd_put(oldsock);
@@ -1080,6 +1196,67 @@ out:
 	return r;
 }
 
+static long vhost_net_set_vring_coalesce(struct vhost_dev *d, void __user *argp)
+{
+	u32 __user *idxp = argp;
+	u32 idx;
+	int r;
+	struct vhost_virtqueue *vq;
+	struct vhost_net_vring_coalesce c;
+	struct vhost_net_virtqueue *nvq;
+
+	r = get_user(idx, idxp);
+	if (r < 0)
+		return r;
+	if (idx >= d->nvqs)
+		return -ENOBUFS;
+
+	vq = d->vqs[idx];
+	nvq = container_of(vq, struct vhost_net_virtqueue, vq);
+
+	r = copy_from_user(&c, argp, sizeof(c));
+	if (r < 0)
+		return r;
+
+	mutex_lock(&vq->mutex);
+	nvq->coalesce_usecs = c.coalesce_usecs;
+	nvq->max_coalesced_frames = c.max_coalesced_frames;
+	mutex_unlock(&vq->mutex);
+
+	return 0;
+}
+
+static long vhost_net_get_vring_coalesce(struct vhost_dev *d, void __user *argp)
+{
+	u32 __user *idxp = argp;
+	u32 idx;
+	int r;
+	struct vhost_virtqueue *vq;
+	struct vhost_net_vring_coalesce c;
+	struct vhost_net_virtqueue *nvq;
+
+	r = get_user(idx, idxp);
+	if (r < 0)
+		return r;
+	if (idx >= d->nvqs)
+		return -ENOBUFS;
+
+	vq = d->vqs[idx];
+	nvq = container_of(vq, struct vhost_net_virtqueue, vq);
+
+	mutex_lock(&vq->mutex);
+	c.index = idx;
+	c.coalesce_usecs = nvq->coalesce_usecs;
+	c.max_coalesced_frames = nvq->max_coalesced_frames;
+	mutex_unlock(&vq->mutex);
+
+	r = copy_to_user(argp, &c, sizeof(c));
+	if (r < 0)
+		return r;
+
+	return 0;
+}
+
 static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
 			    unsigned long arg)
 {
@@ -1110,6 +1287,10 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
 		return vhost_net_reset_owner(n);
 	case VHOST_SET_OWNER:
 		return vhost_net_set_owner(n);
+	case VHOST_NET_SET_VRING_COALESCE:
+		return vhost_net_set_vring_coalesce(&n->dev, argp);
+	case VHOST_NET_GET_VRING_COALESCE:
+		return vhost_net_get_vring_coalesce(&n->dev, argp);
 	default:
 		mutex_lock(&n->dev.mutex);
 		r = vhost_dev_ioctl(&n->dev, ioctl, argp);
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index bb6a5b4..6799cc1 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -27,6 +27,12 @@ struct vhost_vring_file {
 
 };
 
+struct vhost_net_vring_coalesce {
+	unsigned int index;
+	__u32 coalesce_usecs;
+	__u32 max_coalesced_frames;
+};
+
 struct vhost_vring_addr {
 	unsigned int index;
 	/* Option flags. */
@@ -121,6 +127,12 @@ struct vhost_memory {
  * device.  This can be used to stop the ring (e.g. for migration). */
 #define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vring_file)
 
+/* Setting interrupt coalescing parameters. */
+#define VHOST_NET_SET_VRING_COALESCE \
+	_IOW(VHOST_VIRTIO, 0x31, struct vhost_net_vring_coalesce)
+/* Getting interrupt coalescing parameters. */
+#define VHOST_NET_GET_VRING_COALESCE \
+	_IOW(VHOST_VIRTIO, 0x32, struct vhost_net_vring_coalesce)
 /* Feature bits */
 /* Log all write descriptors. Can be changed while device is active. */
 #define VHOST_F_LOG_ALL 26
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending Jason Wang
@ 2015-02-10  1:03   ` Rusty Russell
  2015-02-10  6:26     ` Jason Wang
  2015-02-10 10:18     ` Michael S. Tsirkin
  0 siblings, 2 replies; 24+ messages in thread
From: Rusty Russell @ 2015-02-10  1:03 UTC (permalink / raw)
  To: Jason Wang, netdev, linux-kernel, virtualization, mst; +Cc: pagupta

Jason Wang <jasowang@redhat.com> writes:
> We currently does:
>
> bufs = (avail->idx - last_used_idx) * 3 / 4;
>
> This is ok now since we only try to enable the delayed callbacks when
> the queue is about to be full. This may not work well when there is
> only one pending buffer in the virtqueue (this may be the case after
> tx interrupt was enabled). Since virtqueue_enable_cb() will return
> false which may cause unnecessary triggering of napis. This patch
> correct this by only calculate the four thirds when bufs is not one.

I mildly prefer to avoid the branch, by changing the calculation like
so:

        /* Set bufs >= 1, even if there's only one pending buffer */
        bufs = (bufs + 1) * 3 / 4;

But it's not clear to me how much this happens.  I'm happy with the
patch though, as currently virtqueue_enable_cb_delayed() is the same
as virtqueue_enable_cb() if there's only been one buffer added.

Cheers,
Rusty.

> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 00ec6b3..545fed5 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -636,7 +636,10 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>  	 * entry. Always do both to keep code simple. */
>  	vq->vring.avail->flags &= cpu_to_virtio16(_vq->vdev, ~VRING_AVAIL_F_NO_INTERRUPT);
>  	/* TODO: tune this threshold */
> -	bufs = (u16)(virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) - vq->last_used_idx) * 3 / 4;
> +	bufs = (u16)(virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) -
> +		                     vq->last_used_idx);
> +	if (bufs != 1)
> +		bufs = bufs * 3 / 4;
>  	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs);
>  	virtio_mb(vq->weak_barriers);
>  	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
> -- 
> 1.8.3.1
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb()
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb() Jason Wang
@ 2015-02-10  1:07   ` Rusty Russell
  2015-02-10 10:24   ` Michael S. Tsirkin
  1 sibling, 0 replies; 24+ messages in thread
From: Rusty Russell @ 2015-02-10  1:07 UTC (permalink / raw)
  To: Jason Wang, netdev, linux-kernel, virtualization, mst; +Cc: pagupta

Jason Wang <jasowang@redhat.com> writes:
> Currently, we do nothing to prevent the callbacks in
> virtqueue_disable_cb() when event index is used. This may cause
> spurious interrupts which may damage the performance. This patch tries
> to publish avail event as the used even to prevent the callbacks.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Rusty Russell <rusty@rustcorp.com.au>

> ---
>  drivers/virtio/virtio_ring.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 545fed5..e9ffbfb 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -539,6 +539,8 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
>  	vq->vring.avail->flags |= cpu_to_virtio16(_vq->vdev, VRING_AVAIL_F_NO_INTERRUPT);
> +	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev,
> +						       vq->vring.avail->idx);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>  
> -- 
> 1.8.3.1
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support Jason Wang
@ 2015-02-10  1:32   ` Rusty Russell
  2015-02-10  6:51     ` Jason Wang
  2015-02-10 10:40     ` Michael S. Tsirkin
  0 siblings, 2 replies; 24+ messages in thread
From: Rusty Russell @ 2015-02-10  1:32 UTC (permalink / raw)
  To: Jason Wang, netdev, linux-kernel, virtualization, mst; +Cc: pagupta

Jason Wang <jasowang@redhat.com> writes:
> This patch enables the interrupt coalescing setting through ethtool.

The problem is that there's nothing network specific about interrupt
coalescing.  I can see other devices wanting exactly the same thing,
which means we'd deprecate this in the next virtio standard.

I think the right answer is to extend like we did with
vring_used_event(), eg:

1) Add a new feature VIRTIO_F_RING_COALESCE.
2) Add another a 32-bit field after vring_used_event(), eg:
        #define vring_used_delay(vr) (*(u32 *)((vr)->avail->ring[(vr)->num + 2]))

This loses the ability to coalesce by number of frames, but we can still
do number of sg entries, as we do now with used_event, and we could
change virtqueue_enable_cb_delayed() to take a precise number if we
wanted.

My feeling is that this should be a v1.0-only feature though
(eg. feature bit 33).

Cheers,
Rusty.

> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c        | 67 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/virtio_net.h | 12 ++++++++
>  2 files changed, 79 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index cc5f5de..2b958fb 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -145,6 +145,11 @@ struct virtnet_info {
>  
>  	/* Budget for polling tx completion */
>  	u32 tx_work_limit;
> +
> +	__u32 rx_coalesce_usecs;
> +	__u32 rx_max_coalesced_frames;
> +	__u32 tx_coalesce_usecs;
> +	__u32 tx_max_coalesced_frames;
>  };
>  
>  struct padded_vnet_hdr {
> @@ -1404,12 +1409,73 @@ static void virtnet_get_channels(struct net_device *dev,
>  	channels->other_count = 0;
>  }
>  
> +static int virtnet_set_coalesce(struct net_device *dev,
> +				struct ethtool_coalesce *ec)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	struct scatterlist sg;
> +	struct virtio_net_ctrl_coalesce c;
> +
> +	if (!vi->has_cvq ||
> +	    !virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_COALESCE))
> +		return -EOPNOTSUPP;
> +	if (vi->rx_coalesce_usecs != ec->rx_coalesce_usecs ||
> +	    vi->rx_max_coalesced_frames != ec->rx_max_coalesced_frames) {
> +		c.coalesce_usecs = ec->rx_coalesce_usecs;
> +		c.max_coalesced_frames = ec->rx_max_coalesced_frames;
> +		sg_init_one(&sg, &c, sizeof(c));
> +		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
> +					  VIRTIO_NET_CTRL_COALESCE_RX_SET,
> +					  &sg)) {
> +			dev_warn(&dev->dev, "Fail to set rx coalescing\n");
> +			return -EINVAL;
> +		}
> +		vi->rx_coalesce_usecs = ec->rx_coalesce_usecs;
> +		vi->rx_max_coalesced_frames = ec->rx_max_coalesced_frames;
> +	}
> +
> +	if (vi->tx_coalesce_usecs != ec->tx_coalesce_usecs ||
> +	    vi->tx_max_coalesced_frames != ec->tx_max_coalesced_frames) {
> +		c.coalesce_usecs = ec->tx_coalesce_usecs;
> +		c.max_coalesced_frames = ec->tx_max_coalesced_frames;
> +		sg_init_one(&sg, &c, sizeof(c));
> +		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
> +					  VIRTIO_NET_CTRL_COALESCE_TX_SET,
> +					  &sg)) {
> +			dev_warn(&dev->dev, "Fail to set tx coalescing\n");
> +			return -EINVAL;
> +		}
> +		vi->tx_coalesce_usecs = ec->tx_coalesce_usecs;
> +		vi->tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
> +	}
> +
> +	vi->tx_work_limit = ec->tx_max_coalesced_frames_irq;
> +
> +	return 0;
> +}
> +
> +static int virtnet_get_coalesce(struct net_device *dev,
> +				struct ethtool_coalesce *ec)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +
> +	ec->rx_coalesce_usecs = vi->rx_coalesce_usecs;
> +	ec->rx_max_coalesced_frames = vi->rx_max_coalesced_frames;
> +	ec->tx_coalesce_usecs = vi->tx_coalesce_usecs;
> +	ec->tx_max_coalesced_frames = vi->tx_max_coalesced_frames;
> +	ec->tx_max_coalesced_frames_irq = vi->tx_work_limit;
> +
> +	return 0;
> +}
> +
>  static const struct ethtool_ops virtnet_ethtool_ops = {
>  	.get_drvinfo = virtnet_get_drvinfo,
>  	.get_link = ethtool_op_get_link,
>  	.get_ringparam = virtnet_get_ringparam,
>  	.set_channels = virtnet_set_channels,
>  	.get_channels = virtnet_get_channels,
> +	.set_coalesce = virtnet_set_coalesce,
> +	.get_coalesce = virtnet_get_coalesce,
>  };
>  
>  #define MIN_MTU 68
> @@ -2048,6 +2114,7 @@ static unsigned int features[] = {
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
>  	VIRTIO_NET_F_CTRL_MAC_ADDR,
>  	VIRTIO_F_ANY_LAYOUT,
> +	VIRTIO_NET_F_CTRL_COALESCE,
>  };
>  
>  static struct virtio_driver virtio_net_driver = {
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index b5f1677..332009d 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -34,6 +34,7 @@
>  /* The feature bitmap for virtio net */
>  #define VIRTIO_NET_F_CSUM	0	/* Host handles pkts w/ partial csum */
>  #define VIRTIO_NET_F_GUEST_CSUM	1	/* Guest handles pkts w/ partial csum */
> +#define VIRTIO_NET_F_CTRL_COALESCE 3	/* Set coalescing */
>  #define VIRTIO_NET_F_MAC	5	/* Host has given MAC address. */
>  #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
>  #define VIRTIO_NET_F_GUEST_TSO4	7	/* Guest can handle TSOv4 in. */
> @@ -202,4 +203,15 @@ struct virtio_net_ctrl_mq {
>   #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
>   #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
>  
> +struct virtio_net_ctrl_coalesce {
> +	__u32 coalesce_usecs;
> +	__u32 max_coalesced_frames;
> +};
> +
> +#define VIRTIO_NET_CTRL_COALESCE 6
> + #define VIRTIO_NET_CTRL_COALESCE_TX_SET 0
> + #define VIRTIO_NET_CTRL_COALESCE_TX_GET 1
> + #define VIRTIO_NET_CTRL_COALESCE_RX_SET 2
> + #define VIRTIO_NET_CTRL_COALESCE_RX_GET 3
> +
>  #endif /* _LINUX_VIRTIO_NET_H */
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending
  2015-02-10  1:03   ` Rusty Russell
@ 2015-02-10  6:26     ` Jason Wang
  2015-02-10 10:18     ` Michael S. Tsirkin
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Wang @ 2015-02-10  6:26 UTC (permalink / raw)
  To: Rusty Russell; +Cc: pagupta, netdev, mst, linux-kernel, virtualization



On Tue, Feb 10, 2015 at 9:03 AM, Rusty Russell <rusty@rustcorp.com.au> 
wrote:
> Jason Wang <jasowang@redhat.com> writes:
>>  We currently does:
>> 
>>  bufs = (avail->idx - last_used_idx) * 3 / 4;
>> 
>>  This is ok now since we only try to enable the delayed callbacks 
>> when
>>  the queue is about to be full. This may not work well when there is
>>  only one pending buffer in the virtqueue (this may be the case after
>>  tx interrupt was enabled). Since virtqueue_enable_cb() will return
>>  false which may cause unnecessary triggering of napis. This patch
>>  correct this by only calculate the four thirds when bufs is not one.
> 
> I mildly prefer to avoid the branch, by changing the calculation like
> so:
> 
>         /* Set bufs >= 1, even if there's only one pending buffer */
>         bufs = (bufs + 1) * 3 / 4;

Ok.
> 
> But it's not clear to me how much this happens. 

Depends on the traffic type. In some case e.g one session TCP_RR test 
(which has at most 1 pending packet), it may happen very often.
>  I'm happy with the
> patch though, as currently virtqueue_enable_cb_delayed() is the same
> as virtqueue_enable_cb() if there's only been one buffer added.

Yes. Thanks.
> 
> Cheers,
> Rusty.
> 
>>  Signed-off-by: Jason Wang <jasowang@redhat.com>
>>  ---
>>   drivers/virtio/virtio_ring.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>> 
>>  diff --git a/drivers/virtio/virtio_ring.c 
>> b/drivers/virtio/virtio_ring.c
>>  index 00ec6b3..545fed5 100644
>>  --- a/drivers/virtio/virtio_ring.c
>>  +++ b/drivers/virtio/virtio_ring.c
>>  @@ -636,7 +636,10 @@ bool virtqueue_enable_cb_delayed(struct 
>> virtqueue *_vq)
>>   	 * entry. Always do both to keep code simple. */
>>   	vq->vring.avail->flags &= cpu_to_virtio16(_vq->vdev, 
>> ~VRING_AVAIL_F_NO_INTERRUPT);
>>   	/* TODO: tune this threshold */
>>  -	bufs = (u16)(virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) - 
>> vq->last_used_idx) * 3 / 4;
>>  +	bufs = (u16)(virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) -
>>  +		                     vq->last_used_idx);
>>  +	if (bufs != 1)
>>  +		bufs = bufs * 3 / 4;
>>   	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, 
>> vq->last_used_idx + bufs);
>>   	virtio_mb(vq->weak_barriers);
>>   	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, 
>> vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
>>  -- 
>>  1.8.3.1
>> 
>>  _______________________________________________
>>  Virtualization mailing list
>>  Virtualization@lists.linux-foundation.org
>>  https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
  2015-02-10  1:32   ` Rusty Russell
@ 2015-02-10  6:51     ` Jason Wang
  2015-02-10 10:25       ` Michael S. Tsirkin
  2015-02-10 10:40     ` Michael S. Tsirkin
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Wang @ 2015-02-10  6:51 UTC (permalink / raw)
  To: Rusty Russell; +Cc: pagupta, netdev, mst, linux-kernel, virtualization



On Tue, Feb 10, 2015 at 9:32 AM, Rusty Russell <rusty@rustcorp.com.au> 
wrote:
> Jason Wang <jasowang@redhat.com> writes:
>>  This patch enables the interrupt coalescing setting through ethtool.
> 
> The problem is that there's nothing network specific about interrupt
> coalescing.  I can see other devices wanting exactly the same thing,
> which means we'd deprecate this in the next virtio standard.
> 
> I think the right answer is to extend like we did with
> vring_used_event(), eg:
> 
> 1) Add a new feature VIRTIO_F_RING_COALESCE.
> 2) Add another a 32-bit field after vring_used_event(), eg:
>         #define vring_used_delay(vr) (*(u32 
> *)((vr)->avail->ring[(vr)->num + 2]))

Yes. This looks better and we don't even need device specific 
configuration method.

> 
> This loses the ability to coalesce by number of frames, but we can 
> still
> do number of sg entries, as we do now with used_event, and we could
> change virtqueue_enable_cb_delayed() to take a precise number if we
> wanted.

Can we give a device specific meaning for this? For virtio-net, we want 
to expose the coalescing settings through ethtool (tx-frames). And it 
was usually used with a timer, so probably another field after 
vring_used_delay() for this timer interval to trigger the interrupt if 
no new used buffers come after this interval.

> 
> 
> My feeling is that this should be a v1.0-only feature though
> (eg. feature bit 33).

Yes it should.

> 
> Cheers,
> Rusty.
> 
>>  Cc: Rusty Russell <rusty@rustcorp.com.au>
>>  Cc: Michael S. Tsirkin <mst@redhat.com>
>>  Signed-off-by: Jason Wang <jasowang@redhat.com>
>>  ---
>>   drivers/net/virtio_net.c        | 67 
>> +++++++++++++++++++++++++++++++++++++++++
>>   include/uapi/linux/virtio_net.h | 12 ++++++++
>>   2 files changed, 79 insertions(+)
>> 
>>  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>  index cc5f5de..2b958fb 100644
>>  --- a/drivers/net/virtio_net.c
>>  +++ b/drivers/net/virtio_net.c
>>  @@ -145,6 +145,11 @@ struct virtnet_info {
>>   
>>   	/* Budget for polling tx completion */
>>   	u32 tx_work_limit;
>>  +
>>  +	__u32 rx_coalesce_usecs;
>>  +	__u32 rx_max_coalesced_frames;
>>  +	__u32 tx_coalesce_usecs;
>>  +	__u32 tx_max_coalesced_frames;
>>   };
>>   
>>   struct padded_vnet_hdr {
>>  @@ -1404,12 +1409,73 @@ static void virtnet_get_channels(struct 
>> net_device *dev,
>>   	channels->other_count = 0;
>>   }
>>   
>>  +static int virtnet_set_coalesce(struct net_device *dev,
>>  +				struct ethtool_coalesce *ec)
>>  +{
>>  +	struct virtnet_info *vi = netdev_priv(dev);
>>  +	struct scatterlist sg;
>>  +	struct virtio_net_ctrl_coalesce c;
>>  +
>>  +	if (!vi->has_cvq ||
>>  +	    !virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_COALESCE))
>>  +		return -EOPNOTSUPP;
>>  +	if (vi->rx_coalesce_usecs != ec->rx_coalesce_usecs ||
>>  +	    vi->rx_max_coalesced_frames != ec->rx_max_coalesced_frames) {
>>  +		c.coalesce_usecs = ec->rx_coalesce_usecs;
>>  +		c.max_coalesced_frames = ec->rx_max_coalesced_frames;
>>  +		sg_init_one(&sg, &c, sizeof(c));
>>  +		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
>>  +					  VIRTIO_NET_CTRL_COALESCE_RX_SET,
>>  +					  &sg)) {
>>  +			dev_warn(&dev->dev, "Fail to set rx coalescing\n");
>>  +			return -EINVAL;
>>  +		}
>>  +		vi->rx_coalesce_usecs = ec->rx_coalesce_usecs;
>>  +		vi->rx_max_coalesced_frames = ec->rx_max_coalesced_frames;
>>  +	}
>>  +
>>  +	if (vi->tx_coalesce_usecs != ec->tx_coalesce_usecs ||
>>  +	    vi->tx_max_coalesced_frames != ec->tx_max_coalesced_frames) {
>>  +		c.coalesce_usecs = ec->tx_coalesce_usecs;
>>  +		c.max_coalesced_frames = ec->tx_max_coalesced_frames;
>>  +		sg_init_one(&sg, &c, sizeof(c));
>>  +		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
>>  +					  VIRTIO_NET_CTRL_COALESCE_TX_SET,
>>  +					  &sg)) {
>>  +			dev_warn(&dev->dev, "Fail to set tx coalescing\n");
>>  +			return -EINVAL;
>>  +		}
>>  +		vi->tx_coalesce_usecs = ec->tx_coalesce_usecs;
>>  +		vi->tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
>>  +	}
>>  +
>>  +	vi->tx_work_limit = ec->tx_max_coalesced_frames_irq;
>>  +
>>  +	return 0;
>>  +}
>>  +
>>  +static int virtnet_get_coalesce(struct net_device *dev,
>>  +				struct ethtool_coalesce *ec)
>>  +{
>>  +	struct virtnet_info *vi = netdev_priv(dev);
>>  +
>>  +	ec->rx_coalesce_usecs = vi->rx_coalesce_usecs;
>>  +	ec->rx_max_coalesced_frames = vi->rx_max_coalesced_frames;
>>  +	ec->tx_coalesce_usecs = vi->tx_coalesce_usecs;
>>  +	ec->tx_max_coalesced_frames = vi->tx_max_coalesced_frames;
>>  +	ec->tx_max_coalesced_frames_irq = vi->tx_work_limit;
>>  +
>>  +	return 0;
>>  +}
>>  +
>>   static const struct ethtool_ops virtnet_ethtool_ops = {
>>   	.get_drvinfo = virtnet_get_drvinfo,
>>   	.get_link = ethtool_op_get_link,
>>   	.get_ringparam = virtnet_get_ringparam,
>>   	.set_channels = virtnet_set_channels,
>>   	.get_channels = virtnet_get_channels,
>>  +	.set_coalesce = virtnet_set_coalesce,
>>  +	.get_coalesce = virtnet_get_coalesce,
>>   };
>>   
>>   #define MIN_MTU 68
>>  @@ -2048,6 +2114,7 @@ static unsigned int features[] = {
>>   	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
>>   	VIRTIO_NET_F_CTRL_MAC_ADDR,
>>   	VIRTIO_F_ANY_LAYOUT,
>>  +	VIRTIO_NET_F_CTRL_COALESCE,
>>   };
>>   
>>   static struct virtio_driver virtio_net_driver = {
>>  diff --git a/include/uapi/linux/virtio_net.h 
>> b/include/uapi/linux/virtio_net.h
>>  index b5f1677..332009d 100644
>>  --- a/include/uapi/linux/virtio_net.h
>>  +++ b/include/uapi/linux/virtio_net.h
>>  @@ -34,6 +34,7 @@
>>   /* The feature bitmap for virtio net */
>>   #define VIRTIO_NET_F_CSUM	0	/* Host handles pkts w/ partial csum */
>>   #define VIRTIO_NET_F_GUEST_CSUM	1	/* Guest handles pkts w/ partial 
>> csum */
>>  +#define VIRTIO_NET_F_CTRL_COALESCE 3	/* Set coalescing */
>>   #define VIRTIO_NET_F_MAC	5	/* Host has given MAC address. */
>>   #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
>>   #define VIRTIO_NET_F_GUEST_TSO4	7	/* Guest can handle TSOv4 in. */
>>  @@ -202,4 +203,15 @@ struct virtio_net_ctrl_mq {
>>    #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
>>    #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
>>   
>>  +struct virtio_net_ctrl_coalesce {
>>  +	__u32 coalesce_usecs;
>>  +	__u32 max_coalesced_frames;
>>  +};
>>  +
>>  +#define VIRTIO_NET_CTRL_COALESCE 6
>>  + #define VIRTIO_NET_CTRL_COALESCE_TX_SET 0
>>  + #define VIRTIO_NET_CTRL_COALESCE_TX_GET 1
>>  + #define VIRTIO_NET_CTRL_COALESCE_RX_SET 2
>>  + #define VIRTIO_NET_CTRL_COALESCE_RX_GET 3
>>  +
>>   #endif /* _LINUX_VIRTIO_NET_H */
>>  -- 
>>  1.8.3.1
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending
  2015-02-10  1:03   ` Rusty Russell
  2015-02-10  6:26     ` Jason Wang
@ 2015-02-10 10:18     ` Michael S. Tsirkin
  2015-02-10 23:58       ` Rusty Russell
  2015-02-11  5:41       ` Jason Wang
  1 sibling, 2 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2015-02-10 10:18 UTC (permalink / raw)
  To: Rusty Russell; +Cc: pagupta, netdev, linux-kernel, virtualization

On Tue, Feb 10, 2015 at 11:33:52AM +1030, Rusty Russell wrote:
> Jason Wang <jasowang@redhat.com> writes:
> > We currently does:
> >
> > bufs = (avail->idx - last_used_idx) * 3 / 4;
> >
> > This is ok now since we only try to enable the delayed callbacks when
> > the queue is about to be full. This may not work well when there is
> > only one pending buffer in the virtqueue (this may be the case after
> > tx interrupt was enabled). Since virtqueue_enable_cb() will return
> > false which may cause unnecessary triggering of napis. This patch
> > correct this by only calculate the four thirds when bufs is not one.
> 
> I mildly prefer to avoid the branch, by changing the calculation like
> so:
> 
>         /* Set bufs >= 1, even if there's only one pending buffer */
>         bufs = (bufs + 1) * 3 / 4;

Or bus * 3/4 + 1

> But it's not clear to me how much this happens.  I'm happy with the
> patch though, as currently virtqueue_enable_cb_delayed() is the same
> as virtqueue_enable_cb() if there's only been one buffer added.
> 
> Cheers,
> Rusty.

But isn't this by design?
The documentation says:

 * This re-enables callbacks but hints to the other side to delay
 * interrupts until most of the available buffers have been processed;

Clearly, this implies that when there's one buffer it must behave
exactly the same.

So I'm not very happy - this changes the meaning of the API without
updating the documentation.


> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 00ec6b3..545fed5 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -636,7 +636,10 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> >  	 * entry. Always do both to keep code simple. */
> >  	vq->vring.avail->flags &= cpu_to_virtio16(_vq->vdev, ~VRING_AVAIL_F_NO_INTERRUPT);
> >  	/* TODO: tune this threshold */
> > -	bufs = (u16)(virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) - vq->last_used_idx) * 3 / 4;
> > +	bufs = (u16)(virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) -
> > +		                     vq->last_used_idx);
> > +	if (bufs != 1)
> > +		bufs = bufs * 3 / 4;
> >  	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs);
> >  	virtio_mb(vq->weak_barriers);
> >  	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
> > -- 
> > 1.8.3.1
> >
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb()
  2015-02-09  8:39 ` [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb() Jason Wang
  2015-02-10  1:07   ` Rusty Russell
@ 2015-02-10 10:24   ` Michael S. Tsirkin
  2015-02-11  5:55     ` Jason Wang
  1 sibling, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2015-02-10 10:24 UTC (permalink / raw)
  To: Jason Wang; +Cc: pagupta, netdev, linux-kernel, virtualization

On Mon, Feb 09, 2015 at 03:39:21AM -0500, Jason Wang wrote:
> Currently, we do nothing to prevent the callbacks in
> virtqueue_disable_cb() when event index is used. This may cause
> spurious interrupts which may damage the performance. This patch tries
> to publish avail event as the used even to prevent the callbacks.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I'm surprised that this ever happens though.
Normally we call this after getting an interrupt, and
interrupts won't trigger again until the rings wraps around.

When I tested this, touching an extra cache line was more
expensive.

Does this really reduce number of interrupts?
Could you pls share some numbers with and without this patch?


> ---
>  drivers/virtio/virtio_ring.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 545fed5..e9ffbfb 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -539,6 +539,8 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
>  	vq->vring.avail->flags |= cpu_to_virtio16(_vq->vdev, VRING_AVAIL_F_NO_INTERRUPT);
> +	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev,
> +						       vq->vring.avail->idx);

Hmm in fact, can't this actually cause an extra interrupt
when avail->idx is completed?

I think that if you really can show disabling interrupts like this helps, you should
set some invalid value like 0xfffff, or move it back to vq->vring.avail->idx - 1.
No?



>  }
>  EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>  
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
  2015-02-10  6:51     ` Jason Wang
@ 2015-02-10 10:25       ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2015-02-10 10:25 UTC (permalink / raw)
  To: Jason Wang; +Cc: pagupta, netdev, linux-kernel, virtualization

On Tue, Feb 10, 2015 at 06:59:30AM +0008, Jason Wang wrote:
> 
> 
> On Tue, Feb 10, 2015 at 9:32 AM, Rusty Russell <rusty@rustcorp.com.au>
> wrote:
> >Jason Wang <jasowang@redhat.com> writes:
> >> This patch enables the interrupt coalescing setting through ethtool.
> >
> >The problem is that there's nothing network specific about interrupt
> >coalescing.  I can see other devices wanting exactly the same thing,
> >which means we'd deprecate this in the next virtio standard.
> >
> >I think the right answer is to extend like we did with
> >vring_used_event(), eg:
> >
> >1) Add a new feature VIRTIO_F_RING_COALESCE.
> >2) Add another a 32-bit field after vring_used_event(), eg:
> >        #define vring_used_delay(vr) (*(u32 *)((vr)->avail->ring[(vr)->num
> >+ 2]))
> 
> Yes. This looks better and we don't even need device specific configuration
> method.
> 
> >
> >This loses the ability to coalesce by number of frames, but we can still
> >do number of sg entries, as we do now with used_event, and we could
> >change virtqueue_enable_cb_delayed() to take a precise number if we
> >wanted.
> 
> Can we give a device specific meaning for this? For virtio-net, we want to
> expose the coalescing settings through ethtool (tx-frames). And it was
> usually used with a timer, so probably another field after
> vring_used_delay() for this timer interval to trigger the interrupt if no
> new used buffers come after this interval.

I think what Rusty has in mind is precisely sticking the delay
in vring_used_delay.


> >
> >
> >My feeling is that this should be a v1.0-only feature though
> >(eg. feature bit 33).
> 
> Yes it should.
> 
> >
> >Cheers,
> >Rusty.
> >
> >> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> Cc: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >>  drivers/net/virtio_net.c        | 67
> >>+++++++++++++++++++++++++++++++++++++++++
> >>  include/uapi/linux/virtio_net.h | 12 ++++++++
> >>  2 files changed, 79 insertions(+)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index cc5f5de..2b958fb 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -145,6 +145,11 @@ struct virtnet_info {
> >>  	/* Budget for polling tx completion */
> >>  	u32 tx_work_limit;
> >> +
> >> +	__u32 rx_coalesce_usecs;
> >> +	__u32 rx_max_coalesced_frames;
> >> +	__u32 tx_coalesce_usecs;
> >> +	__u32 tx_max_coalesced_frames;
> >>  };
> >>  struct padded_vnet_hdr {
> >> @@ -1404,12 +1409,73 @@ static void virtnet_get_channels(struct
> >>net_device *dev,
> >>  	channels->other_count = 0;
> >>  }
> >> +static int virtnet_set_coalesce(struct net_device *dev,
> >> +				struct ethtool_coalesce *ec)
> >> +{
> >> +	struct virtnet_info *vi = netdev_priv(dev);
> >> +	struct scatterlist sg;
> >> +	struct virtio_net_ctrl_coalesce c;
> >> +
> >> +	if (!vi->has_cvq ||
> >> +	    !virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_COALESCE))
> >> +		return -EOPNOTSUPP;
> >> +	if (vi->rx_coalesce_usecs != ec->rx_coalesce_usecs ||
> >> +	    vi->rx_max_coalesced_frames != ec->rx_max_coalesced_frames) {
> >> +		c.coalesce_usecs = ec->rx_coalesce_usecs;
> >> +		c.max_coalesced_frames = ec->rx_max_coalesced_frames;
> >> +		sg_init_one(&sg, &c, sizeof(c));
> >> +		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
> >> +					  VIRTIO_NET_CTRL_COALESCE_RX_SET,
> >> +					  &sg)) {
> >> +			dev_warn(&dev->dev, "Fail to set rx coalescing\n");
> >> +			return -EINVAL;
> >> +		}
> >> +		vi->rx_coalesce_usecs = ec->rx_coalesce_usecs;
> >> +		vi->rx_max_coalesced_frames = ec->rx_max_coalesced_frames;
> >> +	}
> >> +
> >> +	if (vi->tx_coalesce_usecs != ec->tx_coalesce_usecs ||
> >> +	    vi->tx_max_coalesced_frames != ec->tx_max_coalesced_frames) {
> >> +		c.coalesce_usecs = ec->tx_coalesce_usecs;
> >> +		c.max_coalesced_frames = ec->tx_max_coalesced_frames;
> >> +		sg_init_one(&sg, &c, sizeof(c));
> >> +		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
> >> +					  VIRTIO_NET_CTRL_COALESCE_TX_SET,
> >> +					  &sg)) {
> >> +			dev_warn(&dev->dev, "Fail to set tx coalescing\n");
> >> +			return -EINVAL;
> >> +		}
> >> +		vi->tx_coalesce_usecs = ec->tx_coalesce_usecs;
> >> +		vi->tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
> >> +	}
> >> +
> >> +	vi->tx_work_limit = ec->tx_max_coalesced_frames_irq;
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static int virtnet_get_coalesce(struct net_device *dev,
> >> +				struct ethtool_coalesce *ec)
> >> +{
> >> +	struct virtnet_info *vi = netdev_priv(dev);
> >> +
> >> +	ec->rx_coalesce_usecs = vi->rx_coalesce_usecs;
> >> +	ec->rx_max_coalesced_frames = vi->rx_max_coalesced_frames;
> >> +	ec->tx_coalesce_usecs = vi->tx_coalesce_usecs;
> >> +	ec->tx_max_coalesced_frames = vi->tx_max_coalesced_frames;
> >> +	ec->tx_max_coalesced_frames_irq = vi->tx_work_limit;
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  static const struct ethtool_ops virtnet_ethtool_ops = {
> >>  	.get_drvinfo = virtnet_get_drvinfo,
> >>  	.get_link = ethtool_op_get_link,
> >>  	.get_ringparam = virtnet_get_ringparam,
> >>  	.set_channels = virtnet_set_channels,
> >>  	.get_channels = virtnet_get_channels,
> >> +	.set_coalesce = virtnet_set_coalesce,
> >> +	.get_coalesce = virtnet_get_coalesce,
> >>  };
> >>  #define MIN_MTU 68
> >> @@ -2048,6 +2114,7 @@ static unsigned int features[] = {
> >>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
> >>  	VIRTIO_NET_F_CTRL_MAC_ADDR,
> >>  	VIRTIO_F_ANY_LAYOUT,
> >> +	VIRTIO_NET_F_CTRL_COALESCE,
> >>  };
> >>  static struct virtio_driver virtio_net_driver = {
> >> diff --git a/include/uapi/linux/virtio_net.h
> >>b/include/uapi/linux/virtio_net.h
> >> index b5f1677..332009d 100644
> >> --- a/include/uapi/linux/virtio_net.h
> >> +++ b/include/uapi/linux/virtio_net.h
> >> @@ -34,6 +34,7 @@
> >>  /* The feature bitmap for virtio net */
> >>  #define VIRTIO_NET_F_CSUM	0	/* Host handles pkts w/ partial csum */
> >>  #define VIRTIO_NET_F_GUEST_CSUM	1	/* Guest handles pkts w/ partial
> >>csum */
> >> +#define VIRTIO_NET_F_CTRL_COALESCE 3	/* Set coalescing */
> >>  #define VIRTIO_NET_F_MAC	5	/* Host has given MAC address. */
> >>  #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
> >>  #define VIRTIO_NET_F_GUEST_TSO4	7	/* Guest can handle TSOv4 in. */
> >> @@ -202,4 +203,15 @@ struct virtio_net_ctrl_mq {
> >>   #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
> >>   #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
> >> +struct virtio_net_ctrl_coalesce {
> >> +	__u32 coalesce_usecs;
> >> +	__u32 max_coalesced_frames;
> >> +};
> >> +
> >> +#define VIRTIO_NET_CTRL_COALESCE 6
> >> + #define VIRTIO_NET_CTRL_COALESCE_TX_SET 0
> >> + #define VIRTIO_NET_CTRL_COALESCE_TX_GET 1
> >> + #define VIRTIO_NET_CTRL_COALESCE_RX_SET 2
> >> + #define VIRTIO_NET_CTRL_COALESCE_RX_GET 3
> >> +
> >>  #endif /* _LINUX_VIRTIO_NET_H */
> >> --  1.8.3.1
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
  2015-02-10  1:32   ` Rusty Russell
  2015-02-10  6:51     ` Jason Wang
@ 2015-02-10 10:40     ` Michael S. Tsirkin
  2015-02-13  2:52       ` Rusty Russell
  1 sibling, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2015-02-10 10:40 UTC (permalink / raw)
  To: Rusty Russell; +Cc: pagupta, netdev, linux-kernel, virtualization

On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote:
> Jason Wang <jasowang@redhat.com> writes:
> > This patch enables the interrupt coalescing setting through ethtool.
> 
> The problem is that there's nothing network specific about interrupt
> coalescing.  I can see other devices wanting exactly the same thing,
> which means we'd deprecate this in the next virtio standard.
> 
> I think the right answer is to extend like we did with
> vring_used_event(), eg:
> 
> 1) Add a new feature VIRTIO_F_RING_COALESCE.
> 2) Add another a 32-bit field after vring_used_event(), eg:
>         #define vring_used_delay(vr) (*(u32 *)((vr)->avail->ring[(vr)->num + 2]))
> 
> This loses the ability to coalesce by number of frames, but we can still
> do number of sg entries, as we do now with used_event, and we could
> change virtqueue_enable_cb_delayed() to take a precise number if we
> wanted.

But do we expect delay to be update dynamically?
If not, why not stick it in config space?

> My feeling is that this should be a v1.0-only feature though
> (eg. feature bit 33).
> 
> Cheers,
> Rusty.

Yes, e.g. we can't extend config space for legacy virtio pci.

> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/net/virtio_net.c        | 67 +++++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/virtio_net.h | 12 ++++++++
> >  2 files changed, 79 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index cc5f5de..2b958fb 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -145,6 +145,11 @@ struct virtnet_info {
> >  
> >  	/* Budget for polling tx completion */
> >  	u32 tx_work_limit;
> > +
> > +	__u32 rx_coalesce_usecs;
> > +	__u32 rx_max_coalesced_frames;
> > +	__u32 tx_coalesce_usecs;
> > +	__u32 tx_max_coalesced_frames;
> >  };
> >  
> >  struct padded_vnet_hdr {
> > @@ -1404,12 +1409,73 @@ static void virtnet_get_channels(struct net_device *dev,
> >  	channels->other_count = 0;
> >  }
> >  
> > +static int virtnet_set_coalesce(struct net_device *dev,
> > +				struct ethtool_coalesce *ec)
> > +{
> > +	struct virtnet_info *vi = netdev_priv(dev);
> > +	struct scatterlist sg;
> > +	struct virtio_net_ctrl_coalesce c;
> > +
> > +	if (!vi->has_cvq ||
> > +	    !virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_COALESCE))
> > +		return -EOPNOTSUPP;
> > +	if (vi->rx_coalesce_usecs != ec->rx_coalesce_usecs ||
> > +	    vi->rx_max_coalesced_frames != ec->rx_max_coalesced_frames) {
> > +		c.coalesce_usecs = ec->rx_coalesce_usecs;
> > +		c.max_coalesced_frames = ec->rx_max_coalesced_frames;
> > +		sg_init_one(&sg, &c, sizeof(c));
> > +		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
> > +					  VIRTIO_NET_CTRL_COALESCE_RX_SET,
> > +					  &sg)) {
> > +			dev_warn(&dev->dev, "Fail to set rx coalescing\n");
> > +			return -EINVAL;
> > +		}
> > +		vi->rx_coalesce_usecs = ec->rx_coalesce_usecs;
> > +		vi->rx_max_coalesced_frames = ec->rx_max_coalesced_frames;
> > +	}
> > +
> > +	if (vi->tx_coalesce_usecs != ec->tx_coalesce_usecs ||
> > +	    vi->tx_max_coalesced_frames != ec->tx_max_coalesced_frames) {
> > +		c.coalesce_usecs = ec->tx_coalesce_usecs;
> > +		c.max_coalesced_frames = ec->tx_max_coalesced_frames;
> > +		sg_init_one(&sg, &c, sizeof(c));
> > +		if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
> > +					  VIRTIO_NET_CTRL_COALESCE_TX_SET,
> > +					  &sg)) {
> > +			dev_warn(&dev->dev, "Fail to set tx coalescing\n");
> > +			return -EINVAL;
> > +		}
> > +		vi->tx_coalesce_usecs = ec->tx_coalesce_usecs;
> > +		vi->tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
> > +	}
> > +
> > +	vi->tx_work_limit = ec->tx_max_coalesced_frames_irq;
> > +
> > +	return 0;
> > +}
> > +
> > +static int virtnet_get_coalesce(struct net_device *dev,
> > +				struct ethtool_coalesce *ec)
> > +{
> > +	struct virtnet_info *vi = netdev_priv(dev);
> > +
> > +	ec->rx_coalesce_usecs = vi->rx_coalesce_usecs;
> > +	ec->rx_max_coalesced_frames = vi->rx_max_coalesced_frames;
> > +	ec->tx_coalesce_usecs = vi->tx_coalesce_usecs;
> > +	ec->tx_max_coalesced_frames = vi->tx_max_coalesced_frames;
> > +	ec->tx_max_coalesced_frames_irq = vi->tx_work_limit;
> > +
> > +	return 0;
> > +}
> > +
> >  static const struct ethtool_ops virtnet_ethtool_ops = {
> >  	.get_drvinfo = virtnet_get_drvinfo,
> >  	.get_link = ethtool_op_get_link,
> >  	.get_ringparam = virtnet_get_ringparam,
> >  	.set_channels = virtnet_set_channels,
> >  	.get_channels = virtnet_get_channels,
> > +	.set_coalesce = virtnet_set_coalesce,
> > +	.get_coalesce = virtnet_get_coalesce,
> >  };
> >  
> >  #define MIN_MTU 68
> > @@ -2048,6 +2114,7 @@ static unsigned int features[] = {
> >  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
> >  	VIRTIO_NET_F_CTRL_MAC_ADDR,
> >  	VIRTIO_F_ANY_LAYOUT,
> > +	VIRTIO_NET_F_CTRL_COALESCE,
> >  };
> >  
> >  static struct virtio_driver virtio_net_driver = {
> > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > index b5f1677..332009d 100644
> > --- a/include/uapi/linux/virtio_net.h
> > +++ b/include/uapi/linux/virtio_net.h
> > @@ -34,6 +34,7 @@
> >  /* The feature bitmap for virtio net */
> >  #define VIRTIO_NET_F_CSUM	0	/* Host handles pkts w/ partial csum */
> >  #define VIRTIO_NET_F_GUEST_CSUM	1	/* Guest handles pkts w/ partial csum */
> > +#define VIRTIO_NET_F_CTRL_COALESCE 3	/* Set coalescing */
> >  #define VIRTIO_NET_F_MAC	5	/* Host has given MAC address. */
> >  #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
> >  #define VIRTIO_NET_F_GUEST_TSO4	7	/* Guest can handle TSOv4 in. */
> > @@ -202,4 +203,15 @@ struct virtio_net_ctrl_mq {
> >   #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
> >   #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
> >  
> > +struct virtio_net_ctrl_coalesce {
> > +	__u32 coalesce_usecs;
> > +	__u32 max_coalesced_frames;
> > +};
> > +
> > +#define VIRTIO_NET_CTRL_COALESCE 6
> > + #define VIRTIO_NET_CTRL_COALESCE_TX_SET 0
> > + #define VIRTIO_NET_CTRL_COALESCE_TX_GET 1
> > + #define VIRTIO_NET_CTRL_COALESCE_RX_SET 2
> > + #define VIRTIO_NET_CTRL_COALESCE_RX_GET 3
> > +
> >  #endif /* _LINUX_VIRTIO_NET_H */
> > -- 
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending
  2015-02-10 10:18     ` Michael S. Tsirkin
@ 2015-02-10 23:58       ` Rusty Russell
  2015-02-11  5:41       ` Jason Wang
  1 sibling, 0 replies; 24+ messages in thread
From: Rusty Russell @ 2015-02-10 23:58 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: pagupta, netdev, linux-kernel, virtualization

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Tue, Feb 10, 2015 at 11:33:52AM +1030, Rusty Russell wrote:
>> Jason Wang <jasowang@redhat.com> writes:
>> > We currently does:
>> >
>> > bufs = (avail->idx - last_used_idx) * 3 / 4;
>> >
>> > This is ok now since we only try to enable the delayed callbacks when
>> > the queue is about to be full. This may not work well when there is
>> > only one pending buffer in the virtqueue (this may be the case after
>> > tx interrupt was enabled). Since virtqueue_enable_cb() will return
>> > false which may cause unnecessary triggering of napis. This patch
>> > correct this by only calculate the four thirds when bufs is not one.
>> 
>> I mildly prefer to avoid the branch, by changing the calculation like
>> so:
>> 
>>         /* Set bufs >= 1, even if there's only one pending buffer */
>>         bufs = (bufs + 1) * 3 / 4;
>
> Or bus * 3/4 + 1
>
>> But it's not clear to me how much this happens.  I'm happy with the
>> patch though, as currently virtqueue_enable_cb_delayed() is the same
>> as virtqueue_enable_cb() if there's only been one buffer added.
>> 
>> Cheers,
>> Rusty.
>
> But isn't this by design?
> The documentation says:
>
>  * This re-enables callbacks but hints to the other side to delay
>  * interrupts until most of the available buffers have been processed;
>
> Clearly, this implies that when there's one buffer it must behave
> exactly the same.

Yes, my mistake.  We could hit the "never gets notified" case with this
change, and that's a much bigger problem.

So I don't think we can accept this patch...
Rusty.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending
  2015-02-10 10:18     ` Michael S. Tsirkin
  2015-02-10 23:58       ` Rusty Russell
@ 2015-02-11  5:41       ` Jason Wang
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Wang @ 2015-02-11  5:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: pagupta, netdev, linux-kernel, virtualization



On Tue, Feb 10, 2015 at 6:18 PM, Michael S. Tsirkin <mst@redhat.com> 
wrote:
> On Tue, Feb 10, 2015 at 11:33:52AM +1030, Rusty Russell wrote:
>>  Jason Wang <jasowang@redhat.com> writes:
>>  > We currently does:
>>  >
>>  > bufs = (avail->idx - last_used_idx) * 3 / 4;
>>  >
>>  > This is ok now since we only try to enable the delayed callbacks 
>> when
>>  > the queue is about to be full. This may not work well when there 
>> is
>>  > only one pending buffer in the virtqueue (this may be the case 
>> after
>>  > tx interrupt was enabled). Since virtqueue_enable_cb() will return
>>  > false which may cause unnecessary triggering of napis. This patch
>>  > correct this by only calculate the four thirds when bufs is not 
>> one.
>>  
>>  I mildly prefer to avoid the branch, by changing the calculation 
>> like
>>  so:
>>  
>>          /* Set bufs >= 1, even if there's only one pending buffer */
>>          bufs = (bufs + 1) * 3 / 4;
> 
> Or bus * 3/4 + 1
> 
>>  But it's not clear to me how much this happens.  I'm happy with the
>>  patch though, as currently virtqueue_enable_cb_delayed() is the same
>>  as virtqueue_enable_cb() if there's only been one buffer added.
>>  
>>  Cheers,
>>  Rusty.
> 
> But isn't this by design?
> The documentation says:
> 
>  * This re-enables callbacks but hints to the other side to delay
>  * interrupts until most of the available buffers have been processed;
> 
> Clearly, this implies that when there's one buffer it must behave
> exactly the same.
> 
> So I'm not very happy - this changes the meaning of the API without
> updating the documentation.

Think hard about this. And looks like your are right. And the patch may 
in fact cause more trouble e.g the only pending buffer were consumed 
before the final check between used idx and last_used_idx.

Will drop this.

Thanks	

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb()
  2015-02-10 10:24   ` Michael S. Tsirkin
@ 2015-02-11  5:55     ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2015-02-11  5:55 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, pagupta



On Tue, Feb 10, 2015 at 6:24 PM, Michael S. Tsirkin <mst@redhat.com> 
wrote:
> On Mon, Feb 09, 2015 at 03:39:21AM -0500, Jason Wang wrote:
>>  Currently, we do nothing to prevent the callbacks in
>>  virtqueue_disable_cb() when event index is used. This may cause
>>  spurious interrupts which may damage the performance. This patch 
>> tries
>>  to publish avail event as the used even to prevent the callbacks.
>>  
>>  Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> I'm surprised that this ever happens though.
> Normally we call this after getting an interrupt, and
> interrupts won't trigger again until the rings wraps around.

It was used to disable tx interrupt during start_xmit().
> 
> When I tested this, touching an extra cache line was more
> expensive.
> 
> Does this really reduce number of interrupts?
> Could you pls share some numbers with and without this patch?

Yes. It does reduce, see the following test results on multiple 
sessions of TCP_RR

Datacopy:
/size/sessions/+-throughput%/+-cpu%/+-per_cpu_throughput%/+-tx_interrupts%/
/1/50/+0.1/+0.7/-0.6/-72.3/
/64/50/+1.7/+1.0/+0.7/-72.2/
/256/50/+0.0/-0.8/+0.9/-72.4/

Zerocopy:
/size/sessions/+-throughput%/+-cpu%/+-per_cpu_throughput%/+-tx_interrupts%/
/1/50/+3.2/-1.3/+4.6/-71.7/
/64/50/+1.9/-0.2/+2.1/-70.8/
/256/50/+4.4/+0.6/+3.9/-84.2/

No obvious changes for stream rx and tx.
> 
> 
> 
>>  ---
>>   drivers/virtio/virtio_ring.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>  
>>  diff --git a/drivers/virtio/virtio_ring.c 
>> b/drivers/virtio/virtio_ring.c
>>  index 545fed5..e9ffbfb 100644
>>  --- a/drivers/virtio/virtio_ring.c
>>  +++ b/drivers/virtio/virtio_ring.c
>>  @@ -539,6 +539,8 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>>   	struct vring_virtqueue *vq = to_vvq(_vq);
>>   
>>   	vq->vring.avail->flags |= cpu_to_virtio16(_vq->vdev, 
>> VRING_AVAIL_F_NO_INTERRUPT);
>>  +	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev,
>>  +						       vq->vring.avail->idx);
> 
> Hmm in fact, can't this actually cause an extra interrupt
> when avail->idx is completed?

We need to try best to avoid any interrupt after this and 
virtqueue_disable_cb() were always need virtqueue_enable_cb() or 
virtqueue_enable_cb_delayed() afterwards. Those two function will check 
the pending used buffers and set a proper used event.
> 
> 
> I think that if you really can show disabling interrupts like this 
> helps, you should
> set some invalid value like 0xfffff, or move it back to 
> vq->vring.avail->idx - 1.
> No?

I tested avail->idx + vq.num but not obvious changes in the performance 
compared to just use avail->idx here. So 0xffff probably won't help 
much. Since we call virtqueue_enable_cb() or 
virtqueue_disable_cb_delayed() afterwards, it doesn't matter that 
whether avail->idx or avail->idx - 1 is used.
> 
> 
> 
> 
>>   }
>>   EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>>   
>>  -- 
>>  1.8.3.1
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
  2015-02-10 10:40     ` Michael S. Tsirkin
@ 2015-02-13  2:52       ` Rusty Russell
  2015-02-13 12:41         ` Pawel Moll
  2015-02-13 18:19         ` Cornelia Huck
  0 siblings, 2 replies; 24+ messages in thread
From: Rusty Russell @ 2015-02-13  2:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, netdev, linux-kernel, virtualization, pagupta,
	Pawel Moll, Cornelia Huck

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote:
>> Jason Wang <jasowang@redhat.com> writes:
>> > This patch enables the interrupt coalescing setting through ethtool.
>> 
>> The problem is that there's nothing network specific about interrupt
>> coalescing.  I can see other devices wanting exactly the same thing,
>> which means we'd deprecate this in the next virtio standard.
>> 
>> I think the right answer is to extend like we did with
>> vring_used_event(), eg:
>> 
>> 1) Add a new feature VIRTIO_F_RING_COALESCE.
>> 2) Add another a 32-bit field after vring_used_event(), eg:
>>         #define vring_used_delay(vr) (*(u32 *)((vr)->avail->ring[(vr)->num + 2]))
>> 
>> This loses the ability to coalesce by number of frames, but we can still
>> do number of sg entries, as we do now with used_event, and we could
>> change virtqueue_enable_cb_delayed() to take a precise number if we
>> wanted.
>
> But do we expect delay to be update dynamically?
> If not, why not stick it in config space?

Hmm, we could update it dynamically (and will, in the case of ethtool).
But it won't be common, so we could append a field to
virtio_pci_common_cfg for PCI.

I think MMIO and CCW would be easy to extend too, but CC'd to check.

>> My feeling is that this should be a v1.0-only feature though
>> (eg. feature bit 33).
>
> Yes, e.g. we can't extend config space for legacy virtio pci.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
  2015-02-13  2:52       ` Rusty Russell
@ 2015-02-13 12:41         ` Pawel Moll
  2015-02-16  3:07           ` Rusty Russell
  2015-02-13 18:19         ` Cornelia Huck
  1 sibling, 1 reply; 24+ messages in thread
From: Pawel Moll @ 2015-02-13 12:41 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Jason Wang, netdev, linux-kernel,
	virtualization, pagupta, Cornelia Huck

On Fri, 2015-02-13 at 02:52 +0000, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote:
> >> Jason Wang <jasowang@redhat.com> writes:
> >> > This patch enables the interrupt coalescing setting through ethtool.
> >> 
> >> The problem is that there's nothing network specific about interrupt
> >> coalescing.  I can see other devices wanting exactly the same thing,
> >> which means we'd deprecate this in the next virtio standard.
> >> 
> >> I think the right answer is to extend like we did with
> >> vring_used_event(), eg:
> >> 
> >> 1) Add a new feature VIRTIO_F_RING_COALESCE.
> >> 2) Add another a 32-bit field after vring_used_event(), eg:
> >>         #define vring_used_delay(vr) (*(u32 *)((vr)->avail->ring[(vr)->num + 2]))
> >> 
> >> This loses the ability to coalesce by number of frames, but we can still
> >> do number of sg entries, as we do now with used_event, and we could
> >> change virtqueue_enable_cb_delayed() to take a precise number if we
> >> wanted.
> >
> > But do we expect delay to be update dynamically?
> > If not, why not stick it in config space?
> 
> Hmm, we could update it dynamically (and will, in the case of ethtool).
> But it won't be common, so we could append a field to
> virtio_pci_common_cfg for PCI.
> 
> I think MMIO and CCW would be easy to extend too, but CC'd to check.

As far as I understand the virtio_pci_common_cfg principle (just had a
look, for the first time ;-), it's now an equivalent of the MMIO control
registers block. I see no major problem with adding another one.

Or were you thinking about introducing some standard for the "real"
config space? (fine with me as well - the transport will have nothing to
do :-)

Paweł

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
  2015-02-13  2:52       ` Rusty Russell
  2015-02-13 12:41         ` Pawel Moll
@ 2015-02-13 18:19         ` Cornelia Huck
  2015-02-16  3:19           ` Rusty Russell
  1 sibling, 1 reply; 24+ messages in thread
From: Cornelia Huck @ 2015-02-13 18:19 UTC (permalink / raw)
  To: Rusty Russell
  Cc: pagupta, Pawel Moll, Michael S. Tsirkin, netdev, linux-kernel,
	virtualization

On Fri, 13 Feb 2015 13:22:09 +1030
Rusty Russell <rusty@rustcorp.com.au> wrote:

> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote:
> >> Jason Wang <jasowang@redhat.com> writes:
> >> > This patch enables the interrupt coalescing setting through ethtool.
> >> 
> >> The problem is that there's nothing network specific about interrupt
> >> coalescing.  I can see other devices wanting exactly the same thing,
> >> which means we'd deprecate this in the next virtio standard.
> >> 
> >> I think the right answer is to extend like we did with
> >> vring_used_event(), eg:
> >> 
> >> 1) Add a new feature VIRTIO_F_RING_COALESCE.
> >> 2) Add another a 32-bit field after vring_used_event(), eg:
> >>         #define vring_used_delay(vr) (*(u32 *)((vr)->avail->ring[(vr)->num + 2]))
> >> 
> >> This loses the ability to coalesce by number of frames, but we can still
> >> do number of sg entries, as we do now with used_event, and we could
> >> change virtqueue_enable_cb_delayed() to take a precise number if we
> >> wanted.
> >
> > But do we expect delay to be update dynamically?
> > If not, why not stick it in config space?
> 
> Hmm, we could update it dynamically (and will, in the case of ethtool).
> But it won't be common, so we could append a field to
> virtio_pci_common_cfg for PCI.
> 
> I think MMIO and CCW would be easy to extend too, but CC'd to check.

If this is a simple extension of the config space, it should just work
for ccw (the Linux guest driver currently uses 0x100 as max config
space size, which I grabbed from pci at the time I wrote it).

But looking at this virtio_pci_common_cfg stuff, it seems to contain a
lot of things that are handled via ccws on virtio-ccw (like number of
queues or device status). Having an extra ccw just for changing this
delay value seems like overkill.

On the basic topic of interrupt coalescing: With adapter interrupts,
virtio-ccw already has some kind of coalescing: The summary indicator
is set just once and an interrupt is made pending, then individual
queue indicators are switched on and no further interrupt is generated
if the summary indicator has not been cleared by the guest yet. I'm not
sure how it would be different if an individual queue indicator is
switched on later. Chances are that the guest code processing the
indicators has not even yet processed to that individual indicator, so
it wouldn't matter if it was set delayed. It is probably something that
has to be tried out.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
  2015-02-13 12:41         ` Pawel Moll
@ 2015-02-16  3:07           ` Rusty Russell
  0 siblings, 0 replies; 24+ messages in thread
From: Rusty Russell @ 2015-02-16  3:07 UTC (permalink / raw)
  To: Pawel Moll
  Cc: pagupta, Michael S. Tsirkin, netdev, linux-kernel, virtualization

Pawel Moll <pawel.moll@arm.com> writes:
> On Fri, 2015-02-13 at 02:52 +0000, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> > On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote:
>> >> Jason Wang <jasowang@redhat.com> writes:
>> >> > This patch enables the interrupt coalescing setting through ethtool.
>> >> 
>> >> The problem is that there's nothing network specific about interrupt
>> >> coalescing.  I can see other devices wanting exactly the same thing,
>> >> which means we'd deprecate this in the next virtio standard.
>> >> 
>> >> I think the right answer is to extend like we did with
>> >> vring_used_event(), eg:
>> >> 
>> >> 1) Add a new feature VIRTIO_F_RING_COALESCE.
>> >> 2) Add another a 32-bit field after vring_used_event(), eg:
>> >>         #define vring_used_delay(vr) (*(u32 *)((vr)->avail->ring[(vr)->num + 2]))
>> >> 
>> >> This loses the ability to coalesce by number of frames, but we can still
>> >> do number of sg entries, as we do now with used_event, and we could
>> >> change virtqueue_enable_cb_delayed() to take a precise number if we
>> >> wanted.
>> >
>> > But do we expect delay to be update dynamically?
>> > If not, why not stick it in config space?
>> 
>> Hmm, we could update it dynamically (and will, in the case of ethtool).
>> But it won't be common, so we could append a field to
>> virtio_pci_common_cfg for PCI.
>> 
>> I think MMIO and CCW would be easy to extend too, but CC'd to check.
>
> As far as I understand the virtio_pci_common_cfg principle (just had a
> look, for the first time ;-), it's now an equivalent of the MMIO control
> registers block. I see no major problem with adding another one.

OK, thanks.

> Or were you thinking about introducing some standard for the "real"
> config space? (fine with me as well - the transport will have nothing to
> do :-)

No, that'd not be possible at this point.  I think it's a per-transport
decision.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
  2015-02-13 18:19         ` Cornelia Huck
@ 2015-02-16  3:19           ` Rusty Russell
  0 siblings, 0 replies; 24+ messages in thread
From: Rusty Russell @ 2015-02-16  3:19 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Michael S. Tsirkin, Jason Wang, netdev, linux-kernel,
	virtualization, pagupta, Pawel Moll

Cornelia Huck <cornelia.huck@de.ibm.com> writes:
> On Fri, 13 Feb 2015 13:22:09 +1030
> Rusty Russell <rusty@rustcorp.com.au> wrote:
>
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> > On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote:
>> >> Jason Wang <jasowang@redhat.com> writes:
>> >> > This patch enables the interrupt coalescing setting through ethtool.
>> >> 
>> >> The problem is that there's nothing network specific about interrupt
>> >> coalescing.  I can see other devices wanting exactly the same thing,
>> >> which means we'd deprecate this in the next virtio standard.
>> >> 
>> >> I think the right answer is to extend like we did with
>> >> vring_used_event(), eg:
>> >> 
>> >> 1) Add a new feature VIRTIO_F_RING_COALESCE.
>> >> 2) Add another a 32-bit field after vring_used_event(), eg:
>> >>         #define vring_used_delay(vr) (*(u32 *)((vr)->avail->ring[(vr)->num + 2]))
>> >> 
>> >> This loses the ability to coalesce by number of frames, but we can still
>> >> do number of sg entries, as we do now with used_event, and we could
>> >> change virtqueue_enable_cb_delayed() to take a precise number if we
>> >> wanted.
>> >
>> > But do we expect delay to be update dynamically?
>> > If not, why not stick it in config space?
>> 
>> Hmm, we could update it dynamically (and will, in the case of ethtool).
>> But it won't be common, so we could append a field to
>> virtio_pci_common_cfg for PCI.
>> 
>> I think MMIO and CCW would be easy to extend too, but CC'd to check.
>
> If this is a simple extension of the config space, it should just work
> for ccw (the Linux guest driver currently uses 0x100 as max config
> space size, which I grabbed from pci at the time I wrote it).
>
> But looking at this virtio_pci_common_cfg stuff, it seems to contain a
> lot of things that are handled via ccws on virtio-ccw (like number of
> queues or device status). Having an extra ccw just for changing this
> delay value seems like overkill.

Yes, possibly.

> On the basic topic of interrupt coalescing: With adapter interrupts,
> virtio-ccw already has some kind of coalescing: The summary indicator
> is set just once and an interrupt is made pending, then individual
> queue indicators are switched on and no further interrupt is generated
> if the summary indicator has not been cleared by the guest yet. I'm not
> sure how it would be different if an individual queue indicator is
> switched on later. Chances are that the guest code processing the
> indicators has not even yet processed to that individual indicator, so
> it wouldn't matter if it was set delayed. It is probably something that
> has to be tried out.

The network driver will do this at the virtio level too: no more rx
interrupts will be received until all packets have been processed.

But it is particularly useful for network transmit interrupts: we want
to be notified of the packet's finishing, but a little delay (hence more
batching) is better.  For rx, I can envision a case where the guest is
too fast and thus keeps getting interrupted after each packet.  A user
might decide to trade off some latency to increase batching; seems
like a bit like a benchmark hack to me, though...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-02-16  4:10 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-09  8:39 [PATCH RFC v5 net-next 0/6] enable tx interrupts for virtio-net Jason Wang
2015-02-09  8:39 ` [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending Jason Wang
2015-02-10  1:03   ` Rusty Russell
2015-02-10  6:26     ` Jason Wang
2015-02-10 10:18     ` Michael S. Tsirkin
2015-02-10 23:58       ` Rusty Russell
2015-02-11  5:41       ` Jason Wang
2015-02-09  8:39 ` [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb() Jason Wang
2015-02-10  1:07   ` Rusty Russell
2015-02-10 10:24   ` Michael S. Tsirkin
2015-02-11  5:55     ` Jason Wang
2015-02-09  8:39 ` [PATCH RFC v5 net-next 3/6] virtio_net: enable tx interrupt Jason Wang
2015-02-09  8:39 ` [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support Jason Wang
2015-02-10  1:32   ` Rusty Russell
2015-02-10  6:51     ` Jason Wang
2015-02-10 10:25       ` Michael S. Tsirkin
2015-02-10 10:40     ` Michael S. Tsirkin
2015-02-13  2:52       ` Rusty Russell
2015-02-13 12:41         ` Pawel Moll
2015-02-16  3:07           ` Rusty Russell
2015-02-13 18:19         ` Cornelia Huck
2015-02-16  3:19           ` Rusty Russell
2015-02-09  8:39 ` [PATCH RFC v5 net-next 5/6] vhost: let vhost_signal() returns whether signalled Jason Wang
2015-02-09  8:39 ` [PATCH RFC v5 net-next 6/6] vhost_net: interrupt coalescing support Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).