linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net
@ 2015-05-25  5:23 Jason Wang
  2015-05-25  5:23 ` [RFC V7 PATCH 1/7] virito-pci: add coalescing parameters setting Jason Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Jason Wang @ 2015-05-25  5:23 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: rusty, Jason Wang

Hi:

This is a new version of trying to enable tx interrupts for
virtio-net.

We used to try to avoid tx interrupts and orphan packets before
transmission for virtio-net. This breaks socket accounting and can
lead serveral other side effects e.g:

- Several other functions which depends on socket accounting can not
  work correctly (e.g  TCP Small Queue)
- No tx completion which make BQL or packet generator can not work
  correctly.

This series tries to solve the issue by enabling tx interrupts. To
minize the performance impacts of this, several optimizations were
used:

- In guest side, try to use delayed callbacks as much as possible.
- In host side, try to use interrupt coalescing for reduce
  interrupts. About 10% - 15% performance were improved with this.

Perforamnce test shows:
- Few regression (10% - 15%) were noticed TCP_RR were noticed, this
  regresssion were not seen in previous version. Still not clear the
  reason.
- CPU utilization is increased in some cases.
- All other cases, tx interrupts can perform equal or better than
  orphaning especially for small packet tx

TODO:
- Try to fix the regressions of TCP_RR
- Determine a suitable coalescing paramters

Test Environmets:
- Two Intel Xeon E5620  @ 2.40GHz with back to back connected Intel 82599EB
- Both host and guest were 4.1-rc4
- Vhost zerocopy disabled
- idle=poll
- Netperf 2.6.0
- tx-frames=8 tx-usecs=64 (which was chosen to be the best performance
  during testing other combinations)
- Irqbalance were disabled by host, and smp affinity were set manually
- Using default ixgbe coalescing parameters

Test Result:

1 VCPU guest 1 Queue

Guest TX
size/session/+thu%/+normalize%
   64/     1/  +22%/  +23%
   64/     2/  +25%/  +26%
   64/     4/  +24%/  +24%
   64/     8/  +24%/  +25%
  256/     1/ +134%/ +141%
  256/     2/ +126%/ +132%
  256/     4/ +126%/ +134%
  256/     8/ +130%/ +135%
  512/     1/ +157%/ +170%
  512/     2/ +155%/ +169%
  512/     4/ +153%/ +168%
  512/     8/ +162%/ +176%
 1024/     1/  +84%/ +119%
 1024/     2/ +120%/ +146%
 1024/     4/ +105%/ +131%
 1024/     8/ +103%/ +134%
 2048/     1/  +20%/  +97%
 2048/     2/  +29%/  +76%
 2048/     4/    0%/  +11%
 2048/     8/    0%/   +3%
16384/     1/    0%/   -5%
16384/     2/    0%/  -10%
16384/     4/    0%/   -3%
16384/     8/    0%/    0%
65535/     1/    0%/  -10%
65535/     2/    0%/   -5%
65535/     4/    0%/   -3%
65535/     8/    0%/   -5%

TCP_RR
size/session/+thu%/+normalize%
    1/     1/    0%/   -9%
    1/    25/   -5%/   -5%
    1/    50/   -4%/   -3%
   64/     1/    0%/   -7%
   64/    25/   -5%/   -6%
   64/    50/   -5%/   -6%
  256/     1/    0%/   -6%
  256/    25/  -14%/  -14%
  256/    50/  -14%/  -14%

Guest RX
size/session/+thu%/+normalize%
   64/     1/    0%/   -1%
   64/     2/   +3%/   +3%
   64/     4/    0%/   -1%
   64/     8/    0%/    0%
  256/     1/   +5%/   +1%
  256/     2/   -9%/  -13%
  256/     4/    0%/   -2%
  256/     8/    0%/   -3%
  512/     1/   +1%/   -2%
  512/     2/   -3%/   -6%
  512/     4/    0%/   -3%
  512/     8/    0%/   -1%
 1024/     1/  +11%/  +16%
 1024/     2/    0%/   -3%
 1024/     4/    0%/   -2%
 1024/     8/    0%/   -1%
 2048/     1/    0%/   -3%
 2048/     2/    0%/   -1%
 2048/     4/    0%/   -1%
 2048/     8/    0%/   -2%
16384/     1/    0%/   -2%
16384/     2/    0%/   -4%
16384/     4/    0%/   -3%
16384/     8/    0%/   -3%
65535/     1/    0%/   -2%
65535/     2/    0%/   -5%
65535/     4/    0%/   -1%
65535/     8/   +1%/    0%

4 VCPU guest 4 QUEUE
Guest TX
size/session/+thu%/+normalize%
   64/     1/  +42%/  +38%
   64/     2/  +33%/  +33%
   64/     4/  +16%/  +19%
   64/     8/  +19%/  +22%
  256/     1/ +139%/ +134%
  256/     2/  +43%/  +52%
  256/     4/   +1%/   +6%
  256/     8/    0%/   +4%
  512/     1/ +171%/ +175%
  512/     2/   -1%/  +26%
  512/     4/   +9%/   +8%
  512/     8/  +48%/  +31%
 1024/     1/ +162%/ +171%
 1024/     2/    0%/   +2%
 1024/     4/   +3%/    0%
 1024/     8/   +6%/   +2%
 2048/     1/  +60%/  +94%
 2048/     2/    0%/   +2%
 2048/     4/  +23%/  +11%
 2048/     8/   -1%/   -6%
16384/     1/    0%/  -12%
16384/     2/    0%/   -8%
16384/     4/    0%/   -9%
16384/     8/    0%/  -11%
65535/     1/    0%/  -15%
65535/     2/    0%/  -10%
65535/     4/    0%/   -6%
65535/     8/   +1%/  -10%

TCP_RR
size/session/+thu%/+normalize%
    1/     1/    0%/  -15%
    1/    25/  -14%/   -9%
    1/    50/   +3%/   +3%
   64/     1/   -3%/  -10%
   64/    25/  -13%/   -4%
   64/    50/   -7%/   -4%
  256/     1/   -1%/  -19%
  256/    25/  -15%/   -3%
  256/    50/  -16%/   -9%

Guest RX
size/session/+thu%/+normalize%
   64/     1/   +4%/  +21%
   64/     2/  +81%/ +140%
   64/     4/  +51%/ +196%
   64/     8/  -10%/  +33%
  256/     1/ +139%/ +216%
  256/     2/  +53%/ +114%
  256/     4/   -9%/   -5%
  256/     8/   -9%/  -14%
  512/     1/ +257%/ +413%
  512/     2/  +11%/  +32%
  512/     4/   -4%/   -6%
  512/     8/   -7%/  -10%
 1024/     1/  +98%/ +138%
 1024/     2/   -6%/   -9%
 1024/     4/   -3%/   -4%
 1024/     8/   -7%/  -10%
 2048/     1/  +32%/  +29%
 2048/     2/   -7%/  -14%
 2048/     4/   -3%/   -3%
 2048/     8/   -7%/   -3%
16384/     1/  -13%/  -19%
16384/     2/   -3%/   -9%
16384/     4/   -7%/   -9%
16384/     8/   -9%/  -10%
65535/     1/    0%/   -3%
65535/     2/   -2%/  -10%
65535/     4/   -6%/  -11%
65535/     8/   -9%/   -9%

4 VCPU Guest 4 Queue
Guest TX
size/session/+thu%/+normalize%
   64/     1/  +33%/  +31%
   64/     2/  +26%/  +29%
   64/     4/  +24%/  +29%
   64/     8/  +19%/  +24%
  256/     1/ +117%/ +128%
  256/     2/  +96%/ +109%
  256/     4/ +123%/ +198%
  256/     8/  +54%/ +111%
  512/     1/ +153%/ +171%
  512/     2/  +77%/ +135%
  512/     4/    0%/  +11%
  512/     8/    0%/   +2%
 1024/     1/ +133%/ +156%
 1024/     2/  +21%/  +78%
 1024/     4/    0%/   +3%
 1024/     8/    0%/   -7%
 2048/     1/  +41%/  +60%
 2048/     2/  +50%/ +153%
 2048/     4/    0%/  -10%
 2048/     8/   +2%/   -3%
16384/     1/    0%/   -7%
16384/     2/    0%/   -3%
16384/     4/   +1%/   -9%
16384/     8/   +4%/   -9%
65535/     1/    0%/   -7%
65535/     2/    0%/   -7%
65535/     4/   +5%/   -2%
65535/     8/    0%/   -5%

TCP_RR
size/session/+thu%/+normalize%
    1/     1/    0%/   -6%
    1/    25/  -17%/  -15%
    1/    50/  -24%/  -21%
   64/     1/   -1%/   -1%
   64/    25/  -14%/  -12%
   64/    50/  -23%/  -21%
  256/     1/    0%/  -12%
  256/    25/   -4%/   -8%
  256/    50/   -7%/   -8%

Guest RX
size/session/+thu%/+normalize%
   64/     1/   +3%/   -4%
   64/     2/  +32%/  +41%
   64/     4/   +5%/   -3%
   64/     8/   +7%/    0%
  256/     1/    0%/  -10%
  256/     2/  -15%/  -26%
  256/     4/    0%/   -5%
  256/     8/   -1%/  -11%
  512/     1/   +4%/   -7%
  512/     2/   -6%/    0%
  512/     4/    0%/   -8%
  512/     8/    0%/   -8%
 1024/     1/  +71%/   -2%
 1024/     2/   -4%/    0%
 1024/     4/    0%/  -11%
 1024/     8/    0%/   -9%
 2048/     1/   -1%/   +9%
 2048/     2/   -2%/   -2%
 2048/     4/    0%/   -6%
 2048/     8/    0%/  -10%
16384/     1/    0%/   -3%
16384/     2/    0%/  -14%
16384/     4/    0%/  -10%
16384/     8/   -2%/  -13%
65535/     1/    0%/   -4%
65535/     2/   +1%/  -16%
65535/     4/   +1%/   -8%
65535/     8/   +4%/   -6%

Changes from RFCv5:
- rebase the HEAD
- Move net specific codes to virtio/vhost generic codes
- Drop the wrong virtqueue_enable_cb_delayed() optimization from the
  series
- Limit the enabling of tx interrupt only for host with interrupt
  coalescing. This can reduce the performance impact for older host.
- Avoid expensive dividing in vhost code.
- Try to avoid the overhead of timer callback by using mutex_trylock()
  and inject the irq directly from the timer callback.

Changes from RFCv4:
- fix the virtqueue_enable_cb_delayed() return value when only 1
  buffer is pending.
- try to disable callbacks by publish event index in
  virtqueue_disable_cb(). Tests shows about 2% - 3% improvement on
  multiple sessions of TCP_RR.
- Revert some of Micahel's tweaks from RFC v1 (see patch 3 for
  details).
- use netif_wake_subqueue() instead of netif_start_subqueue() in
  free_old_xmit_skbs(), since it may be called in tx napi.
- in start_xmit(), try to enable the callback only when current skb is
  the last in the list or tx has already been stopped. This avoid the
  callbacks enabling in heavy load.
- return ns instead of us in vhost_net_check_coalesce_and_signal()
- measure the time interval of real interrupts instead of calls to
  vhost_signal()
- drop bql from the series since it does not affact performance from
  the test result.
Changes from RFC V3:
- Don't free tx packets in ndo_start_xmit()
- Add interrupt coalescing support for virtio-net
Changes from RFC v2:
- clean up code, address issues raised by Jason
Changes from RFC v1:
- address comments by Jason Wang, use delayed cb everywhere
- rebased Jason's patch on top of mine and include it (with some
  tweaks)

Jason Wang (7):
  virito-cpi: add coalescing parameters setting
  virtio_ring: try to disable event index callbacks in
    virtqueue_disable_cb()
  virtio-net: optimize free_old_xmit_skbs stats
  virtio-net: add basic interrupt coalescing support
  virtio_net: enable tx interrupt
  vhost: interrupt coalescing support
  vhost_net: add interrupt coalescing support

 drivers/net/virtio_net.c           | 266 +++++++++++++++++++++++++++++++------
 drivers/vhost/net.c                |   8 ++
 drivers/vhost/vhost.c              |  88 +++++++++++-
 drivers/vhost/vhost.h              |  20 +++
 drivers/virtio/virtio_pci_modern.c |  15 +++
 drivers/virtio/virtio_ring.c       |   3 +
 include/linux/virtio_config.h      |   8 ++
 include/uapi/linux/vhost.h         |  13 +-
 include/uapi/linux/virtio_pci.h    |   4 +
 include/uapi/linux/virtio_ring.h   |   1 +
 10 files changed, 382 insertions(+), 44 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC V7 PATCH 1/7] virito-pci: add coalescing parameters setting
  2015-05-25  5:23 [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net Jason Wang
@ 2015-05-25  5:23 ` Jason Wang
  2015-05-25  5:23 ` [RFC V7 PATCH 2/7] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb() Jason Wang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Jason Wang @ 2015-05-25  5:23 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: rusty, Jason Wang

This patch introduces a transport specific methods to set or get the
coalescing parameters and implement the pci methods.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_pci_modern.c | 15 +++++++++++++++
 include/linux/virtio_config.h      |  8 ++++++++
 include/uapi/linux/virtio_pci.h    |  4 ++++
 3 files changed, 27 insertions(+)

diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index e88e099..ce801ae 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -266,6 +266,16 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
 	vp_iowrite8(status, &vp_dev->common->device_status);
 }
 
+static void vp_set_coalesce(struct virtio_device *vdev, int n,
+			    u32 coalesce_count, u32 coalesce_us)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	iowrite16(n, &vp_dev->common->queue_select);
+	iowrite32(coalesce_count, &vp_dev->common->queue_coalesce_count);
+	iowrite32(coalesce_us, &vp_dev->common->queue_coalesce_us);
+}
+
 static void vp_reset(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -481,6 +491,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.generation	= vp_generation,
 	.get_status	= vp_get_status,
 	.set_status	= vp_set_status,
+	.set_coalesce   = vp_set_coalesce,
 	.reset		= vp_reset,
 	.find_vqs	= vp_modern_find_vqs,
 	.del_vqs	= vp_del_vqs,
@@ -588,6 +599,10 @@ static inline void check_offsets(void)
 		     offsetof(struct virtio_pci_common_cfg, queue_used_lo));
 	BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_USEDHI !=
 		     offsetof(struct virtio_pci_common_cfg, queue_used_hi));
+	BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_COALESCE_C !=
+		offsetof(struct virtio_pci_common_cfg, queue_coalesce_count));
+	BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_COALESCE_U !=
+		offsetof(struct virtio_pci_common_cfg, queue_coalesce_us));
 }
 
 /* the PCI probing function */
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 1e306f7..d100c32 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -28,6 +28,12 @@
  * @set_status: write the status byte
  *	vdev: the virtio_device
  *	status: the new status byte
+ * @set_coalesce: set coalescing parameters
+ *	vdev: the virtio_device
+ *	n: the queue index
+ *	coalesce_count: maximum coalesced count before issuing interrupt
+ *      coalesce_count: maximum micro seconds to wait if there's a
+ *      pending buffer
  * @reset: reset the device
  *	vdev: the virtio device
  *	After this, status and feature negotiation must be done again
@@ -66,6 +72,8 @@ struct virtio_config_ops {
 	u32 (*generation)(struct virtio_device *vdev);
 	u8 (*get_status)(struct virtio_device *vdev);
 	void (*set_status)(struct virtio_device *vdev, u8 status);
+	void (*set_coalesce)(struct virtio_device *vdev, int n,
+			     u32 coalesce_count, u32 coalesce_us);
 	void (*reset)(struct virtio_device *vdev);
 	int (*find_vqs)(struct virtio_device *, unsigned nvqs,
 			struct virtqueue *vqs[],
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 7530146..3396026 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -155,6 +155,8 @@ struct virtio_pci_common_cfg {
 	__le32 queue_avail_hi;		/* read-write */
 	__le32 queue_used_lo;		/* read-write */
 	__le32 queue_used_hi;		/* read-write */
+	__le32 queue_coalesce_count;    /* read-write */
+	__le32 queue_coalesce_us;       /* read-write */
 };
 
 /* Macro versions of offsets for the Old Timers! */
@@ -187,6 +189,8 @@ struct virtio_pci_common_cfg {
 #define VIRTIO_PCI_COMMON_Q_AVAILHI	44
 #define VIRTIO_PCI_COMMON_Q_USEDLO	48
 #define VIRTIO_PCI_COMMON_Q_USEDHI	52
+#define VIRTIO_PCI_COMMON_Q_COALESCE_C  56
+#define VIRTIO_PCI_COMMON_Q_COALESCE_U  60
 
 #endif /* VIRTIO_PCI_NO_MODERN */
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC V7 PATCH 2/7] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb()
  2015-05-25  5:23 [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net Jason Wang
  2015-05-25  5:23 ` [RFC V7 PATCH 1/7] virito-pci: add coalescing parameters setting Jason Wang
@ 2015-05-25  5:23 ` Jason Wang
  2015-05-25  5:24 ` [RFC V7 PATCH 3/7] virtio-net: optimize free_old_xmit_skbs stats Jason Wang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Jason Wang @ 2015-05-25  5:23 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: rusty, Jason Wang

Currently, we do nothing to prevent the callbacks in virtqueue_disable_cb() when
event index is used. This may cause spurious interrupts which may damage the
performance. This patch tries to publish last_used_idx as the used even to prevent
the callbacks.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 096b857..a83aebc 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -538,6 +538,7 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	vq->vring.avail->flags |= cpu_to_virtio16(_vq->vdev, VRING_AVAIL_F_NO_INTERRUPT);
+	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC V7 PATCH 3/7] virtio-net: optimize free_old_xmit_skbs stats
  2015-05-25  5:23 [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net Jason Wang
  2015-05-25  5:23 ` [RFC V7 PATCH 1/7] virito-pci: add coalescing parameters setting Jason Wang
  2015-05-25  5:23 ` [RFC V7 PATCH 2/7] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb() Jason Wang
@ 2015-05-25  5:24 ` Jason Wang
  2015-05-25  5:24 ` [RFC V7 PATCH 4/7] virtio-net: add basic interrupt coalescing support Jason Wang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Jason Wang @ 2015-05-25  5:24 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: rusty, Jason Wang

We already have counters for sent packets and sent bytes.
Use them to reduce the number of u64_stats_update_begin/end().

Take care not to bother with stats update when called
speculatively.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 63c7810..744f0b1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -826,17 +826,27 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	unsigned int len;
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	unsigned int packets = 0, bytes = 0;
 
 	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 
-		u64_stats_update_begin(&stats->tx_syncp);
-		stats->tx_bytes += skb->len;
-		stats->tx_packets++;
-		u64_stats_update_end(&stats->tx_syncp);
+		bytes += skb->len;
+		packets++;
 
 		dev_kfree_skb_any(skb);
 	}
+
+	/* Avoid overhead when no packets have been processed
+	 * happens when called speculatively from start_xmit.
+	 */
+	if (!packets)
+		return ;
+
+	u64_stats_update_begin(&stats->tx_syncp);
+	stats->tx_bytes += bytes;
+	stats->tx_packets += packets;
+	u64_stats_update_end(&stats->tx_syncp);
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC V7 PATCH 4/7] virtio-net: add basic interrupt coalescing support
  2015-05-25  5:23 [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net Jason Wang
                   ` (2 preceding siblings ...)
  2015-05-25  5:24 ` [RFC V7 PATCH 3/7] virtio-net: optimize free_old_xmit_skbs stats Jason Wang
@ 2015-05-25  5:24 ` Jason Wang
  2015-05-25  5:24 ` [RFC V7 PATCH 5/7] virtio_net: enable tx interrupt Jason Wang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Jason Wang @ 2015-05-25  5:24 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: rusty, Jason Wang

This patch enables the interrupt coalescing setting through ethtool.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c         | 62 ++++++++++++++++++++++++++++++++++++++++
 drivers/virtio/virtio_ring.c     |  2 ++
 include/uapi/linux/virtio_ring.h |  1 +
 3 files changed, 65 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 744f0b1..4ad739f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -140,6 +140,14 @@ struct virtnet_info {
 
 	/* CPU hot plug notifier */
 	struct notifier_block nb;
+
+	/* Budget for polling tx completion */
+	u32 tx_work_limit;
+
+	__u32 rx_coalesce_usecs;
+	__u32 rx_max_coalesced_frames;
+	__u32 tx_coalesce_usecs;
+	__u32 tx_max_coalesced_frames;
 };
 
 struct padded_vnet_hdr {
@@ -1384,6 +1392,58 @@ static void virtnet_get_channels(struct net_device *dev,
 	channels->other_count = 0;
 }
 
+static int virtnet_set_coalesce(struct net_device *dev,
+				struct ethtool_coalesce *ec)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	int i;
+
+	if (!vi->vdev->config->set_coalesce) {
+		dev_warn(&dev->dev, "Transport does not support coalescing.\n");
+		return -EINVAL;
+	}
+
+	if (vi->rx_coalesce_usecs != ec->rx_coalesce_usecs ||
+	    vi->rx_max_coalesced_frames != ec->rx_max_coalesced_frames) {
+		for (i = 0; i < vi->max_queue_pairs; i++) {
+			vi->vdev->config->set_coalesce(vi->vdev, rxq2vq(i),
+						ec->rx_max_coalesced_frames,
+						ec->rx_coalesce_usecs);
+		}
+		vi->rx_coalesce_usecs = ec->rx_coalesce_usecs;
+		vi->rx_max_coalesced_frames = ec->rx_max_coalesced_frames;
+	}
+
+	if (vi->tx_coalesce_usecs != ec->tx_coalesce_usecs ||
+	    vi->tx_max_coalesced_frames != ec->tx_max_coalesced_frames) {
+		for (i = 0; i < vi->max_queue_pairs; i++) {
+			vi->vdev->config->set_coalesce(vi->vdev, txq2vq(i),
+						ec->tx_max_coalesced_frames,
+						ec->tx_coalesce_usecs);
+		}
+		vi->tx_coalesce_usecs = ec->tx_coalesce_usecs;
+		vi->tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
+	}
+
+	vi->tx_work_limit = ec->tx_max_coalesced_frames_irq;
+
+	return 0;
+}
+
+static int virtnet_get_coalesce(struct net_device *dev,
+				struct ethtool_coalesce *ec)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+
+	ec->rx_coalesce_usecs = vi->rx_coalesce_usecs;
+	ec->rx_max_coalesced_frames = vi->rx_max_coalesced_frames;
+	ec->tx_coalesce_usecs = vi->tx_coalesce_usecs;
+	ec->tx_max_coalesced_frames = vi->tx_max_coalesced_frames;
+	ec->tx_max_coalesced_frames_irq = vi->tx_work_limit;
+
+	return 0;
+}
+
 static const struct ethtool_ops virtnet_ethtool_ops = {
 	.get_drvinfo = virtnet_get_drvinfo,
 	.get_link = ethtool_op_get_link,
@@ -1391,6 +1451,8 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
 	.set_channels = virtnet_set_channels,
 	.get_channels = virtnet_get_channels,
 	.get_ts_info = ethtool_op_get_ts_info,
+	.set_coalesce = virtnet_set_coalesce,
+	.get_coalesce = virtnet_get_coalesce,
 };
 
 #define MIN_MTU 68
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index a83aebc..a2cdbe3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -780,6 +780,8 @@ void vring_transport_features(struct virtio_device *vdev)
 			break;
 		case VIRTIO_RING_F_EVENT_IDX:
 			break;
+		case VIRTIO_RING_F_INTR_COALESCING:
+			break;
 		case VIRTIO_F_VERSION_1:
 			break;
 		default:
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 915980a..e9756d8 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -58,6 +58,7 @@
 /* The Host publishes the avail index for which it expects a kick
  * at the end of the used ring. Guest should ignore the used->flags field. */
 #define VIRTIO_RING_F_EVENT_IDX		29
+#define VIRTIO_RING_F_INTR_COALESCING   31
 
 /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
 struct vring_desc {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC V7 PATCH 5/7] virtio_net: enable tx interrupt
  2015-05-25  5:23 [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net Jason Wang
                   ` (3 preceding siblings ...)
  2015-05-25  5:24 ` [RFC V7 PATCH 4/7] virtio-net: add basic interrupt coalescing support Jason Wang
@ 2015-05-25  5:24 ` Jason Wang
  2015-05-25  5:24 ` [RFC V7 PATCH 6/7] vhost: interrupt coalescing support Jason Wang
  2015-05-25  5:24 ` [RFC V7 PATCH 7/7] vhost_net: add " Jason Wang
  6 siblings, 0 replies; 10+ messages in thread
From: Jason Wang @ 2015-05-25  5:24 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: rusty, Jason Wang

This patch enable tx interrupt for virtio-net driver. This can make
socket accounting works again and help to reduce the buffer bloat. To
reduce the performance impacts, only enable tx interrupt on newer host
with interrupt coalescing support.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 214 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 164 insertions(+), 50 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4ad739f..a48b1f9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
 	/* Name of the send queue: output.$index */
 	char name[40];
+
+	struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -123,6 +125,9 @@ struct virtnet_info {
 	/* Host can handle any s/g split between our header and packet data */
 	bool any_header_sg;
 
+	/* Host can coalesce interrupts */
+	bool intr_coalescing;
+
 	/* Packet virtio header size */
 	u8 hdr_len;
 
@@ -215,15 +220,54 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
+static unsigned int free_old_xmit_skbs(struct netdev_queue *txq,
+				       struct send_queue *sq, int budget)
+{
+	struct sk_buff *skb;
+	unsigned int len;
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	unsigned int packets = 0, bytes = 0;
+
+	while (packets < budget &&
+	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		pr_debug("Sent skb %p\n", skb);
+
+		bytes += skb->len;
+		packets++;
+
+		dev_kfree_skb_any(skb);
+	}
+
+	if (vi->intr_coalescing &&
+	    sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+		netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+
+	u64_stats_update_begin(&stats->tx_syncp);
+	stats->tx_bytes += bytes;
+	stats->tx_packets += packets;
+	u64_stats_update_end(&stats->tx_syncp);
+
+	return packets;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
+	struct send_queue *sq = &vi->sq[vq2txq(vq)];
 
-	/* Suppress further interrupts. */
-	virtqueue_disable_cb(vq);
+	if (vi->intr_coalescing) {
+		if (napi_schedule_prep(&sq->napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(&sq->napi);
+		}
+	} else {
+		/* Suppress further interrupts. */
+		virtqueue_disable_cb(vq);
 
-	/* We were probably waiting for more output buffers. */
-	netif_wake_subqueue(vi->dev, vq2txq(vq));
+		/* We were probably waiting for more output buffers. */
+		netif_wake_subqueue(vi->dev, vq2txq(vq));
+	}
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -775,6 +819,30 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 	return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct send_queue *sq =
+		container_of(napi, struct send_queue, napi);
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+	u32 limit = vi->tx_work_limit;
+	unsigned int r, sent;
+
+	__netif_tx_lock(txq, smp_processor_id());
+	sent = free_old_xmit_skbs(txq, sq, limit);
+	if (sent < limit) {
+		r = virtqueue_enable_cb_prepare(sq->vq);
+		napi_complete(napi);
+		if (unlikely(virtqueue_poll(sq->vq, r)) &&
+		    napi_schedule_prep(napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(napi);
+		}
+	}
+	__netif_tx_unlock(txq);
+	return sent < limit ? 0 : budget;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
@@ -823,40 +891,12 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(&vi->rq[i]);
+		napi_enable(&vi->sq[i].napi);
 	}
 
 	return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
-{
-	struct sk_buff *skb;
-	unsigned int len;
-	struct virtnet_info *vi = sq->vq->vdev->priv;
-	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
-	unsigned int packets = 0, bytes = 0;
-
-	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		pr_debug("Sent skb %p\n", skb);
-
-		bytes += skb->len;
-		packets++;
-
-		dev_kfree_skb_any(skb);
-	}
-
-	/* Avoid overhead when no packets have been processed
-	 * happens when called speculatively from start_xmit.
-	 */
-	if (!packets)
-		return ;
-
-	u64_stats_update_begin(&stats->tx_syncp);
-	stats->tx_bytes += bytes;
-	stats->tx_packets += packets;
-	u64_stats_update_end(&stats->tx_syncp);
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
@@ -921,7 +961,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 		sg_set_buf(sq->sg, hdr, hdr_len);
 		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
 	}
-	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
+
+	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb,
+				    GFP_ATOMIC);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -934,7 +976,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	bool kick = !skb->xmit_more;
 
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	free_old_xmit_skbs(txq, sq, virtqueue_get_vring_size(sq->vq));
 
 	/* timestamp packet in software */
 	skb_tx_timestamp(skb);
@@ -957,21 +999,13 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_orphan(skb);
 	nf_reset(skb);
 
-	/* If running out of space, stop queue to avoid getting packets that we
-	 * are then unable to transmit.
-	 * An alternative would be to force queuing layer to requeue the skb by
-	 * returning NETDEV_TX_BUSY. However, NETDEV_TX_BUSY should not be
-	 * returned in a normal path of operation: it means that driver is not
-	 * maintaining the TX queue stop/start state properly, and causes
-	 * the stack to do a non-trivial amount of useless work.
-	 * Since most packets only take 1 or 2 ring slots, stopping the queue
-	 * early means 16 slots are typically wasted.
-	 */
+	/* Apparently nice girls don't return TX_BUSY; stop the queue
+	 * before it gets out of hand.  Naturally, this wastes entries. */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
 		netif_stop_subqueue(dev, qnum);
 		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
+			free_old_xmit_skbs(txq, sq, virtqueue_get_vring_size(sq->vq));
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);
@@ -985,6 +1019,50 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return NETDEV_TX_OK;
 }
 
+static netdev_tx_t start_xmit_txintr(struct sk_buff *skb, struct net_device *dev)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	int qnum = skb_get_queue_mapping(skb);
+	struct send_queue *sq = &vi->sq[qnum];
+	int err;
+	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
+	bool kick = !skb->xmit_more;
+
+	virtqueue_disable_cb(sq->vq);
+
+	/* timestamp packet in software */
+	skb_tx_timestamp(skb);
+
+	/* Try to transmit */
+	err = xmit_skb(sq, skb);
+
+	/* This should not happen! */
+	if (unlikely(err)) {
+		dev->stats.tx_fifo_errors++;
+		if (net_ratelimit())
+			dev_warn(&dev->dev,
+				 "Unexpected TXQ (%d) queue failure: %d\n", qnum, err);
+		dev->stats.tx_dropped++;
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+
+	/* Apparently nice girls don't return TX_BUSY; stop the queue
+	 * before it gets out of hand.  Naturally, this wastes entries. */
+	if (sq->vq->num_free < 2+MAX_SKB_FRAGS)
+		netif_stop_subqueue(dev, qnum);
+
+	if (kick || netif_xmit_stopped(txq)) {
+		virtqueue_kick(sq->vq);
+		if (!virtqueue_enable_cb_delayed(sq->vq) &&
+		    napi_schedule_prep(&sq->napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(&sq->napi);
+		}
+	}
+	return NETDEV_TX_OK;
+}
+
 /*
  * Send command via the control virtqueue and check status.  Commands
  * supported by the hypervisor, as indicated by feature bits, should
@@ -1159,8 +1237,10 @@ static int virtnet_close(struct net_device *dev)
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_disable(&vi->rq[i].napi);
+		napi_disable(&vi->sq[i].napi);
+	}
 
 	return 0;
 }
@@ -1485,6 +1565,25 @@ static const struct net_device_ops virtnet_netdev = {
 #endif
 };
 
+static const struct net_device_ops virtnet_netdev_txintr = {
+	.ndo_open            = virtnet_open,
+	.ndo_stop   	     = virtnet_close,
+	.ndo_start_xmit      = start_xmit_txintr,
+	.ndo_validate_addr   = eth_validate_addr,
+	.ndo_set_mac_address = virtnet_set_mac_address,
+	.ndo_set_rx_mode     = virtnet_set_rx_mode,
+	.ndo_change_mtu	     = virtnet_change_mtu,
+	.ndo_get_stats64     = virtnet_stats,
+	.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	.ndo_poll_controller = virtnet_netpoll,
+#endif
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	.ndo_busy_poll		= virtnet_busy_poll,
+#endif
+};
+
 static void virtnet_config_changed_work(struct work_struct *work)
 {
 	struct virtnet_info *vi =
@@ -1531,6 +1630,7 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_hash_del(&vi->rq[i].napi);
 		netif_napi_del(&vi->rq[i].napi);
+		netif_napi_del(&vi->sq[i].napi);
 	}
 
 	kfree(vi->rq);
@@ -1685,6 +1785,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
 			       napi_weight);
 		napi_hash_add(&vi->rq[i].napi);
+		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+			       napi_weight);
 
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
@@ -1819,7 +1921,10 @@ static int virtnet_probe(struct virtio_device *vdev)
 
 	/* Set up network device as normal. */
 	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE;
-	dev->netdev_ops = &virtnet_netdev;
+	if (virtio_has_feature(vdev, VIRTIO_RING_F_INTR_COALESCING))
+		dev->netdev_ops = &virtnet_netdev_txintr;
+	else
+		dev->netdev_ops = &virtnet_netdev;
 	dev->features = NETIF_F_HIGHDMA;
 
 	dev->ethtool_ops = &virtnet_ethtool_ops;
@@ -1906,6 +2011,9 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
 		vi->has_cvq = true;
 
+	if (virtio_has_feature(vdev, VIRTIO_RING_F_INTR_COALESCING))
+		vi->intr_coalescing = true;
+
 	if (vi->any_header_sg)
 		dev->needed_headroom = vi->hdr_len;
 
@@ -1918,6 +2026,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (err)
 		goto free_stats;
 
+	vi->tx_work_limit = napi_weight;
+
 #ifdef CONFIG_SYSFS
 	if (vi->mergeable_rx_bufs)
 		dev->sysfs_rx_queue_group = &virtio_net_mrg_rx_group;
@@ -2030,8 +2140,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	cancel_delayed_work_sync(&vi->refill);
 
 	if (netif_running(vi->dev)) {
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			napi_disable(&vi->rq[i].napi);
+			napi_disable(&vi->sq[i].napi);
+		}
 	}
 
 	remove_vq_common(vi);
@@ -2055,8 +2167,10 @@ static int virtnet_restore(struct virtio_device *vdev)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(&vi->rq[i]);
+			napi_enable(&vi->sq[i].napi);
+		}
 	}
 
 	netif_device_attach(vi->dev);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC V7 PATCH 6/7] vhost: interrupt coalescing support
  2015-05-25  5:23 [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net Jason Wang
                   ` (4 preceding siblings ...)
  2015-05-25  5:24 ` [RFC V7 PATCH 5/7] virtio_net: enable tx interrupt Jason Wang
@ 2015-05-25  5:24 ` Jason Wang
  2015-05-25  5:24 ` [RFC V7 PATCH 7/7] vhost_net: add " Jason Wang
  6 siblings, 0 replies; 10+ messages in thread
From: Jason Wang @ 2015-05-25  5:24 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: rusty, Jason Wang

This patch implements basic interrupt coalescing support. This is done
by introducing two new per virtqueue parameters:

- max_coalescced_buffers: maximum number of buffers before trying to
  issue an interrupt.
- coalesce_usecs: maximum number of microseconds waited if at least
  one buffer is pending before trying to issue an interrupt.

A new ioctl was also introduced for userspace to set or get the above
two values.

The number of coalesced buffers were increased in vhost_add_used_n()
and vhost_signal() was modified that it will only try to issue an
interrupt when:

- The number of coalesced buffers exceed or is equal to
  max_coalesced_buffes.
- The time since last signal trying exceed or is equal to
  coalesce_usecs.

When neither of the above two conditions were met, the interrupt was
delayed and during exit of a round of processing, device specific code
will call vhost_check_coalesce_and_signal() to check the above two
conditions again and schedule a timer for delayed interrupt if the
conditions were still not met.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c      | 88 ++++++++++++++++++++++++++++++++++++++++++++--
 drivers/vhost/vhost.h      | 20 +++++++++++
 include/uapi/linux/vhost.h | 13 ++++++-
 3 files changed, 117 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2ee2826..7739112 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -199,6 +199,11 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->call = NULL;
 	vq->log_ctx = NULL;
 	vq->memory = NULL;
+	vq->coalesce_usecs = ktime_set(0, 0);
+	vq->max_coalesced_buffers = 0;
+	vq->coalesced = 0;
+	vq->last_signal = ktime_get();
+	hrtimer_cancel(&vq->ctimer);
 }
 
 static int vhost_worker(void *data)
@@ -291,6 +296,23 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev)
 		vhost_vq_free_iovecs(dev->vqs[i]);
 }
 
+void vhost_check_coalesce_and_signal(struct vhost_dev *dev,
+				     struct vhost_virtqueue *vq,
+				bool timer);
+static enum hrtimer_restart vhost_ctimer_handler(struct hrtimer *timer)
+{
+	struct vhost_virtqueue *vq =
+		container_of(timer, struct vhost_virtqueue, ctimer);
+
+	if (mutex_trylock(&vq->mutex)) {
+		vhost_check_coalesce_and_signal(vq->dev, vq, false);
+		mutex_unlock(&vq->mutex);
+	} else
+		vhost_poll_queue(&vq->poll);
+
+	return HRTIMER_NORESTART;
+}
+
 void vhost_dev_init(struct vhost_dev *dev,
 		    struct vhost_virtqueue **vqs, int nvqs)
 {
@@ -315,6 +337,8 @@ void vhost_dev_init(struct vhost_dev *dev,
 		vq->heads = NULL;
 		vq->dev = dev;
 		mutex_init(&vq->mutex);
+		hrtimer_init(&vq->ctimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+		vq->ctimer.function = vhost_ctimer_handler;
 		vhost_vq_reset(dev, vq);
 		if (vq->handle_kick)
 			vhost_poll_init(&vq->poll, vq->handle_kick,
@@ -640,6 +664,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
 	struct vhost_vring_state s;
 	struct vhost_vring_file f;
 	struct vhost_vring_addr a;
+	struct vhost_vring_coalesce c;
 	u32 idx;
 	long r;
 
@@ -696,6 +721,19 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
 		if (copy_to_user(argp, &s, sizeof s))
 			r = -EFAULT;
 		break;
+	case VHOST_SET_VRING_COALESCE:
+		if (copy_from_user(&c, argp, sizeof c)) {
+			r = -EFAULT;
+			break;
+		}
+		vq->coalesce_usecs = ns_to_ktime(c.coalesce_usecs * NSEC_PER_USEC) ;
+		vq->max_coalesced_buffers = c.max_coalesced_buffers;
+		break;
+	case VHOST_GET_VRING_COALESCE:
+		s.index = idx;
+		if (copy_to_user(argp, &c, sizeof c))
+			r = -EFAULT;
+		break;
 	case VHOST_SET_VRING_ADDR:
 		if (copy_from_user(&a, argp, sizeof a)) {
 			r = -EFAULT;
@@ -1415,6 +1453,9 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 {
 	int start, n, r;
 
+	if (vq->max_coalesced_buffers && ktime_to_ns(vq->coalesce_usecs))
+		vq->coalesced += count;
+
 	start = vq->last_used_idx % vq->num;
 	n = vq->num - start;
 	if (n < count) {
@@ -1440,6 +1481,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
+
 	return r;
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
@@ -1481,15 +1523,55 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	return vring_need_event(vhost16_to_cpu(vq, event), new, old);
 }
 
+static void __vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	if (vq->call_ctx && vhost_notify(dev, vq)) {
+		eventfd_signal(vq->call_ctx, 1);
+	}
+
+	vq->coalesced = 0;
+	vq->last_signal = ktime_get();
+}
+
 /* This actually signals the guest, using eventfd. */
 void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
-	/* Signal the Guest tell them we used something up. */
-	if (vq->call_ctx && vhost_notify(dev, vq))
-		eventfd_signal(vq->call_ctx, 1);
+	bool can_coalesce = vq->max_coalesced_buffers &&
+		            ktime_to_ns(vq->coalesce_usecs);
+
+	if (can_coalesce) {
+		ktime_t passed = ktime_sub(ktime_get(), vq->last_signal);
+
+		if ((vq->coalesced >= vq->max_coalesced_buffers) ||
+		     !ktime_before(passed, vq->coalesce_usecs))
+			__vhost_signal(dev, vq);
+	} else {
+		__vhost_signal(dev, vq);
+	}
 }
 EXPORT_SYMBOL_GPL(vhost_signal);
 
+void vhost_check_coalesce_and_signal(struct vhost_dev *dev,
+				     struct vhost_virtqueue *vq,
+				     bool timer)
+{
+	bool can_coalesce = vq->max_coalesced_buffers &&
+		            ktime_to_ns(vq->coalesce_usecs);
+
+	hrtimer_try_to_cancel(&vq->ctimer);
+	if (can_coalesce && vq->coalesced) {
+		ktime_t passed = ktime_sub(ktime_get(), vq->last_signal);
+		ktime_t left = ktime_sub(vq->coalesce_usecs, passed);
+
+		if (ktime_to_ns(left) <= 0) {
+			__vhost_signal(dev, vq);
+		} else if (timer) {
+			hrtimer_start(&vq->ctimer, left, HRTIMER_MODE_REL);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(vhost_check_coalesce_and_signal);
+
 /* And here's the combo meal deal.  Supersize me! */
 void vhost_add_used_and_signal(struct vhost_dev *dev,
 			       struct vhost_virtqueue *vq,
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8c1c792..2e6754d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -92,6 +92,23 @@ struct vhost_virtqueue {
 	/* Last used index value we have signalled on */
 	bool signalled_used_valid;
 
+	/* Maxinum microseconds waited after at least one buffer is
+	 * processed before generating an interrupt.
+	 */
+	ktime_t coalesce_usecs;
+
+	/* Maxinum number of pending buffers before genearting an interrupt. */
+	__u32 max_coalesced_buffers;
+
+	/* The number of buffers whose interrupt are coalesced */
+	__u32 coalesced;
+
+	/* Last time we singalled guest. */
+	ktime_t last_signal;
+
+	/* Timer used to trigger an coalesced interrupt. */
+	struct hrtimer ctimer;
+
 	/* Log writes to used structure. */
 	bool log_used;
 	u64 log_addr;
@@ -149,6 +166,9 @@ void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
 void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
+void vhost_check_coalesce_and_signal(struct vhost_dev *dev,
+				     struct vhost_virtqueue *vq,
+				     bool timer);
 void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index bb6a5b4..6362e6e 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -27,6 +27,12 @@ struct vhost_vring_file {
 
 };
 
+struct vhost_vring_coalesce {
+	unsigned int index;
+	__u32 coalesce_usecs;
+	__u32 max_coalesced_buffers;
+};
+
 struct vhost_vring_addr {
 	unsigned int index;
 	/* Option flags. */
@@ -102,7 +108,12 @@ struct vhost_memory {
 #define VHOST_SET_VRING_BASE _IOW(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
 /* Get accessor: reads index, writes value in num */
 #define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
-
+/* Set coalescing parameters for the ring. */
+#define VHOST_SET_VRING_COALESCE _IOW(VHOST_VIRTIO, 0x13, \
+				      struct vhost_vring_coalesce)
+/* Get coalescing parameters for the ring. */
+#define VHOST_GET_VRING_COALESCE _IOW(VHOST_VIRTIO, 0x14, \
+				      struct vhost_vring_coalesce)
 /* The following ioctls use eventfd file descriptors to signal and poll
  * for events. */
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC V7 PATCH 7/7] vhost_net: add interrupt coalescing support
  2015-05-25  5:23 [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net Jason Wang
                   ` (5 preceding siblings ...)
  2015-05-25  5:24 ` [RFC V7 PATCH 6/7] vhost: interrupt coalescing support Jason Wang
@ 2015-05-25  5:24 ` Jason Wang
  2015-05-26 18:02   ` Stephen Hemminger
  6 siblings, 1 reply; 10+ messages in thread
From: Jason Wang @ 2015-05-25  5:24 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: rusty, Jason Wang

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7d137a4..5ee28b7 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -320,6 +320,9 @@ static void handle_tx(struct vhost_net *net)
 	hdr_size = nvq->vhost_hlen;
 	zcopy = nvq->ubufs;
 
+	/* Finish pending interrupts first */
+	vhost_check_coalesce_and_signal(vq->dev, vq, false);
+
 	for (;;) {
 		/* Release DMAs done buffers first */
 		if (zcopy)
@@ -415,6 +418,7 @@ static void handle_tx(struct vhost_net *net)
 		}
 	}
 out:
+	vhost_check_coalesce_and_signal(vq->dev, vq, true);
 	mutex_unlock(&vq->mutex);
 }
 
@@ -554,6 +558,9 @@ static void handle_rx(struct vhost_net *net)
 		vq->log : NULL;
 	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
 
+	/* Finish pending interrupts first */
+	vhost_check_coalesce_and_signal(vq->dev, vq, false);
+
 	while ((sock_len = peek_head_len(sock->sk))) {
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
@@ -638,6 +645,7 @@ static void handle_rx(struct vhost_net *net)
 		}
 	}
 out:
+	vhost_check_coalesce_and_signal(vq->dev, vq, true);
 	mutex_unlock(&vq->mutex);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC V7 PATCH 7/7] vhost_net: add interrupt coalescing support
  2015-05-25  5:24 ` [RFC V7 PATCH 7/7] vhost_net: add " Jason Wang
@ 2015-05-26 18:02   ` Stephen Hemminger
  2015-05-27  8:30     ` Jason Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2015-05-26 18:02 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, rusty

On Mon, 25 May 2015 01:24:04 -0400
Jason Wang <jasowang@redhat.com> wrote:

> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 7d137a4..5ee28b7 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -320,6 +320,9 @@ static void handle_tx(struct vhost_net *net)
>  	hdr_size = nvq->vhost_hlen;
>  	zcopy = nvq->ubufs;
>  
> +	/* Finish pending interrupts first */
> +	vhost_check_coalesce_and_signal(vq->dev, vq, false);
> +
>  	for (;;) {
>  		/* Release DMAs done buffers first */
>  		if (zcopy)
> @@ -415,6 +418,7 @@ static void handle_tx(struct vhost_net *net)
>  		}
>  	}
>  out:
> +	vhost_check_coalesce_and_signal(vq->dev, vq, true);
>  	mutex_unlock(&vq->mutex);
>  }
>  
> @@ -554,6 +558,9 @@ static void handle_rx(struct vhost_net *net)
>  		vq->log : NULL;
>  	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
>  
> +	/* Finish pending interrupts first */
> +	vhost_check_coalesce_and_signal(vq->dev, vq, false);
> +
>  	while ((sock_len = peek_head_len(sock->sk))) {
>  		sock_len += sock_hlen;
>  		vhost_len = sock_len + vhost_hlen;
> @@ -638,6 +645,7 @@ static void handle_rx(struct vhost_net *net)
>  		}
>  	}
>  out:
> +	vhost_check_coalesce_and_signal(vq->dev, vq, true);
>  	mutex_unlock(&vq->mutex);
>  }
>  

Could you implement ethtool control of these coalescing parameters?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC V7 PATCH 7/7] vhost_net: add interrupt coalescing support
  2015-05-26 18:02   ` Stephen Hemminger
@ 2015-05-27  8:30     ` Jason Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Wang @ 2015-05-27  8:30 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: mst, virtualization, linux-kernel, netdev, rusty



On 05/27/2015 02:02 AM, Stephen Hemminger wrote:
> On Mon, 25 May 2015 01:24:04 -0400
> Jason Wang <jasowang@redhat.com> wrote:
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  drivers/vhost/net.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 7d137a4..5ee28b7 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -320,6 +320,9 @@ static void handle_tx(struct vhost_net *net)
>>  	hdr_size = nvq->vhost_hlen;
>>  	zcopy = nvq->ubufs;
>>  
>> +	/* Finish pending interrupts first */
>> +	vhost_check_coalesce_and_signal(vq->dev, vq, false);
>> +
>>  	for (;;) {
>>  		/* Release DMAs done buffers first */
>>  		if (zcopy)
>> @@ -415,6 +418,7 @@ static void handle_tx(struct vhost_net *net)
>>  		}
>>  	}
>>  out:
>> +	vhost_check_coalesce_and_signal(vq->dev, vq, true);
>>  	mutex_unlock(&vq->mutex);
>>  }
>>  
>> @@ -554,6 +558,9 @@ static void handle_rx(struct vhost_net *net)
>>  		vq->log : NULL;
>>  	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
>>  
>> +	/* Finish pending interrupts first */
>> +	vhost_check_coalesce_and_signal(vq->dev, vq, false);
>> +
>>  	while ((sock_len = peek_head_len(sock->sk))) {
>>  		sock_len += sock_hlen;
>>  		vhost_len = sock_len + vhost_hlen;
>> @@ -638,6 +645,7 @@ static void handle_rx(struct vhost_net *net)
>>  		}
>>  	}
>>  out:
>> +	vhost_check_coalesce_and_signal(vq->dev, vq, true);
>>  	mutex_unlock(&vq->mutex);
>>  }
>>  
> Could you implement ethtool control of these coalescing parameters?

I believe you mean guest ethtool control. If yes, it has been
implemented in [RFC V7 PATCH 4/7] virtio-net: add basic interrupt
coalescing support.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-05-27  8:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-25  5:23 [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net Jason Wang
2015-05-25  5:23 ` [RFC V7 PATCH 1/7] virito-pci: add coalescing parameters setting Jason Wang
2015-05-25  5:23 ` [RFC V7 PATCH 2/7] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb() Jason Wang
2015-05-25  5:24 ` [RFC V7 PATCH 3/7] virtio-net: optimize free_old_xmit_skbs stats Jason Wang
2015-05-25  5:24 ` [RFC V7 PATCH 4/7] virtio-net: add basic interrupt coalescing support Jason Wang
2015-05-25  5:24 ` [RFC V7 PATCH 5/7] virtio_net: enable tx interrupt Jason Wang
2015-05-25  5:24 ` [RFC V7 PATCH 6/7] vhost: interrupt coalescing support Jason Wang
2015-05-25  5:24 ` [RFC V7 PATCH 7/7] vhost_net: add " Jason Wang
2015-05-26 18:02   ` Stephen Hemminger
2015-05-27  8:30     ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).