From patchwork Mon Feb 9 08:39:20 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 541310 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933088AbbBIIj5 (ORCPT ); Mon, 9 Feb 2015 03:39:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46090 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933042AbbBIIjt (ORCPT ); Mon, 9 Feb 2015 03:39:49 -0500 From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com Cc: pagupta@redhat.com, Jason Wang Subject: [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending Date: Mon, 9 Feb 2015 03:39:20 -0500 Message-Id: <1423471165-34243-2-git-send-email-jasowang@redhat.com> In-Reply-To: <1423471165-34243-1-git-send-email-jasowang@redhat.com> References: <1423471165-34243-1-git-send-email-jasowang@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1760 Lines: 40 We currently does: bufs = (avail->idx - last_used_idx) * 3 / 4; This is ok now since we only try to enable the delayed callbacks when the queue is about to be full. This may not work well when there is only one pending buffer in the virtqueue (this may be the case after tx interrupt was enabled). Since virtqueue_enable_cb() will return false which may cause unnecessary triggering of napis. This patch correct this by only calculate the four thirds when bufs is not one. Signed-off-by: Jason Wang --- drivers/virtio/virtio_ring.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 00ec6b3..545fed5 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -636,7 +636,10 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq) * entry. Always do both to keep code simple. */ vq->vring.avail->flags &= cpu_to_virtio16(_vq->vdev, ~VRING_AVAIL_F_NO_INTERRUPT); /* TODO: tune this threshold */ - bufs = (u16)(virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) - vq->last_used_idx) * 3 / 4; + bufs = (u16)(virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) - + vq->last_used_idx); + if (bufs != 1) + bufs = bufs * 3 / 4; vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs); virtio_mb(vq->weak_barriers); if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) { From patchwork Mon Feb 9 08:39:21 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 541311 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933122AbbBIIkB (ORCPT ); Mon, 9 Feb 2015 03:40:01 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46089 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933041AbbBIIjt (ORCPT ); Mon, 9 Feb 2015 03:39:49 -0500 From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com Cc: pagupta@redhat.com, Jason Wang Subject: [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb() Date: Mon, 9 Feb 2015 03:39:21 -0500 Message-Id: <1423471165-34243-3-git-send-email-jasowang@redhat.com> In-Reply-To: <1423471165-34243-1-git-send-email-jasowang@redhat.com> References: <1423471165-34243-1-git-send-email-jasowang@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1152 Lines: 31 Currently, we do nothing to prevent the callbacks in virtqueue_disable_cb() when event index is used. This may cause spurious interrupts which may damage the performance. This patch tries to publish avail event as the used even to prevent the callbacks. Signed-off-by: Jason Wang Acked-by: Rusty Russell --- drivers/virtio/virtio_ring.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 545fed5..e9ffbfb 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -539,6 +539,8 @@ void virtqueue_disable_cb(struct virtqueue *_vq) struct vring_virtqueue *vq = to_vvq(_vq); vq->vring.avail->flags |= cpu_to_virtio16(_vq->vdev, VRING_AVAIL_F_NO_INTERRUPT); + vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, + vq->vring.avail->idx); } EXPORT_SYMBOL_GPL(virtqueue_disable_cb); From patchwork Mon Feb 9 08:39:22 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 541312 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933104AbbBIIj7 (ORCPT ); Mon, 9 Feb 2015 03:39:59 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46108 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933046AbbBIIjw (ORCPT ); Mon, 9 Feb 2015 03:39:52 -0500 From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com Cc: pagupta@redhat.com, Jason Wang Subject: [PATCH RFC v5 net-next 3/6] virtio_net: enable tx interrupt Date: Mon, 9 Feb 2015 03:39:22 -0500 Message-Id: <1423471165-34243-4-git-send-email-jasowang@redhat.com> In-Reply-To: <1423471165-34243-1-git-send-email-jasowang@redhat.com> References: <1423471165-34243-1-git-send-email-jasowang@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8933 Lines: 298 On newer hosts that support delayed tx interrupts, we probably don't have much to gain from orphaning packets early. Note: this might degrade performance for hosts without event idx support. Should be addressed by the next patch. Signed-off-by: Jason Wang --- Changes from RFCv4: - change: virtqueue_disable_cb(sq->vq); napi_schedule(&sq->napi); in skb_xmit_done() to: if (__napi_schedule_prep(&sq->napi)) { virtqueue_diable_cb(sq->vq); __napi_schedule(&sq->napi); } to solve the race on architectures that atomic operations were not serialized. And do solve a similar issue in virtnet_poll_tx(). - use netif_wake_subqueue() instead of netif_start_subqueue() in free_old_xmit_skbs(), since it may be called in tx napi. - in start_xmit(), try to enable the callback only when current skb is the last in the list or tx has already been stopped. This avoid the callbacks enabling in heavy load. --- drivers/net/virtio_net.c | 136 +++++++++++++++++++++++++++++++---------------- 1 file changed, 90 insertions(+), 46 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 11e2e81..cc5f5de 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -72,6 +72,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -140,6 +142,9 @@ struct virtnet_info { /* CPU hot plug notifier */ struct notifier_block nb; + + /* Budget for polling tx completion */ + u32 tx_work_limit; }; struct padded_vnet_hdr { @@ -207,15 +212,43 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static unsigned int free_old_xmit_skbs(struct netdev_queue *txq, + struct send_queue *sq, int budget) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq->vq->vdev->priv; + struct virtnet_stats *stats = this_cpu_ptr(vi->stats); + unsigned int packets = 0; + + while (packets < budget && + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) { + pr_debug("Sent skb %p\n", skb); + + u64_stats_update_begin(&stats->tx_syncp); + stats->tx_bytes += skb->len; + stats->tx_packets++; + u64_stats_update_end(&stats->tx_syncp); + + dev_kfree_skb_any(skb); + packets++; + } + + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) + netif_wake_subqueue(vi->dev, vq2txq(sq->vq)); + + return packets; +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq->vdev->priv; + struct send_queue *sq = &vi->sq[vq2txq(vq)]; - /* Suppress further interrupts. */ - virtqueue_disable_cb(vq); - - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi->dev, vq2txq(vq)); + if (napi_schedule_prep(&sq->napi)) { + virtqueue_disable_cb(sq->vq); + __napi_schedule(&sq->napi); + } } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -776,6 +809,30 @@ static int virtnet_poll(struct napi_struct *napi, int budget) return received; } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = + container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq->vq->vdev->priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq)); + u32 limit = vi->tx_work_limit; + unsigned int r, sent; + + __netif_tx_lock(txq, smp_processor_id()); + sent = free_old_xmit_skbs(txq, sq, limit); + if (sent < limit) { + r = virtqueue_enable_cb_prepare(sq->vq); + napi_complete(napi); + if (unlikely(virtqueue_poll(sq->vq, r)) && + napi_schedule_prep(napi)) { + virtqueue_disable_cb(sq->vq); + __napi_schedule(napi); + } + } + __netif_tx_unlock(txq); + return sent < limit ? 0 : budget; +} + #ifdef CONFIG_NET_RX_BUSY_POLL /* must be called with local_bh_disable()d */ static int virtnet_busy_poll(struct napi_struct *napi) @@ -824,30 +881,12 @@ static int virtnet_open(struct net_device *dev) if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) schedule_delayed_work(&vi->refill, 0); virtnet_napi_enable(&vi->rq[i]); + napi_enable(&vi->sq[i].napi); } return 0; } -static void free_old_xmit_skbs(struct send_queue *sq) -{ - struct sk_buff *skb; - unsigned int len; - struct virtnet_info *vi = sq->vq->vdev->priv; - struct virtnet_stats *stats = this_cpu_ptr(vi->stats); - - while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) { - pr_debug("Sent skb %p\n", skb); - - u64_stats_update_begin(&stats->tx_syncp); - stats->tx_bytes += skb->len; - stats->tx_packets++; - u64_stats_update_end(&stats->tx_syncp); - - dev_kfree_skb_any(skb); - } -} - static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) { struct virtio_net_hdr_mrg_rxbuf *hdr; @@ -910,7 +949,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) sg_set_buf(sq->sg, hdr, hdr_len); num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1; } - return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC); + + return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, + GFP_ATOMIC); } static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) @@ -922,8 +963,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); bool kick = !skb->xmit_more; - /* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(sq); + virtqueue_disable_cb(sq->vq); /* Try to transmit */ err = xmit_skb(sq, skb); @@ -939,27 +979,19 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } - /* Don't wait up for transmitted skbs to be freed. */ - skb_orphan(skb); - nf_reset(skb); - /* Apparently nice girls don't return TX_BUSY; stop the queue * before it gets out of hand. Naturally, this wastes entries. */ - if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { + if (sq->vq->num_free < 2+MAX_SKB_FRAGS) netif_stop_subqueue(dev, qnum); - if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { - /* More just got used, free them then recheck. */ - free_old_xmit_skbs(sq); - if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { - netif_start_subqueue(dev, qnum); - virtqueue_disable_cb(sq->vq); - } - } - } - if (kick || netif_xmit_stopped(txq)) + if (kick || netif_xmit_stopped(txq)) { virtqueue_kick(sq->vq); - + if (!virtqueue_enable_cb_delayed(sq->vq) && + napi_schedule_prep(&sq->napi)) { + virtqueue_disable_cb(sq->vq); + __napi_schedule(&sq->napi); + } + } return NETDEV_TX_OK; } @@ -1137,8 +1169,10 @@ static int virtnet_close(struct net_device *dev) /* Make sure refill_work doesn't re-enable napi! */ cancel_delayed_work_sync(&vi->refill); - for (i = 0; i < vi->max_queue_pairs; i++) + for (i = 0; i < vi->max_queue_pairs; i++) { napi_disable(&vi->rq[i].napi); + napi_disable(&vi->sq[i].napi); + } return 0; } @@ -1451,8 +1485,10 @@ static void virtnet_free_queues(struct virtnet_info *vi) { int i; - for (i = 0; i < vi->max_queue_pairs; i++) + for (i = 0; i < vi->max_queue_pairs; i++) { netif_napi_del(&vi->rq[i].napi); + netif_napi_del(&vi->sq[i].napi); + } kfree(vi->rq); kfree(vi->sq); @@ -1606,6 +1642,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi) netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll, napi_weight); napi_hash_add(&vi->rq[i].napi); + netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx, + napi_weight); sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg)); ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT); @@ -1830,6 +1868,8 @@ static int virtnet_probe(struct virtio_device *vdev) if (err) goto free_stats; + vi->tx_work_limit = napi_weight; + #ifdef CONFIG_SYSFS if (vi->mergeable_rx_bufs) dev->sysfs_rx_queue_group = &virtio_net_mrg_rx_group; @@ -1944,8 +1984,10 @@ static int virtnet_freeze(struct virtio_device *vdev) if (netif_running(vi->dev)) { for (i = 0; i < vi->max_queue_pairs; i++) { napi_disable(&vi->rq[i].napi); + napi_disable(&vi->sq[i].napi); napi_hash_del(&vi->rq[i].napi); netif_napi_del(&vi->rq[i].napi); + netif_napi_del(&vi->sq[i].napi); } } @@ -1970,8 +2012,10 @@ static int virtnet_restore(struct virtio_device *vdev) if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) schedule_delayed_work(&vi->refill, 0); - for (i = 0; i < vi->max_queue_pairs; i++) + for (i = 0; i < vi->max_queue_pairs; i++) { virtnet_napi_enable(&vi->rq[i]); + napi_enable(&vi->sq[i].napi); + } } netif_device_attach(vi->dev); From patchwork Mon Feb 9 08:39:23 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 541313 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933177AbbBIIky (ORCPT ); Mon, 9 Feb 2015 03:40:54 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34804 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933123AbbBIIkE (ORCPT ); Mon, 9 Feb 2015 03:40:04 -0500 From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com Cc: pagupta@redhat.com, Jason Wang , Rusty Russell Subject: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support Date: Mon, 9 Feb 2015 03:39:23 -0500 Message-Id: <1423471165-34243-5-git-send-email-jasowang@redhat.com> In-Reply-To: <1423471165-34243-1-git-send-email-jasowang@redhat.com> References: <1423471165-34243-1-git-send-email-jasowang@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4886 Lines: 144 This patch enables the interrupt coalescing setting through ethtool. Cc: Rusty Russell Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 67 +++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/virtio_net.h | 12 ++++++++ 2 files changed, 79 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index cc5f5de..2b958fb 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -145,6 +145,11 @@ struct virtnet_info { /* Budget for polling tx completion */ u32 tx_work_limit; + + __u32 rx_coalesce_usecs; + __u32 rx_max_coalesced_frames; + __u32 tx_coalesce_usecs; + __u32 tx_max_coalesced_frames; }; struct padded_vnet_hdr { @@ -1404,12 +1409,73 @@ static void virtnet_get_channels(struct net_device *dev, channels->other_count = 0; } +static int virtnet_set_coalesce(struct net_device *dev, + struct ethtool_coalesce *ec) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct scatterlist sg; + struct virtio_net_ctrl_coalesce c; + + if (!vi->has_cvq || + !virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_COALESCE)) + return -EOPNOTSUPP; + if (vi->rx_coalesce_usecs != ec->rx_coalesce_usecs || + vi->rx_max_coalesced_frames != ec->rx_max_coalesced_frames) { + c.coalesce_usecs = ec->rx_coalesce_usecs; + c.max_coalesced_frames = ec->rx_max_coalesced_frames; + sg_init_one(&sg, &c, sizeof(c)); + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE, + VIRTIO_NET_CTRL_COALESCE_RX_SET, + &sg)) { + dev_warn(&dev->dev, "Fail to set rx coalescing\n"); + return -EINVAL; + } + vi->rx_coalesce_usecs = ec->rx_coalesce_usecs; + vi->rx_max_coalesced_frames = ec->rx_max_coalesced_frames; + } + + if (vi->tx_coalesce_usecs != ec->tx_coalesce_usecs || + vi->tx_max_coalesced_frames != ec->tx_max_coalesced_frames) { + c.coalesce_usecs = ec->tx_coalesce_usecs; + c.max_coalesced_frames = ec->tx_max_coalesced_frames; + sg_init_one(&sg, &c, sizeof(c)); + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE, + VIRTIO_NET_CTRL_COALESCE_TX_SET, + &sg)) { + dev_warn(&dev->dev, "Fail to set tx coalescing\n"); + return -EINVAL; + } + vi->tx_coalesce_usecs = ec->tx_coalesce_usecs; + vi->tx_max_coalesced_frames = ec->tx_max_coalesced_frames; + } + + vi->tx_work_limit = ec->tx_max_coalesced_frames_irq; + + return 0; +} + +static int virtnet_get_coalesce(struct net_device *dev, + struct ethtool_coalesce *ec) +{ + struct virtnet_info *vi = netdev_priv(dev); + + ec->rx_coalesce_usecs = vi->rx_coalesce_usecs; + ec->rx_max_coalesced_frames = vi->rx_max_coalesced_frames; + ec->tx_coalesce_usecs = vi->tx_coalesce_usecs; + ec->tx_max_coalesced_frames = vi->tx_max_coalesced_frames; + ec->tx_max_coalesced_frames_irq = vi->tx_work_limit; + + return 0; +} + static const struct ethtool_ops virtnet_ethtool_ops = { .get_drvinfo = virtnet_get_drvinfo, .get_link = ethtool_op_get_link, .get_ringparam = virtnet_get_ringparam, .set_channels = virtnet_set_channels, .get_channels = virtnet_get_channels, + .set_coalesce = virtnet_set_coalesce, + .get_coalesce = virtnet_get_coalesce, }; #define MIN_MTU 68 @@ -2048,6 +2114,7 @@ static unsigned int features[] = { VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, VIRTIO_NET_F_CTRL_MAC_ADDR, VIRTIO_F_ANY_LAYOUT, + VIRTIO_NET_F_CTRL_COALESCE, }; static struct virtio_driver virtio_net_driver = { diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h index b5f1677..332009d 100644 --- a/include/uapi/linux/virtio_net.h +++ b/include/uapi/linux/virtio_net.h @@ -34,6 +34,7 @@ /* The feature bitmap for virtio net */ #define VIRTIO_NET_F_CSUM 0 /* Host handles pkts w/ partial csum */ #define VIRTIO_NET_F_GUEST_CSUM 1 /* Guest handles pkts w/ partial csum */ +#define VIRTIO_NET_F_CTRL_COALESCE 3 /* Set coalescing */ #define VIRTIO_NET_F_MAC 5 /* Host has given MAC address. */ #define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */ #define VIRTIO_NET_F_GUEST_TSO4 7 /* Guest can handle TSOv4 in. */ @@ -202,4 +203,15 @@ struct virtio_net_ctrl_mq { #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN 1 #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX 0x8000 +struct virtio_net_ctrl_coalesce { + __u32 coalesce_usecs; + __u32 max_coalesced_frames; +}; + +#define VIRTIO_NET_CTRL_COALESCE 6 + #define VIRTIO_NET_CTRL_COALESCE_TX_SET 0 + #define VIRTIO_NET_CTRL_COALESCE_TX_GET 1 + #define VIRTIO_NET_CTRL_COALESCE_RX_SET 2 + #define VIRTIO_NET_CTRL_COALESCE_RX_GET 3 + #endif /* _LINUX_VIRTIO_NET_H */ From patchwork Mon Feb 9 08:39:24 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 541314 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933143AbbBIIkH (ORCPT ); Mon, 9 Feb 2015 03:40:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34787 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933117AbbBIIkB (ORCPT ); Mon, 9 Feb 2015 03:40:01 -0500 From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com Cc: pagupta@redhat.com, Jason Wang Subject: [PATCH RFC v5 net-next 5/6] vhost: let vhost_signal() returns whether signalled Date: Mon, 9 Feb 2015 03:39:24 -0500 Message-Id: <1423471165-34243-6-git-send-email-jasowang@redhat.com> In-Reply-To: <1423471165-34243-1-git-send-email-jasowang@redhat.com> References: <1423471165-34243-1-git-send-email-jasowang@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2021 Lines: 52 Let vhost_signal() return whether or not vhost has injected an interrupt to guest. This is used for interrupt coalescing implementation to calculate the interval between two interrupts. Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 7 +++++-- drivers/vhost/vhost.h | 2 +- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index cb807d0..20d6b84 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1480,11 +1480,14 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) } /* This actually signals the guest, using eventfd. */ -void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq) +bool vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq) { /* Signal the Guest tell them we used something up. */ - if (vq->call_ctx && vhost_notify(dev, vq)) + if (vq->call_ctx && vhost_notify(dev, vq)) { eventfd_signal(vq->call_ctx, 1); + return true; + } + return false; } EXPORT_SYMBOL_GPL(vhost_signal); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 8c1c792..a482563 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -148,7 +148,7 @@ void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *, unsigned int id, int len); void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *, struct vring_used_elem *heads, unsigned count); -void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *); +bool vhost_signal(struct vhost_dev *, struct vhost_virtqueue *); void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *); bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *); From patchwork Mon Feb 9 08:39:25 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 541315 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933192AbbBIIkz (ORCPT ); Mon, 9 Feb 2015 03:40:55 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58654 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933041AbbBIIkE (ORCPT ); Mon, 9 Feb 2015 03:40:04 -0500 From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com Cc: pagupta@redhat.com, Jason Wang Subject: [PATCH RFC v5 net-next 6/6] vhost_net: interrupt coalescing support Date: Mon, 9 Feb 2015 03:39:25 -0500 Message-Id: <1423471165-34243-7-git-send-email-jasowang@redhat.com> In-Reply-To: <1423471165-34243-1-git-send-email-jasowang@redhat.com> References: <1423471165-34243-1-git-send-email-jasowang@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12088 Lines: 416 This patch implements interrupt coalescing support for vhost_net. And provides ioctl()s for userspace to get and set coalescing parameters. Two kinds of parameters were allowed to be set: - max_coalesced_frames: which is the maximum numbers of packets were allowed before issuing an irq. - coalesced_usecs: which is the maximum number of micro seconds were allowed before issuing an irq if at least one packet were pending. A per virtqueue hrtimer were used for coalesced_usecs. Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- Changes from RFCv4: - return ns instead of us in vhost_net_check_coalesce_and_signal() - measure the time interval of real interrupts instead of calls to vhost_signal(). --- drivers/vhost/net.c | 199 +++++++++++++++++++++++++++++++++++++++++++-- include/uapi/linux/vhost.h | 12 +++ 2 files changed, 202 insertions(+), 9 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 6906f76..3222ac9 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -62,7 +63,8 @@ enum { VHOST_NET_FEATURES = VHOST_FEATURES | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | (1ULL << VIRTIO_NET_F_MRG_RXBUF) | - (1ULL << VIRTIO_F_VERSION_1), + (1ULL << VIRTIO_F_VERSION_1) | + (1ULL << VIRTIO_NET_F_CTRL_COALESCE), }; enum { @@ -100,6 +102,15 @@ struct vhost_net_virtqueue { /* Reference counting for outstanding ubufs. * Protected by vq mutex. Writers must also take device mutex. */ struct vhost_net_ubuf_ref *ubufs; + /* Microseconds after at least 1 paket is processed before + * generating an interrupt. + */ + __u32 coalesce_usecs; + /* Packets are processed before genearting an interrupt. */ + __u32 max_coalesced_frames; + __u32 coalesced; + ktime_t last_signal; + struct hrtimer c_timer; }; struct vhost_net { @@ -197,11 +208,16 @@ static void vhost_net_vq_reset(struct vhost_net *n) vhost_net_clear_ubuf_info(n); for (i = 0; i < VHOST_NET_VQ_MAX; i++) { + hrtimer_cancel(&n->vqs[i].c_timer); n->vqs[i].done_idx = 0; n->vqs[i].upend_idx = 0; n->vqs[i].ubufs = NULL; n->vqs[i].vhost_hlen = 0; n->vqs[i].sock_hlen = 0; + n->vqs[i].max_coalesced_frames = 0; + n->vqs[i].coalesce_usecs = 0; + n->vqs[i].last_signal = ktime_get(); + n->vqs[i].coalesced = 0; } } @@ -273,6 +289,55 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to, } } +static int vhost_net_check_coalesce_and_signal(struct vhost_dev *dev, + struct vhost_net_virtqueue *nvq) +{ + struct vhost_virtqueue *vq = &nvq->vq; + int left = 0; + ktime_t now; + + if (nvq->coalesced) { + now = ktime_get(); + left = nvq->coalesce_usecs - + ktime_to_us(ktime_sub(now, nvq->last_signal)); + if (left <= 0) { + vhost_signal(dev, vq); + nvq->last_signal = now; + nvq->coalesced = 0; + } + } + + return left * NSEC_PER_USEC; +} + +static bool vhost_net_add_used_and_signal_n(struct vhost_dev *dev, + struct vhost_net_virtqueue *nvq, + struct vring_used_elem *heads, + unsigned count) +{ + struct vhost_virtqueue *vq = &nvq->vq; + bool can_coalesce = nvq->max_coalesced_frames && nvq->coalesce_usecs; + bool ret = false; + + vhost_add_used_n(vq, heads, count); + + if (can_coalesce) { + ktime_t now = ktime_get(); + + nvq->coalesced += count; + if (((nvq->coalesced >= nvq->max_coalesced_frames) || + (ktime_to_us(ktime_sub(now, nvq->last_signal)) >= + nvq->coalesce_usecs)) && vhost_signal(dev, vq)) { + nvq->coalesced = 0; + nvq->last_signal = now; + ret = true; + } + } else { + vhost_signal(dev, vq); + } + return ret; +} + /* In case of DMA done not in order in lower device driver for some reason. * upend_idx is used to track end of used idx, done_idx is used to track head * of used idx. Once lower device DMA done contiguously, we will signal KVM @@ -297,8 +362,8 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net, } while (j) { add = min(UIO_MAXIOV - nvq->done_idx, j); - vhost_add_used_and_signal_n(vq->dev, vq, - &vq->heads[nvq->done_idx], add); + vhost_net_add_used_and_signal_n(vq->dev, nvq, + &vq->heads[nvq->done_idx], add); nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV; j -= add; } @@ -351,6 +416,7 @@ static void handle_tx(struct vhost_net *net) struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); bool zcopy, zcopy_used; + int left; mutex_lock(&vq->mutex); sock = vq->private_data; @@ -362,6 +428,8 @@ static void handle_tx(struct vhost_net *net) hdr_size = nvq->vhost_hlen; zcopy = nvq->ubufs; + vhost_net_check_coalesce_and_signal(&net->dev, nvq); + for (;;) { /* Release DMAs done buffers first */ if (zcopy) @@ -444,10 +512,15 @@ static void handle_tx(struct vhost_net *net) if (err != len) pr_debug("Truncated TX packet: " " len %d != %zd\n", err, len); - if (!zcopy_used) - vhost_add_used_and_signal(&net->dev, vq, head, 0); - else + + if (!zcopy_used) { + struct vring_used_elem heads = { head, 0 }; + + vhost_net_add_used_and_signal_n(&net->dev, + nvq, &heads, 1); + } else { vhost_zerocopy_signal_used(net, vq); + } total_len += len; vhost_net_tx_packet(net); if (unlikely(total_len >= VHOST_NET_WEIGHT)) { @@ -455,6 +528,12 @@ static void handle_tx(struct vhost_net *net) break; } } + + left = vhost_net_check_coalesce_and_signal(&net->dev, nvq); + if (left > 0) + hrtimer_start(&nvq->c_timer, ns_to_ktime(left), + HRTIMER_MODE_REL); + out: mutex_unlock(&vq->mutex); } @@ -574,7 +653,7 @@ static void handle_rx(struct vhost_net *net) .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE }; size_t total_len = 0; - int err, mergeable; + int err, mergeable, left; s16 headcount; size_t vhost_hlen, sock_hlen; size_t vhost_len, sock_len; @@ -593,6 +672,8 @@ static void handle_rx(struct vhost_net *net) vq->log : NULL; mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF); + vhost_net_check_coalesce_and_signal(&net->dev, nvq); + while ((sock_len = peek_head_len(sock->sk))) { sock_len += sock_hlen; vhost_len = sock_len + vhost_hlen; @@ -658,8 +739,10 @@ static void handle_rx(struct vhost_net *net) vhost_discard_vq_desc(vq, headcount); break; } - vhost_add_used_and_signal_n(&net->dev, vq, vq->heads, - headcount); + + vhost_net_add_used_and_signal_n(&net->dev, nvq, + vq->heads, headcount); + if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len); total_len += vhost_len; @@ -668,6 +751,12 @@ static void handle_rx(struct vhost_net *net) break; } } + + left = vhost_net_check_coalesce_and_signal(&net->dev, nvq); + if (left > 0) + hrtimer_start(&nvq->c_timer, ms_to_ktime(left), + HRTIMER_MODE_REL); + out: mutex_unlock(&vq->mutex); } @@ -704,6 +793,18 @@ static void handle_rx_net(struct vhost_work *work) handle_rx(net); } +static enum hrtimer_restart vhost_net_timer_handler(struct hrtimer *timer) +{ + struct vhost_net_virtqueue *nvq = container_of(timer, + struct vhost_net_virtqueue, + c_timer); + struct vhost_virtqueue *vq = &nvq->vq; + + vhost_poll_queue(&vq->poll); + + return HRTIMER_NORESTART; +} + static int vhost_net_open(struct inode *inode, struct file *f) { struct vhost_net *n; @@ -735,6 +836,13 @@ static int vhost_net_open(struct inode *inode, struct file *f) n->vqs[i].done_idx = 0; n->vqs[i].vhost_hlen = 0; n->vqs[i].sock_hlen = 0; + n->vqs[i].max_coalesced_frames = 0; + n->vqs[i].coalesce_usecs = 0; + n->vqs[i].last_signal = ktime_get(); + n->vqs[i].coalesced = 0; + hrtimer_init(&n->vqs[i].c_timer, CLOCK_MONOTONIC, + HRTIMER_MODE_REL); + n->vqs[i].c_timer.function = vhost_net_timer_handler; } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); @@ -911,6 +1019,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) struct vhost_virtqueue *vq; struct vhost_net_virtqueue *nvq; struct vhost_net_ubuf_ref *ubufs, *oldubufs = NULL; + unsigned int coalesced; int r; mutex_lock(&n->dev.mutex); @@ -939,6 +1048,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) /* start polling new socket */ oldsock = vq->private_data; + coalesced = nvq->coalesced; if (sock != oldsock) { ubufs = vhost_net_ubuf_alloc(vq, sock && vhost_sock_zcopy(sock)); @@ -973,6 +1083,12 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) mutex_unlock(&vq->mutex); } + if (coalesced) { + mutex_lock(&vq->mutex); + vhost_signal(&n->dev, vq); + mutex_unlock(&vq->mutex); + } + if (oldsock) { vhost_net_flush_vq(n, index); sockfd_put(oldsock); @@ -1080,6 +1196,67 @@ out: return r; } +static long vhost_net_set_vring_coalesce(struct vhost_dev *d, void __user *argp) +{ + u32 __user *idxp = argp; + u32 idx; + int r; + struct vhost_virtqueue *vq; + struct vhost_net_vring_coalesce c; + struct vhost_net_virtqueue *nvq; + + r = get_user(idx, idxp); + if (r < 0) + return r; + if (idx >= d->nvqs) + return -ENOBUFS; + + vq = d->vqs[idx]; + nvq = container_of(vq, struct vhost_net_virtqueue, vq); + + r = copy_from_user(&c, argp, sizeof(c)); + if (r < 0) + return r; + + mutex_lock(&vq->mutex); + nvq->coalesce_usecs = c.coalesce_usecs; + nvq->max_coalesced_frames = c.max_coalesced_frames; + mutex_unlock(&vq->mutex); + + return 0; +} + +static long vhost_net_get_vring_coalesce(struct vhost_dev *d, void __user *argp) +{ + u32 __user *idxp = argp; + u32 idx; + int r; + struct vhost_virtqueue *vq; + struct vhost_net_vring_coalesce c; + struct vhost_net_virtqueue *nvq; + + r = get_user(idx, idxp); + if (r < 0) + return r; + if (idx >= d->nvqs) + return -ENOBUFS; + + vq = d->vqs[idx]; + nvq = container_of(vq, struct vhost_net_virtqueue, vq); + + mutex_lock(&vq->mutex); + c.index = idx; + c.coalesce_usecs = nvq->coalesce_usecs; + c.max_coalesced_frames = nvq->max_coalesced_frames; + mutex_unlock(&vq->mutex); + + r = copy_to_user(argp, &c, sizeof(c)); + if (r < 0) + return r; + + return 0; +} + static long vhost_net_ioctl(struct file *f, unsigned int ioctl, unsigned long arg) { @@ -1110,6 +1287,10 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl, return vhost_net_reset_owner(n); case VHOST_SET_OWNER: return vhost_net_set_owner(n); + case VHOST_NET_SET_VRING_COALESCE: + return vhost_net_set_vring_coalesce(&n->dev, argp); + case VHOST_NET_GET_VRING_COALESCE: + return vhost_net_get_vring_coalesce(&n->dev, argp); default: mutex_lock(&n->dev.mutex); r = vhost_dev_ioctl(&n->dev, ioctl, argp); diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index bb6a5b4..6799cc1 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -27,6 +27,12 @@ struct vhost_vring_file { }; +struct vhost_net_vring_coalesce { + unsigned int index; + __u32 coalesce_usecs; + __u32 max_coalesced_frames; +}; + struct vhost_vring_addr { unsigned int index; /* Option flags. */ @@ -121,6 +127,12 @@ struct vhost_memory { * device. This can be used to stop the ring (e.g. for migration). */ #define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vring_file) +/* Setting interrupt coalescing parameters. */ +#define VHOST_NET_SET_VRING_COALESCE \ + _IOW(VHOST_VIRTIO, 0x31, struct vhost_net_vring_coalesce) +/* Getting interrupt coalescing parameters. */ +#define VHOST_NET_GET_VRING_COALESCE \ + _IOW(VHOST_VIRTIO, 0x32, struct vhost_net_vring_coalesce) /* Feature bits */ /* Log all write descriptors. Can be changed while device is active. */ #define VHOST_F_LOG_ALL 26