linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] xen/blk: persistent grant rework
@ 2018-08-06 11:33 Juergen Gross
  2018-08-06 11:33 ` [PATCH 1/4] xen/blkback: don't keep persistent grants too long Juergen Gross
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Juergen Gross @ 2018-08-06 11:33 UTC (permalink / raw)
  To: linux-kernel, xen-devel, linux-block
  Cc: konrad.wilk, roger.pau, axboe, boris.ostrovsky, Juergen Gross

Persistent grants are used in the Xen's blkfront/blkback drivers to
avoid mapping/unmapping of I/O buffers in the backend for each I/O.

While this speeds up processing quite a bit there are problems related
to persistent grants in some configurations: domains with multiple
block devices making use of persistent grants might suffer from a lack
of grants if each of the block devices experienced a high I/O load at
some time. This is due to the number of persistent grants per device
only to be limited by a rather high maximum value, but never being
released even in case of longer times without any I/O.

This series modifies xen-blkback to unmap any domU page mapped via a
persistent grant after a timeout (default: 60 seconds). The timeout
is set to its default value again when a persistent grant has been
used for an I/O.

xen-blkfront is modified to scan every 10 seconds for persistent grants
not in use by blkback any more and to remove such grants.

The last 2 patches are small cleanups of blkfront and blkback drivers.

Juergen Gross (4):
  xen/blkback: don't keep persistent grants too long
  xen/blkfront: cleanup stale persistent grants
  xen/blkfront: reorder tests in xlblk_init()
  xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring

 drivers/block/xen-blkback/blkback.c |  77 +++++++++++++++---------
 drivers/block/xen-blkback/common.h  |   2 +-
 drivers/block/xen-blkfront.c        | 115 ++++++++++++++++++++++++++++++++----
 3 files changed, 153 insertions(+), 41 deletions(-)

-- 
2.13.7


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] xen/blkback: don't keep persistent grants too long
  2018-08-06 11:33 [PATCH 0/4] xen/blk: persistent grant rework Juergen Gross
@ 2018-08-06 11:33 ` Juergen Gross
  2018-08-06 15:58   ` Roger Pau Monné
  2018-08-06 11:34 ` [PATCH] xen/blkfront: remove unused macros Juergen Gross
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2018-08-06 11:33 UTC (permalink / raw)
  To: linux-kernel, xen-devel, linux-block
  Cc: konrad.wilk, roger.pau, axboe, boris.ostrovsky, Juergen Gross

Persistent grants are allocated until a threshold per ring is being
reached. Those grants won't be freed until the ring is being destroyed
meaning there will be resources kept busy which might no longer be
used.

Instead of freeing only persistent grants until the threshold is
reached add a timestamp and remove all persistent grants not having
been in use for a minute.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/block/xen-blkback/blkback.c | 77 +++++++++++++++++++++++--------------
 drivers/block/xen-blkback/common.h  |  1 +
 2 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index b55b245e8052..485e3ecab144 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -84,6 +84,18 @@ MODULE_PARM_DESC(max_persistent_grants,
                  "Maximum number of grants to map persistently");
 
 /*
+ * How long a persistent grant is allowed to remain allocated without being in
+ * use. The time is in seconds, 0 means indefinitely long.
+ */
+
+unsigned int xen_blkif_pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+		   uint, 0644);
+MODULE_PARM_DESC(persistent_grant_unused_seconds,
+		 "Time in seconds an unused persistent grant is allowed to "
+		 "remain allocated. Default is 60, 0 means unlimited.");
+
+/*
  * Maximum number of rings/queues blkback supports, allow as many queues as there
  * are CPUs if user has not specified a value.
  */
@@ -123,6 +135,13 @@ module_param(log_stats, int, 0644);
 /* Number of free pages to remove on each call to gnttab_free_pages */
 #define NUM_BATCH_FREE_PAGES 10
 
+static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt)
+{
+	return xen_blkif_pgrant_timeout &&
+	       (jiffies - persistent_gnt->last_used >=
+		HZ * xen_blkif_pgrant_timeout);
+}
+
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
 {
 	unsigned long flags;
@@ -278,6 +297,7 @@ static void put_persistent_gnt(struct xen_blkif_ring *ring,
 {
 	if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
 		pr_alert_ratelimited("freeing a grant already unused\n");
+	persistent_gnt->last_used = jiffies;
 	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
 	clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
 	atomic_dec(&ring->persistent_gnt_in_use);
@@ -374,23 +394,23 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
 	bool scan_used = false, clean_used = false;
 	struct rb_root *root;
 
-	if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-	    (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
-	    !ring->blkif->vbd.overflow_max_grants)) {
-		goto out;
-	}
-
 	if (work_busy(&ring->persistent_purge_work)) {
 		pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
 		goto out;
 	}
 
-	num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-	num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
-	num_clean = min(ring->persistent_gnt_c, num_clean);
-	if ((num_clean == 0) ||
-	    (num_clean > (ring->persistent_gnt_c - atomic_read(&ring->persistent_gnt_in_use))))
-		goto out;
+	if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
+	    (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+	    !ring->blkif->vbd.overflow_max_grants)) {
+		num_clean = 0;
+	} else {
+		num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
+		num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
+			    num_clean;
+		num_clean = min(ring->persistent_gnt_c, num_clean);
+		pr_debug("Going to purge at least %u persistent grants\n",
+			 num_clean);
+	}
 
 	/*
 	 * At this point, we can assure that there will be no calls
@@ -401,9 +421,7 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
          * number of grants.
 	 */
 
-	total = num_clean;
-
-	pr_debug("Going to purge %u persistent grants\n", num_clean);
+	total = 0;
 
 	BUG_ON(!list_empty(&ring->persistent_purge_list));
 	root = &ring->persistent_gnts;
@@ -419,39 +437,42 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
 
 		if (test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
 			continue;
-		if (!scan_used &&
+		if (!scan_used && !persistent_gnt_timeout(persistent_gnt) &&
 		    (test_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags)))
 			continue;
+		if (scan_used && total >= num_clean)
+			continue;
 
 		rb_erase(&persistent_gnt->node, root);
 		list_add(&persistent_gnt->remove_node,
 			 &ring->persistent_purge_list);
-		if (--num_clean == 0)
-			goto finished;
+		total++;
 	}
 	/*
-	 * If we get here it means we also need to start cleaning
+	 * Check whether we also need to start cleaning
 	 * grants that were used since last purge in order to cope
 	 * with the requested num
 	 */
-	if (!scan_used && !clean_used) {
-		pr_debug("Still missing %u purged frames\n", num_clean);
+	if (!scan_used && !clean_used && total < num_clean) {
+		pr_debug("Still missing %u purged frames\n", num_clean - total);
 		scan_used = true;
 		goto purge_list;
 	}
-finished:
-	if (!clean_used) {
+
+	if (!clean_used && num_clean) {
 		pr_debug("Finished scanning for grants to clean, removing used flag\n");
 		clean_used = true;
 		goto purge_list;
 	}
 
-	ring->persistent_gnt_c -= (total - num_clean);
-	ring->blkif->vbd.overflow_max_grants = 0;
+	if (total) {
+		ring->persistent_gnt_c -= total;
+		ring->blkif->vbd.overflow_max_grants = 0;
 
-	/* We can defer this work */
-	schedule_work(&ring->persistent_purge_work);
-	pr_debug("Purged %u/%u\n", (total - num_clean), total);
+		/* We can defer this work */
+		schedule_work(&ring->persistent_purge_work);
+		pr_debug("Purged %u/%u\n", num_clean, total);
+	}
 
 out:
 	return;
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index ecb35fe8ca8d..26710602d463 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -250,6 +250,7 @@ struct persistent_gnt {
 	struct page *page;
 	grant_ref_t gnt;
 	grant_handle_t handle;
+	unsigned long last_used;
 	DECLARE_BITMAP(flags, PERSISTENT_GNT_FLAGS_SIZE);
 	struct rb_node node;
 	struct list_head remove_node;
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] xen/blkfront: remove unused macros
  2018-08-06 11:33 [PATCH 0/4] xen/blk: persistent grant rework Juergen Gross
  2018-08-06 11:33 ` [PATCH 1/4] xen/blkback: don't keep persistent grants too long Juergen Gross
@ 2018-08-06 11:34 ` Juergen Gross
  2018-08-06 11:36   ` Juergen Gross
  2018-08-06 11:34 ` [PATCH 2/4] xen/blkfront: cleanup stale persistent grants Juergen Gross
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2018-08-06 11:34 UTC (permalink / raw)
  To: linux-kernel, xen-devel, linux-block
  Cc: konrad.wilk, roger.pau, axboe, boris.ostrovsky, Juergen Gross

Remove some macros not used anywhere.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/block/xen-blkfront.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b5cedccb5d7d..94300dbe358b 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -251,14 +251,9 @@ static DEFINE_SPINLOCK(minor_lock);
 #define GRANTS_PER_INDIRECT_FRAME \
 	(XEN_PAGE_SIZE / sizeof(struct blkif_request_segment))
 
-#define PSEGS_PER_INDIRECT_FRAME	\
-	(GRANTS_INDIRECT_FRAME / GRANTS_PSEGS)
-
 #define INDIRECT_GREFS(_grants)		\
 	DIV_ROUND_UP(_grants, GRANTS_PER_INDIRECT_FRAME)
 
-#define GREFS(_psegs)	((_psegs) * GRANTS_PER_PSEG)
-
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
 static void blkfront_gather_backend_features(struct blkfront_info *info);
 static int negotiate_mq(struct blkfront_info *info);
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] xen/blkfront: cleanup stale persistent grants
  2018-08-06 11:33 [PATCH 0/4] xen/blk: persistent grant rework Juergen Gross
  2018-08-06 11:33 ` [PATCH 1/4] xen/blkback: don't keep persistent grants too long Juergen Gross
  2018-08-06 11:34 ` [PATCH] xen/blkfront: remove unused macros Juergen Gross
@ 2018-08-06 11:34 ` Juergen Gross
  2018-08-06 16:16   ` Roger Pau Monné
  2018-08-06 11:34 ` [PATCH 3/4] xen/blkfront: reorder tests in xlblk_init() Juergen Gross
  2018-08-06 11:34 ` [PATCH 4/4] xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring Juergen Gross
  4 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2018-08-06 11:34 UTC (permalink / raw)
  To: linux-kernel, xen-devel, linux-block
  Cc: konrad.wilk, roger.pau, axboe, boris.ostrovsky, Juergen Gross

Add a periodic cleanup function to remove old persistent grants which
are no longer in use on the backend side. This avoids starvation in
case there are lots of persistent grants for a device which no longer
is involved in I/O business.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 95 insertions(+), 4 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b5cedccb5d7d..19feb8835fc4 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -46,6 +46,7 @@
 #include <linux/scatterlist.h>
 #include <linux/bitmap.h>
 #include <linux/list.h>
+#include <linux/workqueue.h>
 
 #include <xen/xen.h>
 #include <xen/xenbus.h>
@@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
 
 static DEFINE_MUTEX(blkfront_mutex);
 static const struct block_device_operations xlvbd_block_fops;
+static struct delayed_work blkfront_work;
+static LIST_HEAD(info_list);
+static bool blkfront_work_active;
 
 /*
  * Maximum number of segments in indirect requests, the actual value used by
@@ -216,6 +220,7 @@ struct blkfront_info
 	/* Save uncomplete reqs and bios for migration. */
 	struct list_head requests;
 	struct bio_list bio_list;
+	struct list_head info_list;
 };
 
 static unsigned int nr_minors;
@@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
 	return err;
 }
 
+static void free_info(struct blkfront_info *info)
+{
+	list_del(&info->info_list);
+	kfree(info);
+}
+
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
 			   struct blkfront_info *info)
@@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
  destroy_blkring:
 	blkif_free(info, 0);
 
-	kfree(info);
+	mutex_lock(&blkfront_mutex);
+	free_info(info);
+	mutex_unlock(&blkfront_mutex);
+
 	dev_set_drvdata(&dev->dev, NULL);
 
 	return err;
@@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
 	info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
 	dev_set_drvdata(&dev->dev, info);
 
+	mutex_lock(&blkfront_mutex);
+	list_add(&info->info_list, &info_list);
+	mutex_unlock(&blkfront_mutex);
+
 	return 0;
 }
 
@@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
 	if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
 		indirect_segments = 0;
 	info->max_indirect_segments = indirect_segments;
+
+	if (info->feature_persistent) {
+		mutex_lock(&blkfront_mutex);
+		if (!blkfront_work_active) {
+			blkfront_work_active = true;
+			schedule_delayed_work(&blkfront_work, HZ * 10);
+		}
+		mutex_unlock(&blkfront_mutex);
+	}
 }
 
 /*
@@ -2487,7 +2514,9 @@ static int blkfront_remove(struct xenbus_device *xbdev)
 	mutex_unlock(&info->mutex);
 
 	if (!bdev) {
-		kfree(info);
+		mutex_lock(&blkfront_mutex);
+		free_info(info);
+		mutex_unlock(&blkfront_mutex);
 		return 0;
 	}
 
@@ -2507,7 +2536,9 @@ static int blkfront_remove(struct xenbus_device *xbdev)
 	if (info && !bdev->bd_openers) {
 		xlvbd_release_gendisk(info);
 		disk->private_data = NULL;
-		kfree(info);
+		mutex_lock(&blkfront_mutex);
+		free_info(info);
+		mutex_unlock(&blkfront_mutex);
 	}
 
 	mutex_unlock(&bdev->bd_mutex);
@@ -2590,7 +2621,7 @@ static void blkif_release(struct gendisk *disk, fmode_t mode)
 		dev_info(disk_to_dev(bdev->bd_disk), "releasing disk\n");
 		xlvbd_release_gendisk(info);
 		disk->private_data = NULL;
-		kfree(info);
+		free_info(info);
 	}
 
 out:
@@ -2623,6 +2654,62 @@ static struct xenbus_driver blkfront_driver = {
 	.is_ready = blkfront_is_ready,
 };
 
+static void purge_persistent_grants(struct blkfront_info *info)
+{
+	unsigned int i;
+	unsigned long flags;
+
+	for (i = 0; i < info->nr_rings; i++) {
+		struct blkfront_ring_info *rinfo = &info->rinfo[i];
+		struct grant *gnt_list_entry, *tmp;
+
+		spin_lock_irqsave(&rinfo->ring_lock, flags);
+
+		if (rinfo->persistent_gnts_c == 0) {
+			spin_unlock_irqrestore(&rinfo->ring_lock, flags);
+			continue;
+		}
+
+		list_for_each_entry_safe(gnt_list_entry, tmp, &rinfo->grants,
+					 node) {
+			if (gnt_list_entry->gref == GRANT_INVALID_REF ||
+			    gnttab_query_foreign_access(gnt_list_entry->gref))
+				continue;
+
+			list_del(&gnt_list_entry->node);
+			gnttab_end_foreign_access(gnt_list_entry->gref, 0, 0UL);
+			rinfo->persistent_gnts_c--;
+			__free_page(gnt_list_entry->page);
+			kfree(gnt_list_entry);
+		}
+
+		spin_unlock_irqrestore(&rinfo->ring_lock, flags);
+	}
+}
+
+static void blkfront_delay_work(struct work_struct *work)
+{
+	struct blkfront_info *info;
+
+	mutex_lock(&blkfront_mutex);
+
+	blkfront_work_active = false;
+
+	list_for_each_entry(info, &info_list, info_list) {
+		if (info->feature_persistent) {
+			blkfront_work_active = true;
+			mutex_lock(&info->mutex);
+			purge_persistent_grants(info);
+			mutex_unlock(&info->mutex);
+		}
+	}
+
+	if (blkfront_work_active)
+		schedule_delayed_work(&blkfront_work, HZ * 10);
+
+	mutex_unlock(&blkfront_mutex);
+}
+
 static int __init xlblk_init(void)
 {
 	int ret;
@@ -2655,6 +2742,8 @@ static int __init xlblk_init(void)
 		return -ENODEV;
 	}
 
+	INIT_DELAYED_WORK(&blkfront_work, blkfront_delay_work);
+
 	ret = xenbus_register_frontend(&blkfront_driver);
 	if (ret) {
 		unregister_blkdev(XENVBD_MAJOR, DEV_NAME);
@@ -2668,6 +2757,8 @@ module_init(xlblk_init);
 
 static void __exit xlblk_exit(void)
 {
+	cancel_delayed_work_sync(&blkfront_work);
+
 	xenbus_unregister_driver(&blkfront_driver);
 	unregister_blkdev(XENVBD_MAJOR, DEV_NAME);
 	kfree(minors);
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] xen/blkfront: reorder tests in xlblk_init()
  2018-08-06 11:33 [PATCH 0/4] xen/blk: persistent grant rework Juergen Gross
                   ` (2 preceding siblings ...)
  2018-08-06 11:34 ` [PATCH 2/4] xen/blkfront: cleanup stale persistent grants Juergen Gross
@ 2018-08-06 11:34 ` Juergen Gross
  2018-08-06 16:18   ` Roger Pau Monné
  2018-08-06 11:34 ` [PATCH 4/4] xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring Juergen Gross
  4 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2018-08-06 11:34 UTC (permalink / raw)
  To: linux-kernel, xen-devel, linux-block
  Cc: konrad.wilk, roger.pau, axboe, boris.ostrovsky, Juergen Gross

In case we don't want pv block devices we should not test parameters
for sanity and eventually print out error messages. So test precluding
conditions before checking parameters.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/block/xen-blkfront.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 19feb8835fc4..f72d96384326 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2718,6 +2718,15 @@ static int __init xlblk_init(void)
 	if (!xen_domain())
 		return -ENODEV;
 
+	if (!xen_has_pv_disk_devices())
+		return -ENODEV;
+
+	if (register_blkdev(XENVBD_MAJOR, DEV_NAME)) {
+		pr_warn("xen_blk: can't get major %d with name %s\n",
+			XENVBD_MAJOR, DEV_NAME);
+		return -ENODEV;
+	}
+
 	if (xen_blkif_max_segments < BLKIF_MAX_SEGMENTS_PER_REQUEST)
 		xen_blkif_max_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
 
@@ -2733,15 +2742,6 @@ static int __init xlblk_init(void)
 		xen_blkif_max_queues = nr_cpus;
 	}
 
-	if (!xen_has_pv_disk_devices())
-		return -ENODEV;
-
-	if (register_blkdev(XENVBD_MAJOR, DEV_NAME)) {
-		printk(KERN_WARNING "xen_blk: can't get major %d with name %s\n",
-		       XENVBD_MAJOR, DEV_NAME);
-		return -ENODEV;
-	}
-
 	INIT_DELAYED_WORK(&blkfront_work, blkfront_delay_work);
 
 	ret = xenbus_register_frontend(&blkfront_driver);
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring
  2018-08-06 11:33 [PATCH 0/4] xen/blk: persistent grant rework Juergen Gross
                   ` (3 preceding siblings ...)
  2018-08-06 11:34 ` [PATCH 3/4] xen/blkfront: reorder tests in xlblk_init() Juergen Gross
@ 2018-08-06 11:34 ` Juergen Gross
  2018-08-06 16:20   ` Roger Pau Monné
  4 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2018-08-06 11:34 UTC (permalink / raw)
  To: linux-kernel, xen-devel, linux-block
  Cc: konrad.wilk, roger.pau, axboe, boris.ostrovsky, Juergen Gross

pers_gnts_lock isn't being used anywhere. Remove it.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/block/xen-blkback/common.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 26710602d463..4b7747159c38 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -279,7 +279,6 @@ struct xen_blkif_ring {
 	wait_queue_head_t	pending_free_wq;
 
 	/* Tree to store persistent grants. */
-	spinlock_t		pers_gnts_lock;
 	struct rb_root		persistent_gnts;
 	unsigned int		persistent_gnt_c;
 	atomic_t		persistent_gnt_in_use;
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] xen/blkfront: remove unused macros
  2018-08-06 11:34 ` [PATCH] xen/blkfront: remove unused macros Juergen Gross
@ 2018-08-06 11:36   ` Juergen Gross
  0 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-08-06 11:36 UTC (permalink / raw)
  To: linux-kernel, xen-devel, linux-block
  Cc: konrad.wilk, roger.pau, axboe, boris.ostrovsky

On 06/08/18 13:34, Juergen Gross wrote:
> Remove some macros not used anywhere.

Meh, ignore that one, please. Doesn't belong to this series.


Juergen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] xen/blkback: don't keep persistent grants too long
  2018-08-06 11:33 ` [PATCH 1/4] xen/blkback: don't keep persistent grants too long Juergen Gross
@ 2018-08-06 15:58   ` Roger Pau Monné
  2018-08-07  6:34     ` Juergen Gross
  0 siblings, 1 reply; 16+ messages in thread
From: Roger Pau Monné @ 2018-08-06 15:58 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, xen-devel, linux-block, konrad.wilk, axboe,
	boris.ostrovsky

On Mon, Aug 06, 2018 at 01:33:59PM +0200, Juergen Gross wrote:
> Persistent grants are allocated until a threshold per ring is being
> reached. Those grants won't be freed until the ring is being destroyed
> meaning there will be resources kept busy which might no longer be
> used.
> 
> Instead of freeing only persistent grants until the threshold is
> reached add a timestamp and remove all persistent grants not having
> been in use for a minute.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  drivers/block/xen-blkback/blkback.c | 77 +++++++++++++++++++++++--------------
>  drivers/block/xen-blkback/common.h  |  1 +
>  2 files changed, 50 insertions(+), 28 deletions(-)

You should document this new parameter in
Documentation/ABI/testing/sysfs-driver-xen-blkback.

> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index b55b245e8052..485e3ecab144 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -84,6 +84,18 @@ MODULE_PARM_DESC(max_persistent_grants,
>                   "Maximum number of grants to map persistently");
>  
>  /*
> + * How long a persistent grant is allowed to remain allocated without being in
> + * use. The time is in seconds, 0 means indefinitely long.
> + */
> +
> +unsigned int xen_blkif_pgrant_timeout = 60;
> +module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
> +		   uint, 0644);
> +MODULE_PARM_DESC(persistent_grant_unused_seconds,
> +		 "Time in seconds an unused persistent grant is allowed to "
> +		 "remain allocated. Default is 60, 0 means unlimited.");
> +
> +/*
>   * Maximum number of rings/queues blkback supports, allow as many queues as there
>   * are CPUs if user has not specified a value.
>   */
> @@ -123,6 +135,13 @@ module_param(log_stats, int, 0644);
>  /* Number of free pages to remove on each call to gnttab_free_pages */
>  #define NUM_BATCH_FREE_PAGES 10
>  
> +static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt)
> +{
> +	return xen_blkif_pgrant_timeout &&
> +	       (jiffies - persistent_gnt->last_used >=
> +		HZ * xen_blkif_pgrant_timeout);
> +}
> +
>  static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>  {
>  	unsigned long flags;
> @@ -278,6 +297,7 @@ static void put_persistent_gnt(struct xen_blkif_ring *ring,
>  {
>  	if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
>  		pr_alert_ratelimited("freeing a grant already unused\n");
> +	persistent_gnt->last_used = jiffies;
>  	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
>  	clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
>  	atomic_dec(&ring->persistent_gnt_in_use);
> @@ -374,23 +394,23 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
>  	bool scan_used = false, clean_used = false;
>  	struct rb_root *root;
>  
> -	if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
> -	    (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
> -	    !ring->blkif->vbd.overflow_max_grants)) {
> -		goto out;
> -	}
> -
>  	if (work_busy(&ring->persistent_purge_work)) {
>  		pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
>  		goto out;
>  	}
>  
> -	num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
> -	num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
> -	num_clean = min(ring->persistent_gnt_c, num_clean);
> -	if ((num_clean == 0) ||
> -	    (num_clean > (ring->persistent_gnt_c - atomic_read(&ring->persistent_gnt_in_use))))
> -		goto out;
> +	if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
> +	    (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
> +	    !ring->blkif->vbd.overflow_max_grants)) {
> +		num_clean = 0;
> +	} else {
> +		num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
> +		num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
> +			    num_clean;
> +		num_clean = min(ring->persistent_gnt_c, num_clean);
> +		pr_debug("Going to purge at least %u persistent grants\n",
> +			 num_clean);
> +	}
>  
>  	/*
>  	 * At this point, we can assure that there will be no calls
> @@ -401,9 +421,7 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
>           * number of grants.
>  	 */
>  
> -	total = num_clean;
> -
> -	pr_debug("Going to purge %u persistent grants\n", num_clean);
> +	total = 0;
>  
>  	BUG_ON(!list_empty(&ring->persistent_purge_list));
>  	root = &ring->persistent_gnts;
> @@ -419,39 +437,42 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
>  
>  		if (test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
>  			continue;
> -		if (!scan_used &&
> +		if (!scan_used && !persistent_gnt_timeout(persistent_gnt) &&
>  		    (test_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags)))

If you store the jiffies of the time when the grant was last used it
seems like we could get rid of the PERSISTENT_GNT_WAS_ACTIVE flag and
instead use the per-grant jiffies and the jiffies from the last scan
in order to decide which grants to remove?

Thanks, Roger.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants
  2018-08-06 11:34 ` [PATCH 2/4] xen/blkfront: cleanup stale persistent grants Juergen Gross
@ 2018-08-06 16:16   ` Roger Pau Monné
  2018-08-07  6:31     ` Juergen Gross
  0 siblings, 1 reply; 16+ messages in thread
From: Roger Pau Monné @ 2018-08-06 16:16 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, xen-devel, linux-block, konrad.wilk, axboe,
	boris.ostrovsky

On Mon, Aug 06, 2018 at 01:34:01PM +0200, Juergen Gross wrote:
> Add a periodic cleanup function to remove old persistent grants which
> are no longer in use on the backend side. This avoids starvation in
> case there are lots of persistent grants for a device which no longer
> is involved in I/O business.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 95 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index b5cedccb5d7d..19feb8835fc4 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -46,6 +46,7 @@
>  #include <linux/scatterlist.h>
>  #include <linux/bitmap.h>
>  #include <linux/list.h>
> +#include <linux/workqueue.h>
>  
>  #include <xen/xen.h>
>  #include <xen/xenbus.h>
> @@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
>  
>  static DEFINE_MUTEX(blkfront_mutex);
>  static const struct block_device_operations xlvbd_block_fops;
> +static struct delayed_work blkfront_work;
> +static LIST_HEAD(info_list);
> +static bool blkfront_work_active;
>  
>  /*
>   * Maximum number of segments in indirect requests, the actual value used by
> @@ -216,6 +220,7 @@ struct blkfront_info
>  	/* Save uncomplete reqs and bios for migration. */
>  	struct list_head requests;
>  	struct bio_list bio_list;
> +	struct list_head info_list;
>  };
>  
>  static unsigned int nr_minors;
> @@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
>  	return err;
>  }
>  
> +static void free_info(struct blkfront_info *info)
> +{
> +	list_del(&info->info_list);
> +	kfree(info);
> +}
> +
>  /* Common code used when first setting up, and when resuming. */
>  static int talk_to_blkback(struct xenbus_device *dev,
>  			   struct blkfront_info *info)
> @@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
>   destroy_blkring:
>  	blkif_free(info, 0);
>  
> -	kfree(info);
> +	mutex_lock(&blkfront_mutex);
> +	free_info(info);
> +	mutex_unlock(&blkfront_mutex);
> +
>  	dev_set_drvdata(&dev->dev, NULL);
>  
>  	return err;
> @@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
>  	info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
>  	dev_set_drvdata(&dev->dev, info);
>  
> +	mutex_lock(&blkfront_mutex);
> +	list_add(&info->info_list, &info_list);
> +	mutex_unlock(&blkfront_mutex);
> +
>  	return 0;
>  }
>  
> @@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
>  	if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
>  		indirect_segments = 0;
>  	info->max_indirect_segments = indirect_segments;
> +
> +	if (info->feature_persistent) {
> +		mutex_lock(&blkfront_mutex);
> +		if (!blkfront_work_active) {
> +			blkfront_work_active = true;
> +			schedule_delayed_work(&blkfront_work, HZ * 10);

Does it make sense to provide a module parameter to rune the schedule
of the cleanup routine?

> +		}
> +		mutex_unlock(&blkfront_mutex);

Is it really necessary to have the blkfront_work_active boolean? What
happens if you queue the same delayed work more than once?

Thanks, Roger.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] xen/blkfront: reorder tests in xlblk_init()
  2018-08-06 11:34 ` [PATCH 3/4] xen/blkfront: reorder tests in xlblk_init() Juergen Gross
@ 2018-08-06 16:18   ` Roger Pau Monné
  0 siblings, 0 replies; 16+ messages in thread
From: Roger Pau Monné @ 2018-08-06 16:18 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, xen-devel, linux-block, konrad.wilk, axboe,
	boris.ostrovsky

On Mon, Aug 06, 2018 at 01:34:02PM +0200, Juergen Gross wrote:
> In case we don't want pv block devices we should not test parameters
> for sanity and eventually print out error messages. So test precluding
> conditions before checking parameters.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring
  2018-08-06 11:34 ` [PATCH 4/4] xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring Juergen Gross
@ 2018-08-06 16:20   ` Roger Pau Monné
  0 siblings, 0 replies; 16+ messages in thread
From: Roger Pau Monné @ 2018-08-06 16:20 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, xen-devel, linux-block, konrad.wilk, axboe,
	boris.ostrovsky

On Mon, Aug 06, 2018 at 01:34:03PM +0200, Juergen Gross wrote:
> pers_gnts_lock isn't being used anywhere. Remove it.

That was likely a leftover from when the list of persistent grants was
shared between rings.

> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants
  2018-08-06 16:16   ` Roger Pau Monné
@ 2018-08-07  6:31     ` Juergen Gross
  2018-08-07 14:14       ` Roger Pau Monné
  0 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2018-08-07  6:31 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: linux-kernel, xen-devel, linux-block, konrad.wilk, axboe,
	boris.ostrovsky

On 06/08/18 18:16, Roger Pau Monné wrote:
> On Mon, Aug 06, 2018 at 01:34:01PM +0200, Juergen Gross wrote:
>> Add a periodic cleanup function to remove old persistent grants which
>> are no longer in use on the backend side. This avoids starvation in
>> case there are lots of persistent grants for a device which no longer
>> is involved in I/O business.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>  drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 95 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index b5cedccb5d7d..19feb8835fc4 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -46,6 +46,7 @@
>>  #include <linux/scatterlist.h>
>>  #include <linux/bitmap.h>
>>  #include <linux/list.h>
>> +#include <linux/workqueue.h>
>>  
>>  #include <xen/xen.h>
>>  #include <xen/xenbus.h>
>> @@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
>>  
>>  static DEFINE_MUTEX(blkfront_mutex);
>>  static const struct block_device_operations xlvbd_block_fops;
>> +static struct delayed_work blkfront_work;
>> +static LIST_HEAD(info_list);
>> +static bool blkfront_work_active;
>>  
>>  /*
>>   * Maximum number of segments in indirect requests, the actual value used by
>> @@ -216,6 +220,7 @@ struct blkfront_info
>>  	/* Save uncomplete reqs and bios for migration. */
>>  	struct list_head requests;
>>  	struct bio_list bio_list;
>> +	struct list_head info_list;
>>  };
>>  
>>  static unsigned int nr_minors;
>> @@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
>>  	return err;
>>  }
>>  
>> +static void free_info(struct blkfront_info *info)
>> +{
>> +	list_del(&info->info_list);
>> +	kfree(info);
>> +}
>> +
>>  /* Common code used when first setting up, and when resuming. */
>>  static int talk_to_blkback(struct xenbus_device *dev,
>>  			   struct blkfront_info *info)
>> @@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
>>   destroy_blkring:
>>  	blkif_free(info, 0);
>>  
>> -	kfree(info);
>> +	mutex_lock(&blkfront_mutex);
>> +	free_info(info);
>> +	mutex_unlock(&blkfront_mutex);
>> +
>>  	dev_set_drvdata(&dev->dev, NULL);
>>  
>>  	return err;
>> @@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
>>  	info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
>>  	dev_set_drvdata(&dev->dev, info);
>>  
>> +	mutex_lock(&blkfront_mutex);
>> +	list_add(&info->info_list, &info_list);
>> +	mutex_unlock(&blkfront_mutex);
>> +
>>  	return 0;
>>  }
>>  
>> @@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
>>  	if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
>>  		indirect_segments = 0;
>>  	info->max_indirect_segments = indirect_segments;
>> +
>> +	if (info->feature_persistent) {
>> +		mutex_lock(&blkfront_mutex);
>> +		if (!blkfront_work_active) {
>> +			blkfront_work_active = true;
>> +			schedule_delayed_work(&blkfront_work, HZ * 10);
> 
> Does it make sense to provide a module parameter to rune the schedule
> of the cleanup routine?

I don't think this is something anyone would like to tune.

In case you think it should be tunable I can add a parameter, of course.

> 
>> +		}
>> +		mutex_unlock(&blkfront_mutex);
> 
> Is it really necessary to have the blkfront_work_active boolean? What
> happens if you queue the same delayed work more than once?

In case there is already work queued later calls of
schedule_delayed_work() will be ignored.

So yes, I can drop the global boolean (I still need a local flag in
blkfront_delay_work() for controlling the need to call
schedule_delayed_work() again).


Juergen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] xen/blkback: don't keep persistent grants too long
  2018-08-06 15:58   ` Roger Pau Monné
@ 2018-08-07  6:34     ` Juergen Gross
  0 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2018-08-07  6:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: linux-kernel, xen-devel, linux-block, konrad.wilk, axboe,
	boris.ostrovsky

On 06/08/18 17:58, Roger Pau Monné wrote:
> On Mon, Aug 06, 2018 at 01:33:59PM +0200, Juergen Gross wrote:
>> Persistent grants are allocated until a threshold per ring is being
>> reached. Those grants won't be freed until the ring is being destroyed
>> meaning there will be resources kept busy which might no longer be
>> used.
>>
>> Instead of freeing only persistent grants until the threshold is
>> reached add a timestamp and remove all persistent grants not having
>> been in use for a minute.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>  drivers/block/xen-blkback/blkback.c | 77 +++++++++++++++++++++++--------------
>>  drivers/block/xen-blkback/common.h  |  1 +
>>  2 files changed, 50 insertions(+), 28 deletions(-)
> 
> You should document this new parameter in
> Documentation/ABI/testing/sysfs-driver-xen-blkback.

Yes.

> 
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index b55b245e8052..485e3ecab144 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -84,6 +84,18 @@ MODULE_PARM_DESC(max_persistent_grants,
>>                   "Maximum number of grants to map persistently");
>>  
>>  /*
>> + * How long a persistent grant is allowed to remain allocated without being in
>> + * use. The time is in seconds, 0 means indefinitely long.
>> + */
>> +
>> +unsigned int xen_blkif_pgrant_timeout = 60;
>> +module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
>> +		   uint, 0644);
>> +MODULE_PARM_DESC(persistent_grant_unused_seconds,
>> +		 "Time in seconds an unused persistent grant is allowed to "
>> +		 "remain allocated. Default is 60, 0 means unlimited.");
>> +
>> +/*
>>   * Maximum number of rings/queues blkback supports, allow as many queues as there
>>   * are CPUs if user has not specified a value.
>>   */
>> @@ -123,6 +135,13 @@ module_param(log_stats, int, 0644);
>>  /* Number of free pages to remove on each call to gnttab_free_pages */
>>  #define NUM_BATCH_FREE_PAGES 10
>>  
>> +static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt)
>> +{
>> +	return xen_blkif_pgrant_timeout &&
>> +	       (jiffies - persistent_gnt->last_used >=
>> +		HZ * xen_blkif_pgrant_timeout);
>> +}
>> +
>>  static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>>  {
>>  	unsigned long flags;
>> @@ -278,6 +297,7 @@ static void put_persistent_gnt(struct xen_blkif_ring *ring,
>>  {
>>  	if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
>>  		pr_alert_ratelimited("freeing a grant already unused\n");
>> +	persistent_gnt->last_used = jiffies;
>>  	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
>>  	clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
>>  	atomic_dec(&ring->persistent_gnt_in_use);
>> @@ -374,23 +394,23 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
>>  	bool scan_used = false, clean_used = false;
>>  	struct rb_root *root;
>>  
>> -	if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
>> -	    (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
>> -	    !ring->blkif->vbd.overflow_max_grants)) {
>> -		goto out;
>> -	}
>> -
>>  	if (work_busy(&ring->persistent_purge_work)) {
>>  		pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
>>  		goto out;
>>  	}
>>  
>> -	num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
>> -	num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
>> -	num_clean = min(ring->persistent_gnt_c, num_clean);
>> -	if ((num_clean == 0) ||
>> -	    (num_clean > (ring->persistent_gnt_c - atomic_read(&ring->persistent_gnt_in_use))))
>> -		goto out;
>> +	if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
>> +	    (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
>> +	    !ring->blkif->vbd.overflow_max_grants)) {
>> +		num_clean = 0;
>> +	} else {
>> +		num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
>> +		num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
>> +			    num_clean;
>> +		num_clean = min(ring->persistent_gnt_c, num_clean);
>> +		pr_debug("Going to purge at least %u persistent grants\n",
>> +			 num_clean);
>> +	}
>>  
>>  	/*
>>  	 * At this point, we can assure that there will be no calls
>> @@ -401,9 +421,7 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
>>           * number of grants.
>>  	 */
>>  
>> -	total = num_clean;
>> -
>> -	pr_debug("Going to purge %u persistent grants\n", num_clean);
>> +	total = 0;
>>  
>>  	BUG_ON(!list_empty(&ring->persistent_purge_list));
>>  	root = &ring->persistent_gnts;
>> @@ -419,39 +437,42 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
>>  
>>  		if (test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
>>  			continue;
>> -		if (!scan_used &&
>> +		if (!scan_used && !persistent_gnt_timeout(persistent_gnt) &&
>>  		    (test_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags)))
> 
> If you store the jiffies of the time when the grant was last used it
> seems like we could get rid of the PERSISTENT_GNT_WAS_ACTIVE flag and
> instead use the per-grant jiffies and the jiffies from the last scan
> in order to decide which grants to remove?

True. This might make the control flow a little bit easier to
understand.


Juergen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants
  2018-08-07  6:31     ` Juergen Gross
@ 2018-08-07 14:14       ` Roger Pau Monné
  2018-08-07 15:56         ` Juergen Gross
  0 siblings, 1 reply; 16+ messages in thread
From: Roger Pau Monné @ 2018-08-07 14:14 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, xen-devel, linux-block, konrad.wilk, axboe,
	boris.ostrovsky

On Tue, Aug 07, 2018 at 08:31:31AM +0200, Juergen Gross wrote:
> On 06/08/18 18:16, Roger Pau Monné wrote:
> > On Mon, Aug 06, 2018 at 01:34:01PM +0200, Juergen Gross wrote:
> >> Add a periodic cleanup function to remove old persistent grants which
> >> are no longer in use on the backend side. This avoids starvation in
> >> case there are lots of persistent grants for a device which no longer
> >> is involved in I/O business.
> >>
> >> Signed-off-by: Juergen Gross <jgross@suse.com>
> >> ---
> >>  drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
> >>  1 file changed, 95 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> >> index b5cedccb5d7d..19feb8835fc4 100644
> >> --- a/drivers/block/xen-blkfront.c
> >> +++ b/drivers/block/xen-blkfront.c
> >> @@ -46,6 +46,7 @@
> >>  #include <linux/scatterlist.h>
> >>  #include <linux/bitmap.h>
> >>  #include <linux/list.h>
> >> +#include <linux/workqueue.h>
> >>  
> >>  #include <xen/xen.h>
> >>  #include <xen/xenbus.h>
> >> @@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
> >>  
> >>  static DEFINE_MUTEX(blkfront_mutex);
> >>  static const struct block_device_operations xlvbd_block_fops;
> >> +static struct delayed_work blkfront_work;
> >> +static LIST_HEAD(info_list);
> >> +static bool blkfront_work_active;
> >>  
> >>  /*
> >>   * Maximum number of segments in indirect requests, the actual value used by
> >> @@ -216,6 +220,7 @@ struct blkfront_info
> >>  	/* Save uncomplete reqs and bios for migration. */
> >>  	struct list_head requests;
> >>  	struct bio_list bio_list;
> >> +	struct list_head info_list;
> >>  };
> >>  
> >>  static unsigned int nr_minors;
> >> @@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
> >>  	return err;
> >>  }
> >>  
> >> +static void free_info(struct blkfront_info *info)
> >> +{
> >> +	list_del(&info->info_list);
> >> +	kfree(info);
> >> +}
> >> +
> >>  /* Common code used when first setting up, and when resuming. */
> >>  static int talk_to_blkback(struct xenbus_device *dev,
> >>  			   struct blkfront_info *info)
> >> @@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
> >>   destroy_blkring:
> >>  	blkif_free(info, 0);
> >>  
> >> -	kfree(info);
> >> +	mutex_lock(&blkfront_mutex);
> >> +	free_info(info);
> >> +	mutex_unlock(&blkfront_mutex);
> >> +
> >>  	dev_set_drvdata(&dev->dev, NULL);
> >>  
> >>  	return err;
> >> @@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
> >>  	info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
> >>  	dev_set_drvdata(&dev->dev, info);
> >>  
> >> +	mutex_lock(&blkfront_mutex);
> >> +	list_add(&info->info_list, &info_list);
> >> +	mutex_unlock(&blkfront_mutex);
> >> +
> >>  	return 0;
> >>  }
> >>  
> >> @@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
> >>  	if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
> >>  		indirect_segments = 0;
> >>  	info->max_indirect_segments = indirect_segments;
> >> +
> >> +	if (info->feature_persistent) {
> >> +		mutex_lock(&blkfront_mutex);
> >> +		if (!blkfront_work_active) {
> >> +			blkfront_work_active = true;
> >> +			schedule_delayed_work(&blkfront_work, HZ * 10);
> > 
> > Does it make sense to provide a module parameter to rune the schedule
> > of the cleanup routine?
> 
> I don't think this is something anyone would like to tune.
> 
> In case you think it should be tunable I can add a parameter, of course.

We can always add it later if required. I'm fine as-is now.

> > 
> >> +		}
> >> +		mutex_unlock(&blkfront_mutex);
> > 
> > Is it really necessary to have the blkfront_work_active boolean? What
> > happens if you queue the same delayed work more than once?
> 
> In case there is already work queued later calls of
> schedule_delayed_work() will be ignored.
> 
> So yes, I can drop the global boolean (I still need a local flag in
> blkfront_delay_work() for controlling the need to call
> schedule_delayed_work() again).

Can't you just call schedule_delayed_work if info->feature_persistent
is set, even if that means calling it multiple times if multiple
blkfront instances are using persistent grants?

Thanks, Roger.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants
  2018-08-07 14:14       ` Roger Pau Monné
@ 2018-08-07 15:56         ` Juergen Gross
  2018-08-08  8:27           ` Roger Pau Monné
  0 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2018-08-07 15:56 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: linux-kernel, xen-devel, linux-block, konrad.wilk, axboe,
	boris.ostrovsky

On 07/08/18 16:14, Roger Pau Monné wrote:
> On Tue, Aug 07, 2018 at 08:31:31AM +0200, Juergen Gross wrote:
>> On 06/08/18 18:16, Roger Pau Monné wrote:
>>> On Mon, Aug 06, 2018 at 01:34:01PM +0200, Juergen Gross wrote:
>>>> Add a periodic cleanup function to remove old persistent grants which
>>>> are no longer in use on the backend side. This avoids starvation in
>>>> case there are lots of persistent grants for a device which no longer
>>>> is involved in I/O business.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>>  drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
>>>>  1 file changed, 95 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index b5cedccb5d7d..19feb8835fc4 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -46,6 +46,7 @@
>>>>  #include <linux/scatterlist.h>
>>>>  #include <linux/bitmap.h>
>>>>  #include <linux/list.h>
>>>> +#include <linux/workqueue.h>
>>>>  
>>>>  #include <xen/xen.h>
>>>>  #include <xen/xenbus.h>
>>>> @@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
>>>>  
>>>>  static DEFINE_MUTEX(blkfront_mutex);
>>>>  static const struct block_device_operations xlvbd_block_fops;
>>>> +static struct delayed_work blkfront_work;
>>>> +static LIST_HEAD(info_list);
>>>> +static bool blkfront_work_active;
>>>>  
>>>>  /*
>>>>   * Maximum number of segments in indirect requests, the actual value used by
>>>> @@ -216,6 +220,7 @@ struct blkfront_info
>>>>  	/* Save uncomplete reqs and bios for migration. */
>>>>  	struct list_head requests;
>>>>  	struct bio_list bio_list;
>>>> +	struct list_head info_list;
>>>>  };
>>>>  
>>>>  static unsigned int nr_minors;
>>>> @@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
>>>>  	return err;
>>>>  }
>>>>  
>>>> +static void free_info(struct blkfront_info *info)
>>>> +{
>>>> +	list_del(&info->info_list);
>>>> +	kfree(info);
>>>> +}
>>>> +
>>>>  /* Common code used when first setting up, and when resuming. */
>>>>  static int talk_to_blkback(struct xenbus_device *dev,
>>>>  			   struct blkfront_info *info)
>>>> @@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
>>>>   destroy_blkring:
>>>>  	blkif_free(info, 0);
>>>>  
>>>> -	kfree(info);
>>>> +	mutex_lock(&blkfront_mutex);
>>>> +	free_info(info);
>>>> +	mutex_unlock(&blkfront_mutex);
>>>> +
>>>>  	dev_set_drvdata(&dev->dev, NULL);
>>>>  
>>>>  	return err;
>>>> @@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
>>>>  	info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
>>>>  	dev_set_drvdata(&dev->dev, info);
>>>>  
>>>> +	mutex_lock(&blkfront_mutex);
>>>> +	list_add(&info->info_list, &info_list);
>>>> +	mutex_unlock(&blkfront_mutex);
>>>> +
>>>>  	return 0;
>>>>  }
>>>>  
>>>> @@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
>>>>  	if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
>>>>  		indirect_segments = 0;
>>>>  	info->max_indirect_segments = indirect_segments;
>>>> +
>>>> +	if (info->feature_persistent) {
>>>> +		mutex_lock(&blkfront_mutex);
>>>> +		if (!blkfront_work_active) {
>>>> +			blkfront_work_active = true;
>>>> +			schedule_delayed_work(&blkfront_work, HZ * 10);
>>>
>>> Does it make sense to provide a module parameter to rune the schedule
>>> of the cleanup routine?
>>
>> I don't think this is something anyone would like to tune.
>>
>> In case you think it should be tunable I can add a parameter, of course.
> 
> We can always add it later if required. I'm fine as-is now.
> 
>>>
>>>> +		}
>>>> +		mutex_unlock(&blkfront_mutex);
>>>
>>> Is it really necessary to have the blkfront_work_active boolean? What
>>> happens if you queue the same delayed work more than once?
>>
>> In case there is already work queued later calls of
>> schedule_delayed_work() will be ignored.
>>
>> So yes, I can drop the global boolean (I still need a local flag in
>> blkfront_delay_work() for controlling the need to call
>> schedule_delayed_work() again).
> 
> Can't you just call schedule_delayed_work if info->feature_persistent
> is set, even if that means calling it multiple times if multiple
> blkfront instances are using persistent grants?

I don't like that. With mq we have a high chance for multiple instances
to use persistent grants and a local bool is much cheaper than unneeded
calls of schedule_delayed_work().


Juergen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants
  2018-08-07 15:56         ` Juergen Gross
@ 2018-08-08  8:27           ` Roger Pau Monné
  0 siblings, 0 replies; 16+ messages in thread
From: Roger Pau Monné @ 2018-08-08  8:27 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, xen-devel, linux-block, konrad.wilk, axboe,
	boris.ostrovsky

On Tue, Aug 07, 2018 at 05:56:38PM +0200, Juergen Gross wrote:
> On 07/08/18 16:14, Roger Pau Monné wrote:
> > On Tue, Aug 07, 2018 at 08:31:31AM +0200, Juergen Gross wrote:
> >> On 06/08/18 18:16, Roger Pau Monné wrote:
> >>> On Mon, Aug 06, 2018 at 01:34:01PM +0200, Juergen Gross wrote:
> >>>> Add a periodic cleanup function to remove old persistent grants which
> >>>> are no longer in use on the backend side. This avoids starvation in
> >>>> case there are lots of persistent grants for a device which no longer
> >>>> is involved in I/O business.
> >>>>
> >>>> Signed-off-by: Juergen Gross <jgross@suse.com>
> >>>> ---
> >>>>  drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
> >>>>  1 file changed, 95 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> >>>> index b5cedccb5d7d..19feb8835fc4 100644
> >>>> --- a/drivers/block/xen-blkfront.c
> >>>> +++ b/drivers/block/xen-blkfront.c
> >>>> @@ -46,6 +46,7 @@
> >>>>  #include <linux/scatterlist.h>
> >>>>  #include <linux/bitmap.h>
> >>>>  #include <linux/list.h>
> >>>> +#include <linux/workqueue.h>
> >>>>  
> >>>>  #include <xen/xen.h>
> >>>>  #include <xen/xenbus.h>
> >>>> @@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
> >>>>  
> >>>>  static DEFINE_MUTEX(blkfront_mutex);
> >>>>  static const struct block_device_operations xlvbd_block_fops;
> >>>> +static struct delayed_work blkfront_work;
> >>>> +static LIST_HEAD(info_list);
> >>>> +static bool blkfront_work_active;
> >>>>  
> >>>>  /*
> >>>>   * Maximum number of segments in indirect requests, the actual value used by
> >>>> @@ -216,6 +220,7 @@ struct blkfront_info
> >>>>  	/* Save uncomplete reqs and bios for migration. */
> >>>>  	struct list_head requests;
> >>>>  	struct bio_list bio_list;
> >>>> +	struct list_head info_list;
> >>>>  };
> >>>>  
> >>>>  static unsigned int nr_minors;
> >>>> @@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
> >>>>  	return err;
> >>>>  }
> >>>>  
> >>>> +static void free_info(struct blkfront_info *info)
> >>>> +{
> >>>> +	list_del(&info->info_list);
> >>>> +	kfree(info);
> >>>> +}
> >>>> +
> >>>>  /* Common code used when first setting up, and when resuming. */
> >>>>  static int talk_to_blkback(struct xenbus_device *dev,
> >>>>  			   struct blkfront_info *info)
> >>>> @@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
> >>>>   destroy_blkring:
> >>>>  	blkif_free(info, 0);
> >>>>  
> >>>> -	kfree(info);
> >>>> +	mutex_lock(&blkfront_mutex);
> >>>> +	free_info(info);
> >>>> +	mutex_unlock(&blkfront_mutex);
> >>>> +
> >>>>  	dev_set_drvdata(&dev->dev, NULL);
> >>>>  
> >>>>  	return err;
> >>>> @@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
> >>>>  	info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
> >>>>  	dev_set_drvdata(&dev->dev, info);
> >>>>  
> >>>> +	mutex_lock(&blkfront_mutex);
> >>>> +	list_add(&info->info_list, &info_list);
> >>>> +	mutex_unlock(&blkfront_mutex);
> >>>> +
> >>>>  	return 0;
> >>>>  }
> >>>>  
> >>>> @@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
> >>>>  	if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
> >>>>  		indirect_segments = 0;
> >>>>  	info->max_indirect_segments = indirect_segments;
> >>>> +
> >>>> +	if (info->feature_persistent) {
> >>>> +		mutex_lock(&blkfront_mutex);
> >>>> +		if (!blkfront_work_active) {
> >>>> +			blkfront_work_active = true;
> >>>> +			schedule_delayed_work(&blkfront_work, HZ * 10);
> >>>
> >>> Does it make sense to provide a module parameter to rune the schedule
> >>> of the cleanup routine?
> >>
> >> I don't think this is something anyone would like to tune.
> >>
> >> In case you think it should be tunable I can add a parameter, of course.
> > 
> > We can always add it later if required. I'm fine as-is now.
> > 
> >>>
> >>>> +		}
> >>>> +		mutex_unlock(&blkfront_mutex);
> >>>
> >>> Is it really necessary to have the blkfront_work_active boolean? What
> >>> happens if you queue the same delayed work more than once?
> >>
> >> In case there is already work queued later calls of
> >> schedule_delayed_work() will be ignored.
> >>
> >> So yes, I can drop the global boolean (I still need a local flag in
> >> blkfront_delay_work() for controlling the need to call
> >> schedule_delayed_work() again).
> > 
> > Can't you just call schedule_delayed_work if info->feature_persistent
> > is set, even if that means calling it multiple times if multiple
> > blkfront instances are using persistent grants?
> 
> I don't like that. With mq we have a high chance for multiple instances
> to use persistent grants and a local bool is much cheaper than unneeded
> calls of schedule_delayed_work().

OK, I'm convinced with the local bool.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-08-08  8:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-06 11:33 [PATCH 0/4] xen/blk: persistent grant rework Juergen Gross
2018-08-06 11:33 ` [PATCH 1/4] xen/blkback: don't keep persistent grants too long Juergen Gross
2018-08-06 15:58   ` Roger Pau Monné
2018-08-07  6:34     ` Juergen Gross
2018-08-06 11:34 ` [PATCH] xen/blkfront: remove unused macros Juergen Gross
2018-08-06 11:36   ` Juergen Gross
2018-08-06 11:34 ` [PATCH 2/4] xen/blkfront: cleanup stale persistent grants Juergen Gross
2018-08-06 16:16   ` Roger Pau Monné
2018-08-07  6:31     ` Juergen Gross
2018-08-07 14:14       ` Roger Pau Monné
2018-08-07 15:56         ` Juergen Gross
2018-08-08  8:27           ` Roger Pau Monné
2018-08-06 11:34 ` [PATCH 3/4] xen/blkfront: reorder tests in xlblk_init() Juergen Gross
2018-08-06 16:18   ` Roger Pau Monné
2018-08-06 11:34 ` [PATCH 4/4] xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring Juergen Gross
2018-08-06 16:20   ` Roger Pau Monné

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).