linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v3 0/2] virtio-pmem: Asynchronous flush
@ 2022-01-11 16:19 Pankaj Gupta
  2022-01-11 16:19 ` [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush Pankaj Gupta
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Pankaj Gupta @ 2022-01-11 16:19 UTC (permalink / raw)
  To: nvdimm, virtualization, linux-kernel
  Cc: dan.j.williams, jmoyer, stefanha, david, mst, cohuck,
	vishal.l.verma, dave.jiang, ira.weiny, pankaj.gupta,
	Pankaj Gupta

 Jeff reported preflush order issue with the existing implementation
 of virtio pmem preflush. Dan suggested[1] to implement asynchronous flush
 for virtio pmem using work queue as done in md/RAID. This patch series
 intends to solve the preflush ordering issue and makes the flush asynchronous
 for the submitting thread. Also, adds the flush coalscing logic.

 Submitting this RFC v3 series for review. Thank You!

 RFC v2 -> RFC v3
 - Improve commit log message - patch1.
 - Improve return error handling for Async flush.
 - declare'INIT_WORK' only once.
 - More testing and bug fix.

 [1] https://marc.info/?l=linux-kernel&m=157446316409937&w=2

Pankaj Gupta (2):
  virtio-pmem: Async virtio-pmem flush
  pmem: enable pmem_submit_bio for asynchronous flush

 drivers/nvdimm/nd_virtio.c   | 74 +++++++++++++++++++++++++++---------
 drivers/nvdimm/pmem.c        | 15 ++++++--
 drivers/nvdimm/region_devs.c |  4 +-
 drivers/nvdimm/virtio_pmem.c | 10 +++++
 drivers/nvdimm/virtio_pmem.h | 16 ++++++++
 5 files changed, 98 insertions(+), 21 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush
  2022-01-11 16:19 [RFC v3 0/2] virtio-pmem: Asynchronous flush Pankaj Gupta
@ 2022-01-11 16:19 ` Pankaj Gupta
  2022-02-16  3:21   ` Dan Williams
  2022-01-11 16:19 ` [RFC v3 2/2] pmem: enable pmem_submit_bio for asynchronous flush Pankaj Gupta
  2022-01-28 10:11 ` [RFC v3 0/2] virtio-pmem: Asynchronous flush Pankaj Gupta
  2 siblings, 1 reply; 12+ messages in thread
From: Pankaj Gupta @ 2022-01-11 16:19 UTC (permalink / raw)
  To: nvdimm, virtualization, linux-kernel
  Cc: dan.j.williams, jmoyer, stefanha, david, mst, cohuck,
	vishal.l.verma, dave.jiang, ira.weiny, pankaj.gupta,
	Pankaj Gupta

Enable asynchronous flush for virtio pmem using work queue. Also,
coalesce the flush requests when a flush is already in process.
This functionality is copied from md/RAID code.

When a flush is already in process, new flush requests wait till
previous flush completes in another context (work queue). For all
the requests come between ongoing flush and new flush start time, only
single flush executes, thus adhers to flush coalscing logic. This is
important for maintaining the flush request order with request coalscing.

Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
---
 drivers/nvdimm/nd_virtio.c   | 74 +++++++++++++++++++++++++++---------
 drivers/nvdimm/virtio_pmem.c | 10 +++++
 drivers/nvdimm/virtio_pmem.h | 16 ++++++++
 3 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
index 10351d5b49fa..179ea7a73338 100644
--- a/drivers/nvdimm/nd_virtio.c
+++ b/drivers/nvdimm/nd_virtio.c
@@ -100,26 +100,66 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 /* The asynchronous flush callback function */
 int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
 {
-	/*
-	 * Create child bio for asynchronous flush and chain with
-	 * parent bio. Otherwise directly call nd_region flush.
+	/* queue asynchronous flush and coalesce the flush requests */
+	struct virtio_device *vdev = nd_region->provider_data;
+	struct virtio_pmem *vpmem  = vdev->priv;
+	ktime_t req_start = ktime_get_boottime();
+	int ret = -EINPROGRESS;
+
+	spin_lock_irq(&vpmem->lock);
+	/* flush requests wait until ongoing flush completes,
+	 * hence coalescing all the pending requests.
 	 */
-	if (bio && bio->bi_iter.bi_sector != -1) {
-		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
-
-		if (!child)
-			return -ENOMEM;
-		bio_copy_dev(child, bio);
-		child->bi_opf = REQ_PREFLUSH;
-		child->bi_iter.bi_sector = -1;
-		bio_chain(child, bio);
-		submit_bio(child);
-		return 0;
+	wait_event_lock_irq(vpmem->sb_wait,
+			    !vpmem->flush_bio ||
+			    ktime_before(req_start, vpmem->prev_flush_start),
+			    vpmem->lock);
+	/* new request after previous flush is completed */
+	if (ktime_after(req_start, vpmem->prev_flush_start)) {
+		WARN_ON(vpmem->flush_bio);
+		vpmem->flush_bio = bio;
+		bio = NULL;
+	}
+	spin_unlock_irq(&vpmem->lock);
+
+	if (!bio)
+		queue_work(vpmem->pmem_wq, &vpmem->flush_work);
+	else {
+	/* flush completed in other context while we waited */
+		if (bio && (bio->bi_opf & REQ_PREFLUSH))
+			bio->bi_opf &= ~REQ_PREFLUSH;
+		else if (bio && (bio->bi_opf & REQ_FUA))
+			bio->bi_opf &= ~REQ_FUA;
+
+		ret = vpmem->prev_flush_err;
 	}
-	if (virtio_pmem_flush(nd_region))
-		return -EIO;
 
-	return 0;
+	return ret;
 };
 EXPORT_SYMBOL_GPL(async_pmem_flush);
+
+void submit_async_flush(struct work_struct *ws)
+{
+	struct virtio_pmem *vpmem = container_of(ws, struct virtio_pmem, flush_work);
+	struct bio *bio = vpmem->flush_bio;
+
+	vpmem->start_flush = ktime_get_boottime();
+	vpmem->prev_flush_err = virtio_pmem_flush(vpmem->nd_region);
+	vpmem->prev_flush_start = vpmem->start_flush;
+	vpmem->flush_bio = NULL;
+	wake_up(&vpmem->sb_wait);
+
+	if (vpmem->prev_flush_err)
+		bio->bi_status = errno_to_blk_status(-EIO);
+
+	/* Submit parent bio only for PREFLUSH */
+	if (bio && (bio->bi_opf & REQ_PREFLUSH)) {
+		bio->bi_opf &= ~REQ_PREFLUSH;
+		submit_bio(bio);
+	} else if (bio && (bio->bi_opf & REQ_FUA)) {
+		bio->bi_opf &= ~REQ_FUA;
+		bio_endio(bio);
+	}
+}
+EXPORT_SYMBOL_GPL(submit_async_flush);
 MODULE_LICENSE("GPL");
diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
index 726c7354d465..75ed9b7ddea1 100644
--- a/drivers/nvdimm/virtio_pmem.c
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -24,6 +24,7 @@ static int init_vq(struct virtio_pmem *vpmem)
 		return PTR_ERR(vpmem->req_vq);
 
 	spin_lock_init(&vpmem->pmem_lock);
+	spin_lock_init(&vpmem->lock);
 	INIT_LIST_HEAD(&vpmem->req_list);
 
 	return 0;
@@ -57,7 +58,14 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
 		dev_err(&vdev->dev, "failed to initialize virtio pmem vq's\n");
 		goto out_err;
 	}
+	vpmem->pmem_wq = alloc_workqueue("vpmem_wq", WQ_MEM_RECLAIM, 0);
+	if (!vpmem->pmem_wq) {
+		err = -ENOMEM;
+		goto out_err;
+	}
 
+	INIT_WORK(&vpmem->flush_work, submit_async_flush);
+	init_waitqueue_head(&vpmem->sb_wait);
 	virtio_cread_le(vpmem->vdev, struct virtio_pmem_config,
 			start, &vpmem->start);
 	virtio_cread_le(vpmem->vdev, struct virtio_pmem_config,
@@ -90,10 +98,12 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
 		goto out_nd;
 	}
 	nd_region->provider_data = dev_to_virtio(nd_region->dev.parent->parent);
+	vpmem->nd_region = nd_region;
 	return 0;
 out_nd:
 	nvdimm_bus_unregister(vpmem->nvdimm_bus);
 out_vq:
+	destroy_workqueue(vpmem->pmem_wq);
 	vdev->config->del_vqs(vdev);
 out_err:
 	return err;
diff --git a/drivers/nvdimm/virtio_pmem.h b/drivers/nvdimm/virtio_pmem.h
index 0dddefe594c4..495dc20e1cdb 100644
--- a/drivers/nvdimm/virtio_pmem.h
+++ b/drivers/nvdimm/virtio_pmem.h
@@ -35,9 +35,24 @@ struct virtio_pmem {
 	/* Virtio pmem request queue */
 	struct virtqueue *req_vq;
 
+	struct bio *flush_bio;
+	/* last_flush is when the last completed flush was started */
+	ktime_t prev_flush_start, start_flush;
+	int prev_flush_err;
+
+	/* work queue for deferred flush */
+	struct work_struct flush_work;
+	struct workqueue_struct *pmem_wq;
+
+	/* Synchronize flush wait queue data */
+	spinlock_t lock;
+	/* for waiting for previous flush to complete */
+	wait_queue_head_t sb_wait;
+
 	/* nvdimm bus registers virtio pmem device */
 	struct nvdimm_bus *nvdimm_bus;
 	struct nvdimm_bus_descriptor nd_desc;
+	struct nd_region *nd_region;
 
 	/* List to store deferred work if virtqueue is full */
 	struct list_head req_list;
@@ -52,4 +67,5 @@ struct virtio_pmem {
 
 void virtio_pmem_host_ack(struct virtqueue *vq);
 int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
+void submit_async_flush(struct work_struct *ws);
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC v3 2/2] pmem: enable pmem_submit_bio for asynchronous flush
  2022-01-11 16:19 [RFC v3 0/2] virtio-pmem: Asynchronous flush Pankaj Gupta
  2022-01-11 16:19 ` [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush Pankaj Gupta
@ 2022-01-11 16:19 ` Pankaj Gupta
  2022-02-16  3:34   ` Dan Williams
  2022-01-28 10:11 ` [RFC v3 0/2] virtio-pmem: Asynchronous flush Pankaj Gupta
  2 siblings, 1 reply; 12+ messages in thread
From: Pankaj Gupta @ 2022-01-11 16:19 UTC (permalink / raw)
  To: nvdimm, virtualization, linux-kernel
  Cc: dan.j.williams, jmoyer, stefanha, david, mst, cohuck,
	vishal.l.verma, dave.jiang, ira.weiny, pankaj.gupta,
	Pankaj Gupta

Return from "pmem_submit_bio" when asynchronous flush is
still in progress in other context.

Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
---
 drivers/nvdimm/pmem.c        | 15 ++++++++++++---
 drivers/nvdimm/region_devs.c |  4 +++-
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fe7ece1534e1..f20e30277a68 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -201,8 +201,12 @@ static void pmem_submit_bio(struct bio *bio)
 	struct pmem_device *pmem = bio->bi_bdev->bd_disk->private_data;
 	struct nd_region *nd_region = to_region(pmem);
 
-	if (bio->bi_opf & REQ_PREFLUSH)
+	if (bio->bi_opf & REQ_PREFLUSH) {
 		ret = nvdimm_flush(nd_region, bio);
+		/* asynchronous flush completes in other context */
+		if (ret == -EINPROGRESS)
+			return;
+	}
 
 	do_acct = blk_queue_io_stat(bio->bi_bdev->bd_disk->queue);
 	if (do_acct)
@@ -222,13 +226,18 @@ static void pmem_submit_bio(struct bio *bio)
 	if (do_acct)
 		bio_end_io_acct(bio, start);
 
-	if (bio->bi_opf & REQ_FUA)
+	if (bio->bi_opf & REQ_FUA) {
 		ret = nvdimm_flush(nd_region, bio);
+		/* asynchronous flush completes in other context */
+		if (ret == -EINPROGRESS)
+			return;
+	}
 
 	if (ret)
 		bio->bi_status = errno_to_blk_status(ret);
 
-	bio_endio(bio);
+	if (bio)
+		bio_endio(bio);
 }
 
 static int pmem_rw_page(struct block_device *bdev, sector_t sector,
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 9ccf3d608799..8512d2eaed4e 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1190,7 +1190,9 @@ int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
 	if (!nd_region->flush)
 		rc = generic_nvdimm_flush(nd_region);
 	else {
-		if (nd_region->flush(nd_region, bio))
+		rc = nd_region->flush(nd_region, bio);
+		/* ongoing flush in other context */
+		if (rc && rc != -EINPROGRESS)
 			rc = -EIO;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC v3 0/2] virtio-pmem: Asynchronous flush
  2022-01-11 16:19 [RFC v3 0/2] virtio-pmem: Asynchronous flush Pankaj Gupta
  2022-01-11 16:19 ` [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush Pankaj Gupta
  2022-01-11 16:19 ` [RFC v3 2/2] pmem: enable pmem_submit_bio for asynchronous flush Pankaj Gupta
@ 2022-01-28 10:11 ` Pankaj Gupta
  2 siblings, 0 replies; 12+ messages in thread
From: Pankaj Gupta @ 2022-01-28 10:11 UTC (permalink / raw)
  To: Linux NVDIMM, virtualization, LKML
  Cc: Dan Williams, jmoyer, Stefan Hajnoczi, David Hildenbrand,
	Michael S . Tsirkin, Cornelia Huck, Vishal Verma, Dave Jiang,
	Ira Weiny, Pankaj Gupta

ping

>  Jeff reported preflush order issue with the existing implementation
>  of virtio pmem preflush. Dan suggested[1] to implement asynchronous flush
>  for virtio pmem using work queue as done in md/RAID. This patch series
>  intends to solve the preflush ordering issue and makes the flush asynchronous
>  for the submitting thread. Also, adds the flush coalscing logic.
>
>  Submitting this RFC v3 series for review. Thank You!
>
>  RFC v2 -> RFC v3
>  - Improve commit log message - patch1.
>  - Improve return error handling for Async flush.
>  - declare'INIT_WORK' only once.
>  - More testing and bug fix.
>
>  [1] https://marc.info/?l=linux-kernel&m=157446316409937&w=2
>
> Pankaj Gupta (2):
>   virtio-pmem: Async virtio-pmem flush
>   pmem: enable pmem_submit_bio for asynchronous flush
>
>  drivers/nvdimm/nd_virtio.c   | 74 +++++++++++++++++++++++++++---------
>  drivers/nvdimm/pmem.c        | 15 ++++++--
>  drivers/nvdimm/region_devs.c |  4 +-
>  drivers/nvdimm/virtio_pmem.c | 10 +++++
>  drivers/nvdimm/virtio_pmem.h | 16 ++++++++
>  5 files changed, 98 insertions(+), 21 deletions(-)
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush
  2022-01-11 16:19 ` [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush Pankaj Gupta
@ 2022-02-16  3:21   ` Dan Williams
  2022-02-16  8:47     ` Pankaj Gupta
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2022-02-16  3:21 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Linux NVDIMM, virtualization, Linux Kernel Mailing List, jmoyer,
	Stefan Hajnoczi, David Hildenbrand, Michael S. Tsirkin,
	Cornelia Huck, Vishal L Verma, Dave Jiang, Weiny, Ira,
	Pankaj Gupta

On Tue, Jan 11, 2022 at 8:23 AM Pankaj Gupta
<pankaj.gupta.linux@gmail.com> wrote:
>
> Enable asynchronous flush for virtio pmem using work queue. Also,
> coalesce the flush requests when a flush is already in process.
> This functionality is copied from md/RAID code.
>
> When a flush is already in process, new flush requests wait till
> previous flush completes in another context (work queue). For all
> the requests come between ongoing flush and new flush start time, only
> single flush executes, thus adhers to flush coalscing logic. This is

s/adhers/adheres/

s/coalscing/coalescing/

> important for maintaining the flush request order with request coalscing.

s/coalscing/coalescing/

>
> Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> ---
>  drivers/nvdimm/nd_virtio.c   | 74 +++++++++++++++++++++++++++---------
>  drivers/nvdimm/virtio_pmem.c | 10 +++++
>  drivers/nvdimm/virtio_pmem.h | 16 ++++++++
>  3 files changed, 83 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> index 10351d5b49fa..179ea7a73338 100644
> --- a/drivers/nvdimm/nd_virtio.c
> +++ b/drivers/nvdimm/nd_virtio.c
> @@ -100,26 +100,66 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
>  /* The asynchronous flush callback function */
>  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
>  {
> -       /*
> -        * Create child bio for asynchronous flush and chain with
> -        * parent bio. Otherwise directly call nd_region flush.
> +       /* queue asynchronous flush and coalesce the flush requests */
> +       struct virtio_device *vdev = nd_region->provider_data;
> +       struct virtio_pmem *vpmem  = vdev->priv;
> +       ktime_t req_start = ktime_get_boottime();
> +       int ret = -EINPROGRESS;
> +
> +       spin_lock_irq(&vpmem->lock);

Why a new lock and not continue to use ->pmem_lock?

Have you tested this with CONFIG_PROVE_LOCKING?

Along those lines do you have a selftest that can be added to the
kernel as well so that 0day or other bots could offer early warnings
on regressions?

> +       /* flush requests wait until ongoing flush completes,
> +        * hence coalescing all the pending requests.
>          */
> -       if (bio && bio->bi_iter.bi_sector != -1) {
> -               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> -
> -               if (!child)
> -                       return -ENOMEM;
> -               bio_copy_dev(child, bio);
> -               child->bi_opf = REQ_PREFLUSH;
> -               child->bi_iter.bi_sector = -1;
> -               bio_chain(child, bio);
> -               submit_bio(child);
> -               return 0;
> +       wait_event_lock_irq(vpmem->sb_wait,
> +                           !vpmem->flush_bio ||
> +                           ktime_before(req_start, vpmem->prev_flush_start),
> +                           vpmem->lock);
> +       /* new request after previous flush is completed */
> +       if (ktime_after(req_start, vpmem->prev_flush_start)) {
> +               WARN_ON(vpmem->flush_bio);
> +               vpmem->flush_bio = bio;
> +               bio = NULL;
> +       }
> +       spin_unlock_irq(&vpmem->lock);
> +
> +       if (!bio)
> +               queue_work(vpmem->pmem_wq, &vpmem->flush_work);
> +       else {
> +       /* flush completed in other context while we waited */
> +               if (bio && (bio->bi_opf & REQ_PREFLUSH))
> +                       bio->bi_opf &= ~REQ_PREFLUSH;
> +               else if (bio && (bio->bi_opf & REQ_FUA))
> +                       bio->bi_opf &= ~REQ_FUA;
> +
> +               ret = vpmem->prev_flush_err;
>         }
> -       if (virtio_pmem_flush(nd_region))
> -               return -EIO;
>
> -       return 0;
> +       return ret;
>  };
>  EXPORT_SYMBOL_GPL(async_pmem_flush);
> +
> +void submit_async_flush(struct work_struct *ws)

This name is too generic to be exported from drivers/nvdimm/nd_virtio.c

...it strikes me that there is little reason for nd_virtio and
virtio_pmem to be separate modules. They are both enabled by the same
Kconfig, so why not combine them into one module and drop the exports?

> +{
> +       struct virtio_pmem *vpmem = container_of(ws, struct virtio_pmem, flush_work);
> +       struct bio *bio = vpmem->flush_bio;
> +
> +       vpmem->start_flush = ktime_get_boottime();
> +       vpmem->prev_flush_err = virtio_pmem_flush(vpmem->nd_region);
> +       vpmem->prev_flush_start = vpmem->start_flush;
> +       vpmem->flush_bio = NULL;
> +       wake_up(&vpmem->sb_wait);
> +
> +       if (vpmem->prev_flush_err)
> +               bio->bi_status = errno_to_blk_status(-EIO);
> +
> +       /* Submit parent bio only for PREFLUSH */
> +       if (bio && (bio->bi_opf & REQ_PREFLUSH)) {
> +               bio->bi_opf &= ~REQ_PREFLUSH;
> +               submit_bio(bio);
> +       } else if (bio && (bio->bi_opf & REQ_FUA)) {
> +               bio->bi_opf &= ~REQ_FUA;
> +               bio_endio(bio);
> +       }
> +}
> +EXPORT_SYMBOL_GPL(submit_async_flush);
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> index 726c7354d465..75ed9b7ddea1 100644
> --- a/drivers/nvdimm/virtio_pmem.c
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -24,6 +24,7 @@ static int init_vq(struct virtio_pmem *vpmem)
>                 return PTR_ERR(vpmem->req_vq);
>
>         spin_lock_init(&vpmem->pmem_lock);
> +       spin_lock_init(&vpmem->lock);
>         INIT_LIST_HEAD(&vpmem->req_list);
>
>         return 0;
> @@ -57,7 +58,14 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
>                 dev_err(&vdev->dev, "failed to initialize virtio pmem vq's\n");
>                 goto out_err;
>         }
> +       vpmem->pmem_wq = alloc_workqueue("vpmem_wq", WQ_MEM_RECLAIM, 0);
> +       if (!vpmem->pmem_wq) {
> +               err = -ENOMEM;
> +               goto out_err;
> +       }
>
> +       INIT_WORK(&vpmem->flush_work, submit_async_flush);
> +       init_waitqueue_head(&vpmem->sb_wait);
>         virtio_cread_le(vpmem->vdev, struct virtio_pmem_config,
>                         start, &vpmem->start);
>         virtio_cread_le(vpmem->vdev, struct virtio_pmem_config,
> @@ -90,10 +98,12 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
>                 goto out_nd;
>         }
>         nd_region->provider_data = dev_to_virtio(nd_region->dev.parent->parent);
> +       vpmem->nd_region = nd_region;
>         return 0;
>  out_nd:
>         nvdimm_bus_unregister(vpmem->nvdimm_bus);
>  out_vq:
> +       destroy_workqueue(vpmem->pmem_wq);
>         vdev->config->del_vqs(vdev);
>  out_err:
>         return err;
> diff --git a/drivers/nvdimm/virtio_pmem.h b/drivers/nvdimm/virtio_pmem.h
> index 0dddefe594c4..495dc20e1cdb 100644
> --- a/drivers/nvdimm/virtio_pmem.h
> +++ b/drivers/nvdimm/virtio_pmem.h
> @@ -35,9 +35,24 @@ struct virtio_pmem {
>         /* Virtio pmem request queue */
>         struct virtqueue *req_vq;
>
> +       struct bio *flush_bio;
> +       /* last_flush is when the last completed flush was started */
> +       ktime_t prev_flush_start, start_flush;
> +       int prev_flush_err;
> +
> +       /* work queue for deferred flush */
> +       struct work_struct flush_work;
> +       struct workqueue_struct *pmem_wq;
> +
> +       /* Synchronize flush wait queue data */
> +       spinlock_t lock;
> +       /* for waiting for previous flush to complete */
> +       wait_queue_head_t sb_wait;
> +
>         /* nvdimm bus registers virtio pmem device */
>         struct nvdimm_bus *nvdimm_bus;
>         struct nvdimm_bus_descriptor nd_desc;
> +       struct nd_region *nd_region;
>
>         /* List to store deferred work if virtqueue is full */
>         struct list_head req_list;
> @@ -52,4 +67,5 @@ struct virtio_pmem {
>
>  void virtio_pmem_host_ack(struct virtqueue *vq);
>  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> +void submit_async_flush(struct work_struct *ws);
>  #endif
> --
> 2.25.1
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v3 2/2] pmem: enable pmem_submit_bio for asynchronous flush
  2022-01-11 16:19 ` [RFC v3 2/2] pmem: enable pmem_submit_bio for asynchronous flush Pankaj Gupta
@ 2022-02-16  3:34   ` Dan Williams
  2022-02-16  8:39     ` Pankaj Gupta
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2022-02-16  3:34 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Linux NVDIMM, virtualization, Linux Kernel Mailing List, jmoyer,
	Stefan Hajnoczi, David Hildenbrand, Michael S. Tsirkin,
	Cornelia Huck, Vishal L Verma, Dave Jiang, Weiny, Ira,
	Pankaj Gupta

On Tue, Jan 11, 2022 at 8:21 AM Pankaj Gupta
<pankaj.gupta.linux@gmail.com> wrote:
>
> Return from "pmem_submit_bio" when asynchronous flush is
> still in progress in other context.
>
> Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> ---
>  drivers/nvdimm/pmem.c        | 15 ++++++++++++---
>  drivers/nvdimm/region_devs.c |  4 +++-
>  2 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index fe7ece1534e1..f20e30277a68 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -201,8 +201,12 @@ static void pmem_submit_bio(struct bio *bio)
>         struct pmem_device *pmem = bio->bi_bdev->bd_disk->private_data;
>         struct nd_region *nd_region = to_region(pmem);
>
> -       if (bio->bi_opf & REQ_PREFLUSH)
> +       if (bio->bi_opf & REQ_PREFLUSH) {
>                 ret = nvdimm_flush(nd_region, bio);
> +               /* asynchronous flush completes in other context */

I think a negative error code is a confusing way to capture the case
of "bio successfully coalesced to previously pending flush request.
Perhaps reserve negative codes for failure, 0 for synchronously
completed, and > 0 for coalesced flush request.

> +               if (ret == -EINPROGRESS)
> +                       return;
> +       }
>
>         do_acct = blk_queue_io_stat(bio->bi_bdev->bd_disk->queue);
>         if (do_acct)
> @@ -222,13 +226,18 @@ static void pmem_submit_bio(struct bio *bio)
>         if (do_acct)
>                 bio_end_io_acct(bio, start);
>
> -       if (bio->bi_opf & REQ_FUA)
> +       if (bio->bi_opf & REQ_FUA) {
>                 ret = nvdimm_flush(nd_region, bio);
> +               /* asynchronous flush completes in other context */
> +               if (ret == -EINPROGRESS)
> +                       return;
> +       }
>
>         if (ret)
>                 bio->bi_status = errno_to_blk_status(ret);
>
> -       bio_endio(bio);
> +       if (bio)
> +               bio_endio(bio);
>  }
>
>  static int pmem_rw_page(struct block_device *bdev, sector_t sector,
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 9ccf3d608799..8512d2eaed4e 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -1190,7 +1190,9 @@ int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
>         if (!nd_region->flush)
>                 rc = generic_nvdimm_flush(nd_region);
>         else {
> -               if (nd_region->flush(nd_region, bio))
> +               rc = nd_region->flush(nd_region, bio);
> +               /* ongoing flush in other context */
> +               if (rc && rc != -EINPROGRESS)
>                         rc = -EIO;

Why change this to -EIO vs just let the error code through untranslated?

>         }
>
> --
> 2.25.1
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v3 2/2] pmem: enable pmem_submit_bio for asynchronous flush
  2022-02-16  3:34   ` Dan Williams
@ 2022-02-16  8:39     ` Pankaj Gupta
  2022-02-16 16:25       ` Dan Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Pankaj Gupta @ 2022-02-16  8:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux NVDIMM, virtualization, Linux Kernel Mailing List, jmoyer,
	Stefan Hajnoczi, David Hildenbrand, Michael S. Tsirkin,
	Cornelia Huck, Vishal L Verma, Dave Jiang, Weiny, Ira,
	Pankaj Gupta

> >
> > Return from "pmem_submit_bio" when asynchronous flush is
> > still in progress in other context.
> >
> > Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> > ---
> >  drivers/nvdimm/pmem.c        | 15 ++++++++++++---
> >  drivers/nvdimm/region_devs.c |  4 +++-
> >  2 files changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > index fe7ece1534e1..f20e30277a68 100644
> > --- a/drivers/nvdimm/pmem.c
> > +++ b/drivers/nvdimm/pmem.c
> > @@ -201,8 +201,12 @@ static void pmem_submit_bio(struct bio *bio)
> >         struct pmem_device *pmem = bio->bi_bdev->bd_disk->private_data;
> >         struct nd_region *nd_region = to_region(pmem);
> >
> > -       if (bio->bi_opf & REQ_PREFLUSH)
> > +       if (bio->bi_opf & REQ_PREFLUSH) {
> >                 ret = nvdimm_flush(nd_region, bio);
> > +               /* asynchronous flush completes in other context */
>
> I think a negative error code is a confusing way to capture the case
> of "bio successfully coalesced to previously pending flush request.
> Perhaps reserve negative codes for failure, 0 for synchronously
> completed, and > 0 for coalesced flush request.

Yes. I implemented this way previously, will revert it to. Thanks!

>
> > +               if (ret == -EINPROGRESS)
> > +                       return;
> > +       }
> >
> >         do_acct = blk_queue_io_stat(bio->bi_bdev->bd_disk->queue);
> >         if (do_acct)
> > @@ -222,13 +226,18 @@ static void pmem_submit_bio(struct bio *bio)
> >         if (do_acct)
> >                 bio_end_io_acct(bio, start);
> >
> > -       if (bio->bi_opf & REQ_FUA)
> > +       if (bio->bi_opf & REQ_FUA) {
> >                 ret = nvdimm_flush(nd_region, bio);
> > +               /* asynchronous flush completes in other context */
> > +               if (ret == -EINPROGRESS)
> > +                       return;
> > +       }
> >
> >         if (ret)
> >                 bio->bi_status = errno_to_blk_status(ret);
> >
> > -       bio_endio(bio);
> > +       if (bio)
> > +               bio_endio(bio);
> >  }
> >
> >  static int pmem_rw_page(struct block_device *bdev, sector_t sector,
> > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> > index 9ccf3d608799..8512d2eaed4e 100644
> > --- a/drivers/nvdimm/region_devs.c
> > +++ b/drivers/nvdimm/region_devs.c
> > @@ -1190,7 +1190,9 @@ int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
> >         if (!nd_region->flush)
> >                 rc = generic_nvdimm_flush(nd_region);
> >         else {
> > -               if (nd_region->flush(nd_region, bio))
> > +               rc = nd_region->flush(nd_region, bio);
> > +               /* ongoing flush in other context */
> > +               if (rc && rc != -EINPROGRESS)
> >                         rc = -EIO;
>
> Why change this to -EIO vs just let the error code through untranslated?

The reason was to be generic error code instead of returning host side
return codes to guest?

Thanks!
Pankaj
>
> >         }
> >
> > --
> > 2.25.1
> >
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush
  2022-02-16  3:21   ` Dan Williams
@ 2022-02-16  8:47     ` Pankaj Gupta
  2022-02-16 16:23       ` Dan Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Pankaj Gupta @ 2022-02-16  8:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux NVDIMM, virtualization, Linux Kernel Mailing List, jmoyer,
	Stefan Hajnoczi, David Hildenbrand, Michael S. Tsirkin,
	Cornelia Huck, Vishal L Verma, Dave Jiang, Weiny, Ira,
	Pankaj Gupta

> >
> > Enable asynchronous flush for virtio pmem using work queue. Also,
> > coalesce the flush requests when a flush is already in process.
> > This functionality is copied from md/RAID code.
> >
> > When a flush is already in process, new flush requests wait till
> > previous flush completes in another context (work queue). For all
> > the requests come between ongoing flush and new flush start time, only
> > single flush executes, thus adhers to flush coalscing logic. This is
>
> s/adhers/adheres/
>
> s/coalscing/coalescing/
>
> > important for maintaining the flush request order with request coalscing.
>
> s/coalscing/coalescing/

o.k. Sorry for the spelling mistakes.

>
> >
> > Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> > ---
> >  drivers/nvdimm/nd_virtio.c   | 74 +++++++++++++++++++++++++++---------
> >  drivers/nvdimm/virtio_pmem.c | 10 +++++
> >  drivers/nvdimm/virtio_pmem.h | 16 ++++++++
> >  3 files changed, 83 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > index 10351d5b49fa..179ea7a73338 100644
> > --- a/drivers/nvdimm/nd_virtio.c
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -100,26 +100,66 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> >  /* The asynchronous flush callback function */
> >  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> >  {
> > -       /*
> > -        * Create child bio for asynchronous flush and chain with
> > -        * parent bio. Otherwise directly call nd_region flush.
> > +       /* queue asynchronous flush and coalesce the flush requests */
> > +       struct virtio_device *vdev = nd_region->provider_data;
> > +       struct virtio_pmem *vpmem  = vdev->priv;
> > +       ktime_t req_start = ktime_get_boottime();
> > +       int ret = -EINPROGRESS;
> > +
> > +       spin_lock_irq(&vpmem->lock);
>
> Why a new lock and not continue to use ->pmem_lock?

This spinlock is to protect entry in 'wait_event_lock_irq'
and the Other spinlock is to protect virtio queue data.
>
> Have you tested this with CONFIG_PROVE_LOCKING?
No, I only ran xfs tests and some of my unit test program.
Will enable and test with CONFIG_PROVE_LOCKING.

>
> Along those lines do you have a selftest that can be added to the
> kernel as well so that 0day or other bots could offer early warnings
> on regressions?

Will try to add one.

Thank you Dan for the feedback!

Best regards,
Pankaj
>
> > +       /* flush requests wait until ongoing flush completes,
> > +        * hence coalescing all the pending requests.
> >          */
> > -       if (bio && bio->bi_iter.bi_sector != -1) {
> > -               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > -
> > -               if (!child)
> > -                       return -ENOMEM;
> > -               bio_copy_dev(child, bio);
> > -               child->bi_opf = REQ_PREFLUSH;
> > -               child->bi_iter.bi_sector = -1;
> > -               bio_chain(child, bio);
> > -               submit_bio(child);
> > -               return 0;
> > +       wait_event_lock_irq(vpmem->sb_wait,
> > +                           !vpmem->flush_bio ||
> > +                           ktime_before(req_start, vpmem->prev_flush_start),
> > +                           vpmem->lock);
> > +       /* new request after previous flush is completed */
> > +       if (ktime_after(req_start, vpmem->prev_flush_start)) {
> > +               WARN_ON(vpmem->flush_bio);
> > +               vpmem->flush_bio = bio;
> > +               bio = NULL;
> > +       }
> > +       spin_unlock_irq(&vpmem->lock);
> > +
> > +       if (!bio)
> > +               queue_work(vpmem->pmem_wq, &vpmem->flush_work);
> > +       else {
> > +       /* flush completed in other context while we waited */
> > +               if (bio && (bio->bi_opf & REQ_PREFLUSH))
> > +                       bio->bi_opf &= ~REQ_PREFLUSH;
> > +               else if (bio && (bio->bi_opf & REQ_FUA))
> > +                       bio->bi_opf &= ~REQ_FUA;
> > +
> > +               ret = vpmem->prev_flush_err;
> >         }
> > -       if (virtio_pmem_flush(nd_region))
> > -               return -EIO;
> >
> > -       return 0;
> > +       return ret;
> >  };
> >  EXPORT_SYMBOL_GPL(async_pmem_flush);
> > +
> > +void submit_async_flush(struct work_struct *ws)
>
> This name is too generic to be exported from drivers/nvdimm/nd_virtio.c
>
> ...it strikes me that there is little reason for nd_virtio and
> virtio_pmem to be separate modules. They are both enabled by the same
> Kconfig, so why not combine them into one module and drop the exports?

makes sense.
>
> > +{
> > +       struct virtio_pmem *vpmem = container_of(ws, struct virtio_pmem, flush_work);
> > +       struct bio *bio = vpmem->flush_bio;
> > +
> > +       vpmem->start_flush = ktime_get_boottime();
> > +       vpmem->prev_flush_err = virtio_pmem_flush(vpmem->nd_region);
> > +       vpmem->prev_flush_start = vpmem->start_flush;
> > +       vpmem->flush_bio = NULL;
> > +       wake_up(&vpmem->sb_wait);
> > +
> > +       if (vpmem->prev_flush_err)
> > +               bio->bi_status = errno_to_blk_status(-EIO);
> > +
> > +       /* Submit parent bio only for PREFLUSH */
> > +       if (bio && (bio->bi_opf & REQ_PREFLUSH)) {
> > +               bio->bi_opf &= ~REQ_PREFLUSH;
> > +               submit_bio(bio);
> > +       } else if (bio && (bio->bi_opf & REQ_FUA)) {
> > +               bio->bi_opf &= ~REQ_FUA;
> > +               bio_endio(bio);
> > +       }
> > +}
> > +EXPORT_SYMBOL_GPL(submit_async_flush);
> >  MODULE_LICENSE("GPL");
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > index 726c7354d465..75ed9b7ddea1 100644
> > --- a/drivers/nvdimm/virtio_pmem.c
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -24,6 +24,7 @@ static int init_vq(struct virtio_pmem *vpmem)
> >                 return PTR_ERR(vpmem->req_vq);
> >
> >         spin_lock_init(&vpmem->pmem_lock);
> > +       spin_lock_init(&vpmem->lock);
> >         INIT_LIST_HEAD(&vpmem->req_list);
> >
> >         return 0;
> > @@ -57,7 +58,14 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
> >                 dev_err(&vdev->dev, "failed to initialize virtio pmem vq's\n");
> >                 goto out_err;
> >         }
> > +       vpmem->pmem_wq = alloc_workqueue("vpmem_wq", WQ_MEM_RECLAIM, 0);
> > +       if (!vpmem->pmem_wq) {
> > +               err = -ENOMEM;
> > +               goto out_err;
> > +       }
> >
> > +       INIT_WORK(&vpmem->flush_work, submit_async_flush);
> > +       init_waitqueue_head(&vpmem->sb_wait);
> >         virtio_cread_le(vpmem->vdev, struct virtio_pmem_config,
> >                         start, &vpmem->start);
> >         virtio_cread_le(vpmem->vdev, struct virtio_pmem_config,
> > @@ -90,10 +98,12 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
> >                 goto out_nd;
> >         }
> >         nd_region->provider_data = dev_to_virtio(nd_region->dev.parent->parent);
> > +       vpmem->nd_region = nd_region;
> >         return 0;
> >  out_nd:
> >         nvdimm_bus_unregister(vpmem->nvdimm_bus);
> >  out_vq:
> > +       destroy_workqueue(vpmem->pmem_wq);
> >         vdev->config->del_vqs(vdev);
> >  out_err:
> >         return err;
> > diff --git a/drivers/nvdimm/virtio_pmem.h b/drivers/nvdimm/virtio_pmem.h
> > index 0dddefe594c4..495dc20e1cdb 100644
> > --- a/drivers/nvdimm/virtio_pmem.h
> > +++ b/drivers/nvdimm/virtio_pmem.h
> > @@ -35,9 +35,24 @@ struct virtio_pmem {
> >         /* Virtio pmem request queue */
> >         struct virtqueue *req_vq;
> >
> > +       struct bio *flush_bio;
> > +       /* last_flush is when the last completed flush was started */
> > +       ktime_t prev_flush_start, start_flush;
> > +       int prev_flush_err;
> > +
> > +       /* work queue for deferred flush */
> > +       struct work_struct flush_work;
> > +       struct workqueue_struct *pmem_wq;
> > +
> > +       /* Synchronize flush wait queue data */
> > +       spinlock_t lock;
> > +       /* for waiting for previous flush to complete */
> > +       wait_queue_head_t sb_wait;
> > +
> >         /* nvdimm bus registers virtio pmem device */
> >         struct nvdimm_bus *nvdimm_bus;
> >         struct nvdimm_bus_descriptor nd_desc;
> > +       struct nd_region *nd_region;
> >
> >         /* List to store deferred work if virtqueue is full */
> >         struct list_head req_list;
> > @@ -52,4 +67,5 @@ struct virtio_pmem {
> >
> >  void virtio_pmem_host_ack(struct virtqueue *vq);
> >  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> > +void submit_async_flush(struct work_struct *ws);
> >  #endif
> > --
> > 2.25.1
> >
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush
  2022-02-16  8:47     ` Pankaj Gupta
@ 2022-02-16 16:23       ` Dan Williams
  2022-02-16 17:04         ` Pankaj Gupta
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2022-02-16 16:23 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Linux NVDIMM, virtualization, Linux Kernel Mailing List, jmoyer,
	Stefan Hajnoczi, David Hildenbrand, Michael S. Tsirkin,
	Cornelia Huck, Vishal L Verma, Dave Jiang, Weiny, Ira,
	Pankaj Gupta

On Wed, Feb 16, 2022 at 12:47 AM Pankaj Gupta
<pankaj.gupta.linux@gmail.com> wrote:
>
> > >
> > > Enable asynchronous flush for virtio pmem using work queue. Also,
> > > coalesce the flush requests when a flush is already in process.
> > > This functionality is copied from md/RAID code.
> > >
> > > When a flush is already in process, new flush requests wait till
> > > previous flush completes in another context (work queue). For all
> > > the requests come between ongoing flush and new flush start time, only
> > > single flush executes, thus adhers to flush coalscing logic. This is
> >
> > s/adhers/adheres/
> >
> > s/coalscing/coalescing/
> >
> > > important for maintaining the flush request order with request coalscing.
> >
> > s/coalscing/coalescing/
>
> o.k. Sorry for the spelling mistakes.
>
> >
> > >
> > > Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> > > ---
> > >  drivers/nvdimm/nd_virtio.c   | 74 +++++++++++++++++++++++++++---------
> > >  drivers/nvdimm/virtio_pmem.c | 10 +++++
> > >  drivers/nvdimm/virtio_pmem.h | 16 ++++++++
> > >  3 files changed, 83 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > > index 10351d5b49fa..179ea7a73338 100644
> > > --- a/drivers/nvdimm/nd_virtio.c
> > > +++ b/drivers/nvdimm/nd_virtio.c
> > > @@ -100,26 +100,66 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> > >  /* The asynchronous flush callback function */
> > >  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > >  {
> > > -       /*
> > > -        * Create child bio for asynchronous flush and chain with
> > > -        * parent bio. Otherwise directly call nd_region flush.
> > > +       /* queue asynchronous flush and coalesce the flush requests */
> > > +       struct virtio_device *vdev = nd_region->provider_data;
> > > +       struct virtio_pmem *vpmem  = vdev->priv;
> > > +       ktime_t req_start = ktime_get_boottime();
> > > +       int ret = -EINPROGRESS;
> > > +
> > > +       spin_lock_irq(&vpmem->lock);
> >
> > Why a new lock and not continue to use ->pmem_lock?
>
> This spinlock is to protect entry in 'wait_event_lock_irq'
> and the Other spinlock is to protect virtio queue data.

Understood, but md shares the mddev->lock for both purposes, so I
would ask that you either document what motivates the locking split,
or just reuse the lock until a strong reason to split them arises.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v3 2/2] pmem: enable pmem_submit_bio for asynchronous flush
  2022-02-16  8:39     ` Pankaj Gupta
@ 2022-02-16 16:25       ` Dan Williams
  2022-02-16 17:01         ` Pankaj Gupta
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2022-02-16 16:25 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Linux NVDIMM, virtualization, Linux Kernel Mailing List, jmoyer,
	Stefan Hajnoczi, David Hildenbrand, Michael S. Tsirkin,
	Cornelia Huck, Vishal L Verma, Dave Jiang, Weiny, Ira,
	Pankaj Gupta

On Wed, Feb 16, 2022 at 12:39 AM Pankaj Gupta
<pankaj.gupta.linux@gmail.com> wrote:
>
> > >
> > > Return from "pmem_submit_bio" when asynchronous flush is
> > > still in progress in other context.
> > >
> > > Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> > > ---
> > >  drivers/nvdimm/pmem.c        | 15 ++++++++++++---
> > >  drivers/nvdimm/region_devs.c |  4 +++-
> > >  2 files changed, 15 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > > index fe7ece1534e1..f20e30277a68 100644
> > > --- a/drivers/nvdimm/pmem.c
> > > +++ b/drivers/nvdimm/pmem.c
> > > @@ -201,8 +201,12 @@ static void pmem_submit_bio(struct bio *bio)
> > >         struct pmem_device *pmem = bio->bi_bdev->bd_disk->private_data;
> > >         struct nd_region *nd_region = to_region(pmem);
> > >
> > > -       if (bio->bi_opf & REQ_PREFLUSH)
> > > +       if (bio->bi_opf & REQ_PREFLUSH) {
> > >                 ret = nvdimm_flush(nd_region, bio);
> > > +               /* asynchronous flush completes in other context */
> >
> > I think a negative error code is a confusing way to capture the case
> > of "bio successfully coalesced to previously pending flush request.
> > Perhaps reserve negative codes for failure, 0 for synchronously
> > completed, and > 0 for coalesced flush request.
>
> Yes. I implemented this way previously, will revert it to. Thanks!
>
> >
> > > +               if (ret == -EINPROGRESS)
> > > +                       return;
> > > +       }
> > >
> > >         do_acct = blk_queue_io_stat(bio->bi_bdev->bd_disk->queue);
> > >         if (do_acct)
> > > @@ -222,13 +226,18 @@ static void pmem_submit_bio(struct bio *bio)
> > >         if (do_acct)
> > >                 bio_end_io_acct(bio, start);
> > >
> > > -       if (bio->bi_opf & REQ_FUA)
> > > +       if (bio->bi_opf & REQ_FUA) {
> > >                 ret = nvdimm_flush(nd_region, bio);
> > > +               /* asynchronous flush completes in other context */
> > > +               if (ret == -EINPROGRESS)
> > > +                       return;
> > > +       }
> > >
> > >         if (ret)
> > >                 bio->bi_status = errno_to_blk_status(ret);
> > >
> > > -       bio_endio(bio);
> > > +       if (bio)
> > > +               bio_endio(bio);
> > >  }
> > >
> > >  static int pmem_rw_page(struct block_device *bdev, sector_t sector,
> > > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> > > index 9ccf3d608799..8512d2eaed4e 100644
> > > --- a/drivers/nvdimm/region_devs.c
> > > +++ b/drivers/nvdimm/region_devs.c
> > > @@ -1190,7 +1190,9 @@ int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
> > >         if (!nd_region->flush)
> > >                 rc = generic_nvdimm_flush(nd_region);
> > >         else {
> > > -               if (nd_region->flush(nd_region, bio))
> > > +               rc = nd_region->flush(nd_region, bio);
> > > +               /* ongoing flush in other context */
> > > +               if (rc && rc != -EINPROGRESS)
> > >                         rc = -EIO;
> >
> > Why change this to -EIO vs just let the error code through untranslated?
>
> The reason was to be generic error code instead of returning host side
> return codes to guest?

Ok, maybe a comment to indicate the need to avoid exposing these error
codes toa guest so someone does not ask the same question in the
future?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v3 2/2] pmem: enable pmem_submit_bio for asynchronous flush
  2022-02-16 16:25       ` Dan Williams
@ 2022-02-16 17:01         ` Pankaj Gupta
  0 siblings, 0 replies; 12+ messages in thread
From: Pankaj Gupta @ 2022-02-16 17:01 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux NVDIMM, virtualization, Linux Kernel Mailing List, jmoyer,
	Stefan Hajnoczi, David Hildenbrand, Michael S. Tsirkin,
	Cornelia Huck, Vishal L Verma, Dave Jiang, Weiny, Ira,
	Pankaj Gupta

> > > > Return from "pmem_submit_bio" when asynchronous flush is
> > > > still in progress in other context.
> > > >
> > > > Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> > > > ---
> > > >  drivers/nvdimm/pmem.c        | 15 ++++++++++++---
> > > >  drivers/nvdimm/region_devs.c |  4 +++-
> > > >  2 files changed, 15 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > > > index fe7ece1534e1..f20e30277a68 100644
> > > > --- a/drivers/nvdimm/pmem.c
> > > > +++ b/drivers/nvdimm/pmem.c
> > > > @@ -201,8 +201,12 @@ static void pmem_submit_bio(struct bio *bio)
> > > >         struct pmem_device *pmem = bio->bi_bdev->bd_disk->private_data;
> > > >         struct nd_region *nd_region = to_region(pmem);
> > > >
> > > > -       if (bio->bi_opf & REQ_PREFLUSH)
> > > > +       if (bio->bi_opf & REQ_PREFLUSH) {
> > > >                 ret = nvdimm_flush(nd_region, bio);
> > > > +               /* asynchronous flush completes in other context */
> > >
> > > I think a negative error code is a confusing way to capture the case
> > > of "bio successfully coalesced to previously pending flush request.
> > > Perhaps reserve negative codes for failure, 0 for synchronously
> > > completed, and > 0 for coalesced flush request.
> >
> > Yes. I implemented this way previously, will revert it to. Thanks!
> >
> > >
> > > > +               if (ret == -EINPROGRESS)
> > > > +                       return;
> > > > +       }
> > > >
> > > >         do_acct = blk_queue_io_stat(bio->bi_bdev->bd_disk->queue);
> > > >         if (do_acct)
> > > > @@ -222,13 +226,18 @@ static void pmem_submit_bio(struct bio *bio)
> > > >         if (do_acct)
> > > >                 bio_end_io_acct(bio, start);
> > > >
> > > > -       if (bio->bi_opf & REQ_FUA)
> > > > +       if (bio->bi_opf & REQ_FUA) {
> > > >                 ret = nvdimm_flush(nd_region, bio);
> > > > +               /* asynchronous flush completes in other context */
> > > > +               if (ret == -EINPROGRESS)
> > > > +                       return;
> > > > +       }
> > > >
> > > >         if (ret)
> > > >                 bio->bi_status = errno_to_blk_status(ret);
> > > >
> > > > -       bio_endio(bio);
> > > > +       if (bio)
> > > > +               bio_endio(bio);
> > > >  }
> > > >
> > > >  static int pmem_rw_page(struct block_device *bdev, sector_t sector,
> > > > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> > > > index 9ccf3d608799..8512d2eaed4e 100644
> > > > --- a/drivers/nvdimm/region_devs.c
> > > > +++ b/drivers/nvdimm/region_devs.c
> > > > @@ -1190,7 +1190,9 @@ int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
> > > >         if (!nd_region->flush)
> > > >                 rc = generic_nvdimm_flush(nd_region);
> > > >         else {
> > > > -               if (nd_region->flush(nd_region, bio))
> > > > +               rc = nd_region->flush(nd_region, bio);
> > > > +               /* ongoing flush in other context */
> > > > +               if (rc && rc != -EINPROGRESS)
> > > >                         rc = -EIO;
> > >
> > > Why change this to -EIO vs just let the error code through untranslated?
> >
> > The reason was to be generic error code instead of returning host side
> > return codes to guest?
>
> Ok, maybe a comment to indicate the need to avoid exposing these error
> codes toa guest so someone does not ask the same question in the
> future?

Sure.

Thanks,
Pankaj

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush
  2022-02-16 16:23       ` Dan Williams
@ 2022-02-16 17:04         ` Pankaj Gupta
  0 siblings, 0 replies; 12+ messages in thread
From: Pankaj Gupta @ 2022-02-16 17:04 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux NVDIMM, virtualization, Linux Kernel Mailing List, jmoyer,
	Stefan Hajnoczi, David Hildenbrand, Michael S. Tsirkin,
	Cornelia Huck, Vishal L Verma, Dave Jiang, Weiny, Ira,
	Pankaj Gupta

> >
> > > >
> > > > Enable asynchronous flush for virtio pmem using work queue. Also,
> > > > coalesce the flush requests when a flush is already in process.
> > > > This functionality is copied from md/RAID code.
> > > >
> > > > When a flush is already in process, new flush requests wait till
> > > > previous flush completes in another context (work queue). For all
> > > > the requests come between ongoing flush and new flush start time, only
> > > > single flush executes, thus adhers to flush coalscing logic. This is
> > >
> > > s/adhers/adheres/
> > >
> > > s/coalscing/coalescing/
> > >
> > > > important for maintaining the flush request order with request coalscing.
> > >
> > > s/coalscing/coalescing/
> >
> > o.k. Sorry for the spelling mistakes.
> >
> > >
> > > >
> > > > Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> > > > ---
> > > >  drivers/nvdimm/nd_virtio.c   | 74 +++++++++++++++++++++++++++---------
> > > >  drivers/nvdimm/virtio_pmem.c | 10 +++++
> > > >  drivers/nvdimm/virtio_pmem.h | 16 ++++++++
> > > >  3 files changed, 83 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > > > index 10351d5b49fa..179ea7a73338 100644
> > > > --- a/drivers/nvdimm/nd_virtio.c
> > > > +++ b/drivers/nvdimm/nd_virtio.c
> > > > @@ -100,26 +100,66 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> > > >  /* The asynchronous flush callback function */
> > > >  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > > >  {
> > > > -       /*
> > > > -        * Create child bio for asynchronous flush and chain with
> > > > -        * parent bio. Otherwise directly call nd_region flush.
> > > > +       /* queue asynchronous flush and coalesce the flush requests */
> > > > +       struct virtio_device *vdev = nd_region->provider_data;
> > > > +       struct virtio_pmem *vpmem  = vdev->priv;
> > > > +       ktime_t req_start = ktime_get_boottime();
> > > > +       int ret = -EINPROGRESS;
> > > > +
> > > > +       spin_lock_irq(&vpmem->lock);
> > >
> > > Why a new lock and not continue to use ->pmem_lock?
> >
> > This spinlock is to protect entry in 'wait_event_lock_irq'
> > and the Other spinlock is to protect virtio queue data.
>
> Understood, but md shares the mddev->lock for both purposes, so I
> would ask that you either document what motivates the locking split,
> or just reuse the lock until a strong reason to split them arises.

O.k. Will check again if we could use same lock Or document it.

Thanks,
Pankaj

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-02-16 17:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11 16:19 [RFC v3 0/2] virtio-pmem: Asynchronous flush Pankaj Gupta
2022-01-11 16:19 ` [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush Pankaj Gupta
2022-02-16  3:21   ` Dan Williams
2022-02-16  8:47     ` Pankaj Gupta
2022-02-16 16:23       ` Dan Williams
2022-02-16 17:04         ` Pankaj Gupta
2022-01-11 16:19 ` [RFC v3 2/2] pmem: enable pmem_submit_bio for asynchronous flush Pankaj Gupta
2022-02-16  3:34   ` Dan Williams
2022-02-16  8:39     ` Pankaj Gupta
2022-02-16 16:25       ` Dan Williams
2022-02-16 17:01         ` Pankaj Gupta
2022-01-28 10:11 ` [RFC v3 0/2] virtio-pmem: Asynchronous flush Pankaj Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).