All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dm log writes: make sure the log super sectors are written in order
@ 2019-06-03 14:18 zhangyi (F)
  2019-06-03 14:46 ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: zhangyi (F) @ 2019-06-03 14:18 UTC (permalink / raw)
  To: dm-devel; +Cc: houtao1, snitzer, agk, yi.zhang

Currently, although we submit super bios in log-write thread orderly
(the super.nr_entries is incremented by each logged entry), the
submit_bio() cannot make sure that each super sector is written to log
device in order. So the submitting bio of each super sector may be
out-of-order, and then the final nr_entries maybe small than the real
entries submitted.

This problem can be reproduced by the xfstests generic/455 with ext4,
which may complained below after running the test:

  QA output created by 455
 -Silence is golden
 +mark 'end' does not exist

This patch serialize submitting super secotrs to make sure each super
sectors are written to log disk in order.

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
 drivers/md/dm-log-writes.c | 56 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 9ea2b02..37088c7 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -60,6 +60,7 @@
 
 #define WRITE_LOG_VERSION 1ULL
 #define WRITE_LOG_MAGIC 0x6a736677736872ULL
+#define WRITE_LOG_SUPER_SECTOR 0
 
 /*
  * The disk format for this is braindead simple.
@@ -115,6 +116,8 @@ struct log_writes_c {
 	struct list_head logging_blocks;
 	wait_queue_head_t wait;
 	struct task_struct *log_kthread;
+	bool submitting_super;
+	wait_queue_head_t wait_super;
 };
 
 struct pending_block {
@@ -180,6 +183,34 @@ static void log_end_io(struct bio *bio)
 	bio_put(bio);
 }
 
+static void log_end_super(struct bio *bio)
+{
+	struct log_writes_c *lc = bio->bi_private;
+	unsigned long flags;
+
+	spin_lock_irqsave(&lc->blocks_lock, flags);
+	if (bio->bi_status) {
+		DMERR("Error writing super block, error=%d",
+		      bio->bi_status);
+		lc->logging_enabled = false;
+	}
+
+	WARN_ON(!lc->submitting_super);
+	lc->submitting_super = false;
+	spin_unlock_irqrestore(&lc->blocks_lock, flags);
+
+	/*
+	 * Wake up log-write kthread that previous super sector has
+	 * been written to disk.
+	 */
+	if (waitqueue_active(&lc->wait_super))
+		wake_up(&lc->wait_super);
+
+	bio_free_pages(bio);
+	put_io_block(lc);
+	bio_put(bio);
+}
+
 /*
  * Meant to be called if there is an error, it will free all the pages
  * associated with the block.
@@ -215,7 +246,8 @@ static int write_metadata(struct log_writes_c *lc, void *entry,
 	bio->bi_iter.bi_size = 0;
 	bio->bi_iter.bi_sector = sector;
 	bio_set_dev(bio, lc->logdev->bdev);
-	bio->bi_end_io = log_end_io;
+	bio->bi_end_io = (sector == WRITE_LOG_SUPER_SECTOR) ?
+			  log_end_super : log_end_io;
 	bio->bi_private = lc;
 	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 
@@ -418,7 +450,25 @@ static int log_super(struct log_writes_c *lc)
 	super.nr_entries = cpu_to_le64(lc->logged_entries);
 	super.sectorsize = cpu_to_le32(lc->sectorsize);
 
-	if (write_metadata(lc, &super, sizeof(super), NULL, 0, 0)) {
+	/*
+	 * Super sector should be writen in-order, or else the
+	 * nr_entries could be small than the real submitted entries.
+	 * So wait previous super sector submitted here.
+	 */
+	if (!lc->submitting_super)
+		goto write_super;
+
+	spin_lock_irq(&lc->blocks_lock);
+	if (!lc->submitting_super) {
+		spin_unlock_irq(&lc->blocks_lock);
+		goto write_super;
+	}
+	spin_unlock_irq(&lc->blocks_lock);
+	wait_event(lc->wait_super, !lc->submitting_super);
+write_super:
+	lc->submitting_super = true;
+	if (write_metadata(lc, &super, sizeof(super), NULL, 0,
+			   WRITE_LOG_SUPER_SECTOR)) {
 		DMERR("Couldn't write super");
 		return -1;
 	}
@@ -531,6 +581,7 @@ static int log_writes_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	INIT_LIST_HEAD(&lc->unflushed_blocks);
 	INIT_LIST_HEAD(&lc->logging_blocks);
 	init_waitqueue_head(&lc->wait);
+	init_waitqueue_head(&lc->wait_super);
 	atomic_set(&lc->io_blocks, 0);
 	atomic_set(&lc->pending_blocks, 0);
 
@@ -570,6 +621,7 @@ static int log_writes_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	lc->logging_enabled = true;
 	lc->end_sector = logdev_last_sector(lc);
 	lc->device_supports_discard = true;
+	lc->submitting_super = false;
 
 	ti->num_flush_bios = 1;
 	ti->flush_supported = true;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: dm log writes: make sure the log super sectors are written in order
  2019-06-03 14:18 [PATCH] dm log writes: make sure the log super sectors are written in order zhangyi (F)
@ 2019-06-03 14:46 ` Mike Snitzer
  2019-06-03 19:02   ` Josef Bacik
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2019-06-03 14:46 UTC (permalink / raw)
  To: zhangyi (F), Josef Bacik; +Cc: dm-devel, agk, houtao1

On Mon, Jun 03 2019 at 10:18am -0400,
zhangyi (F) <yi.zhang@huawei.com> wrote:

> Currently, although we submit super bios in log-write thread orderly
> (the super.nr_entries is incremented by each logged entry), the
> submit_bio() cannot make sure that each super sector is written to log
> device in order. So the submitting bio of each super sector may be
> out-of-order, and then the final nr_entries maybe small than the real
> entries submitted.
> 
> This problem can be reproduced by the xfstests generic/455 with ext4,
> which may complained below after running the test:
> 
>   QA output created by 455
>  -Silence is golden
>  +mark 'end' does not exist
> 
> This patch serialize submitting super secotrs to make sure each super
> sectors are written to log disk in order.
> 
> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>

This doesn't feel right.

You raised 2 things you're trying to address:
1) IO is out of order
2) accounting (nr_entries) isn't correct

I'll need to reviewer closer but serializing "super" bios doesn't seem
like the best fix.

Josef, any chance you can weigh in on this?  AFAIK you are still "the
man" for dm-log-writes ;)

Mike


> ---
>  drivers/md/dm-log-writes.c | 56 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
> index 9ea2b02..37088c7 100644
> --- a/drivers/md/dm-log-writes.c
> +++ b/drivers/md/dm-log-writes.c
> @@ -60,6 +60,7 @@
>  
>  #define WRITE_LOG_VERSION 1ULL
>  #define WRITE_LOG_MAGIC 0x6a736677736872ULL
> +#define WRITE_LOG_SUPER_SECTOR 0
>  
>  /*
>   * The disk format for this is braindead simple.
> @@ -115,6 +116,8 @@ struct log_writes_c {
>  	struct list_head logging_blocks;
>  	wait_queue_head_t wait;
>  	struct task_struct *log_kthread;
> +	bool submitting_super;
> +	wait_queue_head_t wait_super;
>  };
>  
>  struct pending_block {
> @@ -180,6 +183,34 @@ static void log_end_io(struct bio *bio)
>  	bio_put(bio);
>  }
>  
> +static void log_end_super(struct bio *bio)
> +{
> +	struct log_writes_c *lc = bio->bi_private;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&lc->blocks_lock, flags);
> +	if (bio->bi_status) {
> +		DMERR("Error writing super block, error=%d",
> +		      bio->bi_status);
> +		lc->logging_enabled = false;
> +	}
> +
> +	WARN_ON(!lc->submitting_super);
> +	lc->submitting_super = false;
> +	spin_unlock_irqrestore(&lc->blocks_lock, flags);
> +
> +	/*
> +	 * Wake up log-write kthread that previous super sector has
> +	 * been written to disk.
> +	 */
> +	if (waitqueue_active(&lc->wait_super))
> +		wake_up(&lc->wait_super);
> +
> +	bio_free_pages(bio);
> +	put_io_block(lc);
> +	bio_put(bio);
> +}
> +
>  /*
>   * Meant to be called if there is an error, it will free all the pages
>   * associated with the block.
> @@ -215,7 +246,8 @@ static int write_metadata(struct log_writes_c *lc, void *entry,
>  	bio->bi_iter.bi_size = 0;
>  	bio->bi_iter.bi_sector = sector;
>  	bio_set_dev(bio, lc->logdev->bdev);
> -	bio->bi_end_io = log_end_io;
> +	bio->bi_end_io = (sector == WRITE_LOG_SUPER_SECTOR) ?
> +			  log_end_super : log_end_io;
>  	bio->bi_private = lc;
>  	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>  
> @@ -418,7 +450,25 @@ static int log_super(struct log_writes_c *lc)
>  	super.nr_entries = cpu_to_le64(lc->logged_entries);
>  	super.sectorsize = cpu_to_le32(lc->sectorsize);
>  
> -	if (write_metadata(lc, &super, sizeof(super), NULL, 0, 0)) {
> +	/*
> +	 * Super sector should be writen in-order, or else the
> +	 * nr_entries could be small than the real submitted entries.
> +	 * So wait previous super sector submitted here.
> +	 */
> +	if (!lc->submitting_super)
> +		goto write_super;
> +
> +	spin_lock_irq(&lc->blocks_lock);
> +	if (!lc->submitting_super) {
> +		spin_unlock_irq(&lc->blocks_lock);
> +		goto write_super;
> +	}
> +	spin_unlock_irq(&lc->blocks_lock);
> +	wait_event(lc->wait_super, !lc->submitting_super);
> +write_super:
> +	lc->submitting_super = true;
> +	if (write_metadata(lc, &super, sizeof(super), NULL, 0,
> +			   WRITE_LOG_SUPER_SECTOR)) {
>  		DMERR("Couldn't write super");
>  		return -1;
>  	}
> @@ -531,6 +581,7 @@ static int log_writes_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	INIT_LIST_HEAD(&lc->unflushed_blocks);
>  	INIT_LIST_HEAD(&lc->logging_blocks);
>  	init_waitqueue_head(&lc->wait);
> +	init_waitqueue_head(&lc->wait_super);
>  	atomic_set(&lc->io_blocks, 0);
>  	atomic_set(&lc->pending_blocks, 0);
>  
> @@ -570,6 +621,7 @@ static int log_writes_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	lc->logging_enabled = true;
>  	lc->end_sector = logdev_last_sector(lc);
>  	lc->device_supports_discard = true;
> +	lc->submitting_super = false;
>  
>  	ti->num_flush_bios = 1;
>  	ti->flush_supported = true;
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dm log writes: make sure the log super sectors are written in order
  2019-06-03 14:46 ` Mike Snitzer
@ 2019-06-03 19:02   ` Josef Bacik
  2019-06-04  4:20     ` zhangyi (F)
  0 siblings, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2019-06-03 19:02 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Josef Bacik, dm-devel, houtao1, agk, zhangyi (F)

On Mon, Jun 03, 2019 at 10:46:08AM -0400, Mike Snitzer wrote:
> On Mon, Jun 03 2019 at 10:18am -0400,
> zhangyi (F) <yi.zhang@huawei.com> wrote:
> 
> > Currently, although we submit super bios in log-write thread orderly
> > (the super.nr_entries is incremented by each logged entry), the
> > submit_bio() cannot make sure that each super sector is written to log
> > device in order. So the submitting bio of each super sector may be
> > out-of-order, and then the final nr_entries maybe small than the real
> > entries submitted.
> > 
> > This problem can be reproduced by the xfstests generic/455 with ext4,
> > which may complained below after running the test:
> > 
> >   QA output created by 455
> >  -Silence is golden
> >  +mark 'end' does not exist
> > 
> > This patch serialize submitting super secotrs to make sure each super
> > sectors are written to log disk in order.
> > 
> > Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
> 
> This doesn't feel right.
> 
> You raised 2 things you're trying to address:
> 1) IO is out of order
> 2) accounting (nr_entries) isn't correct
> 
> I'll need to reviewer closer but serializing "super" bios doesn't seem
> like the best fix.
> 
> Josef, any chance you can weigh in on this?  AFAIK you are still "the
> man" for dm-log-writes ;)
> 

Well the #2 is caused by #1, we submit the bio for a super two times in a row
and it's a crapshoot which one makes it to disk.  So he's right, and it's kind
of funny because this is the sort of problem that dm-log-writes was written to
catch, and I fucked it up here ;).  That being said this is a bit
over-engineered, can we just add a completion to the log buff and do a
wait_for_completion() when we're writing out the super?  It's not like this thing
needs to be super performant.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dm log writes: make sure the log super sectors are written in order
  2019-06-03 19:02   ` Josef Bacik
@ 2019-06-04  4:20     ` zhangyi (F)
  2019-06-04 17:46       ` Josef Bacik
  0 siblings, 1 reply; 5+ messages in thread
From: zhangyi (F) @ 2019-06-04  4:20 UTC (permalink / raw)
  To: Josef Bacik, Mike Snitzer; +Cc: Josef Bacik, dm-devel, agk, houtao1

On 2019/6/4 3:02, Josef Bacik Wrote:
> On Mon, Jun 03, 2019 at 10:46:08AM -0400, Mike Snitzer wrote:
>> On Mon, Jun 03 2019 at 10:18am -0400,
>> zhangyi (F) <yi.zhang@huawei.com> wrote:
>>
>>> Currently, although we submit super bios in log-write thread orderly
>>> (the super.nr_entries is incremented by each logged entry), the
>>> submit_bio() cannot make sure that each super sector is written to log
>>> device in order. So the submitting bio of each super sector may be
>>> out-of-order, and then the final nr_entries maybe small than the real
>>> entries submitted.
>>>
>>> This problem can be reproduced by the xfstests generic/455 with ext4,
>>> which may complained below after running the test:
>>>
>>>   QA output created by 455
>>>  -Silence is golden
>>>  +mark 'end' does not exist
>>>
>>> This patch serialize submitting super secotrs to make sure each super
>>> sectors are written to log disk in order.
>>>
>>> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
>>
>> This doesn't feel right.
>>
>> You raised 2 things you're trying to address:
>> 1) IO is out of order
>> 2) accounting (nr_entries) isn't correct
>>
>> I'll need to reviewer closer but serializing "super" bios doesn't seem
>> like the best fix.
>>
>> Josef, any chance you can weigh in on this?  AFAIK you are still "the
>> man" for dm-log-writes ;)
>>
> 
> Well the #2 is caused by #1, we submit the bio for a super two times in a row
> and it's a crapshoot which one makes it to disk.  So he's right, and it's kind
> of funny because this is the sort of problem that dm-log-writes was written to
> catch, and I fucked it up here ;).  That being said this is a bit
> over-engineered, can we just add a completion to the log buff and do a
> wait_for_completion() when we're writing out the super?  It's not like this thing
> needs to be super performant.  Thanks,
> 

Hi, Josef

Thanks for your suggestions. It's fine by me to switch to use completion
instead. Some thing like this?

...
@@ -180,6 +182,15 @@ static void log_end_io(struct bio *bio)
        bio_put(bio);
 }

+static void log_end_super(struct bio *bio)
+{
+       struct log_writes_c *lc = bio->bi_private;
+
+       /* Wake up log-write kthread that super has been written */
+       complete(&lc->super_comp);
+       log_end_io(bio);
+}
+
 /*
  * Meant to be called if there is an error, it will free all the pages
  * associated with the block.
@@ -215,7 +226,8 @@ static int write_metadata(struct log_writes_c *lc, void *entry,
        bio->bi_iter.bi_size = 0;
        bio->bi_iter.bi_sector = sector;
        bio_set_dev(bio, lc->logdev->bdev);
-       bio->bi_end_io = log_end_io;
+       bio->bi_end_io = (sector == WRITE_LOG_SUPER_SECTOR) ?
+                         log_end_super : log_end_io;
        bio->bi_private = lc;
        bio_set_op_attrs(bio, REQ_OP_WRITE, 0);

@@ -418,11 +430,19 @@ static int log_super(struct log_writes_c *lc)
        super.nr_entries = cpu_to_le64(lc->logged_entries);
        super.sectorsize = cpu_to_le32(lc->sectorsize);

-       if (write_metadata(lc, &super, sizeof(super), NULL, 0, 0)) {
+       if (write_metadata(lc, &super, sizeof(super), NULL, 0,
+                          WRITE_LOG_SUPER_SECTOR)) {
                DMERR("Couldn't write super");
                return -1;
        }

+       /*
+        * Super sector should be writen in-order, or else the
+        * nr_entries could be rewritten by the old bio and small
+        * than the real submitted entries.
+        */
+       wait_for_completion_io(&lc->super_comp);
+
        return 0;
 }
...

But one thing need to mention is that, IIUC, If we use completion, the
log_writes_kthread have to wait on writing out every super bio, so it cannot
keep on submitting subsequent log bios. It maybe more performance impact
than my v1 (it only wait on previous super if it was not finished).

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dm log writes: make sure the log super sectors are written in order
  2019-06-04  4:20     ` zhangyi (F)
@ 2019-06-04 17:46       ` Josef Bacik
  0 siblings, 0 replies; 5+ messages in thread
From: Josef Bacik @ 2019-06-04 17:46 UTC (permalink / raw)
  To: zhangyi (F)
  Cc: Mike Snitzer, Josef Bacik, Josef Bacik, dm-devel, houtao1, agk

On Tue, Jun 04, 2019 at 12:20:53PM +0800, zhangyi (F) wrote:
> On 2019/6/4 3:02, Josef Bacik Wrote:
> > On Mon, Jun 03, 2019 at 10:46:08AM -0400, Mike Snitzer wrote:
> >> On Mon, Jun 03 2019 at 10:18am -0400,
> >> zhangyi (F) <yi.zhang@huawei.com> wrote:
> >>
> >>> Currently, although we submit super bios in log-write thread orderly
> >>> (the super.nr_entries is incremented by each logged entry), the
> >>> submit_bio() cannot make sure that each super sector is written to log
> >>> device in order. So the submitting bio of each super sector may be
> >>> out-of-order, and then the final nr_entries maybe small than the real
> >>> entries submitted.
> >>>
> >>> This problem can be reproduced by the xfstests generic/455 with ext4,
> >>> which may complained below after running the test:
> >>>
> >>>   QA output created by 455
> >>>  -Silence is golden
> >>>  +mark 'end' does not exist
> >>>
> >>> This patch serialize submitting super secotrs to make sure each super
> >>> sectors are written to log disk in order.
> >>>
> >>> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
> >>
> >> This doesn't feel right.
> >>
> >> You raised 2 things you're trying to address:
> >> 1) IO is out of order
> >> 2) accounting (nr_entries) isn't correct
> >>
> >> I'll need to reviewer closer but serializing "super" bios doesn't seem
> >> like the best fix.
> >>
> >> Josef, any chance you can weigh in on this?  AFAIK you are still "the
> >> man" for dm-log-writes ;)
> >>
> > 
> > Well the #2 is caused by #1, we submit the bio for a super two times in a row
> > and it's a crapshoot which one makes it to disk.  So he's right, and it's kind
> > of funny because this is the sort of problem that dm-log-writes was written to
> > catch, and I fucked it up here ;).  That being said this is a bit
> > over-engineered, can we just add a completion to the log buff and do a
> > wait_for_completion() when we're writing out the super?  It's not like this thing
> > needs to be super performant.  Thanks,
> > 
> 
> Hi, Josef
> 
> Thanks for your suggestions. It's fine by me to switch to use completion
> instead. Some thing like this?
> 
> ...
> @@ -180,6 +182,15 @@ static void log_end_io(struct bio *bio)
>         bio_put(bio);
>  }
> 
> +static void log_end_super(struct bio *bio)
> +{
> +       struct log_writes_c *lc = bio->bi_private;
> +
> +       /* Wake up log-write kthread that super has been written */
> +       complete(&lc->super_comp);
> +       log_end_io(bio);
> +}
> +
>  /*
>   * Meant to be called if there is an error, it will free all the pages
>   * associated with the block.
> @@ -215,7 +226,8 @@ static int write_metadata(struct log_writes_c *lc, void *entry,
>         bio->bi_iter.bi_size = 0;
>         bio->bi_iter.bi_sector = sector;
>         bio_set_dev(bio, lc->logdev->bdev);
> -       bio->bi_end_io = log_end_io;
> +       bio->bi_end_io = (sector == WRITE_LOG_SUPER_SECTOR) ?
> +                         log_end_super : log_end_io;
>         bio->bi_private = lc;
>         bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> 
> @@ -418,11 +430,19 @@ static int log_super(struct log_writes_c *lc)
>         super.nr_entries = cpu_to_le64(lc->logged_entries);
>         super.sectorsize = cpu_to_le32(lc->sectorsize);
> 
> -       if (write_metadata(lc, &super, sizeof(super), NULL, 0, 0)) {
> +       if (write_metadata(lc, &super, sizeof(super), NULL, 0,
> +                          WRITE_LOG_SUPER_SECTOR)) {
>                 DMERR("Couldn't write super");
>                 return -1;
>         }
> 
> +       /*
> +        * Super sector should be writen in-order, or else the
> +        * nr_entries could be rewritten by the old bio and small
> +        * than the real submitted entries.
> +        */
> +       wait_for_completion_io(&lc->super_comp);
> +
>         return 0;
>  }
> ...
> 
> But one thing need to mention is that, IIUC, If we use completion, the
> log_writes_kthread have to wait on writing out every super bio, so it cannot
> keep on submitting subsequent log bios. It maybe more performance impact
> than my v1 (it only wait on previous super if it was not finished).
>

Yeah but like I said, we're not worried about performance here.  It's in an
async thread, the fs will never be waiting on these IO's to finish.  It may slow
down the end of xfstests that need to wait for the log to finish flushing, but I
think that's probably fine.

Another solution is to only write the super when we destroy the device, cause
frankly we don't really need the super until we're done anyway.  Thanks,

Josef 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-04 17:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-03 14:18 [PATCH] dm log writes: make sure the log super sectors are written in order zhangyi (F)
2019-06-03 14:46 ` Mike Snitzer
2019-06-03 19:02   ` Josef Bacik
2019-06-04  4:20     ` zhangyi (F)
2019-06-04 17:46       ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.