All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dm writecache: fix data corruption when reloading the target
@ 2020-04-08 19:02 Mikulas Patocka
  2020-04-14 19:05 ` Mike Snitzer
  0 siblings, 1 reply; 8+ messages in thread
From: Mikulas Patocka @ 2020-04-08 19:02 UTC (permalink / raw)
  To: Mike Snitzer, David Teigland; +Cc: dm-devel

The dm-writecache reads metadata in the target constructor. However, when 
we reload the target, there could be another active instance running on 
the same device. This is the sequence of operations when doing a reload:

1. construct new target
2. suspend old target
3. resume new target
4. destroy old target

Metadata that were written by the old target between steps 1 and 2 would
not be visible by the new target.

This patch fixes the data corruption by loading the metadata in the resume
handler.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# v4.18+
Fixes: 48debafe4f2f ("dm: add writecache target")

---
 drivers/md/dm-writecache.c |   44 ++++++++++++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 14 deletions(-)

Index: linux-2.6/drivers/md/dm-writecache.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-writecache.c	2020-04-08 14:47:17.000000000 +0200
+++ linux-2.6/drivers/md/dm-writecache.c	2020-04-08 20:59:15.000000000 +0200
@@ -931,6 +931,24 @@ static int writecache_alloc_entries(stru
 	return 0;
 }
 
+static int writecache_read_metadata(struct dm_writecache *wc, sector_t n_sectors)
+{
+	struct dm_io_region region;
+	struct dm_io_request req;
+
+	region.bdev = wc->ssd_dev->bdev;
+	region.sector = wc->start_sector;
+	region.count = wc->metadata_sectors;
+	req.bi_op = REQ_OP_READ;
+	req.bi_op_flags = REQ_SYNC;
+	req.mem.type = DM_IO_VMA;
+	req.mem.ptr.vma = (char *)wc->memory_map;
+	req.client = wc->dm_io;
+	req.notify.fn = NULL;
+
+	return dm_io(&req, 1, &region, NULL);
+}
+
 static void writecache_resume(struct dm_target *ti)
 {
 	struct dm_writecache *wc = ti->private;
@@ -941,8 +959,16 @@ static void writecache_resume(struct dm_
 
 	wc_lock(wc);
 
-	if (WC_MODE_PMEM(wc))
+	if (WC_MODE_PMEM(wc)) {
 		persistent_memory_invalidate_cache(wc->memory_map, wc->memory_map_size);
+	} else {
+		r = writecache_read_metadata(wc, wc->metadata_sectors);
+		if (r) {
+			writecache_error(wc, r, "unable to read metadata: %d", r);
+			memset((char *)wc->memory_map + offsetof(struct wc_memory_superblock, entries), -1,
+			       (wc->metadata_sectors << SECTOR_SHIFT) - offsetof(struct wc_memory_superblock, entries));
+		}
+	}
 
 	wc->tree = RB_ROOT;
 	INIT_LIST_HEAD(&wc->lru);
@@ -2200,8 +2226,6 @@ invalid_optional:
 			goto bad;
 		}
 	} else {
-		struct dm_io_region region;
-		struct dm_io_request req;
 		size_t n_blocks, n_metadata_blocks;
 		uint64_t n_bitmap_bits;
 
@@ -2258,17 +2282,9 @@ invalid_optional:
 			goto bad;
 		}
 
-		region.bdev = wc->ssd_dev->bdev;
-		region.sector = wc->start_sector;
-		region.count = wc->metadata_sectors;
-		req.bi_op = REQ_OP_READ;
-		req.bi_op_flags = REQ_SYNC;
-		req.mem.type = DM_IO_VMA;
-		req.mem.ptr.vma = (char *)wc->memory_map;
-		req.client = wc->dm_io;
-		req.notify.fn = NULL;
-
-		r = dm_io(&req, 1, &region, NULL);
+		r = writecache_read_metadata(wc,
+			min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
+			    (sector_t)wc->metadata_sectors));
 		if (r) {
 			ti->error = "Unable to read metadata";
 			goto bad;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm writecache: fix data corruption when reloading the target
  2020-04-08 19:02 [PATCH] dm writecache: fix data corruption when reloading the target Mikulas Patocka
@ 2020-04-14 19:05 ` Mike Snitzer
  2020-04-15  8:14   ` Mikulas Patocka
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2020-04-14 19:05 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, David Teigland

On Wed, Apr 08 2020 at  3:02pm -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> The dm-writecache reads metadata in the target constructor. However, when 
> we reload the target, there could be another active instance running on 
> the same device. This is the sequence of operations when doing a reload:
> 
> 1. construct new target
> 2. suspend old target
> 3. resume new target
> 4. destroy old target
> 
> Metadata that were written by the old target between steps 1 and 2 would
> not be visible by the new target.
> 
> This patch fixes the data corruption by loading the metadata in the resume
> handler.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org	# v4.18+
> Fixes: 48debafe4f2f ("dm: add writecache target")
> 
> ---
>  drivers/md/dm-writecache.c |   44 ++++++++++++++++++++++++++++++--------------
>  1 file changed, 30 insertions(+), 14 deletions(-)
> 
> Index: linux-2.6/drivers/md/dm-writecache.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-writecache.c	2020-04-08 14:47:17.000000000 +0200
> +++ linux-2.6/drivers/md/dm-writecache.c	2020-04-08 20:59:15.000000000 +0200
> @@ -931,6 +931,24 @@ static int writecache_alloc_entries(stru
>  	return 0;
>  }
>  
> +static int writecache_read_metadata(struct dm_writecache *wc, sector_t n_sectors)
> +{
> +	struct dm_io_region region;
> +	struct dm_io_request req;
> +
> +	region.bdev = wc->ssd_dev->bdev;
> +	region.sector = wc->start_sector;
> +	region.count = wc->metadata_sectors;
> +	req.bi_op = REQ_OP_READ;
> +	req.bi_op_flags = REQ_SYNC;
> +	req.mem.type = DM_IO_VMA;
> +	req.mem.ptr.vma = (char *)wc->memory_map;
> +	req.client = wc->dm_io;
> +	req.notify.fn = NULL;
> +
> +	return dm_io(&req, 1, &region, NULL);
> +}
> +

You aren't using the passed n_sectors (for region.count?)


>  static void writecache_resume(struct dm_target *ti)
>  {
>  	struct dm_writecache *wc = ti->private;
> @@ -941,8 +959,16 @@ static void writecache_resume(struct dm_
>  
>  	wc_lock(wc);
>  
> -	if (WC_MODE_PMEM(wc))
> +	if (WC_MODE_PMEM(wc)) {
>  		persistent_memory_invalidate_cache(wc->memory_map, wc->memory_map_size);
> +	} else {
> +		r = writecache_read_metadata(wc, wc->metadata_sectors);
> +		if (r) {
> +			writecache_error(wc, r, "unable to read metadata: %d", r);
> +			memset((char *)wc->memory_map + offsetof(struct wc_memory_superblock, entries), -1,
> +			       (wc->metadata_sectors << SECTOR_SHIFT) - offsetof(struct wc_memory_superblock, entries));
> +		}
> +	}
>  
>  	wc->tree = RB_ROOT;
>  	INIT_LIST_HEAD(&wc->lru);
> @@ -2200,8 +2226,6 @@ invalid_optional:
>  			goto bad;
>  		}
>  	} else {
> -		struct dm_io_region region;
> -		struct dm_io_request req;
>  		size_t n_blocks, n_metadata_blocks;
>  		uint64_t n_bitmap_bits;
>  
> @@ -2258,17 +2282,9 @@ invalid_optional:
>  			goto bad;
>  		}
>  
> -		region.bdev = wc->ssd_dev->bdev;
> -		region.sector = wc->start_sector;
> -		region.count = wc->metadata_sectors;
> -		req.bi_op = REQ_OP_READ;
> -		req.bi_op_flags = REQ_SYNC;
> -		req.mem.type = DM_IO_VMA;
> -		req.mem.ptr.vma = (char *)wc->memory_map;
> -		req.client = wc->dm_io;
> -		req.notify.fn = NULL;
> -
> -		r = dm_io(&req, 1, &region, NULL);
> +		r = writecache_read_metadata(wc,
> +			min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
> +			    (sector_t)wc->metadata_sectors));

Can you explain why this is needed?  Why isn't wc->metadata_sectors
already compatible with wc->ssd_dev->bdev ?

Yet you just use wc->metadata_sectors in the new call to
writecache_read_metadata() in writecache_resume()...

Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm writecache: fix data corruption when reloading the target
  2020-04-14 19:05 ` Mike Snitzer
@ 2020-04-15  8:14   ` Mikulas Patocka
  2020-04-15 12:31     ` [PATCH v2] " Mikulas Patocka
  2020-04-15 13:01     ` Mike Snitzer
  0 siblings, 2 replies; 8+ messages in thread
From: Mikulas Patocka @ 2020-04-15  8:14 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, David Teigland



On Tue, 14 Apr 2020, Mike Snitzer wrote:

> On Wed, Apr 08 2020 at  3:02pm -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > The dm-writecache reads metadata in the target constructor. However, when 
> > we reload the target, there could be another active instance running on 
> > the same device. This is the sequence of operations when doing a reload:
> > 
> > 1. construct new target
> > 2. suspend old target
> > 3. resume new target
> > 4. destroy old target
> > 
> > Metadata that were written by the old target between steps 1 and 2 would
> > not be visible by the new target.
> > 
> > This patch fixes the data corruption by loading the metadata in the resume
> > handler.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > Cc: stable@vger.kernel.org	# v4.18+
> > Fixes: 48debafe4f2f ("dm: add writecache target")
> > 
> > ---
> >  drivers/md/dm-writecache.c |   44 ++++++++++++++++++++++++++++++--------------
> >  1 file changed, 30 insertions(+), 14 deletions(-)
> > 
> > Index: linux-2.6/drivers/md/dm-writecache.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-writecache.c	2020-04-08 14:47:17.000000000 +0200
> > +++ linux-2.6/drivers/md/dm-writecache.c	2020-04-08 20:59:15.000000000 +0200
> > @@ -931,6 +931,24 @@ static int writecache_alloc_entries(stru
> >  	return 0;
> >  }
> >  
> > +static int writecache_read_metadata(struct dm_writecache *wc, sector_t n_sectors)
> > +{
> > +	struct dm_io_region region;
> > +	struct dm_io_request req;
> > +
> > +	region.bdev = wc->ssd_dev->bdev;
> > +	region.sector = wc->start_sector;
> > +	region.count = wc->metadata_sectors;
> > +	req.bi_op = REQ_OP_READ;
> > +	req.bi_op_flags = REQ_SYNC;
> > +	req.mem.type = DM_IO_VMA;
> > +	req.mem.ptr.vma = (char *)wc->memory_map;
> > +	req.client = wc->dm_io;
> > +	req.notify.fn = NULL;
> > +
> > +	return dm_io(&req, 1, &region, NULL);
> > +}
> > +
> 
> You aren't using the passed n_sectors (for region.count?)
> 
> 
> >  static void writecache_resume(struct dm_target *ti)
> >  {
> >  	struct dm_writecache *wc = ti->private;
> > @@ -941,8 +959,16 @@ static void writecache_resume(struct dm_
> >  
> >  	wc_lock(wc);
> >  
> > -	if (WC_MODE_PMEM(wc))
> > +	if (WC_MODE_PMEM(wc)) {
> >  		persistent_memory_invalidate_cache(wc->memory_map, wc->memory_map_size);
> > +	} else {
> > +		r = writecache_read_metadata(wc, wc->metadata_sectors);
> > +		if (r) {
> > +			writecache_error(wc, r, "unable to read metadata: %d", r);
> > +			memset((char *)wc->memory_map + offsetof(struct wc_memory_superblock, entries), -1,
> > +			       (wc->metadata_sectors << SECTOR_SHIFT) - offsetof(struct wc_memory_superblock, entries));
> > +		}
> > +	}
> >  
> >  	wc->tree = RB_ROOT;
> >  	INIT_LIST_HEAD(&wc->lru);
> > @@ -2200,8 +2226,6 @@ invalid_optional:
> >  			goto bad;
> >  		}
> >  	} else {
> > -		struct dm_io_region region;
> > -		struct dm_io_request req;
> >  		size_t n_blocks, n_metadata_blocks;
> >  		uint64_t n_bitmap_bits;
> >  
> > @@ -2258,17 +2282,9 @@ invalid_optional:
> >  			goto bad;
> >  		}
> >  
> > -		region.bdev = wc->ssd_dev->bdev;
> > -		region.sector = wc->start_sector;
> > -		region.count = wc->metadata_sectors;
> > -		req.bi_op = REQ_OP_READ;
> > -		req.bi_op_flags = REQ_SYNC;
> > -		req.mem.type = DM_IO_VMA;
> > -		req.mem.ptr.vma = (char *)wc->memory_map;
> > -		req.client = wc->dm_io;
> > -		req.notify.fn = NULL;
> > -
> > -		r = dm_io(&req, 1, &region, NULL);
> > +		r = writecache_read_metadata(wc,
> > +			min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
> > +			    (sector_t)wc->metadata_sectors));
> 
> Can you explain why this is needed?  Why isn't wc->metadata_sectors
> already compatible with wc->ssd_dev->bdev ?

bdev_logical_block_size is the minimum size accepted by the device. If we 
used just bdev_logical_block_size(wc->ssd_dev->bdev), someone could (by 
using extremely small device with large logical_block_size) trigger 
writing out of the allocated memory.

> Yet you just use wc->metadata_sectors in the new call to
> writecache_read_metadata() in writecache_resume()...

This was my mistake. Change it to "region.count = n_sectors";

> Mike

Mikulas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] dm writecache: fix data corruption when reloading the target
  2020-04-15  8:14   ` Mikulas Patocka
@ 2020-04-15 12:31     ` Mikulas Patocka
  2020-04-15 13:01     ` Mike Snitzer
  1 sibling, 0 replies; 8+ messages in thread
From: Mikulas Patocka @ 2020-04-15 12:31 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, David Teigland

Here I resubmit it.

The reason for this is: we want to read as little data as possible (that's 
bdev_logical_block_size), however, if the user has a pathological setup 
where wc->metadata_sectors < bdev_logical_block_size >> SECTOR_SHIFT, we 
don't want to over-read the allocated memory. Note that if 
wc->metadata_sectors < bdev_logical_block_size >> SECTOR_SHIFT, it won't 
work anyway, but I tried to prevent kernel memory corruption.

+               r = writecache_read_metadata(wc,
+                       min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
+                           (sector_t)wc->metadata_sectors));



From: Mikulas Patocka <mpatocka@redhat.com>

dm writecache: fix data corruption when reloading the target

The dm-writecache reads metadata in the target constructor. However, when
we reload the target, there could be another active instance running on
the same device. This is the sequence of operations when doing a reload:

1. construct new target
2. suspend old target
3. resume new target
4. destroy old target

Metadata that were written by the old target between steps 1 and 2 would
not be visible by the new target.

This patch fixes the data corruption by loading the metadata in the resume
handler.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# v4.18+
Fixes: 48debafe4f2f ("dm: add writecache target")

---
 drivers/md/dm-writecache.c |   44 ++++++++++++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 14 deletions(-)

Index: linux-2.6/drivers/md/dm-writecache.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-writecache.c	2020-04-13 18:27:52.000000000 +0200
+++ linux-2.6/drivers/md/dm-writecache.c	2020-04-15 09:39:27.000000000 +0200
@@ -931,6 +931,24 @@ static int writecache_alloc_entries(stru
 	return 0;
 }
 
+static int writecache_read_metadata(struct dm_writecache *wc, sector_t n_sectors)
+{
+	struct dm_io_region region;
+	struct dm_io_request req;
+
+	region.bdev = wc->ssd_dev->bdev;
+	region.sector = wc->start_sector;
+	region.count = n_sectors;
+	req.bi_op = REQ_OP_READ;
+	req.bi_op_flags = REQ_SYNC;
+	req.mem.type = DM_IO_VMA;
+	req.mem.ptr.vma = (char *)wc->memory_map;
+	req.client = wc->dm_io;
+	req.notify.fn = NULL;
+
+	return dm_io(&req, 1, &region, NULL);
+}
+
 static void writecache_resume(struct dm_target *ti)
 {
 	struct dm_writecache *wc = ti->private;
@@ -941,8 +959,16 @@ static void writecache_resume(struct dm_
 
 	wc_lock(wc);
 
-	if (WC_MODE_PMEM(wc))
+	if (WC_MODE_PMEM(wc)) {
 		persistent_memory_invalidate_cache(wc->memory_map, wc->memory_map_size);
+	} else {
+		r = writecache_read_metadata(wc, wc->metadata_sectors);
+		if (r) {
+			writecache_error(wc, r, "unable to read metadata: %d", r);
+			memset((char *)wc->memory_map + offsetof(struct wc_memory_superblock, entries), -1,
+			       (wc->metadata_sectors << SECTOR_SHIFT) - offsetof(struct wc_memory_superblock, entries));
+		}
+	}
 
 	wc->tree = RB_ROOT;
 	INIT_LIST_HEAD(&wc->lru);
@@ -2200,8 +2226,6 @@ invalid_optional:
 			goto bad;
 		}
 	} else {
-		struct dm_io_region region;
-		struct dm_io_request req;
 		size_t n_blocks, n_metadata_blocks;
 		uint64_t n_bitmap_bits;
 
@@ -2258,17 +2282,9 @@ invalid_optional:
 			goto bad;
 		}
 
-		region.bdev = wc->ssd_dev->bdev;
-		region.sector = wc->start_sector;
-		region.count = wc->metadata_sectors;
-		req.bi_op = REQ_OP_READ;
-		req.bi_op_flags = REQ_SYNC;
-		req.mem.type = DM_IO_VMA;
-		req.mem.ptr.vma = (char *)wc->memory_map;
-		req.client = wc->dm_io;
-		req.notify.fn = NULL;
-
-		r = dm_io(&req, 1, &region, NULL);
+		r = writecache_read_metadata(wc,
+			min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
+			    (sector_t)wc->metadata_sectors));
 		if (r) {
 			ti->error = "Unable to read metadata";
 			goto bad;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm writecache: fix data corruption when reloading the target
  2020-04-15  8:14   ` Mikulas Patocka
  2020-04-15 12:31     ` [PATCH v2] " Mikulas Patocka
@ 2020-04-15 13:01     ` Mike Snitzer
  2020-04-15 14:49       ` Mikulas Patocka
  1 sibling, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2020-04-15 13:01 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, David Teigland

On Wed, Apr 15 2020 at  4:14am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Tue, 14 Apr 2020, Mike Snitzer wrote:
> 
> > On Wed, Apr 08 2020 at  3:02pm -0400,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > > The dm-writecache reads metadata in the target constructor. However, when 
> > > we reload the target, there could be another active instance running on 
> > > the same device. This is the sequence of operations when doing a reload:
> > > 
> > > 1. construct new target
> > > 2. suspend old target
> > > 3. resume new target
> > > 4. destroy old target
> > > 
> > > Metadata that were written by the old target between steps 1 and 2 would
> > > not be visible by the new target.
> > > 
> > > This patch fixes the data corruption by loading the metadata in the resume
> > > handler.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > Cc: stable@vger.kernel.org	# v4.18+
> > > Fixes: 48debafe4f2f ("dm: add writecache target")
> > > 
> > > ---
> > >  drivers/md/dm-writecache.c |   44 ++++++++++++++++++++++++++++++--------------
> > >  1 file changed, 30 insertions(+), 14 deletions(-)
> > > 
> > > Index: linux-2.6/drivers/md/dm-writecache.c
> > > ===================================================================
> > > --- linux-2.6.orig/drivers/md/dm-writecache.c	2020-04-08 14:47:17.000000000 +0200
> > > +++ linux-2.6/drivers/md/dm-writecache.c	2020-04-08 20:59:15.000000000 +0200
> > > @@ -931,6 +931,24 @@ static int writecache_alloc_entries(stru
> > >  	return 0;
> > >  }
> > >  
> > > +static int writecache_read_metadata(struct dm_writecache *wc, sector_t n_sectors)
> > > +{
> > > +	struct dm_io_region region;
> > > +	struct dm_io_request req;
> > > +
> > > +	region.bdev = wc->ssd_dev->bdev;
> > > +	region.sector = wc->start_sector;
> > > +	region.count = wc->metadata_sectors;
> > > +	req.bi_op = REQ_OP_READ;
> > > +	req.bi_op_flags = REQ_SYNC;
> > > +	req.mem.type = DM_IO_VMA;
> > > +	req.mem.ptr.vma = (char *)wc->memory_map;
> > > +	req.client = wc->dm_io;
> > > +	req.notify.fn = NULL;
> > > +
> > > +	return dm_io(&req, 1, &region, NULL);
> > > +}
> > > +
> > 
> > You aren't using the passed n_sectors (for region.count?)
> > 
> > 
> > >  static void writecache_resume(struct dm_target *ti)
> > >  {
> > >  	struct dm_writecache *wc = ti->private;
> > > @@ -941,8 +959,16 @@ static void writecache_resume(struct dm_
> > >  
> > >  	wc_lock(wc);
> > >  
> > > -	if (WC_MODE_PMEM(wc))
> > > +	if (WC_MODE_PMEM(wc)) {
> > >  		persistent_memory_invalidate_cache(wc->memory_map, wc->memory_map_size);
> > > +	} else {
> > > +		r = writecache_read_metadata(wc, wc->metadata_sectors);
> > > +		if (r) {
> > > +			writecache_error(wc, r, "unable to read metadata: %d", r);
> > > +			memset((char *)wc->memory_map + offsetof(struct wc_memory_superblock, entries), -1,
> > > +			       (wc->metadata_sectors << SECTOR_SHIFT) - offsetof(struct wc_memory_superblock, entries));
> > > +		}
> > > +	}
> > >  
> > >  	wc->tree = RB_ROOT;
> > >  	INIT_LIST_HEAD(&wc->lru);
> > > @@ -2200,8 +2226,6 @@ invalid_optional:
> > >  			goto bad;
> > >  		}
> > >  	} else {
> > > -		struct dm_io_region region;
> > > -		struct dm_io_request req;
> > >  		size_t n_blocks, n_metadata_blocks;
> > >  		uint64_t n_bitmap_bits;
> > >  
> > > @@ -2258,17 +2282,9 @@ invalid_optional:
> > >  			goto bad;
> > >  		}
> > >  
> > > -		region.bdev = wc->ssd_dev->bdev;
> > > -		region.sector = wc->start_sector;
> > > -		region.count = wc->metadata_sectors;
> > > -		req.bi_op = REQ_OP_READ;
> > > -		req.bi_op_flags = REQ_SYNC;
> > > -		req.mem.type = DM_IO_VMA;
> > > -		req.mem.ptr.vma = (char *)wc->memory_map;
> > > -		req.client = wc->dm_io;
> > > -		req.notify.fn = NULL;
> > > -
> > > -		r = dm_io(&req, 1, &region, NULL);
> > > +		r = writecache_read_metadata(wc,
> > > +			min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
> > > +			    (sector_t)wc->metadata_sectors));
> > 
> > Can you explain why this is needed?  Why isn't wc->metadata_sectors
> > already compatible with wc->ssd_dev->bdev ?
> 
> bdev_logical_block_size is the minimum size accepted by the device. If we 
> used just bdev_logical_block_size(wc->ssd_dev->bdev), someone could (by 
> using extremely small device with large logical_block_size) trigger 
> writing out of the allocated memory.

OK...
 
> > Yet you just use wc->metadata_sectors in the new call to
> > writecache_read_metadata() in writecache_resume()...
> 
> This was my mistake. Change it to "region.count = n_sectors";

sure, that addresses one aspect.  But I'm also asking:
given what yoou said above about reading past end of smaller device, why
is it safe to do this in writecache_resume ?

r = writecache_read_metadata(wc, wc->metadata_sectors);

Shouldn't ctr do extra validation and then all calls to
writecache_read_metadata() use wc->metadata_sectors?  Which would remove
need to pass extra 'n_sectors' arg to writecache_read_metadata()?

Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm writecache: fix data corruption when reloading the target
  2020-04-15 13:01     ` Mike Snitzer
@ 2020-04-15 14:49       ` Mikulas Patocka
  2020-04-15 14:53         ` Mike Snitzer
  2020-04-15 15:01         ` Mikulas Patocka
  0 siblings, 2 replies; 8+ messages in thread
From: Mikulas Patocka @ 2020-04-15 14:49 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, David Teigland



On Wed, 15 Apr 2020, Mike Snitzer wrote:

> > > > +		r = writecache_read_metadata(wc,
> > > > +			min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
> > > > +			    (sector_t)wc->metadata_sectors));
> > > 
> > > Can you explain why this is needed?  Why isn't wc->metadata_sectors
> > > already compatible with wc->ssd_dev->bdev ?
> > 
> > bdev_logical_block_size is the minimum size accepted by the device. If we 
> > used just bdev_logical_block_size(wc->ssd_dev->bdev), someone could (by 
> > using extremely small device with large logical_block_size) trigger 
> > writing out of the allocated memory.
> 
> OK...
>  
> > > Yet you just use wc->metadata_sectors in the new call to
> > > writecache_read_metadata() in writecache_resume()...
> > 
> > This was my mistake. Change it to "region.count = n_sectors";
> 
> sure, that addresses one aspect.  But I'm also asking:
> given what yoou said above about reading past end of smaller device, why
> is it safe to do this in writecache_resume ?
> 
> r = writecache_read_metadata(wc, wc->metadata_sectors);
> 
> Shouldn't ctr do extra validation and then all calls to
> writecache_read_metadata() use wc->metadata_sectors?  Which would remove
> need to pass extra 'n_sectors' arg to writecache_read_metadata()?
> 
> Mike

wc->memory_map = vmalloc(n_metadata_blocks << wc->block_size_bits);
...
wc->metadata_sectors = n_metadata_blocks << (wc->block_size_bits - SECTOR_SHIFT);

So we are always sure that we can read/write wc->metadata_sectors safely. 

The problem is - what if bdev_logical_block_size is larger than 
wc->metadata_sectors? Then, we would overread past the end of allocated 
memory. The device wouldn't work anyway in this case, so perhaps a better 
solution would be to reject this as an error in the constructor.

Mikulas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm writecache: fix data corruption when reloading the target
  2020-04-15 14:49       ` Mikulas Patocka
@ 2020-04-15 14:53         ` Mike Snitzer
  2020-04-15 15:01         ` Mikulas Patocka
  1 sibling, 0 replies; 8+ messages in thread
From: Mike Snitzer @ 2020-04-15 14:53 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, David Teigland

On Wed, Apr 15 2020 at 10:49am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Wed, 15 Apr 2020, Mike Snitzer wrote:
> 
> > > > > +		r = writecache_read_metadata(wc,
> > > > > +			min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
> > > > > +			    (sector_t)wc->metadata_sectors));
> > > > 
> > > > Can you explain why this is needed?  Why isn't wc->metadata_sectors
> > > > already compatible with wc->ssd_dev->bdev ?
> > > 
> > > bdev_logical_block_size is the minimum size accepted by the device. If we 
> > > used just bdev_logical_block_size(wc->ssd_dev->bdev), someone could (by 
> > > using extremely small device with large logical_block_size) trigger 
> > > writing out of the allocated memory.
> > 
> > OK...
> >  
> > > > Yet you just use wc->metadata_sectors in the new call to
> > > > writecache_read_metadata() in writecache_resume()...
> > > 
> > > This was my mistake. Change it to "region.count = n_sectors";
> > 
> > sure, that addresses one aspect.  But I'm also asking:
> > given what yoou said above about reading past end of smaller device, why
> > is it safe to do this in writecache_resume ?
> > 
> > r = writecache_read_metadata(wc, wc->metadata_sectors);
> > 
> > Shouldn't ctr do extra validation and then all calls to
> > writecache_read_metadata() use wc->metadata_sectors?  Which would remove
> > need to pass extra 'n_sectors' arg to writecache_read_metadata()?
> > 
> > Mike
> 
> wc->memory_map = vmalloc(n_metadata_blocks << wc->block_size_bits);
> ...
> wc->metadata_sectors = n_metadata_blocks << (wc->block_size_bits - SECTOR_SHIFT);
> 
> So we are always sure that we can read/write wc->metadata_sectors safely. 
> 
> The problem is - what if bdev_logical_block_size is larger than 
> wc->metadata_sectors? Then, we would overread past the end of allocated 
> memory. The device wouldn't work anyway in this case, so perhaps a better 
> solution would be to reject this as an error in the constructor.

Yes, please reject in ctr.  No point allowing writecache to limp along
only to fail IO later.

By failing accordingly in ctr that'll allow writecache_read_metadata()
to not need an n_sectors override.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dm writecache: fix data corruption when reloading the target
  2020-04-15 14:49       ` Mikulas Patocka
  2020-04-15 14:53         ` Mike Snitzer
@ 2020-04-15 15:01         ` Mikulas Patocka
  1 sibling, 0 replies; 8+ messages in thread
From: Mikulas Patocka @ 2020-04-15 15:01 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, David Teigland



On Wed, 15 Apr 2020, Mikulas Patocka wrote:

> 
> 
> On Wed, 15 Apr 2020, Mike Snitzer wrote:
> 
> > > > > +		r = writecache_read_metadata(wc,
> > > > > +			min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
> > > > > +			    (sector_t)wc->metadata_sectors));
> > > > 
> > > > Can you explain why this is needed?  Why isn't wc->metadata_sectors
> > > > already compatible with wc->ssd_dev->bdev ?
> > > 
> > > bdev_logical_block_size is the minimum size accepted by the device. If we 
> > > used just bdev_logical_block_size(wc->ssd_dev->bdev), someone could (by 
> > > using extremely small device with large logical_block_size) trigger 
> > > writing out of the allocated memory.
> > 
> > OK...
> >  
> > > > Yet you just use wc->metadata_sectors in the new call to
> > > > writecache_read_metadata() in writecache_resume()...
> > > 
> > > This was my mistake. Change it to "region.count = n_sectors";
> > 
> > sure, that addresses one aspect.  But I'm also asking:
> > given what yoou said above about reading past end of smaller device, why
> > is it safe to do this in writecache_resume ?
> > 
> > r = writecache_read_metadata(wc, wc->metadata_sectors);
> > 
> > Shouldn't ctr do extra validation and then all calls to
> > writecache_read_metadata() use wc->metadata_sectors?  Which would remove
> > need to pass extra 'n_sectors' arg to writecache_read_metadata()?
> > 
> > Mike
> 
> wc->memory_map = vmalloc(n_metadata_blocks << wc->block_size_bits);
> ...
> wc->metadata_sectors = n_metadata_blocks << (wc->block_size_bits - SECTOR_SHIFT);
> 
> So we are always sure that we can read/write wc->metadata_sectors safely. 
> 
> The problem is - what if bdev_logical_block_size is larger than 
> wc->metadata_sectors? Then, we would overread past the end of allocated 
> memory. The device wouldn't work anyway in this case, so perhaps a better 
> solution would be to reject this as an error in the constructor.
> 
> Mikulas

... or, we can use wc->block_size >> SECTOR_SHIFT. It is guaranteed that 
n_metadata_blocks has at least one block, so it won't over-read pass the 
end of the device.

The problem with bdev_logical_block_size is that it may change if the 
device under us is reloaded, so it is not safe to rely on it being stable.

Mikulas

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-04-15 15:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-08 19:02 [PATCH] dm writecache: fix data corruption when reloading the target Mikulas Patocka
2020-04-14 19:05 ` Mike Snitzer
2020-04-15  8:14   ` Mikulas Patocka
2020-04-15 12:31     ` [PATCH v2] " Mikulas Patocka
2020-04-15 13:01     ` Mike Snitzer
2020-04-15 14:49       ` Mikulas Patocka
2020-04-15 14:53         ` Mike Snitzer
2020-04-15 15:01         ` Mikulas Patocka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.