All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] dm clone: Fix discard handling and overflow bugs which could cause data corruption
@ 2020-03-27 14:01 Nikos Tsironis
  2020-03-27 14:01 ` [PATCH 1/4] dm clone: Fix handling of partial region discards Nikos Tsironis
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Nikos Tsironis @ 2020-03-27 14:01 UTC (permalink / raw)
  To: agk, snitzer, dm-devel; +Cc: ntsironis

There is a bug in the way dm-clone handles partial region discards,
which can lead to discarding the wrong blocks or trying to discard
blocks beyond the end of the device.

This could lead to data corruption, if the destination device indeed
discards the underlying blocks, i.e., if the discard operation results
in the original contents of a block to be lost.

The bug manifests when we try to discard part of a single region, i.e.,
when we try to discard a block with size < region_size, and the discard
request both starts at an offset with respect to the beginning of that
region and ends before the end of the region.

The root of the bug is the code that calculates the range of regions
covered by a discard request and decides which regions to discard.

For more information, please see the relevant commit.

As part of fixing this bug, I also audited dm-clone for other
arithmetic/overflow related bugs and found the following:

1. Missing overflow check for the total number of regions
2. Missing casts when converting from regions to sectors
3. Wrong return type of dm_clone_nr_of_hydrated_regions(), which caused
   an unwanted sign extension to occur.

Again, more information can be found in the relevant commits.

Nikos Tsironis (4):
  dm clone: Fix handling of partial region discards
  dm clone: Add overflow check for number of regions
  dm clone: Add missing casts to prevent overflows and data corruption
  dm clone metadata: Fix return type of
    dm_clone_nr_of_hydrated_regions()

 drivers/md/dm-clone-metadata.c | 15 +++++++++-
 drivers/md/dm-clone-metadata.h |  2 +-
 drivers/md/dm-clone-target.c   | 66 ++++++++++++++++++++++++++++++------------
 3 files changed, 62 insertions(+), 21 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] dm clone: Fix handling of partial region discards
  2020-03-27 14:01 [PATCH 0/4] dm clone: Fix discard handling and overflow bugs which could cause data corruption Nikos Tsironis
@ 2020-03-27 14:01 ` Nikos Tsironis
  2020-03-27 14:01 ` [PATCH 2/4] dm clone: Add overflow check for number of regions Nikos Tsironis
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Nikos Tsironis @ 2020-03-27 14:01 UTC (permalink / raw)
  To: agk, snitzer, dm-devel; +Cc: ntsironis

There is a bug in the way dm-clone handles discards, which can lead to
discarding the wrong blocks or trying to discard blocks beyond the end
of the device.

This could lead to data corruption, if the destination device indeed
discards the underlying blocks, i.e., if the discard operation results
in the original contents of a block to be lost.

The root of the problem is the code that calculates the range of regions
covered by a discard request and decides which regions to discard.

Since dm-clone handles the device in units of regions, we don't discard
parts of a region, only whole regions.

The range is calculated as:

    rs = dm_sector_div_up(bio->bi_iter.bi_sector, clone->region_size);
    re = bio_end_sector(bio) >> clone->region_shift;

, where 'rs' is the first region to discard and (re - rs) is the number
of regions to discard.

The bug manifests when we try to discard part of a single region, i.e.,
when we try to discard a block with size < region_size, and the discard
request both starts at an offset with respect to the beginning of that
region and ends before the end of the region.

The root cause is the following comparison:

  if (rs == re)
    // skip discard and complete original bio immediately

, which doesn't take into account that 'rs' might be greater than 're'.

Thus, we then issue a discard request for the wrong blocks, instead of
skipping the discard all together.

Fix the check to also take into account the above case, so we don't end
up discarding the wrong blocks.

Also, add some range checks to dm_clone_set_region_hydrated() and
dm_clone_cond_set_range(), which update dm-clone's region bitmap.

Note that the aforementioned bug doesn't cause invalid memory accesses,
because dm_clone_is_range_hydrated() returns True for this case, so the
checks are just precautionary.

Fixes: 7431b7835f55 ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
---
 drivers/md/dm-clone-metadata.c | 13 +++++++++++++
 drivers/md/dm-clone-target.c   | 43 ++++++++++++++++++++++++++++--------------
 2 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/drivers/md/dm-clone-metadata.c b/drivers/md/dm-clone-metadata.c
index c05b12110456..199e7af00858 100644
--- a/drivers/md/dm-clone-metadata.c
+++ b/drivers/md/dm-clone-metadata.c
@@ -850,6 +850,12 @@ int dm_clone_set_region_hydrated(struct dm_clone_metadata *cmd, unsigned long re
 	struct dirty_map *dmap;
 	unsigned long word, flags;
 
+	if (unlikely(region_nr >= cmd->nr_regions)) {
+		DMERR("Region %lu out of range (total number of regions %lu)",
+		      region_nr, cmd->nr_regions);
+		return -ERANGE;
+	}
+
 	word = region_nr / BITS_PER_LONG;
 
 	spin_lock_irqsave(&cmd->bitmap_lock, flags);
@@ -879,6 +885,13 @@ int dm_clone_cond_set_range(struct dm_clone_metadata *cmd, unsigned long start,
 	struct dirty_map *dmap;
 	unsigned long word, region_nr;
 
+	if (unlikely(start >= cmd->nr_regions || (start + nr_regions) < start ||
+		     (start + nr_regions) > cmd->nr_regions)) {
+		DMERR("Invalid region range: start %lu, nr_regions %lu (total number of regions %lu)",
+		      start, nr_regions, cmd->nr_regions);
+		return -ERANGE;
+	}
+
 	spin_lock_irq(&cmd->bitmap_lock);
 
 	if (cmd->read_only) {
diff --git a/drivers/md/dm-clone-target.c b/drivers/md/dm-clone-target.c
index d1e1b5b56b1b..022dddcad647 100644
--- a/drivers/md/dm-clone-target.c
+++ b/drivers/md/dm-clone-target.c
@@ -293,10 +293,17 @@ static inline unsigned long bio_to_region(struct clone *clone, struct bio *bio)
 
 /* Get the region range covered by the bio */
 static void bio_region_range(struct clone *clone, struct bio *bio,
-			     unsigned long *rs, unsigned long *re)
+			     unsigned long *rs, unsigned long *nr_regions)
 {
+	unsigned long end;
+
 	*rs = dm_sector_div_up(bio->bi_iter.bi_sector, clone->region_size);
-	*re = bio_end_sector(bio) >> clone->region_shift;
+	end = bio_end_sector(bio) >> clone->region_shift;
+
+	if (*rs >= end)
+		*nr_regions = 0;
+	else
+		*nr_regions = end - *rs;
 }
 
 /* Check whether a bio overwrites a region */
@@ -454,7 +461,7 @@ static void trim_bio(struct bio *bio, sector_t sector, unsigned int len)
 
 static void complete_discard_bio(struct clone *clone, struct bio *bio, bool success)
 {
-	unsigned long rs, re;
+	unsigned long rs, nr_regions;
 
 	/*
 	 * If the destination device supports discards, remap and trim the
@@ -463,9 +470,9 @@ static void complete_discard_bio(struct clone *clone, struct bio *bio, bool succ
 	 */
 	if (test_bit(DM_CLONE_DISCARD_PASSDOWN, &clone->flags) && success) {
 		remap_to_dest(clone, bio);
-		bio_region_range(clone, bio, &rs, &re);
+		bio_region_range(clone, bio, &rs, &nr_regions);
 		trim_bio(bio, rs << clone->region_shift,
-			 (re - rs) << clone->region_shift);
+			 nr_regions << clone->region_shift);
 		generic_make_request(bio);
 	} else
 		bio_endio(bio);
@@ -473,12 +480,21 @@ static void complete_discard_bio(struct clone *clone, struct bio *bio, bool succ
 
 static void process_discard_bio(struct clone *clone, struct bio *bio)
 {
-	unsigned long rs, re;
+	unsigned long rs, nr_regions;
 
-	bio_region_range(clone, bio, &rs, &re);
-	BUG_ON(re > clone->nr_regions);
+	bio_region_range(clone, bio, &rs, &nr_regions);
+	if (!nr_regions) {
+		bio_endio(bio);
+		return;
+	}
 
-	if (unlikely(rs == re)) {
+	if (WARN_ON(rs >= clone->nr_regions || (rs + nr_regions) < rs ||
+		    (rs + nr_regions) > clone->nr_regions)) {
+		DMERR("%s: Invalid range (%lu + %lu, total regions %lu) for discard (%llu + %u)",
+		      clone_device_name(clone), rs, nr_regions,
+		      clone->nr_regions,
+		      (unsigned long long)bio->bi_iter.bi_sector,
+		      bio_sectors(bio));
 		bio_endio(bio);
 		return;
 	}
@@ -487,7 +503,7 @@ static void process_discard_bio(struct clone *clone, struct bio *bio)
 	 * The covered regions are already hydrated so we just need to pass
 	 * down the discard.
 	 */
-	if (dm_clone_is_range_hydrated(clone->cmd, rs, re - rs)) {
+	if (dm_clone_is_range_hydrated(clone->cmd, rs, nr_regions)) {
 		complete_discard_bio(clone, bio, true);
 		return;
 	}
@@ -1169,7 +1185,7 @@ static void process_deferred_discards(struct clone *clone)
 	int r = -EPERM;
 	struct bio *bio;
 	struct blk_plug plug;
-	unsigned long rs, re;
+	unsigned long rs, nr_regions;
 	struct bio_list discards = BIO_EMPTY_LIST;
 
 	spin_lock_irq(&clone->lock);
@@ -1185,14 +1201,13 @@ static void process_deferred_discards(struct clone *clone)
 
 	/* Update the metadata */
 	bio_list_for_each(bio, &discards) {
-		bio_region_range(clone, bio, &rs, &re);
+		bio_region_range(clone, bio, &rs, &nr_regions);
 		/*
 		 * A discard request might cover regions that have been already
 		 * hydrated. There is no need to update the metadata for these
 		 * regions.
 		 */
-		r = dm_clone_cond_set_range(clone->cmd, rs, re - rs);
-
+		r = dm_clone_cond_set_range(clone->cmd, rs, nr_regions);
 		if (unlikely(r))
 			break;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] dm clone: Add overflow check for number of regions
  2020-03-27 14:01 [PATCH 0/4] dm clone: Fix discard handling and overflow bugs which could cause data corruption Nikos Tsironis
  2020-03-27 14:01 ` [PATCH 1/4] dm clone: Fix handling of partial region discards Nikos Tsironis
@ 2020-03-27 14:01 ` Nikos Tsironis
  2020-03-27 14:01 ` [PATCH 3/4] dm clone: Add missing casts to prevent overflows and data corruption Nikos Tsironis
  2020-03-27 14:01 ` [PATCH 4/4] dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() Nikos Tsironis
  3 siblings, 0 replies; 5+ messages in thread
From: Nikos Tsironis @ 2020-03-27 14:01 UTC (permalink / raw)
  To: agk, snitzer, dm-devel; +Cc: ntsironis

Add overflow check for clone->nr_regions variable, which holds the
number of regions of the target.

The overflow can occur with sufficiently large devices, if BITS_PER_LONG
== 32. E.g., if the region size is 8 sectors (4K), the overflow would
occur for device sizes > 34359738360 sectors (~16TB).

This could result in multiple device sectors wrongly mapping to the same
region number, due to the truncation from 64 bits to 32 bits, which
would lead to data corruption.

Fixes: 7431b7835f55 ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
---
 drivers/md/dm-clone-target.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-clone-target.c b/drivers/md/dm-clone-target.c
index 022dddcad647..6ee85fb3388a 100644
--- a/drivers/md/dm-clone-target.c
+++ b/drivers/md/dm-clone-target.c
@@ -1790,6 +1790,7 @@ static int copy_ctr_args(struct clone *clone, int argc, const char **argv, char
 static int clone_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 {
 	int r;
+	sector_t nr_regions;
 	struct clone *clone;
 	struct dm_arg_set as;
 
@@ -1831,7 +1832,16 @@ static int clone_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto out_with_source_dev;
 
 	clone->region_shift = __ffs(clone->region_size);
-	clone->nr_regions = dm_sector_div_up(ti->len, clone->region_size);
+	nr_regions = dm_sector_div_up(ti->len, clone->region_size);
+
+	/* Check for overflow */
+	if (nr_regions != (unsigned long)nr_regions) {
+		ti->error = "Too many regions. Consider increasing the region size";
+		r = -EOVERFLOW;
+		goto out_with_source_dev;
+	}
+
+	clone->nr_regions = nr_regions;
 
 	r = validate_nr_regions(clone->nr_regions, &ti->error);
 	if (r)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] dm clone: Add missing casts to prevent overflows and data corruption
  2020-03-27 14:01 [PATCH 0/4] dm clone: Fix discard handling and overflow bugs which could cause data corruption Nikos Tsironis
  2020-03-27 14:01 ` [PATCH 1/4] dm clone: Fix handling of partial region discards Nikos Tsironis
  2020-03-27 14:01 ` [PATCH 2/4] dm clone: Add overflow check for number of regions Nikos Tsironis
@ 2020-03-27 14:01 ` Nikos Tsironis
  2020-03-27 14:01 ` [PATCH 4/4] dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() Nikos Tsironis
  3 siblings, 0 replies; 5+ messages in thread
From: Nikos Tsironis @ 2020-03-27 14:01 UTC (permalink / raw)
  To: agk, snitzer, dm-devel; +Cc: ntsironis

Add missing casts when converting from regions to sectors.

In case BITS_PER_LONG == 32, the lack of the appropriate casts can lead
to overflows and miscalculation of the device sector.

As a result, we could end up discarding and/or copying the wrong parts
of the device, thus corrupting the device's data.

Fixes: 7431b7835f55 ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
---
 drivers/md/dm-clone-target.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-clone-target.c b/drivers/md/dm-clone-target.c
index 6ee85fb3388a..ca5020c58f7c 100644
--- a/drivers/md/dm-clone-target.c
+++ b/drivers/md/dm-clone-target.c
@@ -282,7 +282,7 @@ static bool bio_triggers_commit(struct clone *clone, struct bio *bio)
 /* Get the address of the region in sectors */
 static inline sector_t region_to_sector(struct clone *clone, unsigned long region_nr)
 {
-	return (region_nr << clone->region_shift);
+	return ((sector_t)region_nr << clone->region_shift);
 }
 
 /* Get the region number of the bio */
@@ -471,7 +471,7 @@ static void complete_discard_bio(struct clone *clone, struct bio *bio, bool succ
 	if (test_bit(DM_CLONE_DISCARD_PASSDOWN, &clone->flags) && success) {
 		remap_to_dest(clone, bio);
 		bio_region_range(clone, bio, &rs, &nr_regions);
-		trim_bio(bio, rs << clone->region_shift,
+		trim_bio(bio, region_to_sector(clone, rs),
 			 nr_regions << clone->region_shift);
 		generic_make_request(bio);
 	} else
@@ -804,11 +804,14 @@ static void hydration_copy(struct dm_clone_region_hydration *hd, unsigned int nr
 	struct dm_io_region from, to;
 	struct clone *clone = hd->clone;
 
+	if (WARN_ON(!nr_regions))
+		return;
+
 	region_size = clone->region_size;
 	region_start = hd->region_nr;
 	region_end = region_start + nr_regions - 1;
 
-	total_size = (nr_regions - 1) << clone->region_shift;
+	total_size = region_to_sector(clone, nr_regions - 1);
 
 	if (region_end == clone->nr_regions - 1) {
 		/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions()
  2020-03-27 14:01 [PATCH 0/4] dm clone: Fix discard handling and overflow bugs which could cause data corruption Nikos Tsironis
                   ` (2 preceding siblings ...)
  2020-03-27 14:01 ` [PATCH 3/4] dm clone: Add missing casts to prevent overflows and data corruption Nikos Tsironis
@ 2020-03-27 14:01 ` Nikos Tsironis
  3 siblings, 0 replies; 5+ messages in thread
From: Nikos Tsironis @ 2020-03-27 14:01 UTC (permalink / raw)
  To: agk, snitzer, dm-devel; +Cc: ntsironis

dm_clone_nr_of_hydrated_regions() returns the number of regions that
have been hydrated so far. In order to do so it employs bitmap_weight().

Until now, the return type of dm_clone_nr_of_hydrated_regions() was
unsigned long.

Because bitmap_weight() returns an int, in case BITS_PER_LONG == 64 and
the return value of bitmap_weight() is 2^31 (the maximum allowed number
of regions for a device), the result is sign extended from 32 bits to 64
bits and an incorrect value is displayed, in the status output of
dm-clone, as the number of hydrated regions.

Fix this by having dm_clone_nr_of_hydrated_regions() return an unsigned
int.

Fixes: 7431b7835f55 ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
---
 drivers/md/dm-clone-metadata.c | 2 +-
 drivers/md/dm-clone-metadata.h | 2 +-
 drivers/md/dm-clone-target.c   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-clone-metadata.c b/drivers/md/dm-clone-metadata.c
index 199e7af00858..17712456fa63 100644
--- a/drivers/md/dm-clone-metadata.c
+++ b/drivers/md/dm-clone-metadata.c
@@ -656,7 +656,7 @@ bool dm_clone_is_range_hydrated(struct dm_clone_metadata *cmd,
 	return (bit >= (start + nr_regions));
 }
 
-unsigned long dm_clone_nr_of_hydrated_regions(struct dm_clone_metadata *cmd)
+unsigned int dm_clone_nr_of_hydrated_regions(struct dm_clone_metadata *cmd)
 {
 	return bitmap_weight(cmd->region_map, cmd->nr_regions);
 }
diff --git a/drivers/md/dm-clone-metadata.h b/drivers/md/dm-clone-metadata.h
index 14af1ebd853f..d848b8799c07 100644
--- a/drivers/md/dm-clone-metadata.h
+++ b/drivers/md/dm-clone-metadata.h
@@ -156,7 +156,7 @@ bool dm_clone_is_range_hydrated(struct dm_clone_metadata *cmd,
 /*
  * Returns the number of hydrated regions.
  */
-unsigned long dm_clone_nr_of_hydrated_regions(struct dm_clone_metadata *cmd);
+unsigned int dm_clone_nr_of_hydrated_regions(struct dm_clone_metadata *cmd);
 
 /*
  * Returns the first unhydrated region with region_nr >= @start
diff --git a/drivers/md/dm-clone-target.c b/drivers/md/dm-clone-target.c
index ca5020c58f7c..5ce96ddf1ce1 100644
--- a/drivers/md/dm-clone-target.c
+++ b/drivers/md/dm-clone-target.c
@@ -1473,7 +1473,7 @@ static void clone_status(struct dm_target *ti, status_type_t type,
 			goto error;
 		}
 
-		DMEMIT("%u %llu/%llu %llu %lu/%lu %u ",
+		DMEMIT("%u %llu/%llu %llu %u/%lu %u ",
 		       DM_CLONE_METADATA_BLOCK_SIZE,
 		       (unsigned long long)(nr_metadata_blocks - nr_free_metadata_blocks),
 		       (unsigned long long)nr_metadata_blocks,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-27 14:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 14:01 [PATCH 0/4] dm clone: Fix discard handling and overflow bugs which could cause data corruption Nikos Tsironis
2020-03-27 14:01 ` [PATCH 1/4] dm clone: Fix handling of partial region discards Nikos Tsironis
2020-03-27 14:01 ` [PATCH 2/4] dm clone: Add overflow check for number of regions Nikos Tsironis
2020-03-27 14:01 ` [PATCH 3/4] dm clone: Add missing casts to prevent overflows and data corruption Nikos Tsironis
2020-03-27 14:01 ` [PATCH 4/4] dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() Nikos Tsironis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.