dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [dm-devel] [PATCH 0/1] dm era: Fix digestion bug that can lead to lost writes
@ 2021-01-22 15:22 Nikos Tsironis
  2021-01-22 15:22 ` [dm-devel] [PATCH 1/1] dm era: Reinitialize bitset cache before digesting a new writeset Nikos Tsironis
  0 siblings, 1 reply; 2+ messages in thread
From: Nikos Tsironis @ 2021-01-22 15:22 UTC (permalink / raw)
  To: snitzer, agk, dm-devel; +Cc: ejt, ntsironis

In case of devices with at most 64 blocks, the digestion of consecutive
eras uses the writeset of the first era as the writeset of all eras to
digest, leading to lost writes. That is, we lose the information about
what blocks were written during the affected eras.

The root cause of the bug is a failure to reinitialize the on-disk
bitset cache when the digestion code starts digesting a new writeset.

Steps to reproduce
------------------

1. Create two LVs, one for data and one for metadata

   # lvcreate -n eradata -L1G datavg
   # lvcreate -n erameta -L64M datavg

2. Fill the whole data device with zeroes

   # dd if=/dev/zero of=/dev/datavg/eradata oflag=direct bs=1M

3. Create a dm-delay device, which inserts a 500 msec delay to writes:

   # dmsetup create delaymeta --table "0 `blockdev --getsz \
     /dev/datavg/erameta` delay /dev/datavg/erameta 0 0 /dev/datavg/erameta 0 500"

4. Create a 256MiB (64 4MiB blocks) dm-era device, using the data LV for
   data and the dm-delay device for its metadata. We set the tracking
   granularity to 4MiB.

   # dmsetup create eradev --table "0 524288 era /dev/mapper/delaymeta \
     /dev/datavg/eradata 8192"

5. Run the following script:

   #!/bin/bash

   # Write to block #0 during era 1
   dd if=/dev/urandom of=/dev/mapper/eradev oflag=direct bs=4K count=1

   # Increase era to 2
   dmsetup message eradev 0 checkpoint

   # Write to block #1 during era 2
   dd if=/dev/urandom of=/dev/mapper/eradev oflag=direct bs=4K count=1 seek=1024 &

   # Increase era to 3
   dmsetup message eradev 0 checkpoint

   # Sync the device
   sync /dev/mapper/eradev

6. Remove the device, so we can examine its metadata

   # dmsetup remove eradev

7. Examine the device's metadata with `era_dump --logical /dev/mapper/delaymeta`

   <superblock uuid="" block_size="8192" nr_blocks="64" current_era="3">
       <era_array>
           <era block="0" era="2"/>
           <era block="1" era="0"/>
           <era block="2" era="0"/>
           ...
           <era block="63" era="0"/>
       </era_array>
   </superblock>

   We see that:
    a. Block #0 is marked as last written during era 2, whereas we wrote
       to it only during era 1
    b. Block #1 is not marked as written at all, whereas we wrote to it
       during era 2

8. Examining the data device, e.g., with `hexdump /dev/datavg/eradata`,
   we can see that both blocks #0 and #1 are written, as expected.

Nikos Tsironis (1):
  dm era: Reinitialize bitset cache before digesting a new writeset

 drivers/md/dm-era-target.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

-- 
2.11.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [dm-devel] [PATCH 1/1] dm era: Reinitialize bitset cache before digesting a new writeset
  2021-01-22 15:22 [dm-devel] [PATCH 0/1] dm era: Fix digestion bug that can lead to lost writes Nikos Tsironis
@ 2021-01-22 15:22 ` Nikos Tsironis
  0 siblings, 0 replies; 2+ messages in thread
From: Nikos Tsironis @ 2021-01-22 15:22 UTC (permalink / raw)
  To: snitzer, agk, dm-devel; +Cc: ejt, ntsironis

In case of devices with at most 64 blocks, the digestion of consecutive
eras uses the writeset of the first era as the writeset of all eras to
digest, leading to lost writes. That is, we lose the information about
what blocks were written during the affected eras.

The digestion code uses a dm_disk_bitset object to access the archived
writesets. This structure includes a one word (64-bit) cache to reduce
the number of array lookups.

This structure is initialized only once, in metadata_digest_start(),
when we kick off digestion.

But, when we insert a new writeset into the writeset tree, before the
digestion of the previous writeset is done, or equivalently when there
are multiple writesets in the writeset tree to digest, then all these
writesets are digested using the same cache and the cache is not
re-initialized when moving from one writeset to the next.

For devices with more than 64 blocks, i.e., the size of the cache, the
cache is indirectly invalidated when we move to a next set of blocks, so
we avoid the bug.

But for devices with at most 64 blocks we end up using the same cached
data for digesting all archived writesets, i.e., the cache is loaded
when digesting the first writeset and it never gets reloaded, until the
digestion is done.

As a result, the writeset of the first era to digest is used as the
writeset of all the following archived eras, leading to lost writes.

Fix this by reinitializing the dm_disk_bitset structure, and thus
invalidating the cache, every time the digestion code starts digesting a
new writeset.

Fixes: eec40579d84873 ("dm: add era target")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
---
 drivers/md/dm-era-target.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-era-target.c b/drivers/md/dm-era-target.c
index b24e3839bb3a..951e6df409d4 100644
--- a/drivers/md/dm-era-target.c
+++ b/drivers/md/dm-era-target.c
@@ -746,6 +746,12 @@ static int metadata_digest_lookup_writeset(struct era_metadata *md,
 	ws_unpack(&disk, &d->writeset);
 	d->value = cpu_to_le32(key);
 
+	/*
+	 * We initialise another bitset info to avoid any caching side effects
+	 * with the previous one.
+	 */
+	dm_disk_bitset_init(md->tm, &d->info);
+
 	d->nr_bits = min(d->writeset.nr_bits, md->nr_blocks);
 	d->current_bit = 0;
 	d->step = metadata_digest_transcribe_writeset;
@@ -759,12 +765,6 @@ static int metadata_digest_start(struct era_metadata *md, struct digest *d)
 		return 0;
 
 	memset(d, 0, sizeof(*d));
-
-	/*
-	 * We initialise another bitset info to avoid any caching side
-	 * effects with the previous one.
-	 */
-	dm_disk_bitset_init(md->tm, &d->info);
 	d->step = metadata_digest_lookup_writeset;
 
 	return 0;
-- 
2.11.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-01-22 15:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 15:22 [dm-devel] [PATCH 0/1] dm era: Fix digestion bug that can lead to lost writes Nikos Tsironis
2021-01-22 15:22 ` [dm-devel] [PATCH 1/1] dm era: Reinitialize bitset cache before digesting a new writeset Nikos Tsironis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).