dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [dm-devel] [PATCH] dm: call remove_single_exception_chunk before commit_merge
@ 2023-03-10 10:05 Jiazi.Li
  2023-03-22 20:15 ` Mikulas Patocka
  0 siblings, 1 reply; 2+ messages in thread
From: Jiazi.Li @ 2023-03-10 10:05 UTC (permalink / raw)
  To: Alasdair Kergon, Mike Snitzer; +Cc: Jiazi.Li, dm-devel

Assume that the metadata of cow on the disk is corrupted after init
for some reason:
old chunk-id	new chunk-id
0               2
...
x ---> 0        y
After starting merge, old chunk 0 will be updated twice, and old
chunk x will not be updated.
And dm-snap will print err log after merge new chunk 2 to old chunk 0:

<3>[  731.921642]  (1)[4092:kworker/1:0]device-mapper: snapshots:
Corruption detected: exception for block 0 is on disk but not in memory
then set snap->merge_failed to true.

If userspace use "sectors_allocated == metadata_sectors" to determine
whether the merge is completed, there maybe the following race that
makes the userspace unable to know merge fail event:

kernel merge kworker                 userspace process
merge_callback
  ->commit_merge
                                     get snapshot_status by ioctl
  ->remove_single_exception_chunk
  set merge_failed to true
                                     think merge has been completed,
				     switch device to another target

Could we call remove_single_exception_chunk first to solve this race?

Signed-off-by: Jiazi.Li <jiazi.li@transsion.com>
---
 drivers/md/dm-snap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index f766c21408f1..f658d05752f2 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1141,15 +1141,15 @@ static void merge_callback(int read_err, unsigned long write_err, void *context)
 		goto shut;
 	}
 
+	if (remove_single_exception_chunk(s) < 0)
+		goto shut;
+
 	if (s->store->type->commit_merge(s->store,
 					 s->num_merging_chunks) < 0) {
 		DMERR("Write error in exception store: shutting down merge");
 		goto shut;
 	}
 
-	if (remove_single_exception_chunk(s) < 0)
-		goto shut;
-
 	snapshot_merge_next_chunks(s);
 
 	return;
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [dm-devel] [PATCH] dm: call remove_single_exception_chunk before commit_merge
  2023-03-10 10:05 [dm-devel] [PATCH] dm: call remove_single_exception_chunk before commit_merge Jiazi.Li
@ 2023-03-22 20:15 ` Mikulas Patocka
  0 siblings, 0 replies; 2+ messages in thread
From: Mikulas Patocka @ 2023-03-22 20:15 UTC (permalink / raw)
  To: Jiazi.Li; +Cc: Jiazi.Li, dm-devel, Mike Snitzer, Alasdair Kergon

Hi

I don't feel strong need to fix it.

You claim that if you corrupt the snapshot metadata, there is a small race 
condition where the corruption may not be reported to the user.

If you corrupt the snapshot metadata, you obviously get corrupted result 
of the merge. "garbage in - garbage out". I don't think that we need to 
improve handling of corrupted metadata.

I wrote this code 13 years ago, then I forgot the details about it, and I 
don't feel confident touching it unless there is some strong reason.

Mikulas



On Fri, 10 Mar 2023, Jiazi.Li wrote:

> Assume that the metadata of cow on the disk is corrupted after init
> for some reason:
> old chunk-id	new chunk-id
> 0               2
> ...
> x ---> 0        y
> After starting merge, old chunk 0 will be updated twice, and old
> chunk x will not be updated.
> And dm-snap will print err log after merge new chunk 2 to old chunk 0:
> 
> <3>[  731.921642]  (1)[4092:kworker/1:0]device-mapper: snapshots:
> Corruption detected: exception for block 0 is on disk but not in memory
> then set snap->merge_failed to true.
> 
> If userspace use "sectors_allocated == metadata_sectors" to determine
> whether the merge is completed, there maybe the following race that
> makes the userspace unable to know merge fail event:
> 
> kernel merge kworker                 userspace process
> merge_callback
>   ->commit_merge
>                                      get snapshot_status by ioctl
>   ->remove_single_exception_chunk
>   set merge_failed to true
>                                      think merge has been completed,
> 				     switch device to another target
> 
> Could we call remove_single_exception_chunk first to solve this race?
> 
> Signed-off-by: Jiazi.Li <jiazi.li@transsion.com>
> ---
>  drivers/md/dm-snap.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
> index f766c21408f1..f658d05752f2 100644
> --- a/drivers/md/dm-snap.c
> +++ b/drivers/md/dm-snap.c
> @@ -1141,15 +1141,15 @@ static void merge_callback(int read_err, unsigned long write_err, void *context)
>  		goto shut;
>  	}
>  
> +	if (remove_single_exception_chunk(s) < 0)
> +		goto shut;
> +
>  	if (s->store->type->commit_merge(s->store,
>  					 s->num_merging_chunks) < 0) {
>  		DMERR("Write error in exception store: shutting down merge");
>  		goto shut;
>  	}
>  
> -	if (remove_single_exception_chunk(s) < 0)
> -		goto shut;
> -
>  	snapshot_merge_next_chunks(s);
>  
>  	return;
> -- 
> 2.17.1
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
> 
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-03-22 20:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-10 10:05 [dm-devel] [PATCH] dm: call remove_single_exception_chunk before commit_merge Jiazi.Li
2023-03-22 20:15 ` Mikulas Patocka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).