All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Mike Snitzer <snitzer@redhat.com>, Mikulas Patocka <mpatocka@redhat.com>
Cc: device-mapper development <dm-devel@redhat.com>,
	Milan Broz <gmazyland@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [dm-devel] DM Regression in 4.16-rc1 - read() returns data when    it shouldn't
Date: Thu, 15 Feb 2018 11:07:11 +1100	[thread overview]
Message-ID: <87y3jvp13k.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20180214230526.GA25114@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6581 bytes --]

On Wed, Feb 14 2018, Mike Snitzer wrote:

> On Wed, Feb 14 2018 at  3:39pm -0500,
> NeilBrown <neilb@suse.com> wrote:
>
>> On Wed, Feb 14 2018, Milan Broz wrote:
>> 
>> > Hi,
>> >
>> > the commit (found by bisect)
>> >
>> >   commit 18a25da84354c6bb655320de6072c00eda6eb602
>> >   Author: NeilBrown <neilb@suse.com>
>> >   Date:   Wed Sep 6 09:43:28 2017 +1000
>> >
>> >     dm: ensure bio submission follows a depth-first tree walk
>> >
>> > cause serious regression while reading from DM device.
>> >
>> > The reproducer is below, basically it tries to simulate failure we see in cryptsetup
>> > regression test: we have DM device with error and zero target and try to read
>> > "passphrase" from it (it is test for 64 bit offset error path):
>> >
>> > Test device:
>> > # dmsetup table test
>> > 0 10000000 error 
>> > 10000000 1000000 zero 
>> >
>> > We try to run this operation:
>> >   lseek64(fd, 5119999988, SEEK_CUR); // this should seek to error target sector
>> >   read(fd, buf, 13); // this should fail, if we seek to error part of the device
>> >
>> > While on 4.15 the read properly fails:
>> >   Seek returned 5119999988.
>> >   Read returned -1.
>> >
>> > for 4.16 it actually succeeds returning some random data
>> > (perhaps kernel memory, so this bug is even more dangerous):
>> >   Seek returned 5119999988.
>> >   Read returned 13.
>> >
>> > Full reproducer below:
>> >
>> > #define _GNU_SOURCE
>> > #define _LARGEFILE64_SOURCE
>> > #include <stdio.h>
>> > #include <stddef.h>
>> > #include <stdint.h>
>> > #include <stdlib.h>
>> > #include <unistd.h>
>> > #include <fcntl.h>
>> > #include <inttypes.h>
>> >
>> > int main (int argc, char *argv[])
>> > {
>> >         char buf[13];
>> >         int fd;
>> >         //uint64_t offset64 = 5119999999;
>> >         uint64_t offset64 =   5119999988;
>> >         off64_t r;
>> >         ssize_t bytes;
>> >
>> >         system("echo -e \'0 10000000 error\'\\\\n\'10000000 1000000 zero\' | dmsetup create test");
>> >
>> >         fd = open("/dev/mapper/test", O_RDONLY);
>> >         if (fd == -1) {
>> >                 printf("open fail\n");
>> >                 return 1;
>> >         }
>> >
>> >         r = lseek64(fd, offset64, SEEK_CUR);
>> >         printf("Seek returned %" PRIu64 ".\n", r);
>> >         if (r < 0) {
>> >                 printf("seek fail\n");
>> >                 close(fd);
>> >                 return 2;
>> >         }
>> >
>> >         bytes = read(fd, buf, 13);
>> >         printf("Read returned %d.\n", (int)bytes);
>> >
>> >         close(fd);
>> >         return 0;
>> > }
>> >
>> >
>> > Please let me know if you need more info to reproduce it.
>> 
>> Thanks for the detailed report.  I haven't tried to reproduce, but the
>> code looks very weird.
>> The patch I posted "Date: Wed, 06 Sep 2017 09:43:28 +1000" had:
>>       +				struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
>>       +								 md->queue->bio_split);
>>       +				bio_advance(b, (bio_sectors(b) - ci.sector_count) << 9);
>>       +				bio_chain(b, bio);
>>       +				generic_make_request(b);
>>       +				break;
>> 
>> The code in Linux has:
>> 
>> 				struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
>> 								 md->queue->bio_split);
>> 				ci.io->orig_bio = b;
>> 				bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9);
>> 				bio_chain(b, bio);
>> 				ret = generic_make_request(bio);
>> 				break;
>> 
>> So the wrong bio is sent to generic_make_request().
>> Mike: do you remember how that change happened?  I think there were
>> discussions following the patch, but I cannot find anything about making
>> that change.
>
> Mikulas had this feedback:
> https://www.redhat.com/archives/dm-devel/2017-November/msg00159.html
>
> You replied with:
> https://www.redhat.com/archives/dm-devel/2017-November/msg00165.html
>
> Where you said "Yes, you are right something like that would be better."
>
> And you then provided a follow-up patch:
> https://www.redhat.com/archives/dm-devel/2017-November/msg00175.html
>
> That we discussed and I said I'd just fold it into the original, and you
> agreed and thanked me for checking with you ;)
> https://www.redhat.com/archives/dm-devel/2017-November/msg00208.html
>
> Anyway, I'm just about to switch to Daddy Daycare mode (need to get my
> daughter up from her nap, feed her dinner, etc) so: I'll circle back to
> this tomorrow morning.
>

Thanks for the reminder - it's coming back to me now.

And looking again at the code, it doesn't seem so bad.  I was worried
about reassigning ci.io->orig_bio after calling
__split_and_process_non_flush(), but the important thing is to set it
before the last dec_pending() call, and the code gets that right.

I think I've found the problem though:

			/* done with normal IO or empty flush */
			bio->bi_status = io_error;

When we have chained bios, there are multiple sources for error which
need to be merged into bi_status:  all bios that were chained to this
one, and this bio itself.

__bio_chain_endio uses:

	if (!parent->bi_status)
		parent->bi_status = bio->bi_status;

so the new status won't over-ride the old status (unless there are
races....), but dm doesn't do that - I guess it didn't have to worry
about chains before.

So this:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d6de00f367ef..68136806d365 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -903,7 +903,8 @@ static void dec_pending(struct dm_io *io, blk_status_t error)
 			queue_io(md, bio);
 		} else {
 			/* done with normal IO or empty flush */
-			bio->bi_status = io_error;
+			if (io_error)
+				bio->bi_status = io_error;
 			bio_endio(bio);
 		}
 	}

should fix it.  And __bio_chain_endio should probably be

diff --git a/block/bio.c b/block/bio.c
index e1708db48258..ad77140edc6f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
 {
 	struct bio *parent = bio->bi_private;
 
-	if (!parent->bi_status)
+	if (bio->bi_status)
 		parent->bi_status = bio->bi_status;
 	bio_put(bio);
 	return parent;

to remove the chance that a race will hide a real error.

Milan: could you please check that the patch fixes the dm-crypt bug?
I've confirmed that it fixes your test case.
When you confirm, I'll send a proper patch.

I guess I should audit all code that assigns to ->bi_status and make
sure nothing ever assigns 0 to it.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2018-02-15  0:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-14 13:02 DM Regression in 4.16-rc1 - read() returns data when it shouldn't Milan Broz
2018-02-14 20:39 ` NeilBrown
2018-02-14 23:05   ` Mike Snitzer
2018-02-15  0:07     ` NeilBrown [this message]
2018-02-15  7:37       ` [dm-devel] " Milan Broz
2018-02-15  8:52         ` NeilBrown
2018-02-15  9:00 ` [PATCH] dm: correctly handle chained bios in dec_pending() NeilBrown
2018-02-15  9:01 ` [RFC PATCH] dm: don't assign zero to ->bi_status of an active bio NeilBrown
2018-02-15  9:09 ` [PATCH] block: be more careful about status in __bio_chain_endio NeilBrown
2018-02-15  9:09   ` NeilBrown
2019-02-22 21:10   ` Mike Snitzer
2019-02-22 22:46     ` Jens Axboe
2019-02-22 23:55       ` Mike Snitzer
2019-02-23  2:02         ` John Dorminy
2019-02-23  2:02           ` John Dorminy
2019-02-23  2:44           ` Mike Snitzer
2019-02-23  3:10             ` John Dorminy
2019-06-12  2:56               ` John Dorminy
2019-06-12  7:01                 ` Christoph Hellwig
2019-06-17  7:32                   ` Hannes Reinecke
2018-02-19 13:44 ` DM Regression in 4.16-rc1 - read() returns data when it shouldn't Thorsten Leemhuis
2018-02-19 17:15   ` Mike Snitzer
2018-02-19 17:15     ` Mike Snitzer
2018-02-26 10:14     ` Thorsten Leemhuis
2018-02-26 11:01       ` NeilBrown
2018-02-26 17:31         ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y3jvp13k.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=dm-devel@redhat.com \
    --cc=gmazyland@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.