linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] dm: fix inflight IO check
@ 2018-12-10 22:45 Jens Axboe
  2018-12-11  0:32 ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2018-12-10 22:45 UTC (permalink / raw)
  To: linux-block, device-mapper development; +Cc: Mike Snitzer, Christoph Hellwig

After switching to percpu inflight counters, the inflight check
is totally buggy. It's perfectly valid for some counters to be
non-zero while having a total inflight IO count of 0, that's how
these kinds of counters work (inc on one CPU, dec on another).
Fix the md_in_flight() check to sum all counters before returning
a false positive, potentially.

While at it, remove the inflight read for IO completion. We don't
need it, just wake anyone that's waiting for the IO count to drop
to zero. The caller needs to re-check that value anyway when woken,
which it does.

Fixes: 6f75723190d8 ("dm: remove the pending IO accounting")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 70568f8b6c53..79ad4b3d215c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -650,14 +650,14 @@ static bool md_in_flight(struct mapped_device *md)
 {
 	int cpu;
 	struct hd_struct *part = &dm_disk(md)->part0;
+	long sum = 0;
 
 	for_each_possible_cpu(cpu) {
-		if (part_stat_local_read_cpu(part, in_flight[0], cpu) ||
-		    part_stat_local_read_cpu(part, in_flight[1], cpu))
-			return true;
+		sum += part_stat_local_read_cpu(part, in_flight[0], cpu);
+		sum += part_stat_local_read_cpu(part, in_flight[1], cpu);
 	}
 
-	return false;
+	return sum != 0;
 }
 
 static void start_io_acct(struct dm_io *io)
@@ -691,10 +691,8 @@ static void end_io_acct(struct dm_io *io)
 				    true, duration, &io->stats_aux);
 
 	/* nudge anyone waiting on suspend queue */
-	if (unlikely(waitqueue_active(&md->wait))) {
-		if (!md_in_flight(md))
-			wake_up(&md->wait);
-	}
+	if (unlikely(waitqueue_active(&md->wait)))
+		wake_up(&md->wait);
 }
 
 /*

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: dm: fix inflight IO check
  2018-12-10 22:45 [PATCH] dm: fix inflight IO check Jens Axboe
@ 2018-12-11  0:32 ` Mike Snitzer
  2018-12-12  2:34   ` xfstests generic/347 was never correct [was: Re: dm: fix inflight IO check] Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2018-12-11  0:32 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, device-mapper development, Christoph Hellwig

On Mon, Dec 10 2018 at  5:45pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> After switching to percpu inflight counters, the inflight check
> is totally buggy. It's perfectly valid for some counters to be
> non-zero while having a total inflight IO count of 0, that's how
> these kinds of counters work (inc on one CPU, dec on another).
> Fix the md_in_flight() check to sum all counters before returning
> a false positive, potentially.
> 
> While at it, remove the inflight read for IO completion. We don't
> need it, just wake anyone that's waiting for the IO count to drop
> to zero. The caller needs to re-check that value anyway when woken,
> which it does.
> 
> Fixes: 6f75723190d8 ("dm: remove the pending IO accounting")
> Reported-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

I'm seeing that device-mapper-test-suite's "resize_io" test doesn't
pass.  Glad this resolves the xfstest issue but I think more work is
needed, so I'll build any additional changes on this fix.

Thanks.

Acked-by: Mike Snitzer <snitzer@redhat.com>


> 
> ---
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 70568f8b6c53..79ad4b3d215c 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -650,14 +650,14 @@ static bool md_in_flight(struct mapped_device *md)
>  {
>  	int cpu;
>  	struct hd_struct *part = &dm_disk(md)->part0;
> +	long sum = 0;
>  
>  	for_each_possible_cpu(cpu) {
> -		if (part_stat_local_read_cpu(part, in_flight[0], cpu) ||
> -		    part_stat_local_read_cpu(part, in_flight[1], cpu))
> -			return true;
> +		sum += part_stat_local_read_cpu(part, in_flight[0], cpu);
> +		sum += part_stat_local_read_cpu(part, in_flight[1], cpu);
>  	}
>  
> -	return false;
> +	return sum != 0;
>  }

Heh, amazing any tests passed.. sorry for this

^ permalink raw reply	[flat|nested] 5+ messages in thread

* xfstests generic/347 was never correct [was: Re: dm: fix inflight IO check]
  2018-12-11  0:32 ` Mike Snitzer
@ 2018-12-12  2:34   ` Mike Snitzer
  2018-12-12  2:58     ` Eric Sandeen
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2018-12-12  2:34 UTC (permalink / raw)
  To: Jens Axboe, sandeen, fstests
  Cc: linux-block, device-mapper development, Christoph Hellwig

On Mon, Dec 10 2018 at  7:32pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Dec 10 2018 at  5:45pm -0500,
> Jens Axboe <axboe@kernel.dk> wrote:
> 
> > After switching to percpu inflight counters, the inflight check
> > is totally buggy. It's perfectly valid for some counters to be
> > non-zero while having a total inflight IO count of 0, that's how
> > these kinds of counters work (inc on one CPU, dec on another).
> > Fix the md_in_flight() check to sum all counters before returning
> > a false positive, potentially.
> > 
> > While at it, remove the inflight read for IO completion. We don't
> > need it, just wake anyone that's waiting for the IO count to drop
> > to zero. The caller needs to re-check that value anyway when woken,
> > which it does.
> > 
> > Fixes: 6f75723190d8 ("dm: remove the pending IO accounting")
> > Reported-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> I'm seeing that device-mapper-test-suite's "resize_io" test doesn't
> pass.  Glad this resolves the xfstest issue but I think more work is
> needed, so I'll build any additional changes on this fix.

Turns out the device-mapper-test-suite 'resize_io' issue I was seeing is
just a long lingering race that just got much more likely on my newer
testbed.  This fixes what I was seeing:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.20&id=f6c367585d0d851349d3a9e607c43e5bea993fa1

But pivoting back to xfstests generic/347: that test was broken from the
start, if it ever _really_ worked it would've been a bug in dm-thinp.
And even after Jens' fix (above) the test still just times out after 60
secs while in out-of-data-space (OODS) mode and gives a false "Passed
all 1 tests".

generic/347 exhausting the thin-pool's data space and then issuing
"sync" will cause IO to hang for 60 secs unless, in parallel, another
process comes along and grows the thin-pool to have more data space.
But even if xfstests' _dmthin_grow function were to be issued in 
parallel with another process it'll fail because it uses
_dmthin_reload_table which "dmsetup suspend"s the thin device (that was
just used by XFS to fill the thin-pool) _without_ using the --noflush
flag for "dmsetup suspend".  So the thin device will wait forever for
its  outstanding IO to finish before generic/347's _dmthin_grow will be
able to resize the thin-pool.  Leading to deadlock.  Or until the 60
secs (dm_thin_pool.ko "no_space_timeout" modparam default) causes the
thin-pool to switch from OODS "queue IO" to OODS "error IO" mode:

# tail -f /var/log/messages &
# ./check generic/347
Dec 11 21:12:57 thegoat kernel: XFS (dm-1): Mounting V5 Filesystem
Dec 11 21:12:57 thegoat kernel: XFS (dm-1): Ending clean mount
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 thegoat 4.20.0-rc6.snitm+
MKFS_OPTIONS  -- -f -bsize=4096 /dev/mapper/test-metadata
MOUNT_OPTIONS -- /dev/mapper/test-metadata /scratch

Dec 11 21:12:58 thegoat kernel: device-mapper: thin: 253:4: reached low water mark for data device: sending event.
Dec 11 21:12:58 thegoat kernel: device-mapper: thin: 253:4: switching pool to out-of-data-space (queue IO) mode
Dec 11 21:14:01 thegoat kernel: device-mapper: thin: 253:4: switching pool to out-of-data-space (error IO) mode
Dec 11 21:14:01 thegoat kernel: xfs_destroy_ioend: 136 callbacks suppressed
Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1354752
Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1358592
Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1362432
Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1366272
Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1370112
Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1373952
Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1377792
Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1381632
Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1385472
Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1389312
Dec 11 21:14:02 thegoat kernel: device-mapper: thin: 253:4: switching pool to write mode
Dec 11 21:14:02 thegoat kernel: device-mapper: thin: 253:4: growing the data device from 1000 to 1200 blocks
Dec 11 21:14:02 thegoat kernel: device-mapper: thin: 253:4: reached low water mark for data device: sending event.
Dec 11 21:14:02 thegoat kernel: XFS (dm-5): Unmounting Filesystem
 65s
Ran: generic/347
Passed all 1 tests

Dec 11 21:14:02 thegoat kernel: XFS (dm-1): Unmounting Filesystem

SO generic/347 needs to account for 2 things:
1) issue $XFS_IO_PROG IO in the background with other process(es)
2) in parallel, in main generic/347 shell, run _dmthin_grow but the
"thin" device must be suspended with dmsetup suspend --noflush

Eric Sandeen: can you or someone else xfstests inclined use this info to
fix generic/347 please?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xfstests generic/347 was never correct [was: Re: dm: fix inflight IO check]
  2018-12-12  2:34   ` xfstests generic/347 was never correct [was: Re: dm: fix inflight IO check] Mike Snitzer
@ 2018-12-12  2:58     ` Eric Sandeen
  2018-12-12 15:18       ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Sandeen @ 2018-12-12  2:58 UTC (permalink / raw)
  To: Mike Snitzer, Jens Axboe, sandeen, fstests
  Cc: linux-block, device-mapper development, Christoph Hellwig

On 12/11/18 8:34 PM, Mike Snitzer wrote:
> On Mon, Dec 10 2018 at  7:32pm -0500,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
>> On Mon, Dec 10 2018 at  5:45pm -0500,
>> Jens Axboe <axboe@kernel.dk> wrote:
>>
>>> After switching to percpu inflight counters, the inflight check
>>> is totally buggy. It's perfectly valid for some counters to be
>>> non-zero while having a total inflight IO count of 0, that's how
>>> these kinds of counters work (inc on one CPU, dec on another).
>>> Fix the md_in_flight() check to sum all counters before returning
>>> a false positive, potentially.
>>>
>>> While at it, remove the inflight read for IO completion. We don't
>>> need it, just wake anyone that's waiting for the IO count to drop
>>> to zero. The caller needs to re-check that value anyway when woken,
>>> which it does.
>>>
>>> Fixes: 6f75723190d8 ("dm: remove the pending IO accounting")
>>> Reported-by: Christoph Hellwig <hch@lst.de>
>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> I'm seeing that device-mapper-test-suite's "resize_io" test doesn't
>> pass.  Glad this resolves the xfstest issue but I think more work is
>> needed, so I'll build any additional changes on this fix.
> 
> Turns out the device-mapper-test-suite 'resize_io' issue I was seeing is
> just a long lingering race that just got much more likely on my newer
> testbed.  This fixes what I was seeing:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.20&id=f6c367585d0d851349d3a9e607c43e5bea993fa1
> 
> But pivoting back to xfstests generic/347: that test was broken from the
> start, if it ever _really_ worked it would've been a bug in dm-thinp.
> And even after Jens' fix (above) the test still just times out after 60
> secs while in out-of-data-space (OODS) mode and gives a false "Passed
> all 1 tests".
> 
> generic/347 exhausting the thin-pool's data space and then issuing
> "sync" will cause IO to hang for 60 secs unless, in parallel, another
> process comes along and grows the thin-pool to have more data space.
> But even if xfstests' _dmthin_grow function were to be issued in 
> parallel with another process it'll fail because it uses
> _dmthin_reload_table which "dmsetup suspend"s the thin device (that was
> just used by XFS to fill the thin-pool) _without_ using the --noflush
> flag for "dmsetup suspend".  So the thin device will wait forever for
> its  outstanding IO to finish before generic/347's _dmthin_grow will be
> able to resize the thin-pool.  Leading to deadlock.  Or until the 60
> secs (dm_thin_pool.ko "no_space_timeout" modparam default) causes the
> thin-pool to switch from OODS "queue IO" to OODS "error IO" mode:
> 
> # tail -f /var/log/messages &
> # ./check generic/347
> Dec 11 21:12:57 thegoat kernel: XFS (dm-1): Mounting V5 Filesystem
> Dec 11 21:12:57 thegoat kernel: XFS (dm-1): Ending clean mount
> FSTYP         -- xfs (non-debug)
> PLATFORM      -- Linux/x86_64 thegoat 4.20.0-rc6.snitm+
> MKFS_OPTIONS  -- -f -bsize=4096 /dev/mapper/test-metadata
> MOUNT_OPTIONS -- /dev/mapper/test-metadata /scratch
> 
> Dec 11 21:12:58 thegoat kernel: device-mapper: thin: 253:4: reached low water mark for data device: sending event.
> Dec 11 21:12:58 thegoat kernel: device-mapper: thin: 253:4: switching pool to out-of-data-space (queue IO) mode
> Dec 11 21:14:01 thegoat kernel: device-mapper: thin: 253:4: switching pool to out-of-data-space (error IO) mode
> Dec 11 21:14:01 thegoat kernel: xfs_destroy_ioend: 136 callbacks suppressed
> Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1354752
> Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1358592
> Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1362432
> Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1366272
> Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1370112
> Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1373952
> Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1377792
> Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1381632
> Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1385472
> Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1389312
> Dec 11 21:14:02 thegoat kernel: device-mapper: thin: 253:4: switching pool to write mode
> Dec 11 21:14:02 thegoat kernel: device-mapper: thin: 253:4: growing the data device from 1000 to 1200 blocks
> Dec 11 21:14:02 thegoat kernel: device-mapper: thin: 253:4: reached low water mark for data device: sending event.
> Dec 11 21:14:02 thegoat kernel: XFS (dm-5): Unmounting Filesystem
>  65s
> Ran: generic/347
> Passed all 1 tests
> 
> Dec 11 21:14:02 thegoat kernel: XFS (dm-1): Unmounting Filesystem
> 
> SO generic/347 needs to account for 2 things:
> 1) issue $XFS_IO_PROG IO in the background with other process(es)
> 2) in parallel, in main generic/347 shell, run _dmthin_grow but the
> "thin" device must be suspended with dmsetup suspend --noflush
> 
> Eric Sandeen: can you or someone else xfstests inclined use this info to
> fix generic/347 please?

It's not at all clear to me what is wrong with the test.  "Do a bunch of IO
including a sync, completely filling the backing store, then grow the backing store"
is surely a real-world scenario that dm-thinp might encounter, no?  Things
do recover, subsequent writes succeed, and when all is said and done the filesystem
is checked and it's still consistent.  That sounds like some version of success to me.

Even if it's emulating a dumb user/administrator, that doesn't seem invalid.

What is wrong here?

-Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xfstests generic/347 was never correct [was: Re: dm: fix inflight IO check]
  2018-12-12  2:58     ` Eric Sandeen
@ 2018-12-12 15:18       ` Mike Snitzer
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Snitzer @ 2018-12-12 15:18 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Jens Axboe, sandeen, fstests, linux-block,
	device-mapper development, Christoph Hellwig

On Tue, Dec 11 2018 at  9:58pm -0500,
Eric Sandeen <sandeen@sandeen.net> wrote:

> On 12/11/18 8:34 PM, Mike Snitzer wrote:
> > On Mon, Dec 10 2018 at  7:32pm -0500,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> > 
> > SO generic/347 needs to account for 2 things:
> > 1) issue $XFS_IO_PROG IO in the background with other process(es)
> > 2) in parallel, in main generic/347 shell, run _dmthin_grow but the
> > "thin" device must be suspended with dmsetup suspend --noflush
> > 
> > Eric Sandeen: can you or someone else xfstests inclined use this info to
> > fix generic/347 please?
> 
> It's not at all clear to me what is wrong with the test.  "Do a bunch of IO
> including a sync, completely filling the backing store, then grow the backing store"
> is surely a real-world scenario that dm-thinp might encounter, no?  Things
> do recover, subsequent writes succeed, and when all is said and done the filesystem
> is checked and it's still consistent.  That sounds like some version of success to me.
> 
> Even if it's emulating a dumb user/administrator, that doesn't seem invalid.
> 
> What is wrong here?

I told you in elaborate detail.

Your willingness to leave generic/347 as is makes little sense; but that
is your call.  I'll just move on...

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-12-12 15:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-10 22:45 [PATCH] dm: fix inflight IO check Jens Axboe
2018-12-11  0:32 ` Mike Snitzer
2018-12-12  2:34   ` xfstests generic/347 was never correct [was: Re: dm: fix inflight IO check] Mike Snitzer
2018-12-12  2:58     ` Eric Sandeen
2018-12-12 15:18       ` Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).