All of lore.kernel.org
 help / color / mirror / Atom feed
* Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch
@ 2018-01-22 23:31 David Zarzycki
  2018-01-22 23:34 ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: David Zarzycki @ 2018-01-22 23:31 UTC (permalink / raw)
  To: linux-block

Hello,

I previously reported a hang when building LLVM+clang on a block =
multi-queue device (NVMe _or_ loopback onto tmpfs with the =E2=80=99none=E2=
=80=99 scheduler).

I=E2=80=99ve since updated the kernel to 4.15-rc9, merged the =
=E2=80=98blkmq/for-next=E2=80=99 branch, disabled nohz_full parameter =
(used for testing), and tried again. Both NVMe and loopback now lock up =
hard (ext4 if it matters). Here are the backtraces:

NVMe:      http://znu.io/IMG_0366.jpg
Loopback:  http://znu.io/IMG_0367.jpg

What should I try next to help debug this?

Dave=

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch
  2018-01-22 23:31 Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch David Zarzycki
@ 2018-01-22 23:34 ` Jens Axboe
  2018-01-23  1:05   ` David Zarzycki
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2018-01-22 23:34 UTC (permalink / raw)
  To: David Zarzycki, linux-block

On 1/22/18 4:31 PM, David Zarzycki wrote:
> Hello,
> 
> I previously reported a hang when building LLVM+clang on a block multi-queue device (NVMe _or_ loopback onto tmpfs with the ’none’ scheduler).
> 
> I’ve since updated the kernel to 4.15-rc9, merged the ‘blkmq/for-next’ branch, disabled nohz_full parameter (used for testing), and tried again. Both NVMe and loopback now lock up hard (ext4 if it matters). Here are the backtraces:
> 
> NVMe:      http://znu.io/IMG_0366.jpg
> Loopback:  http://znu.io/IMG_0367.jpg

I tried to reproduce this today using the exact recipe that you provide,
but it ran fine for hours. Similar setup, nvme on a dual socket box
with 48 threads.

> What should I try next to help debug this?

This one looks different than the other one. Are you sure your hw
is sane? I'd probably try and enable lockdep debugging etc and
see if you catch anything.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch
  2018-01-22 23:34 ` Jens Axboe
@ 2018-01-23  1:05   ` David Zarzycki
  2018-01-23  1:20     ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: David Zarzycki @ 2018-01-23  1:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block



> On Jan 22, 2018, at 18:34, Jens Axboe <axboe@kernel.dk> wrote:
>=20
> On 1/22/18 4:31 PM, David Zarzycki wrote:
>> Hello,
>>=20
>> I previously reported a hang when building LLVM+clang on a block =
multi-queue device (NVMe _or_ loopback onto tmpfs with the =E2=80=99none=E2=
=80=99 scheduler).
>>=20
>> I=E2=80=99ve since updated the kernel to 4.15-rc9, merged the =
=E2=80=98blkmq/for-next=E2=80=99 branch, disabled nohz_full parameter =
(used for testing), and tried again. Both NVMe and loopback now lock up =
hard (ext4 if it matters). Here are the backtraces:
>>=20
>> NVMe:      http://znu.io/IMG_0366.jpg
>> Loopback:  http://znu.io/IMG_0367.jpg
>=20
> I tried to reproduce this today using the exact recipe that you =
provide,
> but it ran fine for hours. Similar setup, nvme on a dual socket box
> with 48 threads.

Hi Jens,

Thanks for the quick reply and thanks for trying to reproduce this. =
I=E2=80=99m not sure if this makes a difference, but this dual Skylake =
machine has 96 threads, not 48 threads. Also, just to be clear, NVMe =
doesn=E2=80=99t seem to matter. I hit this bug with a tmpfs loopback =
device set up like so:

dd if=3D/dev/zero bs=3D1024k count=3D10000 of=3D/tmp/loopdisk
losetup /dev/loop0 /tmp/loopdisk
echo none > /sys/block/loop0/queue/scheduler
mkfs -t ext4 -L loopy /dev/loop0
mount /dev/loop0 /l
### build LLVM+clang in /l
### 'ninja check-all=E2=80=99 in a loop in /l

(No swap is setup because the machine has 192 GiB of RAM.)

>=20
>> What should I try next to help debug this?
>=20
> This one looks different than the other one. Are you sure your hw is =
sane?

I can build LLVM+clang in /tmp (tmpfs) reliably which suggests the the =
fundamental hardware is sane. It=E2=80=99s only when the software =
multi-queue layer gets involved that I see quick crashes/hangs.

As for the different backtraces, that's probably because I removed =
nohz_full from the kernel boot parameters.

> I'd probably try and enable lockdep debugging etc and see if you catch =
anything.

Thanks. I turned on lockdep plus other lock debugging. Here is the =
resulting backtrace:

http://znu.io/IMG_0368.jpg

Here is the resulting backtrace with transparent huge pages disabled:

http://znu.io/IMG_0369.jpg

Here is the resulting backtrace with transparent huge pages disabled AND =
with systemd-coredumps disabled too:

http://znu.io/IMG_0370.jpg

I=E2=80=99m open to trying anything at this point. Thanks for helping,
Dave=

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch
  2018-01-23  1:05   ` David Zarzycki
@ 2018-01-23  1:20     ` Jens Axboe
  2018-01-23 13:48       ` David Zarzycki
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2018-01-23  1:20 UTC (permalink / raw)
  To: David Zarzycki; +Cc: linux-block

On 1/22/18 6:05 PM, David Zarzycki wrote:
> 
> 
>> On Jan 22, 2018, at 18:34, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 1/22/18 4:31 PM, David Zarzycki wrote:
>>> Hello,
>>>
>>> I previously reported a hang when building LLVM+clang on a block
>>> multi-queue device (NVMe _or_ loopback onto tmpfs with the ’none’
>>> scheduler).
>>>
>>> I’ve since updated the kernel to 4.15-rc9, merged the
>>> ‘blkmq/for-next’ branch, disabled nohz_full parameter (used for
>>> testing), and tried again. Both NVMe and loopback now lock up hard
>>> (ext4 if it matters). Here are the backtraces:
>>>
>>> NVMe:      http://znu.io/IMG_0366.jpg
>>> Loopback:  http://znu.io/IMG_0367.jpg
>>
>> I tried to reproduce this today using the exact recipe that you provide,
>> but it ran fine for hours. Similar setup, nvme on a dual socket box
>> with 48 threads.
> 
> Hi Jens,
> 
> Thanks for the quick reply and thanks for trying to reproduce this.
> I’m not sure if this makes a difference, but this dual Skylake machine
> has 96 threads, not 48 threads. Also, just to be clear, NVMe doesn’t
> seem to matter. I hit this bug with a tmpfs loopback device set up
> like so:
>
> dd if=/dev/zero bs=1024k count=10000 of=/tmp/loopdisk
> losetup /dev/loop0 /tmp/loopdisk
> echo none > /sys/block/loop0/queue/scheduler
> mkfs -t ext4 -L loopy /dev/loop0
> mount /dev/loop0 /l
> ### build LLVM+clang in /l
> ### 'ninja check-all’ in a loop in /l
> 
> (No swap is setup because the machine has 192 GiB of RAM.)

The 48 vs 96 is probably not that significant.Just to be clear, you can
reproduce something else on tmpfs loopback, they don't look related
apart from the fact that they are both lockups off the IO completion
path.

>>> What should I try next to help debug this?
>>
>> This one looks different than the other one. Are you sure your hw is
>> sane?
>
> I can build LLVM+clang in /tmp (tmpfs) reliably which suggests the the
> fundamental hardware is sane. It’s only when the software multi-queue
> layer gets involved that I see quick crashes/hangs.
> 
> As for the different backtraces, that's probably because I removed
> nohz_full from the kernel boot parameters.

Hardware issues can manifest itself in mysterious ways. It might very
well be a software bug, but it'd be the first one of its kind that I've
seen reported. Which does make me a little skeptical, it might just be
the canary in this case.

>> I'd probably try and enable lockdep debugging etc and see if you
>> catch anything.
> 
> Thanks. I turned on lockdep plus other lock debugging. Here is the
> resulting backtrace:
> 
> http://znu.io/IMG_0368.jpg
> 
> Here is the resulting backtrace with transparent huge pages disabled:
> 
> http://znu.io/IMG_0369.jpg
> 
> Here is the resulting backtrace with transparent huge pages disabled
> AND with systemd-coredumps disabled too:
> 
> http://znu.io/IMG_0370.jpg

All of these are off the blk-wbt completion path. I suggested earlier to
try and disable CONFIG_BLK_WBT to see if it goes away, or at least to
see if the pattern changes.

Lockdep didn't catch anything. Maybe try some of the other debugging
features, like page poisoning, memory allocation debugging, slub debug
on-by-default.

> I’m open to trying anything at this point. Thanks for helping,

I'd try other types of stress testing. Has the machine otherwise been
stable, or is it a new box?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch
  2018-01-23  1:20     ` Jens Axboe
@ 2018-01-23 13:48       ` David Zarzycki
  2018-01-23 15:34         ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: David Zarzycki @ 2018-01-23 13:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block



> On Jan 22, 2018, at 20:20, Jens Axboe <axboe@kernel.dk> wrote:
>=20
> All of these are off the blk-wbt completion path. I suggested earlier =
to
> try and disable CONFIG_BLK_WBT to see if it goes away, or at least to
> see if the pattern changes.

Hi Jens,

Bingo! Disabling CONFIG_BLK_WBT makes the problem go away.

>=20
>> I=E2=80=99m open to trying anything at this point. Thanks for =
helping,
>=20
> I'd try other types of stress testing. Has the machine otherwise been
> stable, or is it a new box?

It is a new box. Other than the CONFIG_BLK_WBT problem, it handles =
stress just fine. If you want to debug this further, I=E2=80=99m willing =
to run instrumented code.

Thanks for helping me,
Dave=

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch
  2018-01-23 13:48       ` David Zarzycki
@ 2018-01-23 15:34         ` Jens Axboe
  2018-01-23 17:54           ` David Zarzycki
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2018-01-23 15:34 UTC (permalink / raw)
  To: David Zarzycki; +Cc: linux-block

On 1/23/18 6:48 AM, David Zarzycki wrote:
> 
> 
>> On Jan 22, 2018, at 20:20, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> All of these are off the blk-wbt completion path. I suggested earlier to
>> try and disable CONFIG_BLK_WBT to see if it goes away, or at least to
>> see if the pattern changes.
> 
> Hi Jens,
> 
> Bingo! Disabling CONFIG_BLK_WBT makes the problem go away.

Interesting. The only thing I can think of is
block/blk-wbt.c:get_rq_wait() returning a bogus pointer, but your
compiler would need to be broken for that. And I think your lockdep
would have exploded if that was the case. See below for a quick'n dirty
you can try and run to disprove that theory.

>>> I’m open to trying anything at this point. Thanks for helping,
>>
>> I'd try other types of stress testing. Has the machine otherwise been
>> stable, or is it a new box?
> 
> It is a new box. Other than the CONFIG_BLK_WBT problem, it handles
> stress just fine. If you want to debug this further, I’m willing to
> run instrumented code.

The below is a long shot, but I'll try and think about it some more. I
haven't had any reports like this, ever, so it's very puzzling.


diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index ae8de9780085..5a45e9245d89 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -103,7 +103,7 @@ static bool wb_recent_wait(struct rq_wb *rwb)
 
 static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_kswapd)
 {
-	return &rwb->rq_wait[is_kswapd];
+	return &rwb->rq_wait[!!is_kswapd];
 }
 
 static void rwb_wake_all(struct rq_wb *rwb)

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch
  2018-01-23 15:34         ` Jens Axboe
@ 2018-01-23 17:54           ` David Zarzycki
  2018-01-23 18:00             ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: David Zarzycki @ 2018-01-23 17:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

Hi Jens,

The bug still reproduces with this change. How confident are we that =
kernel objects are properly reference counted while they are throttled?

Dave


> On Jan 23, 2018, at 10:34, Jens Axboe <axboe@kernel.dk> wrote:
>=20
> On 1/23/18 6:48 AM, David Zarzycki wrote:
>>=20
>>=20
>>> On Jan 22, 2018, at 20:20, Jens Axboe <axboe@kernel.dk> wrote:
>>>=20
>>> All of these are off the blk-wbt completion path. I suggested =
earlier to
>>> try and disable CONFIG_BLK_WBT to see if it goes away, or at least =
to
>>> see if the pattern changes.
>>=20
>> Hi Jens,
>>=20
>> Bingo! Disabling CONFIG_BLK_WBT makes the problem go away.
>=20
> Interesting. The only thing I can think of is
> block/blk-wbt.c:get_rq_wait() returning a bogus pointer, but your
> compiler would need to be broken for that. And I think your lockdep
> would have exploded if that was the case. See below for a quick'n =
dirty
> you can try and run to disprove that theory.
>=20
>>>> I=E2=80=99m open to trying anything at this point. Thanks for =
helping,
>>>=20
>>> I'd try other types of stress testing. Has the machine otherwise =
been
>>> stable, or is it a new box?
>>=20
>> It is a new box. Other than the CONFIG_BLK_WBT problem, it handles
>> stress just fine. If you want to debug this further, I=E2=80=99m =
willing to
>> run instrumented code.
>=20
> The below is a long shot, but I'll try and think about it some more. I
> haven't had any reports like this, ever, so it's very puzzling.
>=20
>=20
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index ae8de9780085..5a45e9245d89 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -103,7 +103,7 @@ static bool wb_recent_wait(struct rq_wb *rwb)
>=20
> static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool =
is_kswapd)
> {
> -	return &rwb->rq_wait[is_kswapd];
> +	return &rwb->rq_wait[!!is_kswapd];
> }
>=20
> static void rwb_wake_all(struct rq_wb *rwb)
>=20
> --=20
> Jens Axboe
>=20

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch
  2018-01-23 17:54           ` David Zarzycki
@ 2018-01-23 18:00             ` Jens Axboe
  2018-01-23 18:12               ` David Zarzycki
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2018-01-23 18:00 UTC (permalink / raw)
  To: David Zarzycki; +Cc: linux-block

On 1/23/18 10:54 AM, David Zarzycki wrote:
> Hi Jens,
> 
> The bug still reproduces with this change. How confident are we that
> kernel objects are properly reference counted while they are
> throttled?

I would be surprised if it made a change, thanks for checking. Since
you're not pulling devices out of your system, is really nothing that
needs ref counting here. You don't have something running turning wbt
on/off during the run, I assume?

The whole thing is very odd. You do have lots of processes exiting with
segfaults and similar. The only thing I can think of is the wait queue
entry becoming invalid since it's on the stack, but I don't see how that
can happen. We should exit the wait path normally and remove ourselves
from the list.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch
  2018-01-23 18:00             ` Jens Axboe
@ 2018-01-23 18:12               ` David Zarzycki
  0 siblings, 0 replies; 9+ messages in thread
From: David Zarzycki @ 2018-01-23 18:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block



> On Jan 23, 2018, at 13:00, Jens Axboe <axboe@kernel.dk> wrote:
>=20
> On 1/23/18 10:54 AM, David Zarzycki wrote:
>> Hi Jens,
>>=20
>> The bug still reproduces with this change. How confident are we that
>> kernel objects are properly reference counted while they are
>> throttled?
>=20
> I would be surprised if it made a change, thanks for checking. Since
> you're not pulling devices out of your system, is really nothing that
> needs ref counting here.

I=E2=80=99m not talking about hardware object reference counting, just =
basic memory management reference counting. That being said, I=E2=80=99m =
not a Linux kernel programmer, so I perhaps Linux avoids classic =
reference counting somehow.

> You don't have something running turning wbt on/off during the run, I =
assume?

I doubt it. This is just a plain Red Hat Fedora 27 workstation install =
with many of frills uninstalled afterwards. I didn=E2=80=99t even know =
that WBT could be dynamically enabled/disabled.

>=20
> The whole thing is very odd. You do have lots of processes exiting =
with
> segfaults and similar. The only thing I can think of is the wait queue
> entry becoming invalid since it's on the stack, but I don't see how =
that
> can happen. We should exit the wait path normally and remove ourselves
> from the list.

For whatever it may be worth, I hit this bug once while rebuilding the =
Fedora kernel RPM, which is similar in that they both have short lived =
processes, but is different in that the Fedora kernel RPM rebuild =
doesn=E2=80=99t intentionally segfault to validate correctness.

Dave=

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-01-23 18:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-22 23:31 Hard LOCKUP on 4.15-rc9 + 'blkmq/for-next' branch David Zarzycki
2018-01-22 23:34 ` Jens Axboe
2018-01-23  1:05   ` David Zarzycki
2018-01-23  1:20     ` Jens Axboe
2018-01-23 13:48       ` David Zarzycki
2018-01-23 15:34         ` Jens Axboe
2018-01-23 17:54           ` David Zarzycki
2018-01-23 18:00             ` Jens Axboe
2018-01-23 18:12               ` David Zarzycki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.