All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug in dm_any_congested?
@ 2015-11-10 13:14 Boštjan Škufca @ Teon.si
  2015-11-10 14:39 ` Zdenek Kabelac
  0 siblings, 1 reply; 4+ messages in thread
From: Boštjan Škufca @ Teon.si @ 2015-11-10 13:14 UTC (permalink / raw)
  To: dm-devel

Hi all,

(I am not sure if this is the right address for this bug, but dm_* is
the first function listed in trace)

this happened when I tried dd-ing LV, which resides on RAID10 sw
storage, to network via netcat. The actual user-visible manifestation
was segfault of dd that was used for reading.

On first request the error reported was "unable to handle kernel NULL
pointer dereference...", but on subsequent tries this changed to
"general protection fault", see below.

HW is a bit dated, but had no problems with it up to now, and SW raid
is used here. Kernel was 4.2.4.

Is this the right mlist for such bug?

Tnx,
b.

2015-11-10T02:03:26+00:00 ring kernel: [1251477.590992] BUG: unable to
handle kernel NULL pointer dereference at 00000000000004f0
2015-11-10T02:03:26+00:00 ring kernel: [1251477.591361] IP:
[<ffffffff818637c6>] dm_any_congested+0x26/0x50
2015-11-10T02:03:26+00:00 ring kernel: [1251477.591565] PGD 1afdc6067
PUD 5106e8067 PMD 0
2015-11-10T02:03:26+00:00 ring kernel: [1251477.591827] Oops: 0000 [#1] SMP
2015-11-10T02:03:26+00:00 ring kernel: [1251477.592057] CPU: 2 PID:
4277 Comm: dd Tainted: G        W       4.2.4 #1
2015-11-10T02:03:26+00:00 ring kernel: [1251477.592226] Hardware name:
HP ProLiant DL360 G5, BIOS P58 04/26/2010
2015-11-10T02:03:26+00:00 ring kernel: [1251477.592392] task:
ffff88080b41a400 ti: ffff8800caae8000 task.ti: ffff8800caae8000
2015-11-10T02:03:26+00:00 ring kernel: [1251477.592682] RIP:
0010:[<ffffffff818637c6>]  [<ffffffff818637c6>]
dm_any_congested+0x26/0x50
2015-11-10T02:03:26+00:00 ring kernel: [1251477.593007] RSP:
0018:ffff8800caaebcf0  EFLAGS: 00010206
2015-11-10T02:03:26+00:00 ring kernel: [1251477.593175] RAX:
0000000000000002 RBX: ffff88080ac30770 RCX: 6b7369440f000000
2015-11-10T02:03:26+00:00 ring kernel: [1251477.593465] RDX:
0000000000000000 RSI: 0000000000000002 RDI: ffff880805a34800
2015-11-10T02:03:26+00:00 ring kernel: [1251477.593753] RBP:
ffff8800caaebd28 R08: 0000000000000100 R09: 0000000000000100
2015-11-10T02:03:26+00:00 ring kernel: [1251477.594041] R10:
0000000000000200 R11: ffff8800caaebcf0 R12: 0000000000000000
2015-11-10T02:03:26+00:00 ring kernel: [1251477.594331] R13:
ffff88001b7b0d00 R14: ffffea00138c6340 R15: ffff8800caaebe80
2015-11-10T02:03:26+00:00 ring kernel: [1251477.594621] FS:
00007f1f99c91740(0000) GS:ffff88082fa80000(0000)
knlGS:0000000000000000
2015-11-10T02:03:26+00:00 ring kernel: [1251477.594911] CS:  0010 DS:
0000 ES: 0000 CR0: 000000008005003b
2015-11-10T02:03:26+00:00 ring kernel: [1251477.595076] CR2:
00000000000004f0 CR3: 00000002254cb000 CR4: 00000000000026e0
2015-11-10T02:03:26+00:00 ring kernel: [1251477.595365] Stack:
2015-11-10T02:03:26+00:00 ring kernel: [1251477.595524]
ffffffff81200798 ffff8800caaebd18 ffffffff8116a18e 0000000000000000
2015-11-10T02:03:26+00:00 ring kernel: [1251477.595943]
ffff88001b7b0da0 ffff88080ac308d8 ffff88001b7b0d00 ffff8800caaebd68
2015-11-10T02:03:26+00:00 ring kernel: [1251477.596362]
ffffffff8117629e 0000000000000292 0000000000000100 0000000000000100
2015-11-10T02:03:26+00:00 ring kernel: [1251477.596784] Call Trace:
2015-11-10T02:03:26+00:00 ring kernel: [1251477.596947]
[<ffffffff81200798>] ? inode_congested+0x58/0x120
2015-11-10T02:03:26+00:00 ring kernel: [1251477.597115]
[<ffffffff8116a18e>] ? find_get_entry+0x1e/0x80
2015-11-10T02:03:26+00:00 ring kernel: [1251477.597282]
[<ffffffff8117629e>] page_cache_async_readahead+0x4e/0x70
2015-11-10T02:03:26+00:00 ring kernel: [1251477.597450]
[<ffffffff8116b211>] generic_file_read_iter+0x391/0x5b0
2015-11-10T02:03:26+00:00 ring kernel: [1251477.597617]
[<ffffffff8120c677>] blkdev_read_iter+0x37/0x40
2015-11-10T02:03:26+00:00 ring kernel: [1251477.597784]
[<ffffffff811d7757>] __vfs_read+0xa7/0xd0
2015-11-10T02:03:26+00:00 ring kernel: [1251477.597949]
[<ffffffff811d7802>] vfs_read+0x82/0x130
2015-11-10T02:03:26+00:00 ring kernel: [1251477.598114]
[<ffffffff811d7c76>] SyS_read+0x46/0xb0
2015-11-10T02:03:26+00:00 ring kernel: [1251477.598280]
[<ffffffff81a7556e>] entry_SYSCALL_64_fastpath+0x12/0x71
2015-11-10T02:03:26+00:00 ring kernel: [1251477.598446] Code: 80 00 00
00 00 66 66 66 66 90 48 8b 97 28 01 00 00 89 f0 83 e2 01 75 38 48 8b
8f e8 00 00 00 48 85 c9 74 1c 48 8b 97 30 01 00 00 <48> 8b b2 f0 04 00
00 f7 c6 00 08 00 00 74 07 23 82 d0 01 00 00
2015-11-10T02:03:26+00:00 ring kernel: [1251477.600930] RIP
[<ffffffff818637c6>] dm_any_congested+0x26/0x50
2015-11-10T02:03:26+00:00 ring kernel: [1251477.600930]  RSP <ffff8800caaebcf0>
2015-11-10T02:03:26+00:00 ring kernel: [1251477.600930] CR2: 00000000000004f0
2015-11-10T02:03:26+00:00 ring kernel: [1251477.601561] ---[ end trace
17914a53c55dd503 ]---



Second and subsequent:
2015-11-10T02:04:46+00:00 ring kernel: [1251557.440486] general
protection fault: 0000 [#2] SMP
2015-11-10T02:04:46+00:00 ring kernel: [1251557.440732] CPU: 7 PID:
4806 Comm: dd Tainted: G      D W       4.2.4 #1
2015-11-10T02:04:46+00:00 ring kernel: [1251557.440904] Hardware name:
HP ProLiant DL360 G5, BIOS P58 04/26/2010
2015-11-10T02:04:46+00:00 ring kernel: [1251557.441071] task:
ffff88019b6c9800 ti: ffff88062c96c000 task.ti: ffff88062c96c000
2015-11-10T02:04:46+00:00 ring kernel: [1251557.441369] RIP:
0010:[<ffffffff818637c6>]  [<ffffffff818637c6>]
dm_any_congested+0x26/0x50
2015-11-10T02:04:46+00:00 ring kernel: [1251557.441699] RSP:
0018:ffff88062c96fcf0  EFLAGS: 00010206
2015-11-10T02:04:46+00:00 ring kernel: [1251557.441863] RAX:
0000000000000002 RBX: ffff88080ac30770 RCX: 6566667542207972
2015-11-10T02:04:46+00:00 ring kernel: [1251557.442152] RDX:
1b0a1bbb00007972 RSI: 0000000000000002 RDI: ffff880805a34800
2015-11-10T02:04:46+00:00 ring kernel: [1251557.442445] RBP:
ffff88062c96fd28 R08: 0000000000000100 R09: 0000000000000100
2015-11-10T02:04:46+00:00 ring kernel: [1251557.442733] R10:
0000000000000200 R11: ffff88062c96fcf0 R12: 0000000000000000
2015-11-10T02:04:46+00:00 ring kernel: [1251557.443021] R13:
ffff88080587ec00 R14: ffffea001699af40 R15: ffff88062c96fe80
2015-11-10T02:04:46+00:00 ring kernel: [1251557.443310] FS:
00007f0dc3de2740(0000) GS:ffff88082fbc0000(0000)
knlGS:0000000000000000
2015-11-10T02:04:46+00:00 ring kernel: [1251557.443603] CS:  0010 DS:
0000 ES: 0000 CR0: 000000008005003b
2015-11-10T02:04:46+00:00 ring kernel: [1251557.443768] CR2:
00000000b768a0a0 CR3: 00000000cab12000 CR4: 00000000000026e0
2015-11-10T02:04:46+00:00 ring kernel: [1251557.447111] Stack:
2015-11-10T02:04:46+00:00 ring kernel: [1251557.447269]
ffffffff81200798 ffff88062c96fd18 ffffffff8116a18e 0000000000000000
2015-11-10T02:04:46+00:00 ring kernel: [1251557.447688]
ffff88080587eca0 ffff88080ac308d8 ffff88080587ec00 ffff88062c96fd68
2015-11-10T02:04:46+00:00 ring kernel: [1251557.448109]
ffffffff8117629e 0000000000000001 0000000000000100 0000000000000100
2015-11-10T02:04:46+00:00 ring kernel: [1251557.448528] Call Trace:
2015-11-10T02:04:46+00:00 ring kernel: [1251557.448691]
[<ffffffff81200798>] ? inode_congested+0x58/0x120
2015-11-10T02:04:46+00:00 ring kernel: [1251557.448859]
[<ffffffff8116a18e>] ? find_get_entry+0x1e/0x80
2015-11-10T02:04:46+00:00 ring kernel: [1251557.449028]
[<ffffffff8117629e>] page_cache_async_readahead+0x4e/0x70
2015-11-10T02:04:46+00:00 ring kernel: [1251557.449195]
[<ffffffff8116b211>] generic_file_read_iter+0x391/0x5b0
2015-11-10T02:04:46+00:00 ring kernel: [1251557.449362]
[<ffffffff8120c677>] blkdev_read_iter+0x37/0x40
2015-11-10T02:04:46+00:00 ring kernel: [1251557.449530]
[<ffffffff811d7757>] __vfs_read+0xa7/0xd0
2015-11-10T02:04:46+00:00 ring kernel: [1251557.449695]
[<ffffffff811d7802>] vfs_read+0x82/0x130
2015-11-10T02:04:46+00:00 ring kernel: [1251557.449859]
[<ffffffff811d7c76>] SyS_read+0x46/0xb0
2015-11-10T02:04:46+00:00 ring kernel: [1251557.450030]
[<ffffffff81a7556e>] entry_SYSCALL_64_fastpath+0x12/0x71
2015-11-10T02:04:46+00:00 ring kernel: [1251557.450197] Code: 80 00 00
00 00 66 66 66 66 90 48 8b 97 28 01 00 00 89 f0 83 e2 01 75 38 48 8b
8f e8 00 00 00 48 85 c9 74 1c 48 8b 97 30 01 00 00 <48> 8b b2 f0 04 00
00 f7 c6 00 08 00 00 74 07 23 82 d0 01 00 00
2015-11-10T02:04:46+00:00 ring kernel: [1251557.450465] RIP
[<ffffffff818637c6>] dm_any_congested+0x26/0x50
2015-11-10T02:04:46+00:00 ring kernel: [1251557.450465]  RSP <ffff88062c96fcf0>
2015-11-10T02:04:46+00:00 ring kernel: [1251557.453232] ---[ end trace
17914a53c55dd504 ]---

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug in dm_any_congested?
  2015-11-10 13:14 Bug in dm_any_congested? Boštjan Škufca @ Teon.si
@ 2015-11-10 14:39 ` Zdenek Kabelac
  2015-11-10 15:02   ` Boštjan Škufca @ Teon.si
  0 siblings, 1 reply; 4+ messages in thread
From: Zdenek Kabelac @ 2015-11-10 14:39 UTC (permalink / raw)
  To: device-mapper development

Dne 10.11.2015 v 14:14 Boštjan Škufca @ Teon.si napsal(a):
> Hi all,
>
> (I am not sure if this is the right address for this bug, but dm_* is
> the first function listed in trace)
>
> this happened when I tried dd-ing LV, which resides on RAID10 sw
> storage, to network via netcat. The actual user-visible manifestation
> was segfault of dd that was used for reading.
>
> On first request the error reported was "unable to handle kernel NULL
> pointer dereference...", but on subsequent tries this changed to
> "general protection fault", see below.
>
> HW is a bit dated, but had no problems with it up to now, and SW raid
> is used here. Kernel was 4.2.4.
>
> Is this the right mlist for such bug?

Hi

Yes the issue is known - but source is not fully known.
I've opened public BZ: https://bugzilla.redhat.com/1279941
There is some potential fix - but unclear what it solves:
http://git.kernel.org/linus/ad5f498f610

Regards

Zdenek

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug in dm_any_congested?
  2015-11-10 14:39 ` Zdenek Kabelac
@ 2015-11-10 15:02   ` Boštjan Škufca @ Teon.si
  2015-11-10 17:27     ` Linux >= 4.2 dm_any_congested bug due to bad data from vfs/mm? [was: Bug in dm_any_congested?] Mike Snitzer
  0 siblings, 1 reply; 4+ messages in thread
From: Boštjan Škufca @ Teon.si @ 2015-11-10 15:02 UTC (permalink / raw)
  To: device-mapper development

On 10 November 2015 at 15:39, Zdenek Kabelac <zkabelac@redhat.com> wrote:
> Dne 10.11.2015 v 14:14 Boštjan Škufca @ Teon.si napsal(a):
>>
>> Hi all,
>>
>> HW is a bit dated, but had no problems with it up to now, and SW raid
>> is used here. Kernel was 4.2.4.
>>
>> Is this the right mlist for such bug?
>
>
> Hi
>
> Yes the issue is known - but source is not fully known.
> I've opened public BZ: https://bugzilla.redhat.com/1279941
> There is some potential fix - but unclear what it solves:
> http://git.kernel.org/linus/ad5f498f610

So 4.1.13 is ok in this respect, or is this unknown ATM?

Does it depend on underlying storage at all, or not? MD does not seem
to be listed in stack trace.

b.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Linux >= 4.2 dm_any_congested bug due to bad data from vfs/mm? [was: Bug in dm_any_congested?]
  2015-11-10 15:02   ` Boštjan Škufca @ Teon.si
@ 2015-11-10 17:27     ` Mike Snitzer
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Snitzer @ 2015-11-10 17:27 UTC (permalink / raw)
  To: Boštjan Škufca @ Teon.si
  Cc: device-mapper development, linux-kernel, linux-fsdevel

[Cc'ing LKML and linux-fsdevel to cast a wider net and raise awareness]

On Tue, Nov 10 2015 at 10:02am -0500,
Boštjan Škufca @ Teon.si <bostjan@teon.si> wrote:

> On 10 November 2015 at 15:39, Zdenek Kabelac <zkabelac@redhat.com> wrote:
> > Dne 10.11.2015 v 14:14 Boštjan Škufca @ Teon.si napsal(a):
> >>
> >> Hi all,
> >>
> >> HW is a bit dated, but had no problems with it up to now, and SW raid
> >> is used here. Kernel was 4.2.4.
> >>
> >> Is this the right mlist for such bug?
> >
> >
> > Hi
> >
> > Yes the issue is known - but source is not fully known.
> > I've opened public BZ: https://bugzilla.redhat.com/1279941
> > There is some potential fix - but unclear what it solves:
> > http://git.kernel.org/linus/ad5f498f610
> 
> So 4.1.13 is ok in this respect, or is this unknown ATM?
> 
> Does it depend on underlying storage at all, or not? MD does not seem
> to be listed in stack trace.

We don't yet have a reliable reproducer.  So if your test proves to
reliably reproduce the issue for you then we may be able to make much
quicker progress.

While the bug manifests as a crash in dm_any_congested (either NULL
pointer or GPF) it _seems_ that the problem is further up the stack in
the vfs and/or mm (by passing garbage into dm_any_congested via call to
queue->backing_dev_info.congested_fn).  But all possibilities are still
on the table... again not much to go on yet.

Please feel free to test using the 4.4 stable@ commit Zdenek referenced
(but I'm skeptical it'll fix this issue if you aren't reactivating
volumes or anything): http://git.kernel.org/linus/ad5f498f610

Also, you're welcome to update this BZ as you collect additional info:
https://bugzilla.redhat.com/1279941

Thanks,
Mike

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-11-10 17:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-10 13:14 Bug in dm_any_congested? Boštjan Škufca @ Teon.si
2015-11-10 14:39 ` Zdenek Kabelac
2015-11-10 15:02   ` Boštjan Škufca @ Teon.si
2015-11-10 17:27     ` Linux >= 4.2 dm_any_congested bug due to bad data from vfs/mm? [was: Bug in dm_any_congested?] Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.