linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG 4.9/4.10] crash in __d_lookup() due to corrupted dentry_hashtable
@ 2017-03-03 13:31 Heiko Carstens
  2017-03-20 12:08 ` Heiko Carstens
  0 siblings, 1 reply; 2+ messages in thread
From: Heiko Carstens @ 2017-03-03 13:31 UTC (permalink / raw)
  To: Al Viro; +Cc: Gustavo Luiz Ferreira Walbon, linux-fsdevel, linux-kernel

Hello Al,

Gustavo reported the crash below within __d_lookup() on s390. I'm wondering
if you can make any sense of it:

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: fffffffffffff000 TEID: fffffffffffff803
Fault in home space mode while using kernel ASCE.
AS:0000000000ec8007 R3:00000003dc5f4007 S:0000000000000020 
Oops: 0038 ilc:3 [#1] SMP 
Modules linked in: dm_mirror dm_region_hash dm_log raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_service_time paes_s390 pkey zcrypt rng_core xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter aes_s390 des_s390 des_generic sha512_s390 sha256_s390 vmur sha1_s390 sha_common vhost_net nfsd tun vhost macvtap auth_rpcgss macvlan nfs_acl lockd kvm sch_fq_codel grace sunrpc dm_multipath dm_mod ip_tables x_tables autofs4
CPU: 1 PID: 31708 Comm: fio Tainted: G        W       4.10.0-20170228.0.1e014e7.d661408.fc24.s390xperformance #1
Hardware name: IBM              2964 N96              704              (z/VM)
task: 00000000f7e55d00 task.stack: 000000023124c000
Krnl PSW : 0704e00180000000 000000000033539e (__d_lookup+0x6e/0x168)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 000000000000002f 000003e0806248f8 0000000082313080 0000000000d2abe0
           000000023124fc00 000000023124fbfc 00000000003b08a8 00000003dafac220
           0000000082313080 0000000000000000 000000023124fdc8 000000026248fcce
           fffffffffffffffe 000000000094b420 000000023124faf0 000000023124faa0
Krnl Code: 0000000000335392: b9040082            lgr     %r8,%r2
           0000000000335396: a7980000            lhi     %r9,0
          #000000000033539a: a7f40008            brc     15,3353aa
          >000000000033539e: e3c0c0000004        lg      %r12,0(%r12)
           00000000003353a4: ecc8001d007c        cgij    %r12,0,8,3353de
           00000000003353aa: 59b0c01c            c       %r11,28(%r12)
           00000000003353ae: a774fff8            brc     7,33539e
           00000000003353b2: 4120c050            la      %r2,80(%r12)
Call Trace:
([<0000000082313080>] 0x82313080)
 [<0000000000327026>] lookup_fast+0x19e/0x340 
 [<0000000000327530>] walk_component+0x48/0x358 
 [<0000000000327980>] link_path_walk+0x140/0x508 
 [<0000000000328806>] path_openat+0xae/0x1320 
 [<000000000032af26>] do_filp_open+0x86/0x108 
 [<0000000000315b5c>] do_sys_open+0x174/0x250 
 [<0000000000922d5c>] system_call+0xc4/0x264 
Last Breaking-Event-Address:
 [<00000000003353ae>] __d_lookup+0x7e/0x168
 
Kernel panic - not syncing: Fatal exception: panic_on_oops

Looking at the relevant part of __d_lookup:

struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name)
{
	unsigned int hash = name->hash;
	struct hlist_bl_head *b = d_hash(hash);  <--- points to corrupted entry
	struct hlist_bl_node *node;
	struct dentry *found = NULL;
	struct dentry *dentry;

	rcu_read_lock();
	
	hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {

		if (dentry->d_name.hash != hash)
			continue;
...

The contents of *b within the dump is:

> struct hlist_bl_head 000003e0806248f8
struct hlist_bl_head {
	first = 0xffffffffffffffff
}

Note that 0x000003e0806248f8 is a valid address within the
dentry_hashtable. In addition all other entries look ok, as far as I can
tell. This is the only entry that contains a -1UL value.

We also have a second dump with a similar crash with a 4.9 kernel. In that
case there are in total three entries spread within the dentry_hashtable
with a -1UL value, while all other entries seem to look ok. So there seems
to be a pattern.

Note: these kernels do contain addon patches that are not mainline, but I
don't believe that any of those can explain these corruptions.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [BUG 4.9/4.10] crash in __d_lookup() due to corrupted dentry_hashtable
  2017-03-03 13:31 [BUG 4.9/4.10] crash in __d_lookup() due to corrupted dentry_hashtable Heiko Carstens
@ 2017-03-20 12:08 ` Heiko Carstens
  0 siblings, 0 replies; 2+ messages in thread
From: Heiko Carstens @ 2017-03-20 12:08 UTC (permalink / raw)
  To: Al Viro, Gustavo Luiz Ferreira Walbon, linux-fsdevel, linux-kernel

On Fri, Mar 03, 2017 at 02:31:50PM +0100, Heiko Carstens wrote:
> Hello Al,
> 
> Gustavo reported the crash below within __d_lookup() on s390. I'm wondering
> if you can make any sense of it:
> 
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: fffffffffffff000 TEID: fffffffffffff803
> Fault in home space mode while using kernel ASCE.

...

> Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> Looking at the relevant part of __d_lookup:
> 
> struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name)
> {
> 	unsigned int hash = name->hash;
> 	struct hlist_bl_head *b = d_hash(hash);  <--- points to corrupted entry
> 	struct hlist_bl_node *node;
> 	struct dentry *found = NULL;
> 	struct dentry *dentry;
> 
> 	rcu_read_lock();
> 	
> 	hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
> 
> 		if (dentry->d_name.hash != hash)
> 			continue;
> ...
> 
> The contents of *b within the dump is:
> 
> > struct hlist_bl_head 000003e0806248f8
> struct hlist_bl_head {
> 	first = 0xffffffffffffffff
> }
> 
> Note that 0x000003e0806248f8 is a valid address within the
> dentry_hashtable. In addition all other entries look ok, as far as I can
> tell. This is the only entry that contains a -1UL value.
> 
> We also have a second dump with a similar crash with a 4.9 kernel. In that
> case there are in total three entries spread within the dentry_hashtable
> with a -1UL value, while all other entries seem to look ok. So there seems
> to be a pattern.
> 
> Note: these kernels do contain addon patches that are not mainline, but I
> don't believe that any of those can explain these corruptions.

Famous last words... it looks like it was indeed one of our addon patches.

At least with the bug fixed Gustavo reported that the system now survives
a 60h stress test, which it previously didn't.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-03-20 12:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-03 13:31 [BUG 4.9/4.10] crash in __d_lookup() due to corrupted dentry_hashtable Heiko Carstens
2017-03-20 12:08 ` Heiko Carstens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).