From: Ronnie Sahlberg <lsahlber@redhat.com>
To: David Wysochanski <dwysocha@redhat.com>
Cc: linux-cifs <linux-cifs@vger.kernel.org>,
Frank Sorenson <sorenson@redhat.com>
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
Date: Wed, 16 Oct 2019 20:17:18 -0400 (EDT) [thread overview]
Message-ID: <1206360169.6955748.1571271438699.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <CALF+zOkugWpn6aCApqj8dF+AovgbQ8zgC-Hf8_0uvwqwHYTPiw@mail.gmail.com>
I can not reproduce this :-(
I have run it for a few hours, restarting samba in a loop with up to 30 threads.
Can you check
1, If this only reproduce for you for the root of the share or it also reproduces for a subdirectory?
2, Does it reproduce also if you use "nohandlecache" mount option?
This disables the use of cached open of the root handle, i.e. open_shroot()
3, When this happens, can you check the content of the mid entry and what these fields are:
mid->mid_flags, mid->handle (this is a function pointer, what does it point to)
mid->command. Maybe print the whole structure.
regards
ronnie sahlberg
----- Original Message -----
> From: "David Wysochanski" <dwysocha@redhat.com>
> To: "linux-cifs" <linux-cifs@vger.kernel.org>
> Cc: "Frank Sorenson" <sorenson@redhat.com>
> Sent: Thursday, 17 October, 2019 5:27:02 AM
> Subject: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
>
> I think this has been there for a long time, since we first saw this
> on a 4.18.0 based kernel but I just noticed the bug recently.
> I just retested on 5.4-rc3 and it's still there. Easy to repro with a
> fairly simple but invasive server restart test - takes only maybe a
> couple minutes on my VM.
>
>
> From Frank Sorenson:
>
> mount off a samba server:
>
> # mount //vm1/share /mnt/vm1
> -overs=2.1,hard,sec=ntlmssp,credentials=/root/.smb_creds
>
>
> on the client, start 10 'find' loops:
>
> # export test_path=/mnt/vm1
> # do_find() { while true ; do find $test_path >/dev/null 2>&1 ; done }
>
> # for i in {1..10} ; do do_find & done
>
>
> optional: also start something to monitor for when the hang occurs:
>
> # while true ; do count=$(grep smb2_reconnect /proc/*/stack -A3 |
> grep -c open_shroot) ; [[ $count -gt 0 ]] && { echo "$(date):
> reproduced bug" ; break ; } ; echo "$(date): stayin' alive" ; sleep 2
> ; done
>
>
>
> On the samba server: restart smb.service (loop it in case it requires
> more than one restart):
>
> # while true ; do echo "$(date): restarting" ; systemctl restart
> smb.service ; sleep 5 ; done | tee /var/tmp/smb_restart_log.out
>
>
>
>
> [ 430.454897] list_del corruption. prev->next should be
> ffff98d3a8f316c0, but was 2e885cb266355469
> [ 430.464668] ------------[ cut here ]------------
> [ 430.466569] kernel BUG at lib/list_debug.c:51!
> [ 430.468476] invalid opcode: 0000 [#1] SMP PTI
> [ 430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted
> 5.4.0-rc3+ #19
> [ 430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [ 430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> [ 430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> 0f 0b
> [ 430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> [ 430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> 0000000000000000
> [ 430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> ffff98d3b7a17908
> [ 430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> 0000000000000285
> [ 430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> ffff98d3aabb89c0
> [ 430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> ffff98d3b24c4480
> [ 430.501981] FS: 0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> knlGS:0000000000000000
> [ 430.505232] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> 00000000000406f0
> [ 430.510426] Call Trace:
> [ 430.511500] cifs_reconnect+0x25e/0x610 [cifs]
> [ 430.513350] cifs_readv_from_socket+0x220/0x250 [cifs]
> [ 430.515464] cifs_read_from_socket+0x4a/0x70 [cifs]
> [ 430.517452] ? try_to_wake_up+0x212/0x650
> [ 430.519122] ? cifs_small_buf_get+0x16/0x30 [cifs]
> [ 430.521086] ? allocate_buffers+0x66/0x120 [cifs]
> [ 430.523019] cifs_demultiplex_thread+0xdc/0xc30 [cifs]
> [ 430.525116] kthread+0xfb/0x130
> [ 430.526421] ? cifs_handle_standard+0x190/0x190 [cifs]
> [ 430.528514] ? kthread_park+0x90/0x90
> [ 430.530019] ret_from_fork+0x35/0x40
> [ 430.531487] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
> ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
> ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> iptable_mangle iptable_raw iptable_security nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul joydev
> virtio_balloon ghash_clmulni_intel i2c_piix4 nfsd nfs_acl lockd
> auth_rpcgss grace sunrpc xfs libcrc32c virtio_net net_failover
> crc32c_intel virtio_console serio_raw virtio_blk ata_generic failover
> pata_acpi qemu_fw_cfg
> [ 430.552782] ---[ end trace c91d4468f8689482 ]---
> [ 430.554948] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> [ 430.557251] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> 0f 0b
> [ 430.565019] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> [ 430.567181] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> 0000000000000000
> [ 430.570073] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> ffff98d3b7a17908
> [ 430.572955] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> 0000000000000285
> [ 430.575854] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> ffff98d3aabb89c0
> [ 430.578745] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> ffff98d3b24c4480
> [ 430.581624] FS: 0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> knlGS:0000000000000000
> [ 430.584881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 430.587230] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> 00000000000406f0
>
>
> crash> dis -lr cifs_reconnect+0x25e | tail --lines=20
> 0xffffffffc062dc26 <cifs_reconnect+0x226>: movb
> $0x0,0xbb36b(%rip) # 0xffffffffc06e8f98 <GlobalMid_Lock>
> /mnt/build/kernel/fs/cifs/connect.c: 572
> 0xffffffffc062dc2d <cifs_reconnect+0x22d>: mov %r12,%rdi
> 0xffffffffc062dc30 <cifs_reconnect+0x230>: callq
> 0xffffffff8d9d5a20 <mutex_unlock>
> /mnt/build/kernel/fs/cifs/connect.c: 574
> 0xffffffffc062dc35 <cifs_reconnect+0x235>: testb
> $0x1,0xbb300(%rip) # 0xffffffffc06e8f3c <cifsFYI>
> 0xffffffffc062dc3c <cifs_reconnect+0x23c>: je
> 0xffffffffc062dc43 <cifs_reconnect+0x243>
> /mnt/build/kernel/./arch/x86/include/asm/jump_label.h: 25
> 0xffffffffc062dc3e <cifs_reconnect+0x23e>: data32 data32 data32
> xchg %ax,%ax
> /mnt/build/kernel/fs/cifs/connect.c: 575
> 0xffffffffc062dc43 <cifs_reconnect+0x243>: mov 0x8(%rsp),%rbp
> 0xffffffffc062dc48 <cifs_reconnect+0x248>: mov 0x0(%rbp),%r14
> 0xffffffffc062dc4c <cifs_reconnect+0x24c>: cmp %r13,%rbp
> 0xffffffffc062dc4f <cifs_reconnect+0x24f>: jne
> 0xffffffffc062dc56 <cifs_reconnect+0x256>
> 0xffffffffc062dc51 <cifs_reconnect+0x251>: jmp
> 0xffffffffc062dc90 <cifs_reconnect+0x290>
> 0xffffffffc062dc53 <cifs_reconnect+0x253>: mov %rax,%r14
> /mnt/build/kernel/./include/linux/list.h: 190
> 0xffffffffc062dc56 <cifs_reconnect+0x256>: mov %rbp,%rdi
> 0xffffffffc062dc59 <cifs_reconnect+0x259>: callq
> 0xffffffff8d4e6b00 <__list_del_entry_valid>
> 0xffffffffc062dc5e <cifs_reconnect+0x25e>: test %al,%al
>
>
> fs/cifs/connect.c
> 566 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> 567 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> 568 mid_entry->mid_state = MID_RETRY_NEEDED;
> 569 list_move(&mid_entry->qhead, &retry_list);
> 570 }
> 571 spin_unlock(&GlobalMid_Lock);
> 572 mutex_unlock(&server->srv_mutex);
> 573
> 574 cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> 575--> list_for_each_safe(tmp, tmp2, &retry_list) {
> 576 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> 577 list_del_init(&mid_entry->qhead);
> 578 mid_entry->callback(mid_entry);
> 579 }
> 580
> 581 if (cifs_rdma_enabled(server)) {
> 582 mutex_lock(&server->srv_mutex);
> 583 smbd_destroy(server);
> 584 mutex_unlock(&server->srv_mutex);
>
next prev parent reply other threads:[~2019-10-17 0:17 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-16 19:27 list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 David Wysochanski
2019-10-17 0:17 ` Ronnie Sahlberg [this message]
2019-10-17 9:05 ` Ronnie Sahlberg
2019-10-17 11:42 ` David Wysochanski
2019-10-17 14:08 ` Ronnie Sahlberg
2019-10-17 15:29 ` David Wysochanski
2019-10-17 18:29 ` Pavel Shilovskiy
2019-10-17 19:23 ` David Wysochanski
2019-10-17 19:58 ` Pavel Shilovskiy
2019-10-17 20:34 ` David Wysochanski
2019-10-17 21:44 ` Ronnie Sahlberg
2019-10-17 22:02 ` Pavel Shilovskiy
2019-10-17 22:53 ` Ronnie Sahlberg
2019-10-17 23:20 ` Pavel Shilovskiy
2019-10-17 23:41 ` Ronnie Sahlberg
2019-10-18 8:16 ` David Wysochanski
2019-10-18 9:27 ` Ronnie Sahlberg
2019-10-18 10:12 ` David Wysochanski
2019-10-18 20:59 ` Pavel Shilovskiy
2019-10-18 21:21 ` David Wysochanski
2019-10-18 21:44 ` David Wysochanski
2019-10-18 22:45 ` Pavel Shilovskiy
2019-10-19 11:09 ` David Wysochanski
2019-10-21 21:54 ` Pavel Shilovsky
2019-10-22 18:39 ` David Wysochanski
2019-10-22 21:20 ` ronnie sahlberg
2019-10-22 21:25 ` Pavel Shilovsky
2019-10-22 21:32 ` ronnie sahlberg
2019-10-19 23:35 ` [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect Dave Wysochanski
2019-10-21 22:34 ` Pavel Shilovsky
2019-10-19 9:44 ` list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 Ronnie Sahlberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1206360169.6955748.1571271438699.JavaMail.zimbra@redhat.com \
--to=lsahlber@redhat.com \
--cc=dwysocha@redhat.com \
--cc=linux-cifs@vger.kernel.org \
--cc=sorenson@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).